History log of /fs/ocfs2/alloc.c
Revision Date Author Comments
981035b47d7da8ba7c153ed431bf515f593853d8 07-Aug-2014 Yingtai Xie <xieyingtai@huawei.com> ocfs2: correctly check the return value of ocfs2_search_extent_list

ocfs2_search_extent_list may return -1, so we should check the return
value in ocfs2_split_and_insert, otherwise it may cause array index out of
bound.

And ocfs2_search_extent_list can only return value less than
el->l_next_free_rec, so check if it is equal or larger than
le16_to_cpu(el->l_next_free_rec) is meaningless.

Signed-off-by: Yingtai Xie <xieyingtai@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a9e9acaeb0a981a6dfa54b32dd756103aeefa6a7 05-Jun-2014 Xue jiufei <xuejiufei@huawei.com> ocfs2: fix umount hang while shutting down truncate log

Revert commit 75f82eaa502c ("ocfs2: fix NULL pointer dereference when
dismount and ocfs2rec simultaneously") because it may cause a umount
hang while shutting down the truncate log.

fix NULL pointer dereference when dismount and ocfs2rec simultaneously

The situation is as followes:
ocfs2_dismout_volume
-> ocfs2_recovery_exit
-> free osb->recovery_map
-> ocfs2_truncate_shutdown
-> lock global bitmap inode
-> ocfs2_wait_for_recovery
-> check whether osb->recovery_map->rm_used is zero

Because osb->recovery_map is already freed, rm_used can be any other
values, so it may yield umount hang.

To prevent NULL pointer dereference while getting sys_root_inode, we use
a osb_tl_disable flag to disable schedule osb_truncate_log_wq after
truncate log shutdown.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6fdb702d6262b18b1b41a35f1f81903b0a2bc2c9 03-Apr-2014 Darrick J. Wong <darrick.wong@oracle.com> ocfs2: call ocfs2_update_inode_fsync_trans when updating any inode

Ensure that ocfs2_update_inode_fsync_trans() is called any time we touch
an inode in a given transaction. This is a follow-on to the previous
patch to reduce lock contention and deadlocking during an fsync
operation.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Wengang <wen.gang.wang@oracle.com>
Cc: Greg Marsden <greg.marsden@oracle.com>
Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2931cdcb49194503b19345c597b68fdcf78396f8 03-Apr-2014 Darrick J. Wong <darrick.wong@oracle.com> ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file

Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
transaction to complete. This isn't terribly efficient, since sync_file
really only needs to wait for the last transaction involving that inode
to complete, and this doesn't require i_mutex.

Therefore, implement the necessary bits to track the newest tid
associated with an inode, and teach sync_file to wait for that instead
of waiting for everything in the journal to commit. Furthermore, only
issue the flush request to the drive if jbd2 hasn't already done so.

This also eliminates the deadlock between ocfs2_file_aio_write() and
ocfs2_sync_file(). aio_write takes i_mutex then calls
ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
However, if that dio completion involves calling fsync, then we can get
into trouble when some ocfs2_sync_file tries to take i_mutex.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d62e74be1270c89fbaf7aada8218bfdf62d00a58 10-Feb-2014 Younger Liu <younger.liu@huawei.com> ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size

The issue scenario is as following:

- Create a small file and fallocate a large disk space for a file with
FALLOC_FL_KEEP_SIZE option.

- ftruncate the file back to the original size again. but the disk free
space is not changed back. This is a real bug that be fixed in this
patch.

In order to solve the issue above, we modified ocfs2_setattr(), if
attr->ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and
truncate disk space to attr->ia_size.

Signed-off-by: Younger Liu <younger.liu@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Tested-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: Jensen <shencanquan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fb951eb5e167de9f07973ce0dfff674a2019bfab 06-Feb-2014 Zongxun Wang <wangzongxun@huawei.com> ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters

Even if using the same jbd2 handle, we cannot rollback a transaction.
So once some error occurs after successfully allocating clusters, the
allocated clusters will never be used and it means they are lost. For
example, call ocfs2_claim_clusters successfully when expanding a file,
but failed in ocfs2_insert_extent. So we need free the allocated
clusters if they are not used indeed.

Signed-off-by: Zongxun Wang <wangzongxun@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aa89762c54800208d5afdcd8e6bf124818f17fe0 22-Jan-2014 Jie Liu <jeff.liu@oracle.com> ocfs2: return EINVAL if the given range to discard is less than block size

For FITRIM ioctl(2), we should not keep silence if the given range
length ls less than a block size as there is no data blocks would be
discareded. Hence it should return EINVAL instead. This issue can be
verified via xfstests/generic/288 which is used for FITRIM argument
handling tests.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7391a294b861bf2c3b762dfdcf61b9c5f1bffa1f 13-Nov-2013 Rui Xiang <rui.xiang@huawei.com> ocfs2: return ENOMEM when sb_getblk() fails

The only reason for sb_getblk() failing is if it can't allocate the
buffer_head. So return ENOMEM instead when it fails.

[joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change]
Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
33add0e3a09a62c7796ea9838243c1cd933d8543 04-Jul-2013 Joseph Qi <joseph.qi@huawei.com> ocfs2: fix mutex_unlock and possible memory leak in ocfs2_remove_btree_range

In ocfs2_remove_btree_range, when calling ocfs2_lock_refcount_tree and
ocfs2_prepare_refcount_change_for_del failed, it goes to out and then
tries to call mutex_unlock without mutex_lock before. And when calling
ocfs2_reserve_blocks_for_rec_trunc failed, it should free ref_tree
before return.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d787ab0977c58e2c421b8d0ab49e363893ddb814 22-Feb-2013 Tim Gardner <tim.gardner@canonical.com> ocfs2: remove kfree() redundant null checks

smatch analysis indicates a number of redundant NULL checks before
calling kfree(), eg:

fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info:
redundant null check on *tl_copy calling kfree()

fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info:
redundant null check on pages calling kfree()

etc....

[akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()]
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3a251f04fe97c3d335b745c98e4b377e3c3899f2 13-Apr-2012 Al Viro <viro@zeniv.linux.org.uk> ocfs2: ->l_next_free_req breakage on big-endian

It's le16, not le32...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
b8a0ae579fb8d9b21008ac386be08b9428902455 12-Oct-2011 Wengang Wang <wen.gang.wang@oracle.com> ocfs2: Commit transactions in error cases -v2

There are three cases found that in error cases, journal transactions are not
committed nor aborted. We should take care of these case by committing the
transactions. Otherwise, there would left a journal handle which will lead to
, in same process context, the comming ocfs2_start_trans() gets wrong credits.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
10fca35ff12ad2a7017bce6567cffe9da443d7a2 23-May-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Add trace event for trim.

Add the corresponding trace event for trim.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
e80de36d8dbff216a384e9204e54d59deeadf344 23-May-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Add ocfs2_trim_fs for SSD trim support.

Add ocfs2_trim_fs to support trimming freed clusters in the
volume. A range will be given and all the freed clusters greater
than minlen will be discarded to the block layer.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
81bad69749623062fae2f94e2d98dd43d95a36f4 22-Feb-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Remove DISK_ALLOC from masklog.

Since all 4 files, localalloc.c, suballoc.c, alloc.c and
resize.c, which use DISK_ALLOC are changed to trace events,
Remove masklog DISK_ALLOC totally.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
a09d09b8f8d7c8acd46d96e3e9899bd1461fc036 22-Feb-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Remove mlog(0) from fs/ocfs2/alloc.c

This is the first try of replacing debug mlog(0,...) to
trace events. Wengang has did some work in his original
patch
http://oss.oracle.com/pipermail/ocfs2-devel/2009-November/005513.html
But he didn't finished it.

So this patch removes all mlog(0,...) from alloc.c and adds
the corresponding trace events. Different mlogs have different
solutions.
1. Some are replaced with trace event directly.
2. Some are replaced and some new parameters are added since
I think we need to know the btree owner in that case.
3. Some are combined into one trace events.
4. Some redundant mlogs are removed.
What's more, it defines some event classes so that we can use
them later.

Cc: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
c1e8d35ef5ffb393b94a192034b5e3541e005d75 07-Mar-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Remove EXIT from masklog.

mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
ef6b689b63b9f5227ccee6f16dd9ee3faf58a464 21-Feb-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Remove ENTRY from masklog.

ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
aecf58661961a553c254cf14536f70349127affb 23-Dec-2010 Tao Ma <tao.ma@oracle.com> ocfs2: Remove unused truncate function from alloc.c

Tristan Ye has done some refactoring against our truncate
process, so some functions like ocfs2_prepare_truncate and
ocfs2_free_truncate_context are no use and we'd better
remove them.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
50308d813bf26500fed671882469939fd19403a3 04-Nov-2010 Tao Ma <tao.ma@oracle.com> ocfs2: Try to free truncate log when meeting ENOSPC in write.

Recently, one of our colleagues meet with a problem that if we
write/delete a 32mb files repeatly, we will get an ENOSPC in
the end. And the corresponding bug is 1288.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288

The real problem is that although we have freed the clusters,
they are in truncate log and they will be summed up so that
we can free them once in a whole.

So this patch just try to resolve it. In case we see -ENOSPC
in ocfs2_write_begin_no_lock, we will check whether the truncate
log has enough clusters for our need, if yes, we will try to
flush the truncate log at that point and try again. This method
is inspired by Mark Fasheh <mfasheh@suse.com>. Thanks.

Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
9b4c0ff32ccd87ab52d4c5bd0a0536febce11370 24-Aug-2010 Jan Kara <jack@suse.cz> ocfs2: Fix deadlock when allocating page

We cannot call grab_cache_page() when holding filesystem locks or with
a transaction started as grab_cache_page() calls page allocation with
GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem
causing deadlocks or various assertion failures. We have to use
find_or_create_page() instead and pass it GFP_NOFS as we do with other
allocations.

Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
ee149a7c6cbaee0e3a1a7d9e9f92711228ef5236 11-May-2010 Tristan Ye <tristan.ye@oracle.com> Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.

The original idea to pull ocfs2_find_cpos_for_left_leaf() out of
alloc.c is to benefit punching-holes optimization patch, it however,
can also be referred by other funcs in the future who want to do the
same job.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
78f94673d7faf01677f374f4ebbf324ff1a0aa6e 11-May-2010 Tristan Ye <tristan.ye@oracle.com> Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.

Truncate is just a special case of punching holes(from new i_size to
end), we therefore could take advantage of the existing
ocfs2_remove_btree_range() to reduce the comlexity and redundancy in
alloc.c. The goal here is to make truncate more generic and
straightforward.

Several functions only used by ocfs2_commit_truncate() will smiply be
removed.

ocfs2_remove_btree_range() was originally used by the hole punching
code, which didn't take refcount trees into account (definitely a bug).
We therefore need to change that func a bit to handle refcount trees.
It must take the refcount lock, calculate and reserve blocks for
refcount tree changes, and decrease refcounts at the end. We replace
ocfs2_lock_allocators() here by adding a new func
ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to
reserve. This will not hurt any other code using
ocfs2_remove_btree_range() (such as dir truncate and hole punching).

I merged the following steps into one patch since they may be
logically doing one thing, though I know it looks a little bit fat
to review.

1). Remove redundant code used by ocfs2_commit_truncate(), since we're
moving to ocfs2_remove_btree_range anyway.

2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of
accepting some extra blocks to reserve.

3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our
needs. It's safe to do this since it's only being called by
truncate.

4). Change ocfs2_remove_btree_range() a bit to take refcount case into
account.

5). Finally, we change ocfs2_commit_truncate() to call
ocfs2_remove_btree_range() in a proper way.

The patch has been tested normally for sanity check, stress tests
with heavier workload will be expected.

Based on this patch, fixing the punching holes bug will be fairly easy.

Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
c901fb00731e307c2c6e8c7d5eee005df5835f9d 26-Apr-2010 Tao Ma <tao.ma@oracle.com> ocfs2: Make ocfs2_extend_trans() really extend.

In ocfs2, we use ocfs2_extend_trans() to extend a journal handle's
blocks. But if jbd2_journal_extend() fails, it will only restart
with the the new number of blocks. This tends to be awkward since
in most cases we want additional reserved blocks. It makes our code
harder to mantain since the caller can't be sure all the original
blocks will not be accessed and dirtied again. There are 15 callers
of ocfs2_extend_trans() in fs/ocfs2, and 12 of them have to add
h_buffer_credits before they call ocfs2_extend_trans(). This makes
ocfs2_extend_trans() really extend atop the original block count.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
4fe370afaae49c57619bb0bedb75de7e7c168308 07-Dec-2009 Mark Fasheh <mfasheh@suse.com> ocfs2: use allocation reservations during file write

Add a per-inode reservations structure and pass it through to the
reservations code.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ec20cec7a351584ca6c70ead012e73d61f9a8e04 19-Mar-2010 Joel Becker <joel.becker@oracle.com> ocfs2: Make ocfs2_journal_dirty() void.

jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0
since before the kernel moved to git. There is no point in checking
this error.

ocfs2_journal_dirty() has been faithfully returning the status since the
beginning. All over ocfs2, we have blocks of code checking this can't
fail status. In the past few years, we've tried to avoid adding these
checks, because they are pointless. But anyone who looks at our code
assumes they are needed.

Finally, ocfs2_journal_dirty() is made a void function. All error
checking is removed from other files. We'll BUG_ON() the status of
jbd2_journal_dirty_metadata() just in case they change it someday. They
won't.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
74380c479ad83addeff8a172ab95f59557b5b0c3 22-Mar-2010 Tao Ma <tao.ma@oracle.com> ocfs2: Free block to the right block group.

In case the block we are going to free is allocated from
a discontiguous block group, we have to use suballoc_loc
to be the right group.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
2b6cb576aa80611f1f6a3c88708d1e68a8d97985 26-Mar-2010 Joel Becker <joel.becker@oracle.com> ocfs2: Set suballoc_loc on allocated metadata.

Get the suballoc_loc from ocfs2_claim_new_inode() or
ocfs2_claim_metadata(). Store it on the appropriate field of the block
we just allocated.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
1ed9b777f77929ae961d6f9cdf828a07200ba71c 06-May-2010 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_claim_*() don't need an ocfs2_super argument.

They all take an ocfs2_alloc_context, which has the allocation inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Tao Ma <tao.ma@oracle.com>
5dd4056db84387975140ff2568eaa0406f07985e 03-Mar-2010 Christoph Hellwig <hch@infradead.org> dquot: cleanup space allocation / freeing routines

Get rid of the alloc_space, free_space, reserve_space, claim_space and
release_rsv dquot operations - they are always called from the filesystem
and if a filesystem really needs their own (which none currently does)
it can just call into it's own routine directly.

Move shared logic into the common __dquot_alloc_space,
dquot_claim_space_nodirty and __dquot_free_space low-level methods,
and rationalize the wrappers around it to move as much as possible
code into the common block for CONFIG_QUOTA vs not. Also rename
all these helpers to be named dquot_* instead of vfs_dq_*.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
b89c54282db0c8634a2d2dc200f196d571750ce5 25-Jan-2010 Tiger Yang <tiger.yang@oracle.com> ocfs2: add extent block stealing for ocfs2 v5

This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2cfd30adf6130dab3fbb130eb5f7b1fd42a70e31 23-Sep-2009 Christoph Hellwig <hch@lst.de> ocfs: stop using do_sync_mapping_range

do_sync_mapping_range(..., SYNC_FILE_RANGE_WRITE) is a very awkward way
to perform a filemap_fdatawrite_range.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
af901ca181d92aac3a7dc265144a9081a86d8f39 14-Nov-2009 André Goddard Rosa <andre.goddard@gmail.com> tree-wide: fix assorted typos all over the place

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
38a04e432768ec0b016f3c687b4de31ac111ae59 30-Nov-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Find proper end cpos for a leaf refcount block.

ocfs2 refcount tree is stored as an extent tree while
the leaf ocfs2_refcount_rec points to a refcount block.

The following step can trip a kernel panic.
mkfs.ocfs2 -b 512 -C 1M --fs-features=refcount $DEVICE
mount -t ocfs2 $DEVICE $MNT_DIR
FILE_NAME=$RANDOM
FILE_NAME_1=$RANDOM
FILE_REF="${FILE_NAME}_ref"
FILE_REF_1="${FILE_NAME}_ref_1"
for((i=0;i<305;i++))
do
# /mnt/1048576 is a file with 1048576 sizes.
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
done
for((i=0;i<3;i++))
do
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
done

for((i=0;i<2;i++))
do
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
done

cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME

for((i=0;i<11;i++))
do
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME
cat /mnt/1048576 >> $MNT_DIR/$FILE_NAME_1
done
reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF
# write_f is a program which will write some bytes to a file at offset.
# write_f -f file_name -l offset -w write_bytes.
./write_f -f $MNT_DIR/$FILE_REF -l $[310*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_REF -l $[306*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_REF -l $[311*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_NAME -l $[310*1048576] -w 4096
./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
reflink $MNT_DIR/$FILE_NAME $MNT_DIR/$FILE_REF_1
./write_f -f $MNT_DIR/$FILE_NAME -l $[311*1048576] -w 4096
#kernel panic here.

The reason is that if the ocfs2_extent_rec is the last record
in a leaf extent block, the old solution fails to find the
suitable end cpos. So this patch try to walk through the b-tree,
find the next sub root and get the c_pos the next sub-tree starts
from.

btw, I have runned tristan's test case against the patched kernel
for several days and this type of kernel panic never happens again.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
c18b812d127a971901180188b918a7cd98ccd4d6 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Make transaction extend more efficient.

In ocfs2_extend_rotate_transaction, op_credits is the orignal
credits in the handle and we only want to extend the credits
for the rotation, but the old solution always double it. It
is harmless for some minor operations, but for actions like
reflink we may rotate tree many times and cause the credits
increase dramatically. So this patch try to only increase
the desired credits.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
6ae23c5555176c5b23480c9c578ff27437085ba5 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: CoW refcount tree improvement.

During CoW, if the old extent record is refcounted, we allocate
som new clusters and do CoW. Actually we can have some improvement
here. If the old extent has refcount=1, that means now it is only
used by this file. So we don't need to allocate new clusters, just
remove the refcounted flag and it is OK. We also have to remove
it from the refcount tree while not deleting it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
6f70fa519976a379d72781d927cf8e5f5b05ec86 25-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Add CoW support.

This patch try CoW support for a refcounted record.

the whole process will be:
1. Calculate how many clusters we need to CoW and where we start.
Extents that are not completely encompassed by the write will
be broken on 1MB boundaries.
2. Do CoW for the clusters with the help of page cache.
3. Change the b-tree structure with the new allocated clusters.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
bcbbb24a6a5c5b3e7b8e5284e0bfa23f45c32377 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Decrement refcount when truncating refcounted extents.

Add 'Decrement refcount for delete' in to the normal truncate
process. So for a refcounted extent record, call refcount rec
decrementation instead of cluster free.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
1aa75fea64bc26bda9be9b1b20ae253d7a481877 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Add functions for extents refcounted.

Add function ocfs2_mark_extent_refcounted which can mark
an extent refcounted.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
1823cb0b9fe5e6d48017ee3f92428f69c0235d87 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Add support of decrementing refcount for delete.

Given a physical cpos and length, decrement the refcount
in the tree. If the refcount for any portion of the extent goes
to zero, that portion is queued for freeing.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
e2e9f6082b5ff099978774d5c0148e062344c2f9 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: move tree path functions to alloc.h.

Now fs/ocfs2/alloc.c has more than 7000 lines. It contains our
basic b-tree operation. Although we have already make our b-tree
operation generic, the basic structrue ocfs2_path which is used
to iterate one b-tree branch is still static and limited to only
used in alloc.c. As refcount tree need them and I don't want to
add any more b-tree unrelated code to alloc.c, export them out.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
fe924415957e60471536762172d127e85519ef78 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Add refcount b-tree as a new extent tree.

Add refcount b-tree as a new extent tree so that it can
use the b-tree to store and maniuplate ocfs2_refcount_rec.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
555936bfcb1af26c6919d6cedb83710bb03d4322 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Abstract extent split process.

ocfs2_mark_extent_written actually does the following things:
1. check the parameters.
2. initialize the left_path and split_rec.
3. call __ocfs2_mark_extent_written. it will do:
1) check the flags of unwritten
2) do the real split work.
The whole process is packed tightly somehow. So this patch
will abstract 2 different functions so that future b-tree
operation can work with it.

1. __ocfs2_split_extent will accept path and split_rec and do
the real split work.
2. ocfs2_change_extent_flag will accept a new flag and initialize
path and split_rec.

So now ocfs2_mark_extent_written will do:
1. check the parameters.
2. call ocfs2_change_extent_flag.
1) initalize the left_path and split_rec.
2) check whether the new flags conflict with the old one.
3) call __ocfs2_split_extent to do the split.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
853a3a1439b18d5a70ada2cb3fcd468e70b7d095 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Wrap ocfs2_extent_contig in ocfs2_extent_tree.

Add a new operation eo_ocfs2_extent_contig int the extent tree's
operations vector. So that with the new refcount tree, We want
this so that refcount trees can always return CONTIG_NONE and
prevent extent merging.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
5e404e9ed1b05cafb044bd46792e50197df805ed 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Pass ocfs2_caching_info into ocfs_init_*_extent_tree().

With this commit, extent tree operations are divorced from inodes and
rely on ocfs2_caching_info. Phew!

Signed-off-by: Joel Becker <joel.becker@oracle.com>
a1cf076ba93f9fdf3eb4195f9f43d1e7cb7550f2 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: __ocfs2_mark_extent_written() doesn't need struct inode.

We only allow unwritten extents on data, so the toplevel
ocfs2_mark_extent_written() can use an inode all it wants. But the
subfunction isn't even using the inode argument.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
f3868d0fa2e20d923087a8296fda47b0afe7f9ba 18-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Teach ocfs2_replace_extent_rec() to use an extent_tree.

Don't use a struct inode anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
d231129f44e7ead14f5f496e664ff1e3883a7b25 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_split_and_insert() no longer needs struct inode.

It already has an extent_tree.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
dbdcf6a48a40e6c9d7081393d793c4f1c5bb4fcf 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_remove_extent() no longer needs struct inode.

One more generic btree function that is isolated from struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
cbee7e1a6a1a2a3d6eda1f76ffc38a3ed3eeb6cc 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_add_clusters_in_btree() no longer needs struct inode.

One more function that doesn't need a struct inode to pass to its
children.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
cc79d8c19e9d39446525a1026f1a21761f5d3cd2 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_insert_extent() no longer needs struct inode.

One more function down, no inode in the entire insert-extent chain.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
92ba470c44c1404ff18ca0f4ecce1e5b116bb933 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Make extent map insertion an extent_tree_operation.

ocfs2_insert_extent() wants to insert a record into the extent map if
it's an inode data extent. But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree. Other tree types
can leave it empty.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
627961b77e68b725851cb227db10084bf15f6920 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_figure_insert_type() no longer needs struct inode.

It's not using it, so remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
1ef61b33148a6b32b6d28383cd72ceeddfc7054d 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Remove inode from ocfs2_figure_extent_contig().

It already has an ocfs2_extent_tree and doesn't need the inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
a29702914ad36443d83b5250b3bfa1bf91e6b239 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Swap inode for extent_tree in ocfs2_figure_merge_contig_type().

We don't want struct inode in generic btree operations.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
b4a176515c715f0c6db1759a39cd9c4175e5a23a 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_extent_contig() only requires the superblock.

Don't pass the inode in. We don't want it around for generic btree
operations.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
3505bec01829a8f690259517add55c7941a4d3d5 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_do_insert_extent() and ocfs2_insert_path() no longer need an inode.

They aren't using it, so remove it from their parameter lists.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
c38e52bb1c0187186bd3c4a2b318ffe69cd2fdf8 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Give ocfs2_split_record() an extent_tree instead of an inode.

Another on the way to generic btree functions.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
d562862314a7b131a630f7b912490312387542fb 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_insert_at_leaf() doesn't need struct inode.

Give it an ocfs2_extent_tree and it is happy.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
4c911eefca316f580f174940cd67d561b4b7e6e8 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Make truncating the extent map an extent_tree_operation.

ocfs2_remove_extent() wants to truncate the extent map if it's
truncating an inode data extent. But since many btrees can call that
function, let's make it an op on ocfs2_extent_tree. Other tree types
can leave it empty.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
043beebb6c467a07ccd7aa666095f87fade1c28e 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_truncate_rec() doesn't need struct inode.

It's not using it anymore. Remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
d401dc12fcced123909eba10334fb5d78866d1a9 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_grow_branch() and ocfs2_append_rec_to_path() lose struct inode.

ocfs2_grow_branch() not really using it other than to pass it to the
subfunctions ocfs2_shift_tree_depth(), ocfs2_find_branch_target(), and
ocfs2_add_branch(). The first two weren't it either, so they drop the
argument. ocfs2_add_branch() only passed it to
ocfs2_adjust_rightmost_branch(), which drops the inode argument and uses
the ocfs2_extent_tree as well.

ocfs2_append_rec_to_path() can be take an ocfs2_extent_tree instead of
the inode. The function ocfs2_adjust_rightmost_records() goes along for
the ride.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
c495dd24ac00654f99540f533185e1fcc9534009 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_try_to_merge_extent() doesn't need struct inode.

It's not using it, so remove it from the parameter list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
4fe82c312a7d975a9d0f591dc9180c1197ee4270 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_merge_rec_left/right() no longer need struct inode.

Drop it from the parameters - they already have ocfs2_extent_list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
70f18c08b476e315c8ee17ea34b55ea1957e7e7d 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_rotate_tree_left() no longer needs struct inode.

It already gets ocfs2_extent_tree, so we can just use that. This chains
to the same modification for ocfs2_remove_rightmost_path() and
ocfs2_rotate_rightmost_leaf_left().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
e46f74dc357947e2aed9bdd63cf335c5fd23810b 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: __ocfs2_rotate_tree_left() doesn't need struct inode.

It already has struct ocfs2_extent_tree, which has the caching info. So
we don't need to pass it struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
1e2dd63fe0b6e99b81904a61090db801978b9520 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_rotate_subtree_left() doesn't need struct inode.

It already has struct ocfs2_extent_tree, which has the caching info. So
we don't need to pass it struct inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
09106bae05c3350e8d0ef0ede90b1c3da4bda2f8 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_update_edge_lengths() doesn't need struct inode.

Pass in the extent tree, which is all we need.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
1bbf0b8d606645c7596ee641acfbf042765c9719 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_rotate_tree_right() doesn't need struct inode.

We don't need struct inode in ocfs2_rotate_tree_right() anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
6136ca5f5f9fd38da399e9ff9380f537c1b3b901 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Drop struct inode from ocfs2_extent_tree_operations.

We can get to the inode from the caching information. Other parent
types don't need it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
7dc028056750328e74ca807041c822068384fe16 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Pass ocfs2_extent_tree to ocfs2_get_subtree_root()

Get rid of the inode argument. Use extent_tree instead. This means a
few more functions have to pass an extent_tree around.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
5c601aba8c5d9d5f944cf02b59e3288dd72ae6cf 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Get inode out of ocfs2_rotate_subtree_root_right().

Pass the ocfs2_extent_list down through ocfs2_rotate_tree_right() and
get rid of struct inode in ocfs2_rotate_subtree_root_right().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
4619c73e7c9bd10bac6b60925fa28d5a2eeaf6ed 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_complete_edge_insert() doesn't need struct inode at all.

Completely unused argument. Get rid of it.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
6641b0ce3274d979338cb67b2f562189dcbc1c28 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Pass ocfs2_extent_tree to ocfs2_unlink_path()

ocfs2_unlink_path() doesn't need struct inode, so let's pass it struct
ocfs2_extent_tree.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
42a5a7a9a5abf9a566b91c51137921957b9a14e4 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_create_new_meta_bhs() doesn't need struct inode.

Pass struct ocfs2_extent_tree into ocfs2_create_new_meta_bhs(). It no
longer needs struct inode or ocfs2_super.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
facdb77f54f09a33baf6b649496f5dd1d7922a7e 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: ocfs2_find_path() only needs the caching info

ocfs2_find_path and ocfs2_find_leaf() walk our btrees, reading extent
blocks. They need struct ocfs2_caching_info for that, but not struct
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
3d03a305ded8057155bd3c801e64ffef9f534827 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Pass ocfs2_caching_info to ocfs2_read_extent_block().

extent blocks belong to btrees on more than just inodes, so we want to
pass the ocfs2_caching_info structure directly to
ocfs2_read_extent_block(). A number of places in alloc.c can now drop
struct inode from their argument list.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
d9a0a1f83bf083b55b3c1f16efddecc31abace61 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Store the ocfs2_caching_info on ocfs2_extent_tree.

What do we cache? Metadata blocks. What are most of our non-inode metadata
blocks? Extent blocks for our btrees. struct ocfs2_extent_tree is the
main structure for managing those. So let's store the associated
ocfs2_caching_info there.

This means that ocfs2_et_root_journal_access() doesn't need struct inode
anymore, and any place that has an et can refer to et->et_ci instead of
INODE_CACHE(inode).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
0cf2f7632b1789b811ab20b611c4156e6de2b055 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Pass struct ocfs2_caching_info to the journal functions.

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
8cb471e8f82506937fe5e2e9fb0bf90f6b1f1170 11-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Take the inode out of the metadata read/write paths.

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
60e2ec48665b8495360ca4a6004c5cd52beb2bc1 12-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: release the buffer head in ocfs2_do_truncate.

In ocfs2_do_truncate, we forget to release last_eb_bh which
will cause memleak. So call brelse in the end.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
82e12644cf5227dab15201fbcaf0ca6330ebd70f 23-Jul-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Use ocfs2_rec_clusters in ocfs2_adjust_adjacent_records.

In ocfs2_adjust_adjacent_records, we will adjust adjacent records
according to the extent_list in the lower level. But actually
the lower level tree will either be a leaf or a branch. If we only
use ocfs2_is_empty_extent we will meet with some problem if the lower
tree is a branch (tree_depth > 1). So use !ocfs2_rec_clusters instead.
And actually only the leaf record can have holes. So add a BUG_ON
for non-leaf branch.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
3c5e10683e684ef45614c9071847e48f633d9806 21-Jul-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Add extra credits and access the modified bh in update_edge_lengths.

In normal tree rotation left process, we will never touch the tree
branch above subtree_index and ocfs2_extend_rotate_transaction doesn't
reserve the credits for them either.

But when we want to delete the rightmost extent block, we have to update
the rightmost records for all the rightmost branch(See
ocfs2_update_edge_lengths), so we have to allocate extra credits for them.
What's more, we have to access them also.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
6b791bcc8b2ae21daf95d18cff2f1eca7a64c9a5 12-Jun-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Adjust rightmost path in ocfs2_add_branch.

In ocfs2_add_branch, we use the rightmost rec of the leaf extent block
to generate the e_cpos for the newly added branch. In the most case, it
is OK but if the parent extent block's rightmost rec covers more clusters
than the leaf does, it will cause kernel panic if we insert some clusters
in it. The message is something like:
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression:
le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count)
(7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28,
next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1
[<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2]
[<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2]
[<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2]
...

The panic can be easily reproduced by the following small test case
(with bs=512, cs=4K, and I remove all the error handling so that it looks
clear enough for reading).

int main(int argc, char **argv)
{
int fd, i;
char buf[5] = "test";

fd = open(argv[1], O_RDWR|O_CREAT);

for (i = 0; i < 30; i++) {
lseek(fd, 40960 * i, SEEK_SET);
write(fd, buf, 5);
}

ftruncate(fd, 1146880);

lseek(fd, 1126400, SEEK_SET);
write(fd, buf, 5);

close(fd);

return 0;
}

The reason of the panic is that:
the 30 writes and the ftruncate makes the file's extent list looks like:

Tree Depth: 1 Count: 19 Next Free Rec: 1
## Offset Clusters Block#
0 0 280 86183
SubAlloc Bit: 7 SubAlloc Slot: 0
Blknum: 86183 Next Leaf: 0
CRC32: 00000000 ECC: 0000
Tree Depth: 0 Count: 28 Next Free Rec: 28
## Offset Clusters Block# Flags
0 0 1 143368 0x0
1 10 1 143376 0x0
...
26 260 1 143576 0x0
27 270 1 143584 0x0

Now another write at 1126400(275 cluster) whiich will write at the gap
between 271 and 280 will trigger ocfs2_add_branch, but the result after
the function looks like:
Tree Depth: 1 Count: 19 Next Free Rec: 2
## Offset Clusters Block#
0 0 280 86183
1 271 0 143592
So the extent record is intersected and make the following operation bug out.

This patch just try to remove the gap before we add the new branch, so that
the root(branch) rightmost rec will cover the same right position. So in the
above case, before adding branch the tree will be changed to
Tree Depth: 1 Count: 19 Next Free Rec: 1
## Offset Clusters Block#
0 0 271 86183
SubAlloc Bit: 7 SubAlloc Slot: 0
Blknum: 86183 Next Leaf: 0
CRC32: 00000000 ECC: 0000
Tree Depth: 0 Count: 28 Next Free Rec: 28
## Offset Clusters Block# Flags
0 0 1 143368 0x0
1 10 1 143376 0x0
...
26 260 1 143576 0x0
27 270 1 143584 0x0
And after branch add, the tree looks like
Tree Depth: 1 Count: 19 Next Free Rec: 2
## Offset Clusters Block#
0 0 271 86183
1 271 0 143592

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
9b7895efac906d66d19856194e1ba61f37e231a4 13-Nov-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: Add a name indexed b-tree to directory inodes

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
74e77eb30d0ecbb12964d005b439c8b84a505b84 11-Mar-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Fix a bug found by sparse check.

We need to use le32_to_cpu to test rec->e_cpos in
ocfs2_dinode_insert_check.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
47be12e4eec84c1846f29af64fe25a396b57a026 09-Jan-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Access and dirty the buffer_head in mark_written.

In __ocfs2_mark_extent_written, when we meet with the situation
of c_split_covers_rec, the old solution just replace the extent
record and forget to access and dirty the buffer_head. This will
cause a problem when the unwritten extent is in an extent block.
So access and dirty it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
fd4ef231962ab44fd1004e87f9d7c6809f00cd64 30-Jan-2009 Mark Fasheh <mfasheh@suse.com> ocfs2: add quota call to ocfs2_remove_btree_range()

We weren't reclaiming the clusters which get free'd from this function,
so any user punching holes in a file would still have those bytes accounted
against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this.
Interestingly enough, the journal credits calculation already took this into
account.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Jan Kara <jack@suse.cz>
c19a28e1193a6c854738d609ae9b2fe2f6e6bea4 08-Jan-2009 Fernando Carrijo <fcarrijo@yahoo.com.br> remove lots of double-semicolons

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9047beabb8a396f0b18de1e4a9ab920cf92054af 05-Jan-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Access the right buffer_head in ocfs2_merge_rec_left.

In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*()
functions", the wrong buffer_head is accessed. So change it
to the right buffer_head.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2a50a743bdaab104155bd9e988d2ba3bb4177263 09-Dec-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Create ocfs2_xattr_value_buf.

When an ocfs2 extended attribute is large enough to require its own
allocation tree, we root it with an ocfs2_xattr_value_root. However,
these roots can be a part of inodes, xattr blocks, or xattr buckets.
Thus, they need a different journal access function for each container.

We wrap the bh, its journal access function, and the value root (xv) in
a structure called ocfs2_xattr_valu_buf. This is a package that can
be passed around. In this first pass, we simply pass it to the
extent tree code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
13723d00e374c2a6d6ccb5af6de965e89c3e1b01 18-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out. This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks. It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root. Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal. We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions. Let's typedef them
in ocfs2.h.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ffdd7a54631f07918b75e324d86713a08c11ec06 18-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Wrap up the common use cases of ocfs2_new_path().

The majority of ocfs2_new_path() calls are:

ocfs2_new_path(path_root_bh(otherpath),
path_root_el(otherpath));

Let's call that ocfs2_new_path_from_path(). The rest do similar things
from struct ocfs2_extent_tree. Let's call those
ocfs2_new_path_from_et(). This will make the next change easier.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
d6b32bbb3eae3fb787f1c33bf9f767ca1ddeb208 17-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: block read meta ecc.

Add block check calls to the read_block validate functions. This is the
almost all of the read-side checking of metaecc. xattr buckets are not checked
yet. Writes are also unchecked, and so a read-write mount will quickly fail.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
a90714c150e3ce677c57a9dac3ab1ec342c75a95 09-Oct-2008 Jan Kara <jack@suse.cz> ocfs2: Add quota calls for allocation and freeing of inodes and space

Add quota calls for allocation and freeing of inodes and space, also update
estimates on number of needed credits for a transaction. Move out inode
allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
outside of a transaction.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
53ef99cad9878f02f27bb30bc304fc42af8bdd6e 19-Nov-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: Remove JBD compatibility layer

JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
970e4936d7d15f35d00fd15a14f5343ba78b2fc8 13-Nov-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Validate metadata only when it's read from disk.

Add an optional validation hook to ocfs2_read_blocks(). Now the
validation function is only called when a block was actually read off of
disk. It is not called when the buffer was in cache.

We add a buffer state bit BH_NeedsValidate to flag these buffers. It
must always be one higher than the last JBD2 buffer state bit.

The dinode, dirblock, extent_block, and xattr_block validators are
lifted to this scheme directly. The group_descriptor validator needs to
be split into two pieces. The first part only needs the gd buffer and
is passed to ocfs2_read_block(). The second part requires the dinode as
well, and is called every time. It's only 3 compares, so it's tiny.
This also allows us to clean up the non-fatal gd check used by resize.c.
It now has no magic argument.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
5e96581a377fc6bd76e9b112da9aeb8a7ae8bf22 13-Nov-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Wrap extent block reads in a dedicated function.

We weren't consistently checking extent blocks after we read them.
Most places checked the signature, but none checked h_blkno or
h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does
the read and the validation.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
10995aa2451afa20b721cc7de856cae1a13dba57 13-Nov-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.

Random places in the code would check a dinode bh to see if it was
valid. Not only did they do different levels of validation, they
handled errors in different ways.

The previous commit unified inode block reads, validating all block
reads in the same place. Thus, these haphazard checks are no longer
necessary. Rather than eliminate them, however, we change them to
BUG_ON() checks. This ensures the assumptions remain true. All of the
code paths to these checks have been audited to ensure they come from a
validated inode read.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
b657c95c11088d77fc1bfc9c84d940f778bf9d12 13-Nov-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Wrap inode block reads in a dedicated function.

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
fecc01126d7a244b7e9b563c80663ffdca35343b 13-Nov-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: turn __ocfs2_remove_inode_range() into ocfs2_remove_btree_range()

This patch genericizes the high level handling of extent removal.
ocfs2_remove_btree_range() is nearly identical to
__ocfs2_remove_inode_range(), except that extent tree operations have been
used where necessary. We update ocfs2_remove_inode_range() to use the
generic helper. Now extent tree based structures have an easy way to
truncate ranges.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
2891d290aa6eee0821f7e4ad0b1c4ae4d964b0f1 12-Nov-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Add clusters free in dealloc_ctxt.

Now in ocfs2 xattr set, the whole process are divided into many small
parts and they are wrapped into diffrent transactions and it make the
set doesn't look like a real transaction. So we want to integrate it
into a real one.

In some cases we will allocate some clusters and free some in just one
transaction. e.g, one xattr is larger than inline size, so it and its
value root is stored within the inode while the value is outside in a
cluster. Then we try to update it with a smaller value(larger than the
size of root but smaller than inline size), we may need to free the
outside cluster while allocate a new bucket(one cluster) since now the
inode may be full. The old solution will lock the global_bitmap(if the
local alloc failed in stress test) and then the truncate log. This will
cause a ABBA lock with truncate log flush.

This patch add the clusters free in dealloc_ctxt, so that we can record
the free clusters during the transaction and then free it after we
release the global_bitmap in xattr set.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
0fcaa56a2a020dd6f90c202b7084e6f4cbedb6c2 10-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Simplify ocfs2_read_block()

More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set. Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
31d33073ca38603dea705dae45e094a64ca062d6 10-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Require an inode for ocfs2_read_block(s)().

Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode. Use it
unconditionally. Since it's there, we don't need to pass the
ocfs2_super either.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
a81cb88b64a479b78c6dd5666678d50171865db8 07-Oct-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: Don't check for NULL before brelse()

This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh
2b4e30fbde425828b17f0e9c8f8e3fd3ecb2bc75 04-Sep-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Switch over to JBD2.

ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change. Most functions are just renamed. The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem. It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being. This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
ocfs2_truncate_for_delete(). --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
8d6220d6a74a33552cf877bcea25503d7f6a59e6 22-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()

The original get/put_extent_tree() functions held a reference on
et_root_bh. However, every single caller already has a safe reference,
making the get/put cycle irrelevant.

We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It
no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is
removed. Callers now have a simpler init+use pattern.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
1625f8ac151743e452ec062c2989669c508ffa48 22-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Comment struct ocfs2_extent_tree_operations.

struct ocfs2_extent_tree_operations provides methods for the different
on-disk btrees in ocfs2. Describing what those methods do is probably a
good idea.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
f99b9b7ccf6a691f653cec45f36bfdd1e94769c7 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.

We now have three different kinds of extent trees in ocfs2: inode data
(dinode), extended attributes (xattr_tree), and extended attribute
values (xattr_value). There is a nice abstraction for them,
ocfs2_extent_tree, but it is hidden in alloc.c. All the calling
functions have to pick amongst a varied API and pass in type bits and
often extraneous pointers.

A better way is to make ocfs2_extent_tree a first-class object.
Everyone converts their object to an ocfs2_extent_tree() via the
ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
tree calls to alloc.c.

This simplifies a lot of callers, making for readability. It also
provides an easy way to add additional extent tree types, as they only
need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
1e61ee79e2a96f62c007486677319814ce621c3c 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Add an insertion check to ocfs2_extent_tree_operations.

A couple places check an extent_tree for a valid inode. We move that
out to add an eo_insert_check() operation. It can be called from
ocfs2_insert_extent() and elsewhere.

We also have the wrapper calls ocfs2_et_insert_check() and
ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to
provide useless operations for xattr types.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
1a09f556e5415a29cdddaf9a6ebf474194161cf3 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Create specific get_extent_tree functions.

A caller knows what kind of extent tree they have. There's no reason
they have to call ocfs2_get_extent_tree() with a NULL when they could
just as easily call a specific function to their type of extent tree.

Introduce ocfs2_dinode_get_extent_tree(),
ocfs2_xattr_tree_get_extent_tree(), and
ocfs2_xattr_value_get_extent_tree(). They only take the necessary
arguments, calling into the underlying __ocfs2_get_extent_tree() to do
the real work.

__ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but
without needing any switch-by-type logic.

ocfs2_get_extent_tree() is now a wrapper around the specific calls. It
exists because a couple alloc.c functions can take et_type. This will
go later.

Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a
struct ocfs2_xattr_value_root* instead of void*. This gives us
typechecking where we didn't have it before.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
943cced39ee45ed2db25efd25eee8ba49cf2dfc4 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Determine an extent tree's max_leaf_clusters in an et_op.

Provide an optional extent_tree_operation to specify the
max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value
is 0 (unlimited).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
1c25d93a4a27c90c3ae33f9e724f7b67783d68d1 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Use struct ocfs2_extent_tree in ocfs2_num_free_extents().

ocfs2_num_free_extents() re-implements the logic of
ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not
allocate, let's use it in ocfs2_num_free_extents() to simplify the code.

The inode validation code in ocfs2_num_free_extents() is not needed.
All callers are passing in pre-validated inodes.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
0ce1010f1a4319e02574b856d50dfdc0ed855f40 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Provide the get_root_el() method to ocfs2_extent_tree_operations.

The root_el of an ocfs2_extent_tree needs to be calculated from
et->et_object. Make it an operation on et->et_ops.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ea5efa151265a743f48e3d371992a0100d73a0eb 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Make 'private' into 'object' on ocfs2_extent_tree.

The 'private' pointer was a way to store off xattr values, which don't
live at a set place in the bh. But the concept of "the object
containing the extent tree" is much more generic. For an inode it's the
struct ocfs2_dinode, for an xattr value its the value. Let's save off
the 'object' at all times. If NULL is passed to
ocfs2_get_extent_tree(), 'object' is set to bh->b_data;

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
dc0ce61af418305afa7e0d05d86ab334e0daabf7 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Make ocfs2_extent_tree get/put instead of alloc.

Rather than allocating a struct ocfs2_extent_tree, just put it on the
stack. Fill it with ocfs2_get_extent_tree() and drop it with
ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet
still safely ref the root_bh.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ce1d9ea621291ed5e985d6677278c6bb20d96a40 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Prefix the ocfs2_extent_tree structure.

The members of the ocfs2_extent_tree structure gain a prefix of 'et_'.
All users are updated.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
35dc0aa3c5e7391319754e0c19cdfc0a28eb5b25 21-Aug-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Prefix the extent tree operations structure.

The ocfs2_extent_tree_operations structure gains a field prefix on its
members. The ->eo_sanity_check() operation gains a wrapper function for
completeness. All of the extent tree operation wrappers gain a
consistent name (ocfs2_et_*()).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ca12b7c48942d21b2e7890b820db9d578bc291cd 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Optionally limit extent size in ocfs2_insert_extent()

In xattr bucket, we want to limit the maximum size of a btree leaf,
otherwise we'll lose the benefits of hashing because we'll have to search
large leaves.

So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster
size we want so that we can prevent ocfs2_insert_extent() from merging the leaf
record even if it is contiguous with an adjacent record.

Other btree types are not affected by this change.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ba492615f0d32d0210b02c14b24512b4372b13d6 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Add xattr index tree operations

When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to
store large numbers of EAs. This patch adds a new type in
ocfs2_extent_tree_type and adds the implementation so that we can re-use the
b-tree code to handle the storage of many EAs.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
fdd77704a8b4666a32120fcd1e4a9fedaf3263d8 18-Aug-2008 Tiger Yang <tiger.yang@oracle.com> ocfs2: reserve inline space for extended attribute

Add the structures and helper functions we want for handling inline extended
attributes. We also update the inline-data handlers so that they properly
function in the event that we have both inline data and inline attributes
sharing an inode block.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
f56654c435c06f2b2bd5751889b1a08a3add7d6c 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Add extent tree operation for xattr value btrees

Add some thin wrappers around ocfs2_insert_extent() for each of the 3
different btree types, ocfs2_inode_insert_extent(),
ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
last is for the xattr index btree, which will be used in a followup patch.

All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
while the other two handle the xattr issue. And the init of extent tree are
handled by these functions.

When storing xattr value which is too large, we will allocate some clusters
for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
order to re-use the b-tree operation code, a new parameter named "private"
is added into ocfs2_extent_tree and it is used to indicate the root of
ocfs2_exent_list. The reason is that we can't deduce the root from the
buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
in any place in an ocfs2_xattr_bucket.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
0eb8d47e69a2211a36643b180f1843ef45f6017d 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Make high level btree extend code generic

Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
ocfs2_do_cluster_allocation() now, but the latter can be used for other
btree types as well.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
e7d4cb6bc19658646357eeff134645cd9bc3479f 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Abstract ocfs2_extent_tree in b-tree operations.

In the old extent tree operation, we take the hypothesis that we
are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
As xattr will also use ocfs2_extent_list to store large value
for a xattr entry, we refactor the tree operation so that xattr
can use it directly.

The refactoring includes 4 steps:
1. Abstract set/get of last_eb_blk and update_clusters since they may
be stored in different location for dinode and xattr.
2. Add a new structure named ocfs2_extent_tree to indicate the
extent tree the operation will work on.
3. Remove all the use of fe_bh and di, use root_bh and root_el in
extent tree instead. So now all the fe_bh is replaced with
et->root_bh, el with root_el accordingly.
4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
in file extend allocation. But the whole function is useful when we want
to store large EAs.

Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
for anything other than truncate inode data btrees.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
811f933df1e55615fd0bb4818f31e3868a8e6e23 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode.

ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
they are all limited to an inode btree because they use a struct
ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
(the part of an ocfs2_dinode they actually use) so that the xattr btree code
can use these functions.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
231b87d10920e024efaf0f9e86e1bab7bced1620 18-Aug-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Modify ocfs2_num_free_extents for future xattr usage.

ocfs2_num_free_extents() is used to find the number of free extent records
in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
use this for extended attribute trees in the future, so genericize the
interface the take a buffer head. A future patch will allow that buffer_head
to contain any structure rooting an ocfs2 btree.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
00dc417fa3e763345b34ccb6034d72de76eea0a1 03-Oct-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: fiemap support

Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be
refactored so that the extent cache can be skipped in favor of going
directly to the on-disk records. This makes it easier for us to determine
which extent is the last one in the btree. Also, I'm not sure we want to be
caching fiemap lookups anyway as they're not directly related to data
read/write.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: ocfs2-devel@oss.oracle.com
Cc: linux-fsdevel@vger.kernel.org
9d8df6aa9b1ca74127b11537d91de492dbea666a 21-May-2008 Al Viro <viro@ftp.linux.org.uk> ocfs2 endianness fixes

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
b1f3550fa1471b691ad6c2f35b5b22e93eaa5855 05-Mar-2008 Julia Lawall <julia@diku.dk> ocfs2: Use BUG_ON

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
4d0ddb2ce25db2254d468233d942276ecf40bff8 05-Mar-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Add inode stealing for ocfs2_reserve_new_inode

Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
ad5a4d7093a76fa245e277e6f0f0e168a08aeff7 30-Jan-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Enable cross extent block merge.

In ocfs2_figure_merge_contig_type, we judge whether there exists
a cross extent block merge and enable it by setting CONTIG_LEFT
and CONTIG_RIGHT accordingly.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
677b975282e48d1818df4181336307377d56b04e 30-Jan-2008 Tao Ma <tao.ma@oracle.com> ocfs2: Add support for cross extent block

In ocfs2_merge_rec_left, when we find the merge extent is "CONTIG_RIGHT"
with the first extent record of the next extent block, we will merge it to
the next extent block and change all the related extent blocks accordingly.

In ocfs2_merge_rec_right, when we find the merge extent is "CONTIG_LEFT"
with the last extent record of the previous extent block, we will merge
it to the prevoius extent block and change all the related extent blocks
accordingly.

As for CONTIG_LEFTRIGHT, we will handle CONTIG_RIGHT first so that when
the index is zero, the merge process will be more efficient and easier.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
eebd2aa355692afaf9906f62118620f1a1c19dbb 05-Feb-2008 Christoph Lameter <clameter@sgi.com> Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user

Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

Zeros two segments of the page. It takes the position where to
start and end the zeroing which avoids length calculations and
makes code clearer.

zero_user_segment(page, start, end)

Same for a single segment.

zero_user(page, start, length)

Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

Having to treat this special case everywhere makes the
code needlessly complex. The parameter for zeroing is always
KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c78bad11fbf1272ea021f56458025dc98486d6f4 03-Feb-2008 Joe Perches <joe@perches.com> fs/: Spelling fixes

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
e63aecb651ba73dffc62f9608ee1b7ae2a0ffd4b 19-Oct-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Rename ocfs2_meta_[un]lock

Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e8aed3450c0afd6fdb79ec233f806e3e69454dfe 04-Dec-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Re-journal buffers after transaction extend

ocfs2_extend_trans() might call journal_restart() which will commit dirty
buffers and then restart the transaction. This means that any buffers which
still need changes should be passed to journal_access() again. Some paths
during extend weren't doing this right.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
92295d8054289eff0d52b4d12349f9b9df0f58e4 04-Dec-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Don't panic when truncating an empty extent

This BUG_ON() was unintentionally left in after the sparse file support was
written.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
3cf0c507dd28de0e1a4c24304d806e6b3976f0f5 27-Oct-2007 Roel Kluin <12o3l@tiscali.nl> [PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}

Fixes priority mistakes similar to '!x & y'

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
5b6a3a2b4a5f071d170f8122038dd647a84810a8 14-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Write support for directories with inline data

Create all new directories with OCFS2_INLINE_DATA_FL and the inline data
bytes formatted as an empty directory. Inode size field reflects the actual
amount of inline data available, which makes searching for dirent space
very similar to the regular directory search.

Inline-data directories are automatically pushed out to extents on any
insert request which is too large for the available space.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
1afc32b952335f665327a1a9001ba1b44bb76fd9 07-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Write support for inline data

This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.

For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.

Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
1d410a6e337a0d2d5543ad1d9bccb670a7a05312 07-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Small refactor of truncate zeroing code

We'll want to reuse most of this when pushing inline data back out to an
extent. Keeping this part as a seperate patch helps to keep the upcoming
changes for write support uncluttered.

The core portion of ocfs2_zero_cluster_pages() responsible for making sure a
page is mapped and properly dirtied is abstracted out into it's own
function, ocfs2_map_and_dirty_page(). Actual functionality doesn't change,
though zeroing becomes optional.

We also turn part of ocfs2_free_write_ctxt() into a common function for
unlocking and freeing a page array. This operation is very common (and
uniform) for Ocfs2 cluster sizes greater than page size, so it makes sense
to keep the code in one place.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
015452b15ff6c2d9fa1f82f28d61e7a66e2df86a 12-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove unused structure field

c_used_tail_recs in struct ocfs2_merge_ctxt is only ever set, so we can
remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
518d7269f3c9129ae51d5f804edff998ab945a40 29-Aug-2007 Tao Mao <tao.ma@oracle.com> ocfs2: remove unused variable

delete_tail_recs in ocfs2_try_to_merge_extent() was only ever set, remove
it.

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
c77534f6fb6d58e27d923b6825ea3b0ef485ab26 29-Aug-2007 Tao Mao <tao.ma@oracle.com> ocfs2: remove mostly unused field from insert structure

ocfs2_insert_type->ins_free_records was only used in one place, and was set
incorrectly in most places. We can free up some memory and lose some code by
removing this.

* Small warning fixup contributed by Andrew Mortom <akpm@linux-foundation.org>

Signed-off-by: Tao Mao <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e535e2efd295c3990bb9f654c8bb6bd176ebdc2b 31-Aug-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Fix calculation of i_blocks during truncate

We were setting i_blocks too early - before truncating any allocation.
Correct things to set i_blocks after the allocation change.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
6a18380e7ddd7d1a0493efe3be6475dd92323364 23-Jul-2007 Adrian Bunk <bunk@stusta.de> [2.6 patch] ocfs2_insert_extent(): remove dead code

This patch removes some now dead code.

Spotted by the Coverity checker.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
063c4561f52a74de686fe0ff2f96f4f54c9fecd2 03-Jul-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: support for removing file regions

Provide an internal interface for the removal of arbitrary file regions.

ocfs2_remove_inode_range() takes a byte range within a file and will remove
existing extents within that range. Partial clusters will be zeroed so that
any read from within the region will return zeros.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
35edec1d52c075975991471d624b33b9336226f2 06-Jul-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: update truncate handling of partial clusters

The partial cluster zeroing code used during truncate usually assumes that
the rightmost byte in the range to be zeroed lies on a cluster boundary.
This makes sense for truncate, but punching holes might require zeroing on
non-aligned rightmost boundaries.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
d0c7d7082ee1ec4f95ee57bf86ed39d1a27c4037 03-Jul-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: btree support for removal of arbirtrary extents

Add code to the btree paths to support the removal of arbitrary regions
within an existing extent. With proper higher level support this can be used
to "punch holes" in a file. Truncate (a special case of hole punching) could
also be converted to use these methods.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2ae99a60374f360ba07037ebbf33d19b89ac43a6 10-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Support creation of unwritten extents

This can now be trivially supported with re-use of our existing extend code.

ocfs2_allocate_unwritten_extents() takes a start offset and a byte length
and iterates over the inode, adding extents (marked as unwritten) until len
is reached. Existing extents are skipped over.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
328d5752e1259dfb29b7e65f6c2d145fddbaa750 18-Jun-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: btree changes for unwritten extents

Writes to a region marked as unwritten might result in a record split or
merge. We can support splits by making minor changes to the existing insert
code. Merges require left rotations which mostly re-use right rotation
support functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
c3afcbb34426a9291e4c038540129053a72c3cd8 29-May-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: abstract btree growing calls

The top level calls and logic for growing a tree can easily be abstracted
out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree().

This allows future code to easily grow btrees when needed.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1f6697d072e6fd0b332a4301c21060dcb89bd623 25-Jun-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: use all extent block suballocators

Now that we have a method to deallocate blocks from them, each node should
allocate extent blocks from their local suballocator file.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
59a5e416d1ab543a5248a2b34d83202c4d55d132 23-Jun-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: plug truncate into cached dealloc routines

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2b604351bc99b4e4504758cbac369b660b71de0b 23-Jun-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: simplify deallocation locking

Deallocation of suballocator blocks, most notably extent blocks, might
involve multiple suballocator inodes.

The locking for this can get extremely complicated, especially when the
suballocator inodes to delete from aren't known until deep within an
unrelated codepath.

Implement a simple scheme for recording the blocks to be unlinked so that
the actual deallocation can be done in a context which won't deadlock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2e89b2e48e1da09ed483f195968c9172aa95b5e2 09-May-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: take ip_alloc_sem during entire truncate

Use of the alloc sem during truncate was too narrow - we want to protect
the i_size change and page truncation against mmap now.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1ca1a111b1e6be843c9ce5245dcd570312998d94 28-Apr-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: fix sparse warnings in fs/ocfs2

None of these are actually harmful, but the noise makes looking for real
problems difficult.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
83418978827324918a8cd25ce5227312de1d4468 24-Apr-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Cache extent records

The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.

Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
49cb8d2d496ce06869ccca2ab368ed6b0b5b979d 10-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Read from an unwritten extent returns zeros

Return an optional extent flags field from our lookup functions and wire up
callers to treat unwritten regions as holes for the purpose of returning
zeros to the user.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e48edee2d8eab812f31f0ff62c6ba635ca2e1e21 08-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: make room for unwritten extents flag

Due to the size of our group bitmaps, we'll never have a leaf node extent
record with more than 16 bits worth of clusters. Split e_clusters up so that
leaf nodes can get a flags field where we can mark unwritten extents.
Interior nodes whose length references all the child nodes beneath it can't
split their e_clusters field, so we use a union to preserve sizing there.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
fa41045fcbf78269991d5aebb1820fc51534f05d 01-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Use do_sync_mapping_range() in ocfs2_zero_tail_for_truncate()

Do this instead of filemap_fdatawrite() - this way we sync only the
range between i_size and the cluster boundary.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
60b11392f1a09433740bda3048202213daa27736 16-Feb-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: zero tail of sparse files on truncate

Since we don't zero on extend anymore, truncate needs to be fixed up to zero
the part of a file between i_size and and end of it's cluster. Otherwise a
subsequent extend could expose bad data.

This introduced a new helper, which can be used in ocfs2_write().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
3a0782d09c07aa3ec767ba6089cd15cfbfbfc508 17-Jan-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: teach extend/truncate about sparse files

For ocfs2_truncate_file(), we eliminate the "simple" truncate case which no
longer exists since i_size is not tied to i_clusters. In
ocfs2_extend_file(), we skip the allocation / page zeroing code for file
systems which understand sparse files.

The core truncate code is changed to do a bottom up tree traversal. This
gets abstracted out into it's own function. To make things more readable,
most of the special case handling for in-inode extents from
ocfs2_do_truncate() is also removed.

Though write support for sparse files comes in a later patch, we at least
update ocfs2_prepare_inode_for_write() to skip allocation for sparse files.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
363041a5f74b953ab6b705ac9c88e5eda218a24b 17-Jan-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: temporarily remove extent map caching

The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
dcd0538ff4e854fa9d7f4630b359ca8fdb5cb5a8 16-Jan-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: sparse b-tree support

Introduce tree rotations into the b-tree code. This will allow ocfs2 to
support sparse files. Much of the added code is designed to be generic (in
the ocfs2 sense) so that it can later be re-used to implement large
extended attributes.

This patch only adds the rotation code and does minimal updates to callers
of the extent api.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
cd86128088554d64fea1679191509f00e6353c5b 13-Dec-2006 Robert P. J. Day <rpjday@mindspring.com> [PATCH] Fix numerous kcalloc() calls, convert to kzalloc()

All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
ordering of the first two arguments are fixed.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1fabe1481fac9e01bf8bffa60a2307ef379aa5de 10-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t

This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
65eff9ccf86d63eb5c3e9071450a36e4e4fa9564 10-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: remove handle argument to ocfs2_start_trans()

All callers either pass in NULL directly, or a local variable that is
already set to NULL.

The internals of ocfs2_start_trans() get a nice cleanup as a result.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
02dc1af44e9fa4b8801169891b3a1ba4047537ad 10-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: pass ocfs2_super * into ocfs2_commit_trans()

This sets us up to remove handle->journal.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
4bcec1847ac4f75c2ee6d091b495f34d8d822e6a 10-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: remove unused handle argument from ocfs2_meta_lock_full()

Now that this is unused and all callers pass NULL, we can safely remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e08dc8b9808f06d412904db4d67434bf19984752 06-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: don't pass handle to ocfs2_meta_lock() in __ocfs2_flush_truncate_log()

Take and drop the locks directly.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1fc581467e52546195c7ee8233a34d63c1cc1322 05-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: have ocfs2_extend_trans() take handle_t

No reason to use our wrapper struct in this function, so take the handle_t
directly.

Also fixes a bug where we were incorrectly setting the handle to NULL in
case of a failure from journal_restart()

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
eb35746ca5e2211569b91ebb44d55b88ec91f3b0 09-Aug-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove overzealous BUG_ON()

The truncate code was never supposed to BUG() on an allocator it doesn't
know about, but rather to ignore it. Right now, this does nothing, but when
we change our allocation paths to use all suballocator files, this will
allow current versions of the fs module to work fine.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
b0697053f9e8de9cea3d510d9e290851ece9460b 03-Mar-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: don't use MLF* in the file system

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1b1dcc1b57a49136f118a0f16367256ff9994a69 10-Jan-2006 Jes Sorensen <jes@sgi.com> [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ccd979bdbce9fba8412beb3f1de68a9d0171b12c 15-Dec-2005 Mark Fasheh <mark.fasheh@oracle.com> [PATCH] OCFS2: The Second Oracle Cluster Filesystem

The OCFS2 file system module.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>