History log of /fs/xfs/xfs_trans.c
Revision Date Author Comments
2451337dd043901b5270b7586942abe564443e3d 25-Jun-2014 Dave Chinner <dchinner@redhat.com> xfs: global error sign conversion

Convert all the errors the core XFs code to negative error signs
like the rest of the kernel and remove all the sign conversion we
do in the interface layers.

Errors for conversion (and comparison) found via searches like:

$ git grep " E" fs/xfs
$ git grep "return E" fs/xfs
$ git grep " E[A-Z].*;$" fs/xfs

Negation points found via searches like:

$ git grep "= -[a-z,A-Z]" fs/xfs
$ git grep "return -[a-z,A-D,F-Z]" fs/xfs
$ git grep " -[a-z].*;" fs/xfs

[ with some bits I missed from Brian Foster ]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
b474c7ae4395ba684e85fde8f55c8cf44a39afaf 22-Jun-2014 Eric Sandeen <sandeen@sandeen.net> xfs: Nuke XFS_ERROR macro

XFS_ERROR was designed long ago to trap return values, but it's not
runtime configurable, it's not consistently used, and we can do
similar error trapping with ftrace scripts and triggers from
userspace.

Just nuke XFS_ERROR and associated bits.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
d99831ff393ff2e28d6110b41f24d9fecf986222 22-Jun-2014 Eric Sandeen <sandeen@sandeen.net> xfs: return is not a function

return is not a function. "return(EIO);" is silly;
"return (EIO);" moreso. return is not a function.
Nuke the pointless parens.

[dchinner: catch a couple of extra cases in xfs_attr_list.c,
xfs_acl.c and xfs_linux.h.]

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
e4a1e29cb0ace3a322c5c07d33dd1f4ab50dbbb8 14-Apr-2014 Eric Sandeen <sandeen@redhat.com> xfs: remove unused ail pointer arg from xfs_trans_ail_cursor_done()

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
c6f9726444c8f8c7df24950864bf1a4cb2c61b3e 07-Feb-2014 Jie Liu <jeff.liu@oracle.com> xfs: convert xfs_log_commit_cil() to void

Convert xfs_log_commit_cil() to a void function since it return nothing
but 0 in any case, after that we can simplify the relative code logic
in xfs_trans_commit() accordingly.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
a4fbe6ab1e7abecf42b75e9c73701ed33b4ab03b 23-Oct-2013 Dave Chinner <dchinner@redhat.com> xfs: decouple inode and bmap btree header files

Currently the xfs_inode.h header has a dependency on the definition
of the BMAP btree records as the inode fork includes an array of
xfs_bmbt_rec_host_t objects in it's definition.

Move all the btree format definitions from xfs_btree.h,
xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
xfs_format.h to continue the process of centralising the on-disk
format definitions. With this done, the xfs inode definitions are no
longer dependent on btree header files.

The enables a massive culling of unnecessary includes, with close to
200 #include directives removed from the XFS kernel code base.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
239880ef6454ccff2ba8d762c3f86e8278f0ce1c 23-Oct-2013 Dave Chinner <dchinner@redhat.com> xfs: decouple log and transaction headers

xfs_trans.h has a dependency on xfs_log.h for a couple of
structures. Most code that does transactions doesn't need to know
anything about the log, but this dependency means that they have to
include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
files and clean up the includes to be in dependency order.

In doing this, remove the direct include of xfs_trans_reserve.h from
xfs_trans.h so that we remove the dependency between xfs_trans.h and
xfs_mount.h. Hence the xfs_trans.h include can be moved to the
indicate the actual dependencies other header files have on it.

Note that these are kernel only header files, so this does not
translate to any userspace changes at all.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
5706278758e334bf6a15f57c18dc16df19c83957 15-Oct-2013 Dave Chinner <dchinner@redhat.com> xfs: unify directory/attribute format definitions

The on-disk format definitions for the directory and attribute
structures are spread across 3 header files right now, only one of
which is dedicated to defining on-disk structures and their
manipulation (xfs_dir2_format.h). Pull all the format definitions
into a single header file - xfs_da_format.h - and switch all the
code over to point at that.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
70a9883c5f34b215b8a77665cefd0398edc5a9ef 23-Oct-2013 Dave Chinner <dchinner@redhat.com> xfs: create a shared header file for format-related information

All of the buffer operations structures are needed to be exported
for xfs_db, so move them all to a common location rather than
spreading them all over the place. They are verifying the on-disk
format, so while xfs_format.h might be a good place, it is not part
of the on disk format.

Hence we need to create a new header file that we centralise these
related definitions. Start by moving the bffer operations
structures, and then also move all the other definitions that have
crept into xfs_log_format.h and xfs_format.h as there was no other
shared header file to put them in.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
904c17e6832845cc651a4d5108a7d57eacdb61f7 28-Aug-2013 Dave Chinner <dchinner@redhat.com> xfs: finish removing IOP_* macros.

In optimising the CIL operations, some of the IOP_* macros for
calling log item operations were removed. Remove the rest of them as
Christoph requested.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Geoffrey Wehrman <gwehrman@sgi.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
3d3c8b5222b92447bffaa4127ee18c757f32a460 12-Aug-2013 Jie Liu <jeff.liu@oracle.com> xfs: refactor xfs_trans_reserve() interface

With the new xfs_trans_res structure has been introduced, the log
reservation size, log count as well as log flags are pre-initialized
at mount time. So it's time to refine xfs_trans_reserve() interface
to be more neat.

Also, introduce a new helper M_RES() to return a pointer to the
mp->m_resv structure to simplify the input.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
0eadd1028858b193ce8cdc36bf443d39b42141ca 12-Aug-2013 Jie Liu <jeff.liu@oracle.com> xfs: Introduce a new structure to hold transaction reservation items

Introduce a new structure xfs_trans_res to hold transaction
reservation item info per log ticket.

We also need to improve xfs_trans_resv_calc() by initializing the
log count as well as log flags for permanent log reservation.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
7fd36c4418ee86712db6871ac95ab23743224bff 12-Aug-2013 Dave Chinner <dchinner@redhat.com> xfs: split out transaction reservation code

The transaction reservation size calculations is used by both kernel
and userspace, but most of the transaction code in xfs_trans.c is
kernel specific. Split all the transaction reservation code out into
it's own files to make sharing with userspace simpler. This just
leaves kernel-only definitions in xfs_trans.h, so it doesn't need to
be shared with userspace anymore, either.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2a3c0acc351baf1e5e584d8d0692f41d5067582c 12-Aug-2013 Dave Chinner <dchinner@redhat.com> xfs: split out on-disk transaction definitions

There's a bunch of definitions in xfs_trans.h that define on-disk
formats - transaction headers that get written into the log, log
item type definitions, etc. Split out everything into a separate
file so that all which remains in xfs_trans.h are kernel only
definitions.

Also, remove the duplicate magic number definitions for
XFS_TRANS_MAGIC...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
6ca1c9063d1952b20c61136e01e6a8987371616b 12-Aug-2013 Dave Chinner <dchinner@redhat.com> xfs: separate dquot on disk format definitions out of xfs_quota.h

The on disk format definitions of the on-disk dquot, log formats and
quota off log formats are all intertwined with other definitions for
quotas. Separate them out into their own header file so they can
easily be shared with userspace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
b8402b4729495ac719a3f532c2e33ac653b222a8 27-Jun-2013 Dave Chinner <david@fromorbit.com> xfs: Inode create transaction reservations

Define the log and space transaction sizes. Factor the current
create log reservation macro into the two logical halves and reuse
one half for the new icreate transactions. The icreate transaction
is transparent to all the high level create code - the
pre-calculated reservations will correctly set the reservations
dependent on whether the filesystem supports the icreate
transaction.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
a21cd503678286c56b1d0cca1c99349a4aa042f4 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: refactor space log reservation for XFS_TRANS_ATTR_SET

Currently, we calculate the attribute set transaction
log space reservation at runtime in two parts:

1) XFS_ATTRSET_LOG_RES() which is calcuated out at mount time.

2) ((ext * (mp)->m_sb.sb_sectsize) + \
(ext * XFS_FSB_TO_B((mp), XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))) + \
(128 * (ext + (ext * XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))))))
which is calculated out at runtime since it depend on the given extent length in blocks.

This patch renamed XFS_ATTRSET_LOG_RES(mp) to XFS_ATTRSETM_LOG_RES(mp) to indicate
that it is figured out at mount time. Introduce XFS_ATTRSETRT_LOG_RES(mp) which would
be used to calculate out the unit of the log space reservation for one block.

In this way, the total runtime space for the given extent length can be figured out by:
XFS_ATTRSETM_LOG_RES(mp) + XFS_ATTRSETRT_LOG_RES(mp) * ext

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
a7bd794a0f489a66ad595f2bcab0eac8f232e409 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: introduce XFS_SB_LOG_RES() for transactions that modify sb on disk

Introduce a new transaction space reservation XFS_SB_LOG_RES() for
those transactions that need to modify the superblock on disk.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
762d7ba657ed4a0934b4da7dcef058012f252e0f 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: calculate XFS_TRANS_QM_QUOTAOFF_END space log reservation at mount time

Convert the calculation for end of quotaoff log space reservation
from runtime to mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
a1bd9557544d59140c4ac87fe405069b9e1aaf99 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: calculate XFS_TRANS_QM_QUOTAOFF space log reservation at mount time

Convert the calculation of quota off transaction log space reservation
from runtime to mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
4800104438a4467ffa5ae1e51d5a59c0f64e5f9a 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: calculate XFS_TRANS_QM_DQALLOC space log reservation at mount time

The disk quota allocation log space reservation is calcuated at runtime,
this patch does it at mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
f0f2df94faca43fd26f85af7e83df240777c8c37 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: calcuate XFS_TRANS_QM_SETQLIM space log reservation at mount time

For adjusting quota limits transactions, we calculate out the log space
reservation at runtime, this patch does it at mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
b0c10b983a3e5cc35f239999df1b8bad1ba5b8f6 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: calculate XFS_TRANS_QM_SBCHANGE space log reservation at mount time

The transaction log space for clearing/reseting the quota flags
is calculated out at runtime, this patch can figure it out at
mount time.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
5b292ae3a951a58e32119d73c7ac8f5bec7395a3 01-Feb-2013 Jeff Liu <jeff.liu@oracle.com> xfs: make use of xfs_calc_buf_res() in xfs_trans.c

Refining the existing reservations with xfs_calc_buf_res() in xfs_trans.c

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
4f3b57832ba39223c6f8823d07b9fb206e282ced 28-Jan-2013 Jeff Liu <jeff.liu@oracle.com> xfs: add a helper to figure out the space log reservation per item

Add a new helper xfs_calc_buf_res() to calcuate out the transaction space
reservations per item. xfs_buf_log_overhead() is used to figure out the
extra space for struct xfs_buf_log_format that gets written into the log
for every buffer as well as a log opheader, i.e. struct xlog_op_header.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
CC: Dave Chinner <david@fromorbit.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
d9457dc056249913a7abe8b71dc09e427e590e35 12-Jun-2012 Jan Kara <jack@suse.cz> xfs: Convert to new freezing code

Generic code now blocks all writers from standard write paths. So we add
blocking of all writers coming from ioctl (we get a protection of ioctl against
racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
non-racy freeze protection. We also keep freeze protection on transaction
start to block internal filesystem writes such as removal of preallocated
blocks.

CC: Ben Myers <bpm@sgi.com>
CC: Alex Elder <elder@kernel.org>
CC: xfs@oss.sgi.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
77ba78776e90e8de541f13b326e284c74286252f 02-Apr-2012 Al Viro <viro@zeniv.linux.org.uk> xfs: switch to proper __bitwise type for KM_... flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ad1e95c54eb3980ab2b4683fba29ad0ef954ec51 23-Apr-2012 Dave Chinner <dchinner@redhat.com> xfs: clean up xfs_bit.h includes

With the removal of xfs_rw.h and other changes over time, xfs_bit.h
is being included in many files that don't actually need it. Clean
up the includes as necessary.

Also move the only-used-once xfs_ialloc_find_free() static inline
function out of a header file that is widely included to reduce
the number of needless dependencies on xfs_bit.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
4ecbfe637cbcc0f093d1f295ef483f4e31e3987b 29-Apr-2012 Dave Chinner <dchinner@redhat.com> xfs: clean up busy extent naming

Now that the busy extent tracking has been moved out of the
allocation files, clean up the namespace it uses to
"xfs_extent_busy" rather than a mix of "xfs_busy" and
"xfs_alloc_busy".

Signed-off-by: Dave Chinner<dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
efc27b52594e322d4c94e379489fa3690bf74739 29-Apr-2012 Dave Chinner <dchinner@redhat.com> xfs: move busy extent handling to it's own file

To make it easier to handle userspace code merges, move all the busy
extent handling out of the allocation code and into it's own file.
The userspace code does not need the busy extent code, so this
simplifies the merging of the kernel code into the userspace
xfsprogs library.

Because the busy extent code has been almost completely rewritten
over the past couple of years, also update the copyright on this new
file to include the authors that made all those changes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
60a34607b26b60d6b5c5c928ede7fc84b0f06b85 23-Apr-2012 Dave Chinner <dchinner@redhat.com> xfs: move xfsagino_t to xfs_types.h

Untangle the header file includes a bit by moving the definition of
xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on
xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include
xfs_ag.h.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
9006fb91cfdf22812923f0536c7531c429c1aeab 20-Feb-2012 Christoph Hellwig <hch@infradead.org> xfs: split and cleanup xfs_log_reserve

Split the log regrant case out of xfs_log_reserve into a separate function,
and merge xlog_grant_log_space and xlog_regrant_write_log_space into their
respective callers. Also replace the XFS_LOG_PERM_RESERV flag, which easily
got misused before the previous cleanups with a simple boolean parameter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
f65020a83ad570c1788f7d8ece67f3487166576b 13-Feb-2012 Jesper Juhl <jj@chaosbits.net> XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended

It looks to me like the two ASSERT()s in xfs_trans_add_item() really
want to do a compare (==) rather than assignment (=).
This patch changes it from the latter to the former.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Ben Myers <bpm@sgi.com>

(cherry picked from commit 05293485a0b6b1f803e8a3c0ff188c38f6969985)
05293485a0b6b1f803e8a3c0ff188c38f6969985 13-Feb-2012 Jesper Juhl <jj@chaosbits.net> XFS: xfs_trans_add_item() - don't assign in ASSERT() when compare is intended

It looks to me like the two ASSERT()s in xfs_trans_add_item() really
want to do a compare (==) rather than assignment (=).
This patch changes it from the latter to the former.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Ben Myers <bpm@sgi.com>
b39342134a6ec72778ffc2ddbd3c0faa10c64676 06-Dec-2011 Christoph Hellwig <hch@infradead.org> xfs: remove the lid_size field in struct log_item_desc

Outside the now removed nodelaylog code this field is only used for
asserts and can be safely removed now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
0244b9603df38bf19155b761689e1a816fc50b0a 06-Dec-2011 Christoph Hellwig <hch@infradead.org> xfs: cleanup the transaction commit path a bit

Now that the nodelaylog mode is gone we can simplify the transaction commit
path a bit by removing the xfs_trans_commit_cil routine. Restoring the
process flags is merged into xfs_trans_commit which already does it for
the error path, and allocating the log vectors is merged into
xlog_cil_format_items, which already fills them with data, thus avoiding
one loop over all log items.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
93b8a5854f247138e401471a9c3b82ccb62ff608 06-Dec-2011 Christoph Hellwig <hch@infradead.org> xfs: remove the deprecated nodelaylog option

The delaylog mode has been the default for a long time, and the nodelaylog
option has been scheduled for removal in Linux 3.3. Remove it and code
only used by it now that we have opened the 3.3 window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
ddc3415aba1cb2f86d1fcad720cea834ee178f54 19-Sep-2011 Christoph Hellwig <hch@infradead.org> xfs: simplify xfs_trans_ijoin* again

There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.

I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort. The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
b10370585349d364ff3c550afa7922e6e21f029d 19-Sep-2011 Christoph Hellwig <hch@infradead.org> xfs: unlock the inode before log force in xfs_fsync

Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.

This also removes the only direct caller of _xfs_trans_commit, thus
allowing it to be merged into the plain xfs_trans_commit again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
1d8c95a363bf8cd4d4182dd19c01693b635311c2 18-Jul-2011 Dave Chinner <dchinner@redhat.com> xfs: use a cursor for bulk AIL insertion

Delayed logging can insert tens of thousands of log items into the
AIL at the same LSN. When the committing of log commit records
occur, we can get insertions occurring at an LSN that is not at the
end of the AIL. If there are thousands of items in the AIL on the
tail LSN, each insertion has to walk the AIL to find the correct
place to insert the new item into the AIL. This can consume large
amounts of CPU time and block other operations from occurring while
the traversals are in progress.

To avoid this repeated walk, use a AIL cursor to record
where we should be inserting the new items into the AIL without
having to repeat the walk. The cursor infrastructure already
provides this functionality for push walks, so is a simple extension
of existing code. While this will not avoid the initial walk, it
will avoid repeating it tens of thousands of times during a single
checkpoint commit.

This version includes logic improvements from Christoph Hellwig.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
b2ce39740066604288876c752d8170b3b17a21aa 11-Jul-2011 Alex Elder <aelder@sgi.com> Revert "xfs: fix filesystsem freeze race in xfs_trans_alloc"

This reverts commit 7a249cf83da1813cfa71cfe1e265b40045eceb47.

That commit created a situation that could lead to a filesystem
hang. As Dave Chinner pointed out, xfs_trans_alloc() could hold a
reference to m_active_trans (i.e., keep it non-zero) and then wait
for SB_FREEZE_TRANS to complete. Meanwhile a filesystem freeze
request could set SB_FREEZE_TRANS and then wait for m_active_trans
to drop to zero. Nobody benefits from this sequence of events...

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
7a249cf83da1813cfa71cfe1e265b40045eceb47 08-Jul-2011 Christoph Hellwig <hch@lst.de> xfs: fix filesystsem freeze race in xfs_trans_alloc

As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem
freeze when it sleeps during the memory allocation. Fix this by moving the
wait_for_freeze call after the memory allocation. This means moving the
freeze into the low-level _xfs_trans_alloc helper, which thus grows a new
argument. Also fix up some comments in that area while at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <david@fromorbit.com>
1316d4da3f632d5843d5a446203e73067dc40f09 04-Jul-2011 Dave Chinner <dchinner@redhat.com> xfs: unpin stale inodes directly in IOP_COMMITTED

When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c18 ("xfs: avoid moving stale inodes
in the AIL").

To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).

The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path. Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.

To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.

Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.

So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
e84661aa84e2e003738563f65155d4f12dc474e7 20-May-2011 Christoph Hellwig <hch@infradead.org> xfs: add online discard support

Now that we have reliably tracking of deleted extents in a
transaction we can easily implement "online" discard support
which calls blkdev_issue_discard once a transaction commits.

The actual discard is a two stage operation as we first have
to mark the busy extent as not available for reuse before we
can start the actual discard. Note that we don't bother
supporting discard for the non-delaylog mode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
8a072a4d4c6a5b6ec32836c467d2996393c76c6f 24-Apr-2011 Christoph Hellwig <hch@infradead.org> xfs: reduce the number of pagb_lock roundtrips in xfs_alloc_clear_busy

Instead of finding the per-ag and then taking and releasing the pagb_lock
for every single busy extent completed sort the list of busy extents and
only switch betweens AGs where nessecary. This becomes especially important
with the online discard support which will hit this lock more often.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
c6f990d1ff8e4e53b12f4175eb7d7ea710c3ca73 27-Jan-2011 Dave Chinner <dchinner@redhat.com> xfs: handle CIl transaction commit failures correctly

Failure to commit a transaction into the CIL is not handled
correctly. This currently can only happen when racing with a
shutdown and requires an explicit shutdown check, so it rare and can
be avoided. Remove the shutdown check and make the CIL commit a void
function to indicate it will always succeed, thereby removing the
incorrectly handled failure case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
e34a314c5e49fe6b763568f6576b19f1299c33c2 26-Jan-2011 Dave Chinner <dchinner@redhat.com> xfs: fix efi item leak on forced shutdown

After test 139, kmemleak shows:

unreferenced object 0xffff880078b405d8 (size 400):
comm "xfs_io", pid 4904, jiffies 4294909383 (age 1186.728s)
hex dump (first 32 bytes):
60 c1 17 79 00 88 ff ff 60 c1 17 79 00 88 ff ff `..y....`..y....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81afb04d>] kmemleak_alloc+0x2d/0x60
[<ffffffff8115c6cf>] kmem_cache_alloc+0x13f/0x2b0
[<ffffffff814aaa97>] kmem_zone_alloc+0x77/0xf0
[<ffffffff814aab2e>] kmem_zone_zalloc+0x1e/0x50
[<ffffffff8147cd6b>] xfs_efi_init+0x4b/0xb0
[<ffffffff814a4ee8>] xfs_trans_get_efi+0x58/0x90
[<ffffffff81455fab>] xfs_bmap_finish+0x8b/0x1d0
[<ffffffff814851b4>] xfs_itruncate_finish+0x2c4/0x5d0
[<ffffffff814a970f>] xfs_setattr+0x8df/0xa70
[<ffffffff814b5c7b>] xfs_vn_setattr+0x1b/0x20
[<ffffffff8117dc00>] notify_change+0x170/0x2e0
[<ffffffff81163bf6>] do_truncate+0x66/0xa0
[<ffffffff81163d0b>] sys_ftruncate+0xdb/0xe0
[<ffffffff8103a002>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The cause of the leak is that the "remove" parameter of IOP_UNPIN()
is never set when a CIL push is aborted. This means that the EFI
item is never freed if it was in the push being cancelled. The
problem is specific to delayed logging, but has uncovered a couple
of problems with the handling of IOP_UNPIN(remove).

Firstly, we cannot safely call xfs_trans_del_item() from IOP_UNPIN()
in the CIL commit failure path or the iclog write failure path
because for delayed loging we have no transaction context. Hence we
must only call xfs_trans_del_item() if the log item being unpinned
has an active log item descriptor.

Secondly, xfs_trans_uncommit() does not handle log item descriptor
freeing during the traversal of log items on a transaction. It can
reference a freed log item descriptor when unpinning an EFI item.
Hence it needs to use a safe list traversal method to allow items to
be removed from the transaction during IOP_UNPIN().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
1884bd8354c9aec4ca501dc4773c13ad2a09af7b 25-Dec-2010 Jesper Juhl <jj@chaosbits.net> xfs: fix an assignment within an ASSERT()

In fs/xfs/xfs_trans.c::xfs_trans_unreserve_and_mod_sb() at the out:
label we have this:
ASSERT(error = 0);
I believe a comparison was intended, not an assignment. If I'm
right, the patch below fixes that up.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Alex Elder <aelder@sgi.com>
0e57f6a36f9be03e5abb755f524ee91c4aebe854 19-Dec-2010 Dave Chinner <dchinner@redhat.com> xfs: bulk AIL insertion during transaction commit

When inserting items into the AIL from the transaction committed
callbacks, we take the AIL lock for every single item that is to be
inserted. For a CIL checkpoint commit, this can be tens of thousands
of individual inserts, yet almost all of the items will be inserted
at the same point in the AIL because they have the same index.

To reduce the overhead and contention on the AIL lock for such
operations, introduce a "bulk insert" operation which allows a list
of log items with the same LSN to be inserted in a single operation
via a list splice. To do this, we need to pre-sort the log items
being committed into a temporary list for insertion.

The complexity is that not every log item will end up with the same
LSN, and not every item is actually inserted into the AIL. Items
that don't match the commit LSN will be inserted and unpinned as per
the current one-at-a-time method (relatively rare), while items that
are not to be inserted will be unpinned and freed immediately. Items
that are to be inserted at the given commit lsn are placed in a
temporary array and inserted into the AIL in bulk each time the
array fills up.

As a result of this, we trade off AIL hold time for a significant
reduction in traffic. lock_stat output shows that the worst case
hold time is unchanged, but contention from AIL inserts drops by an
order of magnitude and the number of lock traversal decreases
significantly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4957a449a1bce2f5095f57f84114dc038a8f08d5 06-Oct-2010 Christoph Hellwig <hch@infradead.org> xfs: fix the xfs_trans_committed

Use the correct prototype for xfs_trans_committed instead of casting it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
dfe188d4283752086d48380cde40d9801c318667 06-Oct-2010 Christoph Hellwig <hch@infradead.org> xfs: remove unused t_callback field in struct xfs_trans

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
1b0407125f9a5be63e861eb27c8af9e32f20619c 30-Sep-2010 Christoph Hellwig <hch@infradead.org> xfs: do not use xfs_mod_incore_sb_batch for per-cpu counters

Update the per-cpu counters manually in xfs_trans_unreserve_and_mod_sb
and remove support for per-cpu counters from xfs_mod_incore_sb_batch
to simplify it. And added benefit is that we don't have to take
m_sb_lock for transactions that only modify per-cpu counters.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
96540c78583a417113df4d027e6b68a595ab9a09 30-Sep-2010 Christoph Hellwig <hch@infradead.org> xfs: do not use xfs_mod_incore_sb for per-cpu counters

Export xfs_icsb_modify_counters and always use it for modifying
the per-cpu counters. Remove support for per-cpu counters from
xfs_mod_incore_sb to simplify it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
d17c701ce6a548a92f7f8a3cec20299465f36ee3 24-Aug-2010 Dave Chinner <dchinner@redhat.com> xfs: unlock items before allowing the CIL to commit

When we commit a transaction using delayed logging, we need to
unlock the items in the transaciton before we unlock the CIL context
and allow it to be checkpointed. If we unlock them after we release
the CIl context lock, the CIL can checkpoint and complete before
we free the log items. This breaks stale buffer item unlock and
unpin processing as there is an implicit assumption that the unlock
will occur before the unpin.

Also, some log items need to store the LSN of the transaction commit
in the item (inodes and EFIs) and so can race with other transaction
completions if we don't prevent the CIL from checkpointing before
the unlock occurs.

Cc: <stable@kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
438697064aaa2f64e0fcc6586582a3e7ec36005b 20-Jul-2010 Dave Chinner <dchinner@redhat.com> xfs: fix xfs_trans_add_item() lockdep warnings

xfs_trans_add_item() is called with ip->i_ilock held, which means it
is unsafe for memory reclaim to recurse back into the filesystem
(ilock is required in writeback). Hence the allocation needs to be
KM_NOFS to avoid recursion.

Lockdep report indicating memory allocation being called with the
ip->i_ilock held is as follows:

[ 1749.866796] =================================
[ 1749.867788] [ INFO: inconsistent lock state ]
[ 1749.868327] 2.6.35-rc3-dgc+ #25
[ 1749.868741] ---------------------------------
[ 1749.868741] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
[ 1749.868741] dd/2835 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1749.868741] (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
[ 1749.868741] {IN-RECLAIM_FS-W} state was registered at:
[ 1749.868741] [<ffffffff810b3a97>] __lock_acquire+0x437/0x1450
[ 1749.868741] [<ffffffff810b4b56>] lock_acquire+0xa6/0x160
[ 1749.868741] [<ffffffff810a20b5>] down_write_nested+0x65/0xb0
[ 1749.868741] [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
[ 1749.868741] [<ffffffff8134e819>] xfs_reclaim_inode+0x99/0x310
[ 1749.868741] [<ffffffff8134f56b>] xfs_inode_ag_walk+0x8b/0x150
[ 1749.868741] [<ffffffff8134f6bb>] xfs_inode_ag_iterator+0x8b/0xf0
[ 1749.868741] [<ffffffff8134f7a8>] xfs_reclaim_inode_shrink+0x88/0x90
[ 1749.868741] [<ffffffff81119d07>] shrink_slab+0x137/0x1a0
[ 1749.868741] [<ffffffff8111bbe1>] balance_pgdat+0x421/0x6a0
[ 1749.868741] [<ffffffff8111bf7d>] kswapd+0x11d/0x320
[ 1749.868741] [<ffffffff8109ce56>] kthread+0x96/0xa0
[ 1749.868741] [<ffffffff81035de4>] kernel_thread_helper+0x4/0x10
[ 1749.868741] irq event stamp: 4234335
[ 1749.868741] hardirqs last enabled at (4234335): [<ffffffff81147d25>] kmem_cache_free+0x115/0x220
[ 1749.868741] hardirqs last disabled at (4234334): [<ffffffff81147c4d>] kmem_cache_free+0x3d/0x220
[ 1749.868741] softirqs last enabled at (4233112): [<ffffffff81084dd2>] __do_softirq+0x142/0x260
[ 1749.868741] softirqs last disabled at (4233095): [<ffffffff81035edc>] call_softirq+0x1c/0x50
[ 1749.868741]
[ 1749.868741] other info that might help us debug this:
[ 1749.868741] 2 locks held by dd/2835:
[ 1749.868741] #0: (&(&ip->i_iolock)->mr_lock#2){+.+.+.}, at: [<ffffffff81316edd>] xfs_ilock_nowait+0xed/0x200
[ 1749.868741] #1: (&(&ip->i_lock)->mr_lock){++++?.}, at: [<ffffffff813170fb>] xfs_ilock+0x10b/0x190
[ 1749.868741]
[ 1749.868741] stack backtrace:
[ 1749.868741] Pid: 2835, comm: dd Not tainted 2.6.35-rc3-dgc+ #25
[ 1749.868741] Call Trace:
[ 1749.868741] [<ffffffff810b1faa>] print_usage_bug+0x18a/0x190
[ 1749.868741] [<ffffffff8104264f>] ? save_stack_trace+0x2f/0x50
[ 1749.868741] [<ffffffff810b2400>] ? check_usage_backwards+0x0/0xf0
[ 1749.868741] [<ffffffff810b2f11>] mark_lock+0x331/0x400
[ 1749.868741] [<ffffffff810b3047>] mark_held_locks+0x67/0x90
[ 1749.868741] [<ffffffff810b3111>] lockdep_trace_alloc+0xa1/0xe0
[ 1749.868741] [<ffffffff81147419>] kmem_cache_alloc+0x39/0x1e0
[ 1749.868741] [<ffffffff8133f954>] kmem_zone_alloc+0x94/0xe0
[ 1749.868741] [<ffffffff8133f9be>] kmem_zone_zalloc+0x1e/0x50
[ 1749.868741] [<ffffffff81335f02>] xfs_trans_add_item+0x72/0xb0
[ 1749.868741] [<ffffffff81339e41>] xfs_trans_ijoin+0xa1/0xd0
[ 1749.868741] [<ffffffff81319f82>] xfs_itruncate_finish+0x312/0x5d0
[ 1749.868741] [<ffffffff8133cb87>] xfs_free_eofblocks+0x227/0x280
[ 1749.868741] [<ffffffff8133cd18>] xfs_release+0x138/0x190
[ 1749.868741] [<ffffffff813464c5>] xfs_file_release+0x15/0x20
[ 1749.868741] [<ffffffff81150ebf>] fput+0x13f/0x260
[ 1749.868741] [<ffffffff8114d8c2>] filp_close+0x52/0x80
[ 1749.868741] [<ffffffff8114d9a9>] sys_close+0xb9/0x120
[ 1749.868741] [<ffffffff81034ff2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
898621d5a72c6799a9a13fce20443b4b6699899c 24-Jun-2010 Christoph Hellwig <hch@infradead.org> xfs: simplify inode to transaction joining

Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.

This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
9412e3181c0ef82efc3d8e88d73e583ec10c34e9 23-Jun-2010 Christoph Hellwig <hch@infradead.org> xfs: merge iop_unpin_remove into iop_unpin

The unpin_remove item operation instances always share most of the
implementation with the respective unpin implementation. So instead
of keeping two different entry points add a remove flag to the unpin
operation and share the code more easily.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
e98c414f9a3134fe7efc56ef8f1d394b54bfd40e 23-Jun-2010 Christoph Hellwig <hch@infradead.org> xfs: simplify log item descriptor tracking

Currently we track log item descriptor belonging to a transaction using a
complex opencoded chunk allocator. This code has been there since day one
and seems to work around the lack of an efficient slab allocator.

This patch replaces it with dynamically allocated log item descriptors
from a dedicated slab pool, linked to the transaction by a linked list.

This allows to greatly simplify the log item descriptor tracking to the
point where it's just a couple hundred lines in xfs_trans.c instead of
a separate file. The external API has also been simplified while we're
at it - the xfs_trans_add_item and xfs_trans_del_item functions to add/
delete items from a transaction have been simplified to the bare minium,
and the xfs_trans_find_item function is replaced with a direct dereference
of the li_desc field. All debug code walking the list of log items in
a transaction is down to a simple list_for_each_entry.

Note that we could easily use a singly linked list here instead of the
double linked list from list.h as the fastpath only does deletion from
sequential traversal. But given that we don't have one available as
a library function yet I use the list.h functions for simplicity.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
3400777ff03a3cd4fdbc6cb15676fc7e7ceefc00 23-Jun-2010 Christoph Hellwig <hch@infradead.org> xfs: remove unneeded #include statements

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
288699fecaffa1ef8f75f92020cbb593a772e487 23-Jun-2010 Christoph Hellwig <hch@infradead.org> xfs: drop dmapi hooks

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead. If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
025101dca4480eff9da948405e872d5115030850 04-May-2010 Christoph Hellwig <hch@infradead.org> xfs: cleanup log reservation calculactions

Instead of having small helper functions calling big macros do the
calculations for the log reservations directly in the functions.
These are mostly 1:1 from the macros execept that the macros kept
the quota calculations in their callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
71e330b593905e40d6c5afa824d38ee02d70ce5f 21-May-2010 Dave Chinner <dchinner@redhat.com> xfs: Introduce delayed logging core code

The delayed logging code only changes in-memory structures and as
such can be enabled and disabled with a mount option. Add the mount
option and emit a warning that this is an experimental feature that
should not be used in production yet.

We also need infrastructure to track committed items that have not
yet been written to the log. This is what the Committed Item List
(CIL) is for.

The log item also needs to be extended to track the current log
vector, the associated memory buffer and it's location in the Commit
Item List. Extend the log item and log vector structures to enable
this tracking.

To maintain the current log format for transactions with delayed
logging, we need to introduce a checkpoint transaction and a context
for tracking each checkpoint from initiation to transaction
completion. This includes adding a log ticket for tracking space
log required/used by the context checkpoint.

To track all the changes we need an io vector array per log item,
rather than a single array for the entire transaction. Using the new
log vector structure for this requires two passes - the first to
allocate the log vector structures and chain them together, and the
second to fill them out. This log vector chain can then be passed
to the CIL for formatting, pinning and insertion into the CIL.

Formatting of the log vector chain is relatively simple - it's just
a loop over the iovecs on each log vector, but it is made slightly
more complex because we re-write the iovec after the copy to point
back at the memory buffer we just copied into.

This code also needs to pin log items. If the log item is not
already tracked in this checkpoint context, then it needs to be
pinned. Otherwise it is already pinned and we don't need to pin it
again.

The only other complexity is calculating the amount of new log space
the formatting has consumed. This needs to be accounted to the
transaction in progress, and the accounting is made more complex
becase we need also to steal space from it for log metadata in the
checkpoint transaction. Calculate all this at insert time and update
all the tickets, counters, etc correctly.

Once we've formatted all the log items in the transaction, attach
the busy extents to the checkpoint context so the busy extents live
until checkpoint completion and can be processed at that point in
time. Transactions can then be freed at this point in time.

Now we need to issue checkpoints - we are tracking the amount of log space
used by the items in the CIL, so we can trigger background checkpoints when the
space usage gets to a certain threshold. Otherwise, checkpoints need ot be
triggered when a log synchronisation point is reached - a log force event.

Because the log write code already handles chained log vectors, writing the
transaction is trivial, too. Construct a transaction header, add it
to the head of the chain and write it into the log, then issue a
commit record write. Then we can release the checkpoint log ticket
and attach the context to the log buffer so it can be called during
Io completion to complete the checkpoint.

We also need to allow for synchronising multiple in-flight
checkpoints. This is needed for two things - the first is to ensure
that checkpoint commit records appear in the log in the correct
sequence order (so they are replayed in the correct order). The
second is so that xfs_log_force_lsn() operates correctly and only
flushes and/or waits for the specific sequence it was provided with.

To do this we need a wait variable and a list tracking the
checkpoint commits in progress. We can walk this list and wait for
the checkpoints to change state or complete easily, an this provides
the necessary synchronisation for correct operation in both cases.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
ed3b4d6cdc81e8feefdbfa3c584614be301b6d39 20-May-2010 Dave Chinner <david@fromorbit.com> xfs: Improve scalability of busy extent tracking

When we free a metadata extent, we record it in the per-AG busy
extent array so that it is not re-used before the freeing
transaction hits the disk. This array is fixed size, so when it
overflows we make further allocation transactions synchronous
because we cannot track more freed extents until those transactions
hit the disk and are completed. Under heavy mixed allocation and
freeing workloads with large log buffers, we can overflow this array
quite easily.

Further, the array is sparsely populated, which means that inserts
need to search for a free slot, and array searches often have to
search many more slots that are actually used to check all the
busy extents. Quite inefficient, really.

To enable this aspect of extent freeing to scale better, we need
a structure that can grow dynamically. While in other areas of
XFS we have used radix trees, the extents being freed are at random
locations on disk so are better suited to being indexed by an rbtree.

So, use a per-AG rbtree indexed by block number to track busy
extents. This incures a memory allocation when marking an extent
busy, but should not occur too often in low memory situations. This
should scale to an arbitrary number of extents so should not be a
limitation for features such as in-memory aggregation of
transactions.

However, there are still situations where we can't avoid allocating
busy extents (such as allocation from the AGFL). To minimise the
overhead of such occurences, we need to avoid doing a synchronous
log force while holding the AGF locked to ensure that the previous
transactions are safely on disk before we use the extent. We can do
this by marking the transaction doing the allocation as synchronous
rather issuing a log force.

Because of the locking involved and the ordering of transactions,
the synchronous transaction provides the same guarantees as a
synchronous log force because it ensures that all the prior
transactions are already on disk when the synchronous transaction
hits the disk. i.e. it preserves the free->allocate order of the
extent correctly in recovery.

By doing this, we avoid holding the AGF locked while log writes are
in progress, hence reducing the length of time the lock is held and
therefore we increase the rate at which we can allocate and free
from the allocation group, thereby increasing overall throughput.

The only problem with this approach is that when a metadata buffer is
marked stale (e.g. a directory block is removed), then buffer remains
pinned and locked until the log goes to disk. The issue here is that
if that stale buffer is reallocated in a subsequent transaction, the
attempt to lock that buffer in the transaction will hang waiting
the log to go to disk to unlock and unpin the buffer. Hence if
someone tries to lock a pinned, stale, locked buffer we need to
push on the log to get it unlocked ASAP. Effectively we are trading
off a guaranteed log force for a much less common trigger for log
force to occur.

Ideally we should not reallocate busy extents. That is a much more
complex fix to the problem as it involves direct intervention in the
allocation btree searches in many places. This is left to a future
set of modifications.

Finally, now that we track busy extents in allocated memory, we
don't need the descriptors in the transaction structure to point to
them. We can replace the complex busy chunk infrastructure with a
simple linked list of busy extents. This allows us to remove a large
chunk of code, making the overall change a net reduction in code
size.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
b1c1b5b6108ad8e5991a614514f41da436c659d6 23-Mar-2010 Dave Chinner <dchinner@redhat.com> xfs: Clean up xfs_trans_committed code after factoring

Now that the code has been factored, clean up all the remaining
style cruft, simplify the code and re-order functions so that it
doesn't need forward declarations.

Also move the remaining functions that require forward declarations
(xfs_trans_uncommit, xfs_trans_free) so that all the forward
declarations can be removed from the file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8e646a55ac69fe620b9e84034c03dd1e8e16a36b 08-Mar-2010 Dave Chinner <dchinner@redhat.com> xfs: update and factor xfs_trans_committed()

The function header to xfs-trans_committed has long had this
comment:

* THIS SHOULD BE REWRITTEN TO USE xfs_trans_next_item()

To prepare for different methods of committing items, convert the
code to use xfs_trans_next_item() and factor the code into smaller,
more digestible chunks.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
a3ccd2ca43d5cdfe0b256be02957dc5f47ec4c39 14-Mar-2010 Christoph Hellwig <hch@infradead.org> xfs: clean up xfs_trans_commit logic even more

> +shut_us_down:
> + shutdown = XFS_FORCED_SHUTDOWN(mp) ? EIO : 0;
> + if (!(tp->t_flags & XFS_TRANS_DIRTY) || shutdown) {
> + xfs_trans_unreserve_and_mod_sb(tp);
> + /*

This whole area in _xfs_trans_commit is still a complete mess.

So while touching this code, unravel this mess as well to make the
whole flow of the function simpler and clearer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
0924378a689ccb05f6d60875742dc28f69bf0129 08-Mar-2010 Dave Chinner <dchinner@redhat.com> xfs: split out iclog writing from xfs_trans_commit()

Split the the part of xfs_trans_commit() that deals with writing the
transaction into the iclog into a separate function. This isolates the
physical commit process from the logical commit operation and makes
it easier to insert different transaction commit paths without affecting
the existing algorithm adversely.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
8e123850863366b738d6dfb9a84045018ff038fc 08-Mar-2010 Dave Chinner <dchinner@redhat.com> xfs: remove stale parameter from ->iop_unpin method

The staleness of a object being unpinned can be directly derived
from the object itself - there is no need to extract it from the
object then pass it as a parameter into IOP_UNPIN().

This means we can kill the XFS_LID_BUF_STALE flag - it is set,
checked and cleared in the same places XFS_BLI_STALE flag in the
xfs_buf_log_item so it is now redundant and hence safe to remove.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
35a8a72f064105807b091a21e004893b219c9ed2 16-Feb-2010 Christoph Hellwig <hch@infradead.org> xfs: stop passing opaque handles to xfs_log.c routines

Currenly we pass opaque xfs_log_ticket_t handles instead of
struct xlog_ticket pointers, and void pointers instead of
struct xlog_in_core pointers to various log manager functions.
Instead pass properly typed pointers after adding forward
declarations for them to xfs_log.h, and adjust the touched
function prototypes to the standard XFS style while at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
a14a348bff2f99471a28e5928eb6801224c053d8 19-Jan-2010 Christoph Hellwig <hch@infradead.org> xfs: cleanup up xfs_log_force calling conventions

Remove the XFS_LOG_FORCE argument which was always set, and the
XFS_LOG_URGE define, which was never used.

Split xfs_log_force into a two helpers - xfs_log_force which forces
the whole log, and xfs_log_force_lsn which forces up to the
specified LSN. The underlying implementations already were entirely
separate, as were the users.

Also re-indent the new _xfs_log_force/_xfs_log_force which
previously had a weird coding style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
4139b3b337cffd106744386c842b89dc86e31d4b 19-Jan-2010 Christoph Hellwig <hch@infradead.org> xfs: kill XLOG_VEC_SET_TYPE

This macro only obsfucates the log item type assignments, so kill it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
80641dc66a2d6dfb22af4413227a92b8ab84c7bb 19-Oct-2009 Christoph Hellwig <hch@infradead.org> xfs: I/O completion handlers must use NOFS allocations

When completing I/O requests we must not allow the memory allocator to
recurse into the filesystem, as we might deadlock on waiting for the
I/O completion otherwise. The only thing currently allocating normal
GFP_KERNEL memory is the allocation of the transaction structure for
the unwritten extent conversion. Add a memflags argument to
_xfs_trans_alloc to allow controlling the allocator behaviour.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Thomas Neumann <tneumann@users.sourceforge.net>
Tested-by: Thomas Neumann <tneumann@users.sourceforge.net>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
f95022161d23ee661a48af8f280472209f513a67 05-Jun-2009 Christoph Hellwig <hch@lst.de> xfs: remove ->write_super and stop maintaining ->s_dirt

the write_super method is used for

(1) writing back the superblock periodically from pdflush
(2) called just before ->sync_fs for data integerity syncs

We don't need (1) because we have our own peridoc writeout through xfssyncd,
and we don't need (2) because xfs_fs_sync_fs performs a proper synchronous
superblock writeout after all other data and metadata has been written out.

Also remove ->s_dirt tracking as it's only used to decide when too call
->write_super.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7d095257e321214e4cf359abd131ba1f09c60cba 08-Jun-2009 Christoph Hellwig <hch@lst.de> xfs: kill xfs_qmops

Kill the quota ops function vector and replace it with direct calls or
stubs in the CONFIG_XFS_QUOTA=n case.

Make sure we check XFS_IS_QUOTA_RUNNING in the right spots. We can remove
the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set
otherwise.

This brings us back closer to the way this code worked in IRIX and earlier
Linux versions, but we keep a lot of the more useful factoring of common
code.

Eventually we should also kill xfs_qm_bhv.c, but that's left for a later
patch.

Reduces the size of the source code by about 250 lines and the size of
XFS module by about 1.5 kilobytes with quotas enabled:

text data bss dec hex filename
615957 2960 3848 622765 980ad fs/xfs/xfs.o
617231 3152 3848 624231 98667 fs/xfs/xfs.o.old

Fallout:

- xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects
the inode locked and xfs_qm_dqattach which does the locking around it,
thus removing XFS_QMOPT_ILOCKED.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
cc09c0dc57de7f7d2ed89d480b5653e5f6a32f2c 17-Nov-2008 Dave Chinner <david@fromorbit.com> [XFS] Fix double free of log tickets

When an I/O error occurs during an intermediate commit on a rolling
transaction, xfs_trans_commit() will free the transaction structure
and the related ticket. However, the duplicate transaction that
gets used as the transaction continues still contains a pointer
to the ticket. Hence when the duplicate transaction is cancelled
and freed, we free the ticket a second time.

Add reference counting to the ticket so that we hold an extra
reference to the ticket over the transaction commit. We drop the
extra reference once we have checked that the transaction commit
did not return an error, thus avoiding a double free on commit
error.

Credit to Nick Piggin for tripping over the problem.

SGI-PV: 989741

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
783a2f656f9674c31d4019708a94af93fa1d1c22 30-Oct-2008 David Chinner <david@fromorbit.com> [XFS] Finish removing the mount pointer from the AIL API

Change all the remaining AIL API functions that are passed struct
xfs_mount pointers to pass pointers directly to the struct xfs_ail being
used. With this conversion, all external access to the AIL is via the
struct xfs_ail. Hence the operation and referencing of the AIL is almost
entirely independent of the xfs_mount that is using it - it is now much
more tightly tied to the log and the items it is tracking in the log than
it is tied to the xfs_mount.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32353a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
fc1829f34d30899701dfd5890030d39e13e1f47d 30-Oct-2008 David Chinner <david@fromorbit.com> [XFS] Add ail pointer into log items

Add an xfs_ail pointer to log items so that the log items can reference
the AIL directly during callbacks without needed a struct xfs_mount.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32352a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
c7e8f268278a292d3823b4352182fa7755a71410 30-Oct-2008 David Chinner <david@fromorbit.com> [XFS] Move the AIL lock into the struct xfs_ail

Bring the ail lock inside the struct xfs_ail. This means the AIL can be
entirely manipulated via the struct xfs_ail rather than needing both the
struct xfs_mount and the struct xfs_ail.

SGI-PV: 988143

SGI-Modid: xfs-linux-melb:xfs-kern:32350a

Signed-off-by: David Chinner <david@fromorbit.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
39dab9d7daf5f664a3569378107a2cb284c8a594 13-Aug-2008 Eric Sandeen <sandeen@sandeen.net> [XFS] remove shouting-indirection macros from xfs_trans.h

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31758a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
322ff6b8cd54feb1c4c0426630f3642ab1dd2176 13-Aug-2008 Niv Sardi <xaiki@sgi.com> [XFS] Move xfs_attr_rolltrans to xfs_trans_roll

Move it from the attr code to the transaction code and make
the attr code call the new function.

We rolltrans is really usefull whenever we want to use rolling
transaction, should be generic, it isn't dependent on any part
of the attr code anyway.

We use this excuse to change all the:

if ((error = xfs_attr_rolltrans()))

calls into:

error = xfs_trans_roll();

if (error)

SGI-PV: 981498

SGI-Modid: xfs-linux-melb:xfs-kern:31729a

Signed-off-by: Niv Sardi <xaiki@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
f0e2d93c29dc39ffd24cac180a19d48f700c0706 19-May-2008 Denys Vlasenko <vda.linux@googlemail.com> [XFS] Remove unused arg from kmem_free()

kmem_free() function takes (ptr, size) arguments but doesn't actually use
second one.

This patch removes size argument from all callsites.

SGI-PV: 981498
SGI-Modid: xfs-linux-melb:xfs-kern:31050a

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
413d57c9907c72ed608df2be72ef8ed13a3eeb46 14-Feb-2008 Marcin Slusarz <marcin.slusarz@gmail.com> xfs: convert beX_add to beX_add_cpu (new common API)

remove beX_add functions and replace all uses with beX_add_cpu

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Dave Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
287f3dad14828275d2517c8696ad118c82b9243f 11-Oct-2007 Donald Douwsma <donaldd@sgi.com> [XFS] Unwrap AIL_LOCK

SGI-PV: 970382
SGI-Modid: xfs-linux-melb:xfs-kern:29739a

Signed-off-by: Donald Douwsma <donaldd@sgi.com>
Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Tim Shimmin <tes@sgi.com>
b267ce9952374c51099f21d6c3a59c78fa0d7586 30-Aug-2007 Christoph Hellwig <hch@infradead.org> [XFS] kill struct bhv_vfs

Now that struct bhv_vfs doesn't have any members left we can kill it and
go directly from the super_block to the xfs_mount everywhere.

SGI-PV: 969608
SGI-Modid: xfs-linux-melb:xfs-kern:29509a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
2bdf7cd0baa67608ada1517a281af359faf4c58c 28-Aug-2007 Christoph Hellwig <hch@infradead.org> [XFS] superblock endianess annotations

Creates a new xfs_dsb_t that is __be annotated and keeps xfs_sb_t for the
incore one. xfs_xlatesb is renamed to xfs_sb_to_disk and only handles the
incore -> disk conversion. A new helper xfs_sb_from_disk handles the other
direction and doesn't need the slightly hacky table-driven approach
because we only ever read the full sb from disk.

The handling of shared r/o filesystems has been buggy on little endian
system and fixing this required shuffling around of some code in that
area.

SGI-PV: 968563
SGI-Modid: xfs-linux-melb:xfs-kern:29477a

Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
45c34141126a89da07197d5b89c04c6847f1171a 18-Jun-2007 David Chinner <dgc@sgi.com> [XFS] Apply transaction delta counts atomically to incore counters

With the per-cpu superblock counters, batch updates are no longer atomic
across the entire batch of changes. This is not an issue if each
individual change in the batch is applied atomically. Unfortunately, free
block count changes are not applied atomically, and they are applied in a
manner guaranteed to cause problems.

Essentially, the free block count reservation that the transaction took
initially is returned to the in core counters before a second delta takes
away what is used. because these two operations are not atomic, we can
race with another thread that can use the returned transaction reservation
before the transaction takes the space away again and we can then get
ENOSPC being reported in a spot where we don't have an ENOSPC condition,
nor should we ever see one there.

Fix it up by rolling the two deltas into the one so it can be applied
safely (i.e. atomically) to the incore counters.

SGI-PV: 964465
SGI-Modid: xfs-linux-melb:xfs-kern:28796a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
210c6f1caa451623e14a7cd71000d2c2e0d9cc43 24-May-2007 David Chinner <dgc@sgi.com> [XFS] Fix the transaction flags to make lazy superblock counters work.

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28653a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
92821e2ba4ae26887223326fb0b95cdab963b768 24-May-2007 David Chinner <dgc@sgi.com> [XFS] Lazy Superblock Counters

When we have a couple of hundred transactions on the fly at once, they all
typically modify the on disk superblock in some way.
create/unclink/mkdir/rmdir modify inode counts, allocation/freeing modify
free block counts.

When these counts are modified in a transaction, they must eventually lock
the superblock buffer and apply the mods. The buffer then remains locked
until the transaction is committed into the incore log buffer. The result
of this is that with enough transactions on the fly the incore superblock
buffer becomes a bottleneck.

The result of contention on the incore superblock buffer is that
transaction rates fall - the more pressure that is put on the superblock
buffer, the slower things go.

The key to removing the contention is to not require the superblock fields
in question to be locked. We do that by not marking the superblock dirty
in the transaction. IOWs, we modify the incore superblock but do not
modify the cached superblock buffer. In short, we do not log superblock
modifications to critical fields in the superblock on every transaction.
In fact we only do it just before we write the superblock to disk every
sync period or just before unmount.

This creates an interesting problem - if we don't log or write out the
fields in every transaction, then how do the values get recovered after a
crash? the answer is simple - we keep enough duplicate, logged information
in other structures that we can reconstruct the correct count after log
recovery has been performed.

It is the AGF and AGI structures that contain the duplicate information;
after recovery, we walk every AGI and AGF and sum their individual
counters to get the correct value, and we do a transaction into the log to
correct them. An optimisation of this is that if we have a clean unmount
record, we know the value in the superblock is correct, so we can avoid
the summation walk under normal conditions and so mount/recovery times do
not change under normal operation.

One wrinkle that was discovered during development was that the blocks
used in the freespace btrees are never accounted for in the AGF counters.
This was once a valid optimisation to make; when the filesystem is full,
the free space btrees are empty and consume no space. Hence when it
matters, the "accounting" is correct. But that means the when we do the
AGF summations, we would not have a correct count and xfs_check would
complain. Hence a new counter was added to track the number of blocks used
by the free space btrees. This is an *on-disk format change*.

As a result of this, lazy superblock counters are a mkfs option and at the
moment on linux there is no way to convert an old filesystem. This is
possible - xfs_db can be used to twiddle the right bits and then
xfs_repair will do the format conversion for you. Similarly, you can
convert backwards as well. At some point we'll add functionality to
xfs_admin to do the bit twiddling easily....

SGI-PV: 964999
SGI-Modid: xfs-linux-melb:xfs-kern:28652a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
1c72bf90037f32fc2b10e0a05dff2640abce8ee2 08-May-2007 Eric Sandeen <sandeen@sandeen.net> [XFS] The last argument "lsn" of xfs_trans_commit() is always called with
NULL.

Patch provided by Eric Sandeen.

SGI-PV: 961693
SGI-Modid: xfs-linux-melb:xfs-kern:28199a

Signed-off-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
20f4ebf2bf2f57c1a9abb3655391336cc90314b3 10-Feb-2007 David Chinner <dgc@sgi.com> [XFS] Make growfs work for amounts greater than 2TB

The free block modification code has a 32bit interface, limiting the size
the filesystem can be grown even on 64 bit machines. On 32 bit machines,
there are other 32bit variables in transaction structures and interfaces
that need to be expanded to allow this to work.

SGI-PV: 959978
SGI-Modid: xfs-linux-melb:xfs-kern:27894a

Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tim Shimmin <tes@sgi.com>
f6c2d1fa6310a71b1c2e05fc6d9ff9b91489fa0e 20-Jun-2006 Nathan Scott <nathans@sgi.com> [XFS] Remove version 1 directory code. Never functioned on Linux, just
pure bloat.

SGI-PV: 952969
SGI-Modid: xfs-linux-melb:xfs-kern:26251a

Signed-off-by: Nathan Scott <nathans@sgi.com>
34327e138481137a81a2e33060b8eb0944013801 09-Jun-2006 Nathan Scott <nathans@sgi.com> [XFS] Cleanup a missed porting conversion, and freezing.

SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26109a

Signed-off-by: Nathan Scott <nathans@sgi.com>
59c1b082f5fff8269565039600a2ef18d48649b5 09-Jun-2006 Nathan Scott <nathans@sgi.com> [XFS] Make the pflags test/set wrappers more legible for us mere humans.

SGI-PV: 953338
SGI-Modid: xfs-linux-melb:xfs-kern:26099a

Signed-off-by: Nathan Scott <nathans@sgi.com>
7d04a335b6b2d79e3742ffd28bd651204574e794 09-Jun-2006 Nathan Scott <nathans@sgi.com> [XFS] Shutdown the filesystem if all device paths have gone. Made
shutdown vop flags consistent with sync vop flags declarations too.

SGI-PV: 939911
SGI-Modid: xfs-linux-melb:xfs-kern:26096a

Signed-off-by: Nathan Scott <nathans@sgi.com>
c41564b5af328ea4600b26119f6c9c8e1eb5c28b 29-Mar-2006 Nathan Scott <nathans@sgi.com> [XFS] We really suck at spulling. Thanks to Chris Pascoe for fixing all
these typos.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25539a

Signed-off-by: Nathan Scott <nathans@sgi.com>
2ddd5928d01ca8eb49f55166411b64a5844a8959 17-Mar-2006 Nathan Scott <nathans@sgi.com> [XFS] Correct the dquot reservation component for the link transation.

SGI-PV: 904196
SGI-Modid: xfs-linux-melb:xfs-kern:25476a

Signed-off-by: Nathan Scott <nathans@sgi.com>
8f79405527b50fe27cffcb7081890b5c68439b4f 14-Mar-2006 Nathan Scott <nathans@sgi.com> [XFS] Reduce complexity in xfs_trans_init by pushing complex macros out
into functions and hence reduce the stack footprint there.

SGI-PV: 947312
SGI-Modid: xfs-linux-melb:xfs-kern:25360a

Signed-off-by: Nathan Scott <nathans@sgi.com>
60a204f096dd67683f3993798e14905ee9828ba5 11-Jan-2006 Nathan Scott <nathans@sgi.com> [XFS] Fix a thinko when generating a forced shutdown stack trace.

SGI-PV: 929558
SGI-Modid: xfs-linux-melb:xfs-kern:203817a

Signed-off-by: Nathan Scott <nathans@sgi.com>
0733af213f2859f7228229f3ac053c025f57d0d5 11-Jan-2006 Ryan Hankins <hankins@sgi.com> [XFS] Add a stack trace in the case of xfs_forced_shutdown.

SGI-PV: 929558
SGI-Modid: xfs-linux-melb:xfs-kern:203701a

Signed-off-by: Ryan Hankins <hankins@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
cfcbbbd089eadcaa86abb2c0f352e1ab23e16f72 02-Nov-2005 Nathan Scott <nathans@sgi.com> [XFS] Remove old, broken nolog-mode code - noone plans to ever fix it.

SGI-PV: 944821
SGI-Modid: xfs-linux:xfs-kern:24213a

Signed-off-by: Nathan Scott <nathans@sgi.com>
7b71876980d87c8f237b94d8529ee7fcc05ec2d9 02-Nov-2005 Nathan Scott <nathans@sgi.com> [XFS] Update license/copyright notices to match the prefered SGI
boilerplate.

SGI-PV: 913862
SGI-Modid: xfs-linux:xfs-kern:23903a

Signed-off-by: Nathan Scott <nathans@sgi.com>
a844f4510dce23c07f3923cb42138f5fdd745017 02-Nov-2005 Nathan Scott <nathans@sgi.com> [XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot.

SGI-PV: 943122
SGI-Modid: xfs-linux:xfs-kern:23901a

Signed-off-by: Nathan Scott <nathans@sgi.com>
f538d4da8d521746ca5ebf8c1a8105eb49bfb45e 02-Nov-2005 Christoph Hellwig <hch@sgi.com> [XFS] write barrier support Issue all log sync operations as ordered
writes. In addition flush the disk cache on fsync if the sync cached
operation didn't sync the log to disk (this requires some additional
bookeping in the transaction and log code). If the device doesn't claim to
support barriers, the filesystem has an extern log volume or the trial
superblock write with barriers enabled failed we disable barriers and
print a warning. We should probably fail the mount completely, but that
could lead to nasty boot failures for the root filesystem. Not enabled by
default yet, needs more destructive testing first.

SGI-PV: 912426
SGI-Modid: xfs-linux:xfs-kern:198723a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
7e9c63961558092d584936a874cf3fee80002eb6 02-Sep-2005 Tim Shimmin <tes@sgi.com> [XFS] 929956 add log debugging and tracing info

SGI-PV: 931456
SGI-Modid: xfs-linux:xfs-kern:23155a

Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
ba0f32d46049e2b625dabd33c7964f8ca2cd7651 21-Jun-2005 Christoph Hellwig <hch@sgi.com> [XFS] mark various symbols static Patch from Adrian Bunk

SGI-PV: 936255
SGI-Modid: xfs-linux:xfs-kern:192760a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
4372d6e10349d4e8b012588f86f15c740c73a7c4 21-Jun-2005 Christoph Hellwig <hch@sgi.com> [XFS] Remove dead code. Patch from Adrian Bunk

SGI-PV: 936255
SGI-Modid: xfs-linux:xfs-kern:192759a

Signed-off-by: Christoph Hellwig <hch@sgi.com>
Signed-off-by: Nathan Scott <nathans@sgi.com>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!