History log of /fs/ocfs2/dlmglue.c
Revision Date Author Comments
1848cb5530d3bada86c7b54f4f8b053b2081eb00 10-Oct-2014 Rob Jones <rob.jones@codethink.co.uk> fs/ocfs2/dlmglue.c: use __seq_open_private() not seq_open()

Reduce boilerplate code by using seq_open_private() instead of seq_open()

Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e72db989e1c01fde28aabf7fd29faeaa08538e24 05-Jun-2014 Xue jiufei <xuejiufei@huawei.com> ocfs2: remove some unused code

dlm_recovery_ctxt.received is unused.

ocfs2_should_refresh_lock_res() can only return 0 or 1, so the error
handling code in ocfs2_super_lock() is unneeded.

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
84d86f83f9d0e8431a3c9eae4c47e9d7ff49a411 03-Apr-2014 Jan Kara <jack@suse.cz> ocfs2: avoid blocking in ocfs2_mark_lockres_freeing() in downconvert thread

If we are dropping last inode reference from downconvert thread, we will
end up calling ocfs2_mark_lockres_freeing() which can block if the lock
we are freeing is queued thus creating an A-A deadlock. Luckily, since
we are the downconvert thread, we can immediately dequeue the lock and
thus avoid waiting in this case.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3e8341516409d026636be4d7534b84e6e90bef37 22-Jan-2014 Goldwyn Rodrigues <rgoldwyn@suse.de> ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node

This is done to differentiate between using and not using controld and
use the connection information accordingly.

We need to be backward compatible. So, we use a new enum
ocfs2_connection_type to identify when controld is used and when it is
not.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c74a3bdd9b529d924d1abf986079b783dd105ace 22-Jan-2014 Goldwyn Rodrigues <rgoldwyn@suse.de> ocfs2: add clustername to cluster connection

This is an effort of removing ocfs2_controld.pcmk and getting ocfs2 DLM
handling up to the times with respect to DLM (>=4.0.1) and corosync
(2.3.x). AFAIK, cman also is being phased out for a unified corosync
cluster stack.

fs/dlm performs all the functions with respect to fencing and node
management and provides the API's to do so for ocfs2. For all future
references, DLM stands for fs/dlm code.

The advantages are:
+ No need to run an additional userspace daemon (ocfs2_controld)
+ No controld device handling and controld protocol
+ Shifting responsibilities of node management to DLM layer

For backward compatibility, we are keeping the controld handling code.
Once enough time has passed we can remove a significant portion of the
code. This was tested by using the kernel with changes on older
unmodified tools. The kernel used ocfs2_controld as expected, and
displayed the appropriate warning message.

This feature requires modification in the userspace ocfs2-tools. The
changes can be found at: https://github.com/goldwynr/ocfs2-tools branch:
nocontrold Currently, not many checks are present in the userspace code,
but that would change soon.

This patch (of 6):

Add clustername to cluster connection.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
16735d022f72b20ddbb2274b8e109f69575e9b2b 14-Nov-2013 Wolfram Sang <wsa@the-dreams.de> tree-wide: use reinit_completion instead of INIT_COMPLETION

Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.

[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
41003a7bcfed1255032e1e7c7b487e505b22e298 08-May-2013 Zach Brown <zab@redhat.com> aio: remove retry-based AIO

This removes the retry-based AIO infrastructure now that nothing in tree
is using it.

We want to remove retry-based AIO because it is fundemantally unsafe.
It retries IO submission from a kernel thread that has only assumed the
mm of the submitting task. All other task_struct references in the IO
submission path will see the kernel thread, not the submitting task.
This design flaw means that nothing of any meaningful complexity can use
retry-based AIO.

This removes all the code and data associated with the retry machinery.
The most significant benefit of this is the removal of the locking
around the unused run list in the submission path.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Zach Brown <zab@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3278bb748d2437eb1464765f36429e5d6aa91c38 22-Feb-2013 Junxiao Bi <junxiao.bi@oracle.com> ocfs2: unlock super lock if lockres refresh failed

If lockres refresh failed, the super lock will never be released which
will cause some processes on other cluster nodes hung forever.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
03ab30f73dbf2f4f719d2c0c2acef81bf0445eb7 01-Feb-2013 Eric W. Biederman <ebiederm@xmission.com> ocfs2: convert between kuids and kgids and DLM locks

Convert between uid and gids stored in the on the wire format of dlm
locks aka struct ocfs2_meta_lvb and kuids and kgids stored in
inode->i_uid and inode->i_gid.

Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
a75e9ccabd925d16954739bd977c54695c9310d0 31-Jan-2012 Srinivas Eeda <srinivas.eeda@oracle.com> ocfs2: use spinlock irqsave for downconvert lock.patch

When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.

The patch disables interrupts when acquiring dc_task_lock spinlock.

ocfs2_wake_downconvert_thread
ocfs2_rw_unlock
ocfs2_dio_end_io
dio_complete
.....
bio_endio
req_bio_endio
....
scsi_io_completion
blk_done_softirq
__do_softirq
do_softirq
irq_exit
do_IRQ
ocfs2_downconvert_thread
[kthread]

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
16865b7c42fbce8a4d2b278460e387e719e289cb 12-Dec-2011 roel <roel.kluin@gmail.com> ocfs2: Misplaced parens in unlikley

Fix misplaced parentheses

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
bfe8684869601dacfcb2cd69ef8cfd9045f62170 28-Oct-2011 Miklos Szeredi <mszeredi@suse.cz> filesystems: add set_nlink()

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
03efed8a2a1b8e00164eb4720a82a7dd5e368a8e 27-May-2011 Tiger Yang <tiger.yang@oracle.com> ocfs2: Bugfix for hard readonly mount

ocfs2 cannot currently mount a device that is readonly at the media
("hard readonly"). Fix the broken places.
see detail: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1322

[ Description edited -- Joel ]

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Reviewed-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
c1e8d35ef5ffb393b94a192034b5e3541e005d75 07-Mar-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Remove EXIT from masklog.

mlog_exit is used to record the exit status of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

This patch just try to remove it or change it. So:
1. if all the error paths already use mlog_errno, it is just removed.
Otherwise, it will be replaced by mlog_errno.
2. if it is used to print some return value, it is replaced with
mlog(0,...).
mlog_exit_ptr is changed to mlog(0.
All those mlog(0,...) will be replaced with trace events later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
ef6b689b63b9f5227ccee6f16dd9ee3faf58a464 21-Feb-2011 Tao Ma <boyu.mt@taobao.com> ocfs2: Remove ENTRY from masklog.

ENTRY is used to record the entry of a function.
But because it is added in so many functions, if we enable it,
the system logs get filled up quickly and cause too much I/O.
So actually no one can open it for a production system or even
for a test.

So for mlog_entry_void, we just remove it.
for mlog_entry(...), we replace it with mlog(0,...), and they
will be replace by trace event later.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
5bc970e803ad2b1f26771f39376a79dbf0f5bf64 29-Dec-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Use hrtimer to track ocfs2 fs lock stats

Patch makes use of the hrtimer to track times in ocfs2 lock stats.

The patch is a bit involved to ensure no additional impact on the memory
footprint. The size of ocfs2_inode_cache remains 1280 bytes on 32-bit systems.

A related change was to modify the unit of the max wait time from nanosec to
microsec allowing us to track max time larger than 4 secs. This change
necessitated the bumping of the output version in the debugfs file,
locking_state, from 2 to 3.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
5e98d492406818e6a94c0ba54c61f59d40cefa4a 28-Jun-2010 Goldwyn Rodrigues <rgoldwyn@gmail.com> Track negative entries v3

Track negative dentries by recording the generation number of the parent
directory in d_fsdata. The generation number for the parent directory is
recorded in the inode_info, which increments every time the lock on the
directory is dropped.

If the generation number of the parent directory and the negative dentry
matches, there is no need to perform the revalidate, else a revalidate
is forced. This improves performance in situations where nodes look for
the same non-existent file multiple times.

Thanks Mark for explaining the DLM sequence.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
33fa1d909c7357be715aa0e9f9e24c3ef5714493 12-Jul-2010 Joe Perches <joe@perches.com> fs/ocfs2: Remove unnecessary casts of private_data

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
ae4f6ef13417deaa49471c0e903914a3ef3be258 28-Apr-2010 Jan Kara <jack@suse.cz> ocfs2: Avoid unnecessary block mapping when refreshing quota info

The position of global quota file info does not change. So we do not have
to do logical -> physical block translation every time we reread it from
disk. Thus we can also avoid taking ip_alloc_sem.

Acked-by: Joel Becker <Joel.Becker@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
9b915181af0a99fe94ef0152e6a4ca9990c3b6d0 27-Feb-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Use a separate masklog for AST and BASTs

This patch adds a new masklog and uses it allow tracing ASTs and BASTs
in the dlmglue layer. This has been found to be very useful in debugging
cluster locking issues.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
553b5eb91abd5f8e679d23ae547b92c589726814 30-Jan-2010 Joel Becker <joel.becker@oracle.com> ocfs2: Pass the locking protocol into ocfs2_cluster_connect().

Inside the stackglue, the locking protocol structure is hanging off of
the ocfs2_cluster_connection. This takes it one further; the locking
protocol is passed into ocfs2_cluster_connect(). Now different cluster
connections can have different locking protocols with distinct asts.
Note that all locking protocols have to keep their maximum protocol
version in lock-step.

With the protocol structure set in ocfs2_cluster_connect(), there is no
need for the stackglue to have a static pointer to a specific protocol
structure. We can change initialization to only pass in the maximum
protocol version.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
c0e4133851ed94c73ee3d34a2f2a245fcd0a60a1 29-Jan-2010 Joel Becker <joel.becker@oracle.com> ocfs2: Attach the connection to the lksb

We're going to want it in the ast functions, so we convert union
ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
a796d2862aed8117acc9f470f3429a5ee852912e 29-Jan-2010 Joel Becker <joel.becker@oracle.com> ocfs2: Pass lksbs back from stackglue ast/bast functions.

The stackglue ast and bast functions tried to maintain the fiction that
their arguments were void pointers. In reality, stack_user.c had to
know that the argument was an ocfs2_lock_res in order to get the status
off of the lksb. That's ugly.

This changes stackglue to always pass the lksb as the argument to ast
and bast functions. The caller can always use container_of() to get the
ocfs2_lock_res or user_dlm_lock_res. The net effect to the caller is
zero. They still get back the lockres in their ast. stackglue gets
cleaner, and now can use the lksb itself.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
3ad2f3fbb961429d2aa627465ae4829758bc7e07 03-Feb-2010 Daniel Mack <daniel@caiaq.de> tree-wide: Assorted spelling fixes

In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
079b805782f94f4b278132286a8c9bc4655d1c51 03-Feb-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Plugs race between the dc thread and an unlock ast message

This patch plugs a race between the downconvert thread and an unlock ast message.
Specifically, after the downconvert worker has done its task, the dc thread needs
to check whether an unlock ast made the downconvert moot.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@sus.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
db0f6ce69776370232431eb8be85a5b18b0019c0 02-Feb-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Remove overzealous BUG_ON during blocked lock processing

During blocked lock processing, we should consider the possibility that the
lock is no longer blocking.

Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
0d74125a6a68d4f1969ecaf0b3543f315916ccdc 29-Jan-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Do not downconvert if the lock level is already compatible

During upconvert, if the master were to send a BAST, dlmglue will detect the
upconversion in process and send a cancel convert to the master. Upon receiving
the AST for the cancel convert, it will re-process the lock resource to determine
whether it needs downconverting. Say, the up was from PR to EX and the BAST was
for EX. After the cancel convert, it will need to downconvert to NL.

However, if the node was originally upconverting from NL to EX, then there would
be no reason to downconvert (assuming the same message sequence).

This patch makes dlmglue consider the possibility that the current lock level
is already compatible and that downconverting is not required.

Joel Becker <joel.becker@oracle.com> assisted in fixing this issue.

Fixes ossbz#1178
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1178

Reported-by: Coly Li <coly.li@suse.de>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
a19128260107f951d1b4c421cf98b92f8092b069 21-Jan-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Prevent a livelock in dlmglue

There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were
to get an ast for an upconvert request, followed immediately by a bast,
there is a small window where the fs may downconvert the lock before the
process requesting the upconvert is able to take the lock.

This patch adds a new flag to indicate that the upconvert is still in
progress and that the dc thread should not downconvert it right now.

Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker
<joel.becker@oracle.com> contributed heavily to this patch.

Reported-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
0b94a909eb2e2f6990d05fd486a0cb4902ef1ae7 21-Jan-2010 Wengang Wang <wen.gang.wang@oracle.com> ocfs2: Fix setting of OCFS2_LOCK_BLOCKED during bast

During bast, set the OCFS2_LOCK_BLOCKED flag only if the lock needs to
downconverted.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2bd632165c1f783888bd4cbed95f2f304829159b 26-Jan-2010 Sunil Mushran <sunil.mushran@oracle.com> ocfs2/trivial: Remove trailing whitespaces

Patch removes trailing whitespaces.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
af901ca181d92aac3a7dc265144a9081a86d8f39 14-Nov-2009 André Goddard Rosa <andre.goddard@gmail.com> tree-wide: fix assorted typos all over the place

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
d92bc5127b27f315ef0ef2c1e1829fd6a5cba54a 28-Aug-2009 Coly Li <coly.li@suse.de> dlmglue.c: add missed mlog lines

This patch adds the missed mlog_exit() and mlog_exit_void() lines when routines
return.

Signed-off-by: Coly Li <coly.li@suse.de>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
8dec98edfe9684ce00b580a09dde3dcd21ee785b 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Add new refcount tree lock resource in dlmglue.

refcount tree lock resource is used to protect refcount
tree read/write among multiple nodes.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
a433848132d8cdfb8173745b922ddb919de11527 18-Aug-2009 Tao Ma <tao.ma@oracle.com> ocfs2: Abstract caching info checkpoint.

In meta downconvert, we need to checkpoint the metadata in an inode.
For refcount tree, we also need it. So abstract the process out.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
0cf2f7632b1789b811ab20b611c4156e6de2b055 13-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Pass struct ocfs2_caching_info to the journal functions.

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions. Thus the
journal locks a metadata cache with the cache io_lock function. It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
8cb471e8f82506937fe5e2e9fb0bf90f6b1f1170 11-Feb-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Take the inode out of the metadata read/write paths.

We are really passing the inode into the ocfs2_read/write_blocks()
functions to get at the metadata cache. This commit passes the cache
directly into the metadata block functions, divorcing them from the
inode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
cb25797d451dc774d9dbc402a65f16a0e32199fe 04-Jun-2009 Jan Kara <jack@suse.cz> ocfs2: Add lockdep annotations

Add lockdep support to OCFS2. The support also covers all of the cluster
locks except for open locks, journal locks, and local quotafile locks. These
are special because they are acquired for a node, not for a particular process
and lockdep cannot deal with such type of locking.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
df152c241df9e9d2b9a65d37bd02961abe7f591a 22-Jun-2009 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Disable orphan scanning for local and hard-ro mounts

Local and Hard-RO mounts do not need orphan scanning.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
3211949f8998dde71d9fe2e063de045ece5e0473 20-Jun-2009 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Do not initialize lvb in ocfs2_orphan_scan_lock_res_init()

We don't access the LVB in our ocfs2_*_lock_res_init() functions.

Since the LVB can become invalid during some cluster recovery
operations, the dlmglue must be able to handle an uninitialized
LVB.

For the orphan scan lock, we initialized an uninitialzed LVB with our
scan sequence number plus one. This starts a normal orphan scan
cycle.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
1c520dfbf391e1617ef61553f815b8006a066c44 20-Jun-2009 Joel Becker <joel.becker@oracle.com> ocfs2: Provide the ocfs2_dlm_lvb_valid() stack API.

The Lock Value Block (LVB) of a DLM lock can be lost when nodes die and
the DLM cannot reconstruct its state. Clients of the DLM need to know
this.

ocfs2's internal DLM, o2dlm, explicitly zeroes out the LVB when it loses
track of the state. This is not a standard behavior, but ocfs2 has
always relied on it. Thus, an o2dlm LVB is always "valid".

ocfs2 now supports both o2dlm and fs/dlm via the stack glue. When
fs/dlm loses track of an LVBs state, it sets a flag
(DLM_SBF_VALNOTVALID) on the Lock Status Block (LKSB). The contents of
the LVB may be garbage or merely stale.

ocfs2 doesn't want to try to guess at the validity of the stale LVB.
Instead, it should be checking the VALNOTVALID flag. As this is the
'standard' way of treating LVBs, we will promote this behavior.

We add a stack glue API ocfs2_dlm_lvb_valid(). It returns non-zero when
the LVB is valid. o2dlm will always return valid, while fs/dlm will
check VALNOTVALID.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>
83273932fbefb6ceef9c0b82ac4d23900728f4d9 04-Jun-2009 Srinivas Eeda <srinivas.eeda@oracle.com> ocfs2: timer to queue scan of all orphan slots

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
6ca497a83e592d64e050c4d04b6dedb8c915f39a 06-Mar-2009 wengang wang <wen.gang.wang@oracle.com> ocfs2: fix rare stale inode errors when exporting via nfs

For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.

This patch fixes above problem.

Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.

We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.

And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.

[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
c74ff8bb2235d848beb67fcfddae71ecbe3f92b1 03-Feb-2009 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Cleanup the lockname print in dlmglue.c

The dentry lock has a different format than other locks. This patch fixes
ocfs2_log_dlm_error() macro to make it print the dentry lock correctly.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
a4b91965d39d5d53b470d6aa62cba155a6f3ffe1 30-Jan-2009 Sunil Mushran <sunil.mushran@oracle.com> ocfs2: Wakeup the downconvert thread after a successful cancel convert

When two nodes holding PR locks on a resource concurrently attempt to
upconvert the locks to EX, the master sends a BAST to one of the nodes. This
message tells that node to first cancel convert the upconvert request,
followed by downconvert to a NL. Only when this lock is downconverted to NL,
can the master upconvert the first node's lock to EX.

While the fs was doing the cancel convert, it was forgetting to wake up the
dc thread after a successful cancel, leading to a deadlock.

Reported-and-Tested-by: David Teigland <teigland@redhat.com>
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
73ac36ea14fd18ea3dc057e41b16ff31a3c0bd5a 08-Jan-2009 Coly Li <coyli@suse.de> fix similar typos to successfull

When I review ocfs2 code, find there are 2 typos to "successfull". After
doing grep "successfull " in kernel tree, 22 typos found totally -- great
minds always think alike :)

This patch fixes all the similar typos. Thanks for Randy's ack and comments.

Signed-off-by: Coly Li <coyli@suse.de>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a641dc2a5a1445eb4cb491080dfc41c42a9eb37d 25-Dec-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: remove unneeded lvb casts

dlmglue.c has lots of code which casts the return value of ocfs2_dlm_lvb().
This is pointless however, as ocfs2_dlm_lvb() returns void *.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
85eb8b73d66530bb7b931789ae7a5ec9744eed34 25-Nov-2008 Joel Becker <Joel.Becker@oracle.com> ocfs2: Fix ocfs2_read_quota_block() error handling.

ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to
match ocfs2_read_blocks(). The quota code, converting from
ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in
ocfs2_read_quota_block(). Unfortunately, the prototype of
ocfs2_read_quota_block() matches the old prototype of ocfs2_bread().

The problem is that ocfs2_bread() returned the buffer head, and callers
assumed that a NULL pointer was indicative of error. It wasn't. This
is why ocfs2_bread() took an int*err argument as well.

The new prototype of ocfs2_read_virt_blocks() avoids this error handling
confusion. Let's change ocfs2_read_quota_block() to match.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
9e33d69f553aaf11377307e8d6f82deb3385e351 25-Aug-2008 Jan Kara <jack@suse.cz> ocfs2: Implementation of local and global quota file handling

For each quota type each node has local quota file. In this file it stores
changes users have made to disk usage via this node. Once in a while this
information is synced to global file (and thus with other nodes) so that
limits enforcement at least aproximately works.

Global quota files contain all the information about usage and limits. It's
mostly handled by the generic VFS code (which implements a trie of structures
inside a quota file). We only have to provide functions to convert structures
from on-disk format to in-memory one. We also have to provide wrappers for
various quota functions starting transactions and acquiring necessary cluster
locks before the actual IO is really started.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
b657c95c11088d77fc1bfc9c84d940f778bf9d12 13-Nov-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Wrap inode block reads in a dedicated function.

The ocfs2 code currently reads inodes off disk with a simple
ocfs2_read_block() call. Each place that does this has a different set
of sanity checks it performs. Some check only the signature. A couple
validate the block number (the block read vs di->i_blkno). A couple
others check for VALID_FL. Only one place validates i_fs_generation. A
couple check nothing. Even when an error is found, they don't all do
the same thing.

We wrap inode reading into ocfs2_read_inode_block(). This will validate
all the above fields, going readonly if they are invalid (they never
should be). ocfs2_read_inode_block_full() is provided for the places
that want to pass read_block flags. Every caller is passing a struct
inode with a valid ip_blkno, so we don't need a separate blkno argument
either.

We will remove the validation checks from the rest of the code in a
later commit, as they are no longer necessary.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
07f9eebcdfaeefc8f807fa1bcce1d7c3ae6661b1 17-Nov-2008 David Teigland <teigland@redhat.com> ocfs2: fix wake_up in unlock_ast

In ocfs2_unlock_ast(), call wake_up() on lockres before releasing
the spin lock on it. As soon as the spin lock is released, the
lockres can be freed.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
0fcaa56a2a020dd6f90c202b7084e6f4cbedb6c2 10-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Simplify ocfs2_read_block()

More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED.
Only six pass a different flag set. Rather than have every caller care,
let's make ocfs2_read_block() take no flags and always do a cached read.
The remaining six places can call ocfs2_read_blocks() directly.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
31d33073ca38603dea705dae45e094a64ca062d6 10-Oct-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Require an inode for ocfs2_read_block(s)().

Now that synchronous readers are using ocfs2_read_blocks_sync(), all
callers of ocfs2_read_blocks() are passing an inode. Use it
unconditionally. Since it's there, we don't need to pass the
ocfs2_super either.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
dd25e55ea133b14678cfaa9e205b082b24b26dbc 28-May-2008 Randy Dunlap <randy.dunlap@oracle.com> ocfs2: fix printk format warnings with OCFS2_FS_STATS=n

Fix printk format warnings when OCFS2_FS_STATS=n:

linux-next-20080528/fs/ocfs2/dlmglue.c: In function 'ocfs2_dlm_seq_show':
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'int'
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'int'
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 7 has type 'int'
linux-next-20080528/fs/ocfs2/dlmglue.c:2623: warning: format '%llu' expects type 'long long unsigned int', but argument 8 has type 'int'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
8ddb7b004dfa1832a750e199df8bff4b75b73565 13-May-2008 Sunil Mushran <sunil.mushran@oracle.com> [PATCH 2/2] ocfs2: Instrument fs cluster locks

This patch adds code to track the number of times the fs takes
various cluster locks as well as the times associated with it.
The information is made available to users via debugfs.

This patch was originally written by Jan Kara <jack@suse.cz>.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
e988cf1cfed4ed80bf40528e655fe18bed6a38b6 10-Jul-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: Fix flags in ocfs2_file_lock

The stack-glue merge changed the way we use flags in dlmglue in that we now
use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
code only partially updated. This took a while to show up though, because
the lock level constants are actually identical between o2dlm and fs/dlm.
The *_CONVERT and *_NOQUEUE flags have different values though, which is
eventually causing a crash in flags_to_o2dlm().

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
9c6c877c04ce17d76a35d2173d3a3840d6b796a2 02-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Add the 'cluster_stack' sysfs file.

Userspace can now query and specify the cluster stack in use via the
/sys/fs/ocfs2/cluster_stack file. By default, it is 'o2cb', which is
the classic stack. Thus, old tools that do not know how to modify this
file will work just fine. The stack cannot be modified if there is a
live filesystem.

ocfs2_cluster_connect() now takes the expected cluster stack as an
argument. This way, the filesystem and the stack glue ensure they are
speaking to the same backend.

If the stack is 'o2cb', the o2cb stack plugin is used. For any other
value, the fsdlm stack plugin is selected.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
286eaa95c5c5915a6b72cc3f0a2534161fd7928b 02-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Break out stackglue into modules.

We define the ocfs2_stack_plugin structure to represent a stack driver.
The o2cb stack code is split into stack_o2cb.c. This becomes the
ocfs2_stack_o2cb.ko module.

The stackglue generic functions are similarly split into the
ocfs2_stackglue.ko module. This module now provides an interface to
register drivers. The ocfs2_stack_o2cb driver registers itself. As
part of this interface, ocfs2_stackglue can load drivers on demand.
This is accomplished in ocfs2_cluster_connect().

ocfs2_cluster_disconnect() is now notified when a _hangup() is pending.
If a hangup is pending, it will not release the driver module and will
let _hangup() do that.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
63e0c48ae6986a5bbb8e8dd9210c0e6ca79f2e50 31-Jan-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Clean up stackglue initialization

The stack glue initialization function needs a better name so that it can be
used cleanly when stackglue becomes a module.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
cf0acdcd640e9466059e69951c557e90b4bee45a 30-Jan-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Abstract out a debugging function for underlying dlms.

dlmglue.c was still referencing a raw o2dlm lksb in one instance. Let's
create a generic ocfs2_dlm_dump_lksb() function. This allows underlying
DLMs to print whatever they want about their lock.

We then move the o2dlm dump into stackglue.c where it belongs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
1693a5c0117f8ccd010a666f97aaf0f14fb0a0e4 31-Jan-2008 David Teigland <teigland@redhat.com> ocfs2: handle async EAGAIN from NOQUEUE request

When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE
requests. Fix up dlmglue to expect this.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
de551246e7bc5558371c3427889a8db1b8cc60f4 01-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Remove CANCELGRANT from the view of dlmglue.

o2dlm has the non-standard behavior of providing a cancel callback
(unlock_ast) even when the cancel has failed (the locking operation
succeeded without canceling). This is called CANCELGRANT after the
status code sent to the callback. fs/dlm does not provide this
callback, so dlmglue must be changed to live without it.
o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.

Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
needs to check for it. ocfs2_locking_ast() must catch that a cancel was
tried and clear the cancel state.

Making these changes opens up a locking race. dlmglue uses the the
OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
one time. But dlmglue must unlock the lockres before calling into the
dlm. In the small window of time between unlocking the lockres and
calling the dlm, the downconvert thread can try to cancel the lock. The
downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
know that ocfs2_dlm_lock() has not yet been called.

Because ocfs2_dlm_lock() has not yet been called, the cancel operation
will just be a no-op. There's nothing to cancel. With CANCELGRANT,
dlmglue uses the CANCELGRANT callback to clear up the cancel state.
When it comes around again, it will retry the cancel. Eventually, the
first thread will have called into ocfs2_dlm_lock(), and either the
lock or the cancel will succeed. The downconvert thread can then do its
downconvert.

Without CANCELGRANT, there is nothing to clean up the cancellation
state. The downconvert thread does not know to retry its operations.
More importantly, the original lock may be blocking on the other node
that is trying to cancel us. With neither able to make progress, the
ast is never called and the cancellation state is never cleaned up that
way. dlmglue is deadlocked.

The OCFS2_LOCK_PENDING flag is introduced to remedy this window. It is
set at the same time OCFS2_LOCK_BUSY is. Thus, the downconvert thread
can check whether the lock is cancelable. If not, it just loops around
to try again. Once ocfs2_dlm_lock() is called, the thread then clears
OCFS2_LOCK_PENDING and wakes the downconvert thread. Now, if the
downconvert thread finds the lock BUSY, it can safely try to cancel it.
Whether the cancel works or not, the state will be properly set and the
lock processing can continue.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
0abd6d1803b01c741430af270026d1d95a103d9c 30-Jan-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: Fill node number during cluster stack init

It doesn't make sense to query for a node number before connecting to the
cluster stack. This should be safe to do because node_num is only just
printed,
and we're actually only moving the setting of node num a small amount
further in the mount process.

[ Disconnect when node query fails -- Joel ]

Reviewed-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
6953b4c008628b945bfe0cee97f6e78a98773859 30-Jan-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Move o2hb functionality into the stack glue.

The last bit of classic stack used directly in ocfs2 code is o2hb.
Specifically, the check for heartbeat during mount and the call to
ocfs2_hb_ctl during unmount.

We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
to ocfs2_hb_ctl. Other stacks will just leave hangup() empty.

The check for heartbeat is moved into ocfs2_cluster_connect(). It will
be matched by a similar check for other stacks.

With this change, only stackglue.c includes cluster/ headers.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
4670c46ded9a18268d1265417ff4ac72145a7917 01-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API.

This step introduces a cluster stack agnostic API for initializing and
exiting. fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
connect to the stack. It is all handled in stackglue.c.

heartbeat.c no longer needs to know how it gets called.
ocfs2_do_node_down() is now a clean recovery trigger.

The big gotcha is the ordering of initializations and de-initializations done
underneath ocfs2_cluster_connect(). ocfs2_dlm_init() used to do all
o2dlm initialization in one block. Thus, the o2dlm functionality of
ocfs2_cluster_connect() is very straightforward. ocfs2_dlm_shutdown(),
however, did a few things between de-registration of the eviction
callback and actually shutting down the domain. Now de-registration and
shutdown of the domain are wrapped within the single
ocfs2_cluster_disconnect() call. I've checked the code paths to make
sure we can safely tear down things in ocfs2_dlm_shutdown() before
calling ocfs2_cluster_disconnect(). The filesystem has already set
itself to ignore the callback.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
8f2c9c1b16bf6ed0903b29c49d56fa0109a390e4 01-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Create the lock status block union.

Wrap the lock status block (lksb) in a union. Later we will add a union
element for the fs/dlm lksb. Create accessors for the status and lvb
fields.

Other than a debugging function, dlmglue.c does not directly reference
the o2dlm locking path anymore.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
7431cd7e8dd0e46e9b12bd6a1ac1286f4b420371 01-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API.

Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
This is the first step towards elminiating dlm_status in
fs/ocfs2/dlmglue.c. The change also passes -errno values to
->unlock_ast().

[ Fix a return code in dlmglue.c and change the error translation table into
an array of ints. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
bd3e76105d4478ab89951a52d1a35250d24a9f16 01-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Use global DLM_ constants in generic code.

The ocfs2 generic code should use the values in <linux/dlmconstants.h>.
stackglue.c will convert them to o2dlm values.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
24ef1815e5e13e50196eb1ab8ddc0d783443bdf8 30-Jan-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Separate out dlm lock functions.

This is the first in a series of patches to isolate ocfs2 from the
underlying cluster stack. Here we wrap the dlm locking functions with
ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
callbacks, we can eliminate the callbacks from the filesystem visible
functions.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
553abd046af609191a91af7289d87d477adc659f 01-Feb-2008 Joel Becker <joel.becker@oracle.com> ocfs2: Change the recovery map to an array of node numbers.

The old recovery map was a bitmap of node numbers. This was sufficient
for the maximum node number of 254. Going forward, we want node numbers
to be UINT32. Thus, we need a new recovery map.

Note that we can't keep track of slots here. We must write down the
node number to recovery *before* we get the locks needed to convert a
node number into a slot number.

The recovery map is now an array of unsigned ints, max_slots in size.
It moves to journal.c with the rest of recovery.

Because it needs to be initialized, we move all of recovery initialization
into a new function, ocfs2_recovery_init(). This actually cleans up
ocfs2_initialize_super() a little as well. Following on, recovery cleaup
becomes part of ocfs2_recovery_exit().

A number of node map functions are rendered obsolete and are removed.

Finally, waiting on recovery is wrapped in a function rather than naked
checks on the recovery_event. This is a cleanup from Mark.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
8e8a4603b5422c9145880e73b23bc4c2c4de0098 01-Feb-2008 Mark Fasheh <mfasheh@suse.com> ocfs2: Move slot map access into slot_map.c

journal.c and dlmglue.c would refresh the slot map by hand. Instead, have
the update and clear functions do the work inside slot_map.c. The eventual
result is to make ocfs2_slot_info defined privately in slot_map.c

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
90d99779a4cc134daaf8910d814b7a8a5d1e8970 22-Jan-2008 Jan Engelhardt <jengelh@computergmbh.de> [PATCH] [OCFS2]: constify function pointer tables

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
200bfae37a15e50e0f9aa5683958bdfc3fd55e05 17-Feb-2008 Adrian Bunk <bunk@kernel.org> [2.6 patch] make ocfs2_downconvert_thread() static

This patch makes the needlessly global ocfs2_downconvert_thread()
static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
006000566d4e95b8d1924addfb41094acf0d5ec2 28-Jan-2008 Adrian Bunk <bunk@kernel.org> [2.6 patch] fs/ocfs2/: possible cleanups

This patch contains the following cleanups that are now possible:
- make the following needlessly global functions static:
- dlmglue.c:ocfs2_process_blocked_lock()
- heartbeat.c:ocfs2_node_map_init()
- #if 0 the following unused global function plus support functions:
- heartbeat.c:ocfs2_node_map_is_only()

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
1044e401af9a309637828aa3cc8f3b6409fcbf4e 29-Feb-2008 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Fix writeout in ocfs2_data_convert_worker()

Commit f1f540688eae66c274ff1c1133b5d9c687b28f58 "optimized"
ocfs2_data_convert_worker() to "only do work for regular files".
Unfortunately, I left out a '!', which casued it to *skip* regular files.
This was hidden from testing until recently because the default data
journaling mode (data=ordered) doesn't exercise this code.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
d24fbcda0c4988322949df3d759f1cfb32b32953 26-Jan-2008 Joel Becker <Joel.Becker@oracle.com> ocfs2: Negotiate locking protocol versions.

Currently, when ocfs2 nodes connect via TCP, they advertise their
compatibility level. If the versions do not match, two nodes cannot speak
to each other and they disconnect. As a result, this provides no forward or
backwards compatibility.

This patch implements a simple protocol negotiation at the dlm level by
introducing a major/minor version number scheme for entities that
communicate. Specifically, o2dlm has a major/minor version for interaction
with o2dlm on other nodes, and ocfs2 itself has a major/minor version for
interacting with the filesystem on other nodes.

This will allow rolling upgrades of ocfs2 clusters when changes to the
locking or network protocols can be done in a backwards compatible manner.
In those cases, only the minor number is changed and the negotatied protocol
minor is returned from dlm join. In the far less likely event that a
required protocol change makes backwards compatibility impossible, we simply
bump the major number.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
cf8e06f1a860d8680d6bb4ac8ec7d7724988e46f 21-Dec-2007 Mark Fasheh <mark.fasheh@oracle.com> [PATCH 1/2] ocfs2: add flock lock type

This adds a new dlmglue lock type which is intended to back flock()
requests.

Since these locks are driven from userspace, usage rules are much more
liberal than the typical Ocfs2 internal cluster lock. As a result, we can't
make use of most dlmglue features - lock caching and lock level
optimizations in particular. Additionally, userspace is free to deadlock
itself, so we have to deal with that in the same way as the rest of the
kernel - by allowing a signal to abort a lock request.

In order to keep ocfs2_cluster_lock() complexity down, ocfs2_file_lock()
does it's own dlm coordination. We still use the same helper functions
though, so duplicated code is kept to a minimum.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e63aecb651ba73dffc62f9608ee1b7ae2a0ffd4b 19-Oct-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Rename ocfs2_meta_[un]lock

Call this the "inode_lock" now, since it covers both data and meta data.
This patch makes no functional changes.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
c934a92d05b549dd2f25db72c5fc3cb9dcf1b611 19-Oct-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove data locks

The meta lock now covers both meta data and data, so this just removes the
now-redundant data lock.

Combining locks saves us a round of lock mastery per inode and one less lock
to ping between nodes during read/write.

We don't lose much - since meta locks were always held before a data lock
(and at the same level) ordered writeout mode (the default) ensured that
flushing for the meta data lock also pushed out data anyways.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
f1f540688eae66c274ff1c1133b5d9c687b28f58 19-Oct-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Add data downconvert worker to inode lock

In order to extend inode lock coverage to inode data, we use the same data
downconvert worker with only a small modification to only do work for
regular files.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
34d024f84345807bf44163fac84e921513dde323 25-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove mount/unmount votes

The node maps that are set/unset by these votes are no longer relevant, thus
we can remove the mount and umount votes. Since those are the last two
remaining votes, we can also remove the entire vote infrastructure.

The vote thread has been renamed to the downconvert thread, and the small
amount of functionality related to managing it has been moved into
fs/ocfs2/dlmglue.c. All references to votes have been removed or updated.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
019d1b2247c6898589560c6f3b3e7ec280b0010a 05-Oct-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Create locks at initially requested level

If we have not yet created a cluster lock, ocfs2_cluster_lock() will
first create it at NLMODE, and then convert the lock to either PRMODE or
EXMODE (whichever is requested).

Change ocfs2_cluster_lock() to just create the lock at the initially
requested level. ocfs2_locking_ast() handles this case fine, so the only
update required was in setup of locking state. This should reduce the number
of network messages required for a new lock by one, providing an incremental
performance enhancement.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
3cf0c507dd28de0e1a4c24304d806e6b3976f0f5 27-Oct-2007 Roel Kluin <12o3l@tiscali.nl> [PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}

Fixes priority mistakes similar to '!x & y'

Signed-off-by: Roel Kluin <12o3l@tiscali.nl>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
15b1e36bdb487d67ef924a37b0967453143be53a 07-Sep-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Structure updates for inline data

Add the disk, network and memory structures needed to support data in inode.

Struct ocfs2_inline_data is defined and embedded in ocfs2_dinode for storing
inline data.

A new inode field, i_dyn_features, is added to facilitate tracking of
dynamic inode state. Since it will be used often, we want to mirror it on
ocfs2_inode_info, and transfer it via the meta data lvb.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: Joel Becker <joel.becker@oracle.com>
800deef3f6f87fee3a2e89cf7237a1f20c1a78d7 17-May-2007 Christoph Hellwig <hch@lst.de> [PATCH] ocfs2: use list_for_each_entry where benefical

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e63340ae6b6205fef26b40a75673d1c9c0c8bb90 08-May-2007 Randy Dunlap <randy.dunlap@oracle.com> header cleaning: don't include smp_lock.h when not used

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6cb129f5675c39944e5fe18fd2530a2eb771b754 26-Apr-2007 Adrian Bunk <bunk@stusta.de> [PATCH] fs/ocfs2/: make 3 functions static

This patch makes the following needlessly global functions static:
- aops.c: ocfs2_write_data_page()
- dlmglue.c: ocfs2_dump_meta_lvb_info()
- file.c: ocfs2_set_inode_size()

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
83418978827324918a8cd25ce5227312de1d4468 24-Apr-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Cache extent records

The extent map code was ripped out earlier because of an inability to deal
with holes. This patch adds back a simpler caching scheme requiring far less
code.

Our old extent map caching was designed back when meta data block caching in
Ocfs2 didn't work very well, resulting in many disk reads. These days our
metadata caching is much better, resulting in no un-necessary disk reads. As
a result, extent caching doesn't have to be as fancy, nor does it have to
cache as many extents. Keeping the last 3 extents seen should be sufficient
to give us a small performance boost on some streaming workloads.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
8110b073a9135acf0a71bccfc20c0d1023f179c6 23-Mar-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Fix up i_blocks calculation to know about holes

Older file systems which didn't support holes did a dumb calculation of
i_blocks based on i_size. This is no longer accurate, so fix things up to
take actual allocation into account.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
363041a5f74b953ab6b705ac9c88e5eda218a24b 17-Jan-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: temporarily remove extent map caching

The code in extent_map.c is not prepared to deal with a subtree being
rotated between lookups. This can happen when filling holes in sparse files.
Instead of a lengthy patch to update the code (which would likely lose the
benefit of caching subtree roots), we remove most of the algorithms and
implement a simple path based lookup. A less ambitious extent caching scheme
will be added in a later patch.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
500086300e6dc5308a7328990bd50d17e075162b 21-Mar-2007 Tiger Yang <tiger.yang@oracle.com> ocfs2: Remove delete inode vote

Ocfs2 currently does cluster-wide node messaging to check the open state of
an inode during delete. This patch removes that mechanism in favor of an
inode cluster lock which is taken at shared read when an inode is first read
and dropped in clear_inode(). This allows a deleting node to test the
liveness of an inode by attempting to take an exclusive lock.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
be9e986b824b41c9d5cc5eca34ee3424c35fd162 19-Apr-2007 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Local mounts should skip inode updates

We don't want the extent map and uptodate cache destruction in
ocfs2_meta_lock_update() on a local mount, so skip that.

This fixes several bugs with uptodate being cleared on buffers and extent
maps being corrupted.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
7f4a2a97e324e8c826d1d983bc8efb5c59194f02 11-Dec-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: always unmap in ocfs2_data_convert_worker()

Mmap-heavy clustered workloads were sometimes finding stale data on mmap
reads. The solution is to call unmap_mapping_range() on any down convert of
a data lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
c271c5c22b0a7ca45fda15f1f4d258bca36a5b94 06-Dec-2006 Sunil Mushran <Sunil.Mushran@oracle.com> ocfs2: local mounts

This allows users to format an ocfs2 file system with a special flag,
OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it
will not use any cluster services, nor will it require a cluster
configuration, thus acting like a 'local' file system.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
7f1a37e31f94b4f1c123d32ce9f69205ab2095bd 15-Nov-2006 Tiger Yang <tiger.yang@oracle.com> ocfs2: core atime update functions

This patch adds the core routines for updating atime in ocfs2.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
4bcec1847ac4f75c2ee6d091b495f34d8d822e6a 10-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: remove unused handle argument from ocfs2_meta_lock_full()

Now that this is unused and all callers pass NULL, we can safely remove it.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
daf29e9cdab7219838c6b6e82380aec3466cf379 07-Oct-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: remove unused ocfs2_handle_add_lock()

This gets us rid of a slab we no longer need, as well as removing the
majority of what's left on ocfs2_journal_handle.

ocfs2_commit_unstarted_handle() has no more real work to do, so remove that
function too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
da66116eef7da8557762c9b5efdc435bc0afecfa 20-Nov-2006 Adrian Bunk <bunk@stusta.de> [2.6 patch] make ocfs2_create_new_lock() static

This patch makes the needlessly global ocfs2_create_new_lock() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
8e18e2941c53416aa219708e7dcad21fb4bd6794 27-Sep-2006 Theodore Ts'o <tytso@mit.edu> [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private

The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0d5dc6c2dd7a3cd2b2f505b0625c4ec9c0e5b4f0 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Teach ocfs2_drop_lock() to use ->set_lvb() callback

With this, we don't need to pass an additional struct with function pointer.

Now that the callbacks are fully used, comment the remaining API.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
b5e500e23e532795fbf79a3cdbcb014f207fdb2a 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove ->unblock lockres operation

Have ocfs2_process_blocked_lock() call ocfs2_generic_unblock_lock(), which
gets to be ocfs2_unblock_lock() now that it's the only possible unblock
function.

Remove the ->unblock() callback from the structure, and all lock type
specific unblock functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
cc567d89b3af4294580c9c97610d2c1018032e33 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: move downconvert worker to lockres ops

This way lock types don't have to manually pass it to
ocfs2_generic_unblock_lock().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
08280f11de91beac2f5234ce5fc2ed246dfe6a86 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove unused dlmglue functions

The meta data unblocking code no longer needs ocfs2_do_unblock_meta() or
ocfs2_can_downconvert_meta_lock(), so remove them.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
810d5aeba18825c754cf47db59eb83814a54bb27 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Have the metadata lock use generic dlmglue functions

Fill in the ->check_downconvert and ->set_lvb callbacks with meta data
specific operations and switch ocfs2_unblock_meta() to call
ocfs2_generic_unblock_lock()

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
5ef0d4ea087740908f4fb57606f6c09e3b90c477 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Add ->set_lvb callback in dlmglue

This allows a lock type to set the value block before downconvert.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
16d5b9567ad5241b5c6e0cc4778c1af6c04bb801 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Add ->check_downconvert callback in dlmglue

This will allow lock types to force a requeue of a lock downconvert.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
f7fbfdd1fc91543253ba742a926a29c289f8e6ca 14-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Check for refreshing locks in generic unblock function

Tidy up the exit path a bit too.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
b80fc012e03f8f207911b5eafe6916b000e03c8b 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: don't unconditionally pass LVB flags

Allow a lock type to specifiy whether it makes use of the LVB. The only type
which does this right now is the meta data lock. This should save us some
space on network messages since they won't have to needlessly transmit value
blocks.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
aa2623ad80577b37637914e809bafa36994ccdf1 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: combine inode and generic blocking AST functions

There is extremely little difference between the two now. We can remove the
callback from ocfs2_lock_res_ops as well.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
54a7e7552e484c08db221e49c4519ccdeb8882d0 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Add ->get_osb() dlmglue locking operation

Will be used to find the ocfs2_super structure from a given lockres.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2a45f2d13e1dd91bc110801f5818379f2699509c 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: remove ->unlock_ast() callback from ocfs2_lock_res_ops

This was always defined to the same function in all locks, so clean things
up by removing and passing ocfs2_unlock_ast() directly to the DLM.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
e92d57df273a3a7e57688e1d4f5a894870d550d2 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: combine inode and generic AST functions

There is extremely little difference between the two now. We can remove the
callback from ocfs2_lock_res_ops as well.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
f625c9793b6cc64aeb1b6387039d09019c214352 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Clean up lock resource refresh flags

Use of the refresh mechanism is lock-type wide, so move knowledge of that to
the ocfs2_lock_res_ops structure.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
24c19ef40474c3930597f31ae233dc06319bd881 23-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Remove i_generation from inode lock names

OCFS2 puts inode meta data in the "lock value block" provided by the DLM.
Typically, i_generation is encoded in the lock name so that a deleted inode
on and a new one in the same block don't share the same lvb.

Unfortunately, that scheme means that the read in ocfs2_read_locked_inode()
is potentially thrown away as soon as the meta data lock is taken - we
cannot encode the lock name without first knowing i_generation, which
requires a disk read.

This patch encodes i_generation in the inode meta data lvb, and removes the
value from the inode meta data lock name. This way, the read can be covered
by a lock, and at the same time we can distinguish between an up to date and
a stale LVB.

This will help cold-cache stat(2) performance in particular.

Since this patch changes the protocol version, we take the opportunity to do
a minor re-organization of two of the LVB fields.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
f9e2d82e6395cfa0802446b54b63cc412089d82c 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Encode i_generation in the meta data lvb

When i_generation is removed from the lockname, this will help us determine
whether a meta data lvb has information that is in sync with the local
struct inode.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
4d3b83f7364269b66cdda271f680bd99e77afd96 13-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Free up some space in the lvb

lvb_version doesn't need to be a whole 32 bits. Make it an 8 bit field to
free up some space. This should be backwards compatible until we use one of
the fields, in which case we'd bump the lvb version anyway.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
d680efe9d8fe0eb99d9dd063a4def6b362cdb40d 08-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Add new cluster lock type

Replace the dentry vote mechanism with a cluster lock which covers a set
of dentries. This allows us to force d_delete() only on nodes which actually
care about an unlink.

Every node that does a ->lookup() gets a read only lock on the dentry, until
an unlink during which the unlinking node, will request an exclusive lock,
forcing the other nodes who care about that dentry to d_delete() it. The
effect is that we retain a very lightweight ->d_revalidate(), and at the
same time get to make large improvements to the average case performance of
the ocfs2 unlink and rename operations.

This patch adds the cluster lock type which OCFS2 can attach to
dentries. A small number of fs/ocfs2/dcache.c functions are stubbed
out so that this change can compile.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
f0681062b8e369d9fb6f3ce10f4e3fc8cea5f910 08-Sep-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: Update dlmglue for new dlmlock() API

File system lock names are very regular right now, so we really only need to
pass an extra parameter to dlmlock().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
ca4d147e62df370c334898464023aa7f9126abe1 04-Jul-2006 Herbert Poetzl <herbert@13thfloor.at> ocfs2: add ext2 attributes

Support immutable, and other attributes.

Some renaming and other minor fixes done by myself.

Signed-off-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
784270435b001164054e803421a624ef1098519d 04-May-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: clean up some osb fields

Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were
unused, or could easily be removed. As a result, we also no longer need
MAX_OSB_ID or ocfs2_globals_lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
34af946a22724c4e2b204957f2b24b22a0fb121c 27-Jun-2006 Ingo Molnar <mingo@elte.hu> [PATCH] spin/rwlock init cleanups

locking init cleanups:

- convert " = SPIN_LOCK_UNLOCKED" to spin_lock_init() or DEFINE_SPINLOCK()
- convert rwlocks in a similar manner

this patch was generated automatically.

Motivation:

- cleanliness
- lockdep needs control of lock initialization, which the open-coded
variants do not give
- it's also useful for -rt and for lock debugging in general

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
4b6f5d20b04dcbc3d888555522b90ba6d36c4106 28-Mar-2006 Arjan van de Ven <arjan@infradead.org> [PATCH] Make most file operations structs in fs/ const

This is a conversion to make the various file_operations structs in fs/
const. Basically a regexp job, with a few manual fixups

The goal is both to increase correctness (harder to accidentally write to
shared datastructures) and reducing the false sharing of cachelines with
things that get dirty in .data (while .rodata is nicely read only and thus
cache clean)

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
b0697053f9e8de9cea3d510d9e290851ece9460b 03-Mar-2006 Mark Fasheh <mark.fasheh@oracle.com> ocfs2: don't use MLF* in the file system

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
ccd979bdbce9fba8412beb3f1de68a9d0171b12c 15-Dec-2005 Mark Fasheh <mark.fasheh@oracle.com> [PATCH] OCFS2: The Second Oracle Cluster Filesystem

The OCFS2 file system module.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Kurt Hackel <kurt.hackel@oracle.com>