History log of /fs/dlm/lowcomms.c
Revision Date Author Comments
883854c5457a97190f7b0ee20f03bcd9664fc0c2 12-Jun-2014 Lidong Zhong <lzhong@suse.com> dlm: keep listening connection alive with sctp mode

The connection struct with nodeid 0 is the listening socket,
not a connection to another node. The sctp resend function
was not checking that the nodeid was valid (non-zero), so it
would mistakenly get and resend on the listening connection
when nodeid was zero.

Signed-off-by: Lidong Zhong <lzhong@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
676d23690fb62b5d51ba5d659935e9f7d9da9f8e 11-Apr-2014 David S. Miller <davem@davemloft.net> net: Fix use after free by removing length arg from sk_data_ready callbacks.

Several spots in the kernel perform a sequence like:

skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
048ed4b6266144fdee55089c9eef55b0c1d42ba1 21-Jan-2014 wangweidong <wangweidong1@huawei.com> sctp: remove macros sctp_{lock|release}_sock

Redefined {lock|release}_sock to sctp_{lock|release}_sock for user space friendly
code which we haven't use in years, so removing them.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ece35848c1847cdf3dd07954578d3e99238ebbae 10-Dec-2013 Dongmao Zhang <dmzhang@suse.com> dlm: set zero linger time on sctp socket

The recovery time for a failed node was taking a long
time because the failed node could not perform the full
shutdown process. Removing the linger time speeds this
up. The dlm does not care what happens to messages to
or from the failed node.

Signed-off-by: Dongmao Zhang <dmzhang@suse.com>
Signed-off-by: David Teigland <teigland@redhat.com>
06452eb0538827d2158945d20e3d33e359884437 19-Jun-2013 Wei Yongjun <yongjun_wei@trendmicro.com.cn> dlm: remove duplicated include from lowcomms.c

Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David Teigland <teigland@redhat.com>
86e92ad299fb0be359efdd61812944497d4d8d52 14-Jun-2013 Mike Christie <michaelc@cs.wisc.edu> dlm: disable nagle for SCTP

For TCP we disable Nagle and I cannot think of why it would be needed
for SCTP. When disabled it seems to improve dlm_lock operations like it
does for TCP.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
5d6898714fe2ce485e95ac74479ed40ebd8d5748 14-Jun-2013 Mike Christie <michaelc@cs.wisc.edu> dlm: retry failed SCTP sends

Currently if a SCTP send fails, we lose the data we were trying
to send because the writequeue_entry is released when we do the send.
When this happens other nodes will then hang waiting for a reply.

This adds support for SCTP to retry the send operation.

I also removed the retry limit for SCTP use, because we want
to make sure we try every path during init time and for longer
failures we want to continually retry in case paths come back up
while trying other paths. We will do this until userspace tells us
to stop.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
98e1b60ecc441625c91013e88f14cbd1b3c1fa08 14-Jun-2013 Mike Christie <michaelc@cs.wisc.edu> dlm: try other IPs when sctp init assoc fails

Currently, if we cannot create a association to the first IP addr
that is added to DLM, the SCTP init assoc code will just retry
the same IP. This patch adds a simple failover schemes where we
will try one of the addresses that was passed into DLM.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
b390ca38d27bd3d2f409e64a6f13d6ff67eb4825 14-Jun-2013 Mike Christie <michaelc@cs.wisc.edu> dlm: clear correct bit during sctp init failure handling

We should be testing and cleaing the init pending bit because later
when sctp_init_assoc is recalled it will be checking that it is not set
and set the bit.

We do not want to touch CF_CONNECT_PENDING here because we will queue
swork and process_send_sockets will then call the connect_action function.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
e1631d0c48ca5ba9878f5923ffe58ef6fd2d5fda 14-Jun-2013 Mike Christie <michaelc@cs.wisc.edu> dlm: set sctp assoc id during setup

sctp_assoc was not getting set so later lookups failed.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
efad7e6b1a28be599836c8f15ec04f99a98fb04c 14-Jun-2013 Mike Christie <michaelc@cs.wisc.edu> dlm: clear correct init bit during sctp setup

We were clearing the base con's init pending flags, but the
con for the node was the one with the pending bit set.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: David Teigland <teigland@redhat.com>
1b8664341100716202c29d67f24d67094a82971e 09-Apr-2013 Daniel Borkmann <dborkman@redhat.com> net: sctp: introduce uapi header for sctp

This patch introduces an UAPI header for the SCTP protocol,
so that we can facilitate the maintenance and development of
user land applications or libraries, in particular in terms
of header synchronization.

To not break compatibility, some fragments from lksctp-tools'
netinet/sctp.h have been carefully included, while taking care
that neither kernel nor user land breaks, so both compile fine
with this change (for lksctp-tools I tested with the old
netinet/sctp.h header and with a newly adapted one that includes
the uapi sctp header). lksctp-tools smoke test run through
successfully as well in both cases.

Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a 28-Feb-2013 Sasha Levin <sasha.levin@oracle.com> hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eeee2b5fe1a9db15d3160da8048d9b89108753bf 18-Oct-2012 Wei Yongjun <yongjun_wei@trendmicro.com.cn> dlm: remove unused variable in *dlm_lowcomms_get_buffer()

The variable users is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David Teigland <teigland@redhat.com>
9c5bef5849c9fde1a37ac005299f759440cbaf4c 13-Aug-2012 Ying Xue <ying.xue@windriver.com> dlm: cleanup send_to_sock routine

Remove unnecessary code form send_to_sock routine.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>
4dd40f0cd99a3500c6df80eb8f537678559c761e 10-Aug-2012 Ying Xue <ying.xue@windriver.com> dlm: convert add_sock routine return value type to void

Since add_sock() always returns a success code - 0, its return
value type should be changed from integer to void.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>
b4c798cf695dc7cee9798a686128461ad0070115 10-Aug-2012 Xue Ying <ying.xue@windriver.com> dlm: remove redundant variable assignments

Once the tcp_create_listen_sock() is returned successfully, we
will invoke add_sock() immediately. In add_sock(), the 'con'
variable is assigned to 'sk_user_data', meanwhile, the 'sock' is
also set to 'con->sock'. So it's unnecessary to do the same thing
in tcp_create_listen_sock().

Signed-off-by: Xue Ying <ying.xue@windriver.com>
Signed-off-by: David Teigland <teigland@redhat.com>
36b71a8bfbc92e1ba164e9aec840c0180ee933b5 26-Jul-2012 David Teigland <teigland@redhat.com> dlm: fix deadlock between dlm_send and dlm_controld

A deadlock sometimes occurs between dlm_controld closing
a lowcomms connection through configfs and dlm_send looking
up the address for a new connection in configfs.

dlm_controld does a configfs rmdir which calls
dlm_lowcomms_close which waits for dlm_send to
cancel work on the workqueues.

The dlm_send workqueue thread has called
tcp_connect_to_sock which calls dlm_nodeid_to_addr
which does a configfs lookup and blocks on a lock
held by dlm_controld in the rmdir path.

The solution here is to save the node addresses within
the lowcomms code so that the lowcomms workqueue does
not need to step through configfs to get a node address.

dlm_controld:
wait_for_completion+0x1d/0x20
__cancel_work_timer+0x1b3/0x1e0
cancel_work_sync+0x10/0x20
dlm_lowcomms_close+0x4c/0xb0 [dlm]
drop_comm+0x22/0x60 [dlm]
client_drop_item+0x26/0x50 [configfs]
configfs_rmdir+0x180/0x230 [configfs]
vfs_rmdir+0xbd/0xf0
do_rmdir+0x103/0x120
sys_rmdir+0x16/0x20

dlm_send:
mutex_lock+0x2b/0x50
get_comm+0x34/0x140 [dlm]
dlm_nodeid_to_addr+0x18/0xd0 [dlm]
tcp_connect_to_sock+0xf4/0x2d0 [dlm]
process_send_sockets+0x1d2/0x260 [dlm]
worker_thread+0x170/0x2a0

Signed-off-by: David Teigland <teigland@redhat.com>
513ef596d43cc35a72ae21170075136855641493 30-Mar-2012 David Teigland <teigland@redhat.com> dlm: prevent connections during shutdown

During lowcomms shutdown, a new connection could possibly
be created, and attempt to use a workqueue that's been
destroyed. Similarly, during startup, a new connection
could attempt to use a workqueue that's not been set up
yet. Add a global variable to indicate when new connections
are allowed.

Based on patch by: Christine Caulfield <ccaulfie@redhat.com>

Reported-by: dann frazier <dann.frazier@canonical.com>
Reviewed-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: David Teigland <teigland@redhat.com>
1b189b8889b7d8e0bddc2655d171c43cfd344157 21-Mar-2012 David Teigland <teigland@redhat.com> dlm: last element of dlm_local_addr[] never used

The last element of dlm_local_addr[DLM_MAX_ADDR_COUNT]
was not used because the loop ended at COUNT - 1.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2f2d76cc3e938389feee671b46252dde6880b3b7 08-Mar-2012 Benjamin Poirier <bpoirier@suse.de> dlm: Do not allocate a fd for peeloff

avoids allocating a fd that a) propagates to every kernel thread and
usermodehelper b) is not properly released.

References: http://article.gmane.org/gmane.linux.network.drbd/22529
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e3fd7a06dc20b2d8ec6892233ad2012968fe7b6 21-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com> net: remove ipv6_addr_copy()

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bcaadf5c1ac4ff84b52174a84adb86a1e3e806dd 03-Jul-2011 Masatake YAMATO <yamato@redhat.com> dlm: dump address of unknown node

When the dlm fails to make a network connection to another
node, include the address of the node in the error message.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
e43f055a953721ed1787a039ab5e720755596ea2 10-Mar-2011 David Teigland <teigland@redhat.com> dlm: use alloc_workqueue function

Replaces deprecated create_singlethread_workqueue().

Signed-off-by: David Teigland <teigland@redhat.com>
6b155c8fd4d239f7d883d455bbad1be47724bbfc 11-Feb-2011 David Teigland <teigland@redhat.com> dlm: use single thread workqueues

The recent commit to use cmwq for send and recv threads
dcce240ead802d42b1e45ad2fcb2ed4a399cb255 introduced problems,
apparently due to multiple workqueue threads. Single threads
make the problems go away, so return to that until we fully
understand the concurrency issues with multiple threads.

Signed-off-by: David Teigland <teigland@redhat.com>
b9d41052794385f9d47ebb7acf4a772f3ad02398 13-Dec-2010 Namhyung Kim <namhyung@gmail.com> dlm: sanitize work_start() in lowcomms.c

The create_workqueue() returns NULL if failed rather than ERR_PTR().
Fix error checking and remove unnecessary variable 'error'.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
f92c8dd7a0eb18124521e2b549f88422e17f707b 12-Nov-2010 Bob Peterson <rpeterso@redhat.com> dlm: reduce cond_resched during send

Calling cond_resched() after every send can unnecessarily
degrade performance. Go back to an old method of scheduling
after 25 messages.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
cb2d45da81c86d5191b19d0f67732a854bc0253c 12-Nov-2010 David Teigland <teigland@redhat.com> dlm: use TCP_NODELAY

Nagling doesn't help and can sometimes hurt dlm comms.

Signed-off-by: David Teigland <teigland@redhat.com>
dcce240ead802d42b1e45ad2fcb2ed4a399cb255 12-Nov-2010 Steven Whitehouse <swhiteho@redhat.com> dlm: Use cmwq for send and receive workqueues

So far as I can tell, there is no reason to use a single-threaded
send workqueue for dlm, since it may need to send to several sockets
concurrently. Both workqueues are set to WQ_MEM_RECLAIM to avoid
any possible deadlocks, WQ_HIGHPRI since locking traffic is highly
latency sensitive (and to avoid a priority inversion wrt GFS2's
glock_workqueue) and WQ_FREEZABLE just in case someone needs to do
that (even though with current cluster infrastructure, it doesn't
make sense as the node will most likely land up ejected from the
cluster) in the future.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: David Teigland <teigland@redhat.com>
b36930dd508e00f0c5083bcd57d25de6d0375c76 11-Nov-2010 David Miller <davem@davemloft.net> dlm: Handle application limited situations properly.

In the normal regime where an application uses non-blocking I/O
writes on a socket, they will handle -EAGAIN and use poll() to
wait for send space.

They don't actually sleep on the socket I/O write.

But kernel level RPC layers that do socket I/O operations directly
and key off of -EAGAIN on the write() to "try again later" don't
use poll(), they instead have their own sleeping mechanism and
rely upon ->sk_write_space() to trigger the wakeup.

So they do effectively sleep on the write(), but this mechanism
alone does not let the socket layers know what's going on.

Therefore they must emulate what would have happened, otherwise
TCP cannot possibly see that the connection is application window
size limited.

Handle this, therefore, like SUNRPC by setting SOCK_NOSPACE and
bumping the ->sk_write_count as needed when we hit the send buffer
limits.

This should make TCP send buffer size auto-tuning and the
->sk_write_space() callback invocations actually happen.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David Teigland <teigland@redhat.com>
f70cb33b9c270f4f1a7f28327e7d35dbf1a6fc40 03-Aug-2010 Julia Lawall <julia@diku.dk> fs/dlm: Drop unnecessary null test

hlist_for_each_entry binds its first argument to a non-null value, and thus
any null test on the value of that argument is superfluous.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
iterator I;
expression x,E,E1,E2;
statement S,S1,S2;
@@

I(x,...) { <...
- (x != NULL) &&
E
...> }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David Teigland <teigland@redhat.com>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
573c24c4af6664ffcd9aa7ba617a35fde2b95534 30-Nov-2009 David Teigland <teigland@redhat.com> dlm: always use GFP_NOFS

Replace all GFP_KERNEL and ls_allocation with GFP_NOFS.
ls_allocation would be GFP_KERNEL for userland lockspaces
and GFP_NOFS for file system lockspaces.

It was discovered that any lockspaces on the system can
affect all others by triggering memory reclaim in the
file system which could in turn call back into the dlm
to acquire locks, deadlocking dlm threads that were
shared by all lockspaces, like dlm_recv.

Signed-off-by: David Teigland <teigland@redhat.com>
6861f350785bf476c2d4e3b9cb69ee36b78df2fc 24-Sep-2009 David Teigland <teigland@redhat.com> dlm: fix socket fd translation

The code to set up sctp sockets was not using the sockfd_lookup()
and sockfd_put() routines to translate an fd to a socket. The
direct fget and fput calls were resulting in error messages from
alloc_fd().

Also clean up two log messages and remove a third, related to
setting up sctp associations.

Signed-off-by: David Teigland <teigland@redhat.com>
04bedd79a7037ee7af816b06c60c738144475c4a 18-Sep-2009 David Teigland <teigland@redhat.com> dlm: fix lowcomms_connect_node for sctp

The recently added dlm_lowcomms_connect_node() from
391fbdc5d527149578490db2f1619951d91f3561 does not work
when using SCTP instead of TCP. The sctp connection code
has nothing to do without data to send. Check for no data
in the sctp connection code and do nothing instead of
triggering a BUG. Also have connect_node() do nothing
when the protocol is sctp.

Signed-off-by: David Teigland <teigland@redhat.com>
1329e3f2c898cfabb6ed236d3fb8c1725197af53 24-Aug-2009 Paolo Bonzini <bonzini@gnu.org> dlm: use kernel_sendpage

Using kernel_sendpage() is cleaner and safer than following
sock->ops ourselves.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: David Teigland <teigland@redhat.com>
063c4c99630c0b06afad080d2a18bda64172c1a2 11-Aug-2009 Lars Marowsky-Bree <lmb@suse.de> dlm: fix connection close handling

Closing a connection to a node can create problems if there are
outstanding messages for that node. The problems include dlm_send
spinning attempting to reconnect, or BUG from tcp_connect_to_sock()
attempting to use a partially closed connection.

To cleanly close a connection, we now first attempt to send any pending
messages, cancel any remaining workqueue work, and flag the connection
as closed to avoid reconnect attempts.

Signed-off-by: Lars Marowsky-Bree <lmb@suse.de>
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
b5711b8e5a437ca7d35321d19de568b4f76a7739 28-Jul-2009 Casey Dahlin <cdahlin@redhat.com> dlm: fix double-release of socket in error exit path

The last correction to the tcp_connect_to_sock error exit path,
commit a89d63a159b1ba5833be2bef00adf8ad8caac8be, can free an already
freed socket, due to collision with a previous (incomplete) attempt
to fix the same issue, commit 311f6fc77c51926dbdfbeab0a5d88d70f01fa3f4.

Signed-off-by: Casey Dahlin <cdahlin@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
a89d63a159b1ba5833be2bef00adf8ad8caac8be 14-Jul-2009 Casey Dahlin <cdahlin@redhat.com> dlm: free socket in error exit path

In the tcp_connect_to_sock() error exit path, the socket
allocated at the top of the function was not being freed.

Signed-off-by: Casey Dahlin <cdahlin@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
748285ccf7ea76d3d76d0d5f2945ad6fb91f5329 15-May-2009 David Teigland <teigland@redhat.com> dlm: use more NOFS allocation

Change some GFP_KERNEL allocations to use either GFP_NOFS or
ls_allocation (when available) which the fs sets to GFP_NOFS.
The point is to prevent allocations from going back into the
cluster fs in places where that might lead to deadlock.

Signed-off-by: David Teigland <teigland@redhat.com>
391fbdc5d527149578490db2f1619951d91f3561 07-May-2009 Christine Caulfield <ccaulfie@redhat.com> dlm: connect to nodes earlier

Make network connections to other nodes earlier, in the context of
dlm_recoverd. This avoids connecting to nodes from dlm_send where we
try to avoid allocations which could possibly deadlock if memory reclaim
goes into the cluster fs which may try to do a dlm operation.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
5e9ccc372dc855900c4a75b21286038938e288c7 28-Jan-2009 Christine Caulfield <ccaulfie@redhat.com> dlm: replace idr with hash table for connections

Integer nodeids can be too large for the idr code; use a hash
table instead.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2cf12c0bf261e19d9641d7b8aa220e2651a03289 22-Jan-2009 Joe Perches <joe@perches.com> dlm: comment typo fixes

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David Teigland <teigland@redhat.com>
44ad532b3277f0cae55bfe0625d3140cf73af450 22-Jan-2009 Joe Perches <joe@perches.com> dlm: use ipv6_addr_copy

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David Teigland <teigland@redhat.com>
1521848cbb42935a52d11305c054b14461ad061c 13-Nov-2008 Steven Whitehouse <swhiteho@redhat.com> dlm: remove kmap/kunmap

The pages used in lowcomms are not highmem, so kmap is not necessary.

Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
d6d7b702a3a1ca50f7ca2bebaa79c80425156bac 12-Nov-2008 Steven Whitehouse <swhiteho@redhat.com> dlm: fix up memory allocation flags

Use ls_allocation for memory allocations, which a cluster fs sets to
GFP_NOFS. Use GFP_NOFS for allocations when no lockspace struct is
available. Taking dlm locks needs to avoid calling back into the
cluster fs because write-out can require taking dlm locks.

Cc: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
311f6fc77c51926dbdfbeab0a5d88d70f01fa3f4 27-Jun-2008 Masatake YAMATO <yamato@redhat.com> dlm: release socket on error

It seems that `sock' allocated by sock_create_kern in
tcp_connect_to_sock() of dlm/fs/lowcomms.c is not released if
dlm_nodeid_to_addr an error.

Acked-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
0035a4b14931eb62a5f8a7762284c18e7ab14289 11-May-2008 Marcin Slusarz <marcin.slusarz@gmail.com> dlm: tcp_connect_to_sock should check for -EINVAL, not EINVAL

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: cluster-devel@redhat.com
Signed-off-by: David Teigland <teigland@redhat.com>
7a936ce71eed7b887b8a0d6c54dd8a9072f71c9f 12-May-2008 Matthias Kaehlcke <matthias@kaehlcke.net> dlm: convert connections_lock in a mutex

The semaphore connections_lock is used as a mutex. Convert it to the mutex
API.

Signed-off-by: Matthias Kaehlcke <matthias@kaehlcke.net>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Teigland <teigland@redhat.com>
39bd4177ddbeb4c86e854d3d5c4a6a26088e601e 09-Jan-2008 Patrick Caulfeld <pcaulfie@redhat.com> dlm: close othercons

This patch addresses a problem introduced with the last round of
lowcomms patches where the 'othercon' connections do not get freed when
the DLM shuts down.

This results in the error message
"slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all
objects"

and the DLM cannot be restarted without a system reboot.

See bz#428119

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>
Signed-off-by: David Teigland <teigland@redhat.com>
6bd8fedaa16da1e24f38712ee759950d8c5f4f09 26-Oct-2007 Lon Hohberger <lhh@redhat.com> dlm: bind connections from known local address when using TCP

A common problem occurs when multiple IP addresses within the same
subnet are assigned to the same NIC. If we make a connection attempt to
another address on the same subnet as one of those addresses, the
connection attempt will not necessarily be routed from the address we
want.

In the case of the DLM, the other nodes will quickly drop the connection
attempt, causing problems.

This patch makes the DLM bind to the local address it acquired from the
cluster manager when using TCP prior to making a connection, obviating
the need for administrators to "fix" their systems or use clever routing
tricks.

Signed-off-by: Lon Hohberger <lhh@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
df61c952622f51facac21dd8dfa4d8a24dcb9657 07-Nov-2007 David S. Miller <davem@sunset.davemloft.net> [DLM] lowcomms: Do not muck with sysctl_rmem_max.

Use SO_RCVBUFFORCE instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
d66f8277f53407754f50ae6bada68f1b68d04d48 14-Sep-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Make dlm_sendd cond_resched more

Under high recovery loads dlm_sendd can monopolise the CPU and cause soft lockups.

This one extra and one moved cond_resched() make it yield a little more during
such times keeping work moving.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
61d96be0f474df354c2ff4a2b2bf410b23a5cd60 20-Aug-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Fix lowcomms socket closing

This patch fixes the slight mess made in lowcomms closing by previous patches
and fixes all sorts of DLM hangs.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
9e5f2825a8b721360b291f14f42cd7a25781156b 02-Aug-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] More othercon fixes

The last patch to clean out 'othercon' structures only fixed half the problem.
The attached addresses the other situations too, and fixes bz#238490

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
01c8cab25863de007fe8c598d0033919ea8ae65e 17-Jul-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] zero unused parts of sockaddr_storage

When we build a sockaddr_storage for an IP address, clear the unused parts as
they could be used for node comparisons.

I have seen this occasionally make sctp connections fail.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
25720c2d73058f4f929f16093f60817ed52a285c 11-Jul-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Clear othercon pointers when a connection is closed

This patch clears the othercon pointer and frees the memory when a connnection
is closed. This could cause a small memory leak when nodes leave the cluster.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
20c2df83d25c6a95affe6157a4c9cac4cf5ffaac 20-Jul-2007 Paul Mundt <lethal@linux-sh.org> mm: Remove slab destructors from kmem_cache_create().

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
f4fadb23ca49abd2f1387a0b7e78b385ebc760ce 27-Jun-2007 akpm@linux-foundation.org <akpm@linux-foundation.org> [GFS2] git-gfs2-nmw-build-fix

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
97d848365e603def43c69e160937f073bf9cf02e 27-Jun-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Telnet to port 21064 can stop all lockspaces

This patch fixes Red Hat bz#245892

Opening a tcp connection from a cluster member to another cluster member
targeting the dlm port it is enough to stop every dlm operation in the cluster.
This means that GFS and rgmanager will hang.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
afb853fb4eec380b492a3c369f837359359c28e8 01-Jun-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] fix socket shutdown

This patch clears the user_data of active sockets as part of cleanup.
This prevents any late-arriving data from trying to add jobs to the work
queue while we are tidying up.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
617e82e10ccf96a13eb2efd5eac4abef44a87d02 26-Apr-2007 David Teigland <teigland@redhat.com> [DLM] lowcomms style

Replace some printk with log_print, and fix some simple cases of lines
over 80. Also, return -ENOTCONN if lowcomms_start fails due to no local
IP address being available.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
30d3a2373f171e62e4032819f55fed2ec887d0b8 23-Apr-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Lowcomms nodeid range & initialisation fixes

Fix a few range & initialization bugs in lowcomms.
- max_nodeid is really the highest nodeid encountered, so all loops must include
it in their iterations.
- clean dlm_local_count & connection_idr so we can do a clean restart.
- Remove a spurious BUG_ON

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2439fe50724e8693e8b933b3f8125d870bfbdb25 19-Apr-2007 Josef Bacik <jwhiter@redhat.com> [DLM] Fix dlm_lowcoms_stop hang

When you attempt to release a lockspace in DLM, it will hang trying to down a
semaphore that has already been downed. The attached patch fixes the problem.

Signed-off-by: Josef Bacik <jwhiter@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Patrick Caulfield <pcaulfie@redhat.com>
6ed7257b46709e87d79ac2b6b819b7e0c9184998 17-Apr-2007 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Consolidate transport protocols

This patch consolidates the TCP & SCTP protocols for the DLM into a single file
and makes it switchable at run-time (well, at least before the DLM actually
starts up!)

For RHEL5 this patch requires Neil Horman's patch that expands the in-kernel
socket API but that has already been twice ACKed so it should be OK.

The patch adds a new lowcomms.c file that replaces the existing lowcomms-sctp.c
& lowcomms-tcp.c files.

Signed-off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
fdda387f73947e6ae511ec601f5b3c6fbb582aac 02-Nov-2006 Patrick Caulfield <pcaulfie@redhat.com> [DLM] Add support for tcp communications

The following patch adds a TCP based communications layer
to the DLM which is compile time selectable. The existing SCTP
layer gives the advantage of allowing multihoming, whereas
the TCP layer has been heavily tested in previous versions of
the DLM and is known to be robust and therefore can be used as
a baseline for performance testing.

Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
42fb00838a644d03f9a2a5fbbe0b668a5ff5df4d 13-Oct-2006 Patrick Caulfield <pcaulfie@redhat.com> [DLM] fix iovec length in recvmsg

I didn't spot that the msg_iovlen was set to 2 if there
were two elements in the iovec but left at zero if not :(

I think this might be why bob was still seeing trouble.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
4c5e1b1a8c3f591b21f09001d6748296ddff33b8 12-Oct-2006 Patrick Caulfield <pcaulfie@redhat.com> [DLM] fix iovec length in recvmsg

The DLM always passes the iovec length as 1, this is wrong when the circular
buffer wraps round.

Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
38d6fd26ea7f291141039fe340a581dc6f770fc0 09-Oct-2006 Al Viro <viro@ftp.linux.org.uk> [PATCH] dlm gfp_t annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fcc8abc8d4fcdbddc383091449f3696b411aa8fb 10-Aug-2006 David Teigland <teigland@redhat.com> [DLM] move kmap to after spin_unlock

Doing the kmap() while holding the spinlock was causing recursive spinlock
problems. It seems the kmap was scheduling, although there was no warning
as I'd expect. Patrick, do we need locking around the kmap?

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
7d5513d58d072cf38cae9c886653aadac38ef4a9 19-Jun-2006 David Teigland <teigland@redhat.com> [DLM] init rwsem earlier

The nodeinfo_lock rwsem needs to be initialized when the module is loaded
instead of when the dlm is first used.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
47c96298cd0b04b4478206fde55fd6a6431de980 25-May-2006 Steven Whitehouse <swhiteho@redhat.com> [GFS2] Change name due to local_nodeid being a macro

Change names of local_nodeid to dlm_local_nodeid to prevent a
namespace collision. Changed other local variable to match.

Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
1c032c03117c014512195f2e33c3af999f132146 28-Apr-2006 David Teigland <teigland@redhat.com> [DLM] PATCH 2/3 dlm: lowcomms close

When a node is removed from a lockspace configuration, close our
connection to it, clearing any remaining messages for it.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
e7fd41792fc0ee52a05fcaac87511f118328d147 18-Jan-2006 David Teigland <teigland@redhat.com> [DLM] The core of the DLM for GFS2/CLVM

This is the core of the distributed lock manager which is required
to use GFS2 as a cluster filesystem. It is also used by CLVM and
can be used as a standalone lock manager independantly of either
of these two projects.

It implements VAX-style locking modes.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steve Whitehouse <swhiteho@redhat.com>