History log of /net/sunrpc/xprtsock.c
Revision Date Author Comments
1aff52562939485e503936e17934be077ffaea53 24-Sep-2014 NeilBrown <neilb@suse.de> NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page()

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
- it doesn't hurt for kswapd to block occasionally. If it doesn't
want to block it would clear __GFP_WAIT. The current_is_kswapd()
was only added to avoid deadlocks and we have a new approach for
that.
- memory allocation in the SUNRPC layer can very rarely try to
->releasepage() a page it is trying to handle. The deadlock
is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
3dedbb5ca10ef13f25055776d2f6d9499d9ca1ba 24-Sep-2014 Jason Baron <jbaron@akamai.com> rpc: Add -EPERM processing for xs_udp_send_request()

If an iptables drop rule is added for an nfs server, the client can end up in
a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
is ignored since the prior bits of the packet may have been successfully queued
and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
thinks that because some bits were queued it should return -EAGAIN. We then try
the request again and again, resulting in cpu spinning. Reproducer:

1) open a file on the nfs server '/nfs/foo' (mounted using udp)
2) iptables -A OUTPUT -d <nfs server ip> -j DROP
3) write to /nfs/foo
4) close /nfs/foo
5) iptables -D OUTPUT -d <nfs server ip> -j DROP

The softlockup occurs in step 4 above.

The previous patch, allows xs_sendpages() to return both a sent count and
any error values that may have occurred. Thus, if we get an -EPERM, return
that to the higher level code.

With this patch in place we can successfully abort the above sequence and
avoid the softlockup.

I also tried the above test case on an nfs mount on tcp and although the system
does not softlockup, I still ended up with the 'hung_task' firing after 120
seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
since -EPERM appears to get ignored much lower down in the stack and does not
propogate up to xs_sendpages(). This case is not quite as insidious as the
softlockup and it is not addressed here.

Reported-by: Yigong Lou <ylou@akamai.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
f279cd008fc9742f5ec294d9b8a793a7a0b163ef 24-Sep-2014 Jason Baron <jbaron@akamai.com> rpc: return sent and err from xs_sendpages()

If an error is returned after the first bits of a packet have already been
successfully queued, xs_sendpages() will return a positive 'int' value
indicating success. Callers seem to treat this as -EAGAIN.

However, there are cases where its not a question of waiting for the write
queue to drain. For example, when there is an iptables rule dropping packets
to the destination, the lower level code can return -EPERM only after parts
of the packet have been successfully queued. In this case, we can end up
continuously retrying resulting in a kernel softlockup.

This patch is intended to make no changes in behavior but is in preparation for
subsequent patches that can make decisions based on both on the number of bytes
sent by xs_sendpages() and any errors that may have be returned.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
a743419f420a64d442280845c0377a915b76644f 23-Sep-2014 Benjamin Coddington <bcodding@redhat.com> SUNRPC: Don't wake tasks during connection abort

When aborting a connection to preserve source ports, don't wake the task in
xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
0f7a622ca61621f951af01448b956f2ecf5fad99 05-Sep-2014 Chris Perl <chris.perl@gmail.com> rpc: xs_bind - do not bind when requesting a random ephemeral port

When attempting to establish a local ephemeral endpoint for a TCP or UDP
socket, do not explicitly call bind, instead let it happen implicilty when the
socket is first used.

The main motivating factor for this change is when TCP runs out of unique
ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of
*any* TCP connection). In this situation if you explicitly call bind, then the
call will fail with EADDRINUSE. However, if you allow the allocation of an
ephemeral port to happen implicitly as part of connect (or other functions),
then ephemeral ports can be reused, so long as the combination of (local_ip,
local_port, remote_ip, remote_port) is unique for TCP sockets on the system.

This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP
sockets the same.

This can allow mount.nfs(8) to continue to function successfully, even in the
face of misbehaving applications which are creating a large number of TCP
connections.

Signed-off-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
00cfaa943ec30abbc7109b0b918e0b6a0eef07dc 21-Jun-2014 Daniel Walter <dwalter@google.com> replace strict_strto calls

Replace obsolete strict_strto calls with appropriate kstrto calls

Signed-off-by: Daniel Walter <dwalter@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
3601c4a91ebbbf1cf69f66a2abeffc6c64a4fe64 30-Jun-2014 Trond Myklebust <trond.myklebust@primarydata.com> SUNRPC: Ensure that we handle ENOBUFS errors correctly.

Currently, an ENOBUFS error will result in a fatal error for the RPC
call. Normally, we will just want to wait and then retry.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
0f8066bd48785058c339061fef05be0dcfa8dc08 23-May-2014 Tom Herbert <therbert@google.com> sunrpc: Remove sk_no_check setting

Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e857c58efeb99393cba5a5d0d8ec7117183137c 17-Mar-2014 Peter Zijlstra <peterz@infradead.org> arch: Mass conversion of smp_mb__*()

Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
676d23690fb62b5d51ba5d659935e9f7d9da9f8e 11-Apr-2014 David S. Miller <davem@davemloft.net> net: Fix use after free by removing length arg from sk_data_ready callbacks.

Several spots in the kernel perform a sequence like:

skb_queue_tail(&sk->s_receive_queue, skb);
sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up. So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument. And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller <davem@davemloft.net>
642aab58db209da990dc11b966933afc622ac4c5 23-Mar-2014 Kinglong Mee <kinglongmee@gmail.com> SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed

Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp,
because rpc_ping (which is in rpc_create) will using it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
d531c008d7d9713456abe3d265fc577bba2e1cef 24-Mar-2014 Kinglong Mee <kinglongmee@gmail.com> NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp

Besides checking rpc_xprt out of xs_setup_bc_tcp,
increase it's reference (it's important).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
47f72efa8f32e8182cd4a70d5a9a6d07651093fc 24-Mar-2014 Kinglong Mee <kinglongmee@gmail.com> NFSD: Free backchannel xprt in bc_destroy

Backchannel xprt isn't freed right now.
Free it in bc_destroy, and put the reference of THIS_MODULE.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
315f3812dbd92c7c8f26a8dbba183266ec219795 24-Mar-2014 Kinglong Mee <kinglongmee@gmail.com> SUNRPC: fix memory leak of peer addresses in XPRT

Creating xprt failed after xs_format_peer_addresses,
sunrpc must free those memory of peer addresses in xprt.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2ea24497a1b30dd03dd42b873fa5097913587f4d 10-Feb-2014 Trond Myklebust <trond.myklebust@primarydata.com> SUNRPC: RPC callbacks may be split across several TCP segments

Since TCP is a stream protocol, our callback read code needs to take into
account the fact that RPC callbacks are not always confined to a single
TCP segment.
This patch adds support for multiple TCP segments by ensuring that we
only remove the rpc_rqst structure from the 'free backchannel requests'
list once the data has been completely received. We rely on the fact
that TCP data is ordered for the duration of the connection.

Reported-by: shaobingqing <shaobingqing@bwstor.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
06ea0bfe6e6043cb56a78935a19f6f8ebc636226 11-Feb-2014 Trond Myklebust <trond.myklebust@primarydata.com> SUNRPC: Fix races in xs_nospace()

When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.

Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52ee1ba (SUNRPC: Don't start the retransmission timer...)
Reported-by: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
63862b5bef7349dd1137e4c70702c67d77565785 11-Jan-2014 Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> net: replace macros net_random and net_srandom with direct calls to prandom

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e8353c7682875329b8e70999e1652fd1bba8973d 31-Dec-2013 Trond Myklebust <trond.myklebust@primarydata.com> SUNRPC: Add tracepoint for socket errors

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2118071d3b0d57a03fad77885f4fdc364798aa87 31-Dec-2013 Trond Myklebust <trond.myklebust@primarydata.com> SUNRPC: Report connection error values to rpc_tasks on the pending queue

Currently we only report EAGAIN, which is not descriptive enough for
softconn tasks.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
28303ca3090c0aa0dbbb72714c51fceb4b939f6d 30-Nov-2013 Weng Meiling <wengmeiling.weng@huawei.com> sunrpc: fix some typos

Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
f06c3d2bb8e9e6d697ca059556d8de0647e2905d 17-Sep-2013 J. Bruce Fields <bfields@redhat.com> sunrpc: comment typo fix

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
a6b31d18b02ff9d7915c5898c9b5ca41a798cd73 08-Nov-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a data corruption issue when retransmitting RPC calls

The following scenario can cause silent data corruption when doing
NFS writes. It has mainly been observed when doing database writes
using O_DIRECT.

1) The RPC client uses sendpage() to do zero-copy of the page data.
2) Due to networking issues, the reply from the server is delayed,
and so the RPC client times out.

3) The client issues a second sendpage of the page data as part of
an RPC call retransmission.

4) The reply to the first transmission arrives from the server
_before_ the client hardware has emptied the TCP socket send
buffer.
5) After processing the reply, the RPC state machine rules that
the call to be done, and triggers the completion callbacks.
6) The application notices the RPC call is done, and reuses the
pages to store something else (e.g. a new write).

7) The client NIC drains the TCP socket send buffer. Since the
page data has now changed, it reads a corrupted version of the
initial RPC call, and puts it on the wire.

This patch fixes the problem in the following manner:

The ordering guarantees of TCP ensure that when the server sends a
reply, then we know that the _first_ transmission has completed. Using
zero-copy in that situation is therefore safe.
If a time out occurs, we then send the retransmission using sendmsg()
(i.e. no zero-copy), We then know that the socket contains a full copy of
the data, and so it will retransmit a faithful reproduction even if the
RPC call completes, and the application reuses the O_DIRECT buffer in
the meantime.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
a1311d87fa034e0de580e3a65f2d5c2e7a1f55f3 31-Oct-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Cleanup xs_destroy()

There is no longer any need for a separate xs_local_destroy() helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
93dc41bdc5c853916610576c6b48a1704959c70d 31-Oct-2013 NeilBrown <neilb@suse.de> SUNRPC: close a rare race in xs_tcp_setup_socket.

We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:

xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.

The 'sock' passed to that last function is NULL.

The only way I can see this happening is a concurrent call to
xs_close:

xs_close -> xs_reset_transport -> sock_release -> inet_release

inet_release sets:
sock->sk = NULL;
inet_stream_connect calls
lock_sock(sock->sk);
which gets NULL.

All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.

So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().

To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().

As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.

Signed-off-by: NeilBrown <neilb@suse.de>
[Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e3bfab18483a26649ff9cb605aa1410386b1e498 02-Oct-2013 J. Bruce Fields <bfields@redhat.com> sunrpc: comment typo fix

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
8b71798c0d389d4cadc884fc7d68c61ee8cd4f45 26-Sep-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Only update the TCP connect cookie on a successful connect

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7f260e8575bf53b93b77978c1e39f8e67612759c 24-Sep-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Enable the keepalive option for TCP sockets

For NFSv4 we want to avoid retransmitting RPC calls unless the TCP
connection breaks. However we still want to detect TCP connection
breakage as soon as possible. Do this by setting the keepalive option
with the idle timeout and count set to the 'timeo' and 'retrans' mount
options.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
40b5ea0c25669cb99ba7f4836437a7ebaba91408 04-Sep-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add tracepoints to help debug socket connection issues

Add client side debugging to help trace socket connection/disconnection
and unexpected state change issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
64dc61306ce7da370833289739e2f52dfc6b37ba 23-Jul-2013 Eric Dumazet <edumazet@google.com> net: add sk_stream_is_writeable() helper

Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fe2c6338fd2c6f383c4d4164262f35c8f3708e1f 12-Jun-2013 Joe Perches <joe@perches.com> net: Convert uses of typedef ctl_table to struct ctl_table

Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
xargs perl -p -i -e '\
sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2fccbd9cc0fdca649b01f1e2d96e5ef85256341a 24-Sep-2012 J. Bruce Fields <bfields@redhat.com> sunrpc: server back channel needs no rpcbind method

XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp()
(using xprt_set_bound()), and is never cleared, so ->rpcbind() will
never need to be called.

Reported-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7073ea8799a8cf73db60270986f14e4aae20fa80 21-Feb-2013 J. Bruce Fields <bfields@redhat.com> SUNRPC: attempt AF_LOCAL connect on setup

In the gss-proxy case, setup time is when I know I'll have the right
namespace for the connect.

In other cases, it might be useful to get any connection errors
earlier--though actually in practice it doesn't make any difference for
rpcbind.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
b7993cebb841b0da7a33e9d5ce301a9fd3209165 14-Apr-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Allow rpc_create() to request that TCP slots be unlimited

This is mainly for use by NFSv4.1, where the session negotiation
ultimately wants to decide how many RPC slots we can fill.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3ed5e2a2c394df4e03a680842c2d07a8680f133b 04-Mar-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Report network/connection errors correctly for SOFTCONN rpc tasks

In the case of a SOFTCONN rpc task, we really want to ensure that it
reports errors like ENETUNREACH back to the caller. Currently, only
some of these errors are being reported back (connect errors are not),
and they are being converted by the RPC layer into EIO.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
190b1ecf257be308f0053c371fa7afa1ba5f4932 08-Mar-2013 J. Bruce Fields <bfields@redhat.com> sunrpc: don't attempt to cancel unitialized work

As of dc107402ae06286a9ed33c32daf3f35514a7cb8d "SUNRPC: make AF_LOCAL connect synchronous", we no longer initialize connect_worker in the
AF_LOCAL case, resulting in warnings like:

WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0() Hardware name: Bochs
ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd sunrpc
Pid: 4816, comm: nfsd Tainted: G W 3.8.0-rc2-00049-gdc10740 #801
Call Trace:
[<ffffffff8156ec00>] ? free_obj_work+0x60/0xa0
[<ffffffff81046aaf>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81046ba6>] warn_slowpath_fmt+0x46/0x50
[<ffffffff8156eccc>] debug_print_object+0x8c/0xb0
[<ffffffff81055030>] ? timer_debug_hint+0x10/0x10
[<ffffffff8156f7e3>] debug_object_assert_init+0xe3/0x120
[<ffffffff81057ebb>] del_timer+0x2b/0x80
[<ffffffff8109c4e6>] ? mark_held_locks+0x86/0x110
[<ffffffff81065a29>] try_to_grab_pending+0xd9/0x150
[<ffffffff81065b57>] __cancel_work_timer+0x27/0xc0
[<ffffffff81065c03>] cancel_delayed_work_sync+0x13/0x20
[<ffffffffa0007067>] xs_destroy+0x27/0x80 [sunrpc]
[<ffffffffa00040d8>] xprt_destroy+0x78/0xa0 [sunrpc]
[<ffffffffa0006241>] xprt_put+0x21/0x30 [sunrpc]
[<ffffffffa00030cf>] rpc_free_client+0x10f/0x1a0 [sunrpc]
[<ffffffffa0002ff3>] ? rpc_free_client+0x33/0x1a0 [sunrpc]
[<ffffffffa0002f7e>] rpc_release_client+0x6e/0xb0 [sunrpc]
[<ffffffffa000325d>] rpc_shutdown_client+0xfd/0x1b0 [sunrpc]
[<ffffffffa0017196>] rpcb_put_local+0x106/0x130 [sunrpc]
...

Acked-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
dc107402ae06286a9ed33c32daf3f35514a7cb8d 20-Feb-2013 J. Bruce Fields <bfields@redhat.com> SUNRPC: make AF_LOCAL connect synchronous

It doesn't appear that anyone actually needs to connect asynchronously.

Also, using a workqueue for the connect means we lose the namespace
information from the original process. This is a problem since there's
no way to explicitly pass in a filesystem namespace for resolution of an
AF_LOCAL address.

Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5976687a2b3d1969f02aba16b80ad3ed79be6ad3 04-Feb-2013 Jeff Layton <jlayton@redhat.com> sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h

These routines are used by server and client code, so having them in a
separate header would be best.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
6a24dfb645dbcb05b34d08b991d082bdaa3ff072 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Pass pointers to struct rpc_xprt to the congestion window

Avoid access to task->tk_xprt

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3dc0da278e2b26fa8e353b3a962b2c89e184d353 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix an RCU dereference in xs_local_rpcbind

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1b092092bf0e2e8b7af1c2a03f615b4e60b05d47 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback

Avoid another RCU dereference by passing the pointer to struct rpc_xprt
from the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a4f0835c604f80f945ab3e72ffd00547145c4b2b 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()

tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is
defined as an __rcu variable. Replace dereferences of tk_xprt with
non-rcu dereferences where it is safe to do so.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1efc28780bf84f04dcce4f3f7242e4e6a59bffd4 15-Dec-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: variable 'svsk' is unused in function bc_send_request

Silence a compile time warning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4a20a988f732caa343257346ebf145ae8fa437e1 15-Dec-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Handle ECONNREFUSED in xs_local_setup_socket

Silence the unnecessary warning "unhandled error (111) connecting to..."
and convert it to a dprintk for debugging purposes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b8a13d039cbf7aec3b486fc0ae3996a5449afed2 23-Oct-2012 Weston Andros Adamson <dros@netapp.com> SUNRPC: remove BUG_ON from bc_malloc

Replace BUG_ON() with WARN_ON_ONCE() and NULL return - the caller will handle
this like a memory allocation failure.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1b7a1819078c68c4df4bba90f76b8113a08460de 23-Oct-2012 Weston Andros Adamson <dros@netapp.com> SUNRPC: remove BUG_ONs from *_reclassify_socket*

Replace multiple BUG_ON() calls with WARN_ON_ONCE() and early return when
sanity checking socket ownership (lock). The bind call will fail if the
socket was unsuccessfully reclassified.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
f878b657ce8e7d3673afe48110ec208a29e38c4a 22-Oct-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Get rid of the xs_error_report socket callback

Chris Perl reports that we're seeing races between the wakeup call in
xs_error_report and the connect attempts. Basically, Chris has shown
that in certain circumstances, the call to xs_error_report causes the
rpc_task that is responsible for reconnecting to wake up early, thus
triggering a disconnect and retry.

Since the sk->sk_error_report() calls in the socket layer are always
followed by a tcp_done() in the cases where we care about waking up
the rpc_tasks, just let the state_change callbacks take responsibility
for those wake ups.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
4bc1e68ed6a8b59be8a79eb719be515a55c7bc68 23-Oct-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Prevent races in xs_abort_connection()

The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().

All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
b9d2bb2ee537424a7f855e1f93eed44eb9ee0854 23-Oct-2012 Trond Myklebust <Trond.Myklebust@netapp.com> Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..."

This reverts commit 55420c24a0d4d1fce70ca713f84aa00b6b74a70e.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
d0bea455dd48da1ecbd04fedf00eb89437455fdc 23-Oct-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clear the connect flag when socket state is TCP_CLOSE_WAIT

This is needed to ensure that we call xprt_connect() upon the next
call to call_connect().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Tested-by: Chris Perl <chris.perl@gmail.com>
d19751e7b9bd8a01d00372325439589886674f79 11-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Get rid of the redundant xprt->shutdown bit field

It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
84e28a307e376f271505af65a7b7e212dd6f61f4 24-Sep-2012 Bryan Schumaker <bjschuma@netapp.com> SUNRPC: Set alloc_slot for backchannel tcp ops

f39c1bfb5a03e2d255451bff05be0d7255298fa4 (SUNRPC: Fix a UDP transport
regression) introduced the "alloc_slot" function for xprt operations,
but never created one for the backchannel operations. This patch fixes
a null pointer dereference when mounting NFS over v4.1.

Call Trace:
[<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc]
[<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc]
[<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc]
[<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc]
[<ffffffff81073589>] process_one_work+0x139/0x500
[<ffffffff81070e70>] ? alloc_worker+0x70/0x70
[<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc]
[<ffffffff81073d1e>] worker_thread+0x15e/0x460
[<ffffffff8145c839>] ? preempt_schedule+0x49/0x70
[<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230
[<ffffffff81079603>] kthread+0x93/0xa0
[<ffffffff81465d04>] kernel_thread_helper+0x4/0x10
[<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff81465d00>] ? gs_change+0x13/0x13

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a519fc7a70d1a918574bb826cc6905b87b482eb9 12-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT

Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@vger.kernel.org
f39c1bfb5a03e2d255451bff05be0d7255298fa4 07-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a UDP transport regression

Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
a564b8f0398636ba30b07c0eaebdef7ff7837249 01-Aug-2012 Mel Gorman <mgorman@suse.de> nfs: enable swap on NFS

Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.

[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5cf02d09b50b1ee1c2d536c9cf64af5a7d433f56 23-Jul-2012 Jeff Layton <jlayton@redhat.com> nfs: skip commit in releasepage if we're freeing memory for fs-related reasons

We've had some reports of a deadlock where rpciod ends up with a stack
trace like this:

PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14"
#0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
#1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
#2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
#3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
#4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
#5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
#6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
#7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
#8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
#9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
#10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
#11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
#12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
#13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
#14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
#15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
#16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
#17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
#18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
#19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
#20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
#21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
#22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
#23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
#24 [ffff8810343bfee8] kthread at ffffffff8108dd96
#25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

rpciod is trying to allocate memory for a new socket to talk to the
server. The VM ends up calling ->releasepage to get more memory, and it
tries to do a blocking commit. That commit can't succeed however without
a connected socket, so we deadlock.

Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
socket allocation, and having nfs_release_page check for that flag when
deciding whether to do a commit call. Also, set PF_FSTRANS
unconditionally in rpc_async_schedule since that function can also do
allocations sometimes.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
60d354ebebd9d0f760cb6c3b9f53a7ade0f8cd0e 02-Jul-2012 David S. Miller <davem@davemloft.net> sunrpc: Don't do a dst_confirm() on an input routes.

xs_udp_data_ready() is operating on received packets, and tries to
do a dst_confirm() on the dst attached to the SKB.

This isn't right, dst confirmation is for output routes, not input
rights. It's for resetting the timers on the nexthop neighbour entry
for the route, indicating that we've got good evidence that we've
successfully reached it.

Signed-off-by: David S. Miller <davem@davemloft.net>
92769108f5382a0bdb4c35eb80c183fb7797cfae 23-Mar-2012 J. Bruce Fields <bfields@redhat.com> sunrpc: skip portmap calls on sessions backchannel

There's obviously no point to doing portmap calls over the sessions
backchannel.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
09acfea5d8de419ebe84be43b08f7b79c965215f 11-Mar-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a few sparse warnings

net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
(different address spaces)
- svc_partial_recvfrom now takes a struct kvec, so the variable
save_iovbase needs to be an ordinary (void *)

Make a bunch of variables in net/sunrpc/xprtsock.c static

Fix a couple of "warning: symbol 'foo' was not declared. Should it be
static?" reports.

Fix a couple of conflicting function declarations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
15a4520621824a3c2eb2de2d1f3984bc1663d3c8 14-Feb-2012 Andy Adamson <andros@netapp.com> SUNRPC: add sending,pending queue and max slot to xprt stats

With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d8f58c0d2ff5029e7002ab43f913b36f9, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.

The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.

Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
24ca9a847791fd53d9b217330b15f3c285827a18 22-Nov-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure we return EAGAIN in xs_nospace if congestion is cleared

By returning '0' instead of 'EAGAIN' when the tests in xs_nospace() fail
to find evidence of socket congestion, we are making the RPC engine believe
that the message was incorrectly sent and so it disconnects the socket
instead of just retrying.

The bug appears to have been introduced by commit
5e3771ce2d6a69e10fcc870cdf226d121d868491 (SUNRPC: Ensure that xs_nospace
return values are propagated).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 2.6.30]
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
2aa13531bbbc6582874bedfcd853e1058b0fb4f9 10-Nov-2011 Stanislav Kinsbursky <skinsbursky@parallels.com> SUNRPC: destroy freshly allocated transport in case of sockaddr init error

Otherwise we will leak xprt structure and struct net reference.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
d9ba131d8f58c0d2ff5029e7002ab43f913b36f9 18-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Support dynamic slot allocation for TCP connections

Allow the number of available slots to grow with the TCP window size.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
43cedbf0e8dfb9c5610eb7985d5f21263e313802 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot

This throttles the allocation of new slots when the socket is busy
reconnecting and/or is out of buffer space.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9e00abc3c20904fd6a5d888bb7023925799ec8a5 14-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: sunrpc should not explicitly depend on NFS config options

Change explicit references to CONFIG_NFS_V4_1 to implicit ones
Get rid of the unnecessary defines in backchannel_rqst.c and
bc_svc.c: the Makefile takes care of those dependency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
176e21ee2ec89cae8d45cf1a850ea45a45428fb8 09-May-2011 Chuck Lever <chuck.lever@ORACLE.COM> SUNRPC: Support for RPC over AF_LOCAL transports

TI-RPC introduces the capability of performing RPC over AF_LOCAL
sockets. It uses this mainly for registering and unregistering
local RPC services securely with the local rpcbind, but we could
also conceivably use it as a generic upcall mechanism.

This patch provides a client-side only implementation for the moment.
We might also consider a server-side implementation to provide
AF_LOCAL access to NLM (for statd downcalls, and such like).

Autobinding is not supported on kernel AF_LOCAL transports at this
time. Kernel ULPs must specify the pathname of the remote endpoint
when an AF_LOCAL transport is created. rpcbind supports registering
services available via AF_LOCAL, so the kernel could handle it with
some adjustment to ->rpcbind and ->set_port. But we don't need this
feature for doing upcalls via well-known named sockets.

This has not been tested with ULPs that move a substantial amount of
data. Thus, I can't attest to how robust the write_space and
congestion management logic is.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
61677eeec29e87edc03a1061ae0a04b92507450d 09-May-2011 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename xs_encode_tcp_fragment_header()

Clean up: Use a more generic name for xs_encode_tcp_fragment_header();
it's appropriate to use for all stream transport types. We're about
to add new stream transport.

Also, move it to a place where it is more easily shared amongst the
various send_request methods. And finally, replace the "htonl" macro
invocation with its modern equivalent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fe19a96b10032035a35779f42ad59e35d6dd8ffd 19-Mar-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...

The TCP connection state code depends on the state_change() callback
being called when the SYN_SENT state is set. However the networking layer
doesn't actually call us back in that case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
246408dcd5dfeef2df437ccb0ef4d6ee87805f58 22-Mar-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Never reuse the socket port after an xs_close()

If we call xs_close(), we're in one of two situations:
- Autoclose, which means we don't expect to resend a request
- bind+connect failed, which probably means the port is in use

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
4cea288aaf0e11647880cc487350b1dc45d9febc 22-Feb-2011 Ben Hutchings <bhutchings@solarflare.com> sunrpc: Propagate errors from xs_bind() through xs_create_sock()

xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded
error, but it currently returns 0 if xs_bind() fails.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [v2.6.37]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
f0418aa4b1103f959d64dc18273efa04ee0140e9 08-Dec-2010 J. Bruce Fields <bfields@redhat.com> rpc: allow xprt_class->setup to return a preexisting xprt

This allows us to reuse the xprt associated with a server connection if
one has already been set up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
99de8ea962bbc11a51ad4c52e3dc93bee5f6ba70 08-Dec-2010 J. Bruce Fields <bfields@redhat.com> rpc: keep backchannel xprt as long as server connection

Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
d75faea330dbd1873c9094e9926ae306590c0998 01-Dec-2010 J. Bruce Fields <bfields@redhat.com> rpc: move sk_bc_xprt to svc_xprt

This seems obviously transport-level information even if it's currently
used only by the server socket code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
afe2c511fb2d75f1515081ff1be15bd79cfe722d 14-Dec-2010 Tejun Heo <tj@kernel.org> workqueue: convert cancel_rearming_delayed_work[queue]() users to cancel_delayed_work_sync()

cancel_rearming_delayed_work[queue]() has been superceded by
cancel_delayed_work_sync() quite some time ago. Convert all the
in-kernel users. The conversions are completely equivalent and
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: netdev@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: xfs-masters@oss.sgi.com
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: netfilter-devel@vger.kernel.org
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: linux-nfs@vger.kernel.org
9247685088398cf21bcb513bd2832b4cd42516c4 20-Oct-2010 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Properly initialize sock_xprt.srcaddr in all cases

The source address field in the transport's sock_xprt is initialized
ONLY IF the RPC application passed a pointer to a source address
during the call to rpc_create(). However, xs_bind() subsequently uses
the value of this field without regard to whether the source address
was initialized during transport creation or not.

So far we've been lucky: the uninitialized value of this field is
zeroes. xs_bind(), until recently, used only the sin[6]_addr field in
this sockaddr, and all zeroes is a valid value for this: it means
ANYADDR. This is a happy coincidence.

However, xs_bind() now wants to use the sa_family field as well, and
expects it to be initialized to something other than zero.

Therefore, the source address sockaddr field should be fully
initialized at transport create time in _every_ case, not just when
the RPC application wants to use a specific bind address.

Bruce added a workaround for this missing initialization by adjusting
commit 6bc9638a, but the "right" way to do this is to ensure that the
source address sockaddr is always correctly initialized from the
get-go.

This patch doesn't introduce a behavior change. It's simply a
clean-up of Bruce's fix, to prevent future problems of this kind. It
may look like overkill, but

a) it clearly documents the default initial value of this field,

b) it doesn't assume that the sockaddr_storage memory is first
initialized to any particular value, and

c) it will fail verbosely if some unknown address family is passed
in

Originally introduced by commit d3bc9a1d.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
4232e8634ad82c5a53389e4016de15a8b15c09c3 20-Oct-2010 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Use conventional switch statement when reclassifying sockets

Clean up.

Defensive coding: If "family" is ever something that is neither
AF_INET nor AF_INET6, xs_reclassify_socket6() is not the appropriate
default action. Choose to do nothing in that case.

Introduced by commit 6bc9638a.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
50fa0d40a9d601bb8e6c9a595e90940bc846f7df 05-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove dead "else" branch from bc xprt creation

Since the xprt in question is forcibly set to be bound the else
branch of this check is unneeded.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
8c14ff2aaf26d58aa2258a59bd419c906d105938 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove UDP worker wrappers

Same for UDP sockets creation paths.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
cdd518d524b49e6e80b109bf985376456a2985ce 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove TCP worker wrappers

The v4 and the v6 wrappers only pass the respective family
to the xs_tcp_setup_socket. This family can be taken from the
xprt's sockaddr.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7dfe1fc36278c3aa0db29356c491db6353678e98 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Pass family to setup_socket calls

Now we have a single socket creation routine and can call it
directly from the setup_socket routines.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
6bc9638ab495516f8a34d2ae48f2f43f145e186f 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Merge xs_create_sock code

After xs_bind is merged it's easy to merge its callers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@redhat.com: fix address family initialization]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
beb59b68280d9779cc16591115547678d1c74a66 05-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Merge the xs_bind code

There's the only difference betseen the xs_bind4 and the
xs_bind6 - the size of sockaddr structure they use.

Fortunatelly its size can be indirectly get from the transport.

Change since v1:
* use sockaddr_storage instead of sockaddr
* use rpc_set_port instead of manual port assigning

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
[bfields@redhat.com: fix address family initialization]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
573018c07e040b2c3f3cb8251f66fa4a5cb7425d 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Call xs_create_sockX directly from setup_socket

Remove now unneeded wrappers that just add type and protocol
to socket creation callback.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
22d44a7d8a03456aa6d0a047c051aa28728e6ecd 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out v6 sockets creation

Same patch for v6 protocols.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
22f793268de3b4dff8abfcd873ba7afc1f34224f 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out v4 sockets creation

The UDPv4 and TCPv4 socket creation callbacks now look very similar.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
b65c0310611af73569f94c526a1e2323d99b380a 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out udp sockets creation

Make it look like the TCP sockets creation.
Unfortunately the git diff made the patch look messy :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
58dddac9c55c604f01152832c1c3d2c17a5adea9 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove duplicate xprt/transport arguments from calls

The xs_tcp_reuse_connection takes the xprt only to pass it down
to the xs_abort_connection. The later one can get it from the given
transport itself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
a9f5f0f7bf72f3f1451e844681fb3cb5d0b1c80d 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Get xprt pointer once in xs_tcp_setup_socket

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
baaf4e487a9c42b345bde14698fd566f864c9287 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove unused sock arg from xs_next_srcport

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
5d4ec932972a0dd5486c59909e62dc62105d065c 04-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove unused sock arg from xs_get_srcport

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
14ec63c3336af7ea5445e0d8f4d26ba3041e40b3 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Create sockets in net namespaces

The context is already known in all the sock_create callers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
37aa2133731d9231eb834f700119f0d3f1ed2664 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Tag rpc_xprt with net

The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
e204e621b4160c802315bc2d0fa335337c0d62e8 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out rpc_xprt freeing

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
bd1722d4316e42a12fe6337ebe34d7e1e2c088b2 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out rpc_xprt allocation

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
f064af1e500a2bf4607706f0f458163bdb2a6ea5 22-Sep-2010 Eric Dumazet <eric.dumazet@gmail.com> net: fix a lockdep splat

We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A] CPU2 [C]
read_lock(callback_lock)
<BH> spin_lock_bh(slock)
<wait to spin_lock(slock)>
<wait to write_lock_bh(callback_lock)>

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9bbb9e5a33109b2832e2e63dcc7a132924ab374b 12-Aug-2010 Rusty Russell <rusty@rustcorp.com.au> param: use ops in struct kernel_param, rather than get and set fns directly

This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.

The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).

Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are). This causes some
harmless warnings until we fix them (in following patches).

To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
669502ff31d7dba1849aec7ee2450a3c61f57d39 10-Aug-2010 Andy Chittenden <andyc.bluearc@gmail.com> SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)

When reusing a TCP connection, ensure that it's aborted if a previous
shutdown attempt has been made on that connection so that the RPC over
TCP recovery mechanism succeeds.

Signed-off-by: Andy Chittenden <andyc.bluearc@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b76ce56192bcf618013fb9aecd83488cffd645cc 16-Jun-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()

If the attempt to read the calldir fails, then instead of storing the read
bytes, we currently discard them. This leads to a garbage final result when
upon re-entry to the same routine, we read the remaining bytes.

Fixes the regression in bugzilla number 16213. Please see
https://bugzilla.kernel.org/show_bug.cgi?id=16213

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
0a68b0bed08eeb7ec62e0125f17856b1ccb1ea4b 26-May-2010 J. Bruce Fields <bfields@citi.umich.edu> sunrpc: fix leak on error on socket xprt setup

Also collect exit code together while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3fa21e07e6acefa31f974d57fba2b6920a7ebd1a 18-May-2010 Joe Perches <joe@perches.com> net: Remove unnecessary returns from void function()s

This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d60dbb20a74c2cfa142be0a34dac3c6547ea086c 13-May-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst

It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
712a4338669d7d57f952244abb608e6ac07e39da 12-May-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix xs_setup_bc_tcp()

It is a BUG for anybody to call this function without setting
args->bc_xprt. Trying to return an error value is just wrong, since the
user cannot fix this: it is a programming error, not a user error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
bbc72cea58f671665b6362be0d4e391813ac0eee 07-May-2010 Chuck Lever <chuck.lever@oracle.com> SUNRPC: RPC metrics and RTT estimator should use same RTT value

Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a8ce4a8f37fef0a09a1e920c2e09f67a80426c7e 16-Apr-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fail over more quickly on connect errors

We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Remove the field xprt->connect_timeout since it has been obsoleted by
xprt->reestablish_timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0b9e79431377df452348e78262dd5a3dc359eeef 16-Apr-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move the test for XPRT_CONNECTING into xprt_connect()

This fixes a bug with setting xprt->stat.connect_start.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c9acb42ef1904d15d0fb315061cefbe638f67f3a 19-Mar-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel

The ->release_request() callback was designed to allow the transport layer
to do housekeeping after the RPC call is done. It cannot be used to free
the request itself, and doing so leads to a use-after-free bug in
xprt_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
81160e66cca3d3a16b7d88e0e2dccfc5c76f36f9 08-Mar-2010 Joe Perches <joe@perches.com> net/sunrpc: Convert (void)snprintf to snprintf

(Applies on top of "Remove uses of NIPQUAD, use %pI4")

Casts to void of snprintf are most uncommon in kernel source.
9 use casts, 1301 do not.

Remove the remaining uses in net/sunrpc/

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fc0b579168cbe737c83c6b9bbfe265d3ae6baca6 08-Mar-2010 Joe Perches <joe@perches.com> net/sunrpc: Remove uses of NIPQUAD, use %pI4

Originally submitted Jan 1, 2010
http://patchwork.kernel.org/patch/71221/

Convert NIPQUAD to the %pI4 format extension where possible
Convert %02x%02x%02x%02x/NIPQUAD to %08x/ntohl

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5fe46e9d733f19a880ef7e516002bd4c2b833e14 08-Mar-2010 Bian Naimeng <biannm@cn.fujitsu.com> rpc client can not deal with ENOSOCK, so translate it into ENOCONN

If NFSv4 client send a request before connect, or the old connection was broken
because a ETIMEOUT error catched by call_status, ->send_request will return
ENOSOCK, but rpc layer can not deal with it, so make sure ->send_request can
translate ENOSOCK into ENOCONN.

Signed-off-by: Bian Naimeng <biannm@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9fcfe0c83c3b04a759cde6b8c5f961237f17808b 02-Mar-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Handle EINVAL error returns from the TCP connect operation

This can, for instance, happen if the user specifies a link local IPv6
address.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
5a51f13adf7909caec2f8182767485c30e21364e 14-Jan-2010 H Hartley Sweeten <hartleys@visionengravers.com> xprtsock.c: make bc_{malloc/free} static

xprtsock.c: make bc_{malloc/free} static

The server backchannel buf_alloc and buf_free methods should
be static since they are not used outside this file.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
09a21c4102c8f7893368553273d39c0cadedf9af 03-Dec-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Allow RPCs to fail quickly if the server is unreachable

The kernel sometimes makes RPC calls to services that aren't running.
Because the kernel's RPC client always assumes the hard retry semantic
when reconnecting a connection-oriented RPC transport, the underlying
reconnect logic takes a long while to time out, even though the remote
may have responded immediately with ECONNREFUSED.

In certain cases, like upcalls to our local rpcbind daemon, or for NFS
mount requests, we'd like the kernel to fail immediately if the remote
service isn't reachable. This allows another transport to be tried
immediately, or the pending request can be abandoned quickly.

Introduce a per-request flag which controls how call_transmit_status()
behaves when request transmission fails because the server cannot be
reached.

We don't want soft connection semantics to apply to other errors. The
default case of the switch statement in call_transmit_status() no
longer falls through; the fall through code is copied to the default
case, and a "break;" is added.

The transport's connection re-establishment timeout is also ignored for
such requests. We want the request to fail immediately, so the
reconnect delay is skipped. Additionally, we don't want a connect
failure here to further increase the reconnect timeout value, since
this request will not be retried.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
6d4561110a3e9fa742aeec6717248a491dfb1878 16-Nov-2009 Eric W. Biederman <ebiederm@xmission.com> sysctl: Drop & in front of every proc_handler.

For consistency drop & in front of every proc_handler. Explicity
taking the address is unnecessary and it prevents optimizations
like stubbing the proc_handlers to NULL.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
f8572d8f2a2ba75408b97dc24ef47c83671795d7 05-Nov-2009 Eric W. Biederman <ebiederm@xmission.com> sysctl net: Remove unused binary sysctl code

Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
61d0a8e6a8049cea246ee7ec19b042d4ff1f6ef6 23-Sep-2009 Neil Brown <neilb@suse.de> NFS/RPC: fix problems with reestablish_timeout and related code.


[[resending with correct cc: - "vfs.kernel.org" just isn't right!]]

xprt->reestablish_timeout is used to cause TCP connection attempts to
back off if the connection fails so as not to hammer the network,
but to still allow immediate connections when there is no reason to
believe there is a problem.

It is not used for the first connection (when transport->sock is NULL)
but only on reconnects.

It is currently set:

a/ to 0 when xs_tcp_state_change finds a state of TCP_FIN_WAIT1
on the assumption that the client has closed the connection
so the reconnect should be immediate when needed.
b/ to at least XS_TCP_INIT_REEST_TO when xs_tcp_state_change
detects TCP_CLOSING or TCP_CLOSE_WAIT on the assumption that the
server closed the connection so a small delay at least is
required.
c/ as above when xs_tcp_state_change detects TCP_SYN_SENT, so that
it is never 0 while a connection has been attempted, else
the doubling will produce 0 and there will be no backoff.
d/ to double is value (up to a limit) when delaying a connection,
thus providing exponential backoff and
e/ to XS_TCP_INIT_REEST_TO in xs_setup_tcp as simple initialisation.

So you can see it is highly dependant on xs_tcp_state_change being
called as expected. However experimental evidence shows that
xs_tcp_state_change does not see all state changes.
("rpcdebug -m rpc trans" can help show what actually happens).

Results show:
TCP_ESTABLISHED is reported when a connection is made. TCP_SYN_SENT
is never reported, so rule 'c' above is never effective.

When the server closes the connection, TCP_CLOSE_WAIT and
TCP_LAST_ACK *might* be reported, and TCP_CLOSE is always
reported. This rule 'b' above will sometimes be effective, but
not reliably.

When the client closes the connection, it used to result in
TCP_FIN_WAIT1, TCP_FIN_WAIT2, TCP_CLOSE. However since commit
f75e674 (SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on
reconnect) we don't see *any* events on client-close. I think this
is because xs_restore_old_callbacks is called to disconnect
xs_tcp_state_change before the socket is closed.
In any case, rule 'a' no longer applies.

So all that is left are rule d, which successfully doubles the
timeout which is never rest, and rule e which initialises the timeout.

Even if the rules worked as expected, there would be a problem because
a successful connection does not reset the timeout, so a sequence
of events where the server closes the connection (e.g. during failover
testing) will cause longer and longer timeouts with no good reason.

This patch:

- sets reestablish_timeout to 0 in xs_close thus effecting rule 'a'
- sets it to 0 in xs_tcp_data_ready to ensure that a successful
connection resets the timeout
- sets it to at least XS_TCP_INIT_REEST_TO after it is doubled,
thus effecting rule c

I have not reimplemented rule b and the new version of rule c
seems sufficient.

I suspect other code in xs_tcp_data_ready needs to be revised as well.
For example I don't think connect_cookie is being incremented as often
as it should be.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
f300baba5a1536070d6d77bf0c8c4ca999bb4f0f 10-Sep-2009 Alexandros Batsakis <batsakis@netapp.com> nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel

[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
4cfc7e6019caa3e97d2a81c48c8d575d7b38d751 10-Sep-2009 Rahul Iyer <iyer@netapp.com> nfsd41: sunrpc: Added rpc server-side backchannel handling

When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@netapp.com>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
9dc3b095b78347bfb02c324b5ee2e558f7267396 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Update xprt address strings after an rpcbind completes

After a bind completes, update the transport instance's address
strings so debugging messages display the current port the transport
is connected to.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c740eff84bcfd63c0497ef880e80171931cb8222 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Kill RPC_DISPLAY_ALL

At some point, I recall that rpc_pipe_fs used RPC_DISPLAY_ALL.
Currently there are no uses of RPC_DISPLAY_ALL outside the transport
modules themselves, so we can safely get rid of it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fbfffbd5e74c5fa8c9165e110cb5899ec21e6364 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename sock_xprt.addr as sock_xprt.srcaddr

Clean up: Give the "addr" and "port" field less ambiguous names.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c877b849d302d1275452af80b7221a2555dc02e1 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Use rpc_ntop() for constructing transport address strings

Clean up: In addition to using the new generic rpc_ntop() and
rpc_get_port() functions, have the RPC client compute the presentation
address buffer sizes dynamically using kstrdup().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ba809130bc260fce04141aca01ef9e068d32af2a 09-Aug-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Remove duplicate universal address generation

RPC universal address generation is currently done in several places:
rpcb_clnt.c, nfs4proc.c xprtsock.c, and xprtrdma.c. Remove the
redundant cases that convert a socket address to a universal
address. The nfs4proc.c case takes a pre-formatted presentation
address string, not a socket address, so we'll leave that one.

Because the new uaddr constructor uses the recently introduced
rpc_ntop(), it now supports proper "::" shorthanding for IPv6
addresses. This allows the kernel to register properly formed
universal addresses with the local rpcbind service, in _all_ cases.

The kernel can now also send properly formed universal addresses in
RPCB_GETADDR requests, and support link-local properly when
encoding and decoding IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
cbf1107126af2950623fafdaa5c9df43ab00f046 09-Aug-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: convert some sysctls into module parameters

Parameters like the minimum reserved port, and the number of slot entries
should really be module parameters rather than sysctls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0d90ba1cd416525c4825c111db862d8b15a02e9b 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Backchannel callback service helper routines

Executes the backchannel task on the RPC state machine using
the existing open connection previously established by the client.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>

nfs41: Add bc_svc.o to sunrpc Makefile.

[nfs41: bc_send() does not need to be exported outside RPC module]
[nfs41: xprt_free_bc_request() need not be exported outside RPC module]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Update copyright]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
88b5ed73bcd0f21e008b6e303a02c8b7cb1199f4 17-Jun-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a missing "break" option in xs_tcp_setup_socket()

In the case of -EADDRNOTAVAIL and/or unhandled connection errors, we want
to get rid of the existing socket and retry immediately, just as the
comment says. Currently we end up sleeping for a minute, due to the missing
"break" statement.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
44b98efdd0a205bdca2cb63493350d06ff6804b1 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: New xs_tcp_read_data()

Handles RPC replies and backchannel callbacks. Traditionally the NFS
client has expected only RPC replies on its open connections. With
NFSv4.1, callbacks can arrive over an existing open connection.

This patch refactors the old xs_tcp_read_request() into an RPC reply handler:
xs_tcp_read_reply(), a new backchannel callback handler: xs_tcp_read_callback(),
and a common routine to read the data off the transport: xs_tcp_read_common().
The new xs_tcp_read_callback() queues callback requests onto a queue where
the callback service (a separate thread) is listening for the processing.

This patch incorporates work and suggestions from Rahul Iyer (iyer@netapp.com)
and Benny Halevy (bhalevy@panasas.com).

xs_tcp_read_callback() drops the connection when the number of expected
callbacks is exceeded. Use xprt_force_disconnect(), ensuring tasks on
the pending queue are awaken on disconnect.

[nfs41: Keep track of RPC call/reply direction with a flag]
[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: xs_tcp_read_callback() should use xprt_force_disconnect()]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Moves embedded #ifdefs into #ifdef function blocks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
f4a2e418bfd03a1f25f515e8a92ecd584d96cfc1 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Process the RPC call direction

Reading and storing the RPC direction is a three step process.

1. xs_tcp_read_calldir() reads the RPC direction, but it will not store it
in the XDR buffer since the 'struct rpc_rqst' is not yet available.

2. The 'struct rpc_rqst' is obtained during the TCP_RCV_COPY_DATA state.
This state need not necessarily be preceeded by the TCP_RCV_READ_CALLDIR.
For example, we may be reading a continuation packet to a large reply.
Therefore, we can't simply obtain the 'struct rpc_rqst' during the
TCP_RCV_READ_CALLDIR state and assume it's available during TCP_RCV_COPY_DATA.

This patch adds a new TCP_RCV_READ_CALLDIR flag to indicate the need to
read the RPC direction. It then uses TCP_RCV_COPY_CALLDIR to indicate the
RPC direction needs to be saved after the 'struct rpc_rqst' has been allocated.

3. The 'struct rpc_rqst' is obtained by the xs_tcp_read_data() helper
functions. xs_tcp_read_common() then saves the RPC direction in the XDR
buffer if TCP_RCV_COPY_CALLDIR is set. This will happen when we're reading
the data immediately after the direction was read. xs_tcp_read_common()
then clears this flag.

[was nfs41: Skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Add RPC direction back into the XDR buffer]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: sunrpc: Don't skip past the RPC call direction]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
18dca02aeb3c49dfce87c76be643b139d05cf647 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Add ability to read RPC call direction on TCP stream.

NFSv4.1 callbacks can arrive over an existing connection. This patch adds
the logic to read the RPC call direction (call or reply). It does this by
updating the state machine to look for the call direction invoking
xs_tcp_read_calldir(...) after reading the XID.

[nfs41: Keep track of RPC call/reply direction with a flag]

As per 11/14/08 review of RFC 53/85.

Add a new flag to track whether the incoming message is an RPC call or an
RPC reply. TCP_RPC_REPLY is set in the 'struct sock_xprt' tcp_flags in
xs_tcp_read_calldir() if the message is an RPC reply sent on the forechannel.
It is cleared if the message is an RPC request sent on the back channel.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
adf30907d63893e4208dfe3f5c88ae12bc2f25d5 02-Jun-2009 Eric Dumazet <eric.dumazet@gmail.com> net: skb->dst accessors

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f75e6745aa3084124ae1434fd7629853bdaf6798 21-Apr-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect

See http://bugzilla.kernel.org/show_bug.cgi?id=13034

If the port gets into a TIME_WAIT state, then we cannot reconnect without
binding to a new port.

Tested-by: Petr Vandrovec <petr@vandrovec.name>
Tested-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
55420c24a0d4d1fce70ca713f84aa00b6b74a70e 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure we close the socket on EPIPE errors too...

As long as one task is holding the socket lock, then calls to
xprt_force_disconnect(xprt) will not succeed in shutting down the socket.
In particular, this would mean that a server initiated shutdown will not
succeed until the lock is relinquished.
In order to avoid the deadlock, we should ensure that xs_tcp_send_request()
closes the socket on EPIPE errors too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b61d59fffd3e5b6037c92b4c840605831de8a251 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: xs_tcp_connect_worker{4,6}: merge common code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
25fe6142a57c720452c5e9ddbc1f32309c1e5c19 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add a sysctl to control the duration of the socket linger timeout

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7d1e8255cf959fba7ee2317550dfde39f0b936ae 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets

This fixes a regression against FreeBSD servers as reported by Tomas
Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers
don't ever react to the client closing the socket, and so commit
e06799f958bf7f9f8fae15f0c6f519953fb0257c (SUNRPC: Use shutdown() instead of
close() when disconnecting a TCP socket) causes the setup to hang forever
whenever the client attempts to close and then reconnect.

We break the deadlock by adding a 'linger2' style timeout to the socket,
after which, the client will abort the connection using a TCP 'RST'.

The default timeout is set to 15 seconds. A subsequent patch will put it
under user control by means of a systctl.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5e3771ce2d6a69e10fcc870cdf226d121d868491 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure that xs_nospace return values are propagated

If xs_nospace() finds that the socket has disconnected, it attempts to
return ENOTCONN, however that value is then squashed by the callers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
8a2cec295f4499cc9d4452e9b02d4ed071bb42d3 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Delay, then retry on connection errors.

Enforce the comment in xs_tcp_connect_worker4/xs_tcp_connect_worker6 that
we should delay, then retry on certain connection errors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2a4919919a97911b0aa4b9f5ac1eab90ba87652b 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending

While we should definitely return socket errors to the task that is
currently trying to send data, there is no need to propagate the same error
to all the other tasks on xprt->pending. Doing so actually slows down
recovery, since it causes more than one tasks to attempt socket recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
482f32e65d31cbf88d08306fa5d397cc945c3c26 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Handle socket errors correctly

Ensure that we pick up and handle socket errors as they occur.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c8485e4d634f6df155040293928707f127f0d06d 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()

If we get an ECONNREFUSED error, we currently go to sleep on the
'xprt->sending' wait queue. The problem is that no timeout is set there,
and there is nothing else that will wake the task up later.

We should deal with ECONNREFUSED in call_status, given that is where we
also deal with -EHOSTDOWN, and friends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
40d2549db5f515e415894def98b49db7d4c56714 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Don't disconnect if a connection is still in progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
670f94573104b4a25525d3fcdcd6496c678df172 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure we set XPRT_CLOSING only after we've sent a tcp FIN...

...so that we can distinguish between when we need to shutdown and when we
don't. Also remove the call to xs_tcp_shutdown() from xs_tcp_connect(),
since xprt_connect() makes the same test.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fe315e76fc3a3f9f7e1581dc22fec7e7719f0896 11-Mar-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Avoid spurious wake-up during UDP connect processing

To clear out old state, the UDP connect workers unconditionally invoke
xs_close() before proceeding with a new connect. Nowadays this causes
a spurious wake-up of the task waiting for the connect to complete.

This is a little racey, but usually harmless. The waiting task
immediately retries the connect via a call_bind/call_connect sequence,
which usually finds the transport already in the connected state
because the connect worker has finished in the background.

To avoid a spurious wake-up, factor the xs_close() logic that resets
the underlying socket into a helper, and have the UDP connect workers
call that helper instead of xs_close().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
01d37c428ae080563c0a3bb8bdfa88c65a6891d3 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: xprt_connect() don't abort the task if the transport isn't bound

If the transport isn't bound, then we should just return ENOTCONN, letting
call_connect_status() and/or call_status() deal with retrying. Currently,
we appear to abort all pending tasks with an EIO error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fba91afbec2c004e2c8733ae9e0ca6998e962c64 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix an Oops due to socket not set up yet...

We can Oops in both xs_udp_send_request() and xs_tcp_send_request() if the
call to xs_sendpages() returns an error due to the socket not yet being
set up.
Deal with that situation by returning a new error: ENOTSOCK, so that we
know to avoid dereferencing transport->sock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1f0fa15432e49547c3fa915644c7e0c0975809e7 07-Feb-2009 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> net/sunrpc/xprtsock.c: some common code found

$ diff-funcs xs_udp_write_space net/sunrpc/xprtsock.c
net/sunrpc/xprtsock.c xs_tcp_write_space
--- net/sunrpc/xprtsock.c:xs_udp_write_space()
+++ net/sunrpc/xprtsock.c:xs_tcp_write_space()
@@ -1,4 +1,4 @@
- * xs_udp_write_space - callback invoked when socket buffer space
+ * xs_tcp_write_space - callback invoked when socket buffer space
* becomes available
* @sk: socket whose state has changed
*
@@ -7,12 +7,12 @@
* progress, otherwise we'll waste resources thrashing kernel_sendmsg
* with a bunch of small requests.
*/
-static void xs_udp_write_space(struct sock *sk)
+static void xs_tcp_write_space(struct sock *sk)
{
read_lock(&sk->sk_callback_lock);

- /* from net/core/sock.c:sock_def_write_space */
- if (sock_writeable(sk)) {
+ /* from net/core/stream.c:sk_stream_write_space */
+ if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) {
struct socket *sock;
struct rpc_xprt *xprt;


$ codiff net/sunrpc/xprtsock.o net/sunrpc/xprtsock.o.new
net/sunrpc/xprtsock.c:
xs_tcp_write_space | -163
xs_udp_write_space | -163
2 functions changed, 326 bytes removed

net/sunrpc/xprtsock.c:
xs_write_space | +179
1 function changed, 179 bytes added

net/sunrpc/xprtsock.o.new:
3 functions changed, 179 bytes added, 326 bytes removed, diff: -147

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
e0db4a786bbd73145b4feb45c75d49b6e60fe72c 03-Nov-2008 David S. Miller <davem@davemloft.net> sunrpc: Fix build warning due to typo in %pI4 format changes.

Noticed by Stephen Hemminger.

Signed-off-by: David S. Miller <davem@davemloft.net>
21454aaad30651ba0dcc16fe5271bc12ee21f132 31-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace NIPQUAD() in net/*/

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b095d98928fdb9e3b75be20a54b7a6cbf6ca9ad 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace %p6 with %pI6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4b7a4274ca63dadd9c4f17fc953f3a5d19855c4c 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace %#p6 format specifier with %pi6

gcc warns when using the # modifier with the %p format specifier,
so we can't use this to omit the colons when needed, introduces
%pi6 instead.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fdb46ee752ed05c94bac71fe3decdb5175ec6e1f 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net, misc: replace uses of NIP6_FMT with %p6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b071195deba14b37ce896c26f20349b46e5f9fd2 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace all current users of NIP6_SEQFMT with %#p6

The define in kernel.h can be done away with at a later time.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2a9e1cfa23fb62da37739af81127dab5af095d99 28-Oct-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Respond promptly to server TCP resets

If the server sends us an RST error while we're in the TCP_ESTABLISHED
state, then that will not result in a state change, and so the RPC client
ends up hanging forever (see
http://bugzilla.kernel.org/show_bug.cgi?id=11154)

We can intercept the reset by setting up an sk->sk_error_report callback,
which will then allow us to initiate a proper shutdown and retry...

We also make sure that if the send request receives an ECONNRESET, then we
shutdown too...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
113aa838ec3a235d883f8357d31d90e16c47fc89 14-Oct-2008 Alan Cox <alan@redhat.com> net: Rationalise email address: Network Specific Parts

Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b22602a673b1743bba4b62bb404ffd3b269d2f09 06-Jun-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Ensure all transports set rq_xtime consistently

The RPC client uses the rq_xtime field in each RPC request to determine the
round-trip time of the request. Currently, the rq_xtime field is
initialized by each transport just before it starts enqueing a request to
be sent. However, transports do not handle initializing this value
consistently; sometimes they don't initialize it at all.

To make the measurement of request round-trip time consistent for all
RPC client transport capabilities, pull rq_xtime initialization into the
RPC client's generic transport logic. Now all transports will get a
standardized RTT measure automatically, from:

xprt_transmit()

to

xprt_complete_rqst()

This makes round-trip time calculation more accurate for the TCP transport.
The socket ->sendmsg() method can return "-EAGAIN" if the socket's output
buffer is full, so the TCP transport's ->send_request() method may call
the ->sendmsg() method repeatedly until it gets all of the request's bytes
queued in the socket's buffer.

Currently, the TCP transport sets the rq_xtime field every time through
that loop so the final value is the timestamp just before the *last* call
to the underlying socket's ->sendmsg() method. After this patch, the
rq_xtime field contains a timestamp that reflects the time just before the
*first* call to ->sendmsg().

This is consequential under heavy workloads because large requests often
take multiple ->sendmsg() calls to get all the bytes of a request queued.
The TCP transport causes the request to sleep until the remote end of the
socket has received enough bytes to clear space in the socket's local
output buffer. This delay can be quite significant.

The method introduced by this patch is a more accurate measure of RTT
for stream transports, since the server can cause enough back pressure
to delay (ie increase the latency of) requests from the client.

Additionally, this patch corrects the behavior of the RDMA transport, which
entirely neglected to initialize the rq_xtime field. RPC performance
metrics for RDMA transports now display correct RPC request round trip
times.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Tom Talpey <thomas.talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7c1d71cf56feebfb5b98219b9d11dfc3a2feca62 17-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests

NFSv4 requires us to ensure that we break the TCP connection before we're
allowed to retransmit a request. However in the case where we're
retransmitting several requests that have been sent on the same
connection, we need to ensure that we don't interfere with the attempt to
reconnect and/or break the connection again once it has been established.

We therefore introduce a 'connection' cookie that is bumped every time a
connection is broken. This allows requests to track if they need to force a
disconnection.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
06b4b681ababc20596aa947595714710f557131d 16-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: remove XS_SENDMSG_RETRY

The condition for exiting from the loop in xs_tcp_send_request() should be
that we find we're not making progress (i.e. number of bytes sent is 0).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b6ddf64ffe9d59577a9176856bb6fe69a539f573 18-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix up xprt_write_space()

The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.

SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.

Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0dc47877a3de00ceadea0005189656ae8dc52669 06-Mar-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ff2d7db848f8db7cade39e55f78f86d77e0de01a 26-Feb-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure that we read all available tcp data

Don't stop until we run out of data, or we hit an error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
33e01dc7f578813cda074ceaeaf68b0f3ffcc393 14-Jan-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Clean up functions that free address_strings array

Clean up: document the rule (kfree) and the exceptions
(RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in
a transport's address_strings array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b454ae906085cf7774fb4756746680c9b03b6f84 08-Jan-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: fewer conditionals in the format_ip_address routines

Clean up: have the set up routines explicitly pass the strings to be used
for the transport name and NETID. This removes a number of conditionals
and dependencies on rpc_xprt.prot, which is overloaded.

Tighten up type checking on the address_strings array while we're at it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ba7392bb37cb12781890f45d7ddee1618e33a036 20-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add support for per-client timeout values

In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2881ae74e68ecfe3b32a90936e5d93a9ba598c3a 20-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up the transport timeout initialisation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
663b8858dddbc8e634476960cc1c5f69dadba9b0 02-Jan-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Reconnect immediately whenever the server isn't refusing it.

If we've disconnected from the server, rather than the other way round,
then it makes little sense to wait 3 seconds before reconnecting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
62da3b24880bccd4ffc32cf8d9a7e23fab475bdd 07-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Rename xprt_disconnect()

xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3ebb067d92ebe5bcfd282acf12bade891d334d07 07-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Make call_status()/call_decode() call xprt_force_disconnect()

Move the calls to xprt_disconnect() over to xprt_force_disconnect() in
order to enable the transport layer to manage the state of the
XPRT_CONNECTED flag.
Ditto in xs_tcp_read_fraghdr().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7272dcd31d56580dee7693c21e369fd167e137fe 07-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: xprt_autoclose() should not call xprt_disconnect()

The transport layer should do that itself whenever appropriate.

Note that the RDMA transport already assumes that it needs to call
xprt_disconnect in xprt_rdma_close().
For TCP sockets, we want to call xprt_disconnect() only after the
connection has been closed by both ends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e06799f958bf7f9f8fae15f0c6f519953fb0257c 05-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket

By using shutdown() rather than close() we allow the RPC client to wait
for the TCP close handshake to complete before we start trying to reconnect
using the same port.
We use shutdown(SHUT_WR) only instead of shutting down both directions,
however we wait until the server has closed the connection on its side.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ef80367071dce7d2533e79ae8f3c84ec42708dc8 31-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3b948ae5be5e22532584113e2e02029519bbad8f 05-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Allow the client to detect if the TCP connection is closed

Add an xprt->state bit to enable the TCP ->state_change() method to signal
whether or not the TCP connection is in the process of closing down.
This will to be used by the reconnection logic in a separate patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
67a391d72ca7efb387c30ec761a487e50a3ff085 05-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix TCP rebinding logic

Currently the TCP rebinding logic assumes that if we're not using a
reserved port, then we don't need to reconnect on the same port if a
disconnection event occurs. This breaks most RPC duplicate reply cache
implementations.

Also take into account the fact that xprt_min_resvport and
xprt_max_resvport may change while we're reconnecting, since the user may
change them at any time via the sysctls. Ensure that we check the port
boundaries every time we loop in xs_bind4/xs_bind6. Also ensure that if the
boundaries change, we only scan the ports a maximum of 2 times.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
66af1e558538137080615e7ad6d1f2f80862de01 06-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a race in xs_tcp_state_change()

When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1781f7f5804e52ee2d35328b129602146a8d8254 11-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [UDP]: Restore missing inDatagrams increments

The previous move of the the UDP inDatagrams counter caused the
counting of encapsulated packets, SUNRPC data (as opposed to call)
packets and RXRPC packets to go missing.

This patch restores all of these.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
483066d62ec583fb6379377a9bfa8d5645b91c75 24-Oct-2007 Adrian Bunk <bunk@kernel.org> SUNRPC: make sunrpc/xprtsock.c:xs_setup_{udp,tcp}() static

xs_setup_{udp,tcp}() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5fe4a33430d90243ff93a77ea31e20f7557bca8a 22-Nov-2007 Adrian Bunk <bunk@kernel.org> [SUNRPC]: Make xprtsock.c:xs_setup_{udp,tcp}() static

xs_setup_{udp,tcp}() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
02b3d34631831a19ee691516e233756b270eac6d 12-Sep-2007 John Heffner <jheffner@psc.edu> [NET] Cleanup: Use sock_owned_by_user() macro

Changes asserts in sunrpc to use sock_owned_by_user() macro instead of
referencing sock_lock.owner directly.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2199700f1d660494d2552278354422c51becb686 01-Oct-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix buggy UDP transmission

xs_sendpages() may return a negative result. We sure as hell don't want to
add that to the 'tk_bytes_sent' tally...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1321d8d971028e796978f6a48d195c09158b3bcd 24-Sep-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Fix bytes-per-op accounting for RPC over UDP

NFS performance metrics reported zero bytes sent per op when mounting with
UDP. The UDP socket transport wasn't properly counting the number of bytes
sent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4fa016eb248cac875541fa199af550a8aefa0e90 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> NFS/SUNRPC: support transport protocol naming

To prepare for including non-sockets-based RPC transports, select
RPC transports by an identifier (to be used in following patches).

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
49c36fcc441baf6a4d698e3645d1adf28edaf57b 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: rearrange RPC sockets definitions

To prepare for including non-sockets-based RPC transports, move the
sockets-dependent definitions into their own file.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3c341b0b925eee01daae2c594b81e673f659d7cd 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: rename the rpc_xprtsock_create structure

To prepare for including non-sockets-based RPC transports, change the
overly suggestive name of the transport creation arguments struct.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
bc25571e21e8bd053554209f5b1b228ad71e6b99 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: Finish API to load RPC transport implementations dynamically

Allow RPC client transport implementations to be loaded as needed, or
as they become available from distributors or third-party vendors.

Note that we leave the IP sockets implementation in sunrpc.o
permanently, as IP functionality is always available in any
kernel that runs NFS.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4417c8c41a51a2ae95b2a2fa2811640b368c4151 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: export per-transport rpcbind netid's

The rpcbind (v3+) netid is provided by each RPC client transport. This fixes
an omission in IPv6 rpcbind client support, and enables future extension.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
756805e7a76bcd2aae07fe31786fe453375e60b1 16-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Add support for formatted universal addresses

"Universal addresses" are a string representation of an IP address and
port. They are described fully in RFC 3530, section 2.2. Add support
for generating them in the RPC client's socket transport module.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
8945ee5e27156ef9708bc3a11da87ba689aa38b6 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Split xs_reclassify_socket into an IPv4 and IPv6 version

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
95392c593e13fa7546857425971f87e4ded6e0c1 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Add a helper for extracting the address using the correct type

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
8f9d5b1a2e717fb9e0c4d2c60a224ecce905bd23 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Add IPv6 address support to net/sunrpc/xprtsock.c

Finalize support for setting up RPC client transports to remote RPC
services addressed via IPv6.

Based on work done by Gilles Quillard at Bull Open Source.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
68e220bd5c9fc52d8029275cd42e08f573ce3600 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: create connect workers for IPv6

Clone separate connect worker functions for connecting AF_INET6 sockets.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9c3d72de28eed3e882becd7054da2118f8a73131 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename IPv4 connect workers

Prepare for introduction of IPv6 versions of same.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16be2d20d999cb65daebfdaf0e560225e28fcb9d 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Refactor a part of socket connect logic into a helper function

Finishing a socket connect is the same for IPv4 and IPv6, so split it out
into a helper.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
90058d37c30ffce0e033ea3dcc6a539111483a58 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: create an IPv6-savvy mechanism for binding to a reserved port

Clone xs_bindresvport into two functions, one that can handle IPv4
addresses, and one that can handle IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7dc753f0391ad94868609376f37be4833671b57d 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename xs_bind() to prepare for IPv6-specific bind method

Prepare for introduction of IPv6-specific socket bind function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
20612005c51a5ba1bb17902276b9216825958724 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Introduce support for setting the port number in IPv6 addresses

We could clone xs_set_port, but this is easier overall.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4b6473fba4e832ee1d15737bc38779501c349a61 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: add a function to format IPv6 addresses

Clone xs_format_ipv4_peer_addresses into an IPv6 version.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ba10f2c23471b2ef106eb0c71ead4e9862766b8d 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename xs_format_peer_addresses

Prepare to add an IPv6 version of xs_format_peer_addresses by renaming it
to xs_format_ipv4_peer_addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fbfe3cc677c1a62ca6472abf24d03d4bf9f03a55 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Add hex-formatted address support to rpc_peeraddr2str()

Add support for the NFS client's need to export volume information
with IP addresses formatted in hex instead of decimal.

This isn't used yet, but subsequent patches (not in this series) will
change the NFS client to use this functionality.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0c43b3d81cca46ab2469f8802f8bd68b49f1b2a5 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Free address buffers in a loop

Use more generic logic to free buffers holding formatted addresses. This
makes it less likely a bug will be introduced when adding additional buffer
types in xs_format_peer_address().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
bda243df2f5beebce92bae22bc01960544783984 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Use standard macros for printing IP addresses

include/linux/kernel.h gives us some nice macros for formatting IP
addresses. Use them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b595bb15061567441546be1af883b256bcdfff9c 06-Aug-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Fix a signed v. unsigned comparison in net/sunrpc/xprtsock.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
d3bc9a1deb8964d774af8535814cb91bf8f6def0 09-Jul-2007 Frank van Maarseveen <frankvm@frankvm.com> SUNRPC client: add interface for binding to a local address

In addition to binding to a local privileged port the NFS client should
allow binding to a specific local address. This is used by the server
for callbacks. The patch adds the necessary interface.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
96802a095171f5b35cf0e1e0d4be943e6696a253 08-Jul-2007 Frank van Maarseveen <frankvm@frankvm.com> SUNRPC: cleanup transport creation argument passing

Cleanup argument passing to functions for creating an RPC transport.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
45160d6275814e0c86206e6981f0b92c61a50a21 01-Jul-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename rpcb_getport to be consistent with new rpcb_getport_sync name

Clean up, for consistency. Rename rpcb_getport as rpcb_getport_async, to
match the naming scheme of rpcb_getport_sync.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c1384c9c4c184543375b52a0997d06cd98145164 15-Jun-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: fix hang due to eventd deadlock...

Brian Behlendorf writes:

The root cause of the NFS hang we were observing appears to be a rare
deadlock between the kernel provided usermodehelper API and the linux NFS
client. The deadlock can arise because both of these services use the
generic linux work queues. The usermodehelper API run the specified user
application in the context of the work queue. And NFS submits both cleanup
and reconnect work to the generic work queue for handling. Normally this
is fine but a deadlock can result in the following situation.

- NFS client is in a disconnected state
- [events/0] runs a usermodehelper app with an NFS dependent operation,
this triggers an NFS reconnect.
- NFS reconnect happens to be submitted to [events/0] work queue.
- Deadlock, the [events/0] work queue will never process the
reconnect because it is blocked on the previous NFS dependent
operation which will not complete.`

The solution is simply to run reconnect requests on rpciod.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e9b1c9c98c051f49a76dcd76f914c02653aecccb 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: switch socket-based RPC transports to use rpcbind

Now that we have a version of the portmapper that supports versions 3 and 4
of the rpcbind protocol, use it for new RPC client connections over
sockets.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0b4d414714f0d2f922d39424b0c5c82ad900a381 14-Feb-2007 Eric W. Biederman <ebiederm@xmission.com> [PATCH] sysctl: remove insert_at_head from register_sysctl

The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name. Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2b1bec5f52fec033ed0026e7d85f641e20e1cbb9 14-Feb-2007 Eric W. Biederman <ebiederm@xmission.com> [PATCH] sysctl: sunrpc: don't unnecessarily set ctl_table->de

We don't need this to prevent module unload races so remove the unnecessary
code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7e35280e517c28b991667a608990227503dd2a30 14-Feb-2007 Eric W. Biederman <ebiederm@xmission.com> [PATCH] sysctl: sunrpc: remove unnecessary insert_at_head flag

Because the sunrpc sysctls don't conflict with any other sysctls the setting
the insert at head flag to register_sysctl has no semantic meaning.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cd354f1ae75e6466a7e31b727faede57a1f89ca5 14-Feb-2007 Tim Schmielau <tim@physik3.uni-rostock.de> [PATCH] remove many unneeded #includes of sched.h

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
46121cf7d85869bfe9588bac7ccf55aa0bc7f278 31-Jan-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: fix print format for tk_pid

The tk_pid field is an unsigned short. The proper print format specifier for
that type is %5u, not %4d.

Also clean up some miscellaneous print formatting nits.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ed07536ed6731775219c1df7fa26a7588753e693 07-Dec-2006 Peter Zijlstra <a.p.zijlstra@chello.nl> [PATCH] lockdep: annotate nfs/nfsd in-kernel sockets

Stick NFS sockets in their own class to avoid some lockdep warnings. NFS
sockets are never exposed to user-space, and will hence not trigger certain
code paths that would otherwise pose deadlock scenarios.

[akpm@osdl.org: cleanups]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Dickson <SteveD@redhat.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
[ Fixed patch corruption by quilt, pointed out by Peter Zijlstra ]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fbf76683ff9d1462ec0b2f90ec6ea4793652318c 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: relocate the creation of socket-specific tunables

Clean-up:

The RPC client currently creates some sysctls that are specific to the
socket transport. Move those entirely into xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
282b32e17f64be2204f1ac96d7f40f92cb768cd7 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: create stubs for xprtsock init and cleanup

Over time we will want to add some specific init and cleanup logic for the
xprtsock implementation. Add stub routines for initialization and exit
processing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
dd4564715eae2c4136f278da9ae1c3bb5af3e509 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Rename skb_reader_t and friends

Clean-up: hch suggested that the RPC client shouldn't pollute the name
space used by the generic skb manipulation routines in net/core/skbuff.c.

Rename a couple of types in xdr.h to adhere to this convention.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9d29231690925915015c21c1fff73c7118099843 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: skb_read_bits is the same as xs_tcp_copy_data

Clean-up: eliminate xs_tcp_copy_data -- it's exactly the same logic as the
common routine skb_read_bits. The UDP and TCP socket read code now share
the same routine for copying data into an xdr_buf.

Now that skb_read_bits() is exported, rename it to avoid confusing it with
a generic skb_* function. As these functions are XDR-specific, they should
not have names that suggest they are of generic use. Also rename
skb_read_and_csum_bits() to be consistent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7559c7a28fbcaa0bca028eeebd5f251b09befe6b 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Make address format buffers more generic

For now we will assume that all transports will use the address format
buffers in the rpc_xprt struct to store their addresses. Change
rpc_peer2str() to be a generic routine to handle this, and get rid of the
print_address() op in the rpc_xprt_ops vector.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
314dfd7987c71d7ba0c43ac3bf3d243c102ce025 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: move saved socket callback functions to a private data structure

Move the three fields for saving socket callback functions out of the
rpc_xprt structure and into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7c6e066ec29290bf062f5bff2984bad9be5809c7 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Move the UDP socket bufsize parameters to a private data structure

Move the socket-specific buffer size parameters for UDP sockets to a
private data structure maintained in net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c8475461829fd94f30208fbfa4eab7e5584c6495 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Move rpc_xprt socket connect fields into private data structure

Move the socket-specific connection management fields out of the generic
rpc_xprt structure into a private data structure maintained in
net/sunrpc/xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e136d0926ef6a048f6e65b35263c0a9faae3abbe 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Move TCP state flags into xprtsock.c

Move "XPRT_LAST_FRAG" and friends from xprt.h into xprtsock.c, and rename
them to use the naming scheme in use in xprtsock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
51971139b2342fa1005e87bbfcb52305da3fe891 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Move TCP receive state variables into private data structure

Move the TCP receive state variables from the generic rpc_xprt structure to
a private structure maintained inside net/sunrpc/xprtsock.c.

Also rename a function/variable pair to refer to RPC fragment headers
instead of record markers, to be consistent with types defined in
sunrpc/*.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ee0ac0c227c2a2b6dd1b33c23831100ee895dacf 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Remove sock and inet fields from rpc_xprt

The "sock" and "inet" fields are socket-specific. Move them to a private
data structure maintained entirely within net/sunrpc/xprtsock.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ffc2e518c91942b7ed45fb0ab7deba1ba0c8594a 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Allocate a private data area for socket-specific rpc_xprt fields

When setting up a new transport instance, allocate enough memory for an
rpc_xprt and a private area. As part of the same memory allocation, it
will be easy to find one, given a pointer to the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c8541ecdd5692bcfbcb5305cab9a873288d29175 17-Oct-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Make the transport-specific setup routine allocate rpc_xprt

Change the location where the rpc_xprt structure is allocated so each
transport implementation can allocate a private area from the same
chunk of memory.

Note also that xprt->ops->destroy, rather than xprt_destroy, is now
responsible for freeing rpc_xprt when the transport is destroyed.

Test plan:
Connectathon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
24c5684b65ff52ebfa942e8086d91a4966121ae7 17-Oct-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up xs_send_pages()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
65f27f38446e1976cc98fd3004b110fedcddd189 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: Pass the work_struct pointer instead of context data

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
52bad64d95bd89e08c49ec5a071fa6dcbe5a1a9c 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: Separate delayable and non-delayable events.

Separate delayable work items from non-delayable work items be splitting them
into a separate structure (delayed_work), which incorporates a work_struct and
the timer_list removed from work_struct.

The work_struct struct is huge, and this limits it's usefulness. On a 64-bit
architecture it's nearly 100 bytes in size. This reduces that by half for the
non-delayable type of event.

Signed-Off-By: David Howells <dhowells@redhat.com>
b7766da7f7395b74dec9e52005b7dac0d09391a4 20-Oct-2006 Chuck Lever <chuck.lever@oracle.com> [PATCH] SUNRPC: fix a typo

Yes, this actually passed tests the way it was.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
d8ed029d6000ba2e2908d9286409e4833c091b4c 27-Sep-2006 Alexey Dobriyan <adobriyan@gmail.com> [SUNRPC]: trivial endianness annotations

pure s/u32/__be32/

[AV: large part based on Alexey's patches]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
ff9aa5e56df60cc8565a93cc868fe25ae3f20e49 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Eliminate xprt_create_proto and rpc_create_client

The two function call API for creating a new RPC client is now obsolete.
Remove it.

Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services. The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.

Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
correctness of NLM requests and replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c4efcb1d3e0bc76aeb9ca6301d19a5079893c6c9 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer address

IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
edb267a688fcee5335d596752f117a30c7152e44 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: add xprt switch API for printing formatted remote peer addresses

Add a new method to the transport switch API to provide a way to convert
the opaque contents of xprt->addr to a human-readable string.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
bbf7c1dd2ae2b4040b41b1065ee9b1b6905b1605 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Introduce transport switch callout for pluggable rpcbind

Introduce a clean transport switch API for plugging in different types of
rpcbind mechanisms. For instance, rpcbind can cleanly replace the
existing portmapper client, or a transport can choose to implement RPC
binding any way it likes.

Test plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ec739ef03dc926d05051c8c5838971445504470a 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Create a helper to tell whether a transport is bound

Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field. This change is required to support
alternate RPC host address formats (eg IPv6).

Test-plan:
Destructive testing (unplugging the network temporarily). Repeated runs of
Connectathon locking suite with UDP and TCP.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
53fad3cbff120d8987f377eff374cf4db4ecb177 10-Aug-2006 Sridhar Samudrala <sri@us.ibm.com> [SUNRPC]: Remove the unnecessary check for highmem in xs_sendpages().

Just call kernel_sendpage() directly.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6242e928ef1e4ed853f909a7479e4934f4bcb70 08-Aug-2006 Sridhar Samudrala <sri@us.ibm.com> [SUNRPC]: Update to use in-kernel sockets API.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e0ab53deaa91293a7958d63d5a2cf4c5645ad6f0 27-Jul-2006 Trond Myklebust <Trond.Myklebust@netapp.com> RPC: Ensure that we disconnect TCP socket when client requests error out

If we're part way through transmitting a TCP request, and the client
errors, then we need to disconnect and reconnect the TCP socket in order to
avoid confusing the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 031a50c8b9ea82616abd4a4e18021a25848941ce commit)
0da974f4f303a6842516b764507e3c0a03f41e5a 21-Jul-2006 Panagiotis Issaris <takis@issaris.org> [NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b85d88068444ae5dcb1639bcef770ccbf085dd4e 25-May-2006 Chuck Lever <cel@netapp.com> SUNRPC: select privileged port numbers at random

Make the RPC client select privileged ephemeral source ports at
random. This improves DRC behavior on the server by using the
same port when reconnecting for the same mount point, but using
a different port for fresh mounts.

The Linux TCP implementation already does this for nonprivileged
ports. Note that TCP sockets in TIME_WAIT will prevent quick reuse
of a random ephemeral port number by leaving the port INUSE until
the connection transitions out of TIME_WAIT.

Test plan:
Connectathon against every known server implementation using multiple
mount points. Locking especially.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ef759a2e54ed434b2f72b52a14edecd6d4eadf74 20-Mar-2006 Chuck Lever <cel@netapp.com> SUNRPC: introduce per-task RPC iostats

Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent. Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
262ca07de4d7f1bff20361c1353bb14b3607afb2 20-Mar-2006 Chuck Lever <cel@netapp.com> SUNRPC: add a handful of per-xprt counters

Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
632e3bdc5006334cea894d078660b691685e1075 03-Jan-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure client closes the socket when server initiates a close

If the server decides to close the RPC socket, we currently don't actually
respond until either another RPC call is scheduled, or until xprt_autoclose()
gets called by the socket expiry timer (which may be up to 5 minutes
later).

This patch ensures that xprt_autoclose() is called much sooner if the
server closes the socket.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
922004120b10dcb0ce04b55014168e8a7a8c1a0e 03-Jan-2006 Chuck Lever <cel@netapp.com> SUNRPC: transport switch API for setting port number

At some point, transport endpoint addresses will no longer be IPv4. To hide
the structure of the rpc_xprt's address field from ULPs and port mappers,
add an API for setting the port number during an RPC bind operation.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
02107148349f31eee7c0fb06fd7a880df73dbd20 03-Jan-2006 Chuck Lever <cel@netapp.com> SUNRPC: switchable buffer allocation

Add RPC client transport switch support for replacing buffer management
on a per-transport basis.

In the current IPv4 socket transport implementation, RPC buffers are
allocated as needed for each RPC message that is sent. Some transport
implementations may choose to use pre-allocated buffers for encoding,
sending, receiving, and unmarshalling RPC messages, however. For
transports capable of direct data placement, the buffers can be carved
out of a pre-registered area of memory rather than from a slab cache.

Test-plan:
Millions of fsx operations. Performance characterization with "sio" and
"iozone". Use oprofile and other tools to look for significant regression
in CPU utilization.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b079fa7baa86b47579f3f60f86d03d21c76159b8 13-Dec-2005 Trond Myklebust <Trond.Myklebust@netapp.com> RPC: Do not block on skb allocation

If we get something like the following,
[ 125.300636] [<c04086e1>] schedule_timeout+0x54/0xa5
[ 125.305931] [<c040866e>] io_schedule_timeout+0x29/0x33
[ 125.311495] [<c02880c4>] blk_congestion_wait+0x70/0x85
[ 125.317058] [<c014136b>] throttle_vm_writeout+0x69/0x7d
[ 125.322720] [<c014714d>] shrink_zone+0xe0/0xfa
[ 125.327560] [<c01471d4>] shrink_caches+0x6d/0x6f
[ 125.332581] [<c01472a6>] try_to_free_pages+0xd0/0x1b5
[ 125.338056] [<c013fa4b>] __alloc_pages+0x135/0x2e8
[ 125.343258] [<c03b74ad>] tcp_sendmsg+0xaa0/0xb78
[ 125.348281] [<c03d4666>] inet_sendmsg+0x48/0x53
[ 125.353212] [<c0388716>] sock_sendmsg+0xb8/0xd3
[ 125.358147] [<c0388773>] kernel_sendmsg+0x42/0x4f
[ 125.363259] [<c038bc00>] sock_no_sendpage+0x5e/0x77
[ 125.368556] [<c03ee7af>] xs_tcp_send_request+0x2af/0x375
then the socket is blocked until memory is reclaimed, and no
progress can ever be made.

Try to access the emergency pools by using GFP_ATOMIC.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c556b754967afd0878d65de2cfe0675577b0f62f 01-Nov-2005 Chuck Lever <cel@netapp.com> SUNRPC: allow sunrpc.o to link when CONFIG_SYSCTL is disabled

The sunrpc module should build properly even when CONFIG_SYSCTL is
disabled.

Reported by Jan-Benedict Glaw.

Test plan:
Compile kernel with CONFIG_NFS as a module and built-in, and CONFIG_SYSCTL
enabled and disabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
470056c288334eb0b37be26c9ff8aee37ed1cc7a 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: rationalize set_buffer_size

In fact, ->set_buffer_size should be completely functionless for non-UDP.

Test-plan:
Check socket buffer size on UDP sockets over time.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
03bf4b707eee06706c9db343dd5c905b7ee47ed2 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: parametrize various transport connect timeouts

Each transport implementation can now set unique bind, connect,
reestablishment, and idle timeout values. These are variables,
allowing the values to be modified dynamically. This permits
exponential backoff of any of these values, for instance.

As an example, we implement exponential backoff for the connection
reestablishment timeout.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3167e12c0c424f3c323944701615343022d86418 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: make sure to get the same local port number when reconnecting

Implement a best practice: if the remote end drops our connection, try to
reconnect using the same port number. This is important because the NFS
server's Duplicate Reply Cache often hashes on the source port number.
If the client reuses the port number when it reconnects, the server's DRC
will be more effective.

Based on suggestions by Mike Eisler, Olaf Kirch, and Alexey Kuznetsky.

Test-plan:
Destructive testing.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
529b33c6db0120126b1381faa51406dc463acdc9 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: allow RPC client's port range to be adjustable

Select an RPC client source port between 650 and 1023 instead of between
1 and 800. The old range conflicts with a number of network services.
Provide sysctls to allow admins to select a different port range.

Note that this doesn't affect user-level RPC library behavior, which
still uses 1 to 800.

Based on a suggestion by Olaf Kirch <okir@suse.de>.

Test-plan:
Repeated mount and unmount. Destructive testing. Idle timeouts.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
555ee3af161b037865793bd4bebc06b58daafde6 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: clean up after nocong was removed

Clean-up: Move some macros that are specific to the Van Jacobson
implementation into xprt.c. Get rid of the cong_wait field in
rpc_xprt, which is no longer used. Get rid of xprt_clear_backlog.

Test-plan:
Compile with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ed63c003701a314c4893c11eceb9d68f8f46c662 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: remove xprt->nocong

Get rid of the "xprt->nocong" variable.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
Look for significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a58dd398f5db4f73d5c581069fd70a4304cc4f0a 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: add a release_rqst callout to the RPC transport switch

The final place where congestion control state is adjusted is in
xprt_release, where each request is finally released. Add a callout
there to allow transports to perform additional processing when a
request is about to be released.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1570c1e41eabf6b7031f3e4322a2cf1cbe319fee 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: add generic interface for adjusting the congestion window

A new interface that allows transports to adjust their congestion window
using the Van Jacobson implementation in xprt.c is provided.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
46c0ee8bc4ad3743de05e8b8b20201df44dcb6d3 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: separate xprt_timer implementations

Allow transports to hook the retransmit timer interrupt. Some transports
calculate their congestion window here so that a retransmit timeout has
immediate effect on the congestion window.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
49e9a89086b3cae784a4868ca852863e4f4ea3fe 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: expose API for serializing access to RPC transports

The next method we abstract is the one that releases a transport,
allowing another task to have access to the transport.

Again, one generic version of this is provided for transports that
don't need the RPC client to perform congestion control, and one
version is for transports that can use the original Van Jacobson
implementation in xprt.c.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12a804698b29d040b7cdd92e8a44b0e75164dae9 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: expose API for serializing access to RPC transports

The next several patches introduce an API that allows transports to
choose whether the RPC client provides congestion control or whether
the transport itself provides it.

The first method we abstract is the one that serializes access to the
RPC transport to prevent the bytes from different requests from mingling
together. This method provides proper request serialization and the
opportunity to prevent new requests from being started because the
transport is congested.

The normal situation is for the transport to handle congestion control
itself. Although NFS over UDP was first, it has been recognized after
years of experience that having the transport provide congestion control
is much better than doing it in the RPC client. Thus TCP, and probably
every future transport implementation, will use the default method,
xprt_lock_write, provided in xprt.c, which does not provide any kind
of congestion control. UDP can continue using the xprt.c-provided
Van Jacobson congestion avoidance implementation.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fe3aca290f17ae4978bd73d02aa4029f1c9c024c 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: add API to set transport-specific timeouts

Prepare the way to remove the "xprt->nocong" variable by adding a callout
to the RPC client transport switch API to handle setting RPC retransmit
timeouts.

Add a pair of generic helper functions that provide the ability to set a
simple fixed timeout, or to set a timeout based on the state of a round-
trip estimator.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
43118c29dea2b23798bd42a147015cceee7fa885 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: get rid of xprt->stream

Now we can fix up the last few places that use the "xprt->stream"
variable, and get rid of it from the rpc_xprt structure.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
808012fbb23a52ec59352445d2076d175ad4ab26 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: skip over transport-specific heads automatically

Add a generic mechanism for skipping over transport-specific headers
when constructing an RPC request. This removes another "xprt->stream"
dependency.

Test-plan:
Write-intensive workload on a single mount point (try both UDP and
TCP).

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
262965f53defd312a294b45366ea17907b6a616b 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: separate TCP and UDP socket write paths

Split the RPC client's main socket write path into a TCP version and a UDP
version to eliminate another dependency on the "xprt->stream" variable.

Compiler optimization removes unneeded code from xs_sendpages, as this
function is now called with some constant arguments.

We can now cleanly perform transport protocol-specific return code testing
and error recovery in each path.

Test-plan:
Millions of fsx operations. Performance characterization such as
"sio" or "iozone". Examine oprofile results for any changes before and
after this patch is applied.

Version: Thu, 11 Aug 2005 16:08:46 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b0d93ad511ce2f37823a07c7a3258117a431f5fb 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: separate TCP and UDP transport connection logic

Create separate connection worker functions for managing UDP and TCP
transport sockets. This eliminates several dependencies on "xprt->stream".

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon with
v2, v3, and v4.

Version: Thu, 11 Aug 2005 16:08:18 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c7b2cae8a634015b72941ba2fc6c4bc9b8d3a129 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: separate TCP and UDP write space callbacks

Split the socket write space callback function into a TCP version and UDP
version, eliminating one dependence on the "xprt->stream" variable.

Keep the common pieces of this path in xprt.c so other transports can use
it too.

Test-plan:
Write-intensive workload on a single mount point.

Version: Thu, 11 Aug 2005 16:07:51 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
55aa4f58aa43dc9a51fb80010630d94b96053a2e 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: client-side transport switch cleanup

Clean-up: change some comments to reflect the realities of the new RPC
transport switch mechanism. Get rid of unused xprt_receive() prototype.

Also, organize function prototypes in xprt.h by usage and scope.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:07:21 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
44fbac2288dfed6f1963ac00bf922c3bcd779cd1 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Add helper for waking tasks pending on a transport

Clean-up: remove only reference to xprt->pending from the socket transport
implementation. This makes a cleaner interface for other transport
implementations as well.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:06:52 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2226feb6bcd0e5e117a9be3ea3dd3ffc14f3e41e 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: rename the sockstate field

Clean-up: get rid of a name reference to sockets in the generic parts of the
RPC client by renaming the sockstate field in the rpc_xprt structure.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:53 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4a0f8c04f2ece949d54a0c4fd7490259cf23a58a 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Rename sock_lock

Clean-up: replace a name reference to sockets in the generic parts of the RPC
client by renaming sock_lock in the rpc_xprt structure.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:00 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b4b5cc85ed4ecbe4adbfbc4df028850de67a9f09 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Reduce stack utilization in xs_sendpages

Reduce stack utilization of the RPC socket transport's send path.

A couple of unlikely()s are added to ensure the compiler places the
tail processing at the end of the csect.

Test-plan:
Millions of fsx operations. Performance characterization such as "sio" or
"iozone".

Version: Thu, 11 Aug 2005 16:04:30 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9903cd1c27a1f30e8efea75e125be3b2002f7cb9 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: transport switch function naming

Introduce block header comments and a function naming convention to the
socket transport implementation. Provide a debug setting for transports
that is separate from RPCDBG_XPRT. Eliminate xprt_default_timeout().

Provide block comments for exposed interfaces in xprt.c, and eliminate
the useless obvious comments.

Convert printk's to dprintk's.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:04:04 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a246b0105bbd9a70a698f69baae2042996f2a0e9 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: introduce client-side transport switch

Move the bulk of client-side socket-specific code into a separate source
file, net/sunrpc/xprtsock.c.

Test-plan:
Millions of fsx operations. Performance characterization such as "sio" or
"iozone". Destructive testing (unplugging the network temporarily, server
reboots). Connectathon with v2, v3, and v4.

Version: Thu, 11 Aug 2005 16:03:38 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>