History log of /net/sunrpc/xprt.c
Revision Date Author Comments
33d90ac0581ce81d1ebfc51918a2757e41a6011c 11-Apr-2013 J. Bruce Fields <bfields@redhat.com> SUNRPC: allow disabling idle timeout

In the gss-proxy case we don't want to have to reconnect at random--we
want to connect only on gss-proxy startup when we can steal gss-proxy's
context to do the connect in the right namespace.

So, provide a flag that allows the rpc_create caller to turn off the
idle timeout.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
ba60eb25ff6be6f8e60488cdfd454e5c612bce60 14-Apr-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a livelock problem in the xprt->backlog queue

This patch ensures that we throttle new RPC requests if there are
requests already waiting in the xprt->backlog queue. The reason for
doing this is to fix livelock issues that can occur when an existing
(high priority) task is waiting in the backlog queue, gets woken up
by xprt_free_slot(), but a new task then steals the slot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a9a6b52ee1baa865283a91eb8d443ee91adfca56 22-Feb-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Don't start the retransmission timer when out of socket space

If the socket is full, we're better off just waiting until it empties,
or until the connection is broken. The reason why we generally don't
want to time out is that the call to xprt->ops->release_xprt() will
trigger a connection reset, which isn't helpful...

Let's make an exception for soft RPC calls, since they have to provide
timeout guarantees.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
ad2368d6f5ec6467b9503176e9fb878daf999629 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Avoid RCU dereferences in the transport bind and connect code

Avoid an RCU dereference by removing task->tk_xprt

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
45bc0dce9879505d6fd9ff68dcd0359fb260dfd7 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix an RCU dereference in xprt_reserve

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
6a24dfb645dbcb05b34d08b991d082bdaa3ff072 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Pass pointers to struct rpc_xprt to the congestion window

Avoid access to task->tk_xprt

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1b092092bf0e2e8b7af1c2a03f615b4e60b05d47 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Pass a pointer to struct rpc_xprt to the connect callback

Avoid another RCU dereference by passing the pointer to struct rpc_xprt
from the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a4f0835c604f80f945ab3e72ffd00547145c4b2b 08-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference()

tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is
defined as an __rcu variable. Replace dereferences of tk_xprt with
non-rcu dereferences where it is safe to do so.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
87ed50036b866db2ec2ba16b2a7aec4a2b0b7c39 07-Jan-2013 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure we release the socket write lock if the rpc_task exits early

If the rpc_task exits while holding the socket write lock before it has
allocated an rpc slot, then the usual mechanism for releasing the write
lock in xprt_release() is defeated.

The problem occurs if the call to xprt_lock_write() initially fails, so
that the rpc_task is put on the xprt->sending wait queue. If the task
exits after being assigned the lock by __xprt_lock_write_func, but
before it has retried the call to xprt_lock_and_alloc_slot(), then
it calls xprt_release() while holding the write lock, but will
immediately exit due to the test for task->tk_rqstp != NULL.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
d19751e7b9bd8a01d00372325439589886674f79 11-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Get rid of the redundant xprt->shutdown bit field

It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
f39c1bfb5a03e2d255451bff05be0d7255298fa4 07-Sep-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a UDP transport regression

Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2c53040f018b6c36a46eec75b9b937aaa5f78e6d 10-Jul-2012 Ben Hutchings <bhutchings@solarflare.com> net: Fix (nearly-)kernel-doc comments for various functions

Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1afeaf5c29aa07db25760d2fbed5c08a3aec3498 19-May-2012 Trond Myklebust <Trond.Myklebust@netapp.com> sunrpc: fix loss of task->tk_status after rpc_delay call in xprt_alloc_slot

xprt_alloc_slot will call rpc_delay() to make the task wait a bit before
retrying when it gets back an -ENOMEM error from xprt_dynamic_alloc_slot.
The problem is that rpc_delay will clear the task->tk_status, causing
call_reserveresult to abort the task.

The solution is simply to let call_reserveresult handle the ENOMEM error
directly.

Reported-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org [>= 3.1]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
6b34309936ed5c85cbe5868655814065f42c2f38 16-May-2012 Jeff Layton <jlayton@redhat.com> sunrpc: suppress page allocation warnings in xprt_alloc_slot()

It's easily possible for these allocations to fail since we're using
GFP_NOWAIT here. We don't want to spam the logs with warnings about
that though.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
95c961747284a6b83a5e2d81240e214b0fa3464d 15-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com> net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e0038b6b246e4145fc4a53dca61a556d17bc52c 01-Mar-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move clnt->cl_server into struct rpc_xprt

When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0a702195234eb77c4097148285cccf7f095de9cf 17-Feb-2012 Weston Andros Adamson <dros@netapp.com> NFS: include filelayout DS rpc stats in mountstats

Include RPC statistics from all data servers in /proc/self/mountstats for pNFS
filelayout mounts.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
15a4520621824a3c2eb2de2d1f3984bc1663d3c8 14-Feb-2012 Andy Adamson <andros@netapp.com> SUNRPC: add sending,pending queue and max slot to xprt stats

With static RPC slots, the xprt backlog queue stats were useful in showing
when the transport (TCP) was starved by lack of RPC slots. The new dynamic
RPC slot code, commit d9ba131d8f58c0d2ff5029e7002ab43f913b36f9, always
provides an RPC slot and so only uses the xprt backlog queue when the
tcp_max_slot_table_entries value has been hit or when an allocation error
occurs. All requests are now placed on the xprt sending or pending queue which
need to be monitored for debugging.

The max_slot stat shows the maximum number of dynamic RPC slots reached which is
useful when debugging performance issues.

Add the new fields at the end of the mountstats xprt stanza so that mountstats
outputs the previous correct values and ignores the new fields. Bump
NFS_IOSTATS_VERS.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
87e3c0553fcbea79bf9f17fc5694484ecf3ae5e8 01-Feb-2012 Dan Carpenter <dan.carpenter@oracle.com> SUNRPC: remove an unneeded NULL check in xprt_connect()

We check "task->tk_rqstp" and then we dereference it without checking on
the next line. The only caller is call_connect() and that has a check
which prevents it from calling xprt_connect() with a NULL.

if (task->tk_status < 0)
return;

If "task->tk_rqstp" were NULL then "tk_status" would be -EAGAIN.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
961a828df64979d2a9faeeeee043391670a193b9 18-Jan-2012 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix potential races in xprt_lock_write_next()

We have to ensure that the wake up from the waitqueue and the assignment
of xprt->snd_task are atomic. We can do this by assigning the snd_task
while under the waitqueue spinlock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c25573b5134294c0be82bfaecc6d08136835b271 01-Dec-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure we always bump the backlog queue in xprt_free_slot

Whenever we free a slot, we know that the resulting xprt->num_reqs will
be less than xprt->max_reqs, so we know that we can release at least one
backlogged rpc_task.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.1]
177c27bf05d0ea508e65afdbe4b6998c81e46af5 28-Jul-2011 Randy Dunlap <rdunlap@xenotime.net> net: fix new sunrpc kernel-doc warning

Fix new kernel-doc warning in sunrpc:

Warning(net/sunrpc/xprt.c:196): No description found for parameter 'xprt'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
34006cee28f7344f9557a4be3816c7891b1bbab1 18-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Replace xprt->resend and xprt->sending with a priority queue

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
d9ba131d8f58c0d2ff5029e7002ab43f913b36f9 18-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Support dynamic slot allocation for TCP connections

Allow the number of available slots to grow with the TCP window size.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
21de0a955f3af29fa1100d96f66e6adade89e77a 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up the slot table allocation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
8d9266ffe4332afc5ac9de401ef6f825b3798585 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Initalise the struct xprt upon allocation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
43cedbf0e8dfb9c5610eb7985d5f21263e313802 17-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure that we grab the XPRT_LOCK before calling xprt_alloc_slot

This throttles the allocation of new slots when the socket is busy
reconnecting and/or is out of buffer space.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9e00abc3c20904fd6a5d888bb7023925799ec8a5 14-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: sunrpc should not explicitly depend on NFS config options

Change explicit references to CONFIG_NFS_V4_1 to implicit ones
Get rid of the unnecessary defines in backchannel_rqst.c and
bc_svc.c: the Makefile takes care of those dependency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
468f86134ee515234afe5c5b3f39f266c50e61a5 18-Apr-2011 Bryan Schumaker <bjschuma@netapp.com> NFSv4.1: Don't update sequence number if rpc_task is not sent

If we fail to contact the gss upcall program, then no message will
be sent to the server. The client still updated the sequence number,
however, and this lead to NFS4ERR_SEQ_MISMATCH for the next several
RPC calls.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ba3c578de274a5438bafbce03f9225936698051c 16-Mar-2011 j223yang@asset.uwaterloo.ca <j223yang@asset.uwaterloo.ca> xprt: remove redundant check

remove redundant check.

Signed-off-by: Jinqiu Yang <crindy646@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a8de240a9074b72b156d9e6d53f00076e6cd5f03 16-Mar-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Convert struct rpc_xprt to use atomic_t counters

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4d4a76f3309edc671918a767b336492fbc80a16d 10-Mar-2011 j223yang@asset.uwaterloo.ca <j223yang@asset.uwaterloo.ca> xprt: remove redundant null check

'req' is dereferenced before checked for NULL.
The patch simply removes the check.

Signed-off-by: Jinqiu Yang<crindy646@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
f0418aa4b1103f959d64dc18273efa04ee0140e9 08-Dec-2010 J. Bruce Fields <bfields@redhat.com> rpc: allow xprt_class->setup to return a preexisting xprt

This allows us to reuse the xprt associated with a server connection if
one has already been set up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
99de8ea962bbc11a51ad4c52e3dc93bee5f6ba70 08-Dec-2010 J. Bruce Fields <bfields@redhat.com> rpc: keep backchannel xprt as long as server connection

Multiple backchannels can share the same tcp connection; from rfc 5661 section
2.10.3.1:

A connection's association with a session is not exclusive. A
connection associated with the channel(s) of one session may be
simultaneously associated with the channel(s) of other sessions
including sessions associated with other client IDs.

However, multiple backchannels share a connection, they must all share
the same xid stream (hence the same rpc_xprt); the only way we have to
match replies with calls at the rpc layer is using the xid.

So, keep the rpc_xprt around as long as the connection lasts, in case
we're asked to use the connection as a backchannel again.

Requests to create new backchannel clients over a given server
connection should results in creating new clients that reuse the
existing rpc_xprt.

But to start, just reject attempts to associate multiple rpc_xprt's with
the same underlying bc_xprt.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
8f3a6de313391b6910aa7db185eb9f3e930a51cf 05-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Turn list_for_each-s into the ..._entry-s

Saves some lines of code and some branticks when reading one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
f10fef38d2d1605c977346457d0adb0919d0bbe7 05-Oct-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Remove useless if (task == NULL) from xprt_reserve_xprt

The task in question is dereferenced above (and is actually never NULL).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
37aa2133731d9231eb834f700119f0d3f1ed2664 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Tag rpc_xprt with net

The net is known from the xprt_create and this tagging will also
give un the context in the conntection workers where real sockets
are created.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
e204e621b4160c802315bc2d0fa335337c0d62e8 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out rpc_xprt freeing

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
bd1722d4316e42a12fe6337ebe34d7e1e2c088b2 29-Sep-2010 Pavel Emelyanov <xemul@parallels.com> sunrpc: Factor out rpc_xprt allocation

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
c3ae62ae08bb0db3639d8c579e4ff0967d908199 03-Aug-2010 J. Bruce Fields <bfields@redhat.com> SUNRPC: prevent task_cleanup running on freed xprt

We saw a report of a NULL dereference in xprt_autoclose:

https://bugzilla.redhat.com/show_bug.cgi?id=611938

This appears to be the result of an xprt's task_cleanup running after
the xprt is destroyed. Nothing in the current code appears to prevent
that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a17c2153d2e271b0cbacae9bed83b0eaa41db7e1 31-Jul-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move the bound cred to struct rpc_rqst

This will allow us to save the original generic cred in rpc_message, so
that if we migrate from one server to another, we can generate a new bound
cred without having to punt back to the NFS layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ef7ffe8f06895312aeb08a5f8a1d4c90e34335ea 24-May-2010 Alex Riesen <raa.lkml@gmail.com> sunrpc: use formatting of module name in SUNRPC

gcc-4.3.3 produces the warning:
"format not a string literal and no format arguments"

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3fa21e07e6acefa31f974d57fba2b6920a7ebd1a 18-May-2010 Joe Perches <joe@perches.com> net: Remove unnecessary returns from void function()s

This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d60dbb20a74c2cfa142be0a34dac3c6547ea086c 13-May-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst

It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ff8399709e41bf72b4cb145612a0f9a9f7283c83 07-May-2010 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Replace jiffies-based metrics with ktime-based metrics

Currently RPC performance metrics that tabulate elapsed time use
jiffies time values. This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments). It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.

For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.

We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation. As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.

We could report RTT metrics in microseconds. In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.

For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools. At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
bbc72cea58f671665b6362be0d4e391813ac0eee 07-May-2010 Chuck Lever <chuck.lever@oracle.com> SUNRPC: RPC metrics and RTT estimator should use same RTT value

Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a8ce4a8f37fef0a09a1e920c2e09f67a80426c7e 16-Apr-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fail over more quickly on connect errors

We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Remove the field xprt->connect_timeout since it has been obsoleted by
xprt->reestablish_timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0b9e79431377df452348e78262dd5a3dc359eeef 16-Apr-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move the test for XPRT_CONNECTING into xprt_connect()

This fixes a bug with setting xprt->stat.connect_start.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ee5ebe851ed60206f150d3f189416f9c63245b66 16-Apr-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up xprt_release()

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0eae88f31ca2b88911ce843452054139e028771f 21-Apr-2010 Eric Dumazet <eric.dumazet@gmail.com> net: Fix various endianness glitches

Sparse can help us find endianness bugs, but we need to make some
cleanups to be able to more easily spot real bugs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9acb42ef1904d15d0fb315061cefbe638f67f3a 19-Mar-2010 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel

The ->release_request() callback was designed to allow the transport layer
to do housekeeping after the RPC call is done. It cannot be used to free
the request itself, and doing so leads to a use-after-free bug in
xprt_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
feb8ca37cc3d83c07fd042509ef1e176cfeb2cfa 03-Dec-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure that we honour autoclose before attempting to reconnect

If the XPRT_CLOSE_WAIT flag is set, we need to ensure that we call
xprt->ops->close() while holding xprt_lock_write() before we can
start reconnecting.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4cfc7e6019caa3e97d2a81c48c8d575d7b38d751 10-Sep-2009 Rahul Iyer <iyer@netapp.com> nfsd41: sunrpc: Added rpc server-side backchannel handling

When the call direction is a reply, copy the xid and call direction into the
req->rq_private_buf.head[0].iov_base otherwise rpc_verify_header returns
rpc_garbage.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Mike Sager <sager@netapp.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[get rid of CONFIG_NFSD_V4_1]
[sunrpc: refactoring of svc_tcp_recvfrom]
[nfsd41: sunrpc: create common send routine for the fore and the back channels]
[nfsd41: sunrpc: Use free_page() to free server backchannel pages]
[nfsd41: sunrpc: Document server backchannel locking]
[nfsd41: sunrpc: remove bc_connect_worker()]
[nfsd41: sunrpc: Define xprt_server_backchannel()[
[nfsd41: sunrpc: remove bc_close and bc_init_auto_disconnect dummy functions]
[nfsd41: sunrpc: eliminate unneeded switch statement in xs_setup_tcp()]
[nfsd41: sunrpc: Don't auto close the server backchannel connection]
[nfsd41: sunrpc: Remove unused functions]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: change bc_sock to bc_xprt]
[nfsd41: sunrpc: move struct rpc_buffer def into a common header file]
[nfsd41: sunrpc: use rpc_sleep in bc_send_request so not to block on mutex]
[removed cosmetic changes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: add new xprt class for nfsv4.1 backchannel]
[sunrpc: v2.1 change handling of auto_close and init_auto_disconnect operations for the nfsv4.1 backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
[reverted more cosmetic leftovers]
[got rid of xprt_server_backchannel]
[separated "nfsd41: sunrpc: add new xprt class for nfsv4.1 backchannel"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@netapp.com>
[sunrpc: change idle timeout value for the backchannel]
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Acked-by: Trond Myklebust <trond.myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
dd2b63d049480979016b959abc2d141cdddb1389 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Rename rq_received to rq_reply_bytes_recvd

The 'rq_received' member of 'struct rpc_rqst' is used to track when we
have received a reply to our request. With v4.1, the backchannel
can now accept callback requests over the existing connection. Rename
this field to make it clear that it is only used for tracking reply bytes
and not all bytes received on the connection.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
343952fa5aac888934ffc203abed26a823400eb6 01-Apr-2009 Rahul Iyer <iyer@netapp.com> nfs41: Get the rpc_xprt * from the rpc_rqst instead of the rpc_clnt.

Obtain the rpc_xprt from the rpc_rqst so that calls and callback replies
can both use the same code path. A client needs the rpc_xprt in order
to reply to a callback.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
55ae1aabfb108106dd095de2578ceef1c755a8b8 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Add backchannel processing support to RPC state machine

Adds rpc_run_bc_task() which is called by the NFS callback service to
process backchannel requests. It performs similar work to rpc_run_task()
though "schedules" the backchannel task to be executed starting at the
call_trasmit state in the RPC state machine.

It also introduces some miscellaneous updates to the argument validation,
call_transmit, and transport cleanup functions to take into account
that there are now forechannel and backchannel tasks.

Backchannel requests do not carry an RPC message structure, since the
payload has already been XDR encoded using the existing NFSv4 callback
mechanism.

Introduce a new transmit state for the client to reply on to backchannel
requests. This new state simply reserves the transport and issues the
reply. In case of a connection related error, disconnects the transport and
drops the reply. It requires the forechannel to re-establish the connection
and the server to retransmit the request, as stated in NFSv4.1 section
2.9.2 "Client and Server Transport Behavior".

Note: There is no need to loop attempting to reserve the transport. If EAGAIN
is returned by xprt_prepare_transmit(), return with tk_status == 0,
setting tk_action to call_bc_transmit. rpc_execute() will invoke it again
after the task is taken off the sleep queue.

[nfs41: rpc_run_bc_task() need not be exported outside RPC module]
[nfs41: New call_bc_transmit RPC state]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: No need to loop in call_bc_transmit()]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[rpc_count_iostats incorrectly exits early]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Convert rpc_reply_expected() to inline function]
[Remove unnecessary BUG_ON()]
[Rename variable]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
f9acac1a4710ce88871f1ae323fc91c1cb6e9d52 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Initialize new rpc_xprt callback related fields

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
f75e6745aa3084124ae1434fd7629853bdaf6798 21-Apr-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix the problem of EADDRNOTAVAIL syslog floods on reconnect

See http://bugzilla.kernel.org/show_bug.cgi?id=13034

If the port gets into a TIME_WAIT state, then we cannot reconnect without
binding to a new port.

Tested-by: Petr Vandrovec <petr@vandrovec.name>
Tested-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2a4919919a97911b0aa4b9f5ac1eab90ba87652b 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending

While we should definitely return socket errors to the task that is
currently trying to send data, there is no need to propagate the same error
to all the other tasks on xprt->pending. Doing so actually slows down
recovery, since it causes more than one tasks to attempt socket recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c8485e4d634f6df155040293928707f127f0d06d 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit()

If we get an ECONNREFUSED error, we currently go to sleep on the
'xprt->sending' wait queue. The problem is that no timeout is set there,
and there is nothing else that will wake the task up later.

We should deal with ECONNREFUSED in call_status, given that is where we
also deal with -EHOSTDOWN, and friends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
441e3e242903f9b190d5764bed73edb58f977413 11-Mar-2009 Tom Talpey <tmtalpey@gmail.com> SUNRPC: dynamically load RPC transport modules on-demand

Provide an api to attempt to load any necessary kernel RPC
client transport module automatically. By convention, the
desired module name is "xprt"+"transport name". For example,
when NFS mounting with "-o proto=rdma", attempt to load the
"xprtrdma" module.

Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
01d37c428ae080563c0a3bb8bdfa88c65a6891d3 11-Mar-2009 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: xprt_connect() don't abort the task if the transport isn't bound

If the transport isn't bound, then we should just return ENOTCONN, letting
call_connect_status() and/or call_status() deal with retrying. Currently,
we appear to abort all pending tasks with an EIO error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c9f6cde6e26ef98ee9c4b6288b126ac9c580d88b 31-Jul-2008 Denis V. Lunev <den@openvz.org> sunrpc: do not pin sunrpc module in the memory

Basically, try_module_get here are pretty useless. Any other module using
this API will pin sunrpc in memory due using exported symbols.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b22602a673b1743bba4b62bb404ffd3b269d2f09 06-Jun-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Ensure all transports set rq_xtime consistently

The RPC client uses the rq_xtime field in each RPC request to determine the
round-trip time of the request. Currently, the rq_xtime field is
initialized by each transport just before it starts enqueing a request to
be sent. However, transports do not handle initializing this value
consistently; sometimes they don't initialize it at all.

To make the measurement of request round-trip time consistent for all
RPC client transport capabilities, pull rq_xtime initialization into the
RPC client's generic transport logic. Now all transports will get a
standardized RTT measure automatically, from:

xprt_transmit()

to

xprt_complete_rqst()

This makes round-trip time calculation more accurate for the TCP transport.
The socket ->sendmsg() method can return "-EAGAIN" if the socket's output
buffer is full, so the TCP transport's ->send_request() method may call
the ->sendmsg() method repeatedly until it gets all of the request's bytes
queued in the socket's buffer.

Currently, the TCP transport sets the rq_xtime field every time through
that loop so the final value is the timestamp just before the *last* call
to the underlying socket's ->sendmsg() method. After this patch, the
rq_xtime field contains a timestamp that reflects the time just before the
*first* call to ->sendmsg().

This is consequential under heavy workloads because large requests often
take multiple ->sendmsg() calls to get all the bytes of a request queued.
The TCP transport causes the request to sleep until the remote end of the
socket has received enough bytes to clear space in the socket's local
output buffer. This delay can be quite significant.

The method introduced by this patch is a more accurate measure of RTT
for stream transports, since the server can cause enough back pressure
to delay (ie increase the latency of) requests from the client.

Additionally, this patch corrects the behavior of the RDMA transport, which
entirely neglected to initialize the rq_xtime field. RPC performance
metrics for RDMA transports now display correct RPC request round trip
times.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Tom Talpey <thomas.talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
cd983ef81b9d79573848dabf81277c7314220257 11-Jun-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Remove obsolete messages during transport connect

Recent changes to the RPC client's transport connect logic make connect
status values ECONNREFUSED and ECONNRESET impossible.

Clean up xprt_connect_status() to account for these changes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0b80ae4201e5128e16e5161825f5cd377a5d1fee 27-Apr-2008 Randy Dunlap <randy.dunlap@oracle.com> sunrpc: fix missing kernel-doc

Fix missing sunrpc kernel-doc:

Warning(linux-2.6.25-git7//net/sunrpc/xprt.c:451): No description found for parameter 'action'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7c1d71cf56feebfb5b98219b9d11dfc3a2feca62 17-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests

NFSv4 requires us to ensure that we break the TCP connection before we're
allowed to retransmit a request. However in the case where we're
retransmitting several requests that have been sent on the same
connection, we need to ensure that we don't interfere with the attempt to
reconnect and/or break the connection again once it has been established.

We therefore introduce a 'connection' cookie that is bumped every time a
connection is broken. This allows requests to track if they need to force a
disconnection.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
636ac43318ce6939c1698fb67e714d421314ed71 17-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Remove the unused export of xprt_force_disconnect

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1e799b673c6b82b336ab13c48b5651d511ca3000 21-Mar-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix read ordering problems with req->rq_private_buf.len

We want to ensure that req->rq_private_buf.len is updated before
req->rq_received, so that call_decode() doesn't use an old value for
req->rq_rcv_buf.len.

In 'call_decode()' itself, instead of using task->tk_status (which is set
using req->rq_received) must use the actual value of
req->rq_private_buf.len when deciding whether or not the received RPC reply
is too short.

Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
request. A typo meant that we were resetting req->rq_private_buf.len in
call_decode(), and then clobbering that value with the old rq_rcv_buf.len
again in xprt_transmit().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b6ddf64ffe9d59577a9176856bb6fe69a539f573 18-Apr-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix up xprt_write_space()

The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
or not we have someone waiting for buffer memory. Convert the SUNRPC layer
to use the same idiom.
Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
fact, the most common case will be that there is nobody waiting for buffer
space.

SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
limited by the application window. Ensure that we follow the same idiom as
the rest of the networking layer here too.

Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
write_space() doesn't keep waking things up on xprt->pending.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
f6a1cc89309f0ae847a9b6fe418d1c4215e5bc55 22-Feb-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add a (empty for the moment) destructor for rpc_wait_queues

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5d00837b90340af9106dcd93af75fd664c8eb87f 22-Feb-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Run rpc timeout functions as callbacks instead of in softirqs

An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).

The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fda1393938035559b417dd5b26b9cc293a7aee00 22-Feb-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Convert users of rpc_wake_up_task to use rpc_wake_up_queued_task

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fde95c7554aa77f9a242f32b0b5f8f15395abf52 22-Feb-2008 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up rpc_run_timer()

All RPC timeout callback functions are expected to wake the task up. We can
enforce this by moving the wakeup back into rpc_run_timer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
65b6e42cdc5b6a1ce2ada31cc294d7e60b22bb43 14-Feb-2008 Randy Dunlap <randy.dunlap@oracle.com> docbook: sunrpc filenames and notation fixes

Use updated file list for docbook files and
fix kernel-doc warnings in sunrpc:
Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:689): No description found for parameter 'rpc_client'
Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:765): No description found for parameter 'flags'
Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:584): No description found for parameter 'tk_ops'
Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:618): No description found for parameter 'bufsize'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ba7392bb37cb12781890f45d7ddee1618e33a036 20-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add support for per-client timeout values

In order to be able to support setting the timeo and retrans parameters on
a per-mountpoint basis, we move the rpc_timeout structure into the
rpc_clnt.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2881ae74e68ecfe3b32a90936e5d93a9ba598c3a 20-Dec-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up the transport timeout initialisation

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e8914c65f7f8d4e8701b8e78a12b714872ea0402 14-Jul-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Restrict sunrpc client exports

The sunrpc client exports are not meant to be part of any official kernel
API: they can change at the drop of a hat. Mark them as internal functions
using EXPORT_SYMBOL_GPL.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a6eaf8bdf9308b51ec84e358915fc65400029519 14-Jul-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Move exported declarations to the function declarations

Do this for all RPC client related functions and XDR functions.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
62da3b24880bccd4ffc32cf8d9a7e23fab475bdd 07-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Rename xprt_disconnect()

xprt_disconnect() should really only be called when the transport shutdown
is completed, and it is time to wake up any pending tasks. Rename it to
xprt_disconnect_done() in order to reflect the semantical change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7272dcd31d56580dee7693c21e369fd167e137fe 07-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: xprt_autoclose() should not call xprt_disconnect()

The transport layer should do that itself whenever appropriate.

Note that the RDMA transport already assumes that it needs to call
xprt_disconnect in xprt_rdma_close().
For TCP sockets, we want to call xprt_disconnect() only after the
connection has been closed by both ends.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
66af1e558538137080615e7ad6d1f2f80862de01 06-Nov-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix a race in xs_tcp_state_change()

When scheduling the autoclose RPC call, we want to ensure that we don't
race against the test_bit() call in xprt_clear_locked().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
b24b8a247ff65c01b252025926fe564209fae4fc 24-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NET]: Convert init_timer into setup_timer

Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5ba03e82b3dac41bb1c5ca29060aa44b5e44b486 22-Nov-2007 Jiri Slaby <jirislaby@gmail.com> [SUNRPC]: Remove SPIN_LOCK_UNLOCKED

SPIN_LOCK_UNLOCKED is deprecated, use DEFINE_SPINLOCK instead

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
4fa016eb248cac875541fa199af550a8aefa0e90 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> NFS/SUNRPC: support transport protocol naming

To prepare for including non-sockets-based RPC transports, select
RPC transports by an identifier (to be used in following patches).

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
3c341b0b925eee01daae2c594b81e673f659d7cd 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: rename the rpc_xprtsock_create structure

To prepare for including non-sockets-based RPC transports, change the
overly suggestive name of the transport creation arguments struct.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
bc25571e21e8bd053554209f5b1b228ad71e6b99 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: Finish API to load RPC transport implementations dynamically

Allow RPC client transport implementations to be loaded as needed, or
as they become available from distributors or third-party vendors.

Note that we leave the IP sockets implementation in sunrpc.o
permanently, as IP functionality is always available in any
kernel that runs NFS.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
81c098af3da7981902e9f8163aeccc2467c4ba6d 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: Provide a new API for registering transport implementations

To allow transport capabilities to be loaded dynamically, provide an API
for registering and unregistering the transports with the RPC client.
Eventually xprt_create_transport() will be changed to search the list of
registered transports when initializing a fresh transport.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1244480976d357447aeddd3f44977586bfa0462b 10-Sep-2007 \"Talpey, Thomas\ <Thomas.Talpey@netapp.com> SUNRPC: add EXPORT_SYMBOL_GPL for generic transport functions

SUNRPC: add EXPORT_SYMBOL_GPL for generic transport functions

As a preface to allowing arbitrary transport modules to be loaded
dynamically, add EXPORT_SYMBOL_GPL for all generic transport functions
that a transport implementation might want to use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Tom Talpey <tmt@netapp.com>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
96802a095171f5b35cf0e1e0d4be943e6696a253 08-Jul-2007 Frank van Maarseveen <frankvm@frankvm.com> SUNRPC: cleanup transport creation argument passing

Cleanup argument passing to functions for creating an RPC transport.

Signed-off-by: Frank van Maarseveen <frankvm@frankvm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c1384c9c4c184543375b52a0997d06cd98145164 15-Jun-2007 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: fix hang due to eventd deadlock...

Brian Behlendorf writes:

The root cause of the NFS hang we were observing appears to be a rare
deadlock between the kernel provided usermodehelper API and the linux NFS
client. The deadlock can arise because both of these services use the
generic linux work queues. The usermodehelper API run the specified user
application in the context of the work queue. And NFS submits both cleanup
and reconnect work to the generic work queue for handling. Normally this
is fine but a deadlock can result in the following situation.

- NFS client is in a disconnected state
- [events/0] runs a usermodehelper app with an NFS dependent operation,
this triggers an NFS reconnect.
- NFS reconnect happens to be submitted to [events/0] work queue.
- Deadlock, the [events/0] work queue will never process the
reconnect because it is blocked on the previous NFS dependent
operation which will not complete.`

The solution is simply to run reconnect requests on rpciod.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a509050bd3b8e0aa269c2241aa10d74ca7701e2f 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: introduce rpcbind: replacement for in-kernel portmapper

Introduce a replacement for the in-kernel portmapper client that supports
all 3 versions of the rpcbind protocol. This code is not used yet.

Original code by Groupe Bull updated for the latest kernel, with multiple
bug fixes.

Note that rpcb_clnt.c does not yet support registering via versions 3 and
4 of the rpcbind protocol. That is planned for a later patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c5a4dd8b7c15927a8fbff83171b57cad675a79b9 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Eliminate side effects from rpc_malloc

Currently rpc_malloc sets req->rq_buffer internally. Make this a more
generic interface: return a pointer to the new buffer (or NULL) and
make the caller set req->rq_buffer and req->rq_bufsize. This looks much
more like kmalloc and eliminates the side effects.

To fix a potential deadlock, this patch also replaces GFP_NOFS with
GFP_NOWAIT in rpc_malloc. This prevents async RPCs from sleeping outside
the RPC's task scheduler while allocating their buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2bea90d43a050bbc4021d44e59beb34f384438db 29-Mar-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: RPC buffer size estimates are too large

The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two. That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
241c39b9ac4bf847013aa06cce6d4d61426a2006 20-Apr-2007 Trond Myklebust <Trond.Myklebust@netapp.com> RPC: Fix the TCP resend semantics for NFSv4

Fix a regression due to the patch "NFS: disconnect before retrying NFSv4
requests over TCP"

The assumption made in xprt_transmit() that the condition
"req->rq_bytes_sent == 0 and request is on the receive list"
should imply that we're dealing with a retransmission is false.
Firstly, it may simply happen that the socket send queue was full
at the time the request was initially sent through xprt_transmit().
Secondly, doing this for each request that was retransmitted implies
that we disconnect and reconnect for _every_ request that happened to
be retransmitted irrespective of whether or not a disconnection has
already occurred.

Fix is to move this logic into the call_status request timeout handler.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
43d78ef2ba5bec26d0315859e8324bfc0be23766 07-Feb-2007 Chuck Lever <chuck.lever@oracle.com> NFS: disconnect before retrying NFSv4 requests over TCP

RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
twice on the same connection unless it is the NULL procedure. Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to
specify that the underlying transport should be disconnected on a
major timeout. The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do
if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
cca5172a7ec10dfdb0b787cd8e9d5b0b8f179793 10-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] SUNRPC: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
46121cf7d85869bfe9588bac7ccf55aa0bc7f278 31-Jan-2007 Chuck Lever <chuck.lever@oracle.com> SUNRPC: fix print format for tk_pid

The tk_pid field is an unsigned short. The proper print format specifier for
that type is %5u, not %4d.

Also clean up some miscellaneous print formatting nits.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5847e1f4d058677c5e46dc6c3e3c70e8855ea3ba 05-Dec-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Remove pprintk() from net/sunrpc/xprt.c

These appear to be deprecated. Removing them also gets rid of some sparse
noise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c8541ecdd5692bcfbcb5305cab9a873288d29175 17-Oct-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Make the transport-specific setup routine allocate rpc_xprt

Change the location where the rpc_xprt structure is allocated so each
transport implementation can allocate a private area from the same
chunk of memory.

Note also that xprt->ops->destroy, rather than xprt_destroy, is now
responsible for freeing rpc_xprt when the transport is destroyed.

Test plan:
Connectathon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
65f27f38446e1976cc98fd3004b110fedcddd189 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: Pass the work_struct pointer instead of context data

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
d8ed029d6000ba2e2908d9286409e4833c091b4c 27-Sep-2006 Alexey Dobriyan <adobriyan@gmail.com> [SUNRPC]: trivial endianness annotations

pure s/u32/__be32/

[AV: large part based on Alexey's patches]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b6ca86b77b62b798cf9ca2599036420abce7796 05-Sep-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Add refcounting to the struct rpc_xprt

In a subsequent patch, this will allow the portmapper to take a reference
to the rpc_xprt for which it is updating the port number, fixing an Oops.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
da45828e2835057045150b318c4fbe9bb91f18dd 31-Aug-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up soft task error handling

- Ensure that the task aborts the RPC call only when it has actually timed out.
- Ensure that req->rq_majortimeo is initialised correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ff9aa5e56df60cc8565a93cc868fe25ae3f20e49 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Eliminate xprt_create_proto and rpc_create_client

The two function call API for creating a new RPC client is now obsolete.
Remove it.

Also, remove an unnecessary check to see whether the caller is capable of
using privileged network services. The kernel RPC client always uses a
privileged ephemeral port by default; callers are responsible for checking
the authority of users to make use of any RPC service, or for specifying
that a nonprivileged port is acceptable.

Test plan:
Repeated runs of Connectathon locking suite. Check network trace to ensure
correctness of NLM requests and replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c2866763b4029411d166040306691773c12d4caf 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: use sockaddr + size when creating remote transport endpoints

Prepare for more generic transport endpoint handling needed by transports
that might use different forms of addressing, such as IPv6.

Introduce a single function call to replace the two-call
xprt_create_proto/rpc_create_client API. Define a new rpc_create_args
structure that allows callers to pass in remote endpoint addresses of
varying length.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c4efcb1d3e0bc76aeb9ca6301d19a5079893c6c9 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Use "sockaddr_storage" for storing RPC client's remote peer address

IPv6 addresses are big (128 bytes). Now that no RPC client consumers treat
the addr field in rpc_xprt structs as an opaque, and access it only via the
API calls, we can safely widen the field in the rpc_xprt struct to
accomodate larger addresses.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4a68179d38874c37be2802442a71b847f5d1a2a9 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Make RPC portmapper use per-transport storage

Move connection and bind state that was maintained in the rpc_clnt
structure to the rpc_xprt structure. This will allow the creation of
a clean API for plugging in different types of bind mechanisms.

This brings improvements such as the elimination of a single spin lock to
control serialization for all in-kernel RPC binding. A set of per-xprt
bitops is used to serialize tasks during RPC binding, just like it now
works for making RPC transport connections.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. NFSv2/3 and NFSv4 mounting should be carefully checked.
Probably need to rig a server where certain services aren't running, or
that returns an error for some typical operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ec739ef03dc926d05051c8c5838971445504470a 23-Aug-2006 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Create a helper to tell whether a transport is bound

Hide the contents and format of xprt->addr by eliminating direct uses
of the xprt->addr.sin_port field. This change is required to support
alternate RPC host address formats (eg IPv6).

Test-plan:
Destructive testing (unplugging the network temporarily). Repeated runs of
Connectathon locking suite with UDP and TCP.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e0ab53deaa91293a7958d63d5a2cf4c5645ad6f0 27-Jul-2006 Trond Myklebust <Trond.Myklebust@netapp.com> RPC: Ensure that we disconnect TCP socket when client requests error out

If we're part way through transmitting a TCP request, and the client
errors, then we need to disconnect and reconnect the TCP socket in order to
avoid confusing the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
(cherry picked from 031a50c8b9ea82616abd4a4e18021a25848941ce commit)
0da974f4f303a6842516b764507e3c0a03f41e5a 21-Jul-2006 Panagiotis Issaris <takis@issaris.org> [NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bf3fcf89552f24657bcfb6a9d73cd167ebb496c6 25-May-2006 Chuck Lever <cel@netapp.com> SUNRPC: NFS_ROOT always uses the same XIDs

The XID generator uses get_random_bytes to generate an initial XID.
NFS_ROOT starts up before the random driver, though, so get_random_bytes
doesn't set a random XID for NFS_ROOT. This causes NFS_ROOT mount points
to reuse XIDs every time the client is booted. If the client boots often
enough, the server will start serving old replies out of its DRC.

Use net_random() instead.

Test plan:
I/O intensive workloads should perform well and generate no errors. Traces
taken during client reboots should show that NFS_ROOT mounts use unique
XIDs after every reboot.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
43ac3f2961b8616da26114ec6dc76ac2a61f76ad 20-Mar-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Fix memory barriers for req->rq_received

We need to ensure that all writes to the XDR buffers are done before
req->rq_received is visible to other processors.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e95b85ec9d8c8ad4667f746aa4c9d22c281efc44 20-Mar-2006 Chuck Lever <cel@netapp.com> SUNRPC: minor cleanup

RPC_DEBUG_DATA no longer needed in net/sunrpc/xprt.c.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
11c556b3d8d481829ab5f9933a25d29b00913b5a 20-Mar-2006 Chuck Lever <cel@netapp.com> SUNRPC: provide a mechanism for collecting stats in the RPC client

Add a simple mechanism for collecting stats in the RPC client. Stats are
tabulated during xprt_release. Note that per_cpu shenanigans are not
required here because the RPC client already serializes on the transport
write lock.

Test plan:
Compile kernel with CONFIG_NFS enabled. Basic performance regression
testing with high-speed networking and high performance server.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ef759a2e54ed434b2f72b52a14edecd6d4eadf74 20-Mar-2006 Chuck Lever <cel@netapp.com> SUNRPC: introduce per-task RPC iostats

Account for various things that occur while an RPC task is executed.
Separate timers for RPC round trip and RPC execution time show how
long RPC requests wait in queue before being sent. Eventually these
will be accumulated at xprt_release time in one place where they can
be viewed from userland.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
262ca07de4d7f1bff20361c1353bb14b3607afb2 20-Mar-2006 Chuck Lever <cel@netapp.com> SUNRPC: add a handful of per-xprt counters

Monitor generic transport events. Add a transport switch callout to
format transport counters for export to user-land.

Test plan:
Compile kernel with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0065db328533c390fbfb0fe0c46bcf9a278fb99e 03-Jan-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Clean up xprt_destroy()

We ought never to be calling xprt_destroy() if there are still active
rpc_tasks. Optimise away the broken code that attempts to "fix" that case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
632e3bdc5006334cea894d078660b691685e1075 03-Jan-2006 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: Ensure client closes the socket when server initiates a close

If the server decides to close the RPC socket, we currently don't actually
respond until either another RPC call is scheduled, or until xprt_autoclose()
gets called by the socket expiry timer (which may be up to 5 minutes
later).

This patch ensures that xprt_autoclose() is called much sooner if the
server closes the socket.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
02107148349f31eee7c0fb06fd7a880df73dbd20 03-Jan-2006 Chuck Lever <cel@netapp.com> SUNRPC: switchable buffer allocation

Add RPC client transport switch support for replacing buffer management
on a per-transport basis.

In the current IPv4 socket transport implementation, RPC buffers are
allocated as needed for each RPC message that is sent. Some transport
implementations may choose to use pre-allocated buffers for encoding,
sending, receiving, and unmarshalling RPC messages, however. For
transports capable of direct data placement, the buffers can be carved
out of a pre-registered area of memory rather than from a slab cache.

Test-plan:
Millions of fsx operations. Performance characterization with "sio" and
"iozone". Use oprofile and other tools to look for significant regression
in CPU utilization.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
ead5e1c26fdcd969cf40c49cb0589d56879d240d 13-Oct-2005 J. Bruce Fields <bfields@fieldses.org> SUNRPC: Provide a callback to allow free pages allocated during xdr encoding

For privacy, we need to allocate pages to store the encrypted data (passed
in pages can't be used without the risk of corrupting data in the page cache).
So we need a way to free that memory after the request has been transmitted.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5e5ce5be6f0161d2a069a4f8a1154fe639c5c02f 18-Oct-2005 Trond Myklebust <Trond.Myklebust@netapp.com> RPC: allow call_encode() to delay transmission of an RPC call.

Currently, call_encode will cause the entire RPC call to abort if it returns
an error. This is unnecessarily rigid, and gets in the way of attempts
to allow the NFSv4 layer to order RPC calls that carry sequence ids.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
03bf4b707eee06706c9db343dd5c905b7ee47ed2 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: parametrize various transport connect timeouts

Each transport implementation can now set unique bind, connect,
reestablishment, and idle timeout values. These are variables,
allowing the values to be modified dynamically. This permits
exponential backoff of any of these values, for instance.

As an example, we implement exponential backoff for the connection
reestablishment timeout.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
555ee3af161b037865793bd4bebc06b58daafde6 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: clean up after nocong was removed

Clean-up: Move some macros that are specific to the Van Jacobson
implementation into xprt.c. Get rid of the cong_wait field in
rpc_xprt, which is no longer used. Get rid of xprt_clear_backlog.

Test-plan:
Compile with CONFIG_NFS enabled.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a58dd398f5db4f73d5c581069fd70a4304cc4f0a 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: add a release_rqst callout to the RPC transport switch

The final place where congestion control state is adjusted is in
xprt_release, where each request is finally released. Add a callout
there to allow transports to perform additional processing when a
request is about to be released.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1570c1e41eabf6b7031f3e4322a2cf1cbe319fee 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: add generic interface for adjusting the congestion window

A new interface that allows transports to adjust their congestion window
using the Van Jacobson implementation in xprt.c is provided.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
46c0ee8bc4ad3743de05e8b8b20201df44dcb6d3 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: separate xprt_timer implementations

Allow transports to hook the retransmit timer interrupt. Some transports
calculate their congestion window here so that a retransmit timeout has
immediate effect on the congestion window.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
49e9a89086b3cae784a4868ca852863e4f4ea3fe 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: expose API for serializing access to RPC transports

The next method we abstract is the one that releases a transport,
allowing another task to have access to the transport.

Again, one generic version of this is provided for transports that
don't need the RPC client to perform congestion control, and one
version is for transports that can use the original Van Jacobson
implementation in xprt.c.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12a804698b29d040b7cdd92e8a44b0e75164dae9 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: expose API for serializing access to RPC transports

The next several patches introduce an API that allows transports to
choose whether the RPC client provides congestion control or whether
the transport itself provides it.

The first method we abstract is the one that serializes access to the
RPC transport to prevent the bytes from different requests from mingling
together. This method provides proper request serialization and the
opportunity to prevent new requests from being started because the
transport is congested.

The normal situation is for the transport to handle congestion control
itself. Although NFS over UDP was first, it has been recognized after
years of experience that having the transport provide congestion control
is much better than doing it in the RPC client. Thus TCP, and probably
every future transport implementation, will use the default method,
xprt_lock_write, provided in xprt.c, which does not provide any kind
of congestion control. UDP can continue using the xprt.c-provided
Van Jacobson congestion avoidance implementation.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
fe3aca290f17ae4978bd73d02aa4029f1c9c024c 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: add API to set transport-specific timeouts

Prepare the way to remove the "xprt->nocong" variable by adding a callout
to the RPC client transport switch API to handle setting RPC retransmit
timeouts.

Add a pair of generic helper functions that provide the ability to set a
simple fixed timeout, or to set a timeout based on the state of a round-
trip estimator.

Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
43118c29dea2b23798bd42a147015cceee7fa885 26-Aug-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: get rid of xprt->stream

Now we can fix up the last few places that use the "xprt->stream"
variable, and get rid of it from the rpc_xprt structure.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
c7b2cae8a634015b72941ba2fc6c4bc9b8d3a129 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: separate TCP and UDP write space callbacks

Split the socket write space callback function into a TCP version and UDP
version, eliminating one dependence on the "xprt->stream" variable.

Keep the common pieces of this path in xprt.c so other transports can use
it too.

Test-plan:
Write-intensive workload on a single mount point.

Version: Thu, 11 Aug 2005 16:07:51 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
55aa4f58aa43dc9a51fb80010630d94b96053a2e 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: client-side transport switch cleanup

Clean-up: change some comments to reflect the realities of the new RPC
transport switch mechanism. Get rid of unused xprt_receive() prototype.

Also, organize function prototypes in xprt.h by usage and scope.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:07:21 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
44fbac2288dfed6f1963ac00bf922c3bcd779cd1 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Add helper for waking tasks pending on a transport

Clean-up: remove only reference to xprt->pending from the socket transport
implementation. This makes a cleaner interface for other transport
implementations as well.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:06:52 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2226feb6bcd0e5e117a9be3ea3dd3ffc14f3e41e 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: rename the sockstate field

Clean-up: get rid of a name reference to sockets in the generic parts of the
RPC client by renaming the sockstate field in the rpc_xprt structure.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:53 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5dc07727f86b25851e95193a0c484ea21b531c47 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Rename xprt_lock

Clean-up: Replace the xprt_lock with something more aptly named. This lock
single-threads the XID and request slot reservation process.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:26 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4a0f8c04f2ece949d54a0c4fd7490259cf23a58a 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Rename sock_lock

Clean-up: replace a name reference to sockets in the generic parts of the RPC
client by renaming sock_lock in the rpc_xprt structure.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:05:00 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9903cd1c27a1f30e8efea75e125be3b2002f7cb9 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: transport switch function naming

Introduce block header comments and a function naming convention to the
socket transport implementation. Provide a debug setting for transports
that is separate from RPCDBG_XPRT. Eliminate xprt_default_timeout().

Provide block comments for exposed interfaces in xprt.c, and eliminate
the useless obvious comments.

Convert printk's to dprintk's.

Test-plan:
Compile kernel with CONFIG_NFS enabled.

Version: Thu, 11 Aug 2005 16:04:04 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
a246b0105bbd9a70a698f69baae2042996f2a0e9 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: introduce client-side transport switch

Move the bulk of client-side socket-specific code into a separate source
file, net/sunrpc/xprtsock.c.

Test-plan:
Millions of fsx operations. Performance characterization such as "sio" or
"iozone". Destructive testing (unplugging the network temporarily, server
reboots). Connectathon with v2, v3, and v4.

Version: Thu, 11 Aug 2005 16:03:38 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
094bb20b9fcab3a1652a77741caba6b78097d622 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: extract socket logic common to both client and server

Clean-up: Move some code that is common to both RPC client- and server-side
socket transports into its own source file, net/sunrpc/socklib.c.

Test-plan:
Compile kernel with CONFIG_NFS enabled. Millions of fsx operations over
UDP, client and server. Connectathon over UDP.

Version: Thu, 11 Aug 2005 16:03:09 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
eab5c084b858fd95a873fc2b97de9a9ad937b4ed 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] NFS: use a constant value for TCP retransmit timeouts

Implement a best practice: don't use exponential backoff when computing
retransmit timeout values on TCP connections, but simply retransmit
at regular intervals.

This also fixes a bug introduced when xprt_reset_majortimeo() was added.

Test-plan:
Enable RPC debugging and watch timeout behavior on a NFS/TCP mount.

Version: Thu, 11 Aug 2005 16:02:19 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
23475d66bd8600e0c5353f86c1b74f68df27bdb5 11-Aug-2005 Chuck Lever <cel@citi.umich.edu> [PATCH] RPC: Report connection errors properly when mounting with "soft"

Fix up xprt_connect_status: the soft timeout logic was clobbering tk_status,
so TCP connect errors were not properly reported on soft mounts.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.

Version: Thu, 11 Aug 2005 16:01:28 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7e8d7e3c9e38dab8d28a8667faa4941842f64213 08-Jul-2005 KAMBAROV, ZAUR <kambarov@berkeley.edu> [PATCH] coverity: sunrpc/xprt task null check

In __xprt_lock_write() we check to see if `task' is NULL, but in other places
we just go and dereference it.

`task' shouldn't be NULL anyway, so remove this test.

This defect was found automatically by Coverity Prevent, a static analysis
tool.

Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
c54d7e03c3a21b38c587f671704c5a12aa3987fc 25-Jun-2005 David S. Miller <davem@davemloft.net> [SUNRPC]: Fix {s,}size_t printf format strings in xprt.c

Signed-off-by: David S. Miller <davem@davemloft.net>
ae3884621bf5b4caff7785b9a417f262202965b2 22-Jun-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: kick off socket connect operations faster

Make the socket transport kick the event queue to start socket connects
immediately. This should improve responsiveness of applications that are
sensitive to slow mount operations (like automounters).

We are now also careful to cancel the connect worker before destroying
the xprt. This eliminates a race where xprt_destroy can finish before
the connect worker is even allowed to run.

Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP. Hard-code impossibly small connect timeout.

Version: Fri, 29 Apr 2005 15:32:01 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
20e5ac828dfd23b9080159c62a34f32d2dcd92fc 22-Jun-2005 Chuck Lever <cel@netapp.com> [PATCH] RPC: TCP reconnects are too slow

When the network layer reports a connection close, the RPC task
waiting to reconnect should be notified so it can retry immediately
instead of waiting for the normal connection establishment timeout.

This reverts a change made in 2.6.6 as part of adding client support
for RPC over TCP socket idle timeouts.

Test-plan:
Destructive testing with NFS over TCP mounts.

Version: Fri, 29 Apr 2005 15:31:46 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
0f9dc2b16884bb5957d010ed8e9114e771a05916 22-Jun-2005 Trond Myklebust <Trond.Myklebust@netapp.com> [PATCH] RPC: Clean up socket autodisconnect

Cancel autodisconnect requests inside xprt_transmit() in order to avoid
races.
Use more efficient del_singleshot_timer_sync()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
7e06b53d796a3740307b54aa2799077f8a0c84e7 22-Jun-2005 Trond Myklebust <Trond.Myklebust@netapp.com> [PATCH] RPC: fix accounting bug in the case of a truncated RPC message

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
e053d1ab62c8ef0eff3dd4c95448cad3c6d2fbf4 22-Jun-2005 Olaf Kirch <okir@suse.de> [PATCH] RPC: Lazy RPC receive buffer allocation

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!