History log of /include/linux/sunrpc/svc.h
Revision Date Author Comments
983c684466e02b21f83c025ea539deee6c0aeac0 03-Aug-2014 Trond Myklebust <trond.myklebust@primarydata.com> SUNRPC: get rid of the request wait queue

We're always _only_ waking up tasks from within the sp_threads list, so
we know that they are enqueued and alive. The rq_wait waitqueue is just
a distraction with extra atomic semantics.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
f15a5cf912f05b572d1f9f3772fba019643f4837 10-Jun-2014 Kinglong Mee <kinglongmee@gmail.com> SUNRPC/NFSD: Change to type of bool for rq_usedeferral and rq_splice_ok

rq_usedeferral and rq_splice_ok are used as 0 and 1, just defined to bool.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
a5cddc885b99458df963a75abbe0b40cbef56c48 13-May-2014 J. Bruce Fields <bfields@redhat.com> nfsd4: better reservation of head space for krb5

RPC_MAX_AUTH_SIZE is scattered around several places. Better to set it
once in the auth code, where this kind of estimate should be made. And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2825a7f90753012babe7ee292f4a1eadd3706f92 26-Aug-2013 J. Bruce Fields <bfields@redhat.com> nfsd4: allow encoding across page boundaries

After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
ef11ce24875a8a540adc185e7bce3d7d49c8296f 12-May-2014 NeilBrown <neilb@suse.de> SUNRPC: track whether a request is coming from a loop-back interface.

If an incoming NFS request is coming from the local host, then
nfsd will need to perform some special handling. So detect that
possibility and make the source visible in rq_local.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
7e55b59b2f32afc83452ae250dfd6173c9a7b515 31-Dec-2013 Kinglong Mee <kinglongmee@gmail.com> SUNRPC/NFSD: Support a new option for ignoring the result of svc_register

NFSv4 clients can contact port 2049 directly instead of needing the
portmapper.

Therefore a failure to register to the portmapper when starting an
NFSv4-only server isn't really a problem.

But Gareth Williams reports that an attempt to start an NFSv4-only
server without starting portmap fails:

#rpc.nfsd -N 2 -N 3
rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
rpc.nfsd: unable to set any sockets for nfsd

Add a flag to svc_version to tell the rpc layer it can safely ignore an
rpcbind failure in the NFSv4-only case.

Reported-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
28303ca3090c0aa0dbbb72714c51fceb4b939f6d 30-Nov-2013 Weng Meiling <wengmeiling.weng@huawei.com> sunrpc: fix some typos

Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11d2a1618e377236facdd404113992bde1083914 30-Aug-2013 J. Bruce Fields <bfields@redhat.com> svcrpc: remove unused rq_resused

I forgot to remove this in
afc59400d6c65bad66d4ad0b2daf879cbff8e23e "nfsd4: cleanup: replace
rq_resused count by rq_next_page pointer".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
35525b79786b2ba58ef13822198ce22c497bc7a2 06-Jan-2013 Andriy Skulysh <andriy_skulysh@xyratex.com> sunrpc: Fix lockd sleeping until timeout

There is a race in enqueueing thread to a pool and
waking up a thread.
lockd doesn't wake up on reception of lock granted callback
if svc_wake_up() is called before lockd's thread is added
to a pool.

Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
afc59400d6c65bad66d4ad0b2daf879cbff8e23e 11-Dec-2012 J. Bruce Fields <bfields@redhat.com> nfsd4: cleanup: replace rq_resused count by rq_next_page pointer

It may be a matter of personal taste, but I find this makes the code
clearer.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9695c7057f4887ed54dc1e6c2ef22f72a2be1175 25-Jul-2012 Stanislav Kinsbursky <skinsbursky@parallels.com> SUNRPC: service request network namespace helper introduced

This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
d5497fc693a446ce9100fcf4117c3f795ddfd0d2 15-May-2012 J. Bruce Fields <bfields@redhat.com> nfsd4: move rq_flavor into svc_cred

Move the rq_flavor into struct svc_cred, and use it in setclientid and
exchange_id comparisons as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9793f7c88937e7ac07305ab1af1a519225836823 02-May-2012 Stanislav Kinsbursky <skinsbursky@parallels.com> SUNRPC: new svc_bind() routine introduced

This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
bb2224df5ffe4f864f5b696199b17db1ce77bc0a 31-Jan-2012 Stanislav Kinsbursky <skinsbursky@parallels.com> Lockd: per-net up and down routines introduced

This patch introduces per-net Lockd initialization and destruction routines.
The logic is the same as in global Lockd up and down routines. Probably the
solution is not the best one. But at least it looks clear.
So per-net "up" routine are called only in case of lockd is running already. If
per-net resources are not allocated yet, then service is being registered with
local portmapper and lockd sockets created.
Per-net "down" routine is called on every lockd_down() call in case of global
users counter is not zero.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5ecebb7c7fd737cf387a552994df319c063973db 13-Jan-2012 Stanislav Kinsbursky <skinsbursky@parallels.com> SUNRPC: unregister service on creation in current network namespace

On service shutdown we can be sure, that no more users of it left except
current. Thus it looks like using current network namespace context is safe in
this case.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
5247fab5c82779174d50590e0200bf532248a8a1 13-Jan-2012 Stanislav Kinsbursky <skinsbursky@parallels.com> SUNRPC: pass network namespace to service registering routines

Lockd and NFSd services will handle requests from and to many network
nsamespaces. And thus have to be registered and unregistered per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
16d0587090ab93206768f726f71d84ecf55e05c4 25-Oct-2011 Stanislav Kinsbursky <skinsbursky@parallels.com> NFSd: call svc rpcbind cleanup explicitly

We have to call svc_rpcb_cleanup() explicitly from nfsd_last_thread() since
this function is registered as service shutdown callback and thus nobody else
will done it for us.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
849a1cf13d4394d398d91752166e92e9ecd64f8d 30-Aug-2011 Mi Jinlong <mijinlong@cn.fujitsu.com> SUNRPC: Replace svc_addr_u by sockaddr_storage

For IPv6 local address, lockd can not callback to client for
missing scope id when binding address at inet6_bind:

324 if (addr_type & IPV6_ADDR_LINKLOCAL) {
325 if (addr_len >= sizeof(struct sockaddr_in6) &&
326 addr->sin6_scope_id) {
327 /* Override any existing binding, if another one
328 * is supplied by user.
329 */
330 sk->sk_bound_dev_if = addr->sin6_scope_id;
331 }
332
333 /* Binding to link-local address requires an interface */
334 if (!sk->sk_bound_dev_if) {
335 err = -EINVAL;
336 goto out_unlock;
337 }

Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
besides address.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
11fd165c68b73434ca1273e21f21db5eecc90926 28-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com> sunrpc: use better NUMA affinities

Use NUMA aware allocations to reduce latencies and increase throughput.

sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: Greg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
1091006c5eb15cba56785bd5b498a8d0b9546903 24-Jan-2011 J. Bruce Fields <bfields@redhat.com> nfsd: turn on reply cache for NFSv4

It's sort of ridiculous that we've never had a working reply cache for
NFSv4.

On the other hand, we may still not: our current reply cache is likely
not very good, especially in the TCP case (which is the only case that
matters for v4). What we really need here is some serious testing.

Anyway, here's a start.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
9e00abc3c20904fd6a5d888bb7023925799ec8a5 14-Jul-2011 Trond Myklebust <Trond.Myklebust@netapp.com> SUNRPC: sunrpc should not explicitly depend on NFS config options

Change explicit references to CONFIG_NFS_V4_1 to implicit ones
Get rid of the unnecessary defines in backchannel_rqst.c and
bc_svc.c: the Makefile takes care of those dependency.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4a19de0f4b693139bb10b7cc3cfe1f618576ba67 06-Jan-2011 Andy Adamson <andros@netapp.com> NFS rename client back channel transport field

Differentiate from server backchannel

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9e701c610923aaeac8b38b9202a686d1cc9ee35d 03-Jan-2011 J. Bruce Fields <bfields@redhat.com> svcrpc: simpler request dropping

Currently we use -EAGAIN returns to determine when to drop a deferred
request. On its own, that is error-prone, as it makes us treat -EAGAIN
returns from other functions specially to prevent inadvertent dropping.

So, use a flag on the request instead.

Returning an error on request deferral is still required, to prevent
further processing, but we no longer need worry that an error return on
its own could result in a drop.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
c7af6b0895229bd080b86afc91302b66f6df0378 05-Dec-2009 J. Bruce Fields <bfields@citi.umich.edu> nfsd: remove unused field rq_reffh

This field is never referenced anywhere else. I don't know what it was
intended for.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
78c210efdefe07131f91ed512a3308b15bb14e2f 06-Aug-2009 J. Bruce Fields <bfields@citi.umich.edu> Revert "knfsd: avoid overloading the CPU scheduler with enormous load averages"

This reverts commit 59a252ff8c0f2fa32c896f69d56ae33e641ce7ad.

This helps in an entirely cached workload but not necessarily in
workloads that require waiting on disk.

Conflicts:

include/linux/sunrpc/svc.h
net/sunrpc/svc_xprt.c

Reported-by: Simon Kirby <sim@hostway.ca>
Tested-by: Jesper Krogh <jesper@krogh.cc>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
4bd9b0f4afc76cf972578c702e1bc1b6f2d10ba5 24-Jun-2009 Andy Adamson <andros@netapp.com> nfsd41: use globals for DRC limits

The version 4.1 DRC memory limit and tracking variables are server wide and
session specific. Replace struct svc_serv fields with globals.
Stop using the svc_serv sv_lock.

Add a spinlock to serialize access to the DRC limit management variables which
change on session creation and deletion (usage counter) or (future)
administrative action to adjust the total DRC memory limit.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
9c9f3f5fa62cc4959e4d4d1cf1ec74f2d6ac1197 01-Apr-2009 Andy Adamson <andros@netapp.com> nfs41: sunrpc: add a struct svc_xprt pointer to struct svc_serv for backchannel use

This svc_xprt is passed on to the callback service thread to be later used
to processes incoming svc_rqst's

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
4d6bbb6233c9cf23822a2f66f8470c9f40854b77 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: Backchannel bc_svc_process()

Implement the NFSv4.1 backchannel service. Invokes the common callback
processing logic svc_process_common() to authenticate the call and
dispatch the appropriate NFSv4.1 XDR decoder and operation procedure.
It then invokes bc_send() to send the reply over the same connection.
bc_send() is implemented in a separate patch.

At this time there is no slot validation or reply cache handling.

[nfs41: Preallocate rpc_rqst receive buffer for handling callbacks]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Move bc_svc_process() declaration to correct patch]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
56632b5bff5af10eb12d7e9499b5ffcadcb7a7b2 01-Apr-2009 Ricardo Labiaga <Ricardo.Labiaga@netapp.com> nfs41: client callback structures

Adds new list of rpc_xprt structures, and a readers/writers lock to
protect the list. The list is used to preallocate resources for
the backchannel during backchannel requests. Callbacks are not
expected to cause significant latency, so only one callback will
be allowed at this time.

It also adds a pointer to the NFS callback service so that
requests can be directed to it for processing.

New callback members added to svc_serv. The NFSv4.1 callback service will
sleep on the svc_serv->svc_cb_waitq until new callback requests arrive.
The request will be queued in svc_serv->svc_cb_list. This patch adds this
list, the sleep queue and spinlock to svc_serv.

[nfs41: NFSv4.1 callback support]
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
c3d06f9ce8544fecfe13e377d1e2c2e47fe18dbc 03-Apr-2009 Andy Adamson <andros@netapp.com> nfsd41: hard page limit for DRC

Use no more than 1/128th of the number of free pages at nfsd startup for the
v4.1 DRC.

This is an arbitrary default which should probably end up under the control
of an administrator.

Signed-off-by: Andy Adamson <andros@netapp.com>
[moved added fields in struct svc_serv under CONFIG_NFSD_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[fix set_max_drc calculation of sv_drc_max_pages]
[moved NFSD_DRC_SIZE_SHIFT's declaration up in header file]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2f425878b6a71571341dcd3f9e9d1a6f6355da9c 03-Apr-2009 Andy Adamson <andros@netapp.com> nfsd: don't use the deferral service, return NFS4ERR_DELAY

On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
returned. It is up to the NFSv4.1 client to resend only the operations that
have not been processed.

Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

Note: this isn't an adequate solution on its own. It's acceptable as a way
to get some minimal 4.1 up and working, but we're going to have to find a
way to avoid returning DELAY in all common cases before 4.1 can really be
considered ready.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: reverse rq_nodeferral negative logic]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[sunrpc: initialize rq_usedeferral]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
49a9072f29a1039f142ec98b44a72d7173651c02 19-Mar-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Remove @family argument from svc_create() and svc_create_pooled()

Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled(). Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
4b62e58cccff9c5e7ffc7023f7ec24c75fbd549b 19-Mar-2009 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Pass a family argument to svc_register()

The sv_family field is going away. Instead of using sv_family, have
the svc_register() function take a protocol family argument.

Since this argument represents a protocol family, and not an address
family, this argument takes an int, as this is what is passed to
sock_create_kern(). Also make sure svc_register's helpers are
checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are
equivalent; this is simply a symbolic change to reflect the semantics
of the value stored in that variable.

sock_create_kern() should return EPFNOSUPPORT if the passed-in
protocol family isn't supported, but it uses EAFNOSUPPORT for this
case. We will stick with that tradition here, as svc_register()
is called by the RPC server in the same path as sock_create_kern().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
03cf6c9f49a8fea953d38648d016e3f46e814991 13-Jan-2009 Greg Banks <gnb@sgi.com> knfsd: add file to export stats about nfsd pools

Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.

This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:

http://article.gmane.org/gmane.linux.nfs/10375

It has also been updated thus:

* moved EXPORT_SYMBOL() to near the function it exports
* made the new struct struct seq_operations const
* used SEQ_START_TOKEN instead of ((void *)1)
* merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
printk in svc_pool_stats_*()" by Harshula Jayasuriya.
* merged fix from SGI PV 964001 "Crash reading pool_stats before
nfsds are started".

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Harshula Jayasuriya <harshula@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
59a252ff8c0f2fa32c896f69d56ae33e641ce7ad 13-Jan-2009 Greg Banks <gnb@sgi.com> knfsd: avoid overloading the CPU scheduler with enormous load averages

Avoid overloading the CPU scheduler with enormous load averages
when handling high call-rate NFS loads. When the knfsd bottom half
is made aware of an incoming call by the socket layer, it tries to
choose an nfsd thread and wake it up. As long as there are idle
threads, one will be woken up.

If there are lot of nfsd threads (a sensible configuration when
the server is disk-bound or is running an HSM), there will be many
more nfsd threads than CPUs to run them. Under a high call-rate
low service-time workload, the result is that almost every nfsd is
runnable, but only a handful are actually able to run. This situation
causes two significant problems:

1. The CPU scheduler takes over 10% of each CPU, which is robbing
the nfsd threads of valuable CPU time.

2. At a high enough load, the nfsd threads starve userspace threads
of CPU time, to the point where daemons like portmap and rpc.mountd
do not schedule for tens of seconds at a time. Clients attempting
to mount an NFS filesystem timeout at the very first step (opening
a TCP connection to portmap) because portmap cannot wake up from
select() and call accept() in time.

Disclaimer: these effects were observed on a SLES9 kernel, modern
kernels' schedulers may behave more gracefully.

The solution is simple: keep in each svc_pool a counter of the number
of threads which have been woken but have not yet run, and do not wake
any more if that count reaches an arbitrary small threshold.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server. That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server. The server was running 128 nfsds.

Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
Load average was over 120 before the patch, and 20.9 after.

This patch is a forward-ported version of knfsd-avoid-nfsd-overload
which has been shipping in the SGI "Enhanced NFS" product since 2006.
It has been posted before:

http://article.gmane.org/gmane.linux.nfs/10374

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
c9233eb7b0b11ef176d4bf68da2ce85464b6ec39 20-Oct-2008 Jeff Layton <jlayton@redhat.com> sunrpc: add sv_maxconn field to svc_serv (try #3)

svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:

(serv->sv_nrthreads + 3) * 20

Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.

Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.

This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
5344b12d4f97d4a9a62d806425977a6ff64b6baf 27-Aug-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Make svc_addr's argument a constant

Clean up: Add extra type safety and squelch a few compiler complaints
in upcoming patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
a26cfad6e0a308a2c68df1f1ef50aabd48b17e6d 19-Aug-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Support IPv6 when registering kernel RPC services

In order to advertise NFS-related services on IPv6 interfaces via
rpcbind, the kernel RPC server implementation must use
rpcb_v4_register() instead of rpcb_register().

A new kernel build option allows distributions to use the legacy
v2 call until they integrate an appropriate user-space rpcbind
daemon that can support IPv6 RPC services.

I tried adding some automatic logic to fall back if registering
with a v4 protocol request failed, but there are too many corner
cases. So I just made it a compile-time switch that distributions
can throw when they've replaced portmapper with rpcbind.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
e851db5b05408b89b9a9429a66814b79fabee2a1 01-Jul-2008 Chuck Lever <chuck.lever@oracle.com> SUNRPC: Add address family field to svc_serv data structure

Introduce and initialize an address family field in the svc_serv structure.

This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
a75c5d01e4235a7dd785548ac756f248b1b40107 10-Jun-2008 Jeff Layton <jlayton@redhat.com> sunrpc: remove sv_kill_signal field from svc_serv struct

Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
9867d76ca16b3f455f9ca83861f4ce5c94a25928 10-Jun-2008 Jeff Layton <jlayton@redhat.com> knfsd: convert knfsd to kthread API

This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
8774282c4cef82695ccca8bd09976de5d6e49610 07-Apr-2008 Jeff Layton <jlayton@redhat.com> SUNRPC: remove svc_create_thread()

Now that the nfs4 callback thread uses the kthread API, there are no
more users of svc_create_thread(). Remove it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
fbb7878c1a2ee40a1e983bf20f3dd3a80255dcf2 08-Feb-2008 J. Bruce Fields <bfields@citi.umich.edu> nfsd: clean up svc_reserve_auth()

This is a void function attempting to return the return value from
another void function, which seems harmless but extremely weird, and
apparently makes some compilers complain.

While we're there, clean up a little (e.g. the switch statement had a
minor style problem and seemed overkill as long as there's only one
case).

Thanks to Trond for noticing this.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
0113ab34644649aceaac37ef4b7e5c7d5c183be3 29-Jan-2008 Jeff Layton <jlayton@redhat.com> SUNRPC: spin svc_rqst initialization to its own function

Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.

Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
260c1d1298f6703d38fdccd3dd5a310766327340 31-Dec-2007 Tom Tucker <tom@opengridcomputing.com> svc: Add transport hdr size for defer/revisit

Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
57b1d3babaafea1c395c932914e38c2ff9493001 31-Dec-2007 Tom Tucker <tom@opengridcomputing.com> svc: Removing remaining references to rq_sock in rqstp

This functionally empty patch removes rq_sock and unamed union
from rqstp structure.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
8c7b0172a1db8120d25ecb4eff69664c52ee7639 31-Dec-2007 Tom Tucker <tom@opengridcomputing.com> svc: Make deferral processing xprt independent

This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
5148bf4ebc1f59dc6a0ec43a220c55ff0771246e 31-Dec-2007 Tom Tucker <tom@opengridcomputing.com> svc: Add transport specific xpo_release function

The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
9f29868b491beee706931e0cf875a60cb4688754 31-Dec-2007 Tom Tucker <tom@opengridcomputing.com> svc: Change the svc_sock in the rqstp structure to a transport

The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
3ab4d8b1215d61736e2a9a26bea7cc2e6b029e3d 17-Jul-2007 J. Bruce Fields <bfields@citi.umich.edu> knfsd: nfsd: set rq_client to ip-address-determined-domain

We want it to be possible for users to restrict exports both by IP address and
by pseudoflavor. The pseudoflavor information has previously been passed
using special auth_domains stored in the rq_client field. After the preceding
patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
now we use rq_client for the ip information, as auth_null and auth_unix do.

However, we keep around the special auth_domain in the rq_gssclient field for
backwards compatibility purposes, so we can still do upcalls using the old
"gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
appropriate export. This allows us to continue supporting old mountd.

In fact, for this first patch, we always use the "gss/pseudoflavor"
auth_domain (and only it) if it is available; thus rq_client is ignored in the
auth_gss case, and this patch on its own makes no change in behavior; that
will be left to later patches.

Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
upcall by a dummy value--no version of idmapd has ever used it, and it's
unlikely anyone really wants to perform idmapping differently depending on the
where the client is (they may want to perform *credential* mapping
differently, but that's a different matter--the idmapper just handles id's
used in getattr and setattr). But I'm updating the idmapd code anyway, just
out of general backwards-compatibility paranoia.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c4170583f655dca5da32bd14173d6a93805fc48b 17-Jul-2007 Andy Adamson <andros@citi.umich.edu> knfsd: nfsd4: store pseudoflavor in request

Add a new field to the svc_rqst structure to record the pseudoflavor that the
request was made with. For now we record the pseudoflavor but don't use it
for anything.

Signed-off-by: Andy Adamson <andros@citi.umich.edu>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cf8208d0eabd1d5d2625ec02a175a294c3f30d36 12-Jun-2007 Jens Axboe <jens.axboe@oracle.com> sendfile: convert nfsd to splice_direct_to_actor()

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
cd123012d99fde4759500fee611e724e4f3016e3 09-May-2007 Jeff Layton <jlayton@redhat.com> RPC: add wrapper for svc_reserve to account for checksum

When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:

RPC request reserved 164 but used 208

While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

This is probably also a problem for other sec= types and for NFSv4. The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.

This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum. It also fixes up the appropriate callers of svc_reserve to
call the wrapper. For now, it just uses a hardcoded value that I
determined via testing. That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.

Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5a05ed73e1abfd7e0e7d474817245861deaa18af 06-Mar-2007 NeilBrown <neilb@suse.de> [PATCH] knfsd: remove CONFIG_IPV6 ifdefs from sunrpc server code

They don't really save that much, and aren't worth the hassle.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
95756482c9bfa375418c5a32455494a3042f65cd 12-Feb-2007 Chuck Lever <chuck.lever@oracle.com> [PATCH] knfsd: SUNRPC: support IPv6 addresses in RPC server's UDP receive path

Add support for IPv6 addresses in the RPC server's UDP receive path.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
73df0dbaff8d0853387e140f52b6250c486b18a1 12-Feb-2007 Chuck Lever <chuck.lever@oracle.com> [PATCH] knfsd: SUNRPC: Make rq_daddr field address-version independent

The rq_daddr field must support larger addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
27459f0940e16c68e080f5fc7e85aa9eb3f74528 12-Feb-2007 Chuck Lever <chuck.lever@oracle.com> [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses

Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2442222283918c2d1c20ae651d95fe168757938b 12-Feb-2007 Chuck Lever <chuck.lever@oracle.com> [PATCH] knfsd: SUNRPC: Use sockaddr_storage to store address in svc_deferred_req

Sockaddr_storage will allow us to store arbitrary socket addresses in the
svc_deferred_req struct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ad06e4bd62351bc569cca0f25d68c58dbd298146 12-Feb-2007 Chuck Lever <chuck.lever@oracle.com> [PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing

There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address. Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
250f3915183d377d36e012bac9caa7345ce465b8 26-Jan-2007 NeilBrown <neilb@suse.de> [PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads

NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request. The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.

However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply. This can overflow and array and cause an Oops.

This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7111c66e4e70588c9602035a4996c9cdc2087d2d 20-Oct-2006 Al Viro <viro@ftp.linux.org.uk> [PATCH] fix svc_procfunc declaration

svc_procfunc instances return __be32, not int

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
c6b0a9f87b82f25fa35206ec04b5160372eabab4 06-Oct-2006 NeilBrown <neilb@suse.de> [PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpc

There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received. In
other cases it is the largest 'payload' that can be included in a NFS
message.

In either case, it is not possible for both the request and the reply to be
this large. One of the request or reply may only be one page long, which
fits nicely with NFS.

So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'. Max_payload is the size that the server requests. It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.

max_mesg is the largest single message that can be sent or received. It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead. Only one of the request and reply may be
this size. The other must be at most one page.

Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bc5fea4299b8bda5f73c6f79dc35d388caf8bced 04-Oct-2006 Olaf Kirch <okir@suse.de> [PATCH] knfsd: register all RPC programs with portmapper by default

The NFSACL patches introduced support for multiple RPC services listening on
the same transport. However, only the first of these services was registered
with portmapper. This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.

The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
7adae489fe794e3e203ff168595f635d0b845e59 04-Oct-2006 Greg Banks <gnb@melbourne.sgi.com> [PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP

The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet. That comes next.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
3cc03b164cf01c6f36e64720b58610d292fb26f7 04-Oct-2006 NeilBrown <neilb@suse.de> [PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom

.. by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack. So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union). So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
4452435948424e5322c2a2fefbdc2cf3732cc45d 04-Oct-2006 NeilBrown <neilb@suse.de> [PATCH] knfsd: Replace two page lists in struct svc_rqst with one

We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays. However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply. Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

check counters are initilised and incr properly
check for consistant usage of ++ etc
maybe extra some inlines for common approach
general review

Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bfd241600a3b0db4fe43c859f1460d0a958d924a 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com> [PATCH] knfsd: make rpc threads pools numa aware

Actually implement multiple pools. On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.

This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.

Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a74554429eada89a7ddb47317e6a2968d03e41a2 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com> [PATCH] knfsd: add svc_set_num_threads

Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv. Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code. Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
9a24ab5749a31aa10ee60d9310ad72f24d7c38ab 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com> [PATCH] knfsd: add svc_get

add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
3262c816a3d7fb1eaabce633caa317887ed549ae 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com> [PATCH] knfsd: split svc_serv into pools

Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv. The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
This is a major step towards making the NFS server NUMA-friendly.

Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
36bdfc8bae51339aa27ef8e4ce148185293061ae 02-Oct-2006 Greg Banks <gnb@melbourne.sgi.com> [PATCH] knfsd: move tempsock aging to a timer

Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware. They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.

knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.

The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.

Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2

N Throughput, MiB/s CPU usage, % (max=N*100)
Before After Before After
--- ------ ---- ----- -----
4 312 435 350 228
6 500 656 501 418
8 562 804 690 589

This patch:

Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.

[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6fb2b47fa16c81317ec282248e6cff521cca31c2 02-Oct-2006 NeilBrown <neilb@suse.de> [PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process

It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bc591ccff27e6a85d3a0d6fcb16cfadcc45267a8 02-Oct-2006 NeilBrown <neilb@suse.de> [PATCH] knfsd: add a callback for when last rpc thread finishes

nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more. So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.

[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
d8ed029d6000ba2e2908d9286409e4833c091b4c 27-Sep-2006 Alexey Dobriyan <adobriyan@gmail.com> [SUNRPC]: trivial endianness annotations

pure s/u32/__be32/

[AV: large part based on Alexey's patches]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
7699431301b189fca7ccbb64fe54e5a5170f8497 27-Sep-2006 Alexey Dobriyan <adobriyan@gmail.com> [SUNRPC]: svc_{get,put}nl()

* add svc_getnl():
Take network-endian value from buffer, convert to host-endian
and return it.
* add svc_putnl():
Take host-endian value, convert to network-endian and put it
into a buffer.
* annotate svc_getu32()/svc_putu32() as dealing with network-endian.
* convert to svc_getnl(), svc_putnl().

[AV: in large part it's a carved-up Alexey's patch]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5c04c46aec16b3267d8fe03af886f2d41e448cd0 30-Jun-2006 J. Bruce Fields <bfields@citi.umich.edu> [PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy case

Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
case so that the wrapping code will get copies of the read data instead of
real page cache pages. This makes life simpler when we encrypt the response.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6f54e2d0d3a904e55c9c50b78542072f6c42080e 11-Apr-2006 J. Bruce Fields <bfields@citi.umich.edu> [PATCH] knfsd: svcrpc: WARN() instead of returning an error from svc_take_page

Every caller of svc_take_page ignores its return value and assumes it
succeeded. So just WARN() instead of returning an ignored error. This would
have saved some time debugging a recent nfsd4 problem.

If there are still failure cases here, then the result is probably that we
overwrite an earlier part of the reply while xdr-encoding.

While the corrupted reply is a nasty bug, it would be worse to panic here and
create the possibility of a remote DOS; hence WARN() instead of BUG().

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Ingo Oeser <ioe-lkml@rameria.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1918e341383ab787d6c5b17200f4ed901b10c777 19-Jan-2006 J. Bruce Fields <bfields@citi.umich.edu> [PATCH] svcrpc: save and restore the daddr field when request deferred

The server code currently keeps track of the destination address on every
request so that it can reply using the same address. However we forget to do
that in the case of a deferred request. Remedy this oversight. >From folks
at PolyServe.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0ba7536d5d47e4ecf2259a80b207158dc4e711eb 07-Nov-2005 NeilBrown <neilb@cse.unsw.edu.au> [PATCH] knfsd: Fix some minor sign problems in nfsd/xdr

There are a couple of tests which could possibly be confused by extremely
large numbers appearing in 'xdr' packets. I think the closest to an exploit
you could get would be writing random data from a free page into a file - i.e.
leak data out of kernel space.

I'm fairly sure they cannot be used for remote compromise.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a257cdd0e2179630d3201c32ba14d7fcb3c3a055 22-Jun-2005 Andreas Gruenbacher <agruen@suse.de> [PATCH] NFSD: Add server support for NFSv3 ACLs.

This adds functions for encoding and decoding POSIX ACLs for the NFSACL
protocol extension, and the GETACL and SETACL RPCs. The implementation is
compatible with NFSACL in Solaris.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
9ba02638e4be28dd4ff724202a640264427c62d1 22-Jun-2005 Andreas Gruenbacher <agruen@suse.de> [PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port

The NFS and NFSACL programs run on the same RPC transport. This patch adds
support for this by converting svc_program into a chained list of programs
(server-side).

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!