History log of /fs/file.c
Revision Date Author Comments
e983094d6dce524f3890edfec44b7ca6dbfa1183 31-Aug-2014 Al Viro <viro@zeniv.linux.org.uk> missing annotation in fs/file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
bde6c3aa993066acb0d6ce32ecabe03b9d5df92d 01-Jul-2014 Paul E. McKenney <paulmck@linux.vnet.ibm.com> rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops

RCU-tasks requires the occasional voluntary context switch
from CPU-bound in-kernel tasks. In some cases, this requires
instrumenting cond_resched(). However, there is some reluctance
to countenance unconditionally instrumenting cond_resched() (see
http://lwn.net/Articles/603252/), so this commit creates a separate
cond_resched_rcu_qs() that may be used in place of cond_resched() in
locations prone to long-duration in-kernel looping.

This commit currently instruments only RCU-tasks. Future possibilities
include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce
IPI usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
f6c0a1920e0180175bd5e8e4aff8ea5556f1895d 23-Apr-2014 Al Viro <viro@zeniv.linux.org.uk> fs/file.c: don't open-code kvfree()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7f4b36f9bb930b3b2105a9a2cb0121fa7028c432 14-Mar-2014 Al Viro <viro@zeniv.linux.org.uk> get rid of files_defer_init()

the only thing it's doing these days is calculation of
upper limit for fs.nr_open sysctl and that can be done
statically

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
99aea68134f3c2a27b4d463c91cfa298c3efaccf 16-Mar-2014 Eric Biggers <ebiggers3@gmail.com> vfs: Don't let __fdget_pos() get FMODE_PATH files

Commit bd2a31d522344 ("get rid of fget_light()") introduced the
__fdget_pos() function, which returns the resulting file pointer and
fdput flags combined in an 'unsigned long'. However, it also changed the
behavior to return files with FMODE_PATH set, which shouldn't happen
because read(), write(), lseek(), etc. aren't allowed on such files.
This commit restores the old behavior.

This regression actually had no effect on read() and write() since
FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
to fail with ESPIPE rather than EBADF.

Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
bd2a31d522344b3ac2fb680bd2366e77a9bd8209 04-Mar-2014 Al Viro <viro@zeniv.linux.org.uk> get rid of fget_light()

instead of returning the flags by reference, we can just have the
low-level primitive return those in lower bits of unsigned long,
with struct file * derived from the rest.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
add1f0995454374d90c9d6b2c420d2fba3d0a4e3 12-Feb-2014 Paul E. McKenney <paulmck@linux.vnet.ibm.com> fs: Substitute rcu_access_pointer() for rcu_dereference_raw()

(Trivial patch.)

If the code is looking at the RCU-protected pointer itself, but not
dereferencing it, the rcu_dereference() functions can be downgraded to
rcu_access_pointer(). This commit makes this downgrade in __alloc_fd(),
which simply compares the RCU-protected pointer against NULL with no
dereferencing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
96c7a2ff21501691587e1ae969b83cbec8b78e08 10-Feb-2014 Eric W. Biederman <ebiederm@xmission.com> fs/file.c:fdtable: avoid triggering OOMs from alloc_fdmem

Recently due to a spike in connections per second memcached on 3
separate boxes triggered the OOM killer from accept. At the time the
OOM killer was triggered there was 4GB out of 36GB free in zone 1. The
problem was that alloc_fdtable was allocating an order 3 page (32KiB) to
hold a bitmap, and there was sufficient fragmentation that the largest
page available was 8KiB.

I find the logic that PAGE_ALLOC_COSTLY_ORDER can't fail pretty dubious
but I do agree that order 3 allocations are very likely to succeed.

There are always pathologies where order > 0 allocations can fail when
there are copious amounts of free memory available. Using the pigeon
hole principle it is easy to show that it requires 1 page more than 50%
of the pages being free to guarantee an order 1 (8KiB) allocation will
succeed, 1 page more than 75% of the pages being free to guarantee an
order 2 (16KiB) allocation will succeed and 1 page more than 87.5% of
the pages being free to guarantee an order 3 allocate will succeed.

A server churning memory with a lot of small requests and replies like
memcached is a common case that if anything can will skew the odds
against large pages being available.

Therefore let's not give external applications a practical way to kill
linux server applications, and specify __GFP_NORETRY to the kmalloc in
alloc_fdmem. Unless I am misreading the code and by the time the code
reaches should_alloc_retry in __alloc_pages_slowpath (where
__GFP_NORETRY becomes signification). We have already tried everything
reasonable to allocate a page and the only thing left to do is wait. So
not waiting and falling back to vmalloc immediately seems like the
reasonable thing to do even if there wasn't a chance of triggering the
OOM killer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e6ff9a9fa4e05c1c03dec63cdc6a87d6dea02755 13-Jan-2014 Oleg Nesterov <oleg@redhat.com> fs: __fget_light() can use __fget() in slow path

The slow path in __fget_light() can use __fget() to avoid the
code duplication. Saves 232 bytes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ad46183445043b562856c60b74db664668fb364b 13-Jan-2014 Oleg Nesterov <oleg@redhat.com> fs: factor out common code in fget_light() and fget_raw_light()

Apart from FMODE_PATH check fget_light() and fget_raw_light() are
identical, shift the code into the new helper, __fget_light(fd, mask).
Saves 208 bytes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1deb46e2562561255c34075825fd00f22a858bb3 13-Jan-2014 Oleg Nesterov <oleg@redhat.com> fs: factor out common code in fget() and fget_raw()

Apart from FMODE_PATH check fget() and fget_raw() are identical,
shift the code into the new simple helper, __fget(fd, mask). Saves
160 bytes.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ce08b62d18b3f97cd4e5a39bd5898872b9201875 11-Jan-2014 Oleg Nesterov <oleg@redhat.com> change close_files() to use rcu_dereference_raw(files->fdt)

put_files_struct() and close_files() do rcu_read_lock() to make
rcu_dereference_check_fdtable() happy.

This looks a bit ugly, files_fdtable() just reads the pointer,
we can simply use rcu_dereference_raw() to avoid the warning.

The patch also changes close_files() to return fdt, this avoids
another rcu_read_lock()/files_fdtable() in put_files_struct().

I think close_files() needs more cleanups:

- we do not need xchg() exactly because we are the last
user of this files_struct

- "if (file)" should be turned into WARN_ON(!file)

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a8d4b8345e0ee48b732126d980efaf0dc373e2b0 11-Jan-2014 Oleg Nesterov <oleg@redhat.com> introduce __fcheck_files() to fix rcu_dereference_check_fdtable(), kill rcu_my_thread_group_empty()

rcu_dereference_check_fdtable() looks very wrong,

1. rcu_my_thread_group_empty() was added by 844b9a8707f1 "vfs: fix
RCU-lockdep false positive due to /proc" but it doesn't really
fix the problem. A CLONE_THREAD (without CLONE_FILES) task can
hit the same race with get_files_struct().

And otoh rcu_my_thread_group_empty() can suppress the correct
warning if the caller is the CLONE_FILES (without CLONE_THREAD)
task.

2. files->count == 1 check is not really right too. Even if this
files_struct is not shared it is not safe to access it lockless
unless the caller is the owner.

Otoh, this check is sub-optimal. files->count == 0 always means
it is safe to use it lockless even if files != current->files,
but put_files_struct() has to take rcu_read_lock(). See the next
patch.

This patch removes the buggy checks and turns fcheck_files() into
__fcheck_files() which uses rcu_dereference_raw(), the "unshared"
callers, fget_light() and fget_raw_light(), can use it to avoid
the warning from RCU-lockdep.

fcheck_files() is trivially reimplemented as rcu_lockdep_assert()
plus __fcheck_files().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ac3e3c5b1164397656df81b9e9ab4991184d3236 29-Apr-2013 Al Viro <viro@zeniv.linux.org.uk> don't bother with deferred freeing of fdtables

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
eece09ec213e93333010bf4c6bb9175b32229c54 17-Jul-2011 Thomas Gleixner <tglx@linutronix.de> locking: Various static lock initializer fixes

The static lock initializers want to be fed the proper name of the
lock and not some random string. In mainline random strings are
obfuscating the readability of debug output, but for RT they prevent
the spinlock substitution. Fix it up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
6ae141718e3f9c7e2c620e999c86612a7f415bb1 22-Dec-2012 Greg Kroah-Hartman <gregkh@linuxfoundation.org> misc: remove __dev* attributes.

CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.

This change removes the last of the __dev* markings from the kernel from
a variety of different, tiny, places.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
a77cfcb429ed98845a4e4df72473b8f37acd890b 30-Nov-2012 Al Viro <viro@zeniv.linux.org.uk> fix off-by-one in argument passed by iterate_fd() to callbacks

Noticed by Pavel Roskin; the thing in his patch I disagree with
was compensating for that shite in callbacks instead of fixing
it once in the iterator itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
c4144670fd9b34d6eae22c9f83751745898e8243 02-Oct-2012 Al Viro <viro@zeniv.linux.org.uk> kill daemonize()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5a8477660d9ddc090203736d7271137265cb25bb 12-Nov-2012 Al Viro <viro@zeniv.linux.org.uk> kill bogus BUG_ON() in do_close_on_exec()

It can be legitimately triggered via procfs access. Now, at least
2 of 3 of get_files_struct() callers in procfs are useless, but
when and if we get rid of those we can always add WARN_ON() here.
BUG_ON() at that spot is simply wrong.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
08f05c49749ee655bef921d12160960a273aad47 31-Oct-2012 Al Viro <viro@ZenIV.linux.org.uk> Return the right error value when dup[23]() newfd argument is too large

Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
case changed incorrectly after 3.6.

The culprit is commit f33ff9927f42 ("take rlimit check to callers of
expand_files()") which when it moved the "return -EMFILE" out to the
caller, didn't notice that the dup3() had special code to turn the
EMFILE return into EBADF.

The replace_fd() helper that got added later then inherited the bug too.

Reported-by: Jack Lin <linliangjie@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aed976475bff939672b0e21595839c445dcec0fa 09-Oct-2012 Richard W.M. Jones <rjones@redhat.com> dup3: Return an error when oldfd == newfd.

I have tested the attached patch to fix the dup3 regression.

Rich.

From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 9 Oct 2012 14:42:45 +0100
Subject: [PATCH] dup3: Return an error when oldfd == newfd.

The following commit:

commit fe17f22d7fd0e344ef6447238f799bb49f670c6f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Tue Aug 21 11:48:11 2012 -0400

take purely descriptor-related stuff from fcntl.c to file.c

was supposed to be just code motion, but it dropped the following two
lines:

if (unlikely(oldfd == newfd))
return -EINVAL;

from the dup3 system call. dup3 is not specified by POSIX, so Linux
can do what it likes. However the POSIX proposal for dup3 [1] states
that it should return an error if oldfd == newfd.

[1] http://austingroupbugs.net/view.php?id=411

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4557c669ef9801d96cf663331cdd1dcb8fa9c2f1 28-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> export fget_light

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
864bdb3b6cbd9911222543fef1cfe36f88183f44 23-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> new helper: daemonize_descriptors()

descriptor-related parts of daemonize, done right. As the
result we simplify the locking rules for ->files - we
hold task_lock in *all* cases when we modify ->files.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
c3c073f808b22dfae15ef8412b6f7b998644139a 22-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> new helper: iterate_fd()

iterates through the opened files in given descriptor table,
calling a supplied function; we stop once non-zero is returned.
Callback gets struct file *, descriptor number and const void *
argument passed to iterator. It is called with files->file_lock
held, so it is not allowed to block.

tty_io, netprio_cgroup and selinux flush_unauthorized_files()
converted to its use.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ad47bd7252bf402fe7dba92f5240b5ed16832ae7 22-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> make expand_files() and alloc_fd() static

no callers outside of fs/file.c left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
b8318b01a8f7f760ae3ecae052ccc7fc123d9508 22-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> take __{set,clear}_{open_fd,close_on_exec}() into fs/file.c

nobody uses those outside anymore.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8280d16172243702ed43432f826ca6130edb4086 21-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> new helper: replace_fd()

analog of dup2(), except that it takes struct file * as source.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fe17f22d7fd0e344ef6447238f799bb49f670c6f 21-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> take purely descriptor-related stuff from fcntl.c to file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6a6d27de340c89c5323565b49f7851362619925d 21-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> take close-on-exec logics to fs/file.c, clean it up a bit

... and add cond_resched() there, while we are at it. We can
get large latencies as is...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
483ce1d4b8c3b82bc9c9a1dd9dbc44f50b3aaf5a 19-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> take descriptor-related part of close() to file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
0ee8cdfe6af052deb56dccd54838a1eb32fb4ca2 16-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> take fget() and friends to fs/file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
f869e8a7f753e3fd43d6483e796774776f645edb 16-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> expose a low-level variant of fd_install() for binder

Similar situation to that of __alloc_fd(); do not use unless you
really have to. You should not touch any descriptor table other
than your own; it's a sure sign of a really bad API design.

As with __alloc_fd(), you *must* use a first-class reference to
struct files_struct; something obtained by get_files_struct(some task)
(let alone direct task->files) will not do. It must be either
current->files, or obtained by get_files_struct(current) by the
owner of that sucker and given to you.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
56007cae94f349387c088e738c7dcb6bc513063b 16-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> move put_unused_fd() and fd_install() to fs/file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1983e781da2f7f77906f4ccc2c3dc279cd61d1ff 16-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> trim free_fdtable_rcu()

embedded case isn't hit anymore

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
b9e02af0ae0783894abb576fbab45ec29aa8e7fc 16-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> don't bother with call_rcu() in put_files_struct()

At that point nobody can see us anyway; everything that
looks at files_fdtable(files) is separated from the
guts of put_files_struct(files) - either since files is
current->files or because we fetched it under task_lock()
and hadn't dropped that yet, or because we'd bumped
files->count while holding task_lock()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
7cf4dc3c8dbfdfde163d4636f621cf99a1f63bfb 16-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> move files_struct-related bits from kernel/exit.c to fs/file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
dcfadfa4ec5a12404a99ad6426871a6b03a62b37 12-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> new helper: __alloc_fd()

Essentially, alloc_fd() in a files_struct we own a reference to.
Most of the time wanting to use it is a sign of lousy API
design (such as android/binder). It's *not* a general-purpose
interface; better that than open-coding its guts, but again,
playing with other process' descriptor table is a sign of bad
design.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
f33ff9927f42045116d738ee47ff7bc59f739bd7 12-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> take rlimit check to callers of expand_files()

... except for one in android, where the check is different
and already done in caller. No need to recalculate rlimit
many times in alloc_fd() either.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1a7bd2265fc57f29400d57f66275cc5918e30aa6 12-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> make get_unused_fd_flags() a function

... and get_unused_fd() a macro around it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
630d9c47274aa89bfa77fe6556d7818bdcb12992 17-Nov-2011 Paul Gortmaker <paul.gortmaker@windriver.com> fs: reduce the use of module.h wherever possible

For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
f044db4cb4bf16893812d35b5fbeaaf3e30c9215 22-Feb-2012 Bobby Powers <bobbypowers@gmail.com> fs: Fix close_on_exec pointer in alloc_fdtable

alloc_fdtable allocates space for the open_fds and close_on_exec
bitfields together, as 2 * nr / BITS_PER_BYTE. close_on_exec needs to
point to open_fds + nr / BITS_PER_BYTE, not open_fds + nr /
BITS_PER_LONG, as introducted in 1fd36adc: Replace the fd_sets in
struct fdtable with an array of unsigned longs.

Signed-off-by: Bobby Powers <bobbypowers@gmail.com>
Link: http://lkml.kernel.org/r/1329888587-3087-1-git-send-email-bobbypowers@gmail.com
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
1fd36adcd98c14d2fd97f545293c488775cb2823 16-Feb-2012 David Howells <dhowells@redhat.com> Replace the fd_sets in struct fdtable with an array of unsigned longs

Replace the fd_sets in struct fdtable with an array of unsigned longs and then
use the standard non-atomic bit operations rather than the FD_* macros.

This:

(1) Removes the abuses of struct fd_set:

(a) Since we don't want to allocate a full fd_set the vast majority of the
time, we actually, in effect, just allocate a just-big-enough array of
unsigned longs and cast it to an fd_set type - so why bother with the
fd_set at all?

(b) Some places outside of the core fdtable handling code (such as
SELinux) want to look inside the array of unsigned longs hidden inside
the fd_set struct for more efficient iteration over the entire set.

(2) Eliminates the use of FD_*() macros in the kernel completely.

(3) Permits the __FD_*() macros to be deleted entirely where not exposed to
userspace.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
1dce27c5aa6770e9d195f2bb7db1db3d4dde5591 16-Feb-2012 David Howells <dhowells@redhat.com> Wrap accesses to the fd_sets in struct fdtable

Wrap accesses to the fd_sets in struct fdtable (for recording open files and
close-on-exec flags) so that we can move away from using fd_sets since we
abuse the fd_set structs by not allocating the full-sized structure under
normal circumstances and by non-core code looking at the internals of the
fd_sets.

The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
since that cannot be told about their abnormal lengths.

This introduces six wrapper functions for setting, clearing and testing
close-on-exec flags and fd-is-open flags:

void __set_close_on_exec(int fd, struct fdtable *fdt);
void __clear_close_on_exec(int fd, struct fdtable *fdt);
bool close_on_exec(int fd, const struct fdtable *fdt);
void __set_open_fd(int fd, struct fdtable *fdt);
void __clear_open_fd(int fd, struct fdtable *fdt);
bool fd_is_open(int fd, const struct fdtable *fdt);

Note that I've prepended '__' to the names of the set/clear functions because
they require the caller to hold a lock to use them.

Note also that I haven't added wrappers for looking behind the scenes at the
the array. Possibly that should exist too.

Signed-off-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
6d4831c283530a5f2c6bd8172c13efa236eb149d 28-Apr-2011 Andrew Morton <akpm@linux-foundation.org> vfs: avoid large kmalloc()s for the fdtable

Azurit reports large increases in system time after 2.6.36 when running
Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
to allocate fdmem if possible").

That patch caused the vfs to use kmalloc() for very large allocations and
this is causing excessive work (and presumably excessive reclaim) within
the page allocator.

Fix it by falling back to vmalloc() earlier - when the allocation attempt
would have been considered "costly" by reclaim.

Reported-by: azurIt <azurit@pobox.sk>
Tested-by: azurIt <azurit@pobox.sk>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a892e2d7dcdfa6c76e60c50a8c7385c65587a2a6 11-Aug-2010 Changli Gao <xiaosuo@gmail.com> vfs: use kmalloc() to allocate fdmem if possible

Use kmalloc() to allocate fdmem if possible.

vmalloc() is used as a fallback solution for fdmem allocation. A new
helper function __free_fdtable() is introduced to reduce the lines of
code.

A potential bug, vfree() a memory allocated by kmalloc(), is fixed.

[akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()]
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
b97181f24212f4c29197890ce1b2b9100bcc184d 11-May-2010 Paul E. McKenney <paulmck@linux.vnet.ibm.com> fs: remove all rcu head initializations, except on_stack initializations

Remove all rcu head inits. We don't care about the RCU head state before passing
it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
keep track of objects on stack.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andries Brouwer <aeb@cwi.nl>
d554ed895dc8f293cc712c71f14b101ace82579a 05-Mar-2010 Jiri Slaby <jslaby@suse.cz> fs: use rlimit helpers

Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.

I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7dc52157982ab771f40e3c0b7dc55b954c3c2d19 23-Feb-2010 Paul E. McKenney <paulmck@linux.vnet.ibm.com> vfs: Apply lockdep-based checking to rcu_dereference() uses

Add lockdep-ified RCU primitives to alloc_fd(), files_fdtable()
and fcheck_files().

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
LKML-Reference: <1266887105-1528-8-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d43c36dc6b357fa1806800f18aa30123c747a6d1 07-Oct-2009 Alexey Dobriyan <adobriyan@gmail.com> headers: remove sched.h from interrupt.h

After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
1027abe8827b47f7e9c4ed6514fde3d44f79963c 30-Jul-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] merge locate_fd() and get_unused_fd()

New primitive: alloc_fd(start, flags). get_unused_fd() and
get_unused_fd_flags() become wrappers on top of it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4e1e018ecc6f7bfd10fc75b3ff9715cc8164e0a2 26-Jul-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] fix RLIM_NOFILE handling

* dup2() should return -EBADF on exceeded sysctl_nr_open
* dup() should *not* return -EINVAL even if you have rlimit set to 0;
it should get -EMFILE instead.

Check for orig_start exceeding rlimit taken to sys_fcntl().
Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF.
Consequently, remaining checks for rlimit are taken to expand_files().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
eceea0b3df05ed262ae32e0c6340cc7a3626632d 10-May-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] avoid multiplication overflows and signedness issues for max_fds

Limit sysctl_nr_open - we don't want ->max_fds to exceed MAX_INT and
we don't want size calculation for ->fd[] to overflow.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
adbecb128cd2cc5d14b0ebef6d020ced0efd0ec6 09-May-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] dup_fd() part 4 - race fix

Parent _can_ be a clone task, contrary to the comment. Moreover,
more files could be opened while we allocate a copy, in which case
we end up copying only part into new descriptor table. Since what
we get _is_ affected by all changes in the old range, we can get
rather weird effects - e.g.
dup2(0, 1024); close(0);
in parallel with fork() resulting in child that sees the effect of
close(), but not that of dup2() done just before that close().

What we need is to recalculate the open_count after having reacquired
->file_lock and if external fdtable we'd just allocated is too small for
it, free the sucker and redo allocation.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
afbec7fff4928c273a1f1bb14dfdfdf62688a193 09-May-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] dup_fd() - part 3

merge alloc_files() into dup_fd(), leave setting newf->fdt until the end

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9dec3c4d306b09b31331e475e895bb9674e16d81 09-May-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] dup_fd() part 2

use alloc_fdtable() instead of expand_files(), get rid of pointless
grabbing newf->file_lock, kill magic in copy_fdtable() that used to
be there only to skip copying when called from dup_fd().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
02afc6267f6d55d47aba9fcafdbd1b7230d2294a 09-May-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] dup_fd() fixes, part 1

Move the sucker to fs/file.c in preparation to the rest

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
f52111b1546943545e67573c4dde1c7613ca33d3 09-May-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] take init_files to fs/file.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5c598b3428c372a1209597cee99a70da20625876 28-Apr-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] fix sysctl_nr_open bugs

* if luser with root sets it to something that is not a multiple of
BITS_PER_LONG, the system is screwed.
* if it gets decreased at the wrong time, we can get expand_files()
returning success and _not_ increasing the size of table as asked.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9f3acc3140444a900ab280de942291959f0f615d 24-Apr-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] split linux/file.h

Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9cfe015aa424b3c003baba3841a60dd9b5ad319b 06-Feb-2008 Eric Dumazet <dada1@cosmosbay.com> get rid of NR_OPEN and introduce a sysctl_nr_open

NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
01b2d93ca4c495f056471189ac6c4e6ac4cbbccb 22-Dec-2006 Vadim Lobanov <vlobanov@speakeasy.net> [PATCH] fdtable: Provide free_fdtable() wrapper

Christoph Hellwig has expressed concerns that the recent fdtable changes
expose the details of the RCU methodology used to release no-longer-used
fdtable structures to the rest of the kernel. The trivial patch below
addresses these concerns by introducing the appropriate free_fdtable()
calls, which simply wrap the release RCU usage. Since free_fdtable() is a
one-liner, it makes sense to promote it to an inline helper.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
5466b456ed6748e0bfe02831e570004d4c04c1d7 10-Dec-2006 Vadim Lobanov <vlobanov@speakeasy.net> [PATCH] fdtable: Implement new pagesize-based fdtable allocator

This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries. The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.

The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:

pagesize / 4
pagesize / 2
pagesize <- memory allocator switch point
pagesize * 2
pagesize * 4
...etc...

Unlike the current implementation, this allocation scheme does not require a
loop to compute the optimal fdarray size, and can be done in efficient
straightline code.

Furthermore, since the fdarray overflows the pagesize boundary long before any
of the fdsets do, it makes sense to optimize run-time by allocating both
fdsets in a single swoop. Even together, they will still be, by far, smaller
than the fdarray. The fdtable->open_fds is now used as the anchor for the
fdset memory allocation.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
4fd45812cbe875a620c86a096a5d46c742694b7e 10-Dec-2006 Vadim Lobanov <vlobanov@speakeasy.net> [PATCH] fdtable: Remove the free_files field

An fdtable can either be embedded inside a files_struct or standalone (after
being expanded). When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.

Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded. We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
bbea9f69668a3d0cf9feba15a724cd02896f8675 10-Dec-2006 Vadim Lobanov <vlobanov@speakeasy.net> [PATCH] fdtable: Make fdarray and fdsets equal in size

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
593be07ae8f6f4a1b1b98813fabb155328f8bc0c 07-Dec-2006 Tejun Heo <htejun@gmail.com> [PATCH] file: kill unnecessary timer in fdtable_defer

free_fdtable_rc() schedules timer to reschedule fddef->wq if
schedule_work() on it returns 0. However, schedule_work() guarantees that
the target work is executed at least once after the scheduling regardless
of its return value. 0 return simply means that the work was already
pending and thus no further action was required.

Another problem is that it used contant '5' as @expires argument to
mod_timer().

Kill unnecessary fddef->timer.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
65f27f38446e1976cc98fd3004b110fedcddd189 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: Pass the work_struct pointer instead of context data

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).


Signed-Off-By: David Howells <dhowells@redhat.com>
327dcaadc0bc08ad081aa8e36b6ec7ad7aa45e30 29-Sep-2006 Andrew Morton <akpm@osdl.org> [PATCH] expand_fdtable(): remove pointless unlock+lock

This unlock/lock on a super-unlikely path isn't worth the kernel text.

Cc: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
74d392aaabfc890cc1f0e80fc5ff13e5d3bcf4c9 29-Sep-2006 Vadim Lobanov <vlobanov@speakeasy.net> [PATCH] Clean up expand_fdtable() and expand_files()

Perform a code cleanup against the expand_fdtable() and expand_files()
functions inside fs/file.c. It aims to make the flow of code within these
functions simpler and easier to understand, via added comments and modest
refactoring.

Signed-off-by: Vadim Lobanov <vlobanov@speakeasy.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
8b0e330b7720a206339887044fa275bf537a5264 27-Sep-2006 Andrew Morton <akpm@osdl.org> [PATCH] alloc_fdtable() cleanup

free_fdset(NULL, ...) is legal.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a29b0b74e73b66674d20a170e463fe9032f2272a 12-Jul-2006 Andrew Morton <akpm@osdl.org> [PATCH] alloc_fdtable() expansion fix

We're supposed to go the next power of two if nfds==nr.

Of `nr', not of `nfsd'.

Spotted by Rene Scharfe <rene.scharfe@lsrfire.ath.cx>

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
d579091b4385e9386e244622d593fe064aa8e8e7 12-Jul-2006 Kirill Korotaev <dev@openvz.org> [PATCH] fix fdset leakage

When found, it is obvious. nfds calculated when allocating fdsets is
rewritten by calculation of size of fdtable, and when we are unlucky, we
try to free fdsets of wrong size.

Found due to OpenVZ resource management (User Beancounters).

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
92eb7a2f28d551acedeb5752263267a64b1f5ddf 10-Jul-2006 Andrew Morton <akpm@osdl.org> [PATCH] fix weird logic in alloc_fdtable()

There's a fairly obvious infinite loop in there.

Also, use roundup_pow_of_two() rather than open-coding stuff.

Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0a945022778f100115d0cb6234eb28fc1b15ccaf 28-Mar-2006 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [PATCH] for_each_possible_cpu: fixes for generic part

replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0c9e63fd38a2fb2181668a0cdd622a3c23cfd567 23-Mar-2006 Eric Dumazet <dada1@cosmosbay.com> [PATCH] Shrinks sizeof(files_struct) and better layout

1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.

2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.

3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.

UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
88a2a4ac6b671a4b0dd5d2d762418904c05f4104 05-Feb-2006 Eric Dumazet <dada1@cosmosbay.com> [PATCH] percpu data: only iterate over possible CPUs

percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <axboe@suse.de>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: William Irwin <wli@holomorphy.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
0b175a7e68c2f51555820efb0a01681e3419c1bc 14-Sep-2005 Dipankar Sarma <dipankar@in.ibm.com> [PATCH] Fix the fdtable freeing in the case of vmalloced fdset/arrays

Noted by David Miller:

"The bug is that free_fd_array() takes a "num" argument, but when
calling it from __free_fdtable() we're instead passing in the size in
bytes (ie. "num * sizeof(struct file *)")."

Yes it is a bug. I think I messed it up while merging newer
changes with an older version where I was using size in bytes
to optimize.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ab2af1f5005069321c5d130f09cce577b03f43ef 09-Sep-2005 Dipankar Sarma <dipankar@in.ibm.com> [PATCH] files: files struct with RCU

Patch to eliminate struct files_struct.file_lock spinlock on the reader side
and use rcu refcounting rcuref_xxx api for the f_count refcounter. The
updates to the fdtable are done by allocating a new fdtable structure and
setting files->fdt to point to the new structure. The fdtable structure is
protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that
are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A
global list of fdtable to be freed is not scalable, so we use a per-cpu list.
If keventd is already handling the current cpu's work, we use a timer to defer
queueing of that work.

Since the last publication, this patch has been re-written to avoid using
explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
premitives instead. This required that the fd information is kept in a
separate structure (fdtable) and updated atomically.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
badf16621c1f9d1ac753be056fce11b43d6e0be5 09-Sep-2005 Dipankar Sarma <dipankar@in.ibm.com> [PATCH] files: break up files struct

In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically. Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure. This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct. It also changes all
the users to refer to the file table using files_fdtable() macro. Subsequent
applciation of RCU becomes easier after this.

Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!