History log of /fs/proc/generic.c
Revision Date Author Comments
4dcc03fc4553943c8bae086aa826bb75171e82a1 08-Aug-2014 Alexey Dobriyan <adobriyan@gmail.com> proc: make proc_subdir_lock static

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dbcdb504417ae108a20454ef89776a614b948571 08-Aug-2014 Alexey Dobriyan <adobriyan@gmail.com> proc: add and remove /proc entry create checks

* remove proc_create(NULL, ...) check, let it oops

* warn about proc_create("", ...) and proc_create("very very long name", ...)
proc code keeps length as u8, no 256+ name length possible

* warn about proc_create("123", ...)
/proc/$PID and /proc/misc namespaces are separate things,
but dumb module might create funky a-la $PID entry.

* remove post mortem strchr('/') check
Triggering it implies either strchr() is buggy or memory corruption.
It should be VFS check anyway.

In reality, none of these checks will ever trigger,
it is preparation for the next patch.

Based on patch from Al Viro.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cdf7e8dded6212cb29f758017a613e4eefc4ce9e 24-Jan-2014 Rui Xiang <rui.xiang@huawei.com> proc: set attributes of pde using accessor functions

Use existing accessors proc_set_user() and proc_set_size() to set
attributes. Just a cleanup.

Signed-off-by: Rui Xiang <rui.xiang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
b26d4cd385fc51e8844e2cdf9ba2051f5bba11a5 26-Oct-2013 Al Viro <viro@zeniv.linux.org.uk> consolidate simple ->d_delete() instances

Rename simple_delete_dentry() to always_delete_dentry() and export it.
Export simple_dentry_operations, while we are at it, and get rid of
their duplicates

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fd3930f70c8d14008f3377d51ce039806dfc542e 20-Aug-2013 Linus Torvalds <torvalds@linux-foundation.org> proc: more readdir conversion bug-fixes

In the previous commit, Richard Genoud fixed proc_root_readdir(), which
had lost the check for whether all of the non-process /proc entries had
been returned or not.

But that in turn exposed _another_ bug, namely that the original readdir
conversion patch had yet another problem: it had lost the return value
of proc_readdir_de(), so now checking whether it had completed
successfully or not didn't actually work right anyway.

This reinstates the non-zero return for the "end of base entries" that
had also gotten lost in commit f0c3b5093add ("[readdir] convert
procfs"). So now you get all the base entries *and* you get all the
process entries, regardless of getdents buffer size.

(Side note: the Linux "getdents" manual page actually has a nice example
application for testing getdents, which can be easily modified to use
different buffers. Who knew? Man-pages can be useful)

Reported-by: Emmanuel Benisty <benisty.e@gmail.com>
Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
Cc: Richard Genoud <richard.genoud@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f0c3b5093addc8bfe9fe3a5b01acb7ec7969eafa 16-May-2013 Al Viro <viro@zeniv.linux.org.uk> [readdir] convert procfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
c30480b92cf497aa3b463367a82f1c2fdc5c46e9 12-Apr-2013 David Howells <dhowells@redhat.com> proc: Make the PROC_I() and PDE() macros internal to procfs

Make the PROC_I() and PDE() macros internal to procfs. This means making
PDE_DATA() out of line. This could be made more optimal by storing
PDE()->data into inode->i_private.

Also provide a __PDE_DATA() that is inline and internal to procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a8ca16ea7b0abb0a7e49492d1123b715f0ec62e8 12-Apr-2013 David Howells <dhowells@redhat.com> proc: Supply a function to remove a proc entry by PDE

Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer. This allows us to eliminate all remaining pde->name
accesses outside of procfs.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Grant Likely <grant.likely@linaro.or>
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
4a520d2769beb736ba2bd084b8293ce148a1a7ae 12-Apr-2013 David Howells <dhowells@redhat.com> proc: Supply an accessor for getting the data from a PDE's parent

Supply an accessor function for getting the private data from the parent
proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

ReiserFS, for instance, stores the super_block pointer in the proc directory
it makes for that super_block, and a pointer to the respective seq_file show
function in each of the proc files in that directory.

This allows a reduction in the number of file_operations structs, open
functions and seq_operations structs required. The problem otherwise is that
each show function requires two pieces of data but only has storage for one
per PDE (and this has no release function).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: Maxim Mikityanskiy <maxtram95@gmail.com>
cc: YAMANE Toshiaki <yamanetoshi@gmail.com>
cc: linux-wireless@vger.kernel.org
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
270b5ac2151707c25d3327722c5badfbd95945bc 12-Apr-2013 David Howells <dhowells@redhat.com> proc: Add proc_mkdir_data()

Add proc_mkdir_data() to allow procfs directories to be created that are
annotated at the time of creation with private data rather than doing this
post-creation. This means no access is then required to the proc_dir_entry
struct to set this.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Neela Syam Kolli <megaraidlinux@lsi.com>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
cc: linux-wireless@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
271a15eabe094538d958dc68ccfc9c36b699247a 12-Apr-2013 David Howells <dhowells@redhat.com> proc: Supply PDE attribute setting accessor functions

Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
3cb5bf1bf947d325fcf6e9458952b51cfd7e6677 11-Apr-2013 David Howells <dhowells@redhat.com> proc: Delete create_proc_read_entry()

Delete create_proc_read_entry() as it no longer has any users.

Also delete read_proc_t, write_proc_t, the read_proc member of the
proc_dir_entry struct and the support functions that use them. This saves a
pointer for every PDE allocated.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
866ad9a747bbf5461739fcae6d0a41c8971bbe1d 04-Apr-2013 Al Viro <viro@zeniv.linux.org.uk> procfs: preparations for remove_proc_entry() race fixes

* leave ->proc_fops alone; make ->pde_users negative instead
* trim pde_opener
* move relevant code in fs/proc/inode.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ad147d011f4e9d4e4309f7974fd19c7f875ccb14 04-Apr-2013 David Howells <dhowells@redhat.com> procfs: Clean up huge if-statement in __proc_file_read()

Switch huge if-statement in __proc_file_read() around. This then puts the
single line loop break immediately after the if-statement and allows us to
de-indent the huge comment and make it take fewer lines. The code following
the if-statement then follows naturally from the call to dp->read_proc().

Signed-off-by: David Howells <dhowells@redhat.com>
80e928f7ebb958f4d79d4099d1c5c0a015a23b93 04-Apr-2013 David Howells <dhowells@redhat.com> proc: Kill create_proc_entry()

Kill create_proc_entry() in favour of create_proc_read_entry(), proc_create()
and proc_create_data().

Signed-off-by: David Howells <dhowells@redhat.com>
d9dda78bad879595d8c4220a067fc029d6484a16 01-Apr-2013 Al Viro <viro@zeniv.linux.org.uk> procfs: new helper - PDE_DATA(inode)

The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data. Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ee21ed0afc2f47007fbd8b22928ecb17316e13e2 31-Mar-2013 Al Viro <viro@zeniv.linux.org.uk> procfs: kill ->write_proc()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
b6cdc7310338e204224f865918f774eb6db0b75d 31-Mar-2013 Al Viro <viro@zeniv.linux.org.uk> procfs: don't allow to use proc_create, create_proc_entry, etc. for directories

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
8ce584c7416d8a85a6f3edc17d1cddefe331e87e 31-Mar-2013 Al Viro <viro@zeniv.linux.org.uk> procfs: add proc_remove_subtree()

just what it sounds like; do that only to procfs subtrees you've
created - doing that to something shared with another driver is
not only antisocial, but might cause interesting races with
proc_create() and its ilk.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
87ebdc00eeb474615496d5f10eed46709e25c707 28-Feb-2013 Andrew Morton <akpm@linux-foundation.org> fs/proc: clean up printks

- use pr_foo() throughout

- remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...")

- nuke a few warnings which I've never seen happen, ever.

Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d3d009cb965eae7e002ea5badf603ea8f4c34915 26-Jan-2013 Al Viro <viro@zeniv.linux.org.uk> saner proc_get_inode() calling conventions

Make it drop the pde in *all* cases when no new reference to it is
put into an inode - both when an inode had already been set up
(as we were already doing) and when inode allocation has failed.
Makes for simpler logics in callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
496ad9aa8ef448058e36ca7a787c61f2e63f0f54 23-Jan-2013 Al Viro <viro@zeniv.linux.org.uk> new helper: file_inode(file)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
dfb2ea45becb198beeb75350d0b7b7ad9076a38f 22-Dec-2012 Eric W. Biederman <ebiederm@xmission.com> proc: Allow proc_free_inum to be called from any context

While testing the pid namespace code I hit this nasty warning.

[ 176.262617] ------------[ cut here ]------------
[ 176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
[ 176.265145] Hardware name: Bochs
[ 176.265677] Modules linked in:
[ 176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
[ 176.266564] Call Trace:
[ 176.266564] [<ffffffff810a539f>] warn_slowpath_common+0x7f/0xc0
[ 176.266564] [<ffffffff810a53fa>] warn_slowpath_null+0x1a/0x20
[ 176.266564] [<ffffffff810ad9ea>] local_bh_enable_ip+0x7a/0xa0
[ 176.266564] [<ffffffff819308c9>] _raw_spin_unlock_bh+0x19/0x20
[ 176.266564] [<ffffffff8123dbda>] proc_free_inum+0x3a/0x50
[ 176.266564] [<ffffffff8111d0dc>] free_pid_ns+0x1c/0x80
[ 176.266564] [<ffffffff8111d195>] put_pid_ns+0x35/0x50
[ 176.266564] [<ffffffff810c608a>] put_pid+0x4a/0x60
[ 176.266564] [<ffffffff8146b177>] tty_ioctl+0x717/0xc10
[ 176.266564] [<ffffffff810aa4d5>] ? wait_consider_task+0x855/0xb90
[ 176.266564] [<ffffffff81086bf9>] ? default_spin_lock_flags+0x9/0x10
[ 176.266564] [<ffffffff810cab0a>] ? remove_wait_queue+0x5a/0x70
[ 176.266564] [<ffffffff811e37e8>] do_vfs_ioctl+0x98/0x550
[ 176.266564] [<ffffffff810b8a0f>] ? recalc_sigpending+0x1f/0x60
[ 176.266564] [<ffffffff810b9127>] ? __set_task_blocked+0x37/0x80
[ 176.266564] [<ffffffff810ab95b>] ? sys_wait4+0xab/0xf0
[ 176.266564] [<ffffffff811e3d31>] sys_ioctl+0x91/0xb0
[ 176.266564] [<ffffffff810a95f0>] ? task_stopped_code+0x50/0x50
[ 176.266564] [<ffffffff81939199>] system_call_fastpath+0x16/0x1b
[ 176.266564] ---[ end trace 387af88219ad6143 ]---

It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
put_pid is called with another spinlock held and irqs disabled.

For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
ee297209bf0a25c6717b7c063e76795142d32f37 21-Dec-2012 Xiaotian Feng <xtfeng@gmail.com> proc: fix inconsistent lock state

Lockdep found an inconsistent lock state when rcu is processing delayed
work in softirq. Currently, kernel is using spin_lock/spin_unlock to
protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
context.

Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

=================================
[ INFO: inconsistent lock state ]
3.7.0 #36 Not tainted
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
{SOFTIRQ-ON-W} state was registered at:
__lock_acquire+0x8ae/0xca0
lock_acquire+0x199/0x200
_raw_spin_lock+0x41/0x50
proc_alloc_inum+0x4c/0xd0
alloc_mnt_ns+0x49/0xc0
create_mnt_ns+0x25/0x70
mnt_init+0x161/0x1c7
vfs_caches_init+0x107/0x11a
start_kernel+0x348/0x38c
x86_64_start_reservations+0x131/0x136
x86_64_start_kernel+0x103/0x112
irq event stamp: 2993422
hardirqs last enabled at (2993422): _raw_spin_unlock_irqrestore+0x55/0x80
hardirqs last disabled at (2993421): _raw_spin_lock_irqsave+0x29/0x70
softirqs last enabled at (2993394): _local_bh_enable+0x13/0x20
softirqs last disabled at (2993395): call_softirq+0x1c/0x30

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(proc_inum_lock);
<Interrupt>
lock(proc_inum_lock);

*** DEADLOCK ***

no locks held by swapper/1/0.

stack backtrace:
Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
Call Trace:
<IRQ> [<ffffffff810a40f1>] ? vprintk_emit+0x471/0x510
print_usage_bug+0x2a5/0x2c0
mark_lock+0x33b/0x5e0
__lock_acquire+0x813/0xca0
lock_acquire+0x199/0x200
_raw_spin_lock+0x41/0x50
proc_free_inum+0x1c/0x50
free_pid_ns+0x1c/0x50
put_pid_ns+0x2e/0x50
put_pid+0x4a/0x60
delayed_put_pid+0x12/0x20
rcu_process_callbacks+0x462/0x790
__do_softirq+0x1b4/0x3b0
call_softirq+0x1c/0x30
do_softirq+0x59/0xd0
irq_exit+0x54/0xd0
smp_apic_timer_interrupt+0x95/0xa3
apic_timer_interrupt+0x72/0x80
cpuidle_enter_tk+0x10/0x20
cpuidle_enter_state+0x17/0x50
cpuidle_idle_call+0x287/0x520
cpu_idle+0xba/0x130
start_secondary+0x2b3/0x2bc

Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
46f69557103e11fb963ae5c98b7777e90493241b 15-Dec-2012 Marco Stornelli <marco.stornelli@gmail.com> procfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
33d6dce607573b5fd7a43168e0d91221b3ca532b 17-Jun-2011 Eric W. Biederman <ebiederm@xmission.com> proc: Generalize proc inode allocation

Generalize the proc inode allocation so that it can be
used without having to having to create a proc_dir_entry.

This will allow namespace file descriptors to remain light
weight entitities but still have the same inode number
when the backing namespace is the same.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
17baa2a2c40f2f0330d25819b589a515d36b2d40 05-Oct-2012 yan <clouds.yan@gmail.com> proc: use kzalloc instead of kmalloc and memset

Part of the memory will be written twice after this change, but that
should be negligible.

[akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
Signed-off-by: yan <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
620727506dc6da0562fa4f6950dedb8a51bd8237 05-Oct-2012 yan <clouds.yan@gmail.com> proc: return -ENOMEM when inode allocation failed

If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion. proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
00cd8dd3bf95f2cc8435b4cac01d9995635c6d0b 10-Jun-2012 Al Viro <viro@zeniv.linux.org.uk> stop passing nameidata to ->lookup()

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
d161a13f974c72fd7ff0069d39a3ae57cb5694ff 24-Jul-2011 Al Viro <viro@zeniv.linux.org.uk> switch procfs to umode_t use

both proc_dir_entry ->mode and populating functions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
bfe8684869601dacfcb2cd69ef8cfd9045f62170 28-Oct-2011 Miklos Szeredi <mszeredi@suse.cz> filesystems: add set_nlink()

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
09570f914914d2beb0db29c5a9c7344934f2fa8c 27-Jul-2011 David Howells <dhowells@redhat.com> proc: make struct proc_dir_entry::name a terminal array rather than a pointer

Since __proc_create() appends the name it is given to the end of the PDE
structure that it allocates, there isn't a need to store a name pointer.
Instead we can just replace the name pointer with a terminal char array of
_unspecified_ length. The compiler will simply append the string to statically
defined variables of PDE type overlapping any hole at the end of the structure
and, unlike specifying an explicitly _zero_ length array, won't give a warning
if you try to statically initialise it with a string of more than zero length.

Also, whilst we're at it:

(1) Move namelen to end just prior to name and reduce it to a single byte
(name shouldn't be longer than NAME_MAX).

(2) Move pde_unload_lock two places further on so that if it's four bytes in
size on a 64-bit machine, it won't cause an unused hole in the PDE struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
011159a0a746e03ae42d559ce5c2a70138da3129 13-May-2011 Alexey Dobriyan <adobriyan@gmail.com> airo: correct proc entry creation interfaces

* use proc_mkdir_mode() instead of create_proc_entry(S_IFDIR|...),
export proc_mkdir_mode() for that, oh well.
* don't supply S_IFREG to proc_create_data(), it's unnecessary

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
312ec7e50c4d3f40b3762af651d1aa79a67f556a 24-Mar-2011 Alexey Dobriyan <adobriyan@gmail.com> proc: make struct proc_dir_entry::namelen unsigned int

1. namelen is declared "unsigned short" which hints for "maybe space savings".
Indeed in 2.4 struct proc_dir_entry looked like:

struct proc_dir_entry {
unsigned short low_ino;
unsigned short namelen;

Now, low_ino is "unsigned int", all savings were gone for a long time.
"struct proc_dir_entry" is not that countless to worry about it's size,
anyway.

2. converting from unsigned short to int/unsigned int can only create
problems, we better play it safe.

Space is not really conserved, because of natural alignment for the next
field. sizeof(struct proc_dir_entry) remains the same.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3740a20c4fe8697cb604a7d51395d23472b1768f 13-Jan-2011 Alexey Dobriyan <adobriyan@gmail.com> proc: less LOCK/UNLOCK in remove_proc_entry()

For the common case where a proc entry is being removed and nobody is in
the process of using it, save a LOCK/UNLOCK pair.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6d1b6e4eff89475785f60fa00f65da780f869f36 13-Jan-2011 Alexey Dobriyan <adobriyan@gmail.com> proc: ->low_ino cleanup

- ->low_ino is write-once field -- reading it under locks is unnecessary.

- /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
PROC_DYNAMIC_FIRST check never triggers.

- in proc_get_inode(), inode number always matches proc dir entry, so
save one parameter.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fb045adb99d9b7c562dc7fef834857f78249daa1 07-Jan-2011 Nick Piggin <npiggin@kernel.dk> fs: dcache reduce branches in lookup path

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
fe15ce446beb3a33583af81ffe6c9d01a75314ed 07-Jan-2011 Nick Piggin <npiggin@kernel.dk> fs: change d_delete semantics

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
1025774ce411f2bd4b059ad7b53f0003569b74fa 04-Jun-2010 Christoph Hellwig <hch@lst.de> remove inode_setattr

Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
57f87869f073929f8e8b3c73748aabb0cece19aa 26-May-2010 Amerigo Wang <amwang@redhat.com> proc: remove obsolete comments

A quick test shows these comments are obsolete, so just remove them.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
12bac0d9f4dbf3445a0319beee848d15fa32775e 05-Mar-2010 Alexey Dobriyan <adobriyan@gmail.com> proc: warn on non-existing proc entries

* warn if creation goes on to non-existent directory
* warn if removal goes on from non-existing directory
* warn if non-existing proc entry is removed

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e17a5765f20d1219c3f05eb17aab11671978e0ec 05-Mar-2010 Alexey Dobriyan <adobriyan@gmail.com> proc: do translation + unlink atomically at remove_proc_entry()

remove_proc_entry() does

lock
lookup parent
unlock
lock
unlink proc entry from lists
unlock

which can be made bit more correct by doing parent translation + unlink
without dropping lock.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
587d4a17d837ac0f17edb26f1b6c80c0abda6343 30-Dec-2009 Helight.Xu <helight.xu@gmail.com> some clean up in fs/proc

EXPORT_SYMBOL(proc_symlink);
EXPORT_SYMBOL(proc_mkdir);
EXPORT_SYMBOL(create_proc_entry);
EXPORT_SYMBOL(proc_create_data);
EXPORT_SYMBOL(remove_proc_entry);

Those EXPORT_SYMBOL shouldn't be in fs/proc/root.c,
should be in fs/proc/generic.c.

Signed-off-by: Helight.Xu <helight.xu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
135d5655dc58a24eda64e3f6c192d7d605e10050 16-Dec-2009 Alexey Dobriyan <adobriyan@gmail.com> proc: rename de_get() to pde_get() and inline it

* de_get() is trivial -- make inline, save a few bits of code, drop
"refcount is 0" check -- it should be done in some generic refcount
code, don't recall it's was helpful

* rename GET and PUT functions to pde_get(), pde_put() for cool prefix!

* remove obvious and incorrent comments

* in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to
be normal one, remove_proc_entry() was supposed to do "-1" and code now
reflects that.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3dec7f59c370c7b58184d63293c3dc984d475840 20-Feb-2009 Alexey Dobriyan <adobriyan@gmail.com> proc 1/2: do PDE usecounting even for ->read_proc, ->write_proc

struct proc_dir_entry::owner is going to be removed. Now it's only necessary
to protect PDEs which are using ->read_proc, ->write_proc hooks.

However, ->owner assignments are racy and make it very easy for someone to switch
->owner on live PDE (as some subsystems do) without fixing refcounts and so on.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

So, ->owner is on death row.

Proxy file operations exist already (proc_file_operations), just bump usecount
when necessary.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
1681bc30f272dd2fe347b90468791b05c7044f03 13-Jan-2009 Randy Dunlap <randy.dunlap@oracle.com> proc: move fs/proc/inode-alloc.txt comment into a source file

so that people will realize that it exists and can update it as needed.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
d72f71eb0edd629c95715aa7305b0259d3581e34 20-Feb-2009 Al Viro <viro@zeniv.linux.org.uk> constify dentry_operations: procfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
b4df2b92d8461444fac429c75ba6e125c63056bc 27-Oct-2008 Alexey Dobriyan <adobriyan@gmail.com> proc: stop using BKL

There are four BKL users in proc: de_put(), proc_lookup_de(),
proc_readdir_de(), proc_root_readdir(),

1) de_put()
-----------
de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
needed. BKL doesn't matter to possible refcount leak as well.

2) proc_lookup_de()
-------------------
Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
potentially blocking, all callers of proc_lookup_de() eventually end up
from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
doesn't protect anything.

3) proc_readdir_de()
--------------------
"." and ".." part doesn't need BKL, walking PDE list is under
proc_subdir_lock, calling filldir callback is potentially blocking
because it writes to luserspace. All proc_readdir_de() callers
eventually come from ->readdir hook which is under directory's
->i_mutex -- BKL doesn't protect anything.

4) proc_root_readdir_de()
-------------------------
proc_root_readdir_de is ->readdir hook, see (3).

Since readdir hooks doesn't use BKL anymore, switch to
generic_file_llseek, since it also takes directory's i_mutex.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
6c2f91e077f1b60e7f83b7ee044f965f469cfdb3 14-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> proc: use WARN() rather than printk+backtrace

Use WARN() rather than a printk() + backtrace();
this gives a more standard format message as well as complete
information (including line numbers etc) that will be collected
by kerneloops.org

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
665020c35e89a9e0643e21561e4f8f967f4f2c4b 13-Sep-2008 Alexey Dobriyan <adobriyan@gmail.com> proc: more debugging for "already registered" case

Print parent directory name as well.

The aim is to catch non-creation of parent directory when proc_mkdir will
return NULL and all subsequent registrations go directly in /proc instead
of intended directory.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Fixed insane printk string while at it. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cc996099174dc05b35b7a29301026987990e7f8c 02-Aug-2008 Alexey Dobriyan <adobriyan@gmail.com> [PATCH] proc: inode number fixlet

Ouch, if number taken from IDA is too big, the intent was to signal an
error, not check for overflow and still do overflowing addition.

One still needs 2^28 proc entries to notice this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9a18540915faaaadd7f71c16fa877a0c19675923 26-Jul-2008 Alexey Dobriyan <adobriyan@gmail.com> [PATCH 2/2] proc: switch inode number allocation to IDA

proc doesn't use "associate pointer with id" feature of IDR, so switch
to IDA.

NOTE, NOTE, NOTE:
Do not apply if release_inode_number() still mantions MAX_ID_MASK!

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
67935df49dae836fa86621861979fafdfd37ae59 26-Jul-2008 Alexey Dobriyan <adobriyan@gmail.com> [PATCH 1/2] proc: fix inode number bogorithmetic

Id which proc gets from IDR for inode number and id which proc removes
from IDR do not match. E.g. 0x11a transforms into 0x8000011a.

Which stayed unnoticed for a long time because, surprise, idr_remove()
masks out that high bit before doing anything.

All of this due to "| ~MAX_ID_MASK" in release_inode_number().

I still don't understand how it's supposed to work, because "| ~MASK"
is not an inversion for "& MAX" operation.

So, use just one nice, working addition. Make start offset unsigned int,
while I'm at it. It's longness is not used anywhere.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
267e2a9c71b8e088ac307f9549f71468e86e26c1 26-Jul-2008 Arjan van de Ven <arjan@linux.intel.com> Use WARN() in fs/proc/

Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
This way, the entire if() {} section can collapse into the WARN() as well.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
881adb85358309ea9c6f707394002719982ec607 25-Jul-2008 Alexey Dobriyan <adobriyan@gmail.com> proc: always do ->release

Current two-stage scheme of removing PDE emphasizes one bug in proc:

open
rmmod
remove_proc_entry
close

->release won't be called because ->proc_fops were cleared. In simple
cases it's small memory leak.

For every ->open, ->release has to be done. List of openers is introduced
which is traversed at remove_proc_entry() if neeeded.

Discussions with Al long ago (sigh).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
78e92b99ec4eb73755abd4e357b0b211eadafd88 02-May-2008 Denis V. Lunev <den@openvz.org> netns: assign PDE->data before gluing entry into /proc tree

In this unfortunate case, proc_mkdir_mode wrapper can't be used anymore and
this is no way to reuse proc_create_data due to nlinks assignment. So,
copy the code from proc_mkdir and assign PDE->data at the appropriate
moment.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
59b7435149eab2dd06dd678742faff6049cb655f 29-Apr-2008 Denis V. Lunev <den@openvz.org> proc: introduce proc_create_data to setup de->data

This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
the most part of the kernel code. The original OOPS is described in the
commit 2d3a4e3666325a9709cc8ea2e88151394e8f20fc:

Typical PDE creation code looks like:

pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;

Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).

The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:

pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;

Fix most networking users for a start.

In the long run, create_proc_entry() for regular files will go.

In addition to this, proc_create_data is introduced to fix reading from
proc without PDE->data. The race is basically the same as above.

create_proc_entries is replaced in the entire kernel code as new method
is also simply better.

This patch:

The problem is the same as for de->proc_fops. Right now PDE becomes visible
without data set. So, the entry could be looked up without data. This, in
most cases, will simply OOPS.

proc_create_data call is created to address this issue. proc_create now
becomes a wrapper around it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Chris Mason <chris.mason@oracle.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Karsten Keil <kkeil@suse.de>
Cc: Kyle McMartin <kyle@parisc-linux.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8731f14d37825b54ad0c4c309cba2bc8fdf13a86 29-Apr-2008 Alexey Dobriyan <adobriyan@sw.ru> proc: remove ->get_info infrastructure

Now that last dozen or so users of ->get_info were removed, ditch it too.
Everyone sane shouldd have switched to seq_file interface long ago.

P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
is long-standing crap, BTW, thus
a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
b) new such users should be rejected,
c) everyone is encouraged to convert his favourite ->read_proc user or
I'll do it, lazy bastards.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5e971dce0b2f6896e02372512df0d1fb0bfe2d55 29-Apr-2008 Alexey Dobriyan <adobriyan@gmail.com> proc: drop several "PDE valid/invalid" checks

proc-misc code is noticeably full of "if (de)" checks when PDE passed is
always valid. Remove them.

Addition of such check in proc_lookup_de() is for failed lookup case.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7cee4e00e0f8aa7290266382ea903a5a1b92c9a1 29-Apr-2008 Alexey Dobriyan <adobriyan@gmail.com> proc: less special case in xlate code

If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
PDE should consist of only one path component, otherwise creation or or
removal will fail. However, if NULL is passed as parent then create/remove
accept full path as a argument. This is arbitrary restriction -- all
infrastructure is in place.

So, patch allows the following to succeed:

create_proc_entry("foo/bar", 0, pde_baz);
remove_proc_entry("baz/foo/bar", &proc_root);

Also makes the following to behave identically:

create_proc_entry("foo/bar", 0, NULL);
create_proc_entry("foo/bar", 0, &proc_root);

Discrepancy noticed by Den Lunev (IIRC).

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f649d6d32605c7573884613289fb3b9fbd4f99a1 29-Apr-2008 Alexey Dobriyan <adobriyan@gmail.com> proc: simplify locking in remove_proc_entry()

proc_subdir_lock protects only modifying and walking through PDE lists, so
after we've found PDE to remove and actually removed it from lists, there is
no need to hold proc_subdir_lock for the rest of operation.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e93b4ea20adb20f1f1f07f10ba5d7dd739d2843e 29-Apr-2008 Alexey Dobriyan <adobriyan@sw.ru> proc: print more information when removing non-empty directories

This usually saves one recompile to insert similar printk like below. :)

Sample nastygram:

remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
------------[ cut here ]------------
WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
Pid: 3034, comm: rmmod Tainted: G M 2.6.25-rc1 #5

Call Trace:
[<ffffffff80231974>] warn_on_slowpath+0x64/0x90
[<ffffffff80232a6e>] printk+0x4e/0x60
[<ffffffff802d6c8a>] remove_proc_entry+0x18a/0x200
[<ffffffff8045cd88>] mutex_lock_nested+0x1c8/0x2d0
[<ffffffff8025f0f0>] __try_stop_module+0x0/0x40
[<ffffffff8025effd>] sys_delete_module+0x14d/0x200
[<ffffffff8045df3d>] lockdep_sys_exit_thunk+0x35/0x67
[<ffffffff8031c307>] __up_read+0x27/0xa0
[<ffffffff8045decc>] trace_hardirqs_on_thunk+0x35/0x3a
[<ffffffff8020b6ab>] system_call_after_swapgs+0x7b/0x80

---[ end trace 10ef850597e89c54 ]---

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e9720acd728a46cb40daa52c99a979f7c4ff195c 07-Mar-2008 Pavel Emelyanov <xemul@openvz.org> [NET]: Make /proc/net a symlink on /proc/self/net (v3)

Current /proc/net is done with so called "shadows", but current
implementation is broken and has little chances to get fixed.

The problem is that dentries subtree of /proc/net directory has
fancy revalidation rules to make processes living in different
net namespaces see different entries in /proc/net subtree, but
currently, tasks see in the /proc/net subdir the contents of any
other namespace, depending on who opened the file first.

The proposed fix is to turn /proc/net into a symlink, which points
to /proc/self/net, which in turn shows what previously was in
/proc/net - the network-related info, from the net namespace the
appropriate task lives in.

# ls -l /proc/net
lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net

In other words - this behaves like /proc/mounts, but unlike
"mounts", "net" is not a file, but a directory.

Changes from v2:
* Fixed discrepancy of /proc/net nlink count and selinux labeling
screwup pointed out by Stephen.

To get the correct nlink count the ->getattr callback for /proc/net
is overridden to read one from the net->proc_net entry.

To make selinux still work the net->proc_net entry is initialized
properly, i.e. with the "net" name and the proc_net parent.

Selinux fixes are
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>

Changes from v1:
* Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2d3a4e3666325a9709cc8ea2e88151394e8f20fc 08-Feb-2008 Alexey Dobriyan <adobriyan@sw.ru> proc: fix ->open'less usage due to ->proc_fops flip

Typical PDE creation code looks like:

pde = create_proc_entry("foo", 0, NULL);
if (pde)
pde->proc_fops = &foo_proc_fops;

Notice that PDE is first created, only then ->proc_fops is set up to
final value. This is a problem because right after creation
a) PDE is fully visible in /proc , and
b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
possible to ->read without ->open (see one class of oopses below).

The fix is new API called proc_create() which makes sure ->proc_fops are
set up before gluing PDE to main tree. Typical new code looks like:

pde = proc_create("foo", 0, NULL, &foo_proc_fops);
if (!pde)
return -ENOMEM;

Fix most networking users for a start.

In the long run, create_proc_entry() for regular files will go.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/block/sda/sda1/dev
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom

Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
EIP is at mutex_lock_nested+0x75/0x25d
EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
Call Trace:
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c106f7ce>] seq_read+0x24/0x28a
[<c106f7aa>] seq_read+0x0/0x28a
[<c10818b8>] proc_reg_read+0x60/0x73
[<c1081858>] proc_reg_read+0x0/0x73
[<c105a34f>] vfs_read+0x6c/0x8b
[<c105a6f3>] sys_read+0x3c/0x63
[<c10025f2>] sysenter_past_esp+0x5f/0xa5
[<c10697a7>] destroy_inode+0x24/0x33
=======================
INFO: lockdep is turned off.
Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
94413d8807a3c511a3675be4ce27a4d16d6408ee 08-Feb-2008 Zhang Rui <rui.zhang@intel.com> proc: detect duplicate names on registration

Print a warning if PDE is registered with a name which already exists in
target directory.

Bug report and a simple fix can be found here:
http://bugzilla.kernel.org/show_bug.cgi?id=8798

[\n fixlet and no undescriptive variable usage --adobriyan]
[akpm@linux-foundation.org: make printk comprehensible]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fd2cbe48883a01f710c2a639877e3b3e4eba6e59 08-Feb-2008 Alexey Dobriyan <adobriyan@sw.ru> proc: remove useless check on symlink removal

proc symlinks always have valid ->data containing destination of symlink. No
need to check it on removal -- proc_symlink() already done it.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
76df0c25d0c34eba9fbb8a44106ed096553ba0e8 08-Feb-2008 Alexey Dobriyan <adobriyan@sw.ru> proc: simplify function prototypes

Move code around so as to reduce the number of forward-declarations.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4237e0d3de38da640d7c977d68f5f7f1d207a631 08-Feb-2008 Alexey Dobriyan <adobriyan@sw.ru> proc: less LOCK operations during lookup

Pseudo-code for lookup effectively is:

LOCK kernel
LOCK proc_subdir_lock
find PDE
UNLOCK proc_subdir_lock

get inode

LOCK proc_subdir_lock
goto unlock
UNLOCK proc_subdir_lock
UNLOCK kernel

We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
to unlock_kernel() directly.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3790ee4bd86396558eedd86faac1052cb782e4e1 11-Dec-2007 Eric W. Biederman <ebiederm@xmission.com> proc: remove/Fix proc generic d_revalidate

Ultimately to implement /proc perfectly we need an implementation of
d_revalidate because files and directories can be removed behind the back
of the VFS, and d_revalidate is the only way we can let the VFS know that
this has happened.

Unfortunately the linux VFS can not cope with anything in the path to a
mount point going away. So a proper d_revalidate method that calls d_drop
also needs to call have_submounts which is moderately expensive, so you
really don't want a d_revalidate method that unconditionally calls it, but
instead only calls it when the backing object has really gone away.

proc generic entries only disappear on module_unload (when not counting the
fledgling network namespace) so it is quite rare that we actually encounter
that case and has not actually caused us real world trouble yet.

So until we get a proper test for keeping dentries in the dcache fix the
current d_revalidate method by completely removing it. This returns us to
the current status quo.

So with CONFIG_NETNS=n things should look as they have always looked.

For CONFIG_NETNS=y things work most of the time but there are a few rare
corner cases that don't behave properly. As the network namespace is
barely present in 2.6.24 this should not be a problem.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5a622f2d0f86b316b07b55a4866ecb5518dd1cf7 05-Dec-2007 Alexey Dobriyan <adobriyan@sw.ru> proc: fix proc_dir_entry refcounting

Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
Switch to usual scheme:
* PDE is created with refcount 1
* every de_get does +1
* every de_put() and remove_proc_entry() do -1
* once refcount reaches 0, PDE is freed.

This elegantly fixes at least two following races (both observed) without
introducing new locks, without abusing old locks, without spreading
lock_kernel():

1) PDE leak

remove_proc_entry de_put
----------------- ------
[refcnt = 1]
if (atomic_read(&de->count) == 0)
if (atomic_dec_and_test(&de->count))
if (de->deleted)
/* also not taken! */
free_proc_entry(de);
else
de->deleted = 1;
[refcount=0, deleted=1]

2) use after free

remove_proc_entry de_put
----------------- ------
[refcnt = 1]

if (atomic_dec_and_test(&de->count))
if (atomic_read(&de->count) == 0)
free_proc_entry(de);
/* boom! */
if (de->deleted)
free_proc_entry(de);

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
EIP is at strnlen+0x6/0x18
EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
Call Trace:
[<c10ac4f0>] vsnprintf+0x2ad/0x49b
[<c10ac779>] vscnprintf+0x14/0x1f
[<c1018e6b>] vprintk+0xc5/0x2f9
[<c10379f1>] handle_fasteoi_irq+0x0/0xab
[<c1004f44>] do_IRQ+0x9f/0xb7
[<c117db3b>] preempt_schedule_irq+0x3f/0x5b
[<c100264e>] need_resched+0x1f/0x21
[<c10190ba>] printk+0x1b/0x1f
[<c107c8ad>] de_put+0x3d/0x50
[<c107c8f8>] proc_delete_inode+0x38/0x41
[<c107c8c0>] proc_delete_inode+0x0/0x41
[<c1066298>] generic_delete_inode+0x5e/0xc6
[<c1065aa9>] iput+0x60/0x62
[<c1063c8e>] d_kill+0x2d/0x46
[<c1063fa9>] dput+0xdc/0xe4
[<c10571a1>] __fput+0xb0/0xcd
[<c1054e49>] filp_close+0x48/0x4f
[<c1055ee9>] sys_close+0x67/0xa5
[<c10026b6>] sysenter_past_esp+0x5f/0x85
=======================
Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44

Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
module is already pinned and remove_proc_entry() can't happen => nobody
can mark PDE deleted.

Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
never get it, it's just for proper /proc/net removal. I double checked
CLONE_NETNS continues to work.

Patch survives many hours of modprobe/rmmod/cat loops without new bugs
which can be attributed to refcounting.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2b1e300a9dfc3196ccddf6f1d74b91b7af55e416 01-Dec-2007 Eric W. Biederman <ebiederm@xmission.com> [NETNS]: Fix /proc/net breakage

Well I clearly goofed when I added the initial network namespace support
for /proc/net. Currently things work but there are odd details visible to
user space, even when we have a single network namespace.

Since we do not cache proc_dir_entry dentries at the moment we can just
modify ->lookup to return a different directory inode depending on the
network namespace of the process looking at /proc/net, replacing the
current technique of using a magic and fragile follow_link method.

To accomplish that this patch:
- introduces a shadow_proc method to allow different dentries to
be returned from proc_lookup.
- Removes the old /proc/net follow_link magic
- Fixes a weakness in our not caching of proc generic dentries.

As shadow_proc uses a task struct to decided which dentry to return we can
go back later and fix the proc generic caching without modifying any code
that uses the shadow_proc method.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
c2319540cd7330fa9066e5b9b84d357a2c8631a2 29-Nov-2007 Alexey Dobriyan <adobriyan@sw.ru> proc: fix NULL ->i_fop oops

proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
NULL dereference during "file->f_op->readdir(file, buf, filler)".

The solution is to remove proc_kill_inodes() completely:

a) we don't have tricky modules implementing their tricky readdir hooks which
could keeping this revoke from hell.

b) In a situation when module is gone but PDE still alive, standard
readdir will return only "." and "..", because pde->next was cleared by
remove_proc_entry().

c) the race proc_kill_inode() destined to prevent is not completely
fixed, just race window made smaller, because vfs_readdir() is run
without sb_lock held and without file_list_lock held. Effectively,
->i_fop is cleared at random moment, which can't fix properly anything.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
EIP is at vfs_readdir+0x47/0x74
EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
Call Trace:
[<c1061040>] filldir64+0x0/0xc5
[<c1061295>] sys_getdents64+0x63/0xa5
[<c10026ba>] sysenter_past_esp+0x5f/0x85
=======================
Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

hch: "Nice, getting rid of this is a very good step formwards.
Unfortunately we have another copy of this junk in
security/selinux/selinuxfs.c:sel_remove_entries() which would need the
same treatment."

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e1a1c997afe907e6ec4799e4be0f38cffd8b418c 15-Nov-2007 Eric W. Biederman <ebiederm@xmission.com> proc: fix proc_kill_inodes to kill dentries on all proc superblocks

It appears we overlooked support for removing generic proc files
when we added support for multiple proc super blocks. Handle
that now.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e12ba74d8ff3e2f73a583500d7095e406df4d093 16-Oct-2007 Mel Gorman <mel@csn.ul.ie> Group short-lived and reclaimable kernel allocations

This patch marks a number of allocations that are either short-lived such as
network buffers or are reclaimable such as inode allocations. When something
like updatedb is called, long-lived and unmovable kernel allocations tend to
be spread throughout the address space which increases fragmentation.

This patch groups these allocations together as much as possible by adding a
new MIGRATE_TYPE. The MIGRATE_RECLAIMABLE type is for allocations that can be
reclaimed on demand, but not moved. i.e. they can be migrated by deleting
them and re-reading the information from elsewhere.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
99fc06df72fe1c9ad3ec274720dcb5658c40bfd2 16-Jul-2007 Changli Gao <xiaosuo@gmail.com> procfs directory entry cleanup

Function proc_register() will assign proc_dir_operations and
proc_dir_inode_operations to ent's members proc_fops and proc_iops
correctly if ent is a directory. So the early assignment isn't
necessary.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
786d7e1612f0b0adb6046f19b906609e4fe8b1ba 16-Jul-2007 Alexey Dobriyan <adobriyan@sw.ru> Fix rmmod/read/write races in /proc entries

Fix following races:
===========================================
1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
meanwhile. Or, more generically, system call done on /proc file, method
supplied by module is called, module dissapeares meanwhile.

pde = create_proc_entry()
if (!pde)
return -ENOMEM;
pde->write_proc = ...
open
write
copy_from_user
pde = create_proc_entry();
if (!pde) {
remove_proc_entry();
return -ENOMEM;
/* module unloaded */
}
*boom*
==========================================
2. bogo-revoke aka proc_kill_inodes()

remove_proc_entry vfs_read
proc_kill_inodes [check ->f_op validness]
[check ->f_op->read validness]
[verify_area, security permissions checks]
->f_op = NULL;
if (file->f_op->read)
/* ->f_op dereference, boom */

NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
see how this scheme behaves, then extend if needed for directories.
Directories creators in /proc only set ->owner for them, so proxying for
directories may be unneeded.

NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
If your in-tree module uses something else, yell on me. Full audit pending.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
59cd0cbc75367b82f704f63b104117462275060d 08-May-2007 Darrick J. Wong <djwong@us.ibm.com> Fix race between proc_readdir and remove_proc_entry

Fix the following race:

proc_readdir remove_proc_entry
============ =================

spin_lock(&proc_subdir_lock);
[choose PDE to start filldir from]
spin_unlock(&proc_subdir_lock);
spin_lock(&proc_subdir_lock);
[find PDE]
[free PDE, refcount is 0]
spin_unlock(&proc_subdir_lock);
/* boom */
if (filldir(dirent, de->name, ...

[de_put on error path --adobriyan]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7695650a924a6859910c8c19dfa43b4d08224d66 08-May-2007 Alexey Dobriyan <adobriyan@openvz.org> Fix race between proc_get_inode() and remove_proc_entry()

proc_lookup remove_proc_entry
=========== =================

lock_kernel();
spin_lock(&proc_subdir_lock);
[find PDE with refcount 0]
spin_unlock(&proc_subdir_lock);
spin_lock(&proc_subdir_lock);
[find PDE with refcount 0]
[check refcount and free PDE]
spin_unlock(&proc_subdir_lock);
proc_get_inode:
de_get(de); /* boom */

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
77b14db502cb85a031fe8fde6c85d52f3e0acb63 14-Feb-2007 Eric W. Biederman <ebiederm@xmission.com> [PATCH] sysctl: reimplement the sysctl proc support

With this change the sysctl inodes can be cached and nothing needs to be done
when removing a sysctl table.

For a cost of 2K code we will save about 4K of static tables (when we remove
de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
about half that on a 32bit arch.

The speed feels about the same, even though we can now cache the sysctl
dentries :(

We get the core advantage that we don't need to have a 1 to 1 mapping between
ctl table entries and proc files. Making it possible to have /proc/sys vary
depending on the namespace you are in. The currently merged namespaces don't
have an issue here but the network namespace under /proc/sys/net needs to have
different directories depending on which network adapters are visible. By
simply being a cache different directories being visible depending on who you
are is trivial to implement.

[akpm@osdl.org: fix uninitialised var]
[akpm@osdl.org: fix ARM build]
[bunk@stusta.de: make things static]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c5ef1c42c51b1b5b4a401a6517bdda30933ddbaf 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com> [PATCH] mark struct inode_operations const 3

Many struct inode_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
00977a59b951207d38380c75f03a36829950265c 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com> [PATCH] mark struct file_operations const 6

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2fddfeefeed703b7638af97aa3048f82a2d53b03 08-Dec-2006 Josef "Jeff" Sipek <jsipek@cs.sunysb.edu> [PATCH] proc: change uses of f_{dentry, vfsmnt} to use f_path

Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the proc
filesystem code.

Signed-off-by: Josef "Jeff" Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
99ac48f54a91d02140c497edc31dc57d4bc5c85d 28-Mar-2006 Arjan van de Ven <arjan@infradead.org> [PATCH] mark f_ops const in the inode

Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
64a07bd82ed526d813b64b0957543eef55bdf9c0 26-Mar-2006 Steven Rostedt <rostedt@goodmis.org> [PATCH] protect remove_proc_entry

It has been discovered that the remove_proc_entry has a race in the removing
of entries in the proc file system that are siblings. There's no protection
around the traversing and removing of elements that belong in the same
subdirectory.

This subdirectory list is protected in other areas by the BKL. So the BKL was
at first used to protect this area too, but unfortunately, remove_proc_entry
may be called with spinlocks held. The BKL may schedule, so this was not a
solution.

The final solution was to add a new global spin lock to protect this list,
called proc_subdir_lock. This lock now protects the list in
remove_proc_entry, and I also went around looking for other areas that this
list is modified and added this protection there too. Care must be taken
since these locations call several functions that may also schedule.

Since I don't see any location that these functions that modify the
subdirectory list are called by interrupts, the irqsave/restore versions of
the spin lock was _not_ used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fee781e6c25772db862d3322b4745a896022a4f1 08-Jan-2006 Adrian Bunk <bunk@stusta.de> [PATCH] fs/proc/: function prototypes belong in header files

Function prototypes belong into header files.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
8b90db0df7187a01fb7177f1f812123138f562cf 30-Dec-2005 Linus Torvalds <torvalds@g5.osdl.org> Insanity avoidance in /proc

The old /proc interfaces were never updated to use loff_t, and are just
generally broken. Now, we should be using the seq_file interface for
all of the proc files, but converting the legacy functions is more work
than most people care for and has little upside..

But at least we can make the non-LFS rules explicit, rather than just
insanely wrapping the offset or something.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2f51201662b28dbf8c15fb7eb972bc51c6cc3fa5 31-Oct-2005 Eric Dumazet <dada1@cosmosbay.com> [PATCH] reduce sizeof(struct file)

Now that RCU applied on 'struct file' seems stable, we can place f_rcuhead
in a memory location that is not anymore used at call_rcu(&f->f_rcuhead,
file_free_rcu) time, to reduce the size of this critical kernel object.

The trick I used is to move f_rcuhead and f_list in an union called f_u

The callers are changed so that f_rcuhead becomes f_u.fu_rcuhead and f_list
becomes f_u.f_list

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2b579beec255d6589fabe51b60933d723630bcd4 07-Sep-2005 Miklos Szeredi <miklos@szeredi.hu> [PATCH] proc: link count fix

This patch fixes bug titled "sunrpc as module and bad proc/sys link count"
reported by Jiri Slaby.

The problem was, that only proc_dir_entry->nlink was updated and the
corresponding inode->i_nlink was not. The fix is to implement the
inode->getattr() method, and update i_nlink (if necessary).

A quick audit of proc code shows that no other attribute changes after
creation.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
008b150a3c4d971cd65d02d107b8fcc860bc959c 20-Aug-2005 Al Viro <viro@parcelfarce.linux.theplanet.co.uk> [PATCH] Fix up symlink function pointers

This fixes up the symlink functions for the calling convention change:

* afs, autofs4, befs, devfs, freevxfs, jffs2, jfs, ncpfs, procfs,
smbfs, sysvfs, ufs, xfs - prototype change for ->follow_link()
* befs, smbfs, xfs - same for ->put_link()

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!