History log of /net/compat.c
Revision Date Author Comments
40eea803c6b2cfaab092f053248cbeab3f368412 26-Jul-2014 Andrey Ryabinin <ryabinin.a.a@gmail.com> net: sendmsg: fix NULL pointer dereference

Sasha's report:
> While fuzzing with trinity inside a KVM tools guest running the latest -next
> kernel with the KASAN patchset, I've stumbled on the following spew:
>
> [ 4448.949424] ==================================================================
> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
> [ 4448.952988] Read of size 2 by thread T19638:
> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 4448.956823] ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
> [ 4448.958233] ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
> [ 4448.959552] 0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
> [ 4448.961266] Call Trace:
> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
> [ 4448.970103] sock_sendmsg (net/socket.c:654)
> [ 4448.971584] ? might_fault (mm/memory.c:3741)
> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
> [ 4448.988929] ==================================================================

This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.

After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.

This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.

This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: <stable@vger.kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3a49a0f7181c243aa04e6c5e44ca70a90ead8f9a 04-Mar-2014 Heiko Carstens <heiko.carstens@de.ibm.com> net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types

In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that
performs proper zero and sign extension convert all 64 bit parameters
to their corresponding 32 bit compat counterparts.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
361d93c46f688d1a2209912bda377bdb48842cce 03-Mar-2014 Heiko Carstens <heiko.carstens@de.ibm.com> net/compat: convert to COMPAT_SYSCALL_DEFINE

Convert all compat system call functions where all parameter types
have a size of four or less than four bytes, or are pointer types
to COMPAT_SYSCALL_DEFINE.
The implicit casts within COMPAT_SYSCALL_DEFINE will perform proper
zero and sign extension to 64 bit of all parameters if needed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2def2ef2ae5f3990aabdbe8a755911902707d268 31-Jan-2014 PaX Team <pageexec@freemail.hu> x86, x32: Correct invalid use of user timespec in the kernel

The x32 case for the recvmsg() timout handling is broken:

asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg,
unsigned int vlen, unsigned int flags,
struct compat_timespec __user *timeout)
{
int datagrams;
struct timespec ktspec;

if (flags & MSG_CMSG_COMPAT)
return -EINVAL;

if (COMPAT_USE_64BIT_TIME)
return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen,
flags | MSG_CMSG_COMPAT,
(struct timespec *) timeout);
...

The timeout pointer parameter is provided by userland (hence the __user
annotation) but for x32 syscalls it's simply cast to a kernel pointer
and is passed to __sys_recvmmsg which will eventually directly
dereference it for both reading and writing. Other callers to
__sys_recvmmsg properly copy from userland to the kernel first.

The bug was introduced by commit ee4fa23c4bfc ("compat: Use
COMPAT_USE_64BIT_TIME in net/compat.c") and should affect all kernels
since 3.4 (and perhaps vendor kernels if they backported x32 support
along with this code).

Note that CONFIG_X86_X32_ABI gets enabled at build time and only if
CONFIG_X86_X32 is enabled and ld can build x32 executables.

Other uses of COMPAT_USE_64BIT_TIME seem fine.

This addresses CVE-2014-0038.

Signed-off-by: PaX Team <pageexec@freemail.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.4+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
db31c55a6fb245fdbb752a2ca4aefec89afabb06 27-Nov-2013 Dan Carpenter <dan.carpenter@oracle.com> net: clamp ->msg_namelen instead of returning an error

If kmsg->msg_namelen > sizeof(struct sockaddr_storage) then in the
original code that would lead to memory corruption in the kernel if you
had audit configured. If you didn't have audit configured it was
harmless.

There are some programs such as beta versions of Ruby which use too
large of a buffer and returning an error code breaks them. We should
clamp the ->msg_namelen value instead.

Fixes: 1661bf364ae9 ("net: heap overflow in __audit_sockaddr()")
Reported-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Eric Wong <normalperson@yhbt.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f3d3342602f8bcbf37d7c46641cb9bca7618eb1c 21-Nov-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> net: rework recvmsg handler msg_name and msg_namelen logic

This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.

This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.

Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.

Also document these changes in include/linux/net.h as suggested by David
Miller.

Changes since RFC:

Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.

With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
msg->msg_name = NULL
".

This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.

Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.

Cc: David Miller <davem@davemloft.net>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1661bf364ae9c506bc8795fef70d1532931be1e8 02-Oct-2013 Dan Carpenter <dan.carpenter@oracle.com> net: heap overflow in __audit_sockaddr()

We need to cap ->msg_namelen or it leads to a buffer overflow when we
to the memcpy() in __audit_sockaddr(). It requires CAP_AUDIT_CONTROL to
exploit this bug.

The call tree is:
___sys_recvmsg()
move_addr_to_user()
audit_sockaddr()
__audit_sockaddr()

Reported-by: Jüri Aedla <juri.aedla@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a7526eb5d06b0084ef12d7b168d008fcf516caab 05-Jun-2013 Andy Lutomirski <luto@amacapital.net> net: Unbreak compat_sys_{send,recv}msg

I broke them in this commit:

commit 1be374a0518a288147c6a7398792583200a67261
Author: Andy Lutomirski <luto@amacapital.net>
Date: Wed May 22 14:07:44 2013 -0700

net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg

This patch adds __sys_sendmsg and __sys_sendmsg as common helpers that accept
MSG_CMSG_COMPAT and blocks MSG_CMSG_COMPAT at the syscall entrypoints. It
also reverts some unnecessary checks in sys_socketcall.

Apparently I was suffering from underscore blindness the first time around.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Tested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cb0942b81249798e15c3f04eee2946ef543e8115 27-Aug-2012 Al Viro <viro@zeniv.linux.org.uk> make get_file() return its argument

simplifies a bunch of callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
818810472b129004c16fc51bf0a570b60776bfb7 22-Jul-2012 Jesper Juhl <jj@chaosbits.net> net: Fix references to out-of-scope variables in put_cmsg_compat()

In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
either the 'ctv' or 'cts' local variables inside the 'if
(!COMPAT_USE_64BIT_TIME)' branch.

Those variables go out of scope at the end of the 'if' statement, so
when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
it may be refering to - not good.

Fix the problem by simply giving 'ctv' and 'cts' function scope.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
95c961747284a6b83a5e2d81240e214b0fa3464d 15-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com> net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0c5fe1b4221c6701224c2601cf3c692e5721103e 12-Apr-2012 Will Drewry <wad@chromium.org> net/compat.c,linux/filter.h: share compat_sock_fprog

Any other users of bpf_*_filter that take a struct sock_fprog from
userspace will need to be able to also accept a compat_sock_fprog
if the arch supports compat calls. This change allows the existing
compat_sock_fprog be shared.

Signed-off-by: Will Drewry <wad@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Paris <eparis@redhat.com>

v18: tasered by the apostrophe police
v14: rebase/nochanges
v13: rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
v12: rebase on to linux-next
v11: introduction
Signed-off-by: James Morris <james.l.morris@oracle.com>
43db362d3adda9e0a915ddb9a8d1a41186e19179 11-Mar-2012 Maciej Żenczykowski <maze@google.com> net: get rid of some pointless casts to sockaddr

The following 4 functions:
move_addr_to_kernel
move_addr_to_user
verify_iovec
verify_compat_iovec
are always effectively called with a sockaddr_storage.

Make this explicit by changing their signature.

This removes a large number of casts from sockaddr_storage to sockaddr.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee4fa23c4bfcc635d077a9633d405610de45bc70 20-Feb-2012 H. J. Lu <hjl.tools@gmail.com> compat: Use COMPAT_USE_64BIT_TIME in net/compat.c

Handle 64-bit time structures in the networking core compat code.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: David S. Miller <davem@davemloft.net>
bc3b2d7fb9b014d75ebb79ba371a763dbab5e8cf 15-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com> net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules

These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
228e548e602061b08ee8e8966f567c12aa079682 02-May-2011 Anton Blanchard <anton@samba.org> net: Add sendmmsg socket system call

This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.

I wrote a microbenchmark to test the performance gains of using
this new syscall:

http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.

64B UDP

batch pkts/sec
1 804570
2 872800 (+ 8 %)
4 916556 (+14 %)
8 939712 (+17 %)
16 952688 (+18 %)
32 956448 (+19 %)
64 964800 (+20 %)

64B raw socket

batch pkts/sec
1 1201449
2 1350028 (+12 %)
4 1461416 (+22 %)
8 1513080 (+26 %)
16 1541216 (+28 %)
32 1553440 (+29 %)
64 1557888 (+30 %)

We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.

[ Add sparc syscall entries. -DaveM ]

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8acfe468b0384e834a303f08ebc4953d72fb690a 28-Oct-2010 David S. Miller <davem@davemloft.net> net: Limit socket I/O iovec total length to INT_MAX.

This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers. Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.

This works because:

1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
writes are allowed and the application will just continue
with another write to send the rest of the data.

2) For datagram oriented sockets, where there must be a
one-to-one correspondance between write() calls and
packets on the wire, INT_MAX is going to be far larger
than the packet size limit the protocol is going to
check for and signal with -EMSGSIZE.

Based upon a patch by Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
c6d409cfd0fd41e7a0847875e4338ad648c9b96b 04-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com> From abbffa2aa9bd6f8df16d0d0a102af677510d8b9a Mon Sep 17 00:00:00 2001
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 3 Jun 2010 04:29:41 +0000
Subject: [PATCH 2/3] net: net/socket.c and net/compat.c cleanups

cleanup patch, to match modern coding style.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
net/compat.c | 47 ++++++++---------
net/socket.c | 165 ++++++++++++++++++++++++++++------------------------------
2 files changed, 102 insertions(+), 110 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 1cf7590..63d260e 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -81,7 +81,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
int tot_len;

if (kern_msg->msg_namelen) {
- if (mode==VERIFY_READ) {
+ if (mode == VERIFY_READ) {
int err = move_addr_to_kernel(kern_msg->msg_name,
kern_msg->msg_namelen,
kern_address);
@@ -354,7 +354,7 @@ static int do_set_attach_filter(struct socket *sock, int level, int optname,
static int do_set_sock_timeout(struct socket *sock, int level,
int optname, char __user *optval, unsigned int optlen)
{
- struct compat_timeval __user *up = (struct compat_timeval __user *) optval;
+ struct compat_timeval __user *up = (struct compat_timeval __user *)optval;
struct timeval ktime;
mm_segment_t old_fs;
int err;
@@ -367,7 +367,7 @@ static int do_set_sock_timeout(struct socket *sock, int level,
return -EFAULT;
old_fs = get_fs();
set_fs(KERNEL_DS);
- err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime));
+ err = sock_setsockopt(sock, level, optname, (char *)&ktime, sizeof(ktime));
set_fs(old_fs);

return err;
@@ -389,11 +389,10 @@ asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
char __user *optval, unsigned int optlen)
{
int err;
- struct socket *sock;
+ struct socket *sock = sockfd_lookup(fd, &err);

- if ((sock = sockfd_lookup(fd, &err))!=NULL)
- {
- err = security_socket_setsockopt(sock,level,optname);
+ if (sock) {
+ err = security_socket_setsockopt(sock, level, optname);
if (err) {
sockfd_put(sock);
return err;
@@ -453,7 +452,7 @@ static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
{
struct compat_timeval __user *ctv =
- (struct compat_timeval __user*) userstamp;
+ (struct compat_timeval __user *) userstamp;
int err = -ENOENT;
struct timeval tv;

@@ -477,7 +476,7 @@ EXPORT_SYMBOL(compat_sock_get_timestamp);
int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
{
struct compat_timespec __user *ctv =
- (struct compat_timespec __user*) userstamp;
+ (struct compat_timespec __user *) userstamp;
int err = -ENOENT;
struct timespec ts;

@@ -502,12 +501,10 @@ asmlinkage long compat_sys_getsockopt(int fd, int level, int optname,
char __user *optval, int __user *optlen)
{
int err;
- struct socket *sock;
+ struct socket *sock = sockfd_lookup(fd, &err);

- if ((sock = sockfd_lookup(fd, &err))!=NULL)
- {
- err = security_socket_getsockopt(sock, level,
- optname);
+ if (sock) {
+ err = security_socket_getsockopt(sock, level, optname);
if (err) {
sockfd_put(sock);
return err;
@@ -557,7 +554,7 @@ struct compat_group_filter {

int compat_mc_setsockopt(struct sock *sock, int level, int optname,
char __user *optval, unsigned int optlen,
- int (*setsockopt)(struct sock *,int,int,char __user *,unsigned int))
+ int (*setsockopt)(struct sock *, int, int, char __user *, unsigned int))
{
char __user *koptval = optval;
int koptlen = optlen;
@@ -640,12 +637,11 @@ int compat_mc_setsockopt(struct sock *sock, int level, int optname,
}
return setsockopt(sock, level, optname, koptval, koptlen);
}
-
EXPORT_SYMBOL(compat_mc_setsockopt);

int compat_mc_getsockopt(struct sock *sock, int level, int optname,
char __user *optval, int __user *optlen,
- int (*getsockopt)(struct sock *,int,int,char __user *,int __user *))
+ int (*getsockopt)(struct sock *, int, int, char __user *, int __user *))
{
struct compat_group_filter __user *gf32 = (void *)optval;
struct group_filter __user *kgf;
@@ -681,7 +677,7 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname,
__put_user(interface, &kgf->gf_interface) ||
__put_user(fmode, &kgf->gf_fmode) ||
__put_user(numsrc, &kgf->gf_numsrc) ||
- copy_in_user(&kgf->gf_group,&gf32->gf_group,sizeof(kgf->gf_group)))
+ copy_in_user(&kgf->gf_group, &gf32->gf_group, sizeof(kgf->gf_group)))
return -EFAULT;

err = getsockopt(sock, level, optname, (char __user *)kgf, koptlen);
@@ -714,21 +710,22 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname,
copylen = numsrc * sizeof(gf32->gf_slist[0]);
if (copylen > klen)
copylen = klen;
- if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen))
+ if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen))
return -EFAULT;
}
return err;
}
-
EXPORT_SYMBOL(compat_mc_getsockopt);

/* Argument list sizes for compat_sys_socketcall */
#define AL(x) ((x) * sizeof(u32))
-static unsigned char nas[20]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),
- AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),
- AL(6),AL(2),AL(5),AL(5),AL(3),AL(3),
- AL(4),AL(5)};
+static unsigned char nas[20] = {
+ AL(0), AL(3), AL(3), AL(3), AL(2), AL(3),
+ AL(3), AL(3), AL(4), AL(4), AL(4), AL(6),
+ AL(6), AL(2), AL(5), AL(5), AL(3), AL(3),
+ AL(4), AL(5)
+};
#undef AL

asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user *msg, unsigned flags)
@@ -827,7 +824,7 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user *args)
compat_ptr(a[4]), compat_ptr(a[5]));
break;
case SYS_SHUTDOWN:
- ret = sys_shutdown(a0,a1);
+ ret = sys_shutdown(a0, a1);
break;
case SYS_SETSOCKOPT:
ret = compat_sys_setsockopt(a0, a1, a[2],
diff --git a/net/socket.c b/net/socket.c
index 367d547..b63c051 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -124,7 +124,7 @@ static int sock_fasync(int fd, struct file *filp, int on);
static ssize_t sock_sendpage(struct file *file, struct page *page,
int offset, size_t size, loff_t *ppos, int more);
static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
+ struct pipe_inode_info *pipe, size_t len,
unsigned int flags);

/*
@@ -162,7 +162,7 @@ static const struct net_proto_family *net_families[NPROTO] __read_mostly;
* Statistics counters of the socket lists
*/

-static DEFINE_PER_CPU(int, sockets_in_use) = 0;
+static DEFINE_PER_CPU(int, sockets_in_use);

/*
* Support routines.
@@ -309,9 +309,9 @@ static int init_inodecache(void)
}

static const struct super_operations sockfs_ops = {
- .alloc_inode = sock_alloc_inode,
- .destroy_inode =sock_destroy_inode,
- .statfs = simple_statfs,
+ .alloc_inode = sock_alloc_inode,
+ .destroy_inode = sock_destroy_inode,
+ .statfs = simple_statfs,
};

static int sockfs_get_sb(struct file_system_type *fs_type,
@@ -411,6 +411,7 @@ int sock_map_fd(struct socket *sock, int flags)

return fd;
}
+EXPORT_SYMBOL(sock_map_fd);

static struct socket *sock_from_file(struct file *file, int *err)
{
@@ -422,7 +423,7 @@ static struct socket *sock_from_file(struct file *file, int *err)
}

/**
- * sockfd_lookup - Go from a file number to its socket slot
+ * sockfd_lookup - Go from a file number to its socket slot
* @fd: file handle
* @err: pointer to an error code return
*
@@ -450,6 +451,7 @@ struct socket *sockfd_lookup(int fd, int *err)
fput(file);
return sock;
}
+EXPORT_SYMBOL(sockfd_lookup);

static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
{
@@ -540,6 +542,7 @@ void sock_release(struct socket *sock)
}
sock->file = NULL;
}
+EXPORT_SYMBOL(sock_release);

int sock_tx_timestamp(struct msghdr *msg, struct sock *sk,
union skb_shared_tx *shtx)
@@ -586,6 +589,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
ret = wait_on_sync_kiocb(&iocb);
return ret;
}
+EXPORT_SYMBOL(sock_sendmsg);

int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
struct kvec *vec, size_t num, size_t size)
@@ -604,6 +608,7 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
set_fs(oldfs);
return result;
}
+EXPORT_SYMBOL(kernel_sendmsg);

static int ktime2ts(ktime_t kt, struct timespec *ts)
{
@@ -664,7 +669,6 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
put_cmsg(msg, SOL_SOCKET,
SCM_TIMESTAMPING, sizeof(ts), &ts);
}
-
EXPORT_SYMBOL_GPL(__sock_recv_timestamp);

inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
@@ -720,6 +724,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
ret = wait_on_sync_kiocb(&iocb);
return ret;
}
+EXPORT_SYMBOL(sock_recvmsg);

static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
size_t size, int flags)
@@ -752,6 +757,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
set_fs(oldfs);
return result;
}
+EXPORT_SYMBOL(kernel_recvmsg);

static void sock_aio_dtor(struct kiocb *iocb)
{
@@ -774,7 +780,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,
}

static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
- struct pipe_inode_info *pipe, size_t len,
+ struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
{
struct socket *sock = file->private_data;
@@ -887,7 +893,7 @@ static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
*/

static DEFINE_MUTEX(br_ioctl_mutex);
-static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg) = NULL;
+static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg);

void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
{
@@ -895,7 +901,6 @@ void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
br_ioctl_hook = hook;
mutex_unlock(&br_ioctl_mutex);
}
-
EXPORT_SYMBOL(brioctl_set);

static DEFINE_MUTEX(vlan_ioctl_mutex);
@@ -907,7 +912,6 @@ void vlan_ioctl_set(int (*hook) (struct net *, void __user *))
vlan_ioctl_hook = hook;
mutex_unlock(&vlan_ioctl_mutex);
}
-
EXPORT_SYMBOL(vlan_ioctl_set);

static DEFINE_MUTEX(dlci_ioctl_mutex);
@@ -919,7 +923,6 @@ void dlci_ioctl_set(int (*hook) (unsigned int, void __user *))
dlci_ioctl_hook = hook;
mutex_unlock(&dlci_ioctl_mutex);
}
-
EXPORT_SYMBOL(dlci_ioctl_set);

static long sock_do_ioctl(struct net *net, struct socket *sock,
@@ -1047,6 +1050,7 @@ out_release:
sock = NULL;
goto out;
}
+EXPORT_SYMBOL(sock_create_lite);

/* No kernel lock held - perfect */
static unsigned int sock_poll(struct file *file, poll_table *wait)
@@ -1147,6 +1151,7 @@ call_kill:
rcu_read_unlock();
return 0;
}
+EXPORT_SYMBOL(sock_wake_async);

static int __sock_create(struct net *net, int family, int type, int protocol,
struct socket **res, int kern)
@@ -1265,11 +1270,13 @@ int sock_create(int family, int type, int protocol, struct socket **res)
{
return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
}
+EXPORT_SYMBOL(sock_create);

int sock_create_kern(int family, int type, int protocol, struct socket **res)
{
return __sock_create(&init_net, family, type, protocol, res, 1);
}
+EXPORT_SYMBOL(sock_create_kern);

SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
{
@@ -1474,7 +1481,8 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
goto out;

err = -ENFILE;
- if (!(newsock = sock_alloc()))
+ newsock = sock_alloc();
+ if (!newsock)
goto out_put;

newsock->type = sock->type;
@@ -1861,8 +1869,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct msghdr __user *, msg, unsigned, flags)
if (MSG_CMSG_COMPAT & flags) {
if (get_compat_msghdr(&msg_sys, msg_compat))
return -EFAULT;
- }
- else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr)))
+ } else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr)))
return -EFAULT;

sock = sockfd_lookup_light(fd, &err, &fput_needed);
@@ -1964,8 +1971,7 @@ static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
if (MSG_CMSG_COMPAT & flags) {
if (get_compat_msghdr(msg_sys, msg_compat))
return -EFAULT;
- }
- else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
+ } else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
return -EFAULT;

err = -EMSGSIZE;
@@ -2191,10 +2197,10 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
/* Argument list sizes for sys_socketcall */
#define AL(x) ((x) * sizeof(unsigned long))
static const unsigned char nargs[20] = {
- AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),
- AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),
- AL(6),AL(2),AL(5),AL(5),AL(3),AL(3),
- AL(4),AL(5)
+ AL(0), AL(3), AL(3), AL(3), AL(2), AL(3),
+ AL(3), AL(3), AL(4), AL(4), AL(4), AL(6),
+ AL(6), AL(2), AL(5), AL(5), AL(3), AL(3),
+ AL(4), AL(5)
};

#undef AL
@@ -2340,6 +2346,7 @@ int sock_register(const struct net_proto_family *ops)
printk(KERN_INFO "NET: Registered protocol family %d\n", ops->family);
return err;
}
+EXPORT_SYMBOL(sock_register);

/**
* sock_unregister - remove a protocol handler
@@ -2366,6 +2373,7 @@ void sock_unregister(int family)

printk(KERN_INFO "NET: Unregistered protocol family %d\n", family);
}
+EXPORT_SYMBOL(sock_unregister);

static int __init sock_init(void)
{
@@ -2490,13 +2498,13 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
ifc.ifc_req = NULL;
uifc = compat_alloc_user_space(sizeof(struct ifconf));
} else {
- size_t len =((ifc32.ifc_len / sizeof (struct compat_ifreq)) + 1) *
- sizeof (struct ifreq);
+ size_t len = ((ifc32.ifc_len / sizeof(struct compat_ifreq)) + 1) *
+ sizeof(struct ifreq);
uifc = compat_alloc_user_space(sizeof(struct ifconf) + len);
ifc.ifc_len = len;
ifr = ifc.ifc_req = (void __user *)(uifc + 1);
ifr32 = compat_ptr(ifc32.ifcbuf);
- for (i = 0; i < ifc32.ifc_len; i += sizeof (struct compat_ifreq)) {
+ for (i = 0; i < ifc32.ifc_len; i += sizeof(struct compat_ifreq)) {
if (copy_in_user(ifr, ifr32, sizeof(struct compat_ifreq)))
return -EFAULT;
ifr++;
@@ -2516,9 +2524,9 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
ifr = ifc.ifc_req;
ifr32 = compat_ptr(ifc32.ifcbuf);
for (i = 0, j = 0;
- i + sizeof (struct compat_ifreq) <= ifc32.ifc_len && j < ifc.ifc_len;
- i += sizeof (struct compat_ifreq), j += sizeof (struct ifreq)) {
- if (copy_in_user(ifr32, ifr, sizeof (struct compat_ifreq)))
+ i + sizeof(struct compat_ifreq) <= ifc32.ifc_len && j < ifc.ifc_len;
+ i += sizeof(struct compat_ifreq), j += sizeof(struct ifreq)) {
+ if (copy_in_user(ifr32, ifr, sizeof(struct compat_ifreq)))
return -EFAULT;
ifr32++;
ifr++;
@@ -2567,7 +2575,7 @@ static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32
compat_uptr_t uptr32;
struct ifreq __user *uifr;

- uifr = compat_alloc_user_space(sizeof (*uifr));
+ uifr = compat_alloc_user_space(sizeof(*uifr));
if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq)))
return -EFAULT;

@@ -2601,9 +2609,9 @@ static int bond_ioctl(struct net *net, unsigned int cmd,
return -EFAULT;

old_fs = get_fs();
- set_fs (KERNEL_DS);
+ set_fs(KERNEL_DS);
err = dev_ioctl(net, cmd, &kifr);
- set_fs (old_fs);
+ set_fs(old_fs);

return err;
case SIOCBONDSLAVEINFOQUERY:
@@ -2710,9 +2718,9 @@ static int compat_sioc_ifmap(struct net *net, unsigned int cmd,
return -EFAULT;

old_fs = get_fs();
- set_fs (KERNEL_DS);
+ set_fs(KERNEL_DS);
err = dev_ioctl(net, cmd, (void __user *)&ifr);
- set_fs (old_fs);
+ set_fs(old_fs);

if (cmd == SIOCGIFMAP && !err) {
err = copy_to_user(uifr32, &ifr, sizeof(ifr.ifr_name));
@@ -2734,7 +2742,7 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif
compat_uptr_t uptr32;
struct ifreq __user *uifr;

- uifr = compat_alloc_user_space(sizeof (*uifr));
+ uifr = compat_alloc_user_space(sizeof(*uifr));
if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq)))
return -EFAULT;

@@ -2750,20 +2758,20 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif
}

struct rtentry32 {
- u32 rt_pad1;
+ u32 rt_pad1;
struct sockaddr rt_dst; /* target address */
struct sockaddr rt_gateway; /* gateway addr (RTF_GATEWAY) */
struct sockaddr rt_genmask; /* target network mask (IP) */
- unsigned short rt_flags;
- short rt_pad2;
- u32 rt_pad3;
- unsigned char rt_tos;
- unsigned char rt_class;
- short rt_pad4;
- short rt_metric; /* +1 for binary compatibility! */
+ unsigned short rt_flags;
+ short rt_pad2;
+ u32 rt_pad3;
+ unsigned char rt_tos;
+ unsigned char rt_class;
+ short rt_pad4;
+ short rt_metric; /* +1 for binary compatibility! */
/* char * */ u32 rt_dev; /* forcing the device at add */
- u32 rt_mtu; /* per route MTU/Window */
- u32 rt_window; /* Window clamping */
+ u32 rt_mtu; /* per route MTU/Window */
+ u32 rt_window; /* Window clamping */
unsigned short rt_irtt; /* Initial RTT */
};

@@ -2793,29 +2801,29 @@ static int routing_ioctl(struct net *net, struct socket *sock,

if (sock && sock->sk && sock->sk->sk_family == AF_INET6) { /* ipv6 */
struct in6_rtmsg32 __user *ur6 = argp;
- ret = copy_from_user (&r6.rtmsg_dst, &(ur6->rtmsg_dst),
+ ret = copy_from_user(&r6.rtmsg_dst, &(ur6->rtmsg_dst),
3 * sizeof(struct in6_addr));
- ret |= __get_user (r6.rtmsg_type, &(ur6->rtmsg_type));
- ret |= __get_user (r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len));
- ret |= __get_user (r6.rtmsg_src_len, &(ur6->rtmsg_src_len));
- ret |= __get_user (r6.rtmsg_metric, &(ur6->rtmsg_metric));
- ret |= __get_user (r6.rtmsg_info, &(ur6->rtmsg_info));
- ret |= __get_user (r6.rtmsg_flags, &(ur6->rtmsg_flags));
- ret |= __get_user (r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex));
+ ret |= __get_user(r6.rtmsg_type, &(ur6->rtmsg_type));
+ ret |= __get_user(r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len));
+ ret |= __get_user(r6.rtmsg_src_len, &(ur6->rtmsg_src_len));
+ ret |= __get_user(r6.rtmsg_metric, &(ur6->rtmsg_metric));
+ ret |= __get_user(r6.rtmsg_info, &(ur6->rtmsg_info));
+ ret |= __get_user(r6.rtmsg_flags, &(ur6->rtmsg_flags));
+ ret |= __get_user(r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex));

r = (void *) &r6;
} else { /* ipv4 */
struct rtentry32 __user *ur4 = argp;
- ret = copy_from_user (&r4.rt_dst, &(ur4->rt_dst),
+ ret = copy_from_user(&r4.rt_dst, &(ur4->rt_dst),
3 * sizeof(struct sockaddr));
- ret |= __get_user (r4.rt_flags, &(ur4->rt_flags));
- ret |= __get_user (r4.rt_metric, &(ur4->rt_metric));
- ret |= __get_user (r4.rt_mtu, &(ur4->rt_mtu));
- ret |= __get_user (r4.rt_window, &(ur4->rt_window));
- ret |= __get_user (r4.rt_irtt, &(ur4->rt_irtt));
- ret |= __get_user (rtdev, &(ur4->rt_dev));
+ ret |= __get_user(r4.rt_flags, &(ur4->rt_flags));
+ ret |= __get_user(r4.rt_metric, &(ur4->rt_metric));
+ ret |= __get_user(r4.rt_mtu, &(ur4->rt_mtu));
+ ret |= __get_user(r4.rt_window, &(ur4->rt_window));
+ ret |= __get_user(r4.rt_irtt, &(ur4->rt_irtt));
+ ret |= __get_user(rtdev, &(ur4->rt_dev));
if (rtdev) {
- ret |= copy_from_user (devname, compat_ptr(rtdev), 15);
+ ret |= copy_from_user(devname, compat_ptr(rtdev), 15);
r4.rt_dev = devname; devname[15] = 0;
} else
r4.rt_dev = NULL;
@@ -2828,9 +2836,9 @@ static int routing_ioctl(struct net *net, struct socket *sock,
goto out;
}

- set_fs (KERNEL_DS);
+ set_fs(KERNEL_DS);
ret = sock_do_ioctl(net, sock, cmd, (unsigned long) r);
- set_fs (old_fs);
+ set_fs(old_fs);

out:
return ret;
@@ -2993,11 +3001,13 @@ int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen)
{
return sock->ops->bind(sock, addr, addrlen);
}
+EXPORT_SYMBOL(kernel_bind);

int kernel_listen(struct socket *sock, int backlog)
{
return sock->ops->listen(sock, backlog);
}
+EXPORT_SYMBOL(kernel_listen);

int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
{
@@ -3022,24 +3032,28 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
done:
return err;
}
+EXPORT_SYMBOL(kernel_accept);

int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen,
int flags)
{
return sock->ops->connect(sock, addr, addrlen, flags);
}
+EXPORT_SYMBOL(kernel_connect);

int kernel_getsockname(struct socket *sock, struct sockaddr *addr,
int *addrlen)
{
return sock->ops->getname(sock, addr, addrlen, 0);
}
+EXPORT_SYMBOL(kernel_getsockname);

int kernel_getpeername(struct socket *sock, struct sockaddr *addr,
int *addrlen)
{
return sock->ops->getname(sock, addr, addrlen, 1);
}
+EXPORT_SYMBOL(kernel_getpeername);

int kernel_getsockopt(struct socket *sock, int level, int optname,
char *optval, int *optlen)
@@ -3056,6 +3070,7 @@ int kernel_getsockopt(struct socket *sock, int level, int optname,
set_fs(oldfs);
return err;
}
+EXPORT_SYMBOL(kernel_getsockopt);

int kernel_setsockopt(struct socket *sock, int level, int optname,
char *optval, unsigned int optlen)
@@ -3072,6 +3087,7 @@ int kernel_setsockopt(struct socket *sock, int level, int optname,
set_fs(oldfs);
return err;
}
+EXPORT_SYMBOL(kernel_setsockopt);

int kernel_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
@@ -3083,6 +3099,7 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset,

return sock_no_sendpage(sock, page, offset, size, flags);
}
+EXPORT_SYMBOL(kernel_sendpage);

int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)
{
@@ -3095,33 +3112,11 @@ int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)

return err;
}
+EXPORT_SYMBOL(kernel_sock_ioctl);

int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how)
{
return sock->ops->shutdown(sock, how);
}
-
-EXPORT_SYMBOL(sock_create);
-EXPORT_SYMBOL(sock_create_kern);
-EXPORT_SYMBOL(sock_create_lite);
-EXPORT_SYMBOL(sock_map_fd);
-EXPORT_SYMBOL(sock_recvmsg);
-EXPORT_SYMBOL(sock_register);
-EXPORT_SYMBOL(sock_release);
-EXPORT_SYMBOL(sock_sendmsg);
-EXPORT_SYMBOL(sock_unregister);
-EXPORT_SYMBOL(sock_wake_async);
-EXPORT_SYMBOL(sockfd_lookup);
-EXPORT_SYMBOL(kernel_sendmsg);
-EXPORT_SYMBOL(kernel_recvmsg);
-EXPORT_SYMBOL(kernel_bind);
-EXPORT_SYMBOL(kernel_listen);
-EXPORT_SYMBOL(kernel_accept);
-EXPORT_SYMBOL(kernel_connect);
-EXPORT_SYMBOL(kernel_getsockname);
-EXPORT_SYMBOL(kernel_getpeername);
-EXPORT_SYMBOL(kernel_getsockopt);
-EXPORT_SYMBOL(kernel_setsockopt);
-EXPORT_SYMBOL(kernel_sendpage);
-EXPORT_SYMBOL(kernel_sock_ioctl);
EXPORT_SYMBOL(kernel_sock_shutdown);
+
--
1.7.0.4
bc10502dba37d3b210efd9f3867212298f13b78e 03-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com> net: use __packed annotation

cleanup patch.

Use new __packed annotation in net/ and include/
(except netfilter)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
de039f02d877af52b8d0fe77878b8343a0f99d8b 09-Dec-2009 Heiko Carstens <heiko.carstens@de.ibm.com> net: use compat helper functions in compat_sys_recvmmsg

Use (get|put)_compat_timespec helper functions to simplify the code.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
60c2ffd3d2cf12008747d920ae118df119006003 09-Dec-2009 Heiko Carstens <heiko.carstens@de.ibm.com> net: fix compat_sys_recvmmsg parameter type

compat_sys_recvmmsg has a compat_timespec parameter and not a
timespec parameter. This way we also get rid of an odd cast.

Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b23136bcf766a58160a319677b366c90f0cd223 01-Dec-2009 Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> net: compat_sys_recvmmsg user timespec arg can be NULL

We must test if user timespec is non-NULL before copying from userpace,
same as sys_recvmmsg().

Commiter note: changed it so that we have just one branch.

Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
65a1c4fffaaf5ca166a1263d84ca664d5192cda6 23-Oct-2009 roel kluin <roel.kluin@gmail.com> net: Cleanup redundant tests on unsigned

optlen is unsigned so the `< 0' test is never true.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a2e2725541fad72416326798c2d7fa4dafb7d337 13-Oct-2009 Arnaldo Carvalho de Melo <acme@redhat.com> net: Introduce recvmmsg socket syscall

Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
works in the same fashion as the ppoll one.

If the underlying protocol returns a datagram with MSG_OOB set, this
will make recvmmsg return right away with as many datagrams (+ the OOB
one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
datagrams and then recvmsg returns an error, recvmmsg will return
the successfully received datagrams, store the error and return it
in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b7058842c940ad2c08dd829b21e5c92ebe3b8758 01-Oct-2009 David S. Miller <davem@davemloft.net> net: Make setsockopt() optlen be unsigned.

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
1dacc76d0014a034b8aca14237c127d7c19d7726 01-Jul-2009 Johannes Berg <johannes@sipsolutions.net> net/compat/wext: send different messages to compat tasks

Wireless extensions have the unfortunate problem that events
are multicast netlink messages, and are not independent of
pointer size. Thus, currently 32-bit tasks on 64-bit platforms
cannot properly receive events and fail with all kinds of
strange problems, for instance wpa_supplicant never notices
disassociations, due to the way the 64-bit event looks (to a
32-bit process), the fact that the address is all zeroes is
lost, it thinks instead it is 00:00:00:00:01:00.

The same problem existed with the ioctls, until David Miller
fixed those some time ago in an heroic effort.

A different problem caused by this is that we cannot send the
ASSOCREQIE/ASSOCRESPIE events because sending them causes a
32-bit wpa_supplicant on a 64-bit system to overwrite its
internal information, which is worse than it not getting the
information at all -- so we currently resort to sending a
custom string event that it then parses. This, however, has a
severe size limitation we are frequently hitting with modern
access points; this limitation would can be lifted after this
patch by sending the correct binary, not custom, event.

A similar problem apparently happens for some other netlink
users on x86_64 with 32-bit tasks due to the alignment for
64-bit quantities.

In order to fix these problems, I have implemented a way to
send compat messages to tasks. When sending an event, we send
the non-compat event data together with a compat event data in
skb_shinfo(main_skb)->frag_list. Then, when the event is read
from the socket, the netlink code makes sure to pass out only
the skb that is compatible with the task. This approach was
suggested by David Miller, my original approach required
always sending two skbs but that had various small problems.

To determine whether compat is needed or not, I have used the
MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
recvfrom to include it, even if those calls do not have a cmsg
parameter.

I have not solved one small part of the problem, and I don't
think it is necessary to: if a 32-bit application uses read()
rather than any form of recvmsg() it will still get the wrong
(64-bit) event. However, neither do applications actually do
this, nor would it be a regression.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
20d4947353be60e909e6b1a79d241457edd6833f 12-Feb-2009 Patrick Ohly <patrick.ohly@intel.com> net: socket infrastructure for SO_TIMESTAMPING

The overlap with the old SO_TIMESTAMP[NS] options is handled so
that time stamping in software (net_enable_timestamp()) is
enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
is set. It's disabled if all of these are off.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
de11defebf00007677fb7ee91d9b089b78786fbb 20-Nov-2008 Ulrich Drepper <drepper@redhat.com> reintroduce accept4

Introduce a new accept4() system call. The addition of this system call
matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
inotify_init1(), epoll_create1(), pipe2()) which added new system calls
that differed from analogous traditional system calls in adding a flags
argument that can be used to access additional functionality.

The accept4() system call is exactly the same as accept(), except that
it adds a flags bit-mask argument. Two flags are initially implemented.
(Most of the new system calls in 2.6.27 also had both of these flags.)

SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
for the new file descriptor returned by accept4(). This is a useful
security feature to avoid leaking information in a multithreaded
program where one thread is doing an accept() at the same time as
another thread is doing a fork() plus exec(). More details here:
http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
Ulrich Drepper).

The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
to be enabled on the new open file description created by accept4().
(This flag is merely a convenience, saving the use of additional calls
fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.

Here's a test program. Works on x86-32. Should work on x86-64, but
I (mtk) don't have a system to hand to test with.

It tests accept4() with each of the four possible combinations of
SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
that the appropriate flags are set on the file descriptor/open file
description returned by accept4().

I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
and it passes according to my test program.

/* test_accept4.c

Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
<mtk.manpages@gmail.com>

Licensed under the GNU GPLv2 or later.
*/
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>

#define PORT_NUM 33333

#define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)

/**********************************************************************/

/* The following is what we need until glibc gets a wrapper for
accept4() */

/* Flags for socket(), socketpair(), accept4() */
#ifndef SOCK_CLOEXEC
#define SOCK_CLOEXEC O_CLOEXEC
#endif
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK O_NONBLOCK
#endif

#ifdef __x86_64__
#define SYS_accept4 288
#elif __i386__
#define USE_SOCKETCALL 1
#define SYS_ACCEPT4 18
#else
#error "Sorry -- don't know the syscall # on this architecture"
#endif

static int
accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
{
printf("Calling accept4(): flags = %x", flags);
if (flags != 0) {
printf(" (");
if (flags & SOCK_CLOEXEC)
printf("SOCK_CLOEXEC");
if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
printf(" ");
if (flags & SOCK_NONBLOCK)
printf("SOCK_NONBLOCK");
printf(")");
}
printf("\n");

#if USE_SOCKETCALL
long args[6];

args[0] = fd;
args[1] = (long) sockaddr;
args[2] = (long) addrlen;
args[3] = flags;

return syscall(SYS_socketcall, SYS_ACCEPT4, args);
#else
return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
#endif
}

/**********************************************************************/

static int
do_test(int lfd, struct sockaddr_in *conn_addr,
int closeonexec_flag, int nonblock_flag)
{
int connfd, acceptfd;
int fdf, flf, fdf_pass, flf_pass;
struct sockaddr_in claddr;
socklen_t addrlen;

printf("=======================================\n");

connfd = socket(AF_INET, SOCK_STREAM, 0);
if (connfd == -1)
die("socket");
if (connect(connfd, (struct sockaddr *) conn_addr,
sizeof(struct sockaddr_in)) == -1)
die("connect");

addrlen = sizeof(struct sockaddr_in);
acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
closeonexec_flag | nonblock_flag);
if (acceptfd == -1) {
perror("accept4()");
close(connfd);
return 0;
}

fdf = fcntl(acceptfd, F_GETFD);
if (fdf == -1)
die("fcntl:F_GETFD");
fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
((closeonexec_flag & SOCK_CLOEXEC) != 0);
printf("Close-on-exec flag is %sset (%s); ",
(fdf & FD_CLOEXEC) ? "" : "not ",
fdf_pass ? "OK" : "failed");

flf = fcntl(acceptfd, F_GETFL);
if (flf == -1)
die("fcntl:F_GETFD");
flf_pass = ((flf & O_NONBLOCK) != 0) ==
((nonblock_flag & SOCK_NONBLOCK) !=0);
printf("nonblock flag is %sset (%s)\n",
(flf & O_NONBLOCK) ? "" : "not ",
flf_pass ? "OK" : "failed");

close(acceptfd);
close(connfd);

printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
return fdf_pass && flf_pass;
}

static int
create_listening_socket(int port_num)
{
struct sockaddr_in svaddr;
int lfd;
int optval;

memset(&svaddr, 0, sizeof(struct sockaddr_in));
svaddr.sin_family = AF_INET;
svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
svaddr.sin_port = htons(port_num);

lfd = socket(AF_INET, SOCK_STREAM, 0);
if (lfd == -1)
die("socket");

optval = 1;
if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
sizeof(optval)) == -1)
die("setsockopt");

if (bind(lfd, (struct sockaddr *) &svaddr,
sizeof(struct sockaddr_in)) == -1)
die("bind");

if (listen(lfd, 5) == -1)
die("listen");

return lfd;
}

int
main(int argc, char *argv[])
{
struct sockaddr_in conn_addr;
int lfd;
int port_num;
int passed;

passed = 1;

port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;

memset(&conn_addr, 0, sizeof(struct sockaddr_in));
conn_addr.sin_family = AF_INET;
conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
conn_addr.sin_port = htons(port_num);

lfd = create_listening_socket(port_num);

if (!do_test(lfd, &conn_addr, 0, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
passed = 0;
if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
passed = 0;
if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
passed = 0;

close(lfd);

exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
}

[mtk.manpages@gmail.com: rewrote changelog, updated test program]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: <linux-api@vger.kernel.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d35aac10eb7bcb3b80bef16b60844af0313f47f7 12-Nov-2008 Patrick Ohly <patrick.ohly@intel.com> net: put_cmsg_compat + SO_TIMESTAMP[NS]: use same name for value as caller

In __sock_recv_timestamp() the additional SCM_TIMESTAMP[NS] is used. This
has the same value as SO_TIMESTAMP[NS], so this is a purely cosmetic change.

Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aaca0bdca573f3f51ea03139f9c7289541e7bca3 24-Jul-2008 Ulrich Drepper <drepper@redhat.com> flag parameters: paccept

This patch is by far the most complex in the series. It adds a new syscall
paccept. This syscall differs from accept in that it adds (at the userlevel)
two additional parameters:

- a signal mask
- a flags value

The flags parameter can be used to set flag like SOCK_CLOEXEC. This is
imlpemented here as well. Some people argued that this is a property which
should be inherited from the file desriptor for the server but this is against
POSIX. Additionally, we really want the signal mask parameter as well
(similar to pselect, ppoll, etc). So an interface change in inevitable.

The flag value is the same as for socket and socketpair. I think diverging
here will only create confusion. Similar to the filesystem interfaces where
the use of the O_* constants differs, it is acceptable here.

The signal mask is handled as for pselect etc. The mask is temporarily
installed for the thread and removed before the call returns. I modeled the
code after pselect. If there is a problem it's likely also in pselect.

For architectures which use socketcall I maintained this interface instead of
adding a system call. The symmetry shouldn't be broken.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <sys/syscall.h>

#ifndef __NR_paccept
# ifdef __x86_64__
# define __NR_paccept 288
# elif defined __i386__
# define SYS_PACCEPT 18
# define USE_SOCKETCALL 1
# else
# error "need __NR_paccept"
# endif
#endif

#ifdef USE_SOCKETCALL
# define paccept(fd, addr, addrlen, mask, flags) \
({ long args[6] = { \
(long) fd, (long) addr, (long) addrlen, (long) mask, 8, (long) flags }; \
syscall (__NR_socketcall, SYS_PACCEPT, args); })
#else
# define paccept(fd, addr, addrlen, mask, flags) \
syscall (__NR_paccept, fd, addr, addrlen, mask, 8, flags)
#endif

#define PORT 57392

#define SOCK_CLOEXEC O_CLOEXEC

static pthread_barrier_t b;

static void *
tf (void *arg)
{
pthread_barrier_wait (&b);
int s = socket (AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sin;
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);

pthread_barrier_wait (&b);
s = socket (AF_INET, SOCK_STREAM, 0);
sin.sin_port = htons (PORT);
connect (s, (const struct sockaddr *) &sin, sizeof (sin));
close (s);
pthread_barrier_wait (&b);

pthread_barrier_wait (&b);
sleep (2);
pthread_kill ((pthread_t) arg, SIGUSR1);

return NULL;
}

static void
handler (int s)
{
}

int
main (void)
{
pthread_barrier_init (&b, NULL, 2);

struct sockaddr_in sin;
pthread_t th;
if (pthread_create (&th, NULL, tf, (void *) pthread_self ()) != 0)
{
puts ("pthread_create failed");
return 1;
}

int s = socket (AF_INET, SOCK_STREAM, 0);
int reuse = 1;
setsockopt (s, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof (reuse));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl (INADDR_LOOPBACK);
sin.sin_port = htons (PORT);
bind (s, (struct sockaddr *) &sin, sizeof (sin));
listen (s, SOMAXCONN);

pthread_barrier_wait (&b);

int s2 = paccept (s, NULL, 0, NULL, 0);
if (s2 < 0)
{
puts ("paccept(0) failed");
return 1;
}

int coe = fcntl (s2, F_GETFD);
if (coe & FD_CLOEXEC)
{
puts ("paccept(0) set close-on-exec-flag");
return 1;
}
close (s2);

pthread_barrier_wait (&b);

s2 = paccept (s, NULL, 0, NULL, SOCK_CLOEXEC);
if (s2 < 0)
{
puts ("paccept(SOCK_CLOEXEC) failed");
return 1;
}

coe = fcntl (s2, F_GETFD);
if ((coe & FD_CLOEXEC) == 0)
{
puts ("paccept(SOCK_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (s2);

pthread_barrier_wait (&b);

struct sigaction sa;
sa.sa_handler = handler;
sa.sa_flags = 0;
sigemptyset (&sa.sa_mask);
sigaction (SIGUSR1, &sa, NULL);

sigset_t ss;
pthread_sigmask (SIG_SETMASK, NULL, &ss);
sigaddset (&ss, SIGUSR1);
pthread_sigmask (SIG_SETMASK, &ss, NULL);

sigdelset (&ss, SIGUSR1);
alarm (4);
pthread_barrier_wait (&b);

errno = 0 ;
s2 = paccept (s, NULL, 0, &ss, 0);
if (s2 != -1 || errno != EINTR)
{
puts ("paccept did not fail with EINTR");
return 1;
}

close (s);

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[akpm@linux-foundation.org: make it compile]
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roland McGrath <roland@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
230b183921ecbaa5fedc0d35ad6ba7bb64b6e06a 20-Jul-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> net: Use standard structures for generic socket address structures.

Use sockaddr_storage{} for generic socket address storage
and ensures proper alignment.
Use sockaddr{} for pointers to omit several casts.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
42908c69f61f75dd70e424263ab89ee52040382b 29-Apr-2008 David L Stevens <dlstevens@us.ibm.com> net: Add compat support for getsockopt (MCAST_MSFILTER)

This patch adds support for getsockopt for MCAST_MSFILTER for
both IPv4 and IPv6. It depends on the previous setsockopt patch,
and uses the same method.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
be666e0a1345ed80f29cb30c73da0ec2ea5c5863 29-Apr-2008 David L Stevens <dlstevens@us.ibm.com> net: Several cleanups for the setsockopt compat support.

1) added missing "__user" for kgsr and kgf pointers
2) verify read for only GROUP_FILTER_SIZE(0). The group_filter
structure definition (via RFC) includes space for one source
in the source list array, but that source need not be present.
So, sizeof(group_filter) > GROUP_FILTER_SIZE(0). Fixed
the user read-check for minimum length to use the smaller size.
3) remove unneeded "&" for gf_slist addresses

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dae50295488f35d2d617b08a5fae43154c947eec 27-Apr-2008 David L Stevens <dlstevens@us.ibm.com> ipv4/ipv6 compat: Fix SSM applications on 64bit kernels.

Add support on 64-bit kernels for seting 32-bit compatible MCAST*
socket options.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3bc3fe5eed5e866c0871db6d745f3bf58af004ef 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip6_tables: add compat support

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6e23ae2a48750bda407a4a58f52a4865d7308bf5 20-Nov-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: Introduce NF_INET_ hook values

The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ac70e7ad24a88710cf9b6d7ababaefa2b575df0 20-Dec-2007 Wei Yongjun <yjwei@cn.fujitsu.com> [NET]: Fix function put_cmsg() which may cause usr application memory overflow

When used function put_cmsg() to copy kernel information to user
application memory, if the memory length given by user application is
not enough, by the bad length calculate of msg.msg_controllen,
put_cmsg() function may cause the msg.msg_controllen to be a large
value, such as 0xFFFFFFF0, so the following put_cmsg() can also write
data to usr application memory even usr has no valid memory to store
this. This may cause usr application memory overflow.

int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
{
struct cmsghdr __user *cm
= (__force struct cmsghdr __user *)msg->msg_control;
struct cmsghdr cmhdr;
int cmlen = CMSG_LEN(len);
~~~~~~~~~~~~~~~~~~~~~
int err;

if (MSG_CMSG_COMPAT & msg->msg_flags)
return put_cmsg_compat(msg, level, type, len, data);

if (cm==NULL || msg->msg_controllen < sizeof(*cm)) {
msg->msg_flags |= MSG_CTRUNC;
return 0; /* XXX: return error? check spec. */
}
if (msg->msg_controllen < cmlen) {
~~~~~~~~~~~~~~~~~~~~~~~~
msg->msg_flags |= MSG_CTRUNC;
cmlen = msg->msg_controllen;
}
cmhdr.cmsg_level = level;
cmhdr.cmsg_type = type;
cmhdr.cmsg_len = cmlen;

err = -EFAULT;
if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
goto out;
if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
goto out;
cmlen = CMSG_SPACE(len);
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If MSG_CTRUNC flags is set, msg->msg_controllen is less than
CMSG_SPACE(len), "msg->msg_controllen -= cmlen" will cause unsinged int
type msg->msg_controllen to be a large value.
~~~~~~~~~~~~~~~~~~~~~~~~~~~
msg->msg_control += cmlen;
msg->msg_controllen -= cmlen;
~~~~~~~~~~~~~~~~~~~~~
err = 0;
out:
return err;
}

The same promble exists in put_cmsg_compat(). This patch can fix this
problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4a19542e5f694cd408a32c3d9dc593ba9366e2d7 16-Jul-2007 Ulrich Drepper <drepper@redhat.com> O_CLOEXEC for SCM_RIGHTS

Part two in the O_CLOEXEC saga: adding support for file descriptors received
through Unix domain sockets.

The patch is once again pretty minimal, it introduces a new flag for recvmsg
and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit
is not used otherwise but the networking people will know better.

This new flag is not recognized by recvfrom and recv. These functions cannot
be used for that purpose and the asymmetry this introduces is not worse than
the already existing MSG_CMSG_COMPAT situations.

The patch must be applied on the patch which introduced O_CLOEXEC. It has to
remove static from the new get_unused_fd_flags function but since scm.c cannot
live in a module the function still hasn't to be exported.

Here's a test program to make sure the code works. It's so much longer than
the actual patch...

#include <errno.h>
#include <error.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/un.h>

#ifndef O_CLOEXEC
# define O_CLOEXEC 02000000
#endif
#ifndef MSG_CMSG_CLOEXEC
# define MSG_CMSG_CLOEXEC 0x40000000
#endif

int
main (int argc, char *argv[])
{
if (argc > 1)
{
int fd = atol (argv[1]);
printf ("child: fd = %d\n", fd);
if (fcntl (fd, F_GETFD) == 0 || errno != EBADF)
{
puts ("file descriptor valid in child");
return 1;
}
return 0;

}

struct sockaddr_un sun;
strcpy (sun.sun_path, "./testsocket");
sun.sun_family = AF_UNIX;

char databuf[] = "hello";
struct iovec iov[1];
iov[0].iov_base = databuf;
iov[0].iov_len = sizeof (databuf);

union
{
struct cmsghdr hdr;
char bytes[CMSG_SPACE (sizeof (int))];
} buf;
struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1,
.msg_control = buf.bytes,
.msg_controllen = sizeof (buf) };
struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg);

cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;
cmsg->cmsg_len = CMSG_LEN (sizeof (int));

msg.msg_controllen = cmsg->cmsg_len;

pid_t child = fork ();
if (child == -1)
error (1, errno, "fork");
if (child == 0)
{
int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");

if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "bind");
if (listen (sock, SOMAXCONN) < 0)
error (1, errno, "listen");

int conn = accept (sock, NULL, NULL);
if (conn == -1)
error (1, errno, "accept");

*(int *) CMSG_DATA (cmsg) = sock;
if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0)
error (1, errno, "sendmsg");

return 0;
}

/* For a test suite this should be more robust like a
barrier in shared memory. */
sleep (1);

int sock = socket (PF_UNIX, SOCK_STREAM, 0);
if (sock < 0)
error (1, errno, "socket");

if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0)
error (1, errno, "connect");
unlink (sun.sun_path);

*(int *) CMSG_DATA (cmsg) = -1;

if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0)
error (1, errno, "recvmsg");

int fd = *(int *) CMSG_DATA (cmsg);
if (fd == -1)
error (1, 0, "no descriptor received");

char fdname[20];
snprintf (fdname, sizeof (fdname), "%d", fd);
execl ("/proc/self/exe", argv[0], fdname, NULL);
puts ("execl failed");
return 1;
}

[akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Michael Buesch <mb@bu3sch.de>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
92f37fd2ee805aa77925c1e64fd56088b46094fc 26-Mar-2007 Eric Dumazet <dada1@cosmosbay.com> [NET]: Adding SO_TIMESTAMPNS / SCM_TIMESTAMPNS support

Now that network timestamps use ktime_t infrastructure, we can add a new
SOL_SOCKET sockopt SO_TIMESTAMPNS.

This command is similar to SO_TIMESTAMP, but permits transmission of
a 'timespec struct' instead of a 'timeval struct' control message.
(nanosecond resolution instead of microsecond)

Control message is labelled SCM_TIMESTAMPNS instead of SCM_TIMESTAMP

A socket cannot mix SO_TIMESTAMP and SO_TIMESTAMPNS : the two modes are
mutually exclusive.

sock_recv_timestamp() became too big to be fully inlined so I added a
__sock_recv_timestamp() helper function.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: linux-arch@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
e71a4783aae059931f63b2d4e7013e36529badef 11-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET] core: whitespace cleanup

Fix whitespace around keywords. Fix indentation especially of switch
statements.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ae40eb1ef30ab4120bd3c8b7e3da99ee53d27a23 19-Mar-2007 Eric Dumazet <dada1@cosmosbay.com> [NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution

Now network timestamps use ktime_t infrastructure, we can add a new
ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'.
User programs can thus access to nanosecond resolution.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b7aa0bf70c4afb9e38be25f5c0922498d0f8684c 20-Apr-2007 Eric Dumazet <dada1@cosmosbay.com> [NET]: convert network timestamps to ktime_t

We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
cd354f1ae75e6466a7e31b727faede57a1f89ca5 14-Feb-2007 Tim Schmielau <tim@physik3.uni-rostock.de> [PATCH] remove many unneeded #includes of sched.h

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4768fbcbcfbbcacb785ae08eef33767a0b4fdcdd 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET]: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
effee6a00034a8d83a6dea6d221820d87364ac21 10-Oct-2006 Miklos Szeredi <miklos@szeredi.hu> [NET]: File descriptor loss while receiving SCM_RIGHTS

If more than one file descriptor was sent with an SCM_RIGHTS message,
and on the receiving end, after installing a nonzero (but not all)
file descritpors the process runs out of fds, then the already
installed fds will be lost (userspace will have no way of knowing
about them).

The following patch makes sure, that at least the already installed
fds are sent to userspace. It doesn't solve the issue of losing file
descriptors in case of an EFAULT on the userspace buffer.

Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2722971cbe831117686039d5c334f2c0f560be13 01-Apr-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: iptables 32bit compat layer

This patch extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient due
to alignment checks both in kernel and user space tools.

Patch is for current net-2.6.17 with addition of move of ipt_entry_{match|
target} definitions to xt_entry_{match|target}.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
f0ac2614412e2b597e2d5bfbd3960b4f73718b41 22-Mar-2006 Shaun Pereira <spereira@tusc.com.au> [NET]: socket timestamp 32 bit handler for 64 bit kernel

Get socket timestamp handler function that does not use the
ioctl32_hash_table.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3fdadf7d27e3fbcf72930941884387d1f4936f04 21-Mar-2006 Dmitry Mishin <dim@openvz.org> [NET]: {get|set}sockopt compatibility layer

This patch extends {get|set}sockopt compatibility layer in order to
move protocol specific parts to their place and avoid huge universal
net/compat.c file in the future.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8920e8f94c44e31a73bdf923b04721e26e88cadd 08-Sep-2005 Al Viro <viro@zeniv.linux.org.uk> [PATCH] Fix 32bit sendmsg() flaw

When we copy 32bit ->msg_control contents to kernel, we walk the same
userland data twice without sanity checks on the second pass.

Second version of this patch: the original broke with 64-bit arches
running 32-bit-compat-mode executables doing sendmsg() syscalls with
unaligned CMSG data areas

Another thing is that we use kmalloc() to allocate and sock_kfree_s()
to free afterwards; less serious, but also needs fixing.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Chris Wright <chrisw@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
d64d3873721cfe870d49d73c3744f06260779ce7 10-Aug-2005 Andrew Morton <akpm@osdl.org> [NET]: Fix memory leak in sys_{send,recv}msg() w/compat

From: Dave Johnson <djohnson+linux-kernel@sw.starentnetworks.com>

sendmsg()/recvmsg() syscalls from o32/n32 apps to a 64bit kernel will
cause a kernel memory leak if iov_len > UIO_FASTIOV for each syscall!

This is because both sys_sendmsg() and verify_compat_iovec() kmalloc a
new iovec structure. Only the one from sys_sendmsg() is free'ed.

I wrote a simple test program to confirm this after identifying the
problem:

http://davej.org/programs/testsendmsg.c

Note that the below fix will break solaris_sendmsg()/solaris_recvmsg() as
it also calls verify_compat_iovec() but expects it to malloc internally.

[ I fixed that. -DaveM ]

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!