History log of /net/ipv4/netfilter/ip_tables.c
Revision Date Author Comments
c58dd2dd443c26d856a168db108a0cd11c285bf3 04-Apr-2014 Thomas Graf <tgraf@suug.ch> netfilter: Can't fail and free after table replacement

All xtables variants suffer from the defect that the copy_to_user()
to copy the counters to user memory may fail after the table has
already been exchanged and thus exposed. Return an error at this
point will result in freeing the already exposed table. Any
subsequent packet processing will result in a kernel panic.

We can't copy the counters before exposing the new tables as we
want provide the counter state after the old table has been
unhooked. Therefore convert this into a silent error.

Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
b416c144f46af1a30ddfa4e4319a8f077381ad63 21-Oct-2013 Will Deacon <will.deacon@arm.com> netfilter: x_tables: fix ordering of jumpstack allocation and table update

During kernel stability testing on an SMP ARMv7 system, Yalin Wang
reported the following panic from the netfilter code:

1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
[<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
[<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
[<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
[<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
[<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
[<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
[<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
[<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
[<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
[<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
[<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
[<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
[<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
---[ end trace da227214a82491bd ]---
Kernel panic - not syncing: Fatal exception in interrupt

This comes about because CPU1 is executing xt_replace_table in response
to a setsockopt syscall, resulting in:

ret = xt_jumpstack_alloc(newinfo);
--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);

[...]

table->private = newinfo;
newinfo->initial_entries = private->initial_entries;

Meanwhile, CPU0 is handling the network receive path and ends up in
ipt_do_table, resulting in:

private = table->private;

[...]

jumpstack = (struct ipt_entry **)private->jumpstack[cpu];

On weakly ordered memory architectures, the writes to table->private
and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
Furthermore, on architectures which don't respect ordering of address
dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

This patch adds an smp_wmb() before the assignment to table->private
(which is essentially publishing newinfo) to ensure that all writes to
newinfo will be observed before plugging it into the table structure.
A dependent-read barrier is also added on the consumer sides, to ensure
the same ordering requirements are also respected there.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
f229f6ce481ceb33a966311722b8ef0cb6c25de7 06-Apr-2013 Patrick McHardy <kaber@trash.net> netfilter: add my copyright statements

Add copyright statements to all netfilter files which have had significant
changes done by myself in the past.

Some notes:

- nf_conntrack_ecache.c was incorrectly attributed to Rusty and Netfilter
Core Team when it got split out of nf_conntrack_core.c. The copyrights
even state a date which lies six years before it was written. It was
written in 2005 by Harald and myself.

- net/ipv{4,6}/netfilter.c, net/netfitler/nf_queue.c were missing copyright
statements. I've added the copyright statement from net/netfilter/core.c,
where this code originated

- for nf_conntrack_proto_tcp.c I've also added Jozsef, since I didn't want
it to give the wrong impression

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
30e0c6a6bee24db0166b7ca709277cd693e179f2 25-Mar-2013 Gao feng <gaofeng@cn.fujitsu.com> netfilter: nf_log: prepare net namespace support for loggers

This patch adds netns support to nf_log and it prepares netns
support for existing loggers. It is composed of four major
changes.

1) nf_log_register has been split to two functions: nf_log_register
and nf_log_set. The new nf_log_register is used to globally
register the nf_logger and nf_log_set is used for enabling
pernet support from nf_loggers.

Per netns is not yet complete after this patch, it comes in
separate follow up patches.

2) Add net as a parameter of nf_log_bind_pf. Per netns is not
yet complete after this patch, it only allows to bind the
nf_logger to the protocol family from init_net and it skips
other cases.

3) Adapt all nf_log_packet callers to pass netns as parameter.
After this patch, this function only works for init_net.

4) Make the sysctl net/netfilter/nf_log pernet.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
f0165888610a1701a39670c7eadf63a61fad708d 21-Mar-2013 Gao feng <gaofeng@cn.fujitsu.com> netfilter: use IS_ENABLE to replace if defined in TRACE target

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
0cc8d8df9bb931f1d4ab376f59d8ab8a49f9d4d4 22-Jan-2013 YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> netfilter: Use IS_ERR_OR_NULL().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
52e804c6dfaa5df1e4b0e290357b82ad4e4cda2c 16-Nov-2012 Eric W. Biederman <ebiederm@xmission.com> net: Allow userns root to control ipv4

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.

Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.

Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.

Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e87cc4728f0e2fb663e592a1141742b1d6c63256 13-May-2012 Joe Perches <joe@perches.com> net: Convert net_ratelimit uses to net_<level>_ratelimited

Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
95c961747284a6b83a5e2d81240e214b0fa3464d 15-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com> net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
63f6fe92c6b3cbf4c0bbbea4b31fdd3d68e21e4d 16-Jun-2011 Sebastian Andrzej Siewior <sebastian@breakpoint.cc> netfilter: ip_tables: fix compile with debug

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: Patrick McHardy <kaber@trash.net>
7f5c6d4f665bb57a19a34ce1fb16cc708c04f219 04-Apr-2011 Eric Dumazet <eric.dumazet@gmail.com> netfilter: get rid of atomic ops in fast path

We currently use a percpu spinlock to 'protect' rule bytes/packets
counters, after various attempts to use RCU instead.

Lately we added a seqlock so that get_counters() can run without
blocking BH or 'writers'. But we really only need the seqcount in it.

Spinlock itself is only locked by the current/owner cpu, so we can
remove it completely.

This cleanups api, using correct 'writer' vs 'reader' semantic.

At replace time, the get_counters() call makes sure all cpus are done
using the old table.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
db856674ac69e31946e56085239757cca3f7655f 20-Mar-2011 Eric Dumazet <eric.dumazet@gmail.com> netfilter: xtables: fix reentrancy

commit f3c5c1bfd4308 (make ip_tables reentrant) introduced a race in
handling the stackptr restore, at the end of ipt_do_table()

We should do it before the call to xt_info_rdunlock_bh(), or we allow
cpu preemption and another cpu overwrites stackptr of original one.

A second fix is to change the underflow test to check the origptr value
instead of 0 to detect underflow, or else we allow a jump from different
hooks.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
78b79876761b86653df89c48a7010b5cbd41a84a 15-Mar-2011 Vasiliy Kulikov <segoon@openwall.com> netfilter: ip_tables: fix infoleak to userspace

Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
copied from userspace. Fields of these structs that are
zero-terminated strings are not checked. When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.

The first and the third bugs were introduced before the git epoch; the
second was introduced in 2722971c (v2.6.17-rc1). To trigger the bug
one should have CAP_NET_ADMIN.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
255d0dc34068a976550ce555e153c0bfcfec7cc6 18-Dec-2010 Eric Dumazet <eric.dumazet@gmail.com> netfilter: x_table: speedup compat operations

One iptables invocation with 135000 rules takes 35 seconds of cpu time
on a recent server, using a 32bit distro and a 64bit kernel.

We eventually trigger NMI/RCU watchdog.

INFO: rcu_sched_state detected stall on CPU 3 (t=6000 jiffies)

COMPAT mode has quadratic behavior and consume 16 bytes of memory per
rule.

Switch the xt_compat algos to use an array instead of list, and use a
binary search to locate an offset in the sorted array.

This halves memory need (8 bytes per rule), and removes quadratic
behavior [ O(N*N) -> O(N*log2(N)) ]

Time of iptables goes from 35 s to 150 ms.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
83723d60717f8da0f53f91cf42a845ed56c09662 10-Jan-2011 Eric Dumazet <eric.dumazet@gmail.com> netfilter: x_tables: dont block BH while reading counters

Using "iptables -L" with a lot of rules have a too big BH latency.
Jesper mentioned ~6 ms and worried of frame drops.

Switch to a per_cpu seqlock scheme, so that taking a snapshot of
counters doesnt need to block BH (for this cpu, but also other cpus).

This adds two increments on seqlock sequence per ipt_do_table() call,
its a reasonable cost for allowing "iptables -L" not block BH
processing.

Reported-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
b5f15ac4f89f84853544c934fc7a744289e95e34 03-Nov-2010 Vasiliy Kulikov <segooon@gmail.com> ipv4: netfilter: ip_tables: fix information leak to userland

Structure ipt_getinfo is copied to userland with the field "name"
that has the last elements unitialized. It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
243bf6e29eef642de0ff62f1ebf58bc2396d6d6e 13-Oct-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: resolve indirect macros 3/3
87a2e70db62fec7348c6e5545eb7b7650c33d81b 13-Oct-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: resolve indirect macros 2/3

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
12b00c2c025b8af697d9a022ea2e928cad889ef1 13-Oct-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: resolve indirect macros 1/3

Many of the used macros are just there for userspace compatibility.
Substitute the in-kernel code to directly use the terminal macro
and stuff the defines into #ifndef __KERNEL__ sections.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
cca77b7c81876d819a5806f408b3c29b5b61a815 23-Aug-2010 Florian Westphal <fw@strlen.de> netfilter: fix CONFIG_COMPAT support

commit f3c5c1bfd430858d3a05436f82c51e53104feb6b
(netfilter: xtables: make ip_tables reentrant) forgot to
also compute the jumpstack size in the compat handlers.

Result is that "iptables -I INPUT -j userchain" turns into -j DROP.

Reported by Sebastian Roesner on #netfilter, closes
http://bugzilla.netfilter.org/show_bug.cgi?id=669.

Note: arptables change is compile-tested only.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
001389b9581c13fe5fc357a0f89234f85af4215d 16-Aug-2010 Eric Dumazet <eric.dumazet@gmail.com> netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive

After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block
bottom half more than necessary), lockdep can raise a warning
because we attempt to lock a spinlock with BH enabled, while
the same lock is usually locked by another cpu in a softirq context.

Disable again BH to avoid these lockdep warnings.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Diagnosed-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
24b36f0193467fa727b85b4c004016a8dae999b9 02-Aug-2010 Eric Dumazet <eric.dumazet@gmail.com> netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary

We currently disable BH for the whole duration of get_counters()

On machines with a lot of cpus and large tables, this might be too long.

We can disable preemption during the whole function, and disable BH only
while fetching counters for the current cpu.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
7df0884ce144396fc151f2af7a73d5fb305f9b03 23-Jul-2010 Changli Gao <xiaosuo@gmail.com> netfilter: iptables: use skb->len for accounting

Use skb->len for accounting as xt_quota does.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
e12f8e29a8526172b7715881503bae636d60bdd8 04-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com> netfilter: vmalloc_node cleanup

Using vmalloc_node(size, numa_node_id()) for temporary storage is not
needed. vmalloc(size) is more respectful of user NUMA policy.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
7489aec8eed4f2f1eb3b4d35763bd3ea30b32ef5 31-May-2010 Eric Dumazet <eric.dumazet@gmail.com> netfilter: xtables: stackptr should be percpu

commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant)
introduced a performance regression, because stackptr array is shared by
all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache
line)

Fix this using alloc_percpu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-By: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
654d0fbdc8fe1041918741ed5b6abc8ad6b4c1d8 13-May-2010 Stephen Hemminger <shemminger@vyatta.com> netfilter: cleanup printk messages

Make sure all printk messages have a severity level.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
af5676039a9479e6ff42c6aab9fac1149ac9597f 13-May-2010 Stephen Hemminger <shemminger@vyatta.com> netfilter: change NF_ASSERT to WARN_ON

Change netfilter asserts to standard WARN_ON. This has the
benefit of backtrace info and also causes netfilter errors
to show up on kerneloops.org.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
4538506be386f9736b83bf9892f829adbbb70fea 04-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: combine built-in extension structs

Prepare the arrays for use with the multiregister function. The
future layer-3 xt matches can then be easily added to it without
needing more (un)register code.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
b4ba26119b06052888696491f614201817491a0d 07-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: change hotdrop pointer to direct modification

Since xt_action_param is writable, let's use it. The pointer to
'bool hotdrop' always worried (8 bytes (64-bit) to write 1 byte!).
Surprisingly results in a reduction in size:

text data bss filename
5457066 692730 357892 vmlinux.o-prev
5456554 692730 357892 vmlinux.o

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
62fc8051083a334578c3f4b3488808f210b4565f 07-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: deconstify struct xt_action_param for matches

In future, layer-3 matches will be an xt module of their own, and
need to set the fragoff and thoff fields. Adding more pointers would
needlessy increase memory requirements (esp. so for 64-bit, where
pointers are wider).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
4b560b447df83368df44bd3712c0c39b1d79ba04 05-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: substitute temporary defines by final name

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
de74c16996287250f0d947663127f80c6beebd3c 05-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: combine struct xt_match_param and xt_target_param

The structures carried - besides match/target - almost the same data.
It is possible to combine them, as extensions are evaluated serially,
and so, the callers end up a little smaller.

text data bss filename
-15318 740 104 net/ipv4/netfilter/ip_tables.o
+15286 740 104 net/ipv4/netfilter/ip_tables.o
-15333 540 152 net/ipv6/netfilter/ip6_tables.o
+15269 540 152 net/ipv6/netfilter/ip6_tables.o

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
ef53d702c3614fb919e8a8291033e3dbccfd1aea 09-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: dissolve do_match function

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
b5cad0dfd3c80501330215b9a9ae31bcffbd7306 02-May-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: ip_tables: fix compilation when debug is enabled

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
cecc74de25d2cfb08e7702cd38e3f195950f1228 22-Apr-2010 Patrick McHardy <kaber@trash.net> netfilter: ip_tables: convert pr_devel() to pr_debug()

We want to be able to use CONFIG_DYNAMIC_DEBUG in netfilter code, switch
the few existing pr_devel() calls to pr_debug().

Signed-off-by: Patrick McHardy <kaber@trash.net>
5b775eb1c04c2ef33f5e17035e368214214ef9c2 19-Apr-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: remove old comments about reentrancy

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
f3c5c1bfd430858d3a05436f82c51e53104feb6b 19-Apr-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: make ip_tables reentrant

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
bd414ee605ff3ac5fcd79f57269a897879ee4cde 23-Mar-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: change matches to return error code

The following semantic patch does part of the transformation:
// <smpl>
@ rule1 @
struct xt_match ops;
identifier check;
@@
ops.checkentry = check;

@@
identifier rule1.check;
@@
check(...) { <...
-return true;
+return 0;
...> }

@@
identifier rule1.check;
@@
check(...) { <...
-return false;
+return -EINVAL;
...> }
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
b0f38452ff73da7e9e0ddc68cd5c6b93c897ca0d 19-Mar-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: change xt_match.checkentry return type

Restore function signatures from bool to int so that we can report
memory allocation failures or similar using -ENOMEM rather than
always having to pass -EINVAL back.

This semantic patch may not be too precise (checking for functions
that use xt_mtchk_param rather than functions referenced by
xt_match.checkentry), but reviewed, it produced the intended result.

// <smpl>
@@
type bool;
identifier check, par;
@@
-bool check
+int check
(struct xt_mtchk_param *par) { ... }
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
fd0ec0e6216baea854465bbdb177f2d1b2ccaf22 10-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: consolidate code into xt_request_find_match

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
d2a7b6bad2c38e41eddb0b24d03627d9e7aa3f7b 10-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: make use of xt_request_find_target

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
ff67e4e42bd178b1179c4d8e5c1fde18758ce84f 19-Mar-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xt extensions: use pr_<level> (2)

Supplement to 1159683ef48469de71dc26f0ee1a9c30d131cf89.

Downgrade the log level to INFO for most checkentry messages as they
are, IMO, just an extra information to the -EINVAL code that is
returned as part of a parameter "constraint violation". Leave errors
to real errors, such as being unable to create a LED trigger.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
6b4ff2d7675511a31980fa5379808660e1261f90 26-Feb-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: restore indentation

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
0f234214d15fa914436d304ecf5c3e43449e79f9 24-Feb-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: reduce arguments to translate_table

Just pass in the entire repl struct. In case of a new table (e.g.
ip6t_register_table), the repldata has been previously filled with
table->name and table->size already (in ip6t_alloc_initial_table).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
6bdb331bc6910d1ccb74dc9852fc858c5916c927 24-Feb-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: optimize call flow around xt_ematch_foreach

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
dcea992aca82cb08b4674c4c783e325835408d1e 24-Feb-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: replace XT_MATCH_ITERATE macro

The macro is replaced by a list.h-like foreach loop. This makes
the code more inspectable.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
0559518b5b99c591226460c0bbf8e6a570c518a8 24-Feb-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: optimize call flow around xt_entry_foreach

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
72b2b1dd77e8feb0b7c0b26dee58f2a1e2c9828c 24-Feb-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: replace XT_ENTRY_ITERATE macro

The macro is replaced by a list.h-like foreach loop. This makes
the code much more inspectable.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
d5d1baa15f5b05e9110403724d5dc72d6d541e04 26-Jun-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: add const qualifiers

This should make it easier to remove redundant arguments later.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
739674fb7febf116e7d647031fab16989a08a965 26-Jun-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: constify args in compat copying functions

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
e3eaa9910b380530cfd2c0670fcd3f627674da8a 17-Jun-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: generate initial table on-demand

The static initial tables are pretty large, and after the net
namespace has been instantiated, they just hang around for nothing.
This commit removes them and creates tables on-demand at runtime when
needed.

Size shrinks by 7735 bytes (x86_64).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
14c7dbe043d01a83a30633ab6b109ba2ac61d9f7 08-Feb-2010 Alexey Dobriyan <adobriyan@gmail.com> netfilter: xtables: compat out of scope fix

As per C99 6.2.4(2) when temporary table data goes out of scope,
the behaviour is undefined:

if (compat) {
struct foo tmp;
...
private = &tmp;
}
[dereference private]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
dab1531a07ad7c5be4ebe715a3d08742f0c638e3 08-Feb-2010 Alexey Dobriyan <adobriyan@gmail.com> netfilter: xtables: compat out of scope fix

As per C99 6.2.4(2) when temporary table data goes out of scope,
the behaviour is undefined:

if (compat) {
struct foo tmp;
...
private = &tmp;
}
[dereference private]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
add67461240c1dadc7c8d97e66f8f92b556ca523 03-Feb-2010 Patrick McHardy <kaber@trash.net> netfilter: add struct net * to target parameters

Signed-off-by: Patrick McHardy <kaber@trash.net>
f54e9367f8499a9bf6b2afbc0dce63e1d53c525a 18-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> netfilter: xtables: add struct xt_mtdtor_param::net

Add ->net to match destructor list like ->net in constructor list.

Make sure it's set in ebtables/iptables/ip6tables, this requires to
propagate netns up to *_unregister_table().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
a83d8e8d099fc373a5ca7112ad08c553bb2c180f 18-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> netfilter: xtables: add struct xt_mtchk_param::net

Some complex match modules (like xt_hashlimit/xt_recent) want netns
information at constructor and destructor time. We propably can play
games at match destruction time, because netns can be passed in object,
but I think it's cleaner to explicitly pass netns.

Add ->net, make sure it's set from ebtables/iptables/ip6tables code.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
3666ed1c4837fd6906da0224c5373d7a2186a193 23-Nov-2009 Joe Perches <joe@perches.com> netfilter: net/ipv[46]/netfilter: Move && and || to end of previous line

Compile tested only.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
35aad0ffdf548617940ca1e78be1f2e0bafc4496 24-Aug-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: mark initial tables constant

The inputted table is never modified, so should be considered const.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
e2fe35c17fed62d4ab5038fa9bc489e967ff8416 18-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: check for standard verdicts in policies

This adds the second check that Rusty wanted to have a long time ago. :-)

Base chain policies must have absolute verdicts that cease processing
in the table, otherwise rule execution may continue in an unexpected
spurious fashion (e.g. next chain that follows in memory).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
90e7d4ab5c8b0c4c2e00e4893977f6aeec0f18f1 09-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: check for unconditionality of policies

This adds a check that iptables's original author Rusty set forth in
a FIXME comment.

Underflows in iptables are better known as chain policies, and are
required to be unconditional or there would be a stochastical chance
for the policy rule to be skipped if it does not match. If that were
to happen, rule execution would continue in an unexpected spurious
fashion.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
a7d51738e757c1ab94595e7d05594c61f0fb32ce 18-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: ignore unassigned hooks in check_entry_size_and_hooks

The "hook_entry" and "underflow" array contains values even for hooks
not provided, such as PREROUTING in conjunction with the "filter"
table. Usually, the values point to whatever the next rule is. For
the upcoming unconditionality and underflow checking patches however,
we must not inspect that arbitrary rule.

Skipping unassigned hooks seems like a good idea, also because
newinfo->hook_entry and newinfo->underflow will then continue to have
the poison value for detecting abnormalities.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
47901dc2c4a3f1f9af453486a005d31fe9b393f0 09-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: use memcmp in unconditional check

Instead of inspecting each u32/char open-coded, clean up and make use
of memcmp. On some arches, memcmp is implemented as assembly or GCC's
__builtin_memcmp which can possibly take advantages of known
alignment.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
e5afbba1869a5d9509c61f8962be9bdebf95f7d3 08-Jul-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: iptables: remove unused datalen variable

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
24992eacd8a9f4af286bdaaab627b6802ceb8bce 12-Jun-2009 Patrick McHardy <kaber@trash.net> netfilter: ip_tables: fix build error

Fix build error introduced by commit bb70dfa5 (netfilter: xtables:
consolidate comefrom debug cast access):

net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table':
net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function)
net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once
net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.)

Signed-off-by: Patrick McHardy <kaber@trash.net>
a5e78820966e17c2316866e00047e4e7e5480f04 04-Jun-2009 Evgeniy Polyakov <zbr@ioremap.net> netfilter: x_tables: added hook number into match extension parameter structure.

Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
bb70dfa5f8ab4a0f1c699ddb3ef0276d91219b7c 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: consolidate comefrom debug cast access

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
7a6b1c46e28ab0511be26c238d552c00b51b88c5 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: remove another level of indent

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
9452258d8173806f67b1be5b3b00c39b052060c8 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: remove some goto

Combining two ifs, and goto is easily gone.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
a1ff4ac84e58503691058e88d55fa48949822683 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: reduce indent level by one

Cosmetic only. Transformation applied:

-if (foo) { long block; } else { short block; }
+if (!foo) { short block; continue; } long block;

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
98e86403162d08a30b03426c54c2a8fca1f695d1 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: consolidate open-coded logic

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
4f2f6f236af484ada595ff37d0ee1902aa56221f 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: fix const inconsistency

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
ccf5bd8c27daa4184004438273e03d3812b14d75 15-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: remove redundant casts

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
4ba351cf866257de3da3fbc8d2bcc3907ee9460f 14-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: use NFPROTO_ in standard targets

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
383ca5b874fad082a61af0f47610a1af83e6bb44 14-Apr-2009 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: use NFPROTO_ for xt_proto_init callsites

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
942e4a2bd680c606af0211e64eb216be2e19bf61 29-Apr-2009 Stephen Hemminger <shemminger@vyatta.com> netfilter: revised locking for x_tables

The x_tables are organized with a table structure and a per-cpu copies
of the counters and rules. On older kernels there was a reader/writer
lock per table which was a performance bottleneck. In 2.6.30-rc, this
was converted to use RCU and the counters/rules which solved the performance
problems for do_table but made replacing rules much slower because of
the necessary RCU grace period.

This version uses a per-cpu set of spinlocks and counters to allow to
table processing to proceed without the cache thrashing of a global
reader lock and keeps the same performance for table updates.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fa9a86ddc8ecd2830a5e773facc250f110300ae7 02-Apr-2009 Eric Dumazet <dada1@cosmosbay.com> netfilter: use rcu_read_bh() in ipt_do_table()

Commit 784544739a25c30637397ace5489eeb6e15d7d49
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

Reported-and-bisected-by: Roman Mindalev <r000n@r000n.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1f9352ae2253a97b07b34dcf16ffa3b4ca12c558 25-Mar-2009 Patrick McHardy <kaber@trash.net> netfilter: {ip,ip6,arp}_tables: fix incorrect loop detection

Commit e1b4b9f ([NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case
search for loops) introduced a regression in the loop detection algorithm,
causing sporadic incorrectly detected loops.

When a chain has already been visited during the check, it is treated as
having a standard target containing a RETURN verdict directly at the
beginning in order to not check it again. The real target of the first
rule is then incorrectly treated as STANDARD target and checked not to
contain invalid verdicts.

Fix by making sure the rule does actually contain a standard target.

Based on patch by Francis Dupont <Francis_Dupont@isc.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
b8dfe498775de912116f275680ddb57c8799d9ef 25-Mar-2009 Eric Dumazet <dada1@cosmosbay.com> netfilter: factorize ifname_compare()

We use same not trivial helper function in four places. We can factorize it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
08361aa807ae5e5007cd226ca9e34287512de737 20-Feb-2009 Eric Dumazet <dada1@cosmosbay.com> netfilter: ip_tables: unfold two critical loops in ip_packet_match()

While doing oprofile tests I noticed two loops are not properly unrolled by gcc

Using a hand coded unrolled loop provides nice speedup : ipt_do_table
credited of 2.52 % of cpu instead of 3.29 % in tbench.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
784544739a25c30637397ace5489eeb6e15d7d49 20-Feb-2009 Stephen Hemminger <shemminger@vyatta.com> netfilter: iptables: lock free counters

The reader/writer lock in ip_tables is acquired in the critical path of
processing packets and is one of the reasons just loading iptables can cause
a 20% performance loss. The rwlock serves two functions:

1) it prevents changes to table state (xt_replace) while table is in use.
This is now handled by doing rcu on the xt_table. When table is
replaced, the new table(s) are put in and the old one table(s) are freed
after RCU period.

2) it provides synchronization when accesing the counter values.
This is now handled by swapping in new table_info entries for each cpu
then summing the old values, and putting the result back onto one
cpu. On a busy system it may cause sampling to occur at different
times on each cpu, but no packet/byte counts are lost in the process.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)

Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
cffee385d7f367e80b288abf4261256477f7760e 31-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace NIPQUAD() in net/ipv4/netfilter/

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
916a917dfec18535ff9e2afdafba82e6279eb4f4 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: provide invoked family value to extensions

By passing in the family through which extensions were invoked, a bit
of data space can be reclaimed. The "family" member will be added to
the parameter structures and the check functions be adjusted.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
a2df1648ba615dd5908e9a1fa7b2f133fa302487 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: move extension arguments into compound structure (6/6)

This patch does this for target extensions' destroy functions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
af5d6dc200eb0fcc6fbd3df1ab4d8969004cb37f 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: move extension arguments into compound structure (5/6)

This patch does this for target extensions' checkentry functions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
7eb3558655aaa87a3e71a0c065dfaddda521fa6d 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: move extension arguments into compound structure (4/6)

This patch does this for target extensions' target functions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
6be3d8598e883fb632edf059ba2f8d1b9f4da138 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: move extension arguments into compound structure (3/6)

This patch does this for match extensions' destroy functions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
9b4fce7a3508a9776534188b6065b206a9608ccf 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: move extension arguments into compound structure (2/6)

This patch does this for match extensions' checkentry functions.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
f7108a20dee44e5bb037f9e48f6a207b42e6ae1c 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: move extension arguments into compound structure (1/6)

The function signatures for Xtables extensions have grown over time.
It involves a lot of typing/replication, and also a bit of stack space
even if they are not used. Realize an NFWS2008 idea and pack them into
structs. The skb remains outside of the struct so gcc can continue to
apply its optimizations.

This patch does this for match extensions' match functions.

A few ambiguities have also been addressed. The "offset" parameter for
example has been renamed to "fragoff" (there are so many different
offsets already) and "protoff" to "thoff" (there is more than just one
protocol here, so clarify).

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
367c679007fa4f990eb7ee381326ec59d8148b0e 08-Oct-2008 Jan Engelhardt <jengelh@medozas.de> netfilter: xtables: do centralized checkentry call (1/2)

It used to be that {ip,ip6,etc}_tables called extension->checkentry
themselves, but this can be moved into the xtables core.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
544473c1664f3a688be949ac078bdee6f4afeef1 14-Apr-2008 Patrick McHardy <kaber@trash.net> [NETFILTER]: {ip,ip6,arp}_tables: return EAGAIN for invalid SO_GET_ENTRIES size

Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.

Return EAGAIN so userspace can retry the operation.

Fixes (with current iptables SVN version) netfilter bugzilla #104.

Signed-off-by: Patrick McHardy <kaber@trash.net>
5452e425adfdfc4647b618e303f73d48f2405b0e 14-Apr-2008 Jan Engelhardt <jengelh@computergmbh.de> [NETFILTER]: annotate {arp,ip,ip6,x}tables with const

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
3b1e0a655f8eba44ab1ee2a1068d169ccfb853b9 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
0dc47877a3de00ceadea0005189656ae8dc52669 06-Mar-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace remaining __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3cb609d57c20027a8b39fc60b79b930a89da82d4 31-Jan-2008 Alexey Dobriyan <adobriyan@sw.ru> [NETFILTER]: x_tables: create per-netns /proc/net/*_tables_*

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
b0a6363c2418c93f25dd30b8ffcd3fdd4ce23ad6 31-Jan-2008 Patrick McHardy <kaber@trash.net> [NETFILTER]: {ip,arp,ip6}_tables: fix sparse warnings in compat code

CHECK net/ipv4/netfilter/ip_tables.c
net/ipv4/netfilter/ip_tables.c:1453:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1453:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1453:8: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1458:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1458:44: expected int *size
net/ipv4/netfilter/ip_tables.c:1458:44: got unsigned int [usertype] *size
net/ipv4/netfilter/ip_tables.c:1603:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1603:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1603:2: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1627:8: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1627:8: expected int *size
net/ipv4/netfilter/ip_tables.c:1627:8: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1634:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/ip_tables.c:1634:40: expected int *size
net/ipv4/netfilter/ip_tables.c:1634:40: got unsigned int *size
net/ipv4/netfilter/ip_tables.c:1653:8: warning: incorrect type in argument 5 (different signedness)
net/ipv4/netfilter/ip_tables.c:1653:8: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1653:8: got int *<noident>
net/ipv4/netfilter/ip_tables.c:1666:2: warning: incorrect type in argument 2 (different signedness)
net/ipv4/netfilter/ip_tables.c:1666:2: expected unsigned int *i
net/ipv4/netfilter/ip_tables.c:1666:2: got int *<noident>
CHECK net/ipv4/netfilter/arp_tables.c
net/ipv4/netfilter/arp_tables.c:1285:40: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1285:40: expected int *size
net/ipv4/netfilter/arp_tables.c:1285:40: got unsigned int *size
net/ipv4/netfilter/arp_tables.c:1543:44: warning: incorrect type in argument 3 (different signedness)
net/ipv4/netfilter/arp_tables.c:1543:44: expected int *size
net/ipv4/netfilter/arp_tables.c:1543:44: got unsigned int [usertype] *size
CHECK net/ipv6/netfilter/ip6_tables.c
net/ipv6/netfilter/ip6_tables.c:1481:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1481:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1481:8: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1486:44: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1486:44: expected int *size
net/ipv6/netfilter/ip6_tables.c:1486:44: got unsigned int [usertype] *size
net/ipv6/netfilter/ip6_tables.c:1631:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1631:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1631:2: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1655:8: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1655:8: expected int *size
net/ipv6/netfilter/ip6_tables.c:1655:8: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: warning: incorrect type in argument 3 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1662:40: expected int *size
net/ipv6/netfilter/ip6_tables.c:1662:40: got unsigned int *size
net/ipv6/netfilter/ip6_tables.c:1680:8: warning: incorrect type in argument 5 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1680:8: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1680:8: got int *<noident>
net/ipv6/netfilter/ip6_tables.c:1693:2: warning: incorrect type in argument 2 (different signedness)
net/ipv6/netfilter/ip6_tables.c:1693:2: expected unsigned int *i
net/ipv6/netfilter/ip6_tables.c:1693:2: got int *<noident>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
df200969b1627e8f1cda7ce8c0707863f91bb81b 31-Jan-2008 Alexey Dobriyan <adobriyan@sw.ru> [NETFILTER]: netns: put table module on netns stop

When number of entries exceeds number of initial entries, foo-tables code
will pin table module. But during table unregister on netns stop,
that additional pin was forgotten.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
34bd137ba744c2e3a320ff50ac64ae51556cdfae 31-Jan-2008 Alexey Dobriyan <adobriyan@sw.ru> [NETFILTER]: ip_tables: propagate netns from userspace

.. all the way down to table searching functions.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
44d34e721e2c81ccdfb13cf34996309247ae2981 31-Jan-2008 Alexey Dobriyan <adobriyan@sw.ru> [NETFILTER]: x_tables: return new table from {arp,ip,ip6}t_register_table()

Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.

So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.

Do it at once to not screw bisection.

P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
full netnsization of table modules is done.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d870052079d255917ec4f8431f5ec102707b7af 31-Jan-2008 Alexey Dobriyan <adobriyan@sw.ru> [NETFILTER]: x_tables: per-netns xt_tables

In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.

Every user stubbed with init_net for a while.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
a98da11d88dbec1d5cebe2c6dbe9939ed8d13f69 31-Jan-2008 Alexey Dobriyan <adobriyan@sw.ru> [NETFILTER]: x_tables: change xt_table_register() return value convention

Switch from 0/-E to ptr/PTR_ERR convention.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ecb6f85e11627a0fb26a7e2db0d3603c0d602937 31-Jan-2008 Jan Engelhardt <jengelh@computergmbh.de> [NETFILTER]: Use const in struct xt_match, xt_target, xt_table

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
022748a9357c4c1a0113ec1ce5612f383b80156f 15-Jan-2008 Denys Vlasenko <vda.linux@googlemail.com> [NETFILTER]: {ip,ip6}_tables: remove some inlines

This patch removes inlines except those which are used
by packet matching code and thus are performance-critical.

Before:

$ size */*/*/ip*tables*.o
text data bss dec hex filename
6402 500 16 6918 1b06 net/ipv4/netfilter/ip_tables.o
7130 500 16 7646 1dde net/ipv6/netfilter/ip6_tables.o

After:

$ size */*/*/ip*tables*.o
text data bss dec hex filename
6307 500 16 6823 1aa7 net/ipv4/netfilter/ip_tables.o
7010 500 16 7526 1d66 net/ipv6/netfilter/ip6_tables.o

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e79ec50b9587c175f65f98550d66ad5b96c05dd9 18-Dec-2007 Jan Engelhardt <jengelh@computergmbh.de> [NETFILTER]: Parenthesize macro parameters

Parenthesize macro parameters.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
f01ffbd6e7d001ccf9168b33507958a51ce0ffcf 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: nf_log: move logging stuff to seperate header

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6d6a55f42d93eeba7add62c3ad6a5409c6fd24ae 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: remove ipchains compatibility hack

ipchains support has been removed years ago. kill last remains.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9d8fe13175140c79982f9d29c6921328f9afad6 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: {ip,ip6}_tables: fix format strings

Use %zu for sizeof() and remove casts.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9c54795950d198e77144a18c94e7ed52ea0f3c77 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: {ip,ip6}_tables: reformat to eliminate differences

Reformat ip_tables.c and ip6_tables.c in order to eliminate non-functional
differences and minimize diff output.

This allows to get a view of the real differences using:

sed -e 's/IP6T/IPT/g' \
-e 's/IP6/IP/g' \
-e 's/INET6/INET/g' \
-e 's/ip6t/ipt/g' \
-e 's/ip6/ip/g' \
-e 's/ipv6/ip/g' \
-e 's/icmp6/icmp/g' \
net/ipv6/netfilter/ip6_tables.c | \
diff -wup /dev/stdin net/ipv4/netfilter/ip_tables.c

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
b386d9f5960a9afce7f077edf2095fccfbb1a8e6 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: move compat offset calculation to x_tables

Its needed by ip6_tables and arp_tables as well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
73cd598df46a73d6f02063f2520df115a9b88aa5 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: fix compat types

Use compat types and compat iterators when dealing with compat entries for
clarity. This doesn't actually make a difference for ip_tables, but is
needed for ip6_tables and arp_tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
30c08c41be75145b8850ea14b2d5ee4ee4b705d8 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: account for struct ipt_entry/struct compat_ipt_entry size diff

Account for size differences when dumping entries or calculating the
entry positions. This doesn't actually make any difference for IPv4
since the structures have the same size, but its logically correct
and needed for IPv6.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8956695131b8a7878891667469899d667eb5892b 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: make xt_compat_match_from_user usable in iterator macros

Make xt_compat_match_from_user return an int to make it usable in the
*tables iterator macros and kill a now unnecessary wrapper function.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4b4782486d49547bc589b64acfb956bda05f0fca 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: reformat compat code

The compat code has some very odd formating, clean it up before porting
it to ip6_tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ac8e27fd89ca16e02fa1f911480bc46d64edc9da 18-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: kill useless wrapper

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e67d5a739327c44885adebb4f3a538050be73e4 05-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: remove obsolete overflow check

We're not multiplying the size with the number of CPUs anymore, so the
check is obsolete.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
259d4e41f3ec25f22169daece42729f597b89f9a 05-Dec-2007 Eric Dumazet <dada1@cosmosbay.com> [NETFILTER]: x_tables: struct xt_table_info diet

Instead of using a big array of NR_CPUS entries, we can compute the size
needed at runtime, using nr_cpu_ids

This should save some ram (especially on David's machines where NR_CPUS=4096 :
32 KB can be saved per table, and 64KB for dynamically allocated ones (because
of slab/slub alignements) )

In particular, the 'bootstrap' tables are not any more static (in data
section) but on stack as their size is now very small.

This also should reduce the size used on stack in compat functions
(get_info() declares an automatic variable, that could be bigger than kernel
stack size for big NR_CPUS)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6e23ae2a48750bda407a4a58f52a4865d7308bf5 20-Nov-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: Introduce NF_INET_ hook values

The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
a18aa31b7774d8b36048e256a02d9d689533fc96 12-Dec-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: fix compat copy race

When copying entries to user, the kernel makes two passes through the
data, first copying all the entries, then fixing up names and counters.
On the second pass it copies the kernel and match data from userspace
to the kernel again to find the corresponding structures, expecting
that kernel pointers contained in the data are still valid.

This is obviously broken, fix by avoiding the second pass completely
and fixing names and counters while dumping the ruleset, using the
kernel-internal data structures.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3db05fea51cdb162cfa8f69e9cfb9e228919d2a9 15-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [NETFILTER]: Replace sk_buff ** with sk_buff *

With all the users of the double pointers removed, this patch mops up by
finally replacing all occurances of sk_buff ** in the netfilter API by
sk_buff *.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
16fcec35e7d7c4faaa4709f6434a4a25b06d25e3 11-Sep-2007 Neil Horman <nhorman@tuxdriver.com> [NETFILTER]: Fix/improve deadlock condition on module removal netfilter

So I've had a deadlock reported to me. I've found that the sequence of
events goes like this:

1) process A (modprobe) runs to remove ip_tables.ko

2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count

3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt

4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel

4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko

5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.

6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.

7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)

Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem

Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine. This prevents the deadlock by preventing set 4
from happening.

Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation. So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like.... :) ).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
0236e667e188af0336cd776e5b54c1f3fd19a03c 09-Jul-2007 Dan Aloni <da-x@monatomic.org> [NETFILTER] net/ipv4/netfilter/ip_tables.c: lower printk severity

Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9f15c5302de4e8b0aac7ca24c36bf26a7fe1a513 08-Jul-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: mark matches and targets __read_mostly

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ba9dda3ab5a865542e69dfe01edb2436857c9420 08-Jul-2007 Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> [NETFILTER]: x_tables: add TRACE target

The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ccb79bdce71f2c04cfa9bfcbaf4d37e2f963d684 08-Jul-2007 Jan Engelhardt <jengelh@gmx.de> [NETFILTER]: x_tables: switch xt_match->checkentry to bool

Switch the return type of match functions to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1d93a9cbad608f6398ba6c5b588c504ccd35a2ca 08-Jul-2007 Jan Engelhardt <jengelh@gmx.de> [NETFILTER]: x_tables: switch xt_match->match to bool

Switch the return type of match functions to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
cff533ac12494fa002e2c46acc94d670e5f636a2 08-Jul-2007 Jan Engelhardt <jengelh@gmx.de> [NETFILTER]: x_tables: switch hotdrop to bool

Switch the "hotdrop" variables to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4c1b52bc7a2f5ee01ea3fc248a8748a1c6843f7c 05-Jun-2007 Dmitry Mishin <dim@openvz.org> [NETFILTER]: ip_tables: fix compat related crash

check_compat_entry_size_and_hooks iterates over the matches and calls
compat_check_calc_match, which loads the match and calculates the
compat offsets, but unlike the non-compat version, doesn't call
->checkentry yet. On error however it calls cleanup_matches, which in
turn calls ->destroy, which can result in crashes if the destroy
function (validly) expects to only get called after the checkentry
function.

Add a compat_release_match function that only drops the module reference
on error and rename compat_check_calc_match to compat_find_calc_match to
reflect the fact that it doesn't call the checkentry function.

Reported by Jan Engelhardt <jengelh@linux01.gwdg.de>

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b53d9042c04b8eb875d02e65792e9884efc3784 23-Mar-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: Remove changelogs and CVS IDs

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
eddc9ec53be2ecdbf4efe0efd4a83052594f0ac0 21-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9bdd4b5257406b0608385d19c40b5511decf4f6 13-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [IP]: Introduce ip_hdrlen()

For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open
coded skb->nh.iph uses, now to go after the rest...

Just out of curiosity, here are the idioms found to get the same result:

skb->nh.iph->ihl << 2
skb->nh.iph->ihl<<2
skb->nh.iph->ihl * 4
skb->nh.iph->ihl*4
(skb->nh.iph)->ihl * sizeof(u32)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e905a9edab7f4f14f9213b52234e4a346c690911 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] IPV4: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e60a13e030867078f3c9fef8dca6cd8a5b883478 08-Feb-2007 Jan Engelhardt <jengelh@gmx.de> [NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6709dbbb1978abe039ea4b76c364bf003bf40de5 08-Feb-2007 Jan Engelhardt <jengelh@gmx.de> [NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functions

Use the x_tables functions directly to make it better visible which
parts are shared between ip_tables and ip6_tables.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5b5ef7d2b3fa364cb03407c432ae9979657aa6c 04-Jan-2007 Dmitry Mishin <dim@openvz.org> [NETFILTER]: compat offsets size change

Used by compat code offsets of entries should be 'unsigned int' as entries
array size has this dimension.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1b4b9f3986b80d5785d91dbd8d72cfaf9fd1117 12-Dec-2006 Al Viro <viro@zeniv.linux.org.uk> [NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case search for loops

If we come to node we'd already marked as seen and it's not a part of path
(i.e. we don't have a loop right there), we already know that it isn't a
part of any loop, so we don't need to revisit it.

That speeds the things up if some chain is refered to from several places
and kills O(exp(table size)) worst-case behaviour (without sleeping,
at that, so if you manage to self-LART that way, you are SOL for a long
time)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
a96be24679198469df28976c809575423e70d843 12-Dec-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: ip_tables: ipt and ipt_compat checks unification

Matches and targets verification is duplicated in normal and compat processing
ways. This patch refactors code in order to remove this.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
f6677f4312ee74f8ca68c4cc4060465607b72b41 05-Dec-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: Fix iptables compat hook validation

In compat mode, matches and targets valid hooks checks always successful due
to not initialized e->comefrom field yet. This patch separates this checks from
translation code and moves them after mark_source_chains() call, where these
marks are initialized.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by; Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
74c9c0c17dea729d6089c0c82762babd02e65f84 05-Dec-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: Fix {ip,ip6,arp}_tables hook validation

Commit 590bdf7fd2292b47c428111cb1360e312eff207e introduced a regression
in match/target hook validation. mark_source_chains builds a bitmask
for each rule representing the hooks it can be reached from, which is
then used by the matches and targets to make sure they are only called
from valid hooks. The patch moved the match/target specific validation
before the mark_source_chains call, at which point the mask is always zero.

This patch returns back to the old order and moves the standard checks
to mark_source_chains. This allows to get rid of a special case for
standard targets as a nice side-effect.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
920b868ae1dfdac77c5e8c97e7067b23680f043e 31-Oct-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: ip_tables: compat code module refcounting fix

This patch fixes bug in iptables modules refcounting on compat error way.

As we are getting modules in check_compat_entry_size_and_hooks(), in case of
later error, we should put them all in translate_compat_table(), not in the
compat_copy_entry_from_user() or compat_copy_match_from_user(), as it is now.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Vasily Averin <vvs@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ef4512e76679b4f4997f60f93f8a576a0d20c26b 31-Oct-2006 Vasily Averin <vvs@openvz.org> [NETFILTER]: ip_tables: compat error way cleanup

This patch adds forgotten compat_flush_offset() call to error way of
translate_compat_table(). May lead to table corruption on the next
compat_do_replace().

Signed-off-by: Vasily Averin <vvs@openvz.org>
Acked-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
590bdf7fd2292b47c428111cb1360e312eff207e 31-Oct-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables

There is a number of issues in parsing user-provided table in
translate_table(). Malicious user with CAP_NET_ADMIN may crash system by
passing special-crafted table to the *_tables.

The first issue is that mark_source_chains() function is called before entry
content checks. In case of standard target, mark_source_chains() function
uses t->verdict field in order to determine new position. But the check, that
this field leads no further, than the table end, is in check_entry(), which
is called later, than mark_source_chains().

The second issue, that there is no check that target_offset points inside
entry. If so, *_ITERATE_MATCH macro will follow further, than the entry
ends. As a result, we'll have oops or memory disclosure.

And the third issue, that there is no check that the target is completely
inside entry. Results are the same, as in previous issue.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
82fac0542e11c0d3316cc8fdafd2a990d2aab692 20-Oct-2006 Björn Steinbrink <B.Steinbrink@gmx.de> [NETFILTER]: Missing check for CAP_NET_ADMIN in iptables compat layer

The 32bit compatibility layer has no CAP_NET_ADMIN check in
compat_do_ipt_get_ctl, which for example allows to list the current
iptables rules even without having that capability (the non-compat
version requires it). Other capabilities might be required to exploit
the bug (eg. CAP_NET_RAW to get the nfnetlink socket?), so a plain user
can't exploit it, but a setup actually using the posix capability system
might very well hit such a constellation of granted capabilities.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3e597c6045502dd0fa98a61aa95ba178f8a2cc03 25-Sep-2006 Al Viro <viro@ftp.linux.org.uk> [PATCH] fix iptables __user misannotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
9fa492cdc160cd27ce1046cb36f47d3b2b1efa21 20-Sep-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: simplify compat API

Split the xt_compat_match/xt_compat_target into smaller type-safe functions
performing just one operation. Handle all alignment and size-related
conversions centrally in these function instead of requiring each module to
implement a full-blown conversion function. Replace ->compat callback by
->compat_from_user and ->compat_to_user callbacks, responsible for
converting just a single private structure.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
79030ed07de673e8451a03aecb9ada9f4d75d491 20-Sep-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: revision support for compat code

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
bec71b162747708d4b45b0cd399b484f52f2901a 20-Sep-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: fix module refcount leaks in compat error paths

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
90d47db4a06f93f7339618b2a4f0cb032ef8d6d5 20-Sep-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: x_tables: small check_entry & module_refcount cleanup

While standard_target has target->me == NULL, module_put() should be
called for it as for others, because there were try_module_get() before.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
efa741656e9ebf5fd6e0432b0d1b3c7f156392d3 22-Aug-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: remove unused size argument to check/destroy functions

The size is verified by x_tables and isn't needed by the modules anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
fe1cb10873b44cf89082465823ee6d4d4ac63ad7 22-Aug-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: remove unused argument to target functions

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8311731afc439f508ab4d759edadedae75afb73e 18-Aug-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: ip_tables: fix table locking in ipt_do_table

table->private might change because of ruleset changes, don't use it without
holding the lock.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
0eff66e625306a794ecba4b29ed12f7a147ce219 14-Aug-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in init path

Neither of {arp,ip,ip6}_tables cleans up behind itself when something goes
wrong during initialization.

Noticed by Rennie deGraaf <degraaf@cpsc.ucalgary.ca>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
da298d3a4f01dbc10c54da75d6b5717a99fb9cbc 27-Jun-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: fix xt_register_table error propagation

When xt_register_table fails the error is not properly propagated back.
Based on patch by Lepton Wu <ytht.net@gmail.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
fb1bb34d45400f12e0a33f8c487b3795674908a7 25-Jun-2006 Andrew Morton <akpm@osdl.org> [PATCH] remove for_each_cpu()

Convert a few stragglers over to for_each_possible_cpu(), remove
for_each_cpu().

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
7800007c1e2d42cd4120b87b0ba3f3480f17f30a 04-May-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: don't use __copy_{from,to}_user on unchecked memory in compat layer

Noticed by Linus Torvalds <torvalds@osdl.org>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
46c5ea3c9ae7fbc6e52a13c92e59d4fc7f4ca80a 02-May-2006 Patrick McHardy <kaber@trash.net> [NETFILTER] x_tables: fix compat related crash on non-x86

When iptables userspace adds an ipt_standard_target, it calculates the size
of the entire entry as:

sizeof(struct ipt_entry) + XT_ALIGN(sizeof(struct ipt_standard_target))

ipt_standard_target looks like this:

struct xt_standard_target
{
struct xt_entry_target target;
int verdict;
};

xt_entry_target contains a pointer, so when compiled for 64 bit the
structure gets an extra 4 byte of padding at the end. On 32 bit
architectures where iptables aligns to 8 byte it will also have 4
byte padding at the end because it is only 36 bytes large.

The compat_ipt_standard_fn in the kernel adjusts the offsets by

sizeof(struct ipt_standard_target) - sizeof(struct compat_ipt_standard_target),

which will always result in 4, even if the structure from userspace
was already padded to a multiple of 8. On x86 this works out by
accident because userspace only aligns to 4, on all other
architectures this is broken and causes incorrect adjustments to
the size and following offsets.

Thanks to Linus for lots of debugging help and testing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6f912042256c12b0927438122594f5379b364f5d 11-Apr-2006 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [PATCH] for_each_possible_cpu: network codes

for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
in the past where people were using for_each_cpu() where they should have been
iterating across only online or present CPUs. This is inefficient and
possibly buggy.

We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
future.

This patch replaces for_each_cpu with for_each_possible_cpu under /net

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2722971cbe831117686039d5c334f2c0f560be13 01-Apr-2006 Dmitry Mishin <dim@openvz.org> [NETFILTER]: iptables 32bit compat layer

This patch extends current iptables compatibility layer in order to get
32bit iptables to work on 64bit kernel. Current layer is insufficient due
to alignment checks both in kernel and user space tools.

Patch is for current net-2.6.17 with addition of move of ipt_entry_{match|
target} definitions to xt_entry_{match|target}.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Acked-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
65b4b4e81a5094d52cbe372b887b1779abe53f9b 29-Mar-2006 Andrew Morton <akpm@osdl.org> [NETFILTER]: Rename init functions.

Every netfilter module uses `init' for its module_init() function and
`fini' or `cleanup' for its module_exit() function.

Problem is, this creates uninformative initcall_debug output and makes
ctags rather useless.

So go through and rename them all to $(filename)_init and
$(filename)_fini.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a45049c51ce6a3fecf2a909b591b28164c927112 22-Mar-2006 Pablo Neira Ayuso <pablo@netfilter.org> [NETFILTER]: x_tables: set the protocol family in x_tables targets/matches

Set the family field in xt_[matches|targets] registered.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
57b47a53ec4a67691ba32cff5768e8d78fa6c67f 21-Mar-2006 Ingo Molnar <mingo@elte.hu> [NET]: sem2mutex part 2

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c49867347404c46f137a261643ed4fce4376f324 21-Mar-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: add xt_{match,target} arguments to match/target functions

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1c524830d0b39472f0278989bf1119750a5e234d 21-Mar-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: x_tables: pass registered match/target data to match/target functions

This allows to make decisions based on the revision (and address family
with a follow-up patch) at runtime.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1d5cd90976fa0d1cc21554b9d43f5c517323ebfc 21-Mar-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: Convert ip_tables matches/targets to centralized error checking

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3cdc7c953eb1e1e1d1b82adbd140bf3451c165b1 21-Mar-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: Change {ip,ip6,arp}_tables to use centralized error checking

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee4bb818ae35f68d1f848eae0a7b150a38eb4168 04-Feb-2006 Kirill Korotaev <dev@openvz.org> [NETFILTER]: Fix possible overflow in netfilters do_replace()

netfilter's do_replace() can overflow on addition within SMP_ALIGN()
and/or on multiplication by NR_CPUS, resulting in a buffer overflow on
the copy_from_user(). In practice, the overflow on addition is
triggerable on all systems, whereas the multiplication one might require
much physical memory to be present due to the check above. Either is
sufficient to overwrite arbitrary amounts of kernel memory.

I really hate adding the same check to all 4 versions of do_replace(),
but the code is duplicate...

Found by Solar Designer during security audit of OpenVZ.org

Signed-Off-By: Kirill Korotaev <dev@openvz.org>
Signed-Off-By: Solar Designer <solar@openwall.com>
Signed-off-by: Patrck McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2e4e6a17af35be359cc8f1c924f8f198fbd478cc 12-Jan-2006 Harald Welte <laforge@netfilter.org> [NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables

This monster-patch tries to do the best job for unifying the data
structures and backend interfaces for the three evil clones ip_tables,
ip6_tables and arp_tables. In an ideal world we would never have
allowed this kind of copy+paste programming... but well, our world
isn't (yet?) ideal.

o introduce a new x_tables module
o {ip,arp,ip6}_tables depend on this x_tables module
o registration functions for tables, matches and targets are only
wrappers around x_tables provided functions
o all matches/targets that are used from ip_tables and ip6_tables
are now implemented as xt_FOOBAR.c files and provide module aliases
to ipt_FOOBAR and ip6t_FOOBAR
o header files for xt_matches are in include/linux/netfilter/,
include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers
around the xt_FOOBAR.h headers

Based on this patchset we're going to further unify the code,
gradually getting rid of all the layer 3 specific assumptions.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4fc268d24ceb9f4150777c1b5b2b8e6214e56b2b 11-Jan-2006 Randy Dunlap <rdunlap@xenotime.net> [PATCH] capable/capability.h (net/)

net: Use <linux/capability.h> where capable() is used.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
318360646941d6f3d4c6e4ee99107392728a4079 14-Dec-2005 Eric Dumazet <dada1@cosmosbay.com> [NETFILTER] ip_tables: NUMA-aware allocation

Part of a performance problem with ip_tables is that memory allocation
is not NUMA aware, but 'only' SMP aware (ie each CPU normally touch
separate cache lines)

Even with small iptables rules, the cost of this misplacement can be
high on common workloads. Instead of using one vmalloc() area
(located in the node of the iptables process), we now allocate an area
for each possible CPU, using vmalloc_node() so that memory should be
allocated in the CPU's node if possible.

Port to arp_tables and ip6_tables by Harald Welte.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b5b5cff9a6655dbb6d2e2be365bb95eec3950eb 30-Nov-2005 Arjan van de Ven <arjan@infradead.org> [NET]: Add const markers to various variables.

the patch below marks various variables const in net/; the goal is to
move them to the .rodata section so that they can't false-share
cachelines with things that get written to, as well as potentially
helping gcc a bit with optimisations. (these were found using a gcc
patch to warn about such variables)

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c8923c6b852d3a97c1faad0566e38fca330375a7 13-Oct-2005 David S. Miller <davem@davemloft.net> [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.

Original patch by Harald Welte, with feedback from Herbert Xu
and testing by S�bastien Bernard.

EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus
are numbered linearly. That is not necessarily true.

This patch fixes that up by calculating the largest possible
cpu number, and allocating enough per-cpu structure space given
that.

Signed-off-by: David S. Miller <davem@davemloft.net>
05465343bf74e00c8c2c5a310740157de3149f27 22-Aug-2005 Patrick McHardy <kaber@trash.net> [NETFILTER]: Add goto target

Originally written by Henrik Nordstrom <hno@marasystems.com>, taken
from netfilter patch-o-matic and added ip6_tables support.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6869c4d8e066e21623c812c448a05f1ed931c9c6 10-Aug-2005 Harald Welte <laforge@netfilter.org> [NETFILTER]: reduce netfilter sk_buff enlargement

As discussed at netconf'05, we're trying to save every bit in sk_buff.
The patch below makes sk_buff 8 bytes smaller. I did some basic
testing on my notebook and it seems to work.

The only real in-tree user of nfcache was IPVS, who only needs a
single bit. Unfortunately I couldn't find some other free bit in
sk_buff to stuff that bit into, so I introduced a separate field for
them. Maybe the IPVS guys can resolve that to further save space.

Initially I wanted to shrink pkt_type to three bits (PACKET_HOST and
alike are only 6 values defined), but unfortunately the bluetooth code
overloads pkt_type :(

The conntrack-event-api (out-of-tree) uses nfcache, but Rusty just
came up with a way how to do it without any skb fields, so it's safe
to remove it.

- remove all never-implemented 'nfcache' code
- don't have ipvs code abuse 'nfcache' field. currently get's their own
compile-conditional skb->ipvs_property field. IPVS maintainers can
decide to move this bit elswhere, but nfcache needs to die.
- remove skb->nfcache field to save 4 bytes
- move skb->nfctinfo into three unused bits to save further 4 bytes

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e45b1be8bcb3643808975a426fa3e201a2588e87 21-Jun-2005 Patrick McHardy <kaber@trash.net> [NETFILTER]: Kill lockhelp.h

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!