History log of /net/ipv6/netfilter/nf_conntrack_reasm.c
Revision Date Author Comments
d4ad4d22e7ac6b8711b35d7e86eb29f03f8ac153 01-Aug-2014 Nikolay Aleksandrov <nikolay@redhat.com> inet: frags: use kmem_cache for inet_frag_queue

Use kmem_cache to allocate/free inet_frag_queue objects since they're
all the same size per inet_frags user and are alloced/freed in high volumes
thus making it a perfect case for kmem_cache.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
06aa8b8a0345c78f4d9a1fb3f852952b12a0e40c 01-Aug-2014 Nikolay Aleksandrov <nikolay@redhat.com> inet: frags: rename last_in to flags

The last_in field has been used to store various flags different from
first/last frag in so give it a more descriptive name: flags.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1bab4c75075b84675b96992ac47580a57c26958d 24-Jul-2014 Nikolay Aleksandrov <nikolay@redhat.com> inet: frag: set limits and make init_net's high_thresh limit global

This patch makes init_net's high_thresh limit to be the maximum for all
namespaces, thus introducing a global memory limit threshold equal to the
sum of the individual high_thresh limits which are capped.
It also introduces some sane minimums for low_thresh as it shouldn't be
able to drop below 0 (or > high_thresh in the unsigned case), and
overall low_thresh should not ever be above high_thresh, so we make the
following relations for a namespace:
init_net:
high_thresh - max(not capped), min(init_net low_thresh)
low_thresh - max(init_net high_thresh), min (0)

all other namespaces:
high_thresh = max(init_net high_thresh), min(namespace's low_thresh)
low_thresh = max(namespace's high_thresh), min(0)

The major issue with having low_thresh > high_thresh is that we'll
schedule eviction but never evict anything and thus rely only on the
timers.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ab1c724f633080ed2e8a0cfe61654599b55cf8f9 24-Jul-2014 Florian Westphal <fw@strlen.de> inet: frag: use seqlock for hash rebuild

rehash is rare operation, don't force readers to take
the read-side rwlock.

Instead, we only have to detect the (rare) case where
the secret was altered while we are trying to insert
a new inetfrag queue into the table.

If it was changed, drop the bucket lock and recompute
the hash to get the 'new' chain bucket that we have to
insert into.

Joint work with Nikolay Aleksandrov.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e3a57d18b06179d68fcf7a0a06ad844493c65e06 24-Jul-2014 Florian Westphal <fw@strlen.de> inet: frag: remove periodic secret rebuild timer

merge functionality into the eviction workqueue.

Instead of rebuilding every n seconds, take advantage of the upper
hash chain length limit.

If we hit it, mark table for rebuild and schedule workqueue.
To prevent frequent rebuilds when we're completely overloaded,
don't rebuild more than once every 5 seconds.

ipfrag_secret_interval sysctl is now obsolete and has been marked as
deprecated, it still can be changed so scripts won't be broken but it
won't have any effect. A comment is left above each unused secret_timer
variable to avoid confusion.

Joint work with Nikolay Aleksandrov.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3fd588eb90bfbba17091381006ecafe29c45db4a 24-Jul-2014 Florian Westphal <fw@strlen.de> inet: frag: remove lru list

no longer used.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
86e93e470cadedda9181a2bd9aee1d9d2e5e9c0f 24-Jul-2014 Florian Westphal <fw@strlen.de> inet: frag: move evictor calls into frag_find function

First step to move eviction handling into a work queue.

We lose two spots that accounted evicted fragments in MIB counters.

Accounting will be restored since the upcoming work-queue evictor
invokes the frag queue timer callbacks instead.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
fb3cfe6e75b9d05c87265e85e67d7caf6e5b44a7 24-Jul-2014 Florian Westphal <fw@strlen.de> inet: frag: remove hash size assumptions from callers

hide actual hash size from individual users: The _find
function will now fold the given hash value into the required range.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
36c7778218b93d96d88d68f116a711f6a598b72f 24-Jul-2014 Florian Westphal <fw@strlen.de> inet: frag: constify match, hashfn and constructor arguments

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
60ff746739bf805a912484643c720b6124826140 05-May-2014 WANG Cong <xiyou.wangcong@gmail.com> net: rename local_df to ignore_df

As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6aafeef03b9d9ecf255f3a80ed85ee070260e1ae 06-Nov-2013 Jiri Pirko <jiri@resnulli.us> netfilter: push reasm skb through instead of original frag skbs

Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000 (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b1190570b451fb9fd77be8c115fcdb418c5108a5 23-Oct-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> ipv6: split inet6_hash_frag for netfilter and initialize secrets with net_get_random_once

Defer the fragmentation hash secret initialization for IPv6 like the
previous patch did for IPv4.

Because the netfilter logic reuses the hash secret we have to split it
first. Thus introduce a new nf_hash_frag function which takes care to
seed the hash secret.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b8dd6a223eb86d537c2c6d8d28916c1f0ba3ea3c 24-Mar-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> netfilter: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This change brings netfilter reassembly logic on par with
reassembly.c. The corresponding change in net-next is
(eec2e61 ipv6: implement RFC3168 5.3 (ecn protection) for
ipv6 fragmentation handling)

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 15-Mar-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> inet: limit length of fragment queue hash table bucket lists

This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
14bbd6a565e1bcdc240d44687edb93f721cfdf99 14-Feb-2013 Pravin B Shelar <pshelar@nicira.com> net: Add skb_unclone() helper function.

This function will be used in next GRE_GSO patch. This patch does
not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
894e2ac82bd0029adce7ad6c8d25501fdd82c994 13-Feb-2013 Michal Kubeček <mkubecek@suse.cz> netfilter: nf_ct_reasm: fix per-netns sysctl initialization

Adjusting of data pointers in net/netfilter/nf_conntrack_frag6_*
sysctl table for other namespaces points to wrong netns_frags
structure and has reversed order of entries.

Problem introduced by commit c038a767cd69 in 3.7-rc1

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
3ef0eb0db4bf92c6d2510fe5c4dc51852746f206 29-Jan-2013 Jesper Dangaard Brouer <brouer@redhat.com> net: frag, move LRU list maintenance outside of rwlock

Updating the fragmentation queues LRU (Least-Recently-Used) list,
required taking the hash writer lock. However, the LRU list isn't
tied to the hash at all, so we can use a separate lock for it.

Original-idea-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d433673e5f9180e05a770c4b2ab18c08ad51cc21 29-Jan-2013 Jesper Dangaard Brouer <brouer@redhat.com> net: frag helper functions for mem limit tracking

This change is primarily a preparation to ease the extension of memory
limit tracking.

The change does reduce the number atomic operation, during freeing of
a frag queue. This does introduce a some performance improvement, as
these atomic operations are at the core of the performance problems
seen on NUMA systems.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
97cf00e93cc24898493e7a058105e3215257ee04 07-Dec-2012 Haibo Xi <haibbo@gmail.com> netfilter: nf_ct_reasm: fix conntrack reassembly expire code

Commit b836c99fd6c9 (ipv6: unify conntrack reassembly expire
code with standard one) use the standard IPv6 reassembly
code(ip6_expire_frag_queue) to handle conntrack reassembly expire.

In ip6_expire_frag_queue, it invoke dev_get_by_index_rcu to get
which device received this expired packet.so we must save ifindex
when NF_conntrack get this packet.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 1/2 fragmented
IPv6 packet.

Signed-off-by: Haibo Xi <haibbo@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
f1df1374dc83d62588667e566e959df384718ad1 27-Oct-2012 Hein Tibosch <hein_tibosch@yahoo.es> netfilter: nf_defrag_ipv6: solve section mismatch in nf_conntrack_reasm

WARNING: net/ipv6/netfilter/nf_defrag_ipv6.o(.text+0xe0): Section mismatch in
reference from the function nf_ct_net_init() to the function
.init.text:nf_ct_frag6_sysctl_register()
The function nf_ct_net_init() references the function
__init nf_ct_frag6_sysctl_register().

In case nf_conntrack_ipv6 is compiled as a module, nf_ct_net_init could be
called after the init code and data are unloaded. Therefore remove the
"__net_init" annotation from nf_ct_frag6_sysctl_register().

Signed-off-by: Hein Tibosch <hein_tibosch@yahoo.es>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
4b7cc7fc26598914ba202d04d4196efb26b22e69 25-Sep-2012 Konstantin Khlebnikov <khlebnikov@openvz.org> nf_defrag_ipv6: fix oops on module unloading

fix copy-paste error introduced in linux-next commit
"ipv6: add a new namespace for nf_conntrack_reasm"

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b102865e7ba9ff1e3c49c32c7187bb427d91798 18-Sep-2012 Amerigo Wang <amwang@redhat.com> ipv6: unify fragment thresh handling code

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b836c99fd6c9dfe52a69fa0ba36ec918f80ce02a 18-Sep-2012 Amerigo Wang <amwang@redhat.com> ipv6: unify conntrack reassembly expire code with standard one

Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/

The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.

"
If insufficient fragments are received to complete reassembly of a
packet within 60 seconds of the reception of the first-arriving
fragment of that packet, reassembly of that packet must be
abandoned and all the fragments that have been received for that
packet must be discarded. If the first fragment (i.e., the one
with a Fragment Offset of zero) has been received, an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment.
"

As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UDP packet.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c038a767cd697238b09f7a4ea5a504b4891774e9 18-Sep-2012 Amerigo Wang <amwang@redhat.com> ipv6: add a new namespace for nf_conntrack_reasm

As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4cdd34084d539c758d00c5dc7bf95db2e4f2bc70 26-Aug-2012 Patrick McHardy <kaber@trash.net> netfilter: nf_conntrack_ipv6: improve fragmentation handling

The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
assigned, the reassembled packet is passed through the stack instead
of the original fragments.

- During defragmentation, the largest received fragment size is stored.
On output, the packet is refragmented if required. If the largest
received fragment size exceeds the outgoing MTU, a "packet too big"
message is generated, thus behaving as if the original fragments
were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
connections using a helper, so it is switched to use ipv6_skip_exthdr()
instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
e87cc4728f0e2fb663e592a1141742b1d6c63256 13-May-2012 Joe Perches <joe@perches.com> net: Convert net_ratelimit uses to net_<level>_ratelimited

Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ec8f23ce0f4005b74013d4d122e0d540397a93c9 19-Apr-2012 Eric W. Biederman <ebiederm@xmission.com> net: Convert all sysctl registrations to register_net_sysctl

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5dd3df105b9f6cb7dd2472b59e028d0d1c878ecb 19-Apr-2012 Eric W. Biederman <ebiederm@xmission.com> net: Move all of the network sysctls without a namespace into init_net.

This makes it clearer which sysctls are relative to your current network
namespace.

This makes it a little less error prone by not exposing sysctls for the
initial network namespace in other namespaces.

This is the same way we handle all of our other network interfaces to
userspace and I can't honestly remember why we didn't do this for
sysctls right from the start.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0a9ee81349d90c6c85831f38118bf569c60a4d51 29-Aug-2011 Joe Perches <joe@perches.com> netfilter: Remove unnecessary OOM logging messages

Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
9e903e085262ffbf1fc44a17ac06058aca03524a 18-Oct-2011 Eric Dumazet <eric.dumazet@gmail.com> net: add skb frag size accessors

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bced94ed5efad836859d9426f37f48d46218e99a 20-Jan-2011 Eric Dumazet <eric.dumazet@gmail.com> netfilter: add a missing include in nf_conntrack_reasm.c

After commit ae90bdeaeac6b (netfilter: fix compilation when conntrack is
disabled but tproxy is enabled) we have following warnings :

net/ipv6/netfilter/nf_conntrack_reasm.c:520:16: warning: symbol
'nf_ct_frag6_gather' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:591:6: warning: symbol
'nf_ct_frag6_output' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:612:5: warning: symbol
'nf_ct_frag6_init' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:640:6: warning: symbol
'nf_ct_frag6_cleanup' was not declared. Should it be static?

Fix this including net/netfilter/ipv6/nf_defrag_ipv6.h

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
be9e9163afcfc3137e7c6377cb0c7b406318fde0 15-Nov-2010 Eric Dumazet <eric.dumazet@gmail.com> netfilter: nf_ct_frag6_sysctl_table is static

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
22e091e5253da1e9ad7c0a82c2c84446fc403efe 12-Nov-2010 Shan Wei <shanwei@cn.fujitsu.com> netfilter: ipv6: fix overlap check for fragments

The type of FRAG6_CB(prev)->offset is int, skb->len is *unsigned* int,
and offset is int.

Without this patch, type conversion occurred to this expression, when
(FRAG6_CB(prev)->offset + prev->len) is less than offset.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
7932c2e55c707350ac166effea2f49afe2e47400 26-Oct-2010 David S. Miller <davem@davemloft.net> netfilter: Add missing CONFIG_SYSCTL checks in ipv6's nf_conntrack_reasm.c

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e97c3e278e951501c2f385de70c3ceacdea78c4a 21-Oct-2010 Balazs Scheidler <bazsi@balabit.hu> tproxy: split off ipv6 defragmentation to a separate module

Like with IPv4, TProxy needs IPv6 defragmentation but does not
require connection tracking. Since defragmentation was coupled
with conntrack, I split off the two, creating an nf_defrag_ipv6 module,
similar to the already existing nf_defrag_ipv4.

Signed-off-by: Balazs Scheidler <bazsi@balabit.hu>
Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
1ee89bd0fe72e381c8e4ef887d53c19c8adc6c93 03-Sep-2010 Nicolas Dichtel <nicolas.dichtel@6wind.com> netfilter: discard overlapping IPv6 fragment

RFC5722 prohibits reassembling IPv6 fragments when some data overlaps.

Bug spotted by Zhang Zuotao <zuotao.zhang@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
21dc330157454046dd7c494961277d76e1c957fe 23-Aug-2010 David S. Miller <davem@davemloft.net> net: Rename skb_has_frags to skb_has_frag_list

SKBs can be "fragmented" in two ways, via a page array (called
skb_shinfo(skb)->frags[]) and via a list of SKBs (called
skb_shinfo(skb)->frag_list).

Since skb_has_frags() tests the latter, it's name is confusing
since it sounds more like it's testing the former.

Signed-off-by: David S. Miller <davem@davemloft.net>
698f93159a735bd29a8767c9f60d9b2d75870f8e 02-Jul-2010 Uwe Kleine-König <u.kleine-koenig@pengutronix.de> fix comment/printk typos concerning "already"

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
ea8fbe8f198edea19116d4b61267e12235513225 05-Jul-2010 Changli Gao <xiaosuo@gmail.com> netfilter: nf_conntrack_reasm: add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
0b041f8d1e6fb11a6134d37230da8c2182f99110 14-Jun-2010 Shan Wei <shanwei@cn.fujitsu.com> netfilter: defrag: kill unused work parameter of frag_kfree_skb()

The parameter (work) is unused, remove it.
Reported from Eric Dumazet.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
841a5940eb872d70dad2b9ee7f946d8fd13a8c22 14-Jun-2010 Shan Wei <shanwei@cn.fujitsu.com> netfilter: defrag: remove one redundant atomic ops

Instead of doing one atomic operation per frag, we can factorize them.
Reported from Eric Dumazet.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
b2e0b385d77069031edb957839aaaa8441b47287 23-Mar-2010 Jan Engelhardt <jengelh@medozas.de> netfilter: ipv6: use NFPROTO values for NF_HOOK invocation

The semantic patch that was used:
// <smpl>
@@
@@
(NF_HOOK
|NF_HOOK_THRESH
|nf_hook
)(
-PF_INET6,
+NFPROTO_IPV6,
...)
// </smpl>

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
9e2dcf72023d1447f09c47d77c99b0c49659e5ce 19-Feb-2010 Patrick McHardy <kaber@trash.net> netfilter: nf_conntrack_reasm: properly handle packets fragmented into a single fragment

When an ICMPV6_PKT_TOOBIG message is received with a MTU below 1280,
all further packets include a fragment header.

Unlike regular defragmentation, conntrack also needs to "reassemble"
those fragments in order to obtain a packet without the fragment
header for connection tracking. Currently nf_conntrack_reasm checks
whether a fragment has either IP6_MF set or an offset != 0, which
makes it ignore those fragments.

Remove the invalid check and make reassembly handle fragment queues
containing only a single fragment.

Reported-and-tested-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
c92b544bd5d8e7ed7d81c77bbecab6df2a95aa53 26-Jan-2010 Shan Wei <shanwei@cn.fujitsu.com> ipv6: conntrack: Add member of user to nf_ct_frag6_queue structure

The commit 0b5ccb2(title:ipv6: reassembly: use seperate reassembly queues for
conntrack and local delivery) has broken the saddr&&daddr member of
nf_ct_frag6_queue when creating new queue. And then hash value
generated by nf_hashfn() was not equal with that generated by fq_find().
So, a new received fragment can't be inserted to right queue.

The patch fixes the bug with adding member of user to nf_ct_frag6_queue structure.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7c070aa947d1a4105742378579c267f6e7fd08a1 20-Jan-2010 Shan Wei <shanwei@cn.fujitsu.com> IPv6: reassembly: replace magic number with macro definitions

Use macro to define high/low thresh value, refer to IPV6_FRAG_TIMEOUT.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
b38f6eddeee510ce8178c2d2db54ed25f1d7cb63 20-Jan-2010 Shan Wei <shanwei@cn.fujitsu.com> netfilter: nf_conntrack_ipv6: delete the redundant macro definitions

The following three macro definitions are never used, so delete them.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
0b5ccb2ee250136dd7385b1c7da28417d0d4d32d 15-Dec-2009 Patrick McHardy <kaber@trash.net> ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery

Currently the same reassembly queue might be used for packets reassembled
by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT),
as well as local delivery. This can cause "packet jumps" when the fragment
completing a reassembled packet is queued from a different position in the
stack than the previous ones.

Add a "user" identifier to the reassembly queue key to seperate the queues
of each caller, similar to what we do for IPv4.

Signed-off-by: Patrick McHardy <kaber@trash.net>
f8572d8f2a2ba75408b97dc24ef47c83671795d7 05-Nov-2009 Eric W. Biederman <ebiederm@xmission.com> sysctl net: Remove unused binary sysctl code

Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
343a99724e4431d8618bea2eb7552e12c965425a 09-Jun-2009 David S. Miller <davem@davemloft.net> netfilter: Use frag list abstraction interfaces.

Signed-off-by: David S. Miller <davem@davemloft.net>
d1238d5337e8e53cddea77c2a26d26b6eb5a982f 16-Mar-2009 Christoph Paasch <christoph.paasch@gmail.com> netfilter: conntrack: check for NEXTHDR_NONE before header sanity checking

NEXTHDR_NONE doesn't has an IPv6 option header, so the first check
for the length will always fail and results in a confusing message
"too short" if debugging enabled. With this patch, we check for
NEXTHDR_NONE before length sanity checkings are done.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
6d9f239a1edb31d6133230f478fd1dc2da338ec5 04-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> net: '&' redux

I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.

So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
93c8b90f01f0dc73891da4e84b26524b61d29d66 01-Oct-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> ipv6: almost identical frag hashing funcs combined

$ diff-funcs ip6qhashfn reassembly.c netfilter/nf_conntrack_reasm.c
--- reassembly.c:ip6qhashfn()
+++ netfilter/nf_conntrack_reasm.c:ip6qhashfn()
@@ -1,5 +1,5 @@
-static unsigned int ip6qhashfn(__be32 id, struct in6_addr *saddr,
- struct in6_addr *daddr)
+static unsigned int ip6qhashfn(__be32 id, const struct in6_addr *saddr,
+ const struct in6_addr *daddr)
{
u32 a, b, c;

@@ -9,7 +9,7 @@

a += JHASH_GOLDEN_RATIO;
b += JHASH_GOLDEN_RATIO;
- c += ip6_frags.rnd;
+ c += nf_frags.rnd;
__jhash_mix(a, b, c);

a += (__force u32)saddr->s6_addr32[3];

And codiff xx.o.old xx.o.new:

net/ipv6/netfilter/nf_conntrack_reasm.c:
ip6qhashfn | -512
nf_hashfn | +6
nf_ct_frag6_gather | +36
3 functions changed, 42 bytes added, 512 bytes removed, diff: -470
net/ipv6/reassembly.c:
ip6qhashfn | -512
ip6_hashfn | +7
ipv6_frag_rcv | +89
3 functions changed, 96 bytes added, 512 bytes removed, diff: -416

net/ipv6/reassembly.c:
inet6_hash_frag | +510
1 function changed, 510 bytes added, diff: +510

Total: -376

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
547b792cac0a038b9dbf958d3c120df3740b5572 26-Jul-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> net: convert BUG_TRAP to generic WARN_ON

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
9a375803feaadb6c34e0807bd9325885dcca5c00 28-Jun-2008 Pavel Emelyanov <xemul@openvz.org> inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild

The problem is that while we work w/o the inet_frags.lock even
read-locked the secret rebuild timer may occur (on another CPU, since
BHs are still disabled in the inet_frag_find) and change the rnd seed
for ipv4/6 fragments.

It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244
([INET]: Omit double hash calculations in xxx_frag_intern) late
in the 2.6.24 kernel, so this should probably be queued to -stable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b9c698964614f71b9c8afeca163a945b4c2e2d20 04-Jun-2008 Jarek Poplawski <jarkao2@gmail.com> netfilter: nf_conntrack_ipv6: fix inconsistent lock state in nf_ct_frag6_gather()

[ 63.531438] =================================
[ 63.531520] [ INFO: inconsistent lock state ]
[ 63.531520] 2.6.26-rc4 #7
[ 63.531520] ---------------------------------
[ 63.531520] inconsistent {softirq-on-W} -> {in-softirq-W} usage.
[ 63.531520] tcpsic6/3864 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 63.531520] (&q->lock#2){-+..}, at: [<c07175b0>] ipv6_frag_rcv+0xd0/0xbd0
[ 63.531520] {softirq-on-W} state was registered at:
[ 63.531520] [<c0143bba>] __lock_acquire+0x3aa/0x1080
[ 63.531520] [<c0144906>] lock_acquire+0x76/0xa0
[ 63.531520] [<c07a8f0b>] _spin_lock+0x2b/0x40
[ 63.531520] [<c0727636>] nf_ct_frag6_gather+0x3f6/0x910
...

According to this and another similar lockdep report inet_fragment
locks are taken from nf_ct_frag6_gather() with softirqs enabled, but
these locks are mainly used in softirq context, so disabling BHs is
necessary.

Reported-and-tested-by: Eric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
58c0fb0ddd92e5105d61fc85b6e7af7b1f669067 14-Apr-2008 Jan Engelhardt <jengelh@computergmbh.de> [NETFILTER]: annotate rest of nf_conntrack_* with const

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
e8e16b706e8406f1ab3bccab16932ebc513896d8 29-Mar-2008 David S. Miller <davem@davemloft.net> [INET]: inet_frag_evictor() must run with BH disabled

Based upon a lockdep trace from Dave Jones.

Signed-off-by: David S. Miller <davem@davemloft.net>
bc578a54f0fd489d0722303f9a52508495ccaf9a 29-Mar-2008 Joe Perches <joe@perches.com> [NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_*

On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote:
> they should all be renamed.

Done for include/net and net

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
99fa5f53975b4c09beb5d11ed7418fee2dfb43e3 31-Jan-2008 Patrick McHardy <kaber@trash.net> [NETFILTER]: nf_conntrack_ipv6: fix sparse warnings

CHECK net/ipv6/netfilter/nf_conntrack_reasm.c
net/ipv6/netfilter/nf_conntrack_reasm.c:77:18: warning: symbol 'nf_ct_ipv6_sysctl_table' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:586:16: warning: symbol 'nf_ct_frag6_gather' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:662:6: warning: symbol 'nf_ct_frag6_output' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:683:5: warning: symbol 'nf_ct_frag6_kfree_frags' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:698:5: warning: symbol 'nf_ct_frag6_init' was not declared. Should it be static?
net/ipv6/netfilter/nf_conntrack_reasm.c:717:6: warning: symbol 'nf_ct_frag6_cleanup' was not declared. Should it be static?

Based on patch by Stephen Hemminger with suggestions by Yasuyuki KOZAKAI.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3140c25c82106645a6b1fc469dab7006a1d09fd0 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Make the LRU list per namespace.

The inet_frags.lru_list is used for evicting only, so we have
to make it per-namespace, to evict only those fragments, who's
namespace exceeded its high threshold, but not the whole hash.
Besides, this helps to avoid long loops in evictor.

The spinlock is not per-namespace because it protects the
hash table as well, which is global.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3b4bc4a2bfe80d01ebd4f2b6dcc58986c970ed16 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Isolate the secret interval from namespaces.

Since we have one hashtable to lookup the fragment, having
different secret_interval-s for hash rebuild doesn't make
sense, so move this one to inet_frags.

The inet_frags_ctl becomes empty after this, so remove it.
The appropriate ctl table is kept read-only in namespaces.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e31e0bdc7e7fb9a4b09d2f3266c035a18fdcee9d 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Make thresholds work in namespaces.

This is the same as with the timeout variable.

Currently, after exceeding the high threshold _all_
the fragments are evicted, but it will be fixed in
later patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b2fd5321dd160ef309dfb6cfc78ed8de4a830659 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.

Move it to the netns_frags, adjust the usage and
make the appropriate ctl table writable.

Now fragment, that live in different namespaces can
live for different times.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ddc082223ef0f73717b4133fa7e648842bbfd02 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Make the mem counter per-namespace.

This is also simple, but introduces more changes, since
then mem counter is altered in more places.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5a2bb842cd9681d00d4ca963e63e4d3647e66f8 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Make the nqueues counter per-namespace.

This is simple - just move the variable from struct inet_frags
to struct netns_frags and adjust the usage appropriately.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ac18e7509e7df327e30d6e073a787d922eaf211d 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.

Since fragment management code is consolidated, we cannot have the
pointer from inet_frag_queue to struct net, since we must know what
king of fragment this is.

So, I introduce the netns_frags structure. This one is currently
empty, but will be eventually filled with per-namespace
attributes. Each inet_frag_queue is tagged with this one.

The conntrack_reasm is not "netns-izated", so it has one static
netns_frags instance to keep working in init namespace.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d8354d2fb9277f165715a6e1cb92bcc89259975 22-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NETNS][FRAGS]: Move ctl tables around.

This is a preparation for sysctl netns-ization.
Move the ctl tables to the files, where the tuning
variables reside. Plus make the helpers to register
the tables.

This will simplify the later patches and will keep
similar things closer to each other.

ipv4, ipv6 and conntrack_reasm are patched differently,
but the result is all the tables are in appropriate files.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c95477090a2ace6d241c184adc3fbfcab9c61ceb 18-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate frag queues freeing

Since we now allocate the queues in inet_fragment.c, we
can safely free it in the same place. The ->destructor
callback thus becomes optional for inet_frags.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
48d60056387c37a17a46feda48613587a90535e5 18-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Remove no longer needed ->equal callback

Since this callback is used to check for conflicts in
hashtable when inserting a newly created frag queue, we can
do the same by checking for matching the queue with the
argument, used to create one.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
abd6523d15f40bfee14652619a31a7f65f77f581 18-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate xxx_find() in fragment management

Here we need another callback ->match to check whether the
entry found in hash matches the key passed. The key used
is the same as the creation argument for inet_frag_create.

Yet again, this ->match is the same for netfilter and ipv6.
Running a frew steps forward - this callback will later
replace the ->equal one.

Since the inet_frag_find() uses the already consolidated
inet_frag_create() remove the xxx_frag_create from protocol
codes.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c6fda282294da882f8d8cc4c513940277dd380f5 18-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate xxx_frag_create()

This one uses the xxx_frag_intern() and xxx_frag_alloc()
routines, which are already consolidated, so remove them
from protocol code (as promised).

The ->constructor callback is used to init the rest of
the frag queue and it is the same for netfilter and ipv6.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e521db9d790aaa60ae8920e21cb7faedc280fc36 18-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate xxx_frag_alloc()

Just perform the kzalloc() allocation and setup common
fields in the inet_frag_queue(). Then return the result
to the caller to initialize the rest.

The inet_frag_alloc() may return NULL, so check the
return value before doing the container_of(). This looks
ugly, but the xxx_frag_alloc() will be removed soon.

The xxx_expire() timer callbacks are patches,
because the argument is now the inet_frag_queue, not
the protocol specific queue.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2588fe1d782f1686847493ad643157d5d10bf602 18-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate xxx_frag_intern

This routine checks for the existence of a given entry
in the hash table and inserts the new one if needed.

The ->equal callback is used to compare two frag_queue-s
together, but this one is temporary and will be removed
later. The netfilter code and the ipv6 one use the same
routine to compare frags.

The inet_frag_intern() always returns non-NULL pointer,
so convert the inet_frag_queue into protocol specific
one (with the container_of) without any checks.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
f1673ca52c04f1b311abe03fd67cd4d650d19435 15-Oct-2007 Denis V. Lunev <den@openvz.org> [INET]: kmalloc+memset -> kzalloc in frag_alloc_queue

kmalloc + memset -> kzalloc in frag_alloc_queue

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
762cc40801ad757a34527d5e548816cf3b6fc606 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate the xxx_put

These ones use the generic data types too, so move
them in one place.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4b6cb5d8e3f5707d7a2e55cf7b05f1ea8bfc7a6d 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Small cleanup for xxx_put after evictor consolidation

After the evictor code is consolidated there is no need in
passing the extra pointer to the xxx_put() functions.

The only place when it made sense was the evictor code itself.

Maybe this change must got with the previous (or with the
next) patch, but I try to make them shorter as much as
possible to simplify the review (but they are still large
anyway), so this change goes in a separate patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
8e7999c44ee95e1e90ac91c83557a04e2948f160 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate the xxx_evictor

The evictors collect some statistics for ipv4 and ipv6,
so make it return the number of evicted queues and account
them all at once in the caller.

The XXX_ADD_STATS_BH() macros are just for this case,
but maybe there are places in code, that can make use of
them as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1e4b82873af0f21002e37a81ef063d2e5410deb3 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate the xxx_frag_destroy

To make in possible we need to know the exact frag queue
size for inet_frags->mem management and two callbacks:

* to destoy the skb (optional, used in conntracks only)
* to free the queue itself (mandatory, but later I plan to
move the allocation and the destruction of frag_queues
into the common place, so this callback will most likely
be optional too).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
321a3a99e4717b960e21c62fc6a140d21453df7f 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate xxx_the secret_rebuild

This code works with the generic data types as well, so
move this into inet_fragment.c

This move makes it possible to hide the secret_timer
management and the secret_rebuild routine completely in
the inet_fragment.c

Introduce the ->hashfn() callback in inet_frags() to get
the hashfun for a given inet_frag_queue() object.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
277e650ddfc6944ef5f5466fd898b8da7f06cd82 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Consolidate the xxx_frag_kill

Since now all the xxx_frag_kill functions now work
with the generic inet_frag_queue data type, this can
be moved into a common place.

The xxx_unlink() code is moved as well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
04128f233f2b344f3438cde09723e9946463a573 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Collect common frag sysctl variables together

Some sysctl variables are used to tune the frag queues
management and it will be useful to work with them in
a common way in the future, so move them into one
structure, moreover they are the same for all the frag
management codes.

I don't place them in the existing inet_frags object,
introduced in the previous patch for two reasons:

1. to keep them in the __read_mostly section;
2. not to export the whole inet_frags objects outside.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7eb95156d9dce2f59794264db336ce007d71638b 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Collect frag queues management objects together

There are some objects that are common in all the places
which are used to keep track of frag queues, they are:

* hash table
* LRU list
* rw lock
* rnd number for hash function
* the number of queues
* the amount of memory occupied by queues
* secret timer

Move all this stuff into one structure (struct inet_frags)
to make it possible use them uniformly in the future. Like
with the previous patch this mostly consists of hunks like

- write_lock(&ipfrag_lock);
+ write_lock(&ip4_frags.lock);

To address the issue with exporting the number of queues and
the amount of memory occupied by queues outside the .c file
they are declared in, I introduce a couple of helpers.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5ab11c98d3a950faf6922b6166e5f8fc874590e7 15-Oct-2007 Pavel Emelyanov <xemul@openvz.org> [INET]: Move common fields from frag_queues in one place.

Introduce the struct inet_frag_queue in include/net/inet_frag.h
file and place there all the common fields from three structs:

* struct ipq in ipv4/ip_fragment.c
* struct nf_ct_frag6_queue in nf_conntrack_reasm.c
* struct frag_queue in ipv6/reassembly.c

After this, replace these fields on appropriate structures with
this structure instance and fix the users to use correct names
i.e. hunks like

- atomic_dec(&fq->refcnt);
+ atomic_dec(&fq->q.refcnt);

(these occupy most of the patch)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
0d53778e81ac7af266dac8a20cc328328c327112 08-Jul-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: Convert DEBUGP to pr_debug

Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6f689db51a789807edede411b32eb7c9e457948 23-Mar-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: Use setup_timer

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b88dd966b42e374dc783c397efc15f5c1458265 20-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF] ipv6: Use skb_network_offset in some more places

So that we reduce the number of direct accesses to skb->data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
b0e380b1d8a8e0aca215df97702f99815f05c094 11-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: unions of just one member don't get anything done, kill them

Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and
skb->mac to skb->mac_header, to match the names of the associated helpers
(skb[_[re]set]_{transport,network,mac}_header).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cfe1fc7759fdacb0c650b575daed1692bf3eaece 16-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_network_header_len

For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len,
that is precalculated tho, don't think we need to bloat skb with one more
member, so just use this new helper, reducing the number of non-skbuff.h
references to the layer headers even more.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bff9b61ce330df04c6830d823c30c04203543f01 16-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Use the helpers to get the layer header pointer

Some more cases...

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
967b05f64e27d04a4c8879addd0e1c52137e2c9e 13-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_set_transport_header

For the cases where the transport header is being set to a offset from
skb->data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
badff6d01a8589a1c828b0bf118903ca38627f4e 13-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_reset_transport_header(skb)

For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more "complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0660e03f6b18f19b6bbafe7583265a51b90daf36 26-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h

Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d56f90a7c96da5187f0cdf07ee7434fe6aa78bbc 11-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_network_header()

For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b7aa0bf70c4afb9e38be25f5c0922498d0f8684c 20-Apr-2007 Eric Dumazet <dada1@cosmosbay.com> [NET]: convert network timestamps to ktime_t

We currently use a special structure (struct skb_timeval) and plain
'struct timeval' to store packet timestamps in sk_buffs and struct
sock.

This has some drawbacks :
- Fixed resolution of micro second.
- Waste of space on 64bit platforms where sizeof(struct timeval)=16

I suggest using ktime_t that is a nice abstraction of high resolution
time services, currently capable of nanosecond resolution.

As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits
a 8 byte shrink of this structure on 64bit architectures. Some other
structures also benefit from this size reduction (struct ipq in
ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...)

Once this ktime infrastructure adopted, we can more easily provide
nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or
SO_TIMESTAMPNS/SCM_TIMESTAMPNS)

Note : this patch includes a bug correction in
compat_sock_get_timestamp() where a "err = 0;" was missing (so this
syscall returned -ENOENT instead of 0)

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
CC: Stephen Hemminger <shemminger@linux-foundation.org>
CC: John find <linux.kernel@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ab1457c42bc078e5a9becd82a7f9f940b55c53a 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] IPV6: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
f9f02cca25acf33e5853c6b3cbb0c7146312783f 09-Jan-2007 Patrick McHardy <kaber@trash.net> [NETFILTER]: nf_conntrack_ipv6: fix crash when handling fragments

When IPv6 connection tracking splits up a defragmented packet into
its original fragments, the packets are taken from a list and are
passed to the network stack with skb->next still set. This causes
dev_hard_start_xmit to treat them as GSO fragments, resulting in
a use after free when connection tracking handles the next fragment.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
bff9a89bcac5b68ac0a1ea856b1726a35ae1eabb 03-Dec-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: nf_conntrack: endian annotations

Resync with Al Viro's ip_conntrack annotations and fix a missed
spot in ip_nat_proto_icmp.c.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
94aec08ea426903a3fb3cafd4d8b900cd50df702 18-Sep-2006 Brian Haley <brian.haley@hp.com> [NETFILTER]: Change tunables to __read_mostly

Change some netfilter tunables to __read_mostly. Also fixed some
incorrect file reference comments while I was in there.

(this will be my last __read_mostly patch unless someone points out
something else that needs it)

Signed-off-by: Brian Haley <brian.haley@hp.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
84fa7933a33f806bbbaae6775e87459b1ec584c0 30-Aug-2006 Patrick McHardy <kaber@trash.net> [NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE

Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose
checksum still needs to be completed) and CHECKSUM_COMPLETE (for
incoming packets, device supplied full checksum).

Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
b38dfee3d616ffadb58d4215e3ff9d1d7921031e 10-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au> [NET]: skb_trim audit

I found a few more spots where pskb_trim_rcsum could be used but were not.
This patch changes them to use it.

Also, sk_filter can get paged skb data. Therefore we must use pskb_trim
instead of skb_trim.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ea46c9c12da79ec6eead0cf4b3114143dd30bc1 21-Mar-2006 Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> [NETFILTER]: nf_conntrack: use ipv6_addr_equal in nf_ct_reasm

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2e4e6a17af35be359cc8f1c924f8f198fbd478cc 12-Jan-2006 Harald Welte <laforge@netfilter.org> [NETFILTER] x_tables: Abstraction layer for {ip,ip6,arp}_tables

This monster-patch tries to do the best job for unifying the data
structures and backend interfaces for the three evil clones ip_tables,
ip6_tables and arp_tables. In an ideal world we would never have
allowed this kind of copy+paste programming... but well, our world
isn't (yet?) ideal.

o introduce a new x_tables module
o {ip,arp,ip6}_tables depend on this x_tables module
o registration functions for tables, matches and targets are only
wrappers around x_tables provided functions
o all matches/targets that are used from ip_tables and ip6_tables
are now implemented as xt_FOOBAR.c files and provide module aliases
to ipt_FOOBAR and ip6t_FOOBAR
o header files for xt_matches are in include/linux/netfilter/,
include/linux/netfilter_{ipv4,ipv6} contains compatibility wrappers
around the xt_FOOBAR.h headers

Based on this patchset we're going to further unify the code,
gradually getting rid of all the layer 3 specific assumptions.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
181a46a56e9f852060c54247209e93740329b6eb 04-Jan-2006 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NETFILTER]: Use macro for spinlock_t/rwlock_t initializations/definition.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
302fe1758d85ad9c868e77625f61b7edad106381 15-Nov-2005 Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> [NETFILTER] fix leak of fragment queue at unloading nf_conntrack_ipv6

This patch makes nf_conntrack_ipv6 free all IPv6 fragment queues at module
unloading time. Also introduce a BUG_ON if we ever again have leaks in
the memory accounting.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ba430bc3e243d38c0bb2b185bea664b04fc59df 15-Nov-2005 Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> [NETFILTER] nf_conntrack: fix possibility of infinite loop while evicting nf_ct_frag6_queue

This synchronizes nf_ct_reasm with ipv6 reassembly, and fixes a possibility
of an infinite loop if CPUs evict and create nf_ct_frag6_queue in parallel.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7686a02c0ebc11e4f881fe14db3df18569b7dbc1 15-Nov-2005 Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> [NETFILTER]: fix type of sysctl variables in nf_conntrack_ipv6

These variables should be unsigned. This fixes sysctl handler for
nf_ct_frag6_{low,high}_thresh.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9fb9cbb1082d6b31fb45aa1a14432449a0df6cf1 10-Nov-2005 Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> [NETFILTER]: Add nf_conntrack subsystem.

The existing connection tracking subsystem in netfilter can only
handle ipv4. There were basically two choices present to add
connection tracking support for ipv6. We could either duplicate all
of the ipv4 connection tracking code into an ipv6 counterpart, or (the
choice taken by these patches) we could design a generic layer that
could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
(TCP, UDP, etc.) connection tracking helper module to be written.

In fact nf_conntrack is capable of working with any layer 3
protocol.

The existing ipv4 specific conntrack code could also not deal
with the pecularities of doing connection tracking on ipv6,
which is also cured here. For example, these issues include:

1) ICMPv6 handling, which is used for neighbour discovery in
ipv6 thus some messages such as these should not participate
in connection tracking since effectively they are like ARP
messages

2) fragmentation must be handled differently in ipv6, because
the simplistic "defrag, connection track and NAT, refrag"
(which the existing ipv4 connection tracking does) approach simply
isn't feasible in ipv6

3) ipv6 extension header parsing must occur at the correct spots
before and after connection tracking decisions, and there were
no provisions for this in the existing connection tracking
design

4) ipv6 has no need for stateful NAT

The ipv4 specific conntrack layer is kept around, until all of
the ipv4 specific conntrack helpers are ported over to nf_conntrack
and it is feature complete. Once that occurs, the old conntrack
stuff will get placed into the feature-removal-schedule and we will
fully kill it off 6 months later.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>