History log of /net/ipv6/xfrm6_policy.c
Revision Date Author Comments
789f202326640814c52f82e80cef3584d8254623 22-Oct-2014 Li RongQing <roy.qing.li@gmail.com> xfrm6: fix a potential use after free in xfrm6_policy.c

pskb_may_pull() maybe change skb->data and make nh and exthdr pointer
oboslete, so recompute the nd and exthdr

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
67ba4152e8b77eada6a9c64e3c2c84d6112794fc 24-Aug-2014 Ian Morris <ipm@chirality.org.uk> ipv6: White-space cleansing : Line Layouts

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
7e14ea1521d9249d9de7f0ea39c9af054745eebd 14-Mar-2014 Steffen Klassert <steffen.klassert@secunet.com> xfrm6: Add IPsec protocol multiplexer

This patch adds an IPsec protocol multiplexer for ipv6. With
this it is possible to add alternative protocol handlers, as
needed for IPsec virtual tunnel interfaces.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
84502b5ef9849a9694673b15c31bd3ac693010ae 30-Oct-2013 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Fix null pointer dereference when decoding sessions

On some codepaths the skb does not have a dst entry
when xfrm_decode_session() is called. So check for
a valid skb_dst() before dereferencing the device
interface index. We use 0 as the device index if
there is no valid skb_dst(), or at reverse decoding
we use skb_iif as device interface index.

Bug was introduced with git commit bafd4bd4dc
("xfrm: Decode sessions with output interface.").

Reported-by: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
eeb1b73378b560e00ff1da2ef09fed9254f4e128 25-Oct-2013 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Increase the garbage collector threshold

With the removal of the routing cache, we lost the
option to tweak the garbage collector threshold
along with the maximum routing cache size. So git
commit 703fb94ec ("xfrm: Fix the gc threshold value
for ipv4") moved back to a static threshold.

It turned out that the current threshold before we
start garbage collecting is much to small for some
workloads, so increase it from 1024 to 32768. This
means that we start the garbage collector if we have
more than 32768 dst entries in the system and refuse
new allocations if we are above 65536.

Reported-by: Wolfgang Walter <linux@stwm.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
bafd4bd4dcfa13145db7f951251eef3e10f8c278 09-Sep-2013 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Decode sessions with output interface.

The output interface matching does not work on forward
policy lookups, the output interface of the flowi is
always 0. Fix this by setting the output interface when
we decode the session.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
84c4a9dfbf430861e7588d95ae3ff61535dca351 10-May-2013 Cong Wang <amwang@redhat.com> xfrm6: release dev before returning error

We forget to call dev_put() on error path in xfrm6_fill_dst(),
its caller doesn't handle this.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18cf0d0784b4a634472ed24d0d7ca1c721d93e90 18-Feb-2013 Romain KUNTZ <r.kuntz@ipflavors.com> xfrm: release neighbor upon dst destruction

Neighbor is cloned in xfrm6_fill_dst but seems to never be released.
Neighbor entry should be released when XFRM6 dst entry is destroyed
in xfrm6_dst_destroy, otherwise references may be kept forever on
the device pointed by the neighbor entry.

I may not have understood all the subtleties of XFRM & dst so I would
be happy to receive comments on this patch.

Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d068875caca3b507ffa8a57d521483fd4eebcc7 06-Feb-2013 Michal Kubecek <mkubecek@suse.cz> xfrm: make gc_thresh configurable in all namespaces

The xfrm gc threshold can be configured via xfrm{4,6}_gc_thresh
sysctl but currently only in init_net, other namespaces always
use the default value. This can substantially limit the number
of IPsec tunnels that can be effectively used.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
887c95cc1da53f66a5890fdeab13414613010097 17-Jan-2013 YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> ipv6: Complete neighbour entry removal from dst_entry.

CC: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
0afe21fdf6cfe0fe8a184d82a399773cc331bf40 16-Nov-2012 Steffen Klassert <steffen.klassert@secunet.com> xfrm6: Remove commented out function call to xfrm6_input_fini

xfrm6_input_fini() is not in the tree since more than 10 years,
so remove the commented out function call.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
c38132865959c47d36b5db1c9289c7391895be6b 15-Nov-2012 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Use a static gc threshold value for ipv6

Unlike ipv4 did, ipv6 does not handle the maximum number of cached
routes dynamically. So no need to try to handle the IPsec gc threshold
value dynamically. This patch sets the IPsec gc threshold value back to
1024 routes, as it is for non-IPsec routes.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
07a936260a94ae4798527ce8f79d4f3b589ab8a3 29-Oct-2012 Amerigo Wang <amwang@redhat.com> ipv6: use IS_ENABLED()

#if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE)

can be replaced by

#if IS_ENABLED(CONFIG_FOO)

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d7b0fc1ef1f17aff57c0dc9a59453d8fca255c3 20-Aug-2012 Patrick McHardy <kaber@trash.net> net: ipv6: fix oops in inet_putpeer()

Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
[<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
[<c038d5f7>] dst_destroy+0x1d/0xa4
[<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
[<c0396870>] flow_cache_gc_task+0x85/0x9f
[<c0142d2b>] process_one_work+0x122/0x441
[<c043feb5>] ? apic_timer_interrupt+0x31/0x38
[<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
[<c0143e2d>] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6700c2709c08d74ae2c3c29b84a30da012dbc7f1 17-Jul-2012 David S. Miller <davem@davemloft.net> net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()

This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more. In the
future the routes will be without destination address, source address,
etc. keying. One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information. This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here. Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
ec18d9a2691d69cd14b48f9b919fddcef28b7f5c 12-Jul-2012 David S. Miller <davem@davemloft.net> ipv6: Add redirect support to all protocol icmp error handlers.

Signed-off-by: David S. Miller <davem@davemloft.net>
97cac0821af4474ec4ba3a9e7a36b98ed9b6db88 03-Jul-2012 David S. Miller <davem@davemloft.net> ipv6: Store route neighbour in rt6_info struct.

This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller <davem@davemloft.net>
97bab73f987e2781129cd6f4b6379bf44d808cc6 10-Jun-2012 David S. Miller <davem@davemloft.net> inet: Hide route peer accesses behind helpers.

We encode the pointer(s) into an unsigned long with one state bit.

The state bit is used so we can store the inetpeer tree root to use
when resolving the peer later.

Later the peer roots will be per-FIB table, and this change works to
facilitate that.

Signed-off-by: David S. Miller <davem@davemloft.net>
ec8f23ce0f4005b74013d4d122e0d540397a93c9 19-Apr-2012 Eric W. Biederman <ebiederm@xmission.com> net: Convert all sysctl registrations to register_net_sysctl

This results in code with less boiler plate that is a bit easier
to read.

Additionally stops us from using compatibility code in the sysctl
core, hastening the day when the compatibility code can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e3fd7a06dc20b2d8ec6892233ad2012968fe7b6 21-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com> net: remove ipv6_addr_copy()

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b71d1d426d263b0b6cb5760322efebbfc89d4463 22-Apr-2011 Eric Dumazet <eric.dumazet@gmail.com> inet: constify ip headers and in6_addr

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1958b856c1a59c0f1e892b92debb8c9fe4f364dc 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Put fl6_* macros to struct flowi6 and use them again.

Signed-off-by: David S. Miller <davem@davemloft.net>
4c9483b2fb5d2548c3cc1fe03cdd4484ceeb5d1c 12-Mar-2011 David S. Miller <davem@davemloft.net> ipv6: Convert to use flowi6 where applicable.

Signed-off-by: David S. Miller <davem@davemloft.net>
7e1dc7b6f709dfc1a9ab4b320dbe723f45992693 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Use flowi4 and flowi6 in xfrm layer.

Signed-off-by: David S. Miller <davem@davemloft.net>
6281dcc94a96bd73017b2baa8fa83925405109ef 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Make flowi ports AF dependent.

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller <davem@davemloft.net>
1d28f42c1bd4bb2363d88df74d0128b4da135b4a 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Put flowi_* prefix on AF independent members of struct flowi

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2774c131b1d19920b4587db1cfbd6f0750ad1f15 01-Mar-2011 David S. Miller <davem@davemloft.net> xfrm: Handle blackhole route creation via afinfo.

That way we don't have to potentially do this in every xfrm_lookup()
caller.

Signed-off-by: David S. Miller <davem@davemloft.net>
5e6b930f21b0a442f9d5db97c8314b4d91be1c27 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify address arguments to ->dst_lookup()

Signed-off-by: David S. Miller <davem@davemloft.net>
0c7b3eefb4ab8df245e94feb0d83c1c3450a3d87 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to ->fill_dst() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
05d8402576c9c1b85bfc9e4f9d6a21c27ccbd5b1 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to ->get_tos() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
62fa8a846d7de4b299232e330c74b7783539df76 27-Jan-2011 David S. Miller <davem@davemloft.net> net: Implement read-only protection and COW'ing of metrics.

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there. Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing. Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads. In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit. But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline. This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.

Signed-off-by: David S. Miller <davem@davemloft.net>
7cc2edb83447775a34ed3bf9d29d8295a434b523 26-Jan-2011 David S. Miller <davem@davemloft.net> xfrm6: Don't forget to propagate peer into ipsec route.

Like ipv4, we have to propagate the ipv6 route peer into
the ipsec top-level route during instantiation.

Signed-off-by: David S. Miller <davem@davemloft.net>
fc66f95c68b6d4535a0ea2ea15d5cf626e310956 08-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com> net dst: use a percpu_counter to track entries

struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real 0m51.179s
user 0m15.329s
sys 10m15.942s

After:

real 0m45.570s
user 0m15.525s
sys 9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real 0m41.841s
user 0m15.261s
sys 8m45.949s

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a02cec2155fbea457eca8881870fd2de1a4c4c76 22-Sep-2010 Eric Dumazet <eric.dumazet@gmail.com> net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
44b451f1633896de15d2d52e1a2bd462e80b7814 02-Jul-2010 Peter Kosyh <p.kosyh@gmail.com> xfrm: fix xfrm by MARK logic

While using xfrm by MARK feature in
2.6.34 - 2.6.35 kernels, the mark
is always cleared in flowi structure via memset in
_decode_session4 (net/ipv4/xfrm4_policy.c), so
the policy lookup fails.
IPv6 code is affected by this bug too.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bc8e4b954e463716a57d8113dd50ae9d47b682a7 22-Apr-2010 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm6: ensure to use the same dev when building a bundle

When building a bundle, we set dst.dev and rt6.rt6i_idev.
We must ensure to set the same device for both fields.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
80c802f3073e84c956846e921e8a0b02dfa3755f 07-Apr-2010 Timo Teräs <timo.teras@iki.fi> xfrm: cache bundles instead of policies for outgoing flows

__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9 02-Mar-2010 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Fix bogus bundle flowi

When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle. Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.

The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.

Thanks to Jamal for find this problem.

Reported-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
d7c7544c3d5f59033d1bf3236bc7b289f5f26b75 25-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: deal with dst entries in netns

GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.

Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.

Reorder GC threshold initialization so it'd be done before registering
XFRM policies.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f8572d8f2a2ba75408b97dc24ef47c83671795d7 05-Nov-2009 Eric W. Biederman <ebiederm@xmission.com> sysctl net: Remove unused binary sysctl code

Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
db71789c01ae7b641f83c5aa64e7df25122f4b28 05-Aug-2009 David S. Miller <davem@davemloft.net> xfrm6: Fix xfrm6_policy.c build when SYSCTL disabled.

Same as how Randy Dunlap fixed the ipv4 side of things.

Signed-off-by: David S. Miller <davem@davemloft.net>
a33bc5c15154c835aae26f16e6a3a7d9ad4acb45 31-Jul-2009 Neil Horman <nhorman@tuxdriver.com> xfrm: select sane defaults for xfrm[4|6] gc_thresh

Choose saner defaults for xfrm[4|6] gc_thresh values on init

Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024). Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.

For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache. Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.

For ipv6, its a bit trickier. there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a44a4a006b860476881ec0098c36584036e1cb91 27-Jul-2009 Neil Horman <nhorman@tuxdriver.com> xfrm: export xfrm garbage collector thresholds via sysctl

Export garbage collector thresholds for xfrm[4|6]_dst_ops

Had a problem reported to me recently in which a high volume of ipsec
connections on a system began reporting ENOBUFS for new connections
eventually.

It seemed that after about 2000 connections we started being unable to
create more. A quick look revealed that the xfrm code used a dst_ops
structure that limited the gc_thresh value to 1024, and always
dropped route cache entries after 2x the gc_thresh.

It seems the most direct solution is to export the gc_thresh values in
the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
that higher volumes of connections can be supported. This patch has
been tested and allows the reporter to increase their ipsec connection
volume successfully.

Reported-by: Joe Nall <joe@nall.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

ipv4/xfrm4_policy.c | 18 ++++++++++++++++++
ipv6/xfrm6_policy.c | 18 ++++++++++++++++++
2 files changed, 36 insertions(+)
Signed-off-by: David S. Miller <davem@davemloft.net>
59cae0092e4da753b5a2adb32933e0d1b223bcc5 02-Jul-2009 Wei Yongjun <yjwei@cn.fujitsu.com> xfrm6: fix the proto and ports decode of sctp protocol

The SCTP pushed the skb above the sctp chunk header, so the
check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
_decode_session6() will never return 0 and the ports decode
of sctp will always fail. (nh + offset + 1 - skb->data < 0)

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
09640e6365c679b5642b1c41b6d7078f51689ddf 01-Feb-2009 Harvey Harrison <harvey.harrison@gmail.com> net: replace uses of __constant_{endian}

Base versions handle constant folding now.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fbda33b2b85941c1ae3a0d89522dec5c1b1bd98c 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: ->get_saddr in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c5b3cf46eabe6e7459125fc6e2033b4222665017 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: ->dst_lookup in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ddcfd79680c1dc74eb5f24aa70785c11bf7eec8f 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: dst garbage-collecting in netns

Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()

[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6bb3ce25d05f2990c8a19adaf427531430267c1f 12-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> net: remove struct dst_entry::entry_size

Unused after kmem_cache_zalloc() conversion.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7e3a42a12c4b9d99bfe81cb929cadf0e08a37c49 02-Nov-2008 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm6: handling fragment

RFC4301 Section 7.1 says:

"7.1. Tunnel Mode SAs that Carry Initial and Non-Initial Fragments

All implementations MUST support tunnel mode SAs that are configured
to pass traffic without regard to port field (or ICMP type/code or
Mobility Header type) values. If the SA will carry traffic for
specified protocols, the selector set for the SA MUST specify the
port fields (or ICMP type/code or Mobility Header type) as ANY. An
SA defined in this fashion will carry all traffic including initial
and non-initial fragments for the indicated Local/Remote addresses
and specified Next Layer protocol(s)."

But for IPv6, fragment is treated as a protocol. This change catches
protocol transported in fragmented packet. In IPv4, there is no
problem.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
191cd582500f49b32a63040fedeebb0168c720af 15-Aug-2008 Brian Haley <brian.haley@hp.com> netns: Add network namespace argument to rt6_fill_node() and ipv6_dev_get_saddr()

ipv6_dev_get_saddr() blindly de-references dst_dev to get the network
namespace, but some callers might pass NULL. Change callers to pass a
namespace pointer instead.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c346dca10840a874240c78efe3f39acf4312a1f2 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
7cbca67c073263c179f605bdbbdc565ab29d801d 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [IPV6]: Support Source Address Selection API (RFC5014).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
4591db4f37618f37a9f1f25d291c3c7a43a15a21 05-Mar-2008 Daniel Lezcano <dlezcano@fr.ibm.com> [NETNS][IPV6] route6 - add netns parameter to ip6_route_output

Add an netns parameter to ip6_route_output. That will allow to access
to the right routing table for outgoing traffic.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5e5f3f0f801321078c897a5de0b4b4304f234da0 03-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [IPV6] ADDRCONF: Convert ipv6_get_saddr() to ipv6_dev_get_saddr().

Since most users of ipv6_get_saddr() pass non-NULL as
dst argument, use ipv6_dev_get_saddr() directly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
e242297055f906e8e225fb95a8edbc88e9052634 31-Jan-2008 Eric Dumazet <dada1@cosmosbay.com> [NET]: should explicitely initialize atomic_t field in struct dst_ops

All but one struct dst_ops static initializations miss explicit
initialization of entries field.

As this field is atomic_t, we should use ATOMIC_INIT(0), and not
rely on atomic_t implementation.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
569d36452ee26c08523cc9f658901c5188640853 18-Jan-2008 Daniel Lezcano <dlezcano@fr.ibm.com> [NETNS][DST] dst: pass the dst_ops as parameter to the gc functions

The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).

The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a1b051405bc16222d92c73b0c26d65b333a154ee 21-Dec-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] IPv6: Fix dst/routing check at transformation.

IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
care about routing changes. It is required by MIPv6 and
off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
d5422efe680fc55010c6ddca2370ca9548a96355 12-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
0013cabab30ec55830ce63d34c0bdd887eb87644 07-Dec-2007 Daniel Lezcano <dlezcano@fr.ibm.com> [IPV6]: Make xfrm6_init to return an error code.

The xfrm initialization function does not return any error code, so if
there is an error, the caller can not be advise of that. This patch
checks the return code of the different called functions in order to
return a successful or failed initialization.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a3e55d68ec5baac578bf32ba67607088c763657 07-Dec-2007 Denis V. Lunev <den@openvz.org> [NET]: Multiple namespaces in the all dst_ifdown routines.

Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
862b82c6f960cc61274d370aa78ce1112f92a83e 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Merge most of the output path

As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur. As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
25ee3286dcbc830a833354bb1d15567956844813 11-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Merge common code into xfrm_bundle_create

Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common. This patch extracts that logic and puts it into
xfrm_bundle_create. The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
66cdb3ca27323a92712d289fc5edc7841d74a139 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Move flow construction into xfrm_dst_lookup

This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function. It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
f04e7e8d7f175c05bbde3ae748bf2541da53721d 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Replace x->type->{local,remote}_addr with flags

The functions local_addr and remote_addr are more than what they're
needed for. The same thing can be done easily with flags on the type
object. This patch does that and simplifies the wrapper functions in
xfrm6_policy accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
fff693888012806370c98c601fbaa141fb12dfca 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Make sure idev is consistent with dev in xfrm_dst

Previously we took the device from the bottom route and idev from the
top route. This is bad because idev may well point to a different
device. This patch changes it so that we get the idev from the device
directly.

It also makes it an error if either dev or idev is NULL. This is
consistent with the rest of the routing code which also treats these
cases as errors.

I've removed the err initialisation in xfrm6_policy.c because it
achieves no purpose and hid a bug when an initial version of this
patch neglected to set err to -ENODEV (fortunately the IPv4 version
warned about it).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
45ff5a3f9a3d0b1b4f063b5285ab39b7fac59471 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Set dst->input to dst_discard

The input function should never be invoked on IPsec dst objects. This
is because we don't apply IPsec on input until after we've made the
routing decision.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8ce68ceb55fb62d2c8e9a3e94c4ef6ff3e3064ce 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Only set neighbour on top xfrm dst

The neighbour field is only used by dst_confirm which only ever happens on
the top-most xfrm dst. So it's a waste to duplicate for every other xfrm
dst. This patch moves its setting out of the loop so that only the top one
gets set.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
b4ce92775c2e7ff9cf79cca4e0a19c8c5fd6287b 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPV6]: Move nfheader_len into rt6_info

The dst member nfheader_len is only used by IPv6. It's also currently
creating a rather ugly alignment hole in struct dst. Therefore this patch
moves it from there into struct rt6_info.

It also reorders the fields in rt6_info to minimize holes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
0148894223740da4818d7f4e6f92cbb5481a25b8 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPV6]: Only set nfheader_len for top xfrm dst

We only need to set nfheader_len in the top xfrm dst. This is because
we only ever read the nfheader_len from the top xfrm dst.

It is also easier to count nfheader_len as part of header_len which
then lets us remove the ugly wrapper functions for incrementing and
decrementing header lengths in xfrm6_policy.c.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1df2e44560c0d72f381126e52a3ba53614c1c484 08-Dec-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [IPV6] XFRM: Fix auditing rt6i_flags; use RTF_xxx flags instead of RTCF_xxx.

RTCF_xxx flags, defined in include/linux/in_route.h) are available for
IPv4 route (rtable) entries only. Use RTF_xxx flags instead, defined
in include/linux/ipv6_route.h, for IPv6 route entries (rt6_info).

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
13996378e6585fb25e582afe7489bf52dde78deb 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Rename mode to outer_mode and add inner_mode

This patch adds a new field to xfrm states called inner_mode. The existing
mode object is renamed to outer_mode.

This is the first part of an attempt to fix inter-family transforms. As it
is we always use the outer family when determining which mode to use. As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.

What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part. For inbound processing
we'd use the opposite pairing.

I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17c2a42a24e1e8dd6aa7cea4f84e034ab1bfff31 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Store afinfo pointer in xfrm_mode

It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family. Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.

This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it. I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1bfcb10f670f5ff5e1d9f53e59680573524cb142 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Add missing BEET checks

Currently BEET mode does not reinject the packet back into the stack
like tunnel mode does. Since BEET should behave just like tunnel mode
this is incorrect.

This patch fixes this by introducing a flags field to xfrm_mode that
tells the IPsec code whether it should terminate and reinject the packet
back into the stack.

It then sets the flag for BEET and tunnel mode.

I've also added a number of missing BEET checks elsewhere where we check
whether a given mode is a tunnel or not.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2774c7aba6c97a2535be3309a2209770953780b3 27-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make the loopback device per network namespace.

This patch makes loopback_dev per network namespace. Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.

This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working. A later pass will be needed to
update the users to use something other than the initial network
namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
de3cb747ffac5f2a4a6bb156e7e2fd5229e688e5 26-Sep-2007 Daniel Lezcano <dlezcano@fr.ibm.com> [NET]: Dynamically allocate the loopback device, part 1.

This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-By: Kirill Korotaev <dev@sw.ru>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
59fbb3a61e02deaeaa4fb50792217921f3002d64 27-Jun-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [IPV6] MIP6: Loadable module support for MIPv6.

This patch makes MIPv6 loadable module named "mip6".

Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:

alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6

Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.

Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO

These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ff50b7997fe06cd5d276b229967bb52d6b3b6c1 21-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: cleanup extra semicolons

Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cfe1fc7759fdacb0c650b575daed1692bf3eaece 16-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_network_header_len

For the common sequence "skb->h.raw - skb->nh.raw", similar to skb->mac_len,
that is precalculated tho, don't think we need to bloat skb with one more
member, so just use this new helper, reducing the number of non-skbuff.h
references to the layer headers even more.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0660e03f6b18f19b6bbafe7583265a51b90daf36 26-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h

Now the skb->nh union has just one member, .raw, i.e. it is just like the
skb->mac union, strange, no? I'm just leaving it like that till the transport
layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or
->mac_header_offset?), ditto for ->{h,nh}.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d56f90a7c96da5187f0cdf07ee7434fe6aa78bbc 11-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_network_header()

For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d3f23dfe8bbb6bf352a208755e4ff2806315067b 20-Feb-2007 Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> [IPSEC]: More fix is needed for __xfrm6_bundle_create().

Fixed to set fl_tunnel.fl6_src correctly in xfrm6_bundle_create().

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bda390d5c883d5dff1f3ae2bade4c25869769894 10-Feb-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch.

It seems to miss RO mode path by IPv6 over IPv4 IPsec tunnel patch
when it changed semantics to check the mode from
"xfrm[i]->props.mode != XFRM_MODE_TRANSPORT" to
"xfrm[i]->props.mode == XFRM_MODE_TUNNEL" before changing address.
It also makes two incline functions __xfrm6_bundle_addr_{remote,local}
are used by nobody.

This patch fixes it.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ab1457c42bc078e5a9becd82a7f9f940b55c53a 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] IPV6: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c82f963efe823d3cacaf1f1b7f1a35cc9628b188 06-Feb-2007 Miika Komu <miika@iki.fi> [IPSEC]: IPv6 over IPv4 IPsec tunnel

This is the patch to support IPv6 over IPv4 IPsec

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ba4e58eca8aa9473b44fdfd312f26c4a2e7798b3 27-Nov-2006 Gerrit Renker <gerrit@erg.abdn.ac.uk> [NET]: Supporting UDP-Lite (RFC 3828) in Linux

This is a revision of the previously submitted patch, which alters
the way files are organized and compiled in the following manner:

* UDP and UDP-Lite now use separate object files
* source file dependencies resolved via header files
net/ipv{4,6}/udp_impl.h
* order of inclusion files in udp.c/udplite.c adapted
accordingly

[NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828)

This patch adds support for UDP-Lite to the IPv4 stack, provided as an
extension to the existing UDPv4 code:
* generic routines are all located in net/ipv4/udp.c
* UDP-Lite specific routines are in net/ipv4/udplite.c
* MIB/statistics support in /proc/net/snmp and /proc/net/udplite
* shared API with extensions for partial checksum coverage

[NET/IPv6]: Extension for UDP-Lite over IPv6

It extends the existing UDPv6 code base with support for UDP-Lite
in the same manner as per UDPv4. In particular,
* UDPv6 generic and shared code is in net/ipv6/udp.c
* UDP-Litev6 specific extensions are in net/ipv6/udplite.c
* MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6
* support for IPV6_ADDRFORM
* aligned the coding style of protocol initialisation with af_inet6.c
* made the error handling in udpv6_queue_rcv_skb consistent;
to return `-1' on error on all error cases
* consolidation of shared code

[NET]: UDP-Lite Documentation and basic XFRM/Netfilter support

The UDP-Lite patch further provides
* API documentation for UDP-Lite
* basic xfrm support
* basic netfilter support for IPv4 and IPv6 (LOG target)

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
8c689a6eae2d83970e4f34753d513e96fb97a025 08-Nov-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: misc annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
4251320fa2ef93207fbefeb2eda2d265b84fc116 17-Oct-2006 Ville Nuorvala <vnuorval@tcs.hut.fi> [IPV6]: Make sure error handling is done when calling ip6_route_output().

As ip6_route_output() never returns NULL, error checking must be done by
looking at dst->error in stead of comparing dst against NULL.

Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b368e61c2bcb2666bb66e2acf1d6d85ba6f474d 05-Oct-2006 Venkat Yekkirala <vyekkirala@trustedcs.com> IPsec: correct semantics for SELinux policy matching

Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.

The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.

This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.

With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).

Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL. We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.

The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).

This patch: Fix the selinux side of things.

This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.

Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
a1e59abf824969554b90facd44a4ab16e265afa4 19-Sep-2006 Patrick McHardy <kaber@trash.net> [XFRM]: Fix wildcard as tunnel source

Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d4a706d852411154d0c91b9ffb3bec68b94b25c 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Add generation count to xfrm_state and xfrm_dst.

Each xfrm_state inserted gets a new generation counter
value. When a bundle is created, the xfrm_dst objects
get the current generation counter of the xfrm_state
they will attach to at dst->xfrm.

xfrm_bundle_ok() will return false if it sees an
xfrm_dst with a generation count different from the
generation count of the xfrm_state that dst points to.

This provides a facility by which to passively and
cheaply invalidate cached IPSEC routes during SA
database changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2ce4272a699c731b9736d76126dc742353e381db 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [IPV6] MIP6: Transformation support mobility header.

Transformation support mobility header.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e53820de0f81da1429048634cadc6ef5f50c2f8b 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] IPV6: Restrict bundle reusing

For outbound transformation, bundle is checked whether it is
suitable for current flow to be reused or not. In such IPv6 case
as below, transformation may apply incorrect bundle for the flow instead
of creating another bundle:

- The policy selector has destination prefix length < 128
(Two or more addresses can be matched it)
- Its bundle holds dst entry of default route whose prefix length < 128
(Previous traffic was used such route as next hop)
- The policy and the bundle were used a transport mode state and
this time flow address is not matched the bundled state.

This issue is found by Mobile IPv6 usage to protect mobility signaling
by IPsec, but it is not a Mobile IPv6 specific.
This patch adds strict check to xfrm_bundle_ok() for each
state mode and address when prefix length is less than 128.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b5c229987dc4d0c92a38fac0cde2aeec08cd775 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Support non-fragment outbound transformation headers.

For originated outbound IPv6 packets which will fragment, ip6_append_data()
should know length of extension headers before sending them and
the length is carried by dst_entry.
IPv6 IPsec headers fragment then transformation was
designed to place all headers after fragment header.
OTOH Mobile IPv6 extension headers do not fragment then
it is a good idea to make dst_entry have non-fragment length to tell it
to ip6_append_data().

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
99505a843673faeae962a8cde128c7c034ba6b5e 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Add a hook to obtain local/remote outbound address.

Outbound transformation replaces both source and destination address with
state's end-point addresses at the same time when IPsec tunnel mode.
It is also required to change them for Mobile IPv6 route optimization, but we
should care about the following differences:
- changing result is not end-point but care-of address
- either source or destination is replaced for each state
This hook is a common platform to change outbound address.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7e49e6de30efa716614e280d97963c570f3acf29 23-Sep-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Add XFRM_MODE_xxx for future use.

Transformation mode is used as either IPsec transport or tunnel.
It is required to add two more items, route optimization and inbound trigger
for Mobile IPv6.
Based on MIPL2 kernel patch.

This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
546be2405be119ef55467aace45f337a16e5d424 28-May-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] xfrm: Undo afinfo lock proliferation

The number of locks used to manage afinfo structures can easily be reduced
down to one each for policy and state respectively. This is based on the
observation that the write locks are only held by module insertion/removal
which are very rare events so there is no need to further differentiate
between the insertion of modules like ipv6 versus esp6.

The removal of the read locks in xfrm4_policy.c/xfrm6_policy.c might look
suspicious at first. However, after you realise that nobody ever takes
the corresponding write lock you'll feel better :)

As far as I can gather it's an attempt to guard against the removal of
the corresponding modules. Since neither module can be unloaded at all
we can leave it to whoever fixes up IPv6 unloading :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5d25a90886d62d88fdd7cd5c3375f4fe436be64 18-Apr-2006 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [IPV6] XFRM: Fix decoding session with preceding extension header(s).

We did not correctly decode session with preceding extension
header(s). This was because we had already pulled preceding
headers, skb->nh.raw + 40 + 1 - skb->data was minus, and
pskb_may_pull() failed.

We now have IP6CB(skb)->nhoff and skb->h.raw, and we can
start parsing / decoding upper layer protocol from current
position.

Tracked down by Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
and tested by Kazunori Miyazawa <kazunori@miyazawa.org>.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e3cae904d7df4f86ea1d13d459e667d389cc35e3 18-Apr-2006 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [IPV6] XFRM: Don't use old copy of pointer after pskb_may_pull().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b8623545b42c03eb92e51b28c84acf4b8ba00a3 15-Dec-2005 Al Viro <viro@zeniv.linux.org.uk> [PATCH] remove bogus asm/bug.h includes.

A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9e999993c71e1506378d26d81f842277aff8a250 19-Dec-2005 Patrick McHardy <kaber@trash.net> [XFRM]: Handle DCCP in xfrm{4,6}_decode_session

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
92d63decc0b6a5d600f792fcf5f3ff9718c09a3d 26-May-2005 Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> From: Kazunori Miyazawa <kazunori@miyazawa.org>

[XFRM] Call dst_check() with appropriate cookie

This fixes infinite loop issue with IPv6 tunnel mode.

Signed-off-by: Kazunori Miyazawa <kazunori@miyazawa.org>
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
aabc9761b69f1bfa30a78f7005be95cc9cc06175 04-May-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Store idev entries

I found a bug that stopped IPsec/IPv6 from working. About
a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
entries. If the cached socket dst entry is IPsec, then rt6i_idev will
be NULL.

Since we want to look at the rt6i_idev of the original route in this
case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
as we do for a number of other IPv6 route attributes. Unfortunately
this means that we need some new code to handle the references to
rt6i_idev. That's why this patch is bigger than it would otherwise be.

I've also done the same thing for IPv4 since it is conceivable that
once these idev attributes start getting used for accounting, we
probably need to dereference them for IPv4 IPsec entries too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!