History log of /net/xfrm/xfrm_policy.c
Revision Date Author Comments
e4c1721642bbd42d8142f4811cde0588c28db51d 29-May-2013 Paul Moore <pmoore@redhat.com> xfrm: force a garbage collection after deleting a policy

In some cases after deleting a policy from the SPD the policy would
remain in the dst/flow/route cache for an extended period of time
which caused problems for SELinux as its dynamic network access
controls key off of the number of XFRM policy and state entries.
This patch corrects this problem by forcing a XFRM garbage collection
whenever a policy is sucessfully removed.

Reported-by: Ondrej Moris <omoris@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b5fb82c48b5898c50a9cf75fc957911b56fe1dc5 19-Mar-2013 Baker Zhang <baker.kernel@gmail.com> xfrm: use xfrm direction when lookup policy

because xfrm policy direction has same value with corresponding
flow direction, so this problem is covered.

In xfrm_lookup and __xfrm_policy_check, flow_cache_lookup is used to
accelerate the lookup.

Flow direction is given to flow_cache_lookup by policy_to_flow_dir.

When the flow cache is mismatched, callback 'resolver' is called.

'resolver' requires xfrm direction,
so convert direction back to xfrm direction.

Signed-off-by: Baker Zhang <baker.zhang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a 28-Feb-2013 Sasha Levin <sasha.levin@oracle.com> hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7cb8a93968e395e40a72a50da0b6114e752304b4 11-Feb-2013 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Allow inserting policies with matching mark and different priorities

We currently can not insert policies with mark and mask
such that some flows would be matched from both policies.
We make this possible when the priority of these policies
are different. If both policies match a flow, the one with
the higher priority is used.

Reported-by: Emmanuel Thierry <emmanuel.thierry@telecom-bretagne.eu>
Reported-by: Romain Kuntz <r.kuntz@ipflavors.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
a0073fe18e718a1c815fe8b0120f1ac3c60284ba 05-Feb-2013 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Add a state resolution packet queue

As the default, we blackhole packets until the key manager resolves
the states. This patch implements a packet queue where IPsec packets
are queued until the states are resolved. We generate a dummy xfrm
bundle, the output routine of the returned route enqueues the packet
to a per policy queue and arms a timer that checks for state resolution
when dst_output() is called. Once the states are resolved, the packets
are sent out of the queue. If the states are not resolved after some
time, the queue is flushed.

This patch keeps the defaut behaviour to blackhole packets as long
as we have no states. To enable the packet queue the sysctl
xfrm_larval_drop must be switched off.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
70e94e66aec255aff276397f5ed3f3626c548f1c 29-Jan-2013 YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> xfrm: Convert xfrm_addr_cmp() to boolean xfrm_addr_equal().

All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b653b2a1c3b5634368fde2df958a1398481e580 18-Jan-2013 Michal Kubecek <mkubecek@suse.cz> xfrm: fix freed block size calculation in xfrm_policy_fini()

Missing multiplication of block size by sizeof(struct hlist_head)
can cause xfrm_hash_free() to be called with wrong second argument
so that kfree() is called on a block allocated with vzalloc() or
__get_free_pages() or free_pages() is called with wrong order when
a namespace with enough policies is removed.

Bug introduced by commit a35f6c5d, i.e. versions >= 2.6.29 are
affected.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
433a19548061bb5457b6ab77ed7ea58ca6e43ddb 18-Sep-2012 Li RongQing <roy.qing.li@gmail.com> xfrm: fix a read lock imbalance in make_blackhole

if xfrm_policy_get_afinfo returns 0, it has already released the read
lock, xfrm_policy_put_afinfo should not be called again.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee8372dd1989287c5eedb69d44bac43f69e496f1 11-Sep-2012 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm: invalidate dst on policy insertion/deletion

When a policy is inserted or deleted, all dst should be recalculated.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e1760bd5ffae8cb98cffb030ee8e631eba28f3d8 11-Sep-2012 Eric W. Biederman <ebiederm@xmission.com> userns: Convert the audit loginuid to be a kuid

Always store audit loginuids in type kuid_t.

Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.

Modify audit_get_loginuid to return a kuid_t.

Modify audit_set_loginuid to take a kuid_t.

Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.

Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
ef8531b64c3e2443da52e9f05d74a988230eedc5 19-Aug-2012 Eric Dumazet <edumazet@google.com> xfrm: fix RCU bugs

This patch reverts commit 56892261ed1a (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit 418a99ac6ad ( Replace rwlock on xfrm_policy_afinfo
with rcu )

1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH

2) We must defer some writes after the synchronize_rcu() call or a reader
can crash dereferencing NULL pointer.

3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()

4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()

5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
and also move xfrm_policy_get_afinfo() declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d7b0fc1ef1f17aff57c0dc9a59453d8fca255c3 20-Aug-2012 Patrick McHardy <kaber@trash.net> net: ipv6: fix oops in inet_putpeer()

Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
[<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
[<c038d5f7>] dst_destroy+0x1d/0xa4
[<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
[<c0396870>] flow_cache_gc_task+0x85/0x9f
[<c0142d2b>] process_one_work+0x122/0x441
[<c043feb5>] ? apic_timer_interrupt+0x31/0x38
[<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
[<c0143e2d>] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
56892261ed1a854db5363df8bb3fbdb2c6c28d4c 16-Aug-2012 Fan Du <fan.du@windriver.com> xfrm: Use rcu_dereference_bh to deference pointer protected by rcu_read_lock_bh

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
418a99ac6ad487dc9c42e6b0e85f941af56330f2 12-Aug-2012 Priyanka Jain <Priyanka.Jain@freescale.com> Replace rwlock on xfrm_policy_afinfo with rcu

xfrm_policy_afinfo is read mosly data structure.
Write on xfrm_policy_afinfo is done only at the
time of configuration.
So rwlocks can be safely replaced with RCU.

RCUs usage optimizes the performance.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f5b0a8743601a4477419171f5046bd07d1c080a0 19-Jul-2012 David S. Miller <davem@davemloft.net> net: Document dst->obsolete better.

Add a big comment explaining how the field works, and use defines
instead of magic constants for the values assigned to it.

Suggested by Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>
141e369de698f2e17bf716b83fcc647ddcb2220c 06-Jul-2012 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Initialize the struct xfrm_dst behind the dst_enty field

We start initializing the struct xfrm_dst at the first field
behind the struct dst_enty. This is error prone because it
might leave a new field uninitialized. So start initializing
the struct xfrm_dst right behind the dst_entry.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d1e31fb02b31ba88d5650d97c35eb58f52bfe0e1 03-Jul-2012 David S. Miller <davem@davemloft.net> xfrm: No need to copy generic neighbour pointer.

Nobody reads it any longer.

Signed-off-by: David S. Miller <davem@davemloft.net>
f894cbf847c9bea1955095bf37aca6c050553167 03-Jul-2012 David S. Miller <davem@davemloft.net> net: Add optional SKB arg to dst_ops->neigh_lookup().

Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller <davem@davemloft.net>
0c1833797a5a6ec23ea9261d979aa18078720b74 26-May-2012 Gao feng <gaofeng@cn.fujitsu.com> ipv6: fix incorrect ipsec fragment

Since commit ad0081e43a
"ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed"
the fragment of packets is incorrect.
because tunnel mode needs IPsec headers and trailer for all fragments,
while on transport mode it is sufficient to add the headers to the
first fragment and the trailer to the last.

so modify mtu and maxfraglen base on ipsec mode and if fragment is first
or last.

with my test,it work well(every fragment's size is the mtu)
and does not trigger slow fragment path.

Changes from v1:
though optimization, mtu_prev and maxfraglen_prev can be delete.
replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL.
add fuction ip6_append_data_mtu to make codes clearer.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bc9b35ad41387379e0b1257b3171da0dca73562d 15-May-2012 David S. Miller <davem@davemloft.net> xfrm: Convert several xfrm policy match functions to bool.

xfrm_selector_match
xfrm_sec_ctx_match
__xfrm4_selector_match
__xfrm6_selector_match

Signed-off-by: David S. Miller <davem@davemloft.net>
6ce74ec75ca690c4fb3a3c5f8b7767d094d93215 16-Feb-2012 Eric Paris <eparis@redhat.com> SELinux: include flow.h where used rather than get it indirectly

We use flow_cache_genid in the selinux xfrm files. This is declared in
net/flow.h However we do not include that file directly anywhere. We have
always just gotten it through a long chain of indirect .h file includes.

on x86_64:

CC security/selinux/ss/services.o
In file included from
/next/linux-next-20120216/security/selinux/ss/services.c:69:0:
/next/linux-next-20120216/security/selinux/include/xfrm.h: In function 'selinux_xfrm_notify_policyload':
/next/linux-next-20120216/security/selinux/include/xfrm.h:51:14: error: 'flow_cache_genid' undeclared (first use in this function)
/next/linux-next-20120216/security/selinux/include/xfrm.h:51:14: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [security/selinux/ss/services.o] Error 1

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
c0ed1c14a72ca9ebacd51fb94a8aca488b0d361e 21-Dec-2011 Steffen Klassert <steffen.klassert@secunet.com> net: Add a flow_cache_flush_deferred function

flow_cach_flush() might sleep but can be called from
atomic context via the xfrm garbage collector. So add
a flow_cache_flush_deferred() function and use this if
the xfrm garbage colector is invoked from within the
packet path.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
dfd56b8b38fff3586f36232db58e1e9f7885a605 10-Dec-2011 Eric Dumazet <eric.dumazet@gmail.com> net: use IS_ENABLED(CONFIG_IPV6)

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2721745501a26d0dc3b88c0d2f3aa11471891388 02-Dec-2011 David Miller <davem@davemloft.net> net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.

To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
618f9bc74a039da76fa027ac2600c5b785b964c5 23-Nov-2011 Steffen Klassert <steffen.klassert@secunet.com> net: Move mtu handling down to the protocol depended handlers

We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ebb762f27fed083cb993a0816393aba4615f6544 23-Nov-2011 Steffen Klassert <steffen.klassert@secunet.com> net: Rename the dst_opt default_mtu method to mtu

We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
26bff940dd975499c6c47438d4395d7d215911e8 22-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com> xfrm: optimize ipv4 selector matching

Current addr_match() is errh, under-optimized.

Compiler doesn't know that memcmp() branch doesn't trigger for IPv4.
Also, pass addresses by value -- they fit into register.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d4cae56219755ccf8acfc8e2c1927009ff29d8c6 26-Sep-2011 Madalin Bucur <madalin.bucur@freescale.com> net: check return value for dst_alloc

return value of dst_alloc must be checked before use

Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d3aaeb38c40e5a6c08dd31a1b64da65c4352be36 18-Jul-2011 David S. Miller <davem@davemloft.net> net: Add ->neigh_lookup() operation to dst_ops

In the future dst entries will be neigh-less. In that environment we
need to have an easy transition point for current users of
dst->neighbour outside of the packet output fast path.

Signed-off-by: David S. Miller <davem@davemloft.net>
69cce1d1404968f78b177a0314f5822d5afdbbfb 18-Jul-2011 David S. Miller <davem@davemloft.net> net: Abstract dst->neighbour accesses behind helpers.

dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
12fdb4d3babcde43834c54dee22a69bb73adbae7 30-Jun-2011 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Remove family arg from xfrm_bundle_ok

The family arg is not used any more, so remove it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
43a4dea4c9d44baae38ddc14b9b6d86fde4c8b88 09-May-2011 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Assign the inner mode output function to the dst entry

As it is, we assign the outer modes output function to the dst entry
when we create the xfrm bundle. This leads to two problems on interfamily
scenarios. We might insert ipv4 packets into ip6_fragment when called
from xfrm6_output. The system crashes if we try to fragment an ipv4
packet with ip6_fragment. This issue was introduced with git commit
ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
as needed). The second issue is, that we might insert ipv4 packets in
netfilter6 and vice versa on interfamily scenarios.

With this patch we assign the inner mode output function to the dst entry
when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
mode is used and the right fragmentation and netfilter functions are called.
We switch then to outer mode with the output_finish functions.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cf91166223772ef4a2ed98b9874958bf6a2470df 28-Apr-2011 David S. Miller <davem@davemloft.net> net: Use non-zero allocations in dst_alloc().

Make dst_alloc() and it's users explicitly initialize the entire
entry.

The zero'ing done by kmem_cache_zalloc() was almost entirely
redundant.

Signed-off-by: David S. Miller <davem@davemloft.net>
5c1e6aa300a7a669dc469d2dcb20172c6bd8fed9 28-Apr-2011 David S. Miller <davem@davemloft.net> net: Make dst_alloc() take more explicit initializations.

Now the dst->dev, dev->obsolete, and dst->flags values can
be specified as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
fbd5060875d25f7764fd1c3d35b83a8ed1d88d7b 15-Mar-2011 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Refcount destination entry on xfrm_lookup

We return a destination entry without refcount if a socket
policy is found in xfrm_lookup. This triggers a warning on
a negative refcount when freeeing this dst entry. So take
a refcount in this case to fix it.

This refcount was forgotten when xfrm changed to cache bundles
instead of policies for outgoing flows.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
7313714775a6411402f63261c05fbb4ee3d5b64a 15-Mar-2011 Eric Dumazet <eric.dumazet@gmail.com> xfrm: fix __xfrm_route_forward()

This function should return 0 in case of error, 1 if OK
commit 452edd598f60522 (xfrm: Return dst directly from xfrm_lookup())
got it wrong.

Reported-and-bisected-by: Michael Smith <msmith@cbnco.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7e1dc7b6f709dfc1a9ab4b320dbe723f45992693 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Use flowi4 and flowi6 in xfrm layer.

Signed-off-by: David S. Miller <davem@davemloft.net>
56bb8059e1a8bf291054c26367564dc302f6fd8f 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Break struct flowi out into AF specific instances.

Now we have struct flowi4, flowi6, and flowidn for each address
family. And struct flowi is just a union of them all.

It might have been troublesome to convert flow_cache_uli_match() but
as it turns out this function is completely unused and therefore can
be simply removed.

Signed-off-by: David S. Miller <davem@davemloft.net>
6281dcc94a96bd73017b2baa8fa83925405109ef 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Make flowi ports AF dependent.

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.

Signed-off-by: David S. Miller <davem@davemloft.net>
1d28f42c1bd4bb2363d88df74d0128b4da135b4a 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Put flowi_* prefix on AF independent members of struct flowi

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
ca116922afa8cc5ad46b00c0a637b1cde5ca478a 11-Mar-2011 David S. Miller <davem@davemloft.net> xfrm: Eliminate "fl" and "pol" args to xfrm_bundle_ok().

There is only one caller of xfrm_bundle_ok(), and that always passes these
parameters as NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
452edd598f60522c11f7f88fdbab27eb36509d1a 02-Mar-2011 David S. Miller <davem@davemloft.net> xfrm: Return dst directly from xfrm_lookup()

Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2774c131b1d19920b4587db1cfbd6f0750ad1f15 01-Mar-2011 David S. Miller <davem@davemloft.net> xfrm: Handle blackhole route creation via afinfo.

That way we don't have to potentially do this in every xfrm_lookup()
caller.

Signed-off-by: David S. Miller <davem@davemloft.net>
80c0bc9e37adfc892af82cb6aa8cace79f8a96cb 01-Mar-2011 David S. Miller <davem@davemloft.net> xfrm: Kill XFRM_LOOKUP_WAIT flag.

This can be determined from the flow flags instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
9a7386ec999ae226890faea2661b4c7d494bcbb8 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify sec_path arg to secpath_has_nontransport.

Signed-off-by: David S. Miller <davem@davemloft.net>
22cccb7e03125155624d0893b86a151155f1048e 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify ptr args to xfrm_policy_ok.

Signed-off-by: David S. Miller <davem@davemloft.net>
7db454b9125100877b6aa15009cf9a73c68ac755 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify ptr args to xfrm_state_ok.

Signed-off-by: David S. Miller <davem@davemloft.net>
1786b3891c5d72803e48b990ebad4ac1b6fd9700 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify selector arg to xfrm_dst_update_parent.

Signed-off-by: David S. Miller <davem@davemloft.net>
d3e40a9f5ed53894bc0ba8cf010844f1028afe29 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify policy arg to clone_policy.

Signed-off-by: David S. Miller <davem@davemloft.net>
f299d557cb7fca4219020b19dab28ed26738c3ee 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify policy arg and local selector in xfrm_policy_match.

Signed-off-by: David S. Miller <davem@davemloft.net>
0b597e7edfd865cce7b18e71989a992ad0ca898e 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify local xfrm_address_t pointers in xfrm_policy_lookup_bytype.

Signed-off-by: David S. Miller <davem@davemloft.net>
b4b7c0b389131c34b6c3a6bf3f3c4d17fe59155f 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify selector args in xfrm_migrate paths.

Signed-off-by: David S. Miller <davem@davemloft.net>
5f803b58cd8528a93fbb72fa7b011547e7b1a310 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify address args to hash helpers.

Signed-off-by: David S. Miller <davem@davemloft.net>
dd701754e7d230330adc0e212b94106bbfd34841 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify pointer args to migrate_tmpl_match and xfrm_migrate_check

Signed-off-by: David S. Miller <davem@davemloft.net>
6418c4e07991a7b405d86bd4579c670b50fec99d 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify address arguments to __xfrm_dst_lookup()

Signed-off-by: David S. Miller <davem@davemloft.net>
200ce96e5601391a6d97c87067edf21fa94fb74e 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify selector argument to xfrm_selector_match()

Signed-off-by: David S. Miller <davem@davemloft.net>
dee9f4bceb5fd9dbfcc1567148fccdbf16d6a38a 23-Feb-2011 David S. Miller <davem@davemloft.net> net: Make flow cache paths use a const struct flowi.

Signed-off-by: David S. Miller <davem@davemloft.net>
4ca2e685114c55e6777022a46849795d2aa1d31a 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_resolve_and_create_bundle() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
3f0e18fb0e33784525322e51cbfa10369cebd912 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_dst_{alloc_copy,update_origin}() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
98313adaac2bdaeab0b60fb3c6bfc94dd6704d6f 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_bundle_create() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
a6c2e611152fcdc67047aaa56b75b9cfc592ce71 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_tmpl_resolve{,_one}() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
73ff93cd0249e822c4fee367e1fd4ad4a45a5515 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_expand_policies() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
062cdb43b8a8de888a6e2abd31228163cc5d8ee1 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_policy_{lookup_by_type,match}() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
47209abd7925acb3f61ae59884247b612b8904c8 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Kill strict arg to xfrm_bundle_ok().

Always set to "0".

Signed-off-by: David S. Miller <davem@davemloft.net>
e1ad2ab2cf0cabcd81861e2c61870fc27bb27ded 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_selector_match() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
8f029de281b26ec9fd5cd77294db1d35d9876f1a 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_type->reject() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
0c7b3eefb4ab8df245e94feb0d83c1c3450a3d87 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to ->fill_dst() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
05d8402576c9c1b85bfc9e4f9d6a21c27ccbd5b1 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to ->get_tos() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
3c7bd1a14071b99d6535b710bc998ae5d3abbb66 16-Feb-2011 David S. Miller <davem@davemloft.net> net: Add initial_ref arg to dst_alloc().

This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".

Signed-off-by: David S. Miller <davem@davemloft.net>
0b150932197b185ad5816932912e648116c7a96a 11-Feb-2011 Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> xfrm: avoid possible oopse in xfrm_alloc_dst

Commit 80c802f3073e84 (xfrm: cache bundles instead of policies for
outgoing flows) introduced possible oopse when dst_alloc returns NULL.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d33e455337ea2c71d09d7f4367d6ad6dd32b6965 14-Dec-2010 David S. Miller <davem@davemloft.net> net: Abstract default MTU metric calculation behind an accessor.

Like RTAX_ADVMSS, make the default calculation go through a dst_ops
method rather than caching the computation in the routing cache
entries.

Now dst metrics are pretty much left as-is when new entries are
created, thus optimizing metric sharing becomes a real possibility.

Signed-off-by: David S. Miller <davem@davemloft.net>
0dbaee3b37e118a96bb7b8eb0d9bbaeeb46264be 13-Dec-2010 David S. Miller <davem@davemloft.net> net: Abstract default ADVMSS behind an accessor.

Make all RTAX_ADVMSS metric accesses go through a new helper function,
dst_metric_advmss().

Leave the actual default metric as "zero" in the real metric slot,
and compute the actual default value dynamically via a new dst_ops
AF specific callback.

For stacked IPSEC routes, we use the advmss of the path which
preserves existing behavior.

Unlike ipv4/ipv6, DecNET ties the advmss to the mtu and thus updates
advmss on pmtu updates. This inconsistency in advmss handling
results in more raw metric accesses than I wish we ended up with.

Signed-off-by: David S. Miller <davem@davemloft.net>
defb3519a64141608725e2dac5a5aa9a3c644bae 09-Dec-2010 David S. Miller <davem@davemloft.net> net: Abstract away all dst_entry metrics accesses.

Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing. We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
1c4c40c42da468ef02dc04940930c1926c964558 15-Oct-2010 stephen hemminger <shemminger@vyatta.com> xfrm: make xfrm_bundle_ok local

Only used in one place.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8444cf712c5f71845cba9dc30d8f530ff0d5ff83 20-Sep-2010 Thomas Egerer <thomas.egerer@secunet.com> xfrm: Allow different selector family in temporary state

The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d809ec895505e6f35fb1965f0946381ab4eaa474 12-Jul-2010 Timo Teräs <timo.teras@iki.fi> xfrm: do not assume that template resolving always returns xfrms

xfrm_resolve_and_create_bundle() assumed that, if policies indicated
presence of xfrms, bundle template resolution would always return
some xfrms. This is not true for 'use' level policies which can
result in no xfrm's being applied if there is no suitable xfrm states.
This fixes a crash by this incorrect assumption.

Reported-by: George Spelvin <linux@horizon.com>
Bisected-by: George Spelvin <linux@horizon.com>
Tested-by: George Spelvin <linux@horizon.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
1823e4c80eeae2a774c75569ce3035070e5ee009 22-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com> snmp: add align parameter to snmp_mib_init()

In preparation for 64bit snmp counters for some mibs,
add an 'align' parameter to snmp_mib_init(), instead
of assuming mibs only contain 'unsigned long' fields.

Callers can use __alignof__(type) to provide correct
alignment.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b1312c89f0016f778cac4f1536f1434e132f8713 24-Jun-2010 Timo Teräs <timo.teras@iki.fi> xfrm: check bundle policy existance before dereferencing it

Fix the bundle validation code to not assume having a valid policy.
When we have multiple transformations for a xfrm policy, the bundle
instance will be a chain of bundles with only the first one having
the policy reference. When policy_genid is bumped it will expire the
first bundle in the chain which is equivalent of expiring the whole
chain.

Reported-bisected-and-tested-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
fafeeb6c80e3842c6dc19d05de09a23f23eef0d8 01-Jun-2010 Eric Dumazet <eric.dumazet@gmail.com> xfrm: force a dst reference in __xfrm_route_forward()

Packets going through __xfrm_route_forward() have a not refcounted dst
entry, since we enabled a noref forwarding path.

xfrm_lookup() might incorrectly release this dst entry.

It's a bit late to make invasive changes in xfrm_lookup(), so lets force
a refcount in this path.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3fa21e07e6acefa31f974d57fba2b6920a7ebd1a 18-May-2010 Joe Perches <joe@perches.com> net: Remove unnecessary returns from void function()s

This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a1aa3483041bd3691c7f029272ccef4ce70bd957 16-May-2010 Timo Teras <timo.teras@iki.fi> xfrm: fix policy unreferencing on larval drop

I mistakenly had the error path to use num_pols to decide how
many policies we need to drop (cruft from earlier patch set
version which did not handle socket policies right).

This is wrong since normally we do not keep explicit references
(instead we hold reference to the cache entry which holds references
to policies). drop_pols is set to num_pols if we are holding the
references, so use that. Otherwise we eventually BUG_ON inside
xfrm_policy_destroy due to premature policy deletion.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
4b021628beb26238087812829cc080da47e4b236 27-Apr-2010 Changli Gao <xiaosuo@gmail.com> xfrm: potential uninitialized variable num_xfrms

potential uninitialized variable num_xfrms

fix compiler warning: 'num_xfrms' may be used uninitialized in this function.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/xfrm/xfrm_policy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
285ead175c5dd5075cab5b6c94f35a3e6c0a3ae6 07-Apr-2010 Timo Teräs <timo.teras@iki.fi> xfrm: remove policy garbage collection

Policies are now properly reference counted and destroyed from
all code paths. The delayed gc is just an overhead now and can
be removed.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
80c802f3073e84c956846e921e8a0b02dfa3755f 07-Apr-2010 Timo Teräs <timo.teras@iki.fi> xfrm: cache bundles instead of policies for outgoing flows

__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
fe1a5f031e76bd8761a7803d75b95ee96e84a574 07-Apr-2010 Timo Teräs <timo.teras@iki.fi> flow: virtualize flow cache entry methods

This allows to validate the cached object before returning it.
It also allows to destruct object properly, if the last reference
was held in flow cache. This is also a prepartion for caching
bundles in the flow cache.

In return for virtualizing the methods, we save on:
- not having to regenerate the whole flow cache on policy removal:
each flow matching a killed policy gets refreshed as the getter
function notices it smartly.
- we do not have to call flow_cache_flush from policy gc, since the
flow cache now properly deletes the object if it had any references

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
ea2dea9dacc256fe927857feb423872051642ae7 31-Mar-2010 Timo Teräs <timo.teras@iki.fi> xfrm: remove policy lock when accessing policy->walk.dead

All of the code considers ->dead as a hint that the cached policy
needs to get refreshed. The read side can just drop the read lock
without any side effects.

The write side needs to make sure that it's written only exactly
once. Only possible race is at xfrm_policy_kill(). This is fixed
by checking result of __xfrm_policy_unlink() when needed. It will
always succeed if the policy object is looked up from the hash
list (so some checks are removed), but it needs to be checked if
we are trying to unlink policy via a reference (appropriate
checks added).

Since policy->walk.dead is written exactly once, it no longer
needs to be protected with a write lock.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9 02-Mar-2010 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Fix bogus bundle flowi

When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle. Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.

The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.

Thanks to Jamal for find this problem.

Reported-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
fb977e2ca607a7e74946a1de798f474d1b80b9d6 24-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: clone mark when cloning policy

When we clone the SP, we should also clone the mark.
Useful for socket based SPs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
34f8d8846f69f3b5bc3916ba9145e4eebae9394e 22-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: SP lookups with mark

Allow mark to be used when doing SP lookup

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
8ca2e93b557f2a0b35f7769038abf600177e1122 22-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: SP lookups signature with mark

pass mark to all SP lookups to prepare them for when we add code
to have them search.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2f1eb65f366b81aa3c22c31e6e8db26168777ec5 19-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: Flushing empty SPD generates false events

To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.

Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
72032fdbcde8b333e65b3430e1bcb4358e2d6716 18-Feb-2010 jamal <hadi@cyberus.ca> xfrm: Introduce LINUX_MIB_XFRMFWDHDRERROR

XFRMINHDRERROR counter is ambigous when validating forwarding
path. It makes it tricky to debug when you have both in and fwd
validation.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
069c474e88bb7753183f1eadbd7786c27888c8e3 17-Feb-2010 David S. Miller <davem@davemloft.net> xfrm: Revert false event eliding commits.

As reported by Alexey Dobriyan:

--------------------
setkey now takes several seconds to run this simple script
and it spits "recv: Resource temporarily unavailable" messages.

#!/usr/sbin/setkey -f
flush;
spdflush;

add A B ipcomp 44 -m tunnel -C deflate;
add B A ipcomp 45 -m tunnel -C deflate;

spdadd A B any -P in ipsec
ipcomp/tunnel/192.168.1.2-192.168.1.3/use;
spdadd B A any -P out ipsec
ipcomp/tunnel/192.168.1.3-192.168.1.2/use;
--------------------

Obviously applications want the events even when the table
is empty. So we cannot make this behavioral change.

Signed-off-by: David S. Miller <davem@davemloft.net>
7d720c3e4f0c4fc152a6bf17e24244a3c85412d2 16-Feb-2010 Tejun Heo <tj@kernel.org> percpu: add __percpu sparse annotations to net

Add __percpu sparse annotations to net.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

The macro and type tricks around snmp stats make things a bit
interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field
as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All
snmp_mib_*() users which used to cast the argument to (void **) are
updated to cast it to (void __percpu **).

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
0dca3a843632c2fbb6e358734fb08fc23e800f50 11-Feb-2010 jamal <hadi@cyberus.ca> xfrm: Flushing empty SPD generates false events

Observed similar behavior on SPD as previouly seen on SAD flushing..
This fixes it.

cheers,
jamal
commit 428b20432dc31bc2e01a94cd451cf5a2c00d2bf4
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 05:49:38 2010 -0500

xfrm: Flushing empty SPD generates false events

To see the effect make sure you have an empty SPD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm policy flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

Signed-off-by: David S. Miller <davem@davemloft.net>
d7c7544c3d5f59033d1bf3236bc7b289f5f26b75 25-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: deal with dst entries in netns

GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.

Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.

Reorder GC threshold initialization so it'd be done before registering
XFRM policies.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e071041be037eca208b62b84469a06bdfc692bea 23-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: fix "ip xfrm state|policy count" misreport

"ip xfrm state|policy count" report SA/SP count from init_net,
not from netns of caller process.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
96c5340147584481ef0c0afbb5423f7563c1d24a 26-Dec-2009 Ralf Baechle <ralf@linux-mips.org> NET: XFRM: Fix spelling of neighbour.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

net/xfrm/xfrm_policy.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
adf30907d63893e4208dfe3f5c88ae12bc2f25d5 02-Jun-2009 Eric Dumazet <eric.dumazet@gmail.com> net: skb->dst accessors

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
29fa0b301bc823016d1a3bed41c36a8977ef9947 03-Dec-2008 Wei Yongjun <yjwei@cn.fujitsu.com> xfrm: Cleanup for unlink SPD entry

Used __xfrm_policy_unlink() to instead of the dup codes when unlink
SPD entry.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d5654efd3ff1cd0baa935a0c9a5d89862f07d009 03-Dec-2008 Wei Yongjun <yjwei@cn.fujitsu.com> xfrm: Fix kernel panic when flush and dump SPD entries

After flush the SPD entries, dump the SPD entries will cause kernel painc.

Used the following commands to reproduct:

- echo 'spdflush;' | setkey -c
- echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64 any -P out ipsec \
ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
spddump;' | setkey -c
- echo 'spdflush; spddump;' | setkey -c
- echo 'spdadd 3ffe:501:ffff:ff01::/64 3ffe:501:ffff:ff04::/64 any -P out ipsec \
ah/tunnel/3ffe:501:ffff:ff00:200:ff:fe00:b0b0-3ffe:501:ffff:ff02:200:ff:fe00:a1a1/require;\
spddump;' | setkey -c

This is because when flush the SPD entries, the SPD entry is not remove
from the list.

This patch fix the problem by remove the SPD entry from the list.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b27aeadb5948d400df83db4d29590fb9862ba49d 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns sysctls

Make
net.core.xfrm_aevent_etime
net.core.xfrm_acq_expires
net.core.xfrm_aevent_rseqth
net.core.xfrm_larval_drop

sysctls per-netns.

For that make net_core_path[] global, register it to prevent two
/proc/net/core antries and change initcall position -- xfrm_init() is called
from fs_initcall, so this one should be fs_initcall at least.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c68cd1a01ba56995d85a4a62b195b2b3f6415c64 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: /proc/net/xfrm_stat in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
59c9940ed0ef026673cac52f2eaed77af7d486da 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns MIBs

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7c2776ee21a60e0d370538bd08b9ed82979f6e3a 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: flush SA/SPDs on netns stop

SA/SPD doesn't pin netns (and it shouldn't), so get rid of them by hand.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fbda33b2b85941c1ae3a0d89522dec5c1b1bd98c 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: ->get_saddr in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c5b3cf46eabe6e7459125fc6e2033b4222665017 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: ->dst_lookup in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ddcfd79680c1dc74eb5f24aa70785c11bf7eec8f 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: dst garbage-collecting in netns

Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()

[This needs more thoughts on what to do with dst_ops]
[Currently stub to init_net]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3dd0b4997a1d4f3a3666e400cc75b0279ce96849 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: flushing/pruning bundles in netns

Allow netdevice notifier as result.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
99a66657b2f62ae8b2b1e6ffc6abed051e4561ca 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: xfrm_route_forward() in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f6e1e25d703c0a9ba1863384a16851dec52f8e3a 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: xfrm_policy_check in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
52479b623d3d41df84c499325b6a8c7915413032 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: lookup in netns

Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns
to flow_cache_lookup() and resolver callback.

Take it from socket or netdevice. Stub DECnet to init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cdcbca7c1f1946758cfacb69bc1c7eeaccb11e2d 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: policy walking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d1211a6aaea43ea36151c17b0193eb763ff2d7e 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: finding policy in netns

Add netns parameter to xfrm_policy_bysel_ctx(), xfrm_policy_byidx().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
33ffbbd52c327225a3e28485c39dc5746d81be03 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: policy flushing in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1121994c803f4a4f471d617443ff2a09515725e7 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: policy insertion in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e92303f872600978796ff323bc229d911f905849 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: propagate netns into policy byidx hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
98806f75ba2afc716e4d2f915d3ac7687546f9c0 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: trivial netns propagations

Take netns from xfrm_state or xfrm_policy.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
66caf628c3b634c57b14a1a104dcd57e4fab2e3b 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns policy hash resizing work

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dc2caba7b321289e7d02e63d7216961ccecfa103 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns policy counts

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a35f6c5de32664d82c072a7e2c7d5c5234de4158 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_policy_bydst hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8b18f8eaf9207d53ba3e69f2b98d7290f4dec227 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns inexact policies

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8100bea7d619e8496ad8e545d1b41f536e076cd5 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_policy_byidx hashmask

Per-netns hashes are independently resizeable.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
93b851c1c93c7d5cd8d94cd3f3a268b2d5460e9e 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_policy_byidx hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
adfcf0b27e87d16a6a8c364daa724653d4d8930b 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns policy list

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0331b1f383e1fa4049f8e75cafeea8f006171c64 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: add struct xfrm_policy::xp_net

Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
50a30657fd7ee77a94a6bf0ad86eba7c37c3032e 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns km_waitq

Disallow spurious wakeups in __xfrm_lookup().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d62ddc21b674b5ac1466091ff3fbf7baa53bc92c 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: add netns boilerplate

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c95839693d2a6612af7f75ad877012eba2f69757 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> xfrm: initialise xfrm_policy_gc_work statically

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7a12122c7a00347da9669cfcde82954c9e3d6f5e 13-Nov-2008 Arnaud Ebalard <arno@natisbad.org> net: Remove unused parameter of xfrm_gen_index()

In commit 2518c7c2b3d7f0a6b302b4efe17c911f8dd4049f ("[XFRM]: Hash
policies when non-prefixed."), the last use of xfrm_gen_policy() first
argument was removed, but the argument was left behind in the
prototype.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
bbb770e7ab9a436752babfc8765e422d7481be1f 04-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> xfrm: Fix xfrm_policy_gc_lock handling.

From: Alexey Dobriyan <adobriyan@gmail.com>

Based upon a lockdep trace by Simon Arlott.

xfrm_policy_kill() can be called from both BH and
non-BH contexts, so we have to grab xfrm_policy_gc_lock
with BH disabling.

Signed-off-by: David S. Miller <davem@davemloft.net>
21454aaad30651ba0dcc16fe5271bc12ee21f132 31-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace NIPQUAD() in net/*/

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d5917a35ac0d8ebfb4a7d0db3b66054009bd4f37 31-Oct-2008 Alexey Dobriyan <adobriyan@gmail.com> xfrm: C99 for xfrm_dev_notifier

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a432226614c5616e3cfd211e0acffa0acfb4770c 23-Oct-2008 fernando@oss.ntt.co <fernando@oss.ntt.co> xfrm: do not leak ESRCH to user space

I noticed that, under certain conditions, ESRCH can be leaked from the
xfrm layer to user space through sys_connect. In particular, this seems
to happen reliably when the kernel fails to resolve a template either
because the AF_KEY receive buffer being used by racoon is full or
because the SA entry we are trying to use is in XFRM_STATE_EXPIRED
state.

However, since this could be a transient issue it could be argued that
EAGAIN would be more appropriate. Besides this error code is not even
documented in the man page for sys_connect (as of man-pages 3.07).

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b095d98928fdb9e3b75be20a54b7a6cbf6ca9ad 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace %p6 with %pI6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fdb46ee752ed05c94bac71fe3decdb5175ec6e1f 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net, misc: replace uses of NIP6_FMT with %p6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13c1d18931ebb5cf407cb348ef2cd6284d68902d 05-Oct-2008 Arnaud Ebalard <arno@natisbad.org> xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)

Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
12a169e7d8f4b1c95252d8b04ed0f1033ed7cfe2 01-Oct-2008 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Put dumpers on the dump list

Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:

As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.

I've also changed the order of addition on new states to prevent
a never-ending dump.

Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
28faa979746b2352cd78a376bf9f52db953bda46 10-Sep-2008 David S. Miller <davem@davemloft.net> ipsec: Make xfrm_larval_drop default to 1.

The previous default behavior is definitely the least user
friendly. Hanging there forever just because the keying
daemon is wedged or the refreshing of the policy can't move
forward is anti-social to say the least.

Signed-off-by: David S. Miller <davem@davemloft.net>
225f40055f779032974a9fce7b2f9c9eda04ff58 09-Sep-2008 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Restore larval states and socket policies in dump

The commit commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3 ("[XFRM]:
Speed up xfrm_policy and xfrm_state walking") inadvertently removed
larval states and socket policies from netlink dumps. This patch
restores them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d7d74029e0f5fde3b88b39892b9b9cfdf4ea10a 03-Sep-2008 Julien Brunel <brunel@diku.dk> net/xfrm: Use an IS_ERR test rather than a NULL test

In case of error, the function xfrm_bundle_create returns an ERR
pointer, but never returns a NULL pointer. So a NULL test that comes
after an IS_ERR test should be deleted.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match_bad_null_test@
expression x, E;
statement S1,S2;
@@
x = xfrm_bundle_create(...)
... when != x = E
* if (x != NULL)
S1 else S2
// </smpl>

Signed-off-by: Julien Brunel <brunel@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
721499e8931c5732202481ae24f2dfbf9910f129 20-Jul-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> netns: Use net_eq() to compare net-namespaces for optimization.

Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2532386f480eefbdd67b48be55fb4fb3e5a6081c 18-Apr-2008 Eric Paris <eparis@redhat.com> Audit: collect sessionid in netlink messages

Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages. This patch adds that information to netlink messages
so we can audit who sent netlink messages.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
c5d18e984a313adf5a1a4ae69e0b1d93cf410229 22-Apr-2008 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Fix catch-22 with algorithm IDs above 31

As it stands it's impossible to use any authentication algorithms
with an ID above 31 portably. It just happens to work on x86 but
fails miserably on ppc64.

The reason is that we're using a bit mask to check the algorithm
ID but the mask is only 32 bits wide.

After looking at how this is used in the field, I have concluded
that in the long term we should phase out state matching by IDs
because this is made superfluous by the reqid feature. For current
applications, the best solution IMHO is to allow all algorithms when
the bit masks are all ~0.

The following patch does exactly that.

This bug was identified by IBM when testing on the ppc64 platform
using the NULL authentication algorithm which has an ID of 251.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
03e1ad7b5d871d4189b1da3125c2f12d1b5f7d0b 13-Apr-2008 Paul Moore <paul.moore@hp.com> LSM: Make the Labeled IPsec hooks more stack friendly

The xfrm_get_policy() and xfrm_add_pol_expire() put some rather large structs
on the stack to work around the LSM API. This patch attempts to fix that
problem by changing the LSM API to require only the relevant "security"
pointers instead of the entire SPD entry; we do this for all of the
security_xfrm_policy*() functions to keep things consistent.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c346dca10840a874240c78efe3f39acf4312a1f2 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
9bb182a7007515239091b237fe7169b1328a61d3 22-Feb-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [XFRM] MIP6: Fix address keys for routing search.

Each MIPv6 XFRM state (DSTOPT/RH2) holds either destination or source
address to be mangled in the IPv6 header (that is "CoA").
On Inter-MN communication after both nodes binds each other,
they use route optimized traffic two MIPv6 states applied, and
both source and destination address in the IPv6 header
are replaced by the states respectively.
The packet format is correct, however, next-hop routing search
are not.
This patch fixes it by remembering address pairs for later states.

Based on patch from Masahide NAKAMURA <nakam@linux-ipv6.org>.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
4c563f7669c10a12354b72b518c2287ffc6ebfb3 29-Feb-2008 Timo Teras <timo.teras@iki.fi> [XFRM]: Speed up xfrm_policy and xfrm_state walking

Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.

This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.

Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
b791160b5af4ea95c72fb59d13079664beca1963 18-Feb-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [XFRM]: Fix ordering issue in xfrm_dst_hash_transfer().

Keep ordering of policy entries with same selector in
xfrm_dst_hash_transfer().

Issue should not appear in usual cases because multiple policy entries
with same selector are basically not allowed so far. Bug was pointed
out by Sebastien Decugis <sdecugis@hongo.wide.ad.jp>.

We could convert bydst from hlist to list and use list_add_tail()
instead.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Sebastien Decugis <sdecugis@hongo.wide.ad.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
1486cbd777316e55aa30aeb37e231ce618c29d2e 12-Jan-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> [XFRM] xfrm_policy: kill some bloat

net/xfrm/xfrm_policy.c:
xfrm_audit_policy_delete | -692
xfrm_audit_policy_add | -692
2 functions changed, 1384 bytes removed, diff: -1384

net/xfrm/xfrm_policy.c:
xfrm_audit_common_policyinfo | +704
1 function changed, 704 bytes added, diff: +704

net/xfrm/xfrm_policy.o:
3 functions changed, 704 bytes added, 1384 bytes removed, diff: -680

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
64c31b3f76482bb64459e786f9eca3bd0164d153 08-Jan-2008 WANG Cong <xiyou.wangcong@gmail.com> [XFRM] xfrm_policy_destroy: Rename and relative fixes.

Since __xfrm_policy_destroy is used to destory the resources
allocated by xfrm_policy_alloc. So using the name
__xfrm_policy_destroy is not correspond with xfrm_policy_alloc.
Rename it to xfrm_policy_destroy.

And along with some instances that call xfrm_policy_alloc
but not using xfrm_policy_destroy to destroy the resource,
fix them.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
d66e37a99d323012165ce91fd5c4518e2fcea0c5 08-Jan-2008 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] Statistics: Add outbound-dropping error.

o Increment PolError counter when flow_cache_lookup() returns
errored pointer.

o Increment NoStates counter at larval-drop.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
afeb14b49098ba7a51c96e083a4105a0301f94c4 21-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: RFC4303 compliant auditing

This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303. This includes audit hooks for the following events:

* Could not find a valid SA [sections 2.1, 3.4.2]
. xfrm_audit_state_notfound()
. xfrm_audit_state_notfound_simple()

* Sequence number overflow [section 3.3.3]
. xfrm_audit_state_replay_overflow()

* Replayed packet [section 3.4.3]
. xfrm_audit_state_replay()

* Integrity check failure [sections 3.4.4.1, 3.4.4.2]
. xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP. The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
68277accb3a5f004344f4346498640601b8b7016 21-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: Assorted IPsec fixups

This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

* Use the 'audit_enabled' variable already in include/linux/audit.h
Removed the need for extern declarations local to each XFRM audit fuction

* Convert 'sid' to 'secid' everywhere we can
The 'sid' name is specific to SELinux, 'secid' is the common naming
convention used by the kernel when refering to tokenized LSM labels,
unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
we risk breaking userspace

* Convert address display to use standard NIP* macros
Similar to what was recently done with the SPD audit code, this also also
includes the removal of some unnecessary memcpy() calls

* Move common code to xfrm_audit_common_stateinfo()
Code consolidation from the "less is more" book on software development

* Proper spacing around commas in function arguments
Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
0aa647746e5602e608220c10e51f49709a030f5d 21-Dec-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Support to increment packet dropping statistics.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
558f82ef6e0d25e87f7468c07b6db1fbbf95a855 21-Dec-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Define packet dropping statistics.

This statistics is shown factor dropped by transformation
at /proc/net/xfrm_stat for developer.
It is a counter designed from current transformation source code
and defined as linux private MIB.

See Documentation/networking/xfrm_proc.txt for the detail.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a1b051405bc16222d92c73b0c26d65b333a154ee 21-Dec-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] IPv6: Fix dst/routing check at transformation.

IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
care about routing changes. It is required by MIPv6 and
off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
MIPv6 to rt6i_nfheader_len as IPv6 name space.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
aef21785995778f710a60b563e03bf53ba455a47 13-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Fix zero return value in xfrm_lookup on error

Further testing shows that my ICMP relookup patch can cause xfrm_lookup
to return zero on error which isn't very nice since it leads to the caller
dying on null pointer dereference. The bug is due to not setting err
to ENOENT just before we leave xfrm_lookup in case of no policy.

This patch moves the err setting to where it should be.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8b7817f3a959ed99d7443afc12f78a7e1fcc2063 12-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Add ICMP host relookup support

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch implements this
for ICMP traffic that originates from or terminates on localhost.

This is activated on outbound with the new policy flag XFRM_POLICY_ICMP,
and on inbound by the new state flag XFRM_STATE_ICMP.

On inbound the policy check is now performed by the ICMP protocol so
that it can repeat the policy check where necessary.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
d5422efe680fc55010c6ddca2370ca9548a96355 12-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload. This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
815f4e57e9fc67456624ecde0515a901368c78d2 12-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Make xfrm_lookup flags argument a bit-field

This patch introduces an enum for bits in the flags argument of xfrm_lookup.
This is so that we can cram more information into it later.

Since all current users use just the values 0 and 1, XFRM_LOOKUP_WAIT has
been added with the value 1 << 0 to represent the current meaning of flags.

The test in __xfrm_lookup has been changed accordingly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a3e55d68ec5baac578bf32ba67607088c763657 07-Dec-2007 Denis V. Lunev <den@openvz.org> [NET]: Multiple namespaces in the all dst_ifdown routines.

Move dst entries to a namespace loopback to catch refcounting leaks.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
875179fa60ffe2eba1daaefb0af1be97ff5eda6a 01-Dec-2007 Paul Moore <paul.moore@hp.com> [IPSEC]: SPD auditing fix to include the netmask/prefix-length

Currently the netmask/prefix-length of an IPsec SPD entry is not included in
any of the SPD related audit messages. This can cause a problem when the
audit log is examined as the netmask/prefix-length is vital in determining
what network traffic is affected by a particular SPD entry. This patch fixes
this problem by adding two additional fields, "src_prefixlen" and
"dst_prefixlen", to the SPD audit messages to indicate the source and
destination netmasks. These new fields are only included in the audit message
when the netmask/prefix-length is less than the address length, i.e. the SPD
entry applies to a network address and not a host address.

Example audit message:

type=UNKNOWN[1415] msg=audit(1196105849.752:25): auid=0 \
subj=root:system_r:unconfined_t:s0-s0:c0.c1023 op=SPD-add res=1 \
src=192.168.0.0 src_prefixlen=24 dst=192.168.1.0 dst_prefixlen=24

In addition, this patch also fixes a few other things in the
xfrm_audit_common_policyinfo() function. The IPv4 string formatting was
converted to use the standard NIPQUAD_FMT constant, the memcpy() was removed
from the IPv6 code path and replaced with a typecast (the memcpy() was acting
as a slow, implicit typecast anyway), and two local variables were created to
make referencing the XFRM security context and selector information cleaner.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
25ee3286dcbc830a833354bb1d15567956844813 11-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Merge common code into xfrm_bundle_create

Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common. This patch extracts that logic and puts it into
xfrm_bundle_create. The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
66cdb3ca27323a92712d289fc5edc7841d74a139 14-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Move flow construction into xfrm_dst_lookup

This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function. It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
b24b8a247ff65c01b252025926fe564209fae4fc 24-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NET]: Convert init_timer into setup_timer

Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5951cab136d8b7e84696061dc2e69c402bc94f61 20-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: Audit function arguments misordered

In several places the arguments to the xfrm_audit_start() function are
in the wrong order resulting in incorrect user information being
reported. This patch corrects this by pacing the arguments in the
correct order.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
75b8c133267053c9986a7c8db5131f0e7349e806 11-Dec-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Fix potential dst leak in xfrm_lookup

If we get an error during the actual policy lookup we don't free the
original dst while the caller expects us to always free the original
dst in case of error.

This patch fixes that.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5e5234ff17ef98932688116025b30958bd28a940 29-Nov-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Fix uninitialised dst warning in __xfrm_lookup

Andrew Morton reported that __xfrm_lookup generates this warning:

net/xfrm/xfrm_policy.c: In function '__xfrm_lookup':
net/xfrm/xfrm_policy.c:1449: warning: 'dst' may be used uninitialized in this function

This is because if policy->action is of an unexpected value then dst will
not be initialised. Of course, in practice this should never happen since
the input layer xfrm_user/af_key will filter out all illegal values. But
the compiler doesn't know that of course.

So this patch fixes this by taking the conservative approach and treat all
unknown actions the same as a blocking action.

Thanks to Andrew for finding this and providing an initial fix.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
13996378e6585fb25e582afe7489bf52dde78deb 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Rename mode to outer_mode and add inner_mode

This patch adds a new field to xfrm states called inner_mode. The existing
mode object is renamed to outer_mode.

This is the first part of an attempt to fix inter-family transforms. As it
is we always use the outer family when determining which mode to use. As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.

What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part. For inbound processing
we'd use the opposite pairing.

I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1bfcb10f670f5ff5e1d9f53e59680573524cb142 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Add missing BEET checks

Currently BEET mode does not reinject the packet back into the stack
like tunnel mode does. Since BEET should behave just like tunnel mode
this is incorrect.

This patch fixes this by introducing a flags field to xfrm_mode that
tells the IPsec code whether it should terminate and reinject the packet
back into the stack.

It then sets the flag for BEET and tunnel mode.

I've also added a number of missing BEET checks elsewhere where we check
whether a given mode is a tunnel or not.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
aa5d62cc8777f733f8b59b5586c0a1989813189e 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Move type and mode map into xfrm_state.c

The type and mode maps are only used by SAs, not policies. So it makes
sense to move them from xfrm_policy.c into xfrm_state.c. This also allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.

The only other change I've made in the move is to get rid of the casts
on the request_module call for types. They're unnecessary because C
will promote them to ints anyway.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ecafede835321ebdc396531245adc37d22366f7 09-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Remove bogus ref count in xfrm_secpath_reject

Constructs of the form

xfrm_state_hold(x);
foo(x);
xfrm_state_put(x);

tend to be broken because foo is either synchronous where this is totally
unnecessary or if foo is asynchronous then the reference count is in the
wrong spot.

In the case of xfrm_secpath_reject, the function is synchronous and therefore
we should just kill the reference count.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2774c7aba6c97a2535be3309a2209770953780b3 27-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make the loopback device per network namespace.

This patch makes loopback_dev per network namespace. Adding
code to create a different loopback device for each network
namespace and adding the code to free a loopback device
when a network namespace exits.

This patch modifies all users the loopback_dev so they
access it as init_net.loopback_dev, keeping all of the
code compiling and working. A later pass will be needed to
update the users to use something other than the initial network
namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
de3cb747ffac5f2a4a6bb156e7e2fd5229e688e5 26-Sep-2007 Daniel Lezcano <dlezcano@fr.ibm.com> [NET]: Dynamically allocate the loopback device, part 1.

This patch replaces all occurences to the static variable
loopback_dev to a pointer loopback_dev. That provides the
mindless, trivial, uninteressting change part for the dynamic
allocation for the loopback.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-By: Kirill Korotaev <dev@sw.ru>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e9dc86534051b78e41e5b746cccc291b57a3a311 12-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make device event notification network namespace safe

Every user of the network device notifiers is either a protocol
stack or a pseudo device. If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.

To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.

As the rest of the code is made network namespace aware these
checks can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ab5f5e8b144e4c804ef3aa1ce08a9ca9f01187ce 17-Sep-2007 Joy Latten <latten@austin.ibm.com> [XFRM]: xfrm audit calls

This patch modifies the current ipsec audit layer
by breaking it up into purpose driven audit calls.

So far, the only audit calls made are when add/delete
an SA/policy. It had been discussed to give each
key manager it's own calls to do this, but I found
there to be much redundnacy since they did the exact
same things, except for how they got auid and sid, so I
combined them. The below audit calls can be made by any
key manager. Hopefully, this is ok.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f7944fb1913130ae7858008af96e52a3a6b04118 25-Aug-2007 Thomas Graf <tgraf@suug.ch> [XFRM] policy: Replace magic number with XFRM_POLICY_OUT

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
b5890d8ba47741425fe3c0d753e1b57bc0561b7b 11-Aug-2007 Jesper Juhl <jesper.juhl@gmail.com> [XFRM]: Clean up duplicate includes in net/xfrm/

This patch cleans up duplicate includes in
net/xfrm/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6e0871cce2ae04f5790543ad2f4ec36b23260ba 01-Aug-2007 Paul Moore <paul.moore@hp.com> Net/Security: fix memory leaks from security_secid_to_secctx()

The security_secid_to_secctx() function returns memory that must be freed
by a call to security_release_secctx() which was not always happening. This
patch fixes two of these problems (all that I could find in the kernel source
at present).

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: James Morris <jmorris@namei.org>
48b8d78315bf2aef4b6b4fb41c2c94e0b6600234 26-Jul-2007 Joakim Koskela <jookos@gmail.com> [XFRM]: State selection update to use inner addresses.

This patch modifies the xfrm state selection logic to use the inner
addresses where the outer have been (incorrectly) used. This is
required for beet mode in general and interfamily setups in both
tunnel and beet mode.

Signed-off-by: Joakim Koskela <jookos@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Diego Beltrami <diego.beltrami@gmail.com>
Signed-off-by: Miika Komu <miika@iki.fi>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
20c2df83d25c6a95affe6157a4c9cac4cf5ffaac 20-Jul-2007 Paul Mundt <lethal@linux-sh.org> mm: Remove slab destructors from kmem_cache_create().

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
7dc12d6dd6cc1aa489c6f3e34a75e8023c945da8 19-Jul-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] XFRM: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
bd0bf0765ea1fba80d7085e1f0375ec045631dc1 18-Jul-2007 Patrick McHardy <kaber@trash.net> [XFRM]: Fix crash introduced by struct dst_entry reordering

XFRM expects xfrm_dst->u.next to be same pointer as dst->next, which
was broken by the dst_entry reordering in commit 1e19e02c~, causing
an oops in xfrm_bundle_ok when walking the bundle upwards.

Kill xfrm_dst->u.next and change the only user to use dst->next instead.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4aa2e62c45b5ca08be2d0d3c0744d7585b56e860 05-Jun-2007 Joy Latten <latten@austin.ibm.com> xfrm: Add security check before flushing SAD/SPD

Currently we check for permission before deleting entries from SAD and
SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
However we are not checking for authorization when flushing the SPD and
the SAD completely. It was perhaps missed in the original security hooks
patch.

This patch adds a security check when flushing entries from the SAD and
SPD. It runs the entire database and checks each entry for a denial.
If the process attempting the flush is unable to remove all of the
entries a denial is logged the the flush function returns an error
without removing anything.

This is particularly useful when a process may need to create or delete
its own xfrm entries used for things like labeled networking but that
same process should not be able to delete other entries or flush the
entire database.

Signed-off-by: Joy Latten<latten@austin.ibm.com>
Signed-off-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
aad0e0b9b6e4f7085d5e2ec4b5bb59ffecd8b1fb 25-May-2007 David S. Miller <davem@sunset.davemloft.net> [XFRM]: xfrm_larval_drop sysctl should be __read_mostly.

Signed-off-by: David S. Miller <davem@davemloft.net>
14e50e57aedb2a89cf79b77782879769794cab7b 25-May-2007 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Allow packet drops during larval state resolution.

The current IPSEC rule resolution behavior we have does not work for a
lot of people, even though technically it's an improvement from the
-EAGAIN buisness we had before.

Right now we'll block until the key manager resolves the route. That
works for simple cases, but many folks would rather packets get
silently dropped until the key manager resolves the IPSEC rules.

We can't tell these folks to "set the socket non-blocking" because
they don't have control over the non-block setting of things like the
sockets used to resolve DNS deep inside of the resolver libraries in
libc.

With that in mind I coded up the patch below with some help from
Herbert Xu which provides packet-drop behavior during larval state
resolution, controllable via sysctl and off by default.

This lays the framework to either:

1) Make this default at some point or...

2) Move this logic into xfrm{4,6}_policy.c and implement the
ARP-like resolution queue we've all been dreaming of.
The idea would be to queue packets to the policy, then
once the larval state is resolved by the key manager we
re-resolve the route and push the packets out. The
packets would timeout if the rule didn't get resolved
in a certain amount of time.

Signed-off-by: David S. Miller <davem@davemloft.net>
b5505c6e1071b32176c7548a3aaf0be49f7c763e 14-May-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Check validity of direction in xfrm_policy_byid

The function xfrm_policy_byid takes a dir argument but finds the policy
using the index instead. We only use the dir argument to update the
policy count for that direction. Since the user can supply any value
for dir, this can corrupt our policy count.

I know this is the problem because a few days ago I was deleting
policies by hand using indicies and accidentally typed in the wrong
direction. It still deleted the policy and at the time I thought
that was cool. In retrospect it isn't such a good idea :)

I decided against letting it delete the policy anyway just in case
we ever remove the connection between indicies and direction.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a6d34162f5c6f522f857df274f1c8240f161e11 04-May-2007 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM] SPD info TLV aggregation

Aggregate the SPD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
157bfc25020f7eb731f94140e099307ade47299e 30-Apr-2007 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Restrict upper layer information by bundle.

On MIPv6 usage, XFRM sub policy is enabled.
When main (IPsec) and sub (MIPv6) policy selectors have the same
address set but different upper layer information (i.e. protocol
number and its ports or type/code), multiple bundle should be created.
However, currently we have issue to use the same bundle created for
the first time with all flows covered by the case.

It is useful for the bundle to have the upper layer information
to be restructured correctly if it does not match with the flow.

1. Bundle was created by two policies
Selector from another policy is added to xfrm_dst.
If the flow does not match the selector, it goes to slow path to
restructure new bundle by single policy.

2. Bundle was created by one policy
Flow cache is added to xfrm_dst as originated one. If the flow does
not match the cache, it goes to slow path to try searching another
policy.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ecfd6b183780c6d9e85873693b3ce6c5f4d08b58 29-Apr-2007 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM]: Export SPD info

With this patch you can use iproute2 in user space to efficiently see
how many policies exist in different directions.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ff50b7997fe06cd5d276b229967bb52d6b3b6c1 21-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: cleanup extra semicolons

Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d729f72dca9406025bcfa9c1f660d71d9ef0ff5 05-Mar-2007 James Morris <jmorris@namei.org> [NET]: Convert xtime.tv_sec to get_seconds()

Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
961995582e3752e983dc3906a57546a188007440 20-Mar-2007 Joy Latten <latten@austin.ibm.com> [XFRM]: ipsecv6 needs a space when printing audit record.

This patch adds a space between printing of the src and dst ipv6 addresses.
Otherwise, audit or other test tools may fail to process the audit
record properly because they cannot find the dst address.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ef41aaa0b755f479012341ac11db9ca5b8928d98 08-Mar-2007 Eric Paris <eparis@redhat.com> [IPSEC]: xfrm_policy delete security check misplaced

The security hooks to check permissions to remove an xfrm_policy were
actually done after the policy was removed. Since the unlinking and
deletion are done in xfrm_policy_by* functions this moves the hooks
inside those 2 functions. There we have all the information needed to
do the security check and it can be done before the deletion. Since
auditing requires the result of that security check err has to be passed
back and forth from the xfrm_policy_by* functions.

This patch also fixes a bug where a deletion that failed the security
check could cause improper accounting on the xfrm_policy
(xfrm_get_policy didn't have a put on the exit path for the hold taken
by xfrm_policy_by*)

It also fixes the return code when no policy is found in
xfrm_add_pol_expire. In old code (at least back in the 2.6.18 days) err
wasn't used before the return when no policy is found and so the
initialization would cause err to be ENOENT. But since err has since
been used above when we don't get a policy back from the xfrm_policy_by*
function we would always return 0 instead of the intended ENOENT. Also
fixed some white space damage in the same area.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Venkat Yekkirala <vyekkirala@trustedcs.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
928ba4169dc1d82c83105831f5ddb5472379b440 13-Feb-2007 Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> [IPSEC]: Fix the address family to refer encap_family

Fix the address family to refer encap_family
when comparing with a kernel generated xfrm_state

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
13fcfbb0675bf87da694f55dec11cada489a205c 12-Feb-2007 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Fix OOPSes in xfrm_audit_log().

Make sure that this function is called correctly, and
add BUG() checking to ensure the arguments are sane.

Based upon a patch by Joy Latten.

Signed-off-by: David S. Miller <davem@davemloft.net>
a716c1197d608c55adfba45692a890ca64e10df0 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] XFRM: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e610e679dd0057403c96cd31f8739792780732ee 08-Feb-2007 David S. Miller <davem@sunset.davemloft.net> [XFRM]: xfrm_migrate() needs exporting to modules.

Needed by xfrm_user and af_key.

Signed-off-by: David S. Miller <davem@davemloft.net>
80c9abaabf4283f7cf4a0b3597cd302506635b7f 08-Feb-2007 Shinta Sugimoto <shinta.sugimoto@ericsson.com> [XFRM]: Extension for dynamic update of endpoint address(es)

Extend the XFRM framework so that endpoint address(es) in the XFRM
databases could be dynamically updated according to a request (MIGRATE
message) from user application. Target XFRM policy is first identified
by the selector in the MIGRATE message. Next, the endpoint addresses
of the matching templates and XFRM states are updated according to
the MIGRATE message.

Signed-off-by: Shinta Sugimoto <shinta.sugimoto@ericsson.com>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a6c7ab55dda3e16ab5a3cf6f39585aee5876ac3a 17-Jan-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Policy list disorder

The recent hashing introduced an off-by-one bug in policy list insertion.
Instead of adding after the last entry with a lesser or equal priority,
we're adding after the successor of that entry.

This patch fixes this and also adds a warning if we detect a duplicate
entry in the policy list. This should never happen due to this if clause.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
e18b890bb0881bbab6f4f1a6cd20d9c60d66b003 07-Dec-2006 Christoph Lameter <clameter@sgi.com> [PATCH] slab: remove kmem_cache_t

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
c9204d9ca79baac564b49d36d0228a69d7ded084 30-Nov-2006 Joy Latten <latten@austin.ibm.com> audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n

Disables auditing in ipsec when CONFIG_AUDITSYSCALL is
disabled in the kernel.

Also includes a bug fix for xfrm_state.c as a result of
original ipsec audit patch.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
161a09e737f0761ca064ee6a907313402f7a54b6 27-Nov-2006 Joy Latten <latten@austin.ibm.com> audit: Add auditing to ipsec

An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
baf5d743d1b8783fdbd5c1260ada2926e5bbaaee 05-Dec-2006 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM] Optimize policy dumping

This change optimizes the dumping of Security policies.

1) Before this change ..
speedopolis:~# time ./ip xf pol

real 0m22.274s
user 0m0.000s
sys 0m22.269s

2) Turn off sub-policies

speedopolis:~# ./ip xf pol

real 0m13.496s
user 0m0.000s
sys 0m13.493s

i suppose the above is to be expected

3) With this change ..
speedopolis:~# time ./ip x policy

real 0m7.901s
user 0m0.008s
sys 0m7.896s
76b3f055f38954c67dab13844eb92203580038f8 01-Dec-2006 Miika Komu <miika@iki.fi> [IPSEC]: Add encapsulation family.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
776810217ac558351cdcca01c4c6a9474e4a68c2 09-Nov-2006 Andrew Morton <akpm@osdl.org> [XFRM]: uninline xfrm_selector_match()

Six callsites, huge.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
67f83cbf081a70426ff667e8d14f94e13ed3bdca 09-Nov-2006 Venkat Yekkirala <vyekkirala@trustedcs.com> SELinux: Fix SA selection semantics

Fix the selection of an SA for an outgoing packet to be at the same
context as the originating socket/flow. This eliminates the SELinux
policy's ability to use/sendto SAs with contexts other than the socket's.

With this patch applied, the SELinux policy will require one or more of the
following for a socket to be able to communicate with/without SAs:

1. To enable a socket to communicate without using labeled-IPSec SAs:

allow socket_t unlabeled_t:association { sendto recvfrom }

2. To enable a socket to communicate with labeled-IPSec SAs:

allow socket_t self:association { sendto };
allow socket_t peer_sa_t:association { recvfrom };

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
3bccfbc7a7ba4085817deae6e7c67daf0cbd045a 05-Oct-2006 Venkat Yekkirala <vyekkirala@trustedcs.com> IPsec: fix handling of errors for socket policies

This treats the security errors encountered in the case of
socket policy matching, the same as how these are treated in
the case of main/sub policies, which is to return a full lookup
failure.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
5b368e61c2bcb2666bb66e2acf1d6d85ba6f474d 05-Oct-2006 Venkat Yekkirala <vyekkirala@trustedcs.com> IPsec: correct semantics for SELinux policy matching

Currently when an IPSec policy rule doesn't specify a security
context, it is assumed to be "unlabeled" by SELinux, and so
the IPSec policy rule fails to match to a flow that it would
otherwise match to, unless one has explicitly added an SELinux
policy rule allowing the flow to "polmatch" to the "unlabeled"
IPSec policy rules. In the absence of such an explicitly added
SELinux policy rule, the IPSec policy rule fails to match and
so the packet(s) flow in clear text without the otherwise applicable
xfrm(s) applied.

The above SELinux behavior violates the SELinux security notion of
"deny by default" which should actually translate to "encrypt by
default" in the above case.

This was first reported by Evgeniy Polyakov and the way James Morris
was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.

With this patch applied, SELinux "polmatching" of flows Vs. IPSec
policy rules will only come into play when there's a explicit context
specified for the IPSec policy rule (which also means there's corresponding
SELinux policy allowing appropriate domains/flows to polmatch to this context).

Secondly, when a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return errors other than access denied,
such as -EINVAL. We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.

The solution for this is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).

This patch: Fix the selinux side of things.

This makes sure SELinux polmatching of flow contexts to IPSec policy
rules comes into play only when an explicit context is associated
with the IPSec policy rule.

Also, this no longer defaults the context of a socket policy to
the context of the socket since the "no explicit context" case
is now handled properly.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: James Morris <jmorris@namei.org>
134b0fc544ba062498451611cb6f3e4454221b3d 05-Oct-2006 James Morris <jmorris@namei.org> IPsec: propagate security module errors up from flow_cache_lookup

When a security module is loaded (in this case, SELinux), the
security_xfrm_policy_lookup() hook can return an access denied permission
(or other error). We were not handling that correctly, and in fact
inverting the return logic and propagating a false "ok" back up to
xfrm_lookup(), which then allowed packets to pass as if they were not
associated with an xfrm policy.

The way I was seeing the problem was when connecting via IPsec to a
confined service on an SELinux box (vsftpd), which did not have the
appropriate SELinux policy permissions to send packets via IPsec.

The first SYNACK would be blocked, because of an uncached lookup via
flow_cache_lookup(), which would fail to resolve an xfrm policy because
the SELinux policy is checked at that point via the resolver.

However, retransmitted SYNACKs would then find a cached flow entry when
calling into flow_cache_lookup() with a null xfrm policy, which is
interpreted by xfrm_lookup() as the packet not having any associated
policy and similarly to the first case, allowing it to pass without
transformation.

The solution presented here is to first ensure that errno values are
correctly propagated all the way back up through the various call chains
from security_xfrm_policy_lookup(), and handled correctly.

Then, flow_cache_lookup() is modified, so that if the policy resolver
fails (typically a permission denied via the security module), the flow
cache entry is killed rather than having a null policy assigned (which
indicates that the packet can pass freely). This also forces any future
lookups for the same flow to consult the security module (e.g. SELinux)
for current security policy (rather than, say, caching the error on the
flow cache entry).

Signed-off-by: James Morris <jmorris@namei.org>
ae8c05779ac2f286b872db9ebea0c3c0a031ad1e 04-Oct-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Clearing xfrm_policy_count[] to zero during flush is incorrect.

When we flush policies, we do a type match so we might not
actually delete all policies matching a certain direction.

So keep track of how many policies we actually kill and
subtract that number from xfrm_policy_count[dir] at the
end.

Based upon a patch by Masahide NAKAMURA.

Signed-off-by: David S. Miller <davem@davemloft.net>
a1e59abf824969554b90facd44a4ab16e265afa4 19-Sep-2006 Patrick McHardy <kaber@trash.net> [XFRM]: Fix wildcard as tunnel source

Hashing SAs by source address breaks templates with wildcards as tunnel
source since the source address used for hashing/lookup is still 0/0.
Move source address lookup to xfrm_tmpl_resolve_one() so we can use the
real address in the lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
d1d9facfd1b326e0df587c96f0ee55de2ae9f946 01-Sep-2006 James Morris <jmorris@namei.org> [XFRM]: remove xerr_idxp from __xfrm_policy_check()

It seems that during the MIPv6 respin, some code which was originally
conditionally compiled around CONFIG_XFRM_ADVANCED was accidently left
in after the config option was removed.

This patch removes an extraneous pointer (xerr_idxp) which is no
longer needed.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5d679f33900c71d1a76ba07c5b04055abd34480 27-Aug-2006 Alexey Dobriyan <adobriyan@gmail.com> [NET]: Use SLAB_PANIC

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
acba48e1a3c95082af1e12c5efaaca3506103a92 26-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Respect priority in policy lookups.

Even if we find an exact match in the hash table,
we must inspect the inexact list to look for a match
with a better priority.

Noticed by Masahide NAKAMURA <nakam@linux-ipv6.org>.

Signed-off-by: David S. Miller <davem@davemloft.net>
44e36b42a8378be1dcf7e6f8a1cb2710a8903387 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Extract common hashing code into xfrm_hash.[ch]

Signed-off-by: David S. Miller <davem@davemloft.net>
2518c7c2b3d7f0a6b302b4efe17c911f8dd4049f 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Hash policies when non-prefixed.

This idea is from Alexey Kuznetsov.

It is common for policies to be non-prefixed. And for
that case we can optimize lookups, insert, etc. quite
a bit.

For each direction, we have a dynamically sized policy
hash table for non-prefixed policies. We also have a
hash table on policy->index.

For prefixed policies, we have a list per-direction which
we will consult on lookups when a non-prefix hashtable
lookup fails.

This still isn't as efficient as I would like it. There
are four immediate problems:

1) Lots of excessive refcounting, which can be fixed just
like xfrm_state was
2) We do 2 hash probes on insert, one to look for dups and
one to allocate a unique policy->index. Althought I wonder
how much this matters since xfrm_state inserts do up to
3 hash probes and that seems to perform fine.
3) xfrm_policy_insert() is very complex because of the priority
ordering and entry replacement logic.
4) Lots of counter bumping, in addition to policy refcounts,
in the form of xfrm_policy_count[]. This is merely used
to let code path(s) know that some IPSEC rules exist. So
this count is indexed per-direction, maybe that is overkill.

Signed-off-by: David S. Miller <davem@davemloft.net>
1c0953997567b22e32fdf85d3b4bc0f2461fd161 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Purge dst references to deleted SAs passively.

Just let GC and other normal mechanisms take care of getting
rid of DST cache references to deleted xfrm_state objects
instead of walking all the policy bundles.

Signed-off-by: David S. Miller <davem@davemloft.net>
c7f5ea3a4d1ae6b3b426e113358fdc57494bc754 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Do not flush all bundles on SA insert.

Instead, simply set all potentially aliasing existing xfrm_state
objects to have the current generation counter value.

This will make routes get relooked up the next time an existing
route mentioning these aliased xfrm_state objects gets used,
via xfrm_dst_check().

Signed-off-by: David S. Miller <davem@davemloft.net>
9d4a706d852411154d0c91b9ffb3bec68b94b25c 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Add generation count to xfrm_state and xfrm_dst.

Each xfrm_state inserted gets a new generation counter
value. When a bundle is created, the xfrm_dst objects
get the current generation counter of the xfrm_state
they will attach to at dst->xfrm.

xfrm_bundle_ok() will return false if it sees an
xfrm_dst with a generation count different from the
generation count of the xfrm_state that dst points to.

This provides a facility by which to passively and
cheaply invalidate cached IPSEC routes during SA
database changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
41a49cc3c02ace59d4dddae91ea211c330970ee3 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Add sorting interface for state and template.

Under two transformation policies it is required to merge them.
This is a platform to sort state for outbound and templates
for inbound respectively.
It will be used when Mobile IPv6 and IPsec are used at the same time.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e81bb8336a0ac50289d4d4c7a55e559b994ee8f 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] POLICY: sub policy support.

Sub policy is introduced. Main and sub policy are applied the same flow.
(Policy that current kernel uses is named as main.)
It is required another transformation policy management to keep IPsec
and Mobile IPv6 lives separate.
Policy which lives shorter time in kernel should be a sub i.e. normally
main is for IPsec and sub is for Mobile IPv6.
(Such usage as two IPsec policies on different database can be used, too.)

Limitation or TODOs:
- Sub policy is not supported for per socket one (it is always inserted as main).
- Current kernel makes cached outbound with flowi to skip searching database.
However this patch makes it disabled only when "two policies are used and
the first matched one is bypass case" because neither flowi nor bundle
information knows about transformation template size.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
df0ba92a99ca757039dfa84a929281ea3f7a50e8 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Trace which secpath state is reject factor.

For Mobile IPv6 usage, it is required to trace which secpath state is
reject factor in order to notify it to user space (to know the address
which cannot be used route optimized communication).

Based on MIPL2 kernel patch.

This patch was also written by: Henrik Petander <petander@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e53820de0f81da1429048634cadc6ef5f50c2f8b 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] IPV6: Restrict bundle reusing

For outbound transformation, bundle is checked whether it is
suitable for current flow to be reused or not. In such IPv6 case
as below, transformation may apply incorrect bundle for the flow instead
of creating another bundle:

- The policy selector has destination prefix length < 128
(Two or more addresses can be matched it)
- Its bundle holds dst entry of default route whose prefix length < 128
(Previous traffic was used such route as next hop)
- The policy and the bundle were used a transport mode state and
this time flow address is not matched the bundled state.

This issue is found by Mobile IPv6 usage to protect mobility signaling
by IPsec, but it is not a Mobile IPv6 specific.
This patch adds strict check to xfrm_bundle_ok() for each
state mode and address when prefix length is less than 128.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e51fd371a022318c5b64b831c43026e89bc4f75 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Rename secpath_has_tunnel to secpath_has_nontransport.

On current kernel inbound transformation state is allowed transport and
disallowed tunnel mode when mismatch is occurred between tempates and states.
As the result of adding two more modes by Mobile IPv6, this function name
is misleading. Inbound transformation can allow only transport mode
when mismatch is occurred between template and secpath.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
f3bd484021d9486b826b422a017d75dd0bd258ad 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Restrict authentication algorithm only when inbound transformation protocol is IPsec.

For Mobile IPv6 usage, routing header or destination options header is
used and it doesn't require this comparison. It is checked only for
IPsec template.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7e49e6de30efa716614e280d97963c570f3acf29 23-Sep-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Add XFRM_MODE_xxx for future use.

Transformation mode is used as either IPsec transport or tunnel.
It is required to add two more items, route optimization and inbound trigger
for Mobile IPv6.
Based on MIPL2 kernel patch.

This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi>

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
beb8d13bed80f8388f1a9a107d07ddd342e627e8 05-Aug-2006 Venkat Yekkirala <vyekkirala@TrustedCS.com> [MLSXFRM]: Add flow labeling

This labels the flows that could utilize IPSec xfrms at the points the
flows are defined so that IPSec policy and SAs at the right label can
be used.

The following protos are currently not handled, but they should
continue to be able to use single-labeled IPSec like they currently
do.

ipmr
ip_gre
ipip
igmp
sit
sctp
ip6_tunnel (IPv6 over IPv6 tunnel device)
decnet

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e0d1caa7b0d5f02e4f34aa09c695d04251310c6c 25-Jul-2006 Venkat Yekkirala <vyekkirala@TrustedCS.com> [MLSXFRM]: Flow based matching of xfrm policy and state

This implements a seemless mechanism for xfrm policy selection and
state matching based on the flow sid. This also includes the necessary
SELinux enforcement pieces.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d49c73c729e2ef644558a1f441c044bfacdc9744 14-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [IPSEC]: Validate properly in xfrm_dst_check()

If dst->obsolete is -1, this is a signal from the
bundle creator that we want the XFRM dst and the
dsts that it references to be validated on every
use.

I misunderstood this intention when I changed
xfrm_dst_check() to always return NULL.

Now, when we purge a dst entry, by running dst_free()
on it. This will set the dst->obsolete to a positive
integer, and we want to return NULL in that case so
that the socket does a relookup for the route.

Thus, if dst->obsolete<0, let stale_bundle() validate
the state, else always return NULL.

In general, we need to do things more intelligently
here because we flush too much state during rule
changes. Herbert Xu has some ideas wherein the key
manager gives us some help in this area. We can also
use smarter state management algorithms inside of
the kernel as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
0da974f4f303a6842516b764507e3c0a03f41e5a 21-Jul-2006 Panagiotis Issaris <takis@issaris.org> [NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
b59f45d0b2878ab76f8053b0973654e6621828ee 28-May-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] xfrm: Abstract out encapsulation modes

This patch adds the structure xfrm_mode. It is meant to represent
the operations carried out by transport/tunnel modes.

By doing this we allow additional encapsulation modes to be added
without clogging up the xfrm_input/xfrm_output paths.

Candidate modes include 4-to-6 tunnel mode, 6-to-4 tunnel mode, and
BEET modes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
546be2405be119ef55467aace45f337a16e5d424 28-May-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] xfrm: Undo afinfo lock proliferation

The number of locks used to manage afinfo structures can easily be reduced
down to one each for policy and state respectively. This is based on the
observation that the write locks are only held by module insertion/removal
which are very rare events so there is no need to further differentiate
between the insertion of modules like ipv6 versus esp6.

The removal of the read locks in xfrm4_policy.c/xfrm6_policy.c might look
suspicious at first. However, after you realise that nobody ever takes
the corresponding write lock you'll feel better :)

As far as I can gather it's an attempt to guard against the removal of
the corresponding modules. Since neither module can be unloaded at all
we can leave it to whoever fixes up IPv6 unloading :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
e959d8121fcbfee6ec049cc617e9423d1799f2e4 29-Apr-2006 Ingo Molnar <mingo@elte.hu> [XFRM]: fix incorrect xfrm_policy_afinfo_lock use

xfrm_policy_afinfo_lock can be taken in bh context, at:

[<c013fe1a>] lockdep_acquire_read+0x54/0x6d
[<c0f6e024>] _read_lock+0x15/0x22
[<c0e8fcdb>] xfrm_policy_get_afinfo+0x1a/0x3d
[<c0e8fd10>] xfrm_decode_session+0x12/0x32
[<c0e66094>] ip_route_me_harder+0x1c9/0x25b
[<c0e770d3>] ip_nat_local_fn+0x94/0xad
[<c0e2bbc8>] nf_iterate+0x2e/0x7a
[<c0e2bc50>] nf_hook_slow+0x3c/0x9e
[<c0e3a342>] ip_push_pending_frames+0x2de/0x3a7
[<c0e53e19>] icmp_push_reply+0x136/0x141
[<c0e543fb>] icmp_reply+0x118/0x1a0
[<c0e54581>] icmp_echo+0x44/0x46
[<c0e53fad>] icmp_rcv+0x111/0x138
[<c0e36764>] ip_local_deliver+0x150/0x1f9
[<c0e36be2>] ip_rcv+0x3d5/0x413
[<c0df760f>] netif_receive_skb+0x337/0x356
[<c0df76c3>] process_backlog+0x95/0x110
[<c0df5fe2>] net_rx_action+0xa5/0x16d
[<c012d8a7>] __do_softirq+0x6f/0xe6
[<c0105ec2>] do_softirq+0x52/0xb1

this means that all write-locking of xfrm_policy_afinfo_lock must be
bh-safe. This patch fixes xfrm_policy_register_afinfo() and
xfrm_policy_unregister_afinfo().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
8dff7c29707b7514043539f5ab5e0a6eb7bd9dcd 29-Apr-2006 Ingo Molnar <mingo@elte.hu> [XFRM]: fix softirq-unsafe xfrm typemap->lock use

xfrm typemap->lock may be used in softirq context, so all write_lock()
uses must be softirq-safe.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
dbe5b4aaafc715b12dbbea309d3d17958d01fd65 01-Apr-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Kill unused decap state structure

This patch removes the *_decap_state structures which were previously
used to share state between input/post_input. This is no longer
needed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4a3e2f711a00a1feb72ae12fdc749da10179d185 21-Mar-2006 Arjan van de Ven <arjan@infradead.org> [NET] sem2mutex: net/

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a70fcb0ba337956d91476e2e7c3e71d9df940a82 21-Mar-2006 David S. Miller <davem@davemloft.net> [XFRM]: Add some missing exports.

To fix the case of modular xfrm_user.

Signed-off-by: David S. Miller <davem@davemloft.net>
6c5c8ca7ff20523e427b955aa84cef407934710f 21-Mar-2006 Jamal Hadi Salim <hadi@cyberus.ca> [IPSEC]: Sync series - policy expires

This is similar to the SA expire insertion patch - only it inserts
expires for SP.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
752c1f4c78fe86d0fd6497387f763306b0d8fc53 27-Feb-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Kill post_input hook and do NAT-T in esp_input directly

The only reason post_input exists at all is that it gives us the
potential to adjust the checksums incrementally in future which
we ought to do.

However, after thinking about it for a bit we can adjust the
checksums without using this post_input stuff at all. The crucial
point is that only the inner-most NAT-T SA needs to be considered
when adjusting checksums. What's more, the checksum adjustment
comes down to a single u32 due to the linearity of IP checksums.

We just happen to have a spare u32 lying around in our skb structure :)
When ip_summed is set to CHECKSUM_NONE on input, the value of skb->csum
is currently unused. All we have to do is to make that the checksum
adjustment and voila, there goes all the post_input and decap structures!

I've left in the decap data structures for now since it's intricately
woven into the sec_path stuff. We can kill them later too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
42cf93cd464e0df3c85d298c647411bae6d99e6e 21-Feb-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: Fix bridge netfilter related in xfrm_lookup

The bridge-netfilter code attaches a fake dst_entry with dst->ops == NULL
to purely bridged packets. When these packets are SNATed and a policy
lookup is done, xfrm_lookup crashes because it tries to dereference
dst->ops.

Change xfrm_lookup not to dereference dst->ops before checking for the
DST_NOXFRM flag and set this flag in the fake dst_entry.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
995110143880fd9cb255fa5df05f8950c56fb43a 20-Feb-2006 Patrick McHardy <kaber@trash.net> [XFRM]: Fix policy double put

The policy is put once immediately and once at the error label, which results
in the following Oops:

kernel BUG at net/xfrm/xfrm_policy.c:250!
invalid opcode: 0000 [#2]
PREEMPT
[...]
CPU: 0
EIP: 0060:[<c028caf7>] Not tainted VLI
EFLAGS: 00210246 (2.6.16-rc3 #39)
EIP is at __xfrm_policy_destroy+0xf/0x46
eax: d49f2000 ebx: d49f2000 ecx: f74bd880 edx: f74bd280
esi: d49f2000 edi: 00000001 ebp: cd506dcc esp: cd506dc8
ds: 007b es: 007b ss: 0068
Process ssh (pid: 31970, threadinfo=cd506000 task=cfb04a70)
Stack: <0>cd506000 cd506e34 c028e92b ebde7280 cd506e58 cd506ec0 f74bd280 00000000
00000214 0000000a 0000000a 00000000 00000002 f7ae6000 00000000 cd506e58
cd506e14 c0299e36 f74bd280 e873fe00 c02943fd cd506ec0 ebde7280 f271f440
Call Trace:
[<c0103a44>] show_stack_log_lvl+0xaa/0xb5
[<c0103b75>] show_registers+0x126/0x18c
[<c0103e68>] die+0x14e/0x1db
[<c02b6809>] do_trap+0x7c/0x96
[<c0104237>] do_invalid_op+0x89/0x93
[<c01035af>] error_code+0x4f/0x54
[<c028e92b>] xfrm_lookup+0x349/0x3c2
[<c02b0b0d>] ip6_datagram_connect+0x317/0x452
[<c0281749>] inet_dgram_connect+0x49/0x54
[<c02404d2>] sys_connect+0x51/0x68
[<c0240928>] sys_socketcall+0x6f/0x166
[<c0102aa1>] syscall_call+0x7/0xb

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
00de651d14baabc5c1d2f32c49d9a984d8891c8e 14-Feb-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Fix strange IPsec freeze.

Problem discovered and initial patch by Olaf Kirch:

there's a problem with IPsec that has been bugging some of our users
for the last couple of kernel revs. Every now and then, IPsec will
freeze the machine completely. This is with openswan user land,
and with kernels up to and including 2.6.16-rc2.

I managed to debug this a little, and what happens is that we end
up looping in xfrm_lookup, and never get out. With a bit of debug
printks added, I can this happening:

ip_route_output_flow calls xfrm_lookup

xfrm_find_bundle returns NULL (apparently we're in the
middle of negotiating a new SA or something)

We therefore call xfrm_tmpl_resolve. This returns EAGAIN
We go to sleep, waiting for a policy update.
Then we loop back to the top

Apparently, the dst_orig that was passed into xfrm_lookup
has been dropped from the routing table (obsolete=2)
This leads to the endless loop, because we now create
a new bundle, check the new bundle and find it's stale
(stale_bundle -> xfrm_bundle_ok -> dst_check() return 0)

People have been testing with the patch below, which seems to fix the
problem partially. They still see connection hangs however (things
only clear up when they start a new ping or new ssh). So the patch
is obvsiouly not sufficient, and something else seems to go wrong.

I'm grateful for any hints you may have...

I suggest that we simply bail out always. If the dst decides to die
on us later on, the packet will be dropped anyway. So there is no
great urgency to retry here. Once we have the proper resolution
queueing, we can then do the retry again.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b8623545b42c03eb92e51b28c84acf4b8ba00a3 15-Dec-2005 Al Viro <viro@zeniv.linux.org.uk> [PATCH] remove bogus asm/bug.h includes.

A bunch of asm/bug.h includes are both not needed (since it will get
pulled anyway) and bogus (since they are done too early). Removed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
09a626600b437d91f6b13ade5c7c4b374893c54e 09-Jan-2006 Kris Katterjohn <kjak@users.sourceforge.net> [NET]: Change some "if (x) BUG();" to "BUG_ON(x);"

This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"

Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
eb9c7ebe6980c41cf6ae889e301c3b49f473ee9f 07-Jan-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: Handle NAT in IPsec policy checks

Handle NAT of decapsulated IPsec packets by reconstructing the struct flowi
of the original packet from the conntrack information for IPsec policy
checks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3e3850e989c5d2eb1aab6f0fd9257759f0f4cbc6 07-Jan-2006 Patrick McHardy <kaber@trash.net> [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder

ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
uses ip_route_input for non-local addresses which doesn't do a xfrm
lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.

Use xfrm_decode_session and do the lookup manually, make sure both
only do the lookup if the packet hasn't been transformed already.

Makeing sure the lookup only happens once needs a new field in the
IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
increased to 48b. Apparently the IPv6 mobile extensions need some
more room anyway.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
df71837d5024e2524cd51c93621e558aa7dd9f3f 14-Dec-2005 Trent Jaeger <tjaeger@cse.psu.edu> [LSM-IPSec]: Security association restriction.

This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.

This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.

Patch purpose:

The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.

Patch design approach:

The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.

A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.

Patch implementation details:

On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.

On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.

The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.

Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.

Testing:

The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.

The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b78a82c1cf19aa813bdaa184fa840a3ba811750 22-Dec-2005 David S. Miller <davem@sunset.davemloft.net> [IPSEC]: Fix policy updates missed by sockets

The problem is that when new policies are inserted, sockets do not see
the update (but all new route lookups do).

This bug is related to the SA insertion stale route issue solved
recently, and this policy visibility problem can be fixed in a similar
way.

The fix is to flush out the bundles of all policies deeper than the
policy being inserted. Consider beginning state of "outgoing"
direction policy list:

policy A --> policy B --> policy C --> policy D

First, realize that inserting a policy into a list only potentially
changes IPSEC routes for that direction. Therefore we need not bother
considering the policies for other directions. We need only consider
the existing policies in the list we are doing the inserting.

Consider new policy "B'", inserted after B.

policy A --> policy B --> policy B' --> policy C --> policy D

Two rules:

1) If policy A or policy B matched before the insertion, they
appear before B' and thus would still match after inserting
B'

2) Policy C and D, now "shadowed" and after policy B', potentially
contain stale routes because policy B' might be selected
instead of them.

Therefore we only need flush routes assosciated with policies
appearing after a newly inserted policy, if any.

Signed-off-by: David S. Miller <davem@davemloft.net>
399c180ac5f0cb66ef9479358e0b8b6bafcbeafe 19-Dec-2005 David S. Miller <davem@sunset.davemloft.net> [IPSEC]: Perform SA switchover immediately.

When we insert a new xfrm_state which potentially
subsumes an existing one, make sure all cached
bundles are flushed so that the new SA is used
immediately.

Signed-off-by: David S. Miller <davem@davemloft.net>
80b30c1023dbd795faf948dee0cfb3b270b56d47 15-Oct-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Kill obsolete get_mss function

Now that we've switched over to storing MTUs in the xfrm_dst entries,
we no longer need the dst's get_mss methods. This patch gets rid of
them.

It also documents the fact that our MTU calculation is not optimal
for ESP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
dd0fc66fb33cd610bc1a5db8a5e232d34879b4d7 07-Oct-2005 Al Viro <viro@ftp.linux.org.uk> [PATCH] gfp flags annotations - part 1

- added typedef unsigned int __nocast gfp_t;

- replaced __nocast uses for gfp flags with gfp_t - it gives exactly
the same warnings as far as sparse is concerned, doesn't change
generated code (from gcc point of view we replaced unsigned int with
typedef) and documents what's going on far better.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
77d8d7a6848c81084f413e1ec4982123a56e2ccb 05-Oct-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Document that policy direction is derived from the index.

Here is a patch that adds a helper called xfrm_policy_id2dir to
document the fact that the policy direction can be and is derived
from the index.

This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
83fa3400ebcba307a60909824a251be984eb9567 05-Oct-2005 Randy Dunlap <rdunlap@xenotime.net> [XFRM]: fix sparse gfp nocast warnings

Fix implicit nocast warnings in xfrm code:
net/xfrm/xfrm_policy.c:232:47: warning: implicit cast to nocast type

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e104411b82f5c4d19752c335492036abdbf5880d 09-Sep-2005 Patrick McHardy <kaber@trash.net> [XFRM]: Always release dst_entry on error in xfrm_lookup

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ba89966c1984513f4f2cc0a6c182266be44ddd03 26-Aug-2005 Eric Dumazet <dada1@cosmosbay.com> [NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers

This patch puts mostly read only data in the right section
(read_mostly), to help sharing of these data between CPUS without
memory ping pongs.

On one of my production machine, tcp_statistics was sitting in a
heavily modified cache line, so *every* SNMP update had to force a
reload.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
72cb6962a91f2af9eef69a06198e1949c10259ae 20-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Add xfrm_init_state

This patch adds xfrm_init_state which is simply a wrapper that calls
xfrm_get_type and subsequently x->type->init_state. It also gets rid
of the unused args argument.

Abstracting it out allows us to add common initialisation code, e.g.,
to set family-specific flags.

The add_time setting in xfrm_user.c was deleted because it's already
set by xfrm_state_alloc.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4666faab095230ec8aa62da6c33391287f281154 19-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] Kill spurious hard expire messages

This patch ensures that the hard state/policy expire notifications are
only sent when the state/policy is successfully removed from their
respective tables.

As it is, it's possible for a state/policy to both expire through
reaching a hard limit, as well as being deleted by the user.

Note that this behaviour isn't actually forbidden by RFC 2367.
However, it is a quality of implementation issue.

As an added bonus, the restructuring in this patch will help
eventually in moving the expire notifications from softirq
context into process context, thus improving their reliability.

One important side-effect from this change is that SAs reaching
their hard byte/packet limits are now deleted immediately, just
like SAs that have reached their hard time limits.

Previously they were announced immediately but only deleted after
30 seconds.

This is bad because it prevents the system from issuing an ACQUIRE
command until the existing state was deleted by the user or expires
after the time is up.

In the scenario where the expire notification was lost this introduces
a 30 second delay into the system for no good reason.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
92d63decc0b6a5d600f792fcf5f3ff9718c09a3d 26-May-2005 Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> From: Kazunori Miyazawa <kazunori@miyazawa.org>

[XFRM] Call dst_check() with appropriate cookie

This fixes infinite loop issue with IPv6 tunnel mode.

Signed-off-by: Kazunori Miyazawa <kazunori@miyazawa.org>
Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
aabc9761b69f1bfa30a78f7005be95cc9cc06175 04-May-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Store idev entries

I found a bug that stopped IPsec/IPv6 from working. About
a month ago IPv6 started using rt6i_idev->dev on the cached socket dst
entries. If the cached socket dst entry is IPsec, then rt6i_idev will
be NULL.

Since we want to look at the rt6i_idev of the original route in this
case, the easiest fix is to store rt6i_idev in the IPsec dst entry just
as we do for a number of other IPv6 route attributes. Unfortunately
this means that we need some new code to handle the references to
rt6i_idev. That's why this patch is bigger than it would otherwise be.

I've also done the same thing for IPv4 since it is conceivable that
once these idev attributes start getting used for accounting, we
probably need to dereference them for IPv4 IPsec entries too.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!