History log of /net/xfrm/xfrm_state.c
Revision Date Author Comments
0244790c8ad2408dfb313e5c886e6e5a808ea946 29-Aug-2014 Ying Xue <ying.xue@windriver.com> xfrm: remove useless hash_resize_mutex locks

In xfrm_state.c, hash_resize_mutex is defined as a local variable
and only used in xfrm_hash_resize() which is declared as a work
handler of xfrm.state_hash_work. But when the xfrm.state_hash_work
work is put in the global workqueue(system_wq) with schedule_work(),
the work will be really inserted in the global workqueue if it was
not already queued, otherwise, it is still left in the same position
on the the global workqueue. This means the xfrm_hash_resize() work
handler is only executed once at any time no matter how many times
its work is scheduled, that is, xfrm_hash_resize() is not called
concurrently at all, so hash_resize_mutex is redundant for us.

Cc: Christophe Gouault <christophe.gouault@6wind.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2e71029e2c32ecd59a2e8f351517bfbbad42ac11 22-Apr-2014 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> xfrm: Remove useless xfrm_audit struct.

Commit f1370cc4 "xfrm: Remove useless secid field from xfrm_audit." changed
"struct xfrm_audit" to have either
{ audit_get_loginuid(current) / audit_get_sessionid(current) } or
{ INVALID_UID / -1 } pair.

This means that we can represent "struct xfrm_audit" as "bool".
This patch replaces "struct xfrm_audit" argument with "bool".

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
f1370cc4a01e61007ab3020c761cef6b88ae3729 18-Apr-2014 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> xfrm: Remove useless secid field from xfrm_audit.

It seems to me that commit ab5f5e8b "[XFRM]: xfrm audit calls" is doing
something strange at xfrm_audit_helper_usrinfo().
If secid != 0 && security_secid_to_secctx(secid) != 0, the caller calls
audit_log_task_context() which basically does
secid != 0 && security_secid_to_secctx(secid) == 0 case
except that secid is obtained from current thread's context.

Oh, what happens if secid passed to xfrm_audit_helper_usrinfo() was
obtained from other thread's context? It might audit current thread's
context rather than other thread's context if security_secid_to_secctx()
in xfrm_audit_helper_usrinfo() failed for some reason.

Then, are all the caller of xfrm_audit_helper_usrinfo() passing either
secid obtained from current thread's context or secid == 0?
It seems to me that they are.

If I didn't miss something, we don't need to pass secid to
xfrm_audit_helper_usrinfo() because audit_log_task_context() will
obtain secid from current thread's context.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
870a2df4ca026817eb87bb2f9daaa60a93fd051a 06-Mar-2014 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm: rename struct xfrm_filter

iproute2 already defines a structure with that name, let's use another one to
avoid any conflict.

CC: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
cc9ab60e57964d463ff31b9621c8d7e786aee042 19-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Cleanup error handling of xfrm_state_clone

The error pointer passed to xfrm_state_clone() is unchecked,
so remove it and indicate an error by returning a null pointer.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
ee5c23176fcc820f7a56d3e86001532af0d59b1e 19-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Clone states properly on migration

We loose a lot of information of the original state if we
clone it with xfrm_state_clone(). In particular, there is
no crypto algorithm attached if the original state uses
an aead algorithm. This patch add the missing information
to the clone state.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
8c0cba22e196122d26c92980943474eb53db8deb 19-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Take xfrm_state_lock in xfrm_migrate_state_find

A comment on xfrm_migrate_state_find() says that xfrm_state_lock
is held. This is apparently not the case, but we need it to
traverse through the state lists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
35ea790d7883dd660208f78eae50ebfd6b8bd14a 19-Feb-2014 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Fix NULL pointer dereference on sub policy usage

xfrm_state_sort() takes the unsorted states from the src array
and stores them into the dst array. We try to get the namespace
from the dst array which is empty at this time, so take the
namespace from the src array instead.

Fixes: 283bc9f35bbbc ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
d3623099d3509fa68fa28235366049dd3156c63a 14-Feb-2014 Nicolas Dichtel <nicolas.dichtel@6wind.com> ipsec: add support of limited SA dump

The goal of this patch is to allow userland to dump only a part of SA by
specifying a filter during the dump.
The kernel is in charge to filter SA, this avoids to generate useless netlink
traffic (it save also some cpu cycles). This is particularly useful when there
is a big number of SA set on the system.

Note that I removed the union in struct xfrm_state_walk to fix a problem on arm.
struct netlink_callback->args is defined as a array of 6 long and the first long
is used in xfrm code to flag the cb as initialized. Hence, we must have:
sizeof(struct xfrm_state_walk) <= sizeof(long) * 5.
With the union, it was false on arm (sizeof(struct xfrm_state_walk) was
sizeof(long) * 7), due to the padding.
In fact, whatever the arch is, this union seems useless, there will be always
padding after it. Removing it will not increase the size of this struct (and
reduce it on arm).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
0f24558e91563888d51e9be5b70981da920c37ac 12-Feb-2014 Horia Geanta <horia.geanta@freescale.com> xfrm: avoid creating temporary SA when there are no listeners

In the case when KMs have no listeners, km_query() will fail and
temporary SAs are garbage collected immediately after their allocation.
This causes strain on memory allocation, leading even to OOM since
temporary SA alloc/free cycle is performed for every packet
and garbage collection does not keep up the pace.

The sane thing to do is to make sure we have audience before
temporary SA allocation.

Signed-off-by: Horia Geanta <horia.geanta@freescale.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
63862b5bef7349dd1137e4c70702c67d77565785 11-Jan-2014 Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> net: replace macros net_random and net_srandom with direct calls to prandom

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4440e8548153e9e6d56db9abe6f3bc0e5b9eb74f 27-Nov-2013 Eric Paris <eparis@redhat.com> audit: convert all sessionid declaration to unsigned int

Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int. Just use unsigned int throughout.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
c454997e68eb013510ba128283ad3b4aefeff630 03-Jan-2014 Fan Du <fan.du@windriver.com> {pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen

Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
4ae770bf58ce6315e0f4a1bd4b7e574bdfa59ca9 03-Jan-2014 Fan Du <fan.du@windriver.com> {pktgen, xfrm} Correct xfrm_state_lock usage in xfrm_stateonly_find

Acquiring xfrm_state_lock in process context is expected to turn BH off,
as this lock is also used in BH context, namely xfrm state timer handler.
Otherwise it surprises LOCKDEP with below messages.

[ 81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74
[ 81.725194]
[ 81.725211] =========================================================
[ 81.725212] [ INFO: possible irq lock inversion dependency detected ]
[ 81.725215] 3.13.0-rc2+ #92 Not tainted
[ 81.725216] ---------------------------------------------------------
[ 81.725218] kpktgend_0/2780 just changed the state of lock:
[ 81.725220] (xfrm_state_lock){+.+...}, at: [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past:
[ 81.725232] (&(&x->lock)->rlock){+.-...}
[ 81.725232]
[ 81.725232] and interrupts could create inverse lock ordering between them.
[ 81.725232]
[ 81.725235]
[ 81.725235] other info that might help us debug this:
[ 81.725237] Possible interrupt unsafe locking scenario:
[ 81.725237]
[ 81.725238] CPU0 CPU1
[ 81.725240] ---- ----
[ 81.725241] lock(xfrm_state_lock);
[ 81.725243] local_irq_disable();
[ 81.725244] lock(&(&x->lock)->rlock);
[ 81.725246] lock(xfrm_state_lock);
[ 81.725248] <Interrupt>
[ 81.725249] lock(&(&x->lock)->rlock);
[ 81.725251]
[ 81.725251] *** DEADLOCK ***
[ 81.725251]
[ 81.725254] no locks held by kpktgend_0/2780.
[ 81.725255]
[ 81.725255] the shortest dependencies between 2nd lock and 1st lock:
[ 81.725269] -> (&(&x->lock)->rlock){+.-...} ops: 8 {
[ 81.725274] HARDIRQ-ON-W at:
[ 81.725276] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[ 81.725282] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725284] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725289] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 81.725292] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 81.725300] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 81.725303] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 81.725305] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 81.725308] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 81.725313] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 81.725316] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 81.725329] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 81.725333] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[ 81.725338] IN-SOFTIRQ-W at:
[ 81.725340] [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
[ 81.725342] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725344] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725347] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 81.725349] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 81.725352] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 81.725355] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 81.725358] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 81.725360] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 81.725363] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 81.725365] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 81.725368] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 81.725370] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[ 81.725373] INITIAL USE at:
[ 81.725375] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[ 81.725385] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725388] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725390] [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[ 81.725394] [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[ 81.725398] [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[ 81.725401] [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[ 81.725404] [<ffffffff8105a026>] irq_exit+0x96/0xc0
[ 81.725407] [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[ 81.725409] [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[ 81.725412] [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[ 81.725415] [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[ 81.725417] [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[ 81.725420] }
[ 81.725421] ... key at: [<ffffffff8295b9c8>] __key.46349+0x0/0x8
[ 81.725445] ... acquired at:
[ 81.725446] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725449] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725452] [<ffffffff816dc057>] __xfrm_state_delete+0x37/0x140
[ 81.725454] [<ffffffff816dc18c>] xfrm_state_delete+0x2c/0x50
[ 81.725456] [<ffffffff816dc277>] xfrm_state_flush+0xc7/0x1b0
[ 81.725458] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[ 81.725465] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[ 81.725468] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[ 81.725471] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[ 81.725476] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[ 81.725479] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[ 81.725482] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[ 81.725484]
[ 81.725486] -> (xfrm_state_lock){+.+...} ops: 11 {
[ 81.725490] HARDIRQ-ON-W at:
[ 81.725493] [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[ 81.725504] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725507] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[ 81.725510] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[ 81.725513] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[ 81.725516] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[ 81.725519] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[ 81.725522] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[ 81.725525] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[ 81.725527] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[ 81.725530] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[ 81.725533] SOFTIRQ-ON-W at:
[ 81.725534] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 81.725537] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725539] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725541] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725544] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[ 81.725547] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[ 81.725550] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 81.725555] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 81.725565] INITIAL USE at:
[ 81.725567] [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[ 81.725569] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725572] [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[ 81.725574] [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[ 81.725576] [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[ 81.725580] [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[ 81.725583] [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[ 81.725586] [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[ 81.725589] [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[ 81.725594] [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[ 81.725597] [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[ 81.725599] }
[ 81.725600] ... key at: [<ffffffff81cadef8>] xfrm_state_lock+0x18/0x50
[ 81.725606] ... acquired at:
[ 81.725607] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[ 81.725609] [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[ 81.725611] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 81.725614] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725616] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725627] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725629] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[ 81.725632] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[ 81.725635] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 81.725637] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 81.725640]
[ 81.725641]
[ 81.725641] stack backtrace:
[ 81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92
[ 81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[ 81.725649] ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007
[ 81.725652] ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80
[ 81.725655] ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8
[ 81.725659] Call Trace:
[ 81.725664] [<ffffffff8176af37>] dump_stack+0x46/0x58
[ 81.725667] [<ffffffff81099358>] print_irq_inversion_bug.part.42+0x1e8/0x1f0
[ 81.725670] [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[ 81.725672] [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[ 81.725675] [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
[ 81.725685] [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[ 81.725691] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[ 81.725694] [<ffffffff81089b38>] ? sched_clock_cpu+0xa8/0x120
[ 81.725697] [<ffffffff8109a31a>] ? __lock_acquire+0x32a/0x1d70
[ 81.725699] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[ 81.725702] [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[ 81.725704] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[ 81.725707] [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[ 81.725710] [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[ 81.725712] [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[ 81.725715] [<ffffffff810971ec>] ? lock_release_holdtime.part.26+0x1c/0x1a0
[ 81.725717] [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[ 81.725721] [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[ 81.725724] [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[ 81.725727] [<ffffffffa008ba71>] ? pktgen_thread_worker+0xb11/0x1880 [pktgen]
[ 81.725729] [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
[ 81.725733] [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
[ 81.725745] [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
[ 81.725751] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[ 81.725753] [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[ 81.725757] [<ffffffffa008af60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
[ 81.725759] [<ffffffff81078f84>] kthread+0xe4/0x100
[ 81.725762] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
[ 81.725765] [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[ 81.725768] [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
3e94c2dcfd7ca297bd7e0a8d96be1e76dec711a3 24-Dec-2013 Weilong Chen <chenweilong@huawei.com> xfrm: checkpatch errors with foo * bar

This patch clean up some checkpatch errors like this:
ERROR: "foo * bar" should be "foo *bar"
ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
9b7a787d0da7db3127f6e04f8f8159632da50a36 24-Dec-2013 Weilong Chen <chenweilong@huawei.com> xfrm: checkpatch errors with space

This patch cleanup some space errors.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
776e9dd90ca223b82166eb2835389493b5914cba 16-Dec-2013 Fan Du <fan.du@windriver.com> xfrm: export verify_userspi_info for pkfey and netlink interface

In order to check against valid IPcomp spi range, export verify_userspi_info
for both pfkey and netlink interface.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
5b8ef3415a21f173ab115e90ec92c071a03f22d7 27-Aug-2013 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Remove ancient sleeping when the SA is in acquire state

We now queue packets to the policy if the states are not yet resolved,
this replaces the ancient sleeping code. Also the sleeping can cause
indefinite task hangs if the needed state does not get resolved.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
283bc9f35bbbcb0e9ab4e6d2427da7f9f710d52d 07-Nov-2013 Fan Du <fan.du@windriver.com> xfrm: Namespacify xfrm state/policy locks

By semantics, xfrm layer is fully name space aware,
so will the locks, e.g. xfrm_state/pocliy_lock.
Ensure exclusive access into state/policy link list
for different name space with one global lock is not
right in terms of semantics aspect at first place,
as they are indeed mutually independent with each
other, but also more seriously causes scalability
problem.

One practical scenario is on a Open Network Stack,
more than hundreds of lxc tenants acts as routers
within one host, a global xfrm_state/policy_lock
becomes the bottleneck. But onces those locks are
decoupled in a per-namespace fashion, locks contend
is just with in specific name space scope, without
causing additional SPD/SAD access delay for other
name space.

Also this patch improve scalability while as without
changing original xfrm behavior.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
f59bbdfa5c6e2a2f74f0e03d1beab6ddb9b3d466 27-Sep-2013 Fan Du <fan.du@windriver.com> xfrm: Simplify SA looking up when using wildcard source

__xfrm4/6_state_addr_check is a four steps check, all we need to do
is checking whether the destination address match when looking SA
using wildcard source address. Passing saddr from flow is worst option,
as the checking needs to reach the fourth step while actually only
one time checking will do the work.

So, simplify this process by only checking destination address when
using wildcard source address for looking up SAs.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6f1156383a419a4d19bdc196ffa8d4dbe2f01b36 23-Sep-2013 Fan Du <fan.du@windriver.com> xfrm: Force SA to be lookup again if SA in acquire state

If SA is in the process of acquiring, which indicates this SA is more
promising and precise than the fall back option, i.e. using wild card
source address for searching less suitable SA.

So, here bail out, and try again.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
0806ae4cc8722b2d2822fe3fa3f516f2da6b9459 23-Aug-2013 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm: announce deleation of temporary SA

Creation of temporary SA are announced by netlink, but there is no notification
for the deletion.
This patch fix this asymmetric situation.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
99565a6c471cbb66caa68347c195133017559943 15-Aug-2013 Fan Du <fan.du@windriver.com> xfrm: Make xfrm_state timer monotonic

xfrm_state timer should be independent of system clock change,
so switch to CLOCK_BOOTTIME base which is not only monotonic but
also counting suspend time.

Thus issue reported in commit: 9e0d57fd6dad37d72a3ca6db00ca8c76f2215454
("xfrm: SAD entries do not expire correctly after suspend-resume")
could ALSO be avoided.

v2: Use CLOCK_BOOTTIME to count suspend time, but still monotonic.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
628e341f319f1a64a4639088faba952e4ec8f0a8 14-Aug-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> xfrm: make local error reporting more robust

In xfrm4 and xfrm6 we need to take care about sockets of the other
address family. This could happen because a 6in4 or 4in6 tunnel could
get protected by ipsec.

Because we don't want to have a run-time dependency on ipv6 when only
using ipv4 xfrm we have to embed a pointer to the correct local_error
function in xfrm_state_afinet and look it up when returning an error
depending on the socket address family.

Thanks to vi0ss for the great bug report:
<https://bugzilla.kernel.org/show_bug.cgi?id=58691>

v2:
a) fix two more unsafe interpretations of skb->sk as ipv6 socket
(xfrm6_local_dontfrag and __xfrm6_output)
v3:
a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
building ipv6 as a module (thanks to Steffen Klassert)

Reported-by: <vi0oss@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
e473fcb472574de978e47f980aeca510020a1286 26-Jun-2013 Mathias Krause <minipli@googlemail.com> xfrm: constify mark argument of xfrm_find_acq()

The mark argument is read only, so constify it. Also make dummy_mark in
af_key const -- only used as dummy argument for this very function.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
a947b0a93efa9a25c012aa88848f4cf8d9b41280 22-Feb-2013 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm: allow to avoid copying DSCP during encapsulation

By default, DSCP is copying during encapsulation.
Copying the DSCP in IPsec tunneling may be a bit dangerous because packets with
different DSCP may get reordered relative to each other in the network and then
dropped by the remote IPsec GW if the reordering becomes too big compared to the
replay window.

It is possible to avoid this copy with netfilter rules, but it's very convenient
to be able to configure it for each SA directly.

This patch adds a toogle for this purpose. By default, it's not set to maintain
backward compatibility.

Field flags in struct xfrm_usersa_info is full, hence I add a new attribute.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a 28-Feb-2013 Sasha Levin <sasha.levin@oracle.com> hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
70e94e66aec255aff276397f5ed3f3626c548f1c 29-Jan-2013 YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> xfrm: Convert xfrm_addr_cmp() to boolean xfrm_addr_equal().

All users of xfrm_addr_cmp() use its result as boolean.
Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp())
and convert all users.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7a9885b93bb623524468b37d04146d3422e099eb 17-Jan-2013 Cong Wang <amwang@redhat.com> xfrm: use separated locks to protect pointers of struct xfrm_state_afinfo

afinfo->type_map and afinfo->mode_map deserve separated locks,
they are different things.

We should just take RCU read lock to protect afinfo itself,
but not for the inner pointers.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
85168c0036fadf3657470b93cd9babeee8d123d7 16-Jan-2013 Cong Wang <amwang@redhat.com> xfrm: replace rwlock on xfrm_km_list with rcu

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
44abdc3047aecafc141dfbaf1ed4bcb77f5d1652 16-Jan-2013 Cong Wang <amwang@redhat.com> xfrm: replace rwlock on xfrm_state_afinfo with rcu

Similar to commit 418a99ac6ad487dc9c42e6b0e85f941af56330f2
(Replace rwlock on xfrm_policy_afinfo with rcu), the rwlock
on xfrm_state_afinfo can be replaced by RCU too.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
bb65a9cb953fdfe9c507e8dbb6c4ec2540484bd3 28-Dec-2012 Li RongQing <roy.qing.li@gmail.com> xfrm: removes a superfluous check and add a statistic

Remove the check if x->km.state equal to XFRM_STATE_VALID in
xfrm_state_check_expire(), which will be done before call
xfrm_state_check_expire().

add a LINUX_MIB_XFRMOUTSTATEINVALID statistic to record the
outbound error due to invalid xfrm state.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
e1760bd5ffae8cb98cffb030ee8e631eba28f3d8 11-Sep-2012 Eric W. Biederman <ebiederm@xmission.com> userns: Convert the audit loginuid to be a kuid

Always store audit loginuids in type kuid_t.

Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.

Modify audit_get_loginuid to return a kuid_t.

Modify audit_set_loginuid to take a kuid_t.

Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.

Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
15e473046cb6e5d18a4d0057e61d76315230382b 07-Sep-2012 Eric W. Biederman <ebiederm@xmission.com> netlink: Rename pid to portid to avoid confusion

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
599901c3e4204e9d9c5a24df5402cd91617a2a26 29-Aug-2012 Julia Lawall <Julia.Lawall@lip6.fr> net/xfrm/xfrm_state.c: fix error return code

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
65e0736bc2ac314bd374e93c24dd0698ac5ee66d 15-Aug-2012 Fan Du <fan.du@windriver.com> xfrm: remove redundant parameter "int dir" in struct xfrm_mgr.acquire

Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e3c0d04750751389d5116267f8cf4687444d9a50 30-Jul-2012 Fan Du <fdu@windriver.com> Fix unexpected SA hard expiration after changing date

After SA is setup, one timer is armed to detect soft/hard expiration,
however the timer handler uses xtime to do the math. This makes hard
expiration occurs first before soft expiration after setting new date
with big interval. As a result new child SA is deleted before rekeying
the new one.

Signed-off-by: Fan Du <fdu@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e3fd7a06dc20b2d8ec6892233ad2012968fe7b6 21-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com> net: remove ipv6_addr_copy()

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8fcbc63701b01e913e6a13937f765fabf9c969c5 07-Jul-2011 Tushar Gohad <tgohad@mvista.com> XFRM: Fix memory leak in xfrm_state_update

Upon "ip xfrm state update ..", xfrm_add_sa() takes an extra reference on
the user-supplied SA and forgets to drop the reference when
xfrm_state_update() returns 0. This leads to a memory leak as the
parameter SA is never freed. This change attempts to fix the leak by
calling __xfrm_state_put() when xfrm_state_update() updates a valid SA
(err = 0). The parameter SA is added to the gc list when the final
reference is dropped by xfrm_add_sa() upon completion.

Signed-off-by: Tushar Gohad <tgohad@mvista.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
b71d1d426d263b0b6cb5760322efebbfc89d4463 22-Apr-2011 Eric Dumazet <eric.dumazet@gmail.com> inet: constify ip headers and in6_addr

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
af2f464e326ebad57284cfdecb03f1606e89bbc7 28-Mar-2011 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Assign esn pointers when cloning a state

When we clone a xfrm state we have to assign the replay_esn
and the preplay_esn pointers to the state if we use the
new replay detection method. To this end, we add a
xfrm_replay_clone() function that allocates memory for
the replay detection and takes over the necessary values
from the original state.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
a454f0ccefbfdbfc0e1aa8a5f8010af5e48b8845 22-Mar-2011 Wei Yongjun <yjwei@cn.fujitsu.com> xfrm: Fix initialize repl field of struct xfrm_state

Commit 'xfrm: Move IPsec replay detection functions to a separate file'
(9fdc4883d92d20842c5acea77a4a21bb1574b495)
introduce repl field to struct xfrm_state, and only initialize it
under SA's netlink create path, the other path, such as pf_key,
ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if
the SA is created by pf_key, any input packet with SA's encryption
algorithm will cause panic.

int xfrm_input()
{
...
x->repl->advance(x, seq);
...
}

This patch fixed it by introduce new function __xfrm_init_state().

Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs
EIP: 0060:[<c078e5d5>] EFLAGS: 00010206 CPU: 0
EIP is at xfrm_input+0x31c/0x4cc
EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000
ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000)
Stack:
00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0
00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000
dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32
Call Trace:
[<c0786848>] xfrm4_rcv_encap+0x22/0x27
[<c0786868>] xfrm4_rcv+0x1b/0x1d
[<c074ee56>] ip_local_deliver_finish+0x112/0x1b1
[<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
[<c074ef32>] NF_HOOK.clone.1+0x3d/0x44
[<c074ef77>] ip_local_deliver+0x3e/0x44
[<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
[<c074ec03>] ip_rcv_finish+0x30a/0x332
[<c074e8f9>] ? ip_rcv_finish+0x0/0x332
[<c074ef32>] NF_HOOK.clone.1+0x3d/0x44
[<c074f188>] ip_rcv+0x20b/0x247
[<c074e8f9>] ? ip_rcv_finish+0x0/0x332
[<c072797d>] __netif_receive_skb+0x373/0x399
[<c0727bc1>] netif_receive_skb+0x4b/0x51
[<e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp]
[<c072818f>] net_rx_action+0x9a/0x17d
[<c0445b5c>] __do_softirq+0xa1/0x149
[<c0445abb>] ? __do_softirq+0x0/0x149

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d8647b79c3b7e223ac051439d165bc8e7bbb832f 08-Mar-2011 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Add user interface for esn and big anti-replay windows

This patch adds a netlink based user interface to configure
esn and big anti-replay windows. The new netlink attribute
XFRMA_REPLAY_ESN_VAL is used to configure the new implementation.
If the XFRM_STATE_ESN flag is set, we use esn and support for big
anti-replay windows for the configured state. If this flag is not
set we use the new implementation with 32 bit sequence numbers.
A big anti-replay window can be configured in this case anyway.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
9fdc4883d92d20842c5acea77a4a21bb1574b495 08-Mar-2011 Steffen Klassert <steffen.klassert@secunet.com> xfrm: Move IPsec replay detection functions to a separate file

To support multiple versions of replay detection, we move the replay
detection functions to a separate file and make them accessible
via function pointers contained in the struct xfrm_replay.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
1d28f42c1bd4bb2363d88df74d0128b4da135b4a 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Put flowi_* prefix on AF independent members of struct flowi

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
a70486f0e669730bad6713063e3f59e2e870044f 28-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Pass const xfrm_address_t objects to xfrm_state_lookup* and xfrm_find_acq.

Signed-off-by: David S. Miller <davem@davemloft.net>
33765d06033cc4ba4d9ae6d3d606ef3f28773c1b 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify xfrm_address_t args to xfrm_state_find.

This required a const'ification in xfrm_init_tempstate() too.

Signed-off-by: David S. Miller <davem@davemloft.net>
1f673c5fe2eca9007e60d82186473aa94090ea4c 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Remove unused 'saddr' and 'daddr' args to xfrm_state_look_at.

Signed-off-by: David S. Miller <davem@davemloft.net>
9aa600889be2f6a6a5fed85a33d4530920662965 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify xfrm_address_t args to __xfrm_state_lookup{,_byaddr}.

Signed-off-by: David S. Miller <davem@davemloft.net>
046860138e3f244d19e59c4fb1ef637803f3abbf 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify xfrm_tmpl arg to xfrm_init_tempstate.

Signed-off-by: David S. Miller <davem@davemloft.net>
2ab38503d0dff932cb657d8ef6055f28910ac0ef 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify xfrm_address_t args to xfrm_*_hash.

Signed-off-by: David S. Miller <davem@davemloft.net>
183cad12785ffc036571c4b789dc084ec61a1bad 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Const'ify pointer args to km_migrate() and implementations.

Signed-off-by: David S. Miller <davem@davemloft.net>
214e005bc32c7045b8554f9f0fb07b3fcce2cd42 24-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Pass km_event pointers around as const when possible.

Signed-off-by: David S. Miller <davem@davemloft.net>
b520e9f616f4f29c8d2557ba704b74ce6d79ff07 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_state_find() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
1a898592b2bde7b109e121ccb7498d40396fb5c7 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_init_tempstate() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
4a08ab0fe424925352729f1c99b39b1ed876fb14 23-Feb-2011 David S. Miller <davem@davemloft.net> xfrm: Mark flowi arg to xfrm_state_look_at() const.

Signed-off-by: David S. Miller <davem@davemloft.net>
78347c8c6b2ddf20535bc1b18d749a3bbdea2a60 07-Dec-2010 Thomas Egerer <thomas.egerer@secunet.com> xfrm: Fix xfrm_state_migrate leak

xfrm_state_migrate calls kfree instead of xfrm_state_put to free
a failed state. According to git commit 553f9118 this can cause
memory leaks.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
8444cf712c5f71845cba9dc30d8f530ff0d5ff83 20-Sep-2010 Thomas Egerer <thomas.egerer@secunet.com> xfrm: Allow different selector family in temporary state

The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.

Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
34996cb91dd72f0b0456d8fd3fef4aaee62232f2 31-Mar-2010 Herbert Xu <herbert@gondor.apana.org.au> xfrm: Remove xfrm_state_genid

The xfrm state genid only needs to be matched against the copy
saved in xfrm_dst. So we don't need a global genid at all. In
fact, we don't even need to initialise it.

Based on observation by Timo Teräs.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
3d6acfa7641fd0a35f608b142f61e79f7ed8db43 22-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: SA lookups with mark

Allow mark to be added to the SA lookup

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
bd55775c8dd656fc69b3a42a1c4ab32abb7e8af9 23-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: SA lookups signature with mark

pass mark to all SA lookups to prepare them for when we add code
to have them search.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e64cc9572b43afcbcd2d004538db435f2cd0587 19-Feb-2010 Jamal Hadi Salim <hadi@cyberus.ca> xfrm: Flushing empty SAD generates false events

To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window2 and you see the flush event on window1.
With this fix, you still get prompt on window1 but no event on window2.

Thanks to Alexey Dobriyan for finding a bug in earlier version
when using pfkey to do the flushing.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
069c474e88bb7753183f1eadbd7786c27888c8e3 17-Feb-2010 David S. Miller <davem@davemloft.net> xfrm: Revert false event eliding commits.

As reported by Alexey Dobriyan:

--------------------
setkey now takes several seconds to run this simple script
and it spits "recv: Resource temporarily unavailable" messages.

#!/usr/sbin/setkey -f
flush;
spdflush;

add A B ipcomp 44 -m tunnel -C deflate;
add B A ipcomp 45 -m tunnel -C deflate;

spdadd A B any -P in ipsec
ipcomp/tunnel/192.168.1.2-192.168.1.3/use;
spdadd B A any -P out ipsec
ipcomp/tunnel/192.168.1.3-192.168.1.2/use;
--------------------

Obviously applications want the events even when the table
is empty. So we cannot make this behavioral change.

Signed-off-by: David S. Miller <davem@davemloft.net>
6836b9bdd98e3b500cd49512484df68f46e14659 16-Feb-2010 jamal <hadi@cyberus.ca> xfrm: avoid spinlock in get_acqseq() used by xfrm user

Eric's version fixed it for pfkey. This one is for xfrm user.
I thought about amortizing those two get_acqseq()s but it seems
reasonable to have two of these sequence spaces for the two different
interfaces.

cheers,
jamal
commit d5168d5addbc999c94aacda8f28a4a173756a72b
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Tue Feb 16 06:51:22 2010 -0500

xfrm: avoid spinlock in get_acqseq() used by xfrm user

This is in the same spirit as commit 28aecb9d7728dc26bf03ce7925fe622023a83a2a
by Eric Dumazet.
Use atomic_inc_return() in get_acqseq() to avoid taking a spinlock

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
553f9118abc4fc53674fff87f6fe5fa3f56a41ed 15-Feb-2010 Herbert Xu <herbert@gondor.apana.org.au> xfrm: Fix xfrm_state_clone leak

xfrm_state_clone calls kfree instead of xfrm_state_put to free
a failed state. Depending on the state of the failed state, it
can cause leaks to things like module references.

All states should be freed by xfrm_state_put past the point of
xfrm_init_state.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
19f4c7133fc1b94001b997c4843d0a9192ee63e5 11-Feb-2010 jamal <hadi@cyberus.ca> xfrm: Flushing empty SAD generates false events

To see the effect make sure you have an empty SAD.
-On window1 "ip xfrm mon"
-on window2 issue "ip xfrm state flush"
You get prompt back in window1
and you see the flush event on window2.
With this fix, you still get prompt on window1 but no
event on window2.

I was tempted to return -ESRCH on window1 (which would
show "RTNETLINK answers: No such process") but didnt want
to change current behavior.

cheers,
jamal
commit 5f3dd4a772326166e1bcf54acc2391df00dc7ab5
Author: Jamal Hadi Salim <hadi@cyberus.ca>
Date: Thu Feb 11 04:41:36 2010 -0500

xfrm: Flushing empty SAD generates false events

To see the effect make sure you have an empty SAD.
On window1 "ip xfrm mon" and on window2 issue "ip xfrm state flush"
You get prompt back in window1 and you see the flush event on window2.
With this fix, you still get prompt on window1 but no event on window2.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>

Signed-off-by: David S. Miller <davem@davemloft.net>
e071041be037eca208b62b84469a06bdfc692bea 23-Jan-2010 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: fix "ip xfrm state|policy count" misreport

"ip xfrm state|policy count" report SA/SP count from init_net,
not from netns of caller process.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
4447bb33f09444920a8f1d89e1540137429351b6 25-Nov-2009 Martin Willi <martin@strongswan.org> xfrm: Store aalg in xfrm_state with a user specified truncation length

Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.

If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e0d57fd6dad37d72a3ca6db00ca8c76f2215454 09-Nov-2009 Yury Polyanskiy <polyanskiy@gmail.com> xfrm: SAD entries do not expire correctly after suspend-resume

This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine
in the suspended state. This leads to the connectivity problems because
after resuming local machine thinks that the SAD entry is still valid, while
it has already been expired on the remote server.

The cause of this is very simple: the timeouts in the net/xfrm are bound to
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.

I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.

This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).

This patch is against 2.6.31.4. Please CC me.

Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1802571b9865c0fc1d8d0fa39cf73275f3a75af3 28-Jun-2009 Wei Yongjun <yjwei@cn.fujitsu.com> xfrm: use xfrm_addr_cmp() instead of compare addresses directly

Clean up to use xfrm_addr_cmp() instead of compare addresses directly.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
6a783c9067e3f71aac61a9262fe42c1f68efd4fc 27-Apr-2009 Nicolas Dichtel <nicolas.dichtel@6wind.com> xfrm: wrong hash value for temporary SA

When kernel inserts a temporary SA for IKE, it uses the wrong hash
value for dst list. Two hash values were calcultated before: one with
source address and one with a wildcard source address.

Bug hinted by Junwei Zhang <junwei.zhang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7d0b591c655ca0d72ebcbd242cf659a20a8995c5 27-Mar-2009 Chuck Ebbert <cebbert@redhat.com> xfrm: spin_lock() should be spin_unlock() in xfrm_state.c

spin_lock() should be spin_unlock() in xfrm_state_walk_done().

caused by:
commit 12a169e7d8f4b1c95252d8b04ed0f1033ed7cfe2
"ipsec: Put dumpers on the dump list"

Reported-by: Marc Milgram <mmilgram@redhat.com>
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
08ec9af1c0622b0858099a8644a33af02dd3019f 13-Mar-2009 David S. Miller <davem@davemloft.net> xfrm: Fix xfrm_state_find() wrt. wildcard source address.

The change to make xfrm_state objects hash on source address
broke the case where such source addresses are wildcarded.

Fix this by doing a two phase lookup, first with fully specified
source address, next using saddr wildcarded.

Reported-by: Nicolas Dichtel <nicolas.dichtel@dev.6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d81d228567f55af517796638075dbbce9b40d7af 04-Dec-2008 Martin Willi <martin@strongswan.org> xfrm: Accept XFRM_STATE_AF_UNSPEC SAs on IPv4/IPv6 only hosts

Installing SAs using the XFRM_STATE_AF_UNSPEC fails on hosts with
support for one address family only. This patch accepts such SAs, even
if the processing of not supported packets will fail.

Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b27aeadb5948d400df83db4d29590fb9862ba49d 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns sysctls

Make
net.core.xfrm_aevent_etime
net.core.xfrm_acq_expires
net.core.xfrm_aevent_rseqth
net.core.xfrm_larval_drop

sysctls per-netns.

For that make net_core_path[] global, register it to prevent two
/proc/net/core antries and change initcall position -- xfrm_init() is called
from fs_initcall, so this one should be fs_initcall at least.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7c2776ee21a60e0d370538bd08b9ed82979f6e3a 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: flush SA/SPDs on netns stop

SA/SPD doesn't pin netns (and it shouldn't), so get rid of them by hand.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
db983c1144884cab10d6397532f4bf05eb0c01d2 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: KM reporting in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a6483b790f8efcd8db190c1c0ff93f9d9efe919a 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns NETLINK_XFRM socket

Stub senders to init_net's one temporarily.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
284fa7da300adcb700b44df2f64a536b434d4650 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: state walking in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5447c5e401c49aba0c36bb1066f2d25b152553b7 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: finding states in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
12604d8aaa38ac4e24299c9803fefdb301a16421 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: fixup xfrm_alloc_spi()

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
221df1ed33c9284fc7a6f6e47ca7f8d5f3665d43 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: state lookup in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0e6024519b4da2d9413b97be1de8122d5709ccc1 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: state flush in netns

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
98806f75ba2afc716e4d2f915d3ac7687546f9c0 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: trivial netns propagations

Take netns from xfrm_state or xfrm_policy.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
64d0cd009718ce64cf0f388142ead7ea41f1f3c8 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: propagate netns into bydst/bysrc/byspi hash functions

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
50a30657fd7ee77a94a6bf0ad86eba7c37c3032e 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns km_waitq

Disallow spurious wakeups in __xfrm_lookup().

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c78371441c0d957f54c9f8a35b3ee5a378d14808 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns state GC work

State GC is per-netns, and this is part of it.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b8a0ae20b0eecd4b86a113d2abe2fa5a582b30a6 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns state GC list

km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().

To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
630827338585022b851ec0a6335df8e436c900e4 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_hash_work

All of this is implicit passing which netns's hashes should be resized.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0bf7c5b019518d3fe9cb96b9c97bf44d251472c3 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_state counts

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
529983ecabeae3d8e61c9e27079154b1b8544dcd 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_state_hmask

Since hashtables are per-netns, they can be independently resized.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b754a4fd8f58d245c9b5e92914cce09c4309cb67 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_state_byspi hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d320bbb306f2085892bc958781e8fcaf5d491589 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_state_bysrc hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
73d189dce486cd6693fa29169b1aac0872efbcea 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_state_bydst hash

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d4139c76905833afcb77fe8ccc17f302a0eb9ab 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: per-netns xfrm_state_all list

This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
673c09be457bb23aa0eaaa79804cbb342210d195 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: add struct xfrm_state::xs_net

To avoid unnecessary complications with passing netns around.

* set once, very early after allocating
* once set, never changes

For a while create every xfrm_state in init_net.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d62ddc21b674b5ac1466091ff3fbf7baa53bc92c 26-Nov-2008 Alexey Dobriyan <adobriyan@gmail.com> netns xfrm: add netns boilerplate

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
21454aaad30651ba0dcc16fe5271bc12ee21f132 31-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace NIPQUAD() in net/*/

Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b095d98928fdb9e3b75be20a54b7a6cbf6ca9ad 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace %p6 with %pI6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fdb46ee752ed05c94bac71fe3decdb5175ec6e1f 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net, misc: replace uses of NIP6_FMT with %p6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
13c1d18931ebb5cf407cb348ef2cd6284d68902d 05-Oct-2008 Arnaud Ebalard <arno@natisbad.org> xfrm: MIGRATE enhancements (draft-ebalard-mext-pfkey-enhanced-migrate)

Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
12a169e7d8f4b1c95252d8b04ed0f1033ed7cfe2 01-Oct-2008 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Put dumpers on the dump list

Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:

As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.

I've also changed the order of addition on new states to prevent
a never-ending dump.

Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
5c1824587f0797373c95719a196f6098f7c6d20c 23-Sep-2008 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Fix xfrm_state_walk race

As discovered by Timo Teräs, the currently xfrm_state_walk scheme
is racy because if a second dump finishes before the first, we
may free xfrm states that the first dump would walk over later.

This patch fixes this by storing the dumps in a list in order
to calculate the correct completion counter which cures this
problem.

I've expanded netlink_cb in order to accomodate the extra state
related to this. It shouldn't be a big deal since netlink_cb
is kmalloced for each dump and we're just increasing it by 4 or
8 bytes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
08569908fffec3625e29eec7cf7577eaa512e719 10-Sep-2008 David S. Miller <davem@davemloft.net> ipsec: Add missing list_del() in xfrm_state_gc_task().

Otherwise entries stay on the GC todo list forever, even after we free
them.

Signed-off-by: David S. Miller <davem@davemloft.net>
abb81c4f3cb9b8d421f1e5474811ef1d461d341c 10-Sep-2008 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Use RCU-like construct for saved state within a walk

Now that we save states within a walk we need synchronisation
so that the list the saved state is on doesn't disappear from
under us.

As it stands this is done by keeping the state on the list which
is bad because it gets in the way of the management of the state
life-cycle.

An alternative is to make our own pseudo-RCU system where we use
counters to indicate which state can't be freed immediately as
it may be referenced by an ongoing walk when that resumes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
225f40055f779032974a9fce7b2f9c9eda04ff58 09-Sep-2008 Herbert Xu <herbert@gondor.apana.org.au> ipsec: Restore larval states and socket policies in dump

The commit commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3 ("[XFRM]:
Speed up xfrm_policy and xfrm_state walking") inadvertently removed
larval states and socket policies from netlink dumps. This patch
restores them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
37b08e34a98c664bea86e3fae718ac45a46b7276 03-Sep-2008 David S. Miller <davem@davemloft.net> ipsec: Fix deadlock in xfrm_state management.

Ever since commit 4c563f7669c10a12354b72b518c2287ffc6ebfb3
("[XFRM]: Speed up xfrm_policy and xfrm_state walking") it is
illegal to call __xfrm_state_destroy (and thus xfrm_state_put())
with xfrm_state_lock held. If we do, we'll deadlock since we
have the lock already and __xfrm_state_destroy() tries to take
it again.

Fix this by pushing the xfrm_state_put() calls after the lock
is dropped.

Signed-off-by: David S. Miller <davem@davemloft.net>
547b792cac0a038b9dbf958d3c120df3740b5572 26-Jul-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> net: convert BUG_TRAP to generic WARN_ON

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2532386f480eefbdd67b48be55fb4fb3e5a6081c 18-Apr-2008 Eric Paris <eparis@redhat.com> Audit: collect sessionid in netlink messages

Previously I added sessionid output to all audit messages where it was
available but we still didn't know the sessionid of the sender of
netlink messages. This patch adds that information to netlink messages
so we can audit who sent netlink messages.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5e2c433d9f84dd9b0e01ef8607380d53a7f64d69 27-Apr-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [XFRM] AUDIT: Fix flowlabel text format ambibuity.

Flowlabel text format was not correct and thus ambiguous.
For example, 0x00123 or 0x01203 are formatted as 0x123.
This is not what audit tools want.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
df9dcb4588aca9cc243cf1f3f454361a84e1cbdb 24-Mar-2008 Kazunori MIYAZAWA <kazunori@miyazawa.org> [IPSEC]: Fix inter address family IPsec tunnel handling.

Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4c563f7669c10a12354b72b518c2287ffc6ebfb3 29-Feb-2008 Timo Teras <timo.teras@iki.fi> [XFRM]: Speed up xfrm_policy and xfrm_state walking

Change xfrm_policy and xfrm_state walking algorithm from O(n^2) to O(n).
This is achieved adding the entries to one more list which is used
solely for walking the entries.

This also fixes some races where the dump can have duplicate or missing
entries when the SPD/SADB is modified during an ongoing dump.

Dumping SADB with 20000 entries using "time ip xfrm state" the sys
time dropped from 1.012s to 0.080s.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
0c11b9428f619ab377c92eff2f160a834a6585dd 10-Jan-2008 Al Viro <viro@zeniv.linux.org.uk> [PATCH] switch audit_get_loginuid() to task_struct *

all callers pass something->audit_context

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
5255dc6e14ce640ccb3e062362510a00ac59bbcd 01-Feb-2008 Adrian Bunk <bunk@kernel.org> [XFRM]: Remove unused exports.

This patch removes the following no longer used EXPORT_SYMBOL's:
- xfrm_input.c: xfrm_parse_spi
- xfrm_state.c: xfrm_replay_check
- xfrm_state.c: xfrm_replay_advance

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
533cb5b0a63f28ecab5503cfceb77e641fa7f7c4 31-Jan-2008 Eric Dumazet <dada1@cosmosbay.com> [XFRM]: constify 'struct xfrm_type'

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6666351df90656677723f8232b3fdd26a500e51e 08-Jan-2008 Eric Dumazet <dada1@cosmosbay.com> [XFRM]: xfrm_state_clone() should be static, not exported

xfrm_state_clone() is not used outside of net/xfrm/xfrm_state.c
There is no need to export it.

Spoted by sparse checker.
CHECK net/xfrm/xfrm_state.c
net/xfrm/xfrm_state.c:1103:19: warning: symbol 'xfrm_state_clone' was not
declared. Should it be static?

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cf35f43e6e41b160d8dedd80a127210fd3be9ada 06-Jan-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> [XFRM]: Kill some bloat

net/xfrm/xfrm_state.c:
xfrm_audit_state_delete | -589
xfrm_replay_check | -542
xfrm_audit_state_icvfail | -520
xfrm_audit_state_add | -589
xfrm_audit_state_replay_overflow | -523
xfrm_audit_state_notfound_simple | -509
xfrm_audit_state_notfound | -521
7 functions changed, 3793 bytes removed, diff: -3793

net/xfrm/xfrm_state.c:
xfrm_audit_helper_pktinfo | +522
xfrm_audit_helper_sainfo | +598
2 functions changed, 1120 bytes added, diff: +1120

net/xfrm/xfrm_state.o:
9 functions changed, 1120 bytes added, 3793 bytes removed, diff: -2673

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
9a429c4983deae020f1e757ecc8f547b6d4e2f2b 02-Jan-2008 Eric Dumazet <dada1@cosmosbay.com> [NET]: Add some acquires/releases sparse annotations.

Add __acquires() and __releases() annotations to suppress some sparse
warnings.

example of warnings :

net/ipv4/udp.c:1555:14: warning: context imbalance in 'udp_seq_start' - wrong
count at exit
net/ipv4/udp.c:1571:13: warning: context imbalance in 'udp_seq_stop' -
unexpected unlock

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
afeb14b49098ba7a51c96e083a4105a0301f94c4 21-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: RFC4303 compliant auditing

This patch adds a number of new IPsec audit events to meet the auditing
requirements of RFC4303. This includes audit hooks for the following events:

* Could not find a valid SA [sections 2.1, 3.4.2]
. xfrm_audit_state_notfound()
. xfrm_audit_state_notfound_simple()

* Sequence number overflow [section 3.3.3]
. xfrm_audit_state_replay_overflow()

* Replayed packet [section 3.4.3]
. xfrm_audit_state_replay()

* Integrity check failure [sections 3.4.4.1, 3.4.4.2]
. xfrm_audit_state_icvfail()

While RFC4304 deals only with ESP most of the changes in this patch apply to
IPsec in general, i.e. both AH and ESP. The one case, integrity check
failure, where ESP specific code had to be modified the same was done to the
AH code for the sake of consistency.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
68277accb3a5f004344f4346498640601b8b7016 21-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: Assorted IPsec fixups

This patch fixes a number of small but potentially troublesome things in the
XFRM/IPsec code:

* Use the 'audit_enabled' variable already in include/linux/audit.h
Removed the need for extern declarations local to each XFRM audit fuction

* Convert 'sid' to 'secid' everywhere we can
The 'sid' name is specific to SELinux, 'secid' is the common naming
convention used by the kernel when refering to tokenized LSM labels,
unfortunately we have to leave 'ctx_sid' in 'struct xfrm_sec_ctx' otherwise
we risk breaking userspace

* Convert address display to use standard NIP* macros
Similar to what was recently done with the SPD audit code, this also also
includes the removal of some unnecessary memcpy() calls

* Move common code to xfrm_audit_common_stateinfo()
Code consolidation from the "less is more" book on software development

* Proper spacing around commas in function arguments
Minor style tweak since I was already touching the code

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4bda4f250d21c3e4f2a2da5f4cef829a434a4046 14-Dec-2007 Pavel Emelyanov <xemul@openvz.org> [XFRM]: Fix potential race vs xfrm_state(only)_find and xfrm_hash_resize.

The _find calls calculate the hash value using the
xfrm_state_hmask, without the xfrm_state_lock. But the
value of this mask can change in the _resize call under
the state_lock, so we risk to fail in finding the desired
entry in hash.

I think, that the hash value is better to calculate
under the state lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
df01812eba19834e48abd43246abedfbc4feeb7e 07-Dec-2007 Denis Cheng <crquan@gmail.com> [XFRM] net/xfrm/xfrm_state.c: use LIST_HEAD instead of LIST_HEAD_INIT

single list_head variable initialized with LIST_HEAD_INIT could almost
always can be replaced with LIST_HEAD declaration, this shrinks the code
and looks better.

Signed-off-by: Denis Cheng <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b24b8a247ff65c01b252025926fe564209fae4fc 24-Jan-2008 Pavel Emelyanov <xemul@openvz.org> [NET]: Convert init_timer into setup_timer

Many-many code in the kernel initialized the timer->function
and timer->data together with calling init_timer(timer). There
is already a helper for this. Use it for networking code.

The patch is HUGE, but makes the code 130 lines shorter
(98 insertions(+), 228 deletions(-)).

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2d60abc2a937bf77575c3b8c83faeeb84a84e654 04-Jan-2008 Eric Dumazet <dada1@cosmosbay.com> [XFRM]: Do not define km_migrate() if !CONFIG_XFRM_MIGRATE

In include/net/xfrm.h we find :

#ifdef CONFIG_XFRM_MIGRATE
extern int km_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
struct xfrm_migrate *m, int num_bundles);
...
#endif

We can also guard the function body itself in net/xfrm/xfrm_state.c
with same condition.

(Problem spoted by sparse checker)
make C=2 net/xfrm/xfrm_state.o
...
net/xfrm/xfrm_state.c:1765:5: warning: symbol 'km_migrate' was not declared. Should it be static?
...

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5951cab136d8b7e84696061dc2e69c402bc94f61 20-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: Audit function arguments misordered

In several places the arguments to the xfrm_audit_start() function are
in the wrong order resulting in incorrect user information being
reported. This patch corrects this by pacing the arguments in the
correct order.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9ab4c954ce2b2b3c485bee7e425fda05946893be 12-Dec-2007 Paul Moore <paul.moore@hp.com> [XFRM]: Display the audited SPI value in host byte order.

Currently the IPsec protocol SPI values are written to the audit log in
network byte order which is different from almost all other values which
are recorded in host byte order. This patch corrects this inconsistency
by writing the SPI values to the audit record in host byte order.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5dba4797115c8fa05c1a4d12927a6ae0b33ffc41 27-Nov-2007 Patrick McHardy <kaber@trash.net> [XFRM]: Fix leak of expired xfrm_states

The xfrm_timer calls __xfrm_state_delete, which drops the final reference
manually without triggering destruction of the state. Change it to use
xfrm_state_put to add the state to the gc list when we're dropping the
last reference. The timer function may still continue to use the state
safely since the final destruction does a del_timer_sync().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
13996378e6585fb25e582afe7489bf52dde78deb 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Rename mode to outer_mode and add inner_mode

This patch adds a new field to xfrm states called inner_mode. The existing
mode object is renamed to outer_mode.

This is the first part of an attempt to fix inter-family transforms. As it
is we always use the outer family when determining which mode to use. As a
result we may end up shoving IPv4 packets into netfilter6 and vice versa.

What we really want is to use the inner family for the first part of outbound
processing and the outer family for the second part. For inbound processing
we'd use the opposite pairing.

I've also added a check to prevent silly combinations such as transport mode
with inter-family transforms.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17c2a42a24e1e8dd6aa7cea4f84e034ab1bfff31 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Store afinfo pointer in xfrm_mode

It is convenient to have a pointer from xfrm_state to address-specific
functions such as the output function for a family. Currently the
address-specific policy code calls out to the xfrm state code to get
those pointers when we could get it in an easier way via the state
itself.

This patch adds an xfrm_state_afinfo to xfrm_mode (since they're
address-specific) and changes the policy code to use it. I've also
added an owner field to do reference counting on the module providing
the afinfo even though it isn't strictly necessary today since IPv6
can't be unloaded yet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
aa5d62cc8777f733f8b59b5586c0a1989813189e 18-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Move type and mode map into xfrm_state.c

The type and mode maps are only used by SAs, not policies. So it makes
sense to move them from xfrm_policy.c into xfrm_state.c. This also allows
us to mark xfrm_get_type/xfrm_put_type/xfrm_get_mode/xfrm_put_mode as
static.

The only other change I've made in the move is to get rid of the casts
on the request_module call for types. They're unnecessary because C
will promote them to ints anyway.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
658b219e9379d75fbdc578b9630b598098471258 09-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Move common code into xfrm_alloc_spi

This patch moves some common code that conceptually belongs to the xfrm core
from af_key/xfrm_user into xfrm_alloc_spi.

In particular, the spin lock on the state is now taken inside xfrm_alloc_spi.
Previously it also protected the construction of the response PF_KEY/XFRM
messages to user-space. This is inconsistent as other identical constructions
are not protected by the state lock. This is bad because they in fact should
be protected but only in certain spots (so as not to hold the lock for too
long which may cause packet drops).

The SPI byte order conversion has also been moved.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
75ba28c633952f7994a7117c98ae6515b58f8d30 09-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Remove gratuitous km wake-up events on ACQUIRE

There is no point in waking people up when creating/updating larval states
because they'll just go back to sleep again as larval states by definition
cannot be found by xfrm_state_find.

We should only wake them up when the larvals mature or die.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
cdf7e668d4327a33e11be04c4cb9bcc604eaaa0f 09-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Unexport xfrm_replay_notify

Now that the only callers of xfrm_replay_notify are in xfrm, we can remove
the export.

This patch also removes xfrm_aevent_doreplay since it's now called in just
one spot.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
83815dea47cf3e98ccbb6aecda08cba1ba91208f 09-Oct-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Move xfrm_state_check into xfrm_output.c

The functions xfrm_state_check and xfrm_state_check_space are only used by
the output code in xfrm_output.c so we can move them over.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
ab5f5e8b144e4c804ef3aa1ce08a9ca9f01187ce 17-Sep-2007 Joy Latten <latten@austin.ibm.com> [XFRM]: xfrm audit calls

This patch modifies the current ipsec audit layer
by breaking it up into purpose driven audit calls.

So far, the only audit calls made are when add/delete
an SA/policy. It had been discussed to give each
key manager it's own calls to do this, but I found
there to be much redundnacy since they did the exact
same things, except for how they got auid and sid, so I
combined them. The below audit calls can be made by any
key manager. Hopefully, this is ok.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b5890d8ba47741425fe3c0d753e1b57bc0561b7b 11-Aug-2007 Jesper Juhl <jesper.juhl@gmail.com> [XFRM]: Clean up duplicate includes in net/xfrm/

This patch cleans up duplicate includes in
net/xfrm/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
48b8d78315bf2aef4b6b4fb41c2c94e0b6600234 26-Jul-2007 Joakim Koskela <jookos@gmail.com> [XFRM]: State selection update to use inner addresses.

This patch modifies the xfrm state selection logic to use the inner
addresses where the outer have been (incorrectly) used. This is
required for beet mode in general and interfamily setups in both
tunnel and beet mode.

Signed-off-by: Joakim Koskela <jookos@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Diego Beltrami <diego.beltrami@gmail.com>
Signed-off-by: Miika Komu <miika@iki.fi>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7dc12d6dd6cc1aa489c6f3e34a75e8023c945da8 19-Jul-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] XFRM: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
628529b6ee334fedc8d25ce56205bb99566572b9 03-Jul-2007 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM] Introduce standalone SAD lookup

This allows other in-kernel functions to do SAD lookups.
The only known user at the moment is pktgen.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
281216177a407f78cfd650ee4391afc487577193 19-Jun-2007 Patrick McHardy <kaber@trash.net> [XFRM]: Fix MTU calculation for non-ESP SAs

My IPsec MTU optimization patch introduced a regression in MTU calculation
for non-ESP SAs, the SA's header_len needs to be subtracted from the MTU if
the transform doesn't provide a ->get_mtu() function.

Reported-and-tested-by: Marco Berizzi <pupilla@hotmail.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4aa2e62c45b5ca08be2d0d3c0744d7585b56e860 05-Jun-2007 Joy Latten <latten@austin.ibm.com> xfrm: Add security check before flushing SAD/SPD

Currently we check for permission before deleting entries from SAD and
SPD, (see security_xfrm_policy_delete() security_xfrm_state_delete())
However we are not checking for authorization when flushing the SPD and
the SAD completely. It was perhaps missed in the original security hooks
patch.

This patch adds a security check when flushing entries from the SAD and
SPD. It runs the entire database and checks each entry for a denial.
If the process attempting the flush is unable to remove all of the
entries a denial is logged the the flush function returns an error
without removing anything.

This is particularly useful when a process may need to create or delete
its own xfrm entries used for things like labeled networking but that
same process should not be able to delete other entries or flush the
entire database.

Signed-off-by: Joy Latten<latten@austin.ibm.com>
Signed-off-by: Eric Paris <eparis@parisplace.org>
Signed-off-by: James Morris <jmorris@namei.org>
01e67d08faa782f1a4d38de702331f5904def6ad 25-May-2007 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Allow XFRM_ACQ_EXPIRES to be tunable via sysctl.

Signed-off-by: David S. Miller <davem@davemloft.net>
af11e31609d93765c1b22611592543e028f7aa54 04-May-2007 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM] SAD info TLV aggregationx

Aggregate the SAD info TLVs.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
28d8909bc790d936ce33f4402adf7577533bbd4b 26-Apr-2007 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM]: Export SAD info.

On a system with a lot of SAs, counting SAD entries chews useful
CPU time since you need to dump the whole SAD to user space;
i.e something like ip xfrm state ls | grep -i src | wc -l
I have seen taking literally minutes on a 40K SAs when the system
is swapping.
With this patch, some of the SAD info (that was already being tracked)
is exposed to user space. i.e you do:
ip xfrm state count
And you get the count; you can also pass -s to the command line and
get the hash info.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ff50b7997fe06cd5d276b229967bb52d6b3b6c1 21-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: cleanup extra semicolons

Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c5c2523893747f88a83376abad310c8ad13f7197 09-Apr-2007 Patrick McHardy <kaber@trash.net> [XFRM]: Optimize MTU calculation

Replace the probing based MTU estimation, which usually takes 2-3 iterations
to find a fitting value and may underestimate the MTU, by an exact calculation.

Also fix underestimation of the XFRM trailer_len, which causes unnecessary
reallocations.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d729f72dca9406025bcfa9c1f660d71d9ef0ff5 05-Mar-2007 James Morris <jmorris@namei.org> [NET]: Convert xtime.tv_sec to get_seconds()

Where appropriate, convert references to xtime.tv_sec to the
get_seconds() helper function.

Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4c4d51a7316b164ba08af61aa0c124a88bc60450 05-Apr-2007 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Reject packets within replay window but outside the bit mask

Up until this point we've accepted replay window settings greater than
32 but our bit mask can only accomodate 32 packets. Thus any packet
with a sequence number within the window but outside the bit mask would
be accepted.

This patch causes those packets to be rejected instead.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
75e252d981c0e80c14ce90df246e9b1300474c4f 13-Mar-2007 Joy Latten <latten@austin.ibm.com> [XFRM]: Fix missing protocol comparison of larval SAs.

I noticed that in xfrm_state_add we look for the larval SA in a few
places without checking for protocol match. So when using both
AH and ESP, whichever one gets added first, deletes the larval SA.
It seems AH always gets added first and ESP is always the larval
SA's protocol since the xfrm->tmpl has it first. Thus causing the
additional km_query()

Adding the check eliminates accidental double SA creation.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a716c1197d608c55adfba45692a890ca64e10df0 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] XFRM: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
80c9abaabf4283f7cf4a0b3597cd302506635b7f 08-Feb-2007 Shinta Sugimoto <shinta.sugimoto@ericsson.com> [XFRM]: Extension for dynamic update of endpoint address(es)

Extend the XFRM framework so that endpoint address(es) in the XFRM
databases could be dynamically updated according to a request (MIGRATE
message) from user application. Target XFRM policy is first identified
by the selector in the MIGRATE message. Next, the endpoint addresses
of the matching templates and XFRM states are updated according to
the MIGRATE message.

Signed-off-by: Shinta Sugimoto <shinta.sugimoto@ericsson.com>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cdca72652adf597f7fef821a27595fd0dd5eea19 06-Feb-2007 Miika Komu <miika@iki.fi> [IPSEC]: exporting xfrm_state_afinfo

This patch exports xfrm_state_afinfo.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9204d9ca79baac564b49d36d0228a69d7ded084 30-Nov-2006 Joy Latten <latten@austin.ibm.com> audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n

Disables auditing in ipsec when CONFIG_AUDITSYSCALL is
disabled in the kernel.

Also includes a bug fix for xfrm_state.c as a result of
original ipsec audit patch.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
161a09e737f0761ca064ee6a907313402f7a54b6 27-Nov-2006 Joy Latten <latten@austin.ibm.com> audit: Add auditing to ipsec

An audit message occurs when an ipsec SA
or ipsec policy is created/deleted.

Signed-off-by: Joy Latten <latten@austin.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
94b9bb5480e73cec4552b19fc3f809742b4ebf67 05-Dec-2006 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM] Optimize SA dumping

Same comments as in "[XFRM] Optimize policy dumping"

The numbers are (20K SAs):
5d36b1803d875cf101fdb972ff9c56663e508e39 08-Nov-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: annotate ->new_mapping()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
2fab22f2d3290ff7c602fe62f22e825c48e97a06 25-Oct-2006 Patrick McHardy <kaber@trash.net> [XFRM]: Fix xfrm_state accounting

xfrm_state_num needs to be increased for XFRM_STATE_ACQ states created
by xfrm_state_find() to prevent the counter from going negative when
the state is destroyed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
918049f0135854a1583f9b3b88f44dbf2b027329 13-Oct-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Fix xfrm_state_num going negative.

Missing counter bump when hashing in a new ACQ
xfrm_state.

Now that we have two spots to do the hash grow
check, break it out into a helper function.

Signed-off-by: David S. Miller <davem@davemloft.net>
667bbcb6c099d1b74f95c6963ddf37a32e7afc29 04-Oct-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Use destination address for src hash.

Src hash is introduced for Mobile IPv6 route optimization usage.
On current kenrel code it is calculated with source address only.
It results we uses the same hash value for outbound state (when
the node has only one address for Mobile IPv6).
This patch use also destination address as peer information for
src hash to be dispersed.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7b4dc3600e4877178ba94c7fbf7e520421378aa6 28-Sep-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Do not add a state whose SPI is zero to the SPI hash.

SPI=0 is used for acquired IPsec SA and MIPv6 RO state.
Such state should not be added to the SPI hash
because we do not care about it on deleting path.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
8122adf06e6f93ae608cf8227e878cc44f4a8fd1 28-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: xfrm_spi_hash() annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
61f4627b2fecce9d5c9645e4b47e75a0c29ad8c3 28-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: xfrm_replay_advance() annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
a252cc2371930debe3162f1ac91467b9791324cb 28-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: xrfm_replay_check() annotations

seq argument is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
a94cfd19744a568d97b14bbaa500b2a0c3684f34 28-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: xfrm_state_lookup() annotations

spi argument of xfrm_state_lookup() is net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
26977b4ed728ae911a162b16dbfe1a165b7cf9a1 28-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [XFRM]: xfrm_alloc_spi() annotated

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
a9917c06652165fe4eeb9ab7a5d1e0674e90e508 01-Sep-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Fix flusing with hash mask.

This is a minor fix about transformation state flushing
for net-2.6.19. Please apply it.

Signed-off-by: David S. Miller <davem@davemloft.net>
44e36b42a8378be1dcf7e6f8a1cb2710a8903387 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Extract common hashing code into xfrm_hash.[ch]

Signed-off-by: David S. Miller <davem@davemloft.net>
c1969f294e624d5b642fc8e6ab9468b7c7791fa8 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Hash xfrm_state objects by source address too.

The source address is always non-prefixed so we should use
it to help give entropy to the bydst hash.

Signed-off-by: David S. Miller <davem@davemloft.net>
a47f0ce05ae12ce9acad62896ff703175764104e 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Kill excessive refcounting of xfrm_state objects.

The refcounting done for timers and hash table insertions
are just wasted cycles. We can eliminate all of this
refcounting because:

1) The implicit refcount when the xfrm_state object is active
will always be held while the object is in the hash tables.
We never kfree() the xfrm_state until long after we've made
sure that it has been unhashed.

2) Timers are even easier. Once we mark that x->km.state as
anything other than XFRM_STATE_VALID (__xfrm_state_delete
sets it to XFRM_STATE_DEAD), any timer that fires will
do nothing and return without rearming the timer.

Therefore we can defer the del_timer calls until when the
object is about to be freed up during GC. We have to use
del_timer_sync() and defer it to GC because we can't do
a del_timer_sync() while holding x->lock which all callers
of __xfrm_state_delete hold.

This makes SA changes even more light-weight.

Signed-off-by: David S. Miller <davem@davemloft.net>
1c0953997567b22e32fdf85d3b4bc0f2461fd161 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Purge dst references to deleted SAs passively.

Just let GC and other normal mechanisms take care of getting
rid of DST cache references to deleted xfrm_state objects
instead of walking all the policy bundles.

Signed-off-by: David S. Miller <davem@davemloft.net>
c7f5ea3a4d1ae6b3b426e113358fdc57494bc754 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Do not flush all bundles on SA insert.

Instead, simply set all potentially aliasing existing xfrm_state
objects to have the current generation counter value.

This will make routes get relooked up the next time an existing
route mentioning these aliased xfrm_state objects gets used,
via xfrm_dst_check().

Signed-off-by: David S. Miller <davem@davemloft.net>
2575b65434d56559bd03854450b9b6aaf19b9c90 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Simplify xfrm_spi_hash

It can use __xfrm{4,6}_addr_hash().

Signed-off-by: David S. Miller <davem@davemloft.net>
a624c108e5595b5827796c253481436929cd5344 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Put more keys into destination hash function.

Besides the daddr, key the hash on family and reqid too.

Signed-off-by: David S. Miller <davem@davemloft.net>
9d4a706d852411154d0c91b9ffb3bec68b94b25c 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Add generation count to xfrm_state and xfrm_dst.

Each xfrm_state inserted gets a new generation counter
value. When a bundle is created, the xfrm_dst objects
get the current generation counter of the xfrm_state
they will attach to at dst->xfrm.

xfrm_bundle_ok() will return false if it sees an
xfrm_dst with a generation count different from the
generation count of the xfrm_state that dst points to.

This provides a facility by which to passively and
cheaply invalidate cached IPSEC routes during SA
database changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
f034b5d4efdfe0fb9e2a1ce1d95fa7914f24de49 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Dynamic xfrm_state hash table sizing.

The grow algorithm is simple, we grow if:

1) we see a hash chain collision at insert, and
2) we haven't hit the hash size limit (currently 1*1024*1024 slots), and
3) the number of xfrm_state objects is > the current hash mask

All of this needs some tweaking.

Remove __initdata from "hashdist" so we can use it safely at run time.

Signed-off-by: David S. Miller <davem@davemloft.net>
8f126e37c0b250310a48a609bedf92a19a5559ec 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Convert xfrm_state hash linkage to hlists.

Signed-off-by: David S. Miller <davem@davemloft.net>
edcd582152090bfb0ccb4ad444c151798a73eda8 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Pull xfrm_state_by{spi,src} hash table knowledge out of afinfo.

Signed-off-by: David S. Miller <davem@davemloft.net>
2770834c9f44afd1bfa13914c7285470775af657 24-Aug-2006 David S. Miller <davem@sunset.davemloft.net> [XFRM]: Pull xfrm_state_bydst hash table knowledge out of afinfo.

Signed-off-by: David S. Miller <davem@davemloft.net>
41a49cc3c02ace59d4dddae91ea211c330970ee3 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Add sorting interface for state and template.

Under two transformation policies it is required to merge them.
This is a platform to sort state for outbound and templates
for inbound respectively.
It will be used when Mobile IPv6 and IPsec are used at the same time.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
97a64b4577ae2bc5599dbd008a3cd9e25de9b9f5 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Introduce XFRM_MSG_REPORT.

XFRM_MSG_REPORT is a message as notification of state protocol and
selector from kernel to user-space.

Mobile IPv6 will use it when inbound reject is occurred at route
optimization to make user-space know a binding error requirement.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
060f02a3bdd4d9ba8aa3c48e9b470672b1f3a585 24-Aug-2006 Noriaki TAKAMIYA <takamiya@po.ntts.co.jp> [XFRM] STATE: Introduce care-of address.

Care-of address is carried by state as a transformation option like
IPsec encryption/authentication algorithm.

Based on MIPL2 kernel patch.

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
fbd9a5b47ee9c319ff0cae584391241ce78ffd6b 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Common receive function for route optimization extension headers.

XFRM_STATE_WILDRECV flag is introduced; the last resort state is set
it and receives packet which is not route optimized but uses such
extension headers i.e. Mobile IPv6 signaling (binding update and
acknowledgement). A node enabled Mobile IPv6 adds the state.

Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
eb2971b68a7d17a7d0fa2c7fc6fbc4bfe41cd694 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Search by address using source address list.

This is a support to search transformation states by its addresses
by using source address list for Mobile IPv6 usage.
To use it from user-space, it is also added a message type for
source address as a xfrm state option.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c44e6b7ab500d7e3e3f406c83325671be51a752 24-Aug-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM] STATE: Add source address list.

Support source address based searching.
Mobile IPv6 will use it.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5794708f11551b6d19b10673abf4b0202f66b44d 23-Sep-2006 Masahide NAKAMURA <nakam@linux-ipv6.org> [XFRM]: Introduce a helper to compare id protocol.

Put the helper to header for future use.
Based on MIPL2 kernel patch.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cb969f072b6d67770b559617f14e767f47e77ece 25-Jul-2006 Venkat Yekkirala <vyekkirala@TrustedCS.com> [MLSXFRM]: Default labeling of socket specific IPSec policies

This defaults the label of socket-specific IPSec policies to be the
same as the socket they are set on.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e0d1caa7b0d5f02e4f34aa09c695d04251310c6c 25-Jul-2006 Venkat Yekkirala <vyekkirala@TrustedCS.com> [MLSXFRM]: Flow based matching of xfrm policy and state

This implements a seemless mechanism for xfrm policy selection and
state matching based on the flow sid. This also includes the necessary
SELinux enforcement pieces.

Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0da974f4f303a6842516b764507e3c0a03f41e5a 21-Jul-2006 Panagiotis Issaris <takis@issaris.org> [NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
244055fdc8dd39407a33d4eb9f4053dd4ca8f1bb 29-Jun-2006 Adrian Bunk <bunk@stusta.de> [XFRM]: unexport xfrm_state_mtu

This patch removes the unused EXPORT_SYMBOL(xfrm_state_mtu).

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
b59f45d0b2878ab76f8053b0973654e6621828ee 28-May-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] xfrm: Abstract out encapsulation modes

This patch adds the structure xfrm_mode. It is meant to represent
the operations carried out by transport/tunnel modes.

By doing this we allow additional encapsulation modes to be added
without clogging up the xfrm_input/xfrm_output paths.

Candidate modes include 4-to-6 tunnel mode, 6-to-4 tunnel mode, and
BEET modes.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
546be2405be119ef55467aace45f337a16e5d424 28-May-2006 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] xfrm: Undo afinfo lock proliferation

The number of locks used to manage afinfo structures can easily be reduced
down to one each for policy and state respectively. This is based on the
observation that the write locks are only held by module insertion/removal
which are very rare events so there is no need to further differentiate
between the insertion of modules like ipv6 versus esp6.

The removal of the read locks in xfrm4_policy.c/xfrm6_policy.c might look
suspicious at first. However, after you realise that nobody ever takes
the corresponding write lock you'll feel better :)

As far as I can gather it's an attempt to guard against the removal of
the corresponding modules. Since neither module can be unloaded at all
we can leave it to whoever fixes up IPv6 unloading :)

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
f3111502c065d32dcb021f55e30398aaebd8fb0f 29-Apr-2006 Ingo Molnar <mingo@elte.hu> [XFRM]: fix incorrect xfrm_state_afinfo_lock use

xfrm_state_afinfo_lock can be read-locked from bh context, so take it
in a bh-safe manner in xfrm_state_register_afinfo() and
xfrm_state_unregister_afinfo(). Found by the lock validator.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2717096ab41eacdbf07352dca6826b59470eb39a 15-Apr-2006 Jamal Hadi Salim <hadi@cyberus.ca> [XFRM]: Fix aevent timer.

Send aevent immediately if we have sent nothing since last timer and
this is the first packet.

Fixes a corner case when packet threshold is very high, the timer low
and a very low packet rate input which is bursty.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
a70fcb0ba337956d91476e2e7c3e71d9df940a82 21-Mar-2006 David S. Miller <davem@davemloft.net> [XFRM]: Add some missing exports.

To fix the case of modular xfrm_user.

Signed-off-by: David S. Miller <davem@davemloft.net>
ee857a7d672859cf4eb735d32bce22c8b7ad0bd2 21-Mar-2006 David S. Miller <davem@davemloft.net> [XFRM]: Move xfrm_nl to xfrm_state.c from xfrm_user.c

xfrm_user could be modular, and since generic code uses this symbol
now...

Signed-off-by: David S. Miller <davem@davemloft.net>
0ac8475248164553ffe21948c7b1a4b9d2a935dc 21-Mar-2006 David S. Miller <davem@davemloft.net> [XFRM]: Make sure xfrm_replay_timer_handler() is declared early enough.

Signed-off-by: David S. Miller <davem@davemloft.net>
6c5c8ca7ff20523e427b955aa84cef407934710f 21-Mar-2006 Jamal Hadi Salim <hadi@cyberus.ca> [IPSEC]: Sync series - policy expires

This is similar to the SA expire insertion patch - only it inserts
expires for SP.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
53bc6b4d29c07664f3abe029b7e6878a1067899a 21-Mar-2006 Jamal Hadi Salim <hadi@cyberus.ca> [IPSEC]: Sync series - SA expires

This patch allows a user to insert SA expires. This is useful to
do on an HA backup for the case of byte counts but may not be very
useful for the case of time based expiry.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
980ebd25794f0f87ac32844e2c73e9e81f0a72ba 21-Mar-2006 Jamal Hadi Salim <hadi@cyberus.ca> [IPSEC]: Sync series - acquire insert

This introduces a feature similar to the one described in RFC 2367:
"
... the application needing an SA sends a PF_KEY
SADB_ACQUIRE message down to the Key Engine, which then either
returns an error or sends a similar SADB_ACQUIRE message up to one or
more key management applications capable of creating such SAs.
...
...
The third is where an application-layer consumer of security
associations (e.g. an OSPFv2 or RIPv2 daemon) needs a security
association.

Send an SADB_ACQUIRE message from a user process to the kernel.

<base, address(SD), (address(P),) (identity(SD),) (sensitivity,)
proposal>

The kernel returns an SADB_ACQUIRE message to registered
sockets.

<base, address(SD), (address(P),) (identity(SD),) (sensitivity,)
proposal>

The user-level consumer waits for an SADB_UPDATE or SADB_ADD
message for its particular type, and then can use that
association by using SADB_GET messages.

"
An app such as OSPF could then use ipsec KM to get keys

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
f8cd54884e675dfaf0c86cc7c088adb6ca9d7638 21-Mar-2006 Jamal Hadi Salim <hadi@cyberus.ca> [IPSEC]: Sync series - core changes

This patch provides the core functionality needed for sync events
for ipsec. Derived work of Krisztian KOVACS <hidden@balabit.hu>

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
21380b81ef8699179b535e197a95b891a7badac7 22-Feb-2006 Herbert Xu <herbert@gondor.apana.org.au> [XFRM]: Eliminate refcounting confusion by creating __xfrm_state_put().

We often just do an atomic_dec(&x->refcnt) on an xfrm_state object
because we know there is more than 1 reference remaining and thus
we can elide the heavier xfrm_state_put() call.

Do this behind an inline function called __xfrm_state_put() so that is
more obvious and also to allow us to more cleanly add refcount
debugging later.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
df71837d5024e2524cd51c93621e558aa7dd9f3f 14-Dec-2005 Trent Jaeger <tjaeger@cse.psu.edu> [LSM-IPSec]: Security association restriction.

This patch series implements per packet access control via the
extension of the Linux Security Modules (LSM) interface by hooks in
the XFRM and pfkey subsystems that leverage IPSec security
associations to label packets. Extensions to the SELinux LSM are
included that leverage the patch for this purpose.

This patch implements the changes necessary to the XFRM subsystem,
pfkey interface, ipv4/ipv6, and xfrm_user interface to restrict a
socket to use only authorized security associations (or no security
association) to send/receive network packets.

Patch purpose:

The patch is designed to enable access control per packets based on
the strongly authenticated IPSec security association. Such access
controls augment the existing ones based on network interface and IP
address. The former are very coarse-grained, and the latter can be
spoofed. By using IPSec, the system can control access to remote
hosts based on cryptographic keys generated using the IPSec mechanism.
This enables access control on a per-machine basis or per-application
if the remote machine is running the same mechanism and trusted to
enforce the access control policy.

Patch design approach:

The overall approach is that policy (xfrm_policy) entries set by
user-level programs (e.g., setkey for ipsec-tools) are extended with a
security context that is used at policy selection time in the XFRM
subsystem to restrict the sockets that can send/receive packets via
security associations (xfrm_states) that are built from those
policies.

A presentation available at
www.selinux-symposium.org/2005/presentations/session2/2-3-jaeger.pdf
from the SELinux symposium describes the overall approach.

Patch implementation details:

On output, the policy retrieved (via xfrm_policy_lookup or
xfrm_sk_policy_lookup) must be authorized for the security context of
the socket and the same security context is required for resultant
security association (retrieved or negotiated via racoon in
ipsec-tools). This is enforced in xfrm_state_find.

On input, the policy retrieved must also be authorized for the socket
(at __xfrm_policy_check), and the security context of the policy must
also match the security association being used.

The patch has virtually no impact on packets that do not use IPSec.
The existing Netfilter (outgoing) and LSM rcv_skb hooks are used as
before.

Also, if IPSec is used without security contexts, the impact is
minimal. The LSM must allow such policies to be selected for the
combination of socket and remote machine, but subsequent IPSec
processing proceeds as in the original case.

Testing:

The pfkey interface is tested using the ipsec-tools. ipsec-tools have
been modified (a separate ipsec-tools patch is available for version
0.5) that supports assignment of xfrm_policy entries and security
associations with security contexts via setkey and the negotiation
using the security contexts via racoon.

The xfrm_user interface is tested via ad hoc programs that set
security contexts. These programs are also available from me, and
contain programs for setting, getting, and deleting policy for testing
this interface. Testing of sa functions was done by tracing kernel
behavior.

Signed-off-by: Trent Jaeger <tjaeger@cse.psu.edu>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
399c180ac5f0cb66ef9479358e0b8b6bafcbeafe 19-Dec-2005 David S. Miller <davem@sunset.davemloft.net> [IPSEC]: Perform SA switchover immediately.

When we insert a new xfrm_state which potentially
subsumes an existing one, make sure all cached
bundles are flushed so that the new SA is used
immediately.

Signed-off-by: David S. Miller <davem@davemloft.net>
a51482bde22f99c63fbbb57d5d46cc666384e379 08-Nov-2005 Jesper Juhl <jesper.juhl@gmail.com> [NET]: kfree cleanup

From: Jesper Juhl <jesper.juhl@gmail.com>

This is the net/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in net/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
80b30c1023dbd795faf948dee0cfb3b270b56d47 15-Oct-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Kill obsolete get_mss function

Now that we've switched over to storing MTUs in the xfrm_dst entries,
we no longer need the dst's get_mss methods. This patch gets rid of
them.

It also documents the fact that our MTU calculation is not optimal
for ESP.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
d094cd83c06e06e01d8edb540555f3f64e4081c2 20-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Add xfrm_state_afinfo->init_flags

This patch adds the xfrm_state_afinfo->init_flags hook which allows
each address family to perform any common initialisation that does
not require a corresponding destructor call.

It will be used subsequently to set the XFRM_STATE_NOPMTUDISC flag
in IPv4.

It also fixes up the error codes returned by xfrm_init_state.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
72cb6962a91f2af9eef69a06198e1949c10259ae 20-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC]: Add xfrm_init_state

This patch adds xfrm_init_state which is simply a wrapper that calls
xfrm_get_type and subsequently x->type->init_state. It also gets rid
of the unused args argument.

Abstracting it out allows us to add common initialisation code, e.g.,
to set family-specific flags.

The add_time setting in xfrm_user.c was deleted because it's already
set by xfrm_state_alloc.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: James Morris <jmorris@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f60f6b8f70c756fc786d68f02ec17a1e84db645f 19-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] Use XFRM_MSG_* instead of XFRM_SAP_*

This patch removes XFRM_SAP_* and converts them over to XFRM_MSG_*.
The netlink interface is meant to map directly onto the underlying
xfrm subsystem. Therefore rather than using a new independent
representation for the events we can simply use the existing ones
from xfrm_user.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
bf08867f91a43aa3ba2e4598c06c4769a6cdddf6 19-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] Turn km_event.data into a union

This patch turns km_event.data into a union. This makes code that
uses it clearer.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
4666faab095230ec8aa62da6c33391287f281154 19-Jun-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPSEC] Kill spurious hard expire messages

This patch ensures that the hard state/policy expire notifications are
only sent when the state/policy is successfully removed from their
respective tables.

As it is, it's possible for a state/policy to both expire through
reaching a hard limit, as well as being deleted by the user.

Note that this behaviour isn't actually forbidden by RFC 2367.
However, it is a quality of implementation issue.

As an added bonus, the restructuring in this patch will help
eventually in moving the expire notifications from softirq
context into process context, thus improving their reliability.

One important side-effect from this change is that SAs reaching
their hard byte/packet limits are now deleted immediately, just
like SAs that have reached their hard time limits.

Previously they were announced immediately but only deleted after
30 seconds.

This is bad because it prevents the system from issuing an ACQUIRE
command until the existing state was deleted by the user or expires
after the time is up.

In the scenario where the expire notification was lost this introduces
a 30 second delay into the system for no good reason.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
26b15dad9f1c19d6d4f7b999b07eaa6d98e4b375 19-Jun-2005 Jamal Hadi Salim <hadi@cyberus.ca> [IPSEC] Add complete xfrm event notification

Heres the final patch.
What this patch provides

- netlink xfrm events
- ability to have events generated by netlink propagated to pfkey
and vice versa.
- fixes the acquire lets-be-happy-with-one-success issue

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
5c5d281a93e9816966b6131ccec19519dab0e103 22-Apr-2005 Patrick McHardy <kaber@trash.net> [XFRM]: Fix existence lookup in xfrm_state_find

Use 'daddr' instead of &tmpl->id.daddr, since the latter
might be zero. Also, only perform the lookup when
tmpl->id.spi is non-zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!