History log of /net/ipv4/fib_semantics.c
Revision Date Author Comments
f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7 13-Oct-2014 Jiri Pirko <jiri@resnulli.us> ipv4: fix nexthop attlen check in fib_nh_match

fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
caa415270c732505240bb60171c44a7838c555e8 04-Sep-2014 Eric Dumazet <edumazet@google.com> ipv4: fix a race in update_or_create_fnhe()

nh_exceptions is effectively used under rcu, but lacks proper
barriers. Between kzalloc() and setting of nh->nh_exceptions(),
we need a proper memory barrier.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 4895c771c7f00 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: David S. Miller <davem@davemloft.net>
aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd 06-May-2014 Sergey Popovich <popovich_sergei@mail.ru> ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation

Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
6a662719c9868b3d6c7d26b3a085f0cd3cc15e64 16-Apr-2014 Cong Wang <cwang@twopensource.com> ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif

As suggested by Julian:

Simply, flowi4_iif must not contain 0, it does not
look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
from include/net/netns/netfilter.h:5,
from include/net/net_namespace.h:21,
from include/linux/netdevice.h:43,
from include/linux/icmpv6.h:12,
from include/linux/ipv6.h:61,
from include/net/ipv6.h:16,
from include/linux/sunrpc/clnt.h:27,
from include/linux/nfs_fs.h:30,
from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9cb6b6ec18ca91d090178d3c6d1dde7428c791b 28-Dec-2013 Stephen Hemminger <stephen@networkplumber.org> ipv4: make fib_detect_death static

Make fib_detect_death function static only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
9877b25382e770618d0a36a3024d8a3c67eee9ea 17-Oct-2013 Joe Perches <joe@perches.com> fib: Use const struct nl_info * in rtmsg_fib

The rtmsg_fib function doesn't modify this argument so mark
it const.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2ffae99d1fac272952b5a395759823717760ce37 27-Jun-2013 Timo Teräs <timo.teras@iki.fi> ipv4: use next hop exceptions also for input routes

Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".

However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)->ops->update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.

It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.

So for the time being, cache separate copies of input routes for
each next hop exception.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a 28-Feb-2013 Sasha Levin <sasha.levin@oracle.com> hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d94ce9b283736a876b2e6dec665c68e5e8b5d55e 21-Oct-2012 Eric Dumazet <edumazet@google.com> ipv4: 16 slots in initial fib_info hash table

A small host typically needs ~10 fib_info structures, so create initial
hash table with 16 slots instead of only one. This removes potential
false sharing and reallocs/rehashes (1->2->4->8->16)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f8a17175c63fd3e8b573719f7538816f8c96abf4 08-Oct-2012 Julian Anastasov <ja@ssi.bg> ipv4: make sure nh_pcpu_rth_output is always allocated

Avoid checking nh_pcpu_rth_output in fast path,
abort fib_info creation on alloc_percpu failure.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
f4ef85bbda96324785097356336bc79cdd37db0a 04-Oct-2012 Eric Dumazet <edumazet@google.com> ipv4: add a fib_type to fib_info

commit d2d68ba9fe8 (ipv4: Cache input routes in fib_info nexthops.)
introduced a regression for forwarding.

This was hard to reproduce but the symptom was that packets were
delivered to local host instead of being forwarded.

David suggested to add fib_type to fib_info so that we dont
inadvertently share same fib_info for different purposes.

With help from Julian Anastasov who provided very helpful
hints, reproduced here :

<quote>
Can it be a problem related to fib_info reuse
from different routes. For example, when local IP address
is created for subnet we have:

broadcast 192.168.0.255 dev DEV proto kernel scope link src
192.168.0.1
192.168.0.0/24 dev DEV proto kernel scope link src 192.168.0.1
local 192.168.0.1 dev DEV proto kernel scope host src 192.168.0.1

The "dev DEV proto kernel scope link src 192.168.0.1" is
a reused fib_info structure where we put cached routes.
The result can be same fib_info for 192.168.0.255 and
192.168.0.0/24. RTN_BROADCAST is cached only for input
routes. Incoming broadcast to 192.168.0.255 can be cached
and can cause problems for traffic forwarded to 192.168.0.0/24.
So, this patch should solve the problem because it
separates the broadcast from unicast traffic.

And the ip_route_input_slow caching will work for
local and broadcast input routes (above routes 1 and 3) just
because they differ in scope and use different fib_info.

</quote>

Many thanks to Chris Clayton for his patience and help.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Bisected-by: Chris Clayton <chris2553@googlemail.com>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15e473046cb6e5d18a4d0057e61d76315230382b 07-Sep-2012 Eric W. Biederman <ebiederm@xmission.com> netlink: Rename pid to portid to avoid confusion

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c5038a8327b980a5b279fa193163c468011de009 01-Aug-2012 David S. Miller <davem@davemloft.net> ipv4: Cache routes in nexthop exception entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
d26b3a7c4b3b26319f18bb645de93eba8f4bdcd5 31-Jul-2012 Eric Dumazet <edumazet@google.com> ipv4: percpu nh_rth_output cache

Input path is mostly run under RCU and doesnt touch dst refcnt

But output path on forwarding or UDP workloads hits
badly dst refcount, and we have lot of false sharing, for example
in ipv4_mtu() when reading rt->rt_pmtu

Using a percpu cache for nh_rth_output gives a nice performance
increase at a small cost.

24 udpflood test on my 24 cpu machine (dummy0 output device)
(each process sends 1.000.000 udp frames, 24 processes are started)

before : 5.24 s
after : 2.06 s
For reference, time on linux-3.5 : 6.60 s

Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
54764bb647b2e847c512acf8d443df965da35000 31-Jul-2012 Eric Dumazet <eric.dumazet@gmail.com> ipv4: Restore old dst_free() behavior.

commit 404e0a8b6a55 (net: ipv4: fix RCU races on dst refcounts) tried
to solve a race but added a problem at device/fib dismantle time :

We really want to call dst_free() as soon as possible, even if sockets
still have dst in their cache.
dst_release() calls in free_fib_info_rcu() are not welcomed.

Root of the problem was that now we also cache output routes (in
nh_rth_output), we must use call_rcu() instead of call_rcu_bh() in
rt_free(), because output route lookups are done in process context.

Based on feedback and initial patch from David Miller (adding another
call_rcu_bh() call in fib, but it appears it was not the right fix)

I left the inet_sk_rx_dst_set() helper and added __rcu attributes
to nh_rth_output and nh_rth_input to better document what is going on in
this code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
404e0a8b6a55d5e1cd138c6deb1bca9abdf75d8c 30-Jul-2012 Eric Dumazet <edumazet@google.com> net: ipv4: fix RCU races on dst refcounts

commit c6cffba4ffa2 (ipv4: Fix input route performance regression.)
added various fatal races with dst refcounts.

crashes happen on tcp workloads if routes are added/deleted at the same
time.

The dst_free() calls from free_fib_info_rcu() are clearly racy.

We need instead regular dst refcounting (dst_release()) and make
sure dst_release() is aware of RCU grace periods :

Add DST_RCU_FREE flag so that dst_release() respects an RCU grace period
before dst destruction for cached dst

Introduce a new inet_sk_rx_dst_set() helper, using atomic_inc_not_zero()
to make sure we dont increase a zero refcount (On a dst currently
waiting an rcu grace period before destruction)

rt_cache_route() must take a reference on the new cached route, and
release it if was not able to install it.

With this patch, my machines survive various benchmarks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c6cffba4ffa26a8ffacd0bb9f3144e34f20da7de 26-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Fix input route performance regression.

With the routing cache removal we lost the "noref" code paths on
input, and this can kill some routing workloads.

Reinstate the noref path when we hit a cached route in the FIB
nexthops.

With help from Eric Dumazet.

Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d2d68ba9fe8b38eb03124b3176a013bb8aa2b5e5 17-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Cache input routes in fib_info nexthops.

Caching input routes is slightly simpler than output routes, since we
don't need to be concerned with nexthop exceptions. (locally
destined, and routed packets, never trigger PMTU events or redirects
that will be processed by us).

However, we have to elide caching for the DIRECTSRC and non-zero itag
cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
f2bb4bedf35d5167a073dcdddf16543f351ef3ae 17-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Cache output routes in fib_info nexthops.

If we have an output route that lacks nexthop exceptions, we can cache
it in the FIB info nexthop.

Such routes will have DST_HOST cleared because such routes refer to a
family of destinations, rather than just one.

The sequence of the handling of exceptions during route lookup is
adjusted to make the logic work properly.

Before we allocate the route, we lookup the exception.

Then we know if we will cache this route or not, and therefore whether
DST_HOST should be set on the allocated route.

Then we use DST_HOST to key off whether we should store the resulting
route, during rt_set_nexthop(), in the FIB nexthop cache.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
5abf7f7e0f6bdbfcac737f636497d7016d9507eb 17-Jul-2012 Eric Dumazet <edumazet@google.com> ipv4: fix rcu splat

free_nh_exceptions() should use rcu_dereference_protected(..., 1)
since its called after one RCU grace period.

Also add some const-ification in recent code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4895c771c7f006b4b90f9d6b1d2210939ba57b38 17-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Add FIB nexthop exceptions.

In a regime where we have subnetted route entries, we need a way to
store persistent storage about destination specific learned values
such as redirects and PMTU values.

This is implemented here via nexthop exceptions.

The initial implementation is a 2048 entry hash table with relaiming
starting at chain length 5. A more sophisticated scheme can be
devised if that proves necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
710ab6c03122cf464510f8c86eb0a179e80b2d61 10-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Enforce max MTU metric at route insertion time.

Rather than at every struct rtable creation.

Signed-off-by: David S. Miller <davem@davemloft.net>
f4530fa574df4d833506c53697ed1daa0d390bf4 06-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Avoid overhead when no custom FIB rules are installed.

If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.

Signed-off-by: David S. Miller <davem@davemloft.net>
7a9bc9b81a5bc6e44ebc80ef781332e4385083f2 29-Jun-2012 David S. Miller <davem@davemloft.net> ipv4: Elide fib_validate_source() completely when possible.

If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.

We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.

Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.

Signed-off-by: David S. Miller <davem@davemloft.net>
6fac262526ee91ee66210b8919a4297dcf7d544e 18-Jun-2012 David S. Miller <davem@davemloft.net> ipv4: Cap ADVMSS metric in the FIB rather than the routing cache.

It makes no sense to execute this limit test every time we create a
routing cache entry.

We can't simply error out on these things since we've silently
accepted and truncated them forever.

Signed-off-by: David S. Miller <davem@davemloft.net>
e49cc0da7283088c5e03d475ffe2fdcb24a6d5b1 23-May-2012 Yanmin Zhang <yanmin_zhang@linux.intel.com> ipv4: fix the rcu race between free_fib_info and ip_route_output_slow

We hit a kernel OOPS.

<3>[23898.789643] BUG: sleeping function called from invalid context at
/data/buildbot/workdir/ics/hardware/intel/linux-2.6/arch/x86/mm/fault.c:1103
<3>[23898.862215] in_atomic(): 0, irqs_disabled(): 0, pid: 10526, name:
Thread-6683
<4>[23898.967805] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.258526] Pid: 10526, comm: Thread-6683 Tainted: G W
3.0.8-137685-ge7742f9 #1
<4>[23899.357404] HSU serial 0000:00:05.1: 0000:00:05.2:HSU serial prevented me
to suspend...
<4>[23899.904225] Call Trace:
<4>[23899.989209] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.000416] [<c1238c2a>] __might_sleep+0x10a/0x110
<4>[23900.007357] [<c1228021>] do_page_fault+0xd1/0x3c0
<4>[23900.013764] [<c18e9ba9>] ? restore_all+0xf/0xf
<4>[23900.024024] [<c17c007b>] ? napi_complete+0x8b/0x690
<4>[23900.029297] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.123739] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.128955] [<c18ea0c3>] error_code+0x5f/0x64
<4>[23900.133466] [<c1227f50>] ? pgtable_bad+0x130/0x130
<4>[23900.138450] [<c17f6298>] ? __ip_route_output_key+0x698/0x7c0
<4>[23900.144312] [<c17f5f8d>] ? __ip_route_output_key+0x38d/0x7c0
<4>[23900.150730] [<c17f63df>] ip_route_output_flow+0x1f/0x60
<4>[23900.156261] [<c181de58>] ip4_datagram_connect+0x188/0x2b0
<4>[23900.161960] [<c18e981f>] ? _raw_spin_unlock_bh+0x1f/0x30
<4>[23900.167834] [<c18298d6>] inet_dgram_connect+0x36/0x80
<4>[23900.173224] [<c14f9e88>] ? _copy_from_user+0x48/0x140
<4>[23900.178817] [<c17ab9da>] sys_connect+0x9a/0xd0
<4>[23900.183538] [<c132e93c>] ? alloc_file+0xdc/0x240
<4>[23900.189111] [<c123925d>] ? sub_preempt_count+0x3d/0x50

Function free_fib_info resets nexthop_nh->nh_dev to NULL before releasing
fi. Other cpu might be accessing fi. Fixing it by delaying the releasing.

With the patch, we ran MTBF testing on Android mobile for 12 hours
and didn't trigger the issue.

Thank Eric for very detailed review/checking the issue.

Signed-off-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Signed-off-by: Kun Jiang <kunx.jiang@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f3756b79e8f76cb92830383c215deba146fe0a26 02-Apr-2012 David S. Miller <davem@davemloft.net> ipv4: Stop using NLA_PUT*().

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
9ffc93f203c18a70623f21950f1dd473c9ec48cd 28-Mar-2012 David Howells <dhowells@redhat.com> Remove all #inclusions of asm/system.h

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
058bd4d2a4ff0aaa4a5381c67e776729d840c785 11-Mar-2012 Joe Perches <joe@perches.com> net: Convert printks to pr_<level>

Use a more current kernel messaging style.

Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.

Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini. Some ah4 and esp messages
are now not prefixed with "ip ".

The intent of this patch is to later add something like
#define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.

Text size is trivially reduced. (x86-32 allyesconfig)

$ size net/ipv4/built-in.o*
text data bss dec hex filename
887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new
887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19c1ea14c930db5e9c0cd7c3c6f4d01457dfcd69 04-Sep-2011 Yan, Zheng <zheng.z.yan@intel.com> ipv4: Fix fib_info->fib_metrics leak

Commit 4670994d(net,rcu: convert call_rcu(fc_rport_free_rcu) to
kfree_rcu()) introduced a memory leak. This patch reverts it.

Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4670994d150a86ebd53ab353a2af517c5465bfaf 18-Mar-2011 Lai Jiangshan <laijs@cn.fujitsu.com> net,rcu: convert call_rcu(fc_rport_free_rcu) to kfree_rcu()

The rcu callback fc_rport_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(fc_rport_free_rcu).

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
37e826c513883099c298317bad1b3b677b2905fb 25-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Fix nexthop caching wrt. scoping.

Move the scope value out of the fib alias entries and into fib_info,
so that we always use the correct scope when recomputing the nexthop
cached source address.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
436c3b66ec9824a633724ae42de1c416af4f2063 25-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Invalidate nexthop cache nh_saddr more correctly.

Any operation that:

1) Brings up an interface
2) Adds an IP address to an interface
3) Deletes an IP address from an interface

can potentially invalidate the nh_saddr value, requiring
it to be recomputed.

Perform the recomputation lazily using a generation ID.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
fcd13f42c9d6ab7b1024b9b7125a2e8db3cc00b2 24-Mar-2011 Eric Dumazet <eric.dumazet@gmail.com> ipv4: fix fib metrics

Alessandro Suardi reported that we could not change route metrics :

ip ro change default .... advmss 1400

This regression came with commit 9c150e82ac50 (Allocate fib metrics
dynamically). fib_metrics is no longer an array, but a pointer to an
array.

Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9ade22861f922344788321e374c542c92bc049b6 12-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Use flowi4 in FIB layer.

Signed-off-by: David S. Miller <davem@davemloft.net>
22bd5b9b13f2931ac80949f8bfbc40e8cab05be7 12-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Pass ipv4 flow objects into fib_lookup() paths.

To start doing these conversions, we need to add some temporary
flow4_* macros which will eventually go away when all the protocol
code paths are changed to work on AF specific flowi objects.

Signed-off-by: David S. Miller <davem@davemloft.net>
1d28f42c1bd4bb2363d88df74d0128b4da135b4a 12-Mar-2011 David S. Miller <davem@davemloft.net> net: Put flowi_* prefix on AF independent members of struct flowi

I intend to turn struct flowi into a union of AF specific flowi
structs. There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
1b7fe59322bef9e7a2c05b64a07a66b875299736 11-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Kill flowi arg to fib_select_multipath()

Completely unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
a7ac8fc1d8d26c975c460a69aa7b9d5b5d5d29b0 08-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Fix scope value used in route src-address caching.

We have to use cfg->fc_scope not the final nh_scope value.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
1fc050a13473348f5c439de2bb41c8e92dba5588 08-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Cache source address in nexthop entries.

When doing output route lookups, we have to select the source address
if the user has not specified an explicit one.

First, if the route has an explicit preferred source address
specified, then we use that.

Otherwise we search the route's outgoing interface for a suitable
address.

This search can be precomputed and cached at route insertion time.

The only missing part is that we have to refresh this precomputed
value any time addresses are added or removed from the interface, and
this is accomplished by fib_update_nh_saddrs().

Signed-off-by: David S. Miller <davem@davemloft.net>
3be0686b6e2f953afe83626e871b4a7b0ceae49b 08-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Inline fib_semantic_match into check_leaf

This elimiates a lot of pure overhead due to parameter
passing.

Signed-off-by: David S. Miller <davem@davemloft.net>
4c8237cd76a0510677dc2e3dd0f8866ec8e0b1e5 07-Mar-2011 David S. Miller <davem@davemloft.net> ipv4: Validate route entry type at insert instead of every lookup.

fib_semantic_match() requires that if the type doesn't signal an
automatic error, it must be of type RTN_UNICAST, RTN_LOCAL,
RTN_BROADCAST, RTN_ANYCAST, or RTN_MULTICAST.

Checking this every route lookup is pointless work.

Instead validate it during route insertion, via fib_create_info().

Also, there was nothing making sure the type value was less than
RTN_MAX, so add that missing check while we're here.

Signed-off-by: David S. Miller <davem@davemloft.net>
31d409373cca3517a30540b51f55dcb1f5af0d49 14-Feb-2011 Eric Dumazet <eric.dumazet@gmail.com> ipv4: fix rcu lock imbalance in fib_select_default()

Commit 0c838ff1ade7 (ipv4: Consolidate all default route selection
implementations.) forgot to remove one rcu_read_unlock() from
fib_select_default().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
123b9731b14f49cd41c91ed2b6c31e515615347c 02-Feb-2011 David S. Miller <davem@davemloft.net> ipv4: Rename fib_hash_* locals in fib_semantics.c

To avoid confusion with the recently deleted fib_hash.c
code, use "fib_info_hash_*" instead of plain "fib_hash_*".

Signed-off-by: David S. Miller <davem@davemloft.net>
0c838ff1ade71162775afffd9e5c6478a60bdca6 01-Feb-2011 David S. Miller <davem@davemloft.net> ipv4: Consolidate all default route selection implementations.

Both fib_trie and fib_hash have a local implementation of
fib_table_select_default(). This is completely unnecessary
code duplication.

Since we now remember the fib_table and the head of the fib
alias list of the default route, we can implement one single
generic version of this routine.

Looking at the fib_hash implementation you may get the impression
that it's possible for there to be multiple top-level routes in
the table for the default route. The truth is, it isn't, the
insert code will only allow one entry to exist in the zero
prefix hash table, because all keys evaluate to zero and all
keys in a hash table must be unique.

Signed-off-by: David S. Miller <davem@davemloft.net>
5b4704419cbd0b7597a91c19f9e8e8b17c1af071 01-Feb-2011 David S. Miller <davem@davemloft.net> ipv4: Remember FIB alias list head and table in lookup results.

This will be used later to implement fib_select_default() in a
completely generic manner, instead of the current situation where the
default route is re-looked up in the TRIE/HASH table and then the
available aliases are analyzed.

Signed-off-by: David S. Miller <davem@davemloft.net>
725d1e1b457dc2bbebb337677e73efe7c6d14da5 28-Jan-2011 David S. Miller <davem@davemloft.net> ipv4: Attach FIB info to dst_default_metrics when possible

If there are no explicit metrics attached to a route, hook
fi->fib_info up to dst_default_metrics.

Signed-off-by: David S. Miller <davem@davemloft.net>
9c150e82ac50a611237bbebd508d17f6347d577c 28-Jan-2011 David S. Miller <davem@davemloft.net> ipv4: Allocate fib metrics dynamically.

This is the initial gateway towards super-sharing metrics
if they are all set to zero for a route.

Signed-off-by: David S. Miller <davem@davemloft.net>
c7066f70d9610df0b9406cc635fc09e86136e714 14-Jan-2011 Patrick McHardy <kaber@trash.net> netfilter: fix Kconfig dependencies

Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE,
which itself depends on NET_SCHED; this dependency is missing from netfilter.

Since matching on realms is also useful without having NET_SCHED enabled and
the option really only controls whether the tclassid member is included in
route and dst entries, rename the config option to IP_ROUTE_CLASSID and move
it outside of traffic scheduling context to get rid of the NET_SCHED dependeny.

Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
5811662b15db018c740c57d037523683fd3e6123 12-Nov-2010 Changli Gao <xiaosuo@gmail.com> net: use the macros defined for the members of flowi

Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b0c290e78d667e6a483bde8c7cef7dd15f49017 21-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com> fib: introduce fib_alias_accessed() helper

Perf tools session at NFWS 2010 pointed out a false sharing on struct
fib_alias that can be avoided pretty easily, if we set FA_S_ACCESSED bit
only if needed (ie : not already set)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8723e1b4ad9be4444423b4d41509ce859a629649 19-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com> inet: RCU changes in inetdev_by_index()

Convert inetdev_by_index() to not increment in_dev refcount.

Callers hold RCU or RTNL, and should not decrement in_dev refcount.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ebc0ffae5dfb4447e0a431ffe7fe1d467c48bbb9 05-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com> fib: RCU conversion of fib_lookup()

fib_lookup() converted to be called in RCU protected context, no
reference taken and released on a contended cache line (fib_clntref)

fib_table_lookup() and fib_semantic_match() get an additional parameter.

struct fib_info gets an rcu_head field, and is freed after an rcu grace
period.

Stress test :
(Sending 160.000.000 UDP frames on same neighbour,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_HASH) (about same results for FIB_TRIE)

Before patch :

real 1m31.199s
user 0m13.761s
sys 23m24.780s

After patch:

real 1m5.375s
user 0m14.997s
sys 15m50.115s

Before patch Profile :

13044.00 15.4% __ip_route_output_key vmlinux
8438.00 10.0% dst_destroy vmlinux
5983.00 7.1% fib_semantic_match vmlinux
5410.00 6.4% fib_rules_lookup vmlinux
4803.00 5.7% neigh_lookup vmlinux
4420.00 5.2% _raw_spin_lock vmlinux
3883.00 4.6% rt_set_nexthop vmlinux
3261.00 3.9% _raw_read_lock vmlinux
2794.00 3.3% fib_table_lookup vmlinux
2374.00 2.8% neigh_resolve_output vmlinux
2153.00 2.5% dst_alloc vmlinux
1502.00 1.8% _raw_read_lock_bh vmlinux
1484.00 1.8% kmem_cache_alloc vmlinux
1407.00 1.7% eth_header vmlinux
1406.00 1.7% ipv4_dst_destroy vmlinux
1298.00 1.5% __copy_from_user_ll vmlinux
1174.00 1.4% dev_queue_xmit vmlinux
1000.00 1.2% ip_output vmlinux

After patch Profile :

13712.00 15.8% dst_destroy vmlinux
8548.00 9.9% __ip_route_output_key vmlinux
7017.00 8.1% neigh_lookup vmlinux
4554.00 5.3% fib_semantic_match vmlinux
4067.00 4.7% _raw_read_lock vmlinux
3491.00 4.0% dst_alloc vmlinux
3186.00 3.7% neigh_resolve_output vmlinux
3103.00 3.6% fib_table_lookup vmlinux
2098.00 2.4% _raw_read_lock_bh vmlinux
2081.00 2.4% kmem_cache_alloc vmlinux
2013.00 2.3% _raw_spin_lock vmlinux
1763.00 2.0% __copy_from_user_ll vmlinux
1763.00 2.0% ip_output vmlinux
1761.00 2.0% ipv4_dst_destroy vmlinux
1631.00 1.9% eth_header vmlinux
1440.00 1.7% _raw_read_unlock_bh vmlinux

Reference results, if IP route cache is enabled :

real 0m29.718s
user 0m10.845s
sys 7m37.341s

25213.00 29.5% __ip_route_output_key vmlinux
9011.00 10.5% dst_release vmlinux
4817.00 5.6% ip_push_pending_frames vmlinux
4232.00 5.0% ip_finish_output vmlinux
3940.00 4.6% udp_sendmsg vmlinux
3730.00 4.4% __copy_from_user_ll vmlinux
3716.00 4.4% ip_route_output_flow vmlinux
2451.00 2.9% __xfrm_lookup vmlinux
2221.00 2.6% ip_append_data vmlinux
1718.00 2.0% _raw_spin_lock_bh vmlinux
1655.00 1.9% __alloc_skb vmlinux
1572.00 1.8% sock_wfree vmlinux
1345.00 1.6% kfree vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6a31d2a97c04ffe9b161ec0177a2296366ff9249 04-Oct-2010 Eric Dumazet <eric.dumazet@gmail.com> fib: cleanups

Code style cleanups before upcoming functional changes.
C99 initializer for fib_props array.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
d088dde7b19628f386c486f16cd471f5b5682ca8 03-Feb-2010 Christoph Egger <siccegge@stud.informatik.uni-erlangen.de> ipv4: obsolete config in kernel source (IP_ROUTE_PERVASIVE)

CONFIG_IP_ROUTE_PERVASIVE is missing a corresponding config
IP_ROUTE_PERVASIVE somewhere in KConfig (and missing it for ages
already) so it looks like some aging artefact no longer needed.

Therefor this patch kills of the only remaining reference to that
config Item removing the already unrechable code snipet.

Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
71fceff0ea36d5a6cffecb272b8b3970535fe905 15-Jan-2010 David S. Miller <davem@davemloft.net> ipv4: Use less conflicting local var name in change_nexthops() loop macro.

As noticed by H Hartley Sweeten, since change_nexthops() uses 'nh'
as it's iterator variable, it can conflict with other existing
local vars.

Use "nexthop_nh" to avoid the conflict and make it easier to figure
out where this magic variable comes from.

Signed-off-by: David S. Miller <davem@davemloft.net>
09ad9bc752519cc167d0a573e1acf69b5c707c67 26-Nov-2009 Octavian Purdila <opurdila@ixiacom.com> net: use net_eq to compare nets

Generated with the following semantic patch

@@
struct net *n1;
struct net *n2;
@@
- n1 == n2
+ net_eq(n1, n2)

@@
struct net *n1;
struct net *n2;
@@
- n1 != n2
+ !net_eq(n1, n2)

applied over {include,net,drivers/net}.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e204a345a0b11c6522ec06e62dee71628a492408 18-May-2009 Rami Rosen <ramirose@gmail.com> ipv4: cleanup - remove two unused parameters from fib_semantic_match().

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ce85fe402137824246bad03ff85f3913d565c17 25-Feb-2009 Pablo Neira Ayuso <pablo@netfilter.org> netlink: change nlmsg_notify() return value logic

This patch changes the return value of nlmsg_notify() as follows:

If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.

This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.

This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
d9319100c1ad7d0ed4045ded767684ad25670436 03-Nov-2008 Jianjun Kong <jianjun@zeuux.org> net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c inetpeer.c ip_output.c

Signed-off-by: Jianjun Kong <jianjun@zeuux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
0b040829952d84bf2a62526f0e24b624e0699447 11-Jun-2008 Adrian Bunk <bunk@kernel.org> net: remove CVS keywords

This patch removes CVS keywords that weren't updated for a long time
from comments.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
709772e6e06564ed94ba740de70185ac3d792773 11-Jun-2008 Krzysztof Piotr Oledzki <ole@ans.pl> net: Fix routing tables with id > 255 for legacy software

Most legacy software do not like tables > 255 as rtm_table is u8
so tb_id is sent &0xff and it is possible to mismatch for example
table 510 with table 254 (main).

This patch introduces RT_TABLE_COMPAT=252 so the code uses it if
tb_id > 255. It makes such old applications happy, new
ones are still able to use RTA_TABLE to get a proper table id.

Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
57d7a6009241fe04a699e5f65d55cf5cb7008fc7 16-Apr-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add netns refcnt debug into fib_info.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4814bdbd590e835ecec2d5e505165ec1c19796b2 01-Feb-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Lookup in FIB semantic hashes taking into account the namespace.

The namespace is not available in the fib_sync_down_addr, add it as a
parameter.

Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all. So, just fix lookup by address and insertion to the
hash table path.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7462bd744e8882f9ebb9220d46fd4fec8b35b082 01-Feb-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add a namespace mark to fib_info.

This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
85326fa54b5516d8859617cc5fdfce8ae19c1480 01-Feb-2008 Denis V. Lunev <den@openvz.org> [IPV4]: fib_sync_down rework.

fib_sync_down can be called with an address and with a device. In
reality it is called either with address OR with a device. The
codepath inside is completely different, so lets separate it into two
calls for these two cases.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
86167a377f1c4fb40742302ae7682dd574abde86 22-Jan-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Pass correct namespace in context fib_check_nh.

Correct network namespace is already used in fib_check_nh. Re-work its
usage for better readability and pass into fib_lookup &
inetdev_by_index.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7fee0ca23711ce1a6b13d3ab78915809a72a59ec 22-Jan-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add netns parameter to inetdev_by_index.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
da0e28cb68a7e22b47c6ae1a5b12cb538c13c69f 22-Jan-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add netns parameter to fib_lookup.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
88ebc72f68974965b41ad7e8e441df57a530e386 13-Jan-2008 David S. Miller <davem@davemloft.net> [IPV4] FIB: Include nexthop device indexes in fib_info hashfn.

Signed-off-by: David S. Miller <davem@davemloft.net>
a6db9010922f2c02db2bbea8c17c50e451be38d9 13-Jan-2008 Stephen Hemminger <stephen.hemminger@vyatta.com> [IPV4] FIB: printk related cleanups

printk related cleanups:
* Get rid of unused printk wrappers.
* Make bug checks into KERN_WARNING because KERN_DEBUG gets ignored
* Turn one cryptic old message into something real
* Make sure all messages have KERN_XXX

Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4d1169c1e781e5853317c6b75620d678b2c4854e 10-Jan-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add netns to nl_info structure.

nl_info is used to track the end-user destination of routing change
notification. This is a natural object to hold a namespace on. Place
it there and utilize the context in the appropriate places.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b175b26c1048d331508940ad3516ead1998084f 10-Jan-2008 Eric W. Biederman <ebiederm@xmission.com> [NETNS]: Add netns parameter to inet_(dev_)add_type.

The patch extends the inet_addr_type and inet_dev_addr_type with the
network namespace pointer. That allows to access the different tables
relatively to the network namespace.

The modification of the signature function is reported in all the
callers of the inet_addr_type using the pointer to the well known
init_net.

Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c17860a039bbde134324ad6f9331500635f5799d 08-Dec-2007 Denis V. Lunev <den@openvz.org> [IPV4]: no need pass pointer to a default into fib_detect_death

ipv4: no need pass pointer to a default into fib_detect_death

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
88f8349164cf4035e63db9e6f484e0ce9cb17e47 26-Nov-2007 Joonwoo Park <joonwpark81@gmail.com> [IPV4] fib_semantics: kmalloc + memset conversion to kzalloc

fib_semantics: kmalloc + memset conversion to kzalloc
fix to avoid memset entirely.

Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
97c53cacf00d1f5aa04adabfebcc806ca8b22b10 20-Nov-2007 Denis V. Lunev <den@openvz.org> [NET]: Make rtnetlink infrastructure network namespace aware (v3)

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8f4c1f9b049df3be11090f1c2c4738700302acae 12-Sep-2007 Thomas Graf <tgraf@suug.ch> [NETLINK]: Introduce nested and byteorder flag to netlink attribute

This change allows the generic attribute interface to be used within
the netfilter subsystem where this flag was initially introduced.

The byte-order flag is yet unused, it's intended use is to
allow automatic byte order convertions for all atomic types.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
881d966b48b035ab3f3aeaae0f3d3f9b584f45b2 17-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make the device list and device lookups per namespace.

This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e06e7c615877026544ad7f8b309d1a3706410383 11-Jun-2007 David S. Miller <davem@sunset.davemloft.net> [IPV4]: The scheduled removal of multipath cached routing support.

With help from Chris Wedgwood.

Signed-off-by: David S. Miller <davem@davemloft.net>
b8f558313506b5bc435f2e031f3bec4b1725098e 23-May-2007 Milan Kocian <milon@wq.cz> [RTNETLINK]: Fix sending netlink message when replace route.

When you replace route via ip r r command the netlink multicast message is
not send. This patch corrects it. NL message is sent with NLM_F_REPLACE
flag.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8320

Signed-off-by: Milan Kocian <milon@wq.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ff50b7997fe06cd5d276b229967bb52d6b3b6c1 21-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: cleanup extra semicolons

Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
a0ee18b9b7d3847976c6fb315c06a34fb296de0e 25-Mar-2007 Thomas Graf <tgraf@suug.ch> [IPv4] fib: Fix out of bound access of fib_props[]

Fixes a typo which caused fib_props[] to have the wrong size
and makes sure the value used to index the array which is
provided by userspace via netlink is checked to avoid out of
bound access.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
e905a9edab7f4f14f9213b52234e4a346c690911 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] IPV4: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
26932566a42d46aee7e5d526cb34fba9380cad10 01-Feb-2007 Patrick McHardy <kaber@trash.net> [NETLINK]: Don't BUG on undersized allocations

Currently netlink users BUG when the allocated skb for an event
notification is undersized. While this is certainly a kernel bug,
its not critical and crashing the kernel is too drastic, especially
when considering that these errors have appeared multiple times in
the past and it BUGs even if no listeners are present.

This patch replaces BUG by WARN_ON and changes the notification
functions to inform potential listeners of undersized allocations
using a unique error code (EMSGSIZE).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
339bf98ffc6a8d8eb16fc532ac57ffbced2f8a68 10-Nov-2006 Thomas Graf <tgraf@suug.ch> [NETLINK]: Do precise netlink message allocations where possible

Account for the netlink message header size directly in nlmsg_new()
instead of relying on the caller calculate it correctly.

Replaces error handling of message construction functions when
constructing notifications with bug traps since a failure implies
a bug in calculating the size of the skb.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
81f7bf6cbaca02c034b0393c51fc22b29cba20f7 28-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4]: net/ipv4/fib annotations

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
1e8aa6f125d959d1a9f16a3f15e9ebbc6df63935 27-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4] bug: open-coded inet_make_mask() in fib_semantic_match() is broken

... and works only on little-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ef1b8c85bb4954b7dca816f1bb7fd54bcd78078 27-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4]: fib_semantic_match() annotations

'mask' and 'zone' arguments are net-endian

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
d878e72e419db9ff4c66848375ee30a19820e4de 27-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4]: ip_fib_check_default() annotated

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
17fb2c64394a2d5106540d69fc83c183ee7c206e 27-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4]: RTA_{DST,SRC,GATEWAY,PREFSRC} annotated

these are passed net-endian; use be32 netlink accessors

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
b83738ae003dde613712ddb1e90a8a01f5587b51 27-Sep-2006 Al Viro <viro@zeniv.linux.org.uk> [IPV4]: FIB_RES_PREFSRC() annotated

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
be403ea1856f1428b5912b42184acbba808c41d6 18-Aug-2006 Thomas Graf <tgraf@suug.ch> [IPv4]: Convert FIB dumping to use new netlink api

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e902c57417c4c285b98ba2722468d1c3ed83d1b 18-Aug-2006 Thomas Graf <tgraf@suug.ch> [IPv4]: FIB configuration using struct fib_config

Introduces struct fib_config replacing the ugly struct kern_rta
prone to ordering issues. Avoids creating faked netlink messages
for auto generated routes or requests via ioctl.

A new interface net/nexthop.h is added to help navigate through
nexthop configuration arrays.

A new struct nl_info will be used to carry the necessary netlink
information to be used for notifications later on.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
f21c7bc5f6a0a5bd03988886ff46656bc3f255b7 15-Aug-2006 Thomas Graf <tgraf@suug.ch> [IPv4] route: Convert route notifications to use rtnl_notify()

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
9e762a4a89b302cb3b26a1f9bb33eff459eaeca9 11-Aug-2006 Patrick McHardy <kaber@trash.net> [NET]: Introduce RTA_TABLE/FRA_TABLE attributes

Introduce RTA_TABLE route attribute and FRA_TABLE routing rule attribute
to hold 32 bit routing table IDs. Usespace compatibility is provided by
continuing to accept and send the rtm_table field, but because of its
limited size it can only carry the low 8 bits of the table ID. This
implies that if larger IDs are used, _all_ userspace programs using them
need to use RTA_TABLE.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2dfe55b47e3d66ded5a84caf71e0da5710edf48b 11-Aug-2006 Patrick McHardy <kaber@trash.net> [NET]: Use u32 for routing table IDs

Use u32 for routing table IDs in net/ipv4 and net/decnet in preparation of
support for a larger number of routing tables. net/ipv6 already uses u32
everywhere and needs no further changes. No functional changes are made by
this patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
832b4c5e184391773e462653aa862a8cab71f38d 30-Aug-2006 Stephen Hemminger <shemminger@osdl.org> [IPV4] fib: convert reader/writer to spinlock

Ther is no point in using a more expensive reader/writer lock
for a low contention lock like the fib_info_lock. The only
reader case is in handling route redirects.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6e8fcbf64024f9056ba122abbb66554aa76bae5d 18-Aug-2006 Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> [IPV4]: severe locking bug in fib_semantics.c

Found in 2.4 by Yixin Pan <yxpan@hotmail.com>.

> When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) =
> is used in fib_release_info() instead of write_lock_bh(&fib_info_lock). =
> Is the following case possible: a BH interrupts fib_release_info() while =
> holding the write lock, and calls ip_check_fib_default() which calls =
> read_lock(&fib_info_lock), and spin forever.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
8265abc082d2283b4ef20237efadb71c6f16ed0c 22-Jul-2006 Patrick McHardy <kaber@trash.net> [IPV4]: Fix nexthop realm dumping for multipath routes

Routing realms exist per nexthop, but are only returned to userspace
for the first nexthop. This is due to the fact that iproute2 only
allows to set the realm for the first nexthop and the kernel refuses
multipath routes where only a single realm is present.

Dump all realms for multipath routes to enable iproute to correctly
display them.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
0da974f4f303a6842516b764507e3c0a03f41e5a 21-Jul-2006 Panagiotis Issaris <takis@issaris.org> [NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
28633514afd68afa77ed2fa34fa53626837bf2d5 10-Feb-2006 Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> [NETLINK]: illegal use of pid in rtnetlink

When a netlink message is not related to a netlink socket,
it is issued by kernel socket with pid 0. Netlink "pid" has nothing
to do with current->pid. I called it incorrectly, if it was named "port",
the confusion would be avoided.

Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
14c850212ed8f8cbb5972ad6b8812e08a0bc901c 27-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h

To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b5b5cff9a6655dbb6d2e2be365bb95eec3950eb 30-Nov-2005 Arjan van de Ven <arjan@infradead.org> [NET]: Add const markers to various variables.

the patch below marks various variables const in net/; the goal is to
move them to the .rodata section so that they can't false-share
cachelines with things that get written to, as well as potentially
helping gcc a bit with optimisations. (these were found using a gcc
patch to warn about such variables)

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5ed639913eea3e4783a550291775ab78dd84966 03-Oct-2005 Herbert Xu <herbert@gondor.apana.org.au> [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl

The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
introduces __in_dev_get_rcu() to cover the second case.

1) RCU with refcnt should use in_dev_get().
2) RCU without refcnt should use __in_dev_get_rcu().
3) All others must hold RTNL and use __in_dev_get_rtnl().

There is one exception in net/ipv4/route.c which is in fact a pre-existing
race condition. I've marked it as such so that we remember to fix it.

This patch is based on suggestions and prior work by Suzanne Wood and
Paul McKenney.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
e5b4376074e02b783e56a8f7c42d544e18112c4e 25-Aug-2005 Robert Olsson <Robert.Olsson@data.slu.se> [IPV4]: Prepare FIB core for RCU.

* RCU versions of hlist_***_rcu
* fib_alias partial rcu port just whats needed now.

Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se>
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ac6d439d2097b72ea0cbc2322ce1263a38bc1fd0 15-Aug-2005 Patrick McHardy <kaber@trash.net> [NETLINK]: Convert netlink users to use group numbers instead of bitmasks

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
b7656e7f2944984befa3ab99a5b99f99a23b302b 05-Aug-2005 David S. Miller <davem@davemloft.net> [IPV4]: Fix memory leak during fib_info hash expansion.

When we grow the tables, we forget to free the olds ones
up.

Noticed by Yan Zheng.

Signed-off-by: David S. Miller <davem@davemloft.net>
9ed19f339e12e731986de84134ac293cd15910a7 19-Jun-2005 Jamal Hadi Salim <hadi@cyberus.ca> [NETLINK]: Set correct pid for ioctl originating netlink events

This patch ensures that netlink events created as a result of programns
using ioctls (such as ifconfig, route etc) contains the correct PID of
those events.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
b6544c0b4cf2bd96195f3cdb7cebfb35090fc557 19-Jun-2005 Jamal Hadi Salim <hadi@cyberus.ca> [NETLINK]: Correctly set NLM_F_MULTI without checking the pid

This patch rectifies some rtnetlink message builders that derive the
flags from the pid. It is now explicit like the other cases
which get it right. Also fixes half a dozen dumpers which did not
set NLM_F_MULTI at all.

Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!