History log of /net/sched/cls_u32.c
Revision Date Author Comments
a0efb80ce3abacfd22a4284c3730924fc2f1f077 01-Oct-2014 WANG Cong <xiyou.wangcong@gmail.com> net_sched: avoid calling tcf_unbind_filter() in call_rcu callback

This fixes the following crash:

[ 63.976822] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 63.980094] CPU: 1 PID: 15 Comm: ksoftirqd/1 Not tainted 3.17.0-rc6+ #648
[ 63.980094] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 63.980094] task: ffff880117dea690 ti: ffff880117dfc000 task.ti: ffff880117dfc000
[ 63.980094] RIP: 0010:[<ffffffff817e6d07>] [<ffffffff817e6d07>] u32_destroy_key+0x27/0x6d
[ 63.980094] RSP: 0018:ffff880117dffcc0 EFLAGS: 00010202
[ 63.980094] RAX: ffff880117dea690 RBX: ffff8800d02e0820 RCX: 0000000000000000
[ 63.980094] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 6b6b6b6b6b6b6b6b
[ 63.980094] RBP: ffff880117dffcd0 R08: 0000000000000000 R09: 0000000000000000
[ 63.980094] R10: 00006c0900006ba8 R11: 00006ba100006b9d R12: 0000000000000001
[ 63.980094] R13: ffff8800d02e0898 R14: ffffffff817e6d4d R15: ffff880117387a30
[ 63.980094] FS: 0000000000000000(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
[ 63.980094] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 63.980094] CR2: 00007f07e6732fed CR3: 000000011665b000 CR4: 00000000000006e0
[ 63.980094] Stack:
[ 63.980094] ffff88011a9cd300 ffffffff82051ac0 ffff880117dffce0 ffffffff817e6d68
[ 63.980094] ffff880117dffd70 ffffffff810cb4c7 ffffffff810cb3cd ffff880117dfffd8
[ 63.980094] ffff880117dea690 ffff880117dea690 ffff880117dfffd8 000000000000000a
[ 63.980094] Call Trace:
[ 63.980094] [<ffffffff817e6d68>] u32_delete_key_freepf_rcu+0x1b/0x1d
[ 63.980094] [<ffffffff810cb4c7>] rcu_process_callbacks+0x3bb/0x691
[ 63.980094] [<ffffffff810cb3cd>] ? rcu_process_callbacks+0x2c1/0x691
[ 63.980094] [<ffffffff817e6d4d>] ? u32_destroy_key+0x6d/0x6d
[ 63.980094] [<ffffffff810780a4>] __do_softirq+0x142/0x323
[ 63.980094] [<ffffffff810782a8>] run_ksoftirqd+0x23/0x53
[ 63.980094] [<ffffffff81092126>] smpboot_thread_fn+0x203/0x221
[ 63.980094] [<ffffffff81091f23>] ? smpboot_unpark_thread+0x33/0x33
[ 63.980094] [<ffffffff8108e44d>] kthread+0xc9/0xd1
[ 63.980094] [<ffffffff819e00ea>] ? do_wait_for_common+0xf8/0x125
[ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61
[ 63.980094] [<ffffffff819e43ec>] ret_from_fork+0x7c/0xb0
[ 63.980094] [<ffffffff8108e384>] ? __kthread_parkme+0x61/0x61

tp could be freed in call_rcu callback too, the order is not guaranteed.

John Fastabend says:

====================
Its worth noting why this is safe. Any running schedulers will either
read the valid class field or it will be zeroed.

All schedulers today when the class is 0 do a lookup using the
same call used by the tcf_exts_bind(). So even if we have a running
classifier hit the null class pointer it will do a lookup and get
to the same result. This is particularly fragile at the moment because
the only way to verify this is to audit the schedulers call sites.
====================

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
18d0264f630e200772bf236ac5747c47e908501e 25-Sep-2014 WANG Cong <xiyou.wangcong@gmail.com> net_sched: remove the first parameter from tcf_exts_destroy()

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a2aeb02a8e6a9fef397c344245a54eeae67341f6 22-Sep-2014 Eric Dumazet <edumazet@google.com> net: sched: fix compile warning in cls_u32

$ grep CONFIG_CLS_U32_MARK .config
# CONFIG_CLS_U32_MARK is not set

net/sched/cls_u32.c: In function 'u32_change':
net/sched/cls_u32.c:852:1: warning: label 'errout' defined but not used
[-Wunused-label]

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
de5df63228fcfbd5bb7fd883774c18fec9e61f12 20-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: cls_u32 changes to knode must appear atomic to readers

Changes to the cls_u32 classifier must appear atomic to the
readers. Before this patch if a change is requested for both
the exts and ifindex, first the ifindex is updated then the
exts with tcf_exts_change(). This opens a small window where
a reader can have a exts chain with an incorrect ifindex. This
violates the the RCU semantics.

Here we resolve this by always passing u32_set_parms() a copy
of the tc_u_knode to work on and then inserting it into the hash
table after the updates have been successfully applied.

Tested with the following short script:

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 handle 1: \
u32 divisor 256

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 99 \
u32 link 1: hashkey mask ffffff00 at 12 \
match ip src 192.168.8.0/2

#tc filter add dev p3p2 parent 8001:0 protocol ip prio 102 \
handle 1::10 u32 classid 1:2 ht 1: \
match ip src 192.168.8.0/8 match ip tos 0x0a 1e

#tc filter change dev p3p2 parent 8001:0 protocol ip prio 102 \
handle 1::10 u32 classid 1:2 ht 1: \
match ip src 1.1.0.0/8 match ip tos 0x0b 1e

CC: Eric Dumazet <edumazet@google.com>
CC: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a1ddcfee2d9ae172d0095f3f8227f7fa53288c65 20-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: cls_u32: fix missed pcpu_success free_percpu

This fixes a missed free_percpu in the unwind code path and when
keys are destroyed.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e2840eee6b21cb5230bd7cac8407badb201aac3 17-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: cls_u32: rcu can not be last node

tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
structure and pads the structure with tc_u32_key fields for each key.

kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a96366bf263919c529baa74a0b029c82a8388045 16-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: cls_u32 add missing rcu_assign_pointer and annotation

Add missing rcu_assign_pointer and missing annotation for ht_up
in cls_u32.c

Caught by kbuild bot,

>> net/sched/cls_u32.c:378:36: sparse: incorrect type in initializer (different address spaces)
net/sched/cls_u32.c:378:36: expected struct tc_u_hnode *ht
net/sched/cls_u32.c:378:36: got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:610:54: sparse: incorrect type in argument 4 (different address spaces)
net/sched/cls_u32.c:610:54: expected struct tc_u_hnode *ht
net/sched/cls_u32.c:610:54: got struct tc_u_hnode [noderef] <asn:4>*ht_up
>> net/sched/cls_u32.c:684:18: sparse: incorrect type in assignment (different address spaces)
net/sched/cls_u32.c:684:18: expected struct tc_u_hnode [noderef] <asn:4>*ht_up
net/sched/cls_u32.c:684:18: got struct tc_u_hnode *[assigned] ht
>> net/sched/cls_u32.c:359:18: sparse: dereference of noderef expression

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
80aab73de4a076fc70ad5cc60395d935c40e605d 16-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: fix unsued cpu variable

kbuild test robot reported an unused variable cpu in cls_u32.c
after the patch below. This happens when PERF and MARK config
variables are disabled

Fix this is to use separate variables for perf and mark
and define the cpu variable inside the ifdef logic.

Fixes: 459d5f626da7 ("net: sched: make cls_u32 per cpu")'
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1ce87720d456e471de0fbd814dc5d1fe10fc1c44 13-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: make cls_u32 lockless

Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
q=`echo $i%8|bc`;
echo -n "u32 tos: iteration $i on queue $q";
tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
action skbedit queue_mapping $q;
sleep 1;
tc filter del dev p3p2 prio $i;

echo -n "u32 tos hash table: iteration $i on queue $q";
tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
sleep 2;
tc filter del dev p3p2 prio $i
sleep 1;
done

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
459d5f626da75573e985a7197b0919c3b143146c 13-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: make cls_u32 per cpu

This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8f787cd1cc1ea51cde3bba82bd0a63b343f88a32 13-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: make cls_u32 lockless

Make cls_u32 classifier safe to run without holding lock. This patch
converts statistics that are kept in read section u32_classify into
per cpu counters.

This patch was tested with a tight u32 filter add/delete loop while
generating traffic with pktgen. By running pktgen on vlan devices
created on top of a physical device we can hit the qdisc layer
correctly. For ingress qdisc's a loopback cable was used.

for i in {1..100}; do
q=`echo $i%8|bc`;
echo -n "u32 tos: iteration $i on queue $q";
tc filter add dev p3p2 parent $p prio $i u32 match ip tos 0x10 0xff \
action skbedit queue_mapping $q;
sleep 1;
tc filter del dev p3p2 prio $i;

echo -n "u32 tos hash table: iteration $i on queue $q";
tc filter add dev p3p2 parent $p protocol ip prio $i handle 628: u32 divisor 1
tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
match ip protocol 17 0xff link 628: offset at 0 mask 0xf00 shift 6 plus 0
tc filter add dev p3p2 parent $p protocol ip prio $i u32 \
ht 628:0 match ip tos 0x10 0xff action skbedit queue_mapping $q
sleep 2;
tc filter del dev p3p2 prio $i
sleep 1;
done

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f4f640502dfbe9b658f9008ee614932bb463d541 13-Sep-2014 John Fastabend <john.fastabend@gmail.com> net: sched: make cls_u32 per cpu

This uses per cpu counters in cls_u32 in preparation
to convert over to rcu.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7801db8aec957fa6610efe0ee26a6c8bc0f1d73b 18-Jul-2014 Cong Wang <cwang@twopensource.com> net_sched: avoid generating same handle for u32 filters

When kernel generates a handle for a u32 filter, it tries to start
from the max in the bucket. So when we have a filter with the max (fff)
handle, it will cause kernel always generates the same handle for new
filters. This can be shown by the following command:

tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff
tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff
...

we will get some u32 filters with same handle:

# tc filter show dev eth0 parent ffff:
filter protocol ip pref 770 u32
filter protocol ip pref 770 u32 fh 800: ht divisor 1
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
match 00010000/00ff0000 at 8
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
match 00010000/00ff0000 at 8
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
match 00010000/00ff0000 at 8
filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
match 00010000/00ff0000 at 8

handles should be unique. This patch fixes it by looking up a bitmap,
so that can guarantee the handle is as unique as possible. For compatibility,
we still start from 0x800.

Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2f7ef2f8790f5bf53db4fc6b2310943139285827 25-Apr-2014 Cong Wang <cwang@twopensource.com> sched, cls: check if we could overwrite actions when changing a filter

When actions are attached to a filter, they are a part of the filter
itself, so when changing a filter we should allow to overwrite the actions
inside as well.

In my specific case, when I tried to _append_ a new action to an existing
filter which already has an action, I got EEXIST since kernel refused
to overwrite the existing one in kernel.

This patch checks if we are changing the filter checking NLM_F_CREATE flag
(Sigh, filters don't use NLM_F_REPLACE...) and then passes the boolean down
to actions. This fixes the problem above.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a8701a6c7ae0142393d0fe87a1e7778bd04d1ac7 10-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com> net_sched: avoid casting void pointer

tp->root is a void* pointer, no need to cast it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2519a602c273c5254781bc55b6e678a17e469a12 10-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com> net_sched: optimize tcf_match_indev()

tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.

Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.

BTW, this will also save some bytes from the core struct of u32.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
832d1d5bfaefafa5aa40282f6765c6d996fe384e 10-Jan-2014 WANG Cong <xiyou.wangcong@gmail.com> net_sched: add struct net pointer to tcf_proto_ops->dump

It will be needed by the next patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5da57f422d89c504a1d72dadd4e19d3dca8e974e 16-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com> net_sched: cls: refactor out struct tcf_ext_map

These information can be saved in tcf_exts, and this will
simplify the code.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
33be627159913b094bb578e83e9a7fdc66c10208 16-Dec-2013 WANG Cong <xiyou.wangcong@gmail.com> net_sched: act: use standard struct list_head

Currently actions are chained by a singly linked list,
therefore it is a bit hard to add and remove a specific
entry. Convert it to struct list_head so that in the
latter patch we can remove an action without finding
its head.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
82d567c26647aa2e54c6814cd593a32f7799e387 10-Dec-2013 Yang Yingliang <yangyingliang@huawei.com> net_sched: change "foo* bar" to "foo *bar"

"foo* bar" or "foo * bar" should be "foo *bar".

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c1b52739e45f5969b208ebc377f52468280af11e 14-Jan-2013 Benjamin LaHaise <bcrl@kvack.org> pkt_sched: namespace aware act_mirred

Eric Dumazet pointed out that act_mirred needs to find the current net_ns,
and struct net pointer is not provided in the call chain. His original
patch made use of current->nsproxy->net_ns to find the network namespace,
but this fails to work correctly for userspace code that makes use of
netlink sockets in different network namespaces. Instead, pass the
"struct net *" down along the call chain to where it is needed.

This version removes the ifb changes as Eric has submitted that patch
separately, but is otherwise identical to the previous version.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
af4c6641f5ad445fe6d0832da42406dbd9a37ce4 25-May-2012 Eric W. Biederman <ebiederm@xmission.com> net sched: Pass the skb into change so it can access NETLINK_CB

cls_flow.c plays with uids and gids. Unless I misread that
code it is possible for classifiers to depend on the specific uid and
gid values. Therefore I need to know the user namespace of the
netlink socket that is installing the packet classifiers. Pass
in the rtnetlink skb so I can access the NETLINK_CB of the passed
packet. In particular I want access to sk_user_ns(NETLINK_CB(in_skb).ssk).

Pass in not the user namespace but the incomming rtnetlink skb into
the the classifier change routines as that is generally the more useful
parameter.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
e87cc4728f0e2fb663e592a1141742b1d6c63256 13-May-2012 Joe Perches <joe@perches.com> net: Convert net_ratelimit uses to net_<level>_ratelimited

Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b34ec43c9b3de44a5420841ab293d1b2035a94c 29-Mar-2012 David S. Miller <davem@davemloft.net> pkt_sched: Stop using NLA_PUT*().

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
dc7f9f6e8838556f226c2ebd1da7bb305cb25654 06-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com> net: sched: constify tcf_proto and tc_action

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
86fce3ba1e731cf6d97a4157a192ffa60dc7ec0b 20-Feb-2011 stephen hemminger <shemminger@vyatta.com> cls_u32: fix sparse warnings

The variable _data is used in asm-generic to define sections
which causes sparse warnings, so just rename the variable.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cc7ec456f82da7f89a5b376e613b3ac4311b3e9a 19-Jan-2011 Eric Dumazet <eric.dumazet@gmail.com> net_sched: cleanups

Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4e18b3edf71f5d4ad653e3c2ff6560878e965f96 04-Oct-2010 Dan Carpenter <error27@gmail.com> cls_u32: signedness bug

skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also
unsigned and can't be less than zero. This test was added in 66d50d25:
"u32: negative offset fix" It was supposed to fix a regression.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
66d50d25502cd9b7d6e3ebbf4e241259c1283eaf 02-Aug-2010 stephen hemminger <shemminger@vyatta.com> u32: negative offset fix

It was possible to use a negative offset in a u32 match to reference
the ethernet header or other parts of the link layer header.
This fixes the regression caused by:

commit fbc2e7d9cf49e0bf89b9e91fd60a06851a855c5d
Author: Changli Gao <xiaosuo@gmail.com>
Date: Wed Jun 2 07:32:42 2010 -0700

cls_u32: use skb_header_pointer() to dereference data safely

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fbc2e7d9cf49e0bf89b9e91fd60a06851a855c5d 02-Jun-2010 Changli Gao <xiaosuo@gmail.com> cls_u32: use skb_header_pointer() to dereference data safely

use skb_header_pointer() to dereference data safely

the original skb->data dereference isn't safe, as there isn't any skb->len or
skb_is_nonlinear() check. skb_header_pointer() is used instead in this patch.
And when the skb isn't long enough, we terminate the function u32_classify()
immediately with -1.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ff9c3644e72bfac20844e0155c2cc8108602820 12-May-2010 stephen hemminger <shemminger@vyatta.com> net sched: printk message severity

The previous patch encourage me to go look at all the messages in
the network scheduler and fix them. Many messages were missing
any severity level. Some serious ones that should never happen
were turned into WARN(), and the random noise messages that were
handled changed to pr_debug().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
b138338056fc423c61a583d45f8aa64cfad87131 24-Mar-2010 Frans Pop <elendil@planet.nl> net: remove trailing space in messages

Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
6f57321422e0d359e83c978c2b03db77b967b7d5 06-Jan-2009 Jarek Poplawski <jarkao2@gmail.com> pkt_sched: cls_u32: Fix locking in u32_change()

New nodes are inserted in u32_change() under rtnl_lock() with wmb(),
so without tcf_tree_lock() like in other classifiers (e.g. cls_fw).
This isn't enough without rmb() on the read side, but on the other
hand adding such barriers doesn't give any savings, so the lock is
added instead.

Reported-by: m0sia <m0sia@plotinka.ru>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
47a1a1d4be2910b13a8e90f75c17e253c39531ff 19-Nov-2008 Patrick McHardy <kaber@trash.net> pkt_sched: remove unnecessary xchg() in packet classifiers

The use of xchg() hasn't been necessary since 2.2.something when proper
locking was added to packet schedulers. In the case of classifiers they
mostly weren't even necessary before that since they're mainly used
to assign a NULL pointer to the filter root in the ->destroy path;
the root is destroyed immediately after that.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
547b792cac0a038b9dbf958d3c120df3740b5572 26-Jul-2008 Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> net: convert BUG_TRAP to generic WARN_ON

Removes legacy reinvent-the-wheel type thing. The generic
machinery integrates much better to automated debugging aids
such as kerneloops.org (and others), and is unambiguous due to
better naming. Non-intuively BUG_TRAP() is actually equal to
WARN_ON() rather than BUG_ON() though some might actually be
promoted to BUG_ON() but I left that to future.

I could make at least one BUILD_BUG_ON conversion.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
72b25a913ed9b1ab49c7022adaf3f271a65ea219 19-Jul-2008 David S. Miller <davem@davemloft.net> pkt_sched: Get rid of u32_list.

The u32_list is just an indirect way of maintaining a reference
to a U32 node on a per-qdisc basis.

Just add an explicit node pointer for u32 to struct Qdisc an do
away with this global list.

Signed-off-by: David S. Miller <davem@davemloft.net>
e56cfad132f2ae269082359d279c17230c987e74 13-Apr-2008 Jarek Poplawski <jarkao2@gmail.com> [NET_SCHED] cls_u32: refcounting fix for u32_delete()

Deleting of nonroot hnodes mostly doesn't work in u32_delete():
refcnt == 1 is expected, but such hnodes' refcnts are initialized
with 0 and charged only with "link" nodes. Now they'll start with
1 like usual. Thanks to Patrick McHardy for an improving suggestion.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
0382b9c35469be273ed10fa374496a924055a3c8 18-Mar-2008 Al Viro <viro@zeniv.linux.org.uk> [PKT_SCHED]: annotate cls_u32

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
5239008b0de2507a531440b8c3019fb9c116fb1a 01-Feb-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Constify struct tcf_ext_map

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6fa8c0144b770dac941cf2c15053b6e24f046c8a 24-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Use nla_policy for attribute validation in classifiers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
1587bac49f8491b5006a78f8d726111b71757941 24-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Use typeful attribute parsing helpers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
24beeab539c6f42c4a93e2ff7c3b5f272e60da45 24-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Use typeful attribute construction helpers

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
57e1c487a4f5754cb77abeb00adb21faa88c484f 24-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Use NLA_PUT_STRING for string dumping

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
4b3550ef530cfc153fa91f0b37cbda448bad11c6 24-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Use nla_nest_start/nla_nest_end

Use nla_nest_start/nla_nest_end for dumping nested attributes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
cee63723b358e594225e812d6e14a2a0abfd5c88 24-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Propagate nla_parse return value

nla_parse() returns more detailed errno codes, propagate them back on
error.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
add93b610a4e66d36d0cf0b2596c3d3bcfdaee39 23-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Convert classifiers from rtnetlink to new netlink API

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2eb9d75c723252c1fa8f0206e6a0df220e3c64c0 23-Jan-2008 Patrick McHardy <kaber@trash.net> [NET_SCHED]: mark classifier ops __read_mostly

Additionally remove unnecessary NULL initilizations of the next pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
b226801676d9533d09da511eb379fe970fa1a770 11-Nov-2007 Radu Rendec <radu.rendec@ines.ro> [PKT_SCHED] CLS_U32: Use ffs() instead of C code on hash mask to get first set bit.

Computing the rank of the first set bit in the hash mask (for using later
in u32_hash_fold()) was done with plain C code. Using ffs() instead makes
the code more readable and improves performance (since ffs() is better
optimized in assembler).

Using the conditional operator on hash mask before applying ntohl() also
saves one ntohl() call if mask is 0.

Signed-off-by: Radu Rendec <radu.rendec@ines.ro>
Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
543821c6f5dea5221426eaf1eac98b100249c7ac 07-Nov-2007 Radu Rendec <radu.rendec@ines.ro> [PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.

While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).

The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).

Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.

This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.

One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.

Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.

The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.

With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.

Signed-off-by: David S. Miller <davem@davemloft.net>
cfcabdcc2d5a810208e5bb3974121b7ed60119aa 09-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: sparse warning fixes

Fix a bunch of sparse warnings. Mostly about 0 used as
NULL pointer, and shadowed variable declarations.
One notable case was that hash size should have been unsigned.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bf1b803b01b00c3801e0aa373ba0305f8278e260 08-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org> [PKT_SCHED] cls_u32: error code isn't been propogated properly

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
c3bc7cff8fddb6ff9715be8bfc3d911378c4d69d 15-Jul-2007 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Kill CONFIG_NET_CLS_POLICE

The NET_CLS_ACT option is now a full replacement for NET_CLS_POLICE,
remove the old code. The config option will be kept around to select
the equivalent NET_CLS_ACT options for a short time to allow easier
upgrades.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
0ba48053831d5b89ee2afaefaae1c06eae80cb05 03-Jul-2007 Patrick McHardy <kaber@trash.net> [NET_SCHED]: Remove unnecessary includes

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3ff50b7997fe06cd5d276b229967bb52d6b3b6c1 21-Apr-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: cleanup extra semicolons

Spring cleaning time...

There seems to be a lot of places in the network code that have
extra bogus semicolons after conditionals. Most commonly is a
bogus semicolon after: switch() { }

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dc5fc579b90ed0a9a4e55b0218cdbaf0a8cf2e67 26-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [NETLINK]: Use nlmsg_trim() where appropriate

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
27a884dc3cb63b93c2b3b643f5b31eed5f8a4d26 20-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Convert skb->tail to sk_buff_data_t

So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes
on 64bit architectures, allowing us to combine the 4 bytes hole left by the
layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4
64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN...
:-)

Many calculations that previously required that skb->{transport,network,
mac}_header be first converted to a pointer now can be done directly, being
meaningful as offsets or pointers.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d56f90a7c96da5187f0cdf07ee7434fe6aa78bbc 11-Apr-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_network_header()

For the places where we need a pointer to the network header, it is still legal
to touch skb->nh.raw directly if just adding to, subtracting from or setting it
to another layer header.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cd354f1ae75e6466a7e31b727faede57a1f89ca5 14-Feb-2007 Tim Schmielau <tim@physik3.uni-rostock.de> [PATCH] remove many unneeded #includes of sched.h

After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10297b99315e5e08fe623ba56da35db1fee69ba9 09-Feb-2007 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] SCHED: Fix whitespace errors.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
82e91ffef60e6eba9848fe149ce1eecd2b5aef12 10-Nov-2006 Thomas Graf <tgraf@suug.ch> [NET]: Turn nfmark into generic mark

nfmark is being used in various subsystems and has become
the defacto mark field for all kinds of packets. Therefore
it makes sense to rename it to `mark' and remove the
dependency on CONFIG_NETFILTER.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
c0956bd25161bff45304d482cda51ca4b3b572f1 15-Aug-2006 Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> [PKT_SCHED] cls_u32: Fix typo.

Signed-off-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
0da974f4f303a6842516b764507e3c0a03f41e5a 21-Jul-2006 Panagiotis Issaris <takis@issaris.org> [NET]: Conversions from kmalloc+memset to k(z|c)alloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
1ae39a430b692552e5aceb63fa35ce95fcbadc6a 23-Mar-2006 Patrick McHardy <kaber@trash.net> [NET_SCHED]: cls_u32: remove unnecessary NULL-ptr check

In both cases n can't be NULL without crashing anyway.

Coverity #78

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
a51482bde22f99c63fbbb57d5d46cc666384e379 08-Nov-2005 Jesper Juhl <jesper.juhl@gmail.com> [NET]: kfree cleanup

From: Jesper Juhl <jesper.juhl@gmail.com>

This is the net/ part of the big kfree cleanup patch.

Remove pointless checks for NULL prior to calling kfree() in net/.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!