57a02c39c1c20ed03a86f8014c11a8c18b94cac3 |
|
01-Oct-2014 |
Fabian Frederick <fabf@skynet.be> |
inet: frags: add __init to ip4_frags_ctl_register ip4_frags_ctl_register is only called by __init ipfrag_init Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d4ad4d22e7ac6b8711b35d7e86eb29f03f8ac153 |
|
01-Aug-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frags: use kmem_cache for inet_frag_queue Use kmem_cache to allocate/free inet_frag_queue objects since they're all the same size per inet_frags user and are alloced/freed in high volumes thus making it a perfect case for kmem_cache. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2e404f632f44979ddf0ce0808a438249a72d7015 |
|
01-Aug-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frags: use INET_FRAG_EVICTED to prevent icmp messages Now that we have INET_FRAG_EVICTED we might as well use it to stop sending icmp messages in the "frag_expire" functions instead of stripping INET_FRAG_FIRST_IN from their flags when evicting. Also fix the comment style in ip6_expire_frag_queue(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
06aa8b8a0345c78f4d9a1fb3f852952b12a0e40c |
|
01-Aug-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frags: rename last_in to flags The last_in field has been used to store various flags different from first/last frag in so give it a more descriptive name: flags. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1bab4c75075b84675b96992ac47580a57c26958d |
|
24-Jul-2014 |
Nikolay Aleksandrov <nikolay@redhat.com> |
inet: frag: set limits and make init_net's high_thresh limit global This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit threshold equal to the sum of the individual high_thresh limits which are capped. It also introduces some sane minimums for low_thresh as it shouldn't be able to drop below 0 (or > high_thresh in the unsigned case), and overall low_thresh should not ever be above high_thresh, so we make the following relations for a namespace: init_net: high_thresh - max(not capped), min(init_net low_thresh) low_thresh - max(init_net high_thresh), min (0) all other namespaces: high_thresh = max(init_net high_thresh), min(namespace's low_thresh) low_thresh = max(namespace's high_thresh), min(0) The major issue with having low_thresh > high_thresh is that we'll schedule eviction but never evict anything and thus rely only on the timers. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ab1c724f633080ed2e8a0cfe61654599b55cf8f9 |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: use seqlock for hash rebuild rehash is rare operation, don't force readers to take the read-side rwlock. Instead, we only have to detect the (rare) case where the secret was altered while we are trying to insert a new inetfrag queue into the table. If it was changed, drop the bucket lock and recompute the hash to get the 'new' chain bucket that we have to insert into. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e3a57d18b06179d68fcf7a0a06ad844493c65e06 |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: remove periodic secret rebuild timer merge functionality into the eviction workqueue. Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit. If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds. ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3fd588eb90bfbba17091381006ecafe29c45db4a |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: remove lru list no longer used. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
434d305405ab86414f6ea3f261307d443a2c3506 |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: don't account number of fragment queues The 'nqueues' counter is protected by the lru list lock, once thats removed this needs to be converted to atomic counter. Given this isn't used for anything except for reporting it to userspace via /proc, just remove it. We still report the memory currently used by fragment reassembly queues. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: move eviction of queues to work queue When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context. This has two drawbacks: 1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim. 2) LRU list maintenance is expensive. But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue. This moves eviction out of the fast path: When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired. When the high threshold is reached, no more fragment queues are created until we're below the limit again. The LRU list is now unused and will be removed in a followup patch. Joint work with Nikolay Aleksandrov. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
86e93e470cadedda9181a2bd9aee1d9d2e5e9c0f |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: move evictor calls into frag_find function First step to move eviction handling into a work queue. We lose two spots that accounted evicted fragments in MIB counters. Accounting will be restored since the upcoming work-queue evictor invokes the frag queue timer callbacks instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fb3cfe6e75b9d05c87265e85e67d7caf6e5b44a7 |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: remove hash size assumptions from callers hide actual hash size from individual users: The _find function will now fold the given hash value into the required range. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
36c7778218b93d96d88d68f116a711f6a598b72f |
|
24-Jul-2014 |
Florian Westphal <fw@strlen.de> |
inet: frag: constify match, hashfn and constructor arguments Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7c3d5ab1f35f5475b1a1fbe74143683cfc092d33 |
|
03-May-2014 |
Vasily Averin <vvs@parallels.com> |
ipv4: fix "conntrack zones" support for defrag user check in ip_expire Defrag user check in ip_expire was not updated after adding support for "conntrack zones". This bug manifests as a RFC violation, since the router will send the icmp time exceeeded message when using conntrack zones. Signed-off-by: Vasily Averin <vvs@openvz.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
7539fadcb8146a5f0db51e80d99c9e724efec7b0 |
|
16-Dec-2013 |
Tom Herbert <therbert@google.com> |
net: Add utility functions to clear rxhash In several places 'skb->rxhash = 0' is being done to clear the rxhash value in an skb. This does not clear l4_rxhash which could still be set so that the rxhash wouldn't be recalculated on subsequent call to skb_get_rxhash. This patch adds an explict function to clear all the rxhash related information in the skb properly. skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as l4_rxhash. Fixed up places where 'skb->rxhash = 0' was being called. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e7b519ba55aeb675daee1d304e80d752c385f7f0 |
|
23-Oct-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv4: initialize ip4_frags hash secret as late as possible Defer the generation of the first hash secret for the ipv4 fragmentation cache as late as possible. ip4_frags.rnd gets initial seeded by inet_frags_init and regulary reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly from ip_fragment.c in which case we initialize the secret directly. If we first get called by inet_frag_secret_rebuild we install a new secret by a manual call to get_random_bytes. This secret will be overwritten as soon as the first call to ipqhashfn happens. This is safe because we won't race while publishing the new secrets with anyone else. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
97599dc792b45b1669c3cdb9a4b365aad0232f65 |
|
16-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
net: drop dst before queueing fragments Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, as non refcounted dst could escape an RCU protected section. Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed the case of timeouts, but not the general problem. Tom Parkin noticed crashes in UDP stack and provided a patch, but further analysis permitted us to pinpoint the root cause. Before queueing a packet into a frag list, we must drop its dst, as this dst has limited lifetime (RCU protected) When/if a packet is finally reassembled, we use the dst of the very last skb, still protected by RCU and valid, as the dst of the reassembled packet. Use same logic in IPv6, as there is no need to hold dst references. Reported-by: Tom Parkin <tparkin@katalix.com> Tested-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
be991971d53e0f5b6d13e3940192054216590072 |
|
22-Mar-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6 This patch just moves some code arround to make the ip4_frag_ecn_table and IPFRAG_ECN_* constants accessible from the other reassembly engines. I also renamed ip4_frag_ecn_table to ip_frag_ecn_table. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5a3da1fe9561828d0ca7eca664b16ec2b9bf0055 |
|
15-Mar-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
inet: limit length of fragment queue hash table bucket lists This patch introduces a constant limit of the fragment queue hash table bucket list lengths. Currently the limit 128 is choosen somewhat arbitrary and just ensures that we can fill up the fragment cache with empty packets up to the default ip_frag_high_thresh limits. It should just protect from list iteration eating considerable amounts of cpu. If we reach the maximum length in one hash bucket a warning is printed. This is implemented on the caller side of inet_frag_find to distinguish between the different users of inet_fragment.c. I dropped the out of memory warning in the ipv4 fragment lookup path, because we already get a warning by the slab allocator. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jesper Dangaard Brouer <jbrouer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
14bbd6a565e1bcdc240d44687edb93f721cfdf99 |
|
14-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
net: Add skb_unclone() helper function. This function will be used in next GRE_GSO patch. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com>
|
3ef0eb0db4bf92c6d2510fe5c4dc51852746f206 |
|
29-Jan-2013 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: frag, move LRU list maintenance outside of rwlock Updating the fragmentation queues LRU (Least-Recently-Used) list, required taking the hash writer lock. However, the LRU list isn't tied to the hash at all, so we can use a separate lock for it. Original-idea-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d433673e5f9180e05a770c4b2ab18c08ad51cc21 |
|
29-Jan-2013 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: frag helper functions for mem limit tracking This change is primarily a preparation to ease the extension of memory limit tracking. The change does reduce the number atomic operation, during freeing of a frag queue. This does introduce a some performance improvement, as these atomic operations are at the core of the performance problems seen on NUMA systems. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c2a936600f78aea00d3312ea4b66a79a4619f9b4 |
|
15-Jan-2013 |
Jesper Dangaard Brouer <brouer@redhat.com> |
net: increase fragment memory usage limits Increase the amount of memory usage limits for incomplete IP fragments. Arguing for new thresh high/low values: High threshold = 4 MBytes Low threshold = 3 MBytes The fragmentation memory accounting code, tries to account for the real memory usage, by measuring both the size of frag queue struct (inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize. We want to be able to handle/hold-on-to enough fragments, to ensure good performance, without causing incomplete fragments to hurt scalability, by causing the number of inet_frag_queue to grow too much (resulting longer searches for frag queues). For IPv4, how much memory does the largest frag consume. Maximum size fragment is 64K, which is approx 44 fragments with MTU(1500) sized packets. Sizeof(struct ipq) is 200. A 1500 byte packet results in a truesize of 2944 (not 2048 as I first assumed) (44*2944)+200 = 129736 bytes The current default high thresh of 262144 bytes, is obviously problematic, as only two 64K fragments can fit in the queue at the same time. How many 64K fragment can we fit into 4 MBytes: 4*2^20/((44*2944)+200) = 32.34 fragment in queues An attacker could send a separate/distinct fake fragment packets per queue, causing us to allocate one inet_frag_queue per packet, and thus attacking the hash table and its lists. How many frag queue do we need to store, and given a current hash size of 64, what is the average list length. Using one MTU sized fragment per inet_frag_queue, each consuming (2944+200) 3144 bytes. 4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length An attack could send small fragments, the smallest packet I could send resulted in a truesize of 896 bytes (I'm a little surprised by this). 4*2^20/(896+200) = 3827 frag queues -> 59 avg list length When increasing these number, we also need to followup with improvements, that is going to help scalability. Simply increasing the hash size, is not enough as the current implementation does not have a per hash bucket locking. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1bf3751ec90cc3174e01f0d701e8449ce163d113 |
|
10-Dec-2012 |
Johannes Berg <johannes.berg@intel.com> |
ipv4: ip_check_defrag must not modify skb before unsharing ip_check_defrag() might be called from af_packet within the RX path where shared SKBs are used, so it must not modify the input SKB before it has unshared it for defragmentation. Use skb_copy_bits() to get the IP header and only pull in everything later. The same is true for the other caller in macvlan as it is called from dev->rx_handler which can also get a shared SKB. Reported-by: Eric Leblond <eric@regit.org> Cc: stable@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
464dc801c76aa0db88e16e8f5f47c6879858b9b2 |
|
16-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Don't export sysctls to unprivileged users In preparation for supporting the creation of network namespaces by unprivileged users, modify all of the per net sysctl exports and refuse to allow them to unprivileged users. This makes it safe for unprivileged users in general to access per net sysctls, and allows sysctls to be exported to unprivileged users on an individual basis as they are deemed safe. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6b102865e7ba9ff1e3c49c32c7187bb427d91798 |
|
18-Sep-2012 |
Amerigo Wang <amwang@redhat.com> |
ipv6: unify fragment thresh handling code Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Michal Kubeček <mkubecek@suse.cz> Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5f2d04f1f9b52604fca6ee08a77972c0df67e082 |
|
26-Aug-2012 |
Patrick McHardy <kaber@trash.net> |
ipv4: fix path MTU discovery with connection tracking IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and (in case of forwarded packets) refragments them at POST_ROUTING independent of the IP_DF flag. Refragmentation uses the dst_mtu() of the local route without caring about the original fragment sizes, thereby breaking PMTUD. This patch fixes this by keeping track of the largest received fragment with IP_DF set and generates an ICMP fragmentation required error during refragmentation if that size exceeds the MTU. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: David S. Miller <davem@davemloft.net>
|
c6cffba4ffa26a8ffacd0bb9f3144e34f20da7de |
|
26-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Fix input route performance regression. With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
38a424e4657462fe9f8b76f01a0e879abde99ab4 |
|
01-Jul-2012 |
David Miller <davem@davemloft.net> |
ipv4: Kill ip_route_input_noref(). The "noref" argument to ip_route_input_common() is now always ignored because we do not cache routes, and in that case we must always grab a reference to the resulting 'dst'. Signed-off-by: David S. Miller <davem@davemloft.net>
|
c10237e077cef50e925f052e49f3b4fead9d71f9 |
|
28-Jun-2012 |
David S. Miller <davem@davemloft.net> |
Revert "ipv4: tcp: dont cache unconfirmed intput dst" This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
|
c074da2810c118b3812f32d6754bd9ead2f169e7 |
|
27-Jun-2012 |
Eric Dumazet <edumazet@google.com> |
ipv4: tcp: dont cache unconfirmed intput dst DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c0efc887dcadbdbfe171f028acfab9c7c00e9dde |
|
10-Jun-2012 |
David S. Miller <davem@davemloft.net> |
inet: Pass inetpeer root into inet_getpeer*() interfaces. Otherwise we reference potentially non-existing members when ipv6 is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
|
54db0cc2ba0d38166acc2d6bae21721405305537 |
|
08-Jun-2012 |
Gao feng <gaofeng@cn.fujitsu.com> |
inetpeer: add parameter net for inet_getpeer_v4,v6 add struct net as a parameter of inet_getpeer_v[4,6], use net to replace &init_net. and modify some places to provide net for inet_getpeer_v[4,6] Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3cc4949269e01f39443d0fcfffb5bc6b47878d45 |
|
19-May-2012 |
Eric Dumazet <edumazet@google.com> |
ipv4: use skb coalescing in defragmentation ip_frag_reasm() can use skb_try_coalesce() to build optimized skb, reducing memory used by them (truesize), and reducing number of cache line misses and overhead for the consumer. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cbc264cacd08e51fd4a64b5d5b1ba48f523990d1 |
|
18-May-2012 |
Eric Dumazet <edumazet@google.com> |
ip_frag: struct inet_frags match() method returns a bool - match() method returns a boolean - return (A && B && C && D) -> return A && B && C && D - fix indentation Signed-off-by: Eric Dumazet <edumazet@google.com>
|
e87cc4728f0e2fb663e592a1141742b1d6c63256 |
|
13-May-2012 |
Joe Perches <joe@perches.com> |
net: Convert net_ratelimit uses to net_<level>_ratelimited Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ec8f23ce0f4005b74013d4d122e0d540397a93c9 |
|
19-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Convert all sysctl registrations to register_net_sysctl This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4344475797a16ef948385780943f7a5cf09f0675 |
|
19-Apr-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Kill register_sysctl_rotable register_sysctl_rotable never caught on as an interesting way to register sysctls. My take on the situation is that what we want are sysctls that we can only see in the initial network namespace. What we have implemented with register_sysctl_rotable are sysctls that we can see in all of the network namespaces and can only change in the initial network namespace. That is a very silly way to go. Just register the network sysctls in the initial network namespace and we don't have any weird special cases to deal with. The sysctls affected are: /proc/sys/net/ipv4/ipfrag_secret_interval /proc/sys/net/ipv4/ipfrag_max_dist /proc/sys/net/ipv6/ip6frag_secret_interval /proc/sys/net/ipv6/mld_max_msf I really don't expect anyone will miss them if they can't read them in a child user namespace. CC: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cbf8f7bb200f5dbdc9ce11243431440720db03dc |
|
19-Apr-2012 |
Eric Dumazet <edumazet@google.com> |
ipv4: dont drop packet in defrag but consume it When defragmentation is finalized, we clone a packet and kfree_skb() it. Call consume_skb() to not confuse dropwatch, since its not a drop. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
afd465030acb4098abcb6b965a5aebc7ea2209e0 |
|
12-Mar-2012 |
Joe Perches <joe@perches.com> |
net: ipv4: Standardize prefixes for message logging Add #define pr_fmt(fmt) as appropriate. Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files. Standardize on "UDPLite: " for appropriate uses. Some prefixes were previously "UDPLITE: " and "UDP-Lite: ". Add KBUILD_MODNAME ": " to icmp and gre. Remove embedded prefixes as appropriate. Add missing "\n" to pr_info in gre.c. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
058bd4d2a4ff0aaa4a5381c67e776729d840c785 |
|
11-Mar-2012 |
Joe Perches <joe@perches.com> |
net: Convert printks to pr_<level> Use a more current kernel messaging style. Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names. Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ". The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages. Text size is trivially reduced. (x86-32 allyesconfig) $ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
42b2aa86c6670347a2a07e6d7af0e0ecc8fdbff9 |
|
29-Nov-2011 |
Justin P. Mattock <justinmattock@gmail.com> |
treewide: Fix typos in various parts of the kernel, and fix some comments. The below patch fixes some typos in various parts of the kernel, as well as fixes some comments. Please let me know if I missed anything, and I will try to get it changed and resent. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
9e903e085262ffbf1fc44a17ac06058aca03524a |
|
18-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bc416d9768aa9a2e46eb11354a9c58399dafeb01 |
|
06-Oct-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
macvlan: handle fragmented multicast frames Fragmented multicast frames are delivered to a single macvlan port, because ip defrag logic considers other samples are redundant. Implement a defrag step before trying to send the multicast frame. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
595fc71baa1e80420fe89a400ff2d9cc099d22fc |
|
05-Jul-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Add ip_defrag() agent IP_DEFRAG_AF_PACKET. Elide the ICMP on frag queue timeouts unconditionally for this user. Signed-off-by: David S. Miller <davem@davemloft.net>
|
1d1652cbdb9885e4d73972263e4cdbe1b0beebfe |
|
17-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Don't use enums as bitmasks in ip_fragment.c Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
|
5173cc057787560c127c6e9737f308c833dc4ff3 |
|
16-May-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv4: more compliant RFC 3168 support Commit 6623e3b24a5e (ipv4: IP defragmentation must be ECN aware) was an attempt to not lose "Congestion Experienced" (CE) indications when performing datagram defragmentation. Stefanos Harhalakis raised the point that RFC 3168 requirements were not completely met by this commit. In particular, we MUST detect invalid combinations and eventually drop illegal frames. Reported-by: Stefanos Harhalakis <v13@v13.gr> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
64f3b9e203bd06855072e295557dca1485a2ecba |
|
04-May-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: ip_expire() must revalidate route Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path) added a bug in IP defragmentation handling, in case timeout is fired. When a frame is defragmented, we use last skb dst field when building final skb. Its dst is valid, since we are in rcu read section. But if a timeout occurs, we take first queued fragment to build one ICMP TIME EXCEEDED message. Problem is all queued skb have weak dst pointers, since we escaped RCU critical section after their queueing. icmp_send() might dereference a now freed (and possibly reused) part of memory. Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is the only possible choice. Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6623e3b24a5ebb07e81648c478d286a1329ab891 |
|
05-Jan-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv4: IP defragmentation must be ECN aware RFC3168 (The Addition of Explicit Congestion Notification to IP) states : 5.3. Fragmentation ECN-capable packets MAY have the DF (Don't Fragment) bit set. Reassembly of a fragmented packet MUST NOT lose indications of congestion. In other words, if any fragment of an IP packet to be reassembled has the CE codepoint set, then one of two actions MUST be taken: * Set the CE codepoint on the reassembled packet. However, this MUST NOT occur if any of the other fragments contributing to this reassembly carries the Not-ECT codepoint. * The packet is dropped, instead of being reassembled, for any other reason. This patch implements this requirement for IPv4, choosing the first action : If one fragment had NO-ECT codepoint reassembled frame has NO-ECT ElIf one fragment had CE codepoint reassembled frame has CE Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b534ecf1cd26f094497da6ae28a6ab64cdbe1617 |
|
30-Nov-2010 |
David S. Miller <davem@davemloft.net> |
inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer. And make an inet_getpeer_v4() helper, update callers. Signed-off-by: David S. Miller <davem@davemloft.net>
|
a02cec2155fbea457eca8881870fd2de1a4c4c76 |
|
22-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: return operator cleanup Change "return (EXPR);" to "return EXPR;" return is not a function, parentheses are not required. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
21dc330157454046dd7c494961277d76e1c957fe |
|
23-Aug-2010 |
David S. Miller <davem@davemloft.net> |
net: Rename skb_has_frags to skb_has_frag_list SKBs can be "fragmented" in two ways, via a page array (called skb_shinfo(skb)->frags[]) and via a list of SKBs (called skb_shinfo(skb)->frag_list). Since skb_has_frags() tests the latter, it's name is confusing since it sounds more like it's testing the former. Signed-off-by: David S. Miller <davem@davemloft.net>
|
4bc2f18ba4f22a90ab593c0a580fc9a19c4777b6 |
|
09-Jul-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net/ipv4: EXPORT_SYMBOL cleanups CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d6bebca92c663fb216c072193945946f3807ca7f |
|
29-Jun-2010 |
Changli Gao <xiaosuo@gmail.com> |
fragment: add fast path for in-order fragments add fast path for in-order fragments As the fragments are sent in order in most of OSes, such as Windows, Darwin and FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue. In the fast path, we check if the skb at the end of the inet_frag_queue is the prev we expect. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_frag.h | 1 + net/ipv4/ip_fragment.c | 12 ++++++++++++ net/ipv6/reassembly.c | 11 +++++++++++ 3 files changed, 24 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
|
a95d8c88bea0c93505e1d143d075f112be2b25e3 |
|
14-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipfrag : frag_kfree_skb() cleanup Third param (work) is unused, remove it. Remove __inline__ and inline qualifiers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d27f9b35827ec91d71d3561c127a0a8135fb470d |
|
14-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_frag: Remove some atomic ops Instead of doing one atomic operation per frag, we can factorize them. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5a0e3ad6af8660be21ca98a971cd00f331318c05 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
d1c9ae6d1e7b95cedc8e39e8949e795379a0669e |
|
02-Feb-2010 |
Patrick McHardy <kaber@trash.net> |
ipv4: ip_fragment: fix unbalanced rcu_read_unlock() Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e9017b55189355e9e6569990a18919e83f35bccb |
|
23-Jan-2010 |
Shan Wei <shanwei@cn.fujitsu.com> |
IP: Send an ICMP "Fragment Reassembly Timeout" message when enabling connection track No matter whether connection track is enabled, an end host should send an ICMPv4 "Fragment Reassembly Timeout" message when defrag timeout. The reasons are following two points: 1. RFC 792 says: >>>> >> > > If a host reassembling a fragmented datagram cannot complete the >>>> >> > > reassembly due to missing fragments within its time limit it >>>> >> > > discards the datagram, and it may send a time exceeded message. >>>> >> > > >>>> >> > > If fragment zero is not available then no time exceeded need be >>>> >> > > sent at all. >>>> >> > > >>>> >> > > Read more: http://www.faqs.org/rfcs/rfc792.html#ixzz0aOXRD7Wp 2. Patrick McHardy also agrees with this opinion. :-) About the discussion of this opinion, refer to http://patchwork.ozlabs.org/patch/41649 The patch fixed the problem like this: When enabling connection track, fragments are received at PRE_ROUTING HOOK. If they are failed to reassemble, ip_expire() will be called. Before sending an ICMP "Fragment Reassembly Timeout" message, the patch searches router table to get the destination entry only for host type. The patch has been tested on both host type and route type. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2c8c1e7297e19bdef3c178c3ea41d898a7716e3e |
|
17-Jan-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bbf31bf18d34caa87dd01f08bf713635593697f2 |
|
30-Nov-2009 |
David Ford <david@blue-labs.org> |
ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, NULL ptr OOPS ipv4 ip_frag_reasm(), fully replace 'dev_net(dev)' with 'net', defined previously patched into 2.6.29. Between 2.6.28.10 and 2.6.29, net/ipv4/ip_fragment.c was patched, changing from dev_net(dev) to container_of(...). Unfortunately the goto section (out_fail) on oversized packets inside ip_frag_reasm() didn't get touched up as well. Oversized IP packets cause a NULL pointer dereference and immediate hang. I discovered this running openvasd and my previous email on this is titled: NULL pointer dereference at 2.6.32-rc8:net/ipv4/ip_fragment.c:566 Signed-off-by: David Ford <david@blue-labs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
09ad9bc752519cc167d0a573e1acf69b5c707c67 |
|
26-Nov-2009 |
Octavian Purdila <opurdila@ixiacom.com> |
net: use net_eq to compare nets Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f8572d8f2a2ba75408b97dc24ef47c83671795d7 |
|
05-Nov-2009 |
Eric W. Biederman <ebiederm@xmission.com> |
sysctl net: Remove unused binary sysctl code Now that sys_sysctl is a compatiblity wrapper around /proc/sys all sysctl strategy routines, and all ctl_name and strategy entries in the sysctl tables are unused, and can be revmoed. In addition neigh_sysctl_register has been modified to no longer take a strategy argument and it's callers have been modified not to pass one. Cc: "David Miller" <davem@davemloft.net> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
69df9d5993bd7dd7499ad0e98fe824147fbe5667 |
|
06-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip_frag: dont touch device refcount When sending fragmentation expiration ICMP V4/V6 messages, we can avoid touching device refcount, thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d7fcf1a5cae2c970e9afe7192fe0c13d931247e0 |
|
09-Jun-2009 |
David S. Miller <davem@davemloft.net> |
ipv4: Use frag list abstraction interfaces. Signed-off-by: David S. Miller <davem@davemloft.net>
|
adf30907d63893e4208dfe3f5c88ae12bc2f25d5 |
|
02-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504 |
|
19-Mar-2009 |
Jorge Boncompte [DTI2] <jorge@dti2.net> |
netns: oops in ip[6]_frag_reasm incrementing stats dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets. Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter conntrack reassembles them on the OUTPUT path you hit this code path. You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD" With help from Jarek Poplawski. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6d9f239a1edb31d6133230f478fd1dc2da338ec5 |
|
04-Nov-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: '&' redux I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fd3f8c4cb632c28ef915a535617a0fcddcfe3f80 |
|
03-Nov-2008 |
Jianjun Kong <jianjun@zeuux.org> |
net: clean up net/ipv4/ip_fragment.c tcp_timer.c ip_input.c Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
673d57e72398edfedc93fb50ff58048077c9d587 |
|
31-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace NIPQUAD() in net/ipv4/ net/ipv6/ Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
113aa838ec3a235d883f8357d31d90e16c47fc89 |
|
14-Oct-2008 |
Alan Cox <alan@redhat.com> |
net: Rationalise email address: Network Specific Parts Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
547b792cac0a038b9dbf958d3c120df3740b5572 |
|
26-Jul-2008 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c5346fe396f5e22bbfb3ec037c43891c3c57d3e6 |
|
17-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: add net to IP_ADD_STATS_BH Very simple - only ip_evictor (fragments) requires such. This patch ends up the IP_XXX_STATS patching. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7c73a6faffae0bfae70639113aecf06af666e714 |
|
17-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: add net to IP_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
84a3aa000eacbaf841d745b07ef3a3280899056b |
|
17-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
ipv4: prepare net initialization for IP accounting Some places, that deal with IP statistics already have where to get a struct net from, but use it directly, without declaring a separate variable on the stack. So, save this net on the stack for future IP_XXX_STATS macros. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9a375803feaadb6c34e0807bd9325885dcca5c00 |
|
28-Jun-2008 |
Pavel Emelyanov <xemul@openvz.org> |
inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild The problem is that while we work w/o the inet_frags.lock even read-locked the secret rebuild timer may occur (on another CPU, since BHs are still disabled in the inet_frag_find) and change the rnd seed for ipv4/6 fragments. It was caused by my patch fd9e63544cac30a34c951f0ec958038f0529e244 ([INET]: Omit double hash calculations in xxx_frag_intern) late in the 2.6.24 kernel, so this should probably be queued to -stable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0b040829952d84bf2a62526f0e24b624e0699447 |
|
11-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7d291ebb834278e30c211b26fb7076adcb636ad9 |
|
19-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
inet: Register fragmentation some ctls at read-only root. Parts of fragments-related sysctls are read-only, but this is done by cloning all the tables and dropping write-bits from mode. Do the same but with read-only root. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0a64b4b811025ce0386ad84d81504e4ff7985856 |
|
19-May-2008 |
Pavel Emelyanov <xemul@openvz.org> |
inet: Rename fragmentation sysctl-related functions/variables. The fragments sysctls also contains some, that are to be visible, but read-only in net namespaces. The naming in net/core/sysctl_net_core.c is - tables, that are to be registered in namespaces have a "ns" word in their names. So rename ones in ipv4/ip_fragment.c and ipv6/reassembly.c to fit this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a7d632b6b4ad1c92746ed409e41f9dc571ec04e2 |
|
14-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV4]: Use NIPQUAD_FMT to format ipv4 addresses. And use %u to format port. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bc578a54f0fd489d0722303f9a52508495ccaf9a |
|
29-Mar-2008 |
Joe Perches <joe@perches.com> |
[NET]: Rename inet_frag.h identifiers COMPLETE, FIRST_IN, LAST_IN to INET_FRAG_* On Fri, 2008-03-28 at 03:24 -0700, Andrew Morton wrote: > they should all be renamed. Done for include/net and net Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c346dca10840a874240c78efe3f39acf4312a1f2 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
cb84663e4d239f23f0d872bc6463c272e74daad8 |
|
24-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Process IP layer in the context of the correct namespace. Replace all the rest of the init_net with a proper net on the IP layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
12b101555f4a67db67a66966a516075bd477741f |
|
21-Mar-2008 |
Phil Oester <kernel@linuxace.com> |
[IPV4]: Fix null dereference in ip_defrag Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag. Offending line in ip_defrag is here: net = skb->dev->nd_net where dev is NULL. Bisected the problem down to commit ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces). Below patch (idea from Patrick McHardy) fixes the problem for me. Signed-off-by: Phil Oester <kernel@linuxace.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
81566e8322c3f6c6f9a2277fe0e440fee8d917bd |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make the pernet subsystem for fragments. On namespace start we mainly prepare the ctl variables. When the namespace is stopped we have to kill all the fragments that point to this namespace. The inet_frags_exit_net() handles it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3140c25c82106645a6b1fc469dab7006a1d09fd0 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make the LRU list per namespace. The inet_frags.lru_list is used for evicting only, so we have to make it per-namespace, to evict only those fragments, who's namespace exceeded its high threshold, but not the whole hash. Besides, this helps to avoid long loops in evictor. The spinlock is not per-namespace because it protects the hash table as well, which is global. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3b4bc4a2bfe80d01ebd4f2b6dcc58986c970ed16 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Isolate the secret interval from namespaces. Since we have one hashtable to lookup the fragment, having different secret_interval-s for hash rebuild doesn't make sense, so move this one to inet_frags. The inet_frags_ctl becomes empty after this, so remove it. The appropriate ctl table is kept read-only in namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e31e0bdc7e7fb9a4b09d2f3266c035a18fdcee9d |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make thresholds work in namespaces. This is the same as with the timeout variable. Currently, after exceeding the high threshold _all_ the fragments are evicted, but it will be fixed in later patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b2fd5321dd160ef309dfb6cfc78ed8de4a830659 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces. Move it to the netns_frags, adjust the usage and make the appropriate ctl table writable. Now fragment, that live in different namespaces can live for different times. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e4a2d5c2bccd5bd29de5ae4f14ff4448fac9cfc8 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Duplicate sysctl tables for new namespaces. Each namespace has to have own tables to tune their different parameters, so duplicate the tables and register them. All the tables in sub-namespaces are temporarily made read-only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6ddc082223ef0f73717b4133fa7e648842bbfd02 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make the mem counter per-namespace. This is also simple, but introduces more changes, since then mem counter is altered in more places. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e5a2bb842cd9681d00d4ca963e63e4d3647e66f8 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make the nqueues counter per-namespace. This is simple - just move the variable from struct inet_frags to struct netns_frags and adjust the usage appropriately. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ac18e7509e7df327e30d6e073a787d922eaf211d |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces. Since fragment management code is consolidated, we cannot have the pointer from inet_frag_queue to struct net, since we must know what king of fragment this is. So, I introduce the netns_frags structure. This one is currently empty, but will be eventually filled with per-namespace attributes. Each inet_frag_queue is tagged with this one. The conntrack_reasm is not "netns-izated", so it has one static netns_frags instance to keep working in init namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8d8354d2fb9277f165715a6e1cb92bcc89259975 |
|
22-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS][FRAGS]: Move ctl tables around. This is a preparation for sysctl netns-ization. Move the ctl tables to the files, where the tuning variables reside. Plus make the helpers to register the tables. This will simplify the later patches and will keep similar things closer to each other. ipv4, ipv6 and conntrack_reasm are patched differently, but the result is all the tables are in appropriate files. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
45542479fb261342d5244869cf3ca4636b7ffd43 |
|
18-Oct-2007 |
David Howells <dhowells@redhat.com> |
[NET]: Fix uninitialised variable in ip_frag_reasm() Fix uninitialised variable in ip_frag_reasm(). err should be set to -ENOMEM if the initial call of skb_clone() fails. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c95477090a2ace6d241c184adc3fbfcab9c61ceb |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate frag queues freeing Since we now allocate the queues in inet_fragment.c, we can safely free it in the same place. The ->destructor callback thus becomes optional for inet_frags. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
48d60056387c37a17a46feda48613587a90535e5 |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Remove no longer needed ->equal callback Since this callback is used to check for conflicts in hashtable when inserting a newly created frag queue, we can do the same by checking for matching the queue with the argument, used to create one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
abd6523d15f40bfee14652619a31a7f65f77f581 |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate xxx_find() in fragment management Here we need another callback ->match to check whether the entry found in hash matches the key passed. The key used is the same as the creation argument for inet_frag_create. Yet again, this ->match is the same for netfilter and ipv6. Running a frew steps forward - this callback will later replace the ->equal one. Since the inet_frag_find() uses the already consolidated inet_frag_create() remove the xxx_frag_create from protocol codes. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c6fda282294da882f8d8cc4c513940277dd380f5 |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate xxx_frag_create() This one uses the xxx_frag_intern() and xxx_frag_alloc() routines, which are already consolidated, so remove them from protocol code (as promised). The ->constructor callback is used to init the rest of the frag queue and it is the same for netfilter and ipv6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e521db9d790aaa60ae8920e21cb7faedc280fc36 |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate xxx_frag_alloc() Just perform the kzalloc() allocation and setup common fields in the inet_frag_queue(). Then return the result to the caller to initialize the rest. The inet_frag_alloc() may return NULL, so check the return value before doing the container_of(). This looks ugly, but the xxx_frag_alloc() will be removed soon. The xxx_expire() timer callbacks are patches, because the argument is now the inet_frag_queue, not the protocol specific queue. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2588fe1d782f1686847493ad643157d5d10bf602 |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate xxx_frag_intern This routine checks for the existence of a given entry in the hash table and inserts the new one if needed. The ->equal callback is used to compare two frag_queue-s together, but this one is temporary and will be removed later. The netfilter code and the ipv6 one use the same routine to compare frags. The inet_frag_intern() always returns non-NULL pointer, so convert the inet_frag_queue into protocol specific one (with the container_of) without any checks. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fd9e63544cac30a34c951f0ec958038f0529e244 |
|
18-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Omit double hash calculations in xxx_frag_intern Since the hash value is already calculated in xxx_find, we can simply use it later. This is already done in netfilter code, so make the same in ipv4 and ipv6. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f1673ca52c04f1b311abe03fd67cd4d650d19435 |
|
15-Oct-2007 |
Denis V. Lunev <den@openvz.org> |
[INET]: kmalloc+memset -> kzalloc in frag_alloc_queue kmalloc + memset -> kzalloc in frag_alloc_queue Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
762cc40801ad757a34527d5e548816cf3b6fc606 |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate the xxx_put These ones use the generic data types too, so move them in one place. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4b6cb5d8e3f5707d7a2e55cf7b05f1ea8bfc7a6d |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Small cleanup for xxx_put after evictor consolidation After the evictor code is consolidated there is no need in passing the extra pointer to the xxx_put() functions. The only place when it made sense was the evictor code itself. Maybe this change must got with the previous (or with the next) patch, but I try to make them shorter as much as possible to simplify the review (but they are still large anyway), so this change goes in a separate patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8e7999c44ee95e1e90ac91c83557a04e2948f160 |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate the xxx_evictor The evictors collect some statistics for ipv4 and ipv6, so make it return the number of evicted queues and account them all at once in the caller. The XXX_ADD_STATS_BH() macros are just for this case, but maybe there are places in code, that can make use of them as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1e4b82873af0f21002e37a81ef063d2e5410deb3 |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate the xxx_frag_destroy To make in possible we need to know the exact frag queue size for inet_frags->mem management and two callbacks: * to destoy the skb (optional, used in conntracks only) * to free the queue itself (mandatory, but later I plan to move the allocation and the destruction of frag_queues into the common place, so this callback will most likely be optional too). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
321a3a99e4717b960e21c62fc6a140d21453df7f |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate xxx_the secret_rebuild This code works with the generic data types as well, so move this into inet_fragment.c This move makes it possible to hide the secret_timer management and the secret_rebuild routine completely in the inet_fragment.c Introduce the ->hashfn() callback in inet_frags() to get the hashfun for a given inet_frag_queue() object. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
277e650ddfc6944ef5f5466fd898b8da7f06cd82 |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Consolidate the xxx_frag_kill Since now all the xxx_frag_kill functions now work with the generic inet_frag_queue data type, this can be moved into a common place. The xxx_unlink() code is moved as well. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
04128f233f2b344f3438cde09723e9946463a573 |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Collect common frag sysctl variables together Some sysctl variables are used to tune the frag queues management and it will be useful to work with them in a common way in the future, so move them into one structure, moreover they are the same for all the frag management codes. I don't place them in the existing inet_frags object, introduced in the previous patch for two reasons: 1. to keep them in the __read_mostly section; 2. not to export the whole inet_frags objects outside. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7eb95156d9dce2f59794264db336ce007d71638b |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Collect frag queues management objects together There are some objects that are common in all the places which are used to keep track of frag queues, they are: * hash table * LRU list * rw lock * rnd number for hash function * the number of queues * the amount of memory occupied by queues * secret timer Move all this stuff into one structure (struct inet_frags) to make it possible use them uniformly in the future. Like with the previous patch this mostly consists of hunks like - write_lock(&ipfrag_lock); + write_lock(&ip4_frags.lock); To address the issue with exporting the number of queues and the amount of memory occupied by queues outside the .c file they are declared in, I introduce a couple of helpers. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5ab11c98d3a950faf6922b6166e5f8fc874590e7 |
|
15-Oct-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[INET]: Move common fields from frag_queues in one place. Introduce the struct inet_frag_queue in include/net/inet_frag.h file and place there all the common fields from three structs: * struct ipq in ipv4/ip_fragment.c * struct nf_ct_frag6_queue in nf_conntrack_reasm.c * struct frag_queue in ipv6/reassembly.c After this, replace these fields on appropriate structures with this structure instance and fix the users to use correct names i.e. hunks like - atomic_dec(&fq->refcnt); + atomic_dec(&fq->q.refcnt); (these occupy most of the patch) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
776c729e8d91b2740583a2169678f2d3f383458b |
|
14-Oct-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Change ip_defrag to return an integer Now that ip_frag always returns the packet given to it on input, we can change it to return an integer indicating error instead. This patch does that and updates all its callers accordingly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1706d58763c36133d7fce6cc78b1444fd40db28c |
|
14-Oct-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Make ip_defrag return the same packet This patch is a bit of a hack. However it is worth it if you consider that this is the only reason why we have to carry around the struct sk_buff ** pointers in netfilter. It makes ip_defrag always return the packet that was given to it on input. It does this by cloning the packet and replacing its original contents with the head fragment if necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
881d966b48b035ab3f3aeaae0f3d3f9b584f45b2 |
|
17-Sep-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make the device list and device lookups per namespace. This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
eddc9ec53be2ecdbf4efe0efd4a83052594f0ac0 |
|
21-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c9bdd4b5257406b0608385d19c40b5511decf4f6 |
|
13-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[IP]: Introduce ip_hdrlen() For the common sequence "skb->nh.iph->ihl * 4", removing a good number of open coded skb->nh.iph uses, now to go after the rest... Just out of curiosity, here are the idioms found to get the same result: skb->nh.iph->ihl << 2 skb->nh.iph->ihl<<2 skb->nh.iph->ihl * 4 skb->nh.iph->ihl*4 (skb->nh.iph)->ihl * sizeof(u32) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d56f90a7c96da5187f0cdf07ee7434fe6aa78bbc |
|
11-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
132adf54639cf7dd9315e8df89c2faa59f6e46d9 |
|
09-Mar-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[IPV4]: cleanup Add whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b7aa0bf70c4afb9e38be25f5c0922498d0f8684c |
|
20-Apr-2007 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: convert network timestamps to ktime_t We currently use a special structure (struct skb_timeval) and plain 'struct timeval' to store packet timestamps in sk_buffs and struct sock. This has some drawbacks : - Fixed resolution of micro second. - Waste of space on 64bit platforms where sizeof(struct timeval)=16 I suggest using ktime_t that is a nice abstraction of high resolution time services, currently capable of nanosecond resolution. As sizeof(ktime_t) is 8 bytes, using ktime_t in 'struct sock' permits a 8 byte shrink of this structure on 64bit architectures. Some other structures also benefit from this size reduction (struct ipq in ipv4/ip_fragment.c, struct frag_queue in ipv6/reassembly.c, ...) Once this ktime infrastructure adopted, we can more easily provide nanosecond resolution on top of it. (ioctl SIOCGSTAMPNS and/or SO_TIMESTAMPNS/SCM_TIMESTAMPNS) Note : this patch includes a bug correction in compat_sock_get_timestamp() where a "err = 0;" was missing (so this syscall returned -ENOENT instead of 0) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> CC: John find <linux.kernel@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e905a9edab7f4f14f9213b52234e4a346c690911 |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] IPV4: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
47c6bf7760bb8021bf7782f05bcd3b9f73ed2c2e |
|
12-Dec-2006 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
fix typo in net/ipv4/ip_fragment.c Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
182777700d912a69824245e9ee99148ac0aa57d7 |
|
27-Sep-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: ip_fragment.c endianness annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ab32ea5d8a760e7dd4339634e95d7be24ee5b842 |
|
22-Sep-2006 |
Brian Haley <brian.haley@hp.com> |
[NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly. Couldn't actually measure any performance increase while testing (.3% I consider noise), but seems like the right thing to do. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
84fa7933a33f806bbbaae6775e87459b1ec584c0 |
|
30-Aug-2006 |
Patrick McHardy <kaber@trash.net> |
[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6ab3d5624e172c553004ecc862bfeac16d9d68b7 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
55c0022e53452360064ea264c41410c70565d9f8 |
|
10-Apr-2006 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV4] ip_fragment: Always compute hash with ipfrag_lock held. Otherwise we could compute an inaccurate hash due to the random seed changing. Noticed by Zach Brown and patch is based upon some feedback from Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
|
76ab608d86cf1ef5c5c46819b5733eb9f9f964f8 |
|
06-Jan-2006 |
Alexey Dobriyan <adobriyan@gmail.com> |
[NET]: Endian-annotate struct iphdr And fix trivial warnings that emerged. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
89cee8b1cbb9dac40c92ef1968aea2b45f82fd18 |
|
14-Dec-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Safer reassembly Another spin of Herbert Xu's "safer ip reassembly" patch for 2.6.16. (The original patch is here: http://marc.theaimsgroup.com/?l=linux-netdev&m=112281936522415&w=2 and my only contribution is to have tested it.) This patch (optionally) does additional checks before accepting IP fragments, which can greatly reduce the possibility of reassembling fragments which originated from different IP datagrams. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e7c8a41e817f381ac5c2a59ecc81b483bd68a7df |
|
16-Nov-2005 |
Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> |
[IPV4,IPV6]: replace handmade list with hlist in IPv{4,6} reassembly Both of ipq and frag_queue have *next and **prev, and they can be replaced with hlist. Thanks Arnaldo Carvalho de Melo for the suggestion. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
48bc41a49c4f3aa760dff84e7f71437f5ed520fe |
|
07-Sep-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[IPV4]: Reassembly trim not clearing CHECKSUM_HW This was found by inspection while looking for checksum problems with the skge driver that sets CHECKSUM_HW. It did not fix the problem, but it looks like it is needed. If IP reassembly is trimming an overlapping fragment, it should reset (or adjust) the hardware checksum flag on the skb. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a61bbcf28a8cb0ba56f8193d512f7222e711a294 |
|
15-Aug-2005 |
Patrick McHardy <kaber@trash.net> |
[NET]: Store skb->timestamp as offset to a base timestamp Reduces skb size by 8 bytes on 64-bit. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
64ce207306debd7157f47282be94770407bec01c |
|
10-Aug-2005 |
Patrick McHardy <kaber@trash.net> |
[NET]: Make NETDEBUG pure printk wrappers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ca9334523c853e407da7b3a0bd02f54d0fa59414 |
|
08-Aug-2005 |
Heikki Orsila <heikki.orsila@iki.fi> |
[IPV4]: Debug cleanup Here's a small patch to cleanup NETDEBUG() use in net/ipv4/ for Linux kernel 2.6.13-rc5. Also weird use of indentation is changed in some places. Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 |
|
17-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|