6a3db02c6f32102f98e1bd8d33b28ba5ec4528bb |
|
30-Jun-2009 |
Chia-chi Yeh <chiachi@android.com> |
net: Replace AID_NET_RAW checks with capable(CAP_NET_RAW). Signed-off-by: Chia-chi Yeh <chiachi@android.com>
|
c7c3ec4903d32c60423ee013d96e94602f66042c |
|
12-May-2008 |
Robert Love <rlove@google.com> |
net: socket ioctl to reset connections matching local address Introduce a new socket ioctl, SIOCKILLADDR, that nukes all sockets bound to the same local address. This is useful in situations with dynamic IPs, to kill stuck connections. Signed-off-by: Brian Swetland <swetland@google.com> net: fix tcp_v4_nuke_addr Signed-off-by: Dima Zavin <dima@android.com> net: ipv4: Fix a spinlock recursion bug in tcp_v4_nuke. We can't hold the lock while calling to tcp_done(), so we drop it before calling. We then have to start at the top of the chain again. Signed-off-by: Dima Zavin <dima@android.com> net: ipv4: Fix race in tcp_v4_nuke_addr(). To fix a recursive deadlock in 2.6.29, we stopped holding the hash table lock across tcp_done() calls. This fixed the deadlock, but introduced a race where the socket could die or change state. Fix: Before unlocking the hash table, we grab a reference to the socket. We can then unlock the hash table without risk of the socket going away. We then lock the socket, which is safe because it is pinned. We can then call tcp_done() without recursive deadlock and without race. Upon return, we unlock the socket and then unpin it, killing it. Change-Id: Idcdae072b48238b01bdbc8823b60310f1976e045 Signed-off-by: Robert Love <rlove@google.com> Acked-by: Dima Zavin <dima@android.com> ipv4: disable bottom halves around call to tcp_done(). Signed-off-by: Robert Love <rlove@google.com> Signed-off-by: Colin Cross <ccross@android.com> ipv4: Move sk_error_report inside bh_lock_sock in tcp_v4_nuke_addr When sk_error_report is called, it wakes up the user-space thread, which then calls tcp_close. When the tcp_close is interrupted by the tcp_v4_nuke_addr ioctl thread running tcp_done, it leaks 392 bytes and triggers a WARN_ON. This patch moves the call to sk_error_report inside the bh_lock_sock, which matches the locking used in tcp_v4_err. Signed-off-by: Colin Cross <ccross@android.com>
|
db0e07948289cd4d332b82b1b5e9fe05846b2bc6 |
|
15-Oct-2008 |
Robert Love <rlove@google.com> |
Paranoid network. With CONFIG_ANDROID_PARANOID_NETWORK, require specific uids/gids to instantiate network sockets. Signed-off-by: Robert Love <rlove@google.com> paranoid networking: Use in_egroup_p() to check group membership The previous group_search() caused trouble for partners with module builds. in_egroup_p() is also cleaner. Signed-off-by: Nick Pelly <npelly@google.com> Fix 2.6.29 build. Signed-off-by: Arve Hjønnevåg <arve@android.com> net: Fix compilation of the IPv6 module Fix compilation of the IPv6 module -- current->euid does not exist anymore, current_euid() is what needs to be used. Signed-off-by: Steinar H. Gunderson <sesse@google.com> net: bluetooth: Remove the AID_NET_BT* gid numbers Removed bluetooth checks for AID_NET_BT and AID_NET_BT_ADMIN which are not useful anymore. This is in preparation for getting rid of all the AID_* gids. Signed-off-by: JP Abgrall <jpa@google.com>
|
f4713a3dfad045d46afcb9c2a7d0bba288920ed4 |
|
26-Nov-2014 |
Willem de Bruijn <willemb@google.com> |
net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets. If the socket is of family AF_INET6, call ipv6_recv_error instead of ip_recv_error. This change is more complex than a single branch due to the loadable ipv6 module. It reuses a pre-existing indirect function call from ping. The ping code is safe to call, because it is part of the core ipv6 module and always present when AF_INET6 sockets are active. Fixes: 4ed2d765 (net-timestamp: TCP timestamping) Signed-off-by: Willem de Bruijn <willemb@google.com> ---- It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6) to ip_recv_error. Signed-off-by: David S. Miller <davem@davemloft.net>
|
1e16aa3ddf863c6b9f37eddf52503230a62dedb3 |
|
20-Oct-2014 |
Florian Westphal <fw@strlen.de> |
net: gso: use feature flag argument in all protocol gso handlers skb_gso_segment() has a 'features' argument representing offload features available to the output path. A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use those instead of the provided ones when handing encapsulation/tunnels. Depending on dev->hw_enc_features of the output device skb_gso_segment() can then return NULL even when the caller has disabled all GSO feature bits, as segmentation of inner header thinks device will take care of segmentation. This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs that did not fit the remaining token quota as the segmentation does not work when device supports corresponding hw offload capabilities. Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2c804d0f8fc7799981d9fdd8c88653541b28c1a7 |
|
01-Oct-2014 |
Eric Dumazet <edumazet@google.com> |
ipv4: mentions skb_gro_postpull_rcsum() in inet_gro_receive() Proper CHECKSUM_COMPLETE support needs to adjust skb->csum when we remove one header. Its done using skb_gro_postpull_rcsum() In the case of IPv4, we know that the adjustment is not really needed, because the checksum over IPv4 header is 0. Lets add a comment to ease code comprehension and avoid copy/paste errors. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
53e50398968d43338c4d932114e68bc099fc5fbd |
|
20-Sep-2014 |
Tom Herbert <therbert@google.com> |
net: Remove gso_send_check as an offload callback The send_check logic was only interesting in cases of TCP offload and UDP UFO where the checksum needed to be initialized to the pseudo header checksum. Now we've moved that logic into the related gso_segment functions so gso_send_check is no longer needed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9667e9bb3f366435dde74f22578876daae850feb |
|
09-Sep-2014 |
Tom Herbert <therbert@google.com> |
ipip: Add gro callbacks to ipip offload Add inet_gro_receive and inet_gro_complete to ipip_offload to support GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
49a601589caaf0e93194c0cc9b4ecddbe75dd2d5 |
|
05-Sep-2014 |
Vincent Bernat <vincent@bernat.im> |
net/ipv4: bind ip_nonlocal_bind to current netns net.ipv4.ip_nonlocal_bind sysctl was global to all network namespaces. This patch allows to set a different value for each network namespace. Signed-off-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c3caf1192f904de2f1381211f564537235d50de3 |
|
15-Jul-2014 |
Jerry Chu <hkchu@google.com> |
net-gre-gro: Fix a bug that breaks the forwarding path Fixed a bug that was introduced by my GRE-GRO patch (bf5a755f5e9186406bbf50f4087100af5bd68e40 net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4749c09c37030ccdc44aecebe0f71b02a377fc14 |
|
05-Jun-2014 |
Tom Herbert <therbert@google.com> |
gre: Call gso_make_checksum Call gso_make_checksum. This should have the benefit of using a checksum that may have been previously computed for the packet. This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that offload GRE GSO with and without the GRE checksum offloaed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0f4f4ffa7b7c3d29d0537a126145c9f8d8ed5dbc |
|
05-Jun-2014 |
Tom Herbert <therbert@google.com> |
net: Add GSO support for UDP tunnels with checksum Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates that a device is capable of computing the UDP checksum in the encapsulating header of a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b26ba202e0500eb852e89499ece1b2deaa64c3a7 |
|
23-May-2014 |
Tom Herbert <therbert@google.com> |
net: Eliminate no_check from protosw It doesn't seem like an protocols are setting anything other than the default, and allowing to arbitrarily disable checksums for a whole protocol seems dangerous. This can be done on a per socket basis. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
122ff243f5f104194750ecbc76d5946dd1eec934 |
|
13-May-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
ipv4: make ip_local_reserved_ports per netns ip_local_port_range is already per netns, so should ip_local_reserved_ports be. And since it is none by default we don't actually need it when we don't enable CONFIG_SYSCTL. By the way, rename inet_is_reserved_local_port() to inet_is_local_reserved_port() Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ba6b918ab234186d3aa1663e296586a1b526b77a |
|
06-May-2014 |
Cong Wang <xiyou.wangcong@gmail.com> |
ping: move ping_group_range out of CONFIG_SYSCTL Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still work, just that no one can change it. Therefore we should move it out of sysctl_net_ipv4.c. And, it should not share the same seqlock with ip_local_port_range. BTW, rename it to ->ping_group_range instead. Cc: David S. Miller <davem@davemloft.net> Cc: Francois Romieu <romieu@fr.zoreil.com> Reported-by: Stefan de Konink <stefan@konink.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c9d8f1a64225dfcc2f721d73a5984a2444920744 |
|
06-May-2014 |
Cong Wang <xiyou.wangcong@gmail.com> |
ipv4: move local_port_range out of CONFIG_SYSCTL When CONFIG_SYSCTL is not set, ip_local_port_range should still work, just that no one can change it. Therefore we should move it out of sysctl_inet.c. Also, rename it to ->ip_local_ports instead. Cc: David S. Miller <davem@davemloft.net> Cc: Francois Romieu <romieu@fr.zoreil.com> Reported-by: Stefan de Konink <stefan@konink.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
698365fa1874aa7635d51667a34a2842228e9837 |
|
06-May-2014 |
WANG Cong <xiyou.wangcong@gmail.com> |
net: clean up snmp stats code commit 8f0ea0fe3a036a47767f9c80e (snmp: reduce percpu needs by 50%) reduced snmp array size to 1, so technically it doesn't have to be an array any more. What's more, after the following commit: commit 933393f58fef9963eac61db8093689544e29a600 Date: Thu Dec 22 11:58:51 2011 -0600 percpu: Remove irqsafe_cpu_xxx variants We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after almost 3 years, no one complains. So, just convert the array to a single pointer and remove snmp_mib_init() and snmp_mib_free() as well. Cc: Christoph Lameter <cl@linux.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
57a7744e09867ebcfa0ccf1d6d529caa7728d552 |
|
14-Mar-2014 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
91a48a2e85a3b70ce10ead34b4ab5347f8d215c9 |
|
24-Feb-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv4: ipv6: better estimate tunnel header cut for correct ufo handling Currently the UFO fragmentation process does not correctly handle inner UDP frames. (The following tcpdumps are captured on the parent interface with ufo disabled while tunnel has ufo enabled, 2000 bytes payload, mtu 1280, both sit device): IPv6: 16:39:10.031613 IP (tos 0x0, ttl 64, id 3208, offset 0, flags [DF], proto IPv6 (41), length 1300) 192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 1240) 2001::1 > 2001::8: frag (0x00000001:0|1232) 44883 > distinct: UDP, length 2000 16:39:10.031709 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPv6 (41), length 844) 192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 784) 2001::1 > 2001::8: frag (0x00000001:0|776) 58979 > 46366: UDP, length 5471 We can see that fragmentation header offset is not correctly updated. (fragmentation id handling is corrected by 916e4cf46d0204 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")). IPv4: 16:39:57.737761 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPIP (4), length 1296) 192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57034, offset 0, flags [none], proto UDP (17), length 1276) 192.168.99.1.35961 > 192.168.99.2.distinct: UDP, length 2000 16:39:57.738028 IP (tos 0x0, ttl 64, id 3210, offset 0, flags [DF], proto IPIP (4), length 792) 192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57035, offset 0, flags [none], proto UDP (17), length 772) 192.168.99.1.13531 > 192.168.99.2.20653: UDP, length 51109 In this case fragmentation id is incremented and offset is not updated. First, I aligned inet_gso_segment and ipv6_gso_segment: * align naming of flags * ipv6_gso_segment: setting skb->encapsulation is unnecessary, as we always ensure that the state of this flag is left untouched when returning from upper gso segmenation function * ipv6_gso_segment: move skb_reset_inner_headers below updating the fragmentation header data, we don't care for updating fragmentation header data * remove currently unneeded comment indicating skb->encapsulation might get changed by upper gso_segment callback (gre and udp-tunnel reset encapsulation after segmentation on each fragment) If we encounter an IPIP or SIT gso skb we now check for the protocol == IPPROTO_UDP and that we at least have already traversed another ip(6) protocol header. The reason why we have to special case GSO_IPIP and GSO_SIT is that we reset skb->encapsulation to 0 while skb_mac_gso_segment the inner protocol of GSO_UDP_TUNNEL or GSO_GRE packets. Reported-by: Wolfgang Walter <linux@stwm.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8ed1dc44d3e9e8387a104b1ae8f92e9a3fbf1b1e |
|
09-Jan-2014 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv4: introduce hardened ip_no_pmtu_disc mode This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors to be honored by protocols which do more stringent validation on the ICMP's packet payload. This knob is useful for people who e.g. want to run an unmodified DNS server in a namespace where they need to use pmtu for TCP connections (as they are used for zone transfers or fallback for requests) but don't want to use possibly spoofed UDP pmtu information. Currently the whitelisted protocols are TCP, SCTP and DCCP as they check if the returned packet is in the window or if the association is valid. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bf5a755f5e9186406bbf50f4087100af5bd68e40 |
|
07-Jan-2014 |
Jerry Chu <hkchu@google.com> |
net-gre-gro: Add GRE support to the GRO stack This patch built on top of Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO stack. It also serves as an example for supporting other encapsulation protocols in the GRO stack in the future. The patch supports version 0 and all the flags (key, csum, seq#) but will flush any pkt with the S (seq#) flag. This is because the S flag is not support by GSO, and a GRO pkt may end up in the forwarding path, thus requiring GSO support to break it up correctly. Currently the "packet_offload" structure only contains L3 (ETH_P_IP/ ETH_P_IPV6) GRO offload support so the encapped pkts are limited to IP pkts (i.e., w/o L2 hdr). But support for other protocol type can be easily added, so is the support for GRE variations like NVGRE. The patch also support csum offload. Specifically if the csum flag is on and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE), the code will take advantage of the csum computed by the h/w when validating the GRE csum. Note that commit 60769a5dcd8755715c7143b4571d5c44f01796f1 "ipv4: gre: add GRO capability" already introduces GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. But GRO is done after GRE hdr has been removed (i.e., decapped). The following patch applies GRO when pkts first come in (before hitting the GRE tunnel code). There is some performance advantage for applying GRO as early as possible. Also this approach is transparent to other subsystem like Open vSwitch where GRE decap is handled outside of the IP stack hence making it harder for the gro_cells stuff to apply. On the other hand, some NICs are still not capable of hashing on the inner hdr of a GRE pkt (RSS). In that case the GRO processing of pkts from the same remote host will all happen on the same CPU and the performance may be suboptimal. I'm including some rough preliminary performance numbers below. Note that the performance will be highly dependent on traffic load, mix as usual. Moreover it also depends on NIC offload features hence the following is by no means a comprehesive study. Local testing and tuning will be needed to decide the best setting. All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs. (super_netperf 50 -H 192.168.1.18 -l 30) An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1 mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123) is configured. The GRO support for pkts AFTER decap are controlled through the device feature of the GRE device (e.g., ethtool -K gre1 gro on/off). 1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off thruput: 9.16Gbps CPU utilization: 19% 1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off thruput: 5.9Gbps CPU utilization: 15% 1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 12-13% 1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 10% The following tests were performed on a different NIC that is capable of csum offload. I.e., the h/w is capable of computing IP payload csum (CHECKSUM_COMPLETE). 2.1 ethtool -K gre1 gro on (hence will use gro_cells) 2.1.1 ethtool -K eth0 gro off; csum offload disabled thruput: 8.53Gbps CPU utilization: 9% 2.1.2 ethtool -K eth0 gro off; csum offload enabled thruput: 8.97Gbps CPU utilization: 7-8% 2.1.3 ethtool -K eth0 gro on; csum offload disabled thruput: 8.83Gbps CPU utilization: 5-6% 2.1.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.98Gbps CPU utilization: 5% 2.2 ethtool -K gre1 gro off 2.2.1 ethtool -K eth0 gro off; csum offload disabled thruput: 5.93Gbps CPU utilization: 9% 2.2.2 ethtool -K eth0 gro off; csum offload enabled thruput: 5.62Gbps CPU utilization: 8% 2.2.3 ethtool -K eth0 gro on; csum offload disabled thruput: 7.69Gbps CPU utilization: 8% 2.2.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.96Gbps CPU utilization: 5-6% Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
974eda11c54290a1be8f8b155edae7d791e5ce57 |
|
14-Dec-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
inet: make no_pmtu_disc per namespace and kill ipv4_config The other field in ipv4_config, log_martians, was converted to a per-interface setting, so we can just remove the whole structure. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
299603e8370a93dd5d8e8d800f0dff1ce2c53d36 |
|
12-Dec-2013 |
Jerry Chu <hkchu@google.com> |
net-gro: Prepare GRO stack for the upcoming tunneling support This patch modifies the GRO stack to avoid the use of "network_header" and associated macros like ip_hdr() and ipv6_hdr() in order to allow an arbitary number of IP hdrs (v4 or v6) to be used in the encapsulation chain. This lays the foundation for various IP tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later. With this patch, the GRO stack traversing now is mostly based on skb_gro_offset rather than special hdr offsets saved in skb (e.g., skb->network_header). As a result all but the top layer (i.e., the the transport layer) must have hdrs of the same length in order for a pkt to be considered for aggregation. Therefore when adding a new encap layer (e.g., for tunneling), one must check and skip flows (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a different hdr length. Note that unlike the network header, the transport header can and will continue to be set by the GRO code since there will be at most one "transport layer" in the encap chain. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0e0d44ab4275549998567cd4700b43f7496eb62b |
|
28-Aug-2013 |
Steffen Klassert <steffen.klassert@secunet.com> |
net: Remove FLOWI_FLAG_CAN_SLEEP FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility to sleep until the needed states are resolved. This code is gone, so FLOWI_FLAG_CAN_SLEEP is not needed anymore. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
dcd607718385d02ce3741de225927a57f528f93b |
|
08-Nov-2013 |
Eric Dumazet <edumazet@google.com> |
inet: fix a UFO regression While testing virtio_net and skb_segment() changes, Hannes reported that UFO was sending wrong frames. It appears this was introduced by a recent commit : 8c3a897bfab1 ("inet: restore gso for vxlan") The old condition to perform IP frag was : tunnel = !!skb->encapsulation; ... if (!tunnel && proto == IPPROTO_UDP) { So the new one should be : udpfrag = !skb->encapsulation && proto == IPPROTO_UDP; ... if (udpfrag) { Initialization of udpfrag must be done before call to ops->callbacks.gso_segment(skb, features), as skb_udp_tunnel_segment() clears skb->encapsulation (We want udpfrag to be true for UFO, false for VXLAN) With help from Alexei Starovoitov Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
827da44c61419f29ae3be198c342e2147f1a10cb |
|
08-Oct-2013 |
John Stultz <john.stultz@linaro.org> |
net: Explicitly initialize u64_stats_sync structures for lockdep In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
8c3a897bfab10f68f90252440bb29e6749a7312a |
|
28-Oct-2013 |
Eric Dumazet <edumazet@google.com> |
inet: restore gso for vxlan Alexei reported a performance regression on vxlan, caused by commit 3347c9602955 "ipv4: gso: make inet_gso_segment() stackable" GSO vxlan packets were not properly segmented, adding IP fragments while they were not expected. Rename 'bool tunnel' to 'bool encap', and add a new boolean to express the fact that UDP should be fragmented. This fragmentation is triggered by skb->encapsulation being set. Remove a "skb->encapsulation = 1" added in above commit, as its not needed, as frags inherit skb->frag from original GSO skb. Reported-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
61c1db7fae21ed33c614356a43bf6580c5e53118 |
|
21-Oct-2013 |
Eric Dumazet <edumazet@google.com> |
ipv6: sit: add GSO/TSO support Now ipv6_gso_segment() is stackable, its relatively easy to implement GSO/TSO support for SIT tunnels Performance results, when segmentation is done after tunnel device (as no NIC is yet enabled for TSO SIT support) : Before patch : lpq84:~# ./netperf -H 2002:af6:1153:: -Cc MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 3168.31 4.81 4.64 2.988 2.877 After patch : lpq84:~# ./netperf -H 2002:af6:1153:: -Cc MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 5525.00 7.76 5.17 2.763 1.840 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a4fe34bf902b8f709c635ab37f1f39de0b86cff2 |
|
20-Oct-2013 |
Eric W. Biederman <ebiederm@xmission.com> |
tcp_memcontrol: Remove the per netns control. The code that is implemented is per memory cgroup not per netns, and having per netns bits is just confusing. Remove the per netns bits to make it easier to see what is really going on. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1bbdceef1e535add893bf71d7b7ab102e4eb69eb |
|
19-Oct-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once Initialize the ehash and ipv6_hash_secrets with net_get_random_once. Each compilation unit gets its own secret now: ipv4/inet_hashtables.o ipv4/udp.o ipv6/inet6_hashtables.o ipv6/udp.o rds/connection.o The functions still get inlined into the hashing functions. In the fast path we have at most two (needed in ipv6) if (unlikely(...)). Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cb32f511a70be8967ac9025cf49c44324ced9a39 |
|
19-Oct-2013 |
Eric Dumazet <edumazet@google.com> |
ipip: add GSO/TSO support Now inet_gso_segment() is stackable, its relatively easy to implement GSO/TSO support for IPIP Performance results, when segmentation is done after tunnel device (as no NIC is yet enabled for TSO IPIP support) : Before patch : lpq83:~# ./netperf -H 7.7.9.84 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167 After patch : lpq83:~# ./netperf -H 7.7.9.84 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3347c960295583eee3fd58e5c539fb1972fbc005 |
|
19-Oct-2013 |
Eric Dumazet <edumazet@google.com> |
ipv4: gso: make inet_gso_segment() stackable In order to support GSO on IPIP, we need to make inet_gso_segment() stackable. It should not assume network header starts right after mac header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
47d27aad44169372f358cda88a223883f6760fa5 |
|
18-Oct-2013 |
Eric Dumazet <edumazet@google.com> |
ipv4: gso: send_check() & segment() cleanups inet_gso_segment() and inet_gso_send_check() are called by skb_mac_gso_segment() under rcu lock, no need to use rcu_read_lock() / rcu_read_unlock() Avoid calling ip_hdr() twice per function. We can use ip_send_check() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
421b3885bf6d56391297844f43fb7154a6396e12 |
|
07-Oct-2013 |
Shawn Bohrer <sbohrer@rgmadvisors.com> |
udp: ipv4: Add udp early demux The removal of the routing cache introduced a performance regression for some UDP workloads since a dst lookup must be done for each packet. This change caches the dst per socket in a similar manner to what we do for TCP by implementing early_demux. For UDP multicast we can only cache the dst if there is only one receiving socket on the host. Since caching only works when there is one receiving socket we do the multicast socket lookup using RCU. For UDP unicast we only demux sockets with an exact match in order to not break forwarding setups. Additionally since the hash chains may be long we only check the first socket to see if it is a match and not waste extra time searching the whole chain when we might not find an exact match. Benchmark results from a netperf UDP_RR test: Before 87961.22 transactions/s After 89789.68 transactions/s Benchmark results from a fio 1 byte UDP multicast pingpong test (Multicast one way unicast response): Before 12.97us RTT After 12.63us RTT Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9a3bab6b05383f1e4c3716b3615500c51285959e |
|
24-Sep-2013 |
Eric Dumazet <edumazet@google.com> |
net: net_secret should not depend on TCP A host might need net_secret[] and never open a single socket. Problem added in commit aebda156a570782 ("net: defer net_secret[] initialization") Based on prior patch from Hannes Frederic Sowa. Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@strressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5a17a390de7bdbcfff9b8f344273a886ca4cf8bf |
|
02-Sep-2013 |
Cong Wang <amwang@redhat.com> |
net: make snmp_mib_free static inline Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5b9b6263775d45ccc0c8b27344bfb1a97cf6f725 |
|
12-Jun-2013 |
Eric Dumazet <edumazet@google.com> |
gro: remove a sparse error Fix following sparse error : net/ipv4/af_inet.c:1410:59: warning: restricted __be16 degrades to integer added in commit db8caf3dbc77599 ("gro: should aggregate frames without DF") Reported-by: kbuild test robot <fengguang.wu@intel.com> From: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
da5bab079f9b7d90ba234965a14914ace55e45e9 |
|
08-Jun-2013 |
Daniel Borkmann <dborkman@redhat.com> |
net: udp4: move GSO functions to udp_offload Similarly to TCP offloading and UDPv6 offloading, move all related UDPv4 functions to udp_offload.c to make things more explicit. Also, by this, we can make those functions static. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
28850dc7c71da9d0c0e39246e9ff6913f41f8d0a |
|
07-Jun-2013 |
Daniel Borkmann <dborkman@redhat.com> |
net: tcp: move GRO/GSO functions to tcp_offload Would be good to make things explicit and move those functions to a new file called tcp_offload.c, thus make this similar to tcpv6_offload.c. While moving all related functions into tcp_offload.c, we can also make some of them static, since they are only used there. Also, add an explicit registration function. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
db8caf3dbc77599dc90f4ea0a803cd1d97116f30 |
|
31-May-2013 |
Eric Dumazet <edumazet@google.com> |
gro: should aggregate frames without DF GRO on IPv4 doesn't aggregate frames if they don't have DF bit set. Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers are unable to aggregate this kind of traffic. The right thing to do is to allow aggregation as long as the DF bit has same value on all segments. bnx2x LRO does this correctly. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ben Hutchings <bhutchings@solarflare.com> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0d89d2035fe063461a5ddb609b2c12e7fb006e44 |
|
23-May-2013 |
Simon Horman <horms@verge.net.au> |
MPLS: Add limited GSO support In the case where a non-MPLS packet is received and an MPLS stack is added it may well be the case that the original skb is GSO but the NIC used for transmit does not support GSO of MPLS packets. The aim of this code is to provide GSO in software for MPLS packets whose skbs are GSO. SKB Usage: When an implementation adds an MPLS stack to a non-MPLS packet it should do the following to skb metadata: * Set skb->inner_protocol to the old non-MPLS ethertype of the packet. skb->inner_protocol is added by this patch. * Set skb->protocol to the new MPLS ethertype of the packet. * Set skb->network_header to correspond to the end of the L3 header, including the MPLS label stack. I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to kernel" which adds MPLS support to the kernel datapath of Open vSwtich. That patch sets the above requirements in datapath/actions.c:push_mpls() and was used to exercise this code. The datapath patch is against the Open vSwtich tree but it is intended that it be added to the Open vSwtich code present in the mainline Linux kernel at some point. Features: I believe that the approach that I have taken is at least partially consistent with the handling of other protocols. Jesse, I understand that you have some ideas here. I am more than happy to change my implementation. This patch adds dev->mpls_features which may be used by devices to advertise features supported for MPLS packets. A new NETIF_F_MPLS_GSO feature is added for devices which support hardware MPLS GSO offload. Currently no devices support this and MPLS GSO always falls back to software. Alternate Implementation: One possible alternate implementation is to teach netif_skb_features() and skb_network_protocol() about MPLS, in a similar way to their understanding of VLANs. I believe this would avoid the need for net/mpls/mpls_gso.c and in particular the calls to __skb_push() and __skb_push() in mpls_gso_segment(). I have decided on the implementation in this patch as it should not introduce any overhead in the case where mpls_gso is not compiled into the kernel or inserted as a module. MPLS GSO suggested by Jesse Gross. Based in part on "v4 GRE: Add TCP segmentation offload for GRE" by Pravin B Shelar. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9b3eb5edf33897dc9128aa27300066153d4f8b9c |
|
02-May-2013 |
Pravin B Shelar <pshelar@nicira.com> |
gre: Fix GREv4 TCPv6 segmentation. For ipv6 traffic, GRE can generate packet with strange GSO bits, e.g. ipv4 packet with SKB_GSO_TCPV6 flag set. Therefore following patch relaxes check in inet gso handler to allow such packet for segmentation. This patch also fixes wrong skb->protocol set that was done in gre_gso_segment() handler. Reported-by: Steinar H. Gunderson <sesse@google.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
aebda156a570782a86fc4426842152237a19427d |
|
29-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
net: defer net_secret[] initialization Instead of feeding net_secret[] at boot time, defer the init at the point first socket is created. This permits some platforms to use better entropy sources than the ones available at boot time. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
330305cc4a6b0cb75c22fc01b8826f0ad755550f |
|
24-Mar-2013 |
Pravin B Shelar <pshelar@nicira.com> |
ipv4: Fix ip-header identification for gso packets. ip-header id needs to be incremented even if IP_DF flag is set. This behaviour was changed in commit 490ab08127cebc25e3a26 (IP_GRE: Fix IP-Identification). Following patch fixes it so that identification is always incremented. Reported-by: Cong Wang <amwang@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c54419321455631079c7d6e60bc732dd0c5914c5 |
|
25-Mar-2013 |
Pravin B Shelar <pshelar@nicira.com> |
GRE: Refactor GRE tunneling code. Following patch refactors GRE code into ip tunneling code and GRE specific code. Common tunneling code is moved to ip_tunnel module. ip_tunnel module is written as generic library which can be used by different tunneling implementations. ip_tunnel module contains following components: - packet xmit and rcv generic code. xmit flow looks like (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out. - hash table of all devices. - lookup for tunnel devices. - control plane operations like device create, destroy, ioctl, netlink operations code. - registration for tunneling modules, like gre, ipip etc. - define single pcpu_tstats dev->tstats. - struct tnl_ptk_info added to pass parsed tunnel packet parameters. ipip.h header is renamed to ip_tunnel.h Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
25c7704d8bdcf6746dab4397953df51759924b37 |
|
24-Mar-2013 |
Pravin B Shelar <pshelar@nicira.com> |
ipv4: Fix ip-header identification for gso packets. ip-header id needs to be incremented even if IP_DF flag is set. This behaviour was changed in commit 490ab08127cebc25e3a26 (IP_GRE: Fix IP-Identification). Following patch fixes it so that identification is always incremented. Reported-by: Cong Wang <amwang@redhat.com> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
|
731362674580cb0c696cd1b1a03d8461a10cf90a |
|
07-Mar-2013 |
Pravin B Shelar <pshelar@nicira.com> |
tunneling: Add generic Tunnel segmentation. Adds generic tunneling offloading support for IPv4-UDP based tunnels. GSO type is added to request this offload for a skb. netdev feature NETIF_F_UDP_TUNNEL is added for hardware offloaded udp-tunnel support. Currently no device supports this feature, software offload is used. This can be used by tunneling protocols like VXLAN. CC: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ec5f061564238892005257c83565a0b58ec79295 |
|
07-Mar-2013 |
Pravin B Shelar <pshelar@nicira.com> |
net: Kill link between CSUM and SG features. Earlier SG was unset if CSUM was not available for given device to force skb copy to avoid sending inconsistent csum. Commit c9af6db4c11c (net: Fix possible wrong checksum generation) added explicit flag to force copy to fix this issue. Therefore there is no need to link SG and CSUM, following patch kills this link between there two features. This patch is also required following patch in series. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
490ab08127cebc25e3a260a74556b56ce5f47c0f |
|
22-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
IP_GRE: Fix IP-Identification. GRE-GSO generates ip fragments with id 0,2,3,4... for every GSO packet, which is not correct. Following patch fixes it by setting ip-header id unique id of fragments are allowed. As Eric Dumazet suggested it is optimized by using inner ip-header whenever inner packet is ipv4. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5b0520425e5ea81ba95ec486dd6bbb59a09fff0e |
|
21-Feb-2013 |
Li Wei <lw@cn.fujitsu.com> |
ipv4: fix error handling in icmp_protocol. Now we handle icmp errors in each transport protocol's err_handler, for icmp protocols, that is ping_err. Since this handler only care of those icmp errors triggered by echo request, errors triggered by echo reply(which sent by kernel) are sliently ignored. So wrap ping_err() with icmp_err() to deal with those icmp errors. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
08dcdbf6a7b9d14c2302c5bd0c5390ddf122f664 |
|
21-Feb-2013 |
Eric Dumazet <edumazet@google.com> |
ipv6: use a stronger hash for tcp It looks like its possible to open thousands of TCP IPv6 sessions on a server, all landing in a single slot of TCP hash table. Incoming packets have to lookup sockets in a very long list. We should hash all bits from foreign IPv6 addresses, using a salt and hash mix, not a simple XOR. inet6_ehashfn() can also separately use the ports, instead of xoring them. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
68c331631143f5f039baac99a650e0b9e1ea02b6 |
|
14-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
v4 GRE: Add TCP segmentation offload for GRE Following patch adds GRE protocol offload handler so that skb_gso_segment() can segment GRE packets. SKB GSO CB is added to keep track of total header length so that skb_segment can push entire header. e.g. in case of GRE, skb_segment need to push inner and outer headers to every segment. New NETIF_F_GRE_GSO feature is added for devices which support HW GRE TSO offload. Currently none of devices support it therefore GRE GSO always fall backs to software GSO. [ Compute pkt_len before ip_local_out() invocation. -DaveM ] Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c9af6db4c11ccc6c3e7f19bbc15d54023956f97c |
|
11-Feb-2013 |
Pravin B Shelar <pshelar@nicira.com> |
net: Fix possible wrong checksum generation. Patch cef401de7be8c4e (net: fix possible wrong checksum generation) fixed wrong checksum calculation but it broke TSO by defining new GSO type but not a netdev feature for that type. net_gso_ok() would not allow hardware checksum/segmentation offload of such packets without the feature. Following patch fixes TSO and wrong checksum. This patch uses same logic that Eric Dumazet used. Patch introduces new flag SKBTX_SHARED_FRAG if at least one frag can be modified by the user. but SKBTX_SHARED_FRAG flag is kept in skb shared info tx_flags rather than gso_type. tx_flags is better compared to gso_type since we can have skb with shared frag without gso packet. It does not link SHARED_FRAG to GSO, So there is no need to define netdev feature for this. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
547472b8e1da72ae226430c0c4273e36fc8ca768 |
|
05-Feb-2013 |
David S. Miller <davem@davemloft.net> |
ipv4: Disallow non-namespace aware protocols to register. All in-tree ipv4 protocol implementations are now namespace aware. Therefore all the run-time checks are superfluous. Reject registry of any non-namespace aware ipv4 protocol. Eventually we'll remove prot->netns_ok and this registry time check as well. Signed-off-by: David S. Miller <davem@davemloft.net>
|
cef401de7be8c4e155c6746bfccf721a4fa5fab9 |
|
25-Jan-2013 |
Eric Dumazet <edumazet@google.com> |
net: fix possible wrong checksum generation Pravin Shelar mentioned that GSO could potentially generate wrong TX checksum if skb has fragments that are overwritten by the user between the checksum computation and transmit. He suggested to linearize skbs but this extra copy can be avoided for normal tcp skbs cooked by tcp_sendmsg(). This patch introduces a new SKB_GSO_SHARED_FRAG flag, set in skb_shinfo(skb)->gso_type if at least one frag can be modified by the user. Typical sources of such possible overwrites are {vm}splice(), sendfile(), and macvtap/tun/virtio_net drivers. Tested: $ netperf -H 7.7.8.84 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3959.52 $ netperf -H 7.7.8.84 -t TCP_SENDFILE TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 3216.80 Performance of the SENDFILE is impacted by the extra allocation and copy, and because we use order-0 pages, while the TCP_STREAM uses bigger pages. Reported-by: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
50c3a487d507568359ccbce1b15bcdfc01edf7ef |
|
22-Jan-2013 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
ipv4: Use IS_ERR_OR_NULL(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
95c7e0e4d4792f9c9629ad759c9be047a3ddbcc5 |
|
09-Jan-2013 |
YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> |
ipv4: Use FIELD_SIZEOF() in inet_init(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3594698a1fb8e5ae60a92c72ce9ca280256939a7 |
|
16-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Make CAP_NET_BIND_SERVICE per user namespace Allow privileged users in any user namespace to bind to privileged sockets in network namespaces they control. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
52e804c6dfaa5df1e4b0e290357b82ad4e4cda2c |
|
16-Nov-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Allow userns root to control ipv4 Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow creating raw sockets. Allow the SIOCSARP ioctl to control the arp cache. Allow the SIOCSIFFLAG ioctl to allow setting network device flags. Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address. Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address. Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address. Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask. Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting gre tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipip tunnels. Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for adding, changing and deleting ipsec virtual tunnel interfaces. Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC, MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing sockets. Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and arbitrary ip options. Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option. Allow setting the IP_TRANSPARENT ipv4 socket option. Allow setting the TCP_REPAIR socket option. Allow setting the TCP_CONGESTION socket option. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f191a1d17f227032c159e5499809f545402b6dc6 |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
net: Remove code duplication between offload structures Move the offload callbacks into its own structure. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
808a8f884554f93315f663b2694addb4a177c578 |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
ipv4: Pull GSO registration out of inet_init() Since GSO/GRO support is now separated, make IPv4 GSO a stand-alone init call and not part of inet_init(). Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bca49f843eac59cbb1ddd1f4a5d65fcc23b62efd |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
ipv4: Switch to using the new offload infrastructure. Switch IPv4 code base to using the new GRO/GSO calls and data. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
de27d001d187721de775ed9e0c485aa6f517c745 |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
net: Add net protocol offload registration infrustructure Create a new data structure for IPv4 protocols that holds GRO/GSO callbacks and a new array to track the protocols that register GRO/GSO. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
22061d8014455b01eb018bd6c35a1b3040ccc230 |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
net: Switch to using the new packet offload infrustructure Convert to using the new GSO/GRO registration mechanism and new packet offload structure. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bb68b64724a4fd6b93d83b39aeffa4aadb2562fc |
|
18-Sep-2012 |
Christoph Paasch <christoph.paasch@uclouvain.be> |
ipv4: Don't add TCP-code in inet_sock_destruct Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: H.K. Jerry Chu <hkchu@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7ab4551f3b391818e29263279031dca1e26417c6 |
|
06-Sep-2012 |
Eric Dumazet <edumazet@google.com> |
tcp: fix TFO regression Fengguang Wu reported various panics and bisected to commit 8336886f786fdac (tcp: TCP Fast Open Server - support TFO listeners) Fix this by making sure socket is a TCP socket before accessing TFO data structures. [ 233.046014] kfree_debugcheck: out of range ptr ea6000000bb8h. [ 233.047399] ------------[ cut here ]------------ [ 233.048393] kernel BUG at /c/kernel-tests/src/stable/mm/slab.c:3074! [ 233.048393] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 233.048393] Modules linked in: [ 233.048393] CPU 0 [ 233.048393] Pid: 3929, comm: trinity-watchdo Not tainted 3.6.0-rc3+ #4192 Bochs Bochs [ 233.048393] RIP: 0010:[<ffffffff81169653>] [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d [ 233.048393] RSP: 0018:ffff88000facbca8 EFLAGS: 00010092 [ 233.048393] RAX: 0000000000000031 RBX: 0000ea6000000bb8 RCX: 00000000a189a188 [ 233.048393] RDX: 000000000000a189 RSI: ffffffff8108ad32 RDI: ffffffff810d30f9 [ 233.048393] RBP: ffff88000facbcb8 R08: 0000000000000002 R09: ffffffff843846f0 [ 233.048393] R10: ffffffff810ae37c R11: 0000000000000908 R12: 0000000000000202 [ 233.048393] R13: ffffffff823dbd5a R14: ffff88000ec5bea8 R15: ffffffff8363c780 [ 233.048393] FS: 00007faa6899c700(0000) GS:ffff88001f200000(0000) knlGS:0000000000000000 [ 233.048393] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 233.048393] CR2: 00007faa6841019c CR3: 0000000012c82000 CR4: 00000000000006f0 [ 233.048393] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 233.048393] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 233.048393] Process trinity-watchdo (pid: 3929, threadinfo ffff88000faca000, task ffff88000faec600) [ 233.048393] Stack: [ 233.048393] 0000000000000000 0000ea6000000bb8 ffff88000facbce8 ffffffff8116ad81 [ 233.048393] ffff88000ff588a0 ffff88000ff58850 ffff88000ff588a0 0000000000000000 [ 233.048393] ffff88000facbd08 ffffffff823dbd5a ffffffff823dbcb0 ffff88000ff58850 [ 233.048393] Call Trace: [ 233.048393] [<ffffffff8116ad81>] kfree+0x5f/0xca [ 233.048393] [<ffffffff823dbd5a>] inet_sock_destruct+0xaa/0x13c [ 233.048393] [<ffffffff823dbcb0>] ? inet_sk_rebuild_header +0x319/0x319 [ 233.048393] [<ffffffff8231c307>] __sk_free+0x21/0x14b [ 233.048393] [<ffffffff8231c4bd>] sk_free+0x26/0x2a [ 233.048393] [<ffffffff825372db>] sctp_close+0x215/0x224 [ 233.048393] [<ffffffff810d6835>] ? lock_release+0x16f/0x1b9 [ 233.048393] [<ffffffff823daf12>] inet_release+0x7e/0x85 [ 233.048393] [<ffffffff82317d15>] sock_release+0x1f/0x77 [ 233.048393] [<ffffffff82317d94>] sock_close+0x27/0x2b [ 233.048393] [<ffffffff81173bbe>] __fput+0x101/0x20a [ 233.048393] [<ffffffff81173cd5>] ____fput+0xe/0x10 [ 233.048393] [<ffffffff810a3794>] task_work_run+0x5d/0x75 [ 233.048393] [<ffffffff8108da70>] do_exit+0x290/0x7f5 [ 233.048393] [<ffffffff82707415>] ? retint_swapgs+0x13/0x1b [ 233.048393] [<ffffffff8108e23f>] do_group_exit+0x7b/0xba [ 233.048393] [<ffffffff8108e295>] sys_exit_group+0x17/0x17 [ 233.048393] [<ffffffff8270de10>] tracesys+0xdd/0xe2 [ 233.048393] Code: 59 01 5d c3 55 48 89 e5 53 41 50 0f 1f 44 00 00 48 89 fb e8 d4 b0 f0 ff 84 c0 75 11 48 89 de 48 c7 c7 fc fa f7 82 e8 0d 0f 57 01 <0f> 0b 5f 5b 5d c3 55 48 89 e5 0f 1f 44 00 00 48 63 87 d8 00 00 [ 233.048393] RIP [<ffffffff81169653>] kfree_debugcheck+0x27/0x2d [ 233.048393] RSP <ffff88000facbca8> Reported-by: Fengguang Wu <wfg@linux.intel.com> Tested-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: "H.K. Jerry Chu" <hkchu@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8336886f786fdacbc19b719c1f7ea91eb70706d4 |
|
31-Aug-2012 |
Jerry Chu <hkchu@google.com> |
tcp: TCP Fast Open Server - support TFO listeners This patch builds on top of the previous patch to add the support for TFO listeners. This includes - 1. allocating, properly initializing, and managing the per listener fastopen_queue structure when TFO is enabled 2. changes to the inet_csk_accept code to support TFO. E.g., the request_sock can no longer be freed upon accept(), not until 3WHS finishes 3. allowing a TCP_SYN_RECV socket to properly poll() and sendmsg() if it's a TFO socket 4. properly closing a TFO listener, and a TFO socket before 3WHS finishes 5. supporting TCP_FASTOPEN socket option 6. modifying tcp_check_req() to use to check a TFO socket as well as request_sock 7. supporting TCP's TFO cookie option 8. adding a new SYN-ACK retransmit handler to use the timer directly off the TFO socket rather than the listener socket. Note that TFO server side will not retransmit anything other than SYN-ACK until the 3WHS is completed. The patch also contains an important function "reqsk_fastopen_remove()" to manage the somewhat complex relation between a listener, its request_sock, and the corresponding child socket. See the comment above the function for the detail. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a9e050f4e7f9d36afe0dcc0bddba864ee442715e |
|
06-Aug-2012 |
Eric Dumazet <edumazet@google.com> |
net: tcp: GRO should be ECN friendly While doing TCP ECN tests, I discovered GRO was reordering packets if it receives one packet with CE set, while previous packets in same NAPI run have ECT(0) for the same flow : 09:25:25.857620 IP (tos 0x2,ECT(0), ttl 64, id 27893, offset 0, flags [DF], proto TCP (6), length 4396) 172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq 233801:238145, ack 1, win 115, options [nop,nop,TS val 3397779 ecr 1990627], length 4344 09:25:25.857626 IP (tos 0x3,CE, ttl 64, id 27892, offset 0, flags [DF], proto TCP (6), length 1500) 172.30.42.19.54550 > 172.30.42.13.44139: Flags [.], seq 232353:233801, ack 1, win 115, options [nop,nop,TS val 3397779 ecr 1990627], length 1448 09:25:25.857638 IP (tos 0x0, ttl 64, id 34581, offset 0, flags [DF], proto TCP (6), length 64) 172.30.42.13.44139 > 172.30.42.19.54550: Flags [.], cksum 0xac8f (incorrect -> 0xca69), ack 232353, win 1271, options [nop,nop,TS val 1990627 ecr 3397779,nop,nop,sack 1 {233801:238145}], length 0 We have two problems here : 1) GRO reorders packets If NIC gave packet1, then packet2, which happen to be from "different flows" GRO feeds stack with packet2, then packet1. I have yet to understand how to solve this problem. 2) GRO is not ECN friendly Delivering packets out of order makes TCP stack not as fast as it could be. In this patch I suggest we make the tos test not part of the 'same_flow' determination, but part of the 'should flush' logic Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cf60af03ca4e71134206809ea892e49b92a88896 |
|
19-Jul-2012 |
Yuchung Cheng <ycheng@google.com> |
net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
783237e8daf13481ee234997cbbbb823872ac388 |
|
19-Jul-2012 |
Yuchung Cheng <ycheng@google.com> |
net-tcp: Fast Open client - sending SYN-data This patch implements sending SYN-data in tcp_connect(). The data is from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch). The length of the cookie in tcp_fastopen_req, init'd to 0, controls the type of the SYN. If the cookie is not cached (len==0), the host sends data-less SYN with Fast Open cookie request option to solicit a cookie from the remote. If cookie is not available (len > 0), the host sends a SYN-data with Fast Open cookie option. If cookie length is negative, the SYN will not include any Fast Open option (for fall back operations). To deal with middleboxes that may drop SYN with data or experimental TCP option, the SYN-data is only sent once. SYN retransmits do not include data or Fast Open options. The connection will fall back to regular TCP handshake. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
41063e9dd11956f2d285e12e4342e1d232ba0ea2 |
|
20-Jun-2012 |
David S. Miller <davem@davemloft.net> |
ipv4: Early TCP socket demux. Input packet processing for local sockets involves two major demuxes. One for the route and one for the socket. But we can optimize this down to one demux for certain kinds of local sockets. Currently we only do this for established TCP sockets, but it could at least in theory be expanded to other kinds of connections. If a TCP socket is established then it's identity is fully specified. This means that whatever input route was used during the three-way handshake must work equally well for the rest of the connection since the keys will not change. Once we move to established state, we cache the receive packet's input route to use later. Like the existing cached route in sk->sk_dst_cache used for output packets, we have to check for route invalidations using dst->obsolete and dst->ops->check(). Early demux occurs outside of a socket locked section, so when a route invalidation occurs we defer the fixup of sk->sk_rx_dst until we are actually inside of established state packet processing and thus have the socket locked. Signed-off-by: David S. Miller <davem@davemloft.net>
|
f9242b6b28d61295f2bf7e8adfb1060b382e5381 |
|
20-Jun-2012 |
David S. Miller <davem@davemloft.net> |
inet: Sanitize inet{,6} protocol demux. Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay are just a straight arrays. Remove all unnecessary hash masking. Document MAX_INET_PROTOS. Use RAW_HTABLE_SIZE when appropriate. Reported-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e3192690a3c889767d1161b228374f4926d92af0 |
|
03-Jun-2012 |
Joe Perches <joe@perches.com> |
net: Remove casts to same type Adding casts of objects to the same type is unnecessary and confusing for a human reader. For example, this cast: int y; int *p = (int *)&y; I used the coccinelle script below to find and remove these unnecessary casts. I manually removed the conversions this script produces of casts with __force and __user. @@ type T; T *p; @@ - (T *)p + p Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4a17fd5229c1b6066aa478f6b690f8293ce811a1 |
|
19-Apr-2012 |
Pavel Emelyanov <xemul@parallels.com> |
sock: Introduce named constants for sk_reuse Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5e73ea1a31c3612aa6dfe44f864ca5b7b6a4cff9 |
|
15-Apr-2012 |
Daniel Baluta <dbaluta@ixiacom.com> |
ipv4: fix checkpatch errors Fix checkpatch errors of the following type: * ERROR: "foo * bar" should be "foo *bar" * ERROR: "(foo*)" should be "(foo *)" Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9ffc93f203c18a70623f21950f1dd473c9ec48cd |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: David Howells <dhowells@redhat.com>
|
afd465030acb4098abcb6b965a5aebc7ea2209e0 |
|
12-Mar-2012 |
Joe Perches <joe@perches.com> |
net: ipv4: Standardize prefixes for message logging Add #define pr_fmt(fmt) as appropriate. Add "IPv4: ", "TCP: ", and "IPsec: " to appropriate files. Standardize on "UDPLite: " for appropriate uses. Some prefixes were previously "UDPLITE: " and "UDP-Lite: ". Add KBUILD_MODNAME ": " to icmp and gre. Remove embedded prefixes as appropriate. Add missing "\n" to pr_info in gre.c. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
058bd4d2a4ff0aaa4a5381c67e776729d840c785 |
|
11-Mar-2012 |
Joe Perches <joe@perches.com> |
net: Convert printks to pr_<level> Use a more current kernel messaging style. Convert a printk block to print_hex_dump. Coalesce formats, align arguments. Use %s, __func__ instead of embedding function names. Some messages that were prefixed with <foo>_close are now prefixed with <foo>_fini. Some ah4 and esp messages are now not prefixed with "ip ". The intent of this patch is to later add something like #define pr_fmt(fmt) "IPv4: " fmt. to standardize the output messages. Text size is trivially reduced. (x86-32 allyesconfig) $ size net/ipv4/built-in.o* text data bss dec hex filename 887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new 887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4c507d2897bd9be810b3403ade73b04cf6fdfd4a |
|
09-Feb-2012 |
Jiri Benc <jbenc@redhat.com> |
net: implement IP_RECVTOS for IP_PKTOPTIONS Currently, it is not easily possible to get TOS/DSCP value of packets from an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt with IP_RECVTOS set, the same way as incoming TTL can be queried. This is not actually implemented for TOS, though. This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6 (IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of the TOS field is stored from the other party's ACK. This is needed for proxies which require DSCP transparency. One such example is at http://zph.bratcheda.org/. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3dc43e3e4d0b52197d3205214fe8f162f9e0c334 |
|
11-Dec-2011 |
Glauber Costa <glommer@parallels.com> |
per-netns ipv4 sysctl_tcp_mem This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c8f44affb7244f2ac3e703cab13d55ede27621bb |
|
15-Nov-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
acb32ba3dee66d58704caeeb8c6ff95f60efdc66 |
|
08-Nov-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv4: reduce percpu needs for icmpmsg mibs Reading /proc/net/snmp on a machine with a lot of cpus is very expensive (can be ~88000 us). This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values for all possible cpus can read 16 Mbytes of memory. ICMP messages are not considered as fast path on a typical server, and eventually few cpus handle them anyway. We can afford an atomic operation instead of using percpu data. This saves 4096 bytes per cpu and per network namespace. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
686dc6b64b58e69715ce92177da0732a6464db69 |
|
15-Oct-2011 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
ipv4: compat_ioctl is local to af_inet.c, make it static ipv4: compat_ioctl is local to af_inet.c, make it static Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
29c486df6a208432b370bd4be99ae1369ede28d8 |
|
31-Aug-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: ipv4: relax AF_INET check in bind() commit d0733d2e29b65 (Check for mistakenly passed in non-IPv4 address) added regression on legacy apps that use bind() with AF_UNSPEC family. Relax the check, but make sure the bind() is done on INADDR_ANY addresses, as AF_UNSPEC has probably no sane meaning for other addresses. Bugzilla reference : https://bugzilla.kernel.org/show_bug.cgi?id=42012 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-and-bisected-by: Rene Meier <r_meier@freenet.de> CC: Marcus Meissner <meissner@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c349a528cd47e2272ded0ea358363855e86180da |
|
04-Jul-2011 |
Marcus Meissner <meissner@novell.com> |
net: bind() fix error return on wrong address family Hi, Reinhard Max also pointed out that the error should EAFNOSUPPORT according to POSIX. The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN. Other protocols error values in their af bind() methods in current mainline git as far as a brief look shows: EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25, No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip Ciao, Marcus Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1eddceadb0d6441cd39b2c38705a8f5fec86e770 |
|
17-Jun-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: rfs: enable RFS before first data packet is received Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit : > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Fri, 17 Jun 2011 00:50:46 +0100 > > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote: > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb) > >> goto discard; > >> > >> if (nsk != sk) { > >> + sock_rps_save_rxhash(nsk, skb->rxhash); > >> if (tcp_child_process(sk, nsk, skb)) { > >> rsk = nsk; > >> goto reset; > >> > > > > I haven't tried this, but it looks reasonable to me. > > > > What about IPv6? The logic in tcp_v6_do_rcv() looks very similar. > > Indeed ipv6 side needs the same fix. > > Eric please add that part and resubmit. And in fact I might stick > this into net-2.6 instead of net-next-2.6 > OK, here is the net-2.6 based one then, thanks ! [PATCH v2] net: rfs: enable RFS before first data packet is received First packet received on a passive tcp flow is not correctly RFS steered. One sock_rps_record_flow() call is missing in inet_accept() But before that, we also must record rxhash when child socket is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
|
8f0ea0fe3a036a47767f9c80e81b13e379a1f43b |
|
10-Jun-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
snmp: reduce percpu needs by 50% SNMP mibs use two percpu arrays, one used in BH context, another in USER context. With increasing number of cpus in machines, and fact that ipv6 uses per network device ipstats_mib, this is consuming a lot of memory if many network devices are registered. commit be281e554e2a (ipv6: reduce per device ICMP mib sizes) shrinked percpu needs for ipv6, but we can reduce memory use a bit more. With recent percpu infrastructure (irqsafe_cpu_inc() ...), we no longer need this BH/USER separation since we can update counters in a single x86 instruction, regardless of the BH/USER context. Other arches than x86 might need to disable irq in their irqsafe_cpu_inc() implementation : If this happens to be a problem, we can make SNMP_ARRAY_SZ arch dependent, but a previous poll ( https://lkml.org/lkml/2011/3/17/174 ) to arch maintainers did not raise strong opposition. Only on 32bit arches, we need to disable BH for 64bit counters updates done from USER context (currently used for IP MIB) This also reduces vmlinux size : 1) x86_64 build $ size vmlinux.before vmlinux.after text data bss dec hex filename 7853650 1293772 1896448 11043870 a8841e vmlinux.before 7850578 1293772 1896448 11040798 a8781e vmlinux.after 2) i386 build $ size vmlinux.before vmlinux.afterpatch text data bss dec hex filename 6039335 635076 3670016 10344427 9dd7eb vmlinux.before 6037342 635076 3670016 10342434 9dd022 vmlinux.afterpatch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Andi Kleen <andi@firstfloor.org> CC: Ingo Molnar <mingo@elte.hu> CC: Tejun Heo <tj@kernel.org> CC: Christoph Lameter <cl@linux-foundation.org> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org CC: linux-arch@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
d0733d2e29b652b2e7b1438ececa732e4eed98eb |
|
02-Jun-2011 |
Marcus Meissner <meissner@suse.de> |
net/ipv4: Check for mistakenly passed in non-IPv4 address Check against mistakenly passing in IPv6 addresses (which would result in an INADDR_ANY bind) or similar incompatible sockaddrs. Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c319b4d76b9e583a5d88d6bf190e079c4e43213d |
|
13-May-2011 |
Vasiliy Kulikov <segoon@openwall.com> |
net: ipv4: add IPPROTO_ICMP socket kind This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below). Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/ A new ping socket is created with socket(PF_INET, SOCK_DGRAM, PROT_ICMP) Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored. Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets. ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed. ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order). socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons. The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.). Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/ Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public. All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch. PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code. PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf(). PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore. RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6e869138101bc607e4780187210b79d531f9b2ce |
|
07-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Use cork flow in inet_sk_{reselect_saddr,rebuild_header}() These two functions must be invoked only when the socket is locked (because socket identity modifications are made non-atomically). Therefore we can use the cork flow for output route lookups. Signed-off-by: David S. Miller <davem@davemloft.net>
|
31e4543db29fb85496a122b965d6482c8d1a2bfe |
|
04-May-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make caller provide on-stack flow key to ip_route_output_ports(). Signed-off-by: David S. Miller <davem@davemloft.net>
|
b883187785004cacea053569f1655fada0dfc299 |
|
29-Apr-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Fetch route saddr from flow key in inet_sk_reselect_saddr(). Now that output route lookups update the flow with source address selection, we can fetch it from fl4->saddr instead of rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
|
f6d8bd051c391c1c0458a30b2a7abcd939329259 |
|
21-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: add RCU protection to inet->opt We lack proper synchronization to manipulate inet->opt ip_options Problem is ip_make_skb() calls ip_setup_cork() and ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options), without any protection against another thread manipulating inet->opt. Another thread can change inet->opt pointer and free old one under us. Use RCU to protect inet->opt (changed to inet->inet_opt). Instead of handling atomic refcounts, just copy ip_options when necessary, to avoid cache line dirtying. We cant insert an rcu_head in struct ip_options since its included in skb->cb[], so this patch is large because I had to introduce a new ip_options_rcu structure. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2d7192d6cbab20e153c47fa1559ffd41ceef0e79 |
|
26-Apr-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Sanitize and simplify ip_route_{connect,newports}() These functions are used together as a unit for route resolution during connect(). They address the chicken-and-egg problem that exists when ports need to be allocated during connect() processing, yet such port allocations require addressing information from the routing code. It's currently more heavy handed than it needs to be, and in particular we allocate and initialize a flow object twice. Let the callers provide the on-stack flow object. That way we only need to initialize it once in the ip_route_connect() call. Later, if ip_route_newports() needs to do anything, it re-uses that flow object as-is except for the ports which it updates before the route re-lookup. Also, describe why this set of facilities are needed and how it works in a big comment. Signed-off-by: David S. Miller <davem@davemloft.net> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
|
b71d1d426d263b0b6cb5760322efebbfc89d4463 |
|
22-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
78fbfd8a653ca972afe479517a40661bfff6d8c3 |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Create and use route lookup helpers. The idea here is this minimizes the number of places one has to edit in order to make changes to how flows are defined and used. Signed-off-by: David S. Miller <davem@davemloft.net>
|
b23dd4fe42b455af5c6e20966b7d6959fa8352ea |
|
02-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make output route lookup return rtable directly. Instead of on the stack. Signed-off-by: David S. Miller <davem@davemloft.net>
|
273447b352e69c327efdecfd6e1d6fe3edbdcd14 |
|
01-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Kill can_sleep arg to ip_route_output_flow() This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>
|
420d44daa7aa1cc847e9e527f0a27a9ce61768ca |
|
01-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>
|
abdf7e7239da270e68262728f125ea94b9b7d42d |
|
01-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv4: Can final ip_route_connect() arg to boolean "can_sleep". Since that's what the current vague "flags" thing means. Signed-off-by: David S. Miller <davem@davemloft.net>
|
709b46e8d90badda1898caea50483c12af178e96 |
|
29-Jan-2011 |
Eric W. Biederman <ebiederm@xmission.com> |
net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1, which unfortunately means the existing infrastructure for compat networking ioctls is insufficient. A trivial compact ioctl implementation would conflict with: SIOCAX25ADDUID SIOCAIPXPRISLT SIOCGETSGCNT_IN6 SIOCGETSGCNT SIOCRSSCAUSE SIOCX25SSUBSCRIP SIOCX25SDTEFACILITIES To make this work I have updated the compat_ioctl decode path to mirror the the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl function into struct proto so I can break out ioctls by which kind of ip socket I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal ipmr_ioctl. This was necessary because unfortunately the struct layout for the SIOCGETSGCNT has unsigned longs in it so changes between 32bit and 64bit kernels. This change was sufficient to run a 32bit ip multicast routing daemon on a 64bit kernel. Reported-by: Bill Fenner <fenner@aristanetworks.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
04ed3e741d0f133e02bed7fa5c98edba128f90e7 |
|
25-Jan-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: change netdev->features to u32 Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5811662b15db018c740c57d037523683fd3e6123 |
|
12-Nov-2010 |
Changli Gao <xiaosuo@gmail.com> |
net: use the macros defined for the members of flowi Use the macros defined for the members of flowi to clean the code up. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
49e8ab03ebcacd8e37660ffec20c0c46721a2800 |
|
19-Aug-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: build_ehash_secret() and rt_bind_peer() cleanups Now cmpxchg() is available on all arches, we can use it in build_ehash_secret() and rt_bind_peer() instead of using spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7ba42910073f8432934d61a6c08b1023c408fb62 |
|
10-Jul-2010 |
Changli Gao <xiaosuo@gmail.com> |
inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage() a new boolean flag no_autobind is added to structure proto to avoid the autobind calls when the protocol is TCP. Then sock_rps_record_flow() is called int the TCP's sendmsg() and sendpage() pathes. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/inet_common.h | 4 ++++ include/net/sock.h | 1 + include/net/tcp.h | 8 ++++---- net/ipv4/af_inet.c | 15 +++++++++------ net/ipv4/tcp.c | 11 +++++------ net/ipv4/tcp_ipv4.c | 3 +++ net/ipv6/af_inet6.c | 8 ++++---- net/ipv6/tcp_ipv6.c | 3 +++ 8 files changed, 33 insertions(+), 20 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
4ce3c183fcade7f4b30a33dae90cd774c3d9e094 |
|
30-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
snmp: 64bit ipstats_mib for all arches /proc/net/snmp and /proc/net/netstat expose SNMP counters. Width of these counters is either 32 or 64 bits, depending on the size of "unsigned long" in kernel. This means user program parsing these files must already be prepared to deal with 64bit values, regardless of user program being 32 or 64 bit. This patch introduces 64bit snmp values for IPSTAT mib, where some counters can wrap pretty fast if they are 32bit wide. # netstat -s|egrep "InOctets|OutOctets" InOctets: 244068329096 OutOctets: 244069348848 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1823e4c80eeae2a774c75569ce3035070e5ee009 |
|
22-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
snmp: add align parameter to snmp_mib_init() In preparation for 64bit snmp counters for some mibs, add an 'align' parameter to snmp_mib_init(), instead of assuming mibs only contain 'unsigned long' fields. Callers can use __alignof__(type) to provide correct alignment. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7b2ff18ee7b0ec4bc3162f821e221781aaca48bd |
|
15-Jun-2010 |
Jiri Olsa <jolsa@redhat.com> |
net - IP_NODEFRAG option for IPv4 socket this patch is implementing IP_NODEFRAG option for IPv4 socket. The reason is, there's no other way to send out the packet with user customized header of the reassembly part. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d8d1f30b95a635dbd610dcc5eb641aca8f4768cf |
|
11-Jun-2010 |
Changli Gao <xiaosuo@gmail.com> |
net-next: remove useless union keyword remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e3826f1e946e7d2354943232f1457be1455a29e2 |
|
05-May-2010 |
Amerigo Wang <amwang@redhat.com> |
net: reserve ports for applications using fixed port numbers (Dropped the infiniband part, because Tetsuo modified the related code, I will send a separate patch for it once this is accepted.) This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which allows users to reserve ports for third-party applications. The reserved ports will not be used by automatic port assignments (e.g. when calling connect() or bind() with port number 0). Explicit port allocation behavior is unchanged. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c58dc01babfd58ec9e71a6ce080150dc27755d88 |
|
28-Apr-2010 |
David S. Miller <davem@davemloft.net> |
net: Make RFS socket operations not be inet specific. Idea from Eric Dumazet. As for placement inside of struct sock, I tried to choose a place that otherwise has a 32-bit hole on 64-bit systems. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
|
0eae88f31ca2b88911ce843452054139e028771f |
|
21-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Fix various endianness glitches Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
aa395145165cb06a0d0885221bbe0ce4a564391d |
|
20-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_sleep() helper Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fec5e652e58fa6017b2c9e06466cb2a6538de5b4 |
|
17-Apr-2010 |
Tom Herbert <therbert@google.com> |
rfs: Receive Flow Steering This patch implements receive flow steering (RFS). RFS steers received packets for layer 3 and 4 processing to the CPU where the application for the corresponding flow is running. RFS is an extension of Receive Packet Steering (RPS). The basic idea of RFS is that when an application calls recvmsg (or sendmsg) the application's running CPU is stored in a hash table that is indexed by the connection's rxhash which is stored in the socket structure. The rxhash is passed in skb's received on the connection from netif_receive_skb. For each received packet, the associated rxhash is used to look up the CPU in the hash table, if a valid CPU is set then the packet is steered to that CPU using the RPS mechanisms. The convolution of the simple approach is that it would potentially allow OOO packets. If threads are thrashing around CPUs or multiple threads are trying to read from the same sockets, a quickly changing CPU value in the hash table could cause rampant OOO packets-- we consider this a non-starter. To avoid OOO packets, this solution implements two types of hash tables: rps_sock_flow_table and rps_dev_flow_table. rps_sock_table is a global hash table. Each entry is just a CPU number and it is populated in recvmsg and sendmsg as described above. This table contains the "desired" CPUs for flows. rps_dev_flow_table is specific to each device queue. Each entry contains a CPU and a tail queue counter. The CPU is the "current" CPU for a matching flow. The tail queue counter holds the value of a tail queue counter for the associated CPU's backlog queue at the time of last enqueue for a flow matching the entry. Each backlog queue has a queue head counter which is incremented on dequeue, and so a queue tail counter is computed as queue head count + queue length. When a packet is enqueued on a backlog queue, the current value of the queue tail counter is saved in the hash entry of the rps_dev_flow_table. And now the trick: when selecting the CPU for RPS (get_rps_cpu) the rps_sock_flow table and the rps_dev_flow table for the RX queue are consulted. When the desired CPU for the flow (found in the rps_sock_flow table) does not match the current CPU (found in the rps_dev_flow table), the current CPU is changed to the desired CPU if one of the following is true: - The current CPU is unset (equal to RPS_NO_CPU) - Current CPU is offline - The current CPU's queue head counter >= queue tail counter in the rps_dev_flow table. This checks if the queue tail has advanced beyond the last packet that was enqueued using this table entry. This guarantees that all packets queued using this entry have been dequeued, thus preserving in order delivery. Making each queue have its own rps_dev_flow table has two advantages: 1) the tail queue counters will be written on each receive, so keeping the table local to interrupting CPU s good for locality. 2) this allows lockless access to the table-- the CPU number and queue tail counter need to be accessed together under mutual exclusion from netif_receive_skb, we assume that this is only called from device napi_poll which is non-reentrant. This patch implements RFS for TCP and connected UDP sockets. It should be usable for other flow oriented protocols. There are two configuration parameters for RFS. The "rps_flow_entries" kernel init parameter sets the number of entries in the rps_sock_flow_table, the per rxqueue sysfs entry "rps_flow_cnt" contains the number of entries in the rps_dev_flow table for the rxqueue. Both are rounded to power of two. The obvious benefit of RFS (over just RPS) is that it achieves CPU locality between the receive processing for a flow and the applications processing; this can result in increased performance (higher pps, lower latency). The benefits of RFS are dependent on cache hierarchy, application load, and other factors. On simple benchmarks, we don't necessarily see improvement and sometimes see degradation. However, for more complex benchmarks and for applications where cache pressure is much higher this technique seems to perform very well. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. The RPC test is an request/response test similar in structure to netperf RR test ith 100 threads on each host, but does more work in userspace that netperf. e1000e on 8 core Intel No RFS or RPS 104K tps at 30% CPU No RFS (best RPS config): 290K tps at 63% CPU RFS 303K tps at 61% CPU RPC test tps CPU% 50/90/99% usec latency Latency StdDev No RFS/RPS 103K 48% 757/900/3185 4472.35 RPS only: 174K 73% 415/993/2468 491.66 RFS 223K 73% 379/651/1382 315.61 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b6c6712a42ca3f9fa7f4a3d7c40e3a9dd1fd9e03 |
|
09-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_dst_cache RCUification With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this work. sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock) This rwlock is readlocked for a very small amount of time, and dst entries are already freed after RCU grace period. This calls for RCU again :) This patch converts sk_dst_lock to a spinlock, and use RCU for readers. __sk_dst_get() is supposed to be called with rcu_read_lock() or if socket locked by user, so use appropriate rcu_dereference_check() condition (rcu_read_lock_held() || sock_owned_by_user(sk)) This patch avoids two atomic ops per tx packet on UDP connected sockets, for example, and permits sk_dst_lock to be much less dirtied. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6503d96168f891ffa3b70ae6c9698a1a722025a0 |
|
01-Apr-2010 |
Changli Gao <xiaosuo@gmail.com> |
net: check the length of the socket address passed to connect(2) check the length of the socket address passed to connect(2). Check the length of the socket address passed to connect(2). If the length is invalid, -EINVAL will be returned. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/bluetooth/l2cap.c | 3 ++- net/bluetooth/rfcomm/sock.c | 3 ++- net/bluetooth/sco.c | 3 ++- net/can/bcm.c | 3 +++ net/ieee802154/af_ieee802154.c | 3 +++ net/ipv4/af_inet.c | 5 +++++ net/netlink/af_netlink.c | 3 +++ 7 files changed, 20 insertions(+), 3 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>
|
5a0e3ad6af8660be21ca98a971cd00f331318c05 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
ec733b15a3ef0b5759141a177f8044a2f40c41e7 |
|
18-Mar-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: snmp mib cleanup There is no point to align or pad mibs to cache lines, they are per cpu allocated with a 8 bytes alignment anyway. This wastes space for no gain. This patch removes __SNMP_MIB_ALIGN__ Since SNMP mibs contain "unsigned long" fields only, we can relax the allocation alignment from "unsigned long long" to "unsigned long" Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7d720c3e4f0c4fc152a6bf17e24244a3c85412d2 |
|
16-Feb-2010 |
Tejun Heo <tj@kernel.org> |
percpu: add __percpu sparse annotations to net Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_*() users which used to cast the argument to (void **) are updated to cast it to (void __percpu **). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
|
c84b3268da3b85c9d8a9e504e1001a14ed829e94 |
|
06-Nov-2009 |
Eric Paris <eparis@redhat.com> |
net: check kern before calling security subsystem Before calling capable(CAP_NET_RAW) check if this operations is on behalf of the kernel or on behalf of userspace. Do not do the security check if it is on behalf of the kernel. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3f378b684453f2a028eda463ce383370545d9cc9 |
|
06-Nov-2009 |
Eric Paris <eparis@redhat.com> |
net: pass kern to net_proto_family create function The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
13f18aa05f5abe135f47b6417537ae2b2fedc18c |
|
06-Nov-2009 |
Eric Paris <eparis@redhat.com> |
net: drop capability from protocol definitions struct can_proto had a capability field which wasn't ever used. It is dropped entirely. struct inet_protosw had a capability field which can be more clearly expressed in the code by just checking if sock->type = SOCK_RAW. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
38bfd8f5bec496e8e0db8849e01c99a33479418a |
|
29-Oct-2009 |
Cyrill Gorcunov <gorcunov@openvz.org> |
net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build time proto_ops->getname implies copying protocol specific data into storage unit (particulary to __kernel_sockaddr_storage). So when we implement new protocol support we should keep such a detail in mind (which is easy to forget about). Lets introduce DECLARE_SOCKADDR helper which check if storage unit is not overfowed at build time. Eventually inet_getname is switched to use DECLARE_SOCKADDR (to show example of usage). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c720c7e8383aff1cb219bddf474ed89d850336e3 |
|
15-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: rename some inet_sock fields In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ec1b4cf74c81bfd0fbe5bf62bafc86c45917e72f |
|
05-Oct-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
net: mark net_proto_ops as const All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
914a9ab386a288d0f22252fc268ecbc048cdcbd5 |
|
02-Oct-2009 |
Atis Elsts <atis@mikrotik.com> |
net: Use sk_mark for routing lookup in more places This patch against v2.6.31 adds support for route lookup using sk_mark in some more places. The benefits from this patch are the following. First, SO_MARK option now has effect on UDP sockets too. Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing lookup correctly if TCP sockets with SO_MARK were used. Signed-off-by: Atis Elsts <atis@mikrotik.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
|
32613090a96dba2ca2cc524c8d4749d3126fdde5 |
|
14-Sep-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: constify struct net_protocol Remove long removed "inet_protocol_base" declaration. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3d1427f87002735aa54c370558e0c2bacc61f31e |
|
29-Aug-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv4: af_inet.c cleanups Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d7ca4cc01fd154f2da30ae6dae160fa5800af758 |
|
09-Jul-2009 |
Sridhar Samudrala <sri@us.ibm.com> |
udpv4: Handle large incoming UDP/IPv4 packets and support software UFO. - validate and forward GSO UDP/IPv4 packets from untrusted sources. - do software UFO if the outgoing device doesn't support UFO. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2307f866f542f3397d24f78d0efd74f4ab214a96 |
|
04-Jun-2009 |
Rami Rosen <ramirose@gmail.com> |
ipv4: remove ip_mc_drop_socket() declaration from af_inet.c. ip_mc_drop_socket() method is declared in linux/igmp.h, which is included anyhow in af_inet.c. So there is no need for this declaration. This patch removes it from af_inet.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f771bef98004d9d141b085d987a77d06669d4f4f |
|
28-May-2009 |
Nivedita Singhvi <niv@us.ibm.com> |
ipv4: New multicast-all socket option After some discussion offline with Christoph Lameter and David Stevens regarding multicast behaviour in Linux, I'm submitting a slightly modified patch from the one Christoph submitted earlier. This patch provides a new socket option IP_MULTICAST_ALL. In this case, default behaviour is _unchanged_ from the current Linux standard. The socket option is set by default to provide original behaviour. Sockets wishing to receive data only from multicast groups they join explicitly will need to clear this socket option. Signed-off-by: Nivedita Singhvi <niv@us.ibm.com> Signed-off-by: Christoph Lameter<cl@linux.com> Acked-by: David Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1075f3f65d0e0f49351b7d4310e9f94483972a51 |
|
26-May-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv4: Use 32-bit loads for ID and length in GRO This patch optimises the IPv4 GRO code by using 32-bit loads (instead of 16-bit ones) on the ID and length checks in the receive function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a5b1cf288d4200506ab62fbb86cc81ace948a306 |
|
26-May-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
gro: Avoid unnecessary comparison after skb_gro_header For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
573636cbafeba88f7c88832ae053dafe265bf94b |
|
17-Apr-2009 |
Eric Dumazet <dada1@cosmosbay.com> |
[PATCH] net: remove superfluous call to synchronize_net() inet_register_protosw() function is responsible for adding a new inet protocol into a global table (inetsw[]) that is used with RCU rules. As soon as the store of the pointer is done, other cpus might see this new protocol in inetsw[], so we have to make sure new protocol is ready for use. All pending memory updates should thus be committed to memory before setting the pointer. This is correctly done using rcu_assign_pointer() synchronize_net() is typically used at unregister time, after unsetting the pointer, to make sure no other cpu is still using the object we want to dismantle. Using it at register time is only adding an artificial delay that could hide a real bug, and this bug could popup if/when synchronize_rcu() can proceed faster than now. This saves about 13 ms on boot time on a HZ=1000 8 cpus machine ;) (4 calls to inet_register_protosw(), and about 3200 us per call) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7546dd97d27306d939c13e03318aae695badaa88 |
|
09-Mar-2009 |
Stephen Hemminger <shemminger@vyatta.com> |
net: convert usage of packet_type to read_mostly Protocols that use packet_type can be __read_mostly section for better locality. Elminate any unnecessary initializations of NULL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
313e458f81ec3852106c5a83830fe0d4f405a71a |
|
20-Feb-2009 |
Rusty Russell <rusty@rustcorp.com.au> |
alloc_percpu: add align argument to __alloc_percpu. This prepares for a real __alloc_percpu, by adding an alignment argument. Only one place uses __alloc_percpu directly, and that's for a string. tj: af_inet also uses __alloc_percpu(), update it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Jens Axboe <axboe@kernel.dk>
|
a5ad24be728d4352b71a81fba471aa41eb71f83a |
|
08-Feb-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
gro: Optimise IPv4 packet reception As this function can be called more than half a million times for 10GbE, it's important to optimise it as much as we can. This patch does some obvious changes to use 2-byte and 4-byte operations instead of byte-oriented ones where possible. Bit ops are also used to replace logical ops to reduce branching. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f15fbcd7d857ca2ea20b57ba6dfe63aab89d0b8b |
|
02-Feb-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv4: Delete redundant sk_family assignment sk_alloc now sets sk_family so this is redundant. In fact it caught my eye because sock_init_data already uses sk_family so this is too late anyway. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
09640e6365c679b5642b1c41b6d7078f51689ddf |
|
01-Feb-2009 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace uses of __constant_{endian} Base versions handle constant folding now. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
86911732d3996a9da07914b280621450111bb6da |
|
29-Jan-2009 |
Herbert Xu <herbert@gondor.apana.org.au> |
gro: Avoid copying headers of unmerged packets Unfortunately simplicity isn't always the best. The fraginfo interface turned out to be suboptimal. The problem was quite obvious. For every packet, we have to copy the headers from the frags structure into skb->head, even though for 99% of the packets this part is immediately thrown away after the merge. LRO didn't have this problem because it directly read the headers from the frags structure. This patch attempts to address this by creating an interface that allows GRO to access the headers in the first frag without having to copy it. Because all drivers that use frags place the headers in the first frag this optimisation should be enough. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b4ee07df3d8121060200dbe1c6686a4e0682bee2 |
|
26-Dec-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns: igmp: allow IPPROTO_IGMP sockets in netns Looks like everything is already ready. Required for ebtables(8) for one thing. Also, required for ipmr per-netns (coming soon). (Benjamin) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bf296b125b21b8d558ceb6ec30bb4eba2730cd6b |
|
16-Dec-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
tcp: Add GRO support This patch adds the TCP-specific portion of GRO. The criterion for merging is extremely strict (the TCP header must match exactly apart from the checksum) so as to allow refragmentation. Otherwise this is pretty much identical to LRO, except that we support the merging of ECN packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
73cc19f1556b95976934de236fd9043f7208844f |
|
16-Dec-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
ipv4: Add GRO infrastructure This patch adds GRO support for IPv4. The criteria for merging is more stringent than LRO, in particular, we require all fields in the IP header to be identical except for the length, ID and checksum. In addition, the ID must form an arithmetic sequence with a difference of one. The ID requirement might seem overly strict, however, most hardware TSO solutions already obey this rule. Linux itself also obeys this whether GSO is in use or not. In future we could relax this rule by storing the IDs (or rather making sure that we don't drop them when pulling the aggregate skb's tail). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
04f258ce7f085dd69422fa01d41c8f0194a0e270 |
|
24-Nov-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
net: some optimizations in af_inet 1) Use eq_net() in inet_netns_ok() to speedup socket creation if !CONFIG_NET_NS 2) Reorder the tests about inet_ehash_secret generation (once only) Use the unlikely() macro when testing if inet_ehash_secret already generated. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c82838458200ec4167ce7083b0a17474150c5bf7 |
|
20-Nov-2008 |
Balazs Scheidler <bazsi@balabit.hu> |
TPROXY: supply a struct flowi->flags argument in inet_sk_rebuild_header() inet_sk_rebuild_header() does a new route lookup if the dst_entry associated with a socket becomes stale. However inet_sk_rebuild_header() didn't use struct flowi->flags, causing the route lookup to fail for foreign-bound IP_TRANSPARENT sockets, causing an error state to be set for the sockets in question. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
|
673d57e72398edfedc93fb50ff58048077c9d587 |
|
31-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace NIPQUAD() in net/ipv4/ net/ipv6/ Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b9fb15067ce93497bef852c05e406d7a96212a9a |
|
01-Oct-2008 |
Tóth László Attila <panther@balabit.hu> |
ipv4: Allow binding to non-local addresses if IP_TRANSPARENT is set Setting IP_TRANSPARENT is not really useful without allowing non-local binds for the socket. To make user-space code simpler we allow these binds even if IP_TRANSPARENT is set but IP_FREEBIND is not. Signed-off-by: Tóth László Attila <panther@balabit.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bd7b1533cd6a68c734062aa69394bec7e2b1718e |
|
15-Jul-2008 |
Al Viro <viro@zeniv.linux.org.uk> |
[PATCH] sysctl: make sure that /proc/sys/net/ipv4 appears before per-ns ones Massage ipv4 initialization - make sure that net.ipv4 appears as non-per-net-namespace before it shows up in per-net-namespace sysctls. That's the only change outside of sysctl.c needed to get sane ordering rules and data structures for sysctls (esp. for procfs side of that mess). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
547b792cac0a038b9dbf958d3c120df3740b5572 |
|
26-Jul-2008 |
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> |
net: convert BUG_TRAP to generic WARN_ON Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
696adfe84c11c571a1e0863460ff0ec142b4e5a9 |
|
25-Jul-2008 |
Paul E. McKenney <paulmck@linux.vnet.ibm.com> |
list_for_each_rcu must die: networking All uses of list_for_each_rcu() can be profitably replaced by the easier-to-use list_for_each_entry_rcu(). This patch makes this change for networking, in preparation for removing the list_for_each_rcu() API entirely. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
d89cbbb1e69a3bfea38038fb058bd51013902ef3 |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
ipv4: clean the init_ipv4_mibs error paths After moving all the stuff outside this function it looks a bit ugly - make it look better. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
923c6586b0dc0a00df07a1608185437145a0c68b |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put icmpmsg statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b60538a0d737609213e4b758881913498d3ff0b4 |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put icmp statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
386019d3514b3ed9de8d0b05b67e638a7048375b |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put udplite statistics on struct net Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2f275f91a438abd8eec5321798d66a4ffe6869fa |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put udp statistics on struct net Similar to... ouch, I repeat myself. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
61a7e26028b94805fd686a6dc9dbd9941f8f19b0 |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put net statistics on struct net Similar to ip and tcp ones :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a20f5799ca7ceb24d63c74b6fdad4b0c0ee91f4f |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put ip statistics on struct net Similar to tcp one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
57ef42d59d1c1d79be59fc3c6380ae14234e38c3 |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
mib: put tcp statistics on struct net Proc temporary uses stats from init_net. BTW, TCP_XXX_STATS are beautiful (w/o do { } while (0) facing) again :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9b4661bd6e5437508e0920608f3213c23212cd1b |
|
18-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
ipv4: add pernet mib operations These ones are currently empty, but stuff from init_ipv4_mibs will sequentially migrate there. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a9c19329eccdb145a08a4a2e969d7b40c54c9bcc |
|
17-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
tcp: add net to tcp_mib_init This one sets TCP MIBs after zeroing them, and thus requires the net. The existing single caller can use init_net (temporarily). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
03d2f897e9fb3218989baa2139a951ce7f5414bf |
|
02-Jul-2008 |
Wang Chen <wangchen@cn.fujitsu.com> |
ipv4: Do cleanup for ip_mr_init Same as ip6_mr_init(), make ip_mr_init() return errno if fails. But do not do error handling in inet_init(), just print a msg. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
0b040829952d84bf2a62526f0e24b624e0699447 |
|
11-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
801678c5a3b4c79236970bcca27c733f5559e0d1 |
|
29-Apr-2008 |
Hirofumi Nakagawa <hnakagawa@miraclelinux.com> |
Remove duplicated unlikely() in IS_ERR() Some drivers have duplicated unlikely() macros. IS_ERR() already has unlikely() in itself. This patch cleans up such pointless code. Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
a7d632b6b4ad1c92746ed409e41f9dc571ec04e2 |
|
14-Apr-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV4]: Use NIPQUAD_FMT to format ipv4 addresses. And use %u to format port. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5616bdd6dfeb4e36be499dbac245e4d3be90a138 |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[INET]: uc_ttl assignment in inet_ctl_sock_create is redundant. uc_ttl is initialized in inet(6)_create and never changed except setsockopt ioctl. Remove this assignment. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5677242f432102dea9e6eceec1dc089e2f709ca4 |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Inet control socket should not hold a namespace. This is a generic requirement, so make inet_ctl_sock_create namespace aware and create a inet_ctl_sock_destroy wrapper around sk_release_kernel. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
eee4fe4ded6e9c196168aee8f9787771f4df9c90 |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[INET]: Let inet_ctl_sock_create return sock rather than socket. All upper protocol layers are already use sock internally. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3d58b5fa8e4c461ab09afdacd3d1754fccca06ad |
|
03-Apr-2008 |
Denis V. Lunev <den@openvz.org> |
[INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create. This call is nothing common with INET connection sockets code. It simply creates an unhashes kernel sockets for protocol messages. Move the new call into af_inet.c after the rename. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3b1e0a655f8eba44ab1ee2a1068d169ccfb853b9 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
92f1fecb45ef97acae94463302f79228a4b454d9 |
|
24-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Enable TCP/UDP/ICMP inside namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2342fd7e146f05edeb13feb03490c13a1bdab2e0 |
|
24-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Allow to create sockets in non-initial namespace. Allow to create sockets in the namespace if the protocol ok with this. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
05cf89d40c85e622dac20e44713168767be5c520 |
|
24-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Process INET socket layer in the correct namespace. Replace all the reast of the init_net with a proper net on the socket layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e6f1cebf71c4e7aae7dfa43414ce2631291def9f |
|
18-Mar-2008 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET] endianness noise: INADDR_ANY Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
db8dac20d5199307dcfcf4e01dac4bda5edf9e89 |
|
07-Mar-2008 |
David S. Miller <davem@davemloft.net> |
[UDP]: Revert udplite and code split. This reverts commit db1ed684f6c430c4cdad67d058688b8a1b5e607c ("[IPV6] UDP: Rename IPv6 UDP files."), commit 8be8af8fa4405652e6c0797db5465a4be8afb998 ("[IPV4] UDP: Move IPv4-specific bits to other file.") and commit e898d4db2749c6052072e9bc4448e396cbdeb06a ("[UDP]: Allow users to configure UDP-Lite."). First, udplite is of such small cost, and it is a core protocol just like TCP and normal UDP are. We spent enormous amounts of effort to make udplite share as much code with core UDP as possible. All of that work is less valuable if we're just going to slap a config option on udplite support. It is also causing build failures, as reported on linux-next, showing that the changeset was not tested very well. In fact, this is the second build failure resulting from the udplite change. Finally, the config options provided was a bool, instead of a modular option. Meaning the udplite code does not even get build tested by allmodconfig builds, and furthermore the user is not presented with a reasonable modular build option which is particularly needed by distribution vendors. Signed-off-by: David S. Miller <davem@davemloft.net>
|
0dc47877a3de00ceadea0005189656ae8dc52669 |
|
06-Mar-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
net: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e898d4db2749c6052072e9bc4448e396cbdeb06a |
|
29-Feb-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[UDP]: Allow users to configure UDP-Lite. Let's give users an option for disabling UDP-Lite (~4K). old: | text data bss dec hex filename | 286498 12432 6072 305002 4a76a net/ipv4/built-in.o | 193830 8192 3204 205226 321aa net/ipv6/ipv6.o new (without UDP-Lite): | text data bss dec hex filename | 284086 12136 5432 301654 49a56 net/ipv4/built-in.o | 191835 7832 3076 202743 317f7 net/ipv6/ipv6.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
a5710d6582868a96cf0fe02eac11d34cfbc783c9 |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[ICMP]: Add return code to icmp_init. icmp_init could fail and this is normal for namespace other than initial. So, the panic should be triggered only on init_net initialization path. Additionally create rollback path for icmp_init as a separate function. It will also be used later during namespace destruction. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9b0f976f27f00a81cf47643d90854659626795b4 |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[INET]: Remove struct net_proto_family* from _init calls. struct net_proto_family* is not used in icmp[v6]_init, ndisc_init, igmp_init and tcp_v4_init. Remove it. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e5b13cb10de209f924fdf9478214bcf7e4008d6d |
|
29-Feb-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Process devinet ioctl in the correct namespace. Add namespace parameter to devinet_ioctl and locate device inside it for state changes. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f1b050bf7a88910f9f00c9c8989c1bf5a67dd140 |
|
23-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Add namespace parameter to ip_route_output_flow. Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1bad118a330d494b23663fce94d4e9d9d5065fa7 |
|
10-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Pass namespace through ip_rt_ioctl. ... up to rtentry_to_fib_config Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6b175b26c1048d331508940ad3516ead1998084f |
|
10-Jan-2008 |
Eric W. Biederman <ebiederm@xmission.com> |
[NETNS]: Add netns parameter to inet_(dev_)add_type. The patch extends the inet_addr_type and inet_dev_addr_type with the network namespace pointer. That allows to access the different tables relatively to the network namespace. The modification of the signature function is reported in all the callers of the inet_addr_type using the pointer to the well known init_net. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7b1a74fdbb9ec38a9780620fae25519fde4b21ee |
|
10-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Refactor fib initialization so it can handle multiple namespaces. This patch makes the fib to be initialized as a subsystem for the network namespaces. The code does not handle several namespaces yet, so in case of a creation of a network namespace, the creation/initialization will not occur. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
61a0265344786a548e8a0b26cb668e78a71f9602 |
|
10-Jan-2008 |
Denis V. Lunev <den@openvz.org> |
[NETNS]: Add namespace to API for routing /proc entries creation. This adds netns parameter to fib_proc_init/exit and replaces __init specifier with __net_init. After this, we will not yet have these proc files show info from the specific namespace - this will be done when these tables become namespaced. Acked-by: Benjamin Thery <benjamin.thery@bull.net> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
95766fff6b9a78d11fc2d3812dd035381690b55d |
|
31-Dec-2007 |
Hideo Aoki <haoki@redhat.com> |
[UDP]: Add memory accounting. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
32e569b7277f13c4b27bb29c761189963e49ce7a |
|
16-Dec-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV4]: Pass the net pointer to the arp_req_set_proxy() This one will need to set the IPV4_DEVCONF_ALL(PROXY_ARP), but there's no ways to get the net right in place, so we have to pull one from the inet_ioctl's struct sock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8b7817f3a959ed99d7443afc12f78a7e1fcc2063 |
|
12-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPSEC]: Add ICMP host relookup support RFC 4301 requires us to relookup ICMP traffic that does not match any policies using the reverse of its payload. This patch implements this for ICMP traffic that originates from or terminates on localhost. This is activated on outbound with the new policy flag XFRM_POLICY_ICMP, and on inbound by the new state flag XFRM_STATE_ICMP. On inbound the policy check is now performed by the ICMP protocol so that it can repeat the policy check where necessary. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c69bce20dda7f79160856a338298d65a284ba303 |
|
24-Jan-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET]: Remove unused "mibalign" argument for snmp_mib_init(). With fixes from Arnaldo Carvalho de Melo. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9ba639797606acbcd97be886f41fbce163914e7b |
|
05-Dec-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[IPV4]: Cleanup the sysctl_net_ipv4.c file This includes several cleanups: * tune Makefile to compile out this file when SYSCTL=n. Now it looks like net/core/sysctl_net_core.c one; * move the ipv4_config to af_inet.c to exist all the time; * remove additional sysctl_ip_nonlocal_bind declaration (it is already declared in net/ip.h); * remove no nonger needed ifdefs from this file. This is a preparation for using ctl paths for net/ipv4/ sysctl table. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9c55e01c0cc835818475a6ce8c4d684df9949ac8 |
|
07-Nov-2007 |
Jens Axboe <jens.axboe@oracle.com> |
[TCP]: Splice receive support. Support for network splice receive. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6257ff2177ff02d7f260a7a501876aa41cb9a9f6 |
|
01-Nov-2007 |
Pavel Emelyanov <xemul@openvz.org> |
[NET]: Forget the zero_it argument of sk_alloc() Finally, the zero_it argument can be completely removed from the callers and from the function prototype. Besides, fix the checkpatch.pl warnings about using the assignments inside if-s. This patch is rather big, and it is a part of the previous one. I splitted it wishing to make the patches more readable. Hope this particular split helped. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
96793b482540f3a26e2188eaf75cb56b7829d3e3 |
|
17-Sep-2007 |
David L Stevens <dlstevens@us.ibm.com> |
[IPV4]: Add ICMPMsgStats MIB (RFC 4293) Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c40f6fff401f693f627d3d44ef7663b993943517 |
|
17-Sep-2007 |
Denis Cheng <crquan@gmail.com> |
[IPV4] af_inet.c: use ARRAY_SIZE macro from kernel.h instead Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1b8d7ae42d02e483ad94035cca851e4f7fbecb40 |
|
09-Oct-2007 |
Eric W. Biederman <ebiederm@xmission.com> |
[NET]: Make socket creation namespace safe. This patch passes in the namespace a new socket should be created in and has the socket code do the appropriate reference counting. By virtue of this all socket create methods are touched. In addition the socket create methods are modified so that they will fail if you attempt to create a socket in a non-default network namespace. Failing if we attempt to create a socket outside of the default network namespace ensures that as we incrementally make the network stack network namespace aware we will not export functionality that someone has not audited and made certain is network namespace safe. Allowing us to partially enable network namespaces before all of the exotic protocols are supported. Any protocol layers I have missed will fail to compile because I now pass an extra parameter into the socket creation code. [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3516ffb0fef710749daf288c0fe146503e0cf9d4 |
|
03-Aug-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg(). As discovered by Evegniy Polyakov, if we try to sendmsg after a connection reset, we can do incredibly stupid things. The core issue is that inet_sendmsg() tries to autobind the socket, but we should never do that for TCP. Instead we should just go straight into TCP's sendmsg() code which will do all of the necessary state and pending socket error checks. TCP's sendpage already directly vectors to tcp_sendpage(), so this merely brings sendmsg() in line with that. Signed-off-by: David S. Miller <davem@davemloft.net>
|
d212f87b068c9d72065ef579d85b5ee6b8b59381 |
|
27-Jun-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[NET]: IPV6 checksum offloading in network devices The existing model for checksum offload does not correctly handle devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag implies device can do any arbitrary protocol. This patch: * adds NETIF_F_IPV6_CSUM for those devices * fixes bnx2 and tg3 devices that need it * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO) * fixes assumptions about NETIF_F_ALL_CSUM in nat * adjusts bridge union of checksumming computation Signed-off-by: David S. Miller <davem@davemloft.net>
|
e63340ae6b6205fef26b40a75673d1c9c0c8bb90 |
|
08-May-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
5e0f04351d11e07a23b5ab4914282cbb78027e50 |
|
25-Apr-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Consolidate common SNMP code This patch moves the SNMP code shared between IPv4/IPv6 from proc.c into net/ipv4/af_inet.c. This makes sense because these functions aren't specific to /proc. As a result we can again skip proc.o if /proc is disabled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
334901700f9f58993ebd7f6136d3f9062460d34d |
|
21-Apr-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV4] SNMP: Move some statistic bits to net/ipv4/proc.c. This also fixes memory leak in error path. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
be776281aee54626a474ba06f91926b98bdd180d |
|
27-Mar-2007 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: inet_ehash_secret should be __read_mostly and set only once There is a very tiny probability that build_ehash_secret() is called at the same time by different CPUS. Also, using __read_mostly is a must for inet_ehash_secret Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b3da2cf37c5c6e47698957a25ab43a7223dbb90f |
|
23-Mar-2007 |
David S. Miller <davem@davemloft.net> |
[INET]: Use jhash + random secret for ehash. The days are gone when this was not an issue, there are folks out there with huge bot networks that can be used to attack the established hash tables on remote systems. So just like the routing cache and connection tracking hash, use Jenkins hash with random secret input. Signed-off-by: David S. Miller <davem@davemloft.net>
|
badff6d01a8589a1c828b0bf118903ca38627f4e |
|
13-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_reset_transport_header(skb) For the common, open coded 'skb->h.raw = skb->data' operation, so that we can later turn skb->h.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple cases: skb->h.raw = skb->data; skb->h.raw = {skb_push|[__]skb_pull}() The next ones will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
eddc9ec53be2ecdbf4efe0efd4a83052594f0ac0 |
|
21-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d56f90a7c96da5187f0cdf07ee7434fe6aa78bbc |
|
11-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce skb_network_header() For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
132adf54639cf7dd9315e8df89c2faa59f6e46d9 |
|
09-Mar-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[IPV4]: cleanup Add whitespace around keywords. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ae40eb1ef30ab4120bd3c8b7e3da99ee53d27a23 |
|
19-Mar-2007 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: Introduce SIOCGSTAMPNS ioctl to get timestamps with nanosec resolution Now network timestamps use ktime_t infrastructure, we can add a new ioctl() SIOCGSTAMPNS command to get timestamps in 'struct timespec'. User programs can thus access to nanosecond resolution. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> CC: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e905a9edab7f4f14f9213b52234e4a346c690911 |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] IPV4: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8eb9086f21c73b38b5ca27558db4c91d62d0e70b |
|
08-Feb-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>
|
469de9b90f739f130ab3d483e819888e977596b8 |
|
09-Jan-2007 |
Paul Moore <paul.moore@hp.com> |
[INET]: style updates for the inet_sock->is_icsk assignment fix A quick patch to change the inet_sock->is_icsk assignment to better fit with existing kernel coding style. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cbbd7d4f36a61631f8c0d73be43df985d1e7d6a6 |
|
05-Jan-2007 |
Paul Moore <paul.moore@hp.com> |
[INET]: Fix incorrect "inet_sock->is_icsk" assignment. The inet_create() and inet6_create() functions incorrectly set the inet_sock->is_icsk field. Both functions assume that the is_icsk field is large enough to hold at least a INET_PROTOSW_ICSK value when it is actually only a single bit. This patch corrects the assignment by doing a boolean comparison whose result will safely fit into a single bit field. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
714e85be3557222bc25f69c252326207c900a7db |
|
15-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV6]: Assorted trivial endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ba4e58eca8aa9473b44fdfd312f26c4a2e7798b3 |
|
27-Nov-2006 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
[NET]: Supporting UDP-Lite (RFC 3828) in Linux This is a revision of the previously submitted patch, which alters the way files are organized and compiled in the following manner: * UDP and UDP-Lite now use separate object files * source file dependencies resolved via header files net/ipv{4,6}/udp_impl.h * order of inclusion files in udp.c/udplite.c adapted accordingly [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828) This patch adds support for UDP-Lite to the IPv4 stack, provided as an extension to the existing UDPv4 code: * generic routines are all located in net/ipv4/udp.c * UDP-Lite specific routines are in net/ipv4/udplite.c * MIB/statistics support in /proc/net/snmp and /proc/net/udplite * shared API with extensions for partial checksum coverage [NET/IPv6]: Extension for UDP-Lite over IPv6 It extends the existing UDPv6 code base with support for UDP-Lite in the same manner as per UDPv4. In particular, * UDPv6 generic and shared code is in net/ipv6/udp.c * UDP-Litev6 specific extensions are in net/ipv6/udplite.c * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6 * support for IPV6_ADDRFORM * aligned the coding style of protocol initialisation with af_inet6.c * made the error handling in udpv6_queue_rcv_skb consistent; to return `-1' on error on all error cases * consolidation of shared code [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support The UDP-Lite patch further provides * API documentation for UDP-Lite * basic xfrm support * basic netfilter support for IPv4 and IPv6 (LOG target) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
72a3effaf633bcae9034b7e176bdbd78d64a71db |
|
16-Nov-2006 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: Size listen hash tables using backlog hint We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3ca3c68e76686bee058937ade2b96f4de58ee434 |
|
28-Sep-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: struct ip_options annotations ->faddr is net-endian; annotated as such, variables inferred to be net-endian annotated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
321efff7c3b7a26fa0522cb12b2af2ac82c05f1e |
|
28-Sep-2006 |
Olaf Kirch <okir@suse.de> |
[IPV4]: Fix order in inet_init failure path. This is just a minor buglet I came across by accident - when inet_init fails to register raw_prot, it jumps to out_unregister_udp_proto which should unregister UDP _and_ TCP. Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bada8adc4e6622764205921e6ba3f717aa03c882 |
|
27-Sep-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV4]: ip_route_connect() ipv4 address arguments annotated annotated address arguments (port number left alone for now); ditto for inferred net-endian variables in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ef047f5e1085d6393748d1ee27d6327905f098dc |
|
01-Sep-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET]: Use BUILD_BUG_ON() for checking size of skb->cb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ab32ea5d8a760e7dd4339634e95d7be24ee5b842 |
|
22-Sep-2006 |
Brian Haley <brian.haley@hp.com> |
[NET/IPV4/IPV6]: Change some sysctl variables to __read_mostly Change net/core, ipv4 and ipv6 sysctl variables to __read_mostly. Couldn't actually measure any performance increase while testing (.3% I consider noise), but seems like the right thing to do. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bf0d52492d82ad70684e17c8a46942c13d0e140e |
|
22-Sep-2006 |
Dave Jones <davej@redhat.com> |
[NET]: Remove unnecessary config.h includes from net/ config.h is automatically included by kbuild these days. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
beb8d13bed80f8388f1a9a107d07ddd342e627e8 |
|
05-Aug-2006 |
Venkat Yekkirala <vyekkirala@TrustedCS.com> |
[MLSXFRM]: Add flow labeling This labels the flows that could utilize IPSec xfrms at the points the flows are defined so that IPSec policy and SAs at the right label can be used. The following protos are currently not handled, but they should continue to be able to use single-labeled IPSec like they currently do. ipmr ip_gre ipip igmp sit sctp ip6_tunnel (IPv6 over IPv6 tunnel device) decnet Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a430a43d087545c96542ee64573237919109d370 |
|
08-Jul-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET] gso: Fix up GSO packets with broken checksums Certain subsystems in the stack (e.g., netfilter) can break the partial checksum on GSO packets. Until they're fixed, this patch allows this to work by recomputing the partial checksums through the GSO mechanism. Once they've all been converted to update the partial checksum instead of clearing it, this workaround can be removed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bbcf467dab42ea3c85f368df346c82af2fbba665 |
|
04-Jul-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Verify gso_type too in gso_segment We don't want nasty Xen guests to pass a TCPv6 packet in with gso_type set to TCPv4 or even UDP (or a packet that's both TCP and UDP). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
576a30eb6453439b3c37ba24455ac7090c247b5a |
|
27-Jun-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Added GSO header verification When GSO packets come from an untrusted source (e.g., a Xen guest domain), we need to verify the header integrity before passing it to the hardware. Since the first step in GSO is to verify the header, we can reuse that code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this bit set can only be fed directly to devices with the corresponding bit NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb is fed to the GSO engine which will allow the packet to be sent to the hardware if it passes the header check. This patch changes the sg flag to a full features flag. The same method can be used to implement TSO ECN support. We simply have to mark packets with CWR set with SKB_GSO_ECN so that only hardware with a corresponding NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment the packet, or segment the first MTU and pass the rest to the hardware for further segmentation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f4c50d990dcf11a296679dc05de3873783236711 |
|
22-Jun-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Add software TSOv4 This patch adds the GSO implementation for IPv4 TCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a536e0778781c1f2ce38cf8e1d82ce258f0193c1 |
|
29-Apr-2006 |
Heiko Carstens <heiko.carstens@de.ibm.com> |
[IPV4]: inet_init() -> fs_initcall Convert inet_init to an fs_initcall to make sure its called before any device driver's initcall. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
543d9cfeec4d58ad3fd974db5531b06b6b95deb4 |
|
21-Mar-2006 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[NET]: Identation & other cleanups related to compat_[gs]etsockopt cset No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3fdadf7d27e3fbcf72930941884387d1f4936f04 |
|
21-Mar-2006 |
Dmitry Mishin <dim@openvz.org> |
[NET]: {get|set}sockopt compatibility layer This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4fc268d24ceb9f4150777c1b5b2b8e6214e56b2b |
|
11-Jan-2006 |
Randy Dunlap <rdunlap@xenotime.net> |
[PATCH] capable/capability.h (net/) net: Use <linux/capability.h> where capable() is used. Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
b5e5fa5e093e42cab4ee3d6dcbc4f450ad29a723 |
|
03-Jan-2006 |
Christoph Hellwig <hch@lst.de> |
[NET]: Add a dev_ioctl() fallback to sock_ioctl() Currently all network protocols need to call dev_ioctl as the default fallback in their ioctl implementations. This patch adds a fallback to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD. This way all the procotol ioctl handlers can be simplified and we don't need to export dev_ioctl. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
|
14c850212ed8f8cbb5972ad6b8812e08a0bc901c |
|
27-Dec-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h To help in reducing the number of include dependencies, several files were touched as they were getting needed headers indirectly for stuff they use. Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had linux/dccp.h include twice. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
90ddc4f0470427df306f308ad03db6b6b21644b8 |
|
22-Dec-2005 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: move struct proto_ops to const I noticed that some of 'struct proto_ops' used in the kernel may share a cache line used by locks or other heavily modified data. (default linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at least) This patch makes sure a 'struct proto_ops' can be declared as const, so that all cpus can share all parts of it without false sharing. This is not mandatory : a driver can still use a read/write structure if it needs to (and eventually a __read_mostly) I made a global stubstitute to change all existing occurences to make them const. This should reduce the possibility of false sharing on SMP, and speedup some socket system calls. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d83d8461f902c672bc1bd8fbc6a94e19f092da97 |
|
14-Dec-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[IP_SOCKGLUE]: Remove most of the tcp specific calls As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
86c8f9d158f68538a971a47206a46a22c7479bac |
|
03-Dec-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4] Fix EPROTONOSUPPORT error in inet_create There is a coding error in inet_create that causes it to always return ESOCKTNOSUPPORT. It should return EPROTONOSUPPORT when there are protocols registered for a given socket type but none of them match the requested protocol. This is based on a patch by Jayachandran C. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a51482bde22f99c63fbbb57d5d46cc666384e379 |
|
08-Nov-2005 |
Jesper Juhl <jesper.juhl@gmail.com> |
[NET]: kfree cleanup From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>
|
cb7b593c2c808b32a1ea188599713c434b95f849 |
|
09-Sep-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[IPV4] fib_trie: fix proc interface Create one iterator for walking over FIB trie, and use it for all the /proc functions. Add a /proc/net/route output for backwards compatibility with old applications. Make initialization of fib_trie same as fib_hash so no #ifdef is needed in af_inet.c Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=5209 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ba89966c1984513f4f2cc0a6c182266be44ddd03 |
|
26-Aug-2005 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers This patch puts mostly read only data in the right section (read_mostly), to help sharing of these data between CPUS without memory ping pongs. On one of my production machine, tcp_statistics was sitting in a heavily modified cache line, so *every* SNMP update had to force a reload. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
20380731bc2897f2952ae055420972ded4cd786e |
|
16-Aug-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[NET]: Fix sparse warnings Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bb97d31f5130d677644d9931ef38613d1164ec94 |
|
10-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[INET]: Make inet_create try to load protocol modules Syntax is net-pf-PROTOCOL_FAMILY-PROTOCOL-SOCK_TYPE and if this fails net-pf-PROTOCOL_FAMILY-PROTOCOL. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
295f7324ff8d9ea58b4d3ec93b1aaa1d80e048a9 |
|
10-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timer With this we're very close to getting all of the current TCP refactorings in my dccp-2.6 tree merged, next changeset will export some functions needed by the current DCCP code and then dccp-2.6.git will be born! Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0a5578cf8e5e045aaa68643c17ce885426697c6b |
|
10-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[ICSK]: Generalise tcp_listen_{start,stop} This also moved inet_iif from tcp to inet_hashtables.h, as it is needed by the inet_lookup callers, perhaps this needs a bit of polishing, but for now seems fine. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
32519f11d38ea8f4f60896763bacec7db1760f9c |
|
10-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[INET]: Introduce inet_sk_rebuild_header From tcp_v4_rebuild_header, that already was pretty generic, I only needed to use sk->sk_protocol instead of the hardcoded IPPROTO_TCP and establish the requirement that INET transport layer protocols that want to use this function map TCP_SYN_SENT to its equivalent state. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e6848976b721eeb5551cd94673faafeef78d9f35 |
|
10-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[NET]: Cleanup INET_REFCNT_DEBUG code Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c877efb207bf4629cfa97ac13412f7392a873485 |
|
19-Jul-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[IPV4]: Fix up lots of little whitespace indentation stuff in fib_trie. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
30e224d76f34e041c30df66a4dcbeeb53556ea3f |
|
05-Jul-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4]: Fix crash in ip_rcv while booting related to netconsole Makes IPv4 ip_rcv registration happen last in af_inet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
19baf839ff4a8daa1f2a7400897094fc18e4f5e9 |
|
21-Jun-2005 |
Robert Olsson <Robert.Olsson@data.slu.se> |
[IPV4]: Add LC-Trie FIB lookup algorithm. Signed-off-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cdac4e07748934e37e415437055ed591aed9eb21 |
|
14-Jun-2005 |
Neil Horman <nhorman@redhat.com> |
[SCTP] Add support for ip_nonlocal_bind sysctl & IP_FREEBIND socket option Signed-off-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
02c30a84e6298b6b20a56f0896ac80b47839e134 |
|
06-May-2005 |
Jesper Juhl <juhl-lkml@dif.dk> |
[PATCH] update Ross Biro bouncing email address Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
5523662c4cd585b892811d7bb3e25d9a787e19b3 |
|
26-Apr-2005 |
Al Viro <viro@parcelfarce.linux.theplanet.co.uk> |
[NET]: kill gratitious includes of major.h A lot of places in there are including major.h for no reason whatsoever. Removed. And yes, it still builds. The history of that stuff is often amusing. E.g. for net/core/sock.c the story looks so, as far as I've been able to reconstruct it: we used to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of net/* had moved a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed... Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b453257f057b834fdf9f4a6ad6133598b79bd982 |
|
26-Apr-2005 |
Al Viro <viro@www.linux.org.uk> |
[PATCH] kill gratitious includes of major.h under net/* A lot of places in there are including major.h for no reason whatsoever. Removed. And yes, it still builds. The history of that stuff is often amusing. E.g. for net/core/sock.c the story looks so, as far as I've been able to reconstruct it: we used to need major.h in net/socket.c circa 1.1.early. In 1.1.13 that need had disappeared, along with register_chrdev(SOCKET_MAJOR, "socket", &net_fops) in sock_init(). Include had not. When 1.2 -> 1.3 reorg of net/* had moved a lot of stuff from net/socket.c to net/core/sock.c, this crap had followed... Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 |
|
17-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|