99a6ea48b591877d1cd6a51732c40a1d5321d961 |
|
31-Mar-2014 |
Lorenzo Colitti <lorenzo@google.com> |
net: core: Support UID-based routing. This contains the following commits: 1. cc2f522 net: core: Add a UID range to fib rules. 2. d7ed2bd net: core: Use the socket UID in routing lookups. 3. 2f9306a net: core: Add a RTA_UID attribute to routes. This is so that userspace can do per-UID route lookups. 4. 8e46efb net: ipv6: Use the UID in IPv6 PMTUD IPv4 PMTUD already does this because ipv4_sk_update_pmtu uses __build_flow_key, which includes the UID. Bug: 15413527 Change-Id: I81bd31dae655de9cce7d7a1f9a905dc1c2feba7c Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
|
f77d602124d865c38705df7fa25c03de9c284ad2 |
|
09-May-2013 |
Eric Dumazet <edumazet@google.com> |
ipv6: do not clear pinet6 field We have seen multiple NULL dereferences in __inet6_lookup_established() After analysis, I found that inet6_sk() could be NULL while the check for sk_family == AF_INET6 was true. Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP and TCP stacks. Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash table, we no longer can clear pinet6 field. This patch extends logic used in commit fcbdf09d9652c891 ("net: fix nulls list corruptions in sk_prot_alloc") TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method to make sure we do not clear pinet6 field. At socket clone phase, we do not really care, as cloning the parent (non NULL) pinet6 is not adding a fatal race. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6a5dc9e598fe90160fee7de098fa319665f5253e |
|
29-Apr-2013 |
Eric Dumazet <edumazet@google.com> |
net: Add MIB counters for checksum errors Add MIB counters for checksum errors in IP layer, and TCP/UDP/ICMP layers, to help diagnose problems. $ nstat -a | grep Csum IcmpInCsumErrors 72 0.0 TcpInCsumErrors 382 0.0 UdpInCsumErrors 463221 0.0 Icmp6InCsumErrors 75 0.0 Udp6InCsumErrors 173442 0.0 IpExtInCsumErrors 10884 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
44046a593eb770dbecdabf1c82bcd252f2a8337b |
|
19-Mar-2013 |
Tom Parkin <tparkin@katalix.com> |
udp: add encap_destroy callback Users of udp encapsulation currently have an encap_rcv callback which they can use to hook into the udp receive path. In situations where a encapsulation user allocates resources associated with a udp encap socket, it may be convenient to be able to also hook the proto .destroy operation. For example, if an encap user holds a reference to the udp socket, the destroy hook might be used to relinquish this reference. This patch adds a socket destroy hook into udp, which is set and enabled in the same way as the existing encap_rcv hook. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
842df0739776fc9af7ac15968b44415a31ba9be4 |
|
08-Mar-2013 |
Hannes Frederic Sowa <hannes@stressinduktion.org> |
ipv6: use newly introduced __ipv6_addr_needs_scope_id and ipv6_iface_scope_id This patch requires multicast interface-scoped addresses to supply a sin6_scope_id. Because the sin6_scope_id is now also correctly used in case of interface-scoped multicast traffic this enables one to use interface scoped addresses over interfaces which are not targeted by the default multicast route (the route has to be put there manually, though). getsockname() and getpeername() now return the correct sin6_scope_id in case of interface-local mc addresses. v2: a) rebased ontop of patch 1/4 (now uses ipv6_addr_props) v3: a) reverted changes for ipv6_addr_props v4: a) unchanged Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>dave Signed-off-by: David S. Miller <davem@davemloft.net>
|
73df66f8b1926c59cbc83000af6bf37ecc5509dd |
|
31-Jan-2013 |
Tom Parkin <tparkin@katalix.com> |
ipv6: rename datagram_send_ctl and datagram_recv_ctl The datagram_*_ctl functions in net/ipv6/datagram.c are IPv6-specific. Since datagram_send_ctl is publicly exported it should be appropriately named to reflect the fact that it's for IPv6 only. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
72289b96c943757220ccc681fe2e22b46e21aced |
|
22-Jan-2013 |
Tom Herbert <therbert@google.com> |
soreuseport: UDP/IPv6 implementation Motivation for soreuseport would be something like a DNS server. An alternative would be to recv on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock. Note that SO_REUSEADDR already allows multiple UDP sockets to bind to the same port, however there is no provision to prevent hijacking and nothing to distribute packets across all the sockets sharing the same bound port. This patch does not change the semantics of SO_REUSEADDR, but provides usable functionality of it for unicast. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
acb3e04119fbf9145eb6d6bb707f6fb662ab4d3b |
|
07-Jan-2013 |
Cong Wang <amwang@redhat.com> |
ipv6: move csum_ipv6_magic() and udp6_csum_init() into static library As suggested by David, udp6_csum_init() is too big to be inlined, move it to ipv6 static library, net/ipv6/ip6_checksum.c. And the generic csum_ipv6_magic() too. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c6b641a4c6b32f39db678c2441cb1ef824110d74 |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
ipv6: Pull IPv6 GSO registration out of the module Sing GSO support is now separate, pull it out of the module and make it its own init call. Remove the cleanup functions as they are no longer called. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5edbb07dc9474b7d4cd4391a2e6551ad067a0f96 |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
ipv6: Separate out UDP offload functionality Pull UDP GSO code into a separate file in preparation for moving the code out of the module. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3336288a9feaa809839adbaf05778dc2f16665dc |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
ipv6: Switch to using new offload infrastructure. Switch IPv6 protocol to using the new GRO/GSO calls and data. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8ca896cfdd17f32f5aa2747644733ebf3725360d |
|
15-Nov-2012 |
Vlad Yasevich <vyasevic@redhat.com> |
ipv6: Add new offload registration infrastructure. Create a new data structure for IPv6 protocols that holds GRO/GSO callbacks and a new array to track the protocols that register GRO/GSO. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
979402b16cde048ced4839e21cc49e0779352b80 |
|
06-Sep-2012 |
Eric Dumazet <edumazet@google.com> |
udp: increment UDP_MIB_INERRORS if copy failed In UDP recvmsg(), we miss an increase of UDP_MIB_INERRORS if the copy of skb to userspace failed for whatever reason. Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a7cb5a49bf64ba64864ae16a6be028f8b0d3cc06 |
|
24-May-2012 |
Eric W. Biederman <ebiederm@xmission.com> |
userns: Print out socket uids in a user namespace aware fashion. Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
ec18d9a2691d69cd14b48f9b919fddcef28b7f5c |
|
12-Jul-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Add redirect support to all protocol icmp error handlers. Signed-off-by: David S. Miller <davem@davemloft.net>
|
22911fc581f6a241e2897a7a8603e97344a6ec82 |
|
27-Jun-2012 |
Eric Dumazet <edumazet@google.com> |
net: skb_free_datagram_locked() doesnt drop all packets dropwatch wrongly diagnose all received UDP packets as drops. This patch removes trace_kfree_skb() done in skb_free_datagram_locked(). Locations calling skb_free_datagram_locked() should do it on their own. As a result, drops are accounted on the right function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
81aded24675ebda5de8a68843250ad15584ac38a |
|
15-Jun-2012 |
David S. Miller <davem@davemloft.net> |
ipv6: Handle PMTU in ICMP error handlers. One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts to handle the error pass the 32-bit info cookie in network byte order whereas ipv4 passes it around in host byte order. Like the ipv4 side, we have two helper functions. One for when we have a socket context and one for when we do not. ip6ip6 tunnels are not handled here, because they handle PMTU events by essentially relaying another ICMP packet-too-big message back to the original sender. This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all kinds of situations that simply cannot happen when we do the PMTU update directly using a fully resolved route. In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very likely be removed or changed into a BUG_ON() check. We should never have a prefixed ipv6 route when we get there. Another piece of strange history here is that TCP and DCCP, unlike in ipv4, never invoke the update_pmtu() method from their ICMP error handlers. This is incredibly astonishing since this is the context where we have the most accurate context in which to make a PMTU update, namely we have a fully connected socket and associated cached socket route. Signed-off-by: David S. Miller <davem@davemloft.net>
|
3dde25988292864a582b4a9389b1ae835aa3fe80 |
|
19-May-2012 |
Jeffrin Jose <ahiliation@yahoo.co.in> |
net:ipv6:fixed space issues relating to operators. Fixed space issues relating to operators found by checkpatch.pl tool in net/ipv6/udp.c Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9a52e97e24d6da744e6c3332b8dd478a4974983e |
|
19-May-2012 |
Jeffrin Jose <ahiliation@yahoo.co.in> |
net:ipv6:fixed a trailing white space issue. Fixed a trailing white space issue found by checkpatch.pl tool in net/ipv6/udp.c Signed-off-by: Jeffrin Jose <ahiliation@yahoo.co.in> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d7f3f62167bc2299d9669888b493b6e6ba561c35 |
|
27-Apr-2012 |
Benjamin LaHaise <bcrl@kvack.org> |
net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6 Now that the sematics of udpv6_queue_rcv_skb() match IPv4's udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cb80ef463d1881757ade3197cdf875a2ff856651 |
|
27-Apr-2012 |
Benjamin LaHaise <bcrl@kvack.org> |
net/ipv6/udp: UDP encapsulation: move socket locking into udpv6_queue_rcv_skb() In order to make sure that when the encap_rcv() hook is introduced it is not called with the socket lock held, move socket locking from callers into udpv6_queue_rcv_skb(), matching what happens in IPv4. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f7ad74fef3af6c6e2ef7f01c5589d77fe7db3d7c |
|
27-Apr-2012 |
Benjamin LaHaise <bcrl@kvack.org> |
net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb This is the first step in reworking the IPv6 UDP code to be structured more like the IPv4 UDP code. This patch creates __udpv6_queue_rcv_skb() with the equivalent sematics to __udp_queue_rcv_skb(), and wires it up to the backlog_rcv method. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f545a38f74584cc7424cb74f792a00c6d2589485 |
|
23-Apr-2012 |
Eric Dumazet <edumazet@google.com> |
net: add a limit parameter to sk_add_backlog() sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the memory limit. We need to make this limit a parameter for TCP use. No functional change expected in this patch, all callers still using the old sk_rcvbuf limit. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3f518bf745cbd6007d8069100fb9cb09e960c872 |
|
21-Feb-2012 |
Pavel Emelyanov <xemul@parallels.com> |
datagram: Add offset argument to __skb_recv_datagram This one is only considered for MSG_PEEK flag and the value pointed by it specifies where to start peeking bytes from. If the offset happens to point into the middle of the returned skb, the offset within this skb is put back to this very argument. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c4062dfc425e94290ac427a98d6b4721dd2bc91f |
|
08-Feb-2012 |
Erich E. Hoover <ehoover@mines.edu> |
ipv6: Implement IPV6_UNICAST_IF socket option. The IPV6_UNICAST_IF feature is the IPv6 compliment to IP_UNICAST_IF. Signed-off-by: Erich E. Hoover <ehoover@mines.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fce823381e3c082ba1b2e15d5151d1aa8afdc9e9 |
|
09-Dec-2011 |
Pavel Emelyanov <xemul@parallels.com> |
udp: Export code sk lookup routines The UDP diag get_exact handler will require them to find a socket by provided net, [sd]addr-s, [sd]ports and device. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
59c2cdae2791c0b2ee13d148edc6b771e7e7953f |
|
01-Dec-2011 |
David S. Miller <davem@davemloft.net> |
Revert "udp: remove redundant variable" This reverts commit 81d54ec8479a2c695760da81f05b5a9fb2dbe40a. If we take the "try_again" goto, due to a checksum error, the 'len' has already been truncated. So we won't compute the same values as the original code did. Reported-by: paul bilke <fsmail@conspiracy.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4e3fd7a06dc20b2d8ec6892233ad2012968fe7b6 |
|
21-Nov-2011 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: remove ipv6_addr_copy() C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c8f44affb7244f2ac3e703cab13d55ede27621bb |
|
15-Nov-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: introduce and use netdev_features_t for device features sets v2: add couple missing conversions in drivers split unexporting netdev_fix_features() implemented %pNF convert sock::sk_route_(no?)caps Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d826eb14ecef3574b6b3be55e5f4329f4a76fbf3 |
|
09-Nov-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv4: PKTINFO doesnt need dst reference Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit : > At least, in recent kernels we dont change dst->refcnt in forwarding > patch (usinf NOREF skb->dst) > > One particular point is the atomic_inc(dst->refcnt) we have to perform > when queuing an UDP packet if socket asked PKTINFO stuff (for example a > typical DNS server has to setup this option) > > I have one patch somewhere that stores the information in skb->cb[] and > avoid the atomic_{inc|dec}(dst->refcnt). > OK I found it, I did some extra tests and believe its ready. [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference When a socket uses IP_PKTINFO notifications, we currently force a dst reference for each received skb. Reader has to access dst to get needed information (rt_iif & rt_spec_dst) and must release dst reference. We also forced a dst reference if skb was put in socket backlog, even without IP_PKTINFO handling. This happens under stress/load. We can instead store the needed information in skb->cb[], so that only softirq handler really access dst, improving cache hit ratios. This removes two atomic operations per packet, and false sharing as well. On a benchmark using a mono threaded receiver (doing only recvmsg() calls), I can reach 720.000 pps instead of 570.000 pps. IP_PKTINFO is typically used by DNS servers, and any multihomed aware UDP application. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
73cb88ecb950ee67906d02354f781ea293bcf895 |
|
30-Oct-2011 |
Arjan van de Ven <arjan@linux.intel.com> |
net: make the tcp and udp file_operations for the /proc stuff const the tcp and udp code creates a set of struct file_operations at runtime while it can also be done at compile time, with the added benefit of then having these file operations be const. the trickiest part was to get the "THIS_MODULE" reference right; the naive method of declaring a struct in the place of registration would not work for this reason. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ec0506dbe4e240ecd4c32bf74c84a88ce1ddb414 |
|
28-Aug-2011 |
Maciej Żenczykowski <maze@google.com> |
net: relax PKTINFO non local ipv6 udp xmit check Allow transparent sockets to be less restrictive about the source ip of ipv6 udp packets being sent. Google-Bug-Id: 5018138 Signed-off-by: Maciej Żenczykowski <maze@google.com> CC: "Erik Kline" <ek@google.com> CC: "Lorenzo Colitti" <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bdeab991918663aed38757904219e8398214334c |
|
14-Aug-2011 |
Tom Herbert <therbert@google.com> |
rps: Add flag to skb to indicate rxhash is based on L4 tuple The l4_rxhash flag was added to the skb structure to indicate that the rxhash value was computed over the 4 tuple for the packet which includes the port information in the encapsulated transport packet. This is used by the stack to preserve the rxhash value in __skb_rx_tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
33d480ce6d43326e2541fd79b3548858a174ec3c |
|
11-Aug-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: cleanup some rcu_dereference_raw RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
87c48fa3b4630905f98268dde838ee43626a060c |
|
22-Jul-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: make fragment identifications less predictable IPv6 fragment identification generation is way beyond what we use for IPv4 : It uses a single generator. Its not scalable and allows DOS attacks. Now inetpeer is IPv6 aware, we can use it to provide a more secure and scalable frag ident generator (per destination, instead of system wide) This patch : 1) defines a new secure_ipv6_id() helper 2) extends inet_getid() to provide 32bit results 3) extends ipv6_select_ident() with a new dest parameter Reported-by: Fernando Gont <fernando@gont.com.ar> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9cfaa8def1c795a512bc04f2aec333b03724ca2e |
|
21-Jun-2011 |
Xufeng Zhang <xufeng.zhang@windriver.com> |
udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet Consider this scenario: When the size of the first received udp packet is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags. However, if checksum error happens and this is a blocking socket, it will goto try_again loop to receive the next packet. But if the size of the next udp packet is smaller than receive buffer, MSG_TRUNC flag should not be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before receive the next packet, MSG_TRUNC is still set, which is wrong. Fix this problem by clearing MSG_TRUNC flag when starting over for a new packet. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
32c90254ed4a0c698caa0794ebb4de63fcc69631 |
|
21-Jun-2011 |
Xufeng Zhang <xufeng.zhang@windriver.com> |
ipv6/udp: Use the correct variable to determine non-blocking condition udpv6_recvmsg() function is not using the correct variable to determine whether or not the socket is in non-blocking operation, this will lead to unexpected behavior when a UDP checksum error occurs. Consider a non-blocking udp receive scenario: when udpv6_recvmsg() is called by sock_common_recvmsg(), MSG_DONTWAIT bit of flags variable in udpv6_recvmsg() is cleared by "flags & ~MSG_DONTWAIT" in this call: err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); i.e. with udpv6_recvmsg() getting these values: int noblock = flags & MSG_DONTWAIT int flags = flags & ~MSG_DONTWAIT So, when udp checksum error occurs, the execution will go to csum_copy_err, and then the problem happens: csum_copy_err: ............... if (flags & MSG_DONTWAIT) return -EAGAIN; goto try_again; ............... But it will always go to try_again as MSG_DONTWAIT has been cleared from flags at call time -- only noblock contains the original value of MSG_DONTWAIT, so the test should be: if (noblock) return -EAGAIN; This is also consistent with what the ipv4/udp code does. Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
71338aa7d050c86d8765cd36e46be514fb0ebbce |
|
23-May-2011 |
Dan Rosenberg <drosenberg@vsecurity.com> |
net: convert %p usage to %pK The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
96339d6c490a32de35fa798ca7922d13a8538ecd |
|
22-Apr-2011 |
Shan Wei <shanwei@cn.fujitsu.com> |
net:use help function of skb_checksum_start_offset to calculate offset Although these are equivalent, but the skb_checksum_start_offset() is more readable. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b71d1d426d263b0b6cb5760322efebbfc89d4463 |
|
22-Apr-2011 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: constify ip headers and in6_addr Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers where possible, to make code intention more obvious. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a9cf73ea7ff78f52662c8658d93c226effbbedde |
|
20-Apr-2011 |
Shan Wei <shanwei@cn.fujitsu.com> |
ipv6: udp: fix the wrong headroom check At this point, skb->data points to skb_transport_header. So, headroom check is wrong. For some case:bridge(UFO is on) + eth device(UFO is off), there is no enough headroom for IPv6 frag head. But headroom check is always false. This will bring about data be moved to there prior to skb->head, when adding IPv6 frag header to skb. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
47482f132a689af168fae3055ff1899dfd032d3a |
|
06-Apr-2011 |
Neil Horman <nhorman@tuxdriver.com> |
ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2) properly record sk_rxhash in ipv6 sockets (v2) Noticed while working on another project that flows to sockets which I had open on a test systems weren't getting steered properly when I had RFS enabled. Looking more closely I found that: 1) The affected sockets were all ipv6 2) They weren't getting steered because sk->sk_rxhash was never set from the incomming skbs on that socket. This was occuring because there are several points in the IPv4 tcp and udp code which save the rxhash value when a new connection is established. Those calls to sock_rps_save_rxhash were never added to the corresponding ipv6 code paths. This patch adds those calls. Tested by myself to properly enable RFS functionalty on ipv6. Change notes: v2: Filtered UDP to only arm RFS on bound sockets (Eric Dumazet) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1958b856c1a59c0f1e892b92debb8c9fe4f364dc |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Put fl6_* macros to struct flowi6 and use them again. Signed-off-by: David S. Miller <davem@davemloft.net>
|
4c9483b2fb5d2548c3cc1fe03cdd4484ceeb5d1c |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>
|
6281dcc94a96bd73017b2baa8fa83925405109ef |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Make flowi ports AF dependent. Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>
|
1d28f42c1bd4bb2363d88df74d0128b4da135b4a |
|
12-Mar-2011 |
David S. Miller <davem@davemloft.net> |
net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>
|
68d0c6d34d586a893292d4fb633a3bf8c547b222 |
|
01-Mar-2011 |
David S. Miller <davem@davemloft.net> |
ipv6: Consolidate route lookup sequences. Route lookups follow a general pattern in the ipv6 code wherein we first find the non-IPSEC route, potentially override the flow destination address due to ipv6 options settings, and then finally make an IPSEC search using either xfrm_lookup() or __xfrm_lookup(). __xfrm_lookup() is used when we want to generate a blackhole route if the key manager needs to resolve the IPSEC rules (in this case -EREMOTE is returned and the original 'dst' is left unchanged). Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC resolution is necessary, we simply fail the lookup completely. All of these cases are encapsulated into two routines, ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which handles unconnected UDP datagram sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
|
04ed3e741d0f133e02bed7fa5c98edba128f90e7 |
|
25-Jan-2011 |
Michał Mirosław <mirq-linux@rere.qmqm.pl> |
net: change netdev->features to u32 Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fcbdf09d9652c8919dcf47072e3ae7dcb4eb98ac |
|
16-Dec-2010 |
Octavian Purdila <opurdila@ixiacom.com> |
net: fix nulls list corruptions in sk_prot_alloc Special care is taken inside sk_port_alloc to avoid overwriting skc_node/skc_nulls_node. We should also avoid overwriting skc_bind_node/skc_portaddr_node. The patch fixes the following crash: BUG: unable to handle kernel paging request at fffffffffffffff0 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0 [<ffffffff812eda35>] udp_rcv+0x15/0x20 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0 Signed-off-by: Leonard Crestez <lcrestez@ixiacom.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c07224005dd3fe746246acadc9be652a588a4d7f |
|
09-Dec-2010 |
Jiri Pirko <jpirko@redhat.com> |
net/ipv6/udp.c: fix typo in flush_stack() skb1 should be passed as parameter to sk_rcvqueues_full() here. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
68835aba4d9b74e2f94106d13b6a4bddc447c4c8 |
|
30-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: optimize INET input path further Followup of commit b178bb3dfc30 (net: reorder struct sock fields) Optimize INET input path a bit further, by : 1) moving sk_refcnt close to sk_lock. This reduces number of dirtied cache lines by one on 64bit arches (and 64 bytes cache line size). 2) moving inet_daddr & inet_rcv_saddr at the beginning of sk (same cache line than hash / family / bound_dev_if / nulls_node) This reduces number of accessed cache lines in lookups by one, and dont increase size of inet and timewait socks. inet and tw sockets now share same place-holder for these fields. Before patch : offsetof(struct sock, sk_refcnt) = 0x10 offsetof(struct sock, sk_lock) = 0x40 offsetof(struct sock, sk_receive_queue) = 0x60 offsetof(struct inet_sock, inet_daddr) = 0x270 offsetof(struct inet_sock, inet_rcv_saddr) = 0x274 After patch : offsetof(struct sock, sk_refcnt) = 0x44 offsetof(struct sock, sk_lock) = 0x48 offsetof(struct sock, sk_receive_queue) = 0x68 offsetof(struct inet_sock, inet_daddr) = 0x0 offsetof(struct inet_sock, inet_rcv_saddr) = 0x4 compute_score() (udp or tcp) now use a single cache line per ignored item, instead of two. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c31504dc0d1dc853dcee509d9999169a9097a717 |
|
15-Nov-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
udp: use atomic_inc_not_zero_hint UDP sockets refcount is usually 2, unless an incoming frame is going to be queued in receive or backlog queue. Using atomic_inc_not_zero_hint() permits to reduce latency, because processor issues less memory transactions. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0d7da9ddd9a4eb7808698d04b98bf9d62d02649b |
|
25-Oct-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: add __rcu annotation to sk_filter Add __rcu annotation to : (struct sock)->sk_filter And use appropriate rcu primitives to reduce sparse warnings if CONFIG_SPARSE_RCU_POINTER=y Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
aa976fc011efa1f0e3290c6c9addf7c20757f885 |
|
21-Oct-2010 |
Balazs Scheidler <bazsi@balabit.hu> |
tproxy: added udp6_lib_lookup function Just like with IPv4, we need access to the UDP hash table to look up local sockets, but instead of exporting the global udp_table, export a lookup function. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
|
88440ae70eda83d0cc94148d404f4990c9f1289c |
|
21-Oct-2010 |
Balazs Scheidler <bazsi@balabit.hu> |
tproxy: added const specifiers to udp lookup functions The parameters for various UDP lookup functions were non-const, even though they could be const. TProxy has some const references and instead of downcasting it, I added const specifiers along the path. Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
|
719f835853a92f6090258114a72ffe41f09155cd |
|
08-Sep-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
udp: add rehash on connect() commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation) added a secondary hash on UDP, hashed on (local addr, local port). Problem is that following sequence : fd = socket(...) connect(fd, &remote, ...) not only selects remote end point (address and port), but also sets local address, while UDP stack stored in secondary hash table the socket while its local address was INADDR_ANY (or ipv6 equivalent) Sequence is : - autobind() : choose a random local port, insert socket in hash tables [while local address is INADDR_ANY] - connect() : set remote address and port, change local address to IP given by a route lookup. When an incoming UDP frame comes, if more than 10 sockets are found in primary hash table, we switch to secondary table, and fail to find socket because its local address changed. One solution to this problem is to rehash datagram socket if needed. We add a new rehash(struct socket *) method in "struct proto", and implement this method for UDP v4 & v6, using a common helper. This rehashing only takes care of secondary hash table, since primary hash (based on local port only) is not changed. Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
|
20c59de2e6b6bc74bbf714dcd4e720afe8d516cf |
|
01-Jun-2010 |
Arnaud Ebalard <arno@natisbad.org> |
ipv6: Refactor update of IPv6 flowi destination address for srcrt (RH) option There are more than a dozen occurrences of following code in the IPv6 stack: if (opt && opt->srcrt) { struct rt0_hdr *rt0 = (struct rt0_hdr *) opt->srcrt; ipv6_addr_copy(&final, &fl.fl6_dst); ipv6_addr_copy(&fl.fl6_dst, rt0->addr); final_p = &final; } Replace those with a helper. Note that the helper overrides final_p in all cases. This is ok as final_p was previously initialized to NULL when declared. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b1faf5666438090a4dc4fceac8502edc7788b7e3 |
|
01-Jun-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sock_queue_err_skb() dont mess with sk_forward_alloc Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit 29030374 (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2903037400a26e7c0cc93ab75a7d62abfacdf485 |
|
29-May-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: fix sk_forward_alloc corruptions As David found out, sock_queue_err_skb() should be called with socket lock hold, or we risk sk_forward_alloc corruption, since we use non atomic operations to update this field. This patch adds bh_lock_sock()/bh_unlock_sock() pair to three spots. (BH already disabled) 1) skb_tstamp_tx() 2) Before calling ip_icmp_error(), in __udp4_lib_err() 3) Before calling ipv6_icmp_error(), in __udp6_lib_err() Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8a74ad60a546b13bd1096b2a61a7a5c6fd9ae17c |
|
26-May-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: fix lock_sock_bh/unlock_sock_bh This new sock lock primitive was introduced to speedup some user context socket manipulation. But it is unsafe to protect two threads, one using regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh This patch changes lock_sock_bh to be careful against 'owned' state. If owned is found to be set, we must take the slow path. lock_sock_bh() now returns a boolean to say if the slow path was taken, and this boolean is used at unlock_sock_bh time to call the appropriate unlock function. After this change, BH are either disabled or enabled during the lock_sock_bh/unlock_sock_bh protected section. This might be misleading, so we rename these functions to lock_sock_fast()/unlock_sock_fast(). Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d6bc0149d8f2300bffa03ea6fea3ca39744277a6 |
|
06-May-2010 |
Bjørn Mork <bjorn@mork.no> |
ipv6: udp: make short packet logging consistent with ipv4 Adding addresses and ports to the short packet log message, like ipv4/udp.c does it, makes these messages a lot more useful: [ 822.182450] UDPv6: short packet: From [2001:db8:ffb4:3::1]:47839 23715/178 to [2001:db8:ffb4:3:5054:ff:feff:200]:1234 This requires us to drop logging in case pskb_may_pull() fails, which also is consistent with ipv4/udp.c Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f84af32cbca70a3c6d30463dc08c7984af11c277 |
|
29-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: ip_queue_rcv_skb() helper When queueing a skb to socket, we can immediately release its dst if target socket do not use IP_CMSG_PKTINFO. tcp_data_queue() can drop dst too. This to benefit from a hot cache line and avoid the receiver, possibly on another cpu, to dirty this cache line himself. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4b0b72f7dd617b13abd1b04c947e15873e011a24 |
|
28-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: speedup udp receive path Since commit 95766fff ([UDP]: Add memory accounting.), each received packet needs one extra sock_lock()/sock_release() pair. This added latency because of possible backlog handling. Then later, ticket spinlocks added yet another latency source in case of DDOS. This patch introduces lock_sock_bh() and unlock_sock_bh() synchronization primitives, avoiding one atomic operation and backlog processing. skb_free_datagram_locked() uses them instead of full blown lock_sock()/release_sock(). skb is orphaned inside locked section for proper socket memory reclaim, and finally freed outside of it. UDP receive path now take the socket spinlock only once. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c377411f2494a931ff7facdbb3a6839b1266bcf6 |
|
28-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_add_backlog() take rmem_alloc into account Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4b340ae20d0e2366792abe70f46629e576adaf5e |
|
23-Apr-2010 |
Brian Haley <brian.haley@hp.com> |
IPv6: Complete IPV6_DONTFRAG support Finally add support to detect a local IPV6_DONTFRAG event and return the relevant data to the user if they've enabled IPV6_RECVPATHMTU on the socket. The next recvmsg() will return no data, but have an IPV6_PATHMTU as ancillary data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
13b52cd44670e3359055e9918d0e766d89836425 |
|
23-Apr-2010 |
Brian Haley <brian.haley@hp.com> |
IPv6: Add dontfrag argument to relevant functions Add dontfrag argument to relevant functions for IPV6_DONTFRAG support, as well as allowing the value to be passed-in via ancillary cmsg data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0eae88f31ca2b88911ce843452054139e028771f |
|
21-Apr-2010 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: Fix various endianness glitches Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1223c67c0938d2df309fde618bd82c87c8c1af04 |
|
08-Apr-2010 |
Jorge Boncompte [DTI2] <jorge@dti2.net> |
udp: fix for unicast RX path optimization Commits 5051ebd275de672b807c28d93002c2fb0514a3c9 and 5051ebd275de672b807c28d93002c2fb0514a3c9 ("ipv[46]: udp: optimize unicast RX path") broke some programs. After upgrading a L2TP server to 2.6.33 it started to fail, tunnels going up an down, after the 10th tunnel came up. My modified rp-l2tp uses a global unconnected socket bound to (INADDR_ANY, 1701) and one connected socket per tunnel after parameter negotiation. After ten sockets were open and due to mixed parameters to udp[46]_lib_lookup2() kernel started to drop packets. Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
5a0e3ad6af8660be21ca98a971cd00f331318c05 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
a3a858ff18a72a8d388e31ab0d98f7e944841a62 |
|
04-Mar-2010 |
Zhu Yi <yi.zhu@intel.com> |
net: backlog functions rename sk_add_backlog -> __sk_add_backlog sk_add_backlog_limited -> sk_add_backlog Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
55349790d7cbf0d381873a7ece1dcafcffd4aaa9 |
|
04-Mar-2010 |
Zhu Yi <yi.zhu@intel.com> |
udp: use limited socket backlog Make udp adapt to the limited socket backlog change. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3ffe533c87281b68d469b279ff3a5056f9c75862 |
|
18-Feb-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
ipv6: drop unused "dev" arg of icmpv6_send() Dunno, what was the idea, it wasn't used for a long time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
81d54ec8479a2c695760da81f05b5a9fb2dbe40a |
|
10-Feb-2010 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
udp: remove redundant variable The variable 'copied' is used in udp_recvmsg() to emphasize that the passed 'len' is adjusted to fit the actual datagram length. But the same can be done by adjusting 'len' directly. This patch thus removes the indirection. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
2c8c1e7297e19bdef3c178c3ea41d898a7716e3e |
|
17-Jan-2010 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: spread __net_init, __net_exit __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
856540ee3116ac04a49bc06c2f30f54dd3faf7db |
|
09-Nov-2009 |
Brian Haley <brian.haley@hp.com> |
IPv6: use ipv6_addr_v4mapped() Change udp6_portaddr_hash() to use ipv6_addr_v4mapped() inline instead of ipv6_addr_type(). Signed-off-by: Brian Haley <brian.haley@hp.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
30fff9231fad757c061285e347b33c5149c2c2e4 |
|
09-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
udp: bind() optimisation UDP bind() can be O(N^2) in some pathological cases. Thanks to secondary hash tables, we can make it O(N) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f6b8f32ca71406de718391369490f6b1e81fe0bb |
|
08-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
udp: multicast RX should increment SNMP/sk_drops counter in allocation failures When skb_clone() fails, we should increment sk_drops and SNMP counters. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a1ab77f97ed03f5dae66ae4c64375beffab83772 |
|
08-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: udp: Optimise multicast reception IPV6 UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
fddc17defa22d8caba1cdfb2e22b50bb4b9f35c0 |
|
08-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ipv6: udp: optimize unicast RX path We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, in6addr_any) chain to find sockets not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d4cada4ae1c012815f95fa507eb86a0ae9d607d7 |
|
08-Nov-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
udp: split sk_hash into two u16 hashes Union sk_hash with two u16 hashes for udp (no extra memory taken) One 16 bits hash on (local port) value (the previous udp 'hash') One 16 bits hash on (local address, local port) values, initialized but not yet used. This second hash is using jenkin hash for better distribution. Because the 'port' is xored later, a partial hash is performed on local address + net_hash_mix(net) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
13f18aa05f5abe135f47b6417537ae2b2fedc18c |
|
06-Nov-2009 |
Eric Paris <eparis@redhat.com> |
net: drop capability from protocol definitions struct can_proto had a capability field which wasn't ever used. It is dropped entirely. struct inet_protosw had a capability field which can be more clearly expressed in the code by just checking if sock->type = SOCK_RAW. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9d410c796067686b1e032d54ce475b7055537138 |
|
30-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: fix sk_forward_alloc corruption On UDP sockets, we must call skb_free_datagram() with socket locked, or risk sk_forward_alloc corruption. This requirement is not respected in SUNRPC. Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8edf19c2fe028563fc6ea9cb1995b8ee4172d4b6 |
|
15-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_drops consolidation part 2 - skb_kill_datagram() can increment sk->sk_drops itself, not callers. - UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c720c7e8383aff1cb219bddf474ed89d850336e3 |
|
15-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
inet: rename some inet_sock fields In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
766e9037cc139ee25ed93ee5ad11e1450c4b99f6 |
|
15-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: sk_drops consolidation sock_queue_rcv_skb() can update sk_drops itself, removing need for callers to take care of it. This is more consistent since sock_queue_rcv_skb() also reads sk_drops when queueing a skb. This adds sk_drops managment to many protocols that not cared yet. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3b885787ea4112eaa80945999ea0901bf742707f |
|
12-Oct-2009 |
Neil Horman <nhorman@tuxdriver.com> |
net: Generalize socket rx gap / receive queue overflow cmsg Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit 977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f86dcc5aa8c7908f2c287e7a211228df599e3e71 |
|
07-Oct-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
udp: dynamically size hash tables at boot time UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame has to lookup all sockets to find best match, so long chains hurt latency. Instead of a fixed size hash table that cant be perfect for every needs, let UDP stack choose its table size at boot time like tcp/ip route, using alloc_large_system_hash() helper Add an optional boot parameter, uhash_entries=x so that an admin can force a size between 256 and 65536 if needed, like thash_entries and rhash_entries. dmesg logs two new lines : [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes) [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes) Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b301e82cf8104cfddbe5452ebe625bab49597c64 |
|
07-Oct-2009 |
Brian Haley <brian.haley@hp.com> |
IPv6: use ipv6_addr_set_v4mapped() Might as well use the ipv6_addr_set_v4mapped() inline we created last year. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
51953d5bc43e468f24cc573a45cde1d32af129b8 |
|
05-Oct-2009 |
Brian Haley <brian.haley@hp.com> |
Use sk_mark for IPv6 routing lookups Atis Elsts wrote: > Not sure if there is need to fill the mark from skb in tunnel xmit functions. In any case, it's not done for GRE or IPIP tunnels at the moment. Ok, I'll just drop that part, I'm not sure what should be done in this case. > Also, in this patch you are doing that for SIT (v6-in-v4) tunnels only, and not doing it for v4-in-v6 or v6-in-v6 tunnels. Any reason for that? I just sent that patch out too quickly, here's a better one with the updates. Add support for IPv6 route lookups using sk_mark. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b7058842c940ad2c08dd829b21e5c92ebe3b8758 |
|
01-Oct-2009 |
David S. Miller <davem@davemloft.net> |
net: Make setsockopt() optlen be unsigned. This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
|
41135cc836a1abeb12ca1416bdb29e87ad021153 |
|
14-Sep-2009 |
Alexey Dobriyan <adobriyan@gmail.com> |
net: constify struct inet6_protocol Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6ce9e7b5fe3195d1ae6e3a0753d4ddcac5cd699e |
|
03-Sep-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
ip: Report qdisc packet drops Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e651f03afe833326faa0abe55948c1c6cfd0b8ac |
|
09-Aug-2009 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
inet6: Conversion from u8 to int This replaces assignments of the type "int on LHS" = "u8 on RHS" with simpler code. The LHS can express all of the unsigned right hand side values, hence the assigned value can not be negative. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ba73542585a4a3c8a708f502e62e6e63dd74b66c |
|
09-Jul-2009 |
Sridhar Samudrala <sri@us.ibm.com> |
udpv6: Handle large incoming UDP/IPv6 packets and support software UFO - validate and forward GSO UDP/IPv6 packets from untrusted sources. - do software UFO if the outgoing device doesn't support UFO. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
493c6be3fedfe24aa676949b237b9b104d911abf |
|
09-Jul-2009 |
Sridhar Samudrala <sri@us.ibm.com> |
udpv6: Fix HW checksum support for outgoing UFO packets - add HW checksum support for outgoing large UDP/IPv6 packets destined for a UFO enabled device. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d5fdd6babcfc2b0e6a8da1acf492a69fb54b4c47 |
|
23-Jun-2009 |
Brian Haley <brian.haley@hp.com> |
ipv6: Use correct data types for ICMPv6 type and code Change all the code that deals directly with ICMPv6 type and code values to use u8 instead of a signed int as that's the actual data type. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
31e6d363abcd0d05766c82f1a9c905a4c974a199 |
|
18-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: correct off-by-one write allocations reports commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
adf30907d63893e4208dfe3f5c88ae12bc2f25d5 |
|
02-Jun-2009 |
Eric Dumazet <eric.dumazet@gmail.com> |
net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
499923c7a3254971873e55a1690d07d3700eea47 |
|
09-Apr-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
ipv6: Fix NULL pointer dereference with time-wait sockets Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6 (ipv6: Fix conflict resolutions during ipv6 binding) introduced a regression where time-wait sockets were not treated correctly. This resulted in the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000062 IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70 ... Call Trace: [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6] [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6] [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400 [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6] [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0 [<ffffffff8056ed49>] sys_bind+0x89/0x100 [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b Tested-by: Brian Haley <brian.haley@hp.com> Tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6 |
|
24-Mar-2009 |
Vlad Yasevich <vladislav.yasevich@hp.com> |
ipv6: Fix conflict resolutions during ipv6 binding The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal() which at times wrongly identified intersections between addresses. It particularly broke down under a few instances and caused erroneous bind conflicts. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9f690db7ff4cb32493c0b0b13334cc4f5fd49a6b |
|
16-Dec-2008 |
Yang Hongyang <yanghy@cn.fujitsu.com> |
ipv6: fix the outgoing interface selection order in udpv6_sendmsg() 1.When no interface is specified in an IPV6_PKTINFO ancillary data item, the interface specified in an IPV6_PKTINFO sticky optionis is used. RFC3542: 6.7. Summary of Outgoing Interface Selection This document and [RFC-3493] specify various methods that affect the selection of the packet's outgoing interface. This subsection summarizes the ordering among those in order to ensure deterministic behavior. For a given outgoing packet on a given socket, the outgoing interface is determined in the following order: 1. if an interface is specified in an IPV6_PKTINFO ancillary data item, the interface is used. 2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky option, the interface is used. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
52479b623d3d41df84c499325b6a8c7915413032 |
|
26-Nov-2008 |
Alexey Dobriyan <adobriyan@gmail.com> |
netns xfrm: lookup in netns Pass netns to xfrm_lookup()/__xfrm_lookup(). For that pass netns to flow_cache_lookup() and resolver callback. Take it from socket or netdevice. Stub DECnet to init_net. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
88ab1932eac721c6e7336708558fa5ed02c85c80 |
|
17-Nov-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
udp: Use hlist_nulls in UDP RCU code This is a straightforward patch, using hlist_nulls infrastructure. RCUification already done on UDP two weeks ago. Using hlist_nulls permits us to avoid some memory barriers, both at lookup time and delete time. Patch is large because it adds new macros to include/net/sock.h. These macros will be used by TCP & DCCP in next patch. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0856f93958c488f0cc656be53c26dfd20663bdb3 |
|
02-Nov-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
udp: Fix the SNMP counter of UDP_MIB_INERRORS UDP packets received in udpv6_recvmsg() are not only IPv6 UDP packets, but also have IPv4 UDP packets, so when do the counter of UDP_MIB_INERRORS in udpv6_recvmsg(), we should check whether the packet is a IPv6 UDP packet or a IPv4 UDP packet. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f26ba1751145edbf52b2c89a40e389f2fbdfc1af |
|
02-Nov-2008 |
Wei Yongjun <yjwei@cn.fujitsu.com> |
udp: Fix the SNMP counter of UDP_MIB_INDATAGRAMS If UDP echo is sent to xinetd/echo-dgram, the UDP reply will be received at the sender. But the SNMP counter of UDP_MIB_INDATAGRAMS will be not increased, UDP6_MIB_INDATAGRAMS will be increased instead. Endpoint A Endpoint B UDP Echo request -----------> (IPv4, Dst port=7) <---------- UDP Echo Reply (IPv4, Src port=7) This bug is come from this patch cb75994ec311b2cd50e5205efdcc0696abd6675d. It do counter UDP[6]_MIB_INDATAGRAMS until udp[v6]_recvmsg. Because xinetd used IPv6 socket to receive UDP messages, thus, when received UDP packet, the UDP6_MIB_INDATAGRAMS will be increased in function udpv6_recvmsg() even if the packet is a IPv4 UDP packet. This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
920a46115ca3fa88990276d98520abab85495b2d |
|
02-Nov-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
udp: multicast packets need to check namespace Current UDP multicast delivery is not namespace aware. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
96631ed16c514cf8b28fab991a076985ce378c26 |
|
29-Oct-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
udp: introduce sk_for_each_rcu_safenext() Corey Minyard found a race added in commit 271b72c7fa82c2c7a795bc16896149933110672d (udp: RCU handling for Unicast packets.) "If the socket is moved from one list to another list in-between the time the hash is calculated and the next field is accessed, and the socket has moved to the end of the new list, the traversal will not complete properly on the list it should have, since the socket will be on the end of the new list and there's not a way to tell it's on a new list and restart the list traversal. I think that this can be solved by pre-fetching the "next" field (with proper barriers) before checking the hash." This patch corrects this problem, introducing a new sk_for_each_rcu_safenext() macro. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
271b72c7fa82c2c7a795bc16896149933110672d |
|
29-Oct-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
udp: RCU handling for Unicast packets. Goals are : 1) Optimizing handling of incoming Unicast UDP frames, so that no memory writes should happen in the fast path. Note: Multicasts and broadcasts still will need to take a lock, because doing a full lockless lookup in this case is difficult. 2) No expensive operations in the socket bind/unhash phases : - No expensive synchronize_rcu() calls. - No added rcu_head in socket structure, increasing memory needs, but more important, forcing us to use call_rcu() calls, that have the bad property of making sockets structure cold. (rcu grace period between socket freeing and its potential reuse make this socket being cold in CPU cache). David did a previous patch using call_rcu() and noticed a 20% impact on TCP connection rates. Quoting Cristopher Lameter : "Right. That results in cacheline cooldown. You'd want to recycle the object as they are cache hot on a per cpu basis. That is screwed up by the delayed regular rcu processing. We have seen multiple regressions due to cacheline cooldown. The only choice in cacheline hot sensitive areas is to deal with the complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU." - Because udp sockets are allocated from dedicated kmem_cache, use of SLAB_DESTROY_BY_RCU can help here. Theory of operation : --------------------- As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()), special attention must be taken by readers and writers. Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed, reused, inserted in a different chain or in worst case in the same chain while readers could do lookups in the same time. In order to avoid loops, a reader must check each socket found in a chain really belongs to the chain the reader was traversing. If it finds a mismatch, lookup must start again at the begining. This *restart* loop is the reason we had to use rdlock for the multicast case, because we dont want to send same message several times to the same socket. We use RCU only for fast path. Thus, /proc/net/udp still takes spinlocks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
645ca708f936b2fbeb79e52d7823e3eb2c0905f8 |
|
29-Oct-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
udp: introduce struct udp_table and multiple spinlocks UDP sockets are hashed in a 128 slots hash table. This hash table is protected by *one* rwlock. This rwlock is readlocked each time an incoming UDP message is handled. This rwlock is writelocked each time a socket must be inserted in hash table (bind time), or deleted from this table (close time) This is not scalable on SMP machines : 1) Even in read mode, lock() and unlock() are atomic operations and must dirty a contended cache line, shared by all cpus. 2) A writer might be starved if many readers are 'in flight'. This can happen on a machine with some NIC receiving many UDP messages. User process can be delayed a long time at socket creation/dismantle time. This patch prepares RCU migration, by introducing 'struct udp_table and struct udp_hslot', and using one spinlock per chain, to reduce contention on central rwlock. Introducing one spinlock per chain reduces latencies, for port randomization on heavily loaded UDP servers. This also speedup bindings to specific ports. udp_lib_unhash() was uninlined, becoming to big. Some cleanups were done to ease review of following patch (RCUification of UDP Unicast lookups) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
23542618deb77cfed312842fe8c41ed19fb16470 |
|
07-Oct-2008 |
KOVACS Krisztian <hidden@sch.bme.hu> |
inet: Don't lookup the socket if there's a socket attached to the skb Use the socket cached in the skb if it's present. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
607c4aaf03041c8bd81555a0218050c0f895088e |
|
07-Oct-2008 |
KOVACS Krisztian <hidden@sch.bme.hu> |
inet: Add udplib_lookup_skb() helpers To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d97106ea52aa57e63ff40d04479016836bbb5a4e |
|
09-Aug-2008 |
Herbert Xu <herbert@gondor.apana.org.au> |
udp: Drop socket lock for encapsulated packets The socket lock is there to protect the normal UDP receive path. Encapsulation UDP sockets don't need that protection. In fact the locking is deadly for them as they may contain another UDP packet within, possibly with the same addresses. Also the nested bit was copied from TCP. TCP needs it because of accept(2) spawning sockets. This simply doesn't apply to UDP so I've removed it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ef28d1a20f9f18ebf1be15ef6f097a76f9a63499 |
|
06-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
MIB: add struct net to UDP6_INC_STATS_BH Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
235b9f7ac53489011d32efeb89e12e308fdd2c64 |
|
06-Jul-2008 |
Pavel Emelyanov <xemul@openvz.org> |
MIB: add struct net to UDP6_INC_STATS_USER As simple as the patch #1 in this set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cb61cb9b8b5ef6c2697d84e5015e314626eb2fba |
|
18-Jun-2008 |
Eric Dumazet <dada1@cosmosbay.com> |
udp: sk_drops handling In commits 33c732c36169d7022ad7d6eb474b0c9be43a2dc1 ([IPV4]: Add raw drops counter) and a92aa318b4b369091fd80433c80e62838db8bc1c ([IPV6]: Add raw drops counter), Wang Chen added raw drops counter for /proc/net/raw & /proc/net/raw6 This patch adds this capability to UDP sockets too (/proc/net/udp & /proc/net/udp6). This means that 'RcvbufErrors' errors found in /proc/net/snmp can be also be examined for each udp socket. # grep Udp: /proc/net/snmp Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors Udp: 23971006 75 899420 16390693 146348 0 # cat /proc/net/udp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt --- uid timeout inode ref pointer drops 75: 00000000:02CB 00000000:0000 07 00000000:00000000 00:00000000 00000000 --- 0 0 2358 2 ffff81082a538c80 0 111: 00000000:006F 00000000:0000 07 00000000:00000000 00:00000000 00000000 --- 0 0 2286 2 ffff81042dd35c80 146348 In this example, only port 111 (0x006F) was flooded by messages that user program could not read fast enough. 146348 messages were lost. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
19c7578fb22b0aef103222cae9b522f03ae489d6 |
|
17-Jun-2008 |
Pavel Emelyanov <xemul@openvz.org> |
udp: add struct net argument to udp_hashfn Every caller already has this one. The new argument is currently unused, but this will be fixed shortly. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e31634931d00081c75e3fb3f3ec51a50dbf108bb |
|
17-Jun-2008 |
Pavel Emelyanov <xemul@openvz.org> |
udp: provide a struct net pointer for __udp[46]_lib_mcast_deliver They both calculate the hash chain, but currently do not have a struct net pointer, so pass one there via additional argument, all the more so their callers already have such. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
d6266281f8175e3ad68c28b20a609b278b47ade5 |
|
17-Jun-2008 |
Pavel Emelyanov <xemul@openvz.org> |
udp: introduce a udp_hashfn function Currently the chain to store a UDP socket is calculated with simple (x & (UDP_HTABLE_SIZE - 1)). But taking net into account would make this calculation a bit more complex, so moving it into a function would help. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7d06b2e053d2d536348e3a0f6bb02982a41bea37 |
|
15-Jun-2008 |
Brian Haley <brian.haley@hp.com> |
net: change proto destroy method to return void Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0b040829952d84bf2a62526f0e24b624e0699447 |
|
11-Jun-2008 |
Adrian Bunk <bunk@kernel.org> |
net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
36d926b94a9908937593e5669162305a071b9cc3 |
|
04-Jun-2008 |
Denis V. Lunev <den@openvz.org> |
[IPV6]: inet_sk(sk)->cork.opt leak IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data actually. In this case ip_flush_pending_frames should be called instead of ip6_flush_pending_frames. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
91e1908f569dd96a25a3947de8771e6cc93999dd |
|
04-Jun-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] NETNS: Handle ancillary data in appropriate namespace. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
a3c960899e042bc1c2b730a2115fa32da7802039 |
|
03-Jun-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] UDP: Possible dst leak in udpv6_sendmsg. ip6_sk_dst_lookup returns held dst entry. It should be released on all paths beyond this point. Add missed release when up->pending is set. Bug report and initial patch by Denis V. Lunev <den@openvz.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Denis V. Lunev <den@openvz.org>
|
876c7f41961dc5172b03cbf2dca65f05003f28a0 |
|
11-Apr-2008 |
Brian Haley <Brian.Haley@hp.com> |
[IPv6]: Change IPv6 unspecified destination address to ::1 for raw and un-connected sockets This patch fixes a difference between IPv4 and IPv6 when sending packets to the unspecified address (either 0.0.0.0 or ::) when using raw or un-connected UDP sockets. There are two cases where IPv6 either fails to send anything, or sends with the destination address set to ::. For example: --> ping -c1 0.0.0.0 PING 0.0.0.0 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.032 ms --> ping6 -c1 :: PING ::(::) 56 data bytes ping: sendmsg: Invalid argument Doing a sendto("0.0.0.0") reveals: 10:55:01.495090 IP localhost.32780 > localhost.7639: UDP, length 100 Doing a sendto("::") reveals: 10:56:13.262478 IP6 fe80::217:8ff:fe7d:4718.32779 > ::.7639: UDP, length 100 If you issue a connect() first in the UDP case, it will be sent to ::1, similar to what happens with TCP. This restores the BSD-ism. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
4ad96d39a2d74c1b2e400b602da2594f5098fc26 |
|
29-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[UDP]: Remove owner from udp_seq_afinfo. Move it to udp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3ba9441bdf07370670a684e6d95dfc523476677f |
|
29-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[UDP]: Place file operations directly into udp_seq_afinfo. No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
dda61925f84d89e2f2a4597d6298a05a2bc05c20 |
|
29-Mar-2008 |
Denis V. Lunev <den@openvz.org> |
[UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo. No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bdcde3d71a67e97f25e851f3ca97c9bb5ef03e7f |
|
29-Mar-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[SOCK]: Drop inuse pcounter from struct proto (v2). An uppercut - do not use the pcounter on struct proto. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
878628fbf2589eb24357e42027d5f54b1dafd3c8 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit namespace comparision without CONFIG_NET_NS. Introduce an inline net_eq() to compare two namespaces. Without CONFIG_NET_NS, since no namespace other than &init_net exists, it is always 1. We do not need to convert 1) inline vs inline and 2) inline vs &init_net comparisons. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
3b1e0a655f8eba44ab1ee2a1068d169ccfb853b9 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
c346dca10840a874240c78efe3f39acf4312a1f2 |
|
25-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
6b75d0908185bf853b188afa6f269426f6554c5b |
|
10-Mar-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Optimize hop-limit determination. Last part of hop-limit determination is always: hoplimit = dst_metric(dst, RTAX_HOPLIMIT); if (hoplimit < 0) hoplimit = ipv6_get_hoplimit(dst->dev). Let's consolidate it as ip6_dst_hoplimit(dst). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
6ba5a3c52da00015e739469e3b00cd6d0d4c5c67 |
|
23-Mar-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[UDP]: Make full use of proto.h.udp_hash innovation. After this we have only udp_lib_get_port to get the port and two stubs for ipv4 and ipv6. No difference in udp and udplite except for initialized h.udp_hash member. I tried to find a graceful way to drop the only difference between udp_v4_get_port and udp_v6_get_port (i.e. the rcv_saddr comparison routine), but adding one more callback on the struct proto didn't appear such :( Maybe later. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0c96d8c50bffb7f02690dd8a8cf1adb8e07e100f |
|
21-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] udp6 - make proc per namespace The proc init/exit functions take a new network namespace parameter in order to register/unregister /proc/net/udp6 for a namespace. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b8ad0cbc58f703972e9e37c4e2a8081dd7e6a551 |
|
07-Mar-2008 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[NETNS][IPV6] mcast - handle several network namespace This patch make use of the network namespace information at the right places to handle the multicast for several network namespaces. It makes the socket control to be per namespace too. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
db8dac20d5199307dcfcf4e01dac4bda5edf9e89 |
|
07-Mar-2008 |
David S. Miller <davem@davemloft.net> |
[UDP]: Revert udplite and code split. This reverts commit db1ed684f6c430c4cdad67d058688b8a1b5e607c ("[IPV6] UDP: Rename IPv6 UDP files."), commit 8be8af8fa4405652e6c0797db5465a4be8afb998 ("[IPV4] UDP: Move IPv4-specific bits to other file.") and commit e898d4db2749c6052072e9bc4448e396cbdeb06a ("[UDP]: Allow users to configure UDP-Lite."). First, udplite is of such small cost, and it is a core protocol just like TCP and normal UDP are. We spent enormous amounts of effort to make udplite share as much code with core UDP as possible. All of that work is less valuable if we're just going to slap a config option on udplite support. It is also causing build failures, as reported on linux-next, showing that the changeset was not tested very well. In fact, this is the second build failure resulting from the udplite change. Finally, the config options provided was a bool, instead of a modular option. Meaning the udplite code does not even get build tested by allmodconfig builds, and furthermore the user is not presented with a reasonable modular build option which is particularly needed by distribution vendors. Signed-off-by: David S. Miller <davem@davemloft.net>
|
db1ed684f6c430c4cdad67d058688b8a1b5e607c |
|
21-Feb-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] UDP: Rename IPv6 UDP files. Rename net/ipv6/udp.c to net/ipv6/udp_ipv6.c Rename net/ipv6/udplite.c to net/ipv6/udplite_ipv6.c. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
e898d4db2749c6052072e9bc4448e396cbdeb06a |
|
29-Feb-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[UDP]: Allow users to configure UDP-Lite. Let's give users an option for disabling UDP-Lite (~4K). old: | text data bss dec hex filename | 286498 12432 6072 305002 4a76a net/ipv4/built-in.o | 193830 8192 3204 205226 321aa net/ipv6/ipv6.o new (without UDP-Lite): | text data bss dec hex filename | 284086 12136 5432 301654 49a56 net/ipv4/built-in.o | 191835 7832 3076 202743 317f7 net/ipv6/ipv6.o Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
fa4d3c6210380c55cf7f295d28fd981fbcbb828c |
|
31-Jan-2008 |
Pavel Emelyanov <xemul@openvz.org> |
[NETNS]: Udp sockets per-net lookup. Add the net parameter to udp_get_port family of calls and udp_lookup one and use it to filter sockets. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
77d0d350e96c9453be255d8eff8dc97555710b17 |
|
22-Jan-2008 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6] UDP,UDPLITE: Sparse: {__udp6_lib,udp,udplite}_err() are of void. Fix following sparse warnings: | net/ipv6/udp.c:262:2: warning: returning void-valued expression | net/ipv6/udplite.c:29:2: warning: returning void-valued expression Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
95766fff6b9a78d11fc2d3812dd035381690b55d |
|
31-Dec-2007 |
Hideo Aoki <haoki@redhat.com> |
[UDP]: Add memory accounting. Signed-off-by: Takahiro Yasui <tyasui@redhat.com> Signed-off-by: Hideo Aoki <haoki@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
9055e051b8d4b266054fe511a65a9888d30fa64f |
|
14-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP]: Move udp_stats_in6 into net/ipv4/udp.c Now that external users may increment the counters directly, we need to ensure that udp_stats_in6 is always available. Otherwise we'd either have to requrie the external users to be built as modules or ipv6 to be built-in. This isn't too bad because udp_stats_in6 is just a pair of pointers plus an EXPORT, e.g., just 40 (16 + 24) bytes on x86-64. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
09f7709f4929666006931f1d4efc498a6d419bbc |
|
13-Dec-2007 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[IPV6]: fix section mismatch warnings Removed useless and buggy __exit section in the different ipv6 subsystems. Otherwise they will be called inside an init section during rollbacking in case of an error in the protocol initialization. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
bb72845e699d3c84e5f861b51db686107a51dea5 |
|
13-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPSEC]: Make callers of xfrm_lookup to use XFRM_LOOKUP_WAIT This patch converts all callers of xfrm_lookup that used an explicit value of 1 to indiciate blocking to use the new flag XFRM_LOOKUP_WAIT. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7f4e4868f3ce0e946f116c28fa4fe033be5e4ba9 |
|
11-Dec-2007 |
Daniel Lezcano <dlezcano@fr.ibm.com> |
[IPV6]: make the protocol initialization to return an error code This patchset makes the different protocols to return an error code, so the af_inet6 module can check the initialization was correct or not. The raw6 was taken into account to be consistent with the rest of the protocols, but the registration is at the same place. Because the raw6 has its own init function, the proto and the ops structure can be moved inside the raw6.c file. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a59322be07c964e916d15be3df473fb7ba20c41e |
|
05-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP]: Only increment counter on first peek/recv The previous move of the the UDP inDatagrams counter caused each peek of the same packet to be counted separately. This may be undesirable. This patch fixes this by adding a bit to sk_buff to record whether this packet has already been seen through skb_recv_datagram. We then only increment the counter when the packet is seen for the first time. The only dodgy part is the fact that skb_recv_datagram doesn't have a good way of returning this new bit of information. So I've added a new function __skb_recv_datagram that does return this and made skb_recv_datagram a wrapper around it. The plan is to eventually replace all uses of skb_recv_datagram with this new function at which time it can be renamed its proper name. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1781f7f5804e52ee2d35328b129602146a8d8254 |
|
11-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP]: Restore missing inDatagrams increments The previous move of the the UDP inDatagrams counter caused the counting of encapsulated packets, SUNRPC data (as opposed to call) packets and RXRPC packets to go missing. This patch restores all of these. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
27ab2568649d5ba6c5a20212079b7c4f6da4ca0d |
|
05-Dec-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP]: Avoid repeated counting of checksum errors due to peeking Currently it is possible for two processes to peek on the same socket and end up incrementing the error counter twice for the same packet. This patch fixes it by making skb_kill_datagram return whether it succeeded in unlinking the packet and only incrementing the counter if it did. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b2bf1e2659b1cba5e65f81781cfd530be447f80b |
|
03-Dec-2007 |
Wang Chen <wangchen@cn.fujitsu.com> |
[UDP]: Clean up for IS_UDPLITE macro Since we have macro IS_UDPLITE, we can use it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cb75994ec311b2cd50e5205efdcc0696abd6675d |
|
03-Dec-2007 |
Wang Chen <wangchen@cn.fujitsu.com> |
[UDP]: Defer InDataGrams increment until recvmsg() does checksum Thanks dave, herbert, gerrit, andi and other people for your discussion about this problem. UdpInDatagrams can be confusing because it counts packets that might be dropped later. Move UdpInDatagrams into recvmsg() as allowed by the RFC. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c5a432f1a18b4b2efe691dd6bbb30d86a281f783 |
|
06-Nov-2007 |
Eric Dumazet <dada1@cosmosbay.com> |
[IPV6]: Use the {DEFINE|REF}_PROTO_INUSE infrastructure Trivial patch to make "tcpv6,udpv6,udplitev6,rawv6" protocols uses the fast "inuse sockets" infrastructure Each protocol use then a static percpu var, instead of a dynamic one. This saves some ram and some cpu cycles Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7a0ff716c2282f4b8d89c65850a4f17399628154 |
|
06-Nov-2007 |
Mitsuru Chinen <mitch@linux.vnet.ibm.com> |
[IPv6] SNMP: Restore Udp6InErrors incrementation As the checksum verification is postponed till user calls recv or poll, the inrementation of Udp6InErrors counter should be also postponed. Currently, it is postponed in non-blocking operation case. However it should be postponed in all case like the IPv4 code. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e5bbef20e017efcb10700398cc048c49b98628e0 |
|
15-Oct-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Replace sk_buff ** with sk_buff * in input handlers With all the users of the double pointers removed from the IPv6 input path, this patch converts all occurances of sk_buff ** to sk_buff * in IPv6 input handlers. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e773e4faa19c54c2f32ddd16add2919588488bd9 |
|
25-Aug-2007 |
Brian Haley <brian.haley@hp.com> |
[IPV6]: Add v4mapped address inline Add v4mapped address inline to avoid calls to ipv6_addr_type(). Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cd562c9859f648d78224e9fc0dafa5a3d5000fdb |
|
15-Sep-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Just increment OutDatagrams once per a datagram. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
df2bc459a3ad71f8b44c358bf7169acf9caf4acd |
|
06-Jun-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[UDP]: Revert 2-pass hashing changes. This reverts changesets: 6aaf47fa48d3c44280810b1b470261d340e4ed87 b7b5f487ab39bc10ed0694af35651a03d9cb97ff de34ed91c4ffa4727964a832c46e624dd1495cf5 fc038410b4b1643766f8033f4940bcdb1dace633 There are still some correctness issues recently discovered which do not have a known fix that doesn't involve doing a full hash table scan on port bind. So revert for now. Signed-off-by: David S. Miller <davem@davemloft.net>
|
14e50e57aedb2a89cf79b77782879769794cab7b |
|
25-May-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[XFRM]: Allow packet drops during larval state resolution. The current IPSEC rule resolution behavior we have does not work for a lot of people, even though technically it's an improvement from the -EAGAIN buisness we had before. Right now we'll block until the key manager resolves the route. That works for simple cases, but many folks would rather packets get silently dropped until the key manager resolves the IPSEC rules. We can't tell these folks to "set the socket non-blocking" because they don't have control over the non-block setting of things like the sockets used to resolve DNS deep inside of the resolver libraries in libc. With that in mind I coded up the patch below with some help from Herbert Xu which provides packet-drop behavior during larval state resolution, controllable via sysctl and off by default. This lays the framework to either: 1) Make this default at some point or... 2) Move this logic into xfrm{4,6}_policy.c and implement the ARP-like resolution queue we've all been dreaming of. The idea would be to queue packets to the policy, then once the larval state is resolved by the key manager we re-resolve the route and push the packets out. The packets would timeout if the rule didn't get resolved in a certain amount of time. Signed-off-by: David S. Miller <davem@davemloft.net>
|
fc038410b4b1643766f8033f4940bcdb1dace633 |
|
10-May-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[UDP]: Fix AF-specific references in AF-agnostic code. __udp_lib_port_inuse() cannot make direct references to inet_sk(sk)->rcv_saddr as that is ipv4 specific state and this code is used by ipv6 too. Use an operations vector to solve this, and this also paves the way for ipv6 support for non-wild saddr hashing in UDP. Signed-off-by: David S. Miller <davem@davemloft.net>
|
604763722c655c7e3f31ecf6f7b4dafcd26a7a15 |
|
09-Apr-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARY When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
4bedb45203eab92a87b4c863fe2d0cded633427f |
|
13-Mar-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce udp_hdr(), remove skb->h.uh Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
0660e03f6b18f19b6bbafe7583265a51b90daf36 |
|
26-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ipv6_hdr(), remove skb->nh.ipv6h Now the skb->nh union has just one member, .raw, i.e. it is just like the skb->mac union, strange, no? I'm just leaving it like that till the transport layer is done with, when we'll rename skb->mac.raw to skb->mac_header (or ->mac_header_offset?), ditto for ->{h,nh}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
eddc9ec53be2ecdbf4efe0efd4a83052594f0ac0 |
|
21-Apr-2007 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iph Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
add459aa1afe05472abc96f6a29aefd0c84e73d6 |
|
09-Mar-2007 |
Stephen Hemminger <shemminger@linux-foundation.org> |
[UDP]: ipv6 style cleanup Fix whitespace around keywords. Eliminate unnecessary ()'s on return statements. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
759e5d006462d53fb708daa8284b4ad909415da1 |
|
26-Mar-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP]: Clean up UDP-Lite receive checksum This patch eliminates some duplicate code for the verification of receive checksums between UDP-Lite and UDP. It does this by introducing __skb_checksum_complete_head which is identical to __skb_checksum_complete_head apart from the fact that it takes a length parameter rather than computing the first skb->len bytes. As a result UDP-Lite will be able to use hardware checksum offload for packets which do not use partial coverage checksums. It also means that UDP-Lite loopback no longer does unnecessary checksum verification. If any NICs start support UDP-Lite this would also start working automatically. This patch removes the assumption that msg_flags has MSG_TRUNC clear upon entry in recvmsg. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1ab6eb62b02e0949a392fb19bf31ba59ae1022b1 |
|
07-Mar-2007 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP6]: Restore sk_filter optimisation This reverts the changeset [IPV6]: UDPv6 checksum. We always need to check UDPv6 checksum because it is mandatory. The sk_filter optimisation has nothing to do whether we verify the checksum. It simply postpones it to the point when the user calls recv or poll. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
b59e139bbd5c789700aa9cefe7eb6590bc516b86 |
|
30-Mar-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPv6]: Fix incorrect length check in rawv6_sendmsg() In article <20070329.142644.70222545.davem@davemloft.net> (at Thu, 29 Mar 2007 14:26:44 -0700 (PDT)), David Miller <davem@davemloft.net> says: > From: Sridhar Samudrala <sri@us.ibm.com> > Date: Thu, 29 Mar 2007 14:17:28 -0700 > > > The check for length in rawv6_sendmsg() is incorrect. > > As len is an unsigned int, (len < 0) will never be TRUE. > > I think checking for IPV6_MAXPLEN(65535) is better. > > > > Is it possible to send ipv6 jumbo packets using raw > > sockets? If so, we can remove this check. > > I don't see why such a limitation against jumbo would exist, > does anyone else? > > Thanks for catching this Sridhar. A good compiler should simply > fail to compile "if (x < 0)" when 'x' is an unsigned type, don't > you think :-) Dave, we use "int" for returning value, so we should fix this anyway, IMHO; we should not allow len > INT_MAX. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
cd354f1ae75e6466a7e31b727faede57a1f89ca5 |
|
14-Feb-2007 |
Tim Schmielau <tim@physik3.uni-rostock.de> |
[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
95f30b336b944e3e418f825044b4793d9e9aac09 |
|
10-Feb-2007 |
Eric Dumazet <dada1@cosmosbay.com> |
[UDP]: UDP can use sk_hash to speedup lookups In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash) to let tcp lookups use one cache line per unmatched entry instead of two. We can also use sk_hash to speedup UDP part as well. We store in sk_hash the hnum value, and use sk->sk_hash (same cache line than 'next' pointer), instead of inet->num (different cache line) Note : We still have a false sharing problem for SMP machines, because sock_hold(sock) dirties the cache line containing the 'next' pointer. Not counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1ab1457c42bc078e5a9becd82a7f9f940b55c53a |
|
09-Feb-2007 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[NET] IPV6: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8eb9086f21c73b38b5ca27558db4c91d62d0e70b |
|
08-Feb-2007 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>
|
4c0a6cb0db19de411c4bf7fcdc79d4c7c4ccafb1 |
|
27-Nov-2006 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
[UDP(-Lite)]: consolidate v4 and v6 get|setsockopt code This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The justification is that UDP(-Lite) is a transport-layer protocol and therefore the socket option code (at least in theory) should be AF-independent. Furthermore, there is the following code reduplication: * do_udp{,v6}_getsockopt is 100% identical between v4 and v6 * do_udp{,v6}_setsockopt is identical up to the following differerence --v4 in contrast to v4 additionally allows the experimental encapsulation types UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE --the remainder is identical between v4 and v6 I believe that this difference is of little relevance. The advantages in not duplicating twice almost completely identical code. The patch further simplifies the interface of udp{,v6}_push_pending_frames, since for the second argument (struct udp_sock *up) it always holds that up = udp_sk(sk); where sk is the first function argument. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8e5200f54062b8af0ed1d186ea0f113854786d89 |
|
21-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: Fix assorted misannotations (from md5 and udplite merges). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7d9e9b3df491d5e1c3ed76c5cff7ace6094124c1 |
|
15-Nov-2006 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV6]: udp.c build fix Signed-off-by: David S. Miller <davem@davemloft.net>
|
f6ab028804bdc580fe0915494dbf31f5ea473ca7 |
|
16-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: Make mangling a checksum (0 -> 0xffff on the wire) explicit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
868c86bcb5bdea7ed8d45979b17bb919af9254db |
|
15-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[NET]: annotate csum_ipv6_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e69a4adc669fe210817ec50ae3f9a7a5ad62d4e8 |
|
15-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV6]: Misc endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
ba4e58eca8aa9473b44fdfd312f26c4a2e7798b3 |
|
27-Nov-2006 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
[NET]: Supporting UDP-Lite (RFC 3828) in Linux This is a revision of the previously submitted patch, which alters the way files are organized and compiled in the following manner: * UDP and UDP-Lite now use separate object files * source file dependencies resolved via header files net/ipv{4,6}/udp_impl.h * order of inclusion files in udp.c/udplite.c adapted accordingly [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828) This patch adds support for UDP-Lite to the IPv4 stack, provided as an extension to the existing UDPv4 code: * generic routines are all located in net/ipv4/udp.c * UDP-Lite specific routines are in net/ipv4/udplite.c * MIB/statistics support in /proc/net/snmp and /proc/net/udplite * shared API with extensions for partial checksum coverage [NET/IPv6]: Extension for UDP-Lite over IPv6 It extends the existing UDPv6 code base with support for UDP-Lite in the same manner as per UDPv4. In particular, * UDPv6 generic and shared code is in net/ipv6/udp.c * UDP-Litev6 specific extensions are in net/ipv6/udplite.c * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6 * support for IPV6_ADDRFORM * aligned the coding style of protocol initialisation with af_inet6.c * made the error handling in udpv6_queue_rcv_skb consistent; to return `-1' on error on all error cases * consolidation of shared code [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support The UDP-Lite patch further provides * API documentation for UDP-Lite * basic xfrm support * basic netfilter support for IPv4 and IPv6 (LOG target) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
04ce69093f91547d3a7c4fc815d2868195591340 |
|
08-Nov-2006 |
Al Viro <viro@zeniv.linux.org.uk> |
[IPV6]: 'info' argument of ipv6 ->err_handler() is net-endian Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f2776ff047229c3e7cee2454e2704dd6f98fa32f |
|
22-Nov-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix address/interface handling in UDP and DCCP, according to the scoping architecture. TCP and RAW do not have this issue. Closes Bug #7432. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1e0c14f49d6b393179f423abbac47f85618d3d46 |
|
03-Oct-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP]: Fix MSG_PROBE crash UDP tracks corking status through the pending variable. The IP layer also tracks it through the socket write queue. It is possible for the two to get out of sync when MSG_PROBE is used. This patch changes UDP to check the write queue to ensure that the two stay in sync. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
132a55f3c5c0b1a364d32f65595ad8838c30a60e |
|
03-Oct-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[UDP6]: Fix flowi clobbering The udp6_sendmsg function uses a shared buffer to store the flow without taking any locks. This leads to races with SMP. This patch moves the flowi object onto the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
25030a7f9eeab2dcefff036469e0e2b4f956198f |
|
27-Aug-2006 |
Gerrit Renker <gerrit@erg.abdn.ac.uk> |
[UDP]: Unify UDPv4 and UDPv6 ->get_port() This patch creates one common function which is called by udp_v4_get_port() and udp_v6_get_port(). As a result, * duplicated code is removed * udp_port_rover and local port lookup can now be removed from udp.h * further savings follow since the same function will be used by UDP-Litev4 and UDP-Litev6 In contrast to the patch sent in response to Yoshifujis comments (fixed by this variant), the code below also removes the EXPORT_SYMBOL(udp_port_rover), since udp_port_rover can now remain local to net/ipv4/udp.c. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
|
8e1ef0a95b87e8b4292b2ba733e8cb854ea2d2fe |
|
30-Aug-2006 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Cache source address as well in ipv6_pinfo{}. Based on MIPL2 kernel patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
|
f8d8fda54a1bfcf8cf829e44c494b2b4582819aa |
|
15-Aug-2006 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV6] udp: Fix type in previous change. UDPv6 stats are UDP6_foo not UDP_foo. Signed-off-by: David S. Miller <davem@davemloft.net>
|
a18135eb9389c26d36ef5c05bd8bc526e0cbe883 |
|
15-Aug-2006 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV6]: Add UDP_MIB_{SND,RCV}BUFERRORS handling. Signed-off-by: David S. Miller <davem@davemloft.net>
|
84fa7933a33f806bbbaae6775e87459b1ec584c0 |
|
30-Aug-2006 |
Patrick McHardy <kaber@trash.net> |
[NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
beb8d13bed80f8388f1a9a107d07ddd342e627e8 |
|
05-Aug-2006 |
Venkat Yekkirala <vyekkirala@TrustedCS.com> |
[MLSXFRM]: Add flow labeling This labels the flows that could utilize IPSec xfrms at the points the flows are defined so that IPSec policy and SAs at the right label can be used. The following protos are currently not handled, but they should continue to be able to use single-labeled IPSec like they currently do. ipmr ip_gre ipip igmp sit sctp ip6_tunnel (IPv6 over IPv6 tunnel device) decnet Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
497c615abad7ee81994dd592194535aea2aad617 |
|
31-Jul-2006 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV6]: Audit all ip6_dst_lookup/ip6_dst_store calls The current users of ip6_dst_lookup can be divided into two classes: 1) The caller holds no locks and is in user-context (UDP). 2) The caller does not want to lookup the dst cache at all. The second class covers everyone except UDP because most people do the cache lookup directly before calling ip6_dst_lookup. This patch adds ip6_sk_dst_lookup for the first class. Similarly ip6_dst_store users can be divded into those that need to take the socket dst lock and those that don't. This patch adds __ip6_dst_store for those (everyone except UDP/datagram) that don't need an extra lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
6ab3d5624e172c553004ecc862bfeac16d9d68b7 |
|
30-Jun-2006 |
Jörn Engel <joern@wohnheim.fh-wedel.de> |
Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
|
543d9cfeec4d58ad3fd974db5531b06b6b95deb4 |
|
21-Mar-2006 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[NET]: Identation & other cleanups related to compat_[gs]etsockopt cset No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3fdadf7d27e3fbcf72930941884387d1f4936f04 |
|
21-Mar-2006 |
Dmitry Mishin <dim@openvz.org> |
[NET]: {get|set}sockopt compatibility layer This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
951dbc8ac714b04c36296b8b5c36c8e036ce433f |
|
07-Jan-2006 |
Patrick McHardy <kaber@trash.net> |
[IPV6]: Move nextheader offset to the IP6CB Move nextheader offset to the IP6CB to make it possible to pass a packet to ip6_input_finish multiple times and have it skip already parsed headers. As a nice side effect this gets rid of the manual hopopts skipping in ip6_input_finish. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
3305b80c214c642b89cd5c21af83bc91ec13f8bd |
|
14-Dec-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IP]: Simplify and consolidate MSG_PEEK error handling When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled it is left on the socket receive queue. This means that when we detect a checksum error we have to be careful when trying to free the packet as someone could have dequeued it in the time being. Currently this delicate logic is duplicated three times between UDPv4, UDPv6 and RAWv6. This patch moves them into a one place and simplifies the code somewhat. This is based on a suggestion by Eric Dumazet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
df9890c31a1a447254f39e40c3fd81ad6547945b |
|
19-Nov-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix sending extension headers before and including routing header. Based on suggestion from Masahide Nakamura <nakam@linux-ipv6.org>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
fb286bb2990a107009dbf25f6ffebeb7df77f9be |
|
10-Nov-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[NET]: Detect hardware rx checksum faults correctly Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
87bf9c97b4b3af8dec7b2b79cdfe7bfc0a0a03b2 |
|
04-Oct-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Fix infinite loop in udp_v6_get_port(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
a5e7c210fefd2454c757a3542e41063407ca7108 |
|
03-Oct-2005 |
David S. Miller <davem@sunset.davemloft.net> |
[IPV6]: Fix leak added by udp connect dst caching fix. Based upon a patch from Mitsuru KANDA <mk@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
987905ded3d19c196dae25cac46c569cac9594b8 |
|
18-Sep-2005 |
Mitsuru KANDA <mk@linux-ipv6.org> |
[IPV6]: Check connect(2) status for IPv6 UDP socket (Re: xfrm_lookup) I think we should cache the per-socket route(dst_entry) only when the IPv6 UDP socket is connect(2)'ed. (which is same as IPv4 UDP send behavior) Signed-off-by: Mitsuru KANDA <mk@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
40796c5e8f2a93008e9034b3110a7e7b1fa0fba0 |
|
15-Sep-2005 |
David L Stevens <dlstevens@us.ibm.com> |
[IPV6]: Fix per-socket multicast filtering in sk_reuse case per-socket multicast filters were not being applied to all sockets in the case of an exact-match bound address, due to an over-exuberant "return" in the look-up code. Fix below. IPv4 does not have this problem. Thanks to Hoerdt Mickael for reporting the bug. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e104411b82f5c4d19752c335492036abdbf5880d |
|
09-Sep-2005 |
Patrick McHardy <kaber@trash.net> |
[XFRM]: Always release dst_entry on error in xfrm_lookup Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
42ca89c18b75e1c4c3b02aa5589ad3aa916909a8 |
|
08-Sep-2005 |
Stephen Hemminger <shemminger@osdl.org> |
[IPV6]: Need to use pskb_trim_rcsum(). Fix pskb_trim usage in ipv6. Only the udp one is really a bug, other places are just doing equivalent code. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
|
41a1f8ea4fbfcdc4232f023732584aae2220de31 |
|
08-Sep-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Support IPV6_{RECV,}TCLASS socket options / ancillary data. Based on patch from David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
333fad5364d6b457c8d837f7d05802d2aaf8a961 |
|
08-Sep-2005 |
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> |
[IPV6]: Support several new sockopt / ancillary data in Advanced API (RFC3542). Support several new socket options / ancillary data: IPV6_RECVPKTINFO, IPV6_PKTINFO, IPV6_RECVHOPOPTS, IPV6_HOPOPTS, IPV6_RECVDSTOPTS, IPV6_DSTOPTS, IPV6_RTHDRDSTOPTS, IPV6_RECVRTHDR, IPV6_RTHDR, IPV6_RECVHOPOPTS, IPV6_HOPOPTS Old semantics are preserved as IPV6_2292xxxx so that we can maintain backward compatibility. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
|
ba89966c1984513f4f2cc0a6c182266be44ddd03 |
|
26-Aug-2005 |
Eric Dumazet <dada1@cosmosbay.com> |
[NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointers This patch puts mostly read only data in the right section (read_mostly), to help sharing of these data between CPUS without memory ping pongs. On one of my production machine, tcp_statistics was sitting in a heavily modified cache line, so *every* SNMP update had to force a reload. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
20380731bc2897f2952ae055420972ded4cd786e |
|
16-Aug-2005 |
Arnaldo Carvalho de Melo <acme@mandriva.com> |
[NET]: Fix sparse warnings Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
64ce207306debd7157f47282be94770407bec01c |
|
10-Aug-2005 |
Patrick McHardy <kaber@trash.net> |
[NET]: Make NETDEBUG pure printk wrappers Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
c752f0739f09b803aed191c4765a3b6650a08653 |
|
10-Aug-2005 |
Arnaldo Carvalho de Melo <acme@ghostprotocols.net> |
[TCP]: Move the tcp sock states to net/tcp_states.h Lots of places just needs the states, not even linux/tcp.h, where this enum was, needs it. This speeds up development of the refactorings as less sources are rebuilt when things get moved from net/tcp.h. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
|
e0f9f8586a0b21fb3c7a4ead3804008d57dfdef7 |
|
19-Jun-2005 |
Herbert Xu <herbert@gondor.apana.org.au> |
[IPV4/IPV6]: Replace spin_lock_irq with spin_lock_bh In light of my recent patch to net/ipv4/udp.c that replaced the spin_lock_irq calls on the receive queue lock with spin_lock_bh, here is a similar patch for all other occurences of spin_lock_irq on receive/error queue locks in IPv4 and IPv6. In these stacks, we know that they can only be entered from user or softirq context. Therefore it's safe to disable BH only. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
|
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 |
|
17-Apr-2005 |
Linus Torvalds <torvalds@ppc970.osdl.org> |
Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
|