Cross Reference: /include/net/route.h

History log of /include/net/route.h
Revision	Date	Author	Comments
83511cc43b561d957b0295c53063127bc5b185da	29-Sep-2015	Amit Pundir <amit.pundir@linaro.org>	net: core: fix UID-based routing build Use kuid_t GLOBAL_ROOT_UID instead of "0" otherwise we run into following build failure: include/net/route.h: In function 'ip_route_output_ports': include/net/route.h:143:55: error: type mismatch in conditional expression daddr, saddr, dport, sport, sk ? sock_i_uid(sk) : 0); ^ Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
2de7009bb038330dd9e9ea4567bbe19b14e14edf	08-Jul-2014	Sreeram Ramachandran <sreeram@google.com>	Handle 'sk' being NULL in UID-based routing. Bug: 15413527 Change-Id: Iab1fae9da6053b284591628ef1de878761b137b1 Signed-off-by: Sreeram Ramachandran <sreeram@google.com> Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
ba3d8d3f9f65807763b2e0e1ea7645d74a962248	31-Mar-2014	Lorenzo Colitti <lorenzo@google.com>	net: core: Support UID-based routing. This contains the following commits: 1. cc2f522 net: core: Add a UID range to fib rules. 2. d7ed2bd net: core: Use the socket UID in routing lookups. 3. 2f9306a net: core: Add a RTA_UID attribute to routes. This is so that userspace can do per-UID route lookups. 4. 8e46efb net: ipv6: Use the UID in IPv6 PMTUD IPv4 PMTUD already does this because ipv4_sk_update_pmtu uses __build_flow_key, which includes the UID. Bug: 15413527 Change-Id: I81bd31dae655de9cce7d7a1f9a905dc1c2feba7c Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
0b8c7f6f2a26ed2dee24881299fc69f554596dbb	21-Mar-2014	Li RongQing <roy.qing.li@gmail.com>	ipv4: remove ip_rt_dump from route.c ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
f87c10a8aa1e82498c42d0335524d6ae7cf5a52b	09-Jan-2014	Hannes Frederic Sowa <hannes@stressinduktion.org>	ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing While forwarding we should not use the protocol path mtu to calculate the mtu for a forwarded packet but instead use the interface mtu. We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was introduced for multicast forwarding. But as it does not conflict with our usage in unicast code path it is perfect for reuse. I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular dependencies because of IPSKB_FORWARDED. Because someone might have written a software which does probe destinations manually and expects the kernel to honour those path mtus I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone can disable this new behaviour. We also still use mtus which are locked on a route for forwarding. The reason for this change is, that path mtus information can be injected into the kernel via e.g. icmp_err protocol handler without verification of local sockets. As such, this could cause the IPv4 forwarding path to wrongfully emit fragmentation needed notifications or start to fragment packets along a path. Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED won't be set and further fragmentation logic will use the path mtu to determine the fragmentation size. They also recheck packet size with help of path mtu discovery and report appropriate errors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: John Heffner <johnwheffner@gmail.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
0e0d44ab4275549998567cd4700b43f7496eb62b	28-Aug-2013	Steffen Klassert <steffen.klassert@secunet.com>	net: Remove FLOWI_FLAG_CAN_SLEEP FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility to sleep until the needed states are resolved. This code is gone, so FLOWI_FLAG_CAN_SLEEP is not needed anymore. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
482fc6094afad572a4ea1fd722e7b11ca72022a0	05-Nov-2013	Hannes Frederic Sowa <hannes@stressinduktion.org>	ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery, their sockets won't accept and install new path mtu information and they will always use the interface mtu for outgoing packets. It is guaranteed that the packet is not fragmented locally. But we won't set the DF-Flag on the outgoing frames. Florian Weimer had the idea to use this flag to ensure DNS servers are never generating outgoing fragments. They may well be fragmented on the path, but the server never stores or usees path mtu values, which could well be forged in an attack. (The root of the problem with path MTU discovery is that there is no reliable way to authenticate ICMP Fragmentation Needed But DF Set messages because they are sent from intermediate routers with their source addresses, and the IMCP payload will not always contain sufficient information to identify a flow.) Recent research in the DNS community showed that it is possible to implement an attack where DNS cache poisoning is feasible by spoofing fragments. This work was done by Amir Herzberg and Haya Shulman: <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf> This issue was previously discussed among the DNS community, e.g. <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>, without leading to fixes. This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK" for the enforcement of the non-fragmentable checks. If other users than ip_append_page/data should use this semantic too, we have to add a new flag to IPCB(skb)->flags to suppress local fragmentation and check for this in ip_finish_output. Many thanks to Florian Weimer for the idea and feedback while implementing this patch. Cc: David S. Miller <davem@davemloft.net> Suggested-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
0baf2b35fc70ab16c385963d2502da26a55d2cb7	16-Oct-2013	Eric Dumazet <edumazet@google.com>	ipv4: shrink rt_cache_stat Half of the rt_cache_stat fields are no longer used after IP route cache removal, lets shrink this per cpu area. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
aa6615814533c634190019ee3a5b10490026d545	24-Sep-2013	Francesco Fusco <ffusco@redhat.com>	ipv4: processing ancillary IP_TOS or IP_TTL If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out packets with the specified TTL or TOS overriding the socket values specified with the traditional setsockopt(). The struct inet_cork stores the values of TOS, TTL and priority that are passed through the struct ipcm_cookie. If there are user-specified TOS (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are used to override the per-socket values. In case of TOS also the priority is changed accordingly. Two helper functions get_rttos and get_rtconn_flags are defined to take into account the presence of a user specified TOS value when computing RT_TOS and RT_CONN_FLAGS. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2702c4bb89b2a9bfe49f3db77a1bac88d5c6404c	22-Sep-2013	Joe Perches <joe@perches.com>	route.h: Remove extern from function prototypes There are a mix of function prototypes with and without extern in the kernel sources. Standardize on not using extern for function prototypes. Function prototypes don't need to be written with extern. extern is assumed by the compiler. Its use is as unnecessary as using auto to declare automatic/local variables in a block. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
0ea9d5e3e0e03a63b11392f5613378977dae7eca	13-Aug-2013	Hannes Frederic Sowa <hannes@stressinduktion.org>	xfrm: introduce helper for safe determination of mtu skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we always have to make sure we a referring to the correct interpretation of skb->sk. We only depend on header defines to query the mtu, so we don't introduce a new dependency to ipv6 by this change. Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6da025fa23bb10c82f80de319c837ed2b02306e4	28-Oct-2012	Eric Dumazet <edumazet@google.com>	ipv4: avoid a test in ip_rt_put() We can save a test in ip_rt_put(), considering dst_release() accepts a NULL parameter, and dst is first element in rtable. Add a BUILD_BUG_ON() to catch any change that could break this assertion. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <amwang@redhat.com> Acked-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
155e8336c373d14d87a7f91e356d85ef4b93b8f9	08-Oct-2012	Julian Anastasov <ja@ssi.bg>	ipv4: introduce rt_uses_gateway Add new flag to remember when route is via gateway. We will use it to allow rt_gateway to contain address of directly connected host for the cases when DST_NOCACHE is used or when the NH exception caches per-destination route without DST_NOCACHE flag, i.e. when routes are not used for other destinations. By this way we force the neighbour resolving to work with the routed destination but we can use different address in the packet, feature needed for IPVS-DR where original packet for virtual IP is routed via route to real IP. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
bafa6d9d89072c1a18853afe9ee5de05c491c13a	07-Sep-2012	Nicolas Dichtel <nicolas.dichtel@6wind.com>	ipv4/route: arg delay is useless in rt_cache_flush() Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
4ccfe6d4109252dfadcd6885f33ed600ee03dbf8	07-Sep-2012	Nicolas Dichtel <nicolas.dichtel@6wind.com>	ipv4/route: arg delay is useless in rt_cache_flush() Since route cache deletion (89aef8921bfbac22f), delay is no more used. Remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
caacf05e5ad1abf0a2864863da4e33024bc68ec6	01-Aug-2012	David S. Miller <davem@davemloft.net>	ipv4: Properly purge netdev references on uncached routes. When a device is unregistered, we have to purge all of the references to it that may exist in the entire system. If a route is uncached, we currently have no way of accomplishing this. So create a global list that is scanned when a network device goes down. This mirrors the logic in net/core/dst.c's dst_ifdown(). Signed-off-by: David S. Miller <davem@davemloft.net>
c6cffba4ffa26a8ffacd0bb9f3144e34f20da7de	26-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Fix input route performance regression. With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
13378cad02afc2adc6c0e07fca03903c7ada0b37	23-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Change rt->rt_iif encoding. On input packet processing, rt->rt_iif will be zero if we should use skb->dev->ifindex. Since we access rt->rt_iif consistently via inet_iif(), that is the only spot whose interpretation have to adjust. Signed-off-by: David S. Miller <davem@davemloft.net>
2860583fe840d972573363dfa190b2149a604534	17-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Kill rt->fi It's not really needed. We only grabbed a reference to the fib_info for the sake of fib_info local metrics. However, fib_info objects are freed using RCU, as are therefore their private metrics (if any). We would have triggered a route cache flush if we eliminated a reference to a fib_info object in the routing tables. Therefore, any existing cached routes will first check and see that they have been invalidated before an errant reference to these metric values would occur. Signed-off-by: David S. Miller <davem@davemloft.net>
9917e1e8762745191eba5a3bf2040278cbddbee1	17-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Turn rt->rt_route_iif into rt->rt_is_input. That is this value's only use, as a boolean to indicate whether a route is an input route or not. So implement it that way, using a u16 gap present in the struct already. Signed-off-by: David S. Miller <davem@davemloft.net>
4fd551d7bed93af60af61c5a324b8f5dff37953a	17-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Kill rt->rt_oif Never actually used. It was being set on output routes to the original OIF specified in the flow key used for the lookup. Adjust the only user, ipmr_rt_fib_lookup(), for greater correctness of the flowi4_oif and flowi4_iif values, thanks to feedback from Julian Anastasov. Signed-off-by: David S. Miller <davem@davemloft.net>
f8126f1d5136be1ca1a3536d43ad7a710b5620f8	13-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Adjust semantics of rt->rt_gateway. In order to allow prefixed routes, we have to adjust how rt_gateway is set and interpreted. The new interpretation is: 1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr 2) rt_gateway != 0, destination requires a nexthop gateway Abstract the fetching of the proper nexthop value using a new inline helper, rt_nexthop(), as suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
f1ce3062c53809d862d8a04e7a0566c3cc4e0bda	12-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Remove 'rt_dst' from 'struct rtable' Signed-off-by: David S. Miller <davem@davemloft.net>
b48698895de86e07b685f8e4b8db0f1cd5a97e9a	01-Jul-2012	David Miller <davem@davemloft.net>	ipv4: Remove 'rt_mark' from 'struct rtable' Signed-off-by: David S. Miller <davem@davemloft.net>
d6c0a4f609847d6e65658913f9ccbcb1c137cff3	01-Jul-2012	David Miller <davem@davemloft.net>	ipv4: Kill 'rt_src' from 'struct rtable' Signed-off-by: David S. Miller <davem@davemloft.net>
1a00fee4ffb22312a0ac40045ecd6f222b55eb3d	01-Jul-2012	David Miller <davem@davemloft.net>	ipv4: Remove rt_key_{src,dst,tos} from struct rtable. They are always used in contexts where they can be reconstituted, or where the finally resolved rt->rt_{src,dst} is semantically equivalent. Signed-off-by: David S. Miller <davem@davemloft.net>
38a424e4657462fe9f8b76f01a0e879abde99ab4	01-Jul-2012	David Miller <davem@davemloft.net>	ipv4: Kill ip_route_input_noref(). The "noref" argument to ip_route_input_common() is now always ignored because we do not cache routes, and in that case we must always grab a reference to the resulting 'dst'. Signed-off-by: David S. Miller <davem@davemloft.net>
89aef8921bfbac22f00e04f8450f6e447db13e42	17-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Delete routing cache. The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. Signed-off-by: David S. Miller <davem@davemloft.net>
1f42539d257af671d56d4bdbcf13aef31abff6ef	12-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Kill ip_rt_redirect(). No longer needed, as the protocol handlers now all properly propagate the redirect back into the routing code. Signed-off-by: David S. Miller <davem@davemloft.net>
b42597e2f36e2043756aa7462faddf630962f061	12-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions. Signed-off-by: David S. Miller <davem@davemloft.net>
94206125c4aac32e43c25bfe1b827e7ab993b7dc	12-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Rearrange arguments to ip_rt_redirect() Pass in the SKB rather than just the IP addresses, so that policy and other aspects can reside in ip_rt_redirect() rather then icmp_redirect(). Signed-off-by: David S. Miller <davem@davemloft.net>
f185071ddf799e194ba015d040d3d49cdbfa7e48	10-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Remove inetpeer from routes. No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
5943634fc5592037db0693b261f7f4bea6bb9457	10-Jul-2012	David S. Miller <davem@davemloft.net>	ipv4: Maintain redirect and PMTU info in struct rtable again. Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by: David S. Miller <davem@davemloft.net>
3e12939a2a67fbb4cbd962c3b9bc398c73319766	10-Jul-2012	David S. Miller <davem@davemloft.net>	inet: Kill FLOWI_FLAG_PRECOW_METRICS. No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>
41347dcdd81988b8e60853257b2875285cc17a4e	28-Jun-2012	David S. Miller <davem@davemloft.net>	ipv4: Kill rt->rt_spec_dst, no longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
c10237e077cef50e925f052e49f3b4fead9d71f9	28-Jun-2012	David S. Miller <davem@davemloft.net>	Revert "ipv4: tcp: dont cache unconfirmed intput dst" This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
c074da2810c118b3812f32d6754bd9ead2f169e7	27-Jun-2012	Eric Dumazet <edumazet@google.com>	ipv4: tcp: dont cache unconfirmed intput dst DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
36393395536064e483b73d173f6afc103eadfbc4	15-Jun-2012	David S. Miller <davem@davemloft.net>	ipv4: Handle PMTU in all ICMP error handlers. With ip_rt_frag_needed() removed, we have to explicitly update PMTU information in every ICMP error handler. Create two helper functions to facilitate this. 1) ipv4_sk_update_pmtu() This updates the PMTU when we have a socket context to work with. 2) ipv4_update_pmtu() Raw version, used when no socket context is available. For this interface, we essentially just pass in explicit arguments for the flow identity information we would have extracted from the socket. And you'll notice that ipv4_sk_update_pmtu() is simply implemented in terms of ipv4_update_pmtu() Note that __ip_route_output_key() is used, rather than something like ip_route_output_flow() or ip_route_output_key(). This is because we absolutely do not want to end up with a route that does IPSEC encapsulation and the like. Instead, we only want the route that would get us to the node described by the outermost IP header. Reported-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
55afabaa0df0dd139c8796a71beb43d1216fbe43	12-Jun-2012	David S. Miller <davem@davemloft.net>	inet: Fix BUG triggered by __rt{,6}_get_peer(). If no peer actually gets attached (either because create is zero or the peer allocation fails) we'll trigger a BUG because we unconditionally do an rt{,6}_peer_ptr() afterwards. Fix this by guarding it with the proper check. Signed-off-by: David S. Miller <davem@davemloft.net>
46517008e1168dc926cf2c47d529efc07eca85c0	10-Jun-2012	David S. Miller <davem@davemloft.net>	ipv4: Kill ip_rt_frag_needed(). There is zero point to this function. It's only real substance is to perform an extremely outdated BSD4.2 ICMP check, which we can safely remove. If you really have a MTU limited link being routed by a BSD4.2 derived system, here's a nickel go buy yourself a real router. The other actions of ip_rt_frag_needed(), checking and conditionally updating the peer, are done by the per-protocol handlers of the ICMP event. TCP, UDP, et al. have a handler which will receive this event and transmit it back into the associated route via dst_ops->update_pmtu(). This simplification is important, because it eliminates the one place where we do not have a proper route context in which to make an inetpeer lookup. Signed-off-by: David S. Miller <davem@davemloft.net>