History log of /net/ipv4/ping.c
Revision Date Author Comments
125ce76ea25ca1e1f634f96ca87ec34bf7c13a8c 03-Mar-2015 Lorenzo Colitti <lorenzo@google.com> net: ping: Return EAFNOSUPPORT when appropriate.

1. For an IPv4 ping socket, ping_check_bind_addr does not check
the family of the socket address that's passed in. Instead,
make it behave like inet_bind, which enforces either that the
address family is AF_INET, or that the family is AF_UNSPEC and
the address is 0.0.0.0.
2. For an IPv6 ping socket, ping_check_bind_addr returns EINVAL
if the socket family is not AF_INET6. Return EAFNOSUPPORT
instead, for consistency with inet6_bind.
3. Make ping_v4_sendmsg and ping_v6_sendmsg return EAFNOSUPPORT
instead of EINVAL if an incorrect socket address structure is
passed in.
4. Make IPv6 ping sockets be IPv6-only. The code does not support
IPv4, and it cannot easily be made to support IPv4 because
the protocol numbers for ICMP and ICMPv6 are different. This
makes connect(::ffff:192.0.2.1) fail with EAFNOSUPPORT instead
of making the socket unusable.

Among other things, this fixes an oops that can be triggered by:

int s = socket(AF_INET, SOCK_DGRAM, IPPROTO_ICMP);
struct sockaddr_in6 sin6 = {
.sin6_family = AF_INET6,
.sin6_addr = in6addr_any,
};
bind(s, (struct sockaddr *) &sin6, sizeof(sin6));

Change-Id: If06ca86d9f1e4593c0d6df174caca3487c57a241
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
858c4e49d1acb72a3c32711f0cbe8cf14b0f663a 02-May-2015 David S. Miller <davem@davemloft.net> ipv4: Missing sk_nulls_node_init() in ping_unhash().

If we don't do that, then the poison value is left in the ->pprev
backlink.

This can cause crashes if we do a disconnect, followed by a connect().

Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bug: 20770158
Change-Id: I944eb20fddea190892c2da681d934801d268096d
ba3d8d3f9f65807763b2e0e1ea7645d74a962248 31-Mar-2014 Lorenzo Colitti <lorenzo@google.com> net: core: Support UID-based routing.

This contains the following commits:

1. cc2f522 net: core: Add a UID range to fib rules.
2. d7ed2bd net: core: Use the socket UID in routing lookups.
3. 2f9306a net: core: Add a RTA_UID attribute to routes.
This is so that userspace can do per-UID route lookups.
4. 8e46efb net: ipv6: Use the UID in IPv6 PMTUD
IPv4 PMTUD already does this because ipv4_sk_update_pmtu
uses __build_flow_key, which includes the UID.

Bug: 15413527
Change-Id: I81bd31dae655de9cce7d7a1f9a905dc1c2feba7c
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
f4713a3dfad045d46afcb9c2a7d0bba288920ed4 26-Nov-2014 Willem de Bruijn <willemb@google.com> net-timestamp: make tcp_recvmsg call ipv6_recv_error for AF_INET6 socks

TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets.
If the socket is of family AF_INET6, call ipv6_recv_error instead
of ip_recv_error.

This change is more complex than a single branch due to the loadable
ipv6 module. It reuses a pre-existing indirect function call from
ping. The ping code is safe to call, because it is part of the core
ipv6 module and always present when AF_INET6 sockets are active.

Fixes: 4ed2d765 (net-timestamp: TCP timestamping)
Signed-off-by: Willem de Bruijn <willemb@google.com>

----

It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6)
to ip_recv_error.
Signed-off-by: David S. Miller <davem@davemloft.net>
91a0b603469069cdcce4d572b7525ffc9fd352a6 24-Nov-2014 Jane Zhou <a17711@motorola.com> net/ping: handle protocol mismatching scenario

ping_lookup() may return a wrong sock if sk_buff's and sock's protocols
dont' match. For example, sk_buff's protocol is ETH_P_IPV6, but sock's
sk_family is AF_INET, in that case, if sk->sk_bound_dev_if is zero, a wrong
sock will be returned.
the fix is to "continue" the searching, if no matching, return NULL.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Jane Zhou <a17711@motorola.com>
Signed-off-by: Yiwei Zhao <gbjc64@motorola.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
49a601589caaf0e93194c0cc9b4ecddbe75dd2d5 05-Sep-2014 Vincent Bernat <vincent@bernat.im> net/ipv4: bind ip_nonlocal_bind to current netns

net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>
4330487acfff0cf1d7b14d238583a182e0a444bb 01-Aug-2014 Duan Jiong <duanj.fnst@cn.fujitsu.com> net: use inet6_iif instead of IP6CB()->iif

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ba6b918ab234186d3aa1663e296586a1b526b77a 06-May-2014 Cong Wang <xiyou.wangcong@gmail.com> ping: move ping_group_range out of CONFIG_SYSCTL

Similarly, when CONFIG_SYSCTL is not set, ping_group_range should still
work, just that no one can change it. Therefore we should move it out of
sysctl_net_ipv4.c. And, it should not share the same seqlock with
ip_local_port_range.

BTW, rename it to ->ping_group_range instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c9d8f1a64225dfcc2f721d73a5984a2444920744 06-May-2014 Cong Wang <xiyou.wangcong@gmail.com> ipv4: move local_port_range out of CONFIG_SYSCTL

When CONFIG_SYSCTL is not set, ip_local_port_range should still work,
just that no one can change it. Therefore we should move it out of sysctl_inet.c.
Also, rename it to ->ip_local_ports instead.

Cc: David S. Miller <davem@davemloft.net>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Reported-by: Stefan de Konink <stefan@konink.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b04c46190219a4f845e46a459e3102137b7f6cac 14-Apr-2014 Wang, Xiaoming <xiaoming.wang@intel.com> net: ipv4: current group_info should be put after using.

Plug a group_info refcount leak in ping_init.
group_info is only needed during initialization and
the code failed to release the reference on exit.
While here move grabbing the reference to a place
where it is actually needed.

Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Zhang Dongxing <dongxing.zhang@intel.com>
Signed-off-by: xiaoming wang <xiaoming.wang@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c8e6ad0829a723a74cd2fea9996a3392d2579a18 18-Feb-2014 Hannes Frederic Sowa <hannes@stressinduktion.org> ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg

In case we decide in udp6_sendmsg to send the packet down the ipv4
udp_sendmsg path because the destination is either of family AF_INET or
the destination is an ipv4 mapped ipv6 address, we don't honor the
maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.

We simply can check for this option in ip_cmsg_send because no calls to
ipv6 module functions are needed to do so.

Reported-by: Gert Doering <gert@space.net>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
82b276cd2b0bacd58e7c307bf8856925a68c4d14 20-Jan-2014 Hannes Frederic Sowa <hannes@stressinduktion.org> ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts

Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
4b261c75a99f29c93a0b6babfc180cdf566bd654 20-Jan-2014 Hannes Frederic Sowa <hannes@stressinduktion.org> ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams

We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
342dfc306fb32155314dad277f3c3686b83fb9f1 17-Jan-2014 Steffen Hurrle <steffen@hurrle.net> net: add build-time checks for msg->msg_name size

This is a follow-up patch to f3d3342602f8bc ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ea074b3495a023c2cf969605d51991b2b9df9514 28-Dec-2013 Stephen Hemminger <stephen@networkplumber.org> ipv4: ping make local stuff static

Don't export ping_table or ping_v4_sendmsg. Both are only used
inside ping code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
f1d8cba61c3c4b1eb88e507249c4cb8d635d9a76 28-Nov-2013 Eric Dumazet <edumazet@google.com> inet: fix possible seqlock deadlocks

In commit c9e9042994d3 ("ipv4: fix possible seqlock deadlock") I left
another places where IP_INC_STATS_BH() were improperly used.

udp_sendmsg(), ping_v4_sendmsg() and tcp_v4_connect() are called from
process context, not from softirq context.

This was detected by lockdep seqlock support.

Reported-by: jongman heo <jongman.heo@samsung.com>
Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
85fbaa75037d0b6b786ff18658ddf0b4014ce2a4 23-Nov-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions

Commit bceaa90240b6019ed73b49965eac7d167610be69 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
cf970c002d270c36202bd5b9c2804d3097a52da0 18-Nov-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> ping: prevent NULL pointer dereference on write to msg_name

A plain read() on a socket does set msg->msg_name to NULL. So check for
NULL pointer first.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
bceaa90240b6019ed73b49965eac7d167610be69 18-Nov-2013 Hannes Frederic Sowa <hannes@stressinduktion.org> inet: prevent leakage of uninitialized memory to user in recv syscalls

Only update *addr_len when we actually fill in sockaddr, otherwise we
can return uninitialized memory from the stack to the caller in the
recvfrom, recvmmsg and recvmsg syscalls. Drop the the (addr_len == NULL)
checks because we only get called with a valid addr_len pointer either
from sock_common_recvmsg or inet_recvmsg.

If a blocking read waits on a socket which is concurrently shut down we
now return zero and set msg_msgnamelen to 0.

Reported-by: mpb <mpb.mail@gmail.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
652586df95e5d76b37d07a11839126dcfede1621 14-Nov-2013 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> seq_file: remove "%n" usage from seq_file users

All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Joe Perches <joe@perches.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c2bb06db59eaf92eb5ca9c6faed590597c6ceccb 09-Oct-2013 Eric Dumazet <edumazet@google.com> net: fix build errors if ipv6 is disabled

CONFIG_IPV6=n is still a valid choice ;)

It appears we can remove dead code.

Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efe4208f47f907b86f528788da711e8ab9dea44d 04-Oct-2013 Eric Dumazet <edumazet@google.com> ipv6: make lookups simpler and faster

TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0bbf87d852d243680ed7074110ccc1dea003b61a 28-Sep-2013 Eric W. Biederman <ebiederm@xmission.com> net ipv4: Convert ipv4.ip_local_port_range to be per netns v3

- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
aa6615814533c634190019ee3a5b10490026d545 24-Sep-2013 Francesco Fusco <ffusco@redhat.com> ipv4: processing ancillary IP_TOS or IP_TTL

If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().

The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.

Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d14c5ab6bef6a46170b84c3589b27768e979f93d 15-Aug-2013 Francesco Fusco <ffusco@redhat.com> net: proc_fs: trivial: print UIDs as unsigned int

UIDs are printed in the proc_fs as signed int, whereas
they are unsigned int.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a06a2d378dbf099c874ad58de07a8e54ffdc94d3 12-Jun-2013 Wu Fengguang <fengguang.wu@intel.com> net: ping_check_bind_addr() etc. can be static

net/ipv4/ping.c:286:5: sparse: symbol 'ping_check_bind_addr' was not declared. Should it be static?
net/ipv4/ping.c:355:6: sparse: symbol 'ping_set_saddr' was not declared. Should it be static?
net/ipv4/ping.c:370:6: sparse: symbol 'ping_clear_saddr' was not declared. Should it be static?

net/ipv6/ping.c:60:5: sparse: symbol 'dummy_ipv6_recv_error' was not declared. Should it be static?
net/ipv6/ping.c:64:5: sparse: symbol 'dummy_ip6_datagram_recv_ctl' was not declared. Should it be static?
net/ipv6/ping.c:69:5: sparse: symbol 'dummy_icmpv6_err_convert' was not declared. Should it be static?
net/ipv6/ping.c:73:6: sparse: symbol 'dummy_ipv6_icmp_error' was not declared. Should it be static?
net/ipv6/ping.c:75:5: sparse: symbol 'dummy_ipv6_chk_addr' was not declared. Should it be static?
net/ipv6/ping.c:201:5: sparse: symbol 'ping_v6_seq_show' was not declared. Should it be static?

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c26d6b46da3ee86fa8a864347331e5513ca84c2b 03-Jun-2013 Cong Wang <amwang@redhat.com> ping: always initialize ->sin6_scope_id and ->sin6_flowinfo

If we don't need scope id, we should initialize it to zero.
Same for ->sin6_flowinfo.

Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d862e546142328d18377a4704be97f2ae301847a 31-May-2013 Lorenzo Colitti <lorenzo@google.com> net: ipv6: Implement /proc/net/icmp6.

The format is based on /proc/net/icmp and /proc/net/{udp,raw}6.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}
Couldn't figure out how to test without CONFIG_PROC_FS enabled.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8cc785f6f429c2a3fb81745dc142cbd72a462c4a 31-May-2013 Lorenzo Colitti <lorenzo@google.com> net: ipv4: make the ping /proc code AF-independent

Introduce a ping_seq_afinfo structure (similar to its UDP
equivalent) and use it to make some of the ping /proc functions
address-family independent. Rename the remaining ping /proc
functions from ping_* to ping_v4_*.

Compiles and displays reasonable results with CONFIG_IPV6={n,m,y}

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6d0bfe22611602f36617bc7aa2ffa1bbb2f54c67 22-May-2013 Lorenzo Colitti <lorenzo@google.com> net: ipv6: Add IPv6 support to the ping socket.

This adds the ability to send ICMPv6 echo requests without a
raw socket. The equivalent ability for ICMPv4 was added in
2011.

Instead of having separate code paths for IPv4 and IPv6, make
most of the code in net/ipv4/ping.c dual-stack and only add a
few IPv6-specific bits (like the protocol definition) to a new
net/ipv6/ping.c. Hopefully this will reduce divergence and/or
duplication of bugs in the future.

Caveats:

- Setting options via ancillary data (e.g., using IPV6_PKTINFO
to specify the outgoing interface) is not yet supported.
- There are no separate security settings for IPv4 and IPv6;
everything is controlled by /proc/net/ipv4/ping_group_range.
- The proc interface does not yet display IPv6 ping sockets
properly.

Tested with a patched copy of ping6 and using raw socket calls.
Compiles and works with all of CONFIG_IPV6={n,m,y}.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bf84a01063eaab2f1a37d72d1b903445b3a25a4e 14-Apr-2013 Daniel Borkmann <dborkman@redhat.com> net: sock: make sock_tx_timestamp void

Currently, sock_tx_timestamp() always returns 0. The comment that
describes the sock_tx_timestamp() function wrongly says that it
returns an error when an invalid argument is passed (from commit
20d4947353be, ``net: socket infrastructure for SO_TIMESTAMPING'').
Make the function void, so that we can also remove all the unneeded
if conditions that check for such a _non-existant_ error case in the
output path.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b531ed61a2a2a77eeb2f7c88b49aa5ec7d9880d8 21-Feb-2013 Li Wei <lw@cn.fujitsu.com> ipv4: fix a bug in ping_err().

We should get 'type' and 'code' from the outer ICMP header.

Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ece31ffd539e8e2b586b1ca5f50bc4f4591e3893 18-Feb-2013 Gao feng <gaofeng@cn.fujitsu.com> net: proc: change proc_net_remove to remove_proc_entry

proc_net_remove is only used to remove proc entries
that under /proc/net,it's not a general function for
removing proc entries of netns. if we want to remove
some proc entries which under /proc/net/stat/, we still
need to call remove_proc_entry.

this patch use remove_proc_entry to replace proc_net_remove.
we can remove proc_net_remove after this patch.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d4beaa66add8aebf83ab16d2fde4e4de8dac36df 18-Feb-2013 Gao feng <gaofeng@cn.fujitsu.com> net: proc: change proc_net_fops_create to proc_create

Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8141ed9fcedb278f4a3a78680591bef1e55f75fb 21-Jan-2013 Steffen Klassert <steffen.klassert@secunet.com> ipv4: Add a socket release callback for datagram sockets

This implements a socket release callback function to check
if the socket cached route got invalid during the time
we owned the socket. The function is used from udp, raw
and ping sockets.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7064d16e162adf8199f0288b694e6af823ed5431 24-May-2012 Eric W. Biederman <ebiederm@xmission.com> userns: Use kgids for sysctl_ping_group_range

- Store sysctl_ping_group_range as a paire of kgid_t values
instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.

With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.

Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
a7cb5a49bf64ba64864ae16a6be028f8b0d3cc06 24-May-2012 Eric W. Biederman <ebiederm@xmission.com> userns: Print out socket uids in a user namespace aware fashion.

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
55be7a9c6074f749d617a7fc1914c9a23505438c 12-Jul-2012 David S. Miller <davem@davemloft.net> ipv4: Add redirect support to all protocol icmp error handlers.

Signed-off-by: David S. Miller <davem@davemloft.net>
36393395536064e483b73d173f6afc103eadfbc4 15-Jun-2012 David S. Miller <davem@davemloft.net> ipv4: Handle PMTU in all ICMP error handlers.

With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.

Create two helper functions to facilitate this.

1) ipv4_sk_update_pmtu()

This updates the PMTU when we have a socket context to
work with.

2) ipv4_update_pmtu()

Raw version, used when no socket context is available. For this
interface, we essentially just pass in explicit arguments for
the flow identity information we would have extracted from the
socket.

And you'll notice that ipv4_sk_update_pmtu() is simply implemented
in terms of ipv4_update_pmtu()

Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key(). This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like. Instead, we only want the route that
would get us to the node described by the outermost IP header.

Reported-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ae2975bc3476243b45a1e2344236d7920c268f38 15-Nov-2011 Eric W. Biederman <ebiederm@xmission.com> userns: Convert group_info values from gid_t to kgid_t.

As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values. Unless user namespaces are used this change should
have no effect.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
95c961747284a6b83a5e2d81240e214b0fa3464d 15-Apr-2012 Eric Dumazet <eric.dumazet@gmail.com> net: cleanup unsigned to unsigned int

Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5e73ea1a31c3612aa6dfe44f864ca5b7b6a4cff9 15-Apr-2012 Daniel Baluta <dbaluta@ixiacom.com> ipv4: fix checkpatch errors

Fix checkpatch errors of the following type:
* ERROR: "foo * bar" should be "foo *bar"
* ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9ffc93f203c18a70623f21950f1dd473c9ec48cd 28-Mar-2012 David Howells <dhowells@redhat.com> Remove all #inclusions of asm/system.h

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
058bd4d2a4ff0aaa4a5381c67e776729d840c785 11-Mar-2012 Joe Perches <joe@perches.com> net: Convert printks to pr_<level>

Use a more current kernel messaging style.

Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.

Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini. Some ah4 and esp messages
are now not prefixed with "ip ".

The intent of this patch is to later add something like
#define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.

Text size is trivially reduced. (x86-32 allyesconfig)

$ size net/ipv4/built-in.o*
text data bss dec hex filename
887888 31558 249696 1169142 11d6f6 net/ipv4/built-in.o.new
887934 31558 249800 1169292 11d78c net/ipv4/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a5e7424d424f6398a198ead79d99e0a3c2f24ce8 21-Feb-2012 David S. Miller <davem@davemloft.net> ipv4: ping: Fix recvmsg MSG_OOB error handling.

Don't return an uninitialized variable as the error, return
-EOPNOTSUPP instead.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
76e21053b5bf33a07c76f99d27a74238310e3c71 08-Feb-2012 Erich E. Hoover <ehoover@mines.edu> ipv4: Implement IP_UNICAST_IF socket option.

The IP_UNICAST_IF feature is needed by the Wine project. This patch
implements the feature by setting the outgoing interface in a similar
fashion to that of IP_MULTICAST_IF. A separate option is needed to
handle this feature since the existing options do not provide all of
the characteristics required by IP_UNICAST_IF, a summary is provided
below.

SO_BINDTODEVICE:
* SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
does not. From reading some old mailing list articles my
understanding is that SO_BINDTODEVICE requires administrative
privileges because it can override the administrator's routing
settings.
* The SO_BINDTODEVICE option restricts both outbound and inbound
traffic, IP_UNICAST_IF only impacts outbound traffic.

IP_PKTINFO:
* Since IP_PKTINFO and IP_UNICAST_IF are independent options,
implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
applications.
* Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
complicates the Wine codebase and reduces the socket performance
(doing this requires a lot of extra communication between the
"server" and "user" layers).

bind():
* bind() does not work on broadcast packets, IP_UNICAST_IF is
specifically intended to work with broadcast packets.
* Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
traffic.

Signed-off-by: Erich E. Hoover <ehoover@mines.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
747465ef7a082033e086dedc8189febfda43b015 16-Jan-2012 Eric Dumazet <eric.dumazet@gmail.com> net: fix some sparse errors

make C=2 CF="-D__CHECK_ENDIAN__" M=net

And fix flowi4_init_output() prototype for sport

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fb120c0a2775b0d2d095a99ea8432bfb5d444ab2 17-Nov-2011 Eric Dumazet <eric.dumazet@gmail.com> ping: dont increment ICMP_MIB_INERRORS

ping module incorrectly increments ICMP_MIB_INERRORS if feeded with a
frame not belonging to its own sockets.

RFC 2011 states that ICMP_MIB_INERRORS should count "the number of ICMP
messages which the entiry received but determined as having
ICMP-specific errors (bad ICMP checksums, bad length, etc.)."

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bc3b2d7fb9b014d75ebb79ba371a763dbab5e8cf 15-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com> net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules

These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
8ad2475e3555346fbd738e77da12578b97d10505 20-Jun-2011 Jesper Juhl <jj@chaosbits.net> ipv4, ping: Remove duplicate icmp.h include

Remove the duplicate inclusion of net/icmp.h from net/ipv4/ping.c

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
e44ba033c5654dbfda53461c9b1f7dd9bd1d198f 20-Jun-2011 Vitaliy Ivanov <vitalivanov@gmail.com> treewide: remove duplicate includes

Many stupid corrections of duplicated includes based on the output of
scripts/checkincludes.pl.

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
19a76fa9593bad778dabeeec1f6c2df6effe9ca3 23-May-2011 Eric Dumazet <eric.dumazet@gmail.com> net: ping: cleanups ping_v4_unhash()

net/ipv4/ping.c: In function ‘ping_v4_unhash’:
net/ipv4/ping.c:140:28: warning: variable ‘hslot’ set but not used

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
75e308c894c4a5e47c005b8e821ae5f539ad2ef3 18-May-2011 Changli Gao <xiaosuo@gmail.com> net: ping: fix the coding style

The characters in a line should be no more than 80.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bb0cd2fb539c4e454ed32e32194acc03b75753f3 18-May-2011 Changli Gao <xiaosuo@gmail.com> net: ping: make local functions static

As these functions are only used in this file.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f56e03e8dc149bf0ac2888d6843584f48c8700fc 17-May-2011 Vasiliy Kulikov <segoon@openwall.com> net: ping: fix build failure

If CONFIG_PROC_SYSCTL=n the building process fails:

ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net'

Moved inet_get_ping_group_range_net() to ping.c.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1a8218e96271790a07dd7065a2ef173e0f67e328 15-May-2011 Eric Dumazet <eric.dumazet@gmail.com> net: ping: dont call udp_ioctl()

udp_ioctl() really handles UDP and UDPLite protocols.

1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds
a frame with bad checksum.

2) It has a dependency on sizeof(struct udphdr), not applicable to
ICMP/PING

If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be
done differently.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b1cb1f78a5e9d54c13e176020c3e8ded5d081ce 14-May-2011 Eric Dumazet <eric.dumazet@gmail.com> net: ping: small changes

ping_table is not __read_mostly, since it contains one rwlock,
and is static to ping.c

ping_port_rover & ping_v4_lookup are static

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c319b4d76b9e583a5d88d6bf190e079c4e43213d 13-May-2011 Vasiliy Kulikov <segoon@openwall.com> net: ipv4: add IPPROTO_ICMP socket kind

This patch adds IPPROTO_ICMP socket kind. It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges. In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets. Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
- switched to flowi4.
- minor changes to be consistent with raw sockets code.

PATCH v2:
- changed ping_debug() to pr_debug().
- removed CONFIG_IP_PING.
- removed ping_seq_fops.owner field (unused for procfs).
- switched to proc_net_fops_create().
- switched to %pK in seq_printf().

PATCH v1:
- fixed checksumming bug.
- CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
- minor cleanups.
- introduced sysctl'able group range to restrict socket(2).

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>