History log of /drivers/net/veth.c
Revision Date Author Comments
5517750f058edd111bcabe5e116056cc63b1f39c 14-Jul-2014 Tom Gundersen <teg@jklm.no> net: rtnetlink - make create_link take name_assign_type

This passes down NET_NAME_USER (or NET_NAME_ENUM) to alloc_netdev(),
for any device created over rtnetlink.

v9: restore reverse-christmas-tree order of local variables

Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
bb446c19fefd7b4435adb12a9dd7666adc5b553a 24-Jun-2014 WANG Cong <xiyou.wangcong@gmail.com> veth: add netpoll support

It is trivial to add netpoll support to veth, since
it is not a stacked device, we don't need to setup and
clean up netpoll.

Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3f8c707b9a83cd956af65796081b6c8cb8716089 28-Mar-2014 Vlad Yasevich <vyasevic@redhat.com> veth: Turn off vlan rx acceleration in vlan_features

For completeness, turn off vlan rx acceleration in vlan_features so
that it doesn't show up on q-in-q setups.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
57a7744e09867ebcfa0ccf1d6d529caa7728d552 14-Mar-2014 Eric W. Biederman <ebiederm@xmission.com> net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq

Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d0d21f4053c07714802cbe8b1fe26913ec296cc 18-Feb-2014 Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> veth: Fix vlan_features so as to be able to use stacked vlan interfaces

Even if we create a stacked vlan interface such as veth0.10.20, it sends
single tagged frames (tagged with only vid 10).
Because vlan_features of a veth interface has the
NETIF_F_HW_VLAN_[CTAG/STAG]_TX bits, veth0.10 also has that feature, so
dev_hard_start_xmit(veth0.10) doesn't call __vlan_put_tag() and
vlan_dev_hard_start_xmit(veth0.10) overwrites vlan_tci.
This prevents us from using a combination of 802.1ad and 802.1Q
in containers, etc.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f7b12606b5de323a2bb5ca1696558efde8f25441 18-Feb-2014 Jiri Pirko <jiri@resnulli.us> rtnl: make ifla_policy static

The only place this is used outside rtnetlink.c is veth. So provide
wrapper function for this usage.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
1c213bd24ad04f4430031d20d740d7783162b099 13-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com> net: introduce netdev_alloc_pcpu_stats() for drivers

There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
827da44c61419f29ae3be198c342e2147f1a10cb 08-Oct-2013 John Stultz <john.stultz@linaro.org> net: Explicitly initialize u64_stats_sync structures for lockdep

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
82d8189826d54740607e6a240e602850ef62a07d 26-Oct-2013 Eric Dumazet <edumazet@google.com> veth: extend features to support tunneling

While investigating on a recent vxlan regression, I found veth
was using a zero features set for vxlan tunnels.

We have to segment GSO frames, copy the payload, and do the checksum.

This patch brings a ~200% performance increase

We probably have to add hw_enc_features support
on other virtual devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5c70ef85a2f26d8a0e1aaa7b4cbfff44fda36585 04-Oct-2013 Gao feng <gaofeng@cn.fujitsu.com> veth: allow to setup multicast address for veth device

We can only setup multicast address for network device when
net_device_ops->ndo_set_rx_mode is not null.

Some configurations need to add multicast address for net
device, such as netfilter cluster match module.

Add a fake ndo_set_rx_mode function to allow this operation.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b343ca84b4e3ba65508503333c923a797801a588 09-Oct-2013 David S. Miller <davem@davemloft.net> Revert "veth: Showing peer of veth type dev in ip link (kernel side)"

This reverts commit 612c337306f00dc8d396830212de51c475844791.

As per Stephen Hemminger, the layout of the netlink attribute
is not implemented correctly so revert this for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
612c337306f00dc8d396830212de51c475844791 04-Oct-2013 Masatake YAMATO <yamato@redhat.com> veth: Showing peer of veth type dev in ip link (kernel side)

ip link has ability to show extra information of net work device if
kernel provides sunh information. With this patch veth driver can
provide its peer ifindex information to ip command via netlink
interface.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
b69bbddfa136dc53ac319d58bc38b41f8aefffea 18-Jul-2013 Flavio Leitner <fbl@redhat.com> veth: add vlan features

The veth device doesn't provide the vlan features,
so TSO for example is disabled and that causes
performance issues when using tagged traffic.

The test topology looks like this:

br0 br1
/ \ / \
vnet veth0.10 ----- veth1.10 vnet
VM VM

The netperf results with current veth driver:
MIGRATED TCP STREAM TEST from 192.168.1.1 ()
port 0 AF_INET to 192.168.1.2 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.01 2210.22

Now after applying the proposed patch:
MIGRATED TCP STREAM TEST from 192.168.1.1 ()
port 0 AF_INET to 192.168.1.2 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 13067.47

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3f8b96379a820318db37f7b6e81e6e459ad56efe 09-Jun-2013 Hong zhi guo <honkiko@gmail.com> veth: remove redundant call of dev_alloc_name

it's called in the following register_netdevice. No need to call it
here.
Tested with "ip link add type veth" and "ip link add xxx%d type veth".

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
28d2b136ca6c7bf7173a43a90f747ecda5b0520d 19-Apr-2013 Patrick McHardy <kaber@trash.net> net: vlan: announce STAG offload capability in some drivers

- macvlan: propagate STAG filtering capabilities from underlying device
- ifb: announce STAG tagging support in addition to CTAG tagging support
- veth: announce STAG tagging/stripping support in addition to CTAG support

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
f646968f8f7c624587de729115d802372b9063dd 19-Apr-2013 Patrick McHardy <kaber@trash.net> net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*

Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
f45a5c267da35174e22cec955093a7513dc1623d 08-Feb-2013 Eric Dumazet <edumazet@google.com> veth: fix NULL dereference in veth_dellink()

commit d0e2c55e7c940 (veth: avoid a NULL deref in veth_stats_one)
added another NULL deref in veth_dellink().

# ip link add name veth1 type veth peer name veth0
# rmmod veth

We crash because veth_dellink() is called twice, so we must
take care of NULL peer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2efd32ee1b60b0b31404ca47c1ce70e5a5d24ebc 10-Jan-2013 Eric Dumazet <edumazet@google.com> veth: fix a NULL deref in netif_carrier_off

In commit d0e2c55e7c94 (veth: avoid a NULL deref in veth_stats_one)
we now clear the peer pointers in veth_dellink()

veth_close() must therefore make sure the peer pointer is set.

Reported-by: Tom Parkin <tom.parkin@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d0e2c55e7c940a3ee91e9e23a2683b593690f1e9 04-Jan-2013 Eric Dumazet <edumazet@google.com> veth: avoid a NULL deref in veth_stats_one

commit 2681128f0ced8a (veth: extend device features) added a NULL deref
in veth_stats_one(), as veth_get_stats64() was not testing if the peer
device was setup or not.

At init time, we call dev_get_stats() before veth pair is fully setup.

[ 178.854758] [<ffffffffa00f5677>] veth_get_stats64+0x47/0x70 [veth]
[ 178.861013] [<ffffffff814f0a2d>] dev_get_stats+0x6d/0x130
[ 178.866486] [<ffffffff81504efc>] rtnl_fill_ifinfo+0x47c/0x930
[ 178.872299] [<ffffffff81505b93>] rtmsg_ifinfo+0x83/0x100
[ 178.877678] [<ffffffff81505cc6>] rtnl_configure_link+0x76/0xa0
[ 178.883580] [<ffffffffa00f52fa>] veth_newlink+0x16a/0x350 [veth]
[ 178.889654] [<ffffffff815061cc>] rtnl_newlink+0x4dc/0x5e0
[ 178.895128] [<ffffffff81505e1e>] ? rtnl_newlink+0x12e/0x5e0
[ 178.900769] [<ffffffff8150587d>] rtnetlink_rcv_msg+0x11d/0x310
[ 178.906669] [<ffffffff81505760>] ? __rtnl_unlock+0x20/0x20
[ 178.912225] [<ffffffff81521f89>] netlink_rcv_skb+0xa9/0xd0
[ 178.917779] [<ffffffff81502d55>] rtnetlink_rcv+0x25/0x40
[ 178.923159] [<ffffffff815218d1>] netlink_unicast+0x1b1/0x230
[ 178.928887] [<ffffffff81521c4e>] netlink_sendmsg+0x2fe/0x3b0
[ 178.934615] [<ffffffff814dbe22>] sock_sendmsg+0xd2/0xf0

So we must check if peer was setup in veth_get_stats64()

As pointed out by Ben Hutchings, priv->peer is missing proper
synchronization. Adding RCU protection is a safe and well documented
way to make sure we don't access about to be freed or already
freed data.

Reported-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8093315a91340bca52549044975d8c7f673b28a1 29-Dec-2012 Eric Dumazet <edumazet@google.com> veth: extend device features

veth is lacking most modern facilities, like SG, checksums, TSO.

It makes sense to extend dev->features to get them, or GRO aggregation
is defeated by a forced segmentation.

Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2681128f0ced8aa4e66f221197e183cc16d244fe 29-Dec-2012 Eric Dumazet <edumazet@google.com> veth: reduce stat overhead

veth stats are a bit bloated. There is no need to account transmit
and receive stats, since they are absolutely symmetric.

Also use a per device atomic64_t for the dropped counter, as it
should never be used in fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c07135633bee3f01a6454d15b6411f32cfbeb2fd 30-Nov-2012 Rami Rosen <ramirose@gmail.com> rtnelink: remove unused parameter from rtnl_create_link().

This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5ecbd51fed932ed4813df82b56e5ff4d by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
23ea5a963768ff162a9ff8654589d7f7e1dfb780 30-Oct-2012 Hannes Frederic Sowa <hannes@stressinduktion.org> veth: allow changing the mac address while interface is up

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
e6f8f1a739b652c56e6f959d6714d92e05621e21 08-Aug-2012 Pavel Emelyanov <xemul@parallels.com> veth: Allow to create peer link with given ifindex

The ifinfomsg is in there (thanks kaber@ for foreseeing this long time ago),
so take the given ifidex and register netdev with it.

Ben noticed, that this code path previously ignored ifmp->ifi_index and
userland could be passing in garbage. Thus it may now fail occasionally
because the value clashes with an existing interface.

To address this it's assumed that if the caller specifies the ifindex for
the veth master device, then it's aware of this possibility and should
explicitly specify (or set to 0 for auto-assignment) the peer's ifindex as
well. With this the compatibility with old tools not setting ifindex is
preserved.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f2cedb63df14342ad40a8b5b324fc5d94a60b665 15-Feb-2012 Danny Kukawka <danny.kukawka@bisect.de> net: replace random_ether_addr() with eth_hw_addr_random()

Replace usage of random_ether_addr() with eth_hw_addr_random()
to set addr_assign_type correctly to NET_ADDR_RANDOM.

Change the trivial cases.

v2: adapt to renamed eth_hw_addr_random()

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
237114384ab22c174ec4641e809f8e6cbcfce774 15-Feb-2012 Thomas Graf <tgraf@suug.ch> veth: Enforce minimum size of VETH_INFO_PEER

VETH_INFO_PEER carries struct ifinfomsg plus optional IFLA
attributes. A minimal size of sizeof(struct ifinfomsg) must be
enforced or we may risk accessing that struct beyond the limits
of the netlink message.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
84b405011166e663fe9ef56c29b1d76f59b35568 21-Nov-2011 Rick Jones <rick.jones2@hp.com> Sweep away N/A fw_version dustbunnies from the .get_drvinfo routine of a number of drivers

Per discussion with Ben Hutchings and David Miller, go through and
remove assignments of "N/A" to fw_version in various drivers'
.get_drvinfo routines. While there clean-up some use of bare
constants and such.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
34324dc2bf27c1773045fea63cb11f7e2a6ad2b9 15-Nov-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: remove NETIF_F_NO_CSUM feature bit

Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
enabled by default. The check heuristics is altered a bit here,
so it hits other people than before. The default shouldn't be
trusted for performance-critical cases anyway.

For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
33a5ba144e3e7ffc1cd4a1d205e99c16078885bf 15-Nov-2011 Rick Jones <rick.jones2@hp.com> net: sweep-up some straglers in strlcpy conversion of .get_drvinfo routines

Convert some remaining straglers' .get_drvinfo routines to use strlcpy
rather than strcpy/strncpy.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8ce120f11898c921329a5f618d01dcc1e8e69cac 05-Nov-2011 Eric Dumazet <eric.dumazet@gmail.com> net: better pcpu data alignment

Tunnels can force an alignment of their percpu data to reduce number of
cache lines used in fast path, or read in .ndo_get_stats()

percpu_alloc() is a very fine grained allocator, so any small hole will
be used anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9d9779e723a5d23b94abbe5bb7d1197921f6f3dd 03-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com> drivers/net: Add module.h to drivers who were implicitly using it

The device.h header was including module.h, making it present for
most of these drivers. But we want to clean that up. Call out the
include of module.h in the modular network drivers.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
550fd08c2cebad61c548def135f67aba284c6162 26-Jul-2011 Neil Horman <nhorman@tuxdriver.com> net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared

After the last patch, We are left in a state in which only drivers calling
ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
hardware call ether_setup for their net_devices and don't hold any state in
their skbs. There are a handful of drivers that violate this assumption of
course, and need to be fixed up. This patch identifies those drivers, and marks
them as not being able to support the safe transmission of skbs by clearning the
IFF_TX_SKB_SHARING flag in priv_flags

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Karsten Keil <isdn@linux-pingi.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: "John W. Linville" <linville@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Marcel Holtmann <marcel@holtmann.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
81b16ba2f1cc93a1ee1dda48be2ea2d91a0cb72e 06-Jul-2011 Eric Dumazet <eric.dumazet@gmail.com> veth: Kill unused tx_dropped

Followup to commit f82528bc13a (Exclude duplicated checking for
iface-up) : We no longer need percpu tx_dropped field.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3600cdadb7ab9ee5f4e73ed01242c3e8b8e3282c 06-Jul-2011 David S. Miller <davem@davemloft.net> veth: Kill unused code label and code block.

Signed-off-by: David S. Miller <davem@davemloft.net>
f82528bc13a157335dc53e78ce801883b26831e2 28-Jun-2011 Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Exclude duplicated checking for iface-up. This flags is checked in 'is_skb_forwardable' function, which is subroutine of 'dev_forward_skb'.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cf05c700cf6dd6f28bd95586d3040f809fd365f5 20-Jun-2011 Eric Dumazet <eric.dumazet@gmail.com> veth: fix 64bit stats on 32bit arches

Using 64bit stats on 32bit arches must use a synchronization or readers
can get transient values.

Fixes bug introduced in commit 6311cc44a2 (veth: convert to 64 bit
statistics)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6311cc44a23bb42636f5076fef0a67859d0a0102 08-Jun-2011 stephen hemminger <shemminger@vyatta.com> veth: convert to 64 bit statistics

Not much change, device was already keeping per cpu statistics.
Use recent 64 statistics interface.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
534ea99b063de7c30262a8e22f0ab44dd7d11a71 13-May-2011 Shan Wei <shanwei@cn.fujitsu.com> net: drivers: kill two unused macro definitions

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6c8c44462ac8ac3f95929328f0c56e9e8b6dd524 30-Apr-2011 Jiri Pirko <jpirko@redhat.com> Revert: veth: remove unneeded ifname code from veth_newlink()

84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 ("veth: remove unneeded
ifname code from veth_newlink()") caused regression on veth
creation. This patch reverts the original one.

Reported-by: Michał Mirosław <mirqus@gmail.com>
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
707394972093e2056e1e8cc39be19cf9bcb3e7b3 27-Apr-2011 David Decotigny <decot@google.com> ethtool: cosmetic: Use ethtool ethtool_cmd_speed API

This updates the network drivers so that they don't access the
ethtool_cmd::speed field directly, but use ethtool_cmd_speed()
instead.

For most of the drivers, these changes are purely cosmetic and don't
fix any problem, such as for those 1GbE/10GbE drivers that indirectly
call their own ethtool get_settings()/mii_ethtool_gset(). The changes
are meant to enforce code consistency and provide robustness with
future larger throughputs, at the expense of a few CPU cycles for each
ethtool operation.

All drivers compiled with make allyesconfig ion x86_64 have been
updated.

Tested: make allyesconfig on x86_64 + e1000e/bnx2x work
Signed-off-by: David Decotigny <decot@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a2c725fa39b79fcc3f09151e847cc006ff0d4389 31-Mar-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> veth: convert to hw_features

This should probably get TSO available as it's basically a loopback device.
Offloads are left disabled by default - as before.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
675071a2ef3f4a6d25ee002a7437d50431168344 22-Mar-2011 Eric W. Biederman <ebiederm@aristanetworks.com> veth: Fix the byte counters

Commit 44540960 "veth: move loopback logic to common location" introduced
a bug in the packet counters. I don't understand why that happened as it
is not explained in the comments and the mut check in dev_forward_skb
retains the assumption that skb->len is the total length of the packet.

I just measured this emperically by setting up a veth pair between two
noop network namespaces setting and attempting a telnet connection between
the two. I saw three packets in each direction and the byte counters were
exactly 14*3 = 42 bytes high in each direction. I got the actual
packet lengths with tcpdump.

So remove the extra ETH_HLEN from the veth byte count totals.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 24-Jan-2011 Jiri Pirko <jpirko@redhat.com> veth: remove unneeded ifname code from veth_newlink()

The code is not needed because tb[IFLA_IFNAME] is already
processed in rtnl_newlink(). Remove this redundancy.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0b7967503dc97864f283a3a06fbe23e041876138 14-Dec-2010 Michał Mirosław <mirq-linux@rere.qmqm.pl> net/veth: Fix packet checksumming

We can't change ip_summed from CHECKSUM_PARTIAL to CHECKSUM_NONE
or CHECKSUM_UNNECESSARY because checksum in packet's headers is
not valid and will cause invalid checksum when frame is forwarded.

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
807540baae406c84dcb9c1c8ef07a56d2d2ae84a 23-Sep-2010 Eric Dumazet <eric.dumazet@gmail.com> drivers/net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ec82562ffc6f297d0de36d65776cff8e5704867 06-May-2010 Eric Dumazet <eric.dumazet@gmail.com> veth: Dont kfree_skb() after dev_forward_skb()

In case of congestion, netif_rx() frees the skb, so we must assume
dev_forward_skb() also consume skb.

Bug introduced by commit 445409602c092
(veth: move loopback logic to common location)

We must change dev_forward_skb() to always consume skb, and veth to not
double free it.

Bug report : http://marc.info/?l=linux-netdev&m=127310770900442&w=3

Reported-by: Martín Ferrari <martin.ferrari@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
3729d5021257b283f7fce33d957893162ccb2c9d 26-Feb-2010 Patrick McHardy <kaber@trash.net> rtnetlink: support specifying device flags on device creation

commit e8469ed959c373c2ff9e6f488aa5a14971aebe1f
Author: Patrick McHardy <kaber@trash.net>
Date: Tue Feb 23 20:41:30 2010 +0100

Support specifying the initial device flags when creating a device though
rtnl_link. Devices allocated by rtnl_create_link() are marked as INITIALIZING
in order to surpress netlink registration notifications. To complete setup,
rtnl_configure_link() must be called, which performs the device flag changes
and invokes the deferred notifiers if everything went well.

Two examples:

# add macvlan to eth0
#
$ ip link add link eth0 up allmulticast on type macvlan

[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 26:f8:84:02:f9:2a brd ff:ff:ff:ff:ff:ff
[ROUTE]ff00::/8 dev macvlan0 table local metric 256 mtu 1500 advmss 1440 hoplimit 0
[ROUTE]fe80::/64 dev macvlan0 proto kernel metric 256 mtu 1500 advmss 1440 hoplimit 0
[LINK]11: macvlan0@eth0: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500
link/ether 26:f8:84:02:f9:2a
[ADDR]11: macvlan0 inet6 fe80::24f8:84ff:fe02:f92a/64 scope link
valid_lft forever preferred_lft forever
[ROUTE]local fe80::24f8:84ff:fe02:f92a via :: dev lo table local proto none metric 0 mtu 16436 advmss 16376 hoplimit 0
[ROUTE]default via fe80::215:e9ff:fef0:10f8 dev macvlan0 proto kernel metric 1024 mtu 1500 advmss 1440 hoplimit 0
[NEIGH]fe80::215:e9ff:fef0:10f8 dev macvlan0 lladdr 00:15:e9:f0:10:f8 router STALE
[ROUTE]2001:6f8:974::/64 dev macvlan0 proto kernel metric 256 expires 0sec mtu 1500 advmss 1440 hoplimit 0
[PREFIX]prefix 2001:6f8:974::/64 dev macvlan0 onlink autoconf valid 14400 preferred 131084
[ADDR]11: macvlan0 inet6 2001:6f8:974:0:24f8:84ff:fe02:f92a/64 scope global dynamic
valid_lft 86399sec preferred_lft 14399sec

# add VLAN to eth1, eth1 is down
#
$ ip link add link eth1 up type vlan id 1000
RTNETLINK answers: Network is down

<no events>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
47d742752df4c1088589d4424840bc761613ab2a 16-Feb-2010 Tejun Heo <tj@kernel.org> percpu: add __percpu sparse annotations to net drivers

Add __percpu sparse annotations to net drivers.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
445409602c09219767c06497c0dc2285eac244ed 26-Nov-2009 Arnd Bergmann <arnd@arndb.de> veth: move loopback logic to common location

The veth driver contains code to forward an skb
from the start_xmit function of one network
device into the receive path of another device.

Moving that code into a common location lets us
reuse the code for direct forwarding of data
between macvlan ports, and possibly in other
drivers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2b1c8b0f925c3f5943cf95d263d72927baae88e7 18-Nov-2009 Eric Dumazet <eric.dumazet@gmail.com> veth: Fix veth_get_stats()

veth_get_stats() can be called in parallel on several cpus.

It's better to not reset dev->stats as it could give wrong result on
one cpu. Use temporary variables, then store the final results.

Also, we should loop on every possible cpus, not only online cpus,
or cpu hotplug can suddenly give wrong veth stats.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
81adee47dfb608df3ad0b91d230fb3cef75f0060 08-Nov-2009 Eric W. Biederman <ebiederm@aristanetworks.com> net: Support specifying the network namespace upon device creation.

There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.

We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call. To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.

In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
24540535d33f72505807be3e7ef2e94f3726f971 30-Oct-2009 Eric Dumazet <eric.dumazet@gmail.com> veth: Fix veth_dellink method

In commit 23289a37e2b127dfc4de1313fba15bb4c9f0cd5b
(net: add a list_head parameter to dellink() method),
I forgot to actually use this parameter in veth_dellink.

I remember feeling a bit uncomfortable about veth_close(),
because it does :

netif_carrier_off(dev);
netif_carrier_off(priv->peer);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
23289a37e2b127dfc4de1313fba15bb4c9f0cd5b 27-Oct-2009 Eric Dumazet <eric.dumazet@gmail.com> net: add a list_head parameter to dellink() method

Adding a list_head parameter to rtnl_link_ops->dellink() methods
allow us to queue devices on a list, in order to dismantle
them all at once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e7dcaa4755e35d7540bf19f316f8798357c53fa0 03-Oct-2009 Christoph Lameter <cl@linux-foundation.org> this_cpu: Eliminate get/put_cpu

There are cases where we can use this_cpu_ptr and as the result
of using this_cpu_ptr() we no longer need to determine the
currently executing cpu.

In those places no get/put_cpu combination is needed anymore.
The local cpu variable can be eliminated.

Preemption still needs to be disabled and enabled since the
modifications of the per cpu variables is not atomic. There may
be multiple per cpu variables modified and those must all
be from the same processor.

Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
cc: Eric Biederman <ebiederm@aristanetworks.com>
cc: Stephen Hemminger <shemminger@vyatta.com>
cc: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
0fc0b732eaa38beb93a6fb62f77c7bd9622c76ec 02-Sep-2009 Stephen Hemminger <shemminger@vyatta.com> netdev: drivers should make ethtool_ops const

No need to put ethtool_ops in data, they should be const.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
424efe9caf6047ffbcd6b383ff4d2347254aabf1 31-Aug-2009 Stephen Hemminger <shemminger@vyatta.com> netdev: convert pseudo drivers to netdev_tx_t

These are all drivers that don't touch real hardware.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
27a242e92f77c955433ce0347533f401ecdcd0f3 21-Jul-2009 Ben Greear <greearb@candelatech.com> veth: Zero timestamp in xmit path.

This patch zero's the timestamp before handing the packet to
the peer interface. This lets the peer recalculate the rx timestamp
if it cares about timestamps.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6ed106549d17474ca17a16057f4c0ed4eba5a7ca 23-Jun-2009 Patrick McHardy <kaber@trash.net> net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions

This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.

Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
11687a1099583273a8a98ec42af62b5bb5a69e45 25-Jun-2009 David S. Miller <davem@davemloft.net> Revert "veth: prevent oops caused by netdev destructor"

This reverts commit ae0e8e82205c903978a79ebf5e31c670b61fa5b4.

This change had two problems:

1) Since it frees the stats in the drivers' close method, we
can OOPS in the transmit routine.

2) stats are no longer remembered across ifdown/ifup which
disagrees with how every other device operates.

Thanks to analysis and test patch from Serge E. Hallyn
and initial OOPS report by Sachin Sant.

Signed-off-by: David S. Miller <davem@davemloft.net>
60df914e295a21a223e43a7ee01e0c73c64dd111 30-May-2009 Eric Dumazet <eric.dumazet@gmail.com> veth: dont release skb->dst in veth_xmit()

No need to release skb->dst, its now done by core network.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ae0e8e82205c903978a79ebf5e31c670b61fa5b4 27-Apr-2009 Stephen Hemminger <shemminger@vyatta.com> veth: prevent oops caused by netdev destructor

From: Stephen Hemminger <shemminger@vyatta.com>

The veth driver will oops if sysfs hooks are open while module is removed.

The net device destructor can not point to code in a module; basically
there are only two possible safe values: NULL - no destructor, or
free_netdev - free on last use

Signed-off-by: David S. Miller <davem@davemloft.net>
38d408152a86598a50680a82fe3353b506630409 04-Mar-2009 Eric Biederman <ebiederm@aristanetworks.com> veth: Allow setting the L3 MTU

The limitation to only 1500 byte mtu's limits the utility of the veth
device for testing routing. So implement implement a configurable
MTU.

For consistency I drop packets on the receive side when they are
larger than the MTU. I count those drops. And I allow
a little padding for vlan headers.

I also test the mtu when a new device is created with netlink
because that path currently bypasses the current mtu setting
code.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2cf48a10aa1f45c7b1f1117a829f2f8a1a1309e2 25-Feb-2009 Eric W. Biederman <ebiederm@xmission.com> veth: Fix carrier detect

The current implementation of carrier detect in veth is broken.
It reports the link is down until both sides of the veth pair
are administatively up and then forever after it reports link up.

So fix veth so that it only reports link up when both interfaces
of the pair are administratively up.

Signed-off-by: Eric Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ee923623177249cf22c43419ad0e8ff926dd1f58 22-Feb-2009 Daniel Lezcano <daniel.lezcano@free.fr> veth : add the set_mac_address capability

Fix lost set_mac_address capability.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
008298231abbeb91bc7be9e8b078607b816d1a4a 21-Nov-2008 Stephen Hemminger <shemminger@vyatta.com> netdev: add more functions to netdevice ops

This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.

Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4456e7bdf74c9f27e2312a6f197b2da467541433 20-Nov-2008 Stephen Hemminger <shemminger@vyatta.com> veth: convert to net_device_ops

Convert to net_device_ops function table.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3717746ef8b5a9279281b5d318496710984ed739 29-Oct-2008 Daniel Lezcano <dlezcano@fr.ibm.com> veth: remove unused list

The veth network device is stored in a list in the netdev private.
AFAICS, this list is never used so I removed this list from the code.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bb7bba3d56963be59bc6764c8021290ed55205ad 29-Oct-2008 Daniel Lezcano <dlezcano@fr.ibm.com> veth: Remove useless veth field

The veth private structure contains a netdev pointer refering to its peer.
This field is never used and it is pointless because if we can access,
the veth_priv, that means we already have the netdev which is stored
in veth_priv->dev.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c346dca10840a874240c78efe3f39acf4312a1f2 25-Mar-2008 YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
c15853f2c1c9baaa27bbc494cd183be96f6d9bb9 20-Feb-2008 Daniel Lezcano <dlezcano@fr.ibm.com> veth: fix dev refcount race

When deleting the veth driver, veth_close calls netif_carrier_off
for the two extremities of the network device. netif_carrier_off on
the peer device will fire an event and hold a reference on the peer
device. Just after, the peer is unregistered taking the rtnl_lock while
the linkwatch_event is scheduled. If __linkwatch_run_queue does not
occurs before the unregistering, unregister_netdevice will wait for
the dev refcount to reach zero holding the rtnl_lock and linkwatch_event
will wait for the rtnl_lock and hold the dev refcount.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
68365458a4252fa993b91a00f7a0b18fed399f0d 21-Jan-2008 Patrick McHardy <kaber@trash.net> [NET]: rtnl_link: fix use-after-free

When unregistering the rtnl_link_ops, all existing devices using
the ops are destroyed. With nested devices this may lead to a
use-after-free despite the use of for_each_netdev_safe() in case
the upper device is next in the device list and is destroyed
by the NETDEV_UNREGISTER notifier.

The easy fix is to restart scanning the device list after removing
a device. Alternatively we could add new devices to the front of
the list to avoid having dependant devices follow the device they
depend on. A third option would be to only restart scanning if
dev->iflink of the next device matches dev->ifindex of the current
one. For now this seems like the safest solution.

With this patch, the veth rtnl_link_ops unregistration can use
rtnl_link_unregister() directly since it now also handles destruction
of multiple devices at once.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
ecef969e5b376f98b142e22deb1cec2f23e4f5d6 26-Dec-2007 Stephen Hemminger <shemminger@linux-foundation.org> [VETH]: move veth.h to include/linux

Move veth.h from net/ to linux/ since it is a user api, and add it to
user header processing Kbuild.

[ Use header-y as suggested by Sam Ravnborg. -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
b9f2c0440d806e01968c3ed4def930a43be248ad 04-Oct-2007 Jeff Garzik <jeff@garzik.org> [netdrvr] Stop using legacy hooks ->self_test_count, ->get_stats_count

These have been superceded by the new ->get_sset_count() hook.

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
881d966b48b035ab3f3aeaae0f3d3f9b584f45b2 17-Sep-2007 Eric W. Biederman <ebiederm@xmission.com> [NET]: Make the device list and device lookups per namespace.

This patch makes most of the generic device layer network
namespace safe. This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables. The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces. The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace. This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change. Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e314dbdc1c0dc6a548ecf0afce28ecfd538ff568 26-Sep-2007 Pavel Emelyanov <xemul@openvz.org> [NET]: Virtual ethernet device driver.

Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.

Mainly it allows to communicate between network namespaces but
it can be used as is as well.

The newlink callback is organized that way to make it easy to
create the peer device in the separate namespace when we have
them in kernel.

This implementation uses another interface - the RTM_NRELINK
message introduced by Patric.

Bug fixes from Daniel Lezcano.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>