History log of /net/openvswitch/datapath.c
Revision Date Author Comments
8ec609d8b561468691b60347ff594bd443ea58c0 12-Nov-2014 Pravin B Shelar <pshelar@nicira.com> openvswitch: Convert dp rcu read operation to locked operations

dp read operations depends on ovs_dp_cmd_fill_info(). This API
needs to looup vport to find dp name, but vport lookup can
fail. Therefore to keep vport reference alive we need to
take ovs lock.

Introduced by commit 6093ae9abac1 ("openvswitch: Minimize
dp and vport critical sections").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
330966e501ffe282d7184fde4518d5e0c24bc7f8 20-Oct-2014 Florian Westphal <fw@strlen.de> net: make skb_gso_segment error handling more robust

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
f5796684069e0c71c65bce6a6d4766114aec1396 04-Oct-2014 Jesse Gross <jesse@nicira.com> openvswitch: Add support for Geneve tunneling.

The Openvswitch implementation is completely agnostic to the options
that are in use and can handle newly defined options without
further work. It does this by simply matching on a byte array
of options and allowing userspace to setup flows on this array.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Singed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Thomas Graf <tgraf@noironetworks.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6b205b2ca17e88ef5e10451b720056b790cc63a5 04-Oct-2014 Jesse Gross <jesse@nicira.com> openvswitch: Factor out allocation and verification of actions.

As the size of the flow key grows, it can put some pressure on the
stack. This is particularly true in ovs_flow_cmd_set(), which needs several
copies of the key on the stack. One of those uses is logically separate,
so this factors it out to reduce stack pressure and improve readibility.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
67fa034194bf82a3d5ca841759d921297daa63ca 04-Oct-2014 Jesse Gross <jesse@nicira.com> openvswitch: Add support for matching on OAM packets.

Some tunnel formats have mechanisms for indicating that packets are
OAM frames that should be handled specially (either as high priority or
not forwarded beyond an endpoint). This provides support for allowing
those types of packets to be matched.

Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b67aa4a82492f128adfccc63e61ab57c1ce1dfd 18-Sep-2014 Samuel Gauthier <samuel.gauthier@6wind.com> openvswitch: restore OVS_FLOW_CMD_NEW notifications

Since commit fb5d1e9e127a ("openvswitch: Build flow cmd netlink reply only if needed."),
the new flows are not notified to the listeners of OVS_FLOW_MCGROUP.

This commit fixes the problem by using the genl function, ie
genl_has_listerners() instead of netlink_has_listeners().

Signed-off-by: Samuel Gauthier <samuel.gauthier@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
971427f353f3c42c8dcef62e7124440df68eb809 16-Sep-2014 Andy Zhou <azhou@nicira.com> openvswitch: Add recirc and hash action.

Recirc action allows a packet to reenter openvswitch processing.
currently openvswitch lookup flow for packet received and execute
set of actions on that packet, with help of recirc action we can
process/modify the packet and recirculate it back in openvswitch
for another pass.

OVS hash action calculates 5-tupple hash and set hash in flow-key
hash. This can be used along with recirculation for distributing
packets among different ports for bond devices.
For example:
OVS bonding can use following actions:
Match on: bond flow; Action: hash, recirc(id)
Match on: recirc-id == id and hash lower bits == a;
Action: output port_bond_a

Signed-off-by: Andy Zhou <azhou@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
8c8b1b83fcdd0f05e1f66ed6f8a2e831d5d374a2 16-Sep-2014 Pravin B Shelar <pshelar@nicira.com> openvswitch: Use tun_key only for egress tunnel path.

Currently tun_key is used for passing tunnel information
on ingress and egress path, this cause confusion. Following
patch removes its use on ingress path make it egress only parameter.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
83c8df26a3b654871c0503fcf6eac61777e12ea1 16-Sep-2014 Pravin B Shelar <pshelar@nicira.com> openvswitch: refactor ovs flow extract API.

OVS flow extract is called on packet receive or packet
execute code path. Following patch defines separate API
for extracting flow-key in packet execute code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
2ff3e4e4868675da1024175215991fa6d9856731 16-Sep-2014 Pravin B Shelar <pshelar@nicira.com> openvswitch: Remove pkt_key from OVS_CB

OVS keeps pointer to packet key in skb->cb, but the packet key is
store on stack. This could make code bit tricky. So it is better to
get rid of the pointer.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
c5eba0b6f84eb4f0fdc1d8a4abc1c7d40db6e8a6 03-Sep-2014 Li RongQing <roy.qing.li@gmail.com> openvswitch: distinguish between the dropped and consumed skb

distinguish between the dropped and consumed skb, not assume the skb
is consumed always

Cc: Thomas Graf <tgraf@noironetworks.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4ee45ea05c8710c7ab8a5eb1a72700b874712746 02-Sep-2014 Li RongQing <roy.qing.li@gmail.com> openvswitch: fix a memory leak

The user_skb maybe be leaked if the operation on it failed and codes
skipped into the label "out:" without calling genlmsg_unicast.

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2072ec846a4c4ee99a6e59ce989b49b22edad59d 07-Aug-2014 Jean Sacren <sakiwit@gmail.com> openvswitch: fix duplicate #include headers

The #include headers net/genetlink.h and linux/genetlink.h both were
included twice, so delete each of the duplicate.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
d0e992aa0270663872c56f07473a7f43adee5bd5 26-Jul-2014 Himangi Saraogi <himangi774@gmail.com> openvswitch: Use IS_ERR_OR_NULL

This patch introduces the use of the macro IS_ERR_OR_NULL in place of
tests for NULL and IS_ERR.

The following Coccinelle semantic patch was used for making the change:

@@
expression e;
@@

- e == NULL || IS_ERR(e)
+ IS_ERR_OR_NULL(e)
|| ...

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f53e38317d581399eb67809d6b6b6c2c107db50c 18-Jul-2014 Andy Zhou <azhou@nicira.com> openvswitch: Avoid memory corruption in queue_userspace_packet()

In queue_userspace_packet(), the ovs_nla_put_flow return value is
not checked. This is fine as long as key_attr_size() returns the
correct value. In case it does not, the current code may corrupt buffer
memory. Add a run time assertion catch this case to avoid silent
failure.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
5cd667b0a4567048bb555927d6ee564f4e5620a9 18-Jul-2014 Alex Wang <alexw@nicira.com> openvswitch: Allow each vport to have an array of 'port_id's.

In order to allow handlers directly read upcalls from datapath,
we need to support per-handler netlink socket for each vport in
datapath. This commit makes this happen. Also, it is guaranteed
to be backward compatible with previous branch.

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
48e48a70c08a8a68f8697f8b30cb83775bda8001 16-Jul-2014 stephen hemminger <stephen@networkplumber.org> openvswitch: make generic netlink group const

Generic netlink tables can be const.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5b9e7e1607956e2454ccbd94ccf5631309ade054 26-Jun-2014 Jiri Pirko <jiri@resnulli.us> openvswitch: introduce rtnl ops stub

This stub now allows userspace to see IFLA_INFO_KIND for ovs master and
IFLA_INFO_SLAVE_KIND for slave.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
4a46b24e147dfa9b858026da02cad0bdd4e149d2 01-Jul-2014 Alex Wang <alexw@nicira.com> openvswitch: Use exact lookup for flow_get and flow_del.

Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath. And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Introduced by 03f0d916a (openvswitch: Mega flow implementation).

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
ad55200734c65a3ec5d0c39d6ea904008baea536 07-May-2014 Ben Pfaff <blp@nicira.com> openvswitch: Fix tracking of flags seen in TCP flows.

Flow statistics need to take into account the TCP flags from the packet
currently being processed (in 'key'), not the TCP flags matched by the
flow found in the kernel flow table (in 'flow').

This bug made the Open vSwitch userspace fin_timeout action have no effect
in many cases.
This bug is introduced by commit 88d73f6c411ac2f0578 (openvswitch: Use
TCP flags in the flow key for stats.)

Reported-by: Len Gao <leng@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
0c200ef94c9492205e18a18c25650cf27939889c 07-May-2014 Pravin B Shelar <pshelar@nicira.com> openvswitch: Simplify genetlink code.

Following patch get rid of struct genl_family_and_ops which is
redundant due to changes to struct genl_family.

Signed-off-by: Kyle Mestery <mestery@noironetworks.com>
Acked-by: Kyle Mestery <mestery@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
893f139b9a6c00c097b9082a90f3041cfb3a0d20 06-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Minimize ovs_flow_cmd_new|set critical sections.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
37bdc87ba00dadd0156db77ba48224d042202435 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Split ovs_flow_cmd_new_or_set().

Following patch will be easier to reason about with separate
ovs_flow_cmd_new() and ovs_flow_cmd_set() functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
aed067783e505bf66dcafa8647d08619eb5b1c55 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Minimize ovs_flow_cmd_del critical section.

ovs_flow_cmd_del() now allocates reply (if needed) after the flow has
already been removed from the flow table. If the reply allocation
fails, a netlink error is signaled with netlink_set_err(), as is
already done in ovs_flow_cmd_new_or_set() in the similar situation.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
0e9796b4af9ef490e203158cb738a5a4986eb75c 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Reduce locking requirements.

Reduce and clarify locking requirements for ovs_flow_cmd_alloc_info(),
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info().

A datapath pointer is available only when holding a lock. Change
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info() to take a
dp_ifindex directly, rather than a datapath pointer that is then
(only) used to get the dp_ifindex. This is useful, since the
dp_ifindex is available even when the datapath pointer is not, both
before and after taking a lock, which makes further critical section
reduction possible.

Make ovs_flow_cmd_alloc_info() take an 'acts' argument instead a
'flow' pointer. This allows some future patches to do the allocation
before acquiring the flow pointer.

The locking requirements after this patch are:

ovs_flow_cmd_alloc_info(): May be called without locking, must not be
called while holding the RCU read lock (due to memory allocation).
If 'acts' belong to a flow in the flow table, however, then the
caller must hold ovs_mutex.

ovs_flow_cmd_fill_info(): Either ovs_mutex or RCU read lock must be held.

ovs_flow_cmd_build_info(): This calls both of the above, so the caller
must hold ovs_mutex.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
6093ae9abac18871afd0bbc5cf093dff53112fcb 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Minimize dp and vport critical sections.

Move most memory allocations away from the ovs_mutex critical
sections. vport allocations still happen while the lock is taken, as
changing that would require major refactoring. Also, vports are
created very rarely so it should not matter.

Change ovs_dp_cmd_get() now only takes the rcu_read_lock(), rather
than ovs_lock(), as nothing need to be changed. This was done by
ovs_vport_cmd_get() already.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
fb5d1e9e127ad1542e5db20cd8620a1509baef69 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Build flow cmd netlink reply only if needed.

Use netlink_has_listeners() and NLM_F_ECHO flag to determine if a
reply is needed or not for OVS_FLOW_CMD_NEW, OVS_FLOW_CMD_SET, or
OVS_FLOW_CMD_DEL. Currently, OVS userspace does not request a reply
for OVS_FLOW_CMD_NEW, but usually does for OVS_FLOW_CMD_DEL, as stats
may have changed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
bb6f9a708d4067713afae2e9eb2637f6b4c01ecb 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Clarify locking.

Remove unnecessary locking from functions that are always called with
appropriate locking.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
be52c9e96a6657d117bb0ec6e11438fb246af5c7 05-May-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Avoid assigning a NULL pointer to flow actions.

Flow SET can accept an empty set of actions, with the intended
semantics of leaving existing actions unmodified. This seems to have
been brokin after OVS 1.7, as we have assigned the flow's actions
pointer to NULL in this case, but we never check for the NULL pointer
later on. This patch restores the intended behavior and documents it
in the include/linux/openvswitch.h.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
23dabf88abb48a866fdb19ee08ebcf1ddd9b1840 27-Mar-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Remove 5-tuple optimization.

The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch. Remove it first to make the changes easier to
grasp.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
36d5fe6a000790f56039afe26834265db0a3ad4c 26-Mar-2014 Zoltan Kiss <zoltan.kiss@citrix.com> core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors

skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
57a7744e09867ebcfa0ccf1d6d529caa7728d552 14-Mar-2014 Eric W. Biederman <ebiederm@xmission.com> net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq

Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
42ee19e2939277a5277c307e517ce2d7ba5f0703 16-Feb-2014 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Fix race.

ovs_vport_cmd_dump() did rcu_read_lock() only after getting the
datapath, which could have been deleted in between. Resolved by
taking rcu_read_lock() before the get_dp() call.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
3c7eacfc8a9a4c2bd48e0093c4f43cf69afd5210 14-Feb-2014 Jiri Pirko <jiri@resnulli.us> ovs: fix dp check in ovs_dp_reset_user_features

This fixes crash when userspace does "ovs-dpctl add-dp dev" where dev is
existing non-dp netdevice.

Introduced by:
commit 44da5ae5fbea4686f667dc854e5ea16814e44c59
"openvswitch: Drop user features if old user space attempted to create datapath"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jesse Gross <jesse@nicira.com>
df9d9fdf8fdad710949ce52a403684c991ced29b 15-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com> openvswitch: rename ->sync to ->syncp

Openvswitch defines u64_stats_sync as ->sync rather than ->syncp,
so fails to compile with netdev_alloc_pcpu_stats(). So just rename it to ->syncp.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: 1c213bd24ad04f4430031 (net: introduce netdev_alloc_pcpu_stats() for drivers)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1c213bd24ad04f4430031d20d740d7783162b099 13-Feb-2014 WANG Cong <xiyou.wangcong@gmail.com> net: introduce netdev_alloc_pcpu_stats() for drivers

There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c14e0953ca51dbcb8d1ac92acbdcff23d0caa158 03-Feb-2014 Andy Zhou <azhou@nicira.com> openvswitch: Suppress error messages on megaflow updates

With subfacets, we'd expect megaflow updates message to carry
the original micro flow. If not, EINVAL is returned and kernel
logs an error message. Now that the user space subfacet layer is
removed, it is expected that flow updates can arrive with a
micro flow other than the original. Change the return code to
EEXIST and remove the kernel error log message.

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
45fb9c35b27c9982e9a55d04ed0a5230a2d0b306 23-Jan-2014 Daniele Di Proietto <daniele.di.proietto@gmail.com> openvswitch: Fix ovs_dp_cmd_msg_size()

commit 43d4be9cb55f3bac5253e9289996fd9d735531db (openvswitch: Allow user space
to announce ability to accept unaligned Netlink messages) introduced
OVS_DP_ATTR_USER_FEATURES netlink attribute in datapath responses,
but the attribute size was not taken into account in ovs_dp_cmd_msg_size().

Signed-off-by: Daniele Di Proietto <daniele.di.proietto@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
e80857cce82da31e41a6599fc888dfc92e0167cc 21-Jan-2014 Andy Zhou <azhou@nicira.com> openvswitch: Fix kernel panic on ovs_flow_free

Both mega flow mask's reference counter and per flow table mask list
should only be accessed when holding ovs_mutex() lock. However
this is not true with ovs_flow_table_flush(). The patch fixes this bug.

Reported-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
aea0bb4f8ee513537ad84b9f3f609f96e272d98e 14-Jan-2014 Thomas Graf <tgraf@suug.ch> openvswitch: Pad OVS_PACKET_ATTR_PACKET if linear copy was performed

While the zerocopy method is correctly omitted if user space
does not support unaligned Netlink messages. The attribute is
still not padded correctly as skb_zerocopy() will not ensure
padding and the attribute size is no longer pre calculated
though nla_reserve() which ensured padding previously.

This patch applies appropriate padding if a linear data copy
was performed in skb_zerocopy().

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
443cd88c8a31379e95326428bbbd40af25c1d440 17-Dec-2013 Stephen Hemminger <stephen@networkplumber.org> ovs: make functions local

Several functions and datastructures could be local
Found with 'make namespacecheck'

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
09c5e6054e206ecf13945f50711856a5cb2d5de1 13-Dec-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Compute checksum in skb_gso_segment() if needed

The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
bda56f143c9dc38ae7926ba21ebeb35359a6c051 13-Dec-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Use skb_zerocopy() for upcall

Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.

Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.

Cost of upcall is significantly reduced from:
+ 7.48% vhost-8471 [k] memcpy
+ 5.57% ovs-vswitchd [k] memcpy
+ 2.81% vhost-8471 [k] csum_partial_copy_generic

to:
+ 5.72% ovs-vswitchd [k] memcpy
+ 3.32% vhost-5153 [k] memcpy
+ 0.68% vhost-5153 [k] skb_zerocopy

(megaflows disabled)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
8055a89cfa533f70bea5970727a50e220bb7d18e 13-Dec-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Pass datapath into userspace queue functions

Allows removing the net and dp_ifindex argument and simplify the
code.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
44da5ae5fbea4686f667dc854e5ea16814e44c59 13-Dec-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Drop user features if old user space attempted to create datapath

Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
43d4be9cb55f3bac5253e9289996fd9d735531db 13-Dec-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Allow user space to announce ability to accept unaligned Netlink messages

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
e298e505700604c97e6a9edb21cebb080bdb91f6 30-Oct-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Per cpu flow stats.

With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
795449d8b846a42d11d47d6ff2f51ab2967411c3 30-Nov-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Enable memory mapped Netlink i/o

Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.

Benchmark
* pktgen -> ovs internal port
* 5M pkts, 5M flows
* 4 threads, 8 cores

Before:
Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
74163pps 5339Mb/sec (5339736000bps) errors: 0
+ 2.98% ovs-vswitchd [k] copy_user_generic_string
+ 2.49% ovs-vswitchd [k] memcpy
+ 1.84% kpktgend_2 [k] memcpy
+ 1.81% kpktgend_1 [k] memcpy
+ 1.81% kpktgend_3 [k] memcpy
+ 1.78% kpktgend_0 [k] memcpy

After:
Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
206358pps 14857Mb/sec (14857776000bps) errors: 0
+ 2.80% ovs-vswitchd [k] memcpy
+ 1.31% kpktgend_2 [k] memcpy
+ 1.23% kpktgend_0 [k] memcpy
+ 1.09% kpktgend_1 [k] memcpy
+ 1.04% kpktgend_3 [k] memcpy
+ 0.96% ovs-vswitchd [k] copy_user_generic_string

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
663efa3696232300a8ad3a46bb10482fc0b861cf 03-Dec-2013 Jesse Gross <jesse@nicira.com> openvswitch: Silence RCU lockdep checks from flow lookup.

Flow lookup can happen either in packet processing context or userspace
context but it was annotated as requiring RCU read lock to be held. This
also allows OVS mutex to be held without causing warnings.

Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
5bb506324d150578afadd10c3198ef5b29f5876b 25-Nov-2013 Andy Zhou <azhou@nicira.com> openvswitch: Change ovs_flow_tbl_lookup_xx() APIs

API changes only for code readability. No functional chnages.

This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.

Reported by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2a94fe48f32ccf7321450a2cc07f2b724a444e5b 19-Nov-2013 Johannes Berg <johannes.berg@intel.com> genetlink: make multicast groups const, prevent abuse

Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
68eb55031da7c967d954e5f9415cd05f4abdb692 19-Nov-2013 Johannes Berg <johannes.berg@intel.com> genetlink: pass family to functions using groups

This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
62b68e99faa802352e9cb2ae91adecd8dfddf1b8 19-Nov-2013 Johannes Berg <johannes.berg@intel.com> genetlink: add and use genl_set_err()

Add a static inline to generic netlink to wrap netlink_set_err()
to make it easier to use here - use it in openvswitch (the only
generic netlink user of netlink_set_err()).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c53ed7423619b4e8108914a9f31b426dd58ad591 19-Nov-2013 Johannes Berg <johannes.berg@intel.com> genetlink: only pass array to genl_register_family_with_ops()

As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4534de8305b3f1460a527a0cda0e3dc2224c6f0c 14-Nov-2013 Johannes Berg <johannes.berg@intel.com> genetlink: make all genl_ops users const

Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:

@@
identifier ops;
@@
+const
struct genl_ops ops[] = {
...
};

(except the struct thing in net/openvswitch/datapath.c)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
827da44c61419f29ae3be198c342e2147f1a10cb 08-Oct-2013 John Stultz <john.stultz@linaro.org> net: Explicitly initialize u64_stats_sync structures for lockdep

In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
df23e9f642830f10c505c8a3d57772ad1238c701 23-Oct-2013 Jarno Rajahalme <jrajahalme@nicira.com> openvswitch: Widen TCP flags handling.

Widen TCP flags handling from 7 bits (uint8_t) to 12 bits (uint16_t).
The kernel interface remains at 8 bits, which makes no functional
difference now, as none of the higher bits is currently of interest
to the userspace.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
1bd7116f1cb833c998cddb6b188df463342069d8 22-Oct-2013 Andy Zhou <azhou@nicira.com> openvswitch: collect mega flow mask stats

Collect mega flow mask stats. ovs-dpctl show command can be used to
display them for debugging and performance tuning.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
618ed0c805b64c820279f50732110ab873221c3b 04-Oct-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Simplify mega-flow APIs.

Hides mega-flow implementation in flow_table.c rather than
datapath.c.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
b637e4988c2d689bb43f943a5af0e684a4981159 04-Oct-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Move mega-flow list out of rehashing struct.

ovs-flow rehash does not touch mega flow list. Following patch
moves it dp struct datapath. Avoid one extra indirection for
accessing mega-flow list head on every packet receive.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
e64457191a259537bbbfaebeba9a8043786af96f 04-Oct-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Restructure datapath.c and flow.c

Over the time datapath.c and flow.c has became pretty large files.
Following patch restructures functionality of component into three
different components:

flow.c: contains flow extract.
flow_netlink.c: netlink flow api.
flow_table.c: flow table api.

This patch restructures code without changing logic.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
e7f133290660d976da8cb20e9bc7310d0cd19341 17-Sep-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Move flow table rehashing to flow install.

Rehashing in ovs-workqueue can cause ovs-mutex lock contentions
in case of heavy flow setups where both needs ovs-mutex. So by
moving rehashing to flow-setup we can eliminate contention.
This also simplify ovs locking and reduces dependence on
workqueue.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
a175a723301a8a4a9fedf9ce5b8ca586e7a97b40 22-Aug-2013 Joe Stringer <joe@wand.net.nz> openvswitch: Add SCTP support

This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.

Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.

Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
03f0d916aa0317592dda11bd17c7357858719b6c 08-Aug-2013 Andy Zhou <azhou@nicira.com> openvswitch: Mega flow implementation

Add wildcarded flow support in kernel datapath.

Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.

In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.

Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
59a35d60af3ab80037ad2fa2d60671ce2818b9e4 31-Jul-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Use RCU lock for dp dump operation.

RCUfy dp-dump operation which is already read-only. This
makes all ovs dump operations lockless.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
d57170b1b1d71382a0d9cf31b01364a97add3f19 31-Jul-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Use RCU lock for flow dump operation.

Flow dump operation is read-only operation. There is no need to
take ovs-lock. Following patch use rcu-lock for dumping flows.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
30444e981ba28e892c439017fbc011d867f02a7d 13-May-2013 Jesse Gross <jesse@nicira.com> openvswitch: Fix bad merge resolution.

git silently included an extra hunk in vport_cmd_set() during
automatic merging. This code is unreachable so it does not actually
introduce a problem but it is clearly incorrect.

Signed-off-by: Jesse Gross <jesse@nicira.com>
a3e82996a8874c4cfe8c7f1be4d552018d8cba7e 18-Jun-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Optimize flow key match for non tunnel flows.

Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7d5437c709ded4f152cb8b305d17972d6707f20c 18-Jun-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Add tunneling interface.

Add ovs tunnel interface for set tunnel action for userspace.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
74f84a5726c7d08c27745305e67474b8645c541d 18-Jun-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Copy individual actions.

Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.

This patch does not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
93d8fd1514b6862c3370ea92be3f3b4216e0bf8f 13-Jun-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Simplify interface ovs_flow_metadata_from_nlattrs()

This is not functional change, this is just code cleanup.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
f44f340883388b57fe03edfb0982e038e57a992c 13-May-2013 Jesse Gross <jesse@nicira.com> openvswitch: Immediately exit on error in ovs_vport_cmd_set().

It is an error to try to change the type of a vport using the set
command. However, while we check that this is an error, we still
proceed to allocate memory which then gets freed immediately.
This stops processing after noticing the error, which does not
actually fix a bug but is more correct.

Signed-off-by: Jesse Gross <jesse@nicira.com>
cff63a52924c6a78fa525c67d81480c85736ff3c 29-Apr-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Remove unneeded ovs_netdev_get_ifindex()

The only user is get_dpifindex(), no need to redirect via the port
operations.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3a4e0d6a95b2b6f7b22eb7c7361a0fc4289478eb 23-Apr-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Use parallel_ops genl.

OVS locking was recently changed to have private OVS lock which
simplified overall locking. Therefore there is no need to have
another global genl lock to protect OVS data structures. Following
patch uses of parallel_ops genl family for OVS. This also allows
more granual OVS locking using ovs_mutex for protecting OVS data
structures, which gives more concurrencey. E.g multiple genl
operations OVS_PACKET_CMD_EXECUTE can run in parallel, etc.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
86a9bad3ab6b6f858fd4443b48738cabbb6d094c 19-Apr-2013 Patrick McHardy <kaber@trash.net> net: vlan: add protocol argument to packet tagging functions

Add a protocol argument to the VLAN packet tagging functions. In case of HW
tagging, we need that protocol available in the ndo_start_xmit functions,
so it is stored in a new field in the skb. The new field fits into a hole
(on 64 bit) and doesn't increase the sks's size.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
8e4e1713e4978447c5f799aa668dcc6d2cb0dee9 15-Apr-2013 Pravin B Shelar <pshelar@nicira.com> openvswitch: Simplify datapath locking.

Currently OVS uses combination of genl and rtnl lock to protect
datapath state. This was done due to networking stack locking.
But this has complicated locking and there are few lock ordering
issues with new tunneling protocols.
Following patch simplifies locking by introducing new ovs mutex
and now this lock is used to protect entire ovs state.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
ed661185859cecfcbe3a0e585563525498b3f405 29-Mar-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Move common genl notify code into ovs_notify()

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
c3ff8cfe3e7748a93c4815b76e464d54c7efd241 29-Mar-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Refine Netlink message size calculation and kill FLOW_BUFSIZE

Kills the FLOW_BUFSIZE constant which needs to be calculated manually
and replaces it with key_attr_size() based on nla_total_size().
Calculates the size of datapath messages instead of relying on
NLMSG_DEFAULT_SIZE and moves the existing message size calculations
into own functions for clarity.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
32686a9d2988516788cfcc402e1355c1eba1186a 29-Mar-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Use nla_memcpy() to memcpy() data from attributes

Less error prone as it takes into account the length of both the
destination buffer and the source attribute and documents when
data is copied from an attribute.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
dded45fc179a07f4463ce37fc376977568655836 29-Mar-2013 Thomas Graf <tgraf@suug.ch> openvswitch: Specify the minimal length of OVS_PACKET_ATTR_PACKET in the policy

Specifying the minimal length in the policy makes it reuseable
and documents the interface.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
e5c5d22e8dcf7c2d430336cbf8e180bd38e8daf1 28-Mar-2013 Simon Horman <horms@verge.net.au> net: add ETH_P_802_3_MIN

Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
an 802.3 frame. Frames with a lower value in the ethernet type field
are Ethernet II.

Also update all the users of this value that David Miller and
I could find to use the new constant.

Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
should be >= not >.

As suggested by Jesse Gross.

Compile tested only.

Cc: David Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-wireless@vger.kernel.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-media@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: dev@openvswitch.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
a9341512c372fcc628dabc619898d910a06c54bc 26-Mar-2013 Jesse Gross <jesse@nicira.com> openvswitch: Preallocate reply skb in ovs_vport_cmd_set().

Allocation of the Netlink notification skb can potentially fail
after changing vport configuration. In general, we try to avoid
this by undoing any change we made but that is difficult for existing
objects. This avoids the problem by preallocating the buffer (which
is fixed size).

Signed-off-by: Jesse Gross <jesse@nicira.com>
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a 28-Feb-2013 Sasha Levin <sasha.levin@oracle.com> hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a15ff76c955d17cf58313097e4a24124da022b1d 15-Feb-2013 Rich Lane <rlane@bigswitch.com> openvswitch: Call genlmsg_end in queue_userspace_packet

Without genlmsg_end the upcall message ends (according to nlmsg_len)
after the struct ovs_header.

Signed-off-by: Rich Lane <rlane@bigswitch.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
cb7c5bdffb727a3d4dea5247d9d1d52238b01d90 08-Feb-2013 Rich Lane <rlane@bigswitch.com> openvswitch: Fix ovs_vport_cmd_new return value on success

If the pointer does not represent an error then the PTR_ERR
macro may still return a nonzero value.

Signed-off-by: Rich Lane <rlane@bigswitch.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
734907e82d21a75a514b80164185427a832a00c0 08-Feb-2013 Rich Lane <rlane@bigswitch.com> openvswitch: Fix ovs_vport_cmd_del return value on success

If the pointer does not represent an error then the PTR_ERR macro may still
return a nonzero value. The fix is the same as in ovs_vport_cmd_set.

Signed-off-by: Rich Lane <rlane@bigswitch.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
4490108b4a5ada14c7be712260829faecc814ae5 16-Feb-2013 Ben Pfaff <blp@nicira.com> openvswitch: Allow OVS_USERSPACE_ATTR_USERDATA to be variable length.

Until now, the optional OVS_USERSPACE_ATTR_USERDATA attribute had to be
exactly 64 bits long, if it was present. However, 64 bits is not enough
space to associate as much information with a flow as would be convenient
for some userspace features now under development. This commit generalizes
the attribute, allowing it to be any length.

This generalization is backward-compatible: if userspace only uses 64-bit
attributes, then it will not see any change in behavior.

CC: Romain Lenglet <rlenglet@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
12b0004d1d1e2a9aa667412d479041e403bcafae 05-Feb-2013 Cong Wang <amwang@redhat.com> net: adjust skb_gso_segment() for calling in rx path

skb_gso_segment() is almost always called in tx path,
except for openvswitch. It calls this function when
it receives the packet and tries to queue it to user-space.
In this special case, the ->ip_summed check inside
skb_gso_segment() is no longer true, as ->ip_summed value
has different meanings on rx path.

This patch adjusts skb_gso_segment() so that we can at least
avoid such warnings on checksum.

Cc: Jesse Gross <jesse@nicira.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3523b29bd2cd58432ea1bf3629695ec1ab607722 09-Jan-2013 YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org> openvswitch: Use FIELD_SIZEOF() in dp_init().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14408dba8440ef629a3a2827bc4c7b5045889295 09-Jan-2013 Jarno Rajahalme <jarno.rajahalme@nsn.com> openvswitch: Change ENOENT return value to ENODEV in lookup_vport().

This reduces the number of valid "no such device" error values that
need special attention by the caller.

Userspace code will need to keep on checking for both ENODEV and
ENOENT as long as older kernel modules are around.

Signed-off-by: Jarno Rajahalme <jarno.rajahalme@nsn.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
39c7caebc94e851f58b84b54659156dd30522e8e 26-Nov-2012 Ansis Atteka <aatteka@nicira.com> openvswitch: add skb mark matching and set action

This patch adds support for skb mark matching and set action.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
404f2f1019c0293bd91dc1c03c8557ec97d9d104 13-Nov-2012 Shan Wei <davidshan@tencent.com> net: openvswitch: use this_cpu_ptr per-cpu helper

just use more faster this_cpu_ptr instead of per_cpu_ptr(p, smp_processor_id());

Signed-off-by: Shan Wei <davidshan@tencent.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
3fdbd1ce11e5c0d7cafbe44c942c5cad61113d7b 14-Nov-2012 Ansis Atteka <aatteka@nicira.com> openvswitch: add ipv6 'set' action

This patch adds ipv6 set action functionality. It allows to change
traffic class, flow label, hop-limit, ipv6 source and destination
address fields.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
15e473046cb6e5d18a4d0057e61d76315230382b 07-Sep-2012 Eric W. Biederman <ebiederm@xmission.com> netlink: Rename pid to portid to avoid confusion

It is a frequent mistake to confuse the netlink port identifier with a
process identifier. Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
15eac2a74277bc7de68a7c2a64a7c91b4b6f5961 23-Aug-2012 Pravin B Shelar <pshelar@nicira.com> openvswitch: Increase maximum number of datapath ports.

Use hash table to store ports of datapath. Allow 64K ports per switch.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
46df7b814548849deee01f50bc75f8f5ae8cd767 23-Feb-2012 Pravin B Shelar <pshelar@nicira.com> openvswitch: Add support for network namespaces.

Following patch adds support for network namespace to openvswitch.
Since it must release devices when namespaces are destroyed, a
side effect of this patch is that the module no longer keeps a
refcount but instead cleans up any state when it is unloaded.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
4185392da4b4b494e51934c51b999b4df424afba 07-Aug-2012 Jesse Gross <jesse@nicira.com> openvswitch: Relax set header validation.

When installing a flow with an action to set a particular field we
need to validate that the packets that are part of the flow actually
contain that header. With IP we use zeroed addresses and with TCP/UDP
the check is for zeroed ports. This check is overly broad and can catch
packets like DHCP requests that have a zero source address in a
legitimate header. This changes the check to look for a zeroed protocol
number for IP or for both ports be zero for TCP/UDP before considering
the header to not exist.

Reported-by: Ethan Jackson <ethan@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
a1b5d0dd28e9cb4fe42ad2df4ebbe5cce96866d7 20-Jul-2012 Ben Pfaff <blp@nicira.com> openvswitch: Check gso_type for correct sk_buff in queue_gso_packets().

At the point where it was used, skb_shinfo(skb)->gso_type referred to a
post-GSO sk_buff. Thus, it would always be 0. We want to know the pre-GSO
gso_type, so we need to obtain it before segmenting.

Before this change, the kernel would pass inconsistent data to userspace:
packets for UDP fragments with nonzero offset would be passed along with
flow keys that indicate a zero offset (that is, the flow key for "later"
fragments claimed to be "first" fragments). This inconsistency tended
to confuse Open vSwitch userspace, causing it to log messages about
"failed to flow_del" the flows with "later" fragments.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
92e5dfc34cf39c20ae1087bd5e676238b5d0dfac 20-Jul-2012 Pravin B Shelar <pshelar@nicira.com> openvswitch: Check currect return value from skb_gso_segment()

Fix return check typo.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
8aa51d64c1f526e43b1e7f89fb8b98c2fd583f4b 13-May-2012 Dan Carpenter <dan.carpenter@oracle.com> openvswitch: checking wrong variable in queue_userspace_packet()

"skb" is non-NULL here, for example we dereference it in skb_clone().
The intent was to test "nskb" which was just set.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
072ae6314a191e3a9fc309b1e4e539ac7abc48ad 08-May-2012 Pravin B Shelar <pshelar@nicira.com> openvswitch: Validation of IPv6 set port action uses IPv4 header

When the kernel validates set TCP/UDP port actions, it looks at
the ports in the existing flow to make sure that the L4 header exists.
However, these actions always use the IPv4 version of the struct.
Following patch fixes this by checking for flow ip protocol first.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
caf2ee14bbc2c6bd73cf0decf576007e0239a482 04-May-2012 Raju Subramanian <rsubramanian@nicira.com> openvswitch: Replace Nicira Networks.

Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.

Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
4cb6e116bb97c8b87a1f4f95e99d0c8dda2a6e9b 04-May-2012 Ansis Atteka <aatteka@nicira.com> openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.

This patch fixes a possible lock-up bug where rtnl_lock might not
get released.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
03fbf8b38792448370343f240131d9fde19d0387 09-Apr-2012 Ansis Atteka <aatteka@nicira.com> openvswitch: Do not send notification if ovs_vport_set_options() failed

There is no need to send a notification if ovs_vport_set_options() failed
and ovs_vport_cmd_set() did not change anything.

Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
028d6a6767456d6c84a72d3451f19fe7ca7b47db 30-Mar-2012 David S. Miller <davem@davemloft.net> openvswitch: Stop using NLA_PUT*().

These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.

Signed-off-by: David S. Miller <davem@davemloft.net>
9ffc93f203c18a70623f21950f1dd473c9ec48cd 28-Mar-2012 David Howells <dhowells@redhat.com> Remove all #inclusions of asm/system.h

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`

Signed-off-by: David Howells <dhowells@redhat.com>
651a68ea2ce9738b84e928836053b2e0fb5db2ba 07-Mar-2012 Ben Pfaff <blp@nicira.com> openvswitch: Honor dp_ifindex, when specified, for vport lookup by name.

When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the
logical behavior would be for the vport name lookup scope to be limited
to the specified datapath, but in fact the dp_ifindex value was ignored.
This commit causes the search scope to be honored.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
77676fdbd54f0c6fdb71d55d9758bebc69a00fc4 17-Jan-2012 Ben Pfaff <blp@nicira.com> openvswitch: Fix multipart datapath dumps.

The logic to split up the list of datapaths into multiple Netlink messages
was simply wrong, causing the list to be terminated after the first part.
Only about the first 50 datapaths would be dumped. This fixes the
problem.

Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8d9d399f14ea65aeb50c7404e986bceede93bb99 14-Jan-2012 Devendra Naga <devendra.aaru@gmail.com> net: remove version.h includes in net/openvswitch/

remove version.h includes in net/openswitch/ as reported by make versioncheck.

Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ccb1352e76cff0524e7ccb2074826a092dd13016 26-Oct-2011 Jesse Gross <jesse@nicira.com> net: Add Open vSwitch kernel components.

Open vSwitch is a multilayer Ethernet switch targeted at virtualized
environments. In addition to supporting a variety of features
expected in a traditional hardware switch, it enables fine-grained
programmatic extension and flow-based control of the network.
This control is useful in a wide variety of applications but is
particularly important in multi-server virtualization deployments,
which are often characterized by highly dynamic endpoints and the need
to maintain logical abstractions for multiple tenants.

The Open vSwitch datapath provides an in-kernel fast path for packet
forwarding. It is complemented by a userspace daemon, ovs-vswitchd,
which is able to accept configuration from a variety of sources and
translate it into packet processing rules.

See http://openvswitch.org for more information and userspace
utilities.

Signed-off-by: Jesse Gross <jesse@nicira.com>