History log of /drivers/infiniband/core/cma.c
Revision Date Author Comments
30dc5e63d6a5ad24894b5512d10b228d73645a44 26-Mar-2014 Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> RDMA/core: Add support for iWARP Port Mapper user space service

This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP
Port Mapper implementation is based on the port mapper specification
section in the Sockets Direct Protocol paper -
http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

Existing iWARP RDMA providers use the same IP address as the native
TCP/IP stack when creating RDMA connections. They need a mechanism to
claim the TCP ports used for RDMA connections to prevent TCP port
collisions when other host applications use TCP ports. The iWARP Port
Mapper provides a standard mechanism to accomplish this. Without this
service it is possible for RDMA application to bind/listen on the same
port which is already being used by native TCP host application. If
that happens the incoming TCP connection data can be passed to the
RDMA stack with error.

The iWARP Port Mapper solution doesn't contain any changes to the
existing network stack in the kernel space. All the changes are
contained with the infiniband tree and also in user space.

The iWARP Port Mapper service is implemented as a user space daemon
process. Source for the IWPM service is located at
http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

The iWARP driver (port mapper client) sends to the IWPM service the
local IP address and TCP port it has received from the RDMA
application, when starting a connection. The IWPM service performs a
socket bind from user space to get an available TCP port, called a
mapped port, and communicates it back to the client. In that sense,
the IWPM service is used to map the TCP port, which the RDMA
application uses to any port available from the host TCP port
space. The mapped ports are used in iWARP RDMA connections to avoid
collisions with native TCP stack which is aware that these ports are
taken. When an RDMA connection using a mapped port is terminated, the
client notifies the IWPM service, which then releases the TCP port.

The message exchange between the IWPM service and the iWARP drivers
(between user space and kernel space) is implemented using netlink
sockets.

1) Netlink interface functions are added: ibnl_unicast() and
ibnl_mulitcast() for sending netlink messages to user space

2) The signature of the existing ibnl_put_msg() is changed to be more
generic

3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
corresponding to the two iWarp drivers - nes and cxgb4 which use
the IWPM service

4) Enums are added to enumerate the attributes in the netlink
messages, which are exchanged between the user space IWPM service
and the iWARP drivers

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com>

[ Fold in range checking fixes and nlh_next removal as suggested by Dan
Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
b2853fd6c2d0f383dbdf7427e263eb576a633867 27-Mar-2014 Moni Shoua <monis@mellanox.com> IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler

The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module. It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
5462eddd7a78131ccb514d52473625d99769215e 16-Nov-2013 Somnath Kotur <somnath.kotur@emulex.com> RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal()

If addr is not a linklocal address, the code incorrectly fails to
return and ends up assigning the scope ID to the scope id of the
address, which is wrong. Fix by checking if it's a link local address
first, and immediately return 0 if not.

Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
7b85627b9f02f9b0fb2ef5f021807f4251135857 12-Dec-2013 Moni Shoua <monis@mellanox.com> IB/cma: IBoE (RoCE) IP-based GID addressing

Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
63862b5bef7349dd1137e4c70702c67d77565785 11-Jan-2014 Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> net: replace macros net_random and net_srandom with direct calls to prandom

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dd5f03beb4f76ae65d76d8c22a8815e424fc607c 12-Dec-2013 Matan Barak <matanb@mellanox.com> IB/core: Ethernet L2 attributes in verbs/cm structures

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP. We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message. We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB. Vendor drivers are modified to support the new
function signature.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
352b9056352fdfc2e70f81e3f73b9b5c85b2207b 10-Nov-2013 Michal Nazarewicz <mina86@mina86.com> RDMA/cma: Remove unused argument and minor dead code

The dev variable is never assigned after being initialised.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
be9130cc927f13ad0159650ed03159ecc82c0262 24-Sep-2013 Doug Ledford <dledford@redhat.com> IB/cma: Check for GID on listening device first

As a simple optimization that should speed up the vast majority of
connect attemps on IB devices, when we are searching for the GID of an
incoming connection in the cached GID lists of devices, search the
device that received the incoming connection request first. If we
don't find it there, then move on to other devices.

This reduces the time to perform 10,000 connections considerably.
Prior to this patch, a bad run of cmtime would look like this:

connect : 12399.26 12351.10 8609.00 1239.93

With this patch, it looks more like this:

connect : 5864.86 5799.80 8876.00 586.49

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
29f27e8477aee4e619889add998a1976d67b60a5 24-Sep-2013 Doug Ledford <dledford@redhat.com> IB/cma: Use cached gids

The cma_acquire_dev function was changed by commit 3c86aa70bf67
("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
because multiport devices might have either IB or IBoE formatted gids.
The old function assumed that all ports on the same device used the
same GID format.

However, when it was changed to use find_gid_port(), we inadvertently
lost usage of the GID cache. This turned out to be a very costly
change. In our testing, each iteration through each index of the GID
table takes roughly 35us. When you have multiple devices in a system,
and the GID you are looking for is on one of the later devices, the
code loops through all of the GID indexes on all of the early devices
before it finally succeeds on the target device. This pathological
search behavior combined with 35us per GID table index retrieval
results in results such as the following from the cmtime application
that's part of the latest librdmacm git repo:

ib1:
step total ms max ms min us us / conn
create id : 29.42 0.04 1.00 2.94
bind addr : 186705.66 19.00 18556.00 18670.57
resolve addr : 41.93 9.68 619.00 4.19
resolve route: 486.93 0.48 101.00 48.69
create qp : 4021.95 6.18 330.00 402.20
connect : 68350.39 68588.17 24632.00 6835.04
disconnect : 1460.43 252.65-1862269.00 146.04
destroy : 41.16 0.04 2.00 4.12

ib0:
step total ms max ms min us us / conn
create id : 28.61 0.68 1.00 2.86
bind addr : 2178.86 2.95 201.00 217.89
resolve addr : 51.26 16.85 845.00 5.13
resolve route: 620.08 0.43 92.00 62.01
create qp : 3344.40 6.36 273.00 334.44
connect : 6435.99 6368.53 7844.00 643.60
disconnect : 5095.38 321.90 757.00 509.54
destroy : 37.13 0.02 2.00 3.71

Clearly, both the bind address and connect operations suffer
a huge penalty for being anything other than the default
GID on the first port in the system.

After applying this patch, the numbers now look like this:

ib1:
step total ms max ms min us us / conn
create id : 30.15 0.03 1.00 3.01
bind addr : 80.27 0.04 7.00 8.03
resolve addr : 43.02 13.53 589.00 4.30
resolve route: 482.90 0.45 100.00 48.29
create qp : 3986.55 5.80 330.00 398.66
connect : 7141.53 7051.29 5005.00 714.15
disconnect : 5038.85 193.63 918.00 503.88
destroy : 37.02 0.04 2.00 3.70

ib0:
step total ms max ms min us us / conn
create id : 34.27 0.05 1.00 3.43
bind addr : 26.45 0.04 1.00 2.64
resolve addr : 38.25 10.54 760.00 3.82
resolve route: 604.79 0.43 97.00 60.48
create qp : 3314.95 6.34 273.00 331.49
connect : 12399.26 12351.10 8609.00 1239.93
disconnect : 5096.76 270.72 1015.00 509.68
destroy : 37.10 0.03 2.00 3.71

It's worth noting that we still suffer a bit of a penalty on
connect to the wrong device, but the penalty is much less than
it used to be. Follow on patches deal with this penalty.

Many thanks to Neil Horman for helping to track the source of
slow function that allowed us to track down the fact that
the original patch I mentioned above backed out cache usage
and identify just how much that impacted the system.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
eb072c4b8da0ba87bc870c7911aae180bae34d4a 06-Nov-2013 Eyal Perry <eyalpe@mellanox.com> RDMA/cma: Set IBoE SL (user-priority) by egress map when using vlans

On top of commit 366cddb40 "IB/rdma_cm: TOS <=> UP mapping for IBoE", add
support for case vlan egress map is used.

When the IBoE session is being set over a vlan, inherit the socket priority
to vlan priority mapping which was configured for the vlan device egress map.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
0bbf87d852d243680ed7074110ccc1dea003b61a 28-Sep-2013 Eric W. Biederman <ebiederm@xmission.com> net ipv4: Convert ipv4.ip_local_port_range to be per netns v3

- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range

Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
24d44a391f1b5d56e9c7a4fc1edd085687864ff9 04-Jul-2013 Steve Wise <swise@opengridcomputing.com> RDMA/cma: Add IPv6 support for iWARP

Modify the type of local_addr and remote_addr fields in struct
iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold
IPv6 and IPv4 addresses uniformly.

Change the references of local_addr and remote_addr in cxgb4, cxgb3,
nes and amso drivers to match this. However to be able to actully run
traffic over IPv6, low-level drivers have to add code to support this.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>

[ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set.
- Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
5eb695c1773b439fb668127d3738d348a46a2748 25-Jul-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Only call cma_save_ib_info() for CM REQs

Calling cma_save_ib_info() for CM SIDR REQs results in a crash
accessing an invalid path record pointer.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
e511d1ae16745baca1e6d807c5b963716e8bdd01 25-Jul-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix accessing invalid private data for UD

If a application is using AF_IB with a UD QP, but does not provide any
private data, we will end up accessing invalid memory. Check for this
case and handle it appropriately.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
8fb488d740582314534c278b5d1e3a1888b850b9 25-Jul-2013 Paul Bolle <pebolle@tiscali.nl> RDMA/cma: Fix gcc warning

Building cma.o triggers this gcc warning:

drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’:
drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here

This is a false positive, as "port" will always be initialized if we're
at "found". But if we assign to "id_priv->id.port_num" directly, we can
drop "port". That will, obviously, silence gcc.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
8c6ffba0eddc8c110dbf444f51354ce42069abfc 15-Jul-2013 Rusty Russell <rusty@rustcorp.com.au> PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.

Sweep of the simple cases.

Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
ce117ffac2e933344d0ea01b1cb3b56627fcb6e7 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Export AF_IB statistics

Report AF_IB source and destination addresses through netlink
interface.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
5bc2b7b397b02026a0596a7807443a18422733fa 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/ucma: Allow user space to specify AF_IB when joining multicast

Allow user space applications to join multicast groups using MGIDs
directly. MGIDs may be passed using AF_IB addresses. Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.

Since AF_IB allows the user to specify the qkey when resolving a
remote UD QP address, when joining the multicast group use the qkey
value, if one has been assigned.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
cf53936f229d81131fef475919f163ce566a205f 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Export cma_get_service_id()

Allow the rdma_ucm to query the IB service ID formed or allocated by
the rdma_cm by exporting the cma_get_service_id() functionality.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
94d0c939416480066d4e4d69e0d3c217bc083cea 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Only listen on IB devices when using AF_IB

If an rdma_cm_id is bound to AF_IB, with a wild card address, only
listen on IB devices.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
5c438135adf90b33cb00e5351becf1e557bbdd9d 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Set qkey for AF_IB

Allow the user to specify the qkey when using AF_IB. The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
e8160e15930969de709ba9b46df9571448b78ce5 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Expose private data when using AF_IB

If the source or destination address is AF_IB, then do not reserve a
portion of the private data in the IB CM REQ or SIDR REQ messages for
the cma header. Instead, all private data should be exported to the
user. When AF_IB is used, the rdma cm does not have sufficient
information to fill in the cma header. Additionally, this will be
necessary to support any IB connection through the rdma cm interface,

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
fbaa1a6d852aa6878059acb18cbc336746795a56 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Merge cma_get/save_net_info

With the removal of SDP related code, we can merge cma_get_net_info()
with cma_save_net_info(), since we're only ever dealing with a single
header format.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
01602f113f7ceee51e04edd1db972b79bedfe3aa 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Remove unused SDP related code

The SDP protocol was never merged upstream. Remove unused SDP related
code from the RDMA CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
496ce3ce17f4b4f1b5f6edf9d2aedc4787a31c2f 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add support for AF_IB to cma_get_service_id()

cma_get_service_id() forms the service ID based on the port space and
port number of the rdma_cm_id. Extend the call to support AF_IB,
which contains the service ID directly. This will be needed to
support any arbitrary SID.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
f68194ca88ecca70e7e9064949d9d1f6e4b3a647 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add support for AF_IB to rdma_resolve_route()

Allow rdma_resolve_route() to handle the case where the user specified
the source and destination addresses using AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
f17df3b0dede861e3c3e20225731fcbe1b1041c3 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add support for AF_IB to rdma_resolve_addr()

Allow the user to specify the remote address using AF_IB format. When
AF_IB is used, the remote address simply needs to be recorded, and no
resolution using ARP is done. The local address may still need to be
matched with a local IB device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
4ae7152e0bf98ac91a2835dce07f6fdb9f6407bd 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Verify that source and dest sa_family are the same

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
b0569e40753aa4f742cfd792447a093ab7937563 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Restrict AF_IB loopback to binding to IB devices only

If a user specifies AF_IB as the source address for a loopback
connection, limit the resolution to IB devices only.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
f4753834b5d06cae9a1d5453c96760571876a014 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add helper functions to return id address information

Provide inline helpers to extract source and destination address data
from the rdma_cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
6a3e362d3ce60d6a9f634572486c2c21a4ccfe69 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Do not modify sa_family when setting loopback address

cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port. Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.

As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback. cma_bind_loopback is only invoked when
the source address is the loopback address.

Finally, add loopback support for AF_IB as part of the change.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
680f920a2e24725e694d9958a08226384750217b 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Allow user to specify AF_IB when binding

Modify rdma_bind_addr to allow the user to specify AF_IB when binding
to a device. AF_IB indicates that the user is not mapping an IP
address to the native IB addressing. (The mapping may have already
been done, or is not needed)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
58afdcb7382234ebd780e43b17edde92a5853cca 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Update port reservation to support AF_IB

The AF_IB uses a 64-bit service id (SID), which the user can control
through the use of a mask. The rdma_cm will assign values to the
unmasked portions of the SID based on the selected port space and port
number.

Because the IB spec divides the SID range into several regions, a
SID/mask combination may fall into one of the existing port space
ranges as defined by the RDMA CM IP Annex. Map the AF_IB SID to the
correct RDMA port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ef560861c01c301cde3da154eb9c1c2619924c3a 29-May-2013 Sean Hefty <sean.hefty@intel.com> IB/addr: Add AF_IB support to ip_addr_size

Add support for AF_IB to ip_addr_size, and rename the function to
account for the change. Give the compiler more control over whether
the call should be inline or not by moving the definition into the .c
file, removing the static inline, and exporting it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2e2d190c5eb05d5a2615f4092e5fe821710404f9 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Include AF_IB in loopback and any address checks

Enhance checks for loopback and any address to support AF_IB in
addition to AF_INET and AF_INT6. This will allow future patches to
use AF_IB when binding and resolving addresses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
c8dea2f9f078395ebac7c92aeb919f02ff3fca88 29-May-2013 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Allow enabling reuseaddr in any state

The rdma_cm only allows setting reuseaddr if the corresponding
rdma_cm_id is in the idle state. Allow setting this value in other
states. This brings the behavior more inline with sockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
351638e7deeed2ec8ce451b53d33921b3da68f83 28-May-2013 Jiri Pirko <jiri@resnulli.us> net: pass info struct via netdevice notifier

So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>

v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
b67bfe0d42cac56c512dd5da4b1b347a23f4b70a 28-Feb-2013 Sasha Levin <sasha.levin@oracle.com> hlist: drop the node parameter from iterators

I'm not sure why, but the hlist for each entry iterators were conceived

list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3b069c5d857a5f1b8cb6bb74c70d9446089f5077 28-Feb-2013 Tejun Heo <tj@kernel.org> IB/core: convert to idr_alloc()

Convert to the much saner new idr interface.

v2: Mike triggered WARN_ON() in idr_preload() because send_mad(),
which may be used from non-process context, was calling
idr_preload() unconditionally. Preload iff @gfp_mask has
__GFP_WAIT.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reported-by: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
63f05be2c075022d1b3227037f82ad75c4d5ab1d 28-Nov-2012 shefty <sean.hefty@intel.com> RDMA/cm: Change return value from find_gid_port()

Problem reported by Dan Carpenter <dan.carpenter@oracle.com>:

The patch 3c86aa70bf67: "RDMA/cm: Add RDMA CM support for IBoE
devices" from Oct 13, 2010, leads to the following warning:
net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create()
error: passing non neg 1 to ERR_PTR

This bug would result in a NULL dereference. svc_rdma_create() is
supposed to return ERR_PTRs or valid pointers, but instead it returns
ERR_PTRs, valid pointers and 1.

The call tree is:

svc_rdma_create()
=> rdma_bind_addr()
=> cma_acquire_dev()
=> find_gid_port()

rdma_bind_addr() should return a valid errno. Fix this by having
find_gid_port() also return a valid errno. If we can't find the
specified GID on a given port, return -EADDRNOTAVAIL, rather than
-EAGAIN, to better indicate the error. We also drop using the
special return value of '1' and instead pass through the error
returned by the underlying verbs call. On such errors, rather
than aborting the search, we simply continue to check the next
device/port.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
809d5fc9bf6589276a12bd4fd611e4c7ff9940c3 04-Oct-2012 Gao feng <gaofeng@cn.fujitsu.com> infiniband: pass rdma_cm module to netlink_dump_start

set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
4ede178a5eb55fe70070fcd0960b58ba6a5643a8 04-Oct-2012 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Check that retry count values are in range

The retry_count and rnr_retry_count connection parameters are both
3-bit values. Check that the values are in range and reduce if
they're not.

This fixes a problem reported by Doug Ledford <dledford@redhat.com>
that resulted in the userspace rping test (part of the librdmacm
samples) failing to run over Intel IB HCAs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Use min_t() to avoid warnings about type mismatch. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2a22fb8c69275903b8be4c6203aa08bfac374844 30-Aug-2012 Dotan Barak <dotanb@dev.mellanox.co.il> RDMA/cma: Use consistent component mask for IPoIB port space multicast joins

CMA multicast joins for the IPoIB port space need to use the same
component mask used by the ipoib driver. Otherwise, it's possible for
the CMA to create a group to which a join made by ipoib will fail, or
vise-versa. Some of the component mask fields set by ipoib weren't
set by the CMA, fix that.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
4e289045286c5f20d7d8f86b18679bad9e9a2003 27-Jul-2012 Fengguang Wu <fengguang.wu@intel.com> RDMA/cma: Use PTR_RET rather than if (IS_ERR(...)) + PTR_ERR

Suggested by scripts/coccinelle/api/ptr_ret.cocci.

Signed-off-by: Roland Dreier <roland@purestorage.com>
d90f9b3591b3b5fa86178e318008fc1c531a84dc 06-Jul-2012 Roland Dreier <roland@purestorage.com> IB: Use IS_ENABLED(CONFIG_IPV6)

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Roland Dreier <roland@purestorage.com>
68602120e496a31d8e3b36d0bfc7d9d2456fb05c 14-Jun-2012 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Allow user to restrict listens to bound address family

Provide an option for the user to specify that listens should only
accept connections where the incoming address family matches that of
the locally bound address. This is used to support the equivalent of
IPV6_V6ONLY socket option, which allows an app to only accept
connection requests directed to IPv6 addresses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
406b6a25f85271397739a7e9f5af1df665b8a0d0 14-Jun-2012 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Listen on specific address family

The rdma_cm maps IPv4 and IPv6 addresses to the same service ID. This
prevents apps from listening only for IPv4 or IPv6 addresses. It also
results in an app binding to an IPv4 address receiving connection
requests for an IPv6 address.

Change this to match socket behavior: restrict listens on IPv4
addresses to only IPv4 addresses, and if a listen is on an IPv6
address, allow it to receive either IPv4 or IPv6 addresses, based on
its address family binding.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
5b0ec991c0576c54db75803fbcb0ef5bebfa0828 14-Jun-2012 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Bind to a specific address family

The RDMA CM uses a single port space for all associated (tcp, udp,
etc.) port bindings, regardless of the address family that the user
binds to. The result is that if a user binds to AF_INET, but does not
specify an IP address, the bind will occur for AF_INET6. This causes
an attempt to bind to the same port using AF_INET6 to fail, and
connection requests to AF_INET6 will match with the AF_INET listener.
Align the behavior with sockets and restrict the bind to AF_INET only.

If a user binds to AF_INET6, we bind the port to AF_INET6 and
AF_INET depending on the value of bindv6only.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
4dd81e895655c59bd19d7a8f03a5de1310f4aeb6 14-Jun-2012 Sean Hefty <sean.hefty@intel.com> RDMA/cma: QP type check on received REQs should be AND not OR

Change || check to the intended && when checking the QP type in a
received connection request against the listening endpoint.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
b6cec8aa4a799d1e146095f0ba52454710f5ede4 25-Apr-2012 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix lockdep false positive recursive locking

The following lockdep problem was reported by Or Gerlitz <ogerlitz@mellanox.com>:

[ INFO: possible recursive locking detected ]
3.3.0-32035-g1b2649e-dirty #4 Not tainted
---------------------------------------------
kworker/5:1/418 is trying to acquire lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0138a41>] rdma_destroy_i d+0x33/0x1f0 [rdma_cm]

but task is already holding lock:
(&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disable_ca llback+0x24/0x45 [rdma_cm]

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&id_priv->handler_mutex);
lock(&id_priv->handler_mutex);

*** DEADLOCK ***

May be due to missing lock nesting notation

3 locks held by kworker/5:1/418:
#0: (ib_cm){.+.+.+}, at: [<ffffffff81042ac1>] process_one_work+0x210/0x4a 6
#1: ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81042ac1>] process_on e_work+0x210/0x4a6
#2: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disab le_callback+0x24/0x45 [rdma_cm]

stack backtrace:
Pid: 418, comm: kworker/5:1 Not tainted 3.3.0-32035-g1b2649e-dirty #4
Call Trace:
[<ffffffff8102b0fb>] ? console_unlock+0x1f4/0x204
[<ffffffff81068771>] __lock_acquire+0x16b5/0x174e
[<ffffffff8106461f>] ? save_trace+0x3f/0xb3
[<ffffffff810688fa>] lock_acquire+0xf0/0x116
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81364351>] mutex_lock_nested+0x64/0x2ce
[<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffff81065abc>] ? trace_hardirqs_on+0xd/0xf
[<ffffffffa0138a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
[<ffffffffa0139c02>] cma_req_handler+0x418/0x644 [rdma_cm]
[<ffffffffa012ee88>] cm_process_work+0x32/0x119 [ib_cm]
[<ffffffffa0130299>] cm_req_handler+0x928/0x982 [ib_cm]
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffffa0130326>] cm_work_handler+0x33/0xfe5 [ib_cm]
[<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
[<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
[<ffffffff81042b6e>] process_one_work+0x2bd/0x4a6
[<ffffffff81042ac1>] ? process_one_work+0x210/0x4a6
[<ffffffff813669f3>] ? _raw_spin_unlock_irq+0x2b/0x40
[<ffffffff8104316e>] worker_thread+0x1d6/0x350
[<ffffffff81042f98>] ? rescuer_thread+0x241/0x241
[<ffffffff81046a32>] kthread+0x84/0x8c
[<ffffffff8136e854>] kernel_thread_helper+0x4/0x10
[<ffffffff81366d59>] ? retint_restore_args+0xe/0xe
[<ffffffff810469ae>] ? __init_kthread_worker+0x56/0x56
[<ffffffff8136e850>] ? gs_change+0xb/0xb

The actual locking is fine, since we're dealing with different locks,
but from the same lock class. cma_disable_callback() acquires the
listening id mutex, whereas rdma_destroy_id() acquires the mutex for
the new connection id. To fix this, delay the call to
rdma_destroy_id() until we've released the listening id mutex.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
366cddb4028858079a9a8145bd713c13a9edb3dc 04-Apr-2012 Amir Vadai <amirv@mellanox.com> IB/rdma_cm: TOS <=> UP mapping for IBoE

Both tagged traffic and untagged traffic use tc tool mapping.
Treat RDMA TOS same as IP TOS when mapping to SL

Signed-off-by: Amir Vadai <amirv@mellanox.com>
CC: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
46ea5061c750fc6b088b9566234287115fb70f98 06-Dec-2011 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix endianness bugs

Fix endianness bugs reported by sparse in the RDMA core stack. Note
that these are real bugs, but don't affect any existing code to the
best of my knowledge. The mlid issue would only affect kernel users
of rdma_join_multicast which have the rdma_cm attach/detach its QP.
There are no current in tree users that do this. (rdma_join_multicast
may be used called by user space applications, which does not have
this issue.) And the pkey setting is simply returned as
informational.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
04ded1672402577cd3f390c764f3046cc704a42a 06-Dec-2011 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Verify private data length

private_data_len is defined as a u8. If the user specifies a large
private_data size (> 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0. This
can lead to overwriting random kernel memory. Avoid this by verifying
that the resulting size fits into a u8.

Reported-by: B. Thery <benjamin.thery@bull.net>
Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
4e3fd7a06dc20b2d8ec6892233ad2012968fe7b6 21-Nov-2011 Alexey Dobriyan <adobriyan@gmail.com> net: remove ipv6_addr_copy()

C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e4dd23d753c3cb0d8533d353069e8b2e8a666360 27-May-2011 Paul Gortmaker <paul.gortmaker@windriver.com> infiniband: Fix up module files that need to include module.h

They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
18c441a6c3741991bfb87a3c6c541d30f0eb9c7c 29-May-2011 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Support XRC QPs

Allow users to connect XRC QPs through the rdma_cm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2d2e94152928209de13dea0535242c0e457bdcbb 29-May-2011 Sean Hefty <sean.hefty@intel.com> RDMA/cm: Define new RDMA port space specific to IB

Add RDMA_PS_IB. XRC QP types will use the IB port space when operating
over the RDMA CM. For the 'IP protocol' field value, we select 0x3F,
which is listed as being for 'any local network'.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
3ebeebc38b4b13384aba97f2e4acd6b48d47a65c 25-Sep-2011 Kumar Sanghvi <kumaras@chelsio.com> RDMA/iwcm: Propagate ird/ord values upwards

Update struct iw_cm_event to support propagating the ird/ord values
upwards to the application.

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
f45ee80eb0dda1fbf32bf63189627a9e1e157a95 06-Oct-2011 Hefty, Sean <sean.hefty@intel.com> RDMA/cma: Check for NULL conn_param in rdma_accept

Check that conn_param is not null before dereferencing it when
processing rdma_accept(). This is necessary to prevent a possible
system crash, which can be caused by user space.

Problem found by code inspection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
9595480c5dd1f01e477e8c993d6b24fa484eca3f 06-Oct-2011 Hefty, Sean <sean.hefty@intel.com> RDMA/cma: Fix crash in cma_req_handler

The RDMA CM uses the local qp_type to determine how to process an
incoming request. This can result in an incoming REQ being treated as
a SIDR REQ and vice versa. Fix this by switching off the event type
instead, and for good measure verify that the listener supports the
incoming connection request.

This problem showed up when a user space application mismatched the QP
types between a client and server app.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2efdd6a038b2d72e74deb8d4725db1663135268c 12-Jul-2011 Moni Shoua <monis@mellanox.co.il> RDMA/cma: Don't allow IPoIB port space for IBoE

This patch fixes a kernel crash in cma_set_qkey().

When the link layer is Ethernet, it is wrong to use IPoIB port space
since no IPoIB interface is available. Specifically, setting the
Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
an SA query, which doesn't make sense over Ethernet.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
0c9361fcdeccd316ed67d85aa819c08066964c5b 17-Jul-2011 Jack Morgenstein <jackm@dev.mellanox.co.il> RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object

Avoid assigning an IS_ERR value to the cm_id pointer. This fixes a
few anomalies in the error flow due to confusion about checking for
NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
every time we wish to determine if the cma_id object has a cm device
associated with it.

Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
check directly for the existence of the device pointer -- for a
non-NULL check, makes no difference if it is the iwarp or the ib
pointer).

Finally, make a few code changes here to improve coding consistency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
83e9502d8db142822f3302e6a46a45082d3a27b7 13-Jan-2011 Nir Muchtar <nirm@voltaire.com> RDMA/cma: Save PID of ID's owner

Save the PID associated with an RDMA CM ID for reporting via netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
753f618ae03a699cc30044d64eea84165016dc6e 03-Jan-2011 Nir Muchtar <nirm@voltaire.com> RDMA/cma: Add support for netlink statistics export

Add callbacks and data types for statistics export of all current
devices/ids. The schema for RDMA CM is a series of netlink messages.
Each one contains an rdma_cm_stat struct. Additionally, two netlink
attributes are created for the addresses for each message (if
applicable).

Their types used are:
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
sockaddr_* structs are encapsulated within these attributes.

In other words, every transaction contains a series of messages like:

-------message 1-------
struct rdma_cm_id_stats {
__u32 qp_num;
__u32 bound_dev_if;
__u32 port_space;
__s32 pid;
__u8 cm_state;
__u8 node_type;
__u8 port_num;
__u8 reserved;
}
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
-------end 1-------
-------message 2-------
struct rdma_cm_id_stats
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
-------end 2-------

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
b26f9b9949013fec31b23c426fc463164ae08891 01-Apr-2010 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Pass QP type into rdma_create_id()

The RDMA CM currently infers the QP type from the port space selected
by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type. For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
550e5ca77e96989c5e19f60e017205b2bcc615a5 20-May-2011 Nir Muchtar <nirm@voltaire.com> RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>

Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
in an exported header so that it can be exported via RDMA netlink.

Signed-off-by: Nir Muchtar <nirm@voltaire.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
a9bb79128aa659f97b774b97c9bb1bdc74444595 10-May-2011 Hefty, Sean <sean.hefty@intel.com> RDMA/cma: Add an ID_REUSEADDR option

Lustre requires that clients bind to a privileged port number before
connecting to a remote server. On larger clusters (typically more
than about 1000 nodes), the number of privileged ports is exhausted,
resulting in lustre being unusable.

To handle this, we add support for reusable addresses to the rdma_cm.
This mimics the behavior of the socket option SO_REUSEADDR. A user
may set an rdma_cm_id to reuse an address before calling
rdma_bind_addr() (explicitly or implicitly). If set, other
rdma_cm_id's may be bound to the same address, provided that they all
have reuse enabled, and there are no active listens.

If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
will only succeed if there are no other id's bound to that same
address. The reuse option is exported to user space. The behavior of
the kernel reuse implementation was verified against that given by
sockets.

This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
43b752daae9445a3b2b075a236840d801fce1593 10-May-2011 Hefty, Sean <sean.hefty@intel.com> RDMA/cma: Fix handling of IPv6 addressing in cma_use_port

cma_use_port() assumes that the sockaddr is an IPv4 address. Since
IPv6 addressing is supported (and also to support other address
families) make the code more generic in its address handling.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
a396d43a35fb91f2d4920a4700d25ecc5ec92404 23-Feb-2011 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one

rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
rdma_cm_id has been bound to a device. This prevents an active
address resolution callback handler from assigning a device to the
rdma_cm_id after rdma_destroy_id checks for one.

Instead, we can replace the use of the global lock around the check to
the rdma_cm_id device pointer by setting the id state to destroying,
then flushing all active callbacks. The latter is accomplished by
acquiring and releasing the handler_mutex. Any active handler will
complete first, and any newly scheduled handlers will find the
rdma_cm_id in an invalid state.

In addition to optimizing the current locking scheme, the use of the
rdma_cm_id mutex is a more intuitive synchronization mechanism than
that of the global lock. These changes are based on feedback from
Doug Ledford <dledford@redhat.com> while he was trying to debug a
crash in the rdma cm destroy path.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
25ae21a10112875763c18b385624df713a288a05 23-Feb-2011 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix crash in request handlers

Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS. The crash has the following call trace:

cm_process_work
cma_req_handler
cma_disable_callback
rdma_create_id
kzalloc
init_completion
cma_get_net_info
cma_save_net_info
cma_any_addr
cma_zero_addr
rdma_translate_ip
rdma_copy_addr
cma_acquire_dev
rdma_addr_get_sgid
ib_find_cached_gid
cma_attach_to_dev
ucma_event_handler
kzalloc
ib_copy_ah_attr_to_user
cma_comp

[ preempted ]

cma_write
copy_from_user
ucma_destroy_id
copy_from_user
_ucma_find_context
ucma_put_ctx
ucma_free_ctx
rdma_destroy_id
cma_exch
cma_cancel_operation
rdma_node_get_transport

rt_mutex_slowunlock
bad_area_nosemaphore
oops_enter

They were able to reproduce the crash multiple times with the
following details:

Crash seems to always happen on the:
mutex_unlock(&conn_id->handler_mutex);
as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers. When a new connection request is received, the rdma_cm
allocates a new connection identifier. This identifier has a single
reference count on it. If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory. However, the request
handlers may still be in the process of running. When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
af7bd463761c6abd8ca8d831f9cc0ac19f3b7d4b 26-Aug-2010 Eli Cohen <eli@dev.mellanox.co.il> IB/core: Add VLAN support for IBoE

Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
GID derived from a link local address in the following way:

GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.

The 3 bits user priority field of the packets are identical to the 3
bits of the SL.

In case of rdma_cm apps, the TOS field is used to generate the SL
field by doing a shift right of 5 bits effectively taking to 3 MS bits
of the TOS field.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
3c86aa70bf677a31b71c8292e349242e26cbc743 13-Oct-2010 Eli Cohen <eli@dev.mellanox.co.il> RDMA/cm: Add RDMA CM support for IBoE devices

Add support for IBoE device binding and IP --> GID resolution. Path
resolving and multicast joining are implemented within cma.c by
filling in the responses and running callbacks in the CMA work queue.

IP --> GID resolution always yields IPv6 link local addresses; remote
GIDs are derived from the destination MAC address of the remote port.
Multicast GIDs are always mapped to multicast MACs as is done in IPv6.
(IPv4 multicast is enabled by translating IPv4 multicast addresses to
IPv6 multicast as described in
<http://www.mail-archive.com/ipng@sunroof.eng.sun.com/msg02134.html>.)

Some helper functions are added to ib_addr.h.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
9893e742a0d942dda2277e9f3e19b726900adf27 15-May-2010 Julia Lawall <julia@diku.dk> IB/core: Use kmemdup() instead of kmalloc()+memcpy()

Use kmemdup when some other buffer is immediately copied into the
allocated region.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
statement S;
@@

- to = \(kmalloc\|kzalloc\)(size,flag);
+ to = kmemdup(from,size,flag);
if (to==NULL || ...) S
- memcpy(to, from, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
5d7220e8dc24feed4bbd66667b7696906a147ac4 15-Apr-2010 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> RDMA/cma: Randomize local port allocation

Randomize local port allocation in the way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ae2d9293d7cfba70f44f59cfedb44122828c73b8 25-Mar-2010 Sean Hefty <sean.hefty@intel.com> RDMA/cm: Set num_paths when manually assigning path records

When manually assigning the path records to use for a connection, save
the number of paths that were set. Otherwise, checks against num_path
will show 0, even though path record data is available.

This was discovered by manually setting the path records from user
space, then querying the kernel to see if the correct path records
were assigned, only to discover that the kernel returned 0 path
records to the query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
8523c0480979080e8088e40f25459e5b2d19f621 09-Feb-2010 Sean Hefty <sean.hefty@intel.com> RDMA/cm: Revert association of an RDMA device when binding to loopback

Revert the following change from commit 6f8372b6 ("RDMA/cm: fix
loopback address support")

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address. (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdma_bind_addr,
no device is associated with the rdma_cm_id. Fix this.

It turns out that important apps such as Open MPI depend on
rdma_bind_addr() NOT associating any RDMA device when binding to a
loopback address. Open MPI is being updated to deal with this, but at
least until a new Open MPI release is available, maintain the previous
behavior: allow rdma_bind_addr() to succeed, but do not bind to a
device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
fd4582a3999e03fa9eae315bf14c88fd32d44035 06-Jan-2010 Robert P. J. Day <rpjday@crashcourse.ca> IB/addr: Correct CONFIG_IPv6 to CONFIG_IPV6

Correct misspelled "CONFIG_IPv6" that was introduced in commit
d14714df ("IB/addr: Fix IPv6 routing lookup"). The config variable
should be all uppercase.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>

[ This was my fault when I munged the original patch. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
d14714df61681cfecf945a58436edf197327e87f 20-Nov-2009 Sean Hefty <sean.hefty@intel.com> IB/addr: Fix IPv6 routing lookup

Include link scope as part of address resolution. Combine local
and remote address resolution into a single, simpler code path.
Fix error checking in the IPv6 routing lookups.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fix up cma_check_linklocal() for !IPV6 case. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
6f8372b69c3198e06cecb1df2cb9682d0c55e657 19-Nov-2009 Sean Hefty <sean.hefty@intel.com> RDMA/cm: fix loopback address support

The RDMA CM is intended to support the use of a loopback address
when establishing a connection; however, the behavior of the CM
when loopback addresses are used is confusing and does not always
work, depending on whether loopback was specified by the server,
the client, or both.

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address. (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdam_bind_addr,
no device is associated with the rdma_cm_id. Fix this.

If a loopback address is specified by the client as the destination
address for a connection, it will fail to establish a connection.
This is true even if the server is listing across all addresses or
on the loopback address itself. The issue is that the server tries
to translate the IP address carried in the REQ message to a local
net_device address, which fails. The translation is not needed in
this case, since the REQ carries the actual HW address that should
be used.

Finally, cleanup loopback support to be more transport neutral.
Replace separate calls to get/set the sgid and dgid from the
device address to a single call that behaves correctly depending
on the format of the device address. And support both IPv4 and
IPv6 address formats.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Fixed RDS build by s/ib_addr_get/rdma_addr_get/ - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
c4315d85f9b76834289fd503796c01b8311c4b84 19-Nov-2009 Sean Hefty <sean.hefty@intel.com> IB/addr: Store net_device type instead of translating to RDMA transport

The struct rdma_dev_addr stores net_device address information:
the source device address, destination hardware address, and
broadcast address. For consistency, store the net_device type
rather than converting it to the rdma_node_type.

The type indicates the format of the various hardware addresses,
which is what we're concerned with, and not the RDMA node type
that the address may map to.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
6266ed6e4164466177238b11ecb825a3a108a3e4 19-Nov-2009 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Replace net_device pointer with index

Provide the device interface when resolving route information to
ensure that the correct outbound device is used. This will also
simplify processing of sin6_scope_id for IPv6 support.

Based on work from:
David Wilder <dwilder@us.ibm.com>
Jason Gunthorpe <jgunthrope@obsidianresearch.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
e2e626972e652d18520f84d69fc06cfa307d11ff 19-Nov-2009 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> RDMA/cma: Fix AF_INET6 support in multicast joining

If joining to an AF_INET6 address, we need to map the address to a MGID
in the same way as the IP stack. The old code would just fall through to
the IPv4 case and generate garbage.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1c9b281997b5876c0c8ed62506b56db89d262b57 19-Nov-2009 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> RDMA/cma: Correct detection of SA Created MGID

RDMA CM treats AF_INET6 addresses that are either 0 or prefixed with
FF1x:A01B::/32 as MGIDs, but the detection for the prefix was buggy;
fix it up.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
716abb1fdf3274ac81dc404f3659cc05d8cdf606 23-Jun-2009 Peter Huewe <peterhuewe@gmx.de> RDMA: Add __init/__exit macros to addr.c and cma.c

Add __init and __exit annotations to the module_init/module_exit
functions from drivers/infiniband/core/addr.c and cma.c.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
d2ca39f262806aa2f035f680a14aa55ff9e3d889 08-Apr-2009 Yossi Etigin <root@voltaire.com> RDMA/cma: Create cm id even when IB port is down

When doing rdma_resolve_addr(), if the relevant IB port is down, the
function fails and the cm_id is not bound to the correct device.
Therefore, application does not have a device handle and cannot wait
for the port to become active. The function fails because the
underlying IPoIB interface is not joined to the broadcast group and
therefore the SA does not have a multicast record to take a Q_Key
from.

The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
id_priv->qkey if it was not set, and will be called just before the
Q_Key is really required.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
84adeee9aaa0d81712de1e0ea74caed3398e4a1d 01-Apr-2009 Yossi Etigin <yosefe@Voltaire.COM> RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups

When joining an IPoIB multicast group, use the same rate as in the
broadcast group. Otherwise, if the RDMA CM creates this group before
IPoIB does, it might get a different rate. This will cause IPoIB to
fail joining to the same group later on, because IPoIB uses strict
rate selection.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1f5175adeaa1d161f603ef351785a19814dfe900 24-Dec-2008 Aleksey Senin <alekseys@voltaire.com> RDMA/cma: Add IPv6 support

Handle AF_INET6 cases where required, and use struct sockaddr_storage
wherever an IPv6 address might be stored.

Signed-off-by: Aleksey Senin <aleksey@alst60.(none)>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
3f44675439b136d51179d31eb5a498383cb38624 04-Aug-2008 Roland Dreier <rolandd@cisco.com> RDMA/cma: Remove padding arrays by using struct sockaddr_storage

There are a few places where the RDMA CM code handles IPv6 by doing

struct sockaddr addr;
u8 pad[sizeof(struct sockaddr_in6) -
sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

struct sockaddr_storage addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
switch to struct sockaddr_storage and get rid of padding arrays in
struct rdma_addr. ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
38ca83a588662f0af684ba2567dd910a564268ab 22-Jul-2008 Amir Vadai <amirv@mellanox.co.il> RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event

Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state. Report the timewait
event through the rdma_cm.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
dd5bdff83b19d9174126e0398b47117c3a80e22d 22-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com> RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event

Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does. In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID. The consumer can act on the
event or just ignore it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
de910bd92137005b5e1ecaf2ce68053d7d7d5350 15-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com> RDMA/cma: Simplify locking needed for serialization of callbacks

The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable. This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
64c5e613b9dd34ef1281ed6d22478609667ae36a 15-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com> RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr

Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr(). Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
468f2239bcc71ae0f345c3fe58c797cf4627daf4 15-Jul-2008 Roland Dreier <rolandd@cisco.com> RDMA/cma: Add missing newlines to printk()s

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
a9474917099e007c0f51d5474394b5890111614f 15-Jul-2008 Sean Hefty <sean.hefty@intel.com> RDMA: Fix license text

The license text for several files references a third software license
that was inadvertently copied in. Update the license to what was
intended. This update was based on a request from HP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
10f32065a25d22bf7894fa39ff2ce8492922086a 17-Apr-2008 Julia Lawall <julia@diku.dk> RDMA/iwcm: Test rdma_create_id() for IS_ERR rather than 0

The function rdma_create_id() always returns either a valid pointer or
a value made with ERR_PTR, so its result should be tested with IS_ERR,
not with a test for 0.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

//<smpl>
@a@
expression E, E1;
statement S,S1;
position p;
@@

E = rdma_create_id(...)
... when != E = E1
if@p (E) S else S1

@n@
position a.p;
expression E,E1;
statement S,S1;
@@

E = NULL
... when != E = E1
if@p (E) S else S1

@depends on !n@
expression E;
statement S,S1;
position a.p;
@@

* if@p (E)
S else S1
//</smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1b90c137cc2a0e9b813a8ae316827c493c664146 29-Mar-2008 Al Viro <viro@ftp.linux.org.uk> trivial endianness annotations: infiniband core

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ead595aeb0974171eddd012df115424752413c26 13-Feb-2008 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Do not issue MRA if user rejects connection request

There's an undesirable interaction with issuing MRA requests to
increase connection timeouts and the listen backlog.

When the rdma_cm receives a connection request, it queues an MRA with
the ib_cm. (The ib_cm will send an MRA if it receives a duplicate
REQ.) The rdma_cm will then create a new rdma_cm_id and give that to
the user, which in this case is the rdma_user_cm.

If the listen backlog maintained in the rdma_user_cm is full, it
destroys the rdma_cm_id, which in turns destroys the ib_cm_id. The
ib_cm_id generates a REJ because the state of the ib_cm_id has changed
to MRA sent, versus REQ received. When the backlog is full, we just
want to drop the REQ so that it is retried later.

Fix this by deferring queuing the MRA until after the user of the
rdma_cm has examined the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1ab352768fc73838b062776ca5d1add3876a019f 23-Jan-2008 Denis V. Lunev <den@openvz.org> [NETNS]: Add namespace parameter to ip_dev_find.

in_dev_find() need a namespace to pass it to fib_get_table(), so add
an argument.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6360a02af1599e46b023ccbb85545ed97c6f662c 16-Dec-2007 Joe Perches <joe@perches.com> [IPV4] drivers/infiniband: Use ipv4_is_<type>

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5851bb893e5bb87150817c180ccddcf4e78db1b6 04-Jan-2008 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Override default responder_resources with user value

By default, the responder_resources parameter is set to that received
in a connection request. The passive side may override this value
when accepting the connection. Use the value provided by the passive
side when transitioning the QP to RTR state, rather than the value
given in the connect request. Without this change, the RTR transition
may fail if the passive side supports fewer responder_resources than
that in the request.

For code consistency and to protect against QP destruction, restructure
overriding initiator_depth to match how responder_resources is set.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
a9e527e3f9f4510e9f3450ca3bc51bc3ef2854fd 10-Dec-2007 Rolf Manderscheid <rvm@obsidianresearch.com> IPoIB: improve IPv4/IPv6 to IB mcast mapping functions

An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
45d9478da106c749203056f56e94d0e370dfac87 08-Dec-2007 Vladimir Sokolovsky <vlad@mellanox.co.il> RDMA/cma: Reenable device removal on passive side

Enable conn_id remove on the passive side after connection
establishment. This corrects an issue where the IB driver can't be
unloaded after running applications over RDS. The 'dev_remove' counter
does not reach 0 for established connections on the passive side.

This problem is limited to device removal, and only occurs on the
passive side if there are established connections.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
8d8293cfb38b042835eeded7c1d90f75ca243e87 29-Oct-2007 Steve Wise <swise@opengridcomputing.com> RDMA/iwcm: Set initiator depth and responder resources to device max values

Set the initiator depth and responder resources to the device max
values for new connect request events in the iWARP connection manager.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
a25de534f89c515c82d3553c42d3bb02c2d1a7da 19-Oct-2007 Anton Arapov <aarapov@redhat.com> [INET]: Justification for local port range robustness.

There is a justifying patch for Stephen's patches. Stephen's patches
disallows using a port range of one single port and brakes the meaning
of the 'remaining' variable, in some places it has different meaning.
My patch gives back the sense of 'remaining' variable. It should mean
how many ports are remaining and nothing else. Also my patch allows
using a single port.

I sure we must be able to use mentioned port range, this does not
restricted by documentation and does not brake current behavior.

usefull links:
Patches posted by Stephen Hemminger
http://marc.info/?l=linux-netdev&m=119206106218187&w=2
http://marc.info/?l=linux-netdev&m=119206109918235&w=2

Andrew Morton's comment
http://marc.info/?l=linux-kernel&m=119248225007737&w=2

1. Allows using a port range of one single port.
2. Gives back sense of 'remaining' variable.

Signed-off-by: Anton Arapov <aarapov@redhat.com>
Acked-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
d02d1f5359e795bac9a4461698521680cddd782b 09-Oct-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix deadlock destroying listen requests

Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>.
The deadlock occurs when a connection request arrives at the same
time that a wildcard listen is being destroyed.

A wildcard listen maintains per device listen requests for each
RDMA device in the system. The per device listens are automatically
added and removed when RDMA devices are inserted or removed from
the system.

When a wildcard listen is destroyed, rdma_destroy_id() acquires
the rdma_cm's device mutex ('lock') to protect against hot-plug
events adding or removing per device listens. It then tries to
destroy the per device listens by calling ib_destroy_cm_id() or
iw_destroy_cm_id(). It does this while holding the device mutex.

However, if the underlying iw/ib CM reports a connection request
while this is occurring, the rdma_cm callback function will try
to acquire the same device mutex. Since we're in a callback,
the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until
their callback thread returns, but the callback is blocked waiting for
the device mutex.

Fix this by re-working how per device listens are destroyed. Use
rdma_destroy_id(), which avoids the deadlock, in place of
cma_destroy_listen(). Additional synchronization is added to handle
device hot-plug events and ensure that the id is not destroyed twice.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
c5483388bb4d771007ef36478db038e07922a020 24-Sep-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add locking around QP accesses

If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically
transition the QP through its states (RTR, RTS, error, etc.) While the
QP state transitions are occurring, the QP itself must remain valid.
Provide locking around the QP pointer to prevent its destruction while
accessing the pointer.

This fixes an issue reported by Olaf Kirch from Oracle that resulted in
a system crash:

"An incoming connection arrives and we decide to tear down the nascent
connection. The remote ends decides to do the same. We start to shut
down the connection, and call rdma_destroy_qp on our cm_id. ... Now
apparently a 'connect reject' message comes in from the other host,
and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED.
It calls cma_modify_qp_err, which for some odd reason tries to modify
the exact same QP we just destroyed."

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
227b60f5102cda4e4ab792b526a59c8cb20cd9f8 11-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org> [INET]: local port range robustness

Expansion of original idea from Denis V. Lunev <den@openvz.org>

Add robustness and locking to the local_port_range sysctl.
1. Enforce that low < high when setting.
2. Use seqlock to ensure atomic update.

The locking might seem like overkill, but there are
cases where sysadmin might want to change value in the
middle of a DoS attack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dcb3f974da827c964cb8d419fbb4350cdc08a559 01-Aug-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries

Automatically queue MRA message to decrease the number of retries sent
by the remote side during connection establishment. This also has the
effect of increasing the overall connection timeout without using a
longer retry time in the case of dropped packets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
a81c994d5eef87ed77cb30d8343d6be296528b3f 09-Aug-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add ability to specify type of service

Provide support to specify a type of service for a communication
identifier. A new function call is used when dealing with IPv4
addresses. For IPv6 addresses, the ToS is specified through the
traffic class field in the sockaddr_in6 structure.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ The comments Eitan Zahavi and myself have made over the v1 post at
<http://lists.openfabrics.org/pipermail/general/2007-August/039247.html>
were fully addressed. ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
8f076531cd20fbf83ed889024c8133d0c71a1fe4 17-Jul-2007 Dotan Barak <dotanb@dev.mellanox.co.il> RDMA/cma: Remove local write permission from QP access flags

Local write permission makes no sense as part of the QP access flags,
since the access flags only control what the remote end of the
connection is allowed to do. Remove the code in the RDMA CM that
initializes qp_access_flags with IB_ACCESS_LOCAL_WRITE.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1d84612649427a85e1f311baa7215f9a6252d856 18-Jun-2007 Sean Hefty <sean.hefty@intel.com> IB/cm: Include HCA ACK delay in local ACK timeout

The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs. If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
bf2944bd56c7a48cc3962a860dbc4ceee6b1ace8 05-Jun-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix initialization of next_port

next_port should be between sysctl_local_port_range[0] and [1].
However, it is initially set to a random value with get_random_bytes().
If the value is negative when treated as a signed integer, next_port
can end up outside the expected range because of the result of the %
operator being negative.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
6c719f5c6c823901fac2d46b83db5a69ba7e9152 07-May-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add check to validate that cm_id is bound to a device

Several checks in the rdma_cm check against the state of the
cm_id, but only to validate that the cm_id is bound to an underlying
transport specific CM and an RDMA device. Make the check explicit
in what we're trying to check for, since we're not synchronizing
against the cm_id state.

This will allow a user to disconnect a cm_id or reject a connection
after receiving a device removal event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
be65f086f2a50c478b2f5ecf4c55a52a4e95059a 07-May-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Fix synchronization with device removal in cma_iw_handler

The cma_iw_handler needs to validate the state of the rdma_cm_id before
processing a new connection request to ensure that a device removal is
not already being processed for the same rdma_cm_id. Without the state
check, the user can receive simultaneous callbacks for the same cm_id, or
a callback after they've destroyed the cm_id.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
8aa08602bdd617a9cdd147f19076a8c8a70e03ef 07-May-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Simplify device removal handling code

Add a new routine and rename another to encapsulate common code for
synchronizing with device removal.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
cb164b8c6a2bdf995c938e5f157d41465b18e5c3 05-Mar-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Initialize rdma_bind_list in cma_alloc_any_port()

The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list. Fix this by using kzalloc() to make
sure all pointers are NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
1836854f25b1bc63766bff06aeeb83d2a602b050 22-Feb-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Remove unused node_guid from cma_device structure

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
962063e64b0c55d270979fa0e4ae26daedac6282 22-Feb-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Request reversible paths only

The rdma_cm requires that path records be reversible. Set the
reversible bit when issuing an path record query.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
c8f6a362bf3eb28ade6027b49bb160a336dd51c0 16-Feb-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add multicast communication support

Extend rdma_cm to support multicast communication. Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space. The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP. The port space determines the signature used in the
MGID when joining the group. The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.

Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant. This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP. The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.

Multicast support is also exported to userspace through the rdma_ucm.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
c7f743a669c27f9c392e78fda8829db9d6d50f43 01-Feb-2007 Sean Hefty <sean.hefty@intel.com> IB: Remove redundant "_wq" from workqueue names

Signed-off-by: Roland Dreier <rolandd@cisco.com>
aedec08050255db1989a38b59616dd973dfe660b 30-Jan-2007 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Increment port number after close to avoid re-use

Randomize the starting port number and avoid re-using port values
immediately after they are closed. Instead keep track of the last
port value used and increment it every time a new port number is
assigned, to better replicate other port spaces.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
881a045fc5b454b57c69e010acecd5830d87e242 15-Dec-2006 Steve Wise <swise@opengridcomputing.com> RDMA/iwcm: iWARP connection timeouts shouldn't be reported as rejects

The iWARP CM should report timeouts as event RDMA_CM_EVENT_UNREACHABLE,
not event RDMA_CM_EVENT_REJECTED.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
628e5f6d39d5a6be96c1272a6709f2dd3ec8b7ce 01-Dec-2006 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Add support for RDMA_PS_UDP

Allow the use of UD QPs through the rdma_cm, in order to provide
address translation services for resolving IB addresses for datagram
messages using SIDR.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
0fe313b000b6a699afbbb59ef9c47a2b22146f1e 01-Dec-2006 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Allow early transition to RTS to handle lost CM messages

During connection establishment, the passive side of a connection can
receive messages from the active side before the connection event has
been delivered to the user. Allow the passive side to send messages
in response to received data before the event is delivered. To handle
the case where the connection messages are lost, a new rdma_notify()
function is added that users may invoke to force a connection into the
established state.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
a1b1b61f80aba49f1e0f32b0e4b1c35be91c57fa 01-Dec-2006 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Report connect info with connect events

Connection information was never given to the recipient of a
connection request or reply message. Only the event was delivered.
Report the connection data with the event to allows user to
reject the connection based on the requested parameters, or adjust
their resources to match the request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
9b2e9c0c241e532d923fff23d9a7c0bd31bd96b1 01-Dec-2006 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Remove unneeded qp_type parameter from rdma_cm

The qp_type parameter into the rdma_cm is unneeded, and can be
misleading. The QP type should be determined from the port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
e31353eaeca736981ec13b46089d30147342b28b 24-Oct-2006 Dotan Barak <dotanb@mellanox.co.il> RDMA/cm: Remove setting local write as part of QP access flags

The qp_access_flags are for remote access permissions only, so
IB_ACCESS_LOCAL_WRITE is an invalid value. Remove it from the values
set by cm_init_qp_init_attr() and cma_init_ib_qp().

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
a1a733f65b091fdad3d0783e648c92b491933ab6 17-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Rewrite cma_req_handler() to encapsulate common code

Rewrite cma_req_handler error handling case to encapsulate
common code.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
e4022274cf8df1f78f9e20ba7e199a9edf655422 16-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Remove redundant check in cma_add_one

Remove redundant check of node_guid in cma_add_one().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
e82153b54d75af31d5d4a84efe441e5719f34cfc 16-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Optimize cma_bind_loopback() to check for empty list

Optimize to test for an empty list first. This ends up simplifying
the code too.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
7a118df3ea23820b9922a1b51cd2f24e464f4c17 31-Oct-2006 Sean Hefty <sean.hefty@intel.com> RDMA/addr: Use client registration to fix module unload race

Require registration with ib_addr module to prevent caller from
unloading while a callback is in progress.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
255d0c14b3757e8bd85add874e4dca4c3621b39e 24-Oct-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count

rdma_bind_addr() leaks a cma_dev reference count in failure case.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
3f168d2b66d2314fea40614a3b966c1a0b6241a9 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Optimize error handling

Reorganize code relating to cma_get_net_info() and rdam_create_id() to
optimize error case handling (no need to alloc memory/etc. as part of
rdma_create_id() if input parameters are wrong).

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
94de178ac636e9d6f89b11cb3dd400b777942ac9 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Eliminate unnecessary remove_list

Eliminate remove_list by using list_del_init() instead during device
removal handling.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
8f0472d331619d5d74927978d0dde5b4935e41a5 29-Sep-2006 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Set status correctly on route resolution error

On reporting a route error, also include the status for the error,
rather than indicating a status of 0 when an error has occurred.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
6e35aabee125999f4b3c01326f5339fa74a89259 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Fix device removal race

The race is as follows:

A process : cma_process_remove() calls cma_remove_id_dev(),
which sets id state to CMA_DEVICE_REMOVAL and
calls wait_event(dev_remove).

B process : cma_req_handler() had incremented dev_remove,
and calls cma_acquire_ib_dev() and on failure
calls cma_release_remove(), which does a
wake_up of cma_process_remove(). Then
cma_req_handler() calls rdma_destroy_id();

A Process : cma_remove_id_dev() gets woken and checks the
state of id, and since it is still (wrongly)
CMA_DEVICE_REMOVAL, it calls notify_user(id)
and if that fails, the caller - cma_process_remove()
calls rdma_destroy_id(id). Two processes can
call rdma_destroy_id(), resulting in one
de-referencing kfreed id_priv.

Fix is for process B to set CMA_DESTROYING in cma_req_handler()
so that process A will return instead of doing a rdma_destroy_id().

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
675a027c3db25a439f6ea744bb0c284f983dbfb9 29-Sep-2006 Krishna Kumar <krkumar2@in.ibm.com> RDMA/cma: Fix leak of cm_ids in case of failures

cma_connect_ib() and cma_connect_iw() leak cm_id's in failure cases.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
c1a0b23bf477c2e1068905f4e2b5c3cee139e853 22-Aug-2006 Michael S. Tsirkin <mst@mellanox.co.il> IB/sa: Require SA registration

Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running. Update all in-tree users for the new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
61a73c708f37295892176bc911b178278df6a091 02-Sep-2006 Sean Hefty <sean.hefty@intel.com> RDMA/cma: Protect against adding device during destruction

Closes a window where address resolution can attach an rdma_cm_id to a
device during destruction of the rdma_cm_id. This can result in the
rdma_cm_id remaining in the device list after its memory has been
freed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
07ebafbaaa72aa6a35472879008f5a1d1d469a0c 03-Aug-2006 Tom Tucker <tom@opengridcomputing.com> RDMA: iWARP Core Changes.

Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
- Hook iWARP CM into the build system and use it in rdma_cm.
- Convert enum ib_node_type to enum rdma_node_type, which includes
the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
3cd965646b7cb75ae84dd0daf6258adf20e4f169 23-Sep-2006 Roland Dreier <rolandd@cisco.com> IB: Whitespace fixes

Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all. Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
d5bb75999cb5733ad936ff000023221fe7a13c59 13-Sep-2006 Michael S. Tsirkin <mst@mellanox.co.il> RDMA/cma: Increase the IB CM retry count in CMA

3 seems like a low number of IB Communication Manager retries to set;
we see connections failing under stress, and in any case 3 just looks
like an arbitrary number. 15 is the max value allowed by the
InfiniBand spec.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
74f76fbac72c84ac78620698a584d403b655e62a 14-Jul-2006 Ira Weiny <weiny2@llnl.gov> [PATCH] IB/cm: set private data length for reject messages

Set private data length for reject messages to the correct size. Fix from
openib svn r8483.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
f0ee3404cce2c45f8b95b341dd6311cd92e5cee0 14-Jul-2006 Michael S. Tsirkin <mst@mellanox.co.il> [PATCH] IB/addr: gid structure alignment fix

The device address contains unsigned character arrays, which contain raw GID
addresses. The GIDs may not be naturally aligned, so do not cast them to
structures or unions.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
5fd571cbc13db113bda26c20673e1ec54bfd26b4 21-Jun-2006 Eric Sesterhenn <snakebyte@gmx.de> [PATCH] Array overrun in drivers/infiniband/core/cma.c

This was spotted by coverity #id 1300. Since the array has only four
elements, we should just use those four.

Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
e51060f08a61965c4dd91516d82fe90617152590 18-Jun-2006 Sean Hefty <sean.hefty@intel.com> IB: IP address based RDMA connection manager

Kernel connection management agent over InfiniBand that connects based
on IP addresses. The agent defines a generic RDMA connection
abstraction to support clients wanting to connect over different RDMA
devices.

The agent also handles RDMA device hotplug events on behalf of clients.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>