Cross Reference: /drivers/infiniband/ulp/ipoib/ipoib

History log of /drivers/infiniband/ulp/ipoib/ipoib_cm.c
Revision	Date	Author	Comments
09b93088d75009807b72293f26e2634430ce5ba9	11-May-2014	Or Gerlitz <ogerlitz@mellanox.com>	IB: Add a QP creation flag to use GFP_NOIO allocations This addresses a problem where NFS client writes over IPoIB connected mode may deadlock on memory allocation/writeback. The problem is not directly memory reclamation. There is an indirect dependency between network filesystems writing back pages and ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot make forward progress until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself waiting for network transmission. Ordinarily this situation may be avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information. To address this, take a general approach and add a new QP creation flag that tells the low-level hardware driver to use GFP_NOIO for the memory allocations related to the new QP. Use the new flag in the ipoib connected mode path, and if the driver doesn't support it, re-issue the QP creation without the flag. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
22252b4e09f19431f82dc26428e0edf697d19a04	16-Oct-2013	Tal Alon <talal@mellanox.com>	IPoIB: Change CM skb memory allocation to be non-atomic during init Change CM skb memory allocation to use GFP_KERNEL when possible. During device init there's no need to use GFP_ATOMIC when allocating memory for the CM skbs -- use GFP_KERNEL instead. Signed-off-by: Tal Alon <talal@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
49b8e74438062aba379ce5d3f91b9e6e2e9c6745	09-Aug-2013	Jim Foraker <foraker1@llnl.gov>	IPoIB: Fix race in deleting ipoib_neigh entries In several places, this snippet is used when removing neigh entries: list_del(&neigh->list); ipoib_neigh_free(neigh); The list_del() removes neigh from the associated struct ipoib_path, while ipoib_neigh_free() removes neigh from the device's neigh entry lookup table. Both of these operations are protected by the priv->lock spinlock. The table however is also protected via RCU, and so naturally the lock is not held when doing reads. This leads to a race condition, in which a thread may successfully look up a neigh entry that has already been deleted from neigh->list. Since the previous deletion will have marked the entry with poison, a second list_del() on the object will cause a panic: #5 [ffff8802338c3c70] general_protection at ffffffff815108c5 [exception RIP: list_del+16] RIP: ffffffff81289020 RSP: ffff8802338c3d20 RFLAGS: 00010082 RAX: dead000000200200 RBX: ffff880433e60c88 RCX: 0000000000009e6c RDX: 0000000000000246 RSI: ffff8806012ca298 RDI: ffff880433e60c88 RBP: ffff8802338c3d30 R8: ffff8806012ca2e8 R9: 00000000ffffffff R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804346b2020 R13: ffff88032a3e7540 R14: ffff8804346b26e0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib] #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm] #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm] #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca We move the list_del() into ipoib_neigh_free(), so that deletion happens only once, after the entry has been successfully removed from the lookup table. This same behavior is already used in ipoib_del_neighs_by_gid() and __ipoib_reap_neigh(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
50bea5c0d583e255e3d8a193f8b20e177bac8ec8	08-May-2013	Andrew Morton <akpm@linux-foundation.org>	drivers/infiniband/hw: rename random32() to prandom_u32() Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cc529c0d72bd2490801ce324caf2ea0a0c1d7b1e	04-Mar-2013	Akinobu Mita <akinobu.mita@gmail.com>	RDMA: Rename random32() to prandom_u32() Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Steve Wise <swise@chelsio.com> Cc: linux-rdma@vger.kernel.org Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
1ee9e2aa7b31427303466776f455d43e5e3c9275	26-Feb-2013	Mike Marciniszyn <mike.marciniszyn@intel.com>	IPoIB: Fix send lockup due to missed TX completion Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM traffic") attempts to solve an issue where unprocessed UD send completions can deadlock the netdev. The patch doesn't fully resolve the issue because if more than half the tx_outstanding's were UD and all of the destinations are RC reachable, arming the CQ doesn't solve the issue. This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the ib_req_notify_cq(). If the rc is above 0, the UD send cq completion callback is called directly to re-arm the send completion timer. This issue is seen in very large parallel filesystem deployments and the patch has been shown to correct the issue. Cc: <stable@vger.kernel.org> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
7e5a90c25f89128c096dbdb0e5451962438b1e05	04-Feb-2013	Shlomo Pongratz <shlomop@mellanox.com>	IPoIB: Fix crash due to skb double destruct After commit b13912bbb4a2 ("IPoIB: Call skb_dst_drop() once skb is enqueued for sending"), using connected mode and running multithreaded iperf for long time, ie iperf -c <IP> -P 16 -t 3600 results in a crash. After the above-mentioned patch, the driver is calling skb_orphan() and skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send() (also in ipoib_ib.c::ipoib_send()) The problem with this is, as is written in a comment in both routines, "it's entirely possible that the completion handler will run before we execute anything after the post_send()." This leads to running the skb cleanup routines simultaneously in two different contexts. The solution is to always perform the skb_orphan() and skb_dst_drop() before queueing the send work request. If an error occurs, then it will be no different than the regular case where dev_free_skb_any() in the completion path, which is assumed to be after these two routines. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
b13912bbb4a2d445c35db24586eb6ecf4707d52d	19-Dec-2012	Roland Dreier <roland@purestorage.com>	IPoIB: Call skb_dst_drop() once skb is enqueued for sending Currently, IPoIB delays collecting send completions for TX packets in order to batch work more efficiently. It does skb_orphan() right after queuing the packets so that destructors run early, to avoid problems like holding socket send buffers for too long (since we might not collect a send completion until a long time after the packet is actually sent). However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks at skb_dst() to update the PMTU when it gets a too-long packet. This means that the packets sitting in the TX ring with uncollected send completions are holding a reference on the dst. We've seen this lead to pathological behavior with respect to route and neighbour GC. The easy fix for this is to call skb_dst_drop() when we call skb_orphan(). Also, give packets sent via connected mode (CM) the same skb_orphan() / skb_dst_drop() treatment that packets sent via datagram mode get. Signed-off-by: Roland Dreier <roland@purestorage.com>
71d9c5f9e60846fa40c9efadda122d9cf275c1d2	03-Oct-2012	Roland Dreier <roland@purestorage.com>	IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n With the new netlink support in commit 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even if connected mode isn't built. Move the function from ipoib_cm.c to ipoib_main.c (and make a few CM-related macros available unconditonally). This fixes the build error drivers/built-in.o: In function 'ipoib_changelink': ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode' ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode' when CONFIG_INFINIBAND_IPOIB_CM isn't set. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
862096a8bbf8f992f6d0a1a8786ffd3fc7437e48	27-Sep-2012	Or Gerlitz <ogerlitz@mellanox.com>	IB/ipoib: Add more rtnl_link_ops callbacks Add the rtnl_link_ops changelink and fill_info callbacks, through which the admin can now set/get the driver mode, etc policies. Maintain the proprietary sysfs entries only for legacy childs. For child devices, set dev->iflink to point to the parent device ifindex, such that user space tools can now correctly show the uplink relation as done for vlan, macvlan, etc devices. Pointed out by Patrick McHardy <kaber@trash.net> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
fa16ebed31f336e41970f3f0ea9e8279f6be2d27	13-Aug-2012	Shlomo Pongratz <shlomop@mellanox.com>	IB/ipoib: Add missing locking when CM object is deleted Commit b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM object is moved between lists without any supported locking. Under a stress test, this eventually leads to list corruption and a crash. Previously when this routine was called, callers were taking the device priv lock. Currently this function is called from the RCU callback associated with neighbour deletion. Fix the race by taking the same lock we used to before. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
b63b70d8774175b6f8393c495fe455f0fba55ce1	24-Jul-2012	Shlomo Pongratz <shlomop@mellanox.com>	IPoIB: Use a private hash table for path lookup in xmit path Dave Miller <davem@davemloft.net> provided a detailed description of why the way IPoIB is using neighbours for its own ipoib_neigh struct is buggy: Any time an ipoib_neigh is changed, a sequence like the following is made: spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that * path->ah will always hold one more reference, * so ipoib_put_ah() will never do more than * decrement the ref count. / if (neigh->ah) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, n, dev); This doesn't work, because you're leaving a stale pointer to the freed up ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails with all the locking done to protect _changes_ to ipoib_neigh(n), and with the code in ipoib_neigh_free() that NULLs out the pointer. The core issue is that read side calls to to_ipoib_neigh(n) are not being synchronized at all, they are performed without any locking. So whether we hold the lock or not when making changes to ipoib_neigh(n) you still can have threads see references to freed up ipoib_neigh objects. cpu 1 cpu 2 n = ipoib_neigh() ipoib_neigh() = NULL kfree(n) n->foo == OOPS [..] Perhaps the ipoib code can have a private path database it manages entirely itself, which holds all the necessary information and is looked up by some generic key which is available easily at transmit time and does not involve generic neighbour entries. See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b> for the full discussion. This patch aims to solve the race conditions found in the IPoIB driver. The patch removes the connection between the core networking neighbour structure and the ipoib_neigh structure. In addition to avoiding the race described above, it allows us to handle SKBs carrying IP packets that don't have any associated neighbour. We add an ipoib_neigh hash table with N buckets where the key is the destination hardware address. The ipoib_neigh is fetched from the hash table and instead of the stashed location in the neighbour structure. The hash table uses both RCU and reference counting to guarantee that no ipoib_neigh instance is ever deleted while in use. Fetching the ipoib_neigh structure instance from the hash also makes the special code in ipoib_start_xmit that handles remote and local bonding failover redundant. Aged ipoib_neigh instances are deleted by a garbage collection task that runs every M seconds and deletes every ipoib_neigh instance that was idle for at least 2*M seconds. The deletion is safe since the ipoib_neigh instances are protected using RCU and reference count mechanisms. The number of buckets (N) and frequency of running the GC thread (M), are taken from the exported arb_tbl. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
6700c2709c08d74ae2c3c29b84a30da012dbc7f1	17-Jul-2012	David S. Miller <davem@davemloft.net>	net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
d90f9b3591b3b5fa86178e318008fc1c531a84dc	06-Jul-2012	Roland Dreier <roland@purestorage.com>	IB: Use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) \|\| defined(CONFIG_IPV6_MODULE) Signed-off-by: Roland Dreier <roland@purestorage.com>
fec14d2fcebe824377ef0305babc365d039f6b39	30-Aug-2011	Paul Gortmaker <paul.gortmaker@windriver.com>	infiniband: add moduleparam.h to drivers/infiniband as required These files were getting the moduleparam infrastructure from the implicit presence of module.h being everywhere, but that is going away soon. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
9e903e085262ffbf1fc44a17ac06058aca03524a	18-Oct-2011	Eric Dumazet <eric.dumazet@gmail.com>	net: add skb frag size accessors To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set\|add\|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
787adb9d6ad9afb498a1580a7d8ad05f779c488a	18-Oct-2011	Dotan Barak <dotanb@dev.mellanox.co.il>	IPoIB: Use the right function to do DMA unmap pages Pages that were mapped using ib_dma_map_page() should be unmapped using ib_dma_unmap_page(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
96104eda01695a26da2c8f7423ec0ba3509c8c97	24-May-2011	Sean Hefty <sean.hefty@intel.com>	RDMA/core: Add SRQ type field Currently, there is only a single ("basic") type of SRQ, but with XRC support we will add a second. Prepare for this by defining an SRQ type and setting all current users to IB_SRQT_BASIC. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
5581be3b48914b0f9126b14daa02d334928322b0	25-Aug-2011	Ian Campbell <Ian.Campbell@citrix.com>	IPoIB: convert to SKB paged frag API. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
3d96c74d8983b16bc7ecb196e61a2173fcc3f09f	19-Apr-2011	Michał Mirosław <mirq-linux@rere.qmqm.pl>	net: infiniband/ulp/ipoib: convert to hw_features Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
948579cd8c6ea7c8c98c52b79f4470952e182ebd	05-Nov-2010	Joe Perches <joe@perches.com>	RDMA: Use vzalloc() to replace vmalloc()+memset(0) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
8ae31e5b1fc73751d800d551fb30340caa53c7dd	11-Jan-2011	Or Gerlitz <ogerlitz@voltaire.com>	IPoIB: Add GRO support Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
5a0e3ad6af8660be21ca98a971cd00f331318c05	24-Mar-2010	Tejun Heo <tj@kernel.org>	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
a48f509b26cec53338f4b0abd52ecea35e3974b8	04-Mar-2010	Or Gerlitz <ogerlitz@voltaire.com>	IPoIB: Include return code in trace message for ib_post_send() failures Print the return code of ib_post_send() if it fails to make these debugging messages more useful. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
f0dc117abdfa9a0e96c3d013d836460ef3cd08c7	03-Mar-2010	Eli Cohen <eli@mellanox.co.il>	IPoIB: Fix TX queue lockup with mixed UD/CM traffic The IPoIB UD QP reports send completions to priv->send_cq, which is usually left unarmed; it only gets armed when the number of outstanding send requests reaches the size of the TX queue. This arming is done only in the send path for the UD QP. However, when sending CM packets, the net queue may be stopped for the same reasons but no measures are taken to recover the UD path from a lockup. Consider this scenario: a host sends high rate of both CM and UD packets, with a TX queue length of N. If at some time the number of outstanding UD packets is more than N/2 and the overall outstanding packets is N-1, and CM sends a packet (making the number of outstanding sends equal N), the TX queue will be stopped. When all the CM packets complete, the number of outstanding packets will still be higher than N/2 so the TX queue will not be restarted. Fix this by calling ib_req_notify_cq() when the queue is stopped in the CM path. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
3ffe533c87281b68d469b279ff3a5056f9c75862	18-Feb-2010	Alexey Dobriyan <adobriyan@gmail.com>	ipv6: drop unused "dev" arg of icmpv6_send() Dunno, what was the idea, it wasn't used for a long time. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cd0bcf4cb963a147baf0b79d94c25ba86220f708	06-Sep-2009	Roland Dreier <rolandd@cisco.com>	IPoIB: Remove unused <rdma/ib_cache.h> includes Signed-off-by: Roland Dreier <rolandd@cisco.com>
451f14439847db302e5104c44458b2dbb4b1829d	31-Aug-2009	Eric Dumazet <eric.dumazet@gmail.com>	drivers: Kill now superfluous ->last_rx stores The generic packet receive code takes care of setting netdev->last_rx when necessary, for the sake of the bonding ARP monitor. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Neil Horman <nhorman@txudriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
adf30907d63893e4208dfe3f5c88ae12bc2f25d5	02-Jun-2009	Eric Dumazet <eric.dumazet@gmail.com>	net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry skb_dst(const struct sk_buff skb) void skb_dst_set(struct sk_buff skb, struct dst_entry dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
26574401fef6766f6c3ca25b5c13febe662d2a32	13-May-2009	Eric W. Biederman <ebiederm@xmission.com>	net: Fix ipoib rtnl_lock sysfs deadlock. Network device sysfs files that grab the rtnl_lock unconditionally will deadlock if accessed when the network device is being unregistered. So use trylock and syscall_restart to avoid this deadlock. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
5b095d98928fdb9e3b75be20a54b7a6cbf6ca9ad	29-Oct-2008	Harvey Harrison <harvey.harrison@gmail.com>	net: replace %p6 with %pI6 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
fcace2fe7a86237c451b09aaf7e2e9d19e09887f	29-Oct-2008	Harvey Harrison <harvey.harrison@gmail.com>	infiniband: ipoib replace IPOIB_GID_FMT with %p6 Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG() Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
943c246e9ba9078a61b6bcc5b4a8131ce8befb64	30-Sep-2008	Roland Dreier <rolandd@cisco.com>	IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling tx_lock. Not only do we want to get rid of LLTX, this actually causes problems because of the skb_orphan() done with this tx_lock held: some skb destructors expect to be run with interrupts enabled. The simplest fix for this is to get rid of the driver-private tx_lock and stop using LLTX. We kill off priv->tx_lock and use netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit tricky because we need to update places that take priv->lock inside the tx_lock to disable IRQs, rather than relying on tx_lock having already disabled IRQs. Also, there are a couple of places where we need to disable BHs to make sure we have a consistent context to call netif_tx_lock() (since we no longer can use _irqsave() variants), and we also have to change ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather than directly, because ipoib_send_comp_handler() runs in interrupt context and drain_tx_cq() must run in BH context so it can call netif_tx_lock(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
b1404069f64457c94de241738fdca142c2e5698f	09-Aug-2008	David J. Wilder <dwilder@us.ibm.com>	IPoIB/cm: Use vmalloc() to allocate rx_rings There are users that are running UDP applications that require a large receive queue size in order to get good performance. To prevent allocation failures for rx_rings when using non-SRQ mode and large recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to alocate rx_rings. Signed-off-by: David Wilder <dwilder@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
e08198169ec5facb3d85bb455efa44a2f8327842	30-Jul-2008	Roland Dreier <rolandd@cisco.com>	IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr() wr->sg_list should be set to the sge pointer passed in, not priv->cm.rx_sge. Reported-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
e112373fd6aa280bd2cbc0d5cc3809115325a1be	15-Jul-2008	Eli Cohen <eli@dev.mellanox.co.il>	IPoIB/cm: Reduce connected mode TX object size Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA mapping per send, so we don't need a mapping[] array. Define a new struct with a single u64 mapping member and use it for the CM tx_ring. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
bd3606715effbf37df986548c43bbed0842b49d5	15-Jul-2008	Eli Cohen <eli@mellanox.co.il>	IPoIB: Use dev_set_mtu() to change mtu When the driver sets the MTU of the net device outside of its change_mtu method, it should make use of dev_set_mtu() instead of directly setting the mtu field of struct netdevice. Otherwise functions registered to be called upon MTU change will not get called (this is done through call_netdevice_notifiers() in dev_set_mtu()). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
c8c2afe360b7366f586f6bece1109a72ea334876	15-Jul-2008	Eli Cohen <eli@mellanox.co.il>	IPoIB: Use rtnl lock/unlock when changing device flags Use of this lock is required to synchronize changes to the netdvice's data structs. Also move the call to ipoib_flush_paths() after the modification of the netdevice flags in set_mode(). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
a7d834c4bc6be73e8f83eaa5072fac3c5549f7f2	15-Jul-2008	Roland Dreier <rolandd@cisco.com>	IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq() For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(), and these two callers are not synchronized against each other. However, ipoib_cm_post_receive_nonsrq() always reuses the same receive work request and scatter list structures, so multiple callers can end up stepping on each other, which leads to posting garbled work requests. Fix this by having the caller pass in the ib_recv_wr and ib_sge structures to use, and allocating new local structures in ipoib_cm_nonsrq_init_rx(). Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam Nguyen <hnguyen@de.ibm.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
f89271da32bc1a636cf4eb078e615930886cd013	15-Jul-2008	Eli Cohen <eli@dev.mellanox.co.il>	IPoIB: Copy small received SKBs in connected mode The connected mode implementation in the IPoIB driver has a large overhead in the way SKBs are handled in the receive flow. It usually allocates an SKB with as big as was used in the currently received SKB and moves unused fragments from the old SKB to the new one. This involves a loop on all the remaining fragments and incurs overhead on the CPU. This patch, for small SKBs, allocates an SKB just large enough to contain the received data and copies to it the data from the received SKB. The newly allocated SKB is passed to the stack and the old SKB is reposted. When running netperf, UDP small messages, without this pach I get: UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 14.4.3.178 (14.4.3.178) port 0 AF_INET Socket Message Elapsed Messages Size Size Time Okay Errors Throughput bytes bytes secs # # 10^6bits/sec 114688 128 10.00 5142034 0 526.31 114688 10.00 1130489 115.71 With this patch I get both send and receive at ~315 mbps. The reason that send performance actually slows down is as follows: When using this patch, the overhead of the CPU for handling RX packets is dramatically reduced. As a result, we do not experience RNR NAK messages from the receiver which cause the connection to be closed and reopened again; when the patch is not used, the receiver cannot handle the packets fast enough so there is less time to post new buffers and hence the mentioned RNR NACKs. So what happens is that the application thinks it posted a certain number of packets for transmission but these packets are flushed and do not really get transmitted. Since the connection gets opened and closed many times, each time netperf gets the CPU time that otherwise would have been given to IPoIB to actually transmit the packets. This can be verified when looking at the port counters -- the output of ifconfig and the oputput of netperf (this is for the case without the patch): tx packets ========== port counter: 1,543,996 ifconfig: 1,581,426 netperf: 5,142,034 rx packets ========== netperf 1,1304,089 Signed-off-by: Eli Cohen <eli@mellanox.co.il>