• Home
  • History
  • Annotate
  • only in /drivers/infiniband/ulp/ipoib/
History log of /drivers/infiniband/ulp/ipoib/
Revision Date Author Comments (<<< Hide modified files) (Show modified files >>>)
d927d505c59a0c7353343174e6225c43c61fba6d 11-Jan-2012 Or Gerlitz <ogerlitz@mellanox.com> IB: Change CQE "csum_ok" field to a bit flag

Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag. By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding. However, it will allow to add more fields in the future
without enlarging the struct. Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
<liranl@mellanox.com>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_ib.c
936d7de3d736e0737542641269436f4b5968e9ef 07-Feb-2012 Roland Dreier <roland@purestorage.com> IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses

Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method. Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib.h
poib_main.c
poib_multicast.c
17e6abeec4cb8df1e33ea0e2b889586c731a68be 02-Dec-2011 David Miller <davem@davemloft.net> infiniband: ipoib: Sanitize neighbour handling in ipoib_main.c

Reduce the number of dst_get_neighbour_noref() calls within a single
call chain. Primarily by passing the neighbour pointer down to the
helper functions.

Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit()
by incrementing the dropped counter and freeing the packet. We don't
want it to fall through into the ARP/RARP/multicast handling, since
that should only happen when skb_dst() is NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
poib_main.c
2721745501a26d0dc3b88c0d2f3aa11471891388 02-Dec-2011 David Miller <davem@davemloft.net> net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.

To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
poib_main.c
poib_multicast.c
b3613118eb30a589d971e4eccbbb2a1314f5dfd4 02-Dec-2011 David S. Miller <davem@davemloft.net> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
596b9b68ef118f7409afbc78487263e08ef96261 25-Jul-2011 David Miller <davem@davemloft.net> neigh: Add infrastructure for allocating device neigh privates.

netdev->neigh_priv_len records the private area length.

This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.

Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
a493f1a24a496711d96b91c4dc0a1bd35eb6954b 30-Nov-2011 Roland Dreier <roland@purestorage.com> Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next
580da35a31f91a594f3090b7a2c39b85cb051a12 29-Nov-2011 Eric Dumazet <eric.dumazet@gmail.com> IB: Fix RCU lockdep splats

Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_main.c
poib_multicast.c
3874397c0bdec3c21ce071711cd105165179b8eb 21-Nov-2011 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/ipoib: Prevent hung task or softlockup processing multicast response

This following can occur with ipoib when processing a multicast reponse:

BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
Modules linked in: ...
CPU 0:
Modules linked in: ...
Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
RSP: 0018:ffff8802119ed860 EFLAGS: 00000246
0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
[<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
[<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
[<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
[<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
[<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
[<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
[<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
[<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
[<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
[<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
[<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
[<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
[<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
[<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
[<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
[<ffffffff81088830>] ? worker_thread+0x170/0x2a0
[<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
[<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
[<ffffffff8108ddf6>] ? kthread+0x96/0xa0
[<ffffffff8100c1ca>] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

ah = ipoib_create_ah(dev, priv->pd, &av);
if (!ah) {
ipoib_warn(priv, "ib_address_create failed\n");
} else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():

if (!mcast->ah) {
if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
skb_queue_tail(&mcast->pkt_queue, skb);
else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets. Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Reviewed-by: Gary Leshner <gary.leshner@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_ib.c
poib_main.c
poib_multicast.c
9ca36f7db29a1e4bde58fa7cf98b542c032b7180 17-Nov-2011 David S. Miller <davem@davemloft.net> infiniband: Update net drivers for netdev_features_t changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
32aaeffbd4a7457bf2f7448b33b5946ff2a960eb 07-Nov-2011 Linus Torvalds <torvalds@linux-foundation.org> Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
f470f8d4e702593ee1d0852871ad80373bce707b 01-Nov-2011 Linus Torvalds <torvalds@linux-foundation.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
mlx4_core: Deprecate log_num_vlan module param
IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
IB/mlx4: Enable 4K mtu for IBoE
RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
RDMA/cxgb4: Serialize calls to CQ's comp_handler
RDMA/cxgb3: Serialize calls to CQ's comp_handler
IB/qib: Fix issue with link states and QSFP cables
IB/mlx4: Configure extended active speeds
mlx4_core: Add extended port capabilities support
IB/qib: Hold links until tuning data is available
IB/qib: Clean up checkpatch issue
IB/qib: Remove s_lock around header validation
IB/qib: Precompute timeout jiffies to optimize latency
IB/qib: Use RCU for qpn lookup
IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
IB/qib: Decode path MTU optimization
IB/qib: Optimize RC/UC code by IB operation
IPoIB: Use the right function to do DMA unmap pages
RDMA/cxgb4: Use correct QID in insert_recv_cqe()
RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
...
504255f8d0480cf293962adf4bc3aecac645ae71 01-Nov-2011 Roland Dreier <roland@purestorage.com> Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next
fec14d2fcebe824377ef0305babc365d039f6b39 30-Aug-2011 Paul Gortmaker <paul.gortmaker@windriver.com> infiniband: add moduleparam.h to drivers/infiniband as required

These files were getting the moduleparam infrastructure from the
implicit presence of module.h being everywhere, but that is going
away soon.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
poib_cm.c
poib_ib.c
poib_multicast.c
b108d9764cff25262bf764542ed1998d3e568962 27-May-2011 Paul Gortmaker <paul.gortmaker@windriver.com> infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE

These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.

Fix these in advance so the tree remains biscect-clean.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
poib_fs.c
9e903e085262ffbf1fc44a17ac06058aca03524a 18-Oct-2011 Eric Dumazet <eric.dumazet@gmail.com> net: add skb frag size accessors

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_ib.c
787adb9d6ad9afb498a1580a7d8ad05f779c488a 18-Oct-2011 Dotan Barak <dotanb@dev.mellanox.co.il> IPoIB: Use the right function to do DMA unmap pages

Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_cm.c
96104eda01695a26da2c8f7423ec0ba3509c8c97 24-May-2011 Sean Hefty <sean.hefty@intel.com> RDMA/core: Add SRQ type field

Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second. Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_cm.c
e36fb88a9a0fb8ac4b87c8ac709214a408de6d97 04-Oct-2011 Marcel Apfelbaum <marcela@dev.mellanox.co.il> IPoIB: Handle extended rates in debugfs

Use new function ib_rate_to_mbps() to handle printing rate in debugfs,
so that we handle extended rates.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_fs.c
8decf868790b48a727d7e7ca164f2bcd3c1389c0 22-Sep-2011 David S. Miller <davem@davemloft.net> Merge branch 'master' of github.com:davem330/net

Conflicts:
MAINTAINERS
drivers/net/Kconfig
drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
drivers/net/ethernet/broadcom/tg3.c
drivers/net/wireless/iwlwifi/iwl-pci.c
drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
drivers/net/wireless/rt2x00/rt2800usb.c
drivers/net/wireless/wl12xx/main.c
5581be3b48914b0f9126b14daa02d334928322b0 25-Aug-2011 Ian Campbell <Ian.Campbell@citrix.com> IPoIB: convert to SKB paged frag API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_ib.c
afc4b13df143122f99a0eb10bfefb216c2806de0 16-Aug-2011 Jiri Pirko <jpirko@redhat.com> net: remove use of ndo_set_multicast_list in drivers

replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
22cfb0bf6721bb1f865f67bc21e3c36c272faf36 16-Aug-2011 Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> IPoIB: Fix possible NULL dereference in ipoib_start_xmit()

Fix a bug introduced in 69cce1d14049 ("net: Abstract dst->neighbour
accesses behind helpers.") where we might dereference skb_dst(skb)
even if it is NULL, which causes:

[ 240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib]
[...]
[ 240.948007] Call Trace:
[ 240.948007] <IRQ>
[ 240.948007] [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590
[ 240.948007] [<ffffffff8131f680>] ? arp_create+0x70/0x200
[ 240.948007] [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
poib_main.c
60063497a95e716c9a689af3be2687d261f115b4 27-Jul-2011 Arun Sharma <asharma@fb.com> atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
poib.h
69cce1d1404968f78b177a0314f5822d5afdbbfb 18-Jul-2011 David S. Miller <davem@davemloft.net> net: Abstract dst->neighbour accesses behind helpers.

dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
poib_multicast.c
3d96c74d8983b16bc7ecb196e61a2173fcc3f09f 19-Apr-2011 Michał Mirosław <mirq-linux@rere.qmqm.pl> net: infiniband/ulp/ipoib: convert to hw_features

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib.h
poib_cm.c
poib_ethtool.c
poib_ib.c
poib_main.c
6a108a14fa356ef607be308b68337939e56ea94e 20-Jan-2011 David Rientjes <rientjes@google.com> kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
config
948579cd8c6ea7c8c98c52b79f4470952e182ebd 05-Nov-2010 Joe Perches <joe@perches.com> RDMA: Use vzalloc() to replace vmalloc()+memset(0)

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_main.c
8ae31e5b1fc73751d800d551fb30340caa53c7dd 11-Jan-2011 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Add GRO support

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_ib.c
poib_main.c
19e364f6801e38972673278adedaab1abf6f854c 11-Jan-2011 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Remove LRO support

As a first step in moving from LRO to GRO, revert commit af40da894e9
("IPoIB: add LRO support"). Also eliminate the ethtool set_flags
callback which isn't needed anymore. Finally, we need to include
<linux/sched.h> directly to get the declaration of restart_syscall()
(which used to be included implicitly through <linux/inet_lro.h>).

Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
poib.h
poib_ethtool.c
poib_ib.c
poib_main.c
9e5fca251f44832cb996961048ea977f80faf6ea 27-Oct-2010 Linus Torvalds <torvalds@linux-foundation.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (63 commits)
IB/qib: clean up properly if pci_set_consistent_dma_mask() fails
IB/qib: Allow driver to load if PCIe AER fails
IB/qib: Fix uninitialized pointer if CONFIG_PCI_MSI not set
IB/qib: Fix extra log level in qib_early_err()
RDMA/cxgb4: Remove unnecessary KERN_<level> use
RDMA/cxgb3: Remove unnecessary KERN_<level> use
IB/core: Add link layer type information to sysfs
IB/mlx4: Add VLAN support for IBoE
IB/core: Add VLAN support for IBoE
IB/mlx4: Add support for IBoE
mlx4_en: Change multicast promiscuous mode to support IBoE
mlx4_core: Update data structures and constants for IBoE
mlx4_core: Allow protocol drivers to find corresponding interfaces
IB/uverbs: Return link layer type to userspace for query port operation
IB/srp: Sync buffer before posting send
IB/srp: Use list_first_entry()
IB/srp: Reduce number of BUSY conditions
IB/srp: Eliminate two forward declarations
IB/mlx4: Signal node desc changes to SM by using FW to generate trap 144
IB: Replace EXTRA_CFLAGS with ccflags-y
...
732eacc0542d0aa48797f675888b85d6065af837 26-Oct-2010 Hagen Paul Pfeifer <hagen@jauu.net> replace nested max/min macros with {max,min}3 macro

Use the new {max,min}3 macros to save some cycles and bytes on the stack.
This patch substitutes trivial nested macros with their counterpart.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
poib_main.c
116e9535fe5e00bafab7a637f306b110cf95cff5 27-Oct-2010 Roland Dreier <rolandd@cisco.com> Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iboe', 'ipoib', 'misc', 'mlx4', 'nes', 'qib' and 'srp' into for-next
c3aa9b186b95025d4ba4e90d6140c9887dfaae0a 20-Sep-2010 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Set dev_id field of net_device

Use the net device's dev_id field to encode the port number of the pci
device. This can be used to to associate a net device with the pci
device's port. The encoding is: dev_id = port - 1.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
7b4c876961ad6ddcfacd69b25fe7e13ff41fe322 28-Sep-2010 Eli Cohen <eli@mellanox.co.il> IPoIB: Skip IBoE ports

IPoIB is IP-over-Infiniband link layer. In the case of IBoE, the link
layer is Ethernet and IP can work directly over Ethernet, so disable
IPoIB for non-IB_LINK_LAYER_INFINIBAND ports.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
fed1db33fe85573487a4732d628ac5afdb5dc776 27-Aug-2010 Christoph Lameter <cl@linux.com> IPoIB: Set pkt_type correctly for multicast packets (fix IGMP breakage)

IGMP processing is broken because the IPOIB does not set the
skb->pkt_type the right way for multicast traffic. All incoming
packets are set to PACKET_HOST which means that igmp_recv() will
ignore the IGMP broadcasts/multicasts.

This in turn means that the IGMP timers are firing and are sending
information about multicast subscriptions unnecessarily. In a large
private network this can cause traffic spikes.

Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
6ba74014c1ab0e37af7de6f64b4eccbbae3cb9e7 04-Aug-2010 Linus Torvalds <torvalds@linux-foundation.org> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
phy/marvell: add 88ec048 support
igb: Program MDICNFG register prior to PHY init
e1000e: correct MAC-PHY interconnect register offset for 82579
hso: Add new product ID
can: Add driver for esd CAN-USB/2 device
l2tp: fix export of header file for userspace
can-raw: Fix skb_orphan_try handling
Revert "net: remove zap_completion_queue"
net: cleanup inclusion
phy/marvell: add 88e1121 interface mode support
u32: negative offset fix
net: Fix a typo from "dev" to "ndev"
igb: Use irq_synchronize per vector when using MSI-X
ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
e1000e: Fix irq_synchronize in MSI-X case
e1000e: register pm_qos request on hardware activation
ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
net: Add getsockopt support for TCP thin-streams
cxgb4: update driver version
cxgb4: add new PCI IDs
...

Manually fix up conflicts in:
- drivers/net/e1000e/netdev.c: due to pm_qos registration
infrastructure changes
- drivers/net/phy/marvell.c: conflict between adding 88ec048 support
and cleaning up the IDs
- drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
conflict (registration change vs marking it static)
7a52b34b07122ff5f45258d47f260f8a525518f0 06-Jun-2010 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Fix world-writable child interface control sysfs attributes

Sumeet Lahorani <sumeet.lahorani@oracle.com> reported that the IPoIB
child entries are world-writable; however we don't want ordinary users
to be able to create and destroy child interfaces, so fix them to be
writable only by root.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
39827be26b36ef9cdbc661c92a269e0484cd9ef5 03-Jul-2010 Ben Hutchings <bhutchings@solarflare.com> IB/{nes, ipoib}: Pass supported flags to ethtool_op_set_flags()

Following commit 1437ce3983bcbc0447a0dedcd644c14fe833d266 "ethtool:
Change ethtool_op_set_flags to validate flags", ethtool_op_set_flags
takes a third parameter and cannot be used directly as an
implementation of ethtool_ops::set_flags.

Changes nes and ipoib driver to pass in the appropriate value.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_ethtool.c
f8965467f366fd18f01feafb5db10512d7b4422c 21-May-2010 Linus Torvalds <torvalds@linux-foundation.org> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
qlcnic: adding co maintainer
ixgbe: add support for active DA cables
ixgbe: dcb, do not tag tc_prio_control frames
ixgbe: fix ixgbe_tx_is_paused logic
ixgbe: always enable vlan strip/insert when DCB is enabled
ixgbe: remove some redundant code in setting FCoE FIP filter
ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
ixgbe: fix header len when unsplit packet overflows to data buffer
ipv6: Never schedule DAD timer on dead address
ipv6: Use POSTDAD state
ipv6: Use state_lock to protect ifa state
ipv6: Replace inet6_ifaddr->dead with state
cxgb4: notify upper drivers if the device is already up when they load
cxgb4: keep interrupts available when the ports are brought down
cxgb4: fix initial addition of MAC address
cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
cnic: Convert cnic_local_flags to atomic ops.
can: Fix SJA1000 command register writes on SMP systems
bridge: fix build for CONFIG_SYSFS disabled
ARCNET: Limit com20020 PCI ID matches for SOHARD cards
...

Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).

Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
d414371795d54fa916938f948105d08928abfbb9 04-Mar-2010 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Allow disabling/enabling TSO on the fly through ethtool

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ethtool.c
871039f02f8ec4ab2e5e9010718caa8e085786f1 11-Apr-2010 David S. Miller <davem@davemloft.net> Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:
drivers/net/stmmac/stmmac_main.c
drivers/net/wireless/wl12xx/wl1271_cmd.c
drivers/net/wireless/wl12xx/wl1271_main.c
drivers/net/wireless/wl12xx/wl1271_spi.c
net/core/ethtool.c
net/mac80211/scan.c
22bedad3ce112d5ca1eaf043d4990fa2ed698c87 01-Apr-2010 Jiri Pirko <jpirko@redhat.com> net: convert multicast list to list_head

Converts the list and the core manipulating with it to be the same as uc_list.

+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
poib_cm.c
poib_fs.c
poib_ib.c
poib_multicast.c
poib_verbs.c
poib_vlan.c
3e4aa12f8a81506c44f04b4f0eb7663981c5a282 22-Mar-2010 Jiri Pirko <jpirko@redhat.com> ipoib: remove addrlen check for mc addresses

Finally this bit can be removed. Currently, after the bonding driver is
changed/fixed (32a806c194ea112cfab00f558482dd97bee5e44e net-next-2.6),
that's not possible for an addr with different length than dev->addr_len
to be present in list. Removing this check as in new mc_list there will be
no addrlen in the record.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
a48f509b26cec53338f4b0abd52ecea35e3974b8 04-Mar-2010 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Include return code in trace message for ib_post_send() failures

Print the return code of ib_post_send() if it fails to make these
debugging messages more useful.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_ib.c
f0dc117abdfa9a0e96c3d013d836460ef3cd08c7 03-Mar-2010 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix TX queue lockup with mixed UD/CM traffic

The IPoIB UD QP reports send completions to priv->send_cq, which is
usually left unarmed; it only gets armed when the number of
outstanding send requests reaches the size of the TX queue. This
arming is done only in the send path for the UD QP. However, when
sending CM packets, the net queue may be stopped for the same reasons
but no measures are taken to recover the UD path from a lockup.

Consider this scenario: a host sends high rate of both CM and UD
packets, with a TX queue length of N. If at some time the number of
outstanding UD packets is more than N/2 and the overall outstanding
packets is N-1, and CM sends a packet (making the number of
outstanding sends equal N), the TX queue will be stopped. When all
the CM packets complete, the number of outstanding packets will still
be higher than N/2 so the TX queue will not be restarted.

Fix this by calling ib_req_notify_cq() when the queue is stopped in
the CM path.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
3ff1562ea48cddaa5ac1adcb8892227389a4c96c 03-Mar-2010 Linus Torvalds <torvalds@linux-foundation.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (48 commits)
IB/srp: Clean up error path in srp_create_target_ib()
IB/srp: Split send and recieve CQs to reduce number of interrupts
RDMA/nes: Add support for KR device id 0x0110
IB/uverbs: Use anon_inodes instead of private infinibandeventfs
IB/core: Fix and clean up ib_ud_header_init()
RDMA/cxgb3: Mark RDMA device with CXIO_ERROR_FATAL when removing
RDMA/cxgb3: Don't allocate the SW queue for user mode CQs
RDMA/cxgb3: Increase the max CQ depth
RDMA/cxgb3: Doorbell overflow avoidance and recovery
IB/core: Pack struct ib_device a little tighter
IB/ucm: Clean whitespace errors
IB/ucm: Increase maximum devices supported
IB/ucm: Use stack variable 'base' in ib_ucm_add_one
IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one
IB/umad: Clean whitespace
IB/umad: Increase maximum devices supported
IB/umad: Use stack variable 'base' in ib_umad_init_port
IB/umad: Use stack variable 'devnum' in ib_umad_init_port
IB/umad: Remove port_table[]
IB/umad: Convert *cdev to cdev in struct ib_umad_port
...
6c74651c3bce418d3b29edfdeb72664f9441509a 27-Feb-2010 Jiri Pirko <jpirko@redhat.com> ipoib: returned back addrlen check for mc addresses

Apparently bogus mc address can break IPOIB multicast processing. Therefore
returning the check for addrlen back until this is resolved in bonding (I don't
see any other point from where mc address with non-dev->addr_len length can came
from).

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
fbf219f1c89b15e90ec2db5a3e9636376dc623db 24-Feb-2010 Jiri Pirko <jpirko@redhat.com> infiniband: convert to use netdev_for_each_mc_addr

Due to the loop complexicity in nes_nic.c, I'm using char* to copy mc addresses
to it.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
3ffe533c87281b68d469b279ff3a5056f9c75862 18-Feb-2010 Alexey Dobriyan <adobriyan@gmail.com> ipv6: drop unused "dev" arg of icmpv6_send()

Dunno, what was the idea, it wasn't used for a long time.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
757bebb3f989f10acc6f105e89305b0d19aa7c55 12-Feb-2010 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Remove TX moderation settings from ethtool support

As of commit f56bcd8 ("IPoIB: Use separate CQ for UD send
completions"), there are no TX interrupts. Change the ethtool code
not to report TX moderation settings, so users will not be misled to
think they can control TX interrupt moderation. Pointed out by Alex
Vainman <alexv@voltaire.com>

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ethtool.c
0cd4d0fd9b0a4e10c091fc6316d1bf92885dcd9c 09-Dec-2009 David J. Wilder <dwilder@us.ibm.com> IPoIB: Clear ipoib_neigh.dgid in ipoib_neigh_alloc()

IPoIB can miss a change in destination GID under some conditions. The
problem is caused when ipoib_neigh->dgid contains a stale address.
The fix is to set ipoib_neigh->dgid to zero in ipoib_neigh_alloc().

This can happen when a system using bonding on its IPoIB interfaces
has switched its active interface from interface A to B and back to A.
The system that fails over will not correctly processes the 2nd
address change, as described below.

When an address has changed neighbor->ha is updated with the new
address. Each neighbor has an associated ipoib_neigh.
ipoib_neigh->dgid also holds a copy of the remote node's hardware
address. When an address changes neighbor->ha is updated by the
network layer (arp code) with the new address. IPoIB detects this
change in ipoib_start_xmit() by comparing neighbor->ha with
ipoib_neigh->dgid. The bug is that ipoib_neigh->dgid may already
contain the new address (A) thus the change from B to A is missed by
ipoib. Here is the sequence of events:

ipoib_neigh->dgid = A and neighbor->ha = A

The address is switched to B (the first switch)

neighbor->ha = B

The change is seen in ipoib_start_xmit() -- neighbor->ha !=
ipoib_neigh->dgid so ipoib_neigh is released, and a new one is
allocated.

The allocator may return the same chunk of memory that was just
released, therefore ipoib_neigh->dgid still contains A at this point.

ipoib_neigh->dgid should be updated in neigh_add_path(), but if the
following conditions are true dgid is not updated:

1) __path_find() returns a path
2) path->ah is NULL

The remote system now switches from address B to A, neighbor->ha is
updated to A.

Now we have again : ipoib_neigh->dgid = A and neighbor->ha = A

Since the addresses are the same ipoib won't process the change in
address. Fix this by zeroing out the dgid field when allocating a new
struct ipoib_neigh.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
d7757be133cc05620608af46acd178686681b7ef 25-Sep-2009 Linus Torvalds <torvalds@linux-foundation.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Don't turn on carrier for a non-active port
IB/mthca: Fix access to freed memory in catastrophic event handling
mlx4_core: Pass cache line size to device FW
RDMA/nes: Remove duplicate .ndo_set_mac_address field initialization
IB/mad: Fix lock-lock-timer deadlock in RMPP code
5ee95120841fd623c48d7d971182cf58e3b0c8de 24-Sep-2009 Moni Shoua <monis@Voltaire.COM> IPoIB: Don't turn on carrier for a non-active port

Multicast joins can succeed even if the IB port is down. This happens
when the SM runs on the same port with the requesting port. However,
IPoIB calls netif_carrier_on() when the join of the broadcast group
succeeds, without caring about the state of the IB port. The result
is an IPoIB interface in RUNNING state but without an active IB port
to support it.

If a bonding interface uses this IPoIB interface as a slave it might
not detect that this slave is almost useless and failover
functionality will be damaged. The fix checks the state of the IB
port in the carrier_task before calling netif_carrier_on().

Adresses: https://bugs.openfabrics.org/show_bug.cgi?id=1726
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
d7e9660ad9d5e0845f52848bce31bcf5cdcdea6b 14-Sep-2009 Linus Torvalds <torvalds@linux-foundation.org> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...

Fixed up trivial conflicts:

- arch/x86/include/asm/socket.h

converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.

- drivers/net/tun.c

fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.

Noted in 'next' by Stephen Rothwell.
5e47596bee12597824a3b5b21e20f80b61e58a35 06-Sep-2009 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> IPoIB: Check multicast address format

Check that the format of multicast link addresses is correct before
taking them from dev->mc_list to priv->multicast_list. This way we
never try to send a bogus address to the SA, which prevents badness
from erronous 'ip maddr addr add', broken bonding drivers, etc.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
721d67cdca5b7642b380ca0584de8dceecf6102f 06-Sep-2009 Roland Dreier <rolandd@cisco.com> IPoIB: Drop priv->lock before calling ipoib_send()

IPoIB currently must use irqsave locking for priv->lock, since it is
taken from interrupt context in one path. However, ipoib_send() does
skb_orphan(), and the network stack locking is not IRQ-safe.
Therefore we need to make sure we don't hold priv->lock when calling
ipoib_send() to avoid lockdep warnings (the code was almost certainly
safe in practice, since the only code path that takes priv->lock from
interrupt context would never call into the network stack).

Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757
Reported-by: Bart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
poib_multicast.c
cd0bcf4cb963a147baf0b79d94c25ba86220f708 06-Sep-2009 Roland Dreier <rolandd@cisco.com> IPoIB: Remove unused <rdma/ib_cache.h> includes

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_ib.c
451f14439847db302e5104c44458b2dbb4b1829d 31-Aug-2009 Eric Dumazet <eric.dumazet@gmail.com> drivers: Kill now superfluous ->last_rx stores

The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Neil Horman <nhorman@txudriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_ib.c
adf30907d63893e4208dfe3f5c88ae12bc2f25d5 02-Jun-2009 Eric Dumazet <eric.dumazet@gmail.com> net: skb->dst accessors

Define three accessors to get/set dst attached to a skb

struct dst_entry *skb_dst(const struct sk_buff *skb)

void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)

void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;

Delete skb->dst field

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_main.c
poib_multicast.c
86d15cd83363a9787039895cb1a1b6be50f82ad3 31-May-2009 Eric Dumazet <eric.dumazet@gmail.com> net: unset IFF_XMIT_DST_RELEASE for qeth and ipoib

Last two drivers that need skb->dst in their start_xmit() function

Tell dev_hard_start_xmit() to no release it by unsetting IFF_XMIT_DST_RELEASE

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
26574401fef6766f6c3ca25b5c13febe662d2a32 13-May-2009 Eric W. Biederman <ebiederm@xmission.com> net: Fix ipoib rtnl_lock sysfs deadlock.

Network device sysfs files that grab the rtnl_lock unconditionally
will deadlock if accessed when the network device is being
unregistered. So use trylock and syscall_restart to avoid this
deadlock.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_vlan.c
e028cc55cc5c90a1c57eefe560a0cbb4df1fed14 20-Apr-2009 Yossi Etigin <yosefe@voltaire.com> IPoIB: Disable NAPI while CQ is being drained

If NAPI is enabled while IPoIB's CQ is being drained, it creates a
race on priv->ibwc between ipoib_poll() and ipoib_drain_cq(), leading
to memory corruption.

The solution is to enable/disable NAPI in ipoib_ib_dev_{open/stop}()
instead of in ipoib_{open/stop}(), and sync NAPI on the INITIALIZED
flag instead on the ADMIN_UP flag. This way NAPI will be disabled when
ipoib_drain_cq() is called.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1587>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
poib_main.c
edb5abb1e2a84fd8802a3577d95eac84fe1405ab 31-Mar-2009 Roland Dreier <rolandd@cisco.com> IPoIB: Avoid free_netdev() BUG when destroying a child interface

We have to release the RTNL before calling free_netdev() so that the
device state has a chance to become NETREG_UNREGISTERED. Otherwise
when removing a child interface, we hit the BUG() that tests the
device state in free_netdev().

Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_vlan.c
13220a94d35708d5378114e96ffcc88d0a74fe99 26-Mar-2009 Linus Torvalds <torvalds@linux-foundation.org> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1750 commits)
ixgbe: Allow Priority Flow Control settings to survive a device reset
net: core: remove unneeded include in net/core/utils.c.
e1000e: update version number
e1000e: fix close interrupt race
e1000e: fix loss of multicast packets
e1000e: commonize tx cleanup routine to match e1000 & igb
netfilter: fix nf_logger name in ebt_ulog.
netfilter: fix warning in ebt_ulog init function.
netfilter: fix warning about invalid const usage
e1000: fix close race with interrupt
e1000: cleanup clean_tx_irq routine so that it completely cleans ring
e1000: fix tx hang detect logic and address dma mapping issues
bridge: bad error handling when adding invalid ether address
bonding: select current active slave when enslaving device for mode tlb and alb
gianfar: reallocate skb when headroom is not enough for fcb
Bump release date to 25Mar2009 and version to 0.22
r6040: Fix second PHY address
qeth: fix wait_event_timeout handling
qeth: check for completion of a running recovery
qeth: unregister MAC addresses during recovery.
...

Manually fixed up conflicts in:
drivers/infiniband/hw/cxgb3/cxio_hal.h
drivers/infiniband/hw/nes/nes_nic.c
fe8114e8e1d15ba07ddcaebc4741957a1546f307 20-Mar-2009 Stephen Hemminger <shemminger@vyatta.com> infiniband: convert ipoib to net_device_ops

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
71d98b4628ee869d62814f6d8607d76cab4b9ec5 17-Feb-2009 Jack Morgenstein <jackm@dev.mellanox.co.il> IPoIB: In unicast_arp_send(), only free newly-created paths

If path_rec_start() returns error, call path_free() only if the path
was newly-created. If we free an existing path whose valid flag was zero,
(but do not detach it from the list) we cause corruption of the
path list (of which it is a member), and get a kernel crash.

The simplest solution is to not free an existing path -- just leave it
in the list as-is (i.e., with its valid flag cleared).

Thanks to Yossi Etigin of Voltaire for identifying the problem flow
which caused the kernel crash.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Moni Shua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
288379f050284087578b77e04f040b57db3db3f8 20-Jan-2009 Ben Hutchings <bhutchings@solarflare.com> net: Remove redundant NAPI functions

Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_ib.c
3c20962086b0ceb5498ba840e5a91bf4a692aae9 16-Jan-2009 Yossi Etigin <yosefe@Voltaire.COM> IPoIB: Do not print error messages for multicast join retries

When IPoIB tries to join a multicast group, and the SA module's SM
address handle is NULL (because of an SM change, etc), the join
returns with -EAGAIN status. In that case, don't print an error
message unless multicast debugging is enabled.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
cbbe1efa4972350286b52cb48aefaa11e198c0fb 15-Jan-2009 Roland Dreier <rolandd@cisco.com> IPoIB: Fix deadlock between ipoib_open() and child interface create

Fix a deadlock between child interface creation/deletion and ipoib
start/stop. The former takes vlan_mutex, and then might take RTNL via
register_netdev()/unregister_netdev(). The latter is executed with
RTNL held, and tries to take vlan_mutex, which can lead to an AB-BA
deadlock.

Fix this by having the child interface creation/deletion code take the
RTNL first so vlan_mutex always nests inside RTNL. We can use
register_netdevice() for child interfaces because we form the
interface name from the parent interface and hence don't need the '%'
expansion of register_netdev().

Reported-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_vlan.c
b8a1b1ce14252b59b2d5c89de25b54f9bfd4cc5e 14-Jan-2009 Roland Dreier <rolandd@cisco.com> IPoIB: Fix hang in napi_disable() if P_Key is never found

After commit fe25c561 ("IPoIB: Don't enable NAPI when it's already
enabled"), if an interface is brought up but the corresponding P_Key
never appears, then ipoib_stop() will hang in napi_disable(), because
ipoib_open() returns before it does napi_enable().

Fix this by changing ipoib_open() to call napi_enable() even if the
P_Key isn't present.

Reported-by: Yossi Etigin <yosefe@Voltaire.COM>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
50df48f59d656d58a1734df5cfe00cdc9a74e8b5 13-Jan-2009 Yossi Etigin <yosefe@Voltaire.COM> IPoIB: Do not join broadcast group if interface is brought down

Because the ipoib_workqueue is not flushed when ipoib interface is
brought down, ipoib_mcast_join() may trigger a join to the broadcast
group after priv->broadcast was set to NULL (during cleanup). This
will cause the system to be a member of the broadcast group when
interface is down. As a side effect, this breaks the optimization of
setting the Q_key only when joining the broadcast group.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
a50df398cddf6b757bdbf30f5f0875982ef5c660 09-Jan-2009 Yossi Etigin <yosefe@Voltaire.COM> IPoIB: Fix loss of connectivity after bonding failover on both sides

Fix bonding failover in the case both peers failover and the
gratuitous ARP is lost. In that case, the sender side will create an
ipoib_neigh and issue a path request with the old GID first. When
skb->dst->neighbour->ha changes due to ARP refresh, this ipoib_neigh
will not be added to the path->list of the path of the new GID,
because the ipoib_neigh already exists. It will not have an AH
either, because of sender-side failover. Therefore, it will not get
an AH when the path is resolved.

The solution here is to compare GIDs in ipoib_start_xmit() even if
neigh->ah is invalid. Comparing with an uninitialized value of
neigh->dgid should be fine, since a spurious match is harmless (and
astronomically unlikely too).

Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
908a7a16b852ffd618a9127be8d62432182d81b4 23-Dec-2008 Neil Horman <nhorman@tuxdriver.com> net: Remove unused netdev arg from some NAPI interfaces.

When the napi api was changed to separate its 1:1 binding to the net_device
struct, the netif_rx_[prep|schedule|complete] api failed to remove the now
vestigual net_device structure parameter. This patch cleans up that api by
properly removing it..

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_ib.c
198d6ba4d7f48c94f990f4604f0b3d73925e0ded 19-Nov-2008 David S. Miller <davem@davemloft.net> Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:

drivers/isdn/i4l/isdn_net.c
fs/cifs/connect.c
ff79ae80837cf45cb703b34824dd3862d2ddcb24 12-Nov-2008 Yossi Etigin <yosefe@Voltaire.COM> IPoIB: Fix crash in path_rec_completion()

Fix a crash in path_rec_completion() during an SM up/down loop. If
more than one path record request is issued, the first completion
releases path->done, allowing ipoib_flush_paths() to free the path,
and thus corrupting it for the second completion.

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") added the field path->valid and changed the test "if
(!path)" to "if (!path || !path->valid)". This change made it
possible for a path with an outstanding query to pass the test and
issue another query on the same path. Having two queries on the same
path leads to a crash.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1325>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
93a3ab939ba90e00e193f0bad98f43fbdfbd925d 12-Nov-2008 Yossi Etigin <yosefe@Voltaire.COM> IPoIB: Fix hang in ipoib_flush_paths()

ipoib_flush_paths() can hang during an SM up/down loop: if
path_rec_start() fails (for instance, because there is no sm_ah), the
path is still added to the path list by neigh_add_path(). Then,
ipoib_flush_paths() will wait for path->done, but it will never
complete because the request was not issued at all. Fix this by
completing path->done if issuing the query fails.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1329>.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
fe25c56190bbc0951d7c53b4ccd148e669d69938 12-Nov-2008 Yossi Etigin <yosefe@Voltaire.COM> IPoIB: Don't enable NAPI when it's already enabled

If a P_Key is not present when an interface is created, ipoib_open()
will return after doing napi_enable(). ipoib_open() will be called
again from ipoib_pkey_poll() when the P_Key appears, after NAPI has
already been enabled, and try to enable it again. This triggers a
BUG_ON() in napi_enable().

Fix this by moving the call to napi_enable() to after the test for
P_Key presence.

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
5b095d98928fdb9e3b75be20a54b7a6cbf6ca9ad 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> net: replace %p6 with %pI6

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_main.c
poib_multicast.c
8c165a8383ef56e84b541fa638be5cf1440010e7 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> infiniband: remove IPOIB_GID_RAW_ARG, IPOIB_GID_ARG, IPOIB_GID_FMT

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib.h
fcace2fe7a86237c451b09aaf7e2e9d19e09887f 29-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> infiniband: ipoib replace IPOIB_GID_FMT with %p6

Replace all uses of IPOIB_GID_FMT, IPOIB_GID_RAW_ARG() and IPOIB_GID_ARG()

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_main.c
poib_multicast.c
83bb63f62bda28be88b21216fbb59838a10f2348 23-Oct-2008 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Set netdev offload features properly for child (VLAN) interfaces

Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>

-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_vlan.c
70c9c0db549245a49cabf42d5a74688077254d46 23-Oct-2008 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Clean up ethtool support

Add a get_rx_csum method. Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ethtool.c
2767840a5ca73fde62b25e0209aa9269ec4fa7c7 11-Oct-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Always initialize poll_timer to avoid crash on unload

ipoib_ib_dev_stop() does del_timer_sync(&priv->poll_timer), but if a
P_key for an interface is not found, poll_timer is not initialized, so
this leads to a crash or hang. Fix this by moving where poll_timer is
initialized to ipoib_ib_dev_init(), which is always called.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1172>.

Debugged-by: Yosef Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
943c246e9ba9078a61b6bcc5b4a8131ce8befb64 30-Sep-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX

Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock. Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX. We kill off priv->tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv->lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
poib_multicast.c
c9da4bad5b80c3d9884e2c6ad8d2091252c32d5e 26-Sep-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Fix crash when path record fails after path flush

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event. This
change introduces a problem if the path record query triggered by
fails, causing path->ah to become NULL. A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path->ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path->ah and freeing old_ah only when the path
record query is successful. This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1194>

Reported-by: Rabah Salem <ravah@mellanox.com>
Debugged-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
e8224e4b804b4fd26723191c1891101a5959bb8a 16-Sep-2008 Yossi Etigin <yossi.openib@gmail.com> IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()

Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop(). We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly. This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes. This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").

Signed-off-by: Yossi Etigin <yosefe@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
a77a57a1a22afc31891d95879fe3cf2ab03838b0 20-Aug-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Fix deadlock on RTNL in ipoib_stop()

Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue. However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop(). It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1114>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
poib_multicast.c
b1404069f64457c94de241738fdca142c2e5698f 09-Aug-2008 David J. Wilder <dwilder@us.ibm.com> IPoIB/cm: Use vmalloc() to allocate rx_rings

There are users that are running UDP applications that require a large
receive queue size in order to get good performance. To prevent
allocation failures for rx_rings when using non-SRQ mode and large
recv_queue_size (1K or larger), use vmalloc() instead of kcalloc() to
alocate rx_rings.

Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
e08198169ec5facb3d85bb455efa44a2f8327842 30-Jul-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Set correct SG list in ipoib_cm_init_rx_wr()

wr->sg_list should be set to the sge pointer passed in, not
priv->cm.rx_sge.

Reported-by: Hoang-Nam Nguyen <HNGUYEN@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
9905922446f6dc02fd4650c8f59114d6bdb5b777 25-Jul-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Correct help text for INFINIBAND_IPOIB_DEBUG

The help text for INFINIBAND_IPOIB_DEBUG refers to "ipoib_debugfs,"
which no longer exists. Correct this to talk about the files under
debugfs that are really created.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
99c3a5a9e388e0ac166c617aaf02150e778d2779 25-Jul-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Connected mode is no longer EXPERIMENTAL

Connected mode is now tested and used by lots of people. No need to
hide it under CONFIG_EXPERIMENTAL.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
01b3fc8b15432f7931e40fe099839e1559fb0e09 22-Jul-2008 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Include err code in trace message for ib_sa_path_rec_get() failures

Print the return code of ib_sa_path_rec_get() if it fails to help
debug errors.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
49997d75152b3d23c53b0fa730599f2f74c92c65 18-Jul-2008 David S. Miller <davem@davemloft.net> Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6

Conflicts:

Documentation/powerpc/booting-without-of.txt
drivers/atm/Makefile
drivers/net/fs_enet/fs_enet-main.c
drivers/pci/pci-acpi.c
net/8021q/vlan.c
net/iucv/iucv.c
b9e40857682ecfc5bcd0356a23ff409883ffb982 15-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Do not use TX lock to protect address lists.

Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.

Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
e308a5d806c852f56590ffdd3834d0df0cbed8d7 15-Jul-2008 David S. Miller <davem@davemloft.net> netdev: Add netdev->addr_list_lock protection.

Add netif_addr_{lock,unlock}{,_bh}() helpers.

Use them to protect operations that operate on or read
the network device unicast and multicast address lists.

Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.

Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
bc3a290b51aaefc6a6af2d6e6d52ed32387c416c 15-Jul-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Double default RX/TX ring sizes

Increase IPoIB ring sizes to twice their original sizes (RX: 128->256,
TX: 64->128) to act as a shock absorber for high traffic peaks. With
the current settings, we have seen cases that there are many calls to
netif_stop_queue(), which causes degradation in throughput. Also,
larger receive buffer sizes help IPoIB in CM mode to avoid experiencing
RNR NAK conditions due to insufficient receive buffers at the SRQ.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
e112373fd6aa280bd2cbc0d5cc3809115325a1be 15-Jul-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB/cm: Reduce connected mode TX object size

Since IPoIB connected mode does not NETIF_F_SG, we only have one DMA
mapping per send, so we don't need a mapping[] array. Define a new
struct with a single u64 mapping member and use it for the CM tx_ring.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
bd3606715effbf37df986548c43bbed0842b49d5 15-Jul-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Use dev_set_mtu() to change mtu

When the driver sets the MTU of the net device outside of its
change_mtu method, it should make use of dev_set_mtu() instead of
directly setting the mtu field of struct netdevice. Otherwise
functions registered to be called upon MTU change will not get called
(this is done through call_netdevice_notifiers() in dev_set_mtu()).

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_multicast.c
c8c2afe360b7366f586f6bece1109a72ea334876 15-Jul-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Use rtnl lock/unlock when changing device flags

Use of this lock is required to synchronize changes to the netdvice's
data structs. Also move the call to ipoib_flush_paths() after the
modification of the netdevice flags in set_mode().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_multicast.c
9eae554c171e086c89ab83da2a2d3c8bf958fcb5 15-Jul-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Get rid of ipoib_mcast_detach() wrapper

ipoib_mcast_detach() does nothing except call ib_detach_mcast(), so just
use the core API in the one place that does a multicast group detach.

add/remove: 0/1 grow/shrink: 0/1 up/down: 0/-105 (-105)
function old new delta
ipoib_mcast_leave 357 319 -38
ipoib_mcast_detach 67 - -67

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_multicast.c
poib_verbs.c
d0de13622d5ac658efe7c51521dbdbe0752aa3dd 15-Jul-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Only set Q_Key once: after joining broadcast group

The current code will set the Q_Key for any join of a non-sendonly
multicast group. The operation involves a modify QP operation, which
is fairly heavyweight, and is only really required after the join of
the broadcast group. Fix this by adding a parameter to ipoib_mcast_attach()
to control when the Q_Key is set.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_multicast.c
poib_verbs.c
5892eff91ad60ba365ae7f75050ce464036c5396 15-Jul-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Remove priv->mcast_mutex

No need for a mutex around calls to ib_attach_mcast/ib_detach_mcast
since these operations are synchronized at the HW driver layer.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_verbs.c
c03d4731b5b6de45b95a10bf1d510dde423d6757 15-Jul-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Remove unused IPOIB_MCAST_STARTED code

The IPOIB_MCAST_STARTED flag is not used at all since commit b3e2749b
("IPoIB: Don't drop multicast sends when they can be queued"), so
remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_multicast.c
ee1e2c82c245a5fb2864e9dbcdaab3390fde3fcc 15-Jul-2008 Moni Shoua <monis@Voltaire.COM> IPoIB: Refresh paths instead of flushing them on SM change events

The patch tries to solve the problem of device going down and paths being
flushed on an SM change event. The method is to mark the paths as candidates for
refresh (by setting the new valid flag to 0), and wait for an ARP
probe a new path record query.

The solution requires a different and less intrusive handling of SM
change event. For that, the second argument of the flush function
changes its meaning from a boolean flag to a level. In most cases, SM
failover doesn't cause LID change so traffic won't stop. In the rare
cases of LID change, the remote host (the one that hadn't changed its
LID) will lose connectivity until paths are refreshed. This is no
worse than the current state. In fact, preventing the device from
going down saves packets that otherwise would be lost.

Signed-off-by: Moni Levy <monil@voltaire.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_verbs.c
af40da894e96d5c826d38be3ea53ee00d9de0367 15-Jul-2008 Vladimir Sokolovsky <vlad@mellanox.co.il> IPoIB: add LRO support

Add "ipoib_use_lro" module parameter to enable LRO and an
"ipoib_lro_max_aggr" module parameter to set the max number of packets
to be aggregated. Make LRO controllable and LRO statistics accessible
through ethtool.

Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
poib.h
poib_ethtool.c
poib_ib.c
poib_main.c
12406734051a26e9fe4c8568e931dfddbb72d431 15-Jul-2008 Ron Livne <ronli@voltaire.com> IPoIB: Use multicast loopback blocking if available

Set IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK for IPoIB's UD QPs if
supported by the underlying device. This creates an improvement of up
to 39% in bandwidth when sending multicast packets with IPoIB, and an
improvment of 12% in cpu usage.

Signed-off-by: Ron Livne <ronli@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_verbs.c
a7d834c4bc6be73e8f83eaa5072fac3c5549f7f2 15-Jul-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Fix racy use of receive WR/SGL in ipoib_cm_post_receive_nonsrq()

For devices that don't support SRQs, ipoib_cm_post_receive_nonsrq() is
called from both ipoib_cm_handle_rx_wc() and ipoib_cm_nonsrq_init_rx(),
and these two callers are not synchronized against each other.
However, ipoib_cm_post_receive_nonsrq() always reuses the same receive
work request and scatter list structures, so multiple callers can end
up stepping on each other, which leads to posting garbled work
requests.

Fix this by having the caller pass in the ib_recv_wr and ib_sge
structures to use, and allocating new local structures in
ipoib_cm_nonsrq_init_rx().

Based on a patch by Pradeep Satyanarayana <pradeep@us.ibm.com> and
David Wilder <dwilder@us.ibm.com>, with debugging help from Hoang-Nam
Nguyen <hnguyen@de.ibm.com>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
f89271da32bc1a636cf4eb078e615930886cd013 15-Jul-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Copy small received SKBs in connected mode

The connected mode implementation in the IPoIB driver has a large
overhead in the way SKBs are handled in the receive flow. It usually
allocates an SKB with as big as was used in the currently received SKB
and moves unused fragments from the old SKB to the new one. This
involves a loop on all the remaining fragments and incurs overhead on
the CPU. This patch, for small SKBs, allocates an SKB just large
enough to contain the received data and copies to it the data from the
received SKB. The newly allocated SKB is passed to the stack and the
old SKB is reposted.

When running netperf, UDP small messages, without this pach I get:

UDP UNIDIRECTIONAL SEND TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
14.4.3.178 (14.4.3.178) port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec

114688 128 10.00 5142034 0 526.31
114688 10.00 1130489 115.71

With this patch I get both send and receive at ~315 mbps.

The reason that send performance actually slows down is as follows:
When using this patch, the overhead of the CPU for handling RX packets
is dramatically reduced. As a result, we do not experience RNR NAK
messages from the receiver which cause the connection to be closed and
reopened again; when the patch is not used, the receiver cannot handle
the packets fast enough so there is less time to post new buffers and
hence the mentioned RNR NACKs. So what happens is that the
application *thinks* it posted a certain number of packets for
transmission but these packets are flushed and do not really get
transmitted. Since the connection gets opened and closed many times,
each time netperf gets the CPU time that otherwise would have been
given to IPoIB to actually transmit the packets. This can be verified
when looking at the port counters -- the output of ifconfig and the
oputput of netperf (this is for the case without the patch):

tx packets
==========
port counter: 1,543,996
ifconfig: 1,581,426
netperf: 5,142,034

rx packets
==========
netperf 1,1304,089

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
poib.h
poib_cm.c
poib_main.c
f3781d2e89f12dd5afa046dc56032af6e39bd116 15-Jul-2008 Roland Dreier <rolandd@cisco.com> RDMA: Remove subversion $Id tags

They don't get updated by git and so they're worse than useless.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_fs.c
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
poib_vlan.c
e1d50dce5af77cb6d33555af70e2b8748dd84009 21-May-2008 Jack Morgenstein <jackm@dev.mellanox.co.il> IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()

We saw a kernel oops in our regression testing when a multicast "join
finish" occurred just after the interface was -- this is
<https://bugs.openfabrics.org/show_bug.cgi?id=1040>. The test
randomly causes the HCA physical port to go down then up.

The cause of this is that ipoib_mcast_join_finish() processing happen
just after ipoib_mcast_dev_flush() was invoked (in which case the
broadcast pointer is NULL). This patch tests for and handles the case
where priv->broadcast is NULL.

Cc: <stable@kernel.org>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
57ce41d1d18279cc90223f3deadca70c7de1cfca 01-May-2008 Eli Cohen <eli@dev.mellanox.co.il> IB/ipoib: Fix transmit queue stalling forever

Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions")
introduced a bug where the transmit queue could get stopped and never
woken up. The problem is that send completions are only polled at the
end of the xmit function, so if the send queue fills up and the xmit
path stops the queue, then there is no way for send completions to
ever get polled, and so the transmit queue stays stopped forever.

Fix this by arming the send CQ just before posting the last send
request that fills the send queue. Then, when the completion event
handler is called, drain the send CQ. Since it is possible that not
enough send completions are in the CQ, verify that the the net queue
has been woken up after draining the send CQ, and if not arm a timer
and drain again at the timer function.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_verbs.c
b4132efa1a47858d741ecb05b8735e6fcb603bc8 29-Apr-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Copy child MTU from parent

When creating a child interface, copy the MTU information from the
parent. Otherwise when the child's multicast join completes, the MTU
will not be updated since the code does

dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);

and priv->admin_mtu will be set to 0.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_vlan.c
f56bcd8013566d4ad4759ae5fc85a6660e4655c7 29-Apr-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Use separate CQ for UD send completions

Use a dedicated CQ for UD send completions. Also, do not arm the UD
send CQ, which reduces the number of interrupts generated. This patch
farther reduces overhead by not calling poll CQ for every posted send
WR -- it does polls only when there 16 or more outstanding work requests.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ethtool.c
poib_ib.c
poib_main.c
poib_verbs.c
bc7b3a36ba02e4053ca38653e6a753082d9add03 23-Apr-2008 Shirley Ma <mashirle@us.ibm.com> IPoIB: Handle 4K IB MTU for UD (datagram) mode

This patch enables IPoIB to use 4K UD messages (when the underlying
device and fabrics support a 4K MTU) by using two scatter buffers when
PAGE_SIZE is less than or equal to thhe HCA IB MTU size. The first
buffer is for IPoIB header + GRH header, and the second buffer is the
IPoIB payload, which is 4K-4.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
poib_vlan.c
9fdd5e5bf682130d1e1dd83d06e99eeafa645c0c 17-Apr-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Handle case when P_Key is deleted and re-added at same index

If a P_Key is deleted and then re-added at the same index, then IPoIB
gets confused because __ipoib_ib_dev_flush() only checks whether the
index is the same without checking whether the P_Key was present, so
the interface is stopped when the P_Key is deleted, but the event when
the P_Key is re-added gets ignored and the interface never gets
restarted.

Also, switch to using ib_find_pkey() instead of ib_find_cached_pkey()
everywhere in IPoIB, since none of the places that look for P_Keys are
in a fast path or in non-sleeping context, and in general we want to
kill off the whole caching infrastructure eventually. This also fixes
consistency problems caused because some IPoIB queries were cached and
some were uncached during the window where the cache was not updated.

Thanks to Venkata Subramonyam <vsubramo@cisco.com> for debugging this
problem and testing this fix.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_ib.c
28d52b3cd8d48ef0ff77d4a8a7a21fc2816bb0a5 17-Apr-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Support modifying IPoIB CQ event moderation

This can be used to tune at run time the parameters controlling the
event (interrupt) generation rate and thus reduce the overhead
incurred by handling interrupts resulting in better throughput. Since
IPoIB uses a single CQ for both RX and TX, RX is chosen to dictate
configuration for both RX and TX.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ethtool.c
82c24c18afc5e1c2a955bdc2762b19721283365c 17-Apr-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Add basic ethtool support

Just add the infrastructure so we can add functionality later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
akefile
poib.h
poib_ethtool.c
poib_main.c
40ca1988e03c001747d0b4cc1b25cf38297c9f9e 17-Apr-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Add LSO support

For HCAs that support TCP segmentation offload (IB_DEVICE_UD_TSO), set
NETIF_F_TSO and use HW LSO to offload TCP segmentation.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
poib_verbs.c
157de229468b2a63bbc8f9a7d37c70a2c9731ac8 17-Apr-2008 Robert P. J. Day <rpjday@crashcourse.ca> IB: Use shorter list_splice_init() for brevity

Convert list_splice() + INIT_LIST_HEAD() to the equivalent list_splice_init()

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
6046136c742e32d5e6431cdcd8957638d1816821 17-Apr-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Use checksum offload support if available

For HCAs that support checksum offload (ie that set IB_DEVICE_UD_IP_CSUM
in the device capabilities flags), have IPoIB set NETIF_F_IP_CSUM and
use the HCA to generate and verify IP checksums.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
10313cbb92206450b450e14f2b3f6ccde42d9a34 12-Mar-2008 Roland Dreier <rolandd@cisco.com> IPoIB: Allocate priv->tx_ring with vmalloc()

Commit 7143740d ("IPoIB: Add send gather support") made struct
ipoib_tx_buf significantly larger, since the mapping member changed
from a single u64 to an array with MAX_SKB_FRAGS + 1 entries. This
means that allocating tx_rings with kzalloc() may fail because there
is not enough contiguous memory for the new, much bigger size. Fix
this regression by allocating the rings with vmalloc() instead.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_main.c
4200406b8fbbf309f4fffb339bd16c4553ae0c30 12-Mar-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Set tx_wr.num_sge in connected mode post_send()

Commit 7143740d ("IPoIB: Add send gather support") made it possible
for tx_wr.num_sge to be != 1 -- this happens if send gather support is
enabled. However, the code in the connected mode post_send() function
assumes the old invariant, namely that tx_wr.num_sge is always 1. Fix
this by explicitly setting tx_wr.num_sge to 1 in the CM post_send().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
b3e2749bf32f61e7beb259eb7cfb066d2ec6ad65 11-Mar-2008 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Don't drop multicast sends when they can be queued

When set_multicast_list() is called the multicast task is restarted
and the IPOIB_MCAST_STARTED bit is cleared. As a result for some
window of time, multicast packets are not transmitted nor queued but
rather dropped by ipoib_mcast_send(). These dropped packets are
painful in two cases:

- bonding fail-over which both calls set_multicast_list() on the new
active slave and sends Gratuitous ARP through that slave.

- IP_DROP_MEMBERSHIP code which both calls set_multicast_list() on the
device and issues IGMP leave.

In both these cases, depending on the scheduling of the IPoIB
multicast task, the packets would be dropped. As a result, in the
bonding case, the failover would not be detected by the peers until
their neighbour is renewed the neighbour (which takes a few tens of
seconds). In the IGMP case, the IP router doesn't get an IGMP leave
and would only learn on that from further probes on the group (also a
delay of at least a few tens of seconds).

Fix this by allowing transmission (or queuing) depending on the
IPOIB_FLAG_OPER_UP flag instead of the IPOIB_MCAST_STARTED flag.

Signed-off-by: Olga Shern <olgas@voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
ec229e5e81b3cf757e5e8b6a8bd0b4f32fe52f8c 13-Feb-2008 Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> IPoIB/cm: Fix ipoib_cm_dev_stop() cleanup when drain times out

Commit efcd9971 ("IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()")
introduced a bug in ipoib_cm_dev_stop() when the receive drain times
out. In that case, the function moves all the pending rx stuff into a
private list but then calls ipoib_cm_free_rx_reap_list(), which
handles a different list.

Fix this by moving everything to the rx_reap_list that will actually
get freed up.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=906>.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
a9d1884925c80b96a621939a4fef5d74de58debe 14-Feb-2008 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Remove unused struct ipoib_cm_tx.ibwc member

struct ipoib_cm_tx.ibwc is unused since commit 1b524963 ("IPoIB/cm:
Use common CQ for CM send completions"), so remove it.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
poib.h
167c42655cca188657aa9bb4e06d1194af3c73a5 13-Feb-2008 Jack Morgenstein <jackm@dev.mellanox.co.il> IPoIB: On P_Key change event, reset state properly

In P_Key event handling, if the old P_Key is no longer available, the
driver must call ipoib_ib_dev_stop() -- just as it does when the P_Key
is still available (see procedure __ipoib_ib_dev_flush()).

When a P_Key becomes available, the driver will perform ipoib_open(),
which assumes that the QP is in RESET, the cm_id has been
destroyed/deleted, etc. If ipoib_ib_dev_stop() is not called as
described above, then these assumptions will be false, and the attempt
to bring the interface up will fail.

Found by Mellanox QA.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
7143740d26098aca84ecc7376ccfe2c58fd0412e 30-Jan-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Add send gather support

This patch acts as a preparation for using checksum offload for IB
devices capable of inserting/verifying checksum in IP packets. The
patch does not actaully turn on NETIF_F_SG - we defer that to the
patches adding checksum offload capabilities.

We only add support for send gathers for datagram mode, since existing
HW does not support checksum offload on connected QPs.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_verbs.c
eb14032f9eb595621270f3269f40094adb3144e8 30-Jan-2008 Eli Cohen <eli@mellanox.co.il> IPoIB: Add high DMA feature flag

All current InfiniBand devices can handle all DMA addresses, and it's
hard to imagine anyone would be silly enough to build a new device
that couldn't. Therefore, enable the NETIF_F_HIGHDMA feature for IPoIB.

This has no effect for no, but is needed when we enable gather/scatter
support and checksum stateless offloads.

Signed-off-by: Eli Cohen <eli@mellnaox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
7bc531dd883b955e6198c8e202161f22d2e8c472 28-Jan-2008 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Remove a misleading debug print

Commit 732a2170 ("IB/ipoib: Bound the net device to the ipoib_neigh
structue") left a misleading debug print (n->dev would be a bond
device only if boding is used). Clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
bafff9741704959e99fb65a7327c017251019a19 17-Jan-2008 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Handle bonding failover race for connected neighbours too

Move up the code that checks for a situation where the remote GID
stored in the ipoib_neigh is different than the one present in the
neighbour (handle gratuitous ARP) or that a bonding fail over has
happened but the neighbour still has a pointer to an ipoib_neigh
created by a different device than the current slave. This will cause
the driver to apply the check also for connected mode neighbours.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
1cf18d5aab5144866cb5221250905623b03a32a9 22-Jan-2008 Jan Engelhardt <jengelh@computergmbh.de> IPoIB: Constify seq_operations function pointer tables

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_fs.c
48fe5e594c979177b7f20affd027be56e8ea2767 15-Nov-2007 Krishna Kumar <krkumar2@in.ibm.com> IPoIB: Remove redundant check of netif_queue_stopped() in xmit handler

qdisc_run() now tests for queue_stopped() before calling
__qdisc_run(), and the same check is done in every iteration of
__qdisc_run(), so another check is not required in the driver xmit.
This means that ipoib_start_xmit() no longer needs to test
netif_queue_stopped(); the test was added to fix earlier kernels,
where the networking stack did not guarantee that the xmit method of
an LLTX driver would not be called after the queue was stopped, but
current kernels do provide this guarantee.

To validate, I put a debug in the TX_BUSY path which never hit with 64
threads running overnight exercising this code a few 100 million
times.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
586a693448676de5174e752426ced69ec79ab174 21-Dec-2007 Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> IPoIB/CM: Enable SRQ support on HCAs that support fewer than 16 SG entries

Some HCAs (such as ehca2) support SRQ, but only support fewer than 16 SG
entries for SRQs. Currently IPoIB/CM implicitly assumes all HCAs will
support 16 SG entries for SRQs (to handle a 64K MTU with 4K pages). This
patch removes that restriction by limiting the maximum MTU in connected
mode to what the maximum number of SRQ SG entries allows.

This patch addresses <https://bugs.openfabrics.org/show_bug.cgi?id=728>

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_main.c
a9e527e3f9f4510e9f3450ca3bc51bc3ef2854fd 10-Dec-2007 Rolf Manderscheid <rvm@obsidianresearch.com> IPoIB: improve IPv4/IPv6 to IB mcast mapping functions

An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs. The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised. This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.

Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
68e995a295720439ad2bf8677114cdf9d262d905 25-Jan-2008 Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com> IPoIB/cm: Add connected mode support for devices without SRQs

Some IB adapters (notably IBM's eHCA) do not implement SRQs (shared
receive queues). The current IPoIB connected mode support only works
on devices that support SRQs.

Fix this by adding support for using the receive queue of each
connected mode receive QP. The disadvantage of this compared to using
an SRQ is that it means a full queue of receives must be posted for
each remote connected mode peer, which means that total memory usage
is potentially much higher than when using SRQs. To manage this, add
a new module parameter "max_nonsrq_conn_qp" that limits the number of
connections allowed per interface.

The rest of the changes are fairly straightforward: we use a table of
struct ipoib_cm_rx to hold all the active connections, and put the
table index of the connection in the high bits of receive WR IDs.
This is needed because we cannot rely on the struct ib_wc.qp field for
non-SRQ receive completions. Most of the rest of the changes just
test whether or not an SRQ is available, and post receives or find
received packets in the right place depending on the answer.

Cleaning up dead connections actually becomes simpler, because we do
not have to do the "last WQE reached" dance that is required to
destroy QPs attached to an SRQ. We just move the QP to the error
state and wait for all pending receives to be flushed.

Signed-off-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>

[ Completely rewritten and split up, based on Pradeep's work. Several
bugs fixed and no doubt several bugs introduced. - Roland ]

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_main.c
poib_verbs.c
efcd99717f76c6d19dd81203c60fe198480de522 25-Jan-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Factor out ipoib_cm_free_rx_reap_list()

Factor out the code for going through the rx_reap list of struct
ipoib_cm_rx and freeing each one. This consolidates the code
duplicated between ipoib_cm_dev_stop() and ipoib_cm_rx_reap() and
reduces the risk of error when adding additional accounting.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
7b3687df66cab4ecd6efb42cfa0c7de60cc4e3b9 25-Jan-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Factor out ipoib_cm_create_srq()

Factor out the code to create an SRQ and allocate the receive ring in
ipoib_cm_dev_init() into a new function ipoib_cm_create_srq(). This
will make the code neater when support for devices that don't implement
SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
1efb61444ca3a598dfafb7a6c573c5d5d42d3432 25-Jan-2008 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Factor out ipoib_cm_free_rx_ring()

Factor out the code to unmap/free skbs and free the receive ring in
ipoib_cm_dev_cleanup() into a new function ipoib_cm_free_rx_ring().
This function will be called from a couple of other places when
support for devices that don't implement SRQs is added.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
2337f80941ac22f747ce6fd2c7a79e91d911a3ce 24-Oct-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Trivial formatting cleanups

Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
1401b53acc0328d96bacb2a3393d2852699df96b 26-Nov-2007 Jack Morgenstein <jackm@dev.mellanox.co.il> IPoIB: Fix oops if xmit is called when priv->broadcast is NULL

If a port goes down, ipoib_ib_dev_down() is invoked -- which flushes
the mcasts (clearing priv->broadcast) and clearing the path record
cache. If ipoib_start_xmit() is then invoked (before the broadcast
group is rejoined), a kernel oops results from attempting to access
priv->broadcast, which is still unset.

Returning NULL from path_rec_create() if priv->broadcast is NULL is a
harmless way of bypassing the problem -- the offending packet is
simply discarded "without prejudice."

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
09f60f8f54c5e2391f0b7c38dccd7b00d83587ab 26-Oct-2007 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Fix receive QP cleanup

Commit 1b524963 ("IPoIB/cm: Use common CQ for CM send completions")
changed how the high-order bits of work request IDs were used, which
had the effect that IPOIB_CM_RX_DRAIN_WRID was no longer handled as a
connected mode receive completion. This leads to the messages

ib1: cm send completion event with wrid 1073741823 (> 64)
ib1: RX drain timing out

when an interface with connected mode QPs is brought down. Fix this
by making sure that both IPOIB_OP_CM and IPOIB_OP_RECV are set in
IPOIB_CM_RX_DRAIN_WRID.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
0b776eb5426752d4e53354ac89e3710d857e09a7 23-Oct-2007 Linus Torvalds <torvalds@woody.linux-foundation.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
mlx4_core: Increase command timeout for INIT_HCA to 10 seconds
IPoIB/cm: Use common CQ for CM send completions
IB/uverbs: Fix checking of userspace object ownership
IB/mlx4: Sanity check userspace send queue sizes
IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"
IB/ehca: Enable large page MRs by default
IB/ehca: Change meaning of hca_cap_mr_pgsize
IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr()
IB/ehca: Fix masking error in {,re}reg_phys_mr()
IB/ehca: Supply QP token for SRQ base QPs
IPoIB: Use round_jiffies() for ah_reap_task
RDMA/cma: Fix deadlock destroying listen requests
RDMA/cma: Add locking around QP accesses
IB/mthca: Avoid alignment traps when writing doorbells
mlx4_core: Kill mlx4_write64_raw()
1b524963fd2d7fb20ea68df497151aa9d17fbca4 16-Aug-2007 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB/cm: Use common CQ for CM send completions

Use the same CQ for CM send completions as for all other IPoIB
completions. This means all completions are processed via the same
NAPI polling routine. This should help reduce the number of
interrupts for bi-directional traffic (such as TCP) and fixes "driver
is hogging interrupts" errors reported for IPoIB send side, e.g.
<https://bugs.openfabrics.org/show_bug.cgi?id=508>

To do this, keep a per-interface counter of outstanding send WRs, and
stop the interface when this counter reaches the send queue size to
avoid CQ overruns.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
fd312561adcc90e924f35d3032d5493aeb4c3017 18-Oct-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))"

It's too hard to figure out what "!likely(...)" really means, and who
knows how compilers interpret the hint.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
69fc507a1424ce31d2096be5b1e5b1750bdfe235 15-Oct-2007 Anton Blanchard <anton@samba.org> IPoIB: Use round_jiffies() for ah_reap_task

Use round_jiffies() to align the 1 second ah_reap_task with other work
and potentially save power by sleeping cores for longer.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
200d1713b47200aa478f27e454e3d957264d49be 10-Oct-2007 Moni Shoua <monis@voltaire.com> IB/ipoib: Verify address handle validity on send

When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
poib_main.c
732a2170f499ce7cf5f0bdd4f9e0b0c8337b67e1 10-Oct-2007 Moni Shoua <monis@voltaire.com> IB/ipoib: Bound the net device to the ipoib_neigh structue

IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
poib.h
poib_main.c
poib_multicast.c
ce9d3c9a6a9aef61525be07fe6ba27d937236aa2 12-Oct-2007 Linus Torvalds <torvalds@woody.linux-foundation.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits)
mlx4_core: Fix section mismatches
IPoIB: Allow setting policy to ignore multicast groups
IB/mthca: Mark error paths as unlikely() in post_srq_recv functions
IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages.
IB/ipath: Remove redundant link state checks
IB/ipath: Fix IB_EVENT_PORT_ERR event
IB/ipath: Better handling of unexpected GPIO interrupts
IB/ipath: Maintain active time on all chips
IB/ipath: Fix QHT7040 serial number check
IB/ipath: Indicate a couple of chip bugs to userspace
IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround
IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close
IB/ipath: Remove duplicate copy of LMC
IB/ipath: Add ability to set the LMC via the sysfs debugging interface
IB/ipath: Optimize completion queue entry insertion and polling
IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED
IB/ipath: Generate flush CQE when QP is in error state
IB/ipath: Remove redundant code
IB/ipath: Future proof eeprom checksum code (contents reading)
IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate
...
9153f66a5b8e63c61374df4e6a4cbd0056e45178 10-Oct-2007 Roland Dreier <rdreier@cisco.com> IPoIB: Fix unused variable warning

The conversion to use netdevice internal stats left an unused variable
in ipoib_neigh_free(), since there's no longer any reason to get
netdev_priv() in order to increment dropped packets. Delete the
unused priv variable.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
poib_main.c
de90351219a1f1fd3cb45cf6fcc4e9d6407fd2c9 29-Sep-2007 Roland Dreier <rolandd@cisco.com> [IPoIB]: Convert to netdevice internal stats

Use the stats member of struct netdevice in IPoIB, so we can save
memory by deleting the stats member of struct ipoib_dev_priv, and save
code by deleting ipoib_get_stats().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
poib_multicast.c
3b04ddde02cf1b6f14f2697da5c20eca5715017f 09-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: Move hardware header operations out of netdevice.

Since hardware header operations are part of the protocol class
not the device instance, make them into a separate object and
save memory.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
10d024c1b2fd58af8362670d7d6e5ae52fc33353 17-Sep-2007 Ralf Baechle <ralf@linux-mips.org> [NET]: Nuke SET_MODULE_OWNER macro.

It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it. The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.

[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
bea3348eef27e6044b6161fd04c3152215f96411 04-Oct-2007 Stephen Hemminger <shemminger@linux-foundation.org> [NET]: Make NAPI polling independent of struct net_device objects.

Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.

In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.

The signature of the ->poll() call back goes from:

int foo_poll(struct net_device *dev, int *budget)

to

int foo_poll(struct napi_struct *napi, int budget)

The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.

The napi_struct is to be embedded in the device driver private data
structures.

Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.

With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.

Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.

[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib.h
poib_ib.c
poib_main.c
335a64a5a958002bc238c90de695e120c3c8c120 08-Oct-2007 Or Gerlitz <ogerlitz@voltaire.com> IPoIB: Allow setting policy to ignore multicast groups

The kernel IB stack allows (through the RDMA CM) userspace
applications to join and use multicast groups from the IPoIB MGID
range. This allows multicast traffic to be handled directly from
userspace QPs, without going through the kernel stack, which gives
better performance for some applications.

However, to fully interoperate with IP multicast, such userspace
applications need to participate in IGMP reports and queries, or else
routers may not forward the multicast traffic to the system where the
application is running. The simplest way to do this is to share the
kernel IGMP implementation by using the IP_ADD_MEMBERSHIP option to
join multicast groups that are being handled directly in userspace.

However, in such cases, the actual multicast traffic should not also
be handled by the IPoIB interface, because that would burn resources
handling multicast packets that will just be discarded in the kernel.

To handle this, this patch adds lookup on the database used for IB
multicast group reference counting when IPoIB is joining multicast
groups, and if a multicast group is already handled by user space,
then the IPoIB kernel driver ignores the group. This is controlled by
a per-interface policy flag. When the flag is set, IPoIB will not
join and attach its QP to a multicast group which already has an entry
in the database; when the flag is cleared, IPoIB will behave as before
this change.

For each IPoIB interface, the /sys/class/net/$intf/umcast attribute
controls the policy flag. The default value is off/0.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
poib_vlan.c
ede6bc04f3a07a9c93f02c92cdc281d254398321 07-Oct-2007 Dotan Barak <dotanb@dev.mellanox.co.il> IPoIB/cm: Clean up initialization of QP attr in ipoib_cm_create_tx_qp()

Make the way QP is being created in ipoib_cm_create_tx_qp()
consistent with ipoib_cm_create_rx_qp().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
81668838c4583b19276b16382e0c61e21ef5adf0 02-Aug-2007 Sean Hefty <sean.hefty@intel.com> IPoIB: Specify Traffic Class with path record queries for QoS support

To support QoS within and between subnets, modify IPoIB to request
specific Traffic Class values with path record queries, using
the value associated with the IPoIB broadcast group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ See some comments I made on this at v1 and v2 of the posts
<http://lists.openfabrics.org/pipermail/general/2007-August/039275.html>
<http://lists.openfabrics.org/pipermail/general/2007-September/040312.html> ]

Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
ca6de177acef8f2c7c3901ea583a263364ca7bbb 21-Aug-2007 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix error path memory leak

Clean up properly if ib_query_pkey() or ib_query_gid() fail.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
b3ac60fc243f2312d27ecded058ef96f52f25fe0 10-Oct-2007 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix typo to end statement with ';' instead of ','

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_verbs.c
ce423ef50ee1b6b7db63c748034423aa0afce224 10-Oct-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Make sure no receives are handled when stopping device

The current IPoIB code might process receive completions from
ipoib_drain_cq() when bringing down the interface. This could cause
packets to be passed up the stack without the device's poll method
being called. Avoid this by setting the status of any successful
completions to IB_WC_WR_FLUSH_ERR.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
6958e827f187c9c5cd39af075567f74f02bf3dd1 06-Aug-2007 Jack Morgenstein <jackm@dev.mellanox.co.il> IPoIB: Fix leak in ipoib_transport_dev_init() error path

ipoib_transport_dev_init() calls ipoib_cm_dev_init(), so it needs to
call ipoib_cm_dev_cleanup() to unwind that on the error path.

Found by Dotan Barak of Mellanox.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_verbs.c
1d84612649427a85e1f311baa7215f9a6252d856 18-Jun-2007 Sean Hefty <sean.hefty@intel.com> IB/cm: Include HCA ACK delay in local ACK timeout

The IB CM should include the HCA ACK delay when calculating the local
ACK timeout value to use for RC QPs. If the HCA ACK delay is large
enough relative to the packet life time, then if it is not taken into
account, the calculated timeout value ends up being too small, which
can result in "retry exceeded" errors.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
1b844afe9e67d6cd441ae6df71051b4004f31dd2 10-Jul-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Recycle loopback skbs instead of freeing and reallocating

InfiniBand HCAs replicate multicast packets back to the QP that sent
them if that QP is attached to the destination multicast group. This
means that IPoIB multicasts are often replicated back to the receive
queue of the interface that generated them. To avoid confusing the
network stack, we drop these duplicates within the IPoIB driver.

However, there's no reason to free the skb that received the duplicate
and then immediately allocate a new skb to post to the receive queue.
We can be more efficient and just repost the same skb.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
20089ca55786a243c7b72becd1bf670f4e2c2028 10-Jul-2007 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Fix warning if IPV6 is not enabled

Fix

drivers/infiniband/ulp/ipoib/ipoib_cm.c:1151: warning: unused variable 'dev'

by getting rid of the variable dev, which is only used if CONFIG_IPV6
is enabled, and replacing the one use of it with the value it is
assigned, namely priv->dev.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
06cc85086e6896939f8c68f8518224748f6b0b2f 23-May-2007 Jan Engelhardt <jengelh@linux01.gwdg.de> IB: Use menuconfig for InfiniBand menu

Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
841adfca9c5fc0fec6b1f0b2e5eb7a3b239a7730 29-Jun-2007 Ralph Campbell <ralph.campbell@qlogic.com> IPoIB/cm: Partial error clean up unmaps wrong address

If a page can't be allocated for the frag list of a skb, the code to
unmap the partially allocated list is off by one. For exaple, if
'frags' equals one, i == 0, and the alloc_page() fails, then the old
loop would have unmapped mapping[1] which is uninitialized. The same
would happen if the call to ib_dma_map_page() failed.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
13ef5f44c3931dff1d75443a875e97b588d4b8f0 21-Jun-2007 Roland Dreier <rolandd@cisco.com> IPoIB/cm: Remove dead definition of struct ipoib_cm_id

It's completely unused.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
82c3aca6ad9004169df8f2f8c0747686fe4003b3 20-Jun-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix interoperability when MTU doesn't match

IPoIB connected mode currently rejects a connection request unless the
supported MTU is >= the local netdevice MTU. This breaks
interoperability with implementations that might have tweaked
IPOIB_CM_MTU, and there's real no longer a reason to do so: this test
is just a leftover from when we did not tweak MTU per-connection. Fix
this by making the test as permissive as possible.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
3ec7393a6858a1716e74aa81be6af76fd180021d 19-Jun-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Initialize RX before moving QP to RTR

Fix a crasher bug in IPoIB CM: once a QP is in the RTR state, a
receive completion (or even an asynchronous error) might be observed
on this QP, so we have to initialize all of our receive data
structures before moving to the RTR state.

As an optimization (since modify_qp might take a long time), the
jiffies update done when moving RX to the passive_ids list is also
left in place to reduce the chance of the RX being misdetected as
stale.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=662>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
ec56dc0b7f6c3fec20bbc2e98ff1a06edf2fc9b9 28-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix performance regression on Mellanox

commit 518b1646 ("IPoIB/cm: Fix SRQ WR leak") introduced a severe
performance regression on Mellanox cards, because keeping a QP in the
error state for extended periods of time moves hardware to the slow
path (until the QP is destroyed). For example, MPI latency goes from
~3 usecs to ~7 usecs.

Fix this by posting a send WR on one of the QPs that are being
flushed, instead of using a separate drain QP that is kept in the
error state.

This fixes bug <https://bugs.openfabrics.org/show_bug.cgi?id=636>,
reported and bisected by Scott Weitzenkamp at Cisco and debugged by
Sasha Mikheev at Voltaire.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
2dfbfc37121d307e1f1d24c2979382cb17b19347 24-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Drain cq in ipoib_cm_dev_stop()

Since NAPI polling is disabled while ipoib_cm_dev_stop() is running,
ipoib_cm_dev_stop() must poll the CQ itself in order to see the
packets draining.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
8fd357a6e3375083f7d321413eb8f6739491f342 24-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix timeout check in ipoib_cm_dev_stop()

time_after() was used backwards, so the timeout occurred immediately.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
518b1646f8a31904ca637b8df0c1e31c34a7a3c2 21-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix SRQ WR leak

SRQ WR leakage has been observed with IPoIB/CM: e.g. flipping ports on
and off will, with time, leak out all WRs and then all connections
will start getting RNR NAKs. Fix this in the way suggested by spec:
move the QP being destroyed to the error state, wait for "Last WQE
Reached" event and then post WR on a "drain QP" connected to the same
CQ. Once we observe a completion on the drain QP, it's safe to call
ib_destroy_qp.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_verbs.c
24bd1e4e32e88cd3d0675482d15bea498a922ca8 18-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IB/ipoib: Fix typos in error messages

Trivial error message fixups.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
poib_multicast.c
26bbf13ce1ca21ec69175bcc4b995cb8ffdf8593 19-May-2007 Yosef Etigin <yosefe@voltaire.com> IPoIB: Handle P_Key table reordering

SM reconfiguration or failover possibly causes a shuffling of the values
in the P_Key table. Right now, IPoIB only queries for the P_Key index
once when it creates the device QP, and hence there are problems if the
index of a P_Key value changes. Fix this by using the PKEY_CHANGE event
to trigger a recheck of the P_Key index.

Signed-off-by: Yosef Etigin <yosefe@voltaire.com>
Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_verbs.c
7c5b9ef8577bfa7b74ea58fc9ff2934ffce13532 14-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Optimize stale connection detection

In the presence of some running RX connections, we repeat
queue_delayed_work calls each 4 RX WRs, which is a waste. It's enough
to start stale task when a first passive connection is added, and
rerun it every IPOIB_CM_RX_DELAY as long as there are outstanding
passive connections.

This removes some code from RX data path.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
972d45fb43f0f0793fa275c4a22998106760cd61 07-May-2007 Linus Torvalds <torvalds@woody.linux-foundation.org> Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IPoIB: Convert to NAPI
IB: Return "maybe missed event" hint from ib_req_notify_cq()
IB: Add CQ comp_vector support
IB/ipath: Fix a race condition when generating ACKs
IB/ipath: Fix two more spin lock problems
IB/fmr_pool: Add prefix to all printks
IB/srp: Set proc_name
IB/srp: Add orig_dgid sysfs attribute to scsi_host
IPoIB/cm: Don't crash if remote side uses one QP for both directions
RDMA/cxgb3: Support for new abort logic
RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message
RDMA/cxgb3: Fail qp creation if the requested max_inline is too large
RDMA/cxgb3: Fix TERM codes
IPoIB/cm: Fix error handling in ipoib_cm_dev_open()
IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
IB/mthca: Work around kernel QP starvation
IB/ipath: Don't put QP in timeout queue if waiting to send
IB/ipath: Don't call spin_lock_irq() from interrupt context
8d1cc86a6278687efbab7b8c294ab01efe4d4231 07-May-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Convert to NAPI

Convert the IP-over-InfiniBand network device driver over to using
NAPI to handle completions for the main CQ. This covers all receives
as well as datagram mode sends; send completions for connected mode
connections are still handled from interrupt context.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_cm.c
poib_ib.c
poib_main.c
f4fd0b224d60044d2da5ca02f8f2b5150c1d8731 03-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IB: Add CQ comp_vector support

Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API. Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value. Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity. This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_verbs.c
d6ef7d68f6f51c5b9de01c542dab8b90067a9c27 02-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Don't crash if remote side uses one QP for both directions

The IPoIB CM spec allows the use of a single connection in both
active->passive and passive->active directions. The current Linux
code uses one connection for both directions, but if another node only
uses one connection for both directions, we oops when we try to look
up the passive connection. Fix by checking that qp_context is
non-NULL before dereferencing it.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
poib_cm.c
6473d160b4aba8023bcf38519a5989694dfd51a7 06-Mar-2007 Jean Delvare <khali@linux-fr.org> PCI: Cleanup the includes of <linux/pci.h>

I noticed that many source files include <linux/pci.h> while they do
not appear to need it. Here is an attempt to clean it all up.

In order to find all possibly affected files, I searched for all
files including <linux/pci.h> but without any other occurence of "pci"
or "PCI". I removed the include statement from all of these, then I
compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
false positives manually.

My tests covered 66% of the affected files, so there could be false
positives remaining. Untested files are:

arch/alpha/kernel/err_common.c
arch/alpha/kernel/err_ev6.c
arch/alpha/kernel/err_ev7.c
arch/ia64/sn/kernel/huberror.c
arch/ia64/sn/kernel/xpnet.c
arch/m68knommu/kernel/dma.c
arch/mips/lib/iomap.c
arch/powerpc/platforms/pseries/ras.c
arch/ppc/8260_io/enet.c
arch/ppc/8260_io/fcc_enet.c
arch/ppc/8xx_io/enet.c
arch/ppc/syslib/ppc4xx_sgdma.c
arch/sh64/mach-cayman/iomap.c
arch/xtensa/kernel/xtensa_ksyms.c
arch/xtensa/platform-iss/setup.c
drivers/i2c/busses/i2c-at91.c
drivers/i2c/busses/i2c-mpc.c
drivers/media/video/saa711x.c
drivers/misc/hdpuftrs/hdpu_cpustate.c
drivers/misc/hdpuftrs/hdpu_nexus.c
drivers/net/au1000_eth.c
drivers/net/fec_8xx/fec_main.c
drivers/net/fec_8xx/fec_mii.c
drivers/net/fs_enet/fs_enet-main.c
drivers/net/fs_enet/mac-fcc.c
drivers/net/fs_enet/mac-fec.c
drivers/net/fs_enet/mac-scc.c
drivers/net/fs_enet/mii-bitbang.c
drivers/net/fs_enet/mii-fec.c
drivers/net/ibm_emac/ibm_emac_core.c
drivers/net/lasi_82596.c
drivers/parisc/hppb.c
drivers/sbus/sbus.c
drivers/video/g364fb.c
drivers/video/platinumfb.c
drivers/video/stifb.c
drivers/video/valkyriefb.c
include/asm-arm/arch-ixp4xx/dma.h
sound/oss/au1550_ac97.c

I would welcome test reports for these files. I am fine with removing
the untested files from the patch if the general opinion is that these
changes aren't safe. The tested part would still be nice to have.

Note that this patch depends on another header fixup patch I submitted
to LKML yesterday:
[PATCH] scatterlist.h needs types.h
http://lkml.org/lkml/2007/3/01/141

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
poib.h
347fcfbed261fdd11f46fa03d524e1bddddab3a6 01-May-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix error handling in ipoib_cm_dev_open()

If skb allocation fails when we start the device, we call
ipoib_cm_dev_stop() even though ipoib_cm_dev_open() did not run to
completion, so we pass an invalid pointer to ib_destroy_cm_id and get
an oops.

Fix by clearing cm.id on error, and testing it in cm_dev_stop().
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=561>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
afc2e82c0851317931a9bfdb98271253371825c6 27-Apr-2007 Linus Torvalds <torvalds@woody.linux-foundation.org> Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits)
IB: Set class_dev->dev in core for nice device symlink
IB/ehca: Implement modify_port
IB/umad: Clarify documentation of transaction ID
IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements
IB/mad: Change SMI to use enums rather than magic return codes
IB/umad: Implement GRH handling for sent/received MADs
IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr
IB/sa: Set src_path_bits correctly in ib_init_ah_from_path()
IB/ucm: Simplify ib_ucm_event()
RDMA/ucma: Simplify ucma_get_event()
IB/mthca: Simplify CQ cleaning in mthca_free_qp()
IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory
IB/mthca: Update HCA firmware revisions
IB/ipath: Fix WC format drift between user and kernel space
IB/ipath: Check that a UD work request's address handle is valid
IB/ipath: Remove duplicate stuff from ipath_verbs.h
IB/ipath: Check reserved memory keys
IB/ipath: Fix unit selection when all CPU affinity bits set
IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times
IB/ipath: Disable IB link earlier in shutdown sequence
...
459a98ed881802dee55897441bc7f77af614368e 19-Mar-2007 Arnaldo Carvalho de Melo <acme@redhat.com> [SK_BUFF]: Introduce skb_reset_mac_header(skb)

For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can
later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple case, next will handle the slightly more
"complex" cases.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_cm.c
poib_ib.c
37aebbde7023d75bf09fbadb6796276d0a65a068 25-Apr-2007 Roland Dreier <rolandd@cisco.com> IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements

There are quite a few places in ipoib_cm.c where we know IRQs are
enabled because we do something that sleeps in the same function, so
we can convert several occurrences of spin_lock_irqsave() to a plain
spin_lock_irq(). This cleans up the source a little and makes the
code smaller too:

add/remove: 0/0 grow/shrink: 1/5 up/down: 3/-51 (-48)
function old new delta
ipoib_cm_tx_reap 403 406 +3
ipoib_cm_stale_task 146 145 -1
ipoib_cm_dev_stop 173 172 -1
ipoib_cm_tx_handler 964 956 -8
ipoib_cm_rx_handler 956 937 -19
ipoib_cm_skb_reap 212 190 -22

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
46f1b3d7aff99ef4c1e729e023b9c8ee51de5973 05-Apr-2007 Sean Hefty <sean.hefty@intel.com> IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr

To support destinations that are not on the local IB subnet, IPoIB
should include the GRH information when constructing an address
handle. Using the existing ib_init_ah_from_path() call will do this
for us.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
poib_main.c
a89875fc7e23ec91561bc3742df3bd5d12b376b4 19-Apr-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Remove pointless opcode field from debugging output

There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line. In fact the opcode field is not
even defined for completions with a status other than success.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_ib.c
6371ea3d48e17d4638a91a4a1e0364e56204e418 10-Apr-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix DMA direction typo

Receive buffers need to be mapped with DMA_FROM_DEVICE. Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
ecbb416939da77c0d107409976499724baddce7b 24-Mar-2007 Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> [NET]: Fix neighbour destructor handling.

->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.

The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.

I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down. But it
would be better to add explicit test for neigh->dead in any case.

Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
77d8e1efea0e313edc710160c232a6fd2dc9f907 21-Mar-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IB/ipoib: Fix thinko in packet length checks

The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet. Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets. For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.

This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
poib_ib.c
d04d01b113be5b88418eb30087753c3de0a39fd8 22-Mar-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB: Fix use-after-free in path_rec_completion()

The connected mode code added the possibility that an neigh struct
gets freed in the list_for_each_entry() loop in path_rec_completion(),
which causes a use-after-free. Fix this by changing to the _safe
variant of the list walking macro.

This was spotted by the Coverity checker (CID 1567).

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
e07832b66285124038a96b25a2056e91a55d8b1e 19-Mar-2007 Sean Hefty <sean.hefty@intel.com> IPoIB: Fix race in detaching from mcast group before attaching

There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish()
where we can try to detach from a multicast group before we've
attached to it. Fix this by reordering the code in ipoib_mcast_leave
to free the multicast group first, which waits for the multicast
callback thread (which calls ipoib_mcast_join_finish()) to complete
before detaching from the group.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
60a596dab7c82bdfa5ee7abcee8e0ce385d4ef21 22-Mar-2007 Michael S. Tsirkin <mst@dev.mellanox.co.il> IPoIB/cm: Fix reaping of stale connections

The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped. Fix this by
changing to time_before_eq().

Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
55c9adde13dfc6b738e0f70c071a0622e52f35ed 08-Mar-2007 Shirley Ma <xma@us.ibm.com> IPoIB: Turn on interface's carrier after broadcast group is joined

Do netif_carrier_on() right after the IPv4 broadcast multicast group
is joined, rather than waiting for all of the initial set of multicast
group joins to finish. This allows at least IPv4 traffic to limp
along on broken fabrics where not all multicast groups can be joined.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
a27cbe878203076247c1b5287f5ab59ed143b560 27-Feb-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Only handle async events for one port

An asynchronous event carries the port number that the event occurred
on, so there's no reason for an IPoIB interface to process an event
associated with a different local HCA port.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_verbs.c
843613b04744d5b65c2f37975c5310f366a0d070 26-Feb-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Correct debugging output when path record lookup fails

If path_rec_completion() is passed a non-NULL path record pointer
along with an unsuccessful status value, the tracing code incorrectly
prints the (invalid) DLID from the path record rather than the more
interesting status code. The actual logic of the function correctly
uses the path record only if the status indicates a successful lookup.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
658bcef619f50d9eb6028452ff9e1ad4a96c2af9 22-Feb-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Remove unused local_rate tracking

Now that low-level drivers handle the conversion from an absolute rate
to a relative rate, there's no need for the IPoIB driver to keep track
of the local port's data rate.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_multicast.c
1812063ba3365aeb5eb9117861a7fb05432f7ed0 20-Feb-2007 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB/cm: Improve small message bandwidth

Avoid the overhead of freeing/reallocating and mapping/unmapping for
DMA pages that have not been written to by hardware.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
faec2f7b96b555055d0aa6cc6b83a537270bed52 16-Feb-2007 Sean Hefty <sean.hefty@intel.com> IB/sa: Track multicast join/leave requests

The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left. Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.

To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations. Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
8a2e65f87c66ab1e720f49378750cdd800f9e9cf 15-Feb-2007 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: CM error handling thinko fix

ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must
use dev_kfree_skb_any(), not kfree_skb().

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
551fd6122d247d76124c4fdb6eb898cc8e3d74aa 16-Feb-2007 Roland Dreier <rolandd@cisco.com> IPoIB: Only allow root to change between datagram and connected mode

Change the permissions of the "mode" sysfs attribute to be S_IWUSR
instead of S_IWUGO.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_cm.c
93bbad8fe13a25dcf7f3bc628a71d1a7642ae61b 14-Feb-2007 Linus Torvalds <torvalds@woody.linux-foundation.org> Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/mthca: Always fill MTTs from CPU
IB/mthca: Merge MR and FMR space on 64-bit systems
IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
IB/mthca: Give reserved MTTs a separate cache line
IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
RDMA/cxgb3: Add driver for Chelsio T3 RNIC
IB: Remove redundant "_wq" from workqueue names
RDMA/cma: Increment port number after close to avoid re-use
IB/ehca: Fix memleak on module unloading
IB/mthca: Work around gcc bug on sparc64
IPoIB: Connected mode experimental support
IB/core: Use ARRAY_SIZE macro for mandatory_table
IB/mthca: Use correct structure size in call to memset()
2b8693c0617e972fc0b2fd1ebf8de97e15b656c3 12-Feb-2007 Arjan van de Ven <arjan@linux.intel.com> [PATCH] mark struct file_operations const 3

Many struct file_operations in the kernel can be "const". Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data. In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
poib_fs.c
839fcaba355abaffb7b44f0f4504093acb0b11cf 05-Feb-2007 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Connected mode experimental support

The following patch adds experimental support for IPoIB connected
mode, as defined by the draft from the IETF ipoib working group. The
idea is to increase performance by increasing the MTU from the maximum
of 2K (theoretically 4K) supported by IPoIB on top of UD. With this
code, I'm able to get 800MByte/sec or more with netperf without
options on a Mellanox 4x back-to-back DDR system.

Some notes on code:
1. SRQ is used for scalability to large cluster sizes
2. Only RC connections are used (UC does not support SRQ now)
3. Retry count is set to 0 since spec draft warns against retries
4. Each connection is used for data transfers in only 1 direction, so
each connection is either active(TX) or passive (RX). 2 sides that
want to communicate create 2 connections.
5. Each active (TX) connection has a separate CQ for send completions -
this keeps the code simple without CQ resize and other tricks
6. To detect stale passive side connections (where the remote side is
down), we keep an LRU list of passive connections (updated once per
second per connection) and destroy a connection after it has been
unused for several seconds. The LRU rule makes it possible to avoid
scanning connections that have recently been active.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
akefile
poib.h
poib_cm.c
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
poib_vlan.c
43cb76d91ee85f579a69d42bc8efc08bac560278 09-Apr-2002 Greg Kroah-Hartman <gregkh@suse.de> Network: convert network devices to use struct device instead of class_device

This lets the network core have the ability to handle suspend/resume
issues, if it wants to.

Thanks to Frederik Deweerdt <frederik.deweerdt@gmail.com> for the arm
driver fixes.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
poib_main.c
poib_vlan.c
82b399133b6ebf667ee635fc69ef26b61eede4bc 12-Dec-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Make sure struct ipoib_neigh.queue is always initialized

Move the initialization of ipoib_neigh's skb_queue into
ipoib_neigh_alloc(), since commit 2745b5b7 ("IPoIB: Fix skb leak when
freeing neighbour") will make iterate over the skb_queue to free any
packets left over when freeing the ipoib_neigh structure.

This fixes a crash when freeing ipoib_neigh structures allocated in
ipoib_mcast_send(), which otherwise don't have their skb_queue
initialized.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
37ccf9df974f55e99bf21278133b065cbbcf3f79 12-Dec-2006 Ralph Campbell <ralph.campbell@qlogic.com> IPoIB: Use the new verbs DMA mapping functions

Convert IPoIB to use the new DMA mapping functions
for kernel verbs consumers.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
4c1ac1b49122b805adfa4efc620592f68dccf5db 05-Dec-2006 David Howells <dhowells@redhat.com> Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

Conflicts:

drivers/infiniband/core/iwcm.c
drivers/net/chelsio/cxgb2.c
drivers/net/wireless/bcm43xx/bcm43xx_main.c
drivers/net/wireless/prism54/islpci_eth.c
drivers/usb/core/hub.h
drivers/usb/input/hid-core.c
net/core/netpoll.c

Fix up merge failures with Linus's head and fix new compilation failures.

Signed-Off-By: David Howells <dhowells@redhat.com>
2745b5b713bf3457d8977c62dc2b3aa61f4a14b0 16-Nov-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Fix skb leak when freeing neighbour

ipoib_neigh_free() is sometimes called while neighbour is still alive,
so it might still have queued skbs. Fix skb leak in this case.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
poib.h
poib_ib.c
poib_main.c
poib_multicast.c
073ae841d6a5098f7c6e17fc1f329350d950d1ce 16-Nov-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Clear high octet in QP number

IPoIB assumes that high (reserved) octet in the hardware address is 0,
and copies it into the QPN. This violates RFC 4391 (which requires
that the high 8 bits are ignored on receive), and will result in an
invalid QPN being used when interoperating with IPoIB connected mode.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
73fbe8be73512b8a3ffa3d20c9d7f531af99679c 10-Oct-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Check for DMA mapping error for TX packets

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
cab00891c5489cb6d0cde0a55d39bd5f2871fa70 03-Oct-2006 Matt LaPlante <kernel1@cyberdogtech.com> Still more typo fixes

Signed-off-by: Adrian Bunk <bunk@stusta.de>
config
8e18e2941c53416aa219708e7dcad21fb4bd6794 27-Sep-2006 Theodore Ts'o <tytso@mit.edu> [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_private

The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).

This patch:

The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.

[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Judith Lebzelter <judith@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
poib_fs.c
a8bfca024326560d86c6323b0504288ca55a75fc 23-Sep-2006 Eli Cohen <eli@dev.mellanox.co.il> IPoIB: Add some likely/unlikely annotations in hot path

Signed-off-by: Eli Cohen <eli@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
poib_main.c
507c33504686e733a14ef0b2dc9db0c20fae4653 21-Sep-2006 Dotan Barak <dotanb@dev.mellanox.co.il> IPoIB: Remove unused include of vmalloc.h

IPoIB doesn't use anything from <linux/vmalloc.h>, so don't include it.

Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
5ccd025553d73e523212ee0860b7f4a75e886bfa 23-Sep-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Rejoin all multicast groups after a port event

When ipoib_ib_dev_flush() is called because of a port event, the
driver needs to rejoin all multicast groups, since the flush will call
ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()). Otherwise no
(non-broadcast) multicast groups will be rejoined until the networking
core calls ->set_multicast_list again, and so multicast reception will
be broken for potentially a long time.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
d0df6d6d4539241179a1ef5394787825bf05bbce 23-Sep-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Create MCGs with all attributes required by RFC

RFC 4391 ("Transmission of IP over InfiniBand (IPoIB)") says:

If the IB multicast group does not already exist, one must be
created first with the IPoIB link MTU. The MGID MUST use the same
P_Key, Q_Key, SL, MTU, and HopLimit as those used in the
broadcast-GID. The rest of attributes SHOULD follow the values used
in the broadcast-GID as well.

However, the current IPoIB driver is only setting the attributes
required by the InfiniBand spec to create a multicast group, so in
particular the MTU and HopLimit are not being set. Add these
attributes when creating MCGs, and also set the Rate attribute, since
IPoIB pays attention to that attribute as well.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
c1a0b23bf477c2e1068905f4e2b5c3cee139e853 22-Aug-2006 Michael S. Tsirkin <mst@mellanox.co.il> IB/sa: Require SA registration

Require users to register with SA module, to prevent the sa_query
module text from going away while an SA query callback is still
running. Update all in-tree users for the new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
2439a6e65ff09729c3b4215f134dc5cd4e8a30c0 23-Sep-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Refactor completion handling

Split up ipoib_ib_handle_wc() into ipoib_ib_handle_rx_wc() and
ipoib_ib_handle_tx_wc() to make the code easier to read. This will
also help implement NAPI in the future.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
07ebafbaaa72aa6a35472879008f5a1d1d469a0c 03-Aug-2006 Tom Tucker <tom@opengridcomputing.com> RDMA: iWARP Core Changes.

Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
- Hook iWARP CM into the build system and use it in rdma_cm.
- Convert enum ib_node_type to enum rdma_node_type, which includes
the possibility of RDMA_NODE_RNIC, and update everything for this.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
3cd965646b7cb75ae84dd0daf6258adf20e4f169 23-Sep-2006 Roland Dreier <rolandd@cisco.com> IB: Whitespace fixes

Remove some trailing whitespace that has snuck in despite the best
efforts of whitespace=error-all. Also fix a few other whitespace
bogosities.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
9217b27b12eb5ab910d14b3376c2b6cd13d87711 03-Aug-2006 Michael S. Tsirkin <mst@mellanox.co.il> IB/ipoib: Fix flush/start xmit race (from code review)

Prevent flush task from freeing the ipoib_neigh pointer, while
ipoib_start_xmit() is accessing the ipoib_neigh through the pointer it
has loaded from the skb's hardware address.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
c11bd42a7676b49d41c780f2ede3709204f8da83 14-Sep-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Retry failed send-only multicast group joins

When a send-only multicast group join fails, mcast->query must be set
to NULL. Otherwise, IPoIB will never retry the join and the multicast
group will never be reachable.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
8ddc7c5326064434048ec1ecfe57659e08345cc1 13-Jul-2006 Or Gerlitz <ogerlitz@voltaire.com> IB/ipoib: Remove broken link from Kconfig and documentation

Remove references to the IPoIB IETF working group as it has been closed.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
8a7f752125a930a83f4d8dfe37fa5a081ab19d31 19-Jul-2006 Michael S. Tsirkin <mst@mellanox.co.il> IB/ipoib: Fix packet loss after hardware address update

The neighbour ha field may get updated without destroying the
neighbour. In this case, the ha field gets out of sync with the
address handle stored in ipoib_neigh->ah, with the result that
the ah field would point to an incorrect path, resulting in all
packets being lost.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
624d01f899f6bbd75fd06890f231e1f46555d376 24-Jul-2006 Or Gerlitz <ogerlitz@voltaire.com> IB/ipoib: Fix oops with ipoib_debug_mcast set

Need to set mcast->ah before debug code dereferences it.

Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
poib.h
179e09172ab663b8587ecc46bb18a56a770304a9 26-Jun-2006 Akinobu Mita <mita@miraclelinux.com> [PATCH] drivers: use list_move()

This patch converts the combination of list_del(A) and list_add(A, B) to
list_move(A, B) under drivers/.

Acked-by: Corey Minyard <minyard@mvista.com>
Cc: Ben Collins <bcollins@debian.org>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Alasdair Kergon <dm-devel@redhat.com>
Cc: Gerd Knorr <kraxel@bytesex.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Pavlic <fpavlic@de.ibm.com>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: Andrew Vasquez <linux-driver@qlogic.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Akinobu Mita <mita@miraclelinux.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
poib_multicast.c
4c84a39c8adba6bf2f829b217e78bfd61478191a 20-Jun-2006 Linus Torvalds <torvalds@g5.osdl.org> Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (46 commits)
IB/uverbs: Don't serialize with ib_uverbs_idr_mutex
IB/mthca: Make all device methods truly reentrant
IB/mthca: Fix memory leak on modify_qp error paths
IB/uverbs: Factor out common idr code
IB/uverbs: Don't decrement usecnt on error paths
IB/uverbs: Release lock on error path
IB/cm: Use address handle helpers
IB/sa: Add ib_init_ah_from_path()
IB: Add ib_init_ah_from_wc()
IB/ucm: Get rid of duplicate P_Key parameter
IB/srp: Factor out common request reset code
IB/srp: Support SRP rev. 10 targets
[SCSI] srp.h: Add I/O Class values
IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool.
IB/mthca: Fill in max_map_per_fmr device attribute
IB/ipath: Add client reregister event generation
IB/mthca: Add client reregister event generation
IB: Move struct port_info from ipath to <rdma/ib_smi.h>
IPoIB: Handle client reregister events
IB: Add client reregister event type
...
932ff279a43ab7257942cddff2595acd541cc49b 09-Jun-2006 Herbert Xu <herbert@gondor.apana.org.au> [NET]: Add netif_tx_lock

Various drivers use xmit_lock internally to synchronise with their
transmission routines. They do so without setting xmit_lock_owner.
This is fine as long as netpoll is not in use.

With netpoll it is possible for deadlocks to occur if xmit_lock_owner
isn't set. This is because if a printk occurs while xmit_lock is held
and xmit_lock_owner is not set can cause netpoll to attempt to take
xmit_lock recursively.

While it is possible to resolve this by getting netpoll to use
trylock, it is suboptimal because netpoll's sole objective is to
maximise the chance of getting the printk out on the wire. So
delaying or dropping the message is to be avoided as much as possible.

So the only alternative is to always set xmit_lock_owner. The
following patch does this by introducing the netif_tx_lock family of
functions that take care of setting/unsetting xmit_lock_owner.

I renamed xmit_lock to _xmit_lock to indicate that it should not be
used directly. I didn't provide irq versions of the netif_tx_lock
functions since xmit_lock is meant to be a BH-disabling lock.

This is pretty much a straight text substitution except for a small
bug fix in winbond. It currently uses
netif_stop_queue/spin_unlock_wait to stop transmission. This is
unsafe as an IRQ can potentially wake up the queue. So it is safer to
use netif_tx_disable.

The hamradio bits used spin_lock_irq but it is unnecessary as
xmit_lock must never be taken in an IRQ handler.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
508e434123b136c96d1bf989e8d4ab0c8d7498b1 18-Jun-2006 Leonid Arsh <leonida@voltaire.com> IPoIB: Handle client reregister events

Handle client reregister events by treating them just like LID or
SM changes -- flush all cached paths and rejoin multicast groups.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_verbs.c
37c22a77212c13201497378cc8becc5c95d0f3f5 29-May-2006 Jack Morgenstein <jackm@mellanox.co.il> IPoIB: Fix kernel unaligned access on ia64

Fix misaligned access faults on ia64: never cast a misaligned
neighbour->ha + 4 pointer to union ib_gid type; pass a void * pointer
instead. The memcpy was being optimized to use full word accesses
because the compiler thought that union ib_gid is always aligned.

The cast in IPOIB_GID_ARG is safe, since it is fixed to access each
byte separately.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
31c02e215700c2b704d9441f629ae87bb9aeb561 18-Jun-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Avoid using stale last_send counter when reaping AHs

The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah()
and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is
not held and hence a stale value of ah->last_send might be used, which
would lead to freeing an AH before the driver was really done with it.
The simple way to fix this is to the optimization of early free from
ipoib_free_ah() and unconditionally queue AHs for reaping, and then
take priv->tx_lock in __ipoib_reap_ah().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
959eb39297e8c82f61fbfc283ad4ff11c883bf1e 05-Jun-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix AH leak at interface down

When ipoib_stop() is called it first calls netif_stop_queue() to stop
the kernel from passing more packets to the network driver. However,
the completion handler may call netif_wake_queue() re-enabling packet
transfer.

This might result in leaks (we see AH leaks which we think can be
attributed to this bug) as new packets get posted while the interface
is going down.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
5941d079f2c3bdf0dffed1afb8941678fcd0bcb7 10-May-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Free child interfaces properly

When deleting a child interface with a non-default P_Key via
/sys/class/net/ibX/delete_child, the interface must be freed with
free_netdev() (rather than kfree() on the private data).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_vlan.c
f697f74a6b189702474b2fd457e1f9365fa213e3 10-Apr-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Use spin_lock_irq() instead of spin_lock_irqsave()

We know ipoib_flush_paths() is called from plain process context with
interrupts enabled, since it does wait_for_completion(). So there's
no need to use spin_lock_irqsave() -- spin_lock_irq() is fine.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
a30bb96c6f5aca6513e4dbd94962da03d14b20a9 05-Apr-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Close race in ipoib_flush_paths()

ib_sa_cancel_query() must be called with priv->lock held since
a completion might arrive and set path->query to NULL.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
0f4852513fb07405ce88da40d8c497060561246e 10-Apr-2006 Shirley Ma <xma@us.ibm.com> IPoIB: Make send and receive queue sizes tunable

Make IPoIB's send and receive queue sizes tunable via module
parameters ("send_queue_size" and "recv_queue_size"). This allows the
queue sizes to be enlarged to fix disastrously bad performance on some
platforms and workloads, without bloating memory usage when large
queues aren't needed.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_verbs.c
f2de3b06126ddb07d0e4617225d74dce0855add3 05-Apr-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Wait for join to finish before freeing mcast struct

ipoib_mcast_restart_task() might free an mcast object while a join
request is still outstanding, leading to an oops when the query
completes. Fix this by waiting for query to complete, similar to what
ipoib_stop_thread() is doing. The wait for mcast completion code is
consolidated in wait_for_mcast_join().

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
bf6a9e31cfa768ce0a8e18474b3ca808641d9243 10-Apr-2006 Jack Morgenstein <jackm@mellanox.co.il> IB: simplify static rate encoding

Push translation of static rate to HCA format into low-level drivers,
where it belongs. For static rate encoding, use encoding of rate
field from IB standard PathRecord, with addition of value 0, for
backwards compatibility with current usage. The changes are:

- Add enum ib_rate to midlayer includes.
- Get rid of static rate translation in IPoIB; just use static rate
directly from Path and MulticastGroup records.
- Update mthca driver to translate absolute static rate into the
format used by hardware. This also fixes mthca's static rate
handling for HCAs that are capable of 4X DDR.

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_fs.c
poib_main.c
poib_multicast.c
d2e0655ede1d91c3a586455d03a4a2d57e659830 04-Apr-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Consolidate private neighbour data handling

Consolidate IPoIB's private neighbour data handling into
ipoib_neigh_alloc() and ipoib_neigh_free(). This will make it easier
to keep track of the neighbour structures that IPoIB is handling, and
is a nice cleanup of the code:

add/remove: 2/1 grow/shrink: 1/8 up/down: 100/-178 (-78)
function old new delta
ipoib_neigh_alloc - 61 +61
ipoib_neigh_free - 36 +36
ipoib_mcast_join_finish 1288 1291 +3
path_rec_completion 575 573 -2
ipoib_mcast_join_task 664 660 -4
ipoib_neigh_destructor 101 92 -9
ipoib_neigh_setup_dev 14 3 -11
ipoib_neigh_setup 17 - -17
path_free 238 215 -23
ipoib_mcast_free 329 306 -23
ipoib_mcast_send 718 684 -34
neigh_add_path 705 650 -55

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_main.c
poib_multicast.c
f5545d24b8aa9fccd8071203e83bc9f4b26e17a6 02-Apr-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Always build debugging code unless CONFIG_EMBEDDED=y

Don't allow CONFIG_INFINIBAND_IPOIB_DEBUG to be disabled unless
CONFIG_EMBEDDED is selected. We want users (and especially distros)
to have this turned on unless they really need to save space, because
by the time we want debugging output, it's usually too late to rebuild
a kernel. The debugging output can be controlled at runtime via the
debug_level module parameter in sysfs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
config
ef12d4561900d4af37f46a8f2d97dec3c4d59bf9 29-Mar-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Fix oops with raw sockets

ipoib_hard_header() needs to handle the case that daddr is NULL. This
can happen when packets are injected via a raw socket, and IPoIB
shouldn't oops in this case.

Reported by Anton Blanchard <anton@samba.org>

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
7a343d4c46bc59fe617f968e996ce2fd67c5d179 23-Mar-2006 Leonid Arsh <leonida@voltaire.com> IPoIB: P_Key change event handling

This patch causes the network interface to respond to P_Key change
events correctly. As a result, you'll see a child interface in the
"RUNNING" state (netif_carrier_on()) only when the corresponding P_Key
is configured by the SM. When SM removes a P_Key, the "RUNNING" state
will be disabled for the corresponding network interface. To
implement this, I added IB_EVENT_PKEY_CHANGE event handling. To
prevent flushing the device before the device is open by the "delay
open" mechanism, I added an additional device flag called
IPOIB_FLAG_INITIALIZED.

This also prevents the child network interface from trying to join to
multicast groups until the PKEY is configured. We used to get error
messages like:

ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff

in this case. To fix this, I just check IPOIB_FLAG_OPER_UP flag in
ipoib_set_mcast_list().

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_verbs.c
4e37b956161c3a3b160972c11c55f07b38b9830c 22-Mar-2006 Leonid Arsh <leonida@voltaire.com> IPoIB: Fix network interface "RUNNING" status

With the current IPoIB driver, the status of network interfaces stays
"RUNNING" even if the link goes down (for example because a cable is
unplugged). Fix this by flushing the IPoIB interface when the link
goes down.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_verbs.c
6f633c8d69415aabbccfcc494008e8e1300a98c1 25-Mar-2006 Leonid Arsh <leonida@voltaire.com> IPoIB: Pass correct pointer when flushing child interfaces

ipoib_ib_dev_flush() should get passed cpriv->dev, not &cpriv->dev.

Signed-off-by: Leonid Arsh <leonida@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
3d1f337b3e7378923c89f37afb573a918ef40be5 21-Mar-2006 Linus Torvalds <torvalds@g5.osdl.org> Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (235 commits)
[NETFILTER]: Add H.323 conntrack/NAT helper
[TG3]: Don't mark tg3_test_registers() as returning const.
[IPV6]: Cleanups for net/ipv6/addrconf.c (kzalloc, early exit) v2
[IPV6]: Nearly complete kzalloc cleanup for net/ipv6
[IPV6]: Cleanup of net/ipv6/reassambly.c
[BRIDGE]: Remove duplicate const from is_link_local() argument type.
[DECNET]: net/decnet/dn_route.c: fix inconsequent NULL checking
[TG3]: make drivers/net/tg3.c:tg3_request_irq() static
[BRIDGE]: use LLC to send STP
[LLC]: llc_mac_hdr_init const arguments
[BRIDGE]: allow show/store of group multicast address
[BRIDGE]: use llc for receiving STP packets
[BRIDGE]: stp timer to jiffies cleanup
[BRIDGE]: forwarding remove unneeded preempt and bh diasables
[BRIDGE]: netfilter inline cleanup
[BRIDGE]: netfilter VLAN macro cleanup
[BRIDGE]: netfilter dont use __constant_htons
[BRIDGE]: netfilter whitespace
[BRIDGE]: optimize frame pass up
[BRIDGE]: use kzalloc
...
e35fc385655ac584902edd98dd07ac488e986aa1 21-Mar-2006 Arnaldo Carvalho de Melo <acme@mandriva.com> [INFINIBAND] ipoib: Remove leftover use of neigh_ops->destructor

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_multicast.c
c5ecd62c25400a3c6856e009f84257d5bd03f03b 21-Mar-2006 Michael S. Tsirkin <mst@mellanox.co.il> [NET]: Move destructor from neigh->ops to neigh_params

struct neigh_ops currently has a destructor field, which no in-kernel
drivers outside of infiniband use. The infiniband/ulp/ipoib in-tree
driver stashes some info in the neighbour structure (the results of
the second-stage lookup from ARP results to real link-level path), and
it uses neigh->ops->destructor to get a callback so it can clean up
this extra info when a neighbour is freed. We've run into problems
with this: since the destructor is in an ops field that is shared
between neighbours that may belong to different net devices, there's
no way to set/clear it safely.

The following patch moves this field to neigh_parms where it can be
safely set, together with its twin neigh_setup. Two additional
patches in the patch series update ipoib to use this new interface.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
bfef73fa78ca1e56175dcbd33aa11de4764f85a5 20-Mar-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Get rid of useless test of queue length

In neigh_add_path(), the queue of delayed packets can never be full,
because the queue is always freshly created and cannot be found by any
other code path. In fact, the test of the queue length is worse than
useless: if somehow the test ever triggered and path_rec_start() also
failed, then dev_kfree_skb_any() will be called twice on the same skb.
Fix this by deleting the useless test. Pointed out by Michael
S. Tsirkin <mst@mellanox.co.il>.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
0b3ea0829cbcdaee6e018a83a2949ef458213f3b 20-Mar-2006 Jack Morgenstein <jackm@mellanox.co.il> IPoIB: Move ipoib_ib_dev_flush() to ipoib workqueue

Move ipoib_ib_dev_flush() to ipoib's workqueue. This keeps it ordered
with respect to other work scheduled by the ipoib driver. This fixes
problems with races, for example:
- ipoib_ib_dev_flush() has started running because of an IB event
- user does ifconfig ib0 down
- ipoib_mcast_stop_thread() gets called twice and waits for the same
completion twice

Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_verbs.c
8b9ab02b690e988f19c9d740ef642d7d833d23d5 07-Mar-2006 Roland Dreier <rdreier@cisco.com> IPoIB: Fix build now that neighbour destructor is in neigh_params

Fix the IPoIB build (which is broken in net-2.6.17 because of my
screw-up, which left out this chunk in ipoib_multicast.c).
The neighbour destructor is now in neigh_params, so we don't
need to clear it in the ops structure.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
9acf6a8570dcfc9f55724b8b71099fc8768e8c26 02-Mar-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Fix multicast race between canceling and completing

ipoib_mcast_stop_thread currently tests mcast->query and if it is
NULL, does not perform wait_for_completion on the mcast and frees the
mcast object directly.

However, since both operations are done without locking, it is
possible that ipoib_mcast_join_complete is in progress on this mcast
object and has set mcast->query to NULL already.

Solve this by:
- taking priv->lock before we change mcast->query in ipoib_mcast_join_complete,
and keeping it until we no longer need the mcast object
- taking priv->lock around mcast->query test in ipoib_mcast_stop_thread

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
54d07e2a1ead2f093ce054cda2e0f5ec163c650c 02-Mar-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Clean up if posting receives fails

If posting receives in ipoib_ib_dev_open() fails, call
ipoib_ib_dev_stop() to move the device's QP back to the RESET state so
that we can try again later.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
7343b231f22cec11f069bcdbb0c9a417df2750d3 28-Feb-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Close race in setting mcast->ah

ipoib_mcast_send() tests mcast->ah twice. If this value is changed
between these two points, we leak an skb. However,
ipoib_mcast_join_finish() sets mcast->ah with no locking, so it could
race against ipoib_mcast_send().

As a solution, take priv->lock around assignment to mcast->ah thus
making sure ipoib_mcast_send() (which also takes priv->lock) is not in
flight.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
44af79f9524c29d6850591cc972f2667a27234d4 21-Feb-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: clarify to_ipoib_neigh()

Cosmetic change: make alignment explicit in to_ipoib_neigh.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
20b83382d1c5d4d1a73fc5671261db5239d1dbb3 11-Feb-2006 Roland Dreier <rolandd@cisco.com> IPoIB: Yet another fix for send-only joins

Even after the last fix, it's still possible for a send-only join to
start before the join for the broadcast group has finished. This
could cause us to create a multicast group using attributes from the
broadcast group that haven't been initialized yet, so we would use
garbage for the Q_Key, etc. Fix this by waiting until the broadcast
group's attached flag is set before starting send-only joins.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
7bcb974ef6a0ae903888272c92c66ea779388c01 08-Feb-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Fix another send-only join race

Further, there's an additional issue that I saw in testing:
ipoib_mcast_send may get called when priv->broadcast is NULL (e.g. if
the device was downed and then upped internally because of a port
event).

If this happends and the send-only join request gets completed before
priv->broadcast is set, we get an oops.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
479a079663bd4c5f3d2714643b1b8c406aaba3e0 08-Feb-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Don't start send-only joins while multicast thread is stopped

Fix the following race scenario:
- Device is up.
- Port event or set mcast list triggers ipoib_mcast_stop_thread,
this cancels the query and waits on mcast "done" completion.
- Completion is called and "done" is set.
- Meanwhile, ipoib_mcast_send arrives and starts a new query,
re-initializing "done".

Fix this by adding a "multicast started" bit and checking it before
starting a send-only join.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_multicast.c
b36f170b617a7cd147b694dabf504e856a50ee9d 17-Jan-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Lock accesses to multicast packet queues

Avoid corrupting mcast->pkt_queue by serializing access with
priv->tx_lock. Also, update dropped packet statistics to count
multicast packets removed from pkt_queue as dropped.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
47f7a0714b67b904a3a36e2f2d85904e8064219b 17-Jan-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Make sure path is fully initialized before using it

The SA path record query completion can initialize path->pathrec.dlid
before IPoIB's callback runs and initializes path->ah, so we must test
ah rather than dlid.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
95ed644fd12f53c6fc778f3f246974e5fe3a9468 13-Jan-2006 Ingo Molnar <mingo@elte.hu> IB: convert from semaphores to mutexes

semaphore to mutex conversion by Ingo and Arjan's script.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ Sanity-checked on real IB hardware ]
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
poib_vlan.c
988bd50300ef2e2d5cb8563e2ac99453dd9acd86 12-Jan-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix memory leak of multicast group structures

The current handling of multicast groups in IPoIB ends up never
freeing send-only multicast groups. It turns out the logic was much
more complicated than it needed to be; we can fix this bug and
completely kill ipoib_mcast_dev_down() at the same time.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
poib_multicast.c
78bfe0b5b67fe126ed98608e42e42fb6ed9aabd4 11-Jan-2006 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: Take dev->xmit_lock around mc_list accesses

dev->mc_list accesses must be protected by dev->xmit_lock.
Found by Eli Cohen <eli@mellanox.co.il>.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
97460df37ea3335ca11562568932c9f9facfecdb 10-Jan-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix address handle refcounting for multicast groups

Multiple ipoib_neigh structures on mcast->neigh_list may point to the
same ah. This means that ipoib_mcast_free() can't just make a list of
ah structs to free, since this might end up trying to add the same ah
to the list more than once. Handle this in ipoib_multicast.c in the
same way as it is handled in ipoib_main.c for struct ipoib_path.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
70b4c8cdc168bb5d18e23fd205c4ede1b756a8b2 10-Jan-2006 Eli Cohen <eli@mellanox.co.il> IPoIB: Fix error path in ipoib_mcast_dev_flush()

Don't leak memory on allocation failure for broadcast mcast group.
Also, print a warning to match handling for other mcast groups.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
14c850212ed8f8cbb5972ad6b8812e08a0bc901c 27-Dec-2005 Arnaldo Carvalho de Melo <acme@mandriva.com> [INET_SOCK]: Move struct inet_sock & helper functions to net/inet_sock.h

To help in reducing the number of include dependencies, several files were
touched as they were getting needed headers indirectly for stuff they use.

Thanks also to Alan Menegotto for pointing out that net/dccp/proto.c had
linux/dccp.h include twice.

Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
poib_main.c
poib_multicast.c
267ee88ed34c76dc527eeb3d95f9f9558ac99973 29-Nov-2005 Roland Dreier <rolandd@cisco.com> IPoIB: fix error handling in ipoib_open

If ipoib_ib_dev_up() fails after ipoib_ib_dev_open() is called, then
ipoib_ib_dev_stop() needs to be called to clean up.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
4f71055a45a503273c039d80db8ba9b13cb17549 29-Nov-2005 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: protect child list in ipoib_ib_dev_flush

race condition: ipoib_ib_dev_flush is accessing child list without locks.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
2e86541ec878de9ec5771600a77f451a80bebfc4 29-Nov-2005 Roland Dreier <rolandd@cisco.com> IPoIB: don't zero members after we allocate with kzalloc

ipoib_mcast_alloc() uses kzalloc(), so there's no need to zero out
members of the mcast struct after it's allocated.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
de922487890936470660e89f9095aee980637989 29-Nov-2005 Michael S. Tsirkin <mst@mellanox.co.il> IPoIB: reinitialize mcast structs' completions for every query

Make sure mcast->done is initialized to uncompleted value before we
submit a new query, so that it's safe to wait on.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
5872a9fc28e6cd3a4e51479a50970d19a01573b3 29-Nov-2005 Roland Dreier <rolandd@cisco.com> IPoIB: always set path->query to NULL when query finishes

Always set path->query to NULL when the SA path record query
completes, rather than only when we don't have an address handle.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
65c7eddaba33995e013ef3c04718f6dc8fdf2335 29-Nov-2005 Roland Dreier <rolandd@cisco.com> IPoIB: reinitialize path struct's completion for every query

It's possible that IPoIB will issue multiple SA queries for the same
path struct. Therefore the struct's completion needs to be
initialized for each query rather than only once when the struct is
allocated, or else we might not wait long enough for later queries to
finish and free the path struct too soon.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
8c608a32e3cd7ff14498ad996ca32d1452245a97 07-Nov-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] no need to set skb->dev right before freeing skb

For cut-and-paste reasons, the IPoIB driver was setting skb->dev right
before calling dev_kfree_skb_any(). Get rid of this.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
1732b0ef3b3a02e3df328086fb3018741c5476da 07-Nov-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] add path record information in debugfs

Add ibX_path files to debugfs that contain information about the IPoIB
path cache. IPoIB ARP only gives GIDs, which the IPoIB driver must
resolve to real IB paths through the ib_sa module. For debugging,
when the ARP table looks OK but traffic isn't flowing, it's useful to
be able to see if the resolution from GID to path worked.

Also clean up the formatting of the existing _mcg debugfs files.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_fs.c
poib_main.c
poib_multicast.c
poib_vlan.c
8ae5a8a24f7fe797027d481f88c1464b0e47eede 03-Nov-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] don't compile debug code if debugging isn't enabled

Don't build ipoib_mcast_iter_ functions if CONFIG_INFINIBAND_IPOIB_DEBUG
is not enabled -- their only callers will not be built either.

Also move the prototype for ipoib_open() to ipoib.h to fix a sparse warning.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_multicast.c
21a384897d48c116b879924c3dd9e96f6f1e764b 02-Nov-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] remove unneeded initializations to 0

Shrink our source and .text a little by removing a few assignments of
NULL and 0 to memory that is already cleared as part of the allocation.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
de6eb66b56d9df5ce6bd254994f05e065214e8cd 02-Nov-2005 Roland Dreier <rolandd@cisco.com> [IB] kzalloc() conversions

Replace kmalloc()+memset(,0,) with kzalloc(), for a net savings of 35
source lines and about 500 bytes of text.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
poib_multicast.c
3bc12e75b23c0499cc2c0873a5f77494be173761 30-Oct-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] cleanups: fix comment, remove useless variables

Minor cleanups: fix a misleading comment, and get rid of attr_mask
variables that are only used to hold constants (just use the constants
directly).

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
poib_verbs.c
a20583a7c2e35d80b1dfc1f60c9729498838725e 29-Oct-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] use spin_trylock_irqsave()

Use spin_trylock_irqsave() in ipoib_start_xmit() instead of
reinventing it out of local_irq_save(), spin_trylock() and
local_irq_restore().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
1993d683f39f77ddb46a662d7146247877d50b8f 29-Oct-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] Drop RX packets when out of memory

Change the way IPoIB handles RX packets when it can't allocate a new
receive skbuff. If the allocation of a new receive skb fails, we now
drop the packet we just received and repost the original receive skb.
This means that the receive ring always stays full and we don't have
to monkey around with trying to schedule a refill task for later.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
4b2d319b53810ab00ef3d8fdfc1c1ab0647ab0a7 18-Oct-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] Improve ipoib_timeout() output

Use jiffies_to_msecs() so we print a human-readable time so
we don't have to worry about what HZ is configured to, and
print out a few values to make post-mortem analysis easier.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
5b6810e048435de508ef66aebd6b78db13d651b8 11-Oct-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] Rename ipoib_create_qp() -> ipoib_init_qp() and fix error cleanup

ipoib_create_qp() no longer creates IPoIB's QP, so it shouldn't
destroy the QP on failure -- that unwinding happens elsewhere, so the
current code can cause a double free. While we're at it, the
function's name should match what it actually does, so rename it to
ipoib_init_qp().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_verbs.c
d70ed6075f15bdbb0548d162394bf10332769c88 29-Sep-2005 Roland Dreier <rolandd@cisco.com> [IPoIB] Rename IPoIB's path_lookup() to avoid name clashes

Rename IPoIB driver's path_lookup() to ipoib_path_lookup() to avoid a
clashes with the kernel global path_lookup(). We don't hit this with
the current kernel source, but some external patches seem to trigger
this, and it's cleaner to avoid clashing with global names anyway.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
refs/heads/for-linus
poib_main.c
8d2cae0651502028bf64844508ab18528bbd65c2 20-Sep-2005 Roland Dreier <rolandd@cisco.com> [PATCH] IPoIB: Don't flush workqueue from within workqueue

ipoib_mcast_restart_task() is always called from within the
single-threaded IPoIB workqueue, so flushing the workqueue from within
the function can lead to a recursion overflow. But since we're
running in a single-threaded workqueue, we're already synchronized
against other items in the workqueue, so just get rid of the flush in
ipoib_mcast_restart_task().

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_multicast.c
ce5b65cc9626feac0d4ffb96f798407e50c45575 18-Sep-2005 Hal Rosenstock <halr@voltaire.com> [PATCH] IPoIB: Fix SA client retransmission strategy

We got a little mixed up with what the backoff member holds in the
IPoIB multicast group structure: sometimes it was used as a number of
seconds, and sometimes it was used as a number of jiffies. Fix the
code so that backoff is always in seconds.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_multicast.c
51574e0398a2d93cbf7f26e36b673cd919062268 12-Sep-2005 Michael S. Tsirkin <mst@mellanox.co.il> [PATCH] IPoIB: fix module removal race

Since ipoib uses queue_delayed_work to run flush task on port state events,
it must flush scheduled work after unregistering the event handler.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
06c56e44f3e32a859420ecac97996cc6f12827bb 01-Sep-2005 Michael S. Tsirkin <mst@mellanox.co.il> [PATCH] IPoIB: fix memory leak

Fix IPoIB memory leak on device removal.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
a4d61e84804f3b14cc35c5e2af768a07c0f64ef6 25-Aug-2005 Roland Dreier <roland@eddore.topspincom.com> [PATCH] IB: move include files to include/rdma

Move the InfiniBand headers from drivers/infiniband/include to include/rdma.
This allows InfiniBand-using code to live elsewhere, and lets us remove the
ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
akefile
poib.h
poib_ib.c
poib_verbs.c
1ad62a19f177e61d4dde111ba35fb4badd0c2106 24-Aug-2005 Michael S. Tsirkin <mst@mellanox.co.il> [PATCH] IPoIB: Fix device removal race

Currently we may have work scheduled in default kernel workqueue when
the device is going down. The device could get freed before this
workqueue gets serviced. I am actually seeing this causing system
hangs.

The following patch fixes this by using ipoib_workqueue which gets
flushed when the device is going down.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
4ce059378c04b40c2e9f658b1c6a2e9078b85c7c 19-Aug-2005 Roland Dreier <roland@eddore.topspincom.com> [PATCH] IPoIB: Set full membership bit in P_Keys

Always make sure that the full membership bit is set in the P_Keys
that IPoIB uses. This makes sure that all hosts join the correct
multicast groups so that hosts that are partial partition members
can talk to the rest of the network.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
2aeba9a03b0d249fc710b9939fc089ce53d8cd30 15-Aug-2005 Olaf Hering <olh@suse.de> [PATCH] IB: Remove unnecessary includes of <linux/version.h>

changing CONFIG_LOCALVERSION rebuilds too much, for no appearent reason.
Remove unneeded includes of <linux/version.h>.

Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
poib_vlan.c
97f52eb438be7caebe026421545619d8a0c1398a 14-Aug-2005 Sean Hefty <sean.hefty@intel.com> [PATCH] IB: sparse endianness cleanup

Fix sparse warnings. Use __be* where appropriate.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_fs.c
poib_main.c
poib_multicast.c
92a6b34bf4d0d11c54b2a6bdd6240f98cb326200 14-Aug-2005 Hal Rosenstock <halr@voltaire.com> [PATCH] IB: Eliminate redundant NULL checks

IPoIB: Eliminate NULL checks prior to calling kfree

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
2a1d9b7f09aaaacf235656cb32a40ba2c79590b3 11-Aug-2005 Roland Dreier <roland@eddore.topspincom.com> [PATCH] IB: Add copyright notices

Make some lawyers happy and add copyright notices for people who
forgot to include them when they actually touched the code.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib.h
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
0dca0f7bf82face7b700890318d5550fd542cabf 28-Jul-2005 Hal Rosenstock <halr@voltaire.com> [PATCH] [IPoIB] Handle sending of unicast RARP responses

RARP replies are another valid case where IPoIB may need to send a
unicast packet with no neighbour structure.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_main.c
2181858bb814b51de8ec25b3ddd37cd06c53b0c9 27-Jul-2005 Roland Dreier <roland@eddore.topspincom.com> [IB/ipoib]: Fix unsigned comparisons to handle wraparound

Fix handling of tx_head/tx_tail comparisons to handle wraparound.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
poib_ib.c
9adec1a808603698bd7ff47f3883bd7cd1383f90 17-Apr-2005 Roland Dreier <roland@topspin.com> [PATCH] IPoIB: convert to debugfs

Convert IPoIB to use debugfs instead of its own custom debugging filesystem.

Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
poib_fs.c
poib_main.c
e6ded99cbbbfef2cef537d717ad61d2f77f4dfd6 17-Apr-2005 Roland Dreier <roland@topspin.com> [PATCH] IPoIB: fix static rate calculation

Correct and simplify calculation of static rate. We need to round up the
quotient of (local_rate - path_rate) / path_rate. To round up we add
(path_rate - 1) to the numerator, so the quotient simplifies to (local_rate -
1) / path_rate.

No idea how I came up with the old formula.

Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
poib_main.c
poib_multicast.c
62241eb497721be7640e5d9330e60f4a88a4db46 17-Apr-2005 Hal Rosenstock <halr@voltaire.com> [PATCH] IPoIB: set skb->mac.raw on receive

Set skb->mac.raw on receive. This fixes crashes when this is
dereferenced, for example by netfilter or when PF_PACKET is used.

Signed-off-by: Hal Rosenstock <halr@voltaire.com>
Signed-off-by: Roland Dreier <roland@topspin.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
poib_ib.c
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!
config
akefile
poib.h
poib_fs.c
poib_ib.c
poib_main.c
poib_multicast.c
poib_verbs.c
poib_vlan.c