History log of /drivers/infiniband/hw/qib/qib.h
Revision Date Author Comments
7d7632add8dd99f68b21546efff08a5a162de184 07-Mar-2014 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Modify software pma counters to use percpu variables

The counters, unicast_xmit, unicast_rcv, multicast_xmit, multicast_rcv
are now maintained as percpu variables.

The mad code is modified to add a z_ latch so that the percpu counters
monotonically increase with appropriate adjustments in the reset,
read logic to maintain the z_ latch.

This patch also corrects the fact the unitcast_xmit wasn't handled
at all for UC and RC QPs.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1ed88dd7d0b361e677b2690f573e5c274bb25c87 07-Mar-2014 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Add percpu counter replacing qib_devdata int_counter

This patch replaces the dd->int_counter with a percpu counter.

The maintanance of qib_stats.sps_ints and int_counter are
combined into the new counter.

There are two new functions added to read the counter:
- qib_int_counter (for a particular qib_devdata)
- qib_sps_ints (for all HCAs)

A z_int_counter is added to allow the interrupt detection logic
to determine if interrupts have occured since z_int_counter
was "reset".

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
bea25e82c61fdf693949178594ee58aced72927d 07-Aug-2013 Paul Bolle <pebolle@tiscali.nl> IB/qib: Make qib_driver static

struct pci_driver qib_driver is only used in qib_init.c. Remove it
from qib.h and make it static in qib_init.c.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Roland Dreier <roland@purestorage.com>
4668e4b527d2263a74a85334243bb740905da51e 19-Jul-2013 CQ Tang <cq.tang@intel.com> IB/qib: Improve SDMA performance

1. The code accepts chunks of messages, and splits the chunk into
packets when converting packets into sdma queue entries. Adjacent
packets will use user buffer pages smartly to avoid pinning the
same page multiple times.

2. Instead of discarding all the work when SDMA queue is full, the
work is saved in a pending queue. Whenever there are enough SDMA
queue free entries, pending queue is directly put onto SDMA queue.

3. An interrupt handler is used to progress this pending queue.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: CQ Tang <cq.tang@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>

[ Fixed up sparse warnings. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
0b3ddf380ca7aa6a009cc3e1944933fff8113b6a 11-Jul-2013 Dean Luick <dean.luick@intel.com> IB/qib: Log all SDMA errors unconditionally

This patch adds code to log SDMA errors for supportability purposes.

Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
ddb8876589702a9396d15d9d4075e6388d0600cf 15-Jun-2013 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Convert opcode counters to per-context

This fix changes the opcode relative counters for receive to per
context.

Profiling has shown that when mulitple contexts are being used there
is a lot of cache activity associated with these counters.

The code formerly kept these counters per port, but only provided the
interface to read per HCA. This patch converts the read of counters
to per HCA and adds the debugfs hooks to be able to read the file as a
sequence of opcodes.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
85caafe307a06e4f9993c8f3c994a07374c07831 04-Jun-2013 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Optimize CQ callbacks

The current workqueue implemention has the following performance
deficiencies on QDR HCAs:

- The CQ call backs tend to run on the CPUs processing the
receive queues
- The single thread queue isn't optimal for multiple HCAs

This patch adds a dedicated per HCA bound thread to process CQ callbacks.

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
e0f30baca1ebe5547f6760f760b8c4e189fc1203 28-May-2013 Ramkrishna Vepa <ramkrishna.vepa@intel.com> IB/qib: Add optional NUMA affinity

This patch adds context relative numa affinity conditioned on the
module parameter numa_aware. The qib_ctxtdata has an additional
node_id member and qib_create_ctxtdata() has an addition node_id
parameter.

The allocations within the hdr queue and eager queue setup routines
now take this additional member and adjust allocations as necesary.
PSM will pass the either current numa node or the node closest to the
HCA depending on numa_aware. Verbs will always use the node closest to
the HCA.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Vinit Agnihotri <vinit.abhay.agnihotri@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
8469ba39a6b77917e8879680aed17229bf72f263 31-May-2013 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Add DCA support

This patch adds DCA cache warming for systems that support DCA.

The code uses cpu affinity notification to react to an affinity change
from a user mode program like irqbalance and (re-)program the chip
accordingly. This notification avoids reading the current cpu on every
interrupt.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>

[ Add Kconfig dependency on SMP && GENERIC_HARDIRQS to avoid failure to
build due to undefined struct irq_affinity_notify. - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
1d3520357df99baf4ad89f86268ac96cd38092d9 07-Sep-2012 Stephen Hemminger <shemminger@vyatta.com> make drivers with pci error handlers const

Covers the rest of the uses of pci error handler.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
5d7fe4efbf0878e0ef12c8f93e7a16c750494b7e 23-Jul-2012 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Fix size of cc_supported_table_entries

Commit 36a8f01cd24b ("IB/qib: Add congestion control agent
implementation") tries to store the value 1984 in a u8, which leads to
truncation. Fix this by making the member big enough.

This bug was detected by a smatch warning.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
36a8f01cd24b125aa027c71c1288588edde5322d 19-Jul-2012 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Add congestion control agent implementation

Add a congestion control agent in the driver that handles gets and
sets from the congestion control manager in the fabric for the
Performance Scale Messaging (PSM) library.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
551ace124d0ef471e8a5fee3ef9e5bb7460251be 19-Jul-2012 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Reduce sdma_lock contention

Profiling has shown that sdma_lock is proving a bottleneck for
performance. The situations include:
- RDMA reads when krcvqs > 1
- post sends from multiple threads

For RDMA read the current global qib_wq mechanism runs on all CPUs
and contends for the sdma_lock when multiple RMDA read requests are
fielded on differenct CPUs. For post sends, the direct call to
qib_do_send() from multiple threads causes the contention.

Since the sdma mechanism is per port, this fix converts the existing
workqueue to a per port single thread workqueue to reduce the lock
contention in the RDMA read case, and for any other case where the QP
is scheduled via the workqueue mechanism from more than 1 CPU.

For the post send case, This patch modifies the post send code to test
for a non empty sdma engine. If the sdma is not idle the (now single
thread) workqueue will be used to trigger the send engine instead of
the direct call to qib_do_send().

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
1c94283ddbe8a9945c4aaac8b0be90d47f97f2df 07-May-2012 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Add cache line awareness to qib_qp and qib_devdata structures

This patch reorganizes the QP and devdata files to be more cache line aware.

qib_qp fields in particular are split into read-mostly, send, and receive fields.

qib_devdata fields are split into read-mostly and read/write fields

Testing has show that bidirectional tests improve by as much as 100%
with this patch.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
bb77a077232e78476d7bc39c080f9e6685cbfd3c 07-May-2012 Mike Marciniszyn <mike.marciniszyn@intel.com> IB/qib: Optimize pio ack buffer allocation

This patch optimizes pio buffer allocation in the kernel.

For qib, kernel pio buffers are used for sending acks. The code to
allocate the buffer would always start at 0 until it found a buffer.

This means that an average of 64 comparisions were done on each
allocate, since the busy bit won't be cleared until the bits are
refreshed when buffers are exhausted.

This patch adds two new fields in the devdata struct, last_pio and
min_kernel_pio. last_pio is the last buffer that was allocated.
min_kernel_pio is the lowest potential available buffer.

min_kernel_pio is modifed as contexts are allocated and deallocted.

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
a778f3fddc6fc2ed4c065f6e160d517a5959f949 26-Feb-2012 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/qib: Add logic for affinity hint

Call irq_set_affinity_hint() to give userspace programs such as
irqbalance the information to be able to distribute qib interrupts
appropriately.

The logic allocates all non-receive interrupts to the first CPU local
to the HCA. Receive interrupts are allocated round robin starting
with the second CPU local to the HCA with potential wrap back to the
second CPU.

This patch also adds a refinement to the name registered for MSI-X
interrupts so that user level scripts can determine the device
associated with the IRQs when there are multiple HCAs with a
potentially different set of local CPUs.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
af061a644a0e4d4778fe6cd2246479c1962e153b 23-Sep-2011 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/qib: Use RCU for qpn lookup

The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU.
The hash list itself is now accessed via jhash functions instead of mod.

The changes should benefit multiple receive contexts in different
processors by not contending for the lock just to read the hash
structures.

The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in
the context. The interrupt handler will test the current packet's qpn
against lookaside_qpn if the lookaside_qp pointer is non-NULL. The
pointer is NULL'ed when the interrupt handler exits.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
9e1c0e43257b6df1ef012dd37c3f0f93b1ee47af 23-Sep-2011 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/qib: Eliminate divide/mod in converting idx to egr buf pointer

The context init now saves a shift from rcvegrbufs_perchunk
rcvegrbufs_perchunk_shift using ilog2. A BUG_ON() protects the
power of 2 assumption.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
53ab1c6498371723c31b18400fab10a902a15a63 06-Oct-2011 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/qib: Correct nfreectxts for multiple HCAs

The code that was recently introduced to report the number
of free contexts is flawed for multiple HCAs:

/* Return the number of free user ports (contexts) available. */
return scnprintf(buf, PAGE_SIZE, "%u\n", dd->cfgctxts -
dd->first_user_ctxt - (u32)qib_stats.sps_ctxts);

The qib_stats is global to the module, not per HCA, so the code is broken
for multiple HCAs.

This patch adds a qib_devdata field, freectxts, that reflects the free
contexts for this HCA.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
e67306a38063d75f61d405527ff8bf1c8e92eb84 21-Jul-2011 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/qib: Defer HCA error events to tasklet

With ib_qib options:

options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4

a run of ib_write_bw -a yields the following:

------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2910.64 229.80
------------------------------------------------------------------

The top cpu use in a profile is:

CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
of 0x00 (No unit mask) count 1002300
Counted LLC_MISSES events (Last level cache demand requests from this core that
missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
samples % samples % app name symbol name
15237 29.2642 964 17.1195 ib_qib.ko qib_7322intr
12320 23.6618 1040 18.4692 ib_qib.ko handle_7322_errors
4106 7.8860 0 0 vmlinux vsnprintf


Analysis of the stats, profile, the code, and the annotated profile indicate:
- All of the overflow interrupts (one per packet overflow) are
serviced on CPU0 with no mitigation on the frequency.
- All of the receive interrupts are being serviced by CPU0. (That is
the way truescale.cmds statically allocates the kctx IRQs to CPU)
- The code is spending all of its time servicing QIB_I_C_ERROR
RcvEgrFullErr interrupts on CPU0, starving the packet receive
processing.
- The decode_err routine is very inefficient, using a printf variant
to format a "%s" and continues to loop when the errs mask has been
cleared.
- Both qib_7322intr and handle_7322_errors read pci registers, which
is very inefficient.

The fix does the following:
- Adds a tasklet to service QIB_I_C_ERROR
- Replaces the very inefficient scnprintf() with a memcpy(). A field
is added to qib_hwerror_msgs to save the sizeof("string") at
compile time so that a strlen is not needed during err_decode().
- The most frequent errors (Overflows) are serviced first to exit the
loop as early as possible.
- The loop now exits as soon as the errs mask is clear rather than
fruitlessly looping through the msp array.

With this fix the performance changes to:

------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
1048576 5000 2990.64 2941.35
------------------------------------------------------------------

During testing of the error handling overflow patch, it was determined
that some CPU's were slower when servicing both overflow and receive
interrupts on CPU0 with different MSI interrupt vectors.

This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
interrupt for kctx's < 2 and to service them on the default interrupt.
For some CPUs, the cost of the interrupt enter/exit is more costly
than then the additional PCI read in the default handler.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
19ede2e422496b2a064b9b22823c6afb66ff927b 11-Jan-2011 Mike Marciniszyn <mike.marciniszyn@qlogic.com> IB/qib: Fix interrupt mitigation

For SusieQ we need to write to the interrupt timer register before
updating the header queue head with interrupt count. This is to
ensure that the timer is enabled properly and a receive available
interrupt is delivered. Otherwise this interrupt can be lost if the
receiver header/eager queues are full before the timer is enabled.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
82fdb0ab54096b8dbc8558e2dd37e9e0ac180db8 22-Oct-2010 Jason Gunthorpe <jgunthorpe@obsidianresearch.com> IB/qib: Fix extra log level in qib_early_err()

Noticed this odd looking thing in dmesg:

ib_qib 0000:02:00.0: <3>ib_qib: Unable to enable pcie error reporting: -5

which is due to a bad use of dev_info.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Acked-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ba818afdc62590e95e45d63be96954ea568925bf 05-Aug-2010 David Miller <davem@davemloft.net> IB/qib: Add missing <linux/slab.h> include

Fix build failure on sparc64 which is missing the include of
<linux/slab.h> via <asm/pci.h> that x86, powerpc, ia64, etc. have.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
cc323b2aaa3921c4eeec309ff64256b0c43ca752 03-Jun-2010 Ralph Campbell <ralph.campbell@qlogic.com> IB/qib: Avoid variable-length array

Rather than use a variable size array allocation on the stack,
define a constant for the maximum array size possible.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
fce24a9d28f8b99fd0eacc14e252ab4fca9527a7 18-Jun-2010 Dave Olson <dave.olson@qlogic.com> IB/qib: Don't mark VL15 bufs as WC to avoid a rare 7322 chip problem

Don't set write combining via PAT on the VL15 buffers to avoid a rare
problem with unaligned writes from interrupt-flushed store buffers.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
f931551bafe1f10ded7f5282e2aa162c267a2e5d 24-May-2010 Ralph Campbell <ralph.campbell@qlogic.com> IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters

Add a low-level IB driver for QLogic PCIe adapters.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>