History log of /net/rds/ib_send.c
Revision Date Author Comments
71fd762f2eef6acc848e262ac934fc694b49204e 18-May-2014 Manuel Schölling <manuel.schoelling@gmx.de> net: rds: Use time_after() for time comparison

To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
18fc25c94eadc52a42c025125af24657a93638c0 03-Dec-2013 Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> rds: prevent BUG_ON triggered on congestion update to loopback

After congestion update on a local connection, when rds_ib_xmit returns
less bytes than that are there in the message, rds_send_xmit calls
back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE)
to trigger.

For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually
the message contains 8240 bytes. rds_send_xmit thinks there is more to send
and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header)
=4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].

The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f
"rds: prevent BUG_ON triggering on congestion map updates" introduced
this regression. That change was addressing the triggering of a different
BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE:
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
This was the sequence it was going through:
(rds_ib_xmit)
/* Do not send cong updates to IB loopback */
if (conn->c_loopback
&& rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) {
rds_cong_map_updated(conn->c_fcong, ~(u64) 0);
return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES;
}
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time)
= 8192
c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again
rds_ib_xmit returns 8240
rds_send_xmit:
c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again
and so on (c_xmit_data_off 24672,32912,41152,49392,57632)
rds_ib_xmit returns 8240
On this iteration this sequence causes the BUG_ON in rds_send_xmit:
while (ret) {
tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off);
[tmp = 65536 - 57632 = 7904]
conn->c_xmit_data_off += tmp;
[c_xmit_data_off = 57632 + 7904 = 65536]
ret -= tmp;
[ret = 8240 - 7904 = 336]
if (conn->c_xmit_data_off == sg->length) {
conn->c_xmit_data_off = 0;
sg++;
conn->c_xmit_sg++;
BUG_ON(ret != 0 &&
conn->c_xmit_sg == rm->data.op_nents);
[c_xmit_sg = 1, rm->data.op_nents = 1]

What the current fix does:
Since the congestion update over loopback is not actually transmitted
as a message, all that rds_ib_xmit needs to do is let the caller think
the full message has been transmitted and not return partial bytes.
It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb.
And 64Kb+48 when page size is 64Kb.

Reported-by: Josh Hunt <joshhunt00@gmail.com>
Tested-by: Honggang Li <honli@redhat.com>
Acked-by: Bang Nguyen <bang.nguyen@oracle.com>
Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cb0a60564943db21ed3af975ac3d578cdc80b329 16-Jun-2011 Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> net/rds: use prink_ratelimited() instead of printk_ratelimit()

Since printk_ratelimit() shouldn't be used anymore (see comment in
include/linux/printk.h), replace it with printk_ratelimited()

Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
6094628bfd94323fc1cea05ec2c6affd98c18f7f 02-Mar-2011 Neil Horman <nhorman@tuxdriver.com> rds: prevent BUG_ON triggering on congestion map updates

Recently had this bug halt reported to me:

kernel BUG at net/rds/send.c:329!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg
ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt
dm_mod [last unloaded: scsi_wait_scan]
NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770
REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64)
MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 44000022 XER: 00000000
TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0
GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030
GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030
GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000
GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00
GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001
GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000
GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860
GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8
NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds]
LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
Call Trace:
[c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds]
(unreliable)
[c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds]
[c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0
[c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0
[c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70
Instruction dump:
4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c
7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000
Kernel panic - not syncing: Fatal exception
Call Trace:
[c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable)
[c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4
[c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0
[c000000175cab750] [c000000000030000] ._exception+0x110/0x220
[c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180

Signed-off-by: David S. Miller <davem@davemloft.net>
20c72bd5f5f902e5a8745d51573699605bf8d21c 25-Aug-2010 Andy Grover <andy.grover@oracle.com> RDS: Implement masked atomic operations

Add two CMSGs for masked versions of cswp and fadd. args
struct modified to use a union for different atomic op type's
arguments. Change IB to do masked atomic ops. Atomic op type
in rds_message similarly unionized.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
59f740a6aeb2cde2f79fe0df38262d4c1ef35cd8 03-Aug-2010 Zach Brown <zach.brown@oracle.com> RDS/IB: print string constants in more places

This prints the constant identifier for work completion status and rdma
cm event types, like we already do for IB event types.

A core string array helper is added that each string type uses.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
f046011cd73c372267befd10242988eb744649fe 14-Jul-2010 Zach Brown <zach.brown@oracle.com> RDS/IB: track signaled sends

We're seeing bugs today where IB connection shutdown clears the send
ring while the tasklet is processing completed sends. Implementation
details cause this to dereference a null pointer. Shutdown needs to
wait for send completion to stop before tearing down the connection. We
can't simply wait for the ring to empty because it may contain
unsignaled sends that will never be processed.

This patch tracks the number of signaled sends that we've posted and
waits for them to complete. It also makes sure that the tasklet has
finished executing.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
0f4b1c7e89e699f588807a914ec6e6396c851a72 04-Jun-2010 Zach Brown <zach.brown@oracle.com> rds: fix rds_send_xmit() serialization

rds_send_xmit() was changed to hold an interrupt masking spinlock instead of a
mutex so that it could be called from the IB receive tasklet path. This broke
the TCP transport because its xmit method can block and masks and unmasks
interrupts.

This patch serializes callers to rds_send_xmit() with a simple bit instead of
the current spinlock or previous mutex. This enables rds_send_xmit() to be
called from any context and to call functions which block. Getting rid of the
c_send_lock exposes the bare c_lock acquisitions which are changed to block
interrupts.

A waitqueue is added so that rds_conn_shutdown() can wait for callers to leave
rds_send_xmit() before tearing down partial send state. This lets us get rid
of c_senders.

rds_send_xmit() is changed to check the conn state after acquiring the
RDS_IN_XMIT bit to resolve races with the shutdown path. Previously both
worked with the conn state and then the lock in the same order, allowing them
to race and execute the paths concurrently.

rds_send_reset() isn't racing with rds_send_xmit() now that rds_conn_shutdown()
properly ensures that rds_send_xmit() can't start once the conn state has been
changed. We can remove its previous use of the spinlock.

Finally, c_send_generation is redundant. Callers can race to test the c_flags
bit by simply retrying instead of racing to test the c_send_generation atomic.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
89bf9d4158b5a1b6bd00960eb2e47601ec8cc138 19-May-2010 Zach Brown <zach.brown@oracle.com> RDS/IB: get the xmit max_sge from the RDS IB device on the connection

rds_ib_xmit_rdma() was calling ib_get_client_data() to get at the rds_ibdevice
just to get the max_sge for the transmit. This patch instead has it get it
directly off the rds_ibdev which is stored on the connection.

The current code won't free the rds_ibdev until all the IB connections that use
it are freed. So it's safe to reference the rds_ibdev this way. In the future
it also makes it easier to support proper reference counting of the rds_ibdev
struct.

As an additional bonus, this gets rid of the performance hit of calling in to
the IB stack to look up the rds_ibdev. The current implementation in the IB
stack acquires an interrupt blocking spinlock to protect the registration of
client callback data.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
1cc2228c599f173d77000a250bf0541294e1a7be 12-May-2010 Chris Mason <chris.mason@oracle.com> rds: Fix reference counting on the for xmit_atomic and xmit_rdma

This makes sure we have the proper number of references in
rds_ib_xmit_atomic and rds_ib_xmit_rdma. We also consistently
drop references the same way for all message types as the IOs end.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
c9e65383a20d9a656db70efbf67e57f8115ad776 12-May-2010 Chris Mason <chris.mason@oracle.com> rds: Fix RDMA message reference counting

The RDS send_xmit code was trying to get fancy with message
counting and was dropping the final reference on the RDMA messages
too early. This resulted in memory corruption and oopsen.

The fix here is to always add a ref as the parts of the message passes
through rds_send_xmit, and always drop a ref as the parts of the message
go through completion handling.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
51e2cba8b5936c13b40f0fa11aa4e84683dbc751 30-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Move atomic stats from general to ib-specific area

Signed-off-by: Andy Grover <andy.grover@oracle.com>
ff3d7d36134ef7138803734fdbf91cc986ea7976 01-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Perform unmapping ops in stages

Previously, RDS would wait until the final send WR had completed
and then handle cleanup. With silent ops, we do not know
if an atomic, rdma, or data op will be last. This patch
handles any of these cases by keeping a pointer to the last
op in the message in m_last_op.

When the TX completion event fires, rds dispatches to per-op-type
cleanup functions, and then does whole-message cleanup, if the
last op equalled m_last_op.

This patch also moves towards having op-specific functions take
the op struct, instead of the overall rm struct.

rds_ib_connection has a pointer to keep track of a a partially-
completed data send operation. This patch changes it from an
rds_message pointer to the narrower rm_data_op pointer, and
modifies places that use this pointer as needed.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
6c7cc6e4694dc464ae884332f2a322973497e3cf 28-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Rename data op members prefix from m_ to op_

For consistency.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
f8b3aaf2ba8ca9e27b47f8bfdff07c8b968f2c05 01-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Remove struct rds_rdma_op

A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
241eef3e2f51fe4ad50abacd7f79c4e2d468197e 20-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Implement silent atomics

Signed-off-by: Andy Grover <andy.grover@oracle.com>
c8de3f1005e8359ea07083e37f3f993646e1adba 16-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS/IB: Make all flow control code conditional on i_flowctl

Maybe things worked fine with the flow control code running
even in the non-flow-control case, but making it explicitly
conditional helps the non-fc case be easier to read.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
1d34f175712b59ad292ecbbaa8fc05402a1fd8ed 15-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Remove unsignaled_bytes sysctl

Removed unsignaled_bytes sysctl and code to signal
based on it. I believe unsignaled_wrs is more than
sufficient for our purposes.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
da5a06cef5724737af4315715632f0a07dd5e116 14-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: rewrite rds_ib_xmit

Now that the header always goes first, it is possible to
simplify rds_ib_xmit. Instead of having a path to handle 0-byte
dgrams and another path to handle >0, these can both be handled
in one path. This lets us eliminate xmit_populate_wr().

Rename sent to bytes_sent, to differentiate better from other
variable named "send".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
919ced4ce7d6ac62dd5be62d8993fe22a527d53a 14-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS/IB: Remove ib_[header/data]_sge() functions

These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.

Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
6f3d05db0da0b874afd2dd229bed715133532f8d 14-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS/IB: Remove dead code

Signed-off-by: Andy Grover <andy.grover@oracle.com>
9c030391e8741695ff6114703e4edccccb634479 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS/IB: eliminate duplicate code

both atomics and rdmas need to convert ib-specific completion codes
into RDS status codes. Rename rds_ib_rdma_send_complete to
rds_ib_send_complete, and have it take a pointer to the function to
call with the new error code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
15133f6e67d8d646d0744336b4daa3135452cb0d 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Implement atomic operations

Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm

Signed-off-by: Andy Grover <andy.grover@oracle.com>
ff87e97a9d70c9ae133d3d3d7792b26ab85f4297 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: make m_rdma_op a member of rds_message

This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.

rdma SGs allocated from rm sg pool.

rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
21f79afa5fda2820671a8f64c3d0e43bb118053b 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: fold rdma.h into rds.h

RDMA is now an intrinsic part of RDS, so it's easier to just have
a single header.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
e779137aa76d38d5c33a98ed887092ae4e4f016f 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: break out rdma and data ops into nested structs in rds_message

Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
8690bfa17aea4c42da1bcf90a7af93d161eca624 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons

Favor "if (foo)" style over "if (foo != NULL)".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
450d06c0208ad195ccd74a7edd11321e316791ad 11-Mar-2010 Sherman Pun <sherman.pun@sun.com> RDS: Properly unmap when getting a remote access error

If the RDMA op has aborted with a remote access error,
in addition to what we already do (tell userspace it has
completed with an error) also unmap it and put() the rm.

Otherwise, hangs may occur on arches that track maps and
will not exit without proper cleanup.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2e7b3b994529d4760231a45a6b88950187bda877 11-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Fix congestion issues for loopback

We have two kinds of loopback: software (via loop transport)
and hardware (via IB). sw is used for 127.0.0.1, and doesn't
support rdma ops. hw is used for sends to local device IPs,
and supports rdma. Both are used in different cases.

For both of these, when there is a congestion map update, we
want to call rds_cong_map_updated() but not actually send
anything -- since loopback local and foreign congestion maps
point to the same spot, they're already in sync.

The old code never called sw loop's xmit_cong_map(),so
rds_cong_map_updated() wasn't being called for it. sw loop
ports would not work right with the congestion monitor.

Fixing that meant that hw loopback now would send congestion maps
to itself. This is also undesirable (racy), so we check for this
case in the ib-specific xmit code.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
735f61e62611161588123930823af6e6a9fd5c2c 11-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Do not BUG() on error returned from ib_post_send

BUGging on a runtime error code should be avoided. This
patch also eliminates all other BUG()s that have no real
reason to exist.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f64f9e719261a87818dd192a3a2352e5b20fbd0f 30-Nov-2009 Joe Perches <joe@perches.com> net: Move && and || to end of previous line

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7b70d0336da777c00395cc7a503497c2cdabd1a8 09-Apr-2009 Steve Wise <swise@opengridcomputing.com> RDS/IW+IB: Allow max credit advertise window.

Fix hack that restricts the credit advertisement to 127.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d39e0602bb987133321d358d9b837d67c27b223d 09-Apr-2009 Steve Wise <swise@opengridcomputing.com> RDS/IW+IB: Set the RDS_LL_SEND_FULL bit when we're throttled.

The RDS_LL_SEND_FULL bit should be set when we stop transmitted due to
flow control. Otherwise the send worker will keep trying as opposed to
sleeping until we unthrottle. Saves CPU.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6a0979df32296c3ba75a346db47a18292a231c6e 24-Feb-2009 Andy Grover <andy.grover@oracle.com> RDS/IB: Implement IB-specific datagram send.

Specific to IB is a credits-based flow control mechanism, in
addition to the expected usage of the IB API to package outgoing
data into work requests.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>