History log of /net/rds/rdma.c
Revision Date Author Comments
dee49f203a7feef5d00c416b7dc7e34a7caba8e1 14-Oct-2014 Cong Wang <cwang@twopensource.com> rds: avoid calling sock_kfree_s() on allocation failure

It is okay to free a NULL pointer but not okay to mischarge the socket optmem
accounting. Compile test only.

Reported-by: rucsoftsec@gmail.com
Cc: Chien Yen <chien.yen@oracle.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
218854af84038d828a32f061858b1902ed2beec6 17-Nov-2010 Dan Rosenberg <drosenberg@vsecurity.com> rds: Integer overflow in RDS cmsg handling

In rds_cmsg_rdma_args(), the user-provided args->nr_local value is
restricted to less than UINT_MAX. This seems to need a tighter upper
bound, since the calculation of total iov_size can overflow, resulting
in a small sock_kmalloc() allocation. This would probably just result
in walking off the heap and crashing when calling rds_rdma_pages() with
a high count value. If it somehow doesn't crash here, then memory
corruption could occur soon after.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
d139ff0907dac9ef72fb2cf301e345bac3aec42f 28-Oct-2010 Andy Grover <andy.grover@oracle.com> RDS: Let rds_message_alloc_sgs() return NULL

Even with the previous fix, we still are reading the iovecs once
to determine SGs needed, and then again later on. Preallocating
space for sg lists as part of rds_message seemed like a good idea
but it might be better to not do this. While working to redo that
code, this patch attempts to protect against userspace rewriting
the rds_iovec array between the first and second accesses.

The consequences of this would be either a too-small or too-large
sg list array. Too large is not an issue. This patch changes all
callers of message_alloc_sgs to handle running out of preallocated
sgs, and fail gracefully.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fc8162e3c034af743d8def435fda6396603d321f 28-Oct-2010 Andy Grover <andy.grover@oracle.com> RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace

Change rds_rdma_pages to take a passed-in rds_iovec array instead
of doing copy_from_user itself.

Change rds_cmsg_rdma_args to copy rds_iovec array once only. This
eliminates the possibility of userspace changing it after our
sanity checks.

Implement stack-based storage for small numbers of iovecs, based
on net/socket.c, to save an alloc in the extremely common case.

Although this patch reduces iovec copies in cmsg_rdma_args to 1,
we still do another one in rds_rdma_extra_size. Getting rid of
that one will be trickier, so it'll be a separate patch.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f4a3fc03c1d73753879fb655b8cd628b29f6706b 28-Oct-2010 Andy Grover <andy.grover@oracle.com> RDS: Clean up error handling in rds_cmsg_rdma_args

We don't need to set ret = 0 at the end -- it's initialized to 0.

Also, don't increment s_send_rdma stat if we're exiting with an
error.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a09f69c49b84b161ebd4dd09d3cce1b68297f1d3 28-Oct-2010 Andy Grover <andy.grover@oracle.com> RDS: Return -EINVAL if rds_rdma_pages returns an error

rds_cmsg_rdma_args would still return success even if rds_rdma_pages
returned an error (or overflowed).

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1b1f693d7ad6d193862dcb1118540a030c5e761f 28-Oct-2010 Linus Torvalds <torvalds@linux-foundation.org> net: fix rds_iovec page count overflow

As reported by Thomas Pollet, the rdma page counting can overflow. We
get the rdma sizes in 64-bit unsigned entities, but then limit it to
UINT_MAX bytes and shift them down to pages (so with a possible "+1" for
an unaligned address).

So each individual page count fits comfortably in an 'unsigned int' (not
even close to overflowing into signed), but as they are added up, they
might end up resulting in a signed return value. Which would be wrong.

Catch the case of tot_pages turning negative, and return the appropriate
error code.

Reported-by: Thomas Pollet <thomas.pollet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9b9d2e00bfa592aceda7b43da76c670df61faa97 18-Sep-2010 Dan Carpenter <error27@gmail.com> rds: signedness bug

In the original code if the copy_from_user() fails in rds_rdma_pages()
then the error handling fails and we get a stack trace from kmalloc().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
20c72bd5f5f902e5a8745d51573699605bf8d21c 25-Aug-2010 Andy Grover <andy.grover@oracle.com> RDS: Implement masked atomic operations

Add two CMSGs for masked versions of cswp and fadd. args
struct modified to use a union for different atomic op type's
arguments. Change IB to do masked atomic ops. Atomic op type
in rds_message similarly unionized.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
2c3a5f9abb1dc5efdab8ba9a568b1661c65fd1e3 02-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Add flag for silent ops. Do atomic op before RDMA

Add a flag to the API so users can indicate they want
silent operations. This is needed because silent ops
cannot be used with USE_ONCE MRs, so we can't just
assume silent.

Also, change send_xmit to do atomic op before rdma op if
both are present, and centralize the hairy logic to determine if
we want to attempt silent, or not.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
7e3bd65ebfd5d6cd76b8b979920c632d6e6b4b2a 02-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Move some variables around for consistency

Also, add a comment.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
f8b3aaf2ba8ca9e27b47f8bfdff07c8b968f2c05 01-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Remove struct rds_rdma_op

A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
d0ab25a83c4a08cd98b73a37d3f4c069f7b4f50b 28-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: purge atomic resources too in rds_message_purge()

Add atomic_free_op function, analogous to rdma_free_op,
and call it in rds_message_purge().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
4324879df06ba4db01a0b455af2d003f117e6aa3 28-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Inline rdma_prepare into cmsg_rdma_args

cmsg_rdma_args just calls rdma_prepare and does a little
arg checking -- not quite enough to justify its existence.
Plus, it is the only caller of rdma_prepare().

Signed-off-by: Andy Grover <andy.grover@oracle.com>
6200ed7799d9225f363f157ab61f1566cfd80e19 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Whitespace

Tidy up some whitespace issues.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
d22faec22c2ab2364fd8fc3c8159b0b5b28b0fd1 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Do not mask address when pinning pages

This does not appear to be necessary.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
40589e74f7ba855f3a887c9d4abe9d100c5b039c 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Base init_depth and responder_resources on hw values

Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
15133f6e67d8d646d0744336b4daa3135452cb0d 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: Implement atomic operations

Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm

Signed-off-by: Andy Grover <andy.grover@oracle.com>
f4dd96f7b27743e568cec519eff0f951c56833c6 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: make sure all sgs alloced are initialized

rds_message_alloc_sgs() now returns correctly-initialized
sg lists, so calleds need not do this themselves.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
ff87e97a9d70c9ae133d3d3d7792b26ab85f4297 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: make m_rdma_op a member of rds_message

This eliminates a separate memory alloc, although
it is now necessary to add an "r_active" flag, since
it is no longer to use the m_rdma_op pointer as an
indicator of if an rdma op is present.

rdma SGs allocated from rm sg pool.

rds_rm_size also gets bigger. It's a little inefficient to
run through CMSGs twice, but it makes later steps a lot smoother.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
21f79afa5fda2820671a8f64c3d0e43bb118053b 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: fold rdma.h into rds.h

RDMA is now an intrinsic part of RDS, so it's easier to just have
a single header.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
3ef13f3c22aaea28aff383cb0883481d24885456 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: cleanup/fix rds_rdma_unuse

First, it looks to me like the atomic_inc is wrong.
We should be decrementing refcount only once here, no? It's
already being done by the mr_put() at the end.

Second, simplify the logic a bit by bailing early (with a warning)
if !mr.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
e779137aa76d38d5c33a98ed887092ae4e4f016f 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: break out rdma and data ops into nested structs in rds_message

Clearly separate rdma-related variables in rm from data-related ones.
This is in anticipation of adding atomic support.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
8690bfa17aea4c42da1bcf90a7af93d161eca624 12-Jan-2010 Andy Grover <andy.grover@oracle.com> RDS: cleanup: remove "== NULL"s and "!= NULL"s in ptr comparisons

Favor "if (foo)" style over "if (foo != NULL)".

Signed-off-by: Andy Grover <andy.grover@oracle.com>
35b52c70534cb7193b218ec12efe6bc595312097 01-Apr-2010 Tina Yang <tina.yang@oracle.com> RDS: Fix corrupted rds_mrs

On second look at this bug (OFED #2002), it seems that the
collision is not with the retransmission queue (packet acked
by the peer), but with the local send completion. A theoretical
sequence of events (from time t0 to t3) is thought to be as
follows,

Thread #1
t0:
sock_release
rds_release
rds_send_drop_to /* wait on send completion */
t2:
rds_rdma_drop_keys() /* destroy & free all mrs */

Thread #2
t1:
rds_ib_send_cq_comp_handler
rds_ib_send_unmap_rm
rds_message_unmapped /* wake up #1 @ t0 */
t3:
rds_message_put
rds_message_purge
rds_mr_put /* memory corruption detected */

The problem with the rds_rdma_drop_keys() is it could
remove a mr's refcount more than its due (i.e. repeatedly
as long as it still remains in the tree (mr->r_refcount > 0)).
Theoretically it should remove only one reference - reference
by the tree.

/* Release any MRs associated with this socket */
while ((node = rb_first(&rs->rs_rdma_keys))) {
mr = container_of(node, struct rds_mr, r_rb_node);
if (mr->r_trans == rs->rs_transport)
mr->r_invalidate = 0;
rds_mr_put(mr);
}

I think the correct way of doing it is to remove the mr from
the tree and rds_destroy_mr it first, then a rds_mr_put()
to decrement its reference count by one. Whichever thread
holds the last reference will free the mr via rds_mr_put().

Signed-off-by: Tina Yang <tina.yang@oracle.com>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
9e2effba2c16fc3bd47da605116485afe01e0be0 13-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Fix BUG_ONs to not fire when in a tasklet

in_interrupt() is true in softirqs. The BUG_ONs are supposed
to check for if irqs are disabled, so we should use
BUG_ON(irqs_disabled()) instead, duh.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
561c7df63e259203515509a7ad075382a42bff0c 11-Mar-2010 Andy Grover <andy.grover@oracle.com> RDS: Do not call set_page_dirty() with irqs off

set_page_dirty() unconditionally re-enables interrupts, so
if we call it with irqs off, they will be on after the call,
and that's bad. This patch moves the call after we've re-enabled
interrupts in send_drop_to(), so it's safe.

Also, add BUG_ONs to let us know if we ever do call set_page_dirty
with interrupts off.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
f64f9e719261a87818dd192a3a2352e5b20fbd0f 30-Nov-2009 Joe Perches <joe@perches.com> net: Move && and || to end of previous line

Not including net/atm/

Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
244546f0d3101c5441f5b14cfe8a79d62679eaea 30-Oct-2009 Andy Grover <andy.grover@oracle.com> RDS: Add GET_MR_FOR_DEST sockopt

RDS currently supports a GET_MR sockopt to establish a
memory region (MR) for a chunk of memory. However, the fastreg
method ties a MR to a particular destination. The GET_MR_FOR_DEST
sockopt allows the remote machine to be specified, and thus
support for fastreg (aka FRWRs).

Note that this patch does *not* do all of this - it simply
implements the new sockopt in terms of the old one, so applications
can begin to use the new sockopt in preparation for cutover to
FRWRs.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
830eb7d56c18ff4c29acd8b0bb48db404660321f 09-Apr-2009 Andy Grover <andy.grover@oracle.com> RDS: use get_user_pages_fast()

Use the new function that is simpler and faster.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7acd4a794c1530af063e51f3f7171e75556458f3 09-Apr-2009 Andy Grover <andy.grover@oracle.com> RDS: Fix ordering in a conditional

Putting the constant first is a supposed "best practice" that actually makes
the code harder to read.

Thanks to Roland Dreier for finding a bug in this "simple, obviously correct"
patch.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eff5f53bef75c0864a5da06bb688939092b848dc 24-Feb-2009 Andy Grover <andy.grover@oracle.com> RDS: RDMA support

Some transports may support RDMA features. This handles the
non-transport-specific parts, like pinning user pages and
tracking mapped regions.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>