6df5a128f0fde6315a44e80b30412997147f5efd |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Suppress scsi command send completions Singal completion of every 32 scsi commands and suppress all the rest. We don't do anything upon getting the completion so no need to "just consume" it. Cleanup of scsi command is done in cleanup_task callback. Still keep dataout and control send completions as we may need to cleanup there. This helps reducing the amount of interrupts/completions in the IO path. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
6e6fe2fb1d61b4baef1cf350049c6877583681ee |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Optimize completion polling Poll in batch of 16. Since we don't want it on the stack, keep under iser completion context (iser_comp). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
ff3dd52d267165347d6f92a90016e692d074a00c |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use beacon to indicate all completions were consumed Avoid post_send counting (atomic) in the IO path just to keep track of how many completions we need to consume. Use a beacon post to indicate that all prior posts completed. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
6aabfa76f5e5281e5db128a34420d8f33b8574f7 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use single CQ for RX and TX This will solve a possible condition where we might miss TX completion (flush error) during session teardown. Since we are using a single CQ, we don't need to actively drain the TX CQ, instead just wait for flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors(). This patch might introduce a minor performance regression on its own, but the next patches will enhance performance using a single CQ for RX and TX. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
183cfa434ec90897b1423ce4f916e8a237139133 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Use internal polling budget to avoid possible live-lock We need a way to guarentee that we don't stay in soft-IRQ context for too long. We might starve other pending CQ tasklets or worse lock against application trying to issue IO on the running CPU. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
bf17554035ab2aaf770321208ce48e69aab71cc8 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Centralize iser completion contexts Introduce iser_comp which centralizes all iser completion related items and is referenced by iser_device and each ib_conn. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
aea8f4df6da46add468c44875348e1045bffeeb7 |
|
01-Oct-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release In case iscsid was violently killed (SIGKILL) during its error recovery stage, we may never get a connection teardown sequence for some of the old connections. No harm done, but when we try to unload the module we will need to cleanup all these connections. So we actually may end-up here - so it's not a BUG_ON(), just give a relaxed warning that this happened and continue with normal unload. BUG_ON() will cause segfault on module_exit and we don't want that. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
8c204e69ced1a8c0d74f8b6d7a1393d055c5c4fa |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Signal iSCSI layer that transport is broken in error completions Previously we notified iscsi layer about the connection layer when we consumed all of our flush errors. This was racy as there was no guarentee that iscsi_conn wasn't terminated by then (which ends up in an invalid memory access). In case we got a non FLUSH error completion, we are guarenteed that iscsi_conn is still alive. We should notify iSCSI layer with iscsi_conn_failure to initiate error handling. While we are at it, add a nice kernel-doc style documentation. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
c107a6c0cf1ab883ea87ca50136f4fc5204a4b82 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Don't bound release_work completions timeouts We no longer rely on iscsi connection teardown sequence, so no need to give a grace period and continue cleanup if it expired. Have iser_conn_release wait for full completion before freeing iser_conn. ib_completion: Guaranteed to come when: - Got DISCONNECTED/ADDR_CHANGE event or - iSCSI called ep_disconnect/conn_stop Guaranteed to finish when: - Got TIMEWAIT_EXIT/DEVICE_REMOVAL event - All Flush errors are consumed - IB related resources are destroyed stop_completion: Guaranteed to come when: - iSCSI calls conn_stop Guaranteed to finish when: - All inflight tasks were cleaned up Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
c47a3c9ed5be167f49a6fd3f696dac03536282eb |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon iscsi daemon is in user-space, thus we can't rely on it to be invoked at connection teardown (if not running or does not receive CPU time). This patch addresses the issue by re-structuring iSER connection teardown logic and CM events handling. The CM events will dictate the RDMA resources destruction (ib_conn) and iser_conn is kept around as long as iscsi_conn is left around allowing iscsi/iser callbacks to continue after RDMA transport was destroyed. This patch introduces a separation in logic when handling CM events: - DISCONNECTED_HANDLER, ADDR_CHANGED This events indicate the start of teardown process. Actions: 1. Terminate the connection: rdma_disconnect (send DREQ/DREP) 2. Notify iSCSI of connection failure 3. Change state to TERMINATING 4. Poll for all flush errors to be consumed - TIMEWAIT_EXIT, DEVICE_REMOVAL These events indicate the final stage of termination process and we can free RDMA related resources. Actions: 1. Call disconnected handler (we are not guaranteed that DISCONNECTED event was invoked in the past) 2. Cleanup RDMA related resources 3. For DEVICE_REMOVAL return non-zero rc from cma_handler to implicitly destroy the cm_id (Can't rely on user-space, make sure we have forward progress) We replace flush_completion (indicate all flushes were consumed) with ib_completion (rdma resources were cleaned up). The iser_conn_release_work will wait for teardown completions: - conn_stop was completed (tasks were cleaned-up) - stop_completion - RDMA resources were destroyed - ib_completion And then will continue to free iser connection representation (iser_conn). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
96f15198c1457df29b51ed151b1e5b2a223d1346 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Extend iser_free_ib_conn_res() Put all connection IB related resources release in this routine. One exception is the cm_id which cannot be destroyed as the routine is protected by the state mutex. Also move its position to avoid forward declaration. While at it fix qp NULL assignment. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
a4ee3539f6e2955815b93350bbce01e8915d27f8 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Re-introduce ib_conn Structure that describes the RDMA relates connection objects. Static member of iser_conn. This patch does not change any functionality Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
5716af6e5234402b2017f41beb36c086201fae42 |
|
01-Oct-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Rename ib_conn -> iser_conn Two reasons why we choose to do this: 1. No point today calling struct iser_conn by another name ib_conn 2. In the next patches we will restructure iser control plane representation - struct iser_conn: connection logical representation - struct ib_conn: connection RDMA layout representation This patch does not change any functionality. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
c33b15f00bbfb9324dc38e5176f576a0f46e0873 |
|
02-Sep-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Fix RX/TX CQ resource leak on error flow When failing to allocate TX CQ we already allocated RX CQ, so we need to make sure we release it. Also, when failing to register notification to the RX CQ we currently leak both RX and TX CQs of the current index, fix that too. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
8d4aca7f04a098086c6545df158e5a022cbbfcd1 |
|
31-Jul-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Clarify a duplicate counters check This is to prevent someone from thinking that this code section is redundant. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
9a6d3234a192d4a3a51df1042c13af13f996242a |
|
31-Jul-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Replace connection waitqueue with completion object Instead of waiting for events and condition changes of the iser connection state, we wait for explicit completion of connection establishment and teardown. Separate connection establishment wait object from the teardown object to avoid a situation where racing connection establishment and teardown may concurrently wakeup each other. ep_poll will wait for up_completion invoked by iser_connected_handler() and iser release worker will wait for flush_completion before releasing the connection. Bound the completion wait with a 30 seconds timeout for cases where iscsid (the user space iscsi daemon) is too slow or gone. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
504130c039f917aba8b145fe8ea99be95e662fca |
|
31-Jul-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Protect iser state machine with a mutex The iser connection state lookups and transitions are not fully protected. Some transitions are protected with a spinlock, and in some cases the state is accessed unprotected due to specific assumptions of the flow. Introduce a new mutex to protect the connection state access. We use a mutex since we need to also include a scheduling operations executed under the state lock. Each state transition/condition and its corresponding action will be protected with the state mutex. The rdma_cm events handler acquires the mutex when handling connection events. Since iser connection state can transition to DOWN concurrently during connection establishment, we bailout from addr/route resolution events when the state is not PENDING. This addresses a scenario where ep_poll retries expire during CMA connection establishment. In this case ep_disconnect is invoked while CMA events keep coming (address/route resolution, connected, etc...). Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
f1a8bf0983207bebebd13c0507cb341fbffc5ed7 |
|
31-Jul-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Remove redundant return code in iser_free_ib_conn_res() Make it void. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
0a6907588a8b9bf1aa9ee84b809a1d49caea594a |
|
31-Jul-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Seperate iser_conn and iscsi_endpoint storage space iser connection needs asynchronous cleanup completions which are triggered in ep_disconnect. As a result we are keeping the corresponding iscsi_endpoint structure hanging for no good reason. In order to avoid that, we seperate iser_conn from iscsi_endpoint storage space to have their destruction being independent. iscsi_endpoint will be destroyed at ep_disconnect stage, while the iser connection will wait for asynchronous completions to be released in an orderly fashion. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
2ea32938f3a702d08c5cc2cc9cb8b11235eaad8c |
|
31-Jul-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Fix responder resources advertisement The iser initiator is the RDMA responder so it should publish to the target the max inflight rdma read requests its local HCA can handle in responder_resources (max_qp_rd_atom). The iser target should take the min of that and its local HCA max inflight oustanding rdma read requests (max_qp_init_rd_atom). We keep initiator_depth set to 1 in order to compat with old targets. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
9579d603502d0f24272c4dd70451d97c8d306b54 |
|
31-Jul-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Add TIMEWAIT_EXIT event handling In case the DISCONNECTED event is not delivered after rdma_disconnect is called, the CM waits TIMEWAIT seconds and delivers the TIMEWAIT_EXIT local event. We use this as the notification needed to continue in the teardown and release sequence. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
96ed02d4be3c68527130decd3a7d18240da4bee5 |
|
31-Jul-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Support IPv6 address family Replace struct sockaddr_in with struct sockaddr which supports both IPv4 and IPv6, and print using the %pIS format directive. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
66d4e62d27875f3ae417e3b73396e290bb184b5c |
|
22-May-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Fix a possible race in iser connection states transition In some circumstances (multiple targets), RDMA_CM ESTABLISHED event and ep_disconnect may race. In this case, the iser connection state may transition to UP (after ep_disconnect transitioned it to TERMINATING), while the connection is being torn down. Upon RDMA_CM event ESTABLISHED we allow iser connection state to transition to UP only from PENDING. We also make sure to protect this state change (done under the connection lock). Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
b73c3adabdb1e2cb2f2c69bc3cbb9306aa3f9700 |
|
22-May-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Simplify connection management iSER relies on refcounting to manage iser connections establishment and teardown. Following commit 39ff05dbbbdb ("IB/iser: Enhance disconnection logic for multi-pathing"), iser connection maintain 3 references: - iscsi_endpoint (at creation stage) - cma_id (at connection request stage) - iscsi_conn (at bind stage) We can avoid taking explicit refcounts by correctly serializing iser teardown flows (graceful and non-graceful). Our approach is to trigger a scheduled work to handle ordered teardown by gracefully waiting for 2 cleanup stages to complete: 1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion 2. Flush errors processing Each completed stage will notify a waiting worker thread when it is done to allow teardwon continuation. Since iSCSI connection establishment may trigger endpoint disconnect without a successful endpoint connect, we rely on the iscsi <-> iser binding (.conn_bind) to learn about the teardown policy we should take wrt cleanup stages. Since all cleanup worker threads are scheduled (release_wq) in .ep_disconnect it is safe to assume that when module_exit is called, all cleanup workers are already scheduled. Thus proper module unload shall flush all scheduled works before allowing safe exit, to guarantee no resources got left behind. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
3ee07d27ac10450ebf5769441fd313c722c03e4a |
|
01-Apr-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Update Mellanox copyright note Update Mellanox copyrights for 2014 on the iser initiator driver. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
4f9208ad3f6625f707210c00c7e7f7a0f688d019 |
|
01-Apr-2014 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Print QP information once connection is established Add an iser info print with the local/remote QP information carried out when the connection is established. While here, fix a little leftover from the T10 work and set a debug print to be carried in debug and not info level. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
4667f5dfb0c36a86339652aca694736ebc590871 |
|
01-Apr-2014 |
Ariel Nahum <arieln@mellanox.com> |
IB/iser: Remove struct iscsi_iser_conn The iscsi stack has existing mechanisms to link back and forth between the iscsi connection and the iscsi transport (e.g iser/tcp) connection. This is done through a dd_data pointer field in struct iscsi_conn which can be set to point to the transport connection, etc. The iscsi_iser_conn structure was used to get this linking done in another way, which is uneeded and adds extra complication to the iser code, so we just remove it. Signed-off-by: Ariel Nahum <arieln@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
1d6c2b736fd251c1f827704b1234d81b28c112d6 |
|
01-Apr-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Drain the tx cq once before looping on the rx cq The iser disconnection flow isn't done before all the inflight recv/send buffers posted to the QP are either flushed or normally completed to the CQ that serves this connection. The condition check is done in iser_handle_comp_error(). Currently, it's possible for the send buffer completion that makes the posted send buffers counter reach zero to be polled in the drain tx call, which is after the rx cq is fully drained. Since this completion might be not an error one (for example, it might be a completion of the logout request iSCSI PDU) we will skip iser_handle_comp_error(). So the connection will never terminate from the iscsi stack point of view, and we hang. To resolve this race, do the draining of the tx cq before the loop on the rx cq. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
39c978cd1704dc03f6913bd40f858def7fd31185 |
|
28-Mar-2014 |
Randy Dunlap <rdunlap@infradead.org> |
IB/iser: Fix sector_t format warning Fix pr_err (printk) format warning: drivers/infiniband/ulp/iser/iser_verbs.c:1181:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'sector_t' [-Wformat] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
0a7a08ad6f5f344d592fe63403f48e67395e08bf |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Implement check_protection Once the iSCSI transaction is completed we must implement check_protection in order to notify on DIF errors that may have occured. The routine boils down to calling ib_check_mr_status to get the signature status of the transaction. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
6b5a8fb0d22f95fff1eefe1545aa2c7771cacc3f |
|
05-Mar-2014 |
Alex Tabachnik <alext@mellanox.com> |
IB/iser: Initialize T10-PI resources During connection establishment we also initialize T10-PI resources (QP, PI contexts) in order to support SCSI's protection operations. Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
7f73384752af3caf0756cf807b2f7fbac50982d1 |
|
05-Mar-2014 |
Alex Tabachnik <alext@mellanox.com> |
IB/iser: Introduce pi_enable, pi_guard module parameters Use modparams to activate protection information support. pi_enable bool: Based on this parameter iSER will know if it should support T10-PI. We don't want to do this by default as it requires to allocate and initialize extra resources. In case pi_enable=N, iSER won't publish to SCSI midlayer any DIF capabilities. pi_guard int: Based on this parameter iSER will publish DIX guard type support to SCSI midlayer. 0 means CRC is allowed to be passed in DIX buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX buffers. Note that over the wire, only CRC is allowed. In the next phase, it is worth considering passing these parameters from iscsid via nlmsg. This will allow these parameters to be connection based rather than global. Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
73bc06b7edd8ce4ccbce7ffd28978ce16b97e5d8 |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Replace fastreg descriptor valid bool with indicators container In T10-PI support we will have memory keys for protection buffers and signature transactions. We prefer to compact indicators rather than keeping multiple bools. This commit does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
65198d6b843bf43650781f71caac1266d6b407cb |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Keep IB device attributes under iser_device For T10-PI offload support, we will need to know the device signature offload capability upon every connection establishment. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
310b347c6017ca5f00fa1e574c2d9c5b1088a786 |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Move fast_reg_descriptor initialization to a function fastreg descriptor will include protection information context. In order to place the logic in one place we introduce iser_create_fr_desc function. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
7306b8fad467c4c3c1e3fc68b237427cac1533a7 |
|
05-Mar-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Avoid FRWR notation, use fastreg instead FRWR stands for "fast registration work request". We want to avoid calling the fastreg pool with that name, instead we name it fastreg which stands for "fast registration". This pool will include more elements in the future, so it is a good idea to generalize the name. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
db523b8de13545488b6ff6c952b4527596f3c16a |
|
22-Jan-2014 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Suppress completions for fast registration work requests In case iSER uses fast registration method, it should not request for successful completions on fast registration nor local invalidate requests. We color wr_id with ISER_FRWR_LI_WRID in order to correctly consume error completions. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
7d9eacf9457efc6b614665e1095336c11ad83f0d |
|
04-Feb-2014 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection Fix a possible NULL pointer dereference in disconnection flow. This can happen if the target disconnected/rejected the connection request, e.g before the binding stage between iscsi connection to the transport connection. Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
27ae2d1ea578dce73ab5368a6bf31c342004e709 |
|
19-Aug-2013 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Fix possible memory leak in iser_create_frwr_pool() Fix leak where desc is not being freed in error flows. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
5587856c9659ac2d6ab201141aa8a5c2ff3be4cd |
|
27-Jul-2013 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Introduce fast memory registration model (FRWR) Newer HCAs and Virtual functions may not support FMRs but rather a fast registration model, which we call FRWR - "Fast Registration Work Requests". This model was introduced in 00f7ec36c ("RDMA/core: Add memory management extensions support") and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the iser device iser will test whether the HCA supports FMRs. If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS is supported and assign function pointers that handle fast registration and allocation of appropriate resources (fast_reg descriptors). Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
e657571b76faf96a1de1aaf40b2111adcf76c673 |
|
27-Jul-2013 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Place the fmr pool into a union in iser's IB conn struct This is preparation step for other memory registration methods to be added. In addition, change reg/unreg routines signature to indicate they use FMRs. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
b4e155ffbbd65cba77207bc5522c7b734a5c8c9d |
|
27-Jul-2013 |
Sagi Grimberg <sagig@mellanox.com> |
IB/iser: Generalize rdma memory registration Currently the driver uses FMRs as the only means to register the memory pointed by SG provided by the SCSI mid-layer with the RDMA device. As preparation step for adding more methods for fast path memory registration, make the alloc/free and reg/unreg calls function pointers, which are for now just set to the existing FMR ones. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
b7f04513090cf12394de27588a1956d1f97188cb |
|
27-Jul-2013 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/iser: Accept session->cmds_max from user space Use cmds_max passed from user space to be the number of PDUs to be supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX. This allow controlling the max number of SCSI commands for the session. Also don't ignore the qdepth passed from user space. Derive from session->cmds_max the actual number of RX buffers and FMR pool size to allocate during the connection bind phase. Since the iser transport connection is established before the iscsi session/connection are created and bound, we still use one hard-coded quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of work-requests to be supported by the RC QP used for the connection. The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN (16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
986db0d6c08125bdf50d8ffdc3b0307aa2871e3e |
|
27-Jul-2013 |
Shlomo Pongratz <shlomop@mellanox.com> |
IB/iser: Restructure allocation/deallocation of connection resources This is a preparation step to a patch that accepts the number of max SCSI commands to be supported a session from user space iSCSI tools. Move the allocation of the login buffer, FMR pool and its associated page vector from iser_create_ib_conn_res() (which is called prior when we actually know how many commands should be supported) to iser_alloc_rx_descriptors() (which is called during the iscsi connection bind step where this quantity is known). Also do small refactoring around the deallocation to make that path similar to the allocation one. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
28f292e879a6acf745005e75196fe8f7cc504103 |
|
08-May-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Add Mellanox copyright Add Mellanox copyright to the iser initiator source code which I maintain. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
5b61ff43a774b9843402fb280fec6d700e1fe583 |
|
08-May-2013 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Fix device removal flow Change the code to destroy the "last opened" rdma_cm id after making sure we released all other objects (QP, CQs, PD, etc) associated with the IB device. Since iser accesses the IB device using the rdma_cm id, we need to free any objects that are related to the device that is associated with the rdma_cm id prior to destroying that id. When this isn't done, the low level driver that created this device can be unloaded before iser has a chance to free all the objects and a such a call may invoke code segment which isn't valid any more and crash. Cc: Sean Hefty <sean.hefty@intel.com Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
8d8399deb0434df6efb3a54a0efeac677e310fc1 |
|
01-May-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Add support for iser CM REQ additional info Annex A12 of the IBTA spec defines additional information that needs to be provided through the CM exchange relating to usage of ZBVA (Zero Based VAs) and Send With Invalidate over an iSER connection. Currently, the initiator sets both to not supported, but does provide the header so that existing iSER targets can be patched to start looking on the private data carried by the CM. This is a preparation step to enable iSER with HW drivers for which FMRs are not supported, such as mlx4 VF instances or new HW devices which might support only FRWR (Fast Registration Work-Requests) along the details of the IB_DEVICE_MEM_MGT_EXTENSIONS device capability. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
4f3638826124b722b2c0dde5097bca182bf74125 |
|
01-May-2013 |
Roi Dayan <roid@mellanox.com> |
IB/iser: Move informational messages from error to info level Introduce iser_info() and move informational messages that were printed as errors to use that macro. Also, cleanup printk leftovers to use the existing macros. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> [ Use pr_warn(... instead of printk(KERN_WARNING .... - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
|
5525d210fd55952262f30ba45b2acceb4a6a50e9 |
|
21-Feb-2013 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Enable iser when FMRs are not supported Reuse the "SG unaligned for FMR" driver flow to make the initiator functional when running over driver instance which doesn't support FMRs, such as a mlx4 virtual function. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
5a33a6694942bc86e487d00cd8feebeae5f14546 |
|
23-Sep-2012 |
Alex Tabachnik <alext@mellanox.com> |
IB/iser: Add more RX CQs to scale out processing of SCSI responses RX/TX CQs will now be selected from a per HCA pool. For the RX flow this has the effect of using different interrupt vectors when using low level drivers (such as mlx4) that map the "vector" param provided by the ULP on CQ creation to a dedicated IRQ/MSI-X vector. This allows the RX flow processing of IO responses to be distributed across multiple CPUs. QPs (--> iSER sessions) are assigned to CQs in round robin order using the CQ with the minimum number of sessions attached to it. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Alex Tabachnik <alext@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
7d9c0de4ab4ec00b8349d1af0b736a0d473671c2 |
|
29-Apr-2012 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Fix error flow in iser ep connection establishment The current error flow code was releasing the IB connection object and calling iscsi_destroy_endpoint() directly without going through the reference counting mechanism introduced in commit 39ff05d ("IB/iser: Enhance disconnection logic for multi-pathing"). This resulted in a double free of the iscsi endpoint object, which causes a kernel NULL pointer dereference. Fix that by plugging into the IB conn reference counting correctly. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
d474186f19d7ac1c7fbb293fdcfa46103e45e2ca |
|
01-Mar-2012 |
Doug Ledford <dledford@redhat.com> |
IB/iser: Free IB connection resources in the proper place We allocate the login dma buffers in iser_verbs.c as part of alloc_ib_conn_resources(), however we are freeing them in iser_initiator.c as part of iser_free_rx_descriptors(). This is needlessly confusing. We have an alloc_rx_descriptors() and it doesn't alloc something that the free_rx_descriptors() frees, and we have an alloc_ib_conn_resources() that allocs something not freed by free_ib_conn_resources(). Clean that up. Signed-off-by: Doug Ledford <dledford@redhat.com> [ Fix build error in iser_free_ib_conn_res(). - Or ] Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
2c4ce609347f2a45792c8d9ebb5af11217766cb6 |
|
03-Nov-2011 |
Or Gerlitz <ogerlitz@mellanox.com> |
IB/iser: Use separate buffers for the login request/response The driver counted on the transactional nature of iSCSI login/text flows and used the same buffer for both the request and the response. We also went further and did DMA mapping only once, with DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by using different buffers, one for requests and one for responses, and use the correct DMA mapping direction for each. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
b26f9b9949013fec31b23c426fc463164ae08891 |
|
01-Apr-2010 |
Sean Hefty <sean.hefty@intel.com> |
RDMA/cma: Pass QP type into rdma_create_id() The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
|
9fda1ac5fa09c49e9148f85be14f55e2bb856c0f |
|
06-May-2010 |
Dan Carpenter <error27@gmail.com> |
IB/iser: Fix error flow in iser_create_ib_conn_res() We shouldn't free things here because we free them later. The call tree looks like this: iser_connect() ==> initiating the connection establishment and later iser_cma_handler() => iser_route_handler() => iser_create_ib_conn_res() if we fail here, eventually iser_conn_release() is called, resulting in a double free. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
39ff05dbbbdb082bbabf06206c56b3cd4ef73904 |
|
05-May-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Enhance disconnection logic for multi-pathing The iser connection teardown flow isn't over until the underlying Connection Manager (e.g the IB CM) delivers a disconnected or timeout event through the RDMA-CM. When the remote (target) side isn't reachable, e.g when some HW e.g port/hca/switch isn't functioning or taken down administratively, the CM timeout flow is used and the event may be generated only after relatively long time -- on the order of tens of seconds. The current iser code exposes this possibly long delay to higher layers, specifically to the iscsid daemon and iscsi kernel stack. As a result, the iscsi stack doesn't respond well: this low-level CM delay is added to the fail-over time under HA schemes such as the one provided by DM multipath through the multipathd(8) service. This patch enhances the reference counting scheme on iser's IB connections so that the disconnect flow initiated by iscsid from user space (ep_disconnect) doesn't wait for the CM to deliver the disconnect/timeout event. (The connection teardown isn't done from iser's view point until the event is delivered) The iser ib (rdma) connection object is destroyed when its reference count reaches zero. When this happens on the RDMA-CM callback context, extra care is taken so that the RDMA-CM does the actual destroying of the associated ID, since doing it in the callback is prohibited. The reference count of iser ib connection normally reaches three, where the <ref, deref> relations are 1. conn <init, terminate> 2. conn <bind, stop/destroy> 3. cma id <create, disconnect/error/timeout callbacks> With this patch, multipath fail-over time is about 30 seconds, while without this patch, multipath fail-over time is about 130 seconds. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
d265b9808272c9f25e1c36d3fb5ddb466efd90e9 |
|
05-May-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Remove buggy back-pointer setting The iscsi connection object life cycle includes binding and unbinding (conn_stop) to/from the iscsi transport connection object. Since iscsi connection objects are recycled, at the time the transport connection (e.g iser's IB connection) is released, it is not valid to touch the iscsi connection tied to the transport back-pointer since it may already point to a different transport connection. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
2110f9bf37511df06220bb7e977f417baecf2950 |
|
05-May-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Add asynchronous event handler Add handler to handle events such as port up and down. This is useful when testing high-availability schemes such as multi-pathing. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
5a0e3ad6af8660be21ca98a971cd00f331318c05 |
|
24-Mar-2010 |
Tejun Heo <tj@kernel.org> |
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
aae3c995ff74a183d15207436d383942485b2edd |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Remove unnecessary connection checks Remove unnecessary checks for the IB connection state and for QP overflow, as conn state changes are reported by iSER to libiscsi and handled there. QP overflow is theoretically possible only when unsolicited data-outs are used; anyway it's being checked and handled by HW drivers. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
f19624aa92003969ba822cd3c552800965aa530b |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Simplify send flow/descriptors Simplify and shrink the logic/code used for the send descriptors. Changes include removing struct iser_dto (an unnecessary abstraction), using struct iser_regd_buf only for handling SCSI commands, using dma_sync instead of dma_map/unmap, etc. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
78ad0a34dc138047529058c5f2265664cb70a052 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Use different CQ for send completions Use a different CQ for send completions, where send completions are polled by the interrupt-driven receive completion handler. Therefore, interrupts aren't used for the send CQ. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
704315f082d473b34047817f0a6a01924f38501e |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Remove atomic counter for posted receive buffers Now that both the posting and reaping of receive buffers is done in the completion path, the counter of outstanding buffers not be atomic. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
bcc60c381d857ced653e912cbe6121294773e147 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: New receive buffer posting logic Currently, the recv buffer posting logic is based on the transactional nature of iSER which allows for posting a buffer before sending a PDU. Change this to post only when the number of outstanding recv buffers is below a water mark and in a batched manner, thus simplifying and optimizing the data path. Use a pre-allocated ring of recv buffers instead of allocating from kmem cache. A special treatment is given to the login response buffer whose size must be 8K unlike the size of buffers used for any other purpose which is 128 bytes. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
1cef4659850eeb862c248c7670e404d7a1711ed1 |
|
08-Feb-2010 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Revert commit bba7ebb "avoid recv buffer exhaustion" We will make a major change in the recv buffer posting logic, after which the problem commit bba7ebb "avoid recv buffer exhaustion caused by unexpected PDUs" comes to solve doesn't exist any more, so revert it. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
94e2bd688820aed72b4f8092f88c2ccf64e003de |
|
16-Oct-2009 |
Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> |
tree-wide: fix some typos and punctuation in comments fix some typos and punctuation in comments Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
1aedb7721f05461f777fdee25b50d8a168c425ed |
|
27-Feb-2009 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Remove hard setting of path MTU Remove hard setting of the IB MTU used by iSER's RC queue-pair to 1K, as this was done due to inter-op issues with an old iser target which is not used any more. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
bba7ebba3b17f4fe8c5907a32e16d9bd3fcf5192 |
|
21-Dec-2008 |
David Disseldorp <ddiss@sgi.com> |
IB/iser: Avoid recv buffer exhaustion caused by unexpected PDUs iSCSI/iSER targets may send PDUs without a prior request from the initiator. RFC 5046 refers to these PDUs as "unexpected". NOP-In PDUs with itt=RESERVED and Asynchronous Message PDUs occupy this category. The amount of active "unexpected" PDU's an iSER target may have at any time is governed by the MaxOutstandingUnexpectedPDUs key, which is not yet supported. Currently when an iSER target sends an "unexpected" PDU, the initiators recv buffer consumed by the PDU is not replaced. If over initial_post_recv_bufs_num "unexpected" PDUs are received then the receive queue will run out of receive work requests entirely. This patch ensures recv buffers consumed by "unexpected" PDUs are replaced in the next iser_post_receive_control() call. Signed-off-by: David Disseldorp <ddiss@sgi.com> Signed-off-by: Ken Sandars <ksandars@sgi.com> Acked-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
63779436ab4ad0867bcea53bf853b0004d7b895d |
|
31-Oct-2008 |
Harvey Harrison <harvey.harrison@gmail.com> |
drivers: replace NIPQUAD() Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
|
7a8fc9b248e77a4eab0613acf30a6811799786b3 |
|
17-Aug-2008 |
Adrian Bunk <bunk@kernel.org> |
removed unused #include <linux/version.h>'s This patch lets the files using linux/version.h match the files that #include it. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
2f5de1512884da8c74bec2c76e8f114b972ab4be |
|
22-Jul-2008 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: Add support for RDMA_CM_EVENT_ADDR_CHANGE event Enhance iser to act upon notification on network stack changes that make its RDMA connection unaligned with the link used by the stack for the <src,dst> IPs used to establish the connection. When RDMA_CM_EVENT_ADDR_CHANGE arrives, just disconnect the connection, assuming that the user space iscsid daemon will reconnect, and the new connection will be aligned with the IP stack. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
f3781d2e89f12dd5afa046dc56032af6e39bd116 |
|
15-Jul-2008 |
Roland Dreier <rolandd@cisco.com> |
RDMA: Remove subversion $Id tags They don't get updated by git and so they're worse than useless. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
412eeafa0a51a8d86545d0be637bf84e4374fccf |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iser: Modify iser to take a iscsi_endpoint struct in ep callouts and session setup This hooks iser into the iscsi endpoint code. Previously it handled the lookup and allocation. This has been made generic so bnx2i and iser can share it. It also allows us to pass iser the leading conn's ep, so we know the ib_deivce being used and can set it as the scsi_host's parent. And that allows scsi-ml to set the dma_mask based on those values. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
b40977d95fb3a1898ace6a7d97e4ed1a33a440a4 |
|
21-May-2008 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] iser: fix handling of scsi cmnds during recovery. After the stop_conn callback has returned the LLD should not touch the scsi cmds. iscsi_tcp and libiscsi use the conn->recv_lock and suspend_rx field to halt recv path processing, but iser does not have any protection. This patch modifies iser so that userspace can just call the ep_disconnect callback, which will halt all recv IO, before calling the stop_conn callback so we do not have to worry about the conn->recv_lock and suspend rx field. iser just needs to stop the send side from accessing the ib conn. Fixup to handle when the ep poll fails and ep disconnect is called from Erez. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
|
d97c51707d7d0716881be84ffd2100449852e44b |
|
17-Apr-2008 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Release connection resources on RDMA_CM_EVENT_DEVICE_REMOVAL event When a RDMA_CM_EVENT_DEVICE_REMOVAL event is raised, iSER should release the connection resources. This is necessary when the IB HCA module is unloaded while open-iscsi is still running. Currently, iSER just BUG()s. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
d33ed425c6cc14370d8c418b504328d2c3db58b4 |
|
04-Mar-2008 |
Arne Redlich <arne.redlich@xiranet.com> |
IB/iser: Handle iser_device allocation error gracefully "iser_device" allocation failure is "handled" with a BUG_ON() right before dereferencing the NULL-pointer - fix this! Signed-off-by: Arne Redlich <arne.redlich@xiranet.com> Signed-off-by: Erez Zilber <erezz@voltaire.com>
|
9a378270c085080b2f38dee6308de4d8413b5141 |
|
04-Mar-2008 |
Arne Redlich <arne.redlich@xiranet.com> |
IB/iser: Fix list iteration bug The iteration through the list of "iser_device"s during device lookup/creation is broken -- it might result in an infinite loop if more than one HCA is used with iSER. Fix this by using list_for_each_entry() instead of the open-coded flawed list iteration code. Signed-off-by: Arne Redlich <arne.redlich@xiranet.com> Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
a4ef1451dfba92f51934e8331f634497b9ed3393 |
|
17-Jan-2008 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Print information about unhandled RDMA CM events Some RDMA CM events are not supported or not handled in iSER. This patch adds some info (printk) for the user about them. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
38dc732f47948b9f91ae846806159a16aab1015f |
|
25-Jan-2008 |
Oliver Pinter <oliver.pntr@gmail.com> |
IB/iser: Typo fix (s/destory/destroy/) Signed-off-by: Oliver Pinter <oliver.pntr@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
ec2a1344ad348a789b1d9d9b32cccbef33161574 |
|
10-Oct-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/iser: Remove unnecessary includes <asm/scatterlist.h> is not needed because everyplace it appears, <linux/scatterlist.h> also appears. <asm/io.h> is not needed because nothing seems to be using device IO anyway. Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
41179e2de6962b46d1d9f2b4437243ac740efdec |
|
18-Jul-2007 |
Roland Dreier <rolandd@cisco.com> |
IB/iser: Make a couple of functions static Make iser_conn_release() and iser_start_rdma_unaligned_sg() static, since they are only used in the .c file where they are defined. In addition to being a cleanup, this even shrinks the generated code by allowing the single call of iser_start_rdma_unaligned_sg() to be inlined into its callsite. On x86_64: add/remove: 0/1 grow/shrink: 1/0 up/down: 466/-533 (-67) function old new delta iser_reg_rdma_mem 1518 1984 +466 iser_start_rdma_unaligned_sg 533 - -533 Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
1548271ece9e9312fd5feb41fd58773b56a71d39 |
|
30-May-2007 |
Mike Christie <michaelc@cs.wisc.edu> |
[SCSI] libiscsi: make can_queue configurable This patch allows us to set can_queue and cmds_per_lun from userspace when we create the session/host. From there we can set it on a per target basis. The patch fully converts iscsi_tcp, but only hooks up ib_iser for cmd_per_lun since it currently has a lots of preallocations based on can_queue. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Cc: Roland Dreier <rdreier@cisco.com> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
|
e63340ae6b6205fef26b40a75673d1c9c0c8bb90 |
|
08-May-2007 |
Randy Dunlap <randy.dunlap@oracle.com> |
header cleaning: don't include smp_lock.h when not used Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
f4fd0b224d60044d2da5ca02f8f2b5150c1d8731 |
|
03-May-2007 |
Michael S. Tsirkin <mst@dev.mellanox.co.il> |
IB: Add CQ comp_vector support Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
1d426d6418d1914b592c9c307c02e488d9182fa8 |
|
01-Apr-2007 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Don't defer connection failure notification to workqueue When a connection is terminated asynchronously from the iSCSI layer's perspective, iSER needs to notify the iSCSI layer that the connection has failed. This is done using a workqueue (switched to from the iSER tasklet context). Meanwhile, the connection object (that holds the work struct) is released. If the workqueue function wasn't called yet, it will be called later with a NULL pointer, which will crash the kernel. The context switch (tasklet to workqueue) is not required, and everything can be done from the iSER tasklet. This eliminates the NULL work struct bug (and simplifies the code). Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
c4028958b6ecad064b1a6303a6a5906d4fe48d73 |
|
22-Nov-2006 |
David Howells <dhowells@redhat.com> |
WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
|
87e8df7a273c7c1acb864c90b5253609c44375a6 |
|
27-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Have iSER data transaction object point to iSER conn iSER uses a data transaction object (struct iser_dto) as part of its IB data descriptors (struct iser_desc) management. It also uses a hierarchy of connection structures pointing to each other. A DTO may exist even after the iscsi_iser connection pointed by it is destroyed (eg one that is bound to a post receive buffer which was flushed by the IB HW). Hence DTOs need point to the lowest connection, which is struct iser_conn. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
d81110285f7f6c07a0ce8f99a5ff158a647cd649 |
|
10-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: Do not use FMR for a single dma entry sg Fast Memory Registration (fmr) is used to register for rdma an sg whose elements are not linearly sequential after dma mapping. The IB verbs layer provides an "all dma memory MR (memory region)" which can be used for RDMA-ing a dma linearly sequential buffer. Change the code to use the dma mr instead of doing fmr when dma mapping produces a single dma entry sg. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
8dfa0876d3dde5f9c1818a4c35caaabc3ddba78b |
|
10-Sep-2006 |
Erez Zilber <erezz@voltaire.com> |
IB/iser: make FMR "page size" be 4K and not PAGE_SIZE As iser is able to use at most one rdma operation for the execution of a scsi command, and registration of the sg associated with scsi command has its restrictions, the code checks if an sg is "aligned for rdma". Alignment for rdma is measured in "fmr page" units whose possible resolutions are different between HCAs and can be smaller, equal or bigger to the system page size. When the system page size is bigger than 4KB (eg the default with ia64 kernels) there a bigger chance that an sg would be aligned for rdma if the fmr page size is 4KB. Change the code to create FMR whose pages are of size 4KB and to take that into account when processing the sg. Signed-off-by: Erez Zilber <erezz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|
adfaa888a292e7f38fb43668d8994f246e371f0f |
|
14-Jul-2006 |
Michael S. Tsirkin <mst@mellanox.co.il> |
[PATCH] fmr pool: remove unnecessary pointer dereference ib_fmr_pool_map_phys gets the virtual address by pointer but never writes there, and users (e.g. srp) seem to assume this and ignore the value returned. This patch cleans up the API to get the VA by value, and updates all users. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Acked-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
1cfa0a75dbef1d5bf687aacafabb023288f6b36a |
|
11-May-2006 |
Or Gerlitz <ogerlitz@voltaire.com> |
IB/iser: iSER RDMA CM (CMA) and IB verbs interaction This file contains the low level interaction with the RDMA CM and the IB verbs, where iSER is consumer of both. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
|