History log of /drivers/vhost/vhost.h
Revision Date Author Comments
47283bef7ed356629467d1fac61687756e48f254 05-Jun-2014 Michael S. Tsirkin <mst@redhat.com> vhost: move memory pointer to VQs

commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc
vhost: replace rcu with mutex
replaced rcu sync for memory accesses with VQ mutex locl/unlock.
This is correct since all accesses are under VQ mutex, but incomplete:
we still do useless rcu lock/unlock operations, someone might copy this
code into some other context where this won't be right.
This use of RCU is also non standard and hard to understand.
Let's copy the pointer to each VQ structure, this way
the access rules become straight-forward, and there's
no need for RCU anymore.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ea16c51433510f7f758382dec5b933fc0797f244 05-Jun-2014 Michael S. Tsirkin <mst@redhat.com> vhost: move acked_features to VQs

Refactor code to make sure features are only accessed
under VQ mutex. This makes everything simpler, no need
for RCU here anymore.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
59566b6e8cc8f8a05022e0f01a1b90974003f092 06-Dec-2013 Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> vhost: remove the dead branch

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
22fa90c7fb479694d6affebc049d21f06b714be6 07-May-2013 Asias He <asias@redhat.com> vhost: Remove custom vhost rcu usage

Now, vq->private_data is always accessed under vq mutex. No need to play
the vhost rcu trick.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
6ac1afbf6132df0fcb0898f3509371305af7de16 06-May-2013 Asias He <asias@redhat.com> vhost: Make vhost a separate module

Currently, vhost-net and vhost-scsi are sharing the vhost core code.
However, vhost-scsi shares the code by including the vhost.c file

Making vhost a separate module makes it is easier to share code with
other vhost devices.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
05c05351943cc03bf5c77e86953b24ae6fb21368 06-Jun-2013 Michael S. Tsirkin <mst@redhat.com> vhost: check owner before we overwrite ubuf_info

If device has an owner, we shouldn't touch ubuf_info
since it might be in use.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e40ab7484f8e9cc9f5e59ae53214800e50f5615e 06-May-2013 Asias He <asias@redhat.com> vhost: Remove vhost_enable_zcopy in vhost.h

It is net.c specific.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ab00c42a567b21024383db3fac98118497d6270b 06-May-2013 Asias He <asias@redhat.com> vhost: Remove comments for hdr in vhost.h

It is supposed to be removed when hdr is moved into vhost_net_virtqueue.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
8570a6e72c5bfb1aee8225b8c4814cdc28417a20 06-May-2013 Asias He <asias@redhat.com> vhost: Move VHOST_NET_FEATURES to net.c

vhost.h should not depend on device specific marcos like

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
54db63c2ca153ebf0c868cdf79f52ec9e701d38c 06-May-2013 Asias He <asias@redhat.com> vhost: Export vhost_dev_set_owner

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
150b9e51ae975ca1fe468c565870fbc4a96e0574 28-Apr-2013 Michael S. Tsirkin <mst@redhat.com> vhost: fix error handling in RESET_OWNER ioctl

RESET_OWNER ioctl would leave the fd in a bad state if
memory allocation failed: device is stopped
but owner is not reset. Make state changes
after allocating memory, such that a failed
ioctl has no effect.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
81f95a55802be669b3191b2828c34006d0f04214 28-Apr-2013 Michael S. Tsirkin <mst@redhat.com> vhost: move per-vq net specific fields out to net

This will remove the need for vhost scsi to pull
in virtio-net.h.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2839400f8fe28ce216eeeba3fb97bdf90977f7ad 27-Apr-2013 Asias He <asias@redhat.com> vhost: move vhost-net zerocopy fields to net.c

On top of 'vhost: Allow device specific fields per vq', we can move device
specific fields to device virt queue from vhost virt queue.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
3ab2e420ec1caf4ead233f3161ac7d86fe5d2a9f 27-Apr-2013 Asias He <asias@redhat.com> vhost: Allow device specific fields per vq

This is useful for any device who wants device specific fields per vq.
For example, tcm_vhost wants a per vq field to track requests which are
in flight on the vq. Also, on top of this we can add patches to move
things like ubufs from vhost.h out to net.c.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2b8b328b61c799957a456a5a8dab8cc7dea68575 28-Jan-2013 Jason Wang <jasowang@redhat.com> vhost_net: handle polling errors when setting backend

Currently, the polling errors were ignored, which can lead following issues:

- vhost remove itself unconditionally from waitqueue when stopping the poll,
this may crash the kernel since the previous attempt of starting may fail to
add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling

Solve this by:

- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
will be checked and returned when userspace want to set the backend

After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
935cdee7ee159569b0aaa10bd9244660f6672b08 06-Dec-2012 Michael S. Tsirkin <mst@redhat.com> vhost: avoid backend flush on vring ops

vring changes already do a flush internally where appropriate, so we do
not need a second flush.

It's currently not very expensive but a follow-up patch makes flush more
heavy-weight, so remove the extra flush here to avoid regressing
performance if call or kick fds are changed on data path.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
b211616d712551874db3ce0fb44196f6faad2c34 01-Nov-2012 Michael S. Tsirkin <mst@redhat.com> vhost: move -net specific code out

Zerocopy handling code is vhost-net specific.
Move it from vhost.c/vhost.h out to net.c

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c4fcb586c337f8b8de3a3d7ba8514eed03695f6e 01-Nov-2012 Michael S. Tsirkin <mst@redhat.com> vhost: track zero copy failures using DMA length

This will be used to disable zerocopy when error rate
is high.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
70e4cb9aafb3caa9e5eb25614a5bc40b0ffa9ffd 01-Nov-2012 Michael S. Tsirkin <mst@redhat.com> vhost-net: cleanup macros for DMA status tracking

Better document macros for DMA tracking. Add an
explicit one for DMA in progress instead of
relying on user supplying len != 1.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
e19d6763cc300fcb706bd291b24ac06be71e1ce6 01-Nov-2012 Michael S. Tsirkin <mst@redhat.com> skb: report completion status for zero copy skbs

Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
163049aefdc04323a2d17ec9f2862027b43b0502 21-Jul-2012 Stefan Hajnoczi <stefanha@gmail.com> vhost: make vhost work queue visible

The vhost work queue allows processing to be done in vhost worker thread
context, which uses the owner process mm. Access to the vring and guest
memory is typically only possible from vhost worker context so it is
useful to allow work to be queued directly by users.

Currently vhost_net only uses the poll wrappers which do not expose the
work queue functions. However, for tcm_vhost (vhost_scsi) it will be
necessary to queue custom work.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
0dd05a3b60ddf0e216fbd9e5116a5273966fd9db 21-Jul-2012 Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> vhost: Separate vhost-net features from vhost features

In order for other vhost devices to use the VHOST_FEATURES bits the
vhost-net specific bits need to be moved to their own VHOST_NET_FEATURES

(Asias: Update drivers/vhost/test.c to use VHOST_NET_FEATURES)

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Zhi Yong Wu <wuzhy@cn.ibm.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Asias He <asias@redhat.com>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ca8f4fb21d08747013cce9cf1840aa5bfc31f2d8 09-Apr-2012 Michael S. Tsirkin <mst@redhat.com> skbuff: struct ubuf_info callback type safety

The skb struct ubuf_info callback gets passed struct ubuf_info
itself, not the arg value as the field name and the function signature
seem to imply. Rename the arg field to ctx to match usage,
add documentation and change the callback argument type
to make usage clear and to have compiler check correctness.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ea5d404655ba3b356d0c06d6a3c4f24112124522 27-Nov-2011 Michael S. Tsirkin <mst@redhat.com> vhost: fix release path lockdep checks

We shouldn't hold any locks on release path. Pass a flag to
vhost_dev_cleanup to use the lockdep info correctly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
60063497a95e716c9a689af3be2687d261f115b4 27-Jul-2011 Arun Sharma <asharma@fb.com> atomic: use <linux/atomic.h>

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f59281dafb832b161133743fcf3dc29051e6fdb8 21-Jun-2011 Jason Wang <jasowang@redhat.com> vhost: init used ring after backend was set

Move the used ring initialization after backend was set. This
makes it possible to disable the backend and tweak the used ring,
then restart. This will also make it possible to log the used ring
write correctly.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
bab632d69ee48a106e779b60cc01adfe80a72807 18-Jul-2011 Michael S. Tsirkin <mst@redhat.com> vhost: vhost TX zero-copy support

>From: Shirley Ma <mashirle@us.ibm.com>

This adds experimental zero copy support in vhost-net,
disabled by default. To enable, set
experimental_zcopytx module option to 1.

This patch maintains the outstanding userspace buffers in the
sequence it is delivered to vhost. The outstanding userspace buffers
will be marked as done once the lower device buffers DMA has finished.
This is monitored through last reference of kfree_skb callback. Two
buffer indices are used for this purpose.

The vhost-net device passes the userspace buffers info to lower device
skb through message control. DMA done status check and guest
notification are handled by handle_tx: in the worst case is all buffers
in the vq are in pending/done status, so we need to notify guest to
release DMA done buffers first before we get any new buffers from the

One known problem is that if the guest stops submitting
buffers, buffers might never get used until some
further action, e.g. device reset. This does not
seem to affect linux guests.

Signed-off-by: Shirley <xma@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8ea8cf89e19aeb596b818ee5f2bec8a8b0586b60 20-May-2011 Michael S. Tsirkin <mst@redhat.com> vhost: support event index

Support the new event index feature. When acked,
utilize it to reduce the # of interrupts sent to the guest.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
5e18247b02d60a1ea4bf98c05e139461ca9aec64 18-Jan-2011 Michael S. Tsirkin <mst@redhat.com> vhost: rcu annotation fixup

When built with rcu checks enabled, vhost triggers
bogus warnings as vhost features are read without
dev->mutex sometimes, and private pointer is read
with our kind of rcu where work serves as a
read side critical section.

Fixing it properly is not trivial.
Disable the warnings by stubbing out the checks for now.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
a290aec88a9c4747353ea7aa9b2569bd61297c3c 29-Nov-2010 Jason Wang <jasowang@redhat.com> vhost: fix typos in comment

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
e0e9b406470b8dd75a115cf82c4791f41615c4c3 14-Sep-2010 Jason Wang <jasowang@redhat.com> vhost: max s/g to match qemu

Qemu supports up to UIO_MAXIOV s/g so we have to match that because guest
drivers may rely on this.

Allocate indirect and log arrays dynamically to avoid using too much contigious
memory and make the length of hdr array to match the header length since each
iovec entry has a least one byte.

Test with copying large files w/ and w/o migration in both linux and windows

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
28457ee69c473a903e51e26c7bcd6f1e9eceb93e 09-Mar-2010 Arnd Bergmann <arnd@relay.de.ibm.com> vhost: add __rcu annotations

Also add rcu_dereference_protected() for code paths where locks are held.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
8dd014adfea6f173c1ef6378f7e5e7924866c923 27-Jul-2010 David Stevens <dlstevens@us.ibm.com> vhost-net: mergeable buffers support

This adds support for mergeable buffers in vhost-net: this is needed
for older guests without indirect buffer support, as well
as for zero copy with some devices.

Includes changes by Michael S. Tsirkin to make the
patch as low risk as possible (i.e., close to no changes
when feature is disabled).

Signed-off-by: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
c23f3445e68e1db0e74099f264bc5ff5d55ebdeb 02-Jun-2010 Tejun Heo <tj@kernel.org> vhost: replace vhost_workqueue with per-vhost kthread

Replace vhost_workqueue with per-vhost kthread. Other than callback
argument change from struct work_struct * to struct vhost_work *,
there's no visible change to vhost_poll_*() interface.

This conversion is to make each vhost use a dedicated kthread so that
resource control via cgroup can be applied.

Partially based on Sridhar Samudrala's patch.

* Updated to use sub structure vhost_work instead of directly using
vhost_poll at Michael's suggestion.

* Added flusher wake_up() optimization at Michael's suggestion.

Changes by MST:
* Converted atomics/barrier use to a spinlock.
* Create thread on SET_OWNER
* Fix flushing

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Sridhar Samudrala <samudrala.sridhar@gmail.com>
d5675bd204efd87a174eeea592de23c4c4e7f908 24-Jun-2010 Michael S. Tsirkin <mst@redhat.com> vhost: break out of polling loop on error

When ring parsing fails, we currently handle this
as ring empty condition. This means that we enable
kicks and recheck ring empty: if this not empty,
we re-start polling which of course will fail again.

Instead, let's return a negative error code and stop polling.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
3a4d5c94e959359ece6d6b55045c3f046677f55c 14-Jan-2010 Michael S. Tsirkin <mst@redhat.com> vhost_net: a kernel-level virtio server

What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear. I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device. Backend is also configured by userspace, including vlan/mac

Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU

Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use

Note on RCU usage (this is also documented in vhost.h, near
private_pointer which is the value protected by this variant of RCU):
what is happening is that the rcu_dereference() is being used in a
workqueue item. The role of rcu_read_lock() is taken on by the start of
execution of the workqueue item, of rcu_read_unlock() by the end of
execution of the workqueue item, and of synchronize_rcu() by
flush_workqueue()/flush_work(). In the future we might need to apply
some gcc attribute or sparse annotation to the function passed to
INIT_WORK(). Paul's ack below is for this RCU usage.

(Includes fixes by Alan Cox <alan@linux.intel.com>,
David L Stevens <dlstevens@us.ibm.com>,
Chris Wright <chrisw@redhat.com>)

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>