Cross Reference: /drivers/vfio/vfio_iommu

History log of /drivers/vfio/vfio_iommu_type1.c
Revision	Date	Author	Comments
f5c9ecebaf2a2c9381973798e389cc019dd983e0	29-Sep-2014	Will Deacon <will.deacon@arm.com>	vfio/iommu_type1: add new VFIO_TYPE1_NESTING_IOMMU IOMMU type VFIO allows devices to be safely handed off to userspace by putting them behind an IOMMU configured to ensure DMA and interrupt isolation. This enables userspace KVM clients, such as kvmtool and qemu, to further map the device into a virtual machine. With IOMMUs such as the ARM SMMU, it is then possible to provide SMMU translation services to the guest operating system, which are nested with the existing translation installed by VFIO. However, enabling this feature means that the IOMMU driver must be informed that the VFIO domain is being created for the purposes of nested translation. This patch adds a new IOMMU type (VFIO_TYPE1_NESTING_IOMMU) to the VFIO type-1 driver. The new IOMMU type acts identically to the VFIO_TYPE1v2_IOMMU type, but additionally sets the DOMAIN_ATTR_NESTING attribute on its IOMMU domains. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
eb165f0584d2c073dd343bdc609e2f94d143037e	05-Sep-2014	Joerg Roedel <jroedel@suse.de>	vfio: Convert to use new iommu_capable() API function Cc: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
c8dbca165bb090f926996a572ea2b5b577b34b70	30-May-2014	Alex Williamson <alex.williamson@redhat.com>	vfio/iommu_type1: Avoid overflow Coverity reports use of a tained scalar used as a loop boundary. For the most part, any values passed from userspace for a DMA mapping size, IOVA, or virtual address are valid, with some alignment constraints. The size is ultimately bound by how many pages the user is able to lock, IOVA is tested by the IOMMU driver when doing a map, and the virtual address needs to pass get_user_pages. The only problem I can find is that we do expect the __u64 user values to fit within our variables, which might not happen on 32bit platforms. Add a test for this and return error on overflow. Also propagate use of the type-correct local variables throughout the function. The above also points to the 'end' variable, which can be zero if we're operating at the very top of the address space. We try to account for this, but our loop botches it. Rework the loop to use the remaining size as our loop condition rather than the IOVA vs end. Detected by Coverity: CID 714659 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
668f9abbd4334e6c29fa8acd71635c4f9101caa7	04-Mar-2014	David Rientjes <rientjes@google.com>	mm: close PageTail race Commit bf6bddf1924e ("mm: introduce compaction and migration for ballooned pages") introduces page_count(page) into memory compaction which dereferences page->first_page if PageTail(page). This results in a very rare NULL pointer dereference on the aforementioned page_count(page). Indeed, anything that does compound_head(), including page_count() is susceptible to racing with prep_compound_page() and seeing a NULL or dangling page->first_page pointer. This patch uses Andrea's implementation of compound_trans_head() that deals with such a race and makes it the default compound_head() implementation. This includes a read memory barrier that ensures that if PageTail(head) is true that we return a head page that is neither NULL nor dangling. The patch then adds a store memory barrier to prep_compound_page() to ensure page->first_page is set. This is the safest way to ensure we see the head page that we are expecting, PageTail(page) is already in the unlikely() path and the memory barriers are unfortunately required. Hugetlbfs is the exception, we don't enforce a store memory barrier during init since no race is possible. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Holger Kiehl <Holger.Kiehl@dwd.de> Cc: Christoph Lameter <cl@linux.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
aa429318279b90192f35a97e9ccdc1e83b3a9624	26-Feb-2014	Alex Williamson <alex.williamson@redhat.com>	vfio/type1: Add extension to test DMA cache coherence of IOMMU Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to be able to test whether coherency is currently enforced. Add an extension for this. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
1ef3e2bc04223ff956dc62abaf2dff1f3322a431	26-Feb-2014	Alex Williamson <alex.williamson@redhat.com>	vfio/iommu_type1: Multi-IOMMU domain support We currently have a problem that we cannot support advanced features of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee that those features will be supported by all of the hardware units involved with the domain over its lifetime. For instance, the Intel VT-d architecture does not require that all DRHDs support snoop control. If we create a domain based on a device behind a DRHD that does support snoop control and enable SNP support via the IOMMU_CACHE mapping option, we cannot then add a device behind a DRHD which does not support snoop control or we'll get reserved bit faults from the SNP bit in the pagetables. To add to the complexity, we can't know the properties of a domain until a device is attached. We could pass this problem off to userspace and require that a separate vfio container be used, but we don't know how to handle page accounting in that case. How do we know that a page pinned in one container is the same page as a different container and avoid double billing the user for the page. The solution is therefore to support multiple IOMMU domains per container. In the majority of cases, only one domain will be required since hardware is typically consistent within a system. However, this provides us the ability to validate compatibility of domains and support mixed environments where page table flags can be different between domains. To do this, our DMA tracking needs to change. We currently try to coalesce user mappings into as few tracking entries as possible. The problem then becomes that we lose granularity of user mappings. We've never guaranteed that a user is able to unmap at a finer granularity than the original mapping, but we must honor the granularity of the original mapping. This coalescing code is therefore removed, allowing only unmaps covering complete maps. The change in accounting is fairly small here, a typical QEMU VM will start out with roughly a dozen entries, so it's arguable if this coalescing was ever needed. We also move IOMMU domain creation to the point where a group is attached to the container. An interesting side-effect of this is that we now have access to the device at the time of domain creation and can probe the devices within the group to determine the bus_type. This finally makes vfio_iommu_type1 completely device/bus agnostic. In fact, each IOMMU domain can host devices on different buses managed by different physical IOMMUs, and present a single DMA mapping interface to the user. When a new domain is created, mappings are replayed to bring the IOMMU pagetables up to the state of the current container. And of course, DMA mapping and unmapping automatically traverse all of the configured IOMMU domains. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Varun Sethi <Varun.Sethi@freescale.com>
d93b3ac0edb85b1c1e8fe0a065c1e5045783e2f6	11-Oct-2013	Antonios Motakis <a.motakis@virtualopensystems.com>	VFIO: vfio_iommu_type1: fix bug caused by break in nested loop In vfio_iommu_type1.c there is a bug in vfio_dma_do_map, when checking that pages are not already mapped. Since the check is being done in a for loop nested within the main loop, breaking out of it does not create the intended behavior. If the underlying IOMMU driver returns a non-NULL value, this will be ignored and mapping the DMA range will be attempted anyway, leading to unpredictable behavior. This interracts badly with the ARM SMMU driver issue fixed in the patch that was submitted with the title: "[PATCH 2/2] ARM: SMMU: return NULL on error in arm_smmu_iova_to_phys" Both fixes are required in order to use the vfio_iommu_type1 driver with an ARM SMMU. This patch refactors the function slightly, in order to also make this kind of bug less likely. Signed-off-by: Antonios Motakis <a.motakis@virtualopensystems.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
8d38ef1948bd415a5cb653a5c0ec16f3402aaca1	01-Jul-2013	Alex Williamson <alex.williamson@redhat.com>	vfio/type1: Fix leak on error path We also don't handle unpinning zero pages as an error on other exits so we can fix that inconsistency by rolling in the next conditional return. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
f5bfdbf252ad54aa4b6f765cbe89750087fcc155	26-Jun-2013	Alex Williamson <alex.williamson@redhat.com>	vfio/type1: Fix missed frees and zero sized removes With hugepage support we can only properly aligned and sized ranges. We only guarantee that we can unmap the same ranges mapped and not arbitrary sub-ranges. This means we might not free anything or might free more than requested. The vfio unmap interface started storing the unmapped size to return to userspace to handle this. This patch fixes a few places where we don't properly handle those cases, moves a memory allocation to a place where failure is an option and checks our loops to make sure we don't get into an infinite loop trying to remove an overlap. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
5c6c2b21ecc9adfc47cfaf93404937b6ecd8395d	21-Jun-2013	Alex Williamson <alex.williamson@redhat.com>	vfio: Provide module option to disable vfio_iommu_type1 hugepage support Add a module option to vfio_iommu_type1 to disable IOMMU hugepage support. This causes iommu_map to only be called with single page mappings, disabling the IOMMU driver's ability to use hugepages. This option can be enabled by loading vfio_iommu_type1 with disable_hugepages=1 or dynamically through sysfs. If enabled dynamically, only new mappings are restricted. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
166fd7d94afdac040b28c473e45241820ca522a2	21-Jun-2013	Alex Williamson <alex.williamson@redhat.com>	vfio: hugepage support for vfio_iommu_type1 We currently send all mappings to the iommu in PAGE_SIZE chunks, which prevents the iommu from enabling support for larger page sizes. We still need to pin pages, which means we step through them in PAGE_SIZE chunks, but we can batch up contiguous physical memory chunks to allow the iommu the opportunity to use larger pages. The approach here is a bit different that the one currently used for legacy KVM device assignment. Rather than looking at the vma page size and using that as the maximum size to pass to the iommu, we instead simply look at whether the next page is physically contiguous. This means we might ask the iommu to map a 4MB region, while legacy KVM might limit itself to a maximum of 2MB. Splitting our mapping path also allows us to be smarter about locked memory because we can more easily unwind if the user attempts to exceed the limit. Therefore, rather than assuming that a mapping will result in locked memory, we test each page as it is pinned to determine whether it locks RAM vs an mmap'd MMIO region. This should result in better locking granularity and less locked page fudge factors in userspace. The unmap path uses the same algorithm as legacy KVM. We don't want to track the pfn for each mapping ourselves, but we need the pfn in order to unpin pages. We therefore ask the iommu for the iova to physical address translation, ask it to unpin a page, and see how many pages were actually unpinned. iommus supporting large pages will often return something bigger than a page here, which we know will be physically contiguous and we can unpin a batch of pfns. iommus not supporting large mappings won't see an improvement in batching here as they only unmap a page at a time. With this change, we also make a clarification to the API for mapping and unmapping DMA. We can only guarantee unmaps at the same granularity as used for the original mapping. In other words, unmapping a subregion of a previous mapping is not guaranteed and may result in a larger or smaller unmapping than requested. The size field in the unmapping structure is updated to reflect this. Previously this was unmodified on mapping, always returning the the requested unmap size. This is now updated to return the actual unmap size on success, allowing userspace to appropriately track mappings. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
cd9b22685e4ccd728550d51fbe108c473f89df4f	21-Jun-2013	Alex Williamson <alex.williamson@redhat.com>	vfio: Convert type1 iommu to use rbtree We need to keep track of all the DMA mappings of an iommu container so that it can be automatically unmapped when the user releases the file descriptor. We currently do this using a simple list, where we merge entries with contiguous iovas and virtual addresses. Using a tree for this is a bit more efficient and allows us to use common code instead of inventing our own. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
73fa0d10d077d9521ee2dace2307ae2c9a965336	31-Jul-2012	Alex Williamson <alex.williamson@redhat.com>	vfio: Type1 IOMMU implementation This VFIO IOMMU backend is designed primarily for AMD-Vi and Intel VT-d hardware, but is potentially usable by anything supporting similar mapping functionality. We arbitrarily call this a Type1 backend for lack of a better name. This backend has no IOVA or host memory mapping restrictions for the user and is optimized for relatively static mappings. Mapped areas are pinned into system memory. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>