Blob Blame History Raw
From 8cfbdbdc24815417a3ab35101ccf706b9a23ff17 Mon Sep 17 00:00:00 2001
From: Paul Mackerras <paulus@ozlabs.org>
Date: Thu, 23 Aug 2018 10:08:58 +1000
Subject: [PATCH] KVM: PPC: Book3S: Fix guest DMA when guest partially backed
 by THP pages

References: bsc#1077761, git-fixes, bsc#1103948, bsc#1103949
Patch-mainline: v4.19-rc1
Git-commit: 8cfbdbdc24815417a3ab35101ccf706b9a23ff17

Commit 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in
the pinned physical page", 2018-07-17) added some checks to ensure
that guest DMA mappings don't attempt to map more than the guest is
entitled to access. However, errors in the logic mean that legitimate
guest requests to map pages for DMA are being denied in some
situations. Specifically, if the first page of the range passed to
mm_iommu_get() is mapped with a normal page, and subsequent pages are
mapped with transparent huge pages, we end up with mem->pageshift ==
0. That means that the page size checks in mm_iommu_ua_to_hpa() and
mm_iommu_up_to_hpa_rm() will always fail for every page in that
region, and thus the guest can never map any memory in that region for
DMA, typically leading to a flood of error messages like this:

  qemu-system-ppc64: VFIO_MAP_DMA: -22
  qemu-system-ppc64: vfio_dma_map(0x10005f47780, 0x800000000000000, 0x10000, 0x7fff63ff0000) = -22 (Invalid argument)

The logic errors in mm_iommu_get() are:

  (a) use of 'ua' not 'ua + (i << PAGE_SHIFT)' in the find_linux_pte()
      call (meaning that find_linux_pte() returns the pte for the
      first address in the range, not the address we are currently up
      to);
  (b) use of 'pageshift' as the variable to receive the hugepage shift
      returned by find_linux_pte() - for a normal page this gets set
      to 0, leading to us setting mem->pageshift to 0 when we conclude
      that the pte returned by find_linux_pte() didn't match the page
      we were looking at;
  (c) comparing 'compshift', which is a page order, i.e. log base 2 of
      the number of pages, with 'pageshift', which is a log base 2 of
      the number of bytes.

To fix these problems, this patch introduces 'cur_ua' to hold the
current user address and uses that in the find_linux_pte() call;
introduces 'pteshift' to hold the hugepage shift found by
find_linux_pte(); and compares 'pteshift' with 'compshift +
PAGE_SHIFT' rather than 'compshift'.

The patch also moves the local_irq_restore to the point after the PTE
pointer returned by find_linux_pte() has been dereferenced because
otherwise the PTE could change underneath us, and adds a check to
avoid doing the find_linux_pte() call once mem->pageshift has been
reduced to PAGE_SHIFT, as an optimization.

Fixes: 76fa4975f3ed ("KVM: PPC: Check if IOMMU page is contained in the pinned physical page")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michal Suchanek <msuchanek@suse.de>
---
 arch/powerpc/mm/mmu_context_iommu.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_iommu.c b/arch/powerpc/mm/mmu_context_iommu.c
index a4ca57612558..c9ee9e23845f 100644
--- a/arch/powerpc/mm/mmu_context_iommu.c
+++ b/arch/powerpc/mm/mmu_context_iommu.c
@@ -129,6 +129,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 	long i, j, ret = 0, locked_entries = 0;
 	unsigned int pageshift;
 	unsigned long flags;
+	unsigned long cur_ua;
 	struct page *page = NULL;
 
 	mutex_lock(&mem_list_mutex);
@@ -177,7 +178,8 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 	}
 
 	for (i = 0; i < entries; ++i) {
-		if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+		cur_ua = ua + (i << PAGE_SHIFT);
+		if (1 != get_user_pages_fast(cur_ua,
 					1/* pages */, 1/* iswrite */, &page)) {
 			ret = -EFAULT;
 			for (j = 0; j < i; ++j)
@@ -196,7 +198,7 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		if (is_migrate_cma_page(page)) {
 			if (mm_iommu_move_page_from_cma(page))
 				goto populate;
-			if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
+			if (1 != get_user_pages_fast(cur_ua,
 						1/* pages */, 1/* iswrite */,
 						&page)) {
 				ret = -EFAULT;
@@ -210,20 +212,21 @@ long mm_iommu_get(struct mm_struct *mm, unsigned long ua, unsigned long entries,
 		}
 populate:
 		pageshift = PAGE_SHIFT;
-		if (PageCompound(page)) {
+		if (mem->pageshift > PAGE_SHIFT && PageCompound(page)) {
 			pte_t *pte;
 			struct page *head = compound_head(page);
 			unsigned int compshift = compound_order(head);
+			unsigned int pteshift;
 
 			local_irq_save(flags); /* disables as well */
-			pte = find_linux_pte(mm->pgd, ua, NULL, &pageshift);
-			local_irq_restore(flags);
+			pte = find_linux_pte(mm->pgd, cur_ua, NULL, &pteshift);
 
 			/* Double check it is still the same pinned page */
 			if (pte && pte_page(*pte) == head &&
-					pageshift == compshift)
-				pageshift = max_t(unsigned int, pageshift,
+			    pteshift == compshift + PAGE_SHIFT)
+				pageshift = max_t(unsigned int, pteshift,
 						PAGE_SHIFT);
+			local_irq_restore(flags);
 		}
 		mem->pageshift = min(mem->pageshift, pageshift);
 		mem->hpas[i] = page_to_pfn(page) << PAGE_SHIFT;
-- 
2.13.7