Blob Blame History Raw
From 0e997db8386f0ef000ba8894caa0990a2f301be7 Mon Sep 17 00:00:00 2001
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Fri, 29 Apr 2022 11:04:13 +0100
Subject: drm/i915: Enable THP on Icelake and beyond
Git-commit: 23dd74db02d75579d8d4eb0b88c7ad119e782269
Patch-mainline: v6.0-rc1
References: jsc#PED-1166 jsc#PED-1168 jsc#PED-1170 jsc#PED-1218 jsc#PED-1220 jsc#PED-1222 jsc#PED-1223 jsc#PED-1225 jsc#PED-2849

We have a statement from HW designers that the GPU read regression when
using 2M pages was fixed from Icelake onwards, which was also confirmed
by bencharking Eero did last year:

"""
When IOMMU is disabled, enabling THP causes following perf changes on
TGL-H (GT1):

    10-15% SynMark Batch[0-3]
    5-10% MemBW GPU texture, SynMark ShMapVsm
    3-5% SynMark TerrainFly* + Geom* + Fill* + CSCloth + Batch4
    1-3% GpuTest Triangle, SynMark TexMem* + DeferredAA + Batch[5-7]
          + few others
    -7% MemBW GPU blend

In the above 3D benchmark names, * means all the variants of tests with
the same prefix. For example "SynMark TexMem*", means both TexMem128 &
TexMem512 tests in the synthetic (Intel internal) SynMark test suite.

In the (public, but proprietary) GfxBench & GLB(enchmark) test suites,
there are both onscreen and offscreen variants of each test. Unless
explicitly stated otherwise, numbers are for both variants.

All tests are run with FullHD monitor. All tests are fullscreen except
for GLB and GpuTest ones, which are run in 1/2 screen window (GpuTest
triangle is run both in fullscreen and 1/2 screen window).
"""

Since the only regression is MemBW GPU blend, against many more gains,
it sounds it is time to enable THP on Gen11+.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/430
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220429100414.647857-1-tvrtko.ursulin@linux.intel.com
Acked-by: Patrik Jakobsson <pjakobsson@suse.de>
---
 drivers/gpu/drm/i915/gem/i915_gemfs.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c
index ee87874e59dc..c5a6bbc842fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gemfs.c
+++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c
@@ -28,12 +28,14 @@ int i915_gemfs_init(struct drm_i915_private *i915)
 	 *
 	 * One example, although it is probably better with a per-file
 	 * control, is selecting huge page allocations ("huge=within_size").
-	 * However, we only do so to offset the overhead of iommu lookups
-	 * due to bandwidth issues (slow reads) on Broadwell+.
+	 * However, we only do so on platforms which benefit from it, or to
+	 * offset the overhead of iommu lookups, where with latter it is a net
+	 * win even on platforms which would otherwise see some performance
+	 * regressions such a slow reads issue on Broadwell and Skylake.
 	 */
 
 	opts = NULL;
-	if (i915_vtd_active(i915)) {
+	if (GRAPHICS_VER(i915) >= 11 || i915_vtd_active(i915)) {
 		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
 			opts = huge_opt;
 			drm_info(&i915->drm,
@@ -41,7 +43,10 @@ int i915_gemfs_init(struct drm_i915_private *i915)
 				 opts);
 		} else {
 			drm_notice(&i915->drm,
-				   "Transparent Hugepage support is recommended for optimal performance when IOMMU is enabled!\n");
+				   "Transparent Hugepage support is recommended for optimal performance%s\n",
+				   GRAPHICS_VER(i915) >= 11 ?
+				   " on this platform!" :
+				   " when IOMMU is enabled!");
 		}
 	}
 
-- 
2.38.1