Blob Blame History Raw
From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date: Tue, 10 Apr 2018 12:27:04 +0100
Subject: drm/i915/pmu: Inspect runtime PM state more carefully while
 estimating RC6
Git-commit: e6be6bd85654dba55b97758f937c46835d961a44
Patch-mainline: v4.17-rc2
References: FATE#326289 FATE#326079 FATE#326049 FATE#322398 FATE#326166

While thinking about sporadic failures of perf_pmu/rc6-runtime-pm* tests
on some CI machines I have concluded that: a) the PMU readout of RC6 can
race against runtime PM transitions, and b) there are other reasons than
being runtime suspended which can cause intel_runtime_pm_get_if_in_use to
fail.

Therefore when estimating RC6 the code needs to assert we are indeed in
suspended state, and if not, the best we can do is return the last known
RC6 value.

Without this check we can calculate the estimated value based on un-
initialized or inappropriate internal state, which can result in over-
estimation, or in any case incorrect value being returned.

v2:
 * Re-arrange the code a bit to avoid second unlock and return branch.
   (Chris Wilson)

v3:
 * Insert some strategic blank lines and improve commit msg.
   (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 1fe699e30113 ("drm/i915/pmu: Fix sleep under atomic in RC6 readout")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180410112704.24462-1-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 2924bdee21edd6785a4df1b4d17fd3cb265fddd9)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Petr Tesarik <ptesarik@suse.com>
---
 drivers/gpu/drm/i915/i915_pmu.c |   41 ++++++++++++++++++++++++++++------------
 1 file changed, 29 insertions(+), 12 deletions(-)

--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -473,20 +473,37 @@ static u64 get_rc6(struct drm_i915_priva
 		spin_lock_irqsave(&i915->pmu.lock, flags);
 		spin_lock(&kdev->power.lock);
 
-		if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
-			i915->pmu.suspended_jiffies_last =
-						kdev->power.suspended_jiffies;
-
-		val = kdev->power.suspended_jiffies -
-		      i915->pmu.suspended_jiffies_last;
-		val += jiffies - kdev->power.accounting_timestamp;
+		/*
+		 * After the above branch intel_runtime_pm_get_if_in_use failed
+		 * to get the runtime PM reference we cannot assume we are in
+		 * runtime suspend since we can either: a) race with coming out
+		 * of it before we took the power.lock, or b) there are other
+		 * states than suspended which can bring us here.
+		 *
+		 * We need to double-check that we are indeed currently runtime
+		 * suspended and if not we cannot do better than report the last
+		 * known RC6 value.
+		 */
+		if (kdev->power.runtime_status == RPM_SUSPENDED) {
+			if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
+				i915->pmu.suspended_jiffies_last =
+						  kdev->power.suspended_jiffies;
+
+			val = kdev->power.suspended_jiffies -
+			      i915->pmu.suspended_jiffies_last;
+			val += jiffies - kdev->power.accounting_timestamp;
+
+			val = jiffies_to_nsecs(val);
+			val += i915->pmu.sample[__I915_SAMPLE_RC6].cur;
+
+			i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = val;
+		} else if (i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur) {
+			val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
+		} else {
+			val = i915->pmu.sample[__I915_SAMPLE_RC6].cur;
+		}
 
 		spin_unlock(&kdev->power.lock);
-
-		val = jiffies_to_nsecs(val);
-		val += i915->pmu.sample[__I915_SAMPLE_RC6].cur;
-		i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = val;
-
 		spin_unlock_irqrestore(&i915->pmu.lock, flags);
 	}