Takashi Iwai 94ec1a
From 7fcc17d0cb12938d2b3507973a6f93fc9ed2c7a1 Mon Sep 17 00:00:00 2001
Takashi Iwai 94ec1a
From: Lukasz Luba <lukasz.luba@arm.com>
Takashi Iwai 94ec1a
Date: Tue, 3 Aug 2021 11:27:43 +0100
Takashi Iwai 94ec1a
Subject: [PATCH] PM: EM: Increase energy calculation precision
Takashi Iwai 94ec1a
Git-commit: 7fcc17d0cb12938d2b3507973a6f93fc9ed2c7a1
Takashi Iwai 94ec1a
Patch-mainline: v5.15-rc1
Takashi Iwai 7af61b
References: git-fixes stable-5.14.4
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
The Energy Model (EM) provides useful information about device power in
Takashi Iwai 94ec1a
each performance state to other subsystems like: Energy Aware Scheduler
Takashi Iwai 94ec1a
(EAS). The energy calculation in EAS does arithmetic operation based on
Takashi Iwai 94ec1a
the EM em_cpu_energy(). Current implementation of that function uses
Takashi Iwai 94ec1a
em_perf_state::cost as a pre-computed cost coefficient equal to:
Takashi Iwai 94ec1a
cost = power * max_frequency / frequency.
Takashi Iwai 94ec1a
The 'power' is expressed in milli-Watts (or in abstract scale).
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
There are corner cases when the EAS energy calculation for two Performance
Takashi Iwai 94ec1a
Domains (PDs) return the same value. The EAS compares these values to
Takashi Iwai 94ec1a
choose smaller one. It might happen that this values are equal due to
Takashi Iwai 94ec1a
rounding error. In such scenario, we need better resolution, e.g. 1000
Takashi Iwai 94ec1a
times better. To provide this possibility increase the resolution in the
Takashi Iwai 94ec1a
em_perf_state::cost for 64-bit architectures. The cost of increasing
Takashi Iwai 94ec1a
resolution on 32-bit is pretty high (64-bit division) and is not justified
Takashi Iwai 94ec1a
since there are no new 32bit big.LITTLE EAS systems expected which would
Takashi Iwai 94ec1a
benefit from this higher resolution.
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
This patch allows to avoid the rounding to milli-Watt errors, which might
Takashi Iwai 94ec1a
occur in EAS energy estimation for each PD. The rounding error is common
Takashi Iwai 94ec1a
for small tasks which have small utilization value.
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
There are two places in the code where it makes a difference:
Takashi Iwai 94ec1a
1. In the find_energy_efficient_cpu() where we are searching for
Takashi Iwai 94ec1a
best_delta. We might suffer there when two PDs return the same result,
Takashi Iwai 94ec1a
like in the example below.
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
Scenario: 
Takashi Iwai 94ec1a
Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There
Takashi Iwai 94ec1a
are quite a few small tasks ~10-15 util. These tasks would suffer for
Takashi Iwai 94ec1a
the rounding error. These utilization values are typical when running games
Takashi Iwai 94ec1a
on Android. One of our partners has reported 5..10mA less battery drain
Takashi Iwai 94ec1a
when running with increased resolution.
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
Some details:
Takashi Iwai 94ec1a
We have two PDs: PD0 (big) and PD1 (little)
Takashi Iwai 94ec1a
Let's compare w/o patch set ('old') and w/ patch set ('new')
Takashi Iwai 94ec1a
We are comparing energy w/ task and w/o task placed in the PDs
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
a) 'old' w/o patch set, PD0
Takashi Iwai 94ec1a
task_util = 13
Takashi Iwai 94ec1a
cost = 480
Takashi Iwai 94ec1a
sum_util_w/o_task = 215
Takashi Iwai 94ec1a
sum_util_w_task = 228
Takashi Iwai 94ec1a
scale_cpu = 1024
Takashi Iwai 94ec1a
energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100
Takashi Iwai 94ec1a
energy_w_task = 480 * 228 / 1024 = 106.87 => 106
Takashi Iwai 94ec1a
energy_diff = 106 - 100 = 6
Takashi Iwai 94ec1a
(this is equal to 'old' PD1's energy_diff in 'c)')
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
b) 'new' w/ patch set, PD0
Takashi Iwai 94ec1a
task_util = 13
Takashi Iwai 94ec1a
cost = 480 * 1000 = 480000
Takashi Iwai 94ec1a
sum_util_w/o_task = 215
Takashi Iwai 94ec1a
sum_util_w_task = 228
Takashi Iwai 94ec1a
energy_w/o_task = 480000 * 215 / 1024 = 100781
Takashi Iwai 94ec1a
energy_w_task = 480000 * 228 / 1024  = 106875
Takashi Iwai 94ec1a
energy_diff = 106875 - 100781 = 6094
Takashi Iwai 94ec1a
(this is not equal to 'new' PD1's energy_diff in 'd)')
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
c) 'old' w/o patch set, PD1
Takashi Iwai 94ec1a
task_util = 13
Takashi Iwai 94ec1a
cost = 160
Takashi Iwai 94ec1a
sum_util_w/o_task = 283
Takashi Iwai 94ec1a
sum_util_w_task = 293
Takashi Iwai 94ec1a
scale_cpu = 355
Takashi Iwai 94ec1a
energy_w/o_task = 160 * 283 / 355 = 127.55 => 127
Takashi Iwai 94ec1a
energy_w_task = 160 * 296 / 355 = 133.41 => 133
Takashi Iwai 94ec1a
energy_diff = 133 - 127 = 6
Takashi Iwai 94ec1a
(this is equal to 'old' PD0's energy_diff in 'a)')
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
d) 'new' w/ patch set, PD1
Takashi Iwai 94ec1a
task_util = 13
Takashi Iwai 94ec1a
cost = 160 * 1000 = 160000
Takashi Iwai 94ec1a
sum_util_w/o_task = 283
Takashi Iwai 94ec1a
sum_util_w_task = 293
Takashi Iwai 94ec1a
scale_cpu = 355
Takashi Iwai 94ec1a
energy_w/o_task = 160000 * 283 / 355 = 127549
Takashi Iwai 94ec1a
energy_w_task = 160000 * 296 / 355 =   133408
Takashi Iwai 94ec1a
energy_diff = 133408 - 127549 = 5859
Takashi Iwai 94ec1a
(this is not equal to 'new' PD0's energy_diff in 'b)')
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
2. Difference in the 6% energy margin filter at the end of
Takashi Iwai 94ec1a
find_energy_efficient_cpu(). With this patch the margin comparison also
Takashi Iwai 94ec1a
has better resolution, so it's possible to have better task placement
Takashi Iwai 94ec1a
thanks to that.
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
Fixes: 27871f7a8a341ef ("PM: Introduce an Energy Model management framework")
Takashi Iwai 94ec1a
Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com>
Takashi Iwai 94ec1a
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Takashi Iwai 94ec1a
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Takashi Iwai 94ec1a
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Takashi Iwai 94ec1a
Acked-by: Takashi Iwai <tiwai@suse.de>
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
---
Takashi Iwai 94ec1a
 include/linux/energy_model.h | 16 ++++++++++++++++
Takashi Iwai 94ec1a
 kernel/power/energy_model.c  |  4 +++-
Takashi Iwai 94ec1a
 2 files changed, 19 insertions(+), 1 deletion(-)
Takashi Iwai 94ec1a
Takashi Iwai 94ec1a
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
Takashi Iwai 94ec1a
index 3f221dbf5f95..1834752c5617 100644
Takashi Iwai 94ec1a
--- a/include/linux/energy_model.h
Takashi Iwai 94ec1a
+++ b/include/linux/energy_model.h
Takashi Iwai 94ec1a
@@ -53,6 +53,22 @@ struct em_perf_domain {
Takashi Iwai 94ec1a
 #ifdef CONFIG_ENERGY_MODEL
Takashi Iwai 94ec1a
 #define EM_MAX_POWER 0xFFFF
Takashi Iwai 94ec1a
 
Takashi Iwai 94ec1a
+/*
Takashi Iwai 94ec1a
+ * Increase resolution of energy estimation calculations for 64-bit
Takashi Iwai 94ec1a
+ * architectures. The extra resolution improves decision made by EAS for the
Takashi Iwai 94ec1a
+ * task placement when two Performance Domains might provide similar energy
Takashi Iwai 94ec1a
+ * estimation values (w/o better resolution the values could be equal).
Takashi Iwai 94ec1a
+ *
Takashi Iwai 94ec1a
+ * We increase resolution only if we have enough bits to allow this increased
Takashi Iwai 94ec1a
+ * resolution (i.e. 64-bit). The costs for increasing resolution when 32-bit
Takashi Iwai 94ec1a
+ * are pretty high and the returns do not justify the increased costs.
Takashi Iwai 94ec1a
+ */
Takashi Iwai 94ec1a
+#ifdef CONFIG_64BIT
Takashi Iwai 94ec1a
+#define em_scale_power(p) ((p) * 1000)
Takashi Iwai 94ec1a
+#else
Takashi Iwai 94ec1a
+#define em_scale_power(p) (p)
Takashi Iwai 94ec1a
+#endif
Takashi Iwai 94ec1a
+
Takashi Iwai 94ec1a
 struct em_data_callback {
Takashi Iwai 94ec1a
 	/**
Takashi Iwai 94ec1a
 	 * active_power() - Provide power at the next performance state of
Takashi Iwai 94ec1a
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
Takashi Iwai 94ec1a
index 0f4530b3a8cd..a332ccd829e2 100644
Takashi Iwai 94ec1a
--- a/kernel/power/energy_model.c
Takashi Iwai 94ec1a
+++ b/kernel/power/energy_model.c
Takashi Iwai 94ec1a
@@ -170,7 +170,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
Takashi Iwai 94ec1a
 	/* Compute the cost of each performance state. */
Takashi Iwai 94ec1a
 	fmax = (u64) table[nr_states - 1].frequency;
Takashi Iwai 94ec1a
 	for (i = 0; i < nr_states; i++) {
Takashi Iwai 94ec1a
-		table[i].cost = div64_u64(fmax * table[i].power,
Takashi Iwai 94ec1a
+		unsigned long power_res = em_scale_power(table[i].power);
Takashi Iwai 94ec1a
+
Takashi Iwai 94ec1a
+		table[i].cost = div64_u64(fmax * power_res,
Takashi Iwai 94ec1a
 					  table[i].frequency);
Takashi Iwai 94ec1a
 	}
Takashi Iwai 94ec1a
 
Takashi Iwai 94ec1a
-- 
Takashi Iwai 94ec1a
2.26.2
Takashi Iwai 94ec1a