|
Takashi Iwai |
94ec1a |
From 7fcc17d0cb12938d2b3507973a6f93fc9ed2c7a1 Mon Sep 17 00:00:00 2001
|
|
Takashi Iwai |
94ec1a |
From: Lukasz Luba <lukasz.luba@arm.com>
|
|
Takashi Iwai |
94ec1a |
Date: Tue, 3 Aug 2021 11:27:43 +0100
|
|
Takashi Iwai |
94ec1a |
Subject: [PATCH] PM: EM: Increase energy calculation precision
|
|
Takashi Iwai |
94ec1a |
Git-commit: 7fcc17d0cb12938d2b3507973a6f93fc9ed2c7a1
|
|
Takashi Iwai |
94ec1a |
Patch-mainline: v5.15-rc1
|
|
Takashi Iwai |
7af61b |
References: git-fixes stable-5.14.4
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
The Energy Model (EM) provides useful information about device power in
|
|
Takashi Iwai |
94ec1a |
each performance state to other subsystems like: Energy Aware Scheduler
|
|
Takashi Iwai |
94ec1a |
(EAS). The energy calculation in EAS does arithmetic operation based on
|
|
Takashi Iwai |
94ec1a |
the EM em_cpu_energy(). Current implementation of that function uses
|
|
Takashi Iwai |
94ec1a |
em_perf_state::cost as a pre-computed cost coefficient equal to:
|
|
Takashi Iwai |
94ec1a |
cost = power * max_frequency / frequency.
|
|
Takashi Iwai |
94ec1a |
The 'power' is expressed in milli-Watts (or in abstract scale).
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
There are corner cases when the EAS energy calculation for two Performance
|
|
Takashi Iwai |
94ec1a |
Domains (PDs) return the same value. The EAS compares these values to
|
|
Takashi Iwai |
94ec1a |
choose smaller one. It might happen that this values are equal due to
|
|
Takashi Iwai |
94ec1a |
rounding error. In such scenario, we need better resolution, e.g. 1000
|
|
Takashi Iwai |
94ec1a |
times better. To provide this possibility increase the resolution in the
|
|
Takashi Iwai |
94ec1a |
em_perf_state::cost for 64-bit architectures. The cost of increasing
|
|
Takashi Iwai |
94ec1a |
resolution on 32-bit is pretty high (64-bit division) and is not justified
|
|
Takashi Iwai |
94ec1a |
since there are no new 32bit big.LITTLE EAS systems expected which would
|
|
Takashi Iwai |
94ec1a |
benefit from this higher resolution.
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
This patch allows to avoid the rounding to milli-Watt errors, which might
|
|
Takashi Iwai |
94ec1a |
occur in EAS energy estimation for each PD. The rounding error is common
|
|
Takashi Iwai |
94ec1a |
for small tasks which have small utilization value.
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
There are two places in the code where it makes a difference:
|
|
Takashi Iwai |
94ec1a |
1. In the find_energy_efficient_cpu() where we are searching for
|
|
Takashi Iwai |
94ec1a |
best_delta. We might suffer there when two PDs return the same result,
|
|
Takashi Iwai |
94ec1a |
like in the example below.
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
Scenario:
|
|
Takashi Iwai |
94ec1a |
Low utilized system e.g. ~200 sum_util for PD0 and ~220 for PD1. There
|
|
Takashi Iwai |
94ec1a |
are quite a few small tasks ~10-15 util. These tasks would suffer for
|
|
Takashi Iwai |
94ec1a |
the rounding error. These utilization values are typical when running games
|
|
Takashi Iwai |
94ec1a |
on Android. One of our partners has reported 5..10mA less battery drain
|
|
Takashi Iwai |
94ec1a |
when running with increased resolution.
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
Some details:
|
|
Takashi Iwai |
94ec1a |
We have two PDs: PD0 (big) and PD1 (little)
|
|
Takashi Iwai |
94ec1a |
Let's compare w/o patch set ('old') and w/ patch set ('new')
|
|
Takashi Iwai |
94ec1a |
We are comparing energy w/ task and w/o task placed in the PDs
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
a) 'old' w/o patch set, PD0
|
|
Takashi Iwai |
94ec1a |
task_util = 13
|
|
Takashi Iwai |
94ec1a |
cost = 480
|
|
Takashi Iwai |
94ec1a |
sum_util_w/o_task = 215
|
|
Takashi Iwai |
94ec1a |
sum_util_w_task = 228
|
|
Takashi Iwai |
94ec1a |
scale_cpu = 1024
|
|
Takashi Iwai |
94ec1a |
energy_w/o_task = 480 * 215 / 1024 = 100.78 => 100
|
|
Takashi Iwai |
94ec1a |
energy_w_task = 480 * 228 / 1024 = 106.87 => 106
|
|
Takashi Iwai |
94ec1a |
energy_diff = 106 - 100 = 6
|
|
Takashi Iwai |
94ec1a |
(this is equal to 'old' PD1's energy_diff in 'c)')
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
b) 'new' w/ patch set, PD0
|
|
Takashi Iwai |
94ec1a |
task_util = 13
|
|
Takashi Iwai |
94ec1a |
cost = 480 * 1000 = 480000
|
|
Takashi Iwai |
94ec1a |
sum_util_w/o_task = 215
|
|
Takashi Iwai |
94ec1a |
sum_util_w_task = 228
|
|
Takashi Iwai |
94ec1a |
energy_w/o_task = 480000 * 215 / 1024 = 100781
|
|
Takashi Iwai |
94ec1a |
energy_w_task = 480000 * 228 / 1024 = 106875
|
|
Takashi Iwai |
94ec1a |
energy_diff = 106875 - 100781 = 6094
|
|
Takashi Iwai |
94ec1a |
(this is not equal to 'new' PD1's energy_diff in 'd)')
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
c) 'old' w/o patch set, PD1
|
|
Takashi Iwai |
94ec1a |
task_util = 13
|
|
Takashi Iwai |
94ec1a |
cost = 160
|
|
Takashi Iwai |
94ec1a |
sum_util_w/o_task = 283
|
|
Takashi Iwai |
94ec1a |
sum_util_w_task = 293
|
|
Takashi Iwai |
94ec1a |
scale_cpu = 355
|
|
Takashi Iwai |
94ec1a |
energy_w/o_task = 160 * 283 / 355 = 127.55 => 127
|
|
Takashi Iwai |
94ec1a |
energy_w_task = 160 * 296 / 355 = 133.41 => 133
|
|
Takashi Iwai |
94ec1a |
energy_diff = 133 - 127 = 6
|
|
Takashi Iwai |
94ec1a |
(this is equal to 'old' PD0's energy_diff in 'a)')
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
d) 'new' w/ patch set, PD1
|
|
Takashi Iwai |
94ec1a |
task_util = 13
|
|
Takashi Iwai |
94ec1a |
cost = 160 * 1000 = 160000
|
|
Takashi Iwai |
94ec1a |
sum_util_w/o_task = 283
|
|
Takashi Iwai |
94ec1a |
sum_util_w_task = 293
|
|
Takashi Iwai |
94ec1a |
scale_cpu = 355
|
|
Takashi Iwai |
94ec1a |
energy_w/o_task = 160000 * 283 / 355 = 127549
|
|
Takashi Iwai |
94ec1a |
energy_w_task = 160000 * 296 / 355 = 133408
|
|
Takashi Iwai |
94ec1a |
energy_diff = 133408 - 127549 = 5859
|
|
Takashi Iwai |
94ec1a |
(this is not equal to 'new' PD0's energy_diff in 'b)')
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
2. Difference in the 6% energy margin filter at the end of
|
|
Takashi Iwai |
94ec1a |
find_energy_efficient_cpu(). With this patch the margin comparison also
|
|
Takashi Iwai |
94ec1a |
has better resolution, so it's possible to have better task placement
|
|
Takashi Iwai |
94ec1a |
thanks to that.
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
Fixes: 27871f7a8a341ef ("PM: Introduce an Energy Model management framework")
|
|
Takashi Iwai |
94ec1a |
Reported-by: CCJ Yeh <CCj.Yeh@mediatek.com>
|
|
Takashi Iwai |
94ec1a |
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
|
|
Takashi Iwai |
94ec1a |
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
|
|
Takashi Iwai |
94ec1a |
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Takashi Iwai |
94ec1a |
Acked-by: Takashi Iwai <tiwai@suse.de>
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
---
|
|
Takashi Iwai |
94ec1a |
include/linux/energy_model.h | 16 ++++++++++++++++
|
|
Takashi Iwai |
94ec1a |
kernel/power/energy_model.c | 4 +++-
|
|
Takashi Iwai |
94ec1a |
2 files changed, 19 insertions(+), 1 deletion(-)
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
|
|
Takashi Iwai |
94ec1a |
index 3f221dbf5f95..1834752c5617 100644
|
|
Takashi Iwai |
94ec1a |
--- a/include/linux/energy_model.h
|
|
Takashi Iwai |
94ec1a |
+++ b/include/linux/energy_model.h
|
|
Takashi Iwai |
94ec1a |
@@ -53,6 +53,22 @@ struct em_perf_domain {
|
|
Takashi Iwai |
94ec1a |
#ifdef CONFIG_ENERGY_MODEL
|
|
Takashi Iwai |
94ec1a |
#define EM_MAX_POWER 0xFFFF
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
+/*
|
|
Takashi Iwai |
94ec1a |
+ * Increase resolution of energy estimation calculations for 64-bit
|
|
Takashi Iwai |
94ec1a |
+ * architectures. The extra resolution improves decision made by EAS for the
|
|
Takashi Iwai |
94ec1a |
+ * task placement when two Performance Domains might provide similar energy
|
|
Takashi Iwai |
94ec1a |
+ * estimation values (w/o better resolution the values could be equal).
|
|
Takashi Iwai |
94ec1a |
+ *
|
|
Takashi Iwai |
94ec1a |
+ * We increase resolution only if we have enough bits to allow this increased
|
|
Takashi Iwai |
94ec1a |
+ * resolution (i.e. 64-bit). The costs for increasing resolution when 32-bit
|
|
Takashi Iwai |
94ec1a |
+ * are pretty high and the returns do not justify the increased costs.
|
|
Takashi Iwai |
94ec1a |
+ */
|
|
Takashi Iwai |
94ec1a |
+#ifdef CONFIG_64BIT
|
|
Takashi Iwai |
94ec1a |
+#define em_scale_power(p) ((p) * 1000)
|
|
Takashi Iwai |
94ec1a |
+#else
|
|
Takashi Iwai |
94ec1a |
+#define em_scale_power(p) (p)
|
|
Takashi Iwai |
94ec1a |
+#endif
|
|
Takashi Iwai |
94ec1a |
+
|
|
Takashi Iwai |
94ec1a |
struct em_data_callback {
|
|
Takashi Iwai |
94ec1a |
/**
|
|
Takashi Iwai |
94ec1a |
* active_power() - Provide power at the next performance state of
|
|
Takashi Iwai |
94ec1a |
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
|
|
Takashi Iwai |
94ec1a |
index 0f4530b3a8cd..a332ccd829e2 100644
|
|
Takashi Iwai |
94ec1a |
--- a/kernel/power/energy_model.c
|
|
Takashi Iwai |
94ec1a |
+++ b/kernel/power/energy_model.c
|
|
Takashi Iwai |
94ec1a |
@@ -170,7 +170,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
|
|
Takashi Iwai |
94ec1a |
/* Compute the cost of each performance state. */
|
|
Takashi Iwai |
94ec1a |
fmax = (u64) table[nr_states - 1].frequency;
|
|
Takashi Iwai |
94ec1a |
for (i = 0; i < nr_states; i++) {
|
|
Takashi Iwai |
94ec1a |
- table[i].cost = div64_u64(fmax * table[i].power,
|
|
Takashi Iwai |
94ec1a |
+ unsigned long power_res = em_scale_power(table[i].power);
|
|
Takashi Iwai |
94ec1a |
+
|
|
Takashi Iwai |
94ec1a |
+ table[i].cost = div64_u64(fmax * power_res,
|
|
Takashi Iwai |
94ec1a |
table[i].frequency);
|
|
Takashi Iwai |
94ec1a |
}
|
|
Takashi Iwai |
94ec1a |
|
|
Takashi Iwai |
94ec1a |
--
|
|
Takashi Iwai |
94ec1a |
2.26.2
|
|
Takashi Iwai |
94ec1a |
|