Blob Blame History Raw
From: Mel Gorman <mgorman@suse.de>
Date: Fri, 24 Apr 2020 14:45:41 +0100
Subject: [PATCH] sched/nohz: Avoid disabling the tick for very short durations

Patch-mainline: Never, upstream favours power consumption over performance
References: bnc#754690, bsc#1158748

The decision on whether to disable the tick is based on when the
next tick is expected to update and some predictions made by the idle
governor. However, some workloads can idle for very brief periods due
to rapid context switching, ping-pong workloads or those communicating
rapidly with kernel threads. In these cases, the time to the next tick is
irrelevant and the predictions are not great. When this happens, the time
to exit from idle is substantial and performance can be severely degraded.

This patch avoids disabling the timer interrupt until the CPU
has been idling as least as long as the time a running task
is considered cache-hot.

SLE15-SP4: This showed very inconsistent results and nohz works very
	differently now to how it did for previous versions of SLE.
	Leave disabled until there is solid evidence it's a consistent
	win. See results at
	http://laplace.suse.de/pt-master/SLE15-SP4/0003-noshortnohz

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/sched/nohz.h |  5 +++++
 kernel/sched/core.c        | 10 ++++++++++
 kernel/time/tick-sched.c   |  3 +++
 3 files changed, 18 insertions(+)

diff --git a/include/linux/sched/nohz.h b/include/linux/sched/nohz.h
index 6d67e9a5af6b..3ef0d61f9f46 100644
--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -9,6 +9,11 @@
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 extern void nohz_balance_enter_idle(int cpu);
 extern int get_nohz_timer_target(void);
+#ifdef CONFIG_SMP
+extern bool nohz_sched_idling_cpu(int cpu);
+#else
+static inline bool nohz_sched_idling_cpu(int cpu) { return false; }
+#endif
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fed93994bc00..b4ac8247f791 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -631,6 +631,16 @@ void wake_up_nohz_cpu(int cpu)
 		wake_up_idle_cpu(cpu);
 }
 
+/*
+ * Returns true if a CPU is idling on average more that the cost of
+ * migration. This is used to avoid entering nohz for very short
+ * durations as the exit cost from a nohz idle is substantial.
+ */
+bool nohz_sched_idling_cpu(int cpu)
+{
+	return cpu_rq(cpu)->avg_idle >= sysctl_sched_migration_cost;
+}
+
 static void nohz_csd_func(void *info)
 {
 	struct rq *rq = info;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index be9707f68024..8021d0ccc4ad 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -890,6 +890,9 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	if (need_resched())
 		return false;
 
+	if (!nohz_sched_idling_cpu(cpu))
+		return false;
+
 	if (unlikely(local_softirq_pending())) {
 		static int ratelimit;