Blob Blame History Raw
From: Mel Gorman <mgorman@suse.de>
Date: Fri, 24 Apr 2020 14:45:41 +0100
Subject: [PATCH] sched/nohz: Avoid disabling the tick for very short durations

Patch-mainline: Never, upstream favours power consumption over performance
References: bnc#754690, bsc#1158748

The decision on whether to disable the tick is based on when the
next tick is expected to update and some predictions made by the idle
governor. However, some workloads can idle for very brief periods due
to rapid context switching, ping-pong workloads or those communicating
rapidly with kernel threads. In these cases, the time to the next tick is
irrelevant and the predictions are not great. When this happens, the time
to exit from idle is substantial and performance can be severely degraded.

This patch avoids disabling the timer interrupt until the CPU
has been idling as least as long as the time a running task
is considered cache-hot.

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Mel Gorman <mgorman@suse.de>
[dwagner: Update define section to support UP configs]
Signed-off-by: Daniel Wagner <dwagner@suse.de>
---
 include/linux/sched/nohz.h |    2 ++
 kernel/sched/core.c        |   10 ++++++++++
 kernel/time/tick-sched.c   |    3 +++
 3 files changed, 15 insertions(+)

--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -9,8 +9,10 @@
 #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
 extern void nohz_balance_enter_idle(int cpu);
 extern int get_nohz_timer_target(void);
+extern bool nohz_sched_idling_cpu(int cpu);
 #else
 static inline void nohz_balance_enter_idle(int cpu) { }
+static inline bool nohz_sched_idling_cpu(int cpu) { return false; }
 #endif
 
 #ifdef CONFIG_NO_HZ_COMMON
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -631,6 +631,16 @@ void wake_up_nohz_cpu(int cpu)
 		wake_up_idle_cpu(cpu);
 }
 
+/*
+ * Returns true if a CPU is idling on average more that the cost of
+ * migration. This is used to avoid entering nohz for very short
+ * durations as the exit cost from a nohz idle is substantial.
+ */
+bool nohz_sched_idling_cpu(int cpu)
+{
+	return cpu_rq(cpu)->avg_idle >= sysctl_sched_migration_cost;
+}
+
 static inline bool got_nohz_idle_kick(void)
 {
 	int cpu = smp_processor_id();
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -902,6 +902,9 @@ static bool can_stop_idle_tick(int cpu,
 	if (need_resched())
 		return false;
 
+	if (!nohz_sched_idling_cpu(cpu))
+		return false;
+
 	if (unlikely(local_softirq_pending())) {
 		static int ratelimit;