Blob Blame History Raw
From: Mike Galbraith <mgalbraith@suse.de>
Date: Mon, 11 Jun 2012 18:21:05 +0200
Subject: [PATCH] sched: ratelimit nohz

Patch-mainline: never, SUSE specific
References: Scheduler enhancements for I7 (bnc#754690)

Entering nohz code on every micro-idle is costing ~10% throughput for netperf
TCP_RR when scheduling cross-cpu.

The higher the context switch rate, the more nohz entry costs.  With this
patch and some cycle recovery patches in my tree, max cross cpu context
switch rate is improved by ~16%, a large portion of which of which is this
ratelimiting. In addition, it is known that this improves the database
initialisation times of sysbench-oltp by at least 20% on some Intel machines.

Note: a similar patch was in mainline briefly, but was reverted due to one
complaint wrt laptop using more power.  The earlier version raised ticks/s
on my Q6600 bof from ~85c to ~128, this version does not, and also blocks
the mb() in rcu_needs_cpu().

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
---
 include/linux/sched/nohz.h |    5 +++++
 kernel/sched/core.c        |    8 ++++++++
 kernel/time/tick-sched.c   |    5 +++--
 3 files changed, 16 insertions(+), 2 deletions(-)

--- a/include/linux/sched/nohz.h
+++ b/include/linux/sched/nohz.h
@@ -25,6 +25,11 @@ static inline void set_cpu_sd_state_idle
 #ifdef CONFIG_NO_HZ_COMMON
 void calc_load_enter_idle(void);
 void calc_load_exit_idle(void);
+#ifdef CONFIG_SMP
+extern int sched_needs_cpu(int cpu);
+#else
+static inline int sched_needs_cpu(int cpu) { return 0; }
+#endif
 #else
 static inline void calc_load_enter_idle(void) { }
 static inline void calc_load_exit_idle(void) { }
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -620,6 +620,14 @@ static inline bool got_nohz_idle_kick(vo
 	return false;
 }
 
+int sched_needs_cpu(int cpu)
+{
+	if (tick_nohz_full_cpu(cpu))
+		return 0;
+
+	return  cpu_rq(cpu)->avg_idle < sysctl_sched_migration_cost;
+}
+
 #else /* CONFIG_NO_HZ_COMMON */
 
 static inline bool got_nohz_idle_kick(void)
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -693,8 +693,9 @@ static ktime_t tick_nohz_stop_sched_tick
 	 * minimal delta which brings us back to this place
 	 * immediately. Lather, rinse and repeat...
 	 */
-	if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() ||
-	    irq_work_needs_cpu() || local_timer_softirq_pending()) {
+	if (sched_needs_cpu(cpu) || rcu_needs_cpu(basemono, &next_rcu) ||
+	    arch_needs_cpu() || irq_work_needs_cpu() ||
+	    local_timer_softirq_pending()) {
 		next_tick = basemono + TICK_NSEC;
 	} else {
 		/*