Blob Blame History Raw
From: Mike Galbraith <mgalbraith@suse.de>
Date: Tue, 12 Jun 2012 07:23:26 +0200
Subject: [PATCH] sched: optimize latency defaults for throughput

Patch-mainline: Never, SUSE specific
References: Scheduler enhancements for I7 (bnc#754690, bnc#1144446)

Adjust latency defaults to match SLE 15 SP1:

1. min_granularity = 2ms
   This setting sets nr_ latency to 3, switching the scheduler to cache
   preservation strategy sooner.  LAST_BUDDY uses this to try to give
   the CPU back to a wakeup preempted task, preserving it's footprint
   should a fast mover (ala kthread) briefly preempt, thus preventing
   selection of another task on every preempt, and trashing cache.
2. wakeup_granularity = 2.5ms.
   Reduces wakeup preemption to minimum, thus reducing overscheduling.

Fundamental problem with SP1 wide latency target: our preemption model
is sleep based, which is based upon our latency target.  While sleep
is a very natural preemption metric, there is a fundamental problem
with it: as resource contention increases, so does total sleep time,
rendering it less and less effective as a differentiator as load climbs.
Ergo, when tuning for loaded box performance, it has to be set to the
minimum that still allows preemption to occurr, to minimize thrash.

To date, noone has presented a model that is as cheap as and works
better than Linus' sleep time model despite it's limitations, so here
we sit with a good but not "perfect is the enemy of good" preempt
model in a scheduler.. that tries to be.. perfectly.. fair with all
that implies.

Testing confirmed that this primarily improves tbench at high thread
counts and hackbench in general. Most other examined workloads were
performance neutral. Archived test results are available at
http://laplace.suse.de/pt-master/SLE-15-SP2/0002-mgorman-sched-tuning-defaults/

Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 kernel/sched/fair.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bc9cfeaac8bd..44bb152f92a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -56,15 +56,15 @@ enum sched_tunable_scaling sysctl_sched_tunable_scaling = SCHED_TUNABLESCALING_L
 /*
  * Minimal preemption granularity for CPU-bound tasks:
  *
- * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 2 msec * (1 + ilog(ncpus)), units: nanoseconds)
  */
-unsigned int sysctl_sched_min_granularity			= 750000ULL;
-static unsigned int normalized_sysctl_sched_min_granularity	= 750000ULL;
+unsigned int sysctl_sched_min_granularity			= 2000000ULL;
+static unsigned int normalized_sysctl_sched_min_granularity	= 2000000ULL;
 
 /*
  * This value is kept at sysctl_sched_latency/sysctl_sched_min_granularity
  */
-static unsigned int sched_nr_latency = 8;
+static unsigned int sched_nr_latency = 3;
 
 /*
  * After fork, child runs first. If set to 0 (default) then
@@ -79,10 +79,10 @@ unsigned int sysctl_sched_child_runs_first __read_mostly;
  * and reduces their over-scheduling. Synchronous workloads will still
  * have immediate wakeup/sleep latencies.
  *
- * (default: 1 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 2.5 msec * (1 + ilog(ncpus)), units: nanoseconds)
  */
-unsigned int sysctl_sched_wakeup_granularity			= 1000000UL;
-static unsigned int normalized_sysctl_sched_wakeup_granularity	= 1000000UL;
+unsigned int sysctl_sched_wakeup_granularity			= 2500000UL;
+static unsigned int normalized_sysctl_sched_wakeup_granularity	= 2500000UL;
 
 const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;