Blob Blame History Raw
From 7f7644a6e22f184f3b05d839958939613437ea3e Mon Sep 17 00:00:00 2001
From: Mel Gorman <mgorman@techsingularity.net>
Date: Mon, 31 Jan 2022 14:47:00 +0000
Subject: [PATCH] sched/fair: Revert update_pick_idlest() Select group with
 lowest group_util when idle_cpus are equal

References: bnc#1193175
Patch-mainline: Never, upstream would need a more comprehensive solution

This reverts commit 3edecfef028536cb19a120ec8788bd8a11f93b9e.

Commit 3edecfef0285 ("sched/fair: update_pick_idlest() Select group with
lowest group_util when idle_cpus are equal") intended to differentiate
between two group_spare groups with equal number of idle CPUs by selecting
the one with lower utilisation. However, it can pick a group with idential
or nearly identical utilisation. This impacts fork/exec latencies as
more CPUs can be selected at fork time that may be in a lower c-state.
The lower utilisation group should only be selected if there is a larger
different in utilisation to mitigate the impact on short-lived new tasks.

The fundamental problem is that the upstream code is following the intent
of finding the idlest group regardless of c-state. However, for short-lived
tasks, it spreads tasks to idle groups as the utilisation has not decayed
to 0 quickly enough. While this can be hacked around, it's the wrong
solution. A more comprehensive solution would be to account for the number
of new tasks waking in a group and somehow identify the scheduler group
holding the accounting when the task starts running for the first time.

If that gets prototyped, validated and upstreamed in time, it will be
included in SLE 15 SP4 pre-GA and replace this patch.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 kernel/sched/fair.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ccf38153b9d3..321b111daf3e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9187,14 +9187,8 @@ static bool update_pick_idlest(struct sched_group *idlest,
 
 	case group_has_spare:
 		/* Select group with most idle CPUs */
-		if (idlest_sgs->idle_cpus > sgs->idle_cpus)
+		if (idlest_sgs->idle_cpus >= sgs->idle_cpus)
 			return false;
-
-		/* Select group with lowest group_util */
-		if (idlest_sgs->idle_cpus == sgs->idle_cpus &&
-			idlest_sgs->group_util <= sgs->group_util)
-			return false;
-
 		break;
 	}