From 1c92c3ec4d7635c2d8159e6dec886758bb6e5623 Mon Sep 17 00:00:00 2001
From: Denis Kirjanov <dkirjanov@suse.com>
Date: Oct 26 2023 09:29:36 +0000
Subject: Merge branch 'users/mgorman/SLE12-SP5/for-next' into SLE12-SP5


Pull scheduler fixes from Mel Gorman

---

diff --git a/blacklist.conf b/blacklist.conf
index 16e4991..777fdc3 100644
--- a/blacklist.conf
+++ b/blacklist.conf
@@ -3032,3 +3032,23 @@ f2d3155e2a6bac44d16f04415a321e8707d895c6 # too many changes in between
 a8faed3a02eeb75857a3b5d660fa80fe79db77a3 # for architecture and configuration irrelevant in SLE12
 36d763509be326bb383b1b1852a129ff58d74e3b # comment only
 ac52578d6e8d300dd50f790f29a24169b1edd26c # Fixes: tag was wrong
+1ca4fa3ab604734e38e2a3000c9abf788512ffa7 # Cosmetic, debugging patch for unused config
+99687cdbb3f6c8e32bcc7f37496e811f30460e48 # Sparse warning fix
+1b02cd6a2d7f3e2a6a5262887d2cb2912083e42f # Missing dependencies, fix only in the event of a customer bug
+1a010e29cfa00fee2888fd2fd4983f848cbafb58 # Guard against unlikely tuning value, fix only in the event of a customer bug
+d1e7fd6462ca9fc76650fbe6ca800e35b24267da # KABI hazard, fix only in the event of a customer bug
+26a8b12747c975b33b4a82d62e4a307e1c07f31b # Complex dependencies missing, fix only in the event of a customer bug
+01cfcde9c26d8555f0e6e9aea9d6049f87683998 # Complex dependencies missing that applies to an extreme corner case, fix only in the event of a customer bug
+e5c6b312ce3cc97e90ea159446e6bfa06645364d # Fix to experimental feature, fix only in the event of a customer bug
+83d40a61046f73103b4e5d8f1310261487ff63b0 # Mostly cosmetic fix to a build warning
+42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 # Fix only in the event of a customer bug
+dd02d4234c9a2214a81c57a16484304a1a51872a # Potentially surprising change in behaviour, fix only in the event of a customer bug
+9b58e976b3b391c0cf02e038d53dd0478ed3013c # Potentially surprising change in behaviour, fix only in the event of a customer bug
+248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd # Potentially surprising change in behaviour, fix only in the event of a customer bug
+e4a38402c36e42df28eb1a5394be87e6571fb48a # KABI hazard, fix only in the event of a customer bug
+244226035a1f9b2b6c326e55ae5188fab4f428cb # Complex dependencies missing, fix only in the event of a customer bug
+b759caa1d9f667b94727b2ad12589cbc4ce13a82 # Complex dependencies missing, fix only in the event of a customer bug
+c56ab1b3506ba0e7a872509964b100912bde165d # Complex dependencies missing, fix only in the event of a customer bug
+d81304bc6193554014d4372a01debdf65e1e9a4d # Complex dependencies missing, fix only in the event of a customer bug
+44c7b80bffc3a657a36857098d5d9c49d94e652b # Complex dependencies missing, fix only in the event of a customer bug
+aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8 # Complex dependencies missing, fix only in the event of a customer bug
diff --git a/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch b/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch
index ef95f81..de3daaa 100644
--- a/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch
+++ b/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch
@@ -64,7 +64,7 @@ index 73dac5f85e7d..bba4f0111a36 100644
 --- a/kernel/sched/rt.c
 +++ b/kernel/sched/rt.c
 @@ -831,6 +831,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
- 		struct rq *rq = rq_of_rt_rq(rt_rq);
+ 			continue;
  
  		raw_spin_lock(&rq->lock);
 +		update_rq_clock(rq);
diff --git a/patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch b/patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch
new file mode 100644
index 0000000..a7487e8
--- /dev/null
+++ b/patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch
@@ -0,0 +1,79 @@
+From 3c4dafc1e19a1a043b93801202c99242f88b5463 Mon Sep 17 00:00:00 2001
+From: Michael Wang <yun.wang@linux.alibaba.com>
+Date: Wed, 18 Mar 2020 10:15:15 +0800
+Subject: [PATCH] sched: Avoid scale real weight down to zero
+
+References: git fixes (sched)
+Patch-mainline: v5.7-rc1
+Git-commit: 26cf52229efc87e2effa9d788f9b33c40fb3358a
+
+During our testing, we found a case that shares no longer
+working correctly, the cgroup topology is like:
+
+  /sys/fs/cgroup/cpu/A		(shares=102400)
+  /sys/fs/cgroup/cpu/A/B	(shares=2)
+  /sys/fs/cgroup/cpu/A/B/C	(shares=1024)
+
+  /sys/fs/cgroup/cpu/D		(shares=1024)
+  /sys/fs/cgroup/cpu/D/E	(shares=1024)
+  /sys/fs/cgroup/cpu/D/E/F	(shares=1024)
+
+The same benchmark is running in group C & F, no other tasks are
+running, the benchmark is capable to consumed all the CPUs.
+
+We suppose the group C will win more CPU resources since it could
+enjoy all the shares of group A, but it's F who wins much more.
+
+The reason is because we have group B with shares as 2, since
+A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
+so A->cfs_rq.load.weight become very small.
+
+And in calc_group_shares() we calculate shares as:
+
+  load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
+  shares = (tg_shares * load) / tg_weight;
+
+Since the 'cfs_rq->load.weight' is too small, the load become 0
+after scale down, although 'tg_shares' is 102400, shares of the se
+which stand for group A on root cfs_rq become 2.
+
+While the se of D on root cfs_rq is far more bigger than 2, so it
+wins the battle.
+
+Thus when scale_load_down() scale real weight down to 0, it's no
+longer telling the real story, the caller will have the wrong
+information and the calculation will be buggy.
+
+This patch add check in scale_load_down(), so the real weight will
+be >= MIN_SHARES after scale, after applied the group C wins as
+expected.
+
+Suggested-by: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
+Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/sched.h | 8 +++++++-
+ 1 file changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
+index ba32909bbecc..f516e7a9d43f 100644
+--- a/kernel/sched/sched.h
++++ b/kernel/sched/sched.h
+@@ -88,7 +88,13 @@ static inline void cpu_load_update_active(struct rq *this_rq) { }
+ #ifdef CONFIG_64BIT
+ # define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
+ # define scale_load(w)		((w) << SCHED_FIXEDPOINT_SHIFT)
+-# define scale_load_down(w)	((w) >> SCHED_FIXEDPOINT_SHIFT)
++# define scale_load_down(w) \
++({ \
++	unsigned long __w = (w); \
++	if (__w) \
++		__w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \
++	__w; \
++})
+ #else
+ # define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT)
+ # define scale_load(w)		(w)
diff --git a/patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch b/patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch
new file mode 100644
index 0000000..2be0a65
--- /dev/null
+++ b/patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch
@@ -0,0 +1,42 @@
+From 4d35a8d31a8bb37ffc33267351eae679663b18db Mon Sep 17 00:00:00 2001
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 20 Oct 2020 16:46:55 +0200
+Subject: [PATCH] sched: Reenable interrupts in do_sched_yield()
+
+References: git fixes (sched)
+Patch-mainline: v5.11-rc1
+Git-commit: 345a957fcc95630bf5535d7668a59ed983eb49a7
+
+do_sched_yield() invokes schedule() with interrupts disabled which is
+not allowed. This goes back to the pre git era to commit a6efb709806c
+("[PATCH] irqlock patch 2.5.27-H6") in the history tree.
+
+Reenable interrupts and remove the misleading comment which "explains" it.
+
+Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Link: https://lkml.kernel.org/r/87r1pt7y5c.fsf@nanos.tec.linutronix.de
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/core.c | 6 +-----
+ 1 file changed, 1 insertion(+), 5 deletions(-)
+
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index 616ffabb6d9f..a9fa554d73eb 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -5043,12 +5043,8 @@ SYSCALL_DEFINE0(sched_yield)
+ 	schedstat_inc(rq->yld_count);
+ 	current->sched_class->yield_task(rq);
+ 
+-	/*
+-	 * Since we are going to call schedule() anyway, there's
+-	 * no need to preempt or enable interrupts:
+-	 */
+ 	preempt_disable();
+-	rq_unlock(rq, &rf);
++	rq_unlock_irq(rq, &rf);
+ 	sched_preempt_enable_no_resched();
+ 
+ 	schedule();
diff --git a/patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch b/patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch
new file mode 100644
index 0000000..db8d93f
--- /dev/null
+++ b/patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch
@@ -0,0 +1,83 @@
+From 9aa958f52d3b3d71544358106531f20d2887a466 Mon Sep 17 00:00:00 2001
+From: KeMeng Shi <shikemeng@huawei.com>
+Date: Mon, 16 Sep 2019 06:53:28 +0000
+Subject: [PATCH] sched/core: Fix migration to invalid CPU in
+ __set_cpus_allowed_ptr()
+
+References: git fixes (sched)
+Patch-mainline: v5.4-rc1
+Git-commit: 714e501e16cd473538b609b3e351b2cc9f7f09ed
+
+An oops can be triggered in the scheduler when running qemu on arm64:
+
+ Unable to handle kernel paging request at virtual address ffff000008effe40
+ Internal error: Oops: 96000007 [#1] SMP
+ Process migration/0 (pid: 12, stack limit = 0x00000000084e3736)
+ pstate: 20000085 (nzCv daIf -PAN -UAO)
+ pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20
+ lr : move_queued_task.isra.21+0x124/0x298
+ ...
+ Call trace:
+  __ll_sc___cmpxchg_case_acq_4+0x4/0x20
+  __migrate_task+0xc8/0xe0
+  migration_cpu_stop+0x170/0x180
+  cpu_stopper_thread+0xec/0x178
+  smpboot_thread_fn+0x1ac/0x1e8
+  kthread+0x134/0x138
+  ret_from_fork+0x10/0x18
+
+__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to
+migrage the process if process is not currently running on any one of the
+CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an
+invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if
+CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects
+check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if
+corresponding bit is coincidentally set. As a consequence, kernel will
+access an invalid rq address associate with the invalid CPU in
+migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs.
+
+The reproduce the crash:
+
+  1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling
+  sched_setaffinity.
+
+  2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online"
+  and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn.
+
+  3) Oops appears if the invalid CPU is set in memory after tested cpumask.
+
+Signed-off-by: KeMeng Shi <shikemeng@huawei.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.com
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/core.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index dec420fefa87..616ffabb6d9f 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1173,7 +1173,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
+ 	if (cpumask_equal(&p->cpus_allowed, new_mask))
+ 		goto out;
+ 
+-	if (!cpumask_intersects(new_mask, cpu_valid_mask)) {
++	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
++	if (dest_cpu >= nr_cpu_ids) {
+ 		ret = -EINVAL;
+ 		goto out;
+ 	}
+@@ -1194,7 +1195,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
+ 	if (cpumask_test_cpu(task_cpu(p), new_mask))
+ 		goto out;
+ 
+-	dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask);
+ 	if (task_running(rq, p) || p->state == TASK_WAKING) {
+ 		struct migration_arg arg = { p, dest_cpu };
+ 		/* Need help from migration thread: drop lock and wait. */
diff --git a/patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch b/patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch
new file mode 100644
index 0000000..d1bf89c
--- /dev/null
+++ b/patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch
@@ -0,0 +1,58 @@
+From 61492d9ac3fbbbed05f0e771bcddc8064d34a6c4 Mon Sep 17 00:00:00 2001
+From: Vincent Donnefort <vincent.donnefort@arm.com>
+Date: Thu, 4 Nov 2021 17:51:20 +0000
+Subject: [PATCH] sched/core: Mitigate race
+ cpus_share_cache()/update_top_cache_domain()
+
+References: git fixes (sched)
+Patch-mainline: v5.16-rc1
+Git-commit: 42dc938a590c96eeb429e1830123fef2366d9c80
+
+Nothing protects the access to the per_cpu variable sd_llc_id. When testing
+the same CPU (i.e. this_cpu == that_cpu), a race condition exists with
+update_top_cache_domain(). One scenario being:
+
+              CPU1                            CPU2
+  ==================================================================
+
+  per_cpu(sd_llc_id, CPUX) => 0
+                                    partition_sched_domains_locked()
+      				      detach_destroy_domains()
+  cpus_share_cache(CPUX, CPUX)          update_top_cache_domain(CPUX)
+    per_cpu(sd_llc_id, CPUX) => 0
+                                          per_cpu(sd_llc_id, CPUX) = CPUX
+    per_cpu(sd_llc_id, CPUX) => CPUX
+    return false
+
+ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result
+is a warning triggered from ttwu_queue_wakelist().
+
+Avoid a such race in cpus_share_cache() by always returning true when
+this_cpu == that_cpu.
+
+Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries")
+Reported-by: Jing-Ting Wu <jing-ting.wu@mediatek.com>
+Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
+Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
+Link: https://lore.kernel.org/r/20211104175120.857087-1-vincent.donnefort@arm.com
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/core.c | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index a9fa554d73eb..b27d09d514a1 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -1892,6 +1892,9 @@ void wake_up_if_idle(int cpu)
+ 
+ bool cpus_share_cache(int this_cpu, int that_cpu)
+ {
++	if (this_cpu == that_cpu)
++		return true;
++
+ 	return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu);
+ }
+ #endif /* CONFIG_SMP */
diff --git a/patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch b/patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch
new file mode 100644
index 0000000..6a59154
--- /dev/null
+++ b/patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch
@@ -0,0 +1,37 @@
+From 7f63cdadaf5dc6b8da65b2e0d69814ee7a795672 Mon Sep 17 00:00:00 2001
+From: Peng Liu <iwtbavbm@gmail.com>
+Date: Tue, 9 Jun 2020 23:09:36 +0800
+Subject: [PATCH] sched: correct SD_flags returned by tl->sd_flags()
+
+References: git fixes (sched)
+Patch-mainline: v5.9-rc1
+Git-commit: 9b1b234bb86bcdcdb142e900d39b599185465dbb
+
+During sched domain init, we check whether non-topological SD_flags are
+returned by tl->sd_flags(), if found, fire a waning and correct the
+violation, but the code failed to correct the violation. Correct this.
+
+Fixes: 143e1e28cb40 ("sched: Rework sched_domain topology definition")
+Signed-off-by: Peng Liu <iwtbavbm@gmail.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
+Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
+Link: https://lkml.kernel.org/r/20200609150936.GA13060@iZj6chx1xj0e0buvshuecpZ
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/topology.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
+index 76a3380de05b..1c2906a99a14 100644
+--- a/kernel/sched/topology.c
++++ b/kernel/sched/topology.c
+@@ -1121,7 +1121,7 @@ sd_init(struct sched_domain_topology_level *tl,
+ 		sd_flags = (*tl->sd_flags)();
+ 	if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
+ 			"wrong sd_flags in topology description\n"))
+-		sd_flags &= ~TOPOLOGY_SD_FLAGS;
++		sd_flags &= TOPOLOGY_SD_FLAGS;
+ 
+ 	*sd = (struct sched_domain){
+ 		.min_interval		= sd_weight,
diff --git a/patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch b/patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch
new file mode 100644
index 0000000..451d789
--- /dev/null
+++ b/patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch
@@ -0,0 +1,93 @@
+From 2dda9d9e798105274ee3de4645756131b4926bc2 Mon Sep 17 00:00:00 2001
+From: Yicong Yang <yangyicong@hisilicon.com>
+Date: Tue, 30 May 2023 16:25:07 +0800
+Subject: [PATCH] sched/fair: Don't balance task to its current running CPU
+
+References: git fixes (sched)
+Patch-mainline: v6.5-rc1
+Git-commit: 0dd37d6dd33a9c23351e6115ae8cdac7863bc7de
+
+We've run into the case that the balancer tries to balance a migration
+disabled task and trigger the warning in set_task_cpu() like below:
+
+ ------------[ cut here ]------------
+ WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240
+ Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip>
+ CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G           O       6.1.0-rc4+ #1
+ Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021
+ pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
+ pc : set_task_cpu+0x188/0x240
+ lr : load_balance+0x5d0/0xc60
+ sp : ffff80000803bc70
+ x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040
+ x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001
+ x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78
+ x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000
+ x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000
+ x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000
+ x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530
+ x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e
+ x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a
+ x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001
+ Call trace:
+  set_task_cpu+0x188/0x240
+  load_balance+0x5d0/0xc60
+  rebalance_domains+0x26c/0x380
+  _nohz_idle_balance.isra.0+0x1e0/0x370
+  run_rebalance_domains+0x6c/0x80
+  __do_softirq+0x128/0x3d8
+  ____do_softirq+0x18/0x24
+  call_on_irq_stack+0x2c/0x38
+  do_softirq_own_stack+0x24/0x3c
+  __irq_exit_rcu+0xcc/0xf4
+  irq_exit_rcu+0x18/0x24
+  el1_interrupt+0x4c/0xe4
+  el1h_64_irq_handler+0x18/0x2c
+  el1h_64_irq+0x74/0x78
+  arch_cpu_idle+0x18/0x4c
+  default_idle_call+0x58/0x194
+  do_idle+0x244/0x2b0
+  cpu_startup_entry+0x30/0x3c
+  secondary_start_kernel+0x14c/0x190
+  __secondary_switched+0xb0/0xb4
+ ---[ end trace 0000000000000000 ]---
+
+Further investigation shows that the warning is superfluous, the migration
+disabled task is just going to be migrated to its current running CPU.
+This is because that on load balance if the dst_cpu is not allowed by the
+task, we'll re-select a new_dst_cpu as a candidate. If no task can be
+balanced to dst_cpu we'll try to balance the task to the new_dst_cpu
+instead. In this case when the migration disabled task is not on CPU it
+only allows to run on its current CPU, load balance will select its
+current CPU as new_dst_cpu and later triggers the warning above.
+
+The new_dst_cpu is chosen from the env->dst_grpmask. Currently it
+contains CPUs in sched_group_span() and if we have overlapped groups it's
+possible to run into this case. This patch makes env->dst_grpmask of
+group_balance_mask() which exclude any CPUs from the busiest group and
+solve the issue. For balancing in a domain with no overlapped groups
+the behaviour keeps same as before.
+
+Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
+Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
+Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/fair.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
+index e5152b0bcf67..dd85cf9d8187 100644
+--- a/kernel/sched/fair.c
++++ b/kernel/sched/fair.c
+@@ -8715,7 +8715,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
+ 		.sd		= sd,
+ 		.dst_cpu	= this_cpu,
+ 		.dst_rq		= this_rq,
+-		.dst_grpmask    = sched_group_span(sd->groups),
++		.dst_grpmask    = group_balance_mask(sd->groups),
+ 		.idle		= idle,
+ 		.loop_break	= sched_nr_migrate_break,
+ 		.cpus		= cpus,
diff --git a/patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch b/patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch
new file mode 100644
index 0000000..00556c4
--- /dev/null
+++ b/patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch
@@ -0,0 +1,52 @@
+From 782f0d9f62f0e8cb135e2991fdc71356d6c15e6b Mon Sep 17 00:00:00 2001
+From: Dave Kleikamp <dave.kleikamp@oracle.com>
+Date: Mon, 15 May 2017 14:14:13 -0500
+Subject: [PATCH] sched/rt: Minimize rq->lock contention in
+ do_sched_rt_period_timer()
+
+References: git fixes (sched)
+Patch-mainline: v4.13-rc1
+Git-commit: c249f255aab86b9b187ba319b9d2684841ac7c8d
+
+With CONFIG_RT_GROUP_SCHED=y, do_sched_rt_period_timer() sequentially
+takes each CPU's rq->lock. On a large, busy system, the cumulative time it
+takes to acquire each lock can be excessive, even triggering a watchdog
+timeout.
+
+If rt_rq->rt_time and rt_rq->rt_nr_running are both zero, this function does
+nothing while holding the lock, so don't bother taking it at all.
+
+Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Link: http://lkml.kernel.org/r/a767637b-df85-912f-ba69-c90ee00a3fb6@oracle.com
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/rt.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
+index ba240da708bb..b28045f8a3a0 100644
+--- a/kernel/sched/rt.c
++++ b/kernel/sched/rt.c
+@@ -829,6 +829,17 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
+ 		int enqueue = 0;
+ 		struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
+ 		struct rq *rq = rq_of_rt_rq(rt_rq);
++		int skip;
++
++		/*
++		 * When span == cpu_online_mask, taking each rq->lock
++		 * can be time-consuming. Try to avoid it when possible.
++		 */
++		raw_spin_lock(&rt_rq->rt_runtime_lock);
++		skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
++		raw_spin_unlock(&rt_rq->rt_runtime_lock);
++		if (skip)
++			continue;
+ 
+ 		raw_spin_lock(&rq->lock);
+ 		if (rt_rq->rt_time) {
diff --git a/patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch b/patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch
new file mode 100644
index 0000000..a98c1f0
--- /dev/null
+++ b/patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch
@@ -0,0 +1,73 @@
+From dad98a02679182b373701f296f498d0e83051b86 Mon Sep 17 00:00:00 2001
+From: Hailong Liu <liu.hailong6@zte.com.cn>
+Date: Wed, 18 Jul 2018 08:46:55 +0800
+Subject: [PATCH] sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE
+
+References: git fixes (sched)
+Patch-mainline: v4.18-rc8
+Git-commit: f3d133ee0a17d5694c6f21873eec9863e11fa423
+
+NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough
+runtime with a spin-rt-task.
+
+However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd
+enough rt_runtime at the beginning, rt_runtime can't be restored to
+its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE.
+
+E.g. on my PC with 4 cores, procedure to reproduce:
+1) Make sure  RT_RUNTIME_SHARE is enabled
+ cat /sys/kernel/debug/sched_features
+  GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY
+  CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK
+  LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP
+  NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN
+  ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS
+2) Start a spin-rt-task
+ ./loop_rr &
+3) set affinity to the last cpu
+ taskset -p 8 $pid_of_loop_rr
+4) Observe that last cpu have borrowed enough runtime.
+ cat /proc/sched_debug | grep rt_runtime
+  .rt_runtime                    : 950.000000
+  .rt_runtime                    : 900.000000
+  .rt_runtime                    : 950.000000
+  .rt_runtime                    : 1000.000000
+5) Disable RT_RUNTIME_SHARE
+ echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features
+6) Observe that rt_runtime can not been restored
+ cat /proc/sched_debug | grep rt_runtime
+  .rt_runtime                    : 950.000000
+  .rt_runtime                    : 900.000000
+  .rt_runtime                    : 950.000000
+  .rt_runtime                    : 1000.000000
+
+This patch help to restore rt_runtime after we disable
+RT_RUNTIME_SHARE.
+
+Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
+Signed-off-by: Jiang Biao <jiang.biao2@zte.com.cn>
+Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
+Cc: Linus Torvalds <torvalds@linux-foundation.org>
+Cc: Peter Zijlstra <peterz@infradead.org>
+Cc: Thomas Gleixner <tglx@linutronix.de>
+Cc: zhong.weidong@zte.com.cn
+Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn
+Signed-off-by: Ingo Molnar <mingo@kernel.org>
+Signed-off-by: Mel Gorman <mgorman@suse.de>
+---
+ kernel/sched/rt.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
+index b28045f8a3a0..89b3448e2686 100644
+--- a/kernel/sched/rt.c
++++ b/kernel/sched/rt.c
+@@ -836,6 +836,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
+ 		 * can be time-consuming. Try to avoid it when possible.
+ 		 */
+ 		raw_spin_lock(&rt_rq->rt_runtime_lock);
++		if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime != RUNTIME_INF)
++			rt_rq->rt_runtime = rt_b->rt_runtime;
+ 		skip = !rt_rq->rt_time && !rt_rq->rt_nr_running;
+ 		raw_spin_unlock(&rt_rq->rt_runtime_lock);
+ 		if (skip)
diff --git a/series.conf b/series.conf
index 6275774..1d78484 100644
--- a/series.conf
+++ b/series.conf
@@ -1223,6 +1223,7 @@
 	patches.suse/0001-smp-Avoid-sending-needless-IPI-in-smp_call_function_.patch
 	patches.suse/0001-smp-cpumask-Use-non-atomic-cpumask_-set-clear-_cpu.patch
 	patches.suse/0001-sched-core-Allow-__sched_setscheduler-in-interrupts-.patch
+	patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch
 	patches.suse/0001-x86-tsc-Fold-set_cyc2ns_scale-into-caller.patch
 	patches.suse/0001-sched-clock-Fix-early-boot-preempt-assumption-in-__s.patch
 	patches.suse/0001-sched-deadline-Zero-out-positive-runtime-after-throt.patch
@@ -36054,6 +36055,7 @@
 	patches.suse/perf-x86-intel-fix-unwind-errors-from-pebs-entries-mk-ii.patch
 	patches.suse/perf-core-fix-crash-when-using-hw-tracing-kernel-filters.patch
 	patches.suse/stop_machine-Disable-preemption-after-queueing-stopp.patch
+	patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch
 	patches.suse/x86-entry-64-remove-ebx-handling-from-error_entry-exit.patch
 	patches.suse/x86-boot-fix-if_changed-build-flip-flop-bug
 	patches.suse/squashfs-more-metadata-hardening.patch
@@ -53265,6 +53267,7 @@
 	patches.suse/nfsd-degraded-slot-count-more-gracefully-as-allocati.patch
 	patches.suse/ima-always-return-negative-code-for-error.patch
 	patches.suse/efi-Restrict-efivar_ssdt_load-when-the-kernel-is-loc.patch
+	patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch
 	patches.suse/powerpc-book3s64-radix-Remove-WARN_ON-in-destroy_con.patch
 	patches.suse/KVM-PPC-Book3S-HV-use-smp_mb-when-setting-clearing-h.patch
 	patches.suse/powerpc-pseries-Read-TLB-Block-Invalidate-Characteri.patch
@@ -56026,6 +56029,7 @@
 	patches.suse/sched-fair-reorder-enqueue-dequeue_task_fair-path.patch
 	patches.suse/sched-fair-fix-reordering-of-enqueue-dequeue_task_fair.patch
 	patches.suse/sched-fair-fix-enqueue_task_fair-warning.patch
+	patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch
 	patches.suse/irqchip-bcm2835-Quiesce-IRQs-left-enabled-by-bootloa.patch
 	patches.suse/timekeeping-Prevent-32bit-truncation-in-scale64_chec.patch
 	patches.suse/fbdev-g364fb-Fix-build-failure.patch
@@ -57719,6 +57723,7 @@
 	patches.suse/0001-block-improve-discard-bio-alignment-in-__blkdev_issu.patch
 	patches.suse/block-Use-non-_rcu-version-of-list-functions-for-tag.patch
 	patches.suse/s390-ap-rework-crypto-config-info-and-default-domain-code
+	patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch
 	patches.suse/x86-speculation-merge-one-test-in-spectre_v2_user_select_mitigation.patch
 	patches.suse/x86-mce-inject-fix-a-wrong-assignment-of-i_mce-status.patch
 	patches.suse/platform-x86-intel-hid-Fix-return-value-check-in-che.patch
@@ -59429,6 +59434,7 @@
 	patches.suse/s390-cio-fix-use-after-free-in-ccw_device_destroy_console
 	patches.suse/s390-smp-perform-initial-cpu-reset-also-for-smt-siblings
 	patches.suse/x86-kprobes-restore-btf-if-the-single-stepping-is-cancelled.patch
+	patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch
 	patches.suse/x86-apic-Fix-x2apic-enablement-without-interrupt-rem.patch
 	patches.suse/x86-msi-Only-use-high-bits-of-MSI-address-for-DMAR-u.patch
 	patches.suse/x86-ioapic-Handle-Extended-Destination-ID-field-in-R.patch
@@ -62360,6 +62366,7 @@
 	patches.suse/cifs-release-lock-earlier-in-dequeue_mid-error-case.patch
 	patches.suse/cifs-fix-memory-leak-of-smb3_fs_context_dup-server_hostname.patch
 	patches.suse/cifs-fix-potential-use-after-free-bugs.patch
+	patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch
 	patches.suse/PCI-MSI-Destroy-sysfs-before-freeing-entries.patch
 	patches.suse/msft-hv-2480-x86-hyperv-Fix-NULL-deref-in-set_hv_tscchange_cb-if-.patch
 	patches.suse/printk-Remove-printk.h-inclusion-in-percpu.h.patch
@@ -64216,6 +64223,7 @@
 	patches.suse/svcrdma-Prevent-page-release-when-nothing-was-receiv.patch
 	patches.suse/x86-retbleed-add-_x86_return_thunk-alignment-checks.patch
 	patches.suse/x86-microcode-AMD-Load-late-on-both-threads-too.patch
+	patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch
 	patches.suse/net-mana-Add-support-for-vlan-tagging.patch
 	patches.suse/net-nfc-Fix-use-after-free-caused-by-nfc_llcp_find_l.patch
 	patches.suse/0003-fbdev-omapfb-lcd_mipid-Fix-an-error-handling-path-in.patch