From 1c92c3ec4d7635c2d8159e6dec886758bb6e5623 Mon Sep 17 00:00:00 2001 From: Denis Kirjanov Date: Oct 26 2023 09:29:36 +0000 Subject: Merge branch 'users/mgorman/SLE12-SP5/for-next' into SLE12-SP5 Pull scheduler fixes from Mel Gorman --- diff --git a/blacklist.conf b/blacklist.conf index 16e4991..777fdc3 100644 --- a/blacklist.conf +++ b/blacklist.conf @@ -3032,3 +3032,23 @@ f2d3155e2a6bac44d16f04415a321e8707d895c6 # too many changes in between a8faed3a02eeb75857a3b5d660fa80fe79db77a3 # for architecture and configuration irrelevant in SLE12 36d763509be326bb383b1b1852a129ff58d74e3b # comment only ac52578d6e8d300dd50f790f29a24169b1edd26c # Fixes: tag was wrong +1ca4fa3ab604734e38e2a3000c9abf788512ffa7 # Cosmetic, debugging patch for unused config +99687cdbb3f6c8e32bcc7f37496e811f30460e48 # Sparse warning fix +1b02cd6a2d7f3e2a6a5262887d2cb2912083e42f # Missing dependencies, fix only in the event of a customer bug +1a010e29cfa00fee2888fd2fd4983f848cbafb58 # Guard against unlikely tuning value, fix only in the event of a customer bug +d1e7fd6462ca9fc76650fbe6ca800e35b24267da # KABI hazard, fix only in the event of a customer bug +26a8b12747c975b33b4a82d62e4a307e1c07f31b # Complex dependencies missing, fix only in the event of a customer bug +01cfcde9c26d8555f0e6e9aea9d6049f87683998 # Complex dependencies missing that applies to an extreme corner case, fix only in the event of a customer bug +e5c6b312ce3cc97e90ea159446e6bfa06645364d # Fix to experimental feature, fix only in the event of a customer bug +83d40a61046f73103b4e5d8f1310261487ff63b0 # Mostly cosmetic fix to a build warning +42288cb44c4b5fff7653bc392b583a2b8bd6a8c0 # Fix only in the event of a customer bug +dd02d4234c9a2214a81c57a16484304a1a51872a # Potentially surprising change in behaviour, fix only in the event of a customer bug +9b58e976b3b391c0cf02e038d53dd0478ed3013c # Potentially surprising change in behaviour, fix only in the event of a customer bug +248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd # Potentially surprising change in behaviour, fix only in the event of a customer bug +e4a38402c36e42df28eb1a5394be87e6571fb48a # KABI hazard, fix only in the event of a customer bug +244226035a1f9b2b6c326e55ae5188fab4f428cb # Complex dependencies missing, fix only in the event of a customer bug +b759caa1d9f667b94727b2ad12589cbc4ce13a82 # Complex dependencies missing, fix only in the event of a customer bug +c56ab1b3506ba0e7a872509964b100912bde165d # Complex dependencies missing, fix only in the event of a customer bug +d81304bc6193554014d4372a01debdf65e1e9a4d # Complex dependencies missing, fix only in the event of a customer bug +44c7b80bffc3a657a36857098d5d9c49d94e652b # Complex dependencies missing, fix only in the event of a customer bug +aa69c36f31aadc1669bfa8a3de6a47b5e6c98ee8 # Complex dependencies missing, fix only in the event of a customer bug diff --git a/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch b/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch index ef95f81..de3daaa 100644 --- a/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch +++ b/patches.suse/0001-sched-rt-Fix-rq-clock_update_flags-RQCF_ACT_SKIP-war.patch @@ -64,7 +64,7 @@ index 73dac5f85e7d..bba4f0111a36 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -831,6 +831,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) - struct rq *rq = rq_of_rt_rq(rt_rq); + continue; raw_spin_lock(&rq->lock); + update_rq_clock(rq); diff --git a/patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch b/patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch new file mode 100644 index 0000000..a7487e8 --- /dev/null +++ b/patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch @@ -0,0 +1,79 @@ +From 3c4dafc1e19a1a043b93801202c99242f88b5463 Mon Sep 17 00:00:00 2001 +From: Michael Wang +Date: Wed, 18 Mar 2020 10:15:15 +0800 +Subject: [PATCH] sched: Avoid scale real weight down to zero + +References: git fixes (sched) +Patch-mainline: v5.7-rc1 +Git-commit: 26cf52229efc87e2effa9d788f9b33c40fb3358a + +During our testing, we found a case that shares no longer +working correctly, the cgroup topology is like: + + /sys/fs/cgroup/cpu/A (shares=102400) + /sys/fs/cgroup/cpu/A/B (shares=2) + /sys/fs/cgroup/cpu/A/B/C (shares=1024) + + /sys/fs/cgroup/cpu/D (shares=1024) + /sys/fs/cgroup/cpu/D/E (shares=1024) + /sys/fs/cgroup/cpu/D/E/F (shares=1024) + +The same benchmark is running in group C & F, no other tasks are +running, the benchmark is capable to consumed all the CPUs. + +We suppose the group C will win more CPU resources since it could +enjoy all the shares of group A, but it's F who wins much more. + +The reason is because we have group B with shares as 2, since +A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, +so A->cfs_rq.load.weight become very small. + +And in calc_group_shares() we calculate shares as: + + load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); + shares = (tg_shares * load) / tg_weight; + +Since the 'cfs_rq->load.weight' is too small, the load become 0 +after scale down, although 'tg_shares' is 102400, shares of the se +which stand for group A on root cfs_rq become 2. + +While the se of D on root cfs_rq is far more bigger than 2, so it +wins the battle. + +Thus when scale_load_down() scale real weight down to 0, it's no +longer telling the real story, the caller will have the wrong +information and the calculation will be buggy. + +This patch add check in scale_load_down(), so the real weight will +be >= MIN_SHARES after scale, after applied the group C wins as +expected. + +Suggested-by: Peter Zijlstra +Signed-off-by: Michael Wang +Signed-off-by: Peter Zijlstra (Intel) +Reviewed-by: Vincent Guittot +Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.com +Signed-off-by: Mel Gorman +--- + kernel/sched/sched.h | 8 +++++++- + 1 file changed, 7 insertions(+), 1 deletion(-) + +diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h +index ba32909bbecc..f516e7a9d43f 100644 +--- a/kernel/sched/sched.h ++++ b/kernel/sched/sched.h +@@ -88,7 +88,13 @@ static inline void cpu_load_update_active(struct rq *this_rq) { } + #ifdef CONFIG_64BIT + # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT) + # define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT) +-# define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT) ++# define scale_load_down(w) \ ++({ \ ++ unsigned long __w = (w); \ ++ if (__w) \ ++ __w = max(2UL, __w >> SCHED_FIXEDPOINT_SHIFT); \ ++ __w; \ ++}) + #else + # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT) + # define scale_load(w) (w) diff --git a/patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch b/patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch new file mode 100644 index 0000000..2be0a65 --- /dev/null +++ b/patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch @@ -0,0 +1,42 @@ +From 4d35a8d31a8bb37ffc33267351eae679663b18db Mon Sep 17 00:00:00 2001 +From: Thomas Gleixner +Date: Tue, 20 Oct 2020 16:46:55 +0200 +Subject: [PATCH] sched: Reenable interrupts in do_sched_yield() + +References: git fixes (sched) +Patch-mainline: v5.11-rc1 +Git-commit: 345a957fcc95630bf5535d7668a59ed983eb49a7 + +do_sched_yield() invokes schedule() with interrupts disabled which is +not allowed. This goes back to the pre git era to commit a6efb709806c +("[PATCH] irqlock patch 2.5.27-H6") in the history tree. + +Reenable interrupts and remove the misleading comment which "explains" it. + +Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") +Signed-off-by: Thomas Gleixner +Signed-off-by: Peter Zijlstra (Intel) +Link: https://lkml.kernel.org/r/87r1pt7y5c.fsf@nanos.tec.linutronix.de +Signed-off-by: Mel Gorman +--- + kernel/sched/core.c | 6 +----- + 1 file changed, 1 insertion(+), 5 deletions(-) + +diff --git a/kernel/sched/core.c b/kernel/sched/core.c +index 616ffabb6d9f..a9fa554d73eb 100644 +--- a/kernel/sched/core.c ++++ b/kernel/sched/core.c +@@ -5043,12 +5043,8 @@ SYSCALL_DEFINE0(sched_yield) + schedstat_inc(rq->yld_count); + current->sched_class->yield_task(rq); + +- /* +- * Since we are going to call schedule() anyway, there's +- * no need to preempt or enable interrupts: +- */ + preempt_disable(); +- rq_unlock(rq, &rf); ++ rq_unlock_irq(rq, &rf); + sched_preempt_enable_no_resched(); + + schedule(); diff --git a/patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch b/patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch new file mode 100644 index 0000000..db8d93f --- /dev/null +++ b/patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch @@ -0,0 +1,83 @@ +From 9aa958f52d3b3d71544358106531f20d2887a466 Mon Sep 17 00:00:00 2001 +From: KeMeng Shi +Date: Mon, 16 Sep 2019 06:53:28 +0000 +Subject: [PATCH] sched/core: Fix migration to invalid CPU in + __set_cpus_allowed_ptr() + +References: git fixes (sched) +Patch-mainline: v5.4-rc1 +Git-commit: 714e501e16cd473538b609b3e351b2cc9f7f09ed + +An oops can be triggered in the scheduler when running qemu on arm64: + + Unable to handle kernel paging request at virtual address ffff000008effe40 + Internal error: Oops: 96000007 [#1] SMP + Process migration/0 (pid: 12, stack limit = 0x00000000084e3736) + pstate: 20000085 (nzCv daIf -PAN -UAO) + pc : __ll_sc___cmpxchg_case_acq_4+0x4/0x20 + lr : move_queued_task.isra.21+0x124/0x298 + ... + Call trace: + __ll_sc___cmpxchg_case_acq_4+0x4/0x20 + __migrate_task+0xc8/0xe0 + migration_cpu_stop+0x170/0x180 + cpu_stopper_thread+0xec/0x178 + smpboot_thread_fn+0x1ac/0x1e8 + kthread+0x134/0x138 + ret_from_fork+0x10/0x18 + +__set_cpus_allowed_ptr() will choose an active dest_cpu in affinity mask to +migrage the process if process is not currently running on any one of the +CPUs specified in affinity mask. __set_cpus_allowed_ptr() will choose an +invalid dest_cpu (dest_cpu >= nr_cpu_ids, 1024 in my virtual machine) if +CPUS in an affinity mask are deactived by cpu_down after cpumask_intersects +check. cpumask_test_cpu() of dest_cpu afterwards is overflown and may pass if +corresponding bit is coincidentally set. As a consequence, kernel will +access an invalid rq address associate with the invalid CPU in +migration_cpu_stop->__migrate_task->move_queued_task and the Oops occurs. + +The reproduce the crash: + + 1) A process repeatedly binds itself to cpu0 and cpu1 in turn by calling + sched_setaffinity. + + 2) A shell script repeatedly does "echo 0 > /sys/devices/system/cpu/cpu1/online" + and "echo 1 > /sys/devices/system/cpu/cpu1/online" in turn. + + 3) Oops appears if the invalid CPU is set in memory after tested cpumask. + +Signed-off-by: KeMeng Shi +Signed-off-by: Peter Zijlstra (Intel) +Reviewed-by: Valentin Schneider +Cc: Linus Torvalds +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Link: https://lkml.kernel.org/r/1568616808-16808-1-git-send-email-shikemeng@huawei.com +Signed-off-by: Ingo Molnar +Signed-off-by: Mel Gorman +--- + kernel/sched/core.c | 4 ++-- + 1 file changed, 2 insertions(+), 2 deletions(-) + +diff --git a/kernel/sched/core.c b/kernel/sched/core.c +index dec420fefa87..616ffabb6d9f 100644 +--- a/kernel/sched/core.c ++++ b/kernel/sched/core.c +@@ -1173,7 +1173,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, + if (cpumask_equal(&p->cpus_allowed, new_mask)) + goto out; + +- if (!cpumask_intersects(new_mask, cpu_valid_mask)) { ++ dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); ++ if (dest_cpu >= nr_cpu_ids) { + ret = -EINVAL; + goto out; + } +@@ -1194,7 +1195,6 @@ static int __set_cpus_allowed_ptr(struct task_struct *p, + if (cpumask_test_cpu(task_cpu(p), new_mask)) + goto out; + +- dest_cpu = cpumask_any_and(cpu_valid_mask, new_mask); + if (task_running(rq, p) || p->state == TASK_WAKING) { + struct migration_arg arg = { p, dest_cpu }; + /* Need help from migration thread: drop lock and wait. */ diff --git a/patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch b/patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch new file mode 100644 index 0000000..d1bf89c --- /dev/null +++ b/patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch @@ -0,0 +1,58 @@ +From 61492d9ac3fbbbed05f0e771bcddc8064d34a6c4 Mon Sep 17 00:00:00 2001 +From: Vincent Donnefort +Date: Thu, 4 Nov 2021 17:51:20 +0000 +Subject: [PATCH] sched/core: Mitigate race + cpus_share_cache()/update_top_cache_domain() + +References: git fixes (sched) +Patch-mainline: v5.16-rc1 +Git-commit: 42dc938a590c96eeb429e1830123fef2366d9c80 + +Nothing protects the access to the per_cpu variable sd_llc_id. When testing +the same CPU (i.e. this_cpu == that_cpu), a race condition exists with +update_top_cache_domain(). One scenario being: + + CPU1 CPU2 + ================================================================== + + per_cpu(sd_llc_id, CPUX) => 0 + partition_sched_domains_locked() + detach_destroy_domains() + cpus_share_cache(CPUX, CPUX) update_top_cache_domain(CPUX) + per_cpu(sd_llc_id, CPUX) => 0 + per_cpu(sd_llc_id, CPUX) = CPUX + per_cpu(sd_llc_id, CPUX) => CPUX + return false + +ttwu_queue_cond() wouldn't catch smp_processor_id() == cpu and the result +is a warning triggered from ttwu_queue_wakelist(). + +Avoid a such race in cpus_share_cache() by always returning true when +this_cpu == that_cpu. + +Fixes: 518cd6234178 ("sched: Only queue remote wakeups when crossing cache boundaries") +Reported-by: Jing-Ting Wu +Signed-off-by: Vincent Donnefort +Signed-off-by: Peter Zijlstra (Intel) +Reviewed-by: Valentin Schneider +Reviewed-by: Vincent Guittot +Link: https://lore.kernel.org/r/20211104175120.857087-1-vincent.donnefort@arm.com +Signed-off-by: Mel Gorman +--- + kernel/sched/core.c | 3 +++ + 1 file changed, 3 insertions(+) + +diff --git a/kernel/sched/core.c b/kernel/sched/core.c +index a9fa554d73eb..b27d09d514a1 100644 +--- a/kernel/sched/core.c ++++ b/kernel/sched/core.c +@@ -1892,6 +1892,9 @@ void wake_up_if_idle(int cpu) + + bool cpus_share_cache(int this_cpu, int that_cpu) + { ++ if (this_cpu == that_cpu) ++ return true; ++ + return per_cpu(sd_llc_id, this_cpu) == per_cpu(sd_llc_id, that_cpu); + } + #endif /* CONFIG_SMP */ diff --git a/patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch b/patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch new file mode 100644 index 0000000..6a59154 --- /dev/null +++ b/patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch @@ -0,0 +1,37 @@ +From 7f63cdadaf5dc6b8da65b2e0d69814ee7a795672 Mon Sep 17 00:00:00 2001 +From: Peng Liu +Date: Tue, 9 Jun 2020 23:09:36 +0800 +Subject: [PATCH] sched: correct SD_flags returned by tl->sd_flags() + +References: git fixes (sched) +Patch-mainline: v5.9-rc1 +Git-commit: 9b1b234bb86bcdcdb142e900d39b599185465dbb + +During sched domain init, we check whether non-topological SD_flags are +returned by tl->sd_flags(), if found, fire a waning and correct the +violation, but the code failed to correct the violation. Correct this. + +Fixes: 143e1e28cb40 ("sched: Rework sched_domain topology definition") +Signed-off-by: Peng Liu +Signed-off-by: Peter Zijlstra (Intel) +Reviewed-by: Vincent Guittot +Reviewed-by: Valentin Schneider +Link: https://lkml.kernel.org/r/20200609150936.GA13060@iZj6chx1xj0e0buvshuecpZ +Signed-off-by: Mel Gorman +--- + kernel/sched/topology.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c +index 76a3380de05b..1c2906a99a14 100644 +--- a/kernel/sched/topology.c ++++ b/kernel/sched/topology.c +@@ -1121,7 +1121,7 @@ sd_init(struct sched_domain_topology_level *tl, + sd_flags = (*tl->sd_flags)(); + if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS, + "wrong sd_flags in topology description\n")) +- sd_flags &= ~TOPOLOGY_SD_FLAGS; ++ sd_flags &= TOPOLOGY_SD_FLAGS; + + *sd = (struct sched_domain){ + .min_interval = sd_weight, diff --git a/patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch b/patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch new file mode 100644 index 0000000..451d789 --- /dev/null +++ b/patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch @@ -0,0 +1,93 @@ +From 2dda9d9e798105274ee3de4645756131b4926bc2 Mon Sep 17 00:00:00 2001 +From: Yicong Yang +Date: Tue, 30 May 2023 16:25:07 +0800 +Subject: [PATCH] sched/fair: Don't balance task to its current running CPU + +References: git fixes (sched) +Patch-mainline: v6.5-rc1 +Git-commit: 0dd37d6dd33a9c23351e6115ae8cdac7863bc7de + +We've run into the case that the balancer tries to balance a migration +disabled task and trigger the warning in set_task_cpu() like below: + + ------------[ cut here ]------------ + WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240 + Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip> + CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G O 6.1.0-rc4+ #1 + Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021 + pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) + pc : set_task_cpu+0x188/0x240 + lr : load_balance+0x5d0/0xc60 + sp : ffff80000803bc70 + x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040 + x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001 + x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78 + x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000 + x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000 + x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000 + x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530 + x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e + x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a + x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001 + Call trace: + set_task_cpu+0x188/0x240 + load_balance+0x5d0/0xc60 + rebalance_domains+0x26c/0x380 + _nohz_idle_balance.isra.0+0x1e0/0x370 + run_rebalance_domains+0x6c/0x80 + __do_softirq+0x128/0x3d8 + ____do_softirq+0x18/0x24 + call_on_irq_stack+0x2c/0x38 + do_softirq_own_stack+0x24/0x3c + __irq_exit_rcu+0xcc/0xf4 + irq_exit_rcu+0x18/0x24 + el1_interrupt+0x4c/0xe4 + el1h_64_irq_handler+0x18/0x2c + el1h_64_irq+0x74/0x78 + arch_cpu_idle+0x18/0x4c + default_idle_call+0x58/0x194 + do_idle+0x244/0x2b0 + cpu_startup_entry+0x30/0x3c + secondary_start_kernel+0x14c/0x190 + __secondary_switched+0xb0/0xb4 + ---[ end trace 0000000000000000 ]--- + +Further investigation shows that the warning is superfluous, the migration +disabled task is just going to be migrated to its current running CPU. +This is because that on load balance if the dst_cpu is not allowed by the +task, we'll re-select a new_dst_cpu as a candidate. If no task can be +balanced to dst_cpu we'll try to balance the task to the new_dst_cpu +instead. In this case when the migration disabled task is not on CPU it +only allows to run on its current CPU, load balance will select its +current CPU as new_dst_cpu and later triggers the warning above. + +The new_dst_cpu is chosen from the env->dst_grpmask. Currently it +contains CPUs in sched_group_span() and if we have overlapped groups it's +possible to run into this case. This patch makes env->dst_grpmask of +group_balance_mask() which exclude any CPUs from the busiest group and +solve the issue. For balancing in a domain with no overlapped groups +the behaviour keeps same as before. + +Suggested-by: Vincent Guittot +Signed-off-by: Yicong Yang +Signed-off-by: Peter Zijlstra (Intel) +Reviewed-by: Vincent Guittot +Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com +Signed-off-by: Mel Gorman +--- + kernel/sched/fair.c | 2 +- + 1 file changed, 1 insertion(+), 1 deletion(-) + +diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c +index e5152b0bcf67..dd85cf9d8187 100644 +--- a/kernel/sched/fair.c ++++ b/kernel/sched/fair.c +@@ -8715,7 +8715,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, + .sd = sd, + .dst_cpu = this_cpu, + .dst_rq = this_rq, +- .dst_grpmask = sched_group_span(sd->groups), ++ .dst_grpmask = group_balance_mask(sd->groups), + .idle = idle, + .loop_break = sched_nr_migrate_break, + .cpus = cpus, diff --git a/patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch b/patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch new file mode 100644 index 0000000..00556c4 --- /dev/null +++ b/patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch @@ -0,0 +1,52 @@ +From 782f0d9f62f0e8cb135e2991fdc71356d6c15e6b Mon Sep 17 00:00:00 2001 +From: Dave Kleikamp +Date: Mon, 15 May 2017 14:14:13 -0500 +Subject: [PATCH] sched/rt: Minimize rq->lock contention in + do_sched_rt_period_timer() + +References: git fixes (sched) +Patch-mainline: v4.13-rc1 +Git-commit: c249f255aab86b9b187ba319b9d2684841ac7c8d + +With CONFIG_RT_GROUP_SCHED=y, do_sched_rt_period_timer() sequentially +takes each CPU's rq->lock. On a large, busy system, the cumulative time it +takes to acquire each lock can be excessive, even triggering a watchdog +timeout. + +If rt_rq->rt_time and rt_rq->rt_nr_running are both zero, this function does +nothing while holding the lock, so don't bother taking it at all. + +Signed-off-by: Dave Kleikamp +Signed-off-by: Peter Zijlstra (Intel) +Cc: Linus Torvalds +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Link: http://lkml.kernel.org/r/a767637b-df85-912f-ba69-c90ee00a3fb6@oracle.com +Signed-off-by: Ingo Molnar +Signed-off-by: Mel Gorman +--- + kernel/sched/rt.c | 11 +++++++++++ + 1 file changed, 11 insertions(+) + +diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c +index ba240da708bb..b28045f8a3a0 100644 +--- a/kernel/sched/rt.c ++++ b/kernel/sched/rt.c +@@ -829,6 +829,17 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) + int enqueue = 0; + struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i); + struct rq *rq = rq_of_rt_rq(rt_rq); ++ int skip; ++ ++ /* ++ * When span == cpu_online_mask, taking each rq->lock ++ * can be time-consuming. Try to avoid it when possible. ++ */ ++ raw_spin_lock(&rt_rq->rt_runtime_lock); ++ skip = !rt_rq->rt_time && !rt_rq->rt_nr_running; ++ raw_spin_unlock(&rt_rq->rt_runtime_lock); ++ if (skip) ++ continue; + + raw_spin_lock(&rq->lock); + if (rt_rq->rt_time) { diff --git a/patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch b/patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch new file mode 100644 index 0000000..a98c1f0 --- /dev/null +++ b/patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch @@ -0,0 +1,73 @@ +From dad98a02679182b373701f296f498d0e83051b86 Mon Sep 17 00:00:00 2001 +From: Hailong Liu +Date: Wed, 18 Jul 2018 08:46:55 +0800 +Subject: [PATCH] sched/rt: Restore rt_runtime after disabling RT_RUNTIME_SHARE + +References: git fixes (sched) +Patch-mainline: v4.18-rc8 +Git-commit: f3d133ee0a17d5694c6f21873eec9863e11fa423 + +NO_RT_RUNTIME_SHARE feature is used to prevent a CPU borrow enough +runtime with a spin-rt-task. + +However, if RT_RUNTIME_SHARE feature is enabled and rt_rq has borrowd +enough rt_runtime at the beginning, rt_runtime can't be restored to +its initial bandwidth rt_runtime after we disable RT_RUNTIME_SHARE. + +E.g. on my PC with 4 cores, procedure to reproduce: +1) Make sure RT_RUNTIME_SHARE is enabled + cat /sys/kernel/debug/sched_features + GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY + CACHE_HOT_BUDDY WAKEUP_PREEMPTION NO_HRTICK NO_DOUBLE_TICK + LB_BIAS NONTASK_CAPACITY TTWU_QUEUE NO_SIS_AVG_CPU SIS_PROP + NO_WARN_DOUBLE_CLOCK RT_PUSH_IPI RT_RUNTIME_SHARE NO_LB_MIN + ATTACH_AGE_LOAD WA_IDLE WA_WEIGHT WA_BIAS +2) Start a spin-rt-task + ./loop_rr & +3) set affinity to the last cpu + taskset -p 8 $pid_of_loop_rr +4) Observe that last cpu have borrowed enough runtime. + cat /proc/sched_debug | grep rt_runtime + .rt_runtime : 950.000000 + .rt_runtime : 900.000000 + .rt_runtime : 950.000000 + .rt_runtime : 1000.000000 +5) Disable RT_RUNTIME_SHARE + echo NO_RT_RUNTIME_SHARE > /sys/kernel/debug/sched_features +6) Observe that rt_runtime can not been restored + cat /proc/sched_debug | grep rt_runtime + .rt_runtime : 950.000000 + .rt_runtime : 900.000000 + .rt_runtime : 950.000000 + .rt_runtime : 1000.000000 + +This patch help to restore rt_runtime after we disable +RT_RUNTIME_SHARE. + +Signed-off-by: Hailong Liu +Signed-off-by: Jiang Biao +Signed-off-by: Peter Zijlstra (Intel) +Cc: Linus Torvalds +Cc: Peter Zijlstra +Cc: Thomas Gleixner +Cc: zhong.weidong@zte.com.cn +Link: http://lkml.kernel.org/r/1531874815-39357-1-git-send-email-liu.hailong6@zte.com.cn +Signed-off-by: Ingo Molnar +Signed-off-by: Mel Gorman +--- + kernel/sched/rt.c | 2 ++ + 1 file changed, 2 insertions(+) + +diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c +index b28045f8a3a0..89b3448e2686 100644 +--- a/kernel/sched/rt.c ++++ b/kernel/sched/rt.c +@@ -836,6 +836,8 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) + * can be time-consuming. Try to avoid it when possible. + */ + raw_spin_lock(&rt_rq->rt_runtime_lock); ++ if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime != RUNTIME_INF) ++ rt_rq->rt_runtime = rt_b->rt_runtime; + skip = !rt_rq->rt_time && !rt_rq->rt_nr_running; + raw_spin_unlock(&rt_rq->rt_runtime_lock); + if (skip) diff --git a/series.conf b/series.conf index 6275774..1d78484 100644 --- a/series.conf +++ b/series.conf @@ -1223,6 +1223,7 @@ patches.suse/0001-smp-Avoid-sending-needless-IPI-in-smp_call_function_.patch patches.suse/0001-smp-cpumask-Use-non-atomic-cpumask_-set-clear-_cpu.patch patches.suse/0001-sched-core-Allow-__sched_setscheduler-in-interrupts-.patch + patches.suse/sched-rt-Minimize-rq-lock-contention-in-do_sched_rt_period_timer.patch patches.suse/0001-x86-tsc-Fold-set_cyc2ns_scale-into-caller.patch patches.suse/0001-sched-clock-Fix-early-boot-preempt-assumption-in-__s.patch patches.suse/0001-sched-deadline-Zero-out-positive-runtime-after-throt.patch @@ -36054,6 +36055,7 @@ patches.suse/perf-x86-intel-fix-unwind-errors-from-pebs-entries-mk-ii.patch patches.suse/perf-core-fix-crash-when-using-hw-tracing-kernel-filters.patch patches.suse/stop_machine-Disable-preemption-after-queueing-stopp.patch + patches.suse/sched-rt-Restore-rt_runtime-after-disabling-RT_RUNTIME_SHARE.patch patches.suse/x86-entry-64-remove-ebx-handling-from-error_entry-exit.patch patches.suse/x86-boot-fix-if_changed-build-flip-flop-bug patches.suse/squashfs-more-metadata-hardening.patch @@ -53265,6 +53267,7 @@ patches.suse/nfsd-degraded-slot-count-more-gracefully-as-allocati.patch patches.suse/ima-always-return-negative-code-for-error.patch patches.suse/efi-Restrict-efivar_ssdt_load-when-the-kernel-is-loc.patch + patches.suse/sched-core-Fix-migration-to-invalid-CPU-in-__set_cpus_allowed_ptr.patch patches.suse/powerpc-book3s64-radix-Remove-WARN_ON-in-destroy_con.patch patches.suse/KVM-PPC-Book3S-HV-use-smp_mb-when-setting-clearing-h.patch patches.suse/powerpc-pseries-Read-TLB-Block-Invalidate-Characteri.patch @@ -56026,6 +56029,7 @@ patches.suse/sched-fair-reorder-enqueue-dequeue_task_fair-path.patch patches.suse/sched-fair-fix-reordering-of-enqueue-dequeue_task_fair.patch patches.suse/sched-fair-fix-enqueue_task_fair-warning.patch + patches.suse/sched-Avoid-scale-real-weight-down-to-zero.patch patches.suse/irqchip-bcm2835-Quiesce-IRQs-left-enabled-by-bootloa.patch patches.suse/timekeeping-Prevent-32bit-truncation-in-scale64_chec.patch patches.suse/fbdev-g364fb-Fix-build-failure.patch @@ -57719,6 +57723,7 @@ patches.suse/0001-block-improve-discard-bio-alignment-in-__blkdev_issu.patch patches.suse/block-Use-non-_rcu-version-of-list-functions-for-tag.patch patches.suse/s390-ap-rework-crypto-config-info-and-default-domain-code + patches.suse/sched-correct-SD_flags-returned-by-tl-sd_flags.patch patches.suse/x86-speculation-merge-one-test-in-spectre_v2_user_select_mitigation.patch patches.suse/x86-mce-inject-fix-a-wrong-assignment-of-i_mce-status.patch patches.suse/platform-x86-intel-hid-Fix-return-value-check-in-che.patch @@ -59429,6 +59434,7 @@ patches.suse/s390-cio-fix-use-after-free-in-ccw_device_destroy_console patches.suse/s390-smp-perform-initial-cpu-reset-also-for-smt-siblings patches.suse/x86-kprobes-restore-btf-if-the-single-stepping-is-cancelled.patch + patches.suse/sched-Reenable-interrupts-in-do_sched_yield.patch patches.suse/x86-apic-Fix-x2apic-enablement-without-interrupt-rem.patch patches.suse/x86-msi-Only-use-high-bits-of-MSI-address-for-DMAR-u.patch patches.suse/x86-ioapic-Handle-Extended-Destination-ID-field-in-R.patch @@ -62360,6 +62366,7 @@ patches.suse/cifs-release-lock-earlier-in-dequeue_mid-error-case.patch patches.suse/cifs-fix-memory-leak-of-smb3_fs_context_dup-server_hostname.patch patches.suse/cifs-fix-potential-use-after-free-bugs.patch + patches.suse/sched-core-Mitigate-race-cpus_share_cache-update_top_cache_domain.patch patches.suse/PCI-MSI-Destroy-sysfs-before-freeing-entries.patch patches.suse/msft-hv-2480-x86-hyperv-Fix-NULL-deref-in-set_hv_tscchange_cb-if-.patch patches.suse/printk-Remove-printk.h-inclusion-in-percpu.h.patch @@ -64216,6 +64223,7 @@ patches.suse/svcrdma-Prevent-page-release-when-nothing-was-receiv.patch patches.suse/x86-retbleed-add-_x86_return_thunk-alignment-checks.patch patches.suse/x86-microcode-AMD-Load-late-on-both-threads-too.patch + patches.suse/sched-fair-Don-t-balance-task-to-its-current-running-CPU.patch patches.suse/net-mana-Add-support-for-vlan-tagging.patch patches.suse/net-nfc-Fix-use-after-free-caused-by-nfc_llcp_find_l.patch patches.suse/0003-fbdev-omapfb-lcd_mipid-Fix-an-error-handling-path-in.patch