Petr Mladek 4a6e4c
From 07edfece8bcb0580a1828d939e6f8d91a8603eb2 Mon Sep 17 00:00:00 2001
Petr Mladek 4a6e4c
From: Frederic Weisbecker <frederic@kernel.org>
Petr Mladek 4a6e4c
Date: Wed, 1 Dec 2021 16:19:44 +0100
Petr Mladek 4a6e4c
Subject: [PATCH] workqueue: Fix unbind_workers() VS wq_worker_running() race
Petr Mladek 4a6e4c
Git-commit: 07edfece8bcb0580a1828d939e6f8d91a8603eb2
Petr Mladek 4a6e4c
Patch-mainline: v5.17-rc1
Petr Mladek 4a6e4c
References: bsc#1195062
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
At CPU-hotplug time, unbind_worker() may preempt a worker while it is
Petr Mladek 4a6e4c
waking up. In that case the following scenario can happen:
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
        unbind_workers()                     wq_worker_running()
Petr Mladek 4a6e4c
        --------------                      -------------------
Petr Mladek 4a6e4c
        	                      if (!(worker->flags & WORKER_NOT_RUNNING))
Petr Mladek 4a6e4c
        	                          //PREEMPTED by unbind_workers
Petr Mladek 4a6e4c
        worker->flags |= WORKER_UNBOUND;
Petr Mladek 4a6e4c
        [...]
Petr Mladek 4a6e4c
        atomic_set(&pool->nr_running, 0);
Petr Mladek 4a6e4c
        //resume to worker
Petr Mladek 4a6e4c
		                              atomic_inc(&worker->pool->nr_running);
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
After unbind_worker() resets pool->nr_running, the value is expected to
Petr Mladek 4a6e4c
remain 0 until the pool ever gets rebound in case cpu_up() is called on
Petr Mladek 4a6e4c
the target CPU in the future. But here the race leaves pool->nr_running
Petr Mladek 4a6e4c
with a value of 1, triggering the following warning when the worker goes
Petr Mladek 4a6e4c
idle:
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
	WARNING: CPU: 3 PID: 34 at kernel/workqueue.c:1823 worker_enter_idle+0x95/0xc0
Petr Mladek 4a6e4c
	Modules linked in:
Petr Mladek 4a6e4c
	CPU: 3 PID: 34 Comm: kworker/3:0 Not tainted 5.16.0-rc1+ #34
Petr Mladek 4a6e4c
	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
Petr Mladek 4a6e4c
	Workqueue:  0x0 (rcu_par_gp)
Petr Mladek 4a6e4c
	RIP: 0010:worker_enter_idle+0x95/0xc0
Petr Mladek 4a6e4c
	Code: 04 85 f8 ff ff ff 39 c1 7f 09 48 8b 43 50 48 85 c0 74 1b 83 e2 04 75 99 8b 43 34 39 43 30 75 91 8b 83 00 03 00 00 85 c0 74 87 <0f> 0b 5b c3 48 8b 35 70 f1 37 01 48 8d 7b 48 48 81 c6 e0 93  0
Petr Mladek 4a6e4c
	RSP: 0000:ffff9b7680277ed0 EFLAGS: 00010086
Petr Mladek 4a6e4c
	RAX: 00000000ffffffff RBX: ffff93465eae9c00 RCX: 0000000000000000
Petr Mladek 4a6e4c
	RDX: 0000000000000000 RSI: ffff9346418a0000 RDI: ffff934641057140
Petr Mladek 4a6e4c
	RBP: ffff934641057170 R08: 0000000000000001 R09: ffff9346418a0080
Petr Mladek 4a6e4c
	R10: ffff9b768027fdf0 R11: 0000000000002400 R12: ffff93465eae9c20
Petr Mladek 4a6e4c
	R13: ffff93465eae9c20 R14: ffff93465eae9c70 R15: ffff934641057140
Petr Mladek 4a6e4c
	FS:  0000000000000000(0000) GS:ffff93465eac0000(0000) knlGS:0000000000000000
Petr Mladek 4a6e4c
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Petr Mladek 4a6e4c
	CR2: 0000000000000000 CR3: 000000001cc0c000 CR4: 00000000000006e0
Petr Mladek 4a6e4c
	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Petr Mladek 4a6e4c
	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Petr Mladek 4a6e4c
	Call Trace:
Petr Mladek 4a6e4c
	  <TASK>
Petr Mladek 4a6e4c
	  worker_thread+0x89/0x3d0
Petr Mladek 4a6e4c
	  ? process_one_work+0x400/0x400
Petr Mladek 4a6e4c
	  kthread+0x162/0x190
Petr Mladek 4a6e4c
	  ? set_kthread_struct+0x40/0x40
Petr Mladek 4a6e4c
	  ret_from_fork+0x22/0x30
Petr Mladek 4a6e4c
	  </TASK>
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
Also due to this incorrect "nr_running == 1", further queued work may
Petr Mladek 4a6e4c
end up not being served, because no worker is awaken at work insert time.
Petr Mladek 4a6e4c
This raises rcutorture writer stalls for example.
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
Fix this with disabling preemption in the right place in
Petr Mladek 4a6e4c
wq_worker_running().
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
It's worth noting that if the worker migrates and runs concurrently with
Petr Mladek 4a6e4c
unbind_workers(), it is guaranteed to see the WORKER_UNBOUND flag update
Petr Mladek 4a6e4c
due to set_cpus_allowed_ptr() acquiring/releasing rq->lock.
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
Fixes: 6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq lock")
Petr Mladek 4a6e4c
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Petr Mladek 4a6e4c
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Petr Mladek 4a6e4c
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Petr Mladek 4a6e4c
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Petr Mladek 4a6e4c
Cc: Thomas Gleixner <tglx@linutronix.de>
Petr Mladek 4a6e4c
Cc: Ingo Molnar <mingo@redhat.com>
Petr Mladek 4a6e4c
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Petr Mladek 4a6e4c
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Petr Mladek 4a6e4c
Signed-off-by: Tejun Heo <tj@kernel.org>
Petr Mladek 4a6e4c
Acked-by: Petr Mladek <pmladek@suse.com>
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
---
Petr Mladek 4a6e4c
 kernel/workqueue.c | 9 +++++++++
Petr Mladek 4a6e4c
 1 file changed, 9 insertions(+)
Petr Mladek 4a6e4c
Petr Mladek 4a6e4c
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
Petr Mladek 4a6e4c
index 332361cf215f..5094573e8b45 100644
Petr Mladek 4a6e4c
--- a/kernel/workqueue.c
Petr Mladek 4a6e4c
+++ b/kernel/workqueue.c
Petr Mladek 4a6e4c
@@ -868,8 +868,17 @@ void wq_worker_running(struct task_struct *task)
Petr Mladek 4a6e4c
 
Petr Mladek 4a6e4c
 	if (!worker->sleeping)
Petr Mladek 4a6e4c
 		return;
Petr Mladek 4a6e4c
+
Petr Mladek 4a6e4c
+	/*
Petr Mladek 4a6e4c
+	 * If preempted by unbind_workers() between the WORKER_NOT_RUNNING check
Petr Mladek 4a6e4c
+	 * and the nr_running increment below, we may ruin the nr_running reset
Petr Mladek 4a6e4c
+	 * and leave with an unexpected pool->nr_running == 1 on the newly unbound
Petr Mladek 4a6e4c
+	 * pool. Protect against such race.
Petr Mladek 4a6e4c
+	 */
Petr Mladek 4a6e4c
+	preempt_disable();
Petr Mladek 4a6e4c
 	if (!(worker->flags & WORKER_NOT_RUNNING))
Petr Mladek 4a6e4c
 		atomic_inc(&worker->pool->nr_running);
Petr Mladek 4a6e4c
+	preempt_enable();
Petr Mladek 4a6e4c
 	worker->sleeping = 0;
Petr Mladek 4a6e4c
 }
Petr Mladek 4a6e4c
 
Petr Mladek 4a6e4c
-- 
Petr Mladek 4a6e4c
2.26.2
Petr Mladek 4a6e4c