Blob Blame History Raw
From e45cdc71d1fa5ac3a57b23acc31eb959e4f60135 Mon Sep 17 00:00:00 2001
From: Andy Lutomirski <luto@kernel.org>
Date: Thu, 3 Dec 2020 21:07:06 -0800
Subject: [PATCH] membarrier: Execute SYNC_CORE on the calling thread
Git-commit: e45cdc71d1fa5ac3a57b23acc31eb959e4f60135
Patch-mainline: v5.10
References: git-fixes

membarrier()'s MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is documented as
syncing the core on all sibling threads but not necessarily the calling
thread.  This behavior is fundamentally buggy and cannot be used safely.

Suppose a user program has two threads.  Thread A is on CPU 0 and thread B
is on CPU 1.  Thread A modifies some text and calls
membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE).

Then thread B executes the modified code.  If, at any point after
membarrier() decides which CPUs to target, thread A could be preempted and
replaced by thread B on CPU 0.  This could even happen on exit from the
membarrier() syscall.  If this happens, thread B will end up running on CPU
0 without having synced.

In principle, this could be fixed by arranging for the scheduler to issue
sync_core_before_usermode() whenever switching between two threads in the
same mm if there is any possibility of a concurrent membarrier() call, but
this would have considerable overhead.  Instead, make membarrier() sync the
calling CPU as well.

As an optimization, this avoids an extra smp_mb() in the default
barrier-only mode and an extra rseq preempt on the caller.

Fixes: 70216e18e519 ("membarrier: Provide core serializing command, *_SYNC_CORE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/r/250ded637696d490c69bef1877148db86066881c.1607058304.git.luto@kernel.org
Signed-off-by: Frederic Weisbecker <fweisbecker@suse.com>
---
 kernel/sched/membarrier.c |   36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -166,7 +166,8 @@ static int membarrier_private_expedited(
 			return -EPERM;
 	}
 
-	if (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1)
+	if (flags != MEMBARRIER_FLAG_SYNC_CORE &&
+	    (atomic_read(&mm->mm_users) == 1 || num_online_cpus() == 1))
 		return 0;
 
 	/*
@@ -183,25 +184,32 @@ static int membarrier_private_expedited(
 	for_each_online_cpu(cpu) {
 		struct task_struct *p;
 
-		/*
-		 * Skipping the current CPU is OK even through we can be
-		 * migrated at any point. The current CPU, at the point
-		 * where we read raw_smp_processor_id(), is ensured to
-		 * be in program order with respect to the caller
-		 * thread. Therefore, we can skip this CPU from the
-		 * iteration.
-		 */
-		if (cpu == raw_smp_processor_id())
-			continue;
 		p = rcu_dereference(cpu_rq(cpu)->curr);
 		if (p && p->mm == mm)
 			__cpumask_set_cpu(cpu, tmpmask);
 	}
 	rcu_read_unlock();
 
-	preempt_disable();
-	smp_call_function_many(tmpmask, ipi_func, NULL, 1);
-	preempt_enable();
+	/*
+	 * For regular membarrier, we can save a few cycles by
+	 * skipping the current cpu -- we're about to do smp_mb()
+	 * below, and if we migrate to a different cpu, this cpu
+	 * and the new cpu will execute a full barrier in the
+	 * scheduler.
+	 *
+	 * For SYNC_CORE, we do need a barrier on the current cpu --
+	 * otherwise, if we are migrated and replaced by a different
+	 * task in the same mm just before, during, or after
+	 * membarrier, we will end up with some thread in the mm
+	 * running without a core sync.
+	 */
+	if (flags != MEMBARRIER_FLAG_SYNC_CORE) {
+		preempt_disable();
+		smp_call_function_many(tmpmask, ipi_func, NULL, 1);
+		preempt_enable();
+	} else {
+		on_each_cpu_mask(tmpmask, ipi_func, NULL, true);
+	}
 
 	free_cpumask_var(tmpmask);
 	cpus_read_unlock();