Davidlohr Bueso fbc2c0
From 50972fe78f24f1cd0b9d7bbf1f87d2be9e4f412e Mon Sep 17 00:00:00 2001
Davidlohr Bueso fbc2c0
From: Prateek Sood <prsood@codeaurora.org>
Davidlohr Bueso fbc2c0
Date: Fri, 14 Jul 2017 19:17:56 +0530
Davidlohr Bueso fbc2c0
Subject: [PATCH] locking/osq_lock: Fix osq_lock queue corruption
Davidlohr Bueso fbc2c0
Git-commit: 50972fe78f24f1cd0b9d7bbf1f87d2be9e4f412e 
Benjamin Poirier b965d1
Patch-mainline: v4.14-rc1
Davidlohr Bueso fbc2c0
References: bsc#1050549
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
Fix ordering of link creation between node->prev and prev->next in
Davidlohr Bueso fbc2c0
osq_lock(). A case in which the status of optimistic spin queue is
Davidlohr Bueso fbc2c0
CPU6->CPU2 in which CPU6 has acquired the lock.
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
        tail
Davidlohr Bueso fbc2c0
          v
Davidlohr Bueso fbc2c0
  ,-. <- ,-.
Davidlohr Bueso fbc2c0
  |6|    |2|
Davidlohr Bueso fbc2c0
  `-' -> `-'
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
At this point if CPU0 comes in to acquire osq_lock, it will update the
Davidlohr Bueso fbc2c0
tail count.
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
  CPU2			CPU0
Davidlohr Bueso fbc2c0
  ----------------------------------
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
				       tail
Davidlohr Bueso fbc2c0
				         v
Davidlohr Bueso fbc2c0
			  ,-. <- ,-.    ,-.
Davidlohr Bueso fbc2c0
			  |6|    |2|    |0|
Davidlohr Bueso fbc2c0
			  `-' -> `-'    `-'
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
After tail count update if CPU2 starts to unqueue itself from
Davidlohr Bueso fbc2c0
optimistic spin queue, it will find an updated tail count with CPU0 and
Davidlohr Bueso fbc2c0
update CPU2 node->next to NULL in osq_wait_next().
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
  unqueue-A
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
	       tail
Davidlohr Bueso fbc2c0
	         v
Davidlohr Bueso fbc2c0
  ,-. <- ,-.    ,-.
Davidlohr Bueso fbc2c0
  |6|    |2|    |0|
Davidlohr Bueso fbc2c0
  `-'    `-'    `-'
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
  unqueue-B
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
  ->tail != curr && !node->next
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
If reordering of following stores happen then prev->next where prev
Davidlohr Bueso fbc2c0
being CPU2 would be updated to point to CPU0 node:
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
				       tail
Davidlohr Bueso fbc2c0
				         v
Davidlohr Bueso fbc2c0
			  ,-. <- ,-.    ,-.
Davidlohr Bueso fbc2c0
			  |6|    |2|    |0|
Davidlohr Bueso fbc2c0
			  `-'    `-' -> `-'
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
  osq_wait_next()
Davidlohr Bueso fbc2c0
    node->next <- 0
Davidlohr Bueso fbc2c0
    xchg(node->next, NULL)
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
	       tail
Davidlohr Bueso fbc2c0
	         v
Davidlohr Bueso fbc2c0
  ,-. <- ,-.    ,-.
Davidlohr Bueso fbc2c0
  |6|    |2|    |0|
Davidlohr Bueso fbc2c0
  `-'    `-'    `-'
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
  unqueue-C
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
At this point if next instruction
Davidlohr Bueso fbc2c0
	WRITE_ONCE(next->prev, prev);
Davidlohr Bueso fbc2c0
in CPU2 path is committed before the update of CPU0 node->prev = prev then
Davidlohr Bueso fbc2c0
CPU0 node->prev will point to CPU6 node.
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
	       tail
Davidlohr Bueso fbc2c0
    v----------. v
Davidlohr Bueso fbc2c0
  ,-. <- ,-.    ,-.
Davidlohr Bueso fbc2c0
  |6|    |2|    |0|
Davidlohr Bueso fbc2c0
  `-'    `-'    `-'
Davidlohr Bueso fbc2c0
     `----------^
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
At this point if CPU0 path's node->prev = prev is committed resulting
Davidlohr Bueso fbc2c0
in change of CPU0 prev back to CPU2 node. CPU2 node->next is NULL
Davidlohr Bueso fbc2c0
currently,
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
				       tail
Davidlohr Bueso fbc2c0
			                 v
Davidlohr Bueso fbc2c0
			  ,-. <- ,-. <- ,-.
Davidlohr Bueso fbc2c0
			  |6|    |2|    |0|
Davidlohr Bueso fbc2c0
			  `-'    `-'    `-'
Davidlohr Bueso fbc2c0
			     `----------^
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
so if CPU0 gets into unqueue path of osq_lock it will keep spinning
Davidlohr Bueso fbc2c0
in infinite loop as condition prev->next == node will never be true.
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
Davidlohr Bueso fbc2c0
[ Added pictures, rewrote comments. ]
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Davidlohr Bueso fbc2c0
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso fbc2c0
Cc: Peter Zijlstra <peterz@infradead.org>
Davidlohr Bueso fbc2c0
Cc: Thomas Gleixner <tglx@linutronix.de>
Davidlohr Bueso fbc2c0
Cc: sramana@codeaurora.org
Davidlohr Bueso fbc2c0
Link: http://lkml.kernel.org/r/1500040076-27626-1-git-send-email-prsood@codeaurora.org
Davidlohr Bueso fbc2c0
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Davidlohr Bueso fbc2c0
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
---
Davidlohr Bueso fbc2c0
 kernel/locking/osq_lock.c | 13 +++++++++++++
Davidlohr Bueso fbc2c0
 1 file changed, 13 insertions(+)
Davidlohr Bueso fbc2c0
Davidlohr Bueso fbc2c0
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
Davidlohr Bueso fbc2c0
index a3167941093b..a74ee6abd039 100644
Davidlohr Bueso fbc2c0
--- a/kernel/locking/osq_lock.c
Davidlohr Bueso fbc2c0
+++ b/kernel/locking/osq_lock.c
Davidlohr Bueso fbc2c0
@@ -109,6 +109,19 @@ bool osq_lock(struct optimistic_spin_queue *lock)
Davidlohr Bueso fbc2c0
 
Davidlohr Bueso fbc2c0
 	prev = decode_cpu(old);
Davidlohr Bueso fbc2c0
 	node->prev = prev;
Davidlohr Bueso fbc2c0
+
Davidlohr Bueso fbc2c0
+	/*
Davidlohr Bueso fbc2c0
+	 * osq_lock()			unqueue
Davidlohr Bueso fbc2c0
+	 *
Davidlohr Bueso fbc2c0
+	 * node->prev = prev		osq_wait_next()
Davidlohr Bueso fbc2c0
+	 * WMB				MB
Davidlohr Bueso fbc2c0
+	 * prev->next = node		next->prev = prev // unqueue-C
Davidlohr Bueso fbc2c0
+	 *
Davidlohr Bueso fbc2c0
+	 * Here 'node->prev' and 'next->prev' are the same variable and we need
Davidlohr Bueso fbc2c0
+	 * to ensure these stores happen in-order to avoid corrupting the list.
Davidlohr Bueso fbc2c0
+	 */
Davidlohr Bueso fbc2c0
+	smp_wmb();
Davidlohr Bueso fbc2c0
+
Davidlohr Bueso fbc2c0
 	WRITE_ONCE(prev->next, node);
Davidlohr Bueso fbc2c0
 
Davidlohr Bueso fbc2c0
 	/*
Davidlohr Bueso fbc2c0
-- 
Davidlohr Bueso fbc2c0
2.12.0
Davidlohr Bueso fbc2c0