Miroslav Benes 8e95b8
From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Miroslav Benes 8e95b8
Date: Mon, 2 Nov 2020 15:31:27 -0500
Miroslav Benes 8e95b8
Subject: ring-buffer: Fix recursion protection transitions between interrupt
Miroslav Benes 8e95b8
 context
Miroslav Benes 8e95b8
Git-commit: b02414c8f045ab3b9afc816c3735bc98c5c3d262
Miroslav Benes 8e95b8
Patch-mainline: v5.10-rc3
Miroslav Benes 8e95b8
References: git-fixes
Miroslav Benes 8e95b8
Miroslav Benes 8e95b8
The recursion protection of the ring buffer depends on preempt_count() to be
Miroslav Benes 8e95b8
correct. But it is possible that the ring buffer gets called after an
Miroslav Benes 8e95b8
interrupt comes in but before it updates the preempt_count(). This will
Miroslav Benes 8e95b8
trigger a false positive in the recursion code.
Miroslav Benes 8e95b8
Miroslav Benes 8e95b8
Use the same trick from the ftrace function callback recursion code which
Miroslav Benes 8e95b8
uses a "transition" bit that gets set, to allow for a single recursion for
Miroslav Benes 8e95b8
to handle transitions between contexts.
Miroslav Benes 8e95b8
Miroslav Benes 8e95b8
Cc: stable@vger.kernel.org
Miroslav Benes 8e95b8
Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
Miroslav Benes 8e95b8
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Miroslav Benes 8e95b8
[ mb: adapted to the current SLE15-SP1 code ]
Miroslav Benes 8e95b8
Acked-by: Miroslav Benes <mbenes@suse.cz>
Miroslav Benes 8e95b8
---
Miroslav Benes 8e95b8
 kernel/trace/ring_buffer.c |   54 ++++++++++++++++++++++++++++++++++++---------
Miroslav Benes 8e95b8
 1 file changed, 44 insertions(+), 10 deletions(-)
Miroslav Benes 8e95b8
Miroslav Benes 8e95b8
--- a/kernel/trace/ring_buffer.c
Miroslav Benes 8e95b8
+++ b/kernel/trace/ring_buffer.c
Miroslav Benes 8e95b8
@@ -417,14 +417,16 @@ struct rb_event_info {
Miroslav Benes 8e95b8
 
Miroslav Benes 8e95b8
 /*
Miroslav Benes 8e95b8
  * Used for which event context the event is in.
Miroslav Benes 8e95b8
- *  NMI     = 0
Miroslav Benes 8e95b8
- *  IRQ     = 1
Miroslav Benes 8e95b8
- *  SOFTIRQ = 2
Miroslav Benes 8e95b8
- *  NORMAL  = 3
Miroslav Benes 8e95b8
+ *  TRANSITION = 0
Miroslav Benes 8e95b8
+ *  NMI     = 1
Miroslav Benes 8e95b8
+ *  IRQ     = 2
Miroslav Benes 8e95b8
+ *  SOFTIRQ = 3
Miroslav Benes 8e95b8
+ *  NORMAL  = 4
Miroslav Benes 8e95b8
  *
Miroslav Benes 8e95b8
  * See trace_recursive_lock() comment below for more details.
Miroslav Benes 8e95b8
  */
Miroslav Benes 8e95b8
 enum {
Miroslav Benes 8e95b8
+	RB_CTX_TRANSITION,
Miroslav Benes 8e95b8
 	RB_CTX_NMI,
Miroslav Benes 8e95b8
 	RB_CTX_IRQ,
Miroslav Benes 8e95b8
 	RB_CTX_SOFTIRQ,
Miroslav Benes 8e95b8
@@ -2555,10 +2557,10 @@ rb_wakeups(struct ring_buffer *buffer, s
Miroslav Benes 8e95b8
  * a bit of overhead in something as critical as function tracing,
Miroslav Benes 8e95b8
  * we use a bitmask trick.
Miroslav Benes 8e95b8
  *
Miroslav Benes 8e95b8
- *  bit 0 =  NMI context
Miroslav Benes 8e95b8
- *  bit 1 =  IRQ context
Miroslav Benes 8e95b8
- *  bit 2 =  SoftIRQ context
Miroslav Benes 8e95b8
- *  bit 3 =  normal context.
Miroslav Benes 8e95b8
+ *  bit 1 =  NMI context
Miroslav Benes 8e95b8
+ *  bit 2 =  IRQ context
Miroslav Benes 8e95b8
+ *  bit 3 =  SoftIRQ context
Miroslav Benes 8e95b8
+ *  bit 4 =  normal context.
Miroslav Benes 8e95b8
  *
Miroslav Benes 8e95b8
  * This works because this is the order of contexts that can
Miroslav Benes 8e95b8
  * preempt other contexts. A SoftIRQ never preempts an IRQ
Miroslav Benes 8e95b8
@@ -2581,6 +2583,30 @@ rb_wakeups(struct ring_buffer *buffer, s
Miroslav Benes 8e95b8
  * The least significant bit can be cleared this way, and it
Miroslav Benes 8e95b8
  * just so happens that it is the same bit corresponding to
Miroslav Benes 8e95b8
  * the current context.
Miroslav Benes 8e95b8
+ *
Miroslav Benes 8e95b8
+ * Now the TRANSITION bit breaks the above slightly. The TRANSITION bit
Miroslav Benes 8e95b8
+ * is set when a recursion is detected at the current context, and if
Miroslav Benes 8e95b8
+ * the TRANSITION bit is already set, it will fail the recursion.
Miroslav Benes 8e95b8
+ * This is needed because there's a lag between the changing of
Miroslav Benes 8e95b8
+ * interrupt context and updating the preempt count. In this case,
Miroslav Benes 8e95b8
+ * a false positive will be found. To handle this, one extra recursion
Miroslav Benes 8e95b8
+ * is allowed, and this is done by the TRANSITION bit. If the TRANSITION
Miroslav Benes 8e95b8
+ * bit is already set, then it is considered a recursion and the function
Miroslav Benes 8e95b8
+ * ends. Otherwise, the TRANSITION bit is set, and that bit is returned.
Miroslav Benes 8e95b8
+ *
Miroslav Benes 8e95b8
+ * On the trace_recursive_unlock(), the TRANSITION bit will be the first
Miroslav Benes 8e95b8
+ * to be cleared. Even if it wasn't the context that set it. That is,
Miroslav Benes 8e95b8
+ * if an interrupt comes in while NORMAL bit is set and the ring buffer
Miroslav Benes 8e95b8
+ * is called before preempt_count() is updated, since the check will
Miroslav Benes 8e95b8
+ * be on the NORMAL bit, the TRANSITION bit will then be set. If an
Miroslav Benes 8e95b8
+ * NMI then comes in, it will set the NMI bit, but when the NMI code
Miroslav Benes 8e95b8
+ * does the trace_recursive_unlock() it will clear the TRANSTION bit
Miroslav Benes 8e95b8
+ * and leave the NMI bit set. But this is fine, because the interrupt
Miroslav Benes 8e95b8
+ * code that set the TRANSITION bit will then clear the NMI bit when it
Miroslav Benes 8e95b8
+ * calls trace_recursive_unlock(). If another NMI comes in, it will
Miroslav Benes 8e95b8
+ * set the TRANSITION bit and continue.
Miroslav Benes 8e95b8
+ *
Miroslav Benes 8e95b8
+ * Note: The TRANSITION bit only handles a single transition between context.
Miroslav Benes 8e95b8
  */
Miroslav Benes 8e95b8
 
Miroslav Benes 8e95b8
 static __always_inline int
Miroslav Benes 8e95b8
@@ -2599,8 +2625,16 @@ trace_recursive_lock(struct ring_buffer_
Miroslav Benes 8e95b8
 	} else
Miroslav Benes 8e95b8
 		bit = RB_CTX_NORMAL;
Miroslav Benes 8e95b8
 
Miroslav Benes 8e95b8
-	if (unlikely(val & (1 << bit)))
Miroslav Benes 8e95b8
-		return 1;
Miroslav Benes 8e95b8
+	if (unlikely(val & (1 << bit))) {
Miroslav Benes 8e95b8
+		/*
Miroslav Benes 8e95b8
+		 * It is possible that this was called by transitioning
Miroslav Benes 8e95b8
+		 * between interrupt context, and preempt_count() has not
Miroslav Benes 8e95b8
+		 * been updated yet. In this case, use the TRANSITION bit.
Miroslav Benes 8e95b8
+		 */
Miroslav Benes 8e95b8
+		bit = RB_CTX_TRANSITION;
Miroslav Benes 8e95b8
+		if (val & (1 << bit))
Miroslav Benes 8e95b8
+			return 1;
Miroslav Benes 8e95b8
+	}
Miroslav Benes 8e95b8
 
Miroslav Benes 8e95b8
 	val |= (1 << bit);
Miroslav Benes 8e95b8
 	cpu_buffer->current_context = val;