|
Davidlohr Bueso |
df2923 |
From 11752adb68a388724b1935d57bf543897c34d80b Mon Sep 17 00:00:00 2001
|
|
Davidlohr Bueso |
df2923 |
From: Waiman Long <longman@redhat.com>
|
|
Davidlohr Bueso |
df2923 |
Date: Tue, 7 Nov 2017 16:18:06 -0500
|
|
Davidlohr Bueso |
df2923 |
Subject: [PATCH] locking/pvqspinlock: Implement hybrid PV queued/unfair locks
|
|
Davidlohr Bueso |
df2923 |
Git-commit: 11752adb68a388724b1935d57bf543897c34d80b
|
|
Jeff Mahoney |
1f975c |
Patch-mainline: v4.15-rc1
|
|
Davidlohr Bueso |
df2923 |
References: bsc#1050549
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
Currently, all the lock waiters entering the slowpath will do one
|
|
Davidlohr Bueso |
df2923 |
lock stealing attempt to acquire the lock. That helps performance,
|
|
Davidlohr Bueso |
df2923 |
especially in VMs with over-committed vCPUs. However, the current
|
|
Davidlohr Bueso |
df2923 |
pvqspinlocks still don't perform as good as unfair locks in many cases.
|
|
Davidlohr Bueso |
df2923 |
On the other hands, unfair locks do have the problem of lock starvation
|
|
Davidlohr Bueso |
df2923 |
that pvqspinlocks don't have.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
This patch combines the best attributes of an unfair lock and a
|
|
Davidlohr Bueso |
df2923 |
pvqspinlock into a hybrid lock with 2 modes - queued mode & unfair
|
|
Davidlohr Bueso |
df2923 |
mode. A lock waiter goes into the unfair mode when there are waiters
|
|
Davidlohr Bueso |
df2923 |
in the wait queue but the pending bit isn't set. Otherwise, it will
|
|
Davidlohr Bueso |
df2923 |
go into the queued mode waiting in the queue for its turn.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
On a 2-socket 36-core E5-2699 v3 system (HT off), a kernel build
|
|
Davidlohr Bueso |
df2923 |
(make -j<n>) was done in a VM with unpinned vCPUs 3 times with the
|
|
Davidlohr Bueso |
df2923 |
best time selected and <n> is the number of vCPUs available. The build
|
|
Davidlohr Bueso |
df2923 |
times of the original pvqspinlock, hybrid pvqspinlock and unfair lock
|
|
Davidlohr Bueso |
df2923 |
with various number of vCPUs are as follows:
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
vCPUs pvqlock hybrid pvqlock unfair lock
|
|
Davidlohr Bueso |
df2923 |
----- ------- -------------- -----------
|
|
Davidlohr Bueso |
df2923 |
30 342.1s 329.1s 329.1s
|
|
Davidlohr Bueso |
df2923 |
36 314.1s 305.3s 307.3s
|
|
Davidlohr Bueso |
df2923 |
45 345.0s 302.1s 306.6s
|
|
Davidlohr Bueso |
df2923 |
54 365.4s 308.6s 307.8s
|
|
Davidlohr Bueso |
df2923 |
72 358.9s 293.6s 303.9s
|
|
Davidlohr Bueso |
df2923 |
108 343.0s 285.9s 304.2s
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
The hybrid pvqspinlock performs better or comparable to the unfair
|
|
Davidlohr Bueso |
df2923 |
lock.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
By turning on QUEUED_LOCK_STAT, the table below showed the number
|
|
Davidlohr Bueso |
df2923 |
of lock acquisitions in unfair mode and queue mode after a kernel
|
|
Davidlohr Bueso |
df2923 |
build with various number of vCPUs.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
vCPUs queued mode unfair mode
|
|
Davidlohr Bueso |
df2923 |
----- ----------- -----------
|
|
Davidlohr Bueso |
df2923 |
30 9,130,518 294,954
|
|
Davidlohr Bueso |
df2923 |
36 10,856,614 386,809
|
|
Davidlohr Bueso |
df2923 |
45 8,467,264 11,475,373
|
|
Davidlohr Bueso |
df2923 |
54 6,409,987 19,670,855
|
|
Davidlohr Bueso |
df2923 |
72 4,782,063 25,712,180
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
It can be seen that as the VM became more and more over-committed,
|
|
Davidlohr Bueso |
df2923 |
the ratio of locks acquired in unfair mode increases. This is all
|
|
Davidlohr Bueso |
df2923 |
done automatically to get the best overall performance as possible.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
Using a kernel locking microbenchmark with number of locking
|
|
Davidlohr Bueso |
df2923 |
threads equals to the number of vCPUs available on the same machine,
|
|
Davidlohr Bueso |
df2923 |
the minimum, average and maximum (min/avg/max) numbers of locking
|
|
Davidlohr Bueso |
df2923 |
operations done per thread in a 5-second testing interval are shown
|
|
Davidlohr Bueso |
df2923 |
Below:
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
vCPUs hybrid pvqlock unfair lock
|
|
Davidlohr Bueso |
df2923 |
----- -------------- -----------
|
|
Davidlohr Bueso |
df2923 |
36 822,135/881,063/950,363 75,570/313,496/ 690,465
|
|
Davidlohr Bueso |
df2923 |
54 542,435/581,664/625,937 35,460/204,280/ 457,172
|
|
Davidlohr Bueso |
df2923 |
72 397,500/428,177/499,299 17,933/150,679/ 708,001
|
|
Davidlohr Bueso |
df2923 |
108 257,898/288,150/340,871 3,085/181,176/1,257,109
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
It can be seen that the hybrid pvqspinlocks are more fair and
|
|
Davidlohr Bueso |
df2923 |
performant than the unfair locks in this test.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
The table below shows the kernel build times on a smaller 2-socket
|
|
Davidlohr Bueso |
df2923 |
16-core 32-thread E5-2620 v4 system.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
vCPUs pvqlock hybrid pvqlock unfair lock
|
|
Davidlohr Bueso |
df2923 |
----- ------- -------------- -----------
|
|
Davidlohr Bueso |
df2923 |
16 436.8s 433.4s 435.6s
|
|
Davidlohr Bueso |
df2923 |
36 366.2s 364.8s 364.5s
|
|
Davidlohr Bueso |
df2923 |
48 423.6s 376.3s 370.2s
|
|
Davidlohr Bueso |
df2923 |
64 433.1s 376.6s 376.8s
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
Again, the performance of the hybrid pvqspinlock was comparable to
|
|
Davidlohr Bueso |
df2923 |
that of the unfair lock.
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
Signed-off-by: Waiman Long <longman@redhat.com>
|
|
Davidlohr Bueso |
df2923 |
Reviewed-by: Juergen Gross <jgross@suse.com>
|
|
Davidlohr Bueso |
df2923 |
Reviewed-by: Eduardo Valentin <eduval@amazon.com>
|
|
Davidlohr Bueso |
df2923 |
Acked-by: Peter Zijlstra <peterz@infradead.org>
|
|
Davidlohr Bueso |
df2923 |
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
Davidlohr Bueso |
df2923 |
Cc: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Davidlohr Bueso |
df2923 |
Cc: Paolo Bonzini <pbonzini@redhat.com>
|
|
Davidlohr Bueso |
df2923 |
Cc: Radim Krčmář <rkrcmar@redhat.com>
|
|
Davidlohr Bueso |
df2923 |
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
Davidlohr Bueso |
df2923 |
Link: http://lkml.kernel.org/r/1510089486-3466-1-git-send-email-longman@redhat.com
|
|
Davidlohr Bueso |
df2923 |
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Davidlohr Bueso |
df2923 |
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
---
|
|
Davidlohr Bueso |
df2923 |
kernel/locking/qspinlock_paravirt.h | 47 ++++++++++++++++++++++++++++++-------
|
|
Davidlohr Bueso |
df2923 |
1 file changed, 38 insertions(+), 9 deletions(-)
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
|
|
Davidlohr Bueso |
df2923 |
index 15b6a39366c6..6ee477765e6c 100644
|
|
Davidlohr Bueso |
df2923 |
--- a/kernel/locking/qspinlock_paravirt.h
|
|
Davidlohr Bueso |
df2923 |
+++ b/kernel/locking/qspinlock_paravirt.h
|
|
Davidlohr Bueso |
df2923 |
@@ -61,21 +61,50 @@ struct pv_node {
|
|
Davidlohr Bueso |
df2923 |
#include "qspinlock_stat.h"
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
/*
|
|
Davidlohr Bueso |
df2923 |
+ * Hybrid PV queued/unfair lock
|
|
Davidlohr Bueso |
df2923 |
+ *
|
|
Davidlohr Bueso |
df2923 |
* By replacing the regular queued_spin_trylock() with the function below,
|
|
Davidlohr Bueso |
df2923 |
* it will be called once when a lock waiter enter the PV slowpath before
|
|
Davidlohr Bueso |
df2923 |
- * being queued. By allowing one lock stealing attempt here when the pending
|
|
Davidlohr Bueso |
df2923 |
- * bit is off, it helps to reduce the performance impact of lock waiter
|
|
Davidlohr Bueso |
df2923 |
- * preemption without the drawback of lock starvation.
|
|
Davidlohr Bueso |
df2923 |
+ * being queued.
|
|
Davidlohr Bueso |
df2923 |
+ *
|
|
Davidlohr Bueso |
df2923 |
+ * The pending bit is set by the queue head vCPU of the MCS wait queue in
|
|
Davidlohr Bueso |
df2923 |
+ * pv_wait_head_or_lock() to signal that it is ready to spin on the lock.
|
|
Davidlohr Bueso |
df2923 |
+ * When that bit becomes visible to the incoming waiters, no lock stealing
|
|
Davidlohr Bueso |
df2923 |
+ * is allowed. The function will return immediately to make the waiters
|
|
Davidlohr Bueso |
df2923 |
+ * enter the MCS wait queue. So lock starvation shouldn't happen as long
|
|
Davidlohr Bueso |
df2923 |
+ * as the queued mode vCPUs are actively running to set the pending bit
|
|
Davidlohr Bueso |
df2923 |
+ * and hence disabling lock stealing.
|
|
Davidlohr Bueso |
df2923 |
+ *
|
|
Davidlohr Bueso |
df2923 |
+ * When the pending bit isn't set, the lock waiters will stay in the unfair
|
|
Davidlohr Bueso |
df2923 |
+ * mode spinning on the lock unless the MCS wait queue is empty. In this
|
|
Davidlohr Bueso |
df2923 |
+ * case, the lock waiters will enter the queued mode slowpath trying to
|
|
Davidlohr Bueso |
df2923 |
+ * become the queue head and set the pending bit.
|
|
Davidlohr Bueso |
df2923 |
+ *
|
|
Davidlohr Bueso |
df2923 |
+ * This hybrid PV queued/unfair lock combines the best attributes of a
|
|
Davidlohr Bueso |
df2923 |
+ * queued lock (no lock starvation) and an unfair lock (good performance
|
|
Davidlohr Bueso |
df2923 |
+ * on not heavily contended locks).
|
|
Davidlohr Bueso |
df2923 |
*/
|
|
Davidlohr Bueso |
df2923 |
-#define queued_spin_trylock(l) pv_queued_spin_steal_lock(l)
|
|
Davidlohr Bueso |
df2923 |
-static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock)
|
|
Davidlohr Bueso |
df2923 |
+#define queued_spin_trylock(l) pv_hybrid_queued_unfair_trylock(l)
|
|
Davidlohr Bueso |
df2923 |
+static inline bool pv_hybrid_queued_unfair_trylock(struct qspinlock *lock)
|
|
Davidlohr Bueso |
df2923 |
{
|
|
Davidlohr Bueso |
df2923 |
struct __qspinlock *l = (void *)lock;
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
- if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
|
|
Davidlohr Bueso |
df2923 |
- (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
|
|
Davidlohr Bueso |
df2923 |
- qstat_inc(qstat_pv_lock_stealing, true);
|
|
Davidlohr Bueso |
df2923 |
- return true;
|
|
Davidlohr Bueso |
df2923 |
+ /*
|
|
Davidlohr Bueso |
df2923 |
+ * Stay in unfair lock mode as long as queued mode waiters are
|
|
Davidlohr Bueso |
df2923 |
+ * present in the MCS wait queue but the pending bit isn't set.
|
|
Davidlohr Bueso |
df2923 |
+ */
|
|
Davidlohr Bueso |
df2923 |
+ for (;;) {
|
|
Davidlohr Bueso |
df2923 |
+ int val = atomic_read(&lock->val);
|
|
Davidlohr Bueso |
df2923 |
+
|
|
Davidlohr Bueso |
df2923 |
+ if (!(val & _Q_LOCKED_PENDING_MASK) &&
|
|
Davidlohr Bueso |
df2923 |
+ (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
|
|
Davidlohr Bueso |
df2923 |
+ qstat_inc(qstat_pv_lock_stealing, true);
|
|
Davidlohr Bueso |
df2923 |
+ return true;
|
|
Davidlohr Bueso |
df2923 |
+ }
|
|
Davidlohr Bueso |
df2923 |
+ if (!(val & _Q_TAIL_MASK) || (val & _Q_PENDING_MASK))
|
|
Davidlohr Bueso |
df2923 |
+ break;
|
|
Davidlohr Bueso |
df2923 |
+
|
|
Davidlohr Bueso |
df2923 |
+ cpu_relax();
|
|
Davidlohr Bueso |
df2923 |
}
|
|
Davidlohr Bueso |
df2923 |
|
|
Davidlohr Bueso |
df2923 |
return false;
|
|
Davidlohr Bueso |
df2923 |
--
|
|
Davidlohr Bueso |
df2923 |
2.13.6
|
|
Davidlohr Bueso |
df2923 |
|