Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 28 Aug 2021 02:21:31 GMT
From:      Alexander Motin <mav@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org
Subject:   git: 97e945e0fffc - stable/13 - sched_ule(4): Use trylock when stealing load.
Message-ID:  <202108280221.17S2LVT7052187@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch stable/13 has been updated by mav:

URL: https://cgit.FreeBSD.org/src/commit/?id=97e945e0fffc18a06cd39cf6220dfaca32a25bbb

commit 97e945e0fffc18a06cd39cf6220dfaca32a25bbb
Author:     Alexander Motin <mav@FreeBSD.org>
AuthorDate: 2021-08-02 02:42:01 +0000
Commit:     Alexander Motin <mav@FreeBSD.org>
CommitDate: 2021-08-28 02:17:56 +0000

    sched_ule(4): Use trylock when stealing load.
    
    On some load patterns it is possible for several CPUs to try steal
    thread from the same CPU despite randomization introduced.  It may
    cause significant lock contention when holding one queue lock idle
    thread tries to acquire another one.  Use of trylock on the remote
    queue allows both reduce the contention and handle lock ordering
    easier.  If we can't get lock inside tdq_trysteal() we just return,
    allowing tdq_idled() handle it.  If it happens in tdq_idled(), then
    we repeat search for load skipping this CPU.
    
    On 2-socket 80-thread Xeon system I am observing dramatic reduction
    of the lock spinning time when doing random uncached 4KB reads from
    12 ZVOLs, while IOPS increase from 327K to 403K.
    
    MFC after:      1 month
    
    (cherry picked from commit 8bb173fb5bc33a02d5a4670c9a60bba0ece07bac)
---
 sys/kern/sched_ule.c | 39 +++++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c
index 55af36f40bc1..3185ae3a6e1f 100644
--- a/sys/kern/sched_ule.c
+++ b/sys/kern/sched_ule.c
@@ -300,6 +300,8 @@ static struct tdq	tdq_cpu;
 #define	TDQ_LOCK_ASSERT(t, type)	mtx_assert(TDQ_LOCKPTR((t)), (type))
 #define	TDQ_LOCK(t)		mtx_lock_spin(TDQ_LOCKPTR((t)))
 #define	TDQ_LOCK_FLAGS(t, f)	mtx_lock_spin_flags(TDQ_LOCKPTR((t)), (f))
+#define	TDQ_TRYLOCK(t)		mtx_trylock_spin(TDQ_LOCKPTR((t)))
+#define	TDQ_TRYLOCK_FLAGS(t, f)	mtx_trylock_spin_flags(TDQ_LOCKPTR((t)), (f))
 #define	TDQ_UNLOCK(t)		mtx_unlock_spin(TDQ_LOCKPTR((t)))
 #define	TDQ_LOCKPTR(t)		((struct mtx *)(&(t)->tdq_lock))
 
@@ -989,13 +991,22 @@ tdq_idled(struct tdq *tdq)
 		if (steal->tdq_load < steal_thresh ||
 		    steal->tdq_transferable == 0)
 			goto restart;
-		tdq_lock_pair(tdq, steal);
 		/*
-		 * We were assigned a thread while waiting for the locks.
-		 * Switch to it now instead of stealing a thread.
+		 * Try to lock both queues. If we are assigned a thread while
+		 * waited for the lock, switch to it now instead of stealing.
+		 * If we can't get the lock, then somebody likely got there
+		 * first so continue searching.
 		 */
-		if (tdq->tdq_load)
-			break;
+		TDQ_LOCK(tdq);
+		if (tdq->tdq_load > 0) {
+			mi_switch(SW_VOL | SWT_IDLE);
+			return (0);
+		}
+		if (TDQ_TRYLOCK_FLAGS(steal, MTX_DUPOK) == 0) {
+			TDQ_UNLOCK(tdq);
+			CPU_CLR(cpu, &mask);
+			continue;
+		}
 		/*
 		 * The data returned by sched_highest() is stale and
 		 * the chosen CPU no longer has an eligible thread, or
@@ -1948,18 +1959,18 @@ tdq_trysteal(struct tdq *tdq)
 		if (steal->tdq_load < steal_thresh ||
 		    steal->tdq_transferable == 0)
 			continue;
-		tdq_lock_pair(tdq, steal);
 		/*
-		 * If we get to this point, unconditonally exit the loop
-		 * to bound the time spent in the critcal section.
-		 *
-		 * If a thread was added while interrupts were disabled don't
-		 * steal one here.
+		 * Try to lock both queues. If we are assigned a thread while
+		 * waited for the lock, switch to it now instead of stealing.
+		 * If we can't get the lock, then somebody likely got there
+		 * first.  At this point unconditonally exit the loop to
+		 * bound the time spent in the critcal section.
 		 */
-		if (tdq->tdq_load > 0) {
-			TDQ_UNLOCK(steal);
+		TDQ_LOCK(tdq);
+		if (tdq->tdq_load > 0)
+			break;
+		if (TDQ_TRYLOCK_FLAGS(steal, MTX_DUPOK) == 0)
 			break;
-		}
 		/*
 		 * The data returned by sched_highest() is stale and
                  * the chosen CPU no longer has an eligible thread.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202108280221.17S2LVT7052187>