Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 May 2018 15:58:28 +0000 (UTC)
From:      Mateusz Guzik <mjg@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-11@freebsd.org
Subject:   svn commit: r334437 - in stable/11/sys: kern sys
Message-ID:  <201805311558.w4VFwSL6065914@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: mjg
Date: Thu May 31 15:58:28 2018
New Revision: 334437
URL: https://svnweb.freebsd.org/changeset/base/334437

Log:
  MFC r329276,r329451,r330294,r330414,r330415,r330418,r331109,r332394,r332398,
      r333831:
  
      rwlock: diff-reduction of runlock compared to sx sunlock
  
  ==
  
      Undo LOCK_PROFILING pessimisation after r313454 and r313455
  
      With the option used to compile the kernel both sx and rw shared ops would
      always go to the slow path which added avoidable overhead even when the
      facility is disabled.
  
      Furthermore the increased time spent doing uncontested shared lock acquire
      would be bogusly added to total wait time, somewhat skewing the results.
  
      Restore old behaviour of going there only when profiling is enabled.
  
      This change is a no-op for kernels without LOCK_PROFILING (which is the
      default).
  
  ==
  
      sx: fix adaptive spinning broken in r327397
  
      The condition was flipped.
  
      In particular heavy multithreaded kernel builds on zfs started suffering
      due to nested sx locks.
  
      For instance make -s -j 128 buildkernel:
  
      before: 3326.67s user 1269.62s system 6981% cpu 1:05.84 total
      after: 3365.55s user 911.27s system 6871% cpu 1:02.24 total
  
  ==
  
      locks: fix a corner case in r327399
  
      If there were exactly rowner_retries/asx_retries (by default: 10) transitions
      between read and write state and the waiters still did not get the lock, the
      next owner -> reader transition would result in the code correctly falling
      back to turnstile/sleepq where it would incorrectly think it was waiting
      for a writer and decide to leave turnstile/sleepq to loop back. From this
      point it would take ts/sq trips until the lock gets released.
  
      The bug sometimes manifested itself in stalls during -j 128 package builds.
  
      Refactor the code to fix the bug, while here remove some of the gratituous
      differences between rw and sx locks.
  
  ==
  
      sx: don't do an atomic op in upgrade if it cananot succeed
  
      The code already pays the cost of reading the lock to obtain the waiters
      flag. Checking whether there is more than one reader is not a problem and
      avoids dirtying the line.
  
      This also fixes a small corner case: if waiters were to show up between
      reading the flag and upgrading the lock, the operation would fail even
      though it should not. No correctness change here though.
  
  ==
  
      mtx: tidy up recursion handling in thread lock
  
      Normally after grabbing the lock it has to be verified we got the right one
      to begin with. However, if we are recursing, it must not change thus the
      check can be avoided. In particular this avoids a lock read for non-recursing
      case which found out the lock was changed.
  
      While here avoid an irq trip of this happens.
  
  ==
  
      locks: slightly depessimize lockstat
  
      The slow path is always taken when lockstat is enabled. This induces
      rdtsc (or other) calls to get the cycle count even when there was no
      contention.
  
      Still go to the slow path to not mess with the fast path, but avoid
      the heavy lifting unless necessary.
  
      This reduces sys and real time during -j 80 buildkernel:
      before: 3651.84s user 1105.59s system 5394% cpu 1:28.18 total
      after: 3685.99s user 975.74s system 5450% cpu 1:25.53 total
      disabled: 3697.96s user 411.13s system 5261% cpu 1:18.10 total
  
      So note this is still a significant hit.
  
      LOCK_PROFILING results are not affected.
  
  ==
  
      rw: whack avoidable re-reads in try_upgrade
  
  ==
  
      locks: extend speculative spin waiting for readers to drain
  
      Now that 10 years have passed since the original limit of 10000 was
      committed, bump it a little bit.
  
      Spinning waiting for writers is semi-informed in the sense that we always
      know if the owner is running and base the decision to spin on that.
      However, no such information is provided for read-locking. In particular
      this means that it is possible for a write-spinner to completely waste cpu
      time waiting for the lock to be released, while the reader holding it was
      preempted and is now waiting for the spinner to go off cpu.
  
      Nonetheless, in majority of cases it is an improvement to spin instead of
      instantly giving up and going to sleep.
  
      The current approach is pretty simple: snatch the number of current readers
      and performs that many pauses before checking again. The total number of
      pauses to execute is limited to 10k. If the lock is still not free by
      that time, go to sleep.
  
      Given the previously noted problem of not knowing whether spinning makes
      any sense to begin with the new limit has to remain rather conservative.
      But at the very least it should also be related to the machine. Waiting
      for writers uses parameters selected based on the number of activated
      hardware threads. The upper limit of pause instructions to be executed
      in-between re-reads of the lock is typically 16384 or 32678. It was
      selected as the limit of total spins. The lower bound is set to
      already present 10000 as to not change it for smaller machines.
  
      Bumping the limit reduces system time by few % during benchmarks like
      buildworld, buildkernel and others. Tested on 2 and 4 socket machines
      (Broadwell, Skylake).
  
      Figuring out how to make a more informed decision while not pessimizing
      the fast path is left as an exercise for the reader.
  
  ==
  
      fix uninitialized variable warning in reader locks
  
  Approved by:	re (marius)

Modified:
  stable/11/sys/kern/kern_mutex.c
  stable/11/sys/kern/kern_rwlock.c
  stable/11/sys/kern/kern_sx.c
  stable/11/sys/sys/lockstat.h

Modified: stable/11/sys/kern/kern_mutex.c
==============================================================================
--- stable/11/sys/kern/kern_mutex.c	Thu May 31 15:41:56 2018	(r334436)
+++ stable/11/sys/kern/kern_mutex.c	Thu May 31 15:58:28 2018	(r334437)
@@ -486,8 +486,25 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
 #if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
 	int doing_lockprof;
 #endif
+
 	td = curthread;
 	tid = (uintptr_t)td;
+	m = mtxlock2mtx(c);
+
+#ifdef KDTRACE_HOOKS
+	if (LOCKSTAT_PROFILE_ENABLED(adaptive__acquire)) {
+		while (v == MTX_UNOWNED) {
+			if (_mtx_obtain_lock_fetch(m, &v, tid))
+				goto out_lockstat;
+		}
+		doing_lockprof = 1;
+		all_time -= lockstat_nsecs(&m->lock_object);
+	}
+#endif
+#ifdef LOCK_PROFILING
+	doing_lockprof = 1;
+#endif
+
 	if (SCHEDULER_STOPPED_TD(td))
 		return;
 
@@ -496,7 +513,7 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
 #elif defined(KDTRACE_HOOKS)
 	lock_delay_arg_init(&lda, NULL);
 #endif
-	m = mtxlock2mtx(c);
+
 	if (__predict_false(v == MTX_UNOWNED))
 		v = MTX_READ_VALUE(m);
 
@@ -527,13 +544,6 @@ __mtx_lock_sleep(volatile uintptr_t *c, uintptr_t v)
 		CTR4(KTR_LOCK,
 		    "_mtx_lock_sleep: %s contested (lock=%p) at %s:%d",
 		    m->lock_object.lo_name, (void *)m->mtx_lock, file, line);
-#ifdef LOCK_PROFILING
-	doing_lockprof = 1;
-#elif defined(KDTRACE_HOOKS)
-	doing_lockprof = lockstat_enabled;
-	if (__predict_false(doing_lockprof))
-		all_time -= lockstat_nsecs(&m->lock_object);
-#endif
 
 	for (;;) {
 		if (v == MTX_UNOWNED) {
@@ -637,10 +647,6 @@ retry_turnstile:
 #endif
 #ifdef KDTRACE_HOOKS
 	all_time += lockstat_nsecs(&m->lock_object);
-#endif
-	LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, m, contested,
-	    waittime, file, line);
-#ifdef KDTRACE_HOOKS
 	if (sleep_time)
 		LOCKSTAT_RECORD1(adaptive__block, m, sleep_time);
 
@@ -649,7 +655,10 @@ retry_turnstile:
 	 */
 	if (lda.spin_cnt > sleep_cnt)
 		LOCKSTAT_RECORD1(adaptive__spin, m, all_time - sleep_time);
+out_lockstat:
 #endif
+	LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(adaptive__acquire, m, contested,
+	    waittime, file, line);
 }
 
 #ifdef SMP
@@ -685,6 +694,20 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t
 	tid = (uintptr_t)curthread;
 	m = mtxlock2mtx(c);
 
+#ifdef KDTRACE_HOOKS
+	if (LOCKSTAT_PROFILE_ENABLED(adaptive__acquire)) {
+		while (v == MTX_UNOWNED) {
+			if (_mtx_obtain_lock_fetch(m, &v, tid))
+				goto out_lockstat;
+		}
+		doing_lockprof = 1;
+		spin_time -= lockstat_nsecs(&m->lock_object);
+	}
+#endif
+#ifdef LOCK_PROFILING
+	doing_lockprof = 1;
+#endif
+
 	if (__predict_false(v == MTX_UNOWNED))
 		v = MTX_READ_VALUE(m);
 
@@ -707,13 +730,7 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t
 	PMC_SOFT_CALL( , , lock, failed);
 #endif
 	lock_profile_obtain_lock_failed(&m->lock_object, &contested, &waittime);
-#ifdef LOCK_PROFILING
-	doing_lockprof = 1;
-#elif defined(KDTRACE_HOOKS)
-	doing_lockprof = lockstat_enabled;
-	if (__predict_false(doing_lockprof))
-		spin_time -= lockstat_nsecs(&m->lock_object);
-#endif
+
 	for (;;) {
 		if (v == MTX_UNOWNED) {
 			if (_mtx_obtain_lock_fetch(m, &v, tid))
@@ -744,13 +761,12 @@ _mtx_lock_spin_cookie(volatile uintptr_t *c, uintptr_t
 #endif
 #ifdef KDTRACE_HOOKS
 	spin_time += lockstat_nsecs(&m->lock_object);
-#endif
-	LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, m,
-	    contested, waittime, file, line);
-#ifdef KDTRACE_HOOKS
 	if (lda.spin_cnt != 0)
 		LOCKSTAT_RECORD1(spin__spin, m, spin_time);
+out_lockstat:
 #endif
+	LOCKSTAT_PROFILE_OBTAIN_LOCK_SUCCESS(spin__acquire, m,
+	    contested, waittime, file, line);
 }
 #endif /* SMP */
 
@@ -806,10 +822,8 @@ _thread_lock(struct thread *td)
 		WITNESS_LOCK(&m->lock_object, LOP_EXCLUSIVE, file, line);
 		return;
 	}
-	if (m->mtx_recurse != 0)
-		m->mtx_recurse--;
-	else
-		_mtx_release_lock_quick(m);
+	MPASS(m->mtx_recurse == 0);
+	_mtx_release_lock_quick(m);
 slowpath_unlocked:
 	spinlock_exit();
 slowpath_noirq:
@@ -863,9 +877,10 @@ thread_lock_flags_(struct thread *td, int opts, const 
 	if (__predict_false(doing_lockprof))
 		spin_time -= lockstat_nsecs(&td->td_lock->lock_object);
 #endif
+	spinlock_enter();
+
 	for (;;) {
 retry:
-		spinlock_enter();
 		m = td->td_lock;
 		thread_lock_validate(m, opts, file, line);
 		v = MTX_READ_VALUE(m);
@@ -877,6 +892,7 @@ retry:
 			}
 			if (v == tid) {
 				m->mtx_recurse++;
+				MPASS(m == td->td_lock);
 				break;
 			}
 			lock_profile_obtain_lock_failed(&m->lock_object,
@@ -889,15 +905,18 @@ retry:
 				} else {
 					_mtx_lock_indefinite_check(m, &lda);
 				}
-				if (m != td->td_lock)
+				if (m != td->td_lock) {
+					spinlock_enter();
 					goto retry;
+				}
 				v = MTX_READ_VALUE(m);
 			} while (v != MTX_UNOWNED);
 			spinlock_enter();
 		}
 		if (m == td->td_lock)
 			break;
-		__mtx_unlock_spin(m);	/* does spinlock_exit() */
+		MPASS(m->mtx_recurse == 0);
+		_mtx_release_lock_quick(m);
 	}
 	LOCK_LOG_LOCK("LOCK", &m->lock_object, opts, m->mtx_recurse, file,
 	    line);

Modified: stable/11/sys/kern/kern_rwlock.c
==============================================================================
--- stable/11/sys/kern/kern_rwlock.c	Thu May 31 15:41:56 2018	(r334436)
+++ stable/11/sys/kern/kern_rwlock.c	Thu May 31 15:58:28 2018	(r334437)
@@ -93,8 +93,8 @@ struct lock_class lock_class_rw = {
 };
 
 #ifdef ADAPTIVE_RWLOCKS
-static int __read_frequently rowner_retries = 10;
-static int __read_frequently rowner_loops = 10000;
+static int __read_frequently rowner_retries;
+static int __read_frequently rowner_loops;
 static SYSCTL_NODE(_debug, OID_AUTO, rwlock, CTLFLAG_RD, NULL,
     "rwlock debugging");
 SYSCTL_INT(_debug_rwlock, OID_AUTO, retry, CTLFLAG_RW, &rowner_retries, 0, "");
@@ -107,7 +107,15 @@ SYSCTL_INT(_debug_rwlock, OID_AUTO, delay_base, CTLFLA
 SYSCTL_INT(_debug_rwlock, OID_AUTO, delay_max, CTLFLAG_RW, &rw_delay.max,
     0, "");
 
-LOCK_DELAY_SYSINIT_DEFAULT(rw_delay);
+static void
+rw_lock_delay_init(void *arg __unused)
+{
+
+	lock_delay_default_init(&rw_delay);
+	rowner_retries = 10;
+	rowner_loops = max(10000, rw_delay.max);
+}
+LOCK_DELAY_SYSINIT(rw_lock_delay_init);
 #endif
 
 /*
@@ -436,9 +444,23 @@ __rw_rlock_hard(struct rwlock *rw, struct thread *td, 
 #endif
 #if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
 	uintptr_t state;
-	int doing_lockprof;
+	int doing_lockprof = 0;
 #endif
 
+#ifdef KDTRACE_HOOKS
+	if (LOCKSTAT_PROFILE_ENABLED(rw__acquire)) {
+		if (__rw_rlock_try(rw, td, &v, false LOCK_FILE_LINE_ARG))
+			goto out_lockstat;
+		doing_lockprof = 1;
+		all_time -= lockstat_nsecs(&rw->lock_object);
+		state = v;
+	}
+#endif
+#ifdef LOCK_PROFILING
+	doing_lockprof = 1;
+	state = v;
+#endif
+
 	if (SCHEDULER_STOPPED())
 		return;
 
@@ -454,17 +476,6 @@ __rw_rlock_hard(struct rwlock *rw, struct thread *td, 
 	lock_profile_obtain_lock_failed(&rw->lock_object,
 	    &contested, &waittime);
 
-#ifdef LOCK_PROFILING
-	doing_lockprof = 1;
-	state = v;
-#elif defined(KDTRACE_HOOKS)
-	doing_lockprof = lockstat_enabled;
-	if (__predict_false(doing_lockprof)) {
-		all_time -= lockstat_nsecs(&rw->lock_object);
-		state = v;
-	}
-#endif
-
 	for (;;) {
 		if (__rw_rlock_try(rw, td, &v, false LOCK_FILE_LINE_ARG))
 			break;
@@ -613,6 +624,7 @@ retry_ts:
 		LOCKSTAT_RECORD4(rw__spin, rw, all_time - sleep_time,
 		    LOCKSTAT_READER, (state & RW_LOCK_READ) == 0,
 		    (state & RW_LOCK_READ) == 0 ? 0 : RW_READERS(state));
+out_lockstat:
 #endif
 	/*
 	 * TODO: acquire "owner of record" here.  Here be turnstile dragons
@@ -643,9 +655,12 @@ __rw_rlock_int(struct rwlock *rw LOCK_FILE_LINE_ARG_DE
 	WITNESS_CHECKORDER(&rw->lock_object, LOP_NEWORDER, file, line, NULL);
 
 	v = RW_READ_VALUE(rw);
-	if (__predict_false(LOCKSTAT_OOL_PROFILE_ENABLED(rw__acquire) ||
+	if (__predict_false(LOCKSTAT_PROFILE_ENABLED(rw__acquire) ||
 	    !__rw_rlock_try(rw, td, &v, true LOCK_FILE_LINE_ARG)))
 		__rw_rlock_hard(rw, td, v LOCK_FILE_LINE_ARG);
+	else
+		lock_profile_obtain_lock_success(&rw->lock_object, 0, 0,
+		    file, line);
 
 	LOCK_LOG_LOCK("RLOCK", &rw->lock_object, 0, 0, file, line);
 	WITNESS_LOCK(&rw->lock_object, 0, file, line);
@@ -758,22 +773,19 @@ __rw_runlock_hard(struct rwlock *rw, struct thread *td
 	if (SCHEDULER_STOPPED())
 		return;
 
+	if (__rw_runlock_try(rw, td, &v))
+		goto out_lockstat;
+
+	/*
+	 * Ok, we know we have waiters and we think we are the
+	 * last reader, so grab the turnstile lock.
+	 */
+	turnstile_chain_lock(&rw->lock_object);
+	v = RW_READ_VALUE(rw);
 	for (;;) {
 		if (__rw_runlock_try(rw, td, &v))
 			break;
 
-		/*
-		 * Ok, we know we have waiters and we think we are the
-		 * last reader, so grab the turnstile lock.
-		 */
-		turnstile_chain_lock(&rw->lock_object);
-		v = RW_READ_VALUE(rw);
-retry_ts:
-		if (__rw_runlock_try(rw, td, &v)) {
-			turnstile_chain_unlock(&rw->lock_object);
-			break;
-		}
-
 		v &= (RW_LOCK_WAITERS | RW_LOCK_WRITE_SPINNER);
 		MPASS(v & RW_LOCK_WAITERS);
 
@@ -801,7 +813,7 @@ retry_ts:
 		}
 		v |= RW_READERS_LOCK(1);
 		if (!atomic_fcmpset_rel_ptr(&rw->rw_lock, &v, setv))
-			goto retry_ts;
+			continue;
 		if (LOCK_LOG_TEST(&rw->lock_object, 0))
 			CTR2(KTR_LOCK, "%s: %p last succeeded with waiters",
 			    __func__, rw);
@@ -817,10 +829,11 @@ retry_ts:
 		MPASS(ts != NULL);
 		turnstile_broadcast(ts, queue);
 		turnstile_unpend(ts, TS_SHARED_LOCK);
-		turnstile_chain_unlock(&rw->lock_object);
 		td->td_rw_rlocks--;
 		break;
 	}
+	turnstile_chain_unlock(&rw->lock_object);
+out_lockstat:
 	LOCKSTAT_PROFILE_RELEASE_RWLOCK(rw__release, rw, LOCKSTAT_READER);
 }
 
@@ -839,9 +852,11 @@ _rw_runlock_cookie_int(struct rwlock *rw LOCK_FILE_LIN
 	td = curthread;
 	v = RW_READ_VALUE(rw);
 
-	if (__predict_false(LOCKSTAT_OOL_PROFILE_ENABLED(rw__release) ||
+	if (__predict_false(LOCKSTAT_PROFILE_ENABLED(rw__release) ||
 	    !__rw_runlock_try(rw, td, &v)))
 		__rw_runlock_hard(rw, td, v LOCK_FILE_LINE_ARG);
+	else
+		lock_profile_release_lock(&rw->lock_object);
 
 	TD_LOCKS_DEC(curthread);
 }
@@ -870,7 +885,7 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 #ifdef ADAPTIVE_RWLOCKS
 	int spintries = 0;
 	int i, n;
-	int sleep_reason = 0;
+	enum { READERS, WRITER } sleep_reason = READERS;
 #endif
 	uintptr_t x;
 #ifdef LOCK_PROFILING
@@ -887,10 +902,28 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 #endif
 #if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
 	uintptr_t state;
-	int doing_lockprof;
+	int doing_lockprof = 0;
 #endif
 
 	tid = (uintptr_t)curthread;
+	rw = rwlock2rw(c);
+
+#ifdef KDTRACE_HOOKS
+	if (LOCKSTAT_PROFILE_ENABLED(rw__acquire)) {
+		while (v == RW_UNLOCKED) {
+			if (_rw_write_lock_fetch(rw, &v, tid))
+				goto out_lockstat;
+		}
+		doing_lockprof = 1;
+		all_time -= lockstat_nsecs(&rw->lock_object);
+		state = v;
+	}
+#endif
+#ifdef LOCK_PROFILING
+	doing_lockprof = 1;
+	state = v;
+#endif
+
 	if (SCHEDULER_STOPPED())
 		return;
 
@@ -899,7 +932,6 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 #elif defined(KDTRACE_HOOKS)
 	lock_delay_arg_init(&lda, NULL);
 #endif
-	rw = rwlock2rw(c);
 	if (__predict_false(v == RW_UNLOCKED))
 		v = RW_READ_VALUE(rw);
 
@@ -924,17 +956,6 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 	lock_profile_obtain_lock_failed(&rw->lock_object,
 	    &contested, &waittime);
 
-#ifdef LOCK_PROFILING
-	doing_lockprof = 1;
-	state = v;
-#elif defined(KDTRACE_HOOKS)
-	doing_lockprof = lockstat_enabled;
-	if (__predict_false(doing_lockprof)) {
-		all_time -= lockstat_nsecs(&rw->lock_object);
-		state = v;
-	}
-#endif
-
 	for (;;) {
 		if (v == RW_UNLOCKED) {
 			if (_rw_write_lock_fetch(rw, &v, tid))
@@ -951,9 +972,11 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 		 * running on another CPU, spin until the owner stops
 		 * running or the state of the lock changes.
 		 */
-		sleep_reason = 1;
-		owner = lv_rw_wowner(v);
-		if (!(v & RW_LOCK_READ) && TD_IS_RUNNING(owner)) {
+		if (!(v & RW_LOCK_READ)) {
+			sleep_reason = WRITER;
+			owner = lv_rw_wowner(v);
+			if (!TD_IS_RUNNING(owner))
+				goto ts;
 			if (LOCK_LOG_TEST(&rw->lock_object, 0))
 				CTR3(KTR_LOCK, "%s: spinning on %p held by %p",
 				    __func__, rw, owner);
@@ -968,9 +991,10 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 			KTR_STATE0(KTR_SCHED, "thread", sched_tdname(curthread),
 			    "running");
 			continue;
-		}
-		if ((v & RW_LOCK_READ) && RW_READERS(v) &&
-		    spintries < rowner_retries) {
+		} else if (RW_READERS(v) > 0) {
+			sleep_reason = READERS;
+			if (spintries == rowner_retries)
+				goto ts;
 			if (!(v & RW_LOCK_WRITE_SPINNER)) {
 				if (!atomic_fcmpset_ptr(&rw->rw_lock, &v,
 				    v | RW_LOCK_WRITE_SPINNER)) {
@@ -988,15 +1012,15 @@ __rw_wlock_hard(volatile uintptr_t *c, uintptr_t v LOC
 				if ((v & RW_LOCK_WRITE_SPINNER) == 0)
 					break;
 			}
-			KTR_STATE0(KTR_SCHED, "thread", sched_tdname(curthread),
-			    "running");
 #ifdef KDTRACE_HOOKS
-			lda.spin_cnt += rowner_loops - i;
+			lda.spin_cnt += i;
 #endif
+			KTR_STATE0(KTR_SCHED, "thread", sched_tdname(curthread),
+			    "running");
 			if (i < rowner_loops)
 				continue;
-			sleep_reason = 2;
 		}
+ts:
 #endif
 		ts = turnstile_trywait(&rw->lock_object);
 		v = RW_READ_VALUE(rw);
@@ -1016,7 +1040,7 @@ retry_ts:
 				turnstile_cancel(ts);
 				continue;
 			}
-		} else if (RW_READERS(v) > 0 && sleep_reason == 1) {
+		} else if (RW_READERS(v) > 0 && sleep_reason == WRITER) {
 			turnstile_cancel(ts);
 			continue;
 		}
@@ -1093,6 +1117,7 @@ retry_ts:
 		LOCKSTAT_RECORD4(rw__spin, rw, all_time - sleep_time,
 		    LOCKSTAT_WRITER, (state & RW_LOCK_READ) == 0,
 		    (state & RW_LOCK_READ) == 0 ? 0 : RW_READERS(state));
+out_lockstat:
 #endif
 	LOCKSTAT_PROFILE_OBTAIN_RWLOCK_SUCCESS(rw__acquire, rw, contested,
 	    waittime, file, line, LOCKSTAT_WRITER);
@@ -1185,7 +1210,7 @@ __rw_wunlock_hard(volatile uintptr_t *c, uintptr_t v L
 int
 __rw_try_upgrade_int(struct rwlock *rw LOCK_FILE_LINE_ARG_DEF)
 {
-	uintptr_t v, x, tid;
+	uintptr_t v, setv, tid;
 	struct turnstile *ts;
 	int success;
 
@@ -1205,12 +1230,12 @@ __rw_try_upgrade_int(struct rwlock *rw LOCK_FILE_LINE_
 	 */
 	tid = (uintptr_t)curthread;
 	success = 0;
+	v = RW_READ_VALUE(rw);
 	for (;;) {
-		v = rw->rw_lock;
 		if (RW_READERS(v) > 1)
 			break;
 		if (!(v & RW_LOCK_WAITERS)) {
-			success = atomic_cmpset_acq_ptr(&rw->rw_lock, v, tid);
+			success = atomic_fcmpset_acq_ptr(&rw->rw_lock, &v, tid);
 			if (!success)
 				continue;
 			break;
@@ -1220,7 +1245,8 @@ __rw_try_upgrade_int(struct rwlock *rw LOCK_FILE_LINE_
 		 * Ok, we think we have waiters, so lock the turnstile.
 		 */
 		ts = turnstile_trywait(&rw->lock_object);
-		v = rw->rw_lock;
+		v = RW_READ_VALUE(rw);
+retry_ts:
 		if (RW_READERS(v) > 1) {
 			turnstile_cancel(ts);
 			break;
@@ -1231,16 +1257,16 @@ __rw_try_upgrade_int(struct rwlock *rw LOCK_FILE_LINE_
 		 * If we obtain the lock with the flags set, then claim
 		 * ownership of the turnstile.
 		 */
-		x = rw->rw_lock & RW_LOCK_WAITERS;
-		success = atomic_cmpset_ptr(&rw->rw_lock, v, tid | x);
+		setv = tid | (v & RW_LOCK_WAITERS);
+		success = atomic_fcmpset_ptr(&rw->rw_lock, &v, setv);
 		if (success) {
-			if (x)
+			if (v & RW_LOCK_WAITERS)
 				turnstile_claim(ts);
 			else
 				turnstile_cancel(ts);
 			break;
 		}
-		turnstile_cancel(ts);
+		goto retry_ts;
 	}
 	LOCK_LOG_TRY("WUPGRADE", &rw->lock_object, 0, success, file, line);
 	if (success) {

Modified: stable/11/sys/kern/kern_sx.c
==============================================================================
--- stable/11/sys/kern/kern_sx.c	Thu May 31 15:41:56 2018	(r334436)
+++ stable/11/sys/kern/kern_sx.c	Thu May 31 15:58:28 2018	(r334437)
@@ -143,8 +143,8 @@ struct lock_class lock_class_sx = {
 #endif
 
 #ifdef ADAPTIVE_SX
-static __read_frequently u_int asx_retries = 10;
-static __read_frequently u_int asx_loops = 10000;
+static __read_frequently u_int asx_retries;
+static __read_frequently u_int asx_loops;
 static SYSCTL_NODE(_debug, OID_AUTO, sx, CTLFLAG_RD, NULL, "sxlock debugging");
 SYSCTL_UINT(_debug_sx, OID_AUTO, retries, CTLFLAG_RW, &asx_retries, 0, "");
 SYSCTL_UINT(_debug_sx, OID_AUTO, loops, CTLFLAG_RW, &asx_loops, 0, "");
@@ -156,7 +156,15 @@ SYSCTL_INT(_debug_sx, OID_AUTO, delay_base, CTLFLAG_RW
 SYSCTL_INT(_debug_sx, OID_AUTO, delay_max, CTLFLAG_RW, &sx_delay.max,
     0, "");
 
-LOCK_DELAY_SYSINIT_DEFAULT(sx_delay);
+static void
+sx_lock_delay_init(void *arg __unused)
+{
+
+	lock_delay_default_init(&sx_delay);
+	asx_retries = 10;
+	asx_loops = max(10000, sx_delay.max);
+}
+LOCK_DELAY_SYSINIT(sx_lock_delay_init);
 #endif
 
 void
@@ -411,6 +419,7 @@ int
 sx_try_upgrade_int(struct sx *sx LOCK_FILE_LINE_ARG_DEF)
 {
 	uintptr_t x;
+	uintptr_t waiters;
 	int success;
 
 	if (SCHEDULER_STOPPED())
@@ -425,9 +434,18 @@ sx_try_upgrade_int(struct sx *sx LOCK_FILE_LINE_ARG_DE
 	 * to maintain the SX_LOCK_EXCLUSIVE_WAITERS flag if set so that
 	 * we will wake up the exclusive waiters when we drop the lock.
 	 */
-	x = sx->sx_lock & SX_LOCK_EXCLUSIVE_WAITERS;
-	success = atomic_cmpset_acq_ptr(&sx->sx_lock, SX_SHARERS_LOCK(1) | x,
-	    (uintptr_t)curthread | x);
+	success = 0;
+	x = SX_READ_VALUE(sx);
+	for (;;) {
+		if (SX_SHARERS(x) > 1)
+			break;
+		waiters = (x & SX_LOCK_EXCLUSIVE_WAITERS);
+		if (atomic_fcmpset_acq_ptr(&sx->sx_lock, &x,
+		    (uintptr_t)curthread | waiters)) {
+			success = 1;
+			break;
+		}
+	}
 	LOCK_LOG_TRY("XUPGRADE", &sx->lock_object, 0, success, file, line);
 	if (success) {
 		WITNESS_UPGRADE(&sx->lock_object, LOP_EXCLUSIVE | LOP_TRYLOCK,
@@ -531,8 +549,8 @@ _sx_xlock_hard(struct sx *sx, uintptr_t x, int opts LO
 #ifdef ADAPTIVE_SX
 	volatile struct thread *owner;
 	u_int i, n, spintries = 0;
+	enum { READERS, WRITER } sleep_reason = READERS;
 	bool adaptive;
-	int sleep_reason = 0;
 #endif
 #ifdef LOCK_PROFILING
 	uint64_t waittime = 0;
@@ -548,11 +566,28 @@ _sx_xlock_hard(struct sx *sx, uintptr_t x, int opts LO
 	int64_t all_time = 0;
 #endif
 #if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
-	uintptr_t state;
+	uintptr_t state = 0;
 #endif
 	int extra_work = 0;
 
 	tid = (uintptr_t)curthread;
+
+#ifdef KDTRACE_HOOKS
+	if (LOCKSTAT_PROFILE_ENABLED(sx__acquire)) {
+		while (x == SX_LOCK_UNLOCKED) {
+			if (atomic_fcmpset_acq_ptr(&sx->sx_lock, &x, tid))
+				goto out_lockstat;
+		}
+		extra_work = 1;
+		all_time -= lockstat_nsecs(&sx->lock_object);
+		state = x;
+	}
+#endif
+#ifdef LOCK_PROFILING
+	extra_work = 1;
+	state = x;
+#endif
+
 	if (SCHEDULER_STOPPED())
 		return (0);
 
@@ -582,7 +617,7 @@ _sx_xlock_hard(struct sx *sx, uintptr_t x, int opts LO
 		    sx->lock_object.lo_name, (void *)sx->sx_lock, file, line);
 
 #ifdef ADAPTIVE_SX
-	adaptive = ((sx->lock_object.lo_flags & SX_NOADAPTIVE) != 0);
+	adaptive = ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0);
 #endif
 
 #ifdef HWPMC_HOOKS
@@ -591,16 +626,6 @@ _sx_xlock_hard(struct sx *sx, uintptr_t x, int opts LO
 	lock_profile_obtain_lock_failed(&sx->lock_object, &contested,
 	    &waittime);
 
-#ifdef LOCK_PROFILING
-	extra_work = 1;
-	state = x;
-#elif defined(KDTRACE_HOOKS)
-	extra_work = lockstat_enabled;
-	if (__predict_false(extra_work)) {
-		all_time -= lockstat_nsecs(&sx->lock_object);
-		state = x;
-	}
-#endif
 #ifndef INVARIANTS
 	GIANT_SAVE(extra_work);
 #endif
@@ -626,37 +651,33 @@ _sx_xlock_hard(struct sx *sx, uintptr_t x, int opts LO
 		 * running or the state of the lock changes.
 		 */
 		if ((x & SX_LOCK_SHARED) == 0) {
+			sleep_reason = WRITER;
 			owner = lv_sx_owner(x);
-			if (TD_IS_RUNNING(owner)) {
-				if (LOCK_LOG_TEST(&sx->lock_object, 0))
-					CTR3(KTR_LOCK,
-				    "%s: spinning on %p held by %p",
-					    __func__, sx, owner);
-				KTR_STATE1(KTR_SCHED, "thread",
-				    sched_tdname(curthread), "spinning",
-				    "lockname:\"%s\"",
-				    sx->lock_object.lo_name);
-				do {
-					lock_delay(&lda);
-					x = SX_READ_VALUE(sx);
-					owner = lv_sx_owner(x);
-				} while (owner != NULL &&
-					    TD_IS_RUNNING(owner));
-				KTR_STATE0(KTR_SCHED, "thread",
-				    sched_tdname(curthread), "running");
-				continue;
-			}
-			sleep_reason = 1;
-		} else if (SX_SHARERS(x) && spintries < asx_retries) {
-			KTR_STATE1(KTR_SCHED, "thread",
-			    sched_tdname(curthread), "spinning",
-			    "lockname:\"%s\"", sx->lock_object.lo_name);
+			if (!TD_IS_RUNNING(owner))
+				goto sleepq;
+			if (LOCK_LOG_TEST(&sx->lock_object, 0))
+				CTR3(KTR_LOCK, "%s: spinning on %p held by %p",
+				    __func__, sx, owner);
+			KTR_STATE1(KTR_SCHED, "thread", sched_tdname(curthread),
+			    "spinning", "lockname:\"%s\"",
+			    sx->lock_object.lo_name);
+			do {
+				lock_delay(&lda);
+				x = SX_READ_VALUE(sx);
+				owner = lv_sx_owner(x);
+			} while (owner != NULL && TD_IS_RUNNING(owner));
+			KTR_STATE0(KTR_SCHED, "thread", sched_tdname(curthread),
+			    "running");
+			continue;
+		} else if (SX_SHARERS(x) > 0) {
+			sleep_reason = READERS;
+			if (spintries == asx_retries)
+				goto sleepq;
 			spintries++;
+			KTR_STATE1(KTR_SCHED, "thread", sched_tdname(curthread),
+			    "spinning", "lockname:\"%s\"",
+			    sx->lock_object.lo_name);
 			for (i = 0; i < asx_loops; i += n) {
-				if (LOCK_LOG_TEST(&sx->lock_object, 0))
-					CTR4(KTR_LOCK,
-			    "%s: shared spinning on %p with %u and %u",
-					    __func__, sx, spintries, i);
 				n = SX_SHARERS(x);
 				lock_delay_spin(n);
 				x = SX_READ_VALUE(sx);
@@ -667,11 +688,10 @@ _sx_xlock_hard(struct sx *sx, uintptr_t x, int opts LO
 #ifdef KDTRACE_HOOKS
 			lda.spin_cnt += i;
 #endif
-			KTR_STATE0(KTR_SCHED, "thread",
-			    sched_tdname(curthread), "running");
+			KTR_STATE0(KTR_SCHED, "thread", sched_tdname(curthread),
+			    "running");
 			if (i < asx_loops)
 				continue;
-			sleep_reason = 2;
 		}
 sleepq:
 #endif
@@ -703,7 +723,7 @@ retry_sleepq:
 					sleepq_release(&sx->lock_object);
 					continue;
 				}
-			} else if (SX_SHARERS(x) > 0 && sleep_reason == 1) {
+			} else if (SX_SHARERS(x) > 0 && sleep_reason == WRITER) {
 				sleepq_release(&sx->lock_object);
 				continue;
 			}
@@ -793,6 +813,7 @@ retry_sleepq:
 		LOCKSTAT_RECORD4(sx__spin, sx, all_time - sleep_time,
 		    LOCKSTAT_WRITER, (state & SX_LOCK_SHARED) == 0,
 		    (state & SX_LOCK_SHARED) == 0 ? 0 : SX_SHARERS(state));
+out_lockstat:
 #endif
 	if (!error)
 		LOCKSTAT_PROFILE_OBTAIN_RWLOCK_SUCCESS(sx__acquire, sx,
@@ -921,10 +942,24 @@ _sx_slock_hard(struct sx *sx, int opts, uintptr_t x LO
 	int64_t all_time = 0;
 #endif
 #if defined(KDTRACE_HOOKS) || defined(LOCK_PROFILING)
-	uintptr_t state;
+	uintptr_t state = 0;
 #endif
 	int extra_work = 0;
 
+#ifdef KDTRACE_HOOKS
+	if (LOCKSTAT_PROFILE_ENABLED(sx__acquire)) {
+		if (__sx_slock_try(sx, &x LOCK_FILE_LINE_ARG))
+			goto out_lockstat;
+		extra_work = 1;
+		all_time -= lockstat_nsecs(&sx->lock_object);
+		state = x;
+	}
+#endif
+#ifdef LOCK_PROFILING
+	extra_work = 1;
+	state = x;
+#endif
+
 	if (SCHEDULER_STOPPED())
 		return (0);
 
@@ -935,7 +970,7 @@ _sx_slock_hard(struct sx *sx, int opts, uintptr_t x LO
 #endif
 
 #ifdef ADAPTIVE_SX
-	adaptive = ((sx->lock_object.lo_flags & SX_NOADAPTIVE) != 0);
+	adaptive = ((sx->lock_object.lo_flags & SX_NOADAPTIVE) == 0);
 #endif
 
 #ifdef HWPMC_HOOKS
@@ -944,16 +979,6 @@ _sx_slock_hard(struct sx *sx, int opts, uintptr_t x LO
 	lock_profile_obtain_lock_failed(&sx->lock_object, &contested,
 	    &waittime);
 
-#ifdef LOCK_PROFILING
-	extra_work = 1;
-	state = x;
-#elif defined(KDTRACE_HOOKS)
-	extra_work = lockstat_enabled;
-	if (__predict_false(extra_work)) {
-		all_time -= lockstat_nsecs(&sx->lock_object);
-		state = x;
-	}
-#endif
 #ifndef INVARIANTS
 	GIANT_SAVE(extra_work);
 #endif
@@ -1095,6 +1120,7 @@ retry_sleepq:
 		LOCKSTAT_RECORD4(sx__spin, sx, all_time - sleep_time,
 		    LOCKSTAT_READER, (state & SX_LOCK_SHARED) == 0,
 		    (state & SX_LOCK_SHARED) == 0 ? 0 : SX_SHARERS(state));
+out_lockstat:
 #endif
 	if (error == 0) {
 		LOCKSTAT_PROFILE_OBTAIN_RWLOCK_SUCCESS(sx__acquire, sx,
@@ -1120,9 +1146,12 @@ _sx_slock_int(struct sx *sx, int opts LOCK_FILE_LINE_A
 
 	error = 0;
 	x = SX_READ_VALUE(sx);
-	if (__predict_false(LOCKSTAT_OOL_PROFILE_ENABLED(sx__acquire) ||
+	if (__predict_false(LOCKSTAT_PROFILE_ENABLED(sx__acquire) ||
 	    !__sx_slock_try(sx, &x LOCK_FILE_LINE_ARG)))
 		error = _sx_slock_hard(sx, opts, x LOCK_FILE_LINE_ARG);
+	else
+		lock_profile_obtain_lock_success(&sx->lock_object, 0, 0,
+		    file, line);
 	if (error == 0) {
 		LOCK_LOG_LOCK("SLOCK", &sx->lock_object, 0, 0, file, line);
 		WITNESS_LOCK(&sx->lock_object, 0, file, line);
@@ -1250,9 +1279,11 @@ _sx_sunlock_int(struct sx *sx LOCK_FILE_LINE_ARG_DEF)
 	LOCK_LOG_LOCK("SUNLOCK", &sx->lock_object, 0, 0, file, line);
 
 	x = SX_READ_VALUE(sx);
-	if (__predict_false(LOCKSTAT_OOL_PROFILE_ENABLED(sx__release) ||
+	if (__predict_false(LOCKSTAT_PROFILE_ENABLED(sx__release) ||
 	    !_sx_sunlock_try(sx, &x)))
 		_sx_sunlock_hard(sx, x LOCK_FILE_LINE_ARG);
+	else
+		lock_profile_release_lock(&sx->lock_object);
 
 	TD_LOCKS_DEC(curthread);
 }

Modified: stable/11/sys/sys/lockstat.h
==============================================================================
--- stable/11/sys/sys/lockstat.h	Thu May 31 15:41:56 2018	(r334436)
+++ stable/11/sys/sys/lockstat.h	Thu May 31 15:58:28 2018	(r334437)
@@ -107,12 +107,7 @@ extern volatile int lockstat_enabled;
 	LOCKSTAT_RECORD1(probe, lp, a);					\
 } while (0)
 
-#ifndef LOCK_PROFILING
 #define	LOCKSTAT_PROFILE_ENABLED(probe)		__predict_false(lockstat_enabled)
-#define	LOCKSTAT_OOL_PROFILE_ENABLED(probe)	LOCKSTAT_PROFILE_ENABLED(probe)
-#else
-#define	LOCKSTAT_OOL_PROFILE_ENABLED(probe)	1
-#endif
 
 struct lock_object;
 uint64_t lockstat_nsecs(struct lock_object *);
@@ -137,10 +132,7 @@ uint64_t lockstat_nsecs(struct lock_object *);
 #define	LOCKSTAT_PROFILE_RELEASE_RWLOCK(probe, lp, a)  			\
 	LOCKSTAT_PROFILE_RELEASE_LOCK(probe, lp)
 
-#ifndef LOCK_PROFILING
 #define	LOCKSTAT_PROFILE_ENABLED(probe)		0
-#endif
-#define	LOCKSTAT_OOL_PROFILE_ENABLED(probe)	1
 
 #endif /* !KDTRACE_HOOKS */
 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201805311558.w4VFwSL6065914>