From owner-cvs-src@FreeBSD.ORG  Mon Nov 26 10:31:42 2007
Return-Path: <owner-cvs-src@FreeBSD.ORG>
Delivered-To: cvs-src@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BA73116A418;
	Mon, 26 Nov 2007 10:31:42 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 6E06213C478;
	Mon, 26 Nov 2007 10:31:42 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id 4195746BD1;
	Mon, 26 Nov 2007 05:35:15 -0500 (EST)
Date: Mon, 26 Nov 2007 10:31:35 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Daniel Eischen <eischen@vigrid.com>
In-Reply-To: <Pine.GSO.4.64.0711260008090.11036@sea.ntplx.net>
Message-ID: <20071126102104.L65286@fledge.watson.org>
References: <200711081447.lA8EltXO052057@repoman.freebsd.org>
	<47492064.7080108@freebsd.org>
	<Pine.GSO.4.64.0711251207590.8538@sea.ntplx.net>
	<4749B971.3000703@elischer.org>
	<Pine.GSO.4.64.0711260008090.11036@sea.ntplx.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Stephan Uphoff <ups@freebsd.org>, cvs-src@freebsd.org,
	src-committers@freebsd.org,
	Julian Elischer <julian@elischer.org>, cvs-all@freebsd.org
Subject: Re: cvs commit: src/share/man/man9 locking.9 rmlock.9 src/sys/conf
 files src/sys/kern kern_rmlock.c subr_lock.c subr_pcpu.c        
 subr_smp.c src/sys/sys _rmlock.h lock.h pcpu.h rmlock.h smp.h
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Nov 2007 10:31:43 -0000


On Mon, 26 Nov 2007, Daniel Eischen wrote:

>> Spin and blocking mutexes should in my opinion be defined as different 
>> structures, at least in name so that the compiler hits you with a clue-bat 
>> when you try use a spin-lock with non-spinlock ops etc.
>
> That seems nice, but doesn't go along well with trying to keep a similar API 
> as Solaris.  It is convenient to share the APIs so that it is easy to change 
> the mtx_init() from a default to a spin type without changing the rest of 
> the code.  We really shouldn't have a need for spin mutexes if the default 
> mutexes are adaptive, and if mutexes taken by interrupt handlers are 
> initialized in a manner similar to Solaris.  There really shouldn't be a 
> separate API for spin mutexes.  Once a mutex is initialized as MTX_DEF or 
> MTX_SPIN, that should be sufficient.

While I agree from a code hacking perspective, one of the benefits of having 
different types and APIs in the code is that you can have more rich static 
checking without having to try and manage the state associated with flags at 
init-time.  For example, maintaining the MTX_SPIN/MTX_DEF state in Coverity 
would be difficult or impossible, because global context isn't generally 
available, but by using different types locally in analyzed code, you can have 
more rigorous checking.  The "classic" example is with rwlocks/mutexes and 
sxlocks and the check for unbounded sleeping: it's ok to tsleep/msleep while 
holding an sxlock, but not while holding a mutex or rwlock unless it's the 
argument to the sleep call, and hence implicitly dropped.  If we combined all 
the types, we couldn't enforce that statically, but by using different types, 
local static analysis can flag the sleep attempt.  In much code in the kernel, 
we accomplish lock "pluggability" by using macros rather than by sharing data 
types, allowing changes at compile-time but not run-time, and this appears to 
work well in practice.

>> not sure why sx-locks exist at all, as they seem to be a variant of sleep. 
>> I think it's just a convenience function set to allow one to implement a 
>> sleep-derived synchronisation.
>
> Hmm, sx locks seem similar to rmlocks, at least according to the 
> description:
>
>  Shared/exclusive locks are used to protect data that are read far
>  more often than they are written.
>
> Do we need both?

rmlocks are most similar to rwlocks, as they have priority propogation and 
permit only bounded sleeping.  There's some argument that we might want to 
realign the rwlock API to match the rmlock API to make them more easily 
pluggable.

A really important but less well-understood element of our kernel locking 
model is the careful distinction made between bounded and unbounded sleeps. 
For the core "sleeping" synchronization primitives -- default mutexes, 
rwlocks, and rmlocks, it's permitted only to acquire locks in this class and 
not to perform potentially unbounded I/O waits while holding them.  sx locks 
and the increasingly decrepid lockmgr locks, as well as custom constructions 
that may still persist involving msleep and condition variables, may be held 
over unbounded I/O waiting, such as page faults, disk I/O, socket I/O, etc. 
The former class of locks necessarily always falls after the latter class of 
locks in the global lock order, and we don't permit acquisition of the 
unbounded sleep locks or other unbounded sleep primitives in sensitive 
contexts, such as in interrupt threads, software interrupt threads, etc. 
This prevents a broad class of deadlocks from even being expressed -- hence 
the recent issues with ipf and pf holding non-sleep able (mutex/rwlock/rmlock) 
primitives over copyin/copyout, which is forbidden by witness because those 
locks are intended to be shared between sensitive and less sensitive contexts.

Just to illustrate how this plays out in practice: each socket buffer has two 
locks -- the socket buffer mutex and the socket buffer sx lock (historically a 
mutex-interlocked msleep lock).  The socket buffer mutex maintains the basic 
consistency of the data structure, and may be acquired from ithreads/netisr 
and the user thread accessing the socket buffer.  That mutex is sufficient to 
access and (in general although not in one specific case) modify the data 
structure.  The socket buffer sx lock, on the other hand, is about serializing 
I/O operations to prevent undesired I/O interlacing during simultaneous 
requests from user space, and is acquired only from user threads.  Because a 
single I/O may span many user data pages, and we don't want interlacing of 
those I/Os, we acquire it at the beginning of a particular send() call, and 
drop it at the end only once the complete operation has finished, so that you 
don't get interlacing from other threads sending on the same socket when the 
first threads priority drops, it faults on a page, or the like.  This is not 
disimilar to the pair of locks present on a vnode -- the vnode interlock mutex 
(low level structure consistency) and the vnode lock (a lockmgr lock 
protecting consistency over I/O).

Robert N M Watson
Computer Laboratory
University of Cambridge