From owner-cvs-src@FreeBSD.ORG Mon Nov 26 10:31:42 2007 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA73116A418; Mon, 26 Nov 2007 10:31:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6E06213C478; Mon, 26 Nov 2007 10:31:42 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 4195746BD1; Mon, 26 Nov 2007 05:35:15 -0500 (EST) Date: Mon, 26 Nov 2007 10:31:35 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Daniel Eischen In-Reply-To: Message-ID: <20071126102104.L65286@fledge.watson.org> References: <200711081447.lA8EltXO052057@repoman.freebsd.org> <47492064.7080108@freebsd.org> <4749B971.3000703@elischer.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Stephan Uphoff , cvs-src@freebsd.org, src-committers@freebsd.org, Julian Elischer , cvs-all@freebsd.org Subject: Re: cvs commit: src/share/man/man9 locking.9 rmlock.9 src/sys/conf files src/sys/kern kern_rmlock.c subr_lock.c subr_pcpu.c subr_smp.c src/sys/sys _rmlock.h lock.h pcpu.h rmlock.h smp.h X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2007 10:31:43 -0000 On Mon, 26 Nov 2007, Daniel Eischen wrote: >> Spin and blocking mutexes should in my opinion be defined as different >> structures, at least in name so that the compiler hits you with a clue-bat >> when you try use a spin-lock with non-spinlock ops etc. > > That seems nice, but doesn't go along well with trying to keep a similar API > as Solaris. It is convenient to share the APIs so that it is easy to change > the mtx_init() from a default to a spin type without changing the rest of > the code. We really shouldn't have a need for spin mutexes if the default > mutexes are adaptive, and if mutexes taken by interrupt handlers are > initialized in a manner similar to Solaris. There really shouldn't be a > separate API for spin mutexes. Once a mutex is initialized as MTX_DEF or > MTX_SPIN, that should be sufficient. While I agree from a code hacking perspective, one of the benefits of having different types and APIs in the code is that you can have more rich static checking without having to try and manage the state associated with flags at init-time. For example, maintaining the MTX_SPIN/MTX_DEF state in Coverity would be difficult or impossible, because global context isn't generally available, but by using different types locally in analyzed code, you can have more rigorous checking. The "classic" example is with rwlocks/mutexes and sxlocks and the check for unbounded sleeping: it's ok to tsleep/msleep while holding an sxlock, but not while holding a mutex or rwlock unless it's the argument to the sleep call, and hence implicitly dropped. If we combined all the types, we couldn't enforce that statically, but by using different types, local static analysis can flag the sleep attempt. In much code in the kernel, we accomplish lock "pluggability" by using macros rather than by sharing data types, allowing changes at compile-time but not run-time, and this appears to work well in practice. >> not sure why sx-locks exist at all, as they seem to be a variant of sleep. >> I think it's just a convenience function set to allow one to implement a >> sleep-derived synchronisation. > > Hmm, sx locks seem similar to rmlocks, at least according to the > description: > > Shared/exclusive locks are used to protect data that are read far > more often than they are written. > > Do we need both? rmlocks are most similar to rwlocks, as they have priority propogation and permit only bounded sleeping. There's some argument that we might want to realign the rwlock API to match the rmlock API to make them more easily pluggable. A really important but less well-understood element of our kernel locking model is the careful distinction made between bounded and unbounded sleeps. For the core "sleeping" synchronization primitives -- default mutexes, rwlocks, and rmlocks, it's permitted only to acquire locks in this class and not to perform potentially unbounded I/O waits while holding them. sx locks and the increasingly decrepid lockmgr locks, as well as custom constructions that may still persist involving msleep and condition variables, may be held over unbounded I/O waiting, such as page faults, disk I/O, socket I/O, etc. The former class of locks necessarily always falls after the latter class of locks in the global lock order, and we don't permit acquisition of the unbounded sleep locks or other unbounded sleep primitives in sensitive contexts, such as in interrupt threads, software interrupt threads, etc. This prevents a broad class of deadlocks from even being expressed -- hence the recent issues with ipf and pf holding non-sleep able (mutex/rwlock/rmlock) primitives over copyin/copyout, which is forbidden by witness because those locks are intended to be shared between sensitive and less sensitive contexts. Just to illustrate how this plays out in practice: each socket buffer has two locks -- the socket buffer mutex and the socket buffer sx lock (historically a mutex-interlocked msleep lock). The socket buffer mutex maintains the basic consistency of the data structure, and may be acquired from ithreads/netisr and the user thread accessing the socket buffer. That mutex is sufficient to access and (in general although not in one specific case) modify the data structure. The socket buffer sx lock, on the other hand, is about serializing I/O operations to prevent undesired I/O interlacing during simultaneous requests from user space, and is acquired only from user threads. Because a single I/O may span many user data pages, and we don't want interlacing of those I/Os, we acquire it at the beginning of a particular send() call, and drop it at the end only once the complete operation has finished, so that you don't get interlacing from other threads sending on the same socket when the first threads priority drops, it faults on a page, or the like. This is not disimilar to the pair of locks present on a vnode -- the vnode interlock mutex (low level structure consistency) and the vnode lock (a lockmgr lock protecting consistency over I/O). Robert N M Watson Computer Laboratory University of Cambridge