Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 Jun 2013 21:16:27 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        John Baldwin <jhb@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r252346 - head/share/man/man9
Message-ID:  <51CEDE2B.60204@freebsd.org>
In-Reply-To: <201306281633.r5SGXjFU017827@svn.freebsd.org>
References:  <201306281633.r5SGXjFU017827@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
thanks!


On 6/29/13 12:33 AM, John Baldwin wrote:
> Author: jhb
> Date: Fri Jun 28 16:33:45 2013
> New Revision: 252346
> URL: http://svnweb.freebsd.org/changeset/base/252346
>
> Log:
>    Make a pass over this page to correct and clarify a few things as well as
>    some general word-smithing.
>    - Don't claim that adaptive mutexes have a timeout (they don't).
>    - Don't treat pool mutexes as a separate primitive in a few places.
>    - Describe sleepable read-mostly locks as a separate lock type and add
>      them to the various tables.
>    - Don't claim that sx locks are less efficient.  That hasn't been true in
>      a few years now.
>    - Describe lockmanager locks next to sx locks since they are very similar
>      in terms of rules, etc., and so that all the lock primitives are
>      grouped together before the non-lock primitives.
>    - Similarly, move the section on Giant after the description of all the
>      non-lock primitives to preserve grouping.
>    - Condition variables work on several types of locks, not just mutexes.
>    - Add a bit of language to compare/contrast condition variables with
>      sleep/wakeup.
>    - Add a note about why pause(9) is unique.
>    - Add some language to define bounded vs unbounded sleeps and explain
>      why they are treated separately (bounded sleeps only need CPU time
>      to make forward progress).
>    - Don't state that using mtx_sleep() is a bad idea.  It is in fact rather
>      necessary.
>    - Rework the interaction table a bit.  First, it did not include really
>      include sleepable rmlocks and it left out lockmgr entirely.  To get
>      things to fit, combine similar lock types into the same column / row,
>      and explicitly state what "sleep" means.  The notes about recursion
>      and lock order were also a bit banal (lock order is always important,
>      not just in the few places annotated here), so remove them.  In
>      particular, the lock order note would need to be on just about every
>      cell.  If we want to document recursion I think a better approach
>      would be a separate table summarizing the recursion rules for each
>      lock as having too many notes clutters the table.
>    - Tweak the tables to use less indentation so everything still fits with
>      the added columns.
>    - Correct a few cells in the context mode table.
>    - Use mdoc markup instead of explicit markup in a few places.
>    
>    Requested by:	julian
>    MFC after:	2 weeks
>
> Modified:
>    head/share/man/man9/locking.9
>
> Modified: head/share/man/man9/locking.9
> ==============================================================================
> --- head/share/man/man9/locking.9	Fri Jun 28 16:24:14 2013	(r252345)
> +++ head/share/man/man9/locking.9	Fri Jun 28 16:33:45 2013	(r252346)
> @@ -33,53 +33,52 @@
>   .Sh DESCRIPTION
>   The
>   .Em FreeBSD
> -kernel is written to run across multiple CPUs and as such requires
> -several different synchronization primitives to allow the developers
> -to safely access and manipulate the many data types required.
> +kernel is written to run across multiple CPUs and as such provides
> +several different synchronization primitives to allow developers
> +to safely access and manipulate many data types.
>   .Ss Mutexes
> -Mutexes (also erroneously called "sleep mutexes") are the most commonly used
> +Mutexes (also called "blocking mutexes") are the most commonly used
>   synchronization primitive in the kernel.
>   A thread acquires (locks) a mutex before accessing data shared with other
>   threads (including interrupt threads), and releases (unlocks) it afterwards.
>   If the mutex cannot be acquired, the thread requesting it will wait.
> -Mutexes are by default adaptive, meaning that
> +Mutexes are adaptive by default, meaning that
>   if the owner of a contended mutex is currently running on another CPU,
> -then a thread attempting to acquire the mutex will briefly spin
> -in the hope that the owner is only briefly holding it,
> -and might release it shortly.
> -If the owner does not do so, the waiting thread proceeds to yield the processor,
> -allowing other threads to run.
> -If the owner is not currently actually running then the spin step is skipped.
> +then a thread attempting to acquire the mutex will spin rather than yielding
> +the processor.
>   Mutexes fully support priority propagation.
>   .Pp
>   See
>   .Xr mutex 9
>   for details.
> -.Ss Spin mutexes
> -Spin mutexes are variation of basic mutexes; the main difference between
> -the two is that spin mutexes never yield the processor - instead, they spin,
> -waiting for the thread holding the lock,
> -(which must be running on another CPU), to release it.
> -Spin mutexes disable interrupts while the held so as to not get pre-empted.
> -Since disabling interrupts is expensive, they are also generally slower.
> -Spin mutexes should be used only when necessary, e.g. to protect data shared
> +.Ss Spin Mutexes
> +Spin mutexes are a variation of basic mutexes; the main difference between
> +the two is that spin mutexes never block.
> +Instead, they spin while waiting for the lock to be released.
> +Note that a thread that holds a spin mutex must never yield its CPU to
> +avoid deadlock.
> +Unlike ordinary mutexes, spin mutexes disable interrupts when acquired.
> +Since disabling interrupts can be expensive, they are generally slower to
> +acquire and release.
> +Spin mutexes should be used only when absolutely necessary,
> +e.g. to protect data shared
>   with interrupt filter code (see
>   .Xr bus_setup_intr 9
> -for details).
> -.Ss Pool mutexes
> -With most synchronization primitives, such as mutexes, programmer must
> -provide a piece of allocated memory to hold the primitive.
> +for details),
> +or for scheduler internals.
> +.Ss Mutex Pools
> +With most synchronization primitives, such as mutexes, the programmer must
> +provide memory to hold the primitive.
>   For example, a mutex may be embedded inside the structure it protects.
> -Pool mutex is a variant of mutex without this requirement - to lock or unlock
> -a pool mutex, one uses address of the structure being protected with it,
> -not the mutex itself.
> -Pool mutexes are seldom used.
> +Mutex pools provide a preallocated set of mutexes to avoid this
> +requirement.
> +Note that mutexes from a pool may only be used as leaf locks.
>   .Pp
>   See
>   .Xr mtx_pool 9
>   for details.
> -.Ss Reader/writer locks
> -Reader/writer locks allow shared access to protected data by multiple threads,
> +.Ss Reader/Writer Locks
> +Reader/writer locks allow shared access to protected data by multiple threads
>   or exclusive access by a single thread.
>   The threads with shared access are known as
>   .Em readers
> @@ -91,26 +90,16 @@ since it may modify protected data.
>   Reader/writer locks can be treated as mutexes (see above and
>   .Xr mutex 9 )
>   with shared/exclusive semantics.
> -More specifically, regular mutexes can be
> -considered to be equivalent to a write-lock on an
> -.Em rw_lock.
> -The
> -.Em rw_lock
> -locks have priority propagation like mutexes, but priority
> -can be propagated only to an exclusive holder.
> +Reader/writer locks support priority propagation like mutexes,
> +but priority is propagated only to an exclusive holder.
>   This limitation comes from the fact that shared owners
>   are anonymous.
> -Another important property is that shared holders of
> -.Em rw_lock
> -can recurse, but exclusive locks are not allowed to recurse.
> -This ability should not be used lightly and
> -.Em may go away.
>   .Pp
>   See
>   .Xr rwlock 9
>   for details.
> -.Ss Read-mostly locks
> -Mostly reader locks are similar to
> +.Ss Read-Mostly Locks
> +Read-mostly locks are similar to
>   .Em reader/writer
>   locks but optimized for very infrequent write locking.
>   .Em Read-mostly
> @@ -122,21 +111,41 @@ data structure.
>   See
>   .Xr rmlock 9
>   for details.
> +.Ss Sleepable Read-Mostly Locks
> +Sleepable read-mostly locks are a variation on read-mostly locks.
> +Threads holding an exclusive lock may sleep,
> +but threads holding a shared lock may not.
> +Priority is propagated to shared owners but not to exclusive owners.
>   .Ss Shared/exclusive locks
>   Shared/exclusive locks are similar to reader/writer locks; the main difference
> -between them is that shared/exclusive locks may be held during unbounded sleep
> -(and may thus perform an unbounded sleep).
> -They are inherently less efficient than mutexes, reader/writer locks
> -and read-mostly locks.
> -They do not support priority propagation.
> -They should be considered to be closely related to
> -.Xr sleep 9 .
> -They could in some cases be
> -considered a conditional sleep.
> +between them is that shared/exclusive locks may be held during unbounded sleep.
> +Acquiring a contested shared/exclusive lock can perform an unbounded sleep.
> +These locks do not support priority propagation.
>   .Pp
>   See
>   .Xr sx 9
>   for details.
> +.Ss Lockmanager locks
> +Lockmanager locks are sleepable shared/exclusive locks used mostly in
> +.Xr VFS 9
> +.Po
> +as a
> +.Xr vnode 9
> +lock
> +.Pc
> +and in the buffer cache
> +.Po
> +.Xr BUF_LOCK 9
> +.Pc .
> +They have features other lock types do not have such as sleep
> +timeouts, blocking upgrades,
> +writer starvation avoidance, draining, and an interlock mutex,
> +but this makes them complicated to both use and implement;
> +for this reason, they should be avoided.
> +.Pp
> +See
> +.Xr lock 9
> +for details.
>   .Ss Counting semaphores
>   Counting semaphores provide a mechanism for synchronizing access
>   to a pool of resources.
> @@ -149,43 +158,21 @@ See
>   .Xr sema 9
>   for details.
>   .Ss Condition variables
> -Condition variables are used in conjunction with mutexes to wait for
> -conditions to occur.
> -A thread must hold the mutex before calling the
> -.Fn cv_wait* ,
> +Condition variables are used in conjunction with locks to wait for
> +a condition to become true.
> +A thread must hold the associated lock before calling one of the
> +.Fn cv_wait ,
>   functions.
> -When a thread waits on a condition, the mutex
> -is atomically released before the thread yields the processor,
> -then reacquired before the function call returns.
> +When a thread waits on a condition, the lock
> +is atomically released before the thread yields the processor
> +and reacquired before the function call returns.
> +Condition variables may be used with blocking mutexes,
> +reader/writer locks, read-mostly locks, and shared/exclusive locks.
>   .Pp
>   See
>   .Xr condvar 9
>   for details.
> -.Ss Giant
> -Giant is an instance of a mutex, with some special characteristics:
> -.Bl -enum
> -.It
> -It is recursive.
> -.It
> -Drivers can request that Giant be locked around them
> -by not marking themselves MPSAFE.
> -Note that infrastructure to do this is slowly going away as non-MPSAFE
> -drivers either became properly locked or disappear.
> -.It
> -Giant must be locked first before other locks.
> -.It
> -It is OK to hold Giant while performing unbounded sleep; in such case,
> -Giant will be dropped before sleeping and picked up after wakeup.
> -.It
> -There are places in the kernel that drop Giant and pick it back up
> -again.
> -Sleep locks will do this before sleeping.
> -Parts of the network or VM code may do this as well, depending on the
> -setting of a sysctl.
> -This means that you cannot count on Giant keeping other code from
> -running if your code sleeps, even if you want it to.
> -.El
> -.Ss Sleep/wakeup
> +.Ss Sleep/Wakeup
>   The functions
>   .Fn tsleep ,
>   .Fn msleep ,
> @@ -194,7 +181,12 @@ The functions
>   .Fn wakeup ,
>   and
>   .Fn wakeup_one
> -handle event-based thread blocking.
> +also handle event-based thread blocking.
> +Unlike condition variables,
> +arbitrary addresses may be used as wait channels and an dedicated
> +structure does not need to be allocated.
> +However, care must be taken to ensure that wait channel addresses are
> +unique to an event.
>   If a thread must wait for an external event, it is put to sleep by
>   .Fn tsleep ,
>   .Fn msleep ,
> @@ -214,9 +206,10 @@ the thread is being put to sleep.
>   All threads sleeping on a single
>   .Fa chan
>   are woken up later by
> -.Fn wakeup ,
> -often called from inside an interrupt routine, to indicate that the
> -resource the thread was blocking on is available now.
> +.Fn wakeup
> +.Pq often called from inside an interrupt routine
> +to indicate that the
> +event the thread was blocking on has occurred.
>   .Pp
>   Several of the sleep functions including
>   .Fn msleep ,
> @@ -232,122 +225,168 @@ includes the
>   flag, then the lock will not be reacquired before returning.
>   The lock is used to ensure that a condition can be checked atomically,
>   and that the current thread can be suspended without missing a
> -change to the condition, or an associated wakeup.
> +change to the condition or an associated wakeup.
>   In addition, all of the sleep routines will fully drop the
>   .Va Giant
>   mutex
> -(even if recursed)
> +.Pq even if recursed
>   while the thread is suspended and will reacquire the
>   .Va Giant
> -mutex before the function returns.
> +mutex
> +.Pq restoring any recursion
> +before the function returns.
>   .Pp
> -See
> -.Xr sleep 9
> -for details.
> -.Ss Lockmanager locks
> -Shared/exclusive locks, used mostly in
> -.Xr VFS 9 ,
> -in particular as a
> -.Xr vnode 9
> -lock.
> -They have features other lock types do not have, such as sleep timeout,
> -writer starvation avoidance, draining, and interlock mutex, but this makes them
> -complicated to implement; for this reason, they are deprecated.
> +The
> +.Fn pause
> +function is a special sleep function that waits for a specified
> +amount of time to pass before the thread resumes execution.
> +This sleep cannot be terminated early by either an explicit
> +.Fn wakeup
> +or a signal.
>   .Pp
>   See
> -.Xr lock 9
> +.Xr sleep 9
>   for details.
> +.Ss Giant
> +Giant is a special mutex used to protect data structures that do not
> +yet have their own locks.
> +Since it provides semantics akin to the old
> +.Xr spl 9
> +interface,
> +Giant has special characteristics:
> +.Bl -enum
> +.It
> +It is recursive.
> +.It
> +Drivers can request that Giant be locked around them
> +by not marking themselves MPSAFE.
> +Note that infrastructure to do this is slowly going away as non-MPSAFE
> +drivers either became properly locked or disappear.
> +.It
> +Giant must be locked before other non-sleepable locks.
> +.It
> +Giant is dropped during unbounded sleeps and reacquired after wakeup.
> +.It
> +There are places in the kernel that drop Giant and pick it back up
> +again.
> +Sleep locks will do this before sleeping.
> +Parts of the network or VM code may do this as well.
> +This means that you cannot count on Giant keeping other code from
> +running if your code sleeps, even if you want it to.
> +.El
>   .Sh INTERACTIONS
> -The primitives interact and have a number of rules regarding how
> +The primitives can interact and have a number of rules regarding how
>   they can and can not be combined.
> -Many of these rules are checked using the
> -.Xr witness 4
> -code.
> -.Ss Bounded vs. unbounded sleep
> -The following primitives perform bounded sleep:
> - mutexes, pool mutexes, reader/writer locks and read-mostly locks.
> -.Pp
> -The following primitives may perform an unbounded sleep:
> -shared/exclusive locks, counting semaphores, condition variables, sleep/wakeup and lockmanager locks.
> -.Pp
> +Many of these rules are checked by
> +.Xr witness 4 .
> +.Ss Bounded vs. Unbounded Sleep
> +A bounded sleep
> +.Pq or blocking
> +is a sleep where the only resource needed to resume execution of a thread
> +is CPU time for the owner of a lock that the thread is waiting to acquire.
> +An unbounded sleep
> +.Po
> +often referred to as simply
> +.Dq sleeping
> +.Pc
> +is a sleep where a thread is waiting for an external event or for a condition
> +to become true.
> +In particular,
> +since there is always CPU time available,
> +a dependency chain of threads in bounded sleeps should always make forward
> +progress.
> +This requires that no thread in a bounded sleep is waiting for a lock held
> +by a thread in an unbounded sleep.
> +To avoid priority inversions,
> +a thread in a bounded sleep lends its priority to the owner of the lock
> +that it is waiting for.
> +.Pp
> +The following primitives perform bounded sleeps:
> +mutexes, reader/writer locks and read-mostly locks.
> +.Pp
> +The following primitives perform unbounded sleeps:
> +sleepable read-mostly locks, shared/exclusive locks, lockmanager locks,
> +counting semaphores, condition variables, and sleep/wakeup.
> +.Ss General Principles
> +.Bl -bullet
> +.It
>   It is an error to do any operation that could result in yielding the processor
>   while holding a spin mutex.
> +.It
> +It is an error to do any operation that could result in unbounded sleep
> +while holding any primitive from the 'bounded sleep' group.
> +For example, it is an error to try to acquire a shared/exclusive lock while
> +holding a mutex, or to try to allocate memory with M_WAITOK while holding a
> +reader/writer lock.
>   .Pp
> -As a general rule, it is an error to do any operation that could result
> -in unbounded sleep while holding any primitive from the 'bounded sleep' group.
> -For example, it is an error to try to acquire shared/exclusive lock while
> -holding mutex, or to try to allocate memory with M_WAITOK while holding
> -read-write lock.
> -.Pp
> -As a special case, it is possible to call
> +Note that the lock passed to one of the
>   .Fn sleep
>   or
> -.Fn mtx_sleep
> -while holding a single mutex.
> -It will atomically drop that mutex and reacquire it as part of waking up.
> -This is often a bad idea because it generally relies on the programmer having
> -good knowledge of all of the call graph above the place where
> -.Fn mtx_sleep
> -is being called and assumptions the calling code has made.
> -Because the lock gets dropped during sleep, one must re-test all
> -the assumptions that were made before, all the way up the call graph to the
> -place where the lock was acquired.
> -.Pp
> +.Fn cv_wait
> +functions is dropped before the thread enters the unbounded sleep and does
> +not violate this rule.
> +.It
>   It is an error to do any operation that could result in yielding of
>   the processor when running inside an interrupt filter.
> -.Pp
> +.It
>   It is an error to do any operation that could result in unbounded sleep when
>   running inside an interrupt thread.
> +.El
>   .Ss Interaction table
>   The following table shows what you can and can not do while holding
> -one of the synchronization primitives discussed:
> -.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent
> -.It Em "       You want:" Ta spin-mtx Ta mutex Ta rwlock Ta rmlock Ta sx Ta sleep
> -.It Em "You have:     " Ta ------ Ta ------ Ta ------ Ta ------ Ta ------ Ta ------
> -.It spin mtx  Ta \&ok-1 Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-3
> -.It mutex     Ta \&ok Ta \&ok-1 Ta \&ok Ta \&ok Ta \&no Ta \&no-3
> -.It rwlock    Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok Ta \&no Ta \&no-3
> -.It rmlock    Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&no-5 Ta \&no-5
> -.It sx        Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&no-2 Ta \&ok-4
> +one of the locking primitives discussed.  Note that
> +.Dq sleep
> +includes
> +.Fn sema_wait ,
> +.Fn sema_timedwait ,
> +any of the
> +.Fn cv_wait
> +functions,
> +and any of the
> +.Fn sleep
> +functions.
> +.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
> +.It Em "       You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
> +.It Em "You have:     " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------
> +.It spin mtx  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1
> +.It mutex/rw  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
> +.It rmlock    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
> +.It sleep rm  Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3
> +.It sx        Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3
> +.It lockmgr   Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
>   .El
>   .Pp
>   .Em *1
> -Recursion is defined per lock.
> -Lock order is important.
> +There are calls that atomically release this primitive when going to sleep
> +and reacquire it on wakeup
> +.Po
> +.Fn mtx_sleep ,
> +.Fn rw_sleep ,
> +.Fn msleep_spin ,
> +etc.
> +.Pc .
>   .Pp
>   .Em *2
> -Readers can recurse though writers can not.
> -Lock order is important.
> +These cases are only allowed while holding a write lock on a sleepable
> +read-mostly lock.
>   .Pp
>   .Em *3
> -There are calls that atomically release this primitive when going to sleep
> -and reacquire it on wakeup (e.g.
> -.Fn mtx_sleep ,
> -.Fn rw_sleep
> -and
> -.Fn msleep_spin ) .
> -.Pp
> -.Em *4
> -Though one can sleep holding an sx lock, one can also use
> -.Fn sx_sleep
> -which will atomically release this primitive when going to sleep and
> +Though one can sleep while holding this lock,
> +one can also use a
> +.Fn sleep
> +function to atomically release this primitive when going to sleep and
>   reacquire it on wakeup.
>   .Pp
> -.Em *5
> -.Em Read-mostly
> -locks can be initialized to support sleeping while holding a write lock.
> -See
> -.Xr rmlock 9
> -for details.
> +Note that non-blocking try operations on locks are always permitted.
>   .Ss Context mode table
>   The next table shows what can be used in different contexts.
>   At this time this is a rather easy to remember table.
> -.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent
> -.It Em "Context:"  Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep
> +.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
> +.It Em "Context:"  Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
>   .It interrupt filter:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
> -.It interrupt thread:  Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&ok Ta \&no
> -.It callout:    Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&no Ta \&no
> -.It syscall:    Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
> +.It interrupt thread:  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
> +.It callout:    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
> +.It system call:    Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
>   .El
>   .Sh SEE ALSO
>   .Xr witness 4 ,
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51CEDE2B.60204>