Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Sep 2010 15:22:15 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-hackers@freebsd.org
Cc:        Andrey Simonenko <simon@comsys.ntu-kpi.kiev.ua>, Matthew Fleming <mdf356@gmail.com>
Subject:   Re: Questions about mutex implementation in kern/kern_mutex.c
Message-ID:  <201009151522.15593.jhb@freebsd.org>
In-Reply-To: <AANLkTimJV-oB_uTSbUTtbSrR5fXgWGk00dEV7L-Gobrf@mail.gmail.com>
References:  <20100915134415.GA23727@pm513-1.comsys.ntu-kpi.kiev.ua> <AANLkTimJV-oB_uTSbUTtbSrR5fXgWGk00dEV7L-Gobrf@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, September 15, 2010 11:46:00 am Matthew Fleming wrote:
> I'll take a stab at answering these...
> 
> On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko
> <simon@comsys.ntu-kpi.kiev.ua> wrote:
> > Hello,
> >
> > I have questions about mutex implementation in kern/kern_mutex.c
> > and sys/mutex.h files (current versions of these files):
> >
> > 1. Is the following statement correct for a volatile pointer or integer
> >   variable: if a volatile variable is updated by the compare-and-set
> >   instruction (e.g. atomic_cmpset_ptr(&val, ...)), then the current
> >   value of such variable can be read without any special instruction
> >   (e.g. v = val)?
> >
> >   I checked Assembler code for a function with "v = val" and "val = v"
> >   like statements generated for volatile variable and simple variable
> >   and found differences: on ia64 "v = val" was implemented by ld.acq and
> >   "val = v" was implemented by st.rel; on mips and sparc64 Assembler code
> >   can have different order of lines for volatile and simple variable
> >   (depends on the code of a function).
> 
> I think this depends somewhat on the hardware and what you mean by
> "current" value.
> 
> If you want a value that is not in-flux, then something like
> atomic_cmpset_ptr() setting to the current value is needed, so that
> you force any other atomic_cmpset to fail.  However, since there is no
> explicit lock involved, there is no strong meaning for "current" value
> and a read that does not rely on a value cached in a register is
> likely sufficient.  While the "volatile" keyword in C has no explicit
> hardware meaning, it often means that a load from memory (or,
> presumably, L1-L3 cache) is required.

Actually, all we care about is getting a consistent snapshot of the value of 
the lock cookie at some point in time.  For that 'v = val' works fine.  The 
value may certainly be stale, but the mutex code handles these races in two 
ways:

  1) If MTX_CONTESTED is not set, then the lock cookie value can change at
     any time to either be unlocked, locked by another thread, or to
     become contested.  If any of those actions occur, then the attempt to
     set the MTX_CONTESTED bit via atomic_cmpset() in _mtx_lock_sleep() will
     fail causing the code to retry its loop until it succesfully sets
     MTX_CONTESTED or it notices a different lock cookie state.

  2) Once MTX_CONTESTED is set, the value of the lock cookie will not be
     changed unless the associated turnstile chain is locked.  This means that
     once we have locked the turnstile chain and verified that MTX_CONTESTED
     is set (or successfully set the bit), we can call turnstile_wait() to
     block without assured that the owner of the lock will resume this thread
     via turnstile_wakeup() when it releases the lock.

> > 2. Let there is a default (sleep) mutex and adaptive mutexes is enabled.
> >   A thread tries to obtain lock quickly and fails, _mtx_lock_sleep()
> >   is called, it gets the address of the current mutex's owner thread
> >   and checks whether that owner thread is running (on another CPU).
> >   How does _mtx_lock_sleep() know that that thread still exists
> >   (lines 311-337 in kern_mutex.c)?
> >
> >   When adaptive mutexes was implemented there was explicit locking
> >   around adaptive mutexes code.  When turnstile in mutex code was
> >   implemented that's locking logic was changed.
> 
> It appears that it's possible for the thread pointer to be recycled
> between fetching the value of owner and looking at TD_IS_RUNNING.  On
> actual hardware, this race is unlikely to occur due to the time it
> takes for a thread to release a lock and perform all of thread exit
> code before the struct thread is returned to the uma zone.  However,
> even once returned to the uma zone on many FreeBSD implementations the
> access is safe as the address of the thread is still dereferenceable,
> due to the implementation of uma zones.
> 
> On e.g. AIX this issue was different because the address range for
> threads was determined at compile time (one giant table) and the array
> only grew, never shrank, so the thread pointer was always valid and
> would be recycled at first opportunity.
> 
> It appears to me, from a strict correctness standpoint, that the use
> of uma_zalloc/uma_zfree for thread objects is not safe.  But from a
> practical implementation POV, the unsafe access in kern_mutex.c will
> not cause trouble in the absence of a hypervisor controlling when
> virtual CPUs get runtime.

Yes, it is a known "accepted" race.  This does probably warrant a comment
to say as much.  One could perhaps remove the race by using the owning
thread's td_cpu to do a pcpu_find() and comparing pc_curthread against
the cached 'owner' value instead.  I think even in that case you can still
be subject to the same theoretical race however if a HV prevented you from
running in between setting 'owner' and dereferencing 'owner->td_oncpu'.

However, I might actually prefer switching to the 'pc_curthread' approach
only because it does less work on each spin.

> > 3. Why there is no any memory barrier in mtx_init()?  If another thread
> >   (on another CPU) finds that mutex is initialized using mtx_initialized()
> >   then it can mtx_lock() it and mtx_lock() it second time, as a result
> >   mtx_recurse field will be increased, but its value still can be
> >   uninitialized on architecture with relaxed memory ordering model.
> 
> It seems to me that it's generally a programming error to rely on the
> return of mtx_initialized(), as there is no serialization with e.g. a
> thread calling mtx_destroy().  A fully correct serialization model
> would require that a single thread initialize the mtx and then create
> any worker threads that will use the mtx.

Yes, it is the caller's job to not expose a mtx until after it has been 
initialized.  A memory barrier in mtx_init() can't solve all those races.  If 
you put an object containing a mutex on a global queue and only invoke 
mtx_init() after dropping the global lock protecting the global queue, no 
amount of memory barriers in mtx_init() will save you.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201009151522.15593.jhb>