Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 May 2013 12:11:35 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Marcel Moolenaar <marcel@xcllnt.net>
Cc:        attilio@freebsd.org, svn-src-head@freebsd.org, svn-src-all@freebsd.org, Marcel Moolenaar <marcel@freebsd.org>, src-committers@freebsd.org
Subject:   Re: svn commit: r250411 - in head/sys: conf kern sys
Message-ID:  <201305101211.35808.jhb@freebsd.org>
In-Reply-To: <405C7C78-A626-4836-BD90-16FD08DD3196@xcllnt.net>
References:  <201305091628.r49GSI33039873@svn.freebsd.org> <201305100952.45101.jhb@freebsd.org> <405C7C78-A626-4836-BD90-16FD08DD3196@xcllnt.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, May 10, 2013 11:46:54 am Marcel Moolenaar wrote:
> > 
> > 2) vnode locks from a local filesystem that report a LOR with a "devfs"
> >   vnode.  Typical reports are either "ufs" -> "devfs" or in some cases
> >   "ufs" -> "devfs" -> "ufs".  As with 1), I would much rather tag the
> >   offending location than to disable all WITNESS checking on vnode locks.
> 
> With more file system types in use, this will get mixed up with the
> other file systems and noise you get is rather severe. It is a big
> problem for us at Juniper.

Note, it is very specific that the second lock is always "devfs".  I think
that points to this being isolated to a few specific places, not a generic
ordering problem.

> This is a similar approach to what Attilio suggested (IMO) and I'm
> very concerned on 2 grounds:
> 1.  Disable/suspending witness and making it easy to do so has known
>     side-effects: people will use it simple to get rid of witness
>     warnings and not even bother to understand the root cause of the
>     witness complaints and thus fix the root cause problem. Fixing
>     the root cause could be improving witness, or fixing the locks.
>     If we add means and ways to have locks not checked by witness
>     then witness stops being the useful tool it mostly is now.

The goal is to make the disabling as fine-grained as possible because...

> 2.  This, to me, also doesn't come close to fixing the root cause
>     problem. It's a just a different bandaid. What I would like to
>     see is a statement as to why witness reports LORs (in this case)
>     and why they are not, so that we can improve witness.
>     For example, a statement could go like this: witness operates on
>     lock names. All vnode structures have a lock and they bare the
>     name of the file system. Consequently, we have many instances of
>     a particular lock type that share the same name. This causes,
>     ... (please complete this statement appropriately).
>     If our root cause problem is fundamentally that witness cannot
>     distinguish 1 UFS vnode lock from another UFS vnode lock then
>     we should fix that. Either:
>     1.  Construct unique lock names when witness is enabled, or
>     2.  Create the lock with a flag to tell witness that there are
> 	multiple locks of the same name and as such does not have
> 	enough the knowledge to report LORs adequately. How we
> 	change witness in the face of such a flag is another story
> 	and a good subject for ongoing discussion.

Some things witness can't really grok (at least not well).  It's really
that simple.  Take the dirhash case.  Here is the comment in the code:

/*
 * Locking:
 *
 * The relationship between inode and dirhash is protected either by an
 * exclusive vnode lock or the vnode interlock where a shared vnode lock
 * may be used.  The dirhash_mtx is acquired after the dirhash lock.  To
 * handle teardown races, code wishing to lock the dirhash for an inode
 * when using a shared vnode lock must obtain a private reference on the
 * dirhash while holding the vnode interlock.  They can drop it once they
 * have obtained the dirhash lock and verified that the dirhash wasn't
 * recycled while they waited for the dirhash lock.
 *
 * ufsdirhash_build() acquires a shared lock on the dirhash when it is
 * successful.  This lock is released after a call to ufsdirhash_lookup().
 *
 * Functions requiring exclusive access use ufsdirhash_acquire() which may
 * free a dirhash structure that was recycled by ufsdirhash_recycle().
 *
 * The dirhash lock may be held across io operations.
 *
 * WITNESS reports a lock order reversal between the "bufwait" lock
 * and the "dirhash" lock.  However, this specific reversal will not
 * cause a deadlock.  To get a deadlock, one would have to lock a
 * buffer followed by the dirhash while a second thread locked a
 * buffer while holding the dirhash lock.  The second order can happen
 * under a shared or exclusive vnode lock for the associated directory
 * in lookup().  The first order, however, can only happen under an
 * exclusive vnode lock (e.g. unlink(), rename(), etc.).  Thus, for
 * a thread to be doing a "bufwait" -> "dirhash" order, it has to hold
 * an exclusive vnode lock.  That exclusive vnode lock will prevent
 * any other threads from doing a "dirhash" -> "bufwait" order.
 */

So the issue here is that shared locks and exclusive locks are not quite
the same.  Perhaps this is something that WITNESS could "know" generically,
and one solution might be to use separate witness "objects" for shared
acquires on a lock vs exclusive acquires on a lock, but I'm not sure
that wouldn't result in false negatives (i.e. WITNESS not reporting a
LOR that can deadlock).  My intuition is that the general rule would not
be safe, so I would rather annotate the specific false positive in this
case to remove the noise of a known-false-positive.

Note that there are other instances where we make WITNESS less specific
because it doesn't know about details.  For example, almost all uses of
the "DUPOK" flags err on the side of false negatives.  For PROC_LOCK,
acquiring duplicate locks is only ok if the second PROC_LOCK is a parent
(or ancestor) of the first PROC_LOCK.  However, for WITNESS we just allow
any duplicate lock pairing (partly because there is no support for adding
rules to the duplicate check currently).  Vnode locks are another example
of this actually.  You can lock a child of a directory while holding the
vnode lock of the parent directory but not vice versa.  That particular
rule would be very hard to verify in WITNESS directly (given two vnode
lock pointers, could you "prove" that one is a parent directory of the
other non-trivially in witness_checkorder()?  I seriously doubt it), so we
accept some false-negatives to avoid false-positive LORs on every path
lookup.  Thus, in practice I think your goal of having WITNESS understand
every possible nuance is a bit unrealistic and not consistent with our
use of it historically.

If you accept the notion that we're going to have to cope with imperfection,
then I hope you can appreciate my desire to make that as fine-grained as
possible.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201305101211.35808.jhb>