Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 11 May 2013 23:09:17 -0700
From:      Alfred Perlstein <bright@mu.org>
To:        Jeff Roberson <jroberson@jroberson.net>
Cc:        Marcel Moolenaar <marcel@FreeBSD.org>, jhb@freebsd.org, svn-src-all@freebsd.org, alfred@freebsd.org, attilio@freebsd.org, src-committers@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r250411 - in head/sys: conf kern sys
Message-ID:  <518F320D.3070304@mu.org>
In-Reply-To: <alpine.BSF.2.00.1305111405421.2005@desktop>
References:  <201305091628.r49GSI33039873@svn.freebsd.org> <alpine.BSF.2.00.1305111405421.2005@desktop>

next in thread | previous in thread | raw e-mail | index | archive | help
Can we just admit to ourselves that tweaks to debugging macros/printing 
and WITNESS are our kernel developer's "bikeshed zone" and get over the 
fact that people's needs may diverge and changing non-default behavior 
in non-critical paths is not going to be the death of the kernel as we 
know it?

I could certainly believe that this sort of thing needs long and 
thorough discussion if it wasn't the equivalent of style tweaks to manpages.

Let's leave the long and lengthy discussions to things that matter such 
as standards compliance, ABI, API and really cool performance and 
stability stuff.

-Alfred




On 5/11/13 5:43 PM, Jeff Roberson wrote:
> On Thu, 9 May 2013, Marcel Moolenaar wrote:
>
>> Author: marcel
>> Date: Thu May  9 16:28:18 2013
>> New Revision: 250411
>> URL: http://svnweb.freebsd.org/changeset/base/250411
>>
>> Log:
>>  Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE
>>  locks. To support this, VNODE locks are created with the LK_IS_VNODE
>>  flag. This flag is propagated down using the LO_IS_VNODE flag.
>>
>>  Note that WITNESS still records the LOR. Only the printing and the
>>  optional entering into the kernel debugger is bypassed with the
>>  WITNESS_NO_VNODE option.
>
> I'm replying to the original commit because the resulting thread got 
> way out of hand.  We need to all take a deep breath and take a 
> pragmatic approach to solving the problem at hand.
>
> Let me first say I understand the utility here as this is also coming 
> up in my organization.  Test, and users, do not want to see erroneous 
> warning messages.  I understand that.  Let's find a solution.
>
> Secondly, I think this project has grown too far for us to commit 
> changes like this without some focused discussion.  We need to be more 
> mindful of the size of the impact and the number of people who are 
> interested in a particular area.  I'm not picking on you Marcel 
> because this sort of thing has been coming up lately and we have all 
> been guilty of it from time to time.  There are more companies and 
> individuals than ever trying to push work into the repository and 
> we're having some growing pains.
>
> I am intimately familiar with the problems that lead to these 
> erroneous witness messages as I have tracked down many of them and am 
> even responsible for the code that generates them in some cases.  Let 
> me first outline a handful of generic problems.  The root cause is 
> that witness can not determine the real order between two locks due to 
> relationships too complex to describe with a pair of strings.
>
> One example, which has been brought up, is the hierarchical nature of 
> vnode locks.  This impacts vnodes within one filesystem but it also 
> involves vnodes between two different filesystems as you cross mount 
> points.  We can construct perfectly valid and deadlock free chains of 
> mount points that have two different filesystem types in different 
> orders which will LOR at the boundaries.  We already skip duplicates 
> to avoid this problem within each filesystem.  We need to skip 
> cross-filesystem duplicates, most desirably at the few specific places 
> where this happens. This problem comes up especially for devfs because 
> we lock devvps while file vnodes are locked but we lock devfs 
> directories after the rootfs lock when crossing mountpoints in lookup.
>
> A second example, is locks of a fundamentally different type that have 
> a complex ordering relationship.  For example, a vnode lock may be 
> acquired after a buf lock belonging to the parent's directory block.  
> A cg buf lock may be acquired after any file buf lock.  Here we want 
> to ignore interactions between these two specific types at this 
> particular location but not others as they may be unsafe.
>
> The third example, is a complex locking pattern with shared locks as 
> presented by dirhash.  We are seeing a similar pattern develop in the 
> vm where we are going to use an exclusive object lock to protect pages 
> or a shared object lock + a page lock.  The semantics only get more 
> complex as we push for more scalability. I expect to see more of these 
> patterns develop.
>
> None of these problems can be solved with names alone.  So far we've 
> just lived with the warnings and we're no longer willing to accept 
> that. What we need is a solution that blesses the specific instances 
> and the specific lock classes involved without silencing legitimate 
> warnings that may only occur after new code is added. For example, it 
> may be safe to add a sx lock around some vnode code but you may not 
> notice that you LOR if you silence all witness warnings related to the 
> vnode lock site.
>
> I believe that the perfect solution would be a mechanism that could 
> teach witness about and enforce these specific relationships.  
> However, that may be computationally prohibitive and too complex to 
> code.  A more reasonable option would be to bless the specific 
> relationships at the specific call sites. Turning all witness off at 
> particular sites or with particular types renders important 
> infrastructure useless for very large functional areas.  It's also 
> important to distinguish between squelching the error message from 
> eliminating the other state that is saved at lock sites.
>
> We already have lock names and types.  What I would propose we do is 
> make the type 'vnode' for all vnodes and 'buf' for all bufs with the 
> names used for the specific filesystems.  Then you could specify a 
> DUPOK that automatically blesses any filesystem to filesystem related 
> LORs.  In this way witness still records the call sites and unrelated 
> LORs or panics still have the acquisition information.  You could 
> eventually unwind this to only DUPOK at the specific currently known 
> places that we anticipate multiple vnodes.
>
> To solve the buf and other complex LORs involving multiple types you 
> would make lock variants that accept a blessed list of the specific 
> locks you are blessing.  We already have support for a blessed list in 
> witness.  We just want something like this per-call.
>
> For example;  When acquiring a child vnode lock after a parent lock 
> when a parent's buf is held I would do something like this:
>
> vn_lock_flags(vp, LK_EXCLUSIVE, { "vnode", "bufwait", NULL });
>
> Written properly dead code elimination will take care of it for 
> non-debug builds.  You could come up with other ways of writing this 
> that don't involve passing pointers down the stack.  For example, 
> making a unique token for this blessed pair and passing it in the high 
> bits of the flags. I don't really care about the particular 
> implementation but I think this is the right model.
>
> Thanks,
> Jeff
>
>>
>> Modified:
>>  head/sys/conf/options
>>  head/sys/kern/kern_lock.c
>>  head/sys/kern/subr_witness.c
>>  head/sys/kern/vfs_subr.c
>>  head/sys/sys/lock.h
>>  head/sys/sys/lockmgr.h
>>
>> Modified: head/sys/conf/options
>> ============================================================================== 
>>
>> --- head/sys/conf/options    Thu May  9 16:09:39 2013 (r250410)
>> +++ head/sys/conf/options    Thu May  9 16:28:18 2013 (r250411)
>> @@ -672,6 +672,7 @@ KTR_ENTRIES        opt_global.h
>> KTR_VERBOSE        opt_ktr.h
>> WITNESS            opt_global.h
>> WITNESS_KDB        opt_witness.h
>> +WITNESS_NO_VNODE    opt_witness.h
>> WITNESS_SKIPSPIN    opt_witness.h
>>
>> # options for ACPI support
>>
>> Modified: head/sys/kern/kern_lock.c
>> ============================================================================== 
>>
>> --- head/sys/kern/kern_lock.c    Thu May  9 16:09:39 2013 (r250410)
>> +++ head/sys/kern/kern_lock.c    Thu May  9 16:28:18 2013 (r250411)
>> @@ -393,6 +393,8 @@ lockinit(struct lock *lk, int pri, const
>>         iflags |= LO_WITNESS;
>>     if (flags & LK_QUIET)
>>         iflags |= LO_QUIET;
>> +    if (flags & LK_IS_VNODE)
>> +        iflags |= LO_IS_VNODE;
>>     iflags |= flags & (LK_ADAPTIVE | LK_NOSHARE);
>>
>>     lk->lk_lock = LK_UNLOCKED;
>>
>> Modified: head/sys/kern/subr_witness.c
>> ============================================================================== 
>>
>> --- head/sys/kern/subr_witness.c    Thu May  9 16:09:39 2013 (r250410)
>> +++ head/sys/kern/subr_witness.c    Thu May  9 16:28:18 2013 (r250411)
>> @@ -1289,7 +1289,19 @@ witness_checkorder(struct lock_object *l
>>             w->w_reversed = w1->w_reversed = 1;
>>             witness_increment_graph_generation();
>>             mtx_unlock_spin(&w_mtx);
>> -
>> +
>> +#ifdef WITNESS_NO_VNODE
>> +            /*
>> +             * There are known LORs between VNODE locks. They are
>> +             * not an indication of a bug. VNODE locks are flagged
>> +             * as such (LO_IS_VNODE) and we don't yell if the LOR
>> +             * is between 2 VNODE locks.
>> +             */
>> +            if ((lock->lo_flags & LO_IS_VNODE) != 0 &&
>> +                (lock1->li_lock->lo_flags & LO_IS_VNODE) != 0)
>> +                return;
>> +#endif
>> +
>>             /*
>>              * Ok, yell about it.
>>              */
>>
>> Modified: head/sys/kern/vfs_subr.c
>> ============================================================================== 
>>
>> --- head/sys/kern/vfs_subr.c    Thu May  9 16:09:39 2013 (r250410)
>> +++ head/sys/kern/vfs_subr.c    Thu May  9 16:28:18 2013 (r250411)
>> @@ -1037,7 +1037,7 @@ alloc:
>>      * By default, don't allow shared locks unless filesystems
>>      * opt-in.
>>      */
>> -    lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE);
>> +    lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE | 
>> LK_IS_VNODE);
>>     /*
>>      * Initialize bufobj.
>>      */
>>
>> Modified: head/sys/sys/lock.h
>> ============================================================================== 
>>
>> --- head/sys/sys/lock.h    Thu May  9 16:09:39 2013    (r250410)
>> +++ head/sys/sys/lock.h    Thu May  9 16:28:18 2013    (r250411)
>> @@ -79,6 +79,7 @@ struct lock_class {
>> #define    LO_SLEEPABLE    0x00100000    /* Lock may be held while 
>> sleeping. */
>> #define    LO_UPGRADABLE    0x00200000    /* Lock may be 
>> upgraded/downgraded. */
>> #define    LO_DUPOK    0x00400000    /* Don't check for duplicate 
>> acquires */
>> +#define    LO_IS_VNODE    0x00800000    /* Tell WITNESS about a 
>> VNODE lock */
>> #define    LO_CLASSMASK    0x0f000000    /* Class index bitmask. */
>> #define LO_NOPROFILE    0x10000000      /* Don't profile this lock */
>>
>>
>> Modified: head/sys/sys/lockmgr.h
>> ============================================================================== 
>>
>> --- head/sys/sys/lockmgr.h    Thu May  9 16:09:39 2013 (r250410)
>> +++ head/sys/sys/lockmgr.h    Thu May  9 16:28:18 2013 (r250411)
>> @@ -146,6 +146,7 @@ _lockmgr_args_rw(struct lock *lk, u_int
>> #define    LK_NOWITNESS    0x000010
>> #define    LK_QUIET    0x000020
>> #define    LK_ADAPTIVE    0x000040
>> +#define    LK_IS_VNODE    0x000080    /* Tell WITNESS about a VNODE 
>> lock */
>>
>> /*
>>  * Additional attributes to be used in lockmgr().
>>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?518F320D.3070304>