Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Sep 2002 06:50:33 -0700 (PDT)
From:      Don Lewis <dl-freebsd@catspoiler.org>
To:        iedowse@maths.tcd.ie
Cc:        current@FreeBSD.ORG, jeff@FreeBSD.ORG
Subject:   Re: nfs_inactive() bug? -> panic: lockmgr: locking against myself 
Message-ID:  <200209121350.g8CDoXwr000721@gw.catspoiler.org>
In-Reply-To: <200209111224.aa37675@salmon.maths.tcd.ie>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11 Sep, Ian Dowse wrote:
> In message <200209110658.g8B6w2wr097059@gw.catspoiler.org>, Don Lewis writes:
>>
>>A potentially better solution just occurred to me.  It looks like it
>>would be better if vrele() waited to decrement v_usecount until *after*
>>the call to VOP_INACTIVE() (and after the call to VI_LOCK()).  If that
>>were done, nfs_inactive() wouldn't need to muck with v_usecount at all.
> 
> I've looked at this before; I think some filesystems (ufs anyway)
> depend on v_usecount being 0 when VOP_INACTIVE() is called.

Yup, though it would easy to tweak ufs_inactive() to expect the
v_usecount to be one greater.  A bigger problem is the call to
vrecycle() in ufs_inactive().

> The
> patch I have had lying around for quite a while is below. It adds
> a vnode flag to avoid recursion into the last-reference handling
> code in vrele/vput, which is the real problem.
> 
> It also guarantees that a vnode will not be recycled during
> VOP_INACTIVE(), so the nfs code no longer needs to hold an extra
> reference in the first place. The flag manipulation code got a bit
> messy after Jeff's vnode flag splitting work, so the patch could
> probably be improved.

If the need for nfs_inactive() to manipulate the reference count is
eliminated, that should be sufficient to eliminate the vrele/vput
recursion problem as well.
 
> Whatever way this is done, we should try to avoid adding more hacks
> to the nfs_inactive() code anyway.

Tweaking nfs_inactive() has the advantage of being the least intrusive
fix for this problem, but it isn't architecturally clean.

I'd also argue that decrementing the vnode reference count to zero, but
holding on to the reference for an extended period of time while doing
I/O on the vnode is also a kludge.  It also causes problems that require
kludgey fixes, like the reference count manipulation in nfs_inactive().

For the most part the VI_INACTIVE flag is just another way of indicating
that we are holding a reference to the vnode, so both the flag and the
reference count need to be checked to see if the vnode is in use.  The
advantage of adding this flag versus just bumping the reference count is
that the flag allows us to avoid the bugs in the current implementation
while not requiring changes to code that expects the reference count to
be zero in code called from VOP_INACTIVE().

Here are the three proposed fixes along with their advantages and
disadvantages:

  Inline the v_usecount decrement code in nfs_inactive()

    + Least intrusive fix for the recursion bug

    - Adds a kludge to a kludge

  Call VOP_INACTIVE() before decrementing the reference count and tweak
  the foo_inactive() methods to expect the reference count to be one
  instead of zero

    + Eliminates the kludge from nfs_inactive()

    + We aren't lying about holding a reference, so hopefully less
      potential for bugs

    - It is not obvious to how to adapt ufs_inactive()


  Add VI_INACTIVE flag to prevent the vnode from being recycled

    + Eliminates the kludge from nfs_inactive()

    + Doesn't require changes to the other filesystems

    - Code complexity

    - Programmer needs to know when to look at VI_INACTIVE and when
      looking at just the reference count is ok


After looking at ufs_inactive(), I'd like to add a fourth proposal that
would involve splitting VOP_INACTIVE() into two separate methods.  The
first method would be called before the reference count is decremented
and would do any I/O.  The second method would be called after the
reference count is decremented and could only be used for freeing
resources, manipulating flags, putting it on the free list with
vrecycle(), etc.

    + Eliminates the kludge from nfs_inactive()

    + We don't lie about holding a reference while doing I/O

    - More extensive code changes

    - Bikeshed arguments about what to call the two methods


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200209121350.g8CDoXwr000721>