Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Jun 2003 22:06:59 -0700 (PDT)
From:      Don Lewis <truckman@FreeBSD.org>
To:        jroberson@chesapeake.net
Cc:        arch@FreeBSD.org
Subject:   Re: vnode/buf locking deadlock between nfsiod and getblk()
Message-ID:  <200306160507.h5G56xM7047970@gw.catspoiler.org>
In-Reply-To: <20030615233553.N36168-100000@mail.chesapeake.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 15 Jun, Jeff Roberson wrote:
> On Sat, 14 Jun 2003, Don Lewis wrote:
> 

>> I finally dug around in the code and discovered that the problem is
>> fairly fundamental.  If a thread calls VOP_STRATEGY() to for
>> asynchronous I/O on an NFS mounted filesystem, or if it calls
>> nfs_biord() which decides to do readahead, the request is handled by
>> nfs_asyncio(), which uses BUF_KERNPROC() to transfer ownership of the
>> buf lock to the system, and queues the buf on nmp->nm_bufq for nfsiod to
>> handle later.
>>
>> Everything is fine if nfsiod is able to service the request before
>> another thread requests the buf.  The problem occurs when another thread
>> attempts to do I/O on the file, grabs the vnode lock and then tries to
>> grab the buf lock before nfsiod has gotten around servicing the request.
>> The thread requesting the I/O can't proceed until it gets the buf lock,
>> which won't happen until the the queue request has been serviced, and
>> nfsiod can handle the I/O request in the buf because it can't obtain the
>> vnode lock.  The only reason that we don't see this failure is that
>> nfsiod is not requesting the vnode lock and is allowing nfs_doio() to
>> play with an unlocked vnode (or one locked by another thread).
>>
>> I came up with three possible ways of fixing this, none of which sound
>> very appealing:
>>
>> 	Fix nfs_doio() so that it and the functions that it calls don't
>>         touch any vnode fields that require the vnode lock.
>>
>> 	When attempting to lock a buf whose current lockholder is
>>         LK_KERNPROC, back off by dropping the vnode lock and retrying.
> 
> There are several other places in the kernel that use the solution above.
> If you go backwards from the buf to the vnode lock you must trylock or
> equivalent.

I had actually thought about doing this in nfssvc_iod(), which is where
we try to grab the locks in the wrong order, but I didn't attempt to
actually implement this because I wasn't convinced that the
implementation I had in mind wouldn't spin in a tight loop in some
circumstances.  Now that I've looked at it in more detail, I'm pretty
sure that we can't drop the lock on the buf in nfssvc_iod() to back all
the way out since it looks like the buf needs to be remain locked until
the I/O request is completed.  The problem is nfsiod can't complete the
I/O request without the vnode lock, and it can't get the vnode lock if
some other thread has locked the vnode and is waiting to lock the buf.

My suggestion above is to have nfs_getcacheblk() or getblk() temporarily
fail and have it's caller drop any vnode locks and retry.  This strikes
me as worst of these three choices.

>> 	When attempting to lock a buf whose current lockholder is
>>         LK_KERNPROC, steal the buf back and do the requested I/O
>>         synchronously before proceeding if the previously requested I/O
>>         was not already in progress.

I actually like this one the best.  If there large amount of I/O queued
on the NFS mount point for nfsiod to handle, this has the effect of
giving a higher priority to I/O requests that a process is waiting for
than write behind and speculative read ahead requests.


Another undesirable "fix" would be to give away the vnode lock to nfsiod
at the same time as the the buf lock is given away.  That should do
wonders for performance ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200306160507.h5G56xM7047970>