Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Mar 1998 17:09:01 +0200
From:      Anatoly Vorobey <mellon@pobox.com>
To:        fs@FreeBSD.ORG
Subject:   NFS
Message-ID:  <19980316170901.06534@techunix.technion.ac.il>

next in thread | raw e-mail | index | archive | help
This is a possibly clueless question; a few days ago I knew
nothing about NFS internals, so please forgive my ignorance.
I'm trying to learn NFS and VFS internals by debugging a few
crash scenarios.

How is NFS supposed (if it is) to deal with deadlocks resulting
from upcalls?

Example: currently it's possible to hang the machine by mounting
an NFS-exported fs _locally_, on the same machine, and copying
with cp or dd a large (>2Mb) file from a local fs to the "imported"
fs. 

E.g. mount_nfs localhost:/usr /local ; cp LARGEFILE /local

The systems timeouts indefinitely in NFS client code; softmounting
does not solve the problem. Using NFS 2 does solve the problem.

None of the latest John Dyson's fixes addresses this; it seems
to be a more fundamental problem. 

Here's why it happens. As cp keeps issuing write()'s, and they
become nfs_write()'s, nfs_write() keeps filling buffer after
buffer and calls nfs_doio to write them. Since it's NFS 3, the
write is async by default (until commit comes along), and 
nfs_doio marks the buffer dirty and delayed-write, and sends
the write (later biodone will release the buffer onto the dirty
queue). The "server", which is the same machine, keeps receiving
these writes, and since it's NFS 3 again, it calls bdwrite() 
instead of bwrite(), also putting them onto the dirty queue. 
At some point, server's bdwrite() will discover there're too many
dirty buffers (numdirtybuffers>=highdirtybuffers, which is 256
by default, thus the approx. 2Mb limit), and will try to flush
dirty buffers. However, some of those dirty buffers are _client's_
dirty buffers, and flushing them will try to nfs_commit(). This
nfs_commit() will fail because we still haven't returned from the
previous nfs_write() (the server needs to flush buffers in order
to perform it). We're in a deadlock. If it's a soft mount, after
a few minutes nfs_write() will timeout, and nfs_commit() will get
a chance to receive its reply from the server; however, it won't:
the server is locked trying to nfsrv_commit() - it can't do that
before nfsrv_write()->bdwrwite()->flushdirtybuffers() return.
The client can't even resend commit since the NFS send window
shrinked after all those timeouts. 

Note that although importing NFS-exported fs locally is bizzarre,
the same scenario can happen on two machines which are importing
from each other, when there're enough dirty buffers on each. The
problem is, formally, that nfs_writerpc() which is on a layer
lower than buffercache, leads to an upcall on the server which can
lead to the server's calls on the buffercache layer. 

There may be different possible ways to fix this, but I'm not
even sure at this point it's considered a problem, and how bad
should it be considered. (I don't have two machines to test a
deadlock between two). 

Note that if you cat LARGEFILE instead of cp or dd, it never
hangs. The reason is that cat sends 1024-byte blocks instead
of full buffers or more to nfs_write, and nfs_write deals with it
by bdwrite()'ing them and not calling nfs_doio() at all - it'll
get called later when there's a need to purge the dirty cache. 
This consideration leads to discovering a _bug_ in nfs_doio:
when it both sends a full buffer and puts it into dirty cache,
it never checks if there's a need to flush buffers, and 
numdirtybuffers merrily grows much greater than highdirtybuffers
(it can't check really, because it doesn't see highdirtybuffers which
is local to vfs_bio.c; it shouldn't ++numdirtybuffers itself
but rather should call bdirty (not bdwrite()), which is currently
never called by anyone and should also be slightly modified; I
can send a patch for this to whomever's interested). 

--
Anatoly Vorobey,
mellon@pobox.com http://pobox.com/~mellon/
"Angels can fly because they take themselves lightly" - G.K.Chesterton

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980316170901.06534>