Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 4 Aug 1998 23:18:25 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        wollman@khavrinen.lcs.mit.edu (Garrett Wollman)
Cc:        freebsd-fs@FreeBSD.ORG, core@FreeBSD.ORG
Subject:   Re: Exclusive locking for directory lookups?
Message-ID:  <199808042318.QAA11288@usr07.primenet.com>
In-Reply-To: <199808041758.NAA03021@khavrinen.lcs.mit.edu> from "Garrett Wollman" at Aug 4, 98 01:58:33 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> Does anybody remember why plain-jane directory lookups (i.e., not
> deleting or creating anything) require an exclusive lock on all the
> directory vnodes along the path?  It would seem to be that only shared
> locks should be necessary in those cases...

Because there is no flag from namei() to indicate whether the
terminal path component is going to be modified or not.  This is
more a symptom of the way namei() is implemented; it can't
support inheritance of such a flag (or inheritance of a POSIX
namespace escape ("//<escape>/<rest-of-path>") for the same reason.

The locking is a tail-chase down the tree -- that is, the lock is
one-behind (parent and child) and is not held to the root.

The real question is whether the race this is protecting against
still exists with a unified VM and buffer cache... in which case,
the you could reexamine the need for it (for this to work, you
would need my patch to have namei() to return the EEXISTS,
instead of duplicating the code every place you wanted a lookup
to fail if the target exists, ie: the create/rename target cases,
etc. -- you need to distinguis internal vs. external error return
for this case).  My gut feeling is that it is still neccessary, even
though the race it was intended to protect against was a VM
and buffer cache coherency with multiple accesses in the "write
entry" case, mostly because of the late buffer mapping.  I could
be wrong here, though.

In any case, that would mean you would need to add a flag to
indicate a terminal component lookup to VOP_LOOKUP.  This is
somewhat problematic, because an underlying FS is permitted to
eat as many components as it wants to, according to the design.

To get around this, the idea that it is the terminal component would
have to be indicated by the non-existance of a "next component".  To
implement this approach would require pre-parsing the path into
components.  The easiest way to do this would be to keep a seperate
"total length" in the path component buffer, and replace the path
sperators with NUL, treating it as a pre-strtok'ed string.  If you
go this route, consider providing access macros as well, making the
underlying FS advance cn_nameptr (if it consumes extra components),
and in general making the structure opaque enough that we could
support multiple namespaces (the current VFAT short name binding and
assumption of ISO 8859-1 character set instead of Unicode is broken).

The underlying VOP_LOOKUP for the "writing an entry" case would
use the accessor macro to ask for the start of the next component;
if it got a NULL back, it would know it was terminal, and that it
needed to lock (handling the EEXISTS case in namei() lets you
avoid this lock, if the lookup would succeed).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808042318.QAA11288>