Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Aug 1999 13:44:34 -0700 (PDT)
From:      Bill Studenmund <wrstuden@nas.nasa.gov>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        Hackers@FreeBSD.ORG, fs@FreeBSD.ORG
Subject:   Re: BSD XFS Port & BSD VFS Rewrite
Message-ID:  <Pine.SOL.3.96.990817092121.6014C-100000@marcy.nas.nasa.gov>
In-Reply-To: <199908170231.TAA08526@usr02.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 Aug 1999, Terry Lambert wrote:

> > > > > 2.	Advisory locks are hung off private backing objects.
> > I'm not sure. The struct lock * is only used by layered filesystems, so
> > they can keep track both of the underlying vnode lock, and if needed their
> > own vnode lock. For advisory locks, would we want to keep track both of
> > locks on our layer and the layer below? Don't we want either one or the
> > other? i.e. layers bypass to the one below, or deal with it all
> > themselves.
> 
> I think you want the lock on the intermediate layer: basically, on
> every vnode that has data associated with it that is unique to a
> layer.  Let's not forget, also, that you can expose a layer into
> the namespace in one place, and expose it covered under another
> layer, at another.  If you locked down to the backing object, then
> the only issue you would be left with is one or more intermediate
> backing objects.

Right. That exported struct lock * makes locking down to the lowest-level
file easy - you just feed it to the lock manager, and you're locking the
same lock the lowest level fs uses. You then lock all vnodes stacked over
this one at the same time. Otherwise, you just call VOP_LOCK below and
then lock yourself.

> For a layer with an intermediate backing object, I'm prepared to
> declare it "special", and proxy the operation down to any inferior
> backing object (e.g. a union FS that adds files from two FS's
> together, rather than just directoriy entry lists).  I think such
> layers are the exception, not the rule.

Actually isn't the only problem when you have vnode fan-in (union FS)? 
i.e.  a plain compressing layer should not introduce vnode locking
problems. 

> I think that export policies are the realm of /etc/exports.
> 
> The problem with each FS implementing its own policy, is that this
> is another place that copyinstr() gets called, when it shouldn't.

Well, my thought was that, like with current code, most every fs would
just call vfs_export() when it's presented an export operation. But by
retaining the option of having the fs do its own thing, we can support
different export semantics if desired.

> Right.  The "covering" operation is not the same as the "marking as
> covered" operation.  Both need to be at the higher level.
> Not really.  Julian Elisher had code that mounted a /devfs under
> / automatically, before the user was ever allowed to see /.  As a
> result, the FS that you were left with was indistinguishable from
> what I describe.
> 
> The only real difference is that, as a translucent mount over /devfs,
> the one I describe would be capable of implementing persistant changes
> to the /devfs, as whiteouts.  I don't think this is really that
> desirable, but some people won't accept a devfs that doesn't have
> traditional persistance semantics (e.g. "chmod" vs. modifying a
> well known kernel data structure as an administrative operation).

That wouldn't be hard to do. :-)

> I guess the other difference is that you don't have to worry about
> large minor numbers when you are bringing up a new platform via
> NFS from an old platform that can't support large minors in its FS
> at all.  ;-).

True. :-)

> I would resolve this by passing a standard option to the mount code
> in user space.  For root mounts, a vnode is passed down.  For other
> mounts, the vnode is parsed and passed if the option is specified.

Or maybe add a field to vfsops. This info says what the mount call will
expect (I want a block device, a regular file, a directory, etc), so it
fits. :-)

Also, if we leave it to userland, what happens if someone writes a
program which calls sys_mount with something the fs doesn't expect. :-)

> I think that you will only be able to find rare examples of FS's
> that don't take device names as arguments.  But for those, you
> don't specify the option, and it gets "NULL", and whatever local
> options you specify.

I agree I can't see a leaf fs not taking a device node. But layered fs's
certainly will want something else. :-)

> The point is that, for FS's that can be both root and sub-root,
> the mount code doesn't have to make the decision, it can be punted
> to higher level code, in one place, where the code can be centrally
> maintained and kept from getting "stale" when things change out
> from under it.

True.

And with good comments we can catch the times when the centrally located
code changes & brakes an assumption made by the fs. :-)

> > Except for a minor buglet with device nodes, stacking works in NetBSD at
> > present. :-)
> 
> Have you tried Heidemann's student's stacking layers?  There is one
> encryption, and one per-file compression with namespace hiding, that
> I think it would be hard pressed to keep up with.  But I'll give it
> the benefit of the doubt.  8-).

Nope. The problem is that while stacking (null, umap, and overlay fs's)
work, we don't have the coherency issues worked out so that upper layers
can cache data. i.e. so that the lower fs knows it has to ask the uper
layers to give pages back. :-) But multiple ls -lR's work fine. :-)

> > I agree it's ugly, but it has the advantage that it doesn't grow the
> > on-disk inode. A lot of flks have designs on the remaining 64 bits free.
> > :-)
> 
> Well, so long as we can resolve the issue for a long, long time;
> I plan on being around to have to put up with the bugs, if I can
> wrangle it... 8-).

:-)

I bet by then (559447 AD) we won't be using ffs, so the problem will be
moot. :-)

Take care,

Bill



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.SOL.3.96.990817092121.6014C-100000>