Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Jan 1999 19:03:58 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dyson@iquest.net
Cc:        tlambert@primenet.com, dillon@apollo.backplane.com, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject:   Re: questions/problems with vm_fault() in Stable
Message-ID:  <199901061905.MAA09634@usr05.primenet.com>
In-Reply-To: <199901060255.VAA02192@y.dyson.net> from "John S. Dyson" at Jan 5, 99 09:55:45 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > The distinction I'm trying to make is between VFS layers that are VFS
> > consumers and providers, and VFS layers that that are VFS providers,
> > but VM system consumers.  VM system consumers are "local media" FS's.
>
> The VM system is a provider :-).  The VFS provides only the file naming
> abstraction.

I know that.  But there are three consumers of VFS's.  Only the
first one works correctly in the -current code:

	system calls	NFS server	VFS
	------------	----------	---
	VFS		VFS		VFS

In the provider case, where the VFS consumer is, you have three
cases as well:

	VFS		VFS		VFS
	---		----------	---
	VM		NFS client	VFS

Again, in the -current code, only the first one works correctly in
the -current code.


As I said before, I'm trying to distinguish third case in both
situations: a VFS provider designed to consume VFS's instead of
local resources.


> > The problem is in ensuring object coherency.  Here we have this
> > enormous investment in a unified VM and buffer cache, which is mostly
> > just a way for us to avoid the coherency complications that having
> > seperate VM and buffer cache cause, and we're talking about doing
> > something that will reintroduce the complications.
>
> Please look at the ideas again -- using the VM mechanisms for layering
> eliminate the IMPOSSIBILITY for a VFS layering scheme (as it currently
> is in any fashion) to provide coherence.  A bidirectional object oriented
> scheme can easily provide coherence.  Tell me, what protocol exists
> (or can exist) in the current VFS framework to provide coherence for
> mmap, or file I/O with different layers?  Answer: none.  Thinking of
> things as files begs the fact that files are all a specific instance
> of a memory object, let's forget about "files" and local filesystems.
> (I guess that it is possible for a VFS to provide coherence, at the
> expense of continual "sync" operations -- that is just wrong.)

Thinking in terms of a "protocol" is already thinking in terms
of having to maintain coherency, instead of the coherency being
implicit.

It's wrong to access pages off a vnode directly without using a
VOP_GETPAGES/VOP_PUTPAGES as an accessor function.  If, on the
other hand, all of the references are appropriately encapsulated
behing an abstract interface like VOP_GETPAGES/VOP_PUTPAGES,
then you don't have to hang the pages off of the objects themselves.


> Think in terms of "stuff" or "objects."  The only way that coherence
> can work in the current framework is in the non-layered and/or local
> case.  If you get much beyond the typical filesystem structure, then
> problems start arising, and hacks tend to be the expedient solution.
> Rather than fight an existing structure, it seems that the correct
> thing is to rethink the structure that can architecturally provide
> the features needed and desired -- without hackery and special rules.

I think that alias references are hackery of the type you are
objecting to.

I have no doubt that between you, Matt, and David, they can be made
to work.  But they are an inelegant increase in complexity that is
not meritted by the situation.

I think that the correct way to deal with this is to acknowledge
that there will be stacking layers that only provide semantics;
a directory hierarchy abstraction, or an ACL mechanism, or a
quota mechanism, etc., etc..  If you look at the FFS/UFS layers
interaction, you see this.  The vnodes for directories, which are
containers for sequences of blocks, do not originate in a different
layer than the vnodes for files (also containers for sequeneces of
blocks).

There are VFS stacking layers you could envision that would need
anonymous virtual memory.  A block-by-block compression layer
(a file compression layer would be solidly in the previous list),
a cryptographic layer, etc..  Any layer that needs to modify the
data content representation.  Hell, one of my hobby horses is a
layer that you'd stack on top of an NFS client to do ISO-8859-X
to Unicode conversion so that a Unicode capable system could
consume NFS resources from an 8-bit legacy system.

But these anonymous memory aware VFS stacking layers are the
exceptions, just as a VFS layer that interacts with the VM
abstraction of a physical device is an exception.  Most VFS stacking
layers that you can envision are semantic only.  And for a semantic
layer it just doesn't make sense to instance an alias object
instead of accessing the real object.

The problem here is the BS of the default VNOPS supporting the
generic getpage/putpage code, and of the NFS code using the old
bmap mechanism.  The generic code needs to go away.  The bmap
references need to go away.


The way to access pages is to call the vnode's getpages/putpages,
and if it is a VFS layer where you are suggesting an alias object,
the default implementation is to fall thorugh to the smae functions
for the vnode backing the vnode you are referencing, until the
operation gets down to the vnode off which the pages are actually
hung.


If you need to support a "default" generic mechanism, then there
has to be VOP that the mechanism can call to ask for the vnode
object off which the pages are hung, a VOP_GETBACKINGVP, which
each stacking layer that doesn't contain the backing vnode itself
implements.  If we did the "default VNOPS" approach, we would
implement it as an underlying stacked vnode dereference, and the
FS's that had vnodes that had pages hung off them would return
the vp that they were passed.

If you want to call this a "protocol", go ahead, but it doesn't
need any coherency notification functions to implement, so it
really doesn't meet the definition.


> > Maybe I'm coming in in the middle of a long private discussion (it
> > feels like it, and Julian has hinted that I am), but I think that
> > its necessary to fix some of the things we all know are broken
> > before we try to get tricky... my 2 cents.
>
> The problem with the VFS is that it doesn't appear to be possible to
> fully fix it given the goal of coherent layering.  There is no sane way
> that a lower layer invalidation can properly propagate upwards to
> other consumers without treating the data (or objects) with a proper
> invalidation protocol.

You don't need this.  An invalidation is a return of a NULL from
a VOP_GETBACKINGVP call, if an invalidation exists at all (the only
case where it could exist would be the VM system using the object
itself as a backing store, and the object being accessed externally
to the machine where it was being accessed, e.g., an NFS serve
failure. Even so, what we are talking about is a kludge to deal with
the inadequaces of the error channel in the fault path, not some
intrinsic need.  The "deadfs" crap falls into this same category).

I think the problem that you are trying to solve is implicit in
the creation of aliases.  If you don't create aliases, then the
problem melts away.


> Any of the problems with the existing VFS/VM scheme have been with the
> intricacies of dealing with VFS special cases, and dealing with the
> I/O abstraction of buffers as a cache.

Well, this is typically where I get all worked up.  I think that the
existance of special cases at all is an architectural error, and
therefore that attempting to deal with them instead of fixing the
architecture is merely a compounding of the original error.


> Forget "files" and think
> "blobs of memory."  Once the notion of file is forgotten, then shadowing,
> invalidation and aliasing of memory become very obvious...

Which is why I think the idea that a vnode and a vm object should
be synonymous is wrong.  Kirk called the vnode "the structure that
took over the kernel".  It's time to curb that dog.


> I wouldn't really care if the new scheme would be called "VFS", but forget
> vnodes and bp's...  They are unnecessary and undesirable abstractions except
> at the lowest layers (the leaf filesystems that have those legacy concepts.)

A vnode is a container object through which an accessor function
can get pages.  It may contain pages itself, OR it may contain a
reference to another vnode container.  Eventually, you get down to
a leaf vnode that contains pages.

Mirroring the vnode stacking relationship with an alias stacking
relationship is a coherency nightmare.  It shouldn't be done.

A vnode containing another vnode *is* a necessary abstraction.  It
defines a semantic boundary.  That's what it's for, that's the intent
of the stacking vnode architecture.  To define semantic boundaries.

The few cases where a vnode on top of a stacking boundary needs to
make an information manipulation (e.g., running a one-time-pad
transformation on encrypted data) is an exception (it's worse than
an exception: you can't permit the unencrypted data to be written to
swap as cleartext).

Even in these cases, the primary function of the act of stacking the
vnodes is *not* to obtain anonymous pages to coherency shadow the
underlying contents; it's to define a semantic boundary.  The
anonymous pages are an implementation detail, nothing more.

It's absolutely imperitive to the value of VFS stacking to preserve
the idea of having an object to encapsulate semantic boundaries.  The
entire point of VFS stacking is to enable stacking of semantics.


I think before any sweeping changes go into this area, that it
should be a requirement that those proposing the changes read
John Heidemann's master's thesis, and ensure that the design goals
of the stacking architecture are not compromised in the change
process, as they were in the original 4.4BSD integration of
John's code, and as they continue to be in the various *BSD's.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901061905.MAA09634>