Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jan 1999 22:26:37 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        tlambert@primenet.com, dyson@iquest.net, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject:   Re: questions/problems with vm_fault() in Stable
Message-ID:  <199901072226.PAA11860@usr01.primenet.com>
In-Reply-To: <199901070403.UAA27395@apollo.backplane.com> from "Matthew Dillon" at Jan 6, 99 08:03:58 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> :>     The VFS layer should make no
> :>     assumptions whatsoever as to who attaches to it on the frontside,
> :>     and who it is attached to on the backside.
> :
> :Fine and dandy, if you can tell me the answers to the following
> :questions:
> :
> :1)	The system call layer makes VFS calls.  How can I stack a
> :	VFS *on top of* the system call layer?
> 
>     The system call layer is not a VFS layer. 

Right.  It's a VFS consumer.  It has a VFS interface on its bottom end.

Hence the whole consumer/provider issue.


> :2)	The NFS server VFS makes RPC calls.  How can I stack a
> :	VFS *under* the NFS server VFS?
> 
>     An NFS server in its current incarnation is not a VFS layer.  

Right.  It's also a VFS consumer and an RPC transported NFS service
provider.

I actually meant to say "NFS client" here; the NFS client is a VFS
provider, but and RPC transport consumer.  It has a VFS interface
on its top end.


>     The point is that if you go around trying to assign special circumstances
>     to the head or tail of VFS layers, you wind up in the same boat where we 
>     are now - where VFS layers which were never designed to terminate on
>     anything other then hard block device now, magically, are being terminated
>     on a soft block.  For example, UFS/FFS was never designed to terminate
>     on memory, much less swap-backed memory.  Then came along MFS and
>     suddently were (and still are) all sorts of problems.

I'd argue that MFS is an inappropriate use of the UFS code, since the
UFS code doesn't acknowledge the idea of shrinking block-backing
objects, and barely (with severe fragmentation based degradation)
recognizes growing block-backing objects.

So the idea that there are "suddenly" problems as a result of an
"unexpected but legitimate use" is predicated on the false
assumption that such use is legitimate.

I also believe that the canonically correct way to deal with MFS
issues (if you *insist* on using an inappropriate FS architecture)
is by providing a device interface to anonymous memory, and creating
the FS on that "device", instead.  Clean pages that haven't been
written at all don't need to be instanced, and you will have the
same (effectively) soloution as the current MFS soloution, but
without the current MFS problems.


> :I'm not trying to 'type' a VFS layer.  The problem is that some
> :idiot (who was right) thought it's be faster to implement block
> :access in FS's that need block access, instead of creating a generic
> :"stream tail" that implemented the buffer cache interface.
> :
> :If they had done that, then the VOP_GETPAGES/VOP_PUTPAGES would
> :directly access the VOP_GETBLOCKRANGE/VOP_PUTBLOCKRANGE of the
> 
>     There's nothing wrong with block access - it is, after all, what most
>     VFS layers expect.  But to implement block access only through 
>     VOP_GETPAGES/PUTPAGES is insane.

OK.  Define another interface with a couple functions or smaller
footprint, and call the functions something else.  The point is that
the interaction footprint for UFS/FFS right now is over 150 kernel
functions.

Try linking *just* the FFS/UFS/"kern/vfs*" object files to see what
the kernel interface consumption exposure is for the FS.

It is *hardly* the ideal of portable, abstract code.

I don't care what you call the interface, so long as it meets the
criteria of (1) it has a small footprint, and (2) it's sufficiently
abstrract so as to not render VFS layers non-portable between OS's
(for example, Rhapsody, VXWorks, FreeBSD, and Windows 98).


>     That's why the vm_object model
>     needs to be extended to encompass VFS layering - so the VM
>     system can use it's ability to cache pages to shortcut (when possible)
>     multiple VFS layers and to maintain cache coherency between VFS
>     layers, and in order to get the efficiency that cache coherency gives
>     you - much less memory waste.

There's *no* memory waste, if you don't instanceincoherent copies
of pages in the first place.

The ability to "shortcut multiple VFS layers" is an artifact of
the non-collapse of stacks.  The UFS/FFS interaction is an example
of a correct collape.  If it the interfaces weren't skewed, NULLFS
would be another example, where multiple NULLFS instances collapsed
to *no* local vnode definitions, and one call boundary.  Instead,
you are suggesting that we instance vnodes in each NULLFS layer,
and that we complicate this by associating VM object aliases with
each layer instance to deal with the coherency issues that come
from adding VM object aliases in the first place, and *then* we
"shortcut" page references (and *only* page references, as pigs which
are more equal than other references) by referncing through the
alias.

This is rather an insane amount of useless complexity to get around
the coherency problems which wouldn't exist had you not introduced
vnodes in the null stacking layer case as placeholders for your
coherency mechanism, don't you think?


>     The GETPAGES/PUTPAGES model *cannot* maintain cache coherency across
>     VFS layers.  It doesn't work.  It has never worked.  That's the fraggin
>     problem!

Works on SunOS.  Works on Solaris.  If you have a source license,
or sign non-disclosure, John Heidemann will show you the code.


> :OK.  You are considering the case where I have two vnodes pointing
> :to the same page, and I invalidate the page in the underlying vnode,
> :and asking "how do I make the reference in the upper vnode go away?",
> :right?
> :
> :The way you "make the reference in the upper vnode go away" is by
> :not putting a blessed reference there in the first place.  Problem
> :solved.  No coherency problem because the problem page is not
> :cached in two places.
> 
>     Uh, I think you missed the point.  What you are basically saying is:
>     "I don't want cache coherency".... because that is what you get.  That
>     is, in fact, what we have now, and it means that things like MFS
>     waste a whole lot of memory double-caching pages and that it is not
>     possible to span VFS layers across a network in any meaningful way.

No.  What I'm saying is that I don't want to allow things that
result in coherency tracking problems in the first place.

It's like the Aluminum plaques in an Alzheimer's sufferer: there's
no proven connection between Aluminum consumption and these plaques,
but it's unlikely that the human body is capable of transmuting
Potassium into Aluminum, through some magical process, in the
abscense of dietary Aluminum that could be used by the body in
their construction.

If you don't introduce the building blocks for the problem, then the
problem can't be built.


> :The page's validity is known by whether or not it's valid bit is
> :set.  What you *do* have to do is go through the routines for
> :VOP_GETPAGES/VOP_PUTPAGES if you want to change the status of a
> 
>     This doesn't work if the VFS layering traverses a network.  Furthermore,
>     it is *extremely* inefficient.

So traversing a network,  The issue of latency is not going to go away
if you make something that's 1/1000th of the latency into 1/10,000th
of the latency.  You're optimizing the wrong thing, if network transport
is your concern.

>     Hell, it doesn't even maintain cache
>     coherency even if you DO use VOP_GETPAGES/PUTPAGES.

Not true.  Add the address of the memory to be filled as an argument
(it's there already), and the data from the remote end can be
marshalled via the argument descriptor.


>     Now, Terry, if you are arguing that we don't need cache coherency, then
>     ok ... but if you are arguing that we should have cache coherency, you
>     need to reexamine the plate.

If there's only one object, there's not a coherency problem.  If you
make more than one object, then you need explicit instead of implicit
coherency.  If everything you need is in the descriptor, however,
then it's marshallable over *any* interface.  Network, or user/kernel
proxy for user space developement environment.


>     I, and John, are arguing that a lack of cache coherency (covering both
>     data pages and filesystem ops) is a serious stumbling block that needs
>     to be addressed.  Big time.

I agree with you.  We just disagree about the method of addressing it.
I think that the method should be similar to what Heidemann has already
demonstrated as working in SunOS and Solaris.  I don't think that you
need to add a huge amount of parallel complexity to the VFS VM object
interaction; I think, instead, you need to consider simplifying the
VFS vnode interaction model so that VM complications are unnecessary.

Yeah, VM aliases could be useful somewhere, but I think the VFS system
is one place where they are further down an already wrong road.  Think
about a NULLFS layer, and how to make it truly NULL; that's the minimal
set.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901072226.PAA11860>