Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Jan 1999 02:11:30 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dillon@apollo.backplane.com (Matthew Dillon)
Cc:        tlambert@primenet.com, dyson@iquest.net, pfgiffun@bachue.usc.unal.edu.co, freebsd-hackers@FreeBSD.ORG
Subject:   Re: questions/problems with vm_fault() in Stable
Message-ID:  <199901080211.TAA00869@usr01.primenet.com>
In-Reply-To: <199901072306.PAA35328@apollo.backplane.com> from "Matthew Dillon" at Jan 7, 99 03:06:21 pm

next in thread | previous in thread | raw e-mail | index | archive | help
OK, now on the the non-MFS alias issues:


> :There's *no* memory waste, if you don't instanceincoherent copies
> :of pages in the first place.
> 
>     You are ignoring the point.  I mmap() a file.  I mmap() the block device
>     underlying the filesystem the file resides on.  I start accessing pages.
>     Poof... no coherency, plus lots of wasted memory.

No.  You mmap the file.  This instances a vnode, and a VM object using
the vnode as a swap store.

The VM object using the vnode as a swap store makes VOP_GETPAGES and
VOP_PUTPAGES calles through the VFS interface in order to satisfy
page read and write faults, respectively.

  For a 386, which does not support write faulting in kernel mode,
  this is a problem.  You have to unmap the page, and handle the
  translation lookaside in the fault handler (welcome to the 386
  complications to the copyin/copyout routines since time immemorial).

The underlying VFS makes VM calls, and these go against the real
(VM) backing object.  It would be a mistake to try and tunnel the
VM through the VFS to alias these objects to the same object.  For
one thing, they may not be on the same machine.

I don't really understand how you expect to be able to use a file
on an NFS as a swap store for a program image without a seperate
local and remote copy of the object.


>     You are assuming that VFS devices can be collapsed together such
>     that the inner layers are not independantly accessible, and thus
>     cannot be independantly accessed without going through the upper
>     VFS layers.  This is an extremely restrictive view which fails
>     utterly in a number of already existing cases and fails even
>     worse when you try to extend the model across a network.

No, I'm not allowing aliases in the case of adjacent local layers.

Consider the case of an ACL VFS stacking layer stacked on top of
an NFS client and under a system call layer.

If you expose the ACL layer seperate from the layer on which it is
mounted, then you can access the file in two places in the directory
hierarchy.

The underlying NFS has vnodes that pont to NFSnodes.  These have VM
objects associated with them.

The ACL FS stacks on top of this VFS.  It *also* has vnodes.  These
vnodes are used as abstract credential holders, and point to the
underlying vnode in the NFS.

These vnodes DO *NOT* have VM objects associated with them.

When someone reads or writes a page in the buffer cache from a
user space program, then the fault results in a VOP_GETPAGES or
VOP_PUTPAGES, repsectively.

For the NFS, the information is directly accessed (with lease
controls -- otherwise known as opportunity locks - for cache
coherency) from the underlying vnode's VM objcts.

For the ACL FS stacked on the NFS, the VOP_GETPAGES and VOP_PUTPAGES
*ARE PASSED DOWN*.  The entire purpose of the ACLFS is to impose
access semantics on the underlying vnode objects.  The way it does
this is by a namespace escape of an access control file that it
itself accesses by way direct calls to the underlying NFS.  A
(really) hidden file, in other words.


If we stack a QUOTA FS on top of this, and expose it in another
location in the directory hierarchy, the same rules apply.  It
probably even uses a similar namespace escape to hide a quota
file at the root of the directory hierarchy for the mounted FS.


If we stack a RESOURCE FS that turns file creations into directory
creations, and supports VOP_LOOKUP based inherited flagging
semantics on top of this, the same rules apply, but it's a little
more complex.  When a file is created, it actually creates a
directory with the file name, with an internal file namesd something
like "filedata" (leaving the 4 character upper case namespace for
resources in the "resource fork" or the "_xxx" namespace for an
underscore prefix for OS/2 extended atrtibute fors for the file.


In each case, however, the requests to obtain the page for reading
and writing go down to the cnode object off of which it is hung.


OK, that's most FS's for which metadata is tunneled; that's pretty
easy to understand.  And in the last case, it doesn't really make
sense to expose the FS hierarchy underlying the RESOURCE FS
layer, because it exposes you to non-cache based namespace coherency
problems (e.g., how do you handle someone CD'ing into a directory
thats a file, and removing the "filedata" file containing the file
data fork, without also removing the associated resource fork or
extended attribute?  ... You can't, and the VM cache coherency
protocol you propose won't handle this non-VM coherency problem,
either).



Part II: The case where cache coherency is a real issue

So now we build a cryptographic FS.  It uses any CDROM as a one
time pad, and does duplicate eliminatation on the CDROM data so
that runs of identical data, especially 0's, are not adjacent
to allow statistical analysis.  At the same time, it XOR's in
a repeating password so that pattern data is not differentially
analyzable (we could fix this by using peephole techniques to
machine-eliminate repeating patterns, and deal with comoon phrase
elimination (English speakers CDROMs probably contain English text,
etc.), but we're going to be lazy about the implementation.


The OTPFS (One Time Pad FS) has vnode objects that stack on top
of the underlying vnode objects.


Now we have two problems:

(1)	The coherency issue can not be dealt with via aliases
	because the data in the decrypted form of a page can
	not be used.  In other words, there is no direct alias
	between one page and the other.  They are *procedurally*
	related, but not content-identical (we used a OTP to get
	around the issue of N:M byte relationships where N != M).

	Nevertheless, we must deal with read and write faults,
	and update the upper pages on the former and the lower
	pages on the latter.

(2)	Because this is sensitive data, it should not be written
	to persistant storage.  That means that the anonymous
	pages can't be that anonymous.  The pages can't be backed
	by persistant storage, only memory, and only memory that
	is protected from view by other processes.  So you can't
	use a file or swap as the backing store for any dirty
	unencrypted pages; instead, you must reencrypt direty
	pages and store them out.  Since you are using a OTP, if
	you used the same offset on the CDROM to do this, then you
	would compromise the pad.  Therefore, you must store metadata
	with the offset into the pad, as well.  Probably, you want
	to *not* write dirty data for as long as possible, since
	each write of a page will eat another 4k of your pad.

Neither of these is amenable to the standard vmobject/vmobject alias
soloution.  The only way to deal with teh fault issue is procedurally.

If an access is done at the intermediate layer (say because you don't
want to send cleartext over the net between a remote accessor and the
machine on which the data is stored), then the "getpages" needs to
operate on the underlying object.  In otherwords, a cached copy must
be invalidated.

Luckily, we do not keep a true cached copy.  We merely need to check
the underlying page against the upper page for timestamp when a
getpages occurs on the upper level page.  If the lower level page
(and pad offset) has changed out from under it, then the page is
updated from the lower level page.  Again, we see that we must
provide procedural access for page contents.  The cached copy is
dependent on a lower page reference, even if it does not result in
a pad translation.

We could simplify this considerably by requiring a pad translation
for each unencrypted page reference.  We don't do this for two
reasons:  first, because it's overhead; second, because we need to
prove to ourselves that we can handle the coherency issue for multiple
accessors at multiple levels, without resorting to VM page aliases.


Part III: Where do we need aliases?

Aliases would be useful if we wanted to tunnel page mappings between
an underlying VM object and an upper level VM object using the FS
which owns the underlying object as a file store object for
potentially non-adjacent physical blocks.

In other words, it's a VN device (file as block device) optimization
that isn't strictly necessary, and, given the potential non-adjacency
of the underlying blocks, probably not a useful one.  The mapping
maintenance overhead will drive the cost up to the point that the
optimization has no value in all but special (contiguous) cases.

Aliases would also be useful if one vnode directly represented the
blocks of some underlying vnode.

But the savings here are minimal; it would be trivial to cache the
FINALVP (the vnode pointer that has the underlying VM object
association) as a vnode pointer instead of a VM object alias.
Dereferencing a vnode to get an alias object, and dereferencing
the alias object to get the actual VM object, is no less expensive
than dereferencing a vnode to get a vnode, and then dereferencing
that vnode to get the actual VM object.

Moreover, this type of optimization assumes a great depth of stacked
vnodes.  While this might occur in some specialized cases, this type
of optimization is best left as an option for the VFS implementor.

Indeed, one could easily envision an "ALIASFS" layer, whose sole
reason for existance was to provide a vnode that cached the
underlying vnode that contained the VM object, many layers below
itself.  A much saner implementation than introducing aliases
everywhere in the expectation of a performance win of a double
dereference over a stack traversal.


Part IV: Conclusion

So we don't need aliases in almost any cases.  In the general case
of an object that aliases an object, a special cacheing layer, or
per layer caching can be employed.  Those cases are rare.

In transformational layers, such as our OTPFS, the aliases are
useless because the data is not the same, and, in fact, increased
anonymity of VM resources is counterproductive to the purposes of
the FS.  It damages their ability to do what they are intended to
do.

In pure semantic layers, and even in semantic layers that tunnel
their information, like our ACLFS, our QUOTAFS, and our RESOURCEFS,
it's useless because it would be counterproductive to use real
vnodes at these layers in the first place.  Using real vnodes in
these layers would, in fact, add needless translational complexity
(as we see in the 1992 code in /sys/miscfs/nullfs/nullfs_vnops.c
to support the ugly nullfs_bypass() VOP -- something not necessary
on other platforms because of more correct paging architecture).


In other words, we don't need aliases, except in cases where we
introduce unnecessary abstraction and complexity, for the apparent
purpose of requiring aliases.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199901080211.TAA00869>