Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Sep 1998 22:43:34 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        Don.Lewis@tsc.tdk.com (Don Lewis)
Cc:        freebsd-fs@FreeBSD.ORG
Subject:   Re: vm system interaction with nullfs
Message-ID:  <199809132243.PAA21022@usr04.primenet.com>
In-Reply-To: <199809130701.AAA22846@salsa.gv.tsc.tdk.com> from "Don Lewis" at Sep 13, 98 00:01:42 am

next in thread | previous in thread | raw e-mail | index | archive | help
> Since the vm system keeps track of what it has in memory by (vnode, offset),
> how is this supposed to work when stackable filesystems are in use which
> create multiple vnodes for a single filesytem object, or is this broken?

Yes, this is still broken.

This was the primary reason for the migration of a putpages/getpages
into all "bottom-of-stack" FS's.

The general fix is to create a "getfinalvp".  This would allow you
to page through an object, while allowing layers stacked on top to
dictate layout/content.

Another part of the puzzle is that the vnode locking is overly
complex (VOP_LOCK).  The specific remedy for this case is to move
the locking code for top level VFS consumers (system calls, NFS
server, etc. are all VFS consumers) into the common code; this
resolves the recusion issues.

For FS's that stack on top of other FS's, they must call the general
locking code on the underlying vnodes.

That is, it is the reponsibility of a VFS consumer to ensure the lock
state before making VOP calls on the vnode(s).

The last part is that operations that otherwise need specific
knowledge of in core inode need to be divorced from that knowledge.
For the most part, this means that advisory locks must be asserted
on vnodes, not on in-core inodes.  This thins down the code that
you must duplicate going down, and makes it possible to proxy
locks down.  Again, the code coes into the VFS consumers.  The
typical implementation for terminal (bottom-of-stack) VFS's is
to implement the code as a veto-based operation, and the terminal
FS's never veto the operation.


These steps, taken together, move most of the complexity of stacking
into a single set of common routines, and make operations much
thinner for stacking layers that don't implement them directly, but
instead pass the operations down.


> Unless this works right, it looks like you'll end up with multiple copies
> of the same disk blocks in memory and in memory copies may all be different.

It's worse than merely aliasing the the VM objects (in reality, the
VM object associated with the vnode would be aliased if you used this
code today, and the free order would be the inverse of the stacking
order -- in other words, it would *always* get them wrong).

The existance of multiple copies is less of a problem than the
fact that you don't know the right backing object for a given
piece of data in a stack (consider the case of a "quotafs", where
quotas are implemented using a stacking layer instead of being an
integrated feature of the FS).  This is the point of the "getfinalvp"
suggestion -- probably more properly "getbackingvp".


> It would seem that in the case of nullfs and similar transparent filesytems,
> the vm system should always use the lowest vnode, but this doesn't seem to
> be implemented (though I could just be getting lost in the maze of twisty
> little passages).  It's even messier if the layer isn't transparent,
> like an encryption layer.

Not in all cases, actually.  For a cryptographic FS (such as the
one one of John Heidemann's students wrote, and John sent me), you
will want a backing object for the unencrypted data, seperate from
the backing object of the on-disk data.

There are also cases where the in-core data and the backing data
aren't the same size.  For some of these (like a compression layer),
you would implement this via the comperssion layer's vp's get/putpages,
and not operate on the backing store directly.

One could also imagine an "attributed" FS, where the files have
"attribute binary" or "attribute text + character set".  For
example, if you were to use a Unicode representation of an NFS
mounted legacy filesystem, you might want to attribute text files
on the basis of the remote character set (e.g., ISO 8859-2), and
do a two-for-one page expansion of teh data for it to be locally
usable.


In any of these cases, the "getbackingvp" would return an object
higher up in the stack that the actual backing object, and it
would have vm pages that it would fill from the underlying backing
vp using the get/putpages of the vp for the actual backing object.


Really, you should go to ftp.cs.ucla.edu, and read up on the
stacking architecture.  The documents in the "ficus" directory
are the actual design documents for the BSD 4.4 stacking
architecture.



					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809132243.PAA21022>