Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Mar 1998 08:57:35 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        mellon@pobox.com (Anatoly Vorobey)
Cc:        current@FreeBSD.ORG
Subject:   Re: vnode_pager: *** WARNING *** stale FS code in system
Message-ID:  <199803100857.BAA12940@usr09.primenet.com>
In-Reply-To: <19980309152125.08053@techunix.technion.ac.il> from "Anatoly Vorobey" at Mar 9, 98 03:21:25 pm

next in thread | previous in thread | raw e-mail | index | archive | help
NOTE: This post has "some of the fun stuff" in it.  I love this type
of thing.  8-).

This probably goes a tiny way towards Nate's "FS 101".


> Probably a silly question, but still:
> 
> What about the NFS as a whole? It's both a provider and a consumer
> of VFS. Does it mean it's a stackable FS? (forgetting for a moment
> the networking details; e.g. export and mount a local filesystem for
> proof of concept).

No.  It's two seperate pieces, the client and the server.

John Heidemann's net proxy layer (which allows you to stack two
arbitrary layers across a network by proxying the argument
descriptor contents and the VOP descriptor contents) isn't either
(though it's a lot more useful than NFS.  8-)).

> And if it does, how does it manage to exist if you're saying a lot
> of work has to be done to make stackable FSs possible?

It isn't a stackable FS.


> Another question: which OSes _today_ provide for stackable file systems?
> Not FreeBSD, apparently not Linux or Solaris. NT has got "filesystem
> filters" kind of drivers which stack above the FS (and on each other
> if needed) - this is neat and often useful - but hasn't got stackable
> filesystems IIRC. 

Well, Windows 95 has them.  I and two other guys ported the Heidemann
framework to it for a commercial product.  If FreeBSD had all the
patches I did, and undid a couple of other changes (or they were
redone in the Windows 95 code), *and* FreeBSD ran ELF -- then the same
loadable modules could be used by both systems.  8-).

SunOS has them (if you get the DES key from John and download the
code off the UCLS CS Department FTP server).

You can do something similar with the IFSMGR.VXD code in Windows95;
it's called a "Miniport Driver".  You can't do it to the VFAT.VXD
or VFAT32.VXD, unfortunately, because MicroSoft implemented the TSD
in the VFAT*.VXD.  A TSD is a "Type Specific Driver"; it's what
recognizes the partition ID and exports the "cooked device" to the
rest of Windows95.

You can do the same thing under NT.  The documentation will run you
$75,000.


> A third question: can you give a (few?) example(s?) of hypothetical
> useful stackable file systems, besides NULLFS?

Sure.  Let me make a definition, first:

Namespace Escapes

	A namespace escape is used by a stacking layer to implement
	stacking layer specific storage.

	Namespace escapes can be implemented one of several ways:

	1)	A file on the root of the FS.

		The file is read and/or written by the stacking layer.
		The file may or may not be hidden.  The stacking layer
		may or may not protect the file against modification
		by users.  It uses the schg and sunlnk flags to protect
		against modification; it uses a new flag, shide to
		hide the file from directory lookups.  It uses the
		sappnd if it wants to keep an immutable log.

		This implementation is called a "Namespace Incursion",
		because it puts a special file name in the namespace,
		and (potentially) obstructs the user from using the
		name.  The user can tell it's there because it denies
		the user the use of the same name.

		This is the best way to implement a namespace escape,
		if you plan on mounting the FS both with and without
		the stacking layer (example: an FS which will be
		accessed by more than one OS), and you only need to
		store data that applies to the FS as a whole.

	2)	A file in each directory of the FS.

		This is the same as #1, but each directory has a file,
		instead of just the root.

		This is the best way to implement a namespace escape,
		if you plan on mounting the FS both with and without
		the stacking layer (example: an FS which will be
		accessed by more than one OS), and you only need to
		store data that applies to individual directories or
		files.

	3)	A "root-level redirection".

		A stacking layer that implements a root-level redirection
		makes it look like all user accesses to the root of the
		FS do not actually occur on the root of the FS.

		It does this by making a subdirectory, and pretending
		that the subdirectory is the root, every time it gets
		a request for the root (for instance, "./root").  It
		then keeps its own files and directories in the real
		root.

		This is the best way to implement a namespace escape,
		if you will *always* be mounting the FS with the
		stacking layer stacked over top of the underlying FS,
		and you only need to store data that applies to the
		FS as a whole.

	4)	A "directory-level redirection".

		A stacking layer that implements a directory-level
		redirection makes it look like all user accesses to
		each directory in the FS, including the root, are
		actually occuring on subdirectories.

		This is the classic "files are directories" type of
		implementation that you see Mac and OS/2 people
		lamenting for all of the time.

		When a file is created, you create a directory instead,
		create a file called "data" (or "default", or any other
		implementation defined name), and when the user goes
		to access the file by name, you act as if he had
		asked for the "data" file in the directory with the
		name of the file he asked for.

		You can then create other "streams" (or "forks" or
		"extended attributes") within the file.

		This is the best way to implement a namespace escape,
		if you will *always* be mounting the FS with the
		stacking layer stacked over top of the underlying FS,
		and you need to store data that applies to individual
		directories or files.


	5)	A "partial directory-level redirection".

		A stacking layer that implements a partial directory-
		-level redirection is actually implementing a sort
		of combination of #4 and #2.

		The stacking layer creates a hidden directory, and
		stores its own files in the directory, usually in
		a subdirectory with the same name as the file the
		data applies to, or just a file with the same name if
		it only needs one extra data stream per file.

		The actual files are still in the same place.

		This is the best way to implement a namespace escape,
		if you plan on mounting the FS both with and without
		the stacking layer (example: an FS which will be
		accessed by more than one OS), and you only need to
		store a lot of different types of data that apply to
		individual directories or files.


OK, now onto the fun stuff.  8-).  What kind of stacking FS layers can
you build?  Well, you are pretty much limited by your imagination.  Here
are 10 of them that I've thought up off the top of my head (I admit,
I've been thinking about this for a while now, so several of them aren't
original; you may have seen them before):


QUOTFS		This layer implements quotas using a file on the FS
		root.

		The file is hidden from normal users of the FS using
		either namespace escape technique #3, or it uses
		technique #1 so that you can put quotas on MSDOS FS's
		(which need to also be mounted by DOS/Windows).

UMSDOSFS	This layer implements compatibility with the Linux
		UMSDOSFS.

		It works using namespace escape technique #2, because
		it has to be compatible with Linux.

		In each directory, it looks for (and creates, if not
		present) a hidden file named "--LINUX-.---".  This
		file stores things about the the files in the directory
		it's in, like UNIX UID, UNIX GID, UNIX permissions,
		etc.; everything that MSDOSFS is too stupid to save for
		you.  Even long filenames, if you didn't mount it as
		a VFAT/VFAT32 mount because it was a DOS 2.11 or
		Windows 3.11 drive.  ;-).

		With minor modifications, this FS would allow FreeBSD
		to boot using a subdirectory of an MSDOSFS as its
		root filesystem (so that people could "try it out"
		without needing to repartition their DOS drives).

ATALKFS		This layer implements resource forks for Macintosh
		client machines.

		It works using namespace escape technique #5.  The
		directories it creates are named ".AppleDesktop" and
		".AppleDouble".

		It uses this technique because it makes the code
		"plug compatible" with netatalk.  8-).

		If FreeBSD's namei is patched to correctly inherit
		flags down, and to pass them as part of an opaque
		cn_pnbuf (needs the "nameifree" fixes to make it
		opaque), then you can use "the POSIX namespace
		escape" to access the forks from UNIX.  Example:

		Open the data fork for the file bob in the current
		directory:

			bob

		Open the data fork for the file bob in the current
		directory:

			//ATALKFS/data/bob

		Open the resource fork for the file "bob" in the
		current directory:

			//ATALKFS/rsrc/bob

		Open the resource fork for the file "tom" in the
		"/tmp" directory:

			//ATALKFS/rsrc//tmp/bob

CRYPTFS		This layer implements cryptography for the underlying
		FS.

		It uses namespace escape technique #3 to maintain
		state, and because the FS is useless without the
		cryptographic layer.

		If the encrypted and decrypted data are not the same
		size, it uses namespace escape technique #4.

		VOP_READ/VOP_WRITE are trapped by this layer.  So are
		file creates, deletions, and so on (events for which
		the cryptographic state needs to be synchronized).  A
		good implementation will log transactions like this
		before performing them, in case it crashes halfway
		through an operation.

COMPFS		This layer implements file level compression for the
		underlying FS.

		It uses namespace escape technique #4 to maintain any
		uncompressed copies of file for file currently in use.

		When the COMPFS "fsck" is run, it removes non-compressed
		files ("forks").  This is done after a crash.

		A cleaner process follows closes.  If the file isn't
		reopened after a short period of time, the uncompressed
		image is removed.

EAFS		This layer implements OS/2 extended attributesA (these
		are like Mac resource forks, only you can have more
		than one of them).

		It is like ATALKFS, but uses namespace escape technique
		#4 so that it can store multiple streams.

		Like ATALKFS, it would benefit from POSIX "//" based
		namespace selection.

ACLFS		This layer implements Access Control Lists (ACL's).

		It uses namespace escape technique #4 so it can store
		as many file attributes as it wants.

		This layer extends the VOP's with a "VOP_ACL".

UNRMFS		This layer implements "unrm".

		It uses namespace escape technique #4 so it can store
		as many file forks as it wants.  This lets you delete
		the same file twice, and get back either copy because
		both are saved.

		It has a companion kernel process that can be told
		to go around the FS looking for deleted file older
		older than a set age (ie: over one month old), and
		"purge" (that is, *really* delete) them.

		This layer depends on the POSIX namespace selection
		to allow (1) the deleted files to be listed by a
		VOP_READDIR (this requires that the FS namage it's
		own vnodes so it can "know" if a given directory
		was opened via the POSIX namespace selection or not,
		since VOP_READDIR doesn't know the directory path),
		(2) to allow a user "purge" command to be built, so
		purges can happen under user control, and (3) to allow
		a user "unrm" command to be built (which simply renames
		from the deleted to the default namespace).

FLSFS		This layer implements File Level Security.


		It is like ATALKFS, but uses namespace escape technique
		4 so that it can store multiple streams.

		This layer requires a session manager process.

		When a user attempts to access a file for which file
		level security is active, a message is sent to the
		user's session manager requesting credential
		information from the user for the file.

		The session manager can be a "pre-authentication"
		mechanism, where credentials are entered in using
		a command line tool, after login.  Or it can be a
		"password cache" mechanism, like Windows 95 uses
		(this kind of defeats the purpose, but is useful
		for other uses of session management credentials,
		like an SMBFS or NCPFS).  Or the session manager
		can be "active".

		An "active" session manager is the most interesting.
		Using it's knowledge of the console, or being built
		into the "screen" program, or being built into the
		xdm, the session manager can actually interact with
		the user on behalf of the FLSFS (or SMBFS, or NCPFS,
		etc.) to interactively ask the user for credentials.

		This would let you have password protection per file,
		even going so far as to have different passwords for
		different user for a file (an entire passwd file
		could be supported, including password aging, etc.).


NSEFS		This layer implements a shared namespace escape
		for other layers to share between them.

		At this point you might be wondering about a stack
		consisting of multiple FS's that use techniques #3,
		#4, or #5.  You might be worried that these would
		tend to add up rather quickly.  8-).

		The answer is to implement a stacking layer that
		*only* does namespace escaping, and implements a
		new VOP called VOP_NSE.  FS's which need a namespace
		escape can use this VOP (one of the arguments is
		"technique".

		Alternately, you can leave it up to the NSEFS stacking
		layer to decide by looking at the FS, and specify "I
		need one file" or "I need a file per directory" or
		"I need a file per file" or "I need multiple files
		per file".

The possibilities are practically endless.  8-).

					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199803100857.BAA12940>