Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jul 1996 14:56:01 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        michaelh@cet.co.jp (Michael Hancock)
Cc:        freebsd-fs@FreeBSD.ORG, terry@lambert.org
Subject:   Re: Fixing Union_mounts
Message-ID:  <199607102156.OAA27403@phaeton.artisoft.com>
In-Reply-To: <Pine.SV4.3.93.960710105207.28386D-100000@parkplace.cet.co.jp> from "Michael Hancock" at Jul 10, 96 11:26:40 am

next in thread | previous in thread | raw e-mail | index | archive | help
> [Please trim off current and leave fs when replying]

OK.

> Terry posted this reply to the "making in /usr/src" thread.  I'd like to
> see all this stackable fs stuff made usable.
> 
> I have some questions on Terry's remedies items 2) and 4) below:
> 
> 2) Moving vnode locking to the vnode from the per fs inode will fix the
> help fix the stacking problems, but what will it do for future advanced
> file systems that need to have special locking requirements?

It will not impact them in any way.  Specifically, the change is from:

	syscall()
		VOP_LOCK()
			return xxx_lock()
				return kern_lock.c lock


to:

	syscall()
		if( kern_lock.c lock == SUCCESS) {
			if( VOP_LOCK()
				return xxx_lock()
				== FAILURE) {
					kern_lock.c unlock
			}
		}

Which is to say that the per FS lock code gets the opportunity to veto
the locking, but in the default case, will never veto.  This leaves
room for the complex FS's to veto at will.

The same goes for advisory locking.  It should be obvious how the
lock veto will work for NFS client locking:

	if( local lock == SUCCESS) {
		if( remote lock == FAILURE)
			local unlock
	}

This has the advantage of preventing local conflicts from being
appealed over the wire (and perhaps encountering race conditions
as a result).


> 4) Moving the vnodes from the global pool to a per fs pool to improve
> locality of reference.  Won't this make it hard to manage memory?  How
> will efficient reclaim operations be implemented?

The memory is allocable per mount instance.

The problem with the recovery is in the divorce of the per FS in core
inode from the per FS in core vnode, as implemented primarily by the
vclean() and family of routines.

Specifically, there is already a "max open" limit on the allocated
inodes, in the same respect, and with the same memory fragmentation
issues coming up as a result.


The reclaim operation will be done by multiplexing ffs_vrele the same
way ffs_vget, ffs_fhtovp, and ffs_vptofh (operations which also deal
with per FS vnode-inode association) currently multiplex VFS_VGET,
etc..


The net effect of a real cleanup (which will require something similar
to this to be implemented, in any case) will be to actually reduce the
number of cache misses -- since there are frequent cases where a vnode
is recycled leaving the buffer cache contents in core.  A sbsequent
read failes to detect this fact, and the disk is actually read instead
of a cache hit occurring.  This is a relatively huge overhead, and it
is unnecessary.

This is only foundation work, since it requires a cleanup of the
vclean/etc. interfaces in kern/vfs_subr.c.  It will have *some* effect,
in that an inode in the current ihash without an associated vnode (in
the current implementation) will always have a recoverable vnode.  This
should be an immediate win for ihashget() cache hits, at least in those
FS's that implement in core inode hashing (FFS/LFS/EXT2).


> This stacked fs stuff is really cool.  You can implement a simple undelete
> in the Union layer by making whiteout entries (See the 4.4 deamon book).
> This would only work for the duration of the mount unlike Novell's
> persistent transactional stuff, but still very useful.

Better than that.  You could implement a persistent whiteout or umsdos
type attribution in a file the same way, by stacking on top of the
existing FS, and "swallowing" your own file to do the dirty deed.
The duration would be permanent, assuming mount order is preserved.

This was the initial intent of the "mount over" capability:  the mount
of the underlying FS would take place, then the FS would be "probed"
for stacking by looking for sepcific "swallow" files to determine if
tanother FS should mount the FS again on the same mount point
interposing its layer.


This is specifically most useful right now for implementing a "quota"
layer: ripping the quota code out of UFS in particular, and applying
it to any FS which has a quota file on it.  8-).


> There are already crypto-fs implementation out there, but I'd like to see
> more; especially non ITAR restricted ones that can be used world-wide.

There is a file-compression (not block compression) FS, which two of
John Heidemann's students implemented as part of a class project, as
well.

There is also the concept of a persistent replicated network FS with
intermittent. network connectivity (basically, what the FICUS project
implied) for nomadic computing and docking/undocking at geographically
seperate locations (I use a floating license from the West coast office
to create a "PowerPoint" presentation, fly across the country, plug
in my laptop to the East coast office network, and use a floating
license from the East coast office to make the actual presentation
to the board).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199607102156.OAA27403>