From owner-freebsd-fs  Mon Jul  8 07:05:43 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id HAA05606
          for fs-outgoing; Mon, 8 Jul 1996 07:05:43 -0700 (PDT)
Received: from mail.ruhrgebiet.individual.net (in-ruhr.ruhr.de [193.100.176.38])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id HAA05601
          for <freebsd-fs@freebsd.org>; Mon, 8 Jul 1996 07:05:34 -0700 (PDT)
Received: by mail.ruhrgebiet.individual.net (8.7.1/8.6.12) with UUCP id PAA01079 for freebsd.org!freebsd-fs; Mon, 8 Jul 1996 15:24:15 +0200 (MET DST)
Received: by robkaos.ruhr.de (/\oo/\ Smail3.1.29.1 #29.1)
	id <m0ud0Wu-0000IkC@robkaos.ruhr.de>; Sun, 7 Jul 96 22:34 MET DST
Message-Id: <m0ud0Wu-0000IkC@robkaos.ruhr.de>
From: robsch@robkaos.ruhr.de (Robert Schien)
Subject: procfs
To: freebsd-fs@freebsd.org
Date: Sun, 7 Jul 1996 22:34:07 +0200 (MET DST)
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Is it possible to freeze the current status of a process and
restart it at a later time so that it begins execution at
the point where it was frozen?  

I have the problem to do some number crunching. I want to save
the process and restart it next day or so.

It would be nice to have such a feature.

Does any kind of *nix or other OS support this?

TIA
Robert

From owner-freebsd-fs  Mon Jul  8 11:49:16 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id LAA29450
          for fs-outgoing; Mon, 8 Jul 1996 11:49:16 -0700 (PDT)
Received: from baygate.bayarea.net (baygate.bayarea.net [204.71.212.2])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id LAA29442
          for <freebsd-fs@freebsd.org>; Mon, 8 Jul 1996 11:49:14 -0700 (PDT)
Received: (from mcnab@localhost) by baygate.bayarea.net (8.6.9/8.6.9) id LAA27246; Mon, 8 Jul 1996 11:42:51 -0700
Date: Mon, 8 Jul 1996 11:42:51 -0700
From: David McNab <mcnab@bayarea.net>
Message-Id: <199607081842.LAA27246@baygate.bayarea.net>
To: robsch@robkaos.ruhr.de
CC: freebsd-fs@freebsd.org
In-reply-to: <m0ud0Wu-0000IkC@robkaos.ruhr.de> (robsch@robkaos.ruhr.de)
Subject: Re: procfs
Reply-to: David McNab <mcnab@bayarea.net>
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Robert wrote:
|Is it possible to freeze the current status of a process and
|restart it at a later time so that it begins execution at
|the point where it was frozen?  

     You can suspend it, which means it won't eat any
CPU time and can be easily paged out (won't consume any
memory -- well, hardly any).  But this won't be
persistent across boots.

     If you want real checkpointing -- the kernel
writes the executable and relevant context to a file
and can later restart it -- then the only UNIX OS I'm
familiar with that provides it, albeit in a slightly
limited way, is UNICOS, Cray's UNIX.  The hardware
overhead's a bitch, though  :^).

     It's a hard problem.  There's lots of state
scattered throughout the "system" that's hard to record
and regenerate.  It's especially hard if you are doing
any networking, because then you have state in foreign
address spaces.

     Most people seem to end up writing their number
cruncher so that it periodically hits a "sync point"
where they can easily checkpoint it themselves.

  -- Dave McNab

From owner-freebsd-fs  Mon Jul  8 14:16:55 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id OAA09297
          for fs-outgoing; Mon, 8 Jul 1996 14:16:55 -0700 (PDT)
Received: from ra.ibr.cs.tu-bs.de (ra.ibr.cs.tu-bs.de [134.169.246.34])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id OAA09265
          for <freebsd-fs@freefall.freebsd.org>; Mon, 8 Jul 1996 14:16:26 -0700 (PDT)
Received: from achill [134.169.34.18] by ra.ibr.cs.tu-bs.de (8.6.10/tubsibr) with ESMTP id XAA24073; Mon, 8 Jul 1996 23:14:46 +0200
Received: from petri@localhost by achill.ibr.cs.tu-bs.de (8.6.10/tubsibr) id XAA24089; Mon, 8 Jul 1996 23:14:45 +0200
Date: Mon, 8 Jul 1996 23:14:45 +0200
From: Stefan Petri <petri@ibr.cs.tu-bs.de>
Message-Id: <199607082114.XAA24089@achill.ibr.cs.tu-bs.de>
To: freebsd-fs@freefall.freebsd.org, mcnab@bayarea.net, robsch@robkaos.ruhr.de
Subject: Checkpointing [Re: procfs]
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk


A collection of references about checkpinting and process migration can be found
on http://www.cs.tu-bs.de/~petri/pgmigrefs.html

Stefan

From owner-freebsd-fs  Tue Jul  9 19:26:51 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id TAA27512
          for fs-outgoing; Tue, 9 Jul 1996 19:26:51 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id TAA27493;
          Tue, 9 Jul 1996 19:26:43 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id CAA29004; Wed, 10 Jul 1996 02:26:40 GMT
Date: Wed, 10 Jul 1996 11:26:40 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: freebsd-fs@FreeBSD.ORG
cc: freebsd-current@FreeBSD.ORG
Subject: Fixing Union_mounts
In-Reply-To: <199606251931.MAA00496@phaeton.artisoft.com>
Message-ID: <Pine.SV4.3.93.960710105207.28386D-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

[Please trim off current and leave fs when replying]

Terry posted this reply to the "making in /usr/src" thread.  I'd like to
see all this stackable fs stuff made usable.

I have some questions on Terry's remedies items 2) and 4) below:

2) Moving vnode locking to the vnode from the per fs inode will fix the
help fix the stacking problems, but what will it do for future advanced
file systems that need to have special locking requirements?

4) Moving the vnodes from the global pool to a per fs pool to improve
locality of reference.  Won't this make it hard to manage memory?  How
will efficient reclaim operations be implemented?

This stacked fs stuff is really cool.  You can implement a simple undelete
in the Union layer by making whiteout entries (See the 4.4 deamon book).
This would only work for the duration of the mount unlike Novell's
persistent transactional stuff, but still very useful.

There are already crypto-fs implementation out there, but I'd like to see
more; especially non ITAR restricted ones that can be used world-wide.

Regards,


Mike Hancock

On Tue, 25 Jun 1996, Terry Lambert wrote:

> This is the intrinsic "union" option.
> 
> It does not work.
> 
> It does not work because VOP_ADVLOCK does not veto.
> 
> It does not work because VOP_LOCK can not be stacked because it is
> stupidly referencing flags specific to the underlying vnode for lock
> resoloution instead of the union vnode.
> 
> It does not work because VOP_LOOKUP, VOP_RENAME, etc. can not
> be stacked because they actually deallocate path structures that
> were allocated by code in vfs_syscalls.c, instead of the buffers
> being deallocated in vfs_syscalls.c as well, as you would expect
> in a proper idempotent layering implementation.
> 
> VOP_LOCK stupidly references these flags because vclean needs them.
> 
> vclean is an abomination before God, and is a half-kludge to deal
> with not having both vnode/offset and dev/offset based cache
> references simultaneously.
> 
> Use of vnode/offset cache entries is a result of the unified cache
> implementation.  It saves a bmap call when moving data to/from
> user space.  It's why FreeBSD has faster I/O than most other systems.
> 
> The lack of a parallel dev/offset based caching allows us to be lazy,
> and enlarges the bit limit on FS storage, though it does not help
> the inherent limit on file size (due to mapping).
> 
> The lack of a parallel dev/offset results in the need for
> implementation of a "second chance cache" via ihash.  Still, we
> will discard perfectly good pages from cache as a side effect of
> having no way to reassociate them with a vnode.
> 
> The use of a global vnode pool instead of per FS mount instance vnode
> allocations damages cache locality.  Combined with vclean, it also
> damages cache coherency.
> 
> 
> To repair:
> 
> 1)	Fix the stackability issues with the VFS interface itself,
> 	which will incidently cause the VFS to more closely conform
> 	to the Heidemann Thesis design on which it is based.  Currently
> 	it only implements a subset of the specified functionality.
> 
> 2)	Migrate the vnode locking to the vnode instead of the per FS
> 	inode; get rid of the second chance cache at the same time
> 	(the Lite2 code does some of this).  The pointer should have
> 	been in the vnode, not the inode, from the very beginning.
> 
> 3)	Move the directory name cache out of the per FS code and
> 	into the lookup code.
> 
> 4)	Move the vnodes from the global pool; establish a per-FS
> 	vnode free routine.
> 
> 5)	Establish VOP_GETPAGE/VOP_PUTPAGE, etc...
> 
> 6)	Union mounts will then work without kludges in lookup, locking,
> 	and other code.  They *could* be made to work with great, gross
> 	kludges and changes to at least 3 FS's (that I know of), but
> 	that's a kludge I won't do.
> 
> 
> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.
> 

--
michaelh@cet.co.jp                                http://www.cet.co.jp
CET Inc., Daiichi Kasuya BLDG 8F 2-5-12, Higashi Shinbashi, Minato-ku,
Tokyo 105 Japan              Tel: +81-3-3437-1761 Fax: +81-3-3437-1766


From owner-freebsd-fs  Wed Jul 10 01:27:42 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id BAA21545
          for fs-outgoing; Wed, 10 Jul 1996 01:27:42 -0700 (PDT)
Received: from soleil.uvsq.fr (soleil.uvsq.fr [193.51.24.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id BAA21527;
          Wed, 10 Jul 1996 01:27:31 -0700 (PDT)
Received: from guillotin.prism.uvsq.fr (guillotin.prism.uvsq.fr [193.51.25.1])
          by soleil.uvsq.fr (8.7.5/jtpda-5.2) with ESMTP id KAA17976
          ; Wed, 10 Jul 1996 10:27:28 +0200 (METDST)
Received: from angrand.prism.uvsq.fr (angrand.prism.uvsq.fr [193.51.25.85])
          by guillotin.prism.uvsq.fr (8.7.5/jtpda-5.2) with ESMTP id KAA00273
          ; Wed, 10 Jul 1996 10:27:27 +0200 (MET DST)
Received: from (son@localhost)
          by angrand.prism.uvsq.fr (8.7.5/jtpda-5.2) id LAA02630
          ; Wed, 10 Jul 1996 11:30:07 +0200 (MET DST)
Date: Wed, 10 Jul 1996 11:30:07 +0200 (MET DST)
Message-Id: <199607100930.LAA02630@angrand.prism.uvsq.fr>
From: Nicolas Souchu <Nicolas.Souchu@prism.uvsq.fr>
To: freebsd-fs@freebsd.org
CC: freebsd-scsi@freebsd.org
Subject: msdosfs and scsi
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


I've developed a polling driver for an scsi drive ZIP 100 drive
connected to the parallel port.

http://www.prism.uvsq.fr/~son/ppa3.html

Here is my question/problem:

When polling, the system load is horrible... then I want to insert
some tsleep() in the driver.

In fact, when data is not available, the process which runs into
the driver is scheduled with :

s = splbio();
tsleep(..., PRIBIO, "mywait", 1);
splx (s);

BUT: doing this leads 2 concurent processes to a deadlock.

$ mount -t msdos /dev/sd0s4 /zip
$ time dd if=/dev/zero of=/zip/file bs=8192 count=512 &
$ ls -l /zip

dd is waiting on channel "getblk", ls is waiting on channel "msdhgt".

Debugging the driver shows that dd is scheduled and ls starts reading
data from the disk. But then everythings stop.

Should the driver be atomic until returning SUCCESSFULLY_QUEUED ?
Why ? Why not ?

I may get more info. if you need...

nicolas

-- 
Nicolas.Souchu@prism.uvsq.fr
Laboratoire PRiSM - Versailles, FRANCE


From owner-freebsd-fs  Wed Jul 10 10:45:31 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id KAA29185
          for fs-outgoing; Wed, 10 Jul 1996 10:45:31 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id KAA29164;
          Wed, 10 Jul 1996 10:45:25 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id RAA04866; Wed, 10 Jul 1996 17:45:17 GMT
Date: Thu, 11 Jul 1996 02:45:16 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: freebsd-fs@FreeBSD.ORG
cc: freebsd-current@FreeBSD.ORG
Subject: Re: Fixing Union_mounts
In-Reply-To: <Pine.SV4.3.93.960710105207.28386D-100000@parkplace.cet.co.jp>
Message-ID: <Pine.SV4.3.93.960711024308.4847A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Never mind.  Mail search on "FS and Layering" on current makes for some
good reading. I have some absorbing to do. 

-mike hancock


From owner-freebsd-fs  Wed Jul 10 15:00:50 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id PAA19316
          for fs-outgoing; Wed, 10 Jul 1996 15:00:50 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id PAA19296
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 15:00:41 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id OAA27403; Wed, 10 Jul 1996 14:56:01 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199607102156.OAA27403@phaeton.artisoft.com>
Subject: Re: Fixing Union_mounts
To: michaelh@cet.co.jp (Michael Hancock)
Date: Wed, 10 Jul 1996 14:56:01 -0700 (MST)
Cc: freebsd-fs@FreeBSD.ORG, terry@lambert.org
In-Reply-To: <Pine.SV4.3.93.960710105207.28386D-100000@parkplace.cet.co.jp> from "Michael Hancock" at Jul 10, 96 11:26:40 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> [Please trim off current and leave fs when replying]

OK.

> Terry posted this reply to the "making in /usr/src" thread.  I'd like to
> see all this stackable fs stuff made usable.
> 
> I have some questions on Terry's remedies items 2) and 4) below:
> 
> 2) Moving vnode locking to the vnode from the per fs inode will fix the
> help fix the stacking problems, but what will it do for future advanced
> file systems that need to have special locking requirements?

It will not impact them in any way.  Specifically, the change is from:

	syscall()
		VOP_LOCK()
			return xxx_lock()
				return kern_lock.c lock


to:

	syscall()
		if( kern_lock.c lock == SUCCESS) {
			if( VOP_LOCK()
				return xxx_lock()
				== FAILURE) {
					kern_lock.c unlock
			}
		}

Which is to say that the per FS lock code gets the opportunity to veto
the locking, but in the default case, will never veto.  This leaves
room for the complex FS's to veto at will.

The same goes for advisory locking.  It should be obvious how the
lock veto will work for NFS client locking:

	if( local lock == SUCCESS) {
		if( remote lock == FAILURE)
			local unlock
	}

This has the advantage of preventing local conflicts from being
appealed over the wire (and perhaps encountering race conditions
as a result).


> 4) Moving the vnodes from the global pool to a per fs pool to improve
> locality of reference.  Won't this make it hard to manage memory?  How
> will efficient reclaim operations be implemented?

The memory is allocable per mount instance.

The problem with the recovery is in the divorce of the per FS in core
inode from the per FS in core vnode, as implemented primarily by the
vclean() and family of routines.

Specifically, there is already a "max open" limit on the allocated
inodes, in the same respect, and with the same memory fragmentation
issues coming up as a result.


The reclaim operation will be done by multiplexing ffs_vrele the same
way ffs_vget, ffs_fhtovp, and ffs_vptofh (operations which also deal
with per FS vnode-inode association) currently multiplex VFS_VGET,
etc..


The net effect of a real cleanup (which will require something similar
to this to be implemented, in any case) will be to actually reduce the
number of cache misses -- since there are frequent cases where a vnode
is recycled leaving the buffer cache contents in core.  A sbsequent
read failes to detect this fact, and the disk is actually read instead
of a cache hit occurring.  This is a relatively huge overhead, and it
is unnecessary.

This is only foundation work, since it requires a cleanup of the
vclean/etc. interfaces in kern/vfs_subr.c.  It will have *some* effect,
in that an inode in the current ihash without an associated vnode (in
the current implementation) will always have a recoverable vnode.  This
should be an immediate win for ihashget() cache hits, at least in those
FS's that implement in core inode hashing (FFS/LFS/EXT2).


> This stacked fs stuff is really cool.  You can implement a simple undelete
> in the Union layer by making whiteout entries (See the 4.4 deamon book).
> This would only work for the duration of the mount unlike Novell's
> persistent transactional stuff, but still very useful.

Better than that.  You could implement a persistent whiteout or umsdos
type attribution in a file the same way, by stacking on top of the
existing FS, and "swallowing" your own file to do the dirty deed.
The duration would be permanent, assuming mount order is preserved.

This was the initial intent of the "mount over" capability:  the mount
of the underlying FS would take place, then the FS would be "probed"
for stacking by looking for sepcific "swallow" files to determine if
tanother FS should mount the FS again on the same mount point
interposing its layer.


This is specifically most useful right now for implementing a "quota"
layer: ripping the quota code out of UFS in particular, and applying
it to any FS which has a quota file on it.  8-).


> There are already crypto-fs implementation out there, but I'd like to see
> more; especially non ITAR restricted ones that can be used world-wide.

There is a file-compression (not block compression) FS, which two of
John Heidemann's students implemented as part of a class project, as
well.

There is also the concept of a persistent replicated network FS with
intermittent. network connectivity (basically, what the FICUS project
implied) for nomadic computing and docking/undocking at geographically
seperate locations (I use a floating license from the West coast office
to create a "PowerPoint" presentation, fly across the country, plug
in my laptop to the East coast office network, and use a floating
license from the East coast office to make the actual presentation
to the board).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

From owner-freebsd-fs  Wed Jul 10 19:45:39 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id TAA08262
          for fs-outgoing; Wed, 10 Jul 1996 19:45:39 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id TAA08252
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 19:45:33 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id CAA08956 for <freebsd-fs@FreeBSD.ORG>; Thu, 11 Jul 1996 02:45:30 GMT
Date: Thu, 11 Jul 1996 11:45:30 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: freebsd-fs@FreeBSD.ORG
Subject: Re: Fixing Union_mounts
In-Reply-To: <199607102156.OAA27403@phaeton.artisoft.com>
Message-ID: <Pine.SV4.3.93.960711113222.8671C-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Thanks, the mail archives still left a lot of questions.

For now 2) is clear.  I need to look at the code more to completely
understand 4).

It would be interesting to hear from other FS/VM people so we can archive
this discussion.  Hopefully, agreements can be made to towards fully
realizing stackable mounts.

-mike hancock

On Wed, 10 Jul 1996, Terry Lambert wrote:

> > 2) Moving vnode locking to the vnode from the per fs inode will fix the
> > help fix the stacking problems, but what will it do for future advanced
> > file systems that need to have special locking requirements?
> 
> Which is to say that the per FS lock code gets the opportunity to veto
> the locking, but in the default case, will never veto.  This leaves
> room for the complex FS's to veto at will.
> 
> The same goes for advisory locking.  It should be obvious how the
> lock veto will work for NFS client locking:
> 
> This has the advantage of preventing local conflicts from being
> appealed over the wire (and perhaps encountering race conditions
> as a result).


> > 4) Moving the vnodes from the global pool to a per fs pool to improve
> > locality of reference.  Won't this make it hard to manage memory?  How
> > will efficient reclaim operations be implemented?
> 
> The memory is allocable per mount instance.
> 
> The problem with the recovery is in the divorce of the per FS in core
> inode from the per FS in core vnode, as implemented primarily by the
> vclean() and family of routines.
> 
> Specifically, there is already a "max open" limit on the allocated
> inodes, in the same respect, and with the same memory fragmentation
> issues coming up as a result.
> 
> 
> The reclaim operation will be done by multiplexing ffs_vrele the same
> way ffs_vget, ffs_fhtovp, and ffs_vptofh (operations which also deal
> with per FS vnode-inode association) currently multiplex VFS_VGET,
> etc..
> 
> 
> The net effect of a real cleanup (which will require something similar
> to this to be implemented, in any case) will be to actually reduce the
> number of cache misses -- since there are frequent cases where a vnode
> is recycled leaving the buffer cache contents in core.  A sbsequent
> read failes to detect this fact, and the disk is actually read instead
> of a cache hit occurring.  This is a relatively huge overhead, and it
> is unnecessary.
> 
> This is only foundation work, since it requires a cleanup of the
> vclean/etc. interfaces in kern/vfs_subr.c.  It will have *some* effect,
> in that an inode in the current ihash without an associated vnode (in
> the current implementation) will always have a recoverable vnode.  This
> should be an immediate win for ihashget() cache hits, at least in those
> FS's that implement in core inode hashing (FFS/LFS/EXT2).
> 
> 
> > This stacked fs stuff is really cool.  You can implement a simple undelete
> > in the Union layer by making whiteout entries (See the 4.4 deamon book).
> > This would only work for the duration of the mount unlike Novell's
> > persistent transactional stuff, but still very useful.
> 
> Better than that.  You could implement a persistent whiteout or umsdos
> type attribution in a file the same way, by stacking on top of the
> existing FS, and "swallowing" your own file to do the dirty deed.
> The duration would be permanent, assuming mount order is preserved.
> 
> This was the initial intent of the "mount over" capability:  the mount
> of the underlying FS would take place, then the FS would be "probed"
> for stacking by looking for sepcific "swallow" files to determine if
> tanother FS should mount the FS again on the same mount point
> interposing its layer.
> 
> 
> This is specifically most useful right now for implementing a "quota"
> layer: ripping the quota code out of UFS in particular, and applying
> it to any FS which has a quota file on it.  8-).
> 
> 
> > There are already crypto-fs implementation out there, but I'd like to see
> > more; especially non ITAR restricted ones that can be used world-wide.
> 
> There is a file-compression (not block compression) FS, which two of
> John Heidemann's students implemented as part of a class project, as
> well.
> 
> There is also the concept of a persistent replicated network FS with
> intermittent. network connectivity (basically, what the FICUS project
> implied) for nomadic computing and docking/undocking at geographically
> seperate locations (I use a floating license from the West coast office
> to create a "PowerPoint" presentation, fly across the country, plug
> in my laptop to the East coast office network, and use a floating
> license from the East coast office to make the actual presentation
> to the board).
> 
> 
> 					Regards,
> 					Terry Lambert
> 					terry@lambert.org
> ---
> Any opinions in this posting are my own and not those of my present
> or previous employers.
> 

--
michaelh@cet.co.jp                                http://www.cet.co.jp
CET Inc., Daiichi Kasuya BLDG 8F 2-5-12, Higashi Shinbashi, Minato-ku,
Tokyo 105 Japan              Tel: +81-3-3437-1761 Fax: +81-3-3437-1766


From owner-freebsd-fs  Wed Jul 10 20:11:49 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id UAA09673
          for fs-outgoing; Wed, 10 Jul 1996 20:11:49 -0700 (PDT)
Received: from who.cdrom.com (who.cdrom.com [204.216.27.3])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id UAA09668
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 20:11:47 -0700 (PDT)
Received: from ccs.sogang.ac.kr (ccs.sogang.ac.kr [163.239.1.1])
          by who.cdrom.com (8.6.12/8.6.11) with ESMTP id UAA09347
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 20:11:39 -0700
Received: from cslsun10.sogang.ac.kr by ccs.sogang.ac.kr (8.7.5/Sogang) id LAA24730; Thu, 11 Jul 1996 11:54:31 +0900 (KST)
Received: by cslsun10.sogang.ac.kr (4.1/SMI-4.1)
	id AA06369; Thu, 11 Jul 96 11:51:23 KST
Date: Thu, 11 Jul 96 11:51:23 KST
From: heo@cslsun10.sogang.ac.kr (Heo Sung Gwan)
Message-Id: <9607110251.AA06369@cslsun10.sogang.ac.kr>
Apparently-To: <freebsd-fs@FreeBSD.ORG>
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Hi, 


From owner-freebsd-fs  Wed Jul 10 20:37:49 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id UAA10699
          for fs-outgoing; Wed, 10 Jul 1996 20:37:49 -0700 (PDT)
Received: from dyson.iquest.net (dyson.iquest.net [198.70.144.127])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id UAA10694
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 20:37:43 -0700 (PDT)
Received: (from root@localhost) by dyson.iquest.net (8.7.5/8.6.9) id WAA07219; Wed, 10 Jul 1996 22:37:11 -0500 (EST)
From: "John S. Dyson" <toor@dyson.iquest.net>
Message-Id: <199607110337.WAA07219@dyson.iquest.net>
Subject: Re: Fixing Union_mounts
To: michaelh@cet.co.jp (Michael Hancock)
Date: Wed, 10 Jul 1996 22:37:11 -0500 (EST)
Cc: freebsd-fs@FreeBSD.ORG
In-Reply-To: <Pine.SV4.3.93.960711113222.8671C-100000@parkplace.cet.co.jp> from "Michael Hancock" at Jul 11, 96 11:45:30 am
Reply-To: dyson@FreeBSD.ORG
X-Mailer: ELM [version 2.4 PL24 ME8]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> Thanks, the mail archives still left a lot of questions.
> 
> For now 2) is clear.  I need to look at the code more to completely
> understand 4).
> 
> It would be interesting to hear from other FS/VM people so we can archive
> this discussion.  Hopefully, agreements can be made to towards fully
> realizing stackable mounts.
> 
My two cents (pence, lira, yen, etc...) :

I hope to look at this thread this weekend.  I know that we need to get
off our duffs starting to make progress on the FS front.  My FreeBSD time
is right now tied up on making the swapon/swapoff stuff real.

There is action about to happen on the Jeffery Hsu Lite-2 stuff, and
I heard that Kirk's ordered-delay writes project might be starting.  This
weekend I am going to dedicate a day or so to work with people to understand
all of this so that I can help contribute.  DG needs to get involved also,
and I think that his time is freeing up (that damn -stable release has
tied up very very valuable resources.)

My language (composition) skills suck, but I have excellent reading skills.
Reading Terry's stuff is sometimes very difficult :-).  Makes me think that
I have 2nd grade reading skills at times...  I have convinced myself that he
is right about the managment of the namei buffers, but there is MUCH MUCH
more to do!!!

Some people have been commenting (in private and -core email) that we are
moving too fast...  We on some fronts we are moving like molasses, and
I sure would like to see more progress.   I am kind of the VM person
(Well DG and I are), but do not feel nearly as competent on the FS front.
As an ABSOLUTE minimum, I can provide alot of "nice" hooks into the
VM system for filesystem memory management (LFS really needs help, for
example.)  I really see the need for a fairly close, collabarative
effort on the FS code structure and filesystems.  However, there are,
at times, diverging opinions on how things should be done.  We need to
get organized!!!  I am at my physical limit now, working a regular job,
needing to find another SO or pseudo-SO, and of course my most important
SO needs attention (The FreeBSD issues that I already have committed to.)

I don't know if this position is acceptable to all, but I am thinking that
we need to (eventually) empower an FS development team, like we have kind of
done so with the VM stuff. 

Sorry for my rambling,
John


From owner-freebsd-fs  Wed Jul 10 21:57:31 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id VAA14643
          for fs-outgoing; Wed, 10 Jul 1996 21:57:31 -0700 (PDT)
Received: from ccs.sogang.ac.kr (ccs.sogang.ac.kr [163.239.1.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id VAA14638
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 21:57:27 -0700 (PDT)
Received: from cslsun10.sogang.ac.kr by ccs.sogang.ac.kr (8.7.5/Sogang) id NAA27547; Thu, 11 Jul 1996 13:54:37 +0900 (KST)
Received: by cslsun10.sogang.ac.kr (4.1/SMI-4.1)
	id AA06370; Thu, 11 Jul 96 11:52:55 KST
Date: Thu, 11 Jul 96 11:52:55 KST
From: heo@cslsun10.sogang.ac.kr (Heo Sung Gwan)
Message-Id: <9607110252.AA06370@cslsun10.sogang.ac.kr>
To: undisclosed-recipients:;
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Hi,
I want to know something about lfs(log-structured filesystem).


From owner-freebsd-fs  Wed Jul 10 22:45:25 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id WAA18419
          for fs-outgoing; Wed, 10 Jul 1996 22:45:25 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id WAA18412
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 22:45:22 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id FAA10203; Thu, 11 Jul 1996 05:44:58 GMT
Date: Thu, 11 Jul 1996 14:44:58 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
Reply-To: Michael Hancock <michaelh@cet.co.jp>
To: Heo Sung Gwan <heo@cslsun10.sogang.ac.kr>
cc: freebsd-fs@FreeBSD.ORG
Subject: lfs)
In-Reply-To: <9607110252.AA06370@cslsun10.sogang.ac.kr>
Message-ID: <Pine.SV4.3.93.960711143751.10112A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 11 Jul 1996, Heo Sung Gwan wrote:

> Hi,
> I want to know something about lfs(log-structured filesystem).
> 

The sources are available with FreeBSD.

You can also get the following books:

Unix Internals: The New Frontiers (by Uresh Vahalia)

	There's lots of stuff in there about file systems.

The Design and Implementation of the 4.4BSD Operating System (by McKusick
et. al)

You might want to poke around http://www.usenix.org to find related
papers.  A good source would also be http://deas.harvard.edu (look at
Margo Seltzer's work).

-mike hancock


From owner-freebsd-fs  Wed Jul 10 23:02:29 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id XAA19340
          for fs-outgoing; Wed, 10 Jul 1996 23:02:29 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA19332
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 23:02:26 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id GAA10311 for <freebsd-fs@FreeBSD.ORG>; Thu, 11 Jul 1996 06:02:24 GMT
Date: Thu, 11 Jul 1996 15:02:24 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
Reply-To: Michael Hancock <michaelh@cet.co.jp>
To: freebsd-fs@FreeBSD.ORG
Subject: Re: Fixing Union_mounts
In-Reply-To: <Pine.SV4.3.93.960711113222.8671C-100000@parkplace.cet.co.jp>
Message-ID: <Pine.SV4.3.93.960711144514.10112B-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Sorry, I can't shutup.  I'm fuzzy on 4), and will be until I read the
sources more.

I just want to backup and talk about the design objectives.  The fathers
of 4.4 thought having a global vnode pool vs. partitioning the pools per
fs was a win for kernel memory management when several different file
systems are in use. 

Your design goals seems to be an SMP perspective which means we need to
think differently to understand what your saying.

If we step back and look at this from the point of view of the 4.4
implementers, what are the consequences of moving away from a global
vnode pool?  What are the wins?

-mike hancock


From owner-freebsd-fs  Wed Jul 10 23:37:49 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id XAA20656
          for fs-outgoing; Wed, 10 Jul 1996 23:37:49 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA20651
          for <freebsd-fs@FreeBSD.ORG>; Wed, 10 Jul 1996 23:37:46 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id GAA10559; Thu, 11 Jul 1996 06:37:33 GMT
Date: Thu, 11 Jul 1996 15:37:33 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: Heo Sung Gwan <heo@cslsun10.sogang.ac.kr>
cc: freebsd-fs@FreeBSD.ORG
Subject: Re: lfs)
In-Reply-To: <Pine.SV4.3.93.960711143751.10112A-100000@parkplace.cet.co.jp>
Message-ID: <Pine.SV4.3.93.960711153103.10524A-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

On Thu, 11 Jul 1996, Michael Hancock wrote:

> papers.  A good source would also be http://deas.harvard.edu (look at
> Margo Seltzer's work).

Oops.  http://www.deas.harvard.edu

-mh


From owner-freebsd-fs  Wed Jul 10 23:58:48 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id XAA21481
          for fs-outgoing; Wed, 10 Jul 1996 23:58:48 -0700 (PDT)
Received: from parkplace.cet.co.jp (parkplace.cet.co.jp [202.32.64.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id XAA21474;
          Wed, 10 Jul 1996 23:58:45 -0700 (PDT)
Received: from localhost (michaelh@localhost) by parkplace.cet.co.jp (8.7.5/CET-v2.1) with SMTP id GAA10724; Thu, 11 Jul 1996 06:58:42 GMT
Date: Thu, 11 Jul 1996 15:58:42 +0900 (JST)
From: Michael Hancock <michaelh@cet.co.jp>
To: dyson@FreeBSD.ORG
cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Fixing Union_mounts
In-Reply-To: <199607110337.WAA07219@dyson.iquest.net>
Message-ID: <Pine.SV4.3.93.960711154052.10524B-100000@parkplace.cet.co.jp>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-fs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Thanks, I guess it's bad timing with all the release work happening now.

On Wed, 10 Jul 1996, John S. Dyson wrote:

> I hope to look at this thread this weekend.  I know that we need to get
> off our duffs starting to make progress on the FS front.  My FreeBSD time
> is right now tied up on making the swapon/swapoff stuff real.
> 
> There is action about to happen on the Jeffery Hsu Lite-2 stuff, and
> I heard that Kirk's ordered-delay writes project might be starting.  This

Yes, the Lite2 stuff is needed to proceed further.

Regarding Delayed-Ordered Writes.  Here's an excerpt from Terry's Usenet
posting on the UnixWare group: 

>Contrast this with the UnixWare 2.x UFS, which uses Delayed
>Ordered Writes.  These require significant changes to each
>FS's structure to implement, and do not scale reeentrancy
>per vnode across multiple processors for a particular vnode
>buffer.  They are about 35% slower than soft updates under
>loading, and tend to have bad cache effects.

I agree that things should probably slow down, but to sit down and
do more *designing*.  DOW is an performance optimization, and before doing
that I think we should take a harder look at the framework that serves as
the foundation for all further work.

I'd hate to see the same mistakes done in SVR/4MP go into 4.4BSD.
Identifying these mistakes might be hard, but I think we should try.

-mike hancock


From owner-freebsd-fs  Fri Jul 12 17:29:30 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id RAA26864
          for fs-outgoing; Fri, 12 Jul 1996 17:29:30 -0700 (PDT)
Received: from veda.is (root@ubiq.veda.is [193.4.230.60])
          by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id RAA26838;
          Fri, 12 Jul 1996 17:28:07 -0700 (PDT)
Received: (from adam@localhost) by veda.is (8.7.5/8.7.3) id AAA07477; Sat, 13 Jul 1996 00:27:59 GMT
From: Adam David <adam@veda.is>
Message-Id: <199607130027.AAA07477@veda.is>
Subject: strangest weirdness
To: freebsd-current@freebsd.org
Date: Sat, 13 Jul 1996 00:27:54 +0000 (GMT)
Cc: freebsd-fs@freebsd.org
X-Mailer: ELM [version 2.4ME+ PL22 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

Well I have just seen what seems to be an unusual filesystem glitch.
I was doing 'make depend' in 2 kernel directories concurrently, and at the
same time as another kernel 'make all' was getting towards the end of its
processing. Both instances of 'make depend' broke by invoking the editor
'ex' on an empty temporary file, following the first invocation of 'mkdep'.
No other instances of 'ex' were running at the time as far as I can tell.

This was with an NFS /usr, and I believe that the 'make' executable was
reinstalled after the 'make all' was started but before the 'make depend'
was started. (yes, it's called stress testing. ;)

I have also noticed that executables dump core often on client machines when
the files on the fileserver have been updated "under their feet". Okay I know
"if it hurts, don't do that", but why do these glitches occur?

--
Adam David <adam@veda.is>

From owner-freebsd-fs  Sat Jul 13 02:10:34 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id CAA25278
          for fs-outgoing; Sat, 13 Jul 1996 02:10:34 -0700 (PDT)
Received: from irz301.inf.tu-dresden.de (irz301.inf.tu-dresden.de [141.76.1.11])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id CAA25259;
          Sat, 13 Jul 1996 02:10:27 -0700 (PDT)
Received: from sax.sax.de by irz301.inf.tu-dresden.de (8.6.12/8.6.12-s1) with ESMTP id LAA26066; Sat, 13 Jul 1996 11:10:13 +0200
Received: (from uucp@localhost) by sax.sax.de (8.6.12/8.6.12-s1) with UUCP id LAA10257; Sat, 13 Jul 1996 11:10:13 +0200
Received: (from j@localhost) by uriah.heep.sax.de (8.7.5/8.6.9) id KAA22966; Sat, 13 Jul 1996 10:21:11 +0200 (MET DST)
From: J Wunsch <j@uriah.heep.sax.de>
Message-Id: <199607130821.KAA22966@uriah.heep.sax.de>
Subject: Re: strangest weirdness
To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org
Date: Sat, 13 Jul 1996 10:21:11 +0200 (MET DST)
Cc: adam@veda.is (Adam David)
Reply-To: joerg_wunsch@uriah.heep.sax.de (Joerg Wunsch)
In-Reply-To: <199607130027.AAA07477@veda.is> from Adam David at "Jul 13, 96 00:27:54 am"
X-Phone: +49-351-2012 669
X-PGP-Fingerprint: DC 47 E6 E4 FF A6 E9 8F  93 21 E0 7D F9 12 D6 4E 
X-Mailer: ELM [version 2.4ME+ PL17 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

As Adam David wrote:

> I have also noticed that executables dump core often on client machines when
> the files on the fileserver have been updated "under their feet". Okay I know
> "if it hurts, don't do that", but why do these glitches occur?

Terry will certainly jump in now and explain you that it would be
better to move over the entire executable into local swap instead of
relying on the ability to page it in from the NFS server.  The latter
is what we're doing right now -- so you are simply not expected to
remove it on the server.  The Unix semantics of ``a file will only be
removed once the last reference to it disappeared'' don't work over
NFS since the server simply doesn't know (and cannot know due to the
statelessness of the protocol) which clients still hold references on
some file.  These semantics are being emulated in the case where you
unlink a file on the client that has still other references, by
renaming the file on the server first, and remove it later.

-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

From owner-freebsd-fs  Sat Jul 13 21:09:14 1996
Return-Path: owner-fs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id VAA05394
          for fs-outgoing; Sat, 13 Jul 1996 21:09:14 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id VAA05378;
          Sat, 13 Jul 1996 21:09:11 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id VAA06010; Sat, 13 Jul 1996 21:03:16 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199607140403.VAA06010@phaeton.artisoft.com>
Subject: Re: strangest weirdness
To: joerg_wunsch@uriah.heep.sax.de
Date: Sat, 13 Jul 1996 21:03:16 -0700 (MST)
Cc: freebsd-current@freebsd.org, freebsd-fs@freebsd.org, adam@veda.is
In-Reply-To: <199607130821.KAA22966@uriah.heep.sax.de> from "J Wunsch" at Jul 13, 96 10:21:11 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-fs@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> > I have also noticed that executables dump core often on client machines when
> > the files on the fileserver have been updated "under their feet". Okay I know
> > "if it hurts, don't do that", but why do these glitches occur?
> 
> Terry will certainly jump in now and explain you that it would be
> better to move over the entire executable into local swap instead of
> relying on the ability to page it in from the NFS server.  The latter
> is what we're doing right now -- so you are simply not expected to
> remove it on the server.  The Unix semantics of ``a file will only be
> removed once the last reference to it disappeared'' don't work over
> NFS since the server simply doesn't know (and cannot know due to the
> statelessness of the protocol) which clients still hold references on
> some file.  These semantics are being emulated in the case where you
> unlink a file on the client that has still other references, by
> renaming the file on the server first, and remove it later.

Actually, you could implement a simple distributed cache coherency
protocol for executables with a slight modification of the rpc.statd
code in current and a minor change to the NFS client.

It wouldn't be an NFS spec compliant implementation afterwards, but it
would solve the problem.

I would like to see a flag in the mount structure for FS's which is
inherited from the FS type, so that the dev of an inode about to be
exec'ed may be dereferenced through the mount struct to decide if
the image is coming from local stable storage, local removable
storage, or network storage.

I would also like to se an option where an executable image could be
forced into local memory.  If swap is available, it would be considered
to be local memory.

I would like to see a default of the current behaviour, with sysctl
based controls to cause the exec to force the image into local memory
in the local removable media case or the network storage case, or
both, under user configuration.

To solve your problem (and for my personal defaults selection), you
would set the flag for the exec from network storage case.

Mach, Linux, SunOS, Solaris, SVR4, SCO Xenix, etc., all have the
behaviour of using an image for swap store, and when the image is
modified without notification (the image is modified on the NFS
server case) or when the image is "deleted" without notification
the CDROM/floppy removal case), the client system is the one that
suffers.

What makes this particularly onerous in NFS is that one NFS client
can intentially crash another NFS client of the same server, given
knowledge of what programs are running and a writable server store.
In addition, this method can be used to hack an otherwise secure
client, typically by rewriting the target page on the server so
that when the accept completes on sendmail, it throws up a shell,
or something similar.  Sendmail is SUID root, so it is a bad example,
but telnetd (/usr/libexec/telnetd) is a good candidate for this hack,
since the ruserok() does not specifically block vouchsafe for "bin",
only for "root", and telnetd is owned by "bin".  Since this bin owned
binary will be run by root (inetd) on a client connect, it is an
ideal place to hack.


Besides the security issues, it's just plain annoying to have the
client quit functioning, or hang pending a pagin from the server
when the server has gone down (diskless/dataless configurations
of SunOS are frequently sworn at for this failing).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.