FreeBSD Mail Archives

Date:      Fri, 21 Mar 1997 10:40:15 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        dfr@render.com (Doug Rabson)
Cc:        terry@lambert.org, msmith@atrad.adelaide.edu.au, bde@zeta.org.au, dgy@rtd.com, hackers@FreeBSD.ORG, helbig@MX.BA-Stuttgart.De
Subject:   Re: wd driver questions
Message-ID:  <199703211740.KAA15929@phaeton.artisoft.com>
In-Reply-To: <sen2rxsdjm.fsf@minnow.render.com> from "Doug Rabson" at Mar 21, 97 11:13:49 am

> > The NFS_NOSERVER code changes the way the lease VOP's operate, not
> > because the code should be condition, but because the code is badly
> > integrated.
> > 
> > The registration of lease management mechanisms should take place
> > for all VFS consumers, not just NFS.  This is a generic issue with
> > not being able to register a set of event callback entry points for
> > a module.  The *real* problem here is that the NFS server wants to
> > register event handlers, but the server and the client code are in
> > the same module space.  This is really a source organization issue.
> 
> Interesting idea.  Any chance of a design for such a system?

It should take place in the vfs_init() (which doesn't have a macro op)
or the VFS_START() (which does).

One problem here is that there is no corresponding "deinit"; it would
have to be defined.

You would generate two classes of even initially:

1)	VFSEVENT( type, fs)

	A VFS event of 'type' has occurred on the file system 'fs'

2)	VOPEVENT( type, fs, vn)

	A vnode operation event of type 'type' has occurred on the
	file system 'fs' affecting the file system vnode 'vn'

This really goes back to defining the FS in terms of events and handling
of events... this needs to be done for generic support of soft updates
in all FS's, in any case.

The "event management" would be a very generic subsystem in the
kernel, so an FS specific implementation is probably an error.  The
lease code would be changed over to use "lease events" instead, and
the NFS would register to receive the events.  If there were no NFS
server registration for lease events, then the events would be
ignored (all events for which there is not a handler are simply
ignored).  For events with multiple handlers, all handlers are
called (two browser windows open on the same directory, and so on).
I supposed we could add an arbitrary "priority" field, and provide
the ability for a handler to "swallow" an event to make sure that it
does not propagate, but I'd rather not deal with inheritance issues
juset yet.


One could consider "soft" interrupts and "top end drivers" as event
subsystem clients.


> > I could see a number of useful events that a VFS consumer at the
> > system call level would want to register.  For instance, directory
> > modification events would probably be useful for a file browser
> > application to monitor.
> 
> Directory modification events would be pretty useful for the NFS
> server as well to extend the useful lifetime of NFS client cookies.
> The event would need to supply more information than just 'changed'
> since the seek values are still valid unless the directory was
> compacted.

Yes; this is more like a "lock range decoelesce" for the modified
block.  The client cookies in the "lock range" at the time of the
event would be invalidated.  This would resolve the Sun NFSv3
interoperability issue for the most part (there is still a potential
boundry condition, where BSD4.4 does directory truncation, and the
previous versions of UNIX and clone systems did not; but that's pretty
easy to deal with: if the cookie is past EOF, it's invalid).


If you are planning on doing any of this, I'm going to have to look
to see how much of my mount code changes made it in via Jeffrey Hsu;
the opportunity to modify the VFSOP operation is something that
merits *serious* design consideration... much of the original CSRG
work in the VFS integration was haphazard, IMO, and since we are no
longer under the legal pressures that caused them to have to cut
corners in the first place, we sould step back before diving in.  I
would like to have input here, if you'll let me.


Clearly, the code needs to have reflexive op's (to allow us to
deregister event syncs), but there are other issues as well;
the following, in addition to the deinit, will affect the contents
of the vfsops as well:

o	The original CSRG vfs_init depended on a statically linked
	FS to get the number of element in a 'struct vnodeop_desc'
	array from the first FS, which was hard-coded to be FFS.
	If it still does this, then kern/vnode_if.sh should be changed
	to emit code for vnode_if.c to calculate this from the template
	using 'sizeof( vfs_op_descs)/sizeof(struct vnodeop_desc)'
	instead.  This removes the init dependency on FFS, or on a
	statically linked FS at all (patches submitted 06/06/94).

o	The difference between a 'vfs_mount' and a 'vfs_root' should
	be destroyed.  This can be accomplished by causing all mounts
	to occur into the mounted FS list as if they were root mounts,
	and then handling the "mount point covering" in common code
	shared by all FS's, instead of duplicating parts of it in each
	FS's "mount" routine.  This gets rid of 'vfs_root', and changes
	'vfs_mount' significantly (partial patches for root mount
	merge into mount submitted 06/14/94).

o	When a 'mount' occurs (the mounted FS list is updated), the
	second stage is to map the FS into the FS hierarchy (root
	FS inferior name space).  Since this is in common code, a
	'mapping event' can be genreated, and 'handled' by the handlers.
	One handler would be registered by the NFS server... and thus
	all of the export processing moves out of the per FS code and
	up into the NFS server code, where it belongs, so that FS's
	do not have to have specific knowledge of exports (cv: the
	current code).

o	The previous change provides, as a side effect, the ability
	to handle FS media arrival events cleanly.  A mount is just
	an event handler in the aformentioned event subsystem, and
	a volume arival (or departure) is "just another event".

o	The VFS_QUOTACTL should go away; quotas should be implemented
	as a stacking layer so they can apply to all file systems,
	not just UFS/FFS derived FS's.


More common code == less potential for failures resulting from partially
propagated changes with global effect == narrower change scoping == an
overall more robust system.


> > Heh.  You aren't listening to me... you don't get rid of the boot-loader
> > code.  If you don't get rid of it, you can still use it.
> > 
> > It's like the BIOS based boot being able to use the INT 13 redirector
> > supplied by OnTrack, when you boot from an OnTrack drive.  As long as
> > you don't override it, it's still there.
> 
> Actually, I expect the boot loader will have to be quite simple.  To
> be practical, even with a 3 stage bootstrap the third stage will have
> to fit into 64k since it will need to use INT 13 for its disk access
> and our tools can't (and shouldn't) generate anything except tiny
> model programs.  As a result, it will have severely truncated
> read-only file system support (see libsa from NetBSD).  This is
> sufficient to load up the kernel.  The boot will be discarded as soon
> as the kernel is entered.

It's tempting to implement a protected mode VMM in the third stage boot;
have you seen:

	Protected Mode Software Architecture
	_Tom Shandly_, MindShare, Inc.
	Addison-Wesley Developers Press
	ISBN 0-201-5447-X

Yet?


> I was reading through libsa and our boot code yesterday and I believe
> that a 3 stage bootstrap for biosboot would be pretty easy.  If the
> third stage was written using libsa then life would be much easier
> when writing an ELF loader.  The filesystem and file descriptor
> support in libsa mimic normal syscalls making it possible to write and
> test the loader in userland before changing the bootstrap.

Yes; I would like to see the objects move between the systems, actually,
which is why I was talking about a vnode-as-fd based kernel file I/O
subsystem with Mike the other day.


> I for one find fiddling with the bootstrap a hair raising experience.
> I have some bad memories from the 386bsd days with bootstraps and
> disklabels.  Shudder.

Well, this is an issue for "device arrival" events (implied on a
"probe true" for a physical device) which are handled by a device
mapping layer handler.  Used in conjunction with a devfs, this would
"magically" solve all the partition and diskslice and ... problems.

I've discussed this before, but in case it wasn't obvious how the
algorithm could work, here's a 50,000 foo view of the idea:


	probe()
	{
		if( found)
			event_send( RAW_DEVICE_ARRIVE, some_real_dev);

		return;
	}


	RAW_DEVICE_ARRIVE( ... dev)
	{

		/*
		 * Raw device: look for a mapping layer that
		 * recognizes the format of the data on this
		 * device... ie:
		 *	o	DOS partition table
		 *	o	DOS extended partition table
		 *	o	BSD diskslice
		 *	o	SVR4 vtoc
		 *	o	BAD144
		 *	o	etc. ...
		 */
		for( i = 0; i < num_log_to_phys_layers; i++) {
			if( map_layer[ i].recognize( dev))
				return;
		}

		else	event_send( END_DEVICE_ARRIVE, ... dev);

		return;
	}


	END_DEVICE_ARRIVE( ... dev)
	{
		/*
		 * End device: distinguished because it does not
		 * have a logical-to-physical mapping layer.  It
		 * must be an FS  or it's just a device...
		 */
		lastmp = NULL;
		for( i = 0; i < num_fs_types; i++)
			if( ( mp = fstype[ i].mount( lastmp, dev)) != NULL) {
				lastmp = mp;
				/*
				 * Mounted into mount tab... need
				 * to mount into FS hierarchy, if
				 * mount point defined...
				 */
				mount_into_fs( mp)
				/*
				 * fall through for other FS's so we
				 * can support quotas, umsdosfs, and
				 * so on...
				 */
			}
		}

		if( !lastmp) {
			/*
			 * device not mounted... diagnostic to console
			 * if debugging, etc.
			 */
		}
		return;
	}

	
	/* map_layer[ DOS_PARTITIONING] ... */
	int
	dos_partition_recognize( dev)
	{
		/*
		 * recognition sequence...
		 */
		if( !recognize)
			return( 0);	/* not ours!*/

		/*
		 * Each valid partition is a new raw device...
		 */
		for( i = 0; i < 4; i++) {
			if( !valid( i))
				continue;
			event_send( RAW_DEVICE_ARRIVE, dos_p_mkdev(i));
		}

		return( 1);
	}


And so on... the details of the mapping layer device implementation,
and the collapse of a logical-to-physical on top of a logical to
physical, etc., are implementation issues (we can discuss them seperately,
or off line).



					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199703211740.KAA15929>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation