From owner-freebsd-questions  Thu Aug  3 15:24:13 1995
Return-Path: questions-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.11/8.6.6) id PAA05953
          for questions-outgoing; Thu, 3 Aug 1995 15:24:13 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.cdrom.com (8.6.11/8.6.6) with SMTP id PAA05946
          for <questions@freebsd.org>; Thu, 3 Aug 1995 15:24:08 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA27649; Thu, 3 Aug 95 16:16:58 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9508032216.AA27649@cs.weber.edu>
Subject: Re: Loadable kernel modules/filesystems
To: wollman@halloran-eldar.lcs.mit.edu (Garrett Wollman)
Date: Thu, 3 Aug 95 16:16:57 MDT
Cc: questions@freebsd.org
In-Reply-To: <9508031515.AA20210@halloran-eldar.lcs.mit.edu> from "Garrett Wollman" at Aug 3, 95 11:15:07 am
X-Mailer: ELM [version 2.4dev PL52]
Sender: questions-owner@freebsd.org
Precedence: bulk

> >> > So i can add frogfs to the kernel and mount a frogfs file system without
> >> > having to recompile my kernel or modify system header files.
> >> 
> >> Right, you can indeed do that now.
> 
> > No, I can't.  The vfssw and vfsconf arrays in kern/vfs_init.c are
> > limited to MOUNT_MAXTYPE + 1 entries, and this + 1 entry is there
> > soley to tag the end of the list for vfsinit() in the same file.
> 
> #undef MOUNT_MAXTYPE
> #define MOUNT_MAXTYPE however_big_you_want

This is possible; the default should probably be larger in mount.h
to avoid needing a kernel recompilation, if this is the way the table
is going to be maintained.

What is the change of changing the table to a linked list?  The only
impact would be on file system type lookup, which is only done on
mount, load, and unload.  All of which are allowed to be marginally
slower than the minimum number of nanoseconds possible.

The main magic here would be to use a linker set for the whole kernel
to provide a list of subsystem meta-initialization routines to be
called by the kernel, the entry points of which w ould add the entries
in LKM style on a per file system basis, establishing the initial
linked list.

This would get rid of the majority of static initialization code in
the vfs_init.c modules, as well as some of the other messy code
out there.  In fact, it would allow us to support/unsupport things
like bus specific code without #ifdef's.

> I think this is a remnant of a previous attempt on my part to only
> give as an estimate the actual size of the entries that are present.
> Note that slot zero is reserved; there's probably no point in copying
> out the empty slot zero entry.

Yeah; I didn't consider this a big issue; I'd really like to kill off
the slotted table entirey.

> > Even if you aren't, it bogusly causes the the mount command for the
> > file system type used to provide bogus arguments to the stolen slot,
> 
> Um, excuse me?  Where did you get this idea?  The filesystem is loaded
> by the filesystem-specific mount command; it has no way of providing
> any other sort of arguments.

For the FFS and other currently supported file system types, at least,
it provides an index of MOUNT_UFS in the VFS_SET argument, which really
dictates the slot in which the file system must be loaded.

Increasing MOUNT_MAXTYPE actually fixes the fact that in the current
code, a dynamically loaded FS would overwrite first the devfs slot,
then the union, then the cd9660, etc.  Unless it also specified one of
the slots using MOUNT_ manifest constants.

As it is, the devfs can only successfully load in the devfs slot, etc..

That's what I meant.

> > I won't even get into what happens with the pointer dereference
> > on multiple type-specific vnodeops entries, except to say that
> > only the first is handled correctly, the way that the vfs_opv_init()
> > function is being called in the load.
> 
> The way that the vfs_opv_init() function is being called in the load
> is functionally equivalent to the way it was called in the bad old
> days when all filesystems were statically linked.

Actually, I loaded FFS as a loadable file system.  It required setting
up the ffs_vnodeop_opv_desc, ffs_specop_opv_desc, and ffs_fifoop_opv_desc
and it seemed like the only one of these getting a vector allocated
in vfs_init.c vfs_opv_init() when called from the kern_lkm load code
was the first entry.

When I turned on the DODEBUG messages, there was one "vector at ..."
message per static entry loaded, but only one for all entries for
the loaded module.

If this is actually correct behaviour (?) then I'm wrong about where
the problem is... that would put the error in the vfsinit() code
instead.

> > So if I have a limited implementation, am not using a file system
> > type near the end of the table and don't plan on using a file
> > system type between the end of the table and the point I want
> > allocated, then it *is* possible to do this in a kinda half-assed
> > way.
> 
> Take your obnoxious ad-hominem arguments elsewhere, please,
> Mr. Know-It-All.  They are not welcome here.

This wasn't an ad-hominim argument; I didn't try to assign blame.  I
have to agree in retrospect, that it probably was obnoxious, however.
I appologize for that.  But without your MOUNT_MAXTYPE increase, we
are still talking a full kernel rebuild to load a new file system type
without causing a conflict with a defined type, which is not a happy
prospect.  Even with it, the problems are not fully fixed.

> > Of course, the code is the only documentation for this information.
> 
> Which is no surprise, since it was never documented anywhere.

I agree.  I was commenting on it being non-intuitive where the "gotchas"
were more than anything else.

> > slot when the time cane to use it, nor leave potentially confusing
> > mount_XXX mount commands able to make the mount() call but unable to
> > actually perform the mout for the FS in question.
> 
> Which again is simply and utterly false, as five seconds of code
> inspection would have told you if you came down from your
> high-and-mighty seat and actually looked at it.

If I load three file systems in with an index of -1, I *can* demand
load the cd9660 file system.  However, in so doing, I will blow
over top of the thrid new file system.  Here is the code line from
the file isofs/cd9660/cd9660_vfsops.c

VFS_SET(cd9660_vfsops, cd9660, MOUNT_CD9660, VFCF_READONLY);

And the code from kern/kern_lkm.c

                i = args->lkm_offset = vfc->vfc_index;
[ i = MOUNT_CD9660]
                if (i < 0) {
                        for (i = MOUNT_MAXTYPE - 1; i >= 0; i--) {
                                if(vfsconf[i] == &void_vfsconf)
                                        break;
                        }
                }
                if (i < 0) {
                        return EINVAL;
                }
[ i = MOUNT_CD9660]
                args->lkm_offset = vfc->vfc_index = i;

                vfsconf[i] = vfc;
[ data overwrite ]
                vfssw[i] = vfc->vfc_vfsops;
[ data overwrite ]


Subsequent mount attempts on the third new file system will get the
CD9660 code instead.


This isn't an attempt to trash anyone, this is simply pointing out
that a problem exists, and attempting to find a generally acceptable
fix for it.

Note that upping the MOUNT_MAXTYPE will just mean loading more
file systems before the problem occurs.

The fact that I loaded three was simply to demonstrate a collision
with a configured file system type (CD9660).  If I had picked the
devfs instead, that would only require that I load one.


I think you could argue for forcing an index of -1 for file systems
compiled as loaded modules.  If this was done in the mount.h code,
then you'd lose some of the feature set intended to be in loadable
modules (namely explicit replacement), but you'd resolve the conflict
with an EINVAL error when the static VFS tables were full instead
of causing a potential panic sitation.  Combined with the MOUNT_MAXTYPE
increase that you suggested, this would work around the issue, with
the sole exception of the use of manifest constants for the FS specific
mount commands system calls.


If you don't object, I'd like to look at setting up per module meta
initialization linker sets instead, and convert to a linked list by
putting the entry itself on the list (using a reserved field in the
per file system structure itself for the link vector).


					Regards,
					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.