Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Jun 1998 20:20:23 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        michaelh@cet.co.jp (Michael Hancock)
Cc:        Matthew.Alton@anheuser-busch.com, FreeBSD-fs@FreeBSD.ORG, Scott.Smallie@anheuser-busch.com, Hackers@FreeBSD.ORG
Subject:   Re: Filesystem Development Toolkit
Message-ID:  <199806172020.NAA26615@usr01.primenet.com>
In-Reply-To: <Pine.SV4.3.95.980617190738.4908A-100000@parkplace.cet.co.jp> from "Michael Hancock" at Jun 17, 98 08:09:48 pm

next in thread | previous in thread | raw e-mail | index | archive | help
[ ... NTFS ... ]

> In FreeBSD a partially implemented framework exists, but it needs to be
> cleaned up.  There are 2 major problems with it now:
> 
> 1) There are ref counting and locking layering violations in the code.
> I've cleaned these up for everything except vop_rename, vop_mknod, and
> vop_symlink so far.
> 
> 2) There is no object coherence management done.  This is a more serious
> problem that is inherent in any stacked design where you want to cache
> objects, data, and attributes in different layers that represent a file.

I can not emphasize the interactions with an NTFS implementation that
results from these two errors enough.

Linux has a read-only NTFS, and will not be able to implement an actual
read-write NTFS without a major overhaul of their VFS/VOP interface.

FreeBSD also needs an overhaul, but it is somewhat less dire.  At one
point in time I had a loaner PPC that I was doing a FreeBSD port on;
this port suffered when I had to look for paying work, and returned
the machine.

The machine was running AIX, and I has the PPC booting (via PPCBug,
not OpenBoot) into a single user system.  I was able to mount and
access an AIX JFS, which has many of the same issues as NTFS, or
SGI's XFS (or, to a lesser extent, Veritas's VXFS -- lesser because
VXFS's directory management code was SVR4 UFS derived).

The main isue here is that VOP_ABORTOP is not used correctly in FreeBSD's
VFS interface.  It is used to recover cn_pnbuf's allocated by the caller
(ie: it is not reflexive).

For file systems which are, in effect, transaction interfaces, such
as journalling or log stuctured FS's, the VOP_ABORTOP needs to, in fact,
actually be capable of aborting a transaction that spans a number of
VOP calls.  Such as VOP_LOOKUP calls that result in the allocation of
a directory entry slot for later use by a rename, mknod, or link
operation, or slot locks for use by unlink and rename.

Add to this that VOP_ABORTOP is not called in the correct places because
of self-freeing the cn_pnbuf by operations that should tag the operation
as complete.

VOP_ABORTOP needs to be VOP_TRANSACT, needs to take a "BEGIN", "COMMIT",
or "ABORT".  The "BEGIN" needs to return a transaction ID to the caller,
which is provided as a context argument to the other VOP's ... in most
cases, this transaction identifier will be an opaque reference to a
proc pointer; for NTFS, it needs to be a real transaction, where the
proc pointer is a member.  In other words, this is about as FS specific
as TFS's vnodes.


> This will require taking a hard look at the top half of the kernel code
> where calls are made to things like vop_rdwr, mmap, vop_{put|get}pages and
> the implementation themselves to properly design it.  In a layered
> environment you need to make sure that operations are either done on the
> same vnode of a file or you need a cache_mgr to manage coherence between
> the cached objects hanging off of all the vnodes that represent the file. 

Yes.  But one of the main benefits of FreeBSD over NetBSD is the unification
of the coherency model.

I really think that the code sould be using macrotized VM calls, such
that coherency can be enforced automatically in a unified VM, and
manually in a non-unified or partially unified (like NetBSD's UVM)
VM system.

The real issue here is portability of the FS code without dependency on
the host kernel implementation.


> One reason that I'd like to see a user-space layer implemented is that it
> would represent an extra requirement in the design of the solution to the
> problems in 2) above.  i.e. Instead of putting in VM calls here and there
> you would be forced to think of a cleaner solution, otherwise you will
> have to implement a lot of weird system calls to emulate those VM calls.

Right.  The  coherency model for such an inteface requires that the
interface be reflexive.  For a user space implementation, the FS can
replace the VM macro references with a wrapper to proxy pseudo VM obects
into and out of the kernel.

This is another major impetus to making the interfaces reflexive,
such that if you call something with a buffer you allocate, then *you*
expected to perform the deallocation.  You can't proxy a deallocation
of a cn_pnbuf in user space if you allocated the thing in the kernel.

The one exception allowed is locks, which are objects which are "held".
They are in a different abstraction domain.  Locks are already abstract
in that they are opaque, and you must use operators against them.  You
can operate on an abstract interface using proxies all you want, and
never break anything.  Path name buffers and vnodes need to be similary
abstract.

For vnodes, this means that a proxied interface must manage the data
portion of the vnodes itself, as an abstract object.  This is somewhat
different than the current FreeBSD model, though there are non-integrated
patches (not by me) that clean this up somewhat.

You would then need to macrotize vnode allocations and releases from a
common pool the same way, to allow the pool to be proxied into user
space.


Right now, FreeBSD is not in a good position to support user space
FS developement.  Some work is needed to make intermediate stacking
layer developement possible.

For bottom level device access, which is what a local media FS, like
NTFS or FFS have to do, there are over 120 kernel interfaces being
imported.  That's 120 interfaces to proxy.

Some of these are as simple as each and every FS calling the same kernel
interface using a address of a local opject ("inode") to implement
an abstract data reference (vnode->inode->ref), instead of hanging
the reference off the vnode directly (vnode->ref).  A direct hanging
would allow the kernel interface to use the default VOP's (which breaks
layer collapse, and are therefore evil), or, better, put the common
calls in common code, and invert the call to make it veto-based.  This
would preserve the ability to collapse null VOPs in N layers to one
layer of function calls, without having to implement null layer stub
functions for coherency (this is currently what Tor Egge's patches
to null FS do to workaround the interface problems -- the nullfs isn't
very "null" after that).

Either approach, however, reduces the number of kernel interfaces an
FS must consume to be a fullimplementation, and therefore reduces the
number of interfaces which must be proied to user space, and back again.



> If you implemented a cache_mgr then you could reduce the number of system
> calls you would need to implement and use in your user-land emulation of
> the kernel APIs.

This is a proxy gateway as a single system call, with a large number of
proxies.  It would be much better to reduce the number of calls that must
be proxied, or at least some combination of both (clearly, there must be
a proxy gateway of some kind).


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199806172020.NAA26615>