Date: Wed, 17 Jun 1998 20:20:23 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: michaelh@cet.co.jp (Michael Hancock) Cc: Matthew.Alton@anheuser-busch.com, FreeBSD-fs@FreeBSD.ORG, Scott.Smallie@anheuser-busch.com, Hackers@FreeBSD.ORG Subject: Re: Filesystem Development Toolkit Message-ID: <199806172020.NAA26615@usr01.primenet.com> In-Reply-To: <Pine.SV4.3.95.980617190738.4908A-100000@parkplace.cet.co.jp> from "Michael Hancock" at Jun 17, 98 08:09:48 pm
next in thread | previous in thread | raw e-mail | index | archive | help
[ ... NTFS ... ] > In FreeBSD a partially implemented framework exists, but it needs to be > cleaned up. There are 2 major problems with it now: > > 1) There are ref counting and locking layering violations in the code. > I've cleaned these up for everything except vop_rename, vop_mknod, and > vop_symlink so far. > > 2) There is no object coherence management done. This is a more serious > problem that is inherent in any stacked design where you want to cache > objects, data, and attributes in different layers that represent a file. I can not emphasize the interactions with an NTFS implementation that results from these two errors enough. Linux has a read-only NTFS, and will not be able to implement an actual read-write NTFS without a major overhaul of their VFS/VOP interface. FreeBSD also needs an overhaul, but it is somewhat less dire. At one point in time I had a loaner PPC that I was doing a FreeBSD port on; this port suffered when I had to look for paying work, and returned the machine. The machine was running AIX, and I has the PPC booting (via PPCBug, not OpenBoot) into a single user system. I was able to mount and access an AIX JFS, which has many of the same issues as NTFS, or SGI's XFS (or, to a lesser extent, Veritas's VXFS -- lesser because VXFS's directory management code was SVR4 UFS derived). The main isue here is that VOP_ABORTOP is not used correctly in FreeBSD's VFS interface. It is used to recover cn_pnbuf's allocated by the caller (ie: it is not reflexive). For file systems which are, in effect, transaction interfaces, such as journalling or log stuctured FS's, the VOP_ABORTOP needs to, in fact, actually be capable of aborting a transaction that spans a number of VOP calls. Such as VOP_LOOKUP calls that result in the allocation of a directory entry slot for later use by a rename, mknod, or link operation, or slot locks for use by unlink and rename. Add to this that VOP_ABORTOP is not called in the correct places because of self-freeing the cn_pnbuf by operations that should tag the operation as complete. VOP_ABORTOP needs to be VOP_TRANSACT, needs to take a "BEGIN", "COMMIT", or "ABORT". The "BEGIN" needs to return a transaction ID to the caller, which is provided as a context argument to the other VOP's ... in most cases, this transaction identifier will be an opaque reference to a proc pointer; for NTFS, it needs to be a real transaction, where the proc pointer is a member. In other words, this is about as FS specific as TFS's vnodes. > This will require taking a hard look at the top half of the kernel code > where calls are made to things like vop_rdwr, mmap, vop_{put|get}pages and > the implementation themselves to properly design it. In a layered > environment you need to make sure that operations are either done on the > same vnode of a file or you need a cache_mgr to manage coherence between > the cached objects hanging off of all the vnodes that represent the file. Yes. But one of the main benefits of FreeBSD over NetBSD is the unification of the coherency model. I really think that the code sould be using macrotized VM calls, such that coherency can be enforced automatically in a unified VM, and manually in a non-unified or partially unified (like NetBSD's UVM) VM system. The real issue here is portability of the FS code without dependency on the host kernel implementation. > One reason that I'd like to see a user-space layer implemented is that it > would represent an extra requirement in the design of the solution to the > problems in 2) above. i.e. Instead of putting in VM calls here and there > you would be forced to think of a cleaner solution, otherwise you will > have to implement a lot of weird system calls to emulate those VM calls. Right. The coherency model for such an inteface requires that the interface be reflexive. For a user space implementation, the FS can replace the VM macro references with a wrapper to proxy pseudo VM obects into and out of the kernel. This is another major impetus to making the interfaces reflexive, such that if you call something with a buffer you allocate, then *you* expected to perform the deallocation. You can't proxy a deallocation of a cn_pnbuf in user space if you allocated the thing in the kernel. The one exception allowed is locks, which are objects which are "held". They are in a different abstraction domain. Locks are already abstract in that they are opaque, and you must use operators against them. You can operate on an abstract interface using proxies all you want, and never break anything. Path name buffers and vnodes need to be similary abstract. For vnodes, this means that a proxied interface must manage the data portion of the vnodes itself, as an abstract object. This is somewhat different than the current FreeBSD model, though there are non-integrated patches (not by me) that clean this up somewhat. You would then need to macrotize vnode allocations and releases from a common pool the same way, to allow the pool to be proxied into user space. Right now, FreeBSD is not in a good position to support user space FS developement. Some work is needed to make intermediate stacking layer developement possible. For bottom level device access, which is what a local media FS, like NTFS or FFS have to do, there are over 120 kernel interfaces being imported. That's 120 interfaces to proxy. Some of these are as simple as each and every FS calling the same kernel interface using a address of a local opject ("inode") to implement an abstract data reference (vnode->inode->ref), instead of hanging the reference off the vnode directly (vnode->ref). A direct hanging would allow the kernel interface to use the default VOP's (which breaks layer collapse, and are therefore evil), or, better, put the common calls in common code, and invert the call to make it veto-based. This would preserve the ability to collapse null VOPs in N layers to one layer of function calls, without having to implement null layer stub functions for coherency (this is currently what Tor Egge's patches to null FS do to workaround the interface problems -- the nullfs isn't very "null" after that). Either approach, however, reduces the number of kernel interfaces an FS must consume to be a fullimplementation, and therefore reduces the number of interfaces which must be proied to user space, and back again. > If you implemented a cache_mgr then you could reduce the number of system > calls you would need to implement and use in your user-land emulation of > the kernel APIs. This is a proxy gateway as a single system call, with a large number of proxies. It would be much better to reduce the number of calls that must be proxied, or at least some combination of both (clearly, there must be a proxy gateway of some kind). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199806172020.NAA26615>