From owner-freebsd-chat Wed Mar 31 22:47:47 1999 Delivered-To: freebsd-chat@freebsd.org Received: from iquest3.iquest.net (iquest3.iquest.net [209.43.20.203]) by hub.freebsd.org (Postfix) with SMTP id 4FF62151AB for ; Wed, 31 Mar 1999 22:47:44 -0800 (PST) (envelope-from toor@dyson.iquest.net) Received: (qmail 16961 invoked from network); 1 Apr 1999 06:47:22 -0000 Received: from dyson.iquest.net (198.70.144.127) by iquest3.iquest.net with SMTP; 1 Apr 1999 06:47:22 -0000 Received: (from root@localhost) by dyson.iquest.net (8.9.1/8.9.1) id BAA19102; Thu, 1 Apr 1999 01:47:20 -0500 (EST) From: "John S. Dyson" Message-Id: <199904010647.BAA19102@dyson.iquest.net> Subject: Re: Linux vs. FreeBSD: The Storage Wars In-Reply-To: <199903312141.OAA21836@usr07.primenet.com> from Terry Lambert at "Mar 31, 99 09:41:58 pm" To: tlambert@primenet.com (Terry Lambert) Date: Thu, 1 Apr 1999 01:47:20 -0500 (EST) Cc: tlambert@primenet.com, dyson@iquest.net, hamellr@dsinw.com, unknown@riverstyx.net, freebsd-newbies@FreeBSD.ORG, freebsd-chat@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL32 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > > Excuse me, ask Kirk. He designed the damn FFS with those reserved fields > > > for a reason. > > > > Those reserved, unused fields? It makes little difference as to how the > > problem is fixed, if the problem isn't fixed :-). > > Ugh. Now you're just trying to get my goat. > Not really. > > > > > > This is what stacking layers and namespace escapes were invented for. > > > This is why John's students have been able to implement such VFS stacking > > > layers (albeit, not in FreeBSD, where layer stacking is broken), but the > > > architectural principles are surely not that difficult to grasp. > > > > The framework as it is, is super broken, and any fixes to date are > > only expedient and insufficient. > > And or "too complicated to swallow in a small amount of time without > fully understanding the problem". > I fully understand the problem, and know that the current structure is not the right way to do it. > > > It requires total architecture > > rewrite if you want reasonable efficiency (not throwing performance > > away) and coherency. Half solutions need not apply -- if the framework > > as it was conceived and implemented so far in *BSD was fully implemented, > > there would either be intractable coherency problems, or probably > > intolerable efficiency issues. > > Any putative efficiency penalties (granting their existance for the sake > of discussion) would be paid only by the stacking layers themselves, and > as it currently doesn't work, you aren't going to be paying an efficiency > penalty for anything you currently use. > > > So efficiency is a NULL argument. > It cannot be a NULL argument, because continual polishing the t*rd isn't really solving the problem. > > IF VM alias objects are to be introduced (and that's a big mother "if", > in my opinion), it should only be done *after* it is proven, using > formal analysis methods, that unintentional aliases have been rendered > impossible. > The current VM backing scheme is correct and needs only minor extension. In fact, the VM backing is natural (e.g. copy on write), whilst the current VFS layering doesn't handle the needed semantics for coherency without lots of traversal of the layers. Bottom line, the VM backing already DOES work, or nothing in the system would work. > > The only way I see clear for this to happen is if they don't both > exist in the code at the same time. > Yep, get rid of the unintentional VFS layering bugs, by taking advantage of the already needed VM layering for any kind of reasonable VM behavior. That VM stuff is there anyway, so why muck it up with a parallel, and semantically incorrect (or inefficient) structure? The VM layering already has the needed mechanisms for handling shared (and modified) memory "repositories." By constraining onself to the current VFS layering, it simply complicates the system with two different kinds of layering schemes. Don't forget that sometimes generalization of a problem, simplifies it -- and the VFS scheme is TOO conventionally-file oriented, and not very oriented towards data. The "file" abstraction is too specific. I admit that the VM schemes need to be better documented for those who haven't read the MACH (and the new deamon book) information, but once the underlying principles are understood, it is clear that files are a paradigm that are too focused towards one kind of thinking. Such new documentation would mostly be a repeat of already available materials anyway. As soon as a "file" is abstracted to "memory objects", then things become easier. A memory object can reside anywhere, and have all kinds of inheritance attributes, and interrelations. (A file can also, but the scheme as presented in 4.4BSD VFS doesn't do so -- and to expand the notion of file to what I call "memory objects", changes the current layering code so severely as to make it better to almost start over.) The Heidemann framework is a good document on the needed semantics from a file standpoint, but addresses weakly the issues of the memory objects (be they in memory, on disk, or across a network.) With correct protocols, the "memory object" scheme actually does what the programmer expects. The current VFS layering framework only very weakly handle the issues of the "data containers" or "memory objects" themselves. The non-bidirectional nature of the current layering also forgets the forward movement of OS design. (Of course, if every I/O call or access to memory traverses the entire chain, then the current framework might work.) The memory oriented approaches eliminate (or at least handle) the aliasing and local caching issues correctly. The original 4.4/2 framework was so bad, that even local mmaped objects are only weakly coherent (actually not even that), let alone any other caching in the pipeline. With the memory schemes, the problem solves itself (with only minor consideration for the additional expected file semantics.) It is only the proper implementation of VM coherency that the current code works local to a given vnode. It is only a small VM extension, and definition for use, to make an entire layered scheme work. By reworking the entire VFS layering scheme (still looking somewhat like the current implementation, but properly abstracted) the entire solution (instead of a hack solution) can be made available. Remember, both FILE and MEMORY data needs to be presented to the user, and FILE data is a narrow picture of memory. MEMORY can easily be made more specific by presenting it as a file -- however expanding the semantics of a file to memory is more complex (especially with sharing.) When a conversion to MEMORY from FILE and back again, has to be done at every layer, then a scheme is going to be very inefficient or complex. If the abstraction is kept as memory at each layer, then complexities are lessened. Since each layer might have to present a memory image (either as caching or mmap), then with a file representation, each layer has to do the "hard" conversion (given the anachronisitic file-only abstraction.) There is NO cost in keeping the abstraction as memory as long as possible in the chain. If a conversion is needed at machine boundaries, it might be possible to avoid the file abstraction entirely, and create a (MEMORY <--> SOCKET) protocol directly. (It might not be needed to create and use a more complex (MEMORY <--> NFS <--> SOCKET) thing.) > > > > Why do you put words in my mouth about doubling inode size? Straw man... > > You are mentioning ACL's. The most current FS ACL work is being done > in NetBSD (not FreeBSD). I thought you were referencing a modern > research project when you referenced ACL's. My mistake. > Yep... By assuming what I have been thinking about, it shows that arguments about such might be misguided. > > > > The ODS will need rework before Y2038 anyway. I suspect that if the code > > is working by 2010, things will be all well. A UFS2 would eventually be a > > good thing. > > Fie. You are the one who originally posted about seeing years of work > frittered away. I am not prepared to repeat that journey; it is a fool's > quest. > Fallacious argument -- you aren't the author of the original code or those changes are you? The author of the code apparently accepted the changes. (In fact, the changes were also compatible with other users and developers on the codebase.) > > > The ODS will be changed in the future anyway. It makes little > > difference as to where the data is. There is room in the inode, if > > needed. > > > > You are making a mountain out of a molehill. > > I think you need to go byte counting in the inode structure looking > for the room you claim is there. Without modifying the inode into > incompatability with existing FS's, it's just not there. > I suggest coming up with a solution then. Of course, I suggest that UFS/ODS2 needs to be thought through. By taking micro pot-shots doesn't really solve the problem (or the other problems that needed to be solved in the shorter term.) John To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message