From owner-freebsd-chat Wed Mar 31 14:24:58 1999 Delivered-To: freebsd-chat@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 19C3C14C26 for ; Wed, 31 Mar 1999 14:24:56 -0800 (PST) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id PAA11262; Wed, 31 Mar 1999 15:24:37 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp02.primenet.com, id smtpd011239; Wed Mar 31 15:24:32 1999 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id PAA24710; Wed, 31 Mar 1999 15:24:31 -0700 (MST) From: Terry Lambert Message-Id: <199903312224.PAA24710@usr07.primenet.com> Subject: Re: Linux vs. FreeBSD: The Storage Wars To: dg@root.com Date: Wed, 31 Mar 1999 22:24:30 +0000 (GMT) Cc: tlambert@primenet.com, unknown@riverstyx.net, dyson@iquest.net, freebsd-chat@FreeBSD.ORG In-Reply-To: <199903302303.PAA13468@implode.root.com> from "David Greenman" at Mar 30, 99 03:03:32 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I really shouldn't get into this, but a couple of points: > > 1) According to people who should know, John H. did the implementation and > integration of the stackable filesystems support in 4.4BSD himself, so if > you have a complaint about they way it was done, then blame him and not > anyone else. The problems were not the stacking layer architecture itself, but the variant nature of the vnode object outside the stacking architecture, and the breakage that resulted from the change in VM model and lock management. There is also the NFS cookie stuff in VOP_LOOKUP, which I believe is traceable to a NetBSD design. If there are two problems with the model itself, as integrated, they the freeing of the pathname buffer in VOP_ABORTOP ("callee frees") and the non-veto nature of certain non-atomic but idempotent operations. These are both *trivial* to fix, and they can be blamed more on the model of hanging lists (e.g. advisory locks) off of FS objects instead of vnodes... a consequence of the environment into which the code was being integrated, not the code itself. One might make an argument for the inability to inherit POSIX namespace escapes down agross pathname component lookups; however, I view this as an artifact of the pathname lookup code that makes the VOP calls, and thus exogenous damage. > 2) If (1) "works" in BSD/OS (and after hearing what Mike K. has to say, I'm > certainly not convinced of this), it's only because they spent a lot of > time making it work. The BSD4.4-Lite2 release aggravated the situation by elevating certain problems. Chief among these are the lockmgr changes, which centralized code, but failed to push the calling interfaces for the code up. Another failure of this code was the reliance on existing instances for the initilization cases. Finally, we can blame this code for the mount model changes being, best case, benign reorganization, and at worst, detrimental to the possibility of a future where device existance is more dynamic than should require a human to intervene in the mount status of a volume. > 3) There were/are a lot of architectural problems in the LFS code. That it > takes 1MB of RAM per mounted filesystem is one of them. Its amusing > buffer management mechanisms are another. Margo knows this as well as > anyone. LFS was never production quality; it was written as a proof of > concept that worked well enough to get some benchmark numbers from and > that's about it. The benchmark numbers weren't that great, so there wasn't > sufficient interest to put in that last 10% that takes 90% of the time. I know this. The problem I have is that the LFS code worked as a proof of concept, and now, due to maintenance failures on the part of the responsible persons making tangential changes, the code no longer functions as a proof of concept. I personally despise the term "bit rot"; code does not mutate, it is only orphaned through improper maintenance. I am fully willing to admit that this might be a character flaw. It's very obvious that FreeBSD is currently poorly able to support a corporate LDAP directory (for example) in a reliable and fault-tolerant way, without resorting to custom hardware soloutions. > 4) The use of the spare time field in FFS for sub-second time keeping is > consistent with what BSD/OS (and apparantly Solaris and others) have done. > Kirk's of the opinion that we'll have to move to larger inodes anyway > due to the limitations of [32bit] block pointers, so using the spare > field for sub-second time keeping, rather than Y2038, isn't an issue in > his opinion. The 32 bit block issue is resolvable, although unprettily, with the addition of another indirect block type and a flag bit. The sub-second time keeping is only really meanful to programs which now depend upon it. This limits the utility to the mtime alone, and there is sufficient spare space that this could have been kept at sub-32 bit precision without taking the reserved fields. If it's true that this was a benign change rather than a short-sighted one, then this begs the question of new FS design. Unfortunately, the brokenenss of the existing stacking does not lend itself to resolving this problem, and it appear likely that this issue will remain unaddressed so long as those capable of addressing it don't/can't do so. > 5) Kirk is ready to see your generalized "soft updates", so get busy. They are on my list. For them to be verified to work, FS stacking must first work. I may also need an indemnification against claims of derivation due to my position at Whistle, my familiarity with Kirk's code, and the commercial license under which it is distributed. It may well be that I have to wait the two years based on the license. To give perspective to this, I have 145 critical technologies on my list. Just one of these is taking me three internet drafts to address adequately (so far), and a possible reformation of the DNSIND working group. > 6) Regarding IPv6: Time has proven that we made the right decision by > waiting. Agreed. > It was sufficient motivation to get the various camps to merge their > efforts. The merged IPv6 will be brought into FreeBSD as soon as it is > ready. The issue is one of migration strategy. There are other areas of research of which remain woefully unexplored. Unfortunately, I don't have as much time as it would take to explore everything which needs to be explored; I should probablly "sell out" for a few years to put myself in Matt's position where he can dedicate 16 hours a day to the problems he sees as most important. Several IPv4/IPv6 migration related issues are obvious, however, so there is no need for a detailed defense of their existance, merely their enumeration: o Link management based on the credentials of the entity creating the link demand for transiently connected (NOT mobile) systems. o Binding of sockets to interfaces instead of addresses, so that deamons don't need to be reconfigured when network configuration is changed. o Trust of interfaces based on physical topology (Obtuse systems addresses this issue, somewhat, though not very publically). o Connection to service rather than to server. This is an important one, as it impinges on server anonymity. I view these areas as "best performed in the context of an IPv6 and with a knowledge of the ``blessed'' IPv4-to-IPv6 migration strategy". This may seem silly; but to put it in context, it was the large scale distribution of BSD4.4 derivatives with TTCP/IP and 1323 which drove routers to support option negotiation. It is the wide scale distribution of a standardized research platform (not merely patches) that will enable these research areas. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message