Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Mar 1999 22:24:30 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        dg@root.com
Cc:        tlambert@primenet.com, unknown@riverstyx.net, dyson@iquest.net, freebsd-chat@FreeBSD.ORG
Subject:   Re: Linux vs. FreeBSD: The Storage Wars
Message-ID:  <199903312224.PAA24710@usr07.primenet.com>
In-Reply-To: <199903302303.PAA13468@implode.root.com> from "David Greenman" at Mar 30, 99 03:03:32 pm

next in thread | previous in thread | raw e-mail | index | archive | help
>    I really shouldn't get into this, but a couple of points:
> 
> 1) According to people who should know, John H. did the implementation and
>    integration of the stackable filesystems support in 4.4BSD himself, so if
>    you have a complaint about they way it was done, then blame him and not
>    anyone else.

The problems were not the stacking layer architecture itself, but the
variant nature of the vnode object outside the stacking architecture,
and the breakage that resulted from the change in VM model and lock
management.  There is also the NFS cookie stuff in VOP_LOOKUP, which
I believe is traceable to a NetBSD design.

If there are two problems with the model itself, as integrated, they
the freeing of the pathname buffer in VOP_ABORTOP ("callee frees")
and the non-veto nature of certain non-atomic but idempotent operations.
These are both *trivial* to fix, and they can be blamed more on the
model of hanging lists (e.g. advisory locks) off of FS objects instead
of vnodes... a consequence of the environment into which the code was
being integrated, not the code itself.

One might make an argument for the inability to inherit POSIX namespace
escapes down agross pathname component lookups; however, I view this
as an artifact of the pathname lookup code that makes the VOP calls,
and thus exogenous damage.


> 2) If (1) "works" in BSD/OS (and after hearing what Mike K. has to say, I'm
>    certainly not convinced of this), it's only because they spent a lot of
>    time making it work.

The BSD4.4-Lite2 release aggravated the situation by elevating certain
problems.  Chief among these are the lockmgr changes, which centralized
code, but failed to push the calling interfaces for the code up.  Another
failure of this code was the reliance on existing instances for the
initilization cases.  Finally, we can blame this code for the mount
model changes being, best case, benign reorganization, and at worst,
detrimental to the possibility of a future where device existance is
more dynamic than should require a human to intervene in the mount
status of a volume.


> 3) There were/are a lot of architectural problems in the LFS code. That it
>    takes 1MB of RAM per mounted filesystem is one of them. Its amusing
>    buffer management mechanisms are another. Margo knows this as well as
>    anyone. LFS was never production quality; it was written as a proof of
>    concept that worked well enough to get some benchmark numbers from and
>    that's about it. The benchmark numbers weren't that great, so there wasn't
>    sufficient interest to put in that last 10% that takes 90% of the time.

I know this.  The problem I have is that the LFS code worked as a
proof of concept, and now, due to maintenance failures on the part of
the responsible persons making tangential changes, the code no longer
functions as a proof of concept.  I personally despise the term "bit rot";
code does not mutate, it is only orphaned through improper maintenance.  I
am fully willing to admit that this might be a character flaw.

It's very obvious that FreeBSD is currently poorly able to support a
corporate LDAP directory (for example) in a reliable and fault-tolerant
way, without resorting to custom hardware soloutions.


> 4) The use of the spare time field in FFS for sub-second time keeping is
>    consistent with what BSD/OS (and apparantly Solaris and others) have done.
>    Kirk's of the opinion that we'll have to move to larger inodes anyway
>    due to the limitations of [32bit] block pointers, so using the spare
>    field for sub-second time keeping, rather than Y2038, isn't an issue in
>    his opinion.

The 32 bit block issue is resolvable, although unprettily, with the
addition of another indirect block type and a flag bit.

The sub-second time keeping is only really meanful to programs which
now depend upon it.  This limits the utility to the mtime alone, and
there is sufficient spare space that this could have been kept at
sub-32 bit precision without taking the reserved fields.

If it's true that this was a benign change rather than a short-sighted
one, then this begs the question of new FS design.  Unfortunately, the
brokenenss of the existing stacking does not lend itself to resolving
this problem, and it appear likely that this issue will remain
unaddressed so long as those capable of addressing it don't/can't do so.


> 5) Kirk is ready to see your generalized "soft updates", so get busy.

They are on my list.  For them to be verified to work, FS stacking
must first work.  I may also need an indemnification against claims
of derivation due to my position at Whistle, my familiarity with Kirk's
code, and the commercial license under which it is distributed.  It
may well be that I have to wait the two years based on the license.

To give perspective to this, I have 145 critical technologies on my
list.  Just one of these is taking me three internet drafts to address
adequately (so far), and a possible reformation of the DNSIND working
group.


> 6) Regarding IPv6: Time has proven that we made the right decision by
>    waiting.

Agreed.

>    It was sufficient motivation to get the various camps to merge their
>    efforts. The merged IPv6 will be brought into FreeBSD as soon as it is
>    ready.

The issue is one of migration strategy.  There are other areas of
research of which remain woefully unexplored.  Unfortunately, I don't
have as much time as it would take to explore everything which needs
to be explored; I should probablly "sell out" for a few years to put
myself in Matt's position where he can dedicate 16 hours a day to the
problems he sees as most important.  Several IPv4/IPv6 migration related
issues are obvious, however, so there is no need for a detailed defense
of their existance, merely their enumeration:

o	Link management based on the credentials of the entity creating
	the link demand for transiently connected (NOT mobile) systems.

o	Binding of sockets to interfaces instead of addresses, so that
	deamons don't need to be reconfigured when network configuration
	is changed.

o	Trust of interfaces based on physical topology (Obtuse systems
	addresses this issue, somewhat, though not very publically).

o	Connection to service rather than to server.  This is an
	important one, as it impinges on server anonymity.

I view these areas as "best performed in the context of an IPv6 and
with a knowledge of the ``blessed'' IPv4-to-IPv6 migration strategy".

This may seem silly; but to put it in context, it was the large scale
distribution of BSD4.4 derivatives with TTCP/IP and 1323 which drove
routers to support option negotiation.  It is the wide scale distribution
of a standardized research platform (not merely patches) that will
enable these research areas.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903312224.PAA24710>