Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 30 Mar 1999 22:07:43 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        toor@dyson.iquest.net (John S. Dyson)
Cc:        tlambert@primenet.com, unknown@riverstyx.net, dyson@iquest.net, freebsd-chat@FreeBSD.ORG
Subject:   Re: Linux vs. FreeBSD: The Storage Wars
Message-ID:  <199903302207.PAA05079@usr04.primenet.com>
In-Reply-To: <199903302028.PAA16589@dyson.iquest.net> from "John S. Dyson" at Mar 30, 99 03:28:23 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > Why doesn't FreeBSD FS stacking work?
>
> It never did, and there hasn't been much demand.

Hey, speak for yourself.  I've gone so far as to approach John Heidemann
about rereleasing his code donated to the CSRG under the GPL for a
Linux implementation (yes, I'm deadly serious).

John's stuff worked before it was damaged into inoperability, and
it currently works fine on BSDI.


> It actually would be worthwhile to totally remove the stacking, or fix
> it with a VM approach.  It is totally wrong to use buffer/VP approach,
> but there are those who advocate it (too many people are "bp" heads --
> bp's are good only for I/O, not object or caching representation.)

It is wrong to think of vnodes as caching objects instead of backing
objeccts.

Yes, I know all of the unified VM and buffer cache centric arguments
in favor of this, but the point of having a well defined framework
and API is the ability to share FS code with other OS's.  And not all
other OS's have unified VM and buffer caches.  Implemetnation of a
common API must take into account the lowest common denominator, or
you will be creating a FreeBSD specific API that is not generally useful.


> FS stacking will not help to gain commercial work, but properly working
> reasonbly sized file I/O does.

It is (or should be, since the announcement on these lists last week)
well known that Veritas is porting to Linux.

This code would work in FreeBSD as well, if the Linux and FreeBSD
VFS stacking frameworks were identical API's.

Because the API's are not identical (in fact, both are sufficiently
fluid and architecturally damaged as to render them nearly useless),
this work is bound to be another checkmark in the Linux column that
will remain absent from the FreeBSD column.


> >  Why was X.25 broken, and then
> > not fixed?
>
> I don't know.  I guess that it is an orphan, where the implementation hasn't
> been commercially interesting (or the companies aren't contributing the work
> back.)  It certainly isn't interesting from a research or fun standpoint.
> Since companies like Whistle do networking, it would be nice to see a
> contribution in that area?  (I know that there is supposedly alot of X.25
> stuff out there, but why hasn't it been supported?  Answer: apparently
> other types of X.25 interfacing methods are being used.)

The real answer is that it was broken when someone was permitted to
change interfaces upon which it depended, but was not thereafter held
accountable for keeping the "unsexy" code working.  To use your
terminology, "it was a cowboy what done it".

> >  Why was LFS broken, and then not fixed?
>
> It was always broken, and has always been basically a festering mess.

I think Margo Seltzer would take some issue with this.  I would trust
her authority as an FS expert above that of anyone in the core team;
after all, file systems are her life's work.


> LFS wasn't rewritten because softupdates has been the better answer
> for most of what LFS can do.

Soft updates as they are realized in FreeBSD are a tiny fraction of
what they could, and should, have been.  I have had discussions with
Both Ganger and Patt via email, and discussions in person with Kirk
about the general soloution for the problem.

The FreeBSD soloution is far from general (or dependencies would be
capable of spanning stacking layers, and it would be possible to
build a transaction system into the kernel, accessible from user
space, and making such implied data consistency guarantees as need
to support true database systems).

While Dr. McKusick has made some good points in favor of the less
general soloution (including "that's not what Whistle paid him to do"),
his arguments about dependency representation are not among them.
There is no reason for the dependency representation to bloat up as
a result of generalizing the relationships.  The code that is lacking
generality is not the dependency representation, nor the dependency
conflict resoloution, but in fact the conflict and dependent event
registration mechanisms.  Right now, the edges and the nodal
relationships are hard coded in the structure of the code.  It
is entirely possible to replace this code with code that implements
resolver and event-of-interest registration at the time filesystems
are instanced.  Yes, this requires either a Warshall's algorithm at
instance time to precalculate the relationships -- BUT THIS IS NOT
RUNTIME OVERHEAD, any more than the VOP descriptor arrays should
constitute runtime overhead.  Clever use of Hamiltonians would allow
incremental caluclation of Warshall's by precomputing everything
but leaf nodes.  Sedgewick discusses this algorithm in his book.

Even so, there is still a reason for Journalling and Logging.  If
nothing else, it allows for deterministic failure recovery, whereas
asoft updates merely guarantee consistency, without any recourse
for software fault tolernace in the fact of implied relationships
(e.g., the relationship between a "rerods" file and an "index" file
in a simple relational database).

These issues can not be resolved until it is possible to acknowledge
a transaction as having completed ONLY AFTER SUFFICIENT INFORMATION
IS COMMITTED TO STABLE STORAGE, SUCH THAT IT MAY BE ROLLED FORWARD
AFTER A FAILURE.  This distinction is of paramount importances, and
can not be over-emphasized.


> It is totally wrong to implement a bp
> based LFS anyway, note the hacks in vfs_bio to support that travesty.

With respect, these are historical artifacts that also applied to
the FFS of the same code vintage, and which predate the unification
of the VM and buffer cache code.  This is a case of failure to cross
"T"'s and dot "I"'s during the VM and buffer cache unification wherein
the equivalent FFS issues *were* addressed.  Code does not mutate.  If
code stops working, it is a failure in maintenance, not a failure of
the code (presuming it worked beforehand -- and LFS did; it merely
lacked a cleaner process to deal with issue like garbage collection
and fragmentations -- issues addressed in later versions of Margo's code).


> >  Why does the VM
> > system like to write password database pages back to the crontab, if
> > you stress the system by running newsyslog once a minute from a cron
> > that modifes copy-on-write pages mmap'ping the password database into
> > code, as if the pointers in the pwent pointed back to static buffers
> > in the C library?
>
> Which version? and please PR it.  I have *never* seen it in person recently,
> and locally hacked kernels can cause unexpected brokenness.  The problem
> of modified programs has been fixed a long time ago.  Also, it has taken
> awhile to find someone competent to work on the VM/VFS code.  There
> is a possibility now, but most of the people with the "balls" to work
> on the code with commit frenzies, are often not careful enough to do so.

I believe Matt has much of this in hand.  But it is certainly not finding
it's way back into 2.x-STABLE, per the developement model.  Yes, I know
that -current is 4.x now, and 3.1-STABLE is the maintenance target, but
the fact remains that these problems were identified during the period
of time when the 2.x-STABLE branch was *supposedly* being actively
maintained.  I *personnaly* identified two of these problems, in great
gory detail, and their existance was "pooh-pooh"'ed until 2.x was no
longer an active maintenance release.  I had to fincd explicity
demonstration cases for the people who didn't feel like bothering to
try to follow my theoretical arguments, and refused to work from
anything but concrete examples.


> Time for cowboys is LONG LONG gone, and it seems that cowboys are the most
> commonly available resource.

What do you expect, when you set up camp outside Dodge City?  Bankers?


> > In a more general sense: Why are most of the Usenix papers scheduled
> > this year not about work done on FreeBSD, if FreeBSD is the premeire
> > research OS?  Where is the research?
>
> Alot of work is done privately.  Research != papers, there is NO advantage
> for a FreeBSD team member to give away the mechanisms for FreeBSD's behavior.

Malarkey.  What do you care if the software running the ATM machine and
using the correct algorithm is FreeBSD, or some other software using
the correct algorithm?  The point of the exercise is to increase overall
correctness in the world.

What's the point of using a BSD license, if the intent is not to spread
the code as far and as wide as possible?  C.v. TCP/IP.

Obscurity hurts everyone.  The obscurity of the VM algorithms (not to
pick favorites, but the VM system is one place where complexity was
allowed to grow in FreeBSD unshackled by the "we must understand this
if you do" mentality) was, in fact, damaging to Matt's ability to
contribute.  It was not Matt's cowboy nature, but rather the inability
of a core team to impose a vetting process on somone who could spend
between 12 and 16 hours a day coding on nothing but FreeBSD.


> > Linux has shown a willingness to implement design that FreeBSD has
> > only given lip service to, time and again.  Linux is, unfortunately,
> > where research is taking place.
>
> The ones doing real work will continue
> to use BSD for now.  I don't consider the catchup game that you have alluded
> to as "research", but only catchup.  You are confusing "catchup" with
> research.  Do you see the difference?  (Linux's VM research is a very
> entertaining example: can you say lots of knobs that you need to tweak?)
> FreeBSD's VM has lots of knobs, but those knobs are only desirable for
> atypical configurations.

I see the difference.  However, the VM system is about the only place
that this can be inexpertly defended.  All other places, Linux is close
enough that you have to defend such issues with very hard facts.  But
compare either to SVR4 ES/MP, or Dynix, of 5 years ago, and both FreeBSD
and Linux have areas which are still *laughably* primative, with no
apparent interest or desire to address them.  SMP is one such area; a
firm DDI/DKI is another.


> You can talk a good talk, but I would have adopted your work if it
> was worthwhile to do so (I wanted to, in fact.)  I didn't have the
> energy to maintain the mess that your changes would have caused.  It
> is better to deal with the mess one knows, rather than the mess
> that one doesn't :-).  The changes that did get adopted were good,
> but did require support.
> 
> (Sometimes your stuff was good, but much of the time, not complete
>  enough.)

"Better the devil you know" has never been a sound technological
argument.  I don't need to reach into my own arse for my examples
(though such examples abound); I can point at the networking stuff
that Garret did, which was brilliant, but which was ripped out due
to it not being completed in what someone arbitrarily decided was a
timely pashion.  There is code from Julian, PHK, and Bruce Evans
that falls into this same category.  William's serial driver code, or
Vadim Antonov's floppy tape driver design (from BSDI).  There are
literally thousands of such examples.


> > Julian's right; someone needs to do real architectural work.
>
> Time for Julian/Terry BSD.

You've been reading too much advocacy.  I have had sufficient
opportunity for such a thing in the past.  And I have resisted.  I
have resisted not only my own opportunity, but that of others, as
well.  Schism is not the answer, unless you have a social framework
ready to go in the post-schism universe.

I am frankly of the opinion now that much of "the FreeBSD problem"
is a macro effect of a micro rule, imposed by the tools available,
and, in fact, CVS in particular.  Many macro behaviours derive from
micro rules which prohibit individual behaviours which are, in fact,
available to the group.

Like the patchkit before it (something which, sociologically, I still
deeply regret), the use of CVS in the current system limits the
size, length, magnitude, and duration of branches which diverge from
the common vision (and common visions, themselves, are myopic by their
natures).


> I am not really interested in armchair
> quarterbacks unless they are willing and able to help solve the problems.

And likewise, for people willing and able to accept that help.  The
sword of Damocles is a two edged blade, as were most gladius's, and
that blade cuts both ways.


> One reason why my code hadn't made it into FreeBSD's tree when I left,
> was because of QC issues.  It takes restraint to keep from hacking the
> tree, and yet there is the need for architectural work.  But who?

Pick someone.  Someone with a vision in excess of six months.  Pick
Kirk McKusick, if he's willing, or David Greenman, if he can be freed
from the morass of crises an minutia into which has obviously been
dragging him away from the architects drafting table.  Don't involve
the architect(s) (or allow them to involve themselves!) in the petty
day-to-day infighting.

But for God's sake, pick someone.


> There are precious few people available to do the FreeBSD architectural
> work (who are competent enough.)  I do not include myself in that group,
> but would support someone who is willing and able.  If my work would not
> be wasted, I would aggressively support such a developer (and continually
> do in the background.)

As would I.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199903302207.PAA05079>