From owner-freebsd-chat  Wed Mar 31 22:47:47 1999
Delivered-To: freebsd-chat@freebsd.org
Received: from iquest3.iquest.net (iquest3.iquest.net [209.43.20.203])
	by hub.freebsd.org (Postfix) with SMTP id 4FF62151AB
	for <freebsd-chat@FreeBSD.ORG>; Wed, 31 Mar 1999 22:47:44 -0800 (PST)
	(envelope-from toor@dyson.iquest.net)
Received: (qmail 16961 invoked from network); 1 Apr 1999 06:47:22 -0000
Received: from dyson.iquest.net (198.70.144.127)
  by iquest3.iquest.net with SMTP; 1 Apr 1999 06:47:22 -0000
Received: (from root@localhost)
	by dyson.iquest.net (8.9.1/8.9.1) id BAA19102;
	Thu, 1 Apr 1999 01:47:20 -0500 (EST)
From: "John S. Dyson" <toor@dyson.iquest.net>
Message-Id: <199904010647.BAA19102@dyson.iquest.net>
Subject: Re: Linux vs. FreeBSD: The Storage Wars
In-Reply-To: <199903312141.OAA21836@usr07.primenet.com> from Terry Lambert at "Mar 31, 99 09:41:58 pm"
To: tlambert@primenet.com (Terry Lambert)
Date: Thu, 1 Apr 1999 01:47:20 -0500 (EST)
Cc: tlambert@primenet.com, dyson@iquest.net, hamellr@dsinw.com,
	unknown@riverstyx.net, freebsd-newbies@FreeBSD.ORG,
	freebsd-chat@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL32 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-chat@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 
> > > Excuse me, ask Kirk.  He designed the damn FFS with those reserved fields
> > > for a reason.
> >
> > Those reserved, unused fields?  It makes little difference as to how the
> > problem is fixed, if the problem isn't fixed :-).
> 
> Ugh.  Now you're just trying to get my goat.
>
Not really.

> > > 
> > > This is what stacking layers and namespace escapes were invented for.
> > > This is why John's students have been able to implement such VFS stacking
> > > layers (albeit, not in FreeBSD, where layer stacking is broken), but the
> > > architectural principles are surely not that difficult to grasp.
> >
> > The framework as it is, is super broken, and any fixes to date are
> > only expedient and insufficient.
> 
> And or "too complicated to swallow in a small amount of time without
> fully understanding the problem".
>
I fully understand the problem, and know that the current structure
is not the right way to do it.

> 
> > It requires total architecture
> > rewrite if you want reasonable efficiency (not throwing performance
> > away) and coherency.  Half solutions need not apply -- if the framework
> > as it was conceived and implemented so far in *BSD was fully implemented,
> > there would either be intractable coherency problems, or probably
> > intolerable efficiency issues.
> 
> Any putative efficiency penalties (granting their existance for the sake
> of discussion) would be paid only by the stacking layers themselves, and
> as it currently doesn't work, you aren't going to be paying an efficiency
> penalty for anything you currently use.
>
> 
> So efficiency is a NULL argument.
>
It cannot be a NULL argument, because continual polishing the t*rd isn't
really solving the problem.

> 
> IF VM alias objects are to be introduced (and that's a big mother "if",
> in my opinion), it should only be done *after* it is proven, using
> formal analysis methods, that unintentional aliases have been rendered
> impossible.
>
The current VM backing scheme is correct and needs only minor extension.
In fact, the VM backing is natural (e.g. copy on write), whilst the
current VFS layering doesn't handle the needed semantics for coherency
without lots of traversal of the layers.

Bottom line, the VM backing already DOES work, or nothing in the system
would work.

> 
> The only way I see clear for this to happen is if they don't both
> exist in the code at the same time.
>
Yep, get rid of the unintentional VFS layering bugs, by taking advantage
of the already needed VM layering for any kind of reasonable VM behavior.
That VM stuff is there anyway, so why muck it up with a parallel, and
semantically incorrect (or inefficient) structure?  The VM layering already
has the needed mechanisms for handling shared (and modified) memory
"repositories."

By constraining onself to the current VFS layering, it simply complicates
the system with two different kinds of layering schemes.  Don't forget
that sometimes generalization of a problem, simplifies it -- and the
VFS scheme is TOO conventionally-file oriented, and not very oriented
towards data.

The "file" abstraction is too specific.  I admit that the VM schemes
need to be better documented for those who haven't read the MACH
(and the new deamon book) information, but once the underlying principles
are understood, it is clear that files are a paradigm that are too
focused towards one kind of thinking.  Such new documentation would mostly
be a repeat of already available materials anyway.

As soon as a "file" is abstracted to "memory objects", then things become
easier.  A memory object can reside anywhere, and have all kinds of
inheritance attributes, and interrelations.  (A file can also, but the
scheme as presented in 4.4BSD VFS doesn't do so -- and to expand the
notion of file to what I call "memory objects", changes the current
layering code so severely as to make it better to almost start over.)

The Heidemann framework is a good document on the needed semantics
from a file standpoint, but addresses weakly the issues of the memory
objects (be they in memory, on disk, or across a network.)  With correct
protocols, the "memory object" scheme actually does what the programmer
expects.  The current VFS layering framework only very weakly handle the
issues of the "data containers" or "memory objects" themselves.  The
non-bidirectional nature of the current layering also forgets the 
forward movement of OS design.  (Of course, if every I/O call or access
to memory traverses the entire chain, then the current framework might work.)
The memory oriented approaches eliminate (or at least handle) the aliasing
and local caching issues correctly.  The original 4.4/2 framework was so bad,
that even local mmaped objects are only weakly coherent (actually not even
that), let alone any other caching in the pipeline.  With the memory
schemes, the problem solves itself (with only minor consideration for the
additional expected file semantics.)  It is only the proper implementation
of VM coherency that the current code works local to a given vnode.  It
is only a small VM extension, and definition for use, to make an entire
layered scheme work.  By reworking the entire VFS layering scheme (still
looking somewhat like the current implementation, but properly abstracted)
the entire solution (instead of a hack solution) can be made available.

Remember, both FILE and MEMORY data needs to be presented to the user, and
FILE data is a narrow picture of memory.  MEMORY can easily be made more
specific by presenting it as a file -- however expanding the semantics of
a file to memory is more complex (especially with sharing.)   When a
conversion to MEMORY from FILE and back again, has to be done at every
layer, then a scheme is going to be very inefficient or complex.  If the
abstraction is kept as memory at each layer, then complexities are lessened.

Since each layer might have to present a memory image (either as caching
or mmap), then with a file representation, each layer has to do the "hard"
conversion (given the anachronisitic file-only abstraction.)  There is NO cost
in keeping the abstraction as memory as long as possible in the chain.  If
a conversion is needed at machine boundaries, it might be possible to
avoid the file abstraction entirely, and create a (MEMORY <--> SOCKET)
protocol directly.   (It might not be needed to create and use a more complex
(MEMORY <--> NFS <--> SOCKET) thing.)

> 
> 
> > Why do you put words in my mouth about doubling inode size?  Straw man...
> 
> You are mentioning ACL's.  The most current FS ACL work is being done
> in NetBSD (not FreeBSD).  I thought you were referencing a modern
> research project when you referenced ACL's.  My mistake.
>
Yep...  By assuming what I have been thinking about, it shows that arguments
about such might be misguided.

> >
> > The ODS will need rework before Y2038 anyway.  I suspect that if the code
> > is working by 2010, things will be all well.  A UFS2 would eventually be a
> > good thing.
> 
> Fie.  You are the one who originally posted about seeing years of work
> frittered away.  I am not prepared to repeat that journey; it is a fool's
> quest.
>
Fallacious argument -- you aren't the author of the original code or
those changes are you?  The author of the code apparently accepted the
changes.  (In fact, the changes were also compatible with other users
and developers on the codebase.)

> 
> > The ODS will be changed in the future anyway.  It makes little
> > difference as to where the data is.  There is room in the inode, if
> > needed.
> > 
> > You are making a mountain out of a molehill.
> 
> I think you need to go byte counting in the inode structure looking
> for the room you claim is there.  Without modifying the inode into
> incompatability with existing FS's, it's just not there.
> 
I suggest coming up with a solution then.  Of course, I suggest that
UFS/ODS2 needs to be thought through.  By taking micro pot-shots doesn't
really solve the problem (or the other problems that needed to be solved
in the shorter term.)

John


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message