Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Nov 1999 10:18:58 +0100
From:      Eivind Eklund <eivind@FreeBSD.org>
To:        Greg Lehey <grog@lemis.com>
Cc:        Don <don@calis.blacksun.org>, Jacques Vidrine <n@nectar.com>, freebsd-fs@FreeBSD.org
Subject:   Re: journaling UFS and LFS
Message-ID:  <19991103101858.E72085@bitbox.follo.net>
In-Reply-To: <19991102154614.55760@mojave.sitaranetworks.com>; from grog@lemis.com on Tue, Nov 02, 1999 at 03:46:14PM -0500
References:  <19991030233304.03DB31DA4@bone.nectar.com> <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org> <19991101171936.J72085@bitbox.follo.net> <19991102154614.55760@mojave.sitaranetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Nov 02, 1999 at 03:46:14PM -0500, Greg Lehey wrote:
> On Monday,  1 November 1999 at 17:19:36 +0100, Eivind Eklund wrote:
> > On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote:
> >> This is getting off topic. What features would you like to see in a new
> >> file system. Some suggestions were made. Would you like to add anything to
> >> this list?
> >
> > Yes.
> > * Easy to do concurrent access from multiple hosts to the same
> >   physical media
> 
> You can never do this in the general case (where any host may request
> access to any part of the disk).  The best you could do there is a
> file server, but they're not quite our terms of reference.

I don't get this.  To give a little more detail in what I mean: You
have the FS export a bunch of locks into the DLM (Distributed Lock
Manager) you are running (probably over the bus you use to share
access to the disks, but you can use another connection media as long
as it is there), and the host that wants to do something to some part
of the FS grabs the relevant lock.  You also design the disk layout to
allow writing in a transactional way, so a host failure while the host
has a lock doesn't hurt the other hosts accessing the same physical
media.

I don't get what "general case" there is, as you're designing the
system - could you please explain?

> > * Ability to span more than one disk
> 
> That's not necessarily a file system feature.  Vinum does that now.

Sure.  The reason for having it in the FS is that you can optimize for
the independence of your spindles.  This lets you:
* Write logs and data to separate spindles (increasing performance)
* Give performance guarantees proportional to the number and features
  of your spindles, instead of being limited by what your weakest link
  can do (times one)
* Optimize data layout to be able to do a semi-recovery after losing
  one of your spindles
* (irrelevant unless we extend the userland interface, which was
  planned for G2) Give different guarantees for different files in the
  same namespace.  You may need RAID-0 to get the speed wanted for one
  non-critical file, while wanting RAID-5 to store a file that need
  safe storage, but don't need fast streaming.

> > I have design papers on the FS designed for G2, which was intended to
> > support all of the features I've seen listed so far.  It has a couple
> > of drawbacks:
> > (1) It is not designed to have the semantics of a standard Unix
> >     filesystem.
> 
> That doesn't surprise me, if you want to implement the first of your
> suggestions.

Actually, that's not a problem - but we decided against pushing any
complexity into the bottom end filesystem if we could do it well in a
stacking layer.

> Is there anything in there which would be of interest in our
> environment?

As I said, it supports all features I've seen mentioned (by anybody)
so far in the discussion.  Its most most significant design goal was
to support Highly Available Systems; that is, clusters.  The design
allows more than one machine in a cluster to access a shared disk with
a HAS-FS on it, with the system as a whole surviving the (unplanned)
loss of any individual member.

I think we ended up supporting transactions built from several file
operations in multi-machine context, too, but I'm not 100% sure (it is
almost 1 1/2 year since Simon and I did the design, which was done
during a single three-week session in the same physical location, and
I've not worked with the spec since).

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991103101858.E72085>