From owner-freebsd-fs  Sun Oct 31  3: 5:30 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.127.48])
	by hub.freebsd.org (Postfix) with ESMTP id E8EB314C38
	for <freebsd-fs@FreeBSD.ORG>; Sun, 31 Oct 1999 03:05:17 -0800 (PST)
	(envelope-from rr@xs4all.nl)
Received: from xs3.xs4all.nl (xs3.xs4all.nl [194.109.6.44])
	by smtp1.xs4all.nl (8.9.3/8.9.3) with ESMTP id MAA19130
	for <freebsd-fs@FreeBSD.ORG>; Sun, 31 Oct 1999 12:05:15 +0100 (CET)
Received: (from rr@localhost)
	by xs3.xs4all.nl (8.9.0/8.9.0) id MAA28836
	for freebsd-fs@FreeBSD.ORG; Sun, 31 Oct 1999 12:05:14 +0100 (CET)
Date: Sun, 31 Oct 1999 12:05:14 +0100
From: Rodney <rr@xs4all.nl>
To: freebsd-fs@FreeBSD.ORG
Subject: feature list journalled fs
Message-ID: <19991031120514.A28103@xs4all.nl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


hi, 

here's my list of features I'd like to see in a
journalled fs. Have to admit this list is heavily
inspired ( ok , copied ) from the VxFS features,
apart from th buzz words,
some of them make sense, some of them don't
but it should give us some stuff to discus:

1) extent based allocation
   coding this should be easy, it's just a address-lenght pair
   identifying the starting block address and the length of the 
   extent. I've seen this coded up in qnxfs under linux.
   I think the vsta filesystem does something similar.
2) fast filesystem recovery , obviously
3) acls would be nice , afs style ?
4) online defrag and resizing (while user are online)
5) online backup/snapshot
6) vinum integration (vague)
7) built features that make databases very happy
   like msql/mysql/oracle. (vague)

also b?trees for indexing sounds cool, thought the xfs
implementation seems quite heavy(they maintain 2 of them)
, ie over-kill ? 
The way b+trees are use in the Be fs (bfs) might be more
appropriate.

Comments ?


rodney


-- 
"I can't understand why people are frightened of new ideas. 
 I'm frightened of old ones." --John Cage
--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sun Oct 31 10:53:52 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from angel.algonet.se (angel.algonet.se [194.213.74.112])
	by hub.freebsd.org (Postfix) with SMTP id B913C14CAC
	for <freebsd-fs@FreeBSD.org>; Sun, 31 Oct 1999 10:53:48 -0800 (PST)
	(envelope-from mal@algonet.se)
Received: (qmail 6333 invoked from network); 31 Oct 1999 19:53:44 +0100
Received: from kent.algonet.se (194.213.74.90)
  by angel.algonet.se with SMTP; 31 Oct 1999 19:53:44 +0100
Received: from kairos.algonet.se ([194.213.74.18])
 by algonet.se (BLUETAIL Mail Robustifier1.0.4) with ESMTP
 ; Sun, 31 Oct 1999 18:53:44 GMT
Received: (mal@localhost) by kairos.algonet.se (8.8.8+Sun/8.6.12) id TAA03522; Sun, 31 Oct 1999 19:53:43 +0100 (MET)
To: freebsd-fs@FreeBSD.org
Cc: ezk@cs.columbia.edu (Erez Zadok)
Subject: Re: stupidfs - easily extensible test file systems?
From: Mats Lofkvist <mal@algonet.se>
Date: 31 Oct 1999 19:53:43 +0100
In-Reply-To: ezk@cs.columbia.edu's message of "29 Oct 1999 05:23:26 +0800"
Message-ID: <y2qk8o35pl4.fsf@kairos.algonet.se>
Lines: 26
X-Mailer: Gnus v5.6.45/Emacs 20.3
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

ezk@cs.columbia.edu (Erez Zadok) writes:

> Robert, it's been done.  To some degree that's nullfs (if nullfs had been
> working; the VFS is broken).  I've written stackable f/s templates exactly
> for the purpose of developers using them to build other f/s w/o having the
> many hassles of writing a full f/s.  My wrapper templates, called wrapfs,
> work on freebsd, linux, and solaris.  You can build all kinds of f/s using
> them, including f/s that do not require persistent storage.
> 
> See
> 	http://www.cs.columbia.edu/~ezk/research
> for papers, and
> 	http://www.cs.columbia.edu/~ezk/research/software
> for tarballs.

Is wrapfs/fist actively updated for FreeBSD? (I noted that the
latest FreeBSD version is almost a year old and for 3.0 only.)

And does anyone know if this has a chance being a standard part
of FreeBSD, and how it relates to the general cleanup of the
stacking fs code that seem to be on the "todo sometime in the
 future" list for FreeBSD?

      _
Mats Lofkvist
mal@algonet.se


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sun Oct 31 12: 5:20 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from haldjas.folklore.ee (Haldjas.folklore.ee [193.40.6.121])
	by hub.freebsd.org (Postfix) with ESMTP id 6F9A414C01
	for <freebsd-fs@FreeBSD.ORG>; Sun, 31 Oct 1999 12:05:15 -0800 (PST)
	(envelope-from narvi@haldjas.folklore.ee)
Received: from localhost (narvi@localhost)
	by haldjas.folklore.ee (8.9.3/8.9.3) with SMTP id WAA25104;
	Sun, 31 Oct 1999 22:04:55 +0200 (EET)
	(envelope-from narvi@haldjas.folklore.ee)
Date: Sun, 31 Oct 1999 22:04:55 +0200 (EET)
From: Narvi <narvi@haldjas.folklore.ee>
To: Rodney <rr@xs4all.nl>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991031120514.A28103@xs4all.nl>
Message-ID: <Pine.BSF.3.96.991031214810.55108C-100000@haldjas.folklore.ee>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Sun, 31 Oct 1999, Rodney wrote:

> hi, 
> 
> here's my list of features I'd like to see in a
> journalled fs. Have to admit this list is heavily
> inspired ( ok , copied ) from the VxFS features,
> apart from th buzz words,
> some of them make sense, some of them don't
> but it should give us some stuff to discus:
> 
> Comments ?
> 

You forgot to include *anything* that in any way relates to journaling. 

As did others. 

Which leaves the question whetever you want a journaled filesystem at all,
or just a filesystem confirming to a lot of buzz-words. 

IMHO it would be good to have a journaled filesystem. 8-)

If it is extensible enough to easily allow a selection of the buzz-words
that have been thrown around, so much the better. But the utility of the
features would hopefully be tested before actually incorporated.

If somebody is making a list of 'features' then they should add:

	* Can optimise data placement for the case that the partition it
	  resides on is not located on a single spindel but resides on n
	  spindles.

Think of vinum, that is standard in the system.

> rodney
> -- 
> "I can't understand why people are frightened of new ideas. 
>  I'm frightened of old ones." --John Cage
> --


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sun Oct 31 14:11:25 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20])
	by hub.freebsd.org (Postfix) with ESMTP id C531814EF6
	for <freebsd-fs@FreeBSD.org>; Sun, 31 Oct 1999 14:11:21 -0800 (PST)
	(envelope-from ezk@shekel.mcl.cs.columbia.edu)
Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15])
	by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id RAA27674;
	Sun, 31 Oct 1999 17:11:21 -0500 (EST)
Received: (from ezk@localhost)
	by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id RAA00014;
	Sun, 31 Oct 1999 17:11:20 -0500 (EST)
Date: Sun, 31 Oct 1999 17:11:20 -0500 (EST)
Message-Id: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu>
X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f
From: Erez Zadok <ezk@cs.columbia.edu>
To: Mats Lofkvist <mal@algonet.se>
Cc: freebsd-fs@FreeBSD.org, ezk@cs.columbia.edu (Erez Zadok)
Subject: Re: stupidfs - easily extensible test file systems? 
In-reply-to: Your message of "31 Oct 1999 19:53:43 +0100."
             <y2qk8o35pl4.fsf@kairos.algonet.se> 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <y2qk8o35pl4.fsf@kairos.algonet.se>, Mats Lofkvist writes:
> ezk@cs.columbia.edu (Erez Zadok) writes:
> 
> > Robert, it's been done.  To some degree that's nullfs (if nullfs had been
> > working; the VFS is broken).  I've written stackable f/s templates exactly
> > for the purpose of developers using them to build other f/s w/o having the
> > many hassles of writing a full f/s.  My wrapper templates, called wrapfs,
> > work on freebsd, linux, and solaris.  You can build all kinds of f/s using
> > them, including f/s that do not require persistent storage.
> > 
> > See
> > 	http://www.cs.columbia.edu/~ezk/research
> > for papers, and
> > 	http://www.cs.columbia.edu/~ezk/research/software
> > for tarballs.
> 
> Is wrapfs/fist actively updated for FreeBSD? (I noted that the
> latest FreeBSD version is almost a year old and for 3.0 only.)

I will be updating this port for 3.3 and 4.0 in the two weeks following
LISA, i.e. by end of November.

> And does anyone know if this has a chance being a standard part
> of FreeBSD, and how it relates to the general cleanup of the
> stacking fs code that seem to be on the "todo sometime in the
> future" list for FreeBSD?

What do you mean by "this"?  My code will be fixed soon.  The problem is
that I'm forced to use synchronous writes to work around the VFS problems.
I don't expect the VFS to be fixed any time soon.  It's been broken for a
long time and there aren't too many "customers" complaining about it, or it
would have been fixed by now.  It just doesn't appear to be a high priority
for the freebsd developers.  I think it's too late for 3.x, but now would be
a good time for freebsd to put those fixes into 4.0, before it becomes the
default stable version.

Many people on this list understand the problems and know how to fix them.
There are even some experimental patches made by Eivind Eklund, but those
patches aren't part of the kernel.  Eivind's patches used to be in

	http://www.freebsd.org/~eivind/VOP_GETBACKINGOBJECT.patch

and now they appear to be in 

	http://www.freebsd.org/~eivind/FixNULL.patch

(Eivind, can you confirm the new URL?  FixNull.patch seems to include stuff
unrelated to the VFS, such as scsi driver fixes.  Thanks.)

There's also been talk about some people (McKusick et al) rewriting the
whole VFS.  While I think that's a great idea, it's a large undertaking and
will take a long while for busy people like McKusick to complete.  I think a
complete rewrite, if any, should be scheduled for 5.x.  I would therefore
suggest that a simpler fix such as Eivind's be incorporated into a 4.0 so
people can use stackable f/s (unionfs, nulls, and my wrapfs/cryptfs, etc.)
in the more immediate future.

> Mats Lofkvist
> mal@algonet.se

I'd like to mention that I understand the pressures the freebsd developers
are under.  From a support perspective, you have to prioritize your human
resources based on customer needs.  There are, however, enough people
(myself included) who are willing to work together and come up with a design
and an implementation of the VFS fixes, and we are willing to spend our
personal (i.e., free) time to do so.  All we ask is commitment on the part
of the management to include such patches in a none-too-distant future
release.

BTW, if there's enough momentum and FreeBSD developers/hackers attending
LISA, we can have a brainstorming meeting in Seattle...

Thanks,
Erez Zadok.
---
Columbia University Department of Computer Science.
EMail: ezk@cs.columbia.edu           Web: http://www.cs.columbia.edu/~ezk


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sun Oct 31 22:55:27 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.tvol.com (mail.wgate.com [38.219.83.4])
	by hub.freebsd.org (Postfix) with ESMTP id 4C99315248
	for <freebsd-fs@FreeBSD.ORG>; Sun, 31 Oct 1999 22:55:20 -0800 (PST)
	(envelope-from rjesup@wgate.com)
Received: from jesup.eng.tvol.net (jesup.eng.tvol.net [10.32.2.26]) by mail.tvol.com (8.8.8/8.8.3) with ESMTP id BAA07275 for <freebsd-fs@FreeBSD.ORG>; Mon, 1 Nov 1999 01:50:35 -0500 (EST)
Reply-To: Randell Jesup <rjesup@wgate.com>
To: freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS
References: <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org>
From: Randell Jesup <rjesup@wgate.com>
Date: 01 Nov 1999 02:51:47 +0000
In-Reply-To: Don's message of "Sat, 30 Oct 1999 19:40:35 -0400 (EDT)"
Message-ID: <ybuogdehqkc.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
X-Mailer: Gnus v5.6.43/Emacs 20.4
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Don <don@calis.blacksun.org> writes:
>> Most corporate IT managers wouldn't know a filesystem if they were 
>> bitten by one.
>That is absolutely the case. That is why I can not suggest that
>softupdates is as good as a journaled file system. The people I deal with
>at least know the buzzword and they want to make sure that whatever
>solution they go with will have it.

	Question: is the fsck time for softupdates the same as for
plain UFS (when it needs to fsck, which should be (much) less often,
if I remember correctly).  Even the occasional long-fsck-time can be
a problem for a high-availability production environment.

	Side question: why is it that there are certain errors (inode out
of range, for example) that fsck barfs on and exits?  I actually had to
go in to the source for fsck and modify it to recover a drive of a
coworker (with important changes since the last nightly backup).  And
please don't say "just clrinode it and retry".  First, if you have
more than a couple of them this can take a LONG time and lots of
manual intervention (in this case, hundreds or more likely thousands of
manual clrinodes would have been needed).  Second, if that's the suggested
resolution, why not make it possible to do from within fsck?  If it's
REALLY dangerous, then warn people about that, or stop the normal
automatic mode from doing this correction without another option (the
--i_really_mean_it_i_live_for_danger option).  :-)

	If I hadn't known filesystems and been able to hack the source,
the coworker would have lost some important work.

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sun Oct 31 23:33:40 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20])
	by hub.freebsd.org (Postfix) with ESMTP id 9547D15419
	for <freebsd-fs@FreeBSD.ORG>; Sun, 31 Oct 1999 23:33:33 -0800 (PST)
	(envelope-from bright@wintelcom.net)
Received: from localhost (bright@localhost)
	by fw.wintelcom.net (8.9.3/8.9.3) with ESMTP id XAA20576;
	Sun, 31 Oct 1999 23:56:49 -0800 (PST)
Date: Sun, 31 Oct 1999 23:56:48 -0800 (PST)
From: Alfred Perlstein <bright@wintelcom.net>
To: Randell Jesup <rjesup@wgate.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS
In-Reply-To: <ybuogdehqkc.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
Message-ID: <Pine.BSF.4.05.9910312356070.12797-100000@fw.wintelcom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On 1 Nov 1999, Randell Jesup wrote:

> Don <don@calis.blacksun.org> writes:
> >> Most corporate IT managers wouldn't know a filesystem if they were 
> >> bitten by one.
> >That is absolutely the case. That is why I can not suggest that
> >softupdates is as good as a journaled file system. The people I deal with
> >at least know the buzzword and they want to make sure that whatever
> >solution they go with will have it.
> 
> 	Question: is the fsck time for softupdates the same as for
> plain UFS (when it needs to fsck, which should be (much) less often,
> if I remember correctly).  Even the occasional long-fsck-time can be
> a problem for a high-availability production environment.
> 
> 	Side question: why is it that there are certain errors (inode out
> of range, for example) that fsck barfs on and exits?  I actually had to
> go in to the source for fsck and modify it to recover a drive of a
> coworker (with important changes since the last nightly backup).  And
> please don't say "just clrinode it and retry".  First, if you have
> more than a couple of them this can take a LONG time and lots of
> manual intervention (in this case, hundreds or more likely thousands of
> manual clrinodes would have been needed).  Second, if that's the suggested
> resolution, why not make it possible to do from within fsck?  If it's
> REALLY dangerous, then warn people about that, or stop the normal
> automatic mode from doing this correction without another option (the
> --i_really_mean_it_i_live_for_danger option).  :-)

A url to your patches would be appreciated.

-Alfred

> 	If I hadn't known filesystems and been able to hack the source,
> the coworker would have lost some important work.
> 
> -- 
> Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
> rjesup@wgate.com
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1  2:12:54 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from akat.civ.cvut.cz (akat.civ.cvut.cz [147.32.235.105])
	by hub.freebsd.org (Postfix) with SMTP id 4EF8C14E04
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 02:12:40 -0800 (PST)
	(envelope-from pechy@hp735.cvut.cz)
Received: from localhost (pechy@localhost) by akat.civ.cvut.cz (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA21363; Mon, 1 Nov 1999 11:11:06 +0100
Date: Mon, 1 Nov 1999 11:11:06 +0100
From: Jan Pechanec <pechy@hp735.cvut.cz>
X-Sender: pechy@akat.civ.cvut.cz
To: Poul-Henning Kamp <phk@critter.freebsd.dk>
Cc: Greg Lehey <grog@lemis.com>, Bernd Walter <ticso@cicely.de>,
	Don <don@calis.blacksun.org>,
	Alfred Perlstein <bright@wintelcom.net>, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling 
In-Reply-To: <4407.941213948@critter.freebsd.dk>
Message-ID: <Pine.SGI.4.05.9911011109410.21204-100000@akat.civ.cvut.cz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, 29 Oct 1999, Poul-Henning Kamp wrote:

	Vahalia [UNIX Internals, Prentice-Hall] says that FFS is an
original BSD filesystem and UFS is rewritten FFS for vnode layer.

	Jan.

>In message <19991029095858.50758@mojave.worldwide.lemis.com>, Greg Lehey writes:
>>On Wednesday, 27 October 1999 at 19:32:00 +0200, Bernd Walter wrote:
>>> The number of partitions has nothing to do with with the filesystem you use.
>>> FFS is not a partitionsheme but a filesystem.
>>> UFS is a historic filesystem on which FFS is based.
>>
>>Well, in fact they're the same thing.  The *old* name is FFS (Fast
>>File System).  When System V.4 was released, they adopted FFS as the
>>standard file system and called it the UNIX File System.
>
>...Whereas in *BSD "UFS" refers to the unix sematics layer (directory
>manipulation and all that) and "FFS" refers to the underlying storage
>object manager (which only understands inodes and their layout.)
>
>--
>Poul-Henning Kamp             FreeBSD coreteam member
>phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
>FreeBSD -- It will take a long time before progress goes too far!
>
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-fs" in the body of the message
>

-- 
Jan PECHANEC (mailto:pechy@hp735.cvut.cz)
Computing Center CTU (Zikova 4, Praha 6, 166 35, Czech Republic)
http://www.civ.cvut.cz, tel: +420 2 2435 2969, http://pechy.civ.cvut.cz


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1  2:40:30 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.40.131])
	by hub.freebsd.org (Postfix) with ESMTP id 34FAD14A0B
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 02:40:26 -0800 (PST)
	(envelope-from phk@critter.freebsd.dk)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
	by critter.freebsd.dk (8.9.3/8.9.2) with ESMTP id LAA20349;
	Mon, 1 Nov 1999 11:36:56 +0100 (CET)
	(envelope-from phk@critter.freebsd.dk)
To: Jan Pechanec <pechy@hp735.cvut.cz>
Cc: Greg Lehey <grog@lemis.com>, Bernd Walter <ticso@cicely.de>,
	Don <don@calis.blacksun.org>,
	Alfred Perlstein <bright@wintelcom.net>, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling 
In-reply-to: Your message of "Mon, 01 Nov 1999 11:11:06 +0100."
             <Pine.SGI.4.05.9911011109410.21204-100000@akat.civ.cvut.cz> 
Date: Mon, 01 Nov 1999 11:36:56 +0100
Message-ID: <20347.941452616@critter.freebsd.dk>
From: Poul-Henning Kamp <phk@critter.freebsd.dk>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <Pine.SGI.4.05.9911011109410.21204-100000@akat.civ.cvut.cz>, Jan Pechanec
 writes:
>
>	Vahalia [UNIX Internals, Prentice-Hall] says that FFS is an
>original BSD filesystem and UFS is rewritten FFS for vnode layer.
>

Well, who do you trust, Kirk & the source, or Vahalia ?

Poul-Henning

>On Fri, 29 Oct 1999, Poul-Henning Kamp wrote:
>>
>>...Whereas in *BSD "UFS" refers to the unix sematics layer (directory
>>manipulation and all that) and "FFS" refers to the underlying storage
>>object manager (which only understands inodes and their layout.)

--
Poul-Henning Kamp             FreeBSD coreteam member
phk@FreeBSD.ORG               "Real hackers run -current on their laptop."
FreeBSD -- It will take a long time before progress goes too far!


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1  6:36:49 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.tvol.com (mail.wgate.com [38.219.83.4])
	by hub.freebsd.org (Postfix) with ESMTP id DBFBC14BFA
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 06:36:42 -0800 (PST)
	(envelope-from rjesup@wgate.com)
Received: from jesup.eng.tvol.net (jesup.eng.tvol.net [10.32.2.26]) by mail.tvol.com (8.8.8/8.8.3) with ESMTP id JAA20835 for <freebsd-fs@FreeBSD.ORG>; Mon, 1 Nov 1999 09:32:00 -0500 (EST)
Reply-To: Randell Jesup <rjesup@wgate.com>
To: freebsd-fs@FreeBSD.ORG
Subject: Re: Features of a journaled file system
References: <Pine.BSF.4.05.9910301851350.44044-100000@calis.blacksun.org> <19991031014032.A3510@keltia.freenix.fr> <381B85AB.68EF4A45@zk3.dec.com>
From: Randell Jesup <rjesup@wgate.com>
Date: 01 Nov 1999 10:33:10 +0000
In-Reply-To: Chang Song's message of "Sat, 30 Oct 1999 19:56:27 -0400"
Message-ID: <ybuiu3mh57d.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
X-Mailer: Gnus v5.6.43/Emacs 20.4
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Chang Song <song@zk3.dec.com> writes:
>> > Should the file system use b-trees? What other technologies should such a
>> 
>> B-trees would help a lot in some cases. UFS performance has always been
>> abyssimal with large directories...
>
>I think B+ tree is too complex to maintain and implement.
>Extendible hashing (GFS uses it) is great compromise. Easier to implement
>yet competitive or sometime faster than B+ tree.

	I'm a big fan of hashing for directories.  Add something to (say)
cause the FS to add a hash level to a chain that grows too large (or to all
the chains of a directory that grows too large), and the benefit is almost
as large as a b+ tree for access/modify, and add/delete would be (much?)
faster (it's been a while since I looked at b+ trees for directories - OS2
uses them if I remember correctly).  I slightly prefer adding hash table
levels according to chain length, but that might require some extra
bookkeeping (not a lot, just a counter per chain).

	I've written FS's that kept duplicate directory lists: one hashed
(single level) for speed, and one sequential for speed at listing
directories.  It has the nice side-effect of making the FS more easily
recoverable, at the cost of some disk space and slightly slower
create/delete.  (The sequential blocks were heavily compressed, and the
hash tables only had the head file-header block (inode) of each chain.
(I'm not saying this design should be duplicated - it was a way to avoid
changing too much of an FS written in ASM, while speeding up dir listings;
just mentioning.)

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1  7: 4:41 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from worf.qntm.com (worf.qntm.com [146.174.250.100])
	by hub.freebsd.org (Postfix) with ESMTP id 84DEA14BC2
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 07:04:29 -0800 (PST)
	(envelope-from Stephen.Byan@quantum.com)
Received: from mail3.qntm.com by worf.qntm.com with ESMTP
	(1.40.112.12/16.2) id AA146378668; Mon, 1 Nov 1999 07:04:28 -0800
Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61])
	by mail3.qntm.com (8.8.6/8.8.6) with ESMTP id HAA14917
	for <freebsd-fs@FreeBSD.ORG>; Mon, 1 Nov 1999 07:04:29 -0800 (PST)
Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.10)
	id <WAVHSHL8>; Mon, 1 Nov 1999 07:04:25 -0800
Message-Id: <8133266FE373D11190CD00805FA768BF02EE9DD7@shrcmsg1.tdh.qntm.com>
From: Stephen Byan <Stephen.Byan@quantum.com>
To: freebsd-fs@FreeBSD.ORG
Subject: RE: journaling UFS and LFS
Date: Mon, 1 Nov 1999 07:04:24 -0800 
Mime-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.10)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>Chang Song [mailto:song@zk3.dec.com] wrote:
>
>Don wrote:
>> 
>> Softupdates is definitely a viable solution however it does not address
>> several issues and the license is not a BSD license so it makes me
>> uncomfortable.
>
>Could you let me know what SoftUpdate does not address?
>Thank you.

One potential problem with soft updates is that the order of
creation/deletion/truncation/etc of files is not preserved through a crash
or power outage, wheras UFS and logged file systems  (not logging file
systems as in LFS; what do you say the kind that maintain a recovery log in
addition to their regular metadata?) preserve this ordering. I wonder how
many recovery strategies are broken by soft updates. Anyone have any data?

Regards,
-Steve

Steve Byan <stephen.byan@quantum.com>
Design Engineer 
Quantum Corporation <http://www.quantum.com>
MS 1-3/E23
333 South Street
Shrewsbury, MA 01545
voice: (508) 770-3414 
fax: (508) 770-2604


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1  7:52:43 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from angel.algonet.se (angel.algonet.se [194.213.74.112])
	by hub.freebsd.org (Postfix) with SMTP id 535DA14C92
	for <freebsd-fs@FreeBSD.org>; Mon,  1 Nov 1999 07:52:37 -0800 (PST)
	(envelope-from mal@algonet.se)
Received: (qmail 13586 invoked from network); 1 Nov 1999 16:52:35 +0100
Received: from enok.algonet.se (194.213.74.88)
  by angel.algonet.se with SMTP; 1 Nov 1999 16:52:35 +0100
Received: from kairos.algonet.se ([194.213.74.18])
 by algonet.se (BLUETAIL Mail Robustifier1.0.4) with ESMTP
 ; Mon, 01 Nov 1999 15:52:35 GMT
Received: (mal@localhost) by kairos.algonet.se (8.8.8+Sun/8.6.12) id QAA19191; Mon, 1 Nov 1999 16:52:34 +0100 (MET)
Date: Mon, 1 Nov 1999 16:52:34 +0100 (MET)
Message-Id: <199911011552.QAA19191@kairos.algonet.se>
X-Authentication-Warning: kairos.algonet.se: mal set sender to mal@kairos.algonet.se using -f
From: Mats Lofkvist <mal@algonet.se>
To: ezk@cs.columbia.edu
Cc: freebsd-fs@FreeBSD.org
In-reply-to: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu> (message from
	Erez Zadok on Sun, 31 Oct 1999 17:11:20 -0500 (EST))
Subject: Re: stupidfs - easily extensible test file systems?
References:  <199910312211.RAA00014@shekel.mcl.cs.columbia.edu>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


   > And does anyone know if this has a chance being a standard part
   > of FreeBSD, and how it relates to the general cleanup of the
   > stacking fs code that seem to be on the "todo sometime in the
   > future" list for FreeBSD?

   What do you mean by "this"?  My code will be fixed soon.  The problem is
   that I'm forced to use synchronous writes to work around the VFS problems.
   I don't expect the VFS to be fixed any time soon.  It's been broken for a
   long time and there aren't too many "customers" complaining about it, or it
   would have been fixed by now.  It just doesn't appear to be a high priority
   for the freebsd developers.  I think it's too late for 3.x, but now would be
   a good time for freebsd to put those fixes into 4.0, before it becomes the
   default stable version.

(My limited VFS knowledge shows here, but what the heck..)

What I wondered was if fist/wrapfs helps cleaning up the FreeBSD VFS code,
is only using it as is, or if it is incompatible with what the FreeBSD
architects have in mind.

I.e. is it a good idea to build a new FreeBSD filesystem using wrapfs?

      _
Mats Lofkvist
mal@algonet.se


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1  8:21:15 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 66C6914BFA
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 08:21:09 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id RAA04291;
	Mon, 1 Nov 1999 17:19:38 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id RAA73394;
	Mon, 1 Nov 1999 17:19:37 +0100 (MET)
Date: Mon, 1 Nov 1999 17:19:36 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Don <don@calis.blacksun.org>
Cc: Jacques Vidrine <n@nectar.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS
Message-ID: <19991101171936.J72085@bitbox.follo.net>
References: <19991030233304.03DB31DA4@bone.nectar.com> <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org>; from don@calis.blacksun.org on Sat, Oct 30, 1999 at 07:40:35PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote:
> This is getting off topic. What features would you like to see in a new
> file system. Some suggestions were made. Would you like to add anything to
> this list?

Yes.
* Easy to do concurrent access from multiple hosts to the same
  physical media
* Ability to span more than one disk
* Performance guarantees

I have design papers on the FS designed for G2, which was intended to
support all of the features I've seen listed so far.  It has a couple
of drawbacks:
(1) It is not designed to have the semantics of a standard Unix
    filesystem.  It is designed to run at the bottom end of a chain of
    stacked filesystems.  If you want e.g. symlinks to work, you need
    to stack a layer.
(2) It is not designed to run on a single spindle.  Single spindle
    performance will be horrible.

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 10:46: 1 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20])
	by hub.freebsd.org (Postfix) with ESMTP id C610014A20
	for <freebsd-fs@FreeBSD.org>; Mon,  1 Nov 1999 10:45:52 -0800 (PST)
	(envelope-from ezk@shekel.mcl.cs.columbia.edu)
Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15])
	by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id NAA08720;
	Mon, 1 Nov 1999 13:45:51 -0500 (EST)
Received: (from ezk@localhost)
	by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id NAA02285;
	Mon, 1 Nov 1999 13:45:51 -0500 (EST)
Date: Mon, 1 Nov 1999 13:45:51 -0500 (EST)
Message-Id: <199911011845.NAA02285@shekel.mcl.cs.columbia.edu>
X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f
From: Erez Zadok <ezk@cs.columbia.edu>
To: Mats Lofkvist <mal@algonet.se>
Cc: ezk@cs.columbia.edu, freebsd-fs@FreeBSD.org
Subject: Re: stupidfs - easily extensible test file systems? 
In-reply-to: Your message of "Mon, 01 Nov 1999 16:52:34 +0100."
             <199911011552.QAA19191@kairos.algonet.se> 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <199911011552.QAA19191@kairos.algonet.se>, Mats Lofkvist writes:
> 
>    > And does anyone know if this has a chance being a standard part
>    > of FreeBSD, and how it relates to the general cleanup of the
>    > stacking fs code that seem to be on the "todo sometime in the
>    > future" list for FreeBSD?
> 
>    What do you mean by "this"?  My code will be fixed soon.  The problem
>    is that I'm forced to use synchronous writes to work around the VFS
>    problems.  I don't expect the VFS to be fixed any time soon.  It's been
>    broken for a long time and there aren't too many "customers"
>    complaining about it, or it would have been fixed by now.  It just
>    doesn't appear to be a high priority for the freebsd developers.  I
>    think it's too late for 3.x, but now would be a good time for freebsd
>    to put those fixes into 4.0, before it becomes the default stable
>    version.
> 
> (My limited VFS knowledge shows here, but what the heck..)
> 
> What I wondered was if fist/wrapfs helps cleaning up the FreeBSD VFS code,
> is only using it as is, or if it is incompatible with what the FreeBSD
> architects have in mind.
> 
> I.e. is it a good idea to build a new FreeBSD filesystem using wrapfs?

Yes.  That's the premise of my Ph.D. work:

(1) I provide you with stackable templates that do not change the VFS, do
    not change anything else in the OS, and do not modify lower level file
    systems (FFS, NFS, etc.)  That way, when my templates are not in use,
    the performance of the rest of the system remains the same (which was
    not true for past stackable vnode interface works).

(2) My wrapfs templates use the VFS as is.  That was an important goal for
    me, knowing full well that requiring any significant changes to the VFS
    will never be accepted by any OS vendor.  Requiring big changes was one
    reason why all the work done by Sun and UCLA is not available in modern,
    common OSs; no one wants to rewrite the VFS and all file systems to
    conform to a new "real" stackable interface.

(3) The wrapfs templates export a simple API that's similar across different
    OSS.  I have templates for FreeBSD, Linux, and Solaris.  When an OS
    makes small changes to their VFS, I update the wrapfs templates as
    needed.  People who used wrapfs as a basis for another file system don't
    have to worry too much about kernel internals.

Creating the wrapfs templates was the first half of my Ph.D. work.  The
second half is the creation of a high level stackable f/s language, which I
call FiST.  Fistgen, the language translator, uses wrapfs templates and f/s
descriptions to produce f/s modules automatically for your choice OS.

That's a summary of things.  If you want more details, I'll be happy to
provide them.  You can also read my USENIX'99 paper titled "Extending File
Systems Using Stackable Templates", available in

	http://www.cs.columbia.edu/~ezk/research/

Also, there's a WIP paper on FiST in

	http://www.cs.columbia.edu/~ezk/research/wip.html

Now, going back to FreeBSD: I didn't want to change the FreeBSD VFS, so I
worked around it.  I used synchronous writes to "solve" the backing-object
problem.  The result was a wrapfs template that is slower due to all those
synchronous writes, but at least it works (unlike nullfs and unionfs).  When
the FreeBSD VFS is fixed, I will produce updated wrapfs templates that don't
need synchronous writes.  After that, you'd have to port your diffs from the
old wrapfs template to the new one (that could be automated using a 3-way
diff).  If fistgen is out by then, using new templates would be as easy as
rerunning fistgen on your existing ".fist" file.

> Mats Lofkvist
> mal@algonet.se

Erez.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 12:32:37 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from apollo.sitaranetworks.com (apollo.sitaranetworks.com [199.103.141.105])
	by hub.freebsd.org (Postfix) with ESMTP id 1F864153E9
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 12:30:57 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991029095858.50758@mojave.worldwide.lemis.com>
Date: Fri, 29 Oct 1999 09:58:58 -0400
From: Greg Lehey <grog@lemis.com>
To: Bernd Walter <ticso@cicely.de>, Don <don@calis.blacksun.org>
Cc: Alfred Perlstein <bright@wintelcom.net>, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
References: <19991027095431.45462@mojave.worldwide.lemis.com> <Pine.BSF.4.05.9910271300460.35360-100000@calis.blacksun.org> <19991027193200.A52144@cicely7.cicely.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991027193200.A52144@cicely7.cicely.de>; from Bernd Walter on Wed, Oct 27, 1999 at 07:32:00PM +0200
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday, 27 October 1999 at 19:32:00 +0200, Bernd Walter wrote:
> The number of partitions has nothing to do with with the filesystem you use.
> FFS is not a partitionsheme but a filesystem.
> UFS is a historic filesystem on which FFS is based.

Well, in fact they're the same thing.  The *old* name is FFS (Fast
File System).  When System V.4 was released, they adopted FFS as the
standard file system and called it the UNIX File System.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 12:34:11 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from apollo.sitaranetworks.com (apollo.sitaranetworks.com [199.103.141.105])
	by hub.freebsd.org (Postfix) with ESMTP id D62DB15815
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 12:30:57 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991028085348.39481@mojave.worldwide.lemis.com>
Date: Thu, 28 Oct 1999 08:53:48 -0400
From: Greg Lehey <grog@lemis.com>
To: "Kenneth D. Merry" <ken@kdm.org>, Don <don@calis.blacksun.org>
Cc: Bernd Walter <ticso@cicely.de>,
	Alfred Perlstein <bright@wintelcom.net>, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
References: <Pine.BSF.4.05.9910272146010.36049-100000@calis.blacksun.org> <199910280305.VAA13281@panzer.kdm.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <199910280305.VAA13281@panzer.kdm.org>; from Kenneth D. Merry on Wed, Oct 27, 1999 at 09:05:04PM -0600
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday, 27 October 1999 at 21:05:04 -0600, Kenneth D. Merry wrote:
> Don wrote...
>>> Actually, it's technically 8 partitions, a-h, but c is "special", and
>>> shouldn't normally be used.
>> Correct C represents the entire disk.
>>
>>> This is a disklabel limitation, not a filesystem limitation.  I believe
>>> that Solaris x86 may be able to do 16 partitions (or so a guy at Sun told
>>> me).
>>
>> I will have to check this out. Thanks for the info. Is there any reason
>> that disklabel has this limit?
>
> It has been that way for a long time.  I'm not sure why the limit is 8, but
> it is.  (Someone might know.  I suspect it was just an arbitrary value
> chosen a long time ago.)  Changing it might break backwards compatibility,
> though.

There was some discussion about increasing it at one point.  But as
you say, it would probably confuse some programs, and I personally
don't see any need.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 12:34:12 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from apollo.sitaranetworks.com (apollo.sitaranetworks.com [199.103.141.105])
	by hub.freebsd.org (Postfix) with ESMTP id 548CD157CF
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 12:30:57 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991028085243.24656@mojave.worldwide.lemis.com>
Date: Thu, 28 Oct 1999 08:52:43 -0400
From: Greg Lehey <grog@lemis.com>
To: Don <don@calis.blacksun.org>
Cc: Bernd Walter <ticso@cicely.de>,
	Alfred Perlstein <bright@wintelcom.net>, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
References: <19991027173720.06226@mojave.worldwide.lemis.com> <Pine.BSF.4.05.9910272149360.36049-100000@calis.blacksun.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <Pine.BSF.4.05.9910272149360.36049-100000@calis.blacksun.org>; from Don on Wed, Oct 27, 1999 at 09:59:23PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday, 27 October 1999 at 21:59:23 -0400, Don wrote:
>>> [snipped in original: claim that Vinum wasn't ready for production]
>>
>> Oh, does it?  What problems have you seen?  You'd better tell all the
>> people who are using it in production, too.
>
> Ok can we stop with the insults? The point of this thread is research not
> attacks on anyone. I have seen problems with disk mirroring using vinum in
> which attempting to synchronize a new disk after a previous had failed
> caused a kernel panic and left me with no way to recreate the failed disk.
> This may have been fixed, however. At the time the problem was
> reproduceable and I did not have the time to investigate further.

If a tree falls in the forest, and nobody hears it, did it fall?

As I said above: "What problems have you seen?".  A kernel panic (is
there any other kind?) is a matter you should report.  We *have* had
problems in Vinum; as you say, this isn't necessarily the case at the
moment.

>> UFS on System V uses the System V partition table, which allows 15
>> partitions.  I don't know what use even 7 are, which is probably one
>> of the reasons nobody has done anything about it.
> Actually I simply run everything off of the root partition and allocate
> all of the space to that.
>
>> Yes, this is the usual result of using too many file system
>> partitions.
>
> No this is a result of a mistake in estimating the size that a given
> partition should be. This includes /var and / (although perhaps I
> should simply have a single file system mounted off of /)

Indeed.  But my crystal ball is broken, and I can't find anybody to
repair it.  How do *you* forsee the future?  In any case, even if you
can, what benefit do you have from a maze of twisty little file
systems, all different?

>> I'm not sure what you're talking about here, but the best thing I can
>> think of is Vinum.
>
> Vinum is a volume manager. I dont see why it keeps coming up in reference
> to a journaled file system.

It doesn't.  You were talking about partitioning, which also has
nothing to do with a journalling file system.

On Wednesday, 27 October 1999 at 22:06:30 -0400, Don wrote:
>>> Ok nevermind :) Either way vinum is not up to snuff. It still has a way to
>>> go before it can be used in a production environment.
>>
>> Oh, does it?  What problems have you seen?  You'd better tell all the
>> people who are using it in production, too.
>
> Perhaps you should read the vinum known bugs page.

What was that you were saying about insults above?

> That list is far too long for a production application.

Ah.  Could you define the correct length?  "0" is not an answer.

> If you dont feel it is too long then by all means use it. When I
> stop seeing the words "data corruption" and "kernel panic" on the
> known bugs page then I will use vinum.

Maybe you should read the context.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 12:34:23 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id 7D95315844
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 12:34:08 -0800 (PST)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.1/8.9.1) id NAA55372;
	Mon, 1 Nov 1999 13:33:56 -0700
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp05.primenet.com, id smtpdWO37qa; Mon Nov  1 13:33:50 1999
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id NAA01820;
	Mon, 1 Nov 1999 13:33:29 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911012033.NAA01820@usr02.primenet.com>
Subject: Re: Journaling
To: phk@critter.freebsd.dk (Poul-Henning Kamp)
Date: Mon, 1 Nov 1999 20:33:29 +0000 (GMT)
Cc: pechy@hp735.cvut.cz, grog@lemis.com, ticso@cicely.de,
	don@calis.blacksun.org, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG
In-Reply-To: <20347.941452616@critter.freebsd.dk> from "Poul-Henning Kamp" at Nov 1, 99 11:36:56 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> Jan Pechanec writes:
> >
> >	Vahalia [UNIX Internals, Prentice-Hall] says that FFS is an
> >original BSD filesystem and UFS is rewritten FFS for vnode layer.
> 
> Well, who do you trust, Kirk & the source, or Vahalia ?

The statement in Vahalia is ambiguous; I tried to get it amended
during technical editing.  The only really bad call in the book,
IMO, is that Vahalia likes the Solaris Slab allocator, but I
believe that it is seriously sub-optimal for SMP systems, and
I personally prefer the Dynix Zone allocator, which he doesn't
like as much.


The UFS in System V was originally the Net/2 FFS code, with minor
entry point rewrites for insertion into the VFS switch list in
the System V kernel.

The most significant differences in the current SVR4.2 code for
the UFS compared to the current Berkeley FFS are:

o	No support for vnode stacking, even though the original
	Heidemann code was done on an SVR4 (Solaris) platform.

o	The vnodes are owned by the file systems.  The ability
	to do this in BSD UNIX is missing at this time, which
	makes things like XFS and VXFS, etc., much harder to
	port.  There is some non-general kludge code to support
	TRW's TFS code's ownership of vnodes; it would be nice
	to generalize this to enable easier porting of FS code
	to FreeBSD.

o	No support for soft updates, even though the original
	Ganger/Patt code was developed under SVR4.0.2 ES/MP.

o	Support for Delayed Ordered Writes (DOW).  This is a
	mthod of staging writes; it is similar in result to
	soft updates, with hard-coded pool drains at any
	synchronization points (where soft updates would
	invoke a contention resolver, DOW forces a flush of
	all pending writes, to ensure ordering guarantees).
	DOW is covered by a USL patent.

o	Buffer cache synchronization is still handled manually,
	even within Solaris, which has a unified VM and buffer
	cache.  In particular, like FreeBSD, there are some
	code errors which make msync() necessary for some uses.
	The SVR4.2 buffer cache is not unified with the VM
	system in standard System V, although the VM system is
	significantly reworked for SMP (Steve Baumel of USL did
	much of the rewrite).

Having spent time in the bowels of that code, and in the bowels
of VXFS, and in a derivative of the code of my own design, I
can guarantee you that UFS and VXFS are derived from Net/2 code;
more correctly, UFS is derived from Net/2, and VXFS is derived
from UFS, in particular, its directory handling code has USL
copyrights all over it.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 13:10:46 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id 86B0A14DEA
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 13:10:17 -0800 (PST)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.1/8.9.1) id OAA60198;
	Mon, 1 Nov 1999 14:10:09 -0700
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp05.primenet.com, id smtpdbkSLia; Mon Nov  1 14:10:05 1999
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id OAA03339;
	Mon, 1 Nov 1999 14:10:03 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911012110.OAA03339@usr02.primenet.com>
Subject: Re: Journaling
To: grog@lemis.com (Greg Lehey)
Date: Mon, 1 Nov 1999 21:10:03 +0000 (GMT)
Cc: don@calis.blacksun.org, bright@wintelcom.net,
	freebsd-fs@FreeBSD.ORG
In-Reply-To: <19991027095431.45462@mojave.worldwide.lemis.com> from "Greg Lehey" at Oct 27, 99 09:54:31 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > Kirk McKusick has been working for the last year or so on
> > a combination of "soft-updates" (complete) and "snapshots"
> > (not released yet), once complete FFS will have the equivelant
> > of logging AND snapshots like the netapp appliance.
>
> I am familiar with softupdates but not with snapshots.

Snapshots are where you put a peg in the soft updates clock
and export the state as of the peg.  This lets you have a
consistant copy of the filesystem state, guaranteed, which will
not mutate out from under you while you are, for example, doing
a backup of the system.

This is a far cry from journalling, which, unless you do an
LRU on your journal allocations, doesn't have the capability
for "snapshots" (which would be "all journal entries prior to
the time of the snapshot").


> The reason for starting a new project was basically to once
> and for all get rid of UFS.

I assume you mean the on-disk structure.  Having been in the
bowels of VXFS (Veritas) in SVR4.2, I can guarantee you that
the on-disk directory structure is derived from the SVR4 UFS
implementation, and that the only real changes are to the
way inodes and inode data is stored.


> While there is nothing wrong with UFS it does have some limitations which
> I would like to eliminate such as a limit of 7 slices.

This is a limit of the disklabel partitioning scheme; you might
as well say you want to address the 4 partition limit in the FAT
FS, since it bears the same relation.

The big things that journalling buys you over soft updates or
logging are:

1)	The ability to come back up at the last valid journalled
	state, without checking the FS.  Like soft updates and
	LFS, this only works if you can tell the difference
	between a panic and a power failure; otherwise, you
	still need a full fsck.  If you know this, then it
	saves you the background cleanup of the cylinder group
	bitmaps that soft updates requires, and the background
	"cleanerd" that LFS requires.

2)	The ability to roll things forward following a crash,
	in as much as you know them to be true.  This saves
	you in the case of implied state between user files,
	without a synchronous commit process in effect (e.g.
	an index file for a record file).

> I would also like to add functionality such as the ability to
> grow and shrink partitions etc.

You can actually grow partitions with FFS.  Der Mouse has written
a program to extend FFS size, and it is publically available for
download.

The problem that arises is that the relative fragmentation rate
for the old and new zones are not constant.  If you think of the
block allocation process as a hashing process, you effectively
hash the blocks onto the disk.

The original reason for a large free reserve was based on Knuth's
seminumerical algorithms: sorting and searching, which states that
a hash fill in excess of 85% is the point of diminishing returns
for a perfect hash.  This actually means that the correct free
reserve for a hard disk, for optimal performance, is 15%, which
is almost twice the 8% set by MINFREE in fs.h (whose comments
are wrong now, as well)

So effectively, someone needs to write a defragger.  This is
actually quite trivial to do, it's just a lot of grunt work,
and the danger of a bug is rather amplified, so a lot of rigor
would be needed, as well.

The case of shrinking the available space is trivial, given a
defragger, since you can easily define a "no fly zone" for the
defragmentation process to get the data moved out of the
region that you are going to take away.


In any event, this is unrelated to the idea of journalling.


> Softupdates is also not recommended for use on the root partition and

This is actually a chicken-and-egg problem with setting the bit,
not really an issue of "not being recommended for the root fs";
it's a bit hard to tunefs /.  It's likely that the integration
of character and block devices will make it impossible, without
a seperate boot, since you will no longer be able to "cheat".


> it still seems to be just a little flaky. Every once in a while I wind up
> with a problem which I have traced to softupdates but which I could 
> not recreate. (To be fair I have not had a problem in a month or two now)

I think these are more VM issues, than anything else; when things
change, they tend to break where they are most fragile.  The order
guarantees in soft updates must be rigidly enforced by the systems
on which it depends.

If you are not running a UPS, and you are using soft updates, you
should make sure to turn off write-caching on your disk drive,
since it doesn't do cache flush ("committed to stable storage")
notification, and the cache flush operation, if exported by the
drive, is not integrated, so soft updates can neither force a
flush at a synchronization point, nor can it intentioanlly stall
writes over a synchronization point, pending flush notification.

In any case, so long as you use it correctly, you should not be
experienceing any problems, and I'm sure many of us would be very
interested in knowing about any problems you see (Julian and Kirk,
especially).


Again, soft updates is a contention resoloution technology that
is used to guaranteed ordering of metadata writes.  I believe
that there are good technical arguments why you might want to
use soft updates technology, even if you had journalled metadata,
to allow dependency ordered log data to be logged on a clock
tick rather than on a synchronization point, and to ensure that
the journalling process itself does not become a bottleneck.

That said, without a distributed cache coherency protocol, you
would potentially have to give up some goals, such as multiple
machine access to the same filesystem over a shared SCSI bus,
like XFS, for example.


> > In so far as codebase there is the LFS project, currently 
> > fixed (afaik) in NetBSD, perhaps porting that to FreeBSD
> > would be worthwhile.
>
> This is indeed going to be the starting point for this project but
> I hope I would be able to take it far beyond this.

Logging and journalling are very different animals, even if some
of the tricks that both do are conceptually similar.

I would actually _disourage_ using the LFS as a starting point
for a JFS, since I believe that it would limit you options in a
number of subtle, but important ways.

Also note that XFS is log structured (they have posted their
logging code under GPL, up at SGI, as a "teaser" while they
"clean" the remainder of their code of encumberances, presumably
USL).

Actually, AIX has a device driver writer's supplementary guide,
which comes with source code for an MFS for AIX, and goes into
great detail about th AIX GFS (think file system switch) abstraction,
and into some detail on the AIX JFS, as well.  I was able to, for
example, reverse engineer the entry points for the file locking
code, which was not externalized in AIX 4, in support of a shared
file descriptor pool that could be used by multiple processes -- a
poor man's "rfork".  You have to order this book seperately, since
it doesn't come with the full documentation set.


You might also want to look at the NTFS implementation, as it
is described in the thin (about 1/4 inch thick) Helen Custer
book.

I believe that kernel changes, and in particular, changes to the
way VOP_ABORT has to be called and implemented for journalling,
will be necessary.  It may be easier for you to make these changes
with a partially working example, by making the existing NTFS code
read/write instead of read-only.  Don't despair: Linux is going to
require much more extensive VFS changes to support journalling
than FreeBSD, so you are ahead of the game, even though the Linux
JFS project is already under way.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 13:22:48 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id 712CB14CC1
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 13:22:34 -0800 (PST)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id OAA15950;
	Mon, 1 Nov 1999 14:21:46 -0700 (MST)
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp03.primenet.com, id smtpdAAAEXay4D; Mon Nov  1 14:21:41 1999
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id OAA03623;
	Mon, 1 Nov 1999 14:19:24 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911012119.OAA03623@usr02.primenet.com>
Subject: Re: Journaling
To: dhw@whistle.com (David Wolfskill)
Date: Mon, 1 Nov 1999 21:19:23 +0000 (GMT)
Cc: bright@wintelcom.net, don@calis.blacksun.org,
	freebsd-fs@FreeBSD.ORG
In-Reply-To: <199910271440.HAA31103@pau-amma.whistle.com> from "David Wolfskill" at Oct 27, 99 07:40:16 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >> Kirk McKusick has been working for the last year or so on
> >> a combination of "soft-updates" (complete) and "snapshots"
> >> (not released yet), once complete FFS will have the equivelant
> >> of logging AND snapshots like the netapp appliance.
> >
> >I am familiar with softupdates but not with snapshots.
> 
> Take a look at Network Appliance's "WAFL".  (They have some white
> papers up on their Web site, http://www.netapp.com/.  In particular, the
> one at http://www.netapp.com/tech_library/3002.html descibes WAFL and
> snapshots.)

Note that the internal implementation of the Network Appliance
embedded OS is a non-preemptive cooperative multitasking model,
similar to the internal implementation of NetWare, where threads
either run to completion or until an explicit yield (this is also
why NetWare never did the SMP thing correctly for Native NetWare,
and why NetWare for UNIX is able to beat its performance numbers
on identical single processor hardware, but really kicks its butt
when it comes to SMP hardware).

The upshot of this is that the WAFL implementation make some
seriously invalid-for-FreeBSD assumptions about not having to
have explicit synchronization primitives anywhere.

Short of going to a similar kernel model (kernel threads handling
device drivers are a generally bad idea for a lot of reasons,
including the one where NT was able to kick Linux's ass with
the Microsoft specified four ethernet cards on a 4 processor SMP
box in the Netcraft and Ziff Davis labs tests), you would have to
add significant overhead to the WAFL design discussed in those
documents.  It would perform very poorly in a standard UNIX kernel,
without some significant organizational changes to eliminate the
large number of threads model implied synchronization points from
having to be changed to explicit locks.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 13:28:49 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135])
	by hub.freebsd.org (Postfix) with ESMTP id 2611F14E2A
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 13:28:29 -0800 (PST)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp05.primenet.com (8.9.1/8.9.1) id OAA46230;
	Mon, 1 Nov 1999 14:28:18 -0700
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp05.primenet.com, id smtpdCtTf7a; Mon Nov  1 14:28:14 1999
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id OAA03945;
	Mon, 1 Nov 1999 14:28:11 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911012128.OAA03945@usr02.primenet.com>
Subject: Re: Journaling
To: bde@zeta.org.au (Bruce Evans)
Date: Mon, 1 Nov 1999 21:28:11 +0000 (GMT)
Cc: Brendon_Meyer@fmi.com, grog@lemis.com, freebsd-fs@FreeBSD.ORG
In-Reply-To: <Pine.BSF.4.10.9910301611140.8879-100000@alphplex.bde.org> from "Bruce Evans" at Oct 30, 99 04:18:12 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> The supply of 'FDISK' style slices is essentially unlimited.  I believe the
> limit is 2G or 4G slices for the 'FDISK' (extended) data structure.  FreeBSD
> drivers only support the first 30 and FreeBSD fdisk only supports the first 4.

The slice size limit is based on the 8G overall limit on a partition
in which you can place an extended partition.

In actual fact, the DOS partition table has a 32 bit alternate
size field, which is the count of sectors in the partition, that
is supposed to be used when the C/H/S values are all set to zero;
clearly, it breaks some backward compatability, when used.

This puts the upper bound on a single partition at ~1TB, and the
offset is the same, so you should be able to map an ~2TB disk in
two partitions using FDISK partitioning.

The only caveat is that you won't be able to share the disk with
older versions of DOS and Windows, unless you put a smaller
partition of standard C/H/S values up front.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 13:52:20 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134])
	by hub.freebsd.org (Postfix) with ESMTP id C870A15028
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 13:51:59 -0800 (PST)
	(envelope-from tlambert@usr02.primenet.com)
Received: (from daemon@localhost)
	by smtp04.primenet.com (8.9.3/8.9.3) id OAA27759;
	Mon, 1 Nov 1999 14:51:27 -0700 (MST)
Received: from usr02.primenet.com(206.165.6.202)
 via SMTP by smtp04.primenet.com, id smtpdAAAQFaWd2; Mon Nov  1 14:51:18 1999
Received: (from tlambert@localhost)
	by usr02.primenet.com (8.8.5/8.8.5) id OAA05179;
	Mon, 1 Nov 1999 14:51:45 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911012151.OAA05179@usr02.primenet.com>
Subject: Re: journaling UFS and LFS
To: Stephen.Byan@quantum.com (Stephen Byan)
Date: Mon, 1 Nov 1999 21:51:44 +0000 (GMT)
Cc: freebsd-fs@FreeBSD.ORG
In-Reply-To: <8133266FE373D11190CD00805FA768BF02EE9DD7@shrcmsg1.tdh.qntm.com> from "Stephen Byan" at Nov 1, 99 07:04:24 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >> Softupdates is definitely a viable solution however it does not address
> >> several issues and the license is not a BSD license so it makes me
> >> uncomfortable.

The license issue is a Whistle thing.  Talk to Julian and get him
to pound on Doug Brent, preferrably before December 31st of this year.


> >Could you let me know what SoftUpdate does not address?
> >Thank you.
> 
> One potential problem with soft updates is that the order of
> creation/deletion/truncation/etc of files is not preserved through a crash
> or power outage, wheras UFS and logged file systems  (not logging file
> systems as in LFS; what do you say the kind that maintain a recovery log in
> addition to their regular metadata?) preserve this ordering. I wonder how
> many recovery strategies are broken by soft updates. Anyone have any data?

This is not strictly true of soft updates, if you have a well
behaved disk drive.

The problem with the current implementation is that, when you
have uncooperative hardware, you have to sacrifice some of your
performance by disabling write caching.

Probably, you have not disabled write caching.

If the drive would notify when the data has truly been committed
to stable storage, as opposed to the write cache, or even if there
were an out of band mechanism to force the drive to flush its write
cache (and eat the stall that would have to be introduced for this
to work), you could get significantly better performance without
risking your data.


The main recovery strategy that soft updates allows is that,
after a crash, the file system state is consistant, with the
exception of unallocated blocks showing as allocated in the
unflushed-at-the-time-of-the-crash cylinder group bitmaps.

Technically, you could lock access to particular cylinder
groups as you were fixing up their bitmaps, and effectively
do your fsck in the background.

One real problem that remains unaddressed in this case, however,
is the chicken-and-egg problem.  That is, there is no way to
distinguigh a power failure or an FS-unrelated panic from an
FS-related panic, such as a real disk hardware or buffer cache
corrupting failure -- data non-corrupting vs. data corrupting
crashes.

Without this information, it is unsafe to assume that the
crash was an uncorrupting crash, and do the abbrevated fsck.

Adding this information would require adding a new bit into
the super block, and being willing to write the superblock
back in the event of a panic.  You would probably have to add
a flags parameter to the front of the panic() function in
order to tell it what kind of crash was happening; this would
be a hell of a lot safer than, for example, a global variable.

Another thing that could mitigate this, at least on relatively
quiescent systems (e.g. it'd work for power failures in the
middle of the night, but wouldn't work for systems with disk
writes going on) would be "soft read-only".  This would flush
all writes, and then if no new writes came in for "a while",
you would set a flag on the in code FS structure that you were
marking it "soft read-only", and then write out the superblock
marking it clean.  Subsequent writes would be permitted, but
only when the "soft read-only" bit was cleared, after remarking
the super block dirty again.

We actually implemented both soft updates and soft read-only
in our port of FFS to Windows 95, at Artisoft.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 15:44:36 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from implode.root.com (root.com [209.102.106.178])
	by hub.freebsd.org (Postfix) with ESMTP id 0CC6A14C81
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 15:44:31 -0800 (PST)
	(envelope-from dg@implode.root.com)
Received: from implode.root.com (localhost [127.0.0.1])
	by implode.root.com (8.8.8/8.8.5) with ESMTP id PAA07714;
	Mon, 1 Nov 1999 15:38:21 -0800 (PST)
Message-Id: <199911012338.PAA07714@implode.root.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: Stephen.Byan@quantum.com (Stephen Byan), freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS 
In-reply-to: Your message of "Mon, 01 Nov 1999 21:51:44 GMT."
             <199911012151.OAA05179@usr02.primenet.com> 
From: David Greenman <dg@root.com>
Reply-To: dg@root.com
Date: Mon, 01 Nov 1999 15:38:21 -0800
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>> >> Softupdates is definitely a viable solution however it does not address
>> >> several issues and the license is not a BSD license so it makes me
>> >> uncomfortable.
>
>The license issue is a Whistle thing.  Talk to Julian and get him
>to pound on Doug Brent, preferrably before December 31st of this year.

   How is the softupdates license a Whistle thing? It seems to me that it is
a Kirk McKusick and Sun MicroSystems thing.

-DG

David Greenman
Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org
Creator of high-performance Internet servers - http://www.terasolutions.com
Pave the road of life with opportunities.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 15:52:44 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from uni4nn.gn.iaf.nl (osmium.gn.iaf.nl [193.67.144.12])
	by hub.freebsd.org (Postfix) with ESMTP id 685A014A0D
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 15:52:18 -0800 (PST)
	(envelope-from wilko@yedi.iaf.nl)
Received: from yedi.iaf.nl (uucp@localhost)
	  by uni4nn.gn.iaf.nl (8.9.2/8.9.2) with UUCP id AAA29469;
	  Tue, 2 Nov 1999 00:37:21 +0100 (MET)
Received: (from wilko@localhost)
	by yedi.iaf.nl (8.9.3/8.9.3) id XAA25459;
	Mon, 1 Nov 1999 23:06:03 +0100 (CET)
	(envelope-from wilko)
From: Wilko Bulte <wilko@yedi.iaf.nl>
Message-Id: <199911012206.XAA25459@yedi.iaf.nl>
Subject: Re: journaling UFS and LFS
In-Reply-To: <199911012151.OAA05179@usr02.primenet.com> from Terry Lambert at "Nov 1, 1999  9:51:44 pm"
To: tlambert@primenet.com (Terry Lambert)
Date: Mon, 1 Nov 1999 23:06:03 +0100 (CET)
Cc: Stephen.Byan@quantum.com, freebsd-fs@FreeBSD.ORG
X-Organisation: Private FreeBSD site - Arnhem, The Netherlands
X-pgp-info: PGP public key at 'finger wilko@freefall.freebsd.org'
X-Mailer: ELM [version 2.4ME+ PL43 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

As Terry Lambert wrote ...
> > >> Softupdates is definitely a viable solution however it does not address
> > >> several issues and the license is not a BSD license so it makes me
> > >> uncomfortable.
> 
> The license issue is a Whistle thing.  Talk to Julian and get him
> to pound on Doug Brent, preferrably before December 31st of this year.
> 
> 
> > >Could you let me know what SoftUpdate does not address?
> > >Thank you.
> > 
> > One potential problem with soft updates is that the order of
> > creation/deletion/truncation/etc of files is not preserved through a crash
> > or power outage, wheras UFS and logged file systems  (not logging file
> > systems as in LFS; what do you say the kind that maintain a recovery log in
> > addition to their regular metadata?) preserve this ordering. I wonder how
> > many recovery strategies are broken by soft updates. Anyone have any data?
> 
> This is not strictly true of soft updates, if you have a well
> behaved disk drive.
> 
> The problem with the current implementation is that, when you
> have uncooperative hardware, you have to sacrifice some of your
> performance by disabling write caching.
> 
> Probably, you have not disabled write caching.
> 
> If the drive would notify when the data has truly been committed
> to stable storage, as opposed to the write cache, or even if there
> were an out of band mechanism to force the drive to flush its write
> cache (and eat the stall that would have to be introduced for this
> to work), you could get significantly better performance without
> risking your data.

On SCSI you should be able to use the SYNCHRONISE CACHE cmd to get the
data onto the platter. How to prioritise this cmd into the various queues
is another matter. (As are less-than-wellbehaved SCSI devices that A.
don't implement the cmd or B. implement it wrongly or C. forget all about
write caching when hit with a SCSI bus reset or... you get my point)

Wilko
-- 
|   / o / /  _  	 Arnhem, The Netherlands	- Powered by FreeBSD -
|/|/ / / /( (_) Bulte 	 WWW  : http://www.tcja.nl 	http://www.freebsd.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 16:16:39 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.tvol.com (mail.wgate.com [38.219.83.4])
	by hub.freebsd.org (Postfix) with ESMTP id 86A5414DB4
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 16:16:35 -0800 (PST)
	(envelope-from rjesup@wgate.com)
Received: from jesup.eng.tvol.net (jesup.eng.tvol.net [10.32.2.26]) by mail.tvol.com (8.8.8/8.8.3) with ESMTP id TAA14664 for <freebsd-fs@FreeBSD.ORG>; Mon, 1 Nov 1999 19:11:55 -0500 (EST)
Reply-To: Randell Jesup <rjesup@wgate.com>
To: freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS
References: <199911012151.OAA05179@usr02.primenet.com>
From: Randell Jesup <rjesup@wgate.com>
Date: 01 Nov 1999 20:13:04 +0000
In-Reply-To: Terry Lambert's message of "Mon, 1 Nov 1999 21:51:44 +0000 (GMT)"
Message-ID: <ybu4sf6gecv.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
X-Mailer: Gnus v5.6.43/Emacs 20.4
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Terry Lambert <tlambert@primenet.com> writes:
>Another thing that could mitigate this, at least on relatively
>quiescent systems (e.g. it'd work for power failures in the
>middle of the night, but wouldn't work for systems with disk
>writes going on) would be "soft read-only".  This would flush
>all writes, and then if no new writes came in for "a while",
>you would set a flag on the in code FS structure that you were
>marking it "soft read-only", and then write out the superblock
>marking it clean.  Subsequent writes would be permitted, but
>only when the "soft read-only" bit was cleared, after remarking
>the super block dirty again.

	This scheme was used for the Amiga FS's - in fact it was critical
for them, since there was no explicit 'shutdown' command.  The root block
(equivalent to superblock) would be marked dirty (and flushed to disk) if
metadata (including file sizes) changed, and if there was no write activity
for a second or two it would be flushed and the root block would be written
with a clean flag.  (This is a simplification, of course.)

	On a single-user system, the disks are often (usually) quiescent
and thus would be marked clean (even during use - mine's totally quiet
right now).  On busier systems or under load the superblock would rarely be
left in the clean state, however.  Also, because of write ordering and the
way files were created, during validation (aka fsck) the disk was readable;
in some instances if there were corruption a file or directory might not be
accessible, and an error would be returned (of course, the validation
process would normally fix said error when it got to it).  If something
tried to write to an unvalidated drive, the filesystem would return an
error, and the Write()/Create()/Delete()/etc OS code would put up an
error/retry requester, which would automatically go away (and retry) once
the drive validated.  Validation was also quite fast by fsck standards.
Not all problems could be solved by the built-in validator; disk-recovery
tools could attempt to fix even very seriously hosed disks.  Since the disk
was usually mostly readable even with an uncorrectable error, often the
disk recovery program could be run from the bad partition itself if need
be.

	Of course, this is mostly of historical interest at this point, but
some of the ideas used in it show up moderately often (witness msg above).

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94)
rjesup@wgate.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 17:44: 6 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from web120.yahoomail.com (web120.yahoomail.com [205.180.60.121])
	by hub.freebsd.org (Postfix) with SMTP id D0A3214E94
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 17:44:05 -0800 (PST)
	(envelope-from dyeske@yahoo.com)
Message-ID: <19991102014632.4418.rocketmail@web120.yahoomail.com>
Received: from [209.186.12.16] by web120.yahoomail.com; Mon, 01 Nov 1999 17:46:32 PST
Date: Mon, 1 Nov 1999 17:46:32 -0800 (PST)
From: David Yeske <dyeske@yahoo.com>
Subject: unsubscribe
To: freebsd-fs@FreeBSD.ORG
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

unsubscribe

=====

__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Mon Nov  1 18:47:40 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id E012A14D42
	for <freebsd-fs@FreeBSD.ORG>; Mon,  1 Nov 1999 18:47:27 -0800 (PST)
	(envelope-from robert@cyrus.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.9.3/8.9.3) with SMTP id VAA22569;
	Mon, 1 Nov 1999 21:20:21 -0500 (EST)
	(envelope-from robert@cyrus.watson.org)
Date: Mon, 1 Nov 1999 21:20:21 -0500 (EST)
From: Robert Watson <robert@cyrus.watson.org>
X-Sender: robert@fledge.watson.org
Reply-To: Robert Watson <robert+freebsd@cyrus.watson.org>
To: Rodney <rr@xs4all.nl>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991031120514.A28103@xs4all.nl>
Message-ID: <Pine.BSF.3.96.991101211533.22399A-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sun, 31 Oct 1999, Rodney wrote:

> here's my list of features I'd like to see in a
> journalled fs. Have to admit this list is heavily
> inspired ( ok , copied ) from the VxFS features,
> apart from th buzz words,
> some of them make sense, some of them don't
> but it should give us some stuff to discus:
> 
> 1) extent based allocation
>    coding this should be easy, it's just a address-lenght pair
>    identifying the starting block address and the length of the 
>    extent. I've seen this coded up in qnxfs under linux.
>    I think the vsta filesystem does something similar.
> 2) fast filesystem recovery , obviously
> 3) acls would be nice , afs style ?
> 4) online defrag and resizing (while user are online)
> 5) online backup/snapshot
> 6) vinum integration (vague)
> 7) built features that make databases very happy
>    like msql/mysql/oracle. (vague)
> 
> also b?trees for indexing sounds cool, thought the xfs
> implementation seems quite heavy(they maintain 2 of them)
> , ie over-kill ? 
> The way b+trees are use in the Be fs (bfs) might be more
> appropriate.

I guess I'd be interested in more seperation of the on-top semantics and
filestore.  I.e., some piece of code provides a transactional
filestore--inodes, attributes, and blocks of data.  On top of that, a
semantics layer can build directories, acls, etc.  This way the
transactional implementation doesn't make a mess of the otherwise clean
ufs-like behavior?  I'm not sure how feasible that is, but it would be
nice if possible.  I'd also like to see things like ACLs implemented as
attributes for storage purposes--while the VFS layer would expose
vop_get/set_acl, et al, the top file system layer (not in traditional
layering sense of the word) would convert these to internal
vop_get/set_extattr calls.  I'm really interested in an FS that provides
transactional consistency over the inodes (or equiv) and attributes in an
extensible way, allowing people to develop extensions (such as ACLs, MAC,
etc) without having to understand the filestore in all its complexity.

BTW, a useful thing to address would be consistency across layers in a
stacked file system--something that I haven't really seen discussed
anywhere...

  Robert N M Watson 

robert@fledge.watson.org              http://www.watson.org/~robert/
PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
TIS Labs at Network Associates, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2  4:42:20 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from antioche.lip6.fr (antioche.lip6.fr [132.227.74.11])
	by hub.freebsd.org (Postfix) with ESMTP id 40CF114DFC
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 04:42:09 -0800 (PST)
	(envelope-from bouyer@antioche.lip6.fr)
Received: from antifer.ipv6.lip6.fr (antifer.ipv6.lip6.fr [132.227.72.132])
	by antioche.lip6.fr (8.9.3/8.9.3) with ESMTP id NAA01796;
	Tue, 2 Nov 1999 13:41:53 +0100 (MET)
Received: (bouyer@localhost) by antifer.ipv6.lip6.fr (8.8.8/8.6.4) id NAA18983; Tue, 2 Nov 1999 13:41:52 +0100 (MET)
Date: Tue, 2 Nov 1999 13:41:52 +0100
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
To: Terry Lambert <tlambert@primenet.com>
Cc: "Kenneth D. Merry" <ken@kdm.org>, don@calis.blacksun.org,
	ticso@cicely.de, grog@lemis.com, bright@wintelcom.net,
	freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
Message-ID: <19991102134152.A18969@antioche.lip6.fr>
References: <199910280305.VAA13281@panzer.kdm.org> <199910291710.KAA16646@usr02.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.6us
In-Reply-To: <199910291710.KAA16646@usr02.primenet.com>; from Terry Lambert on Fri, Oct 29, 1999 at 05:10:14PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, Oct 29, 1999 at 05:10:14PM +0000, Terry Lambert wrote:
> NetBSD currently supports 16.
> 
> Yes, it breaks backward compatability.

No, NetBSD supports 16 only on ports that started with 16.
Other still are 8. There are discussions about how  to move to a higther
number (not 16, but at last 64 or more) without breacking backward
compatability ...

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2  4:47:12 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from antioche.lip6.fr (antioche.lip6.fr [132.227.74.11])
	by hub.freebsd.org (Postfix) with ESMTP id 0978314DFC
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 04:47:04 -0800 (PST)
	(envelope-from bouyer@antioche.lip6.fr)
Received: from antifer.ipv6.lip6.fr (antifer.ipv6.lip6.fr [132.227.72.132])
	by antioche.lip6.fr (8.9.3/8.9.3) with ESMTP id NAA01855;
	Tue, 2 Nov 1999 13:47:02 +0100 (MET)
Received: (bouyer@localhost) by antifer.ipv6.lip6.fr (8.8.8/8.6.4) id NAA18991; Tue, 2 Nov 1999 13:47:01 +0100 (MET)
Date: Tue, 2 Nov 1999 13:47:01 +0100
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
To: Kelly Yancey <kbyanc@posi.net>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
Message-ID: <19991102134701.B18969@antioche.lip6.fr>
References: <Pine.BSF.4.05.9910301729120.69564-100000@kronos.alcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.6us
In-Reply-To: <Pine.BSF.4.05.9910301729120.69564-100000@kronos.alcnet.com>; from Kelly Yancey on Sat, Oct 30, 1999 at 05:54:56PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sat, Oct 30, 1999 at 05:54:56PM -0400, Kelly Yancey wrote:
>   Slightly off topic (as if the topic were about journalling anymore in
> this thread anyway :) )...
>   From my perusal of the code, it looks as if the NetBSD change from
> 386BSD's partition ID of 165 (which we still use) to 169 is unrelated to
> the change to 16 partitions. Actually, I can't find where it is useful at
> all; I would have assumed that if they were going to break
> backward-compatibility by going to 16 partitions, switching MBR partition
> IDs at the same time would be logical.
>   Does anyone here know the reasoning between switching MBR partition IDs?

It's because FreeBSD also uses 165, this makes it hard to install both OSes
on the same HD.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
     {Net,Free}BSD: 22 ans d'experience feront toujours la difference
--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2  8:31:29 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id C03AB15719
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 08:31:22 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991102102601.54815@mojave.sitaranetworks.com>
Date: Tue, 2 Nov 1999 10:26:01 -0500
From: Greg Lehey <grog@lemis.com>
To: Don <don@calis.blacksun.org>, freebsd-fs@FreeBSD.ORG
Subject: Re: Features of a journaled file system
Reply-To: Greg Lehey <grog@lemis.com>
References: <Pine.BSF.4.05.9910301851350.44044-100000@calis.blacksun.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <Pine.BSF.4.05.9910301851350.44044-100000@calis.blacksun.org>; from Don on Sat, Oct 30, 1999 at 06:56:24PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Saturday, 30 October 1999 at 18:56:24 -0400, Don wrote:
> What are the features people would like to see in a new FreeBSD file
> system? Some of the ones I have heard listed are:
> 1. Ability to grow a FS
> 2. Ability to shrink a FS
> 3. Acess control lists on files and file systems
> 4. Extensibility. (The ability to easily add new features to the
>    filesystem without having to rewrite utilities such as fsck)

None of these are specific features of a journalling file system.
They're probably all desirable.

Greg
-- 
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2  8:40:59 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id 89D5514BF5
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 08:40:50 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991102102703.16459@mojave.sitaranetworks.com>
Date: Tue, 2 Nov 1999 10:27:03 -0500
From: Greg Lehey <grog@lemis.com>
To: Chang Song <song@zk3.dec.com>,
	Ollivier Robert <roberto@keltia.freenix.fr>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Features of a journaled file system
Reply-To: Greg Lehey <grog@lemis.com>
References: <Pine.BSF.4.05.9910301851350.44044-100000@calis.blacksun.org> <19991031014032.A3510@keltia.freenix.fr> <381B85AB.68EF4A45@zk3.dec.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <381B85AB.68EF4A45@zk3.dec.com>; from Chang Song on Sat, Oct 30, 1999 at 07:56:27PM -0400
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Saturday, 30 October 1999 at 19:56:27 -0400, Chang Song wrote:
> Ollivier Robert wrote:
>>
>> According to Don:
>>> Should the file system use b-trees? What other technologies should such a
>>
>> B-trees would help a lot in some cases. UFS performance has always been
>> abyssimal with large directories...
>
> I think B+ tree is too complex to maintain and implement.

Tandem has been using such a system since 1974.  I can't remember
anybody having much in the way of problems with it.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2  9:28:59 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id D41D614A09
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 09:28:35 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id SAA23846;
	Tue, 2 Nov 1999 18:28:28 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id SAA81192;
	Tue, 2 Nov 1999 18:28:27 +0100 (MET)
Date: Tue, 2 Nov 1999 18:28:27 +0100
From: Eivind Eklund <eivind@FreeBSD.ORG>
To: Erez Zadok <ezk@cs.columbia.edu>
Cc: Mats Lofkvist <mal@algonet.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems?
Message-ID: <19991102182827.B72085@bitbox.follo.net>
References: <y2qk8o35pl4.fsf@kairos.algonet.se> <199910312211.RAA00014@shekel.mcl.cs.columbia.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu>; from ezk@cs.columbia.edu on Sun, Oct 31, 1999 at 05:11:20PM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sun, Oct 31, 1999 at 05:11:20PM -0500, Erez Zadok wrote:
> Many people on this list understand the problems and know how to fix them.
> There are even some experimental patches made by Eivind Eklund, but those
> patches aren't part of the kernel.  Eivind's patches used to be in
> 
> 	http://www.freebsd.org/~eivind/VOP_GETBACKINGOBJECT.patch
> 
> and now they appear to be in 
> 
> 	http://www.freebsd.org/~eivind/FixNULL.patch
> 
> (Eivind, can you confirm the new URL?  FixNull.patch seems to include stuff
> unrelated to the VFS, such as scsi driver fixes.  Thanks.)

The URL is correct - those fixes are there because the environment I
used for working on those patches were somewhat unusual
(cross-compilation from a RELENG_2_2 box), and that brokenness was in
the way of me doing FS work, so it is fixed in that tree (though not
committed, as I was not sure it was a good idea).

> There's also been talk about some people (McKusick et al) rewriting the
> whole VFS.  While I think that's a great idea, it's a large undertaking and
> will take a long while for busy people like McKusick to complete.  I think a
> complete rewrite, if any, should be scheduled for 5.x.  I would therefore
> suggest that a simpler fix such as Eivind's be incorporated into a 4.0 so
> people can use stackable f/s (unionfs, nulls, and my wrapfs/cryptfs, etc.)
> in the more immediate future.

My patches doesn't solve the entire problem.  If they actually had
created a working environment for stacking layers, they would have
been in the kernel already.

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 10:58: 0 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id 638971536A
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 10:57:55 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991102123553.21474@mojave.sitaranetworks.com>
Date: Tue, 2 Nov 1999 12:35:53 -0500
From: Greg Lehey <grog@lemis.com>
To: Rodney <rr@xs4all.nl>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Reply-To: Greg Lehey <grog@lemis.com>
References: <19991031120514.A28103@xs4all.nl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991031120514.A28103@xs4all.nl>; from Rodney on Sun, Oct 31, 1999 at 12:05:14PM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote:
>
>
> hi,
>
> here's my list of features I'd like to see in a
> journalled fs. Have to admit this list is heavily
> inspired ( ok , copied ) from the VxFS features,
> apart from th buzz words,
> some of them make sense, some of them don't
> but it should give us some stuff to discus:
> [snip]
> 6) vinum integration (vague)

Vinum is just a virtual disk.  As such, any file system should work on
it.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 13:45:58 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id 3300814D37
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 13:45:48 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991102154051.35226@mojave.sitaranetworks.com>
Date: Tue, 2 Nov 1999 15:40:51 -0500
From: Greg Lehey <grog@lemis.com>
To: Randell Jesup <rjesup@wgate.com>, freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS
Reply-To: Greg Lehey <grog@lemis.com>
References: <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org> <ybuogdehqkc.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <ybuogdehqkc.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>; from Randell Jesup on Mon, Nov 01, 1999 at 02:51:47AM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  1 November 1999 at  2:51:47 +0000, Randell Jesup wrote:
> Don <don@calis.blacksun.org> writes:
>>> Most corporate IT managers wouldn't know a filesystem if they were
>>> bitten by one.
>> That is absolutely the case. That is why I can not suggest that
>> softupdates is as good as a journaled file system. The people I deal with
>> at least know the buzzword and they want to make sure that whatever
>> solution they go with will have it.
>
> 	Question: is the fsck time for softupdates the same as for
> plain UFS (when it needs to fsck, which should be (much) less often,
> if I remember correctly).

My understanding is that the fsck is identical.  The only advantage
that soft updates brings is that the danger of damage is much less.

> Even the occasional long-fsck-time can be a problem for a
> high-availability production environment.

Agreed.  This is the biggest advantage of a log-based fs.

> 	Side question: why is it that there are certain errors (inode out
> of range, for example) that fsck barfs on and exits?

Because it's broken.  We should be able to recognize and fix all these
problems.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 13:59:24 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id A53671546F
	for <fs@FreeBSD.org>; Tue,  2 Nov 1999 13:59:08 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991102155021.38326@mojave.sitaranetworks.com>
Date: Tue, 2 Nov 1999 15:50:21 -0500
From: Greg Lehey <grog@lemis.com>
To: fs@FreeBSD.org, Don <don@calis.blacksun.org>
Subject: Re: journaling UFS and LFS
Reply-To: Greg Lehey <grog@lemis.com>
References: <86hfj63es8.fsf@not.demophon.com> <Pine.BSF.4.05.9911010732230.46403-100000@calis.blacksun.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <Pine.BSF.4.05.9911010732230.46403-100000@calis.blacksun.org>; from Don on Mon, Nov 01, 1999 at 07:38:42AM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

[moved to fs]

On Monday,  1 November 1999 at  7:38:42 -0500, Don wrote:
>> *Very* different from LFS.  (What are features?  "Has files and
>> directories"?  Time-complexity?  Implementation details?  Buzzwords?)
>
> You know. Features. As in those things that people would like to see in
> such a file system. The features we would like to see have already
> been listed. Please see the archives if you want to know what was
> considered a "feature".
>
> Besides, VxFS has a closer feature set to what I would like to see.

Has anybody thought of lobbying Veritas to release VxFS?  I think you
might just find some open ears.  If anybody's serious about this,
contact me privately and I can give some suggestions.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 13:59:30 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP
	id C92911546E; Tue,  2 Nov 1999 13:59:08 -0800 (PST)
	(envelope-from grog@lemis.com)
Message-ID: <19991102154614.55760@mojave.sitaranetworks.com>
Date: Tue, 2 Nov 1999 15:46:14 -0500
From: Greg Lehey <grog@lemis.com>
To: Eivind Eklund <eivind@FreeBSD.org>, Don <don@calis.blacksun.org>
Cc: Jacques Vidrine <n@nectar.com>, freebsd-fs@FreeBSD.org
Subject: Re: journaling UFS and LFS
Reply-To: Greg Lehey <grog@lemis.com>
References: <19991030233304.03DB31DA4@bone.nectar.com> <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org> <19991101171936.J72085@bitbox.follo.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991101171936.J72085@bitbox.follo.net>; from Eivind Eklund on Mon, Nov 01, 1999 at 05:19:36PM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Monday,  1 November 1999 at 17:19:36 +0100, Eivind Eklund wrote:
> On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote:
>> This is getting off topic. What features would you like to see in a new
>> file system. Some suggestions were made. Would you like to add anything to
>> this list?
>
> Yes.
> * Easy to do concurrent access from multiple hosts to the same
>   physical media

You can never do this in the general case (where any host may request
access to any part of the disk).  The best you could do there is a
file server, but they're not quite our terms of reference.

> * Ability to span more than one disk

That's not necessarily a file system feature.  Vinum does that now.

> I have design papers on the FS designed for G2, which was intended to
> support all of the features I've seen listed so far.  It has a couple
> of drawbacks:
> (1) It is not designed to have the semantics of a standard Unix
>     filesystem.

That doesn't surprise me, if you want to implement the first of your
suggestions.

Is there anything in there which would be of interest in our
environment?

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 15:54:56 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id B752D14E3E
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 15:54:32 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id AAA22553;
	Wed, 3 Nov 1999 00:47:42 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id AAA88098;
	Wed, 3 Nov 1999 00:54:16 +0100 (CET)
Date: Wed, 3 Nov 1999 00:54:16 +0100
From: Bernd Walter <ticso@cicely.de>
To: Greg Lehey <grog@lemis.com>
Cc: Rodney <rr@xs4all.nl>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Message-ID: <19991103005415.A88044@cicely7.cicely.de>
References: <19991031120514.A28103@xs4all.nl> <19991102123553.21474@mojave.sitaranetworks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <19991102123553.21474@mojave.sitaranetworks.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote:
> On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote:
> >
> >
> > hi,
> >
> > here's my list of features I'd like to see in a
> > journalled fs. Have to admit this list is heavily
> > inspired ( ok , copied ) from the VxFS features,
> > apart from th buzz words,
> > some of them make sense, some of them don't
> > but it should give us some stuff to discus:
> > [snip]
> > 6) vinum integration (vague)
> 
> Vinum is just a virtual disk.  As such, any file system should work on
> it.
> 
It is more than that - it is a volume manager.
Maybe you are not clear how far you got beyound the virtual disk.
It manages disks and can find it's drive properly if they changed devices - 
that's working relay fine that I was able to remove nearly all wire
configurations for drives and I'm eaven run a volume with only one single
drive plex - just to get this feature.
It can (or should be able to) resize a volume and should inform the system
about.
I have some ideas about how to get FFS resizeable without needing to freeze or
umount it before and without loosing inodes.
Vinum is the frontend for managing the size of the volume and it should inform
the fs driver about any change, because there is no need to manualy call an
additional tool.
My point is modifying FFS but that's the same for any fs.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 16:17:14 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id E046A152F8
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 16:17:02 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id TAA68704;
	Tue, 2 Nov 1999 19:16:55 -0500 (EST)
Date: Tue, 2 Nov 1999 19:16:55 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Bernd Walter <ticso@cicely.de>
Cc: Rodney <rr@xs4all.nl>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991103005415.A88044@cicely7.cicely.de>
Message-ID: <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, 3 Nov 1999, Bernd Walter wrote:

> On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote:
> > On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote:
> > >
> > >
> > > hi,
> > >
> > > here's my list of features I'd like to see in a
> > > journalled fs. Have to admit this list is heavily
> > > inspired ( ok , copied ) from the VxFS features,
> > > apart from th buzz words,
> > > some of them make sense, some of them don't
> > > but it should give us some stuff to discus:
> > > [snip]
> > > 6) vinum integration (vague)
> > 
> > Vinum is just a virtual disk.  As such, any file system should work on
> > it.
> > 
> It is more than that - it is a volume manager.
> Maybe you are not clear how far you got beyound the virtual disk.
> It manages disks and can find it's drive properly if they changed devices - 
> that's working relay fine that I was able to remove nearly all wire
> configurations for drives and I'm eaven run a volume with only one single
> drive plex - just to get this feature.
> It can (or should be able to) resize a volume and should inform the system
> about.

  I am under the impression that you can only enlarge a vinum volume if it
in a RAID 0 configuration (concatenation). Obviously, it would be very
difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
restriping the data across all disks; I'm not familiar with any product,
hardware or software, that can do this.
  Besides the fact that this would be an issue for any RAID controller
also. Anyone with a RAID controller can add a new disk to their RAID 0 and
enlarge the virtual disk. Those controllers aren't going to tell you about
the increased disk size any more than vinum does. Beyond that, who is to
say that the entire size of the new, enlarged, virtual disk is supposed be
dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to
add disk space to a RAID array and partition it as say FAT32?

  I think what Greg was getting at as far as the file system is concerned,
vinum just looks like a disk. Whatever else vinum may be, to the file
system it just looks like a disk.

> I have some ideas about how to get FFS resizeable without needing to freeze or
> umount it before and without loosing inodes.

  This is great, but I think that "vinum hooks" are no more needed than
"ccd hooks" or "DPT hooks". User-land tools should allow the administrator
to resize the file system at the administrators discretion. Beyond the
technical issues of providing hooks to automatically extend file systems,
there is the social implication of whether that is what the user wanted.
User-land tools solve both problems.

> Vinum is the frontend for managing the size of the volume and it should inform
> the fs driver about any change, because there is no need to manualy call an
> additional tool.
> My point is modifying FFS but that's the same for any fs.
> 
> 

  No (see above). Forget about vinum, just worry about disks. Vinum will
play nice and pretend to be a disk. In the end you will have a cleaner
solution that plays nice with others too. Everyone will love the fact that
they can extend any disk, at command, either by adding drives to their
vinum config, their hardware RAID array, or finally whiping Windows off
their home system.

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Tue Nov  2 16:21:59 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 4B484152B2
	for <freebsd-fs@FreeBSD.ORG>; Tue,  2 Nov 1999 16:21:56 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id TAA83776;
	Tue, 2 Nov 1999 19:21:51 -0500 (EST)
Date: Tue, 2 Nov 1999 19:21:51 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Bernd Walter <ticso@cicely.de>
Cc: Rodney <rr@xs4all.nl>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com>
Message-ID: <Pine.BSF.4.05.9911021920310.3164-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 
>   I am under the impression that you can only enlarge a vinum volume if it
> in a RAID 0 configuration (concatenation). Obviously, it would be very
> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
> restriping the data across all disks; I'm not familiar with any product,
> hardware or software, that can do this.

  Oops, my mistake. Scratch the RAID 1, mirroring should be relatively
simple to extend. But the rest of the discussion is still valid, I think.

  Kelly

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3  1:21:41 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from ns1.yes.no (ns1.yes.no [195.204.136.10])
	by hub.freebsd.org (Postfix) with ESMTP id 53BA1157CF
	for <freebsd-fs@FreeBSD.org>; Wed,  3 Nov 1999 01:21:33 -0800 (PST)
	(envelope-from eivind@bitbox.follo.net)
Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218])
	by ns1.yes.no (8.9.3/8.9.3) with ESMTP id KAA05915;
	Wed, 3 Nov 1999 10:19:03 +0100 (CET)
Received: (from eivind@localhost)
	by bitbox.follo.net (8.8.8/8.8.6) id KAA85241;
	Wed, 3 Nov 1999 10:18:58 +0100 (MET)
Date: Wed, 3 Nov 1999 10:18:58 +0100
From: Eivind Eklund <eivind@FreeBSD.org>
To: Greg Lehey <grog@lemis.com>
Cc: Don <don@calis.blacksun.org>, Jacques Vidrine <n@nectar.com>,
	freebsd-fs@FreeBSD.org
Subject: Re: journaling UFS and LFS
Message-ID: <19991103101858.E72085@bitbox.follo.net>
References: <19991030233304.03DB31DA4@bone.nectar.com> <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org> <19991101171936.J72085@bitbox.follo.net> <19991102154614.55760@mojave.sitaranetworks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <19991102154614.55760@mojave.sitaranetworks.com>; from grog@lemis.com on Tue, Nov 02, 1999 at 03:46:14PM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, Nov 02, 1999 at 03:46:14PM -0500, Greg Lehey wrote:
> On Monday,  1 November 1999 at 17:19:36 +0100, Eivind Eklund wrote:
> > On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote:
> >> This is getting off topic. What features would you like to see in a new
> >> file system. Some suggestions were made. Would you like to add anything to
> >> this list?
> >
> > Yes.
> > * Easy to do concurrent access from multiple hosts to the same
> >   physical media
> 
> You can never do this in the general case (where any host may request
> access to any part of the disk).  The best you could do there is a
> file server, but they're not quite our terms of reference.

I don't get this.  To give a little more detail in what I mean: You
have the FS export a bunch of locks into the DLM (Distributed Lock
Manager) you are running (probably over the bus you use to share
access to the disks, but you can use another connection media as long
as it is there), and the host that wants to do something to some part
of the FS grabs the relevant lock.  You also design the disk layout to
allow writing in a transactional way, so a host failure while the host
has a lock doesn't hurt the other hosts accessing the same physical
media.

I don't get what "general case" there is, as you're designing the
system - could you please explain?

> > * Ability to span more than one disk
> 
> That's not necessarily a file system feature.  Vinum does that now.

Sure.  The reason for having it in the FS is that you can optimize for
the independence of your spindles.  This lets you:
* Write logs and data to separate spindles (increasing performance)
* Give performance guarantees proportional to the number and features
  of your spindles, instead of being limited by what your weakest link
  can do (times one)
* Optimize data layout to be able to do a semi-recovery after losing
  one of your spindles
* (irrelevant unless we extend the userland interface, which was
  planned for G2) Give different guarantees for different files in the
  same namespace.  You may need RAID-0 to get the speed wanted for one
  non-critical file, while wanting RAID-5 to store a file that need
  safe storage, but don't need fast streaming.

> > I have design papers on the FS designed for G2, which was intended to
> > support all of the features I've seen listed so far.  It has a couple
> > of drawbacks:
> > (1) It is not designed to have the semantics of a standard Unix
> >     filesystem.
> 
> That doesn't surprise me, if you want to implement the first of your
> suggestions.

Actually, that's not a problem - but we decided against pushing any
complexity into the bottom end filesystem if we could do it well in a
stacking layer.

> Is there anything in there which would be of interest in our
> environment?

As I said, it supports all features I've seen mentioned (by anybody)
so far in the discussion.  Its most most significant design goal was
to support Highly Available Systems; that is, clusters.  The design
allows more than one machine in a cluster to access a shared disk with
a HAS-FS on it, with the system as a whole surviving the (unplanned)
loss of any individual member.

I think we ended up supporting transactions built from several file
operations in multi-machine context, too, but I'm not 100% sure (it is
almost 1 1/2 year since Simon and I did the design, which was done
during a single three-week session in the same physical location, and
I've not worked with the spec since).

Eivind.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3  1:54: 1 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id 5E1FB15530
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 01:53:55 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id KAA02544;
	Wed, 3 Nov 1999 10:46:59 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id KAA90686;
	Wed, 3 Nov 1999 10:53:33 +0100 (CET)
Date: Wed, 3 Nov 1999 10:53:33 +0100
From: Bernd Walter <ticso@cicely.de>
To: Kelly Yancey <kbyanc@posi.net>
Cc: Bernd Walter <ticso@cicely.de>, Rodney <rr@xs4all.nl>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Message-ID: <19991103105333.A89617@cicely7.cicely.de>
References: <19991103005415.A88044@cicely7.cicely.de> <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Tue, Nov 02, 1999 at 07:16:55PM -0500, Kelly Yancey wrote:
> On Wed, 3 Nov 1999, Bernd Walter wrote:
> 
> > On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote:
> > > On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote:
> > > >
> > > >
> > > > hi,
> > > >
> > > > here's my list of features I'd like to see in a
> > > > journalled fs. Have to admit this list is heavily
> > > > inspired ( ok , copied ) from the VxFS features,
> > > > apart from th buzz words,
> > > > some of them make sense, some of them don't
> > > > but it should give us some stuff to discus:
> > > > [snip]
> > > > 6) vinum integration (vague)
> > > 
> > > Vinum is just a virtual disk.  As such, any file system should work on
> > > it.
> > > 
> > It is more than that - it is a volume manager.
> > Maybe you are not clear how far you got beyound the virtual disk.
> > It manages disks and can find it's drive properly if they changed devices - 
> > that's working relay fine that I was able to remove nearly all wire
> > configurations for drives and I'm eaven run a volume with only one single
> > drive plex - just to get this feature.
> > It can (or should be able to) resize a volume and should inform the system
> > about.
> 
>   I am under the impression that you can only enlarge a vinum volume if it
> in a RAID 0 configuration (concatenation). Obviously, it would be very
> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
> restriping the data across all disks; I'm not familiar with any product,
> hardware or software, that can do this.

In case of Striping which is valid for Raid5 and concatenated Raid0 configrations
it is not simply possible to do.
But think of a Raid5 volume which is extended with concatenating another Raid5 set.
This is not doable with vinum - but I'm shure that this won't happen before anyone
is using such a feature feature.

>   Besides the fact that this would be an issue for any RAID controller

No.
Most Controllers I have seen increases the size of a disk - not a volume.

> also. Anyone with a RAID controller can add a new disk to their RAID 0 and
> enlarge the virtual disk. Those controllers aren't going to tell you about
> the increased disk size any more than vinum does. Beyond that, who is to

They don't need, because the partition the fs is on won't increases if the
virtual disk is getting bigger.

> say that the entire size of the new, enlarged, virtual disk is supposed be
> dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to
> add disk space to a RAID array and partition it as say FAT32?

That's why it may be interesting to add such hooks to disklabel.

> 
>   I think what Greg was getting at as far as the file system is concerned,
> vinum just looks like a disk. Whatever else vinum may be, to the file
> system it just looks like a disk.
> 
> > I have some ideas about how to get FFS resizeable without needing to freeze or
> > umount it before and without loosing inodes.
> 
>   This is great, but I think that "vinum hooks" are no more needed than
> "ccd hooks" or "DPT hooks". User-land tools should allow the administrator
> to resize the file system at the administrators discretion. Beyond the
> technical issues of providing hooks to automatically extend file systems,
> there is the social implication of whether that is what the user wanted.
> User-land tools solve both problems.
DPT should be obsolete because the don't change the size of a partition.
ccd's should be partionioned too and is not that usefull any more compared to
vinum.
vinum and disklabel are the hooks, but I think vinum is more usefull.
Greg already is about to implement spare disk support.
What about a kind of spare disk which is scheduled to increase a FS
automaticaly if running out of space.
Features like this need interaction between the fs and the volumemanager.
Of course Hardware Raid's are a point too - but that's more difficult.

> 
> > Vinum is the frontend for managing the size of the volume and it should inform
> > the fs driver about any change, because there is no need to manualy call an
> > additional tool.
> > My point is modifying FFS but that's the same for any fs.
> > 
> > 
> 
>   No (see above). Forget about vinum, just worry about disks. Vinum will
> play nice and pretend to be a disk. In the end you will have a cleaner
> solution that plays nice with others too. Everyone will love the fact that
> they can extend any disk, at command, either by adding drives to their
> vinum config, their hardware RAID array, or finally whiping Windows off
> their home system.
> 
I don't want vinum or anything else like this know how to resize a fs, but
I want them to be able to call the needed tools automaticaly.
Think of decreasing - firt you have to find out how big the new partition
will become - then you need to decrease the fs and finaly you have to
decrease the volume.
3 Points to do with the possibility to shoot yourself in the foot.
If vinum calls the tool and say "the user want this volume to decrease 134Meg
Do want is needed so I can do what the user wants" it is easier and less
likely to get you in troubles.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3  8:40:41 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 36BBD15103
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 08:40:35 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id LAA27925;
	Wed, 3 Nov 1999 11:40:24 -0500 (EST)
Date: Wed, 3 Nov 1999 11:40:24 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Bernd Walter <ticso@cicely.de>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991103105333.A89617@cicely7.cicely.de>
Message-ID: <Pine.BSF.4.05.9911031032500.26857-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, 3 Nov 1999, Bernd Walter wrote:

> > 
> >   I am under the impression that you can only enlarge a vinum volume if it
> > in a RAID 0 configuration (concatenation). Obviously, it would be very
> > difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
> > restriping the data across all disks; I'm not familiar with any product,
> > hardware or software, that can do this.
> 
> In case of Striping which is valid for Raid5 and concatenated Raid0 configrations
> it is not simply possible to do.
> But think of a Raid5 volume which is extended with concatenating another Raid5 set.
> This is not doable with vinum - but I'm shure that this won't happen before anyone
> is using such a feature feature.

  That sounds more like a RAID 5/0 config. While I've never seen a
hardware vendor advertise support for such a creature, it should
theoretically be possible.
  However, vinum volumes can only provide mirroring between plexes so
it is impossible for vinum to extend a volume composed of RAID 5 plexes
via concatenation. On the other hand, I see that Greg has "Extending
striped and RAID-5 plexes" on his TODO list for vinum, presumably by
[shudder] restriping everything.

> 
> >   Besides the fact that this would be an issue for any RAID controller
> 
> No.
> Most Controllers I have seen increases the size of a disk - not a volume.

  Sorry, I was thinking about the software in RAID controllers in the same
terms as vinum. You are correct, though, that to the OS it appears as a
single disk which has been enlarged. The same thing, though, is true with
vinum; it should appear simply as though the disk were enlarged (albeit a
"virtual disk").
  No file system should care whether a disk is a "real" disk or a
"virtual" disk or else a "virtual" disk isn't very virtual.

> 
> > also. Anyone with a RAID controller can add a new disk to their RAID 0 and
> > enlarge the virtual disk. Those controllers aren't going to tell you about
> > the increased disk size any more than vinum does. Beyond that, who is to
> 
> They don't need, because the partition the fs is on won't increases if the
> virtual disk is getting bigger.

  I need to clarify terminology here just for myself, because otherwise
we're getting into confusing territory...

  partition: UNIX-style partitions of which there can be 8 (lettered a-h);
	     exist in the disklabel of a slice.
  slice: PC-style partitioning of disk space of which there can be 4;
	     exist in the master boot record.

  vinum doesn't support partitions; I don't know whether it supports
slices.

  Now, if vinum supports slices, then vinum doesn't care what filesystem
one puts on it (ie how it is sliced up). In which case, one could use
vinum to manage a virtual disk with NTFS on one slice and FFS on another.
  However, if it does not support slices, which I suspect it doesn't, then
then entire volume must be dedicated to a single file system. So
arguably, yes, if someone were to extend the size of the virtual disk
(presumably by adding physical disks to the plex), it would be reasonable
to assume that any existing filesystem should be extended to fill the new
space.

  What I can't figure out is why Greg doesn't support slicing
/ partitioning the virtual disk (this is really the only thing that
prevents it from being 100% transparent in my estimation). With a
MBR, vinum could be used to hold any filesystem (ie. NTFS, ext2, or FAT32)
or any combination thereof; with a disklabel vinum wouldn't require
kludges like newfs -v.

> 
> > say that the entire size of the new, enlarged, virtual disk is supposed be
> > dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to
> > add disk space to a RAID array and partition it as say FAT32?
> 
> That's why it may be interesting to add such hooks to disklabel.
> 

  You are saying so that when someone updates the disklabel to specify a
larger partition, the hooks would be used to notify the filesystem which
could then do the dirty work?
  You haven't happened to visit the Pacific Northwest recent, perhaps near
the town of Redmond, WA? :) Seriously, such hooks would have to be in the
kernel, not the disklabel program, in the off chance someone uses a tool
other than disklabel to edit the partition table.

> > 
> >   I think what Greg was getting at as far as the file system is concerned,
> > vinum just looks like a disk. Whatever else vinum may be, to the file
> > system it just looks like a disk.
> > 
> > > I have some ideas about how to get FFS resizeable without needing to freeze or
> > > umount it before and without loosing inodes.
> > 
> >   This is great, but I think that "vinum hooks" are no more needed than
> > "ccd hooks" or "DPT hooks". User-land tools should allow the administrator
> > to resize the file system at the administrators discretion. Beyond the
> > technical issues of providing hooks to automatically extend file systems,
> > there is the social implication of whether that is what the user wanted.
> > User-land tools solve both problems.
>
> DPT should be obsolete because the don't change the size of a partition.
> ccd's should be partionioned too and is not that usefull any more compared to
> vinum.
> vinum and disklabel are the hooks, but I think vinum is more usefull.
> Greg already is about to implement spare disk support.
> What about a kind of spare disk which is scheduled to increase a FS
> automaticaly if running out of space.
> Features like this need interaction between the fs and the volumemanager.
> Of course Hardware Raid's are a point too - but that's more difficult.
>

  Basically what we need is a filesystem-specific resize function which
userland tools could use a syscall to request a filesystem be resized, and
the filesystem itself would do the implemention. Assuming vinum remains
the special case of only allowing one file system on it, it would safe for
it to call the filesystem resize routine when it brings the spare on-line.
However, personally I would like to see vinum become a true virtual disk,
allowing multiple file systems. In which case, I don't see where anything
other than userland tools would access this interface.
 
> >   No (see above). Forget about vinum, just worry about disks. Vinum will
> > play nice and pretend to be a disk. In the end you will have a cleaner
> > solution that plays nice with others too. Everyone will love the fact that
> > they can extend any disk, at command, either by adding drives to their
> > vinum config, their hardware RAID array, or finally whiping Windows off
> > their home system.
> > 
>
> I don't want vinum or anything else like this know how to resize a fs, but
> I want them to be able to call the needed tools automaticaly.
> Think of decreasing - firt you have to find out how big the new partition
> will become - then you need to decrease the fs and finaly you have to
> decrease the volume.
> 3 Points to do with the possibility to shoot yourself in the foot.
> If vinum calls the tool and say "the user want this volume to decrease 134Meg
> Do want is needed so I can do what the user wants" it is easier and less
> likely to get you in troubles.
> 

  This is nice in theory. The tools should still be there to access the
functionality, though. My only question is: how does vinum *know* what you
want to do. Clearly, in it's current state, it is easy to determine when
to enlarge a filesystem (basically whenever more space available); but you
can't *know* when the user wants to shrink the filesystem. Userland tools
are the only way for the user to tell you.

  Kelly

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3  9:29:50 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id A0C68154D7
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 09:29:32 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id SAA10842;
	Wed, 3 Nov 1999 18:22:38 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id SAA92054;
	Wed, 3 Nov 1999 18:29:13 +0100 (CET)
Date: Wed, 3 Nov 1999 18:29:13 +0100
From: Bernd Walter <ticso@cicely.de>
To: Kelly Yancey <kbyanc@posi.net>
Cc: Bernd Walter <ticso@cicely.de>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Message-ID: <19991103182912.A92011@cicely7.cicely.de>
References: <19991103105333.A89617@cicely7.cicely.de> <Pine.BSF.4.05.9911031032500.26857-100000@kronos.alcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <Pine.BSF.4.05.9911031032500.26857-100000@kronos.alcnet.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, Nov 03, 1999 at 11:40:24AM -0500, Kelly Yancey wrote:
> On Wed, 3 Nov 1999, Bernd Walter wrote:
> 
>   That sounds more like a RAID 5/0 config. While I've never seen a
> hardware vendor advertise support for such a creature, it should
> theoretically be possible.
That's what I mean.
With the Metadisk software on Solaris it is possible to do.

>   However, vinum volumes can only provide mirroring between plexes so
> it is impossible for vinum to extend a volume composed of RAID 5 plexes
> via concatenation. On the other hand, I see that Greg has "Extending
> striped and RAID-5 plexes" on his TODO list for vinum, presumably by
> [shudder] restriping everything.
I asume Greg will do the right thing so everyone should be happy.

> 
>   Sorry, I was thinking about the software in RAID controllers in the same
> terms as vinum. You are correct, though, that to the OS it appears as a
> single disk which has been enlarged. The same thing, though, is true with
> vinum; it should appear simply as though the disk were enlarged (albeit a
> "virtual disk").
>   No file system should care whether a disk is a "real" disk or a
> "virtual" disk or else a "virtual" disk isn't very virtual.

To be exact vinum does not create a disk in the usual way.
The volume it creates don't get partitioned like the ccd ones.

> 
>   vinum doesn't support partitions; I don't know whether it supports
> slices.
> 
>   Now, if vinum supports slices, then vinum doesn't care what filesystem
> one puts on it (ie how it is sliced up). In which case, one could use
> vinum to manage a virtual disk with NTFS on one slice and FFS on another.
>   However, if it does not support slices, which I suspect it doesn't, then
> then entire volume must be dedicated to a single file system. So
> arguably, yes, if someone were to extend the size of the virtual disk
> (presumably by adding physical disks to the plex), it would be reasonable
> to assume that any existing filesystem should be extended to fill the new
> space.

I don't see the need for soing that.
If you want to have - say a RAID5 volume partitioned you can also create
2 Volumes with one Raid5 plex. The layout on the disk should be the same.

> 
>   What I can't figure out is why Greg doesn't support slicing
> / partitioning the virtual disk (this is really the only thing that
> prevents it from being 100% transparent in my estimation). With a
> MBR, vinum could be used to hold any filesystem (ie. NTFS, ext2, or FAT32)
> or any combination thereof; with a disklabel vinum wouldn't require
> kludges like newfs -v.

That's a drive naming thing not the label.
Vinum creates an artificial label and you only need to use newfs -v in some cases.
I usually name my volumes d0,d1,d2,... and I don't need to use the -v switch.
If I remember this right the volume needs to end on 0-9,a-h.
The vinum volumes are useable only with an operating system supporting vinum,
it is more a fs issue that limits the further use.
The partion name may be a point - I never thought about it.

> 
> > 
> > > say that the entire size of the new, enlarged, virtual disk is supposed be
> > > dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to
> > > add disk space to a RAID array and partition it as say FAT32?
> > 
> > That's why it may be interesting to add such hooks to disklabel.
> > 
> 
>   You are saying so that when someone updates the disklabel to specify a
> larger partition, the hooks would be used to notify the filesystem which
> could then do the dirty work?
>   You haven't happened to visit the Pacific Northwest recent, perhaps near
> the town of Redmond, WA? :) Seriously, such hooks would have to be in the
> kernel, not the disklabel program, in the off chance someone uses a tool
> other than disklabel to edit the partition table.

That's an option too - but of course anyone should always be able to do everything
manualy.

> 
>   Basically what we need is a filesystem-specific resize function which
> userland tools could use a syscall to request a filesystem be resized, and
> the filesystem itself would do the implemention. Assuming vinum remains
> the special case of only allowing one file system on it, it would safe for
> it to call the filesystem resize routine when it brings the spare on-line.
> However, personally I would like to see vinum become a true virtual disk,
> allowing multiple file systems. In which case, I don't see where anything
> other than userland tools would access this interface.
 
In my opinion vinum should not remain a special case but a usual.
Vinum brings the toolset to manage and handle volumes - why not implementing
hooks for the dependencys?

> > >   No (see above). Forget about vinum, just worry about disks. Vinum will
> > > play nice and pretend to be a disk. In the end you will have a cleaner
> > > solution that plays nice with others too. Everyone will love the fact that
> > > they can extend any disk, at command, either by adding drives to their
> > > vinum config, their hardware RAID array, or finally whiping Windows off
> > > their home system.
> > > 
> >
> > I don't want vinum or anything else like this know how to resize a fs, but
> > I want them to be able to call the needed tools automaticaly.
> > Think of decreasing - firt you have to find out how big the new partition
> > will become - then you need to decrease the fs and finaly you have to
> > decrease the volume.
> > 3 Points to do with the possibility to shoot yourself in the foot.
> > If vinum calls the tool and say "the user want this volume to decrease 134Meg
> > Do want is needed so I can do what the user wants" it is easier and less
> > likely to get you in troubles.
> > 
> 
>   This is nice in theory. The tools should still be there to access the
> functionality, though. My only question is: how does vinum *know* what you
> want to do. Clearly, in it's current state, it is easy to determine when
> to enlarge a filesystem (basically whenever more space available); but you
> can't *know* when the user wants to shrink the filesystem. Userland tools
> are the only way for the user to tell you.

You tell vinum to decrease the volume space.
Vinum tells the fs-tool that the volume would become smaller with saying how
small.
The fs-tool asks the user if he want's to do this after some sanity checkings.
If the user did not want or the fs decreased successfully vinum decreases the
space and if the decreasing fails vinum should refuse to do.
The key missing is how to determine which kind of fs you have to handle but
that's mostly a definition point.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 11:43:58 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id 31B4F1502E
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 11:43:49 -0800 (PST)
	(envelope-from grog@mojave.sitaranetworks.com)
Message-ID: <19991103144037.41321@mojave.sitaranetworks.com>
Date: Wed, 3 Nov 1999 14:40:37 -0500
From: Greg Lehey <grog@mojave.sitaranetworks.com>
To: Kelly Yancey <kbyanc@posi.net>, Bernd Walter <ticso@cicely.de>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Reply-To: Greg Lehey <grog@lemis.com>
References: <19991103105333.A89617@cicely7.cicely.de> <Pine.BSF.4.05.9911031032500.26857-100000@kronos.alcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <Pine.BSF.4.05.9911031032500.26857-100000@kronos.alcnet.com>; from Kelly Yancey on Wed, Nov 03, 1999 at 11:40:24AM -0500
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday,  3 November 1999 at 11:40:24 -0500, Kelly Yancey wrote:
> On Wed, 3 Nov 1999, Bernd Walter wrote:
>
>>>
>>>   I am under the impression that you can only enlarge a vinum volume if it
>>> in a RAID 0 configuration (concatenation). Obviously, it would be very
>>> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
>>> restriping the data across all disks; I'm not familiar with any product,
>>> hardware or software, that can do this.
>>
>> In case of Striping which is valid for Raid5 and concatenated Raid0
>> configrations it is not simply possible to do.  But think of a
>> Raid5 volume which is extended with concatenating another Raid5
>> set.  This is not doable with vinum - but I'm shure that this won't
>> happen before anyone is using such a feature feature.
>
>   That sounds more like a RAID 5/0 config. While I've never seen a
> hardware vendor advertise support for such a creature, it should
> theoretically be possible.
>   However, vinum volumes can only provide mirroring between plexes so
> it is impossible for vinum to extend a volume composed of RAID 5 plexes
> via concatenation. On the other hand, I see that Greg has "Extending
> striped and RAID-5 plexes" on his TODO list for vinum, presumably by
> [shudder] restriping everything.

That's what I'm thinking of.  Yes, it's slow and ugly, but it's a
function that people want.  The obvious current alternative is to back
up the entire volume to tape, rebuild the volume (including
reinitializing in the case of RAID-5) and restoring the data.  By
comparison, restriping looks pretty :-)

There is another way to do this now, on line, if you have enough disk:
create another plex, start it, remove the original plex, remodel it
nearer to the heart's desire, and start it.  It's slow, but not as
slow as backing up to tape, and you can continue to access the volume
while you're doing it.

>>>   Besides the fact that this would be an issue for any RAID controller
>>
>> No.
>> Most Controllers I have seen increases the size of a disk - not a volume.
>
>   Sorry, I was thinking about the software in RAID controllers in the same
> terms as vinum. You are correct, though, that to the OS it appears as a
> single disk which has been enlarged. The same thing, though, is true with
> vinum; it should appear simply as though the disk were enlarged (albeit a
> "virtual disk").

Correct.  I don't really see a difference here, except maybe in
terminology.  Note that many operating systems refer to disks as
volumes, however.

>   No file system should care whether a disk is a "real" disk or a
> "virtual" disk or else a "virtual" disk isn't very virtual.

Almost correct.  It's useful to understand the geometry of a stripe
set when setting up ufs; it's very easy to end up with all cylinder
groups on the same spindle.

>>> also. Anyone with a RAID controller can add a new disk to their RAID 0 and
>>> enlarge the virtual disk. Those controllers aren't going to tell you about
>>> the increased disk size any more than vinum does. Beyond that, who is to
>>
>> They don't need, because the partition the fs is on won't increases if the
>> virtual disk is getting bigger.
>
>   I need to clarify terminology here just for myself, because otherwise
> we're getting into confusing territory...
>
>   partition: UNIX-style partitions of which there can be 8 (lettered a-h);
> 	     exist in the disklabel of a slice.
>   slice: PC-style partitioning of disk space of which there can be 4;
> 	     exist in the master boot record.
>
>   vinum doesn't support partitions; I don't know whether it supports
> slices.

Vinum does support partitions, because there's nothing you can do to
stop it doing so.  They just don't make sense in a Vinum context.

>   Now, if vinum supports slices, then vinum doesn't care what filesystem
> one puts on it (ie how it is sliced up). In which case, one could use
> vinum to manage a virtual disk with NTFS on one slice and FFS on another.
>   However, if it does not support slices, which I suspect it doesn't, then
> then entire volume must be dedicated to a single file system. So
> arguably, yes, if someone were to extend the size of the virtual disk
> (presumably by adding physical disks to the plex), it would be reasonable
> to assume that any existing filesystem should be extended to fill the new
> space.

Slices are supported too, at least as far as the underlying disk code
is fooled by a Vinum volume.  But they don't make sense.

>   What I can't figure out is why Greg doesn't support slicing
> / partitioning the virtual disk (this is really the only thing that
> prevents it from being 100% transparent in my estimation).

As I said, they are supported, but they don't make sense.  Vinum has
its own, more flexible method for subdividing disks.

> With a MBR, vinum could be used to hold any filesystem (ie. NTFS,
> ext2, or FAT32) or any combination thereof;

It can now.  You don't need an MBR, since the bootstrap doesn't
understand Vinum.  And the usefulness of ext2 or NTFS file systems is
limited, since Linux and NT don't understand Vinum.

> with a disklabel vinum wouldn't require kludges like newfs -v.

newfs -v is needed because newfs *without* -v is a kludge.  It
shouldn't assume anything from the name of a partition.

>>> say that the entire size of the new, enlarged, virtual disk is supposed be
>>> dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to
>>> add disk space to a RAID array and partition it as say FAT32?
>>
>> That's why it may be interesting to add such hooks to disklabel.
>
>   You are saying so that when someone updates the disklabel to specify a
> larger partition, the hooks would be used to notify the filesystem which
> could then do the dirty work?
>   You haven't happened to visit the Pacific Northwest recent, perhaps near
> the town of Redmond, WA? :) Seriously, such hooks would have to be in the
> kernel, not the disklabel program, in the off chance someone uses a tool
> other than disklabel to edit the partition table.

I suppose it's possible to get the Vinum daemon to do this.  In
principle the idea makes sense, but it would need to be done right.  I
can think of a lot of more important stuff to do first. 

>>>   I think what Greg was getting at as far as the file system is concerned,
>>> vinum just looks like a disk. Whatever else vinum may be, to the file
>>> system it just looks like a disk.
>>>
>>>> I have some ideas about how to get FFS resizeable without needing to freeze or
>>>> umount it before and without loosing inodes.
>>>
>>>   This is great, but I think that "vinum hooks" are no more needed than
>>> "ccd hooks" or "DPT hooks". User-land tools should allow the administrator
>>> to resize the file system at the administrators discretion. Beyond the
>>> technical issues of providing hooks to automatically extend file systems,
>>> there is the social implication of whether that is what the user wanted.
>>> User-land tools solve both problems.
>>
>> DPT should be obsolete because the don't change the size of a partition.
>> ccd's should be partionioned too and is not that usefull any more compared to
>> vinum.
>> vinum and disklabel are the hooks, but I think vinum is more usefull.
>> Greg already is about to implement spare disk support.
>> What about a kind of spare disk which is scheduled to increase a FS
>> automaticaly if running out of space.
>> Features like this need interaction between the fs and the volumemanager.
>> Of course Hardware Raid's are a point too - but that's more difficult.
>
>   Basically what we need is a filesystem-specific resize function which
> userland tools could use a syscall to request a filesystem be resized, and
> the filesystem itself would do the implemention. 

Resizing a file system is not a thing you can do in a system call.
Much needs to be done in user context.

> Assuming vinum remains the special case of only allowing one file
> system on it,

I'd rather hope that this should become the norm.

> it would safe for it to call the filesystem resize routine when it
> brings the spare on-line.  However, personally I would like to see
> vinum become a true virtual disk, 

It is :-)

> allowing multiple file systems. 

It doesn't make any sense to do this.

> In which case, I don't see where anything other than userland tools
> would access this interface.

That's the case at the moment.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 13: 7:29 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id E641114DC6
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 13:07:13 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id WAA24867;
	Wed, 3 Nov 1999 22:00:13 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id WAA92866;
	Wed, 3 Nov 1999 22:06:49 +0100 (CET)
Date: Wed, 3 Nov 1999 22:06:48 +0100
From: Bernd Walter <ticso@cicely.de>
To: Greg Lehey <grog@lemis.com>
Cc: Kelly Yancey <kbyanc@posi.net>, Bernd Walter <ticso@cicely.de>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Message-ID: <19991103220648.A92524@cicely7.cicely.de>
References: <19991103105333.A89617@cicely7.cicely.de> <Pine.BSF.4.05.9911031032500.26857-100000@kronos.alcnet.com> <19991103144037.41321@mojave.sitaranetworks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <19991103144037.41321@mojave.sitaranetworks.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, Nov 03, 1999 at 02:40:37PM -0500, Greg Lehey wrote:
> >
> >   You are saying so that when someone updates the disklabel to specify a
> > larger partition, the hooks would be used to notify the filesystem which
> > could then do the dirty work?
> >   You haven't happened to visit the Pacific Northwest recent, perhaps near
> > the town of Redmond, WA? :) Seriously, such hooks would have to be in the
> > kernel, not the disklabel program, in the off chance someone uses a tool
> > other than disklabel to edit the partition table.
> 
> I suppose it's possible to get the Vinum daemon to do this.  In
> principle the idea makes sense, but it would need to be done right.  I
> can think of a lot of more important stuff to do first. 

At least resizing should work before.

> >
> >   Basically what we need is a filesystem-specific resize function which
> > userland tools could use a syscall to request a filesystem be resized, and
> > the filesystem itself would do the implemention. 
> 
> Resizing a file system is not a thing you can do in a system call.
> Much needs to be done in user context.
> 
I have to agree.
Everything possible should be done in user-mode because this keeps the code
nonresident and the kernel small.
But several things need to be done in sync with the incore informations
of the fs - at least if you won't freeze the fs and resync the incore.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 13:22:21 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 436E014CE0
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 13:22:17 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id QAA51846;
	Wed, 3 Nov 1999 16:21:09 -0500 (EST)
Date: Wed, 3 Nov 1999 16:21:09 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Greg Lehey <grog@lemis.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991103144037.41321@mojave.sitaranetworks.com>
Message-ID: <Pine.BSF.4.05.9911031515200.49962-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, 3 Nov 1999, Greg Lehey wrote:

> 
> >   No file system should care whether a disk is a "real" disk or a
> > "virtual" disk or else a "virtual" disk isn't very virtual.
> 
> Almost correct.  It's useful to understand the geometry of a stripe
> set when setting up ufs; it's very easy to end up with all cylinder
> groups on the same spindle.

  But wouldn't the apply to any RAID configuration, not just vinum? Albeit
difficult to extract such information from any hardware vendor.

> 
> Vinum does support partitions, because there's nothing you can do to
> stop it doing so.  They just don't make sense in a Vinum context.
> 

  I was was mislead by http://www.lemis.com/vinum/Object-naming.html into
believing vinum didn't support partitions:
	"Volumes appear to the system to be identical to disks, with one
	 exception. Unlike UNIX drives, Vinum does not partition volumes, 
	 which thus do not contain a partition table. This has required
	 modification to some disk utilities, notably newfs, which
	 previously tried to interpret the last letter of a Vinum volume
	 name as a partition identifier."

> Slices are supported too, at least as far as the underlying disk code
> is fooled by a Vinum volume.  But they don't make sense.

  Well, I would say that they make sense in the sense the vinum creates a
virtual disk which should appear and behave exactly like any physical
disk. You can slice up a physical disk and put FFS and NTFS on separate
slices. The "don't make sense" part seems to stem from the fact that 99.9%
of people don't have any reason to do the OS's associated with the other
file systems don't have vinum to access the virtual disk.
  I've been looking at it as creating something that is indistinquishable
from a physical disk drive. In which case, anything a physical disk can
do, a vinum disk should do too. The theory being that other tools won't
have to adapt then to handle a special case for vinum.
  And you have done that. Very well. I considered newfs -v a special case,
but now I Think Different(tm). :)

> It can now.  You don't need an MBR, since the bootstrap doesn't
> understand Vinum.  And the usefulness of ext2 or NTFS file systems is
> limited, since Linux and NT don't understand Vinum.

  The point wasn't so much the "boot" part of the master boot record, but
rather the PC-compatible partition table that is stored in the first
sector with the MBR. But otherwise, I think we are on the same wavelength.

> newfs -v is needed because newfs *without* -v is a kludge.  It
> shouldn't assume anything from the name of a partition.

  I see the light. I was thinking about how things are and trying to
figure out why vinum didn't emulate the "standard" behaviour. Now I
understand that it is simply because the "standard" behaviour is
misguided. This makes very good sense.

> >   Basically what we need is a filesystem-specific resize function which
> > userland tools could use a syscall to request a filesystem be resized, and
> > the filesystem itself would do the implemention. 
> 
> Resizing a file system is not a thing you can do in a system call.
> Much needs to be done in user context.

  Good point. I was thinking along the lines that the filesystem code
would be best suited to understand the disk layout, so ideally one would
inform the file system that you needed to do some resizing and it would
take care of it. Now that I think about it some more, this was
ill-conceived, not only would it be unreasonable to put the functionality
in the kernel, newfs gives a good precident for userland tools to
implement file-system-specific functionality such as resizing.

> 
> > Assuming vinum remains the special case of only allowing one file
> > system on it,
> 
> I'd rather hope that this should become the norm.

  I meant the virtual disk that vinum presents to the world. I guess
things for physical disks are like that now, if you regard each wd0s1e as
a virtual disk encompassing a subset of the physical disk. In that line of
thinking, the slice table is merely a header written to the disk which
represents a first level of virtualization; disklabels provide a second
level.
  Looking at things this way, vinum volumes are equivalent to wd0s1e-style
volumes. They are both virtual disks (although vinum vastly more
configurable :) ), neither necessarily specifies a 1-1 mapping with
physical disks, and both can only contain a single file system.

  I feel enlightened :) Thank you master.

> 
> Greg
> --

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 16:20:46 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.enteract.com (mail.enteract.com [207.229.143.33])
	by hub.freebsd.org (Postfix) with ESMTP id 94F6A155C2
	for <freebsd-fs@FreeBSD.org>; Wed,  3 Nov 1999 16:20:37 -0800 (PST)
	(envelope-from dscheidt@enteract.com)
Received: from shell-2.enteract.com (dscheidt@shell-2.enteract.com [207.229.143.41])
	by mail.enteract.com (8.9.3/8.9.3) with SMTP id SAA17504
	for <freebsd-fs@FreeBSD.org>; Wed, 3 Nov 1999 18:18:42 -0600 (CST)
	(envelope-from dscheidt@enteract.com)
Date: Wed, 3 Nov 1999 18:18:42 -0600 (CST)
From: David Scheidt <dscheidt@enteract.com>
To: freebsd-fs@FreeBSD.org
Subject: Filesystems reading list?
Message-ID: <Pine.NEB.3.96.991103181305.75280B-100000@shell-2.enteract.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


Does anyone have a list of readings in modern filesystem design?  I
understand the basics, at some high-level.  What technical stuff do I need
to read to get up to speed?


Thanks,

David Scheidt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 16:21:39 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33])
	by hub.freebsd.org (Postfix) with ESMTP id C24331511E
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 16:21:23 -0800 (PST)
	(envelope-from pantzer@speedy.ludd.luth.se)
Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164])
	by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id BAA29232;
	Thu, 4 Nov 1999 01:20:37 +0100
Message-Id: <199911040020.BAA29232@zed.ludd.luth.se>
X-Mailer: exmh version 2.0.1 12/23/97
To: Kelly Yancey <kbyanc@posi.net>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs 
In-Reply-To: Message from Kelly Yancey <kbyanc@posi.net> 
   of "Tue, 02 Nov 1999 19:16:55 EST." <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 04 Nov 1999 01:20:36 +0100
From: Mattias Pantzare <pantzer@ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>   I am under the impression that you can only enlarge a vinum volume if it
> in a RAID 0 configuration (concatenation). Obviously, it would be very
> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
> restriping the data across all disks; I'm not familiar with any product,
> hardware or software, that can do this.

Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a 
RAID 5 set, but the extra disks will only hold data, no parity. I think that 
it is a strange mix of RAID 5 and concatenation. All data is still parity 
protected. It might not be as fast as a true RAID 5, but it can be very usful.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 17:18:47 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from login-2.eunet.no (login-2.eunet.no [193.71.71.239])
	by hub.freebsd.org (Postfix) with ESMTP id 8441A14EF4
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 17:18:39 -0800 (PST)
	(envelope-from mbendiks@eunet.no)
Received: from login-1.eunet.no (mbendiks@login-1.eunet.no [193.71.71.238])
	by login-2.eunet.no (8.9.3/8.9.3/GN) with ESMTP id CAA35867;
	Thu, 4 Nov 1999 02:18:32 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
Received: from localhost (mbendiks@localhost)
	by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id CAA81477;
	Thu, 4 Nov 1999 02:18:31 +0100 (CET)
	(envelope-from mbendiks@eunet.no)
X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs
Date: Thu, 4 Nov 1999 02:18:31 +0100 (CET)
From: Marius Bendiksen <mbendiks@eunet.no>
To: Robert Watson <robert+freebsd@cyrus.watson.org>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems?
In-Reply-To: <Pine.BSF.3.96.991028124327.30145D-100000@fledge.watson.org>
Message-ID: <Pine.BSF.4.05.9911040218230.81470-100000@login-1.eunet.no>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

I believe V9fs covers this.

---
Marius Bendiksen, ScanCall AS <mbendiks@eunet.no>

On Thu, 28 Oct 1999, Robert Watson wrote:

> 
> I'm in the process of hacking up a stupidfs -- i.e., a minimal file system
> module that provides simplistic (i.e., stupid) implementations of all the
> relevant vnops and vfsops based on in-kernel memory.  The purpose of
> stupidfs is to allow file system extension developers (like myself) to be
> able to add new vnops and implement them in a simple file system without
> having to deal initially with the issue of permenant storage in the file
> stores, distributed file systems, etc.  It would be a poor-man's MFS
> (although perhaps more useful than MFS because it doesn't have the weight
> of UFS/FFS tangled up in it, which is what has stopped me from using MFS
> to do the same kind of testing), with it only really being useful for this
> testing purpose.
> 
> However, as this will take a little bit to write, I thought I'd ask if
> anyone else has done this already? :-)
> 
> Right now I pretty much have it to the point where I can see the directory
> structure, create files of up to 1k, etc, etc, but there's a fair amount
> more to do before it's useful.  Those people working on ACLs and MACs for
> POSIX.1e have needed a test framework that doesn't involve seriously
> hurting themselves on the sharp edges of FFS and MFS, but that still
> allows them to actually see the results in a file system.  Layering would
> be another option [if only it worked].  And even with layering, there are
> still complications in implementation -- more complicated, than saying
> "gee, let's extend the inode to have *this* structure in it" and just
> having it work as it backs to nothing and isn't tangled up in the idea of
> backing to something (e.g., MFS).
> 
>   Robert N M Watson 
> 
> robert@fledge.watson.org              http://www.watson.org/~robert/
> PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
> TIS Labs at Network Associates, Safeport Network Services
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message
> 
> 
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 18:13:30 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from macalpine.cornfed.com (macalpine.cornfed.com [208.58.42.162])
	by hub.freebsd.org (Postfix) with ESMTP id 2691014A27
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 18:13:26 -0800 (PST)
	(envelope-from fwmiller@macalpine.cornfed.com)
Received: (from fwmiller@localhost)
	by macalpine.cornfed.com (8.8.8/8.8.8) id VAA03158;
	Wed, 3 Nov 1999 21:11:49 -0500 (EST)
	(envelope-from fwmiller)
From: "Frank W. Miller" <fwmiller@macalpine.cornfed.com>
Message-Id: <199911040211.VAA03158@macalpine.cornfed.com>
Subject: Re: Filesystems reading list?
In-Reply-To: <Pine.NEB.3.96.991103181305.75280B-100000@shell-2.enteract.com> from David Scheidt at "Nov 3, 99 06:18:42 pm"
To: freebsd-fs@FreeBSD.ORG
Date: Wed, 3 Nov 1999 21:11:49 -0500 (EST)
Cc: fwmiller@macalpine.cornfed.com (Frank W. Miller)
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 
> Does anyone have a list of readings in modern filesystem design?  I
> understand the basics, at some high-level.  What technical stuff do I need
> to read to get up to speed?
> 

I would recommend the following papers:

McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S., ``A Fast
File System for UNIX'', ACM TOCS 2, 3 (Aug. 1984) pp. 181-197.

Kleinman, S., ``Vnodes: An Architecture for Multiple File System Types
in Sun UNIX'',  Proc. of the Summer 1986 Conference, USENIX, 1986.

Rosenthal, D., ``Evolving the Vnode Interface'', Proc. of the Summer 1990
Conference, USENIX, 1990.

Skinner, G. and Wong, T., ``Stacking Vnodes: A Progress Report'', Proc.
of the Summer 1993 Conference, USENIX, 1993.

Heidemann, J. and Popek, G, ``File-System Development with Stackable
Layers'', ACM TOCS, 12, 1, 1994.

and the book:

McKusick, M., Bostic, K., Karels, M., and Quarterman, J., The Design and
Implementation of the 4.4BSD Operating System, Addison-Wesley, 1996.


Later,
FM

--
Frank W. Miller
Cornfed Systems Inc
www.cornfed.com


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Wed Nov  3 18:51:41 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20])
	by hub.freebsd.org (Postfix) with ESMTP id DE8BC15615
	for <freebsd-fs@FreeBSD.ORG>; Wed,  3 Nov 1999 18:51:37 -0800 (PST)
	(envelope-from ezk@shekel.mcl.cs.columbia.edu)
Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15])
	by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id VAA10939;
	Wed, 3 Nov 1999 21:49:25 -0500 (EST)
Received: (from ezk@localhost)
	by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id VAA26232;
	Wed, 3 Nov 1999 21:49:24 -0500 (EST)
Date: Wed, 3 Nov 1999 21:49:24 -0500 (EST)
Message-Id: <199911040249.VAA26232@shekel.mcl.cs.columbia.edu>
X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f
From: Erez Zadok <ezk@cs.columbia.edu>
To: "Frank W. Miller" <fwmiller@macalpine.cornfed.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Filesystems reading list? 
In-reply-to: Your message of "Wed, 03 Nov 1999 21:11:49 EST."
             <199911040211.VAA03158@macalpine.cornfed.com> 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <199911040211.VAA03158@macalpine.cornfed.com>, "Frank W. Miller" writes:
> > 
> > Does anyone have a list of readings in modern filesystem design?  I
> > understand the basics, at some high-level.  What technical stuff do I need
> > to read to get up to speed?
> > 
> 
> I would recommend the following papers:
[...]

All good papers.  It depends what area or field you'd like to get into wrt
filesystems, David.  The list Frank supplied is more towards stackable f/s.
There are other papers if you're interested in distributed/network file
systems (e.g., nfs, coda), high performance file systems (xfs, reiserfs),
automounter file systems, (amd, automounter/autofs, hlfsd, Blaze's CFS),
extent-like file systems, journaling file systems, numerous special purpose
file systems, and even more numerous tweaks to existing file systems.  I
have an extensive library of f/s papers I've collected over the past decade,
and I probably give you pointers to many.

> Later,
> FM
> 
> --
> Frank W. Miller
> Cornfed Systems Inc
> www.cornfed.com
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-fs" in the body of the message

Erez.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 10: 2:23 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id C524C14C3E
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 10:02:11 -0800 (PST)
	(envelope-from grog@mojave.sitaranetworks.com)
Message-ID: <19991104130052.13342@mojave.sitaranetworks.com>
Date: Thu, 4 Nov 1999 13:00:52 -0500
From: Greg Lehey <grog@mojave.sitaranetworks.com>
To: Bernd Walter <ticso@cicely.de>, Kelly Yancey <kbyanc@posi.net>
Cc: Rodney <rr@xs4all.nl>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Reply-To: Greg Lehey <grog@lemis.com>
References: <19991103005415.A88044@cicely7.cicely.de> <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com> <19991103105333.A89617@cicely7.cicely.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991103105333.A89617@cicely7.cicely.de>; from Bernd Walter on Wed, Nov 03, 1999 at 10:53:33AM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday,  3 November 1999 at 10:53:33 +0100, Bernd Walter wrote:
> On Tue, Nov 02, 1999 at 07:16:55PM -0500, Kelly Yancey wrote:
>> On Wed, 3 Nov 1999, Bernd Walter wrote:
>>
>>> On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote:
>>>> On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote:
>>>>>
>>>>>
>>>>> hi,
>>>>>
>>>>> here's my list of features I'd like to see in a
>>>>> journalled fs. Have to admit this list is heavily
>>>>> inspired ( ok , copied ) from the VxFS features,
>>>>> apart from th buzz words,
>>>>> some of them make sense, some of them don't
>>>>> but it should give us some stuff to discus:
>>>>> [snip]
>>>>> 6) vinum integration (vague)
>>>>
>>>> Vinum is just a virtual disk.  As such, any file system should work on
>>>> it.
>>>>
>>> It is more than that - it is a volume manager.
>>> Maybe you are not clear how far you got beyound the virtual disk.
>>> It manages disks and can find it's drive properly if they changed devices -
>>> that's working relay fine that I was able to remove nearly all wire
>>> configurations for drives and I'm eaven run a volume with only one single
>>> drive plex - just to get this feature.
>>> It can (or should be able to) resize a volume and should inform the system
>>> about.
>>
>>   I am under the impression that you can only enlarge a vinum volume if it
>> in a RAID 0 configuration (concatenation). Obviously, it would be very
>> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
>> restriping the data across all disks; I'm not familiar with any product,
>> hardware or software, that can do this.
>
> In case of Striping which is valid for Raid5 and concatenated Raid0 configrations
> it is not simply possible to do.
> But think of a Raid5 volume which is extended with concatenating another Raid5 set.
> This is not doable with vinum - but I'm shure that this won't happen before anyone
> is using such a feature feature.

Well, I'm sure that nobody will use this feature until it's available
:-)

Yes, I remember you asking for this feature.  I suppose I should add
it to the wish list (I just forgot to do it).

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 10: 2:44 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157])
	by hub.freebsd.org (Postfix) with ESMTP id 7567D15409
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 10:02:38 -0800 (PST)
	(envelope-from grog@mojave.sitaranetworks.com)
Message-ID: <19991104130052.13342@mojave.sitaranetworks.com>
Date: Thu, 4 Nov 1999 13:00:52 -0500
From: Greg Lehey <grog@mojave.sitaranetworks.com>
To: Bernd Walter <ticso@cicely.de>, Kelly Yancey <kbyanc@posi.net>
Cc: Rodney <rr@xs4all.nl>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Reply-To: Greg Lehey <grog@lemis.com>
References: <19991103005415.A88044@cicely7.cicely.de> <Pine.BSF.4.05.9911021900030.3164-100000@kronos.alcnet.com> <19991103105333.A89617@cicely7.cicely.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991103105333.A89617@cicely7.cicely.de>; from Bernd Walter on Wed, Nov 03, 1999 at 10:53:33AM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wednesday,  3 November 1999 at 10:53:33 +0100, Bernd Walter wrote:
> On Tue, Nov 02, 1999 at 07:16:55PM -0500, Kelly Yancey wrote:
>> On Wed, 3 Nov 1999, Bernd Walter wrote:
>>
>>> On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote:
>>>> On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote:
>>>>>
>>>>>
>>>>> hi,
>>>>>
>>>>> here's my list of features I'd like to see in a
>>>>> journalled fs. Have to admit this list is heavily
>>>>> inspired ( ok , copied ) from the VxFS features,
>>>>> apart from th buzz words,
>>>>> some of them make sense, some of them don't
>>>>> but it should give us some stuff to discus:
>>>>> [snip]
>>>>> 6) vinum integration (vague)
>>>>
>>>> Vinum is just a virtual disk.  As such, any file system should work on
>>>> it.
>>>>
>>> It is more than that - it is a volume manager.
>>> Maybe you are not clear how far you got beyound the virtual disk.
>>> It manages disks and can find it's drive properly if they changed devices -
>>> that's working relay fine that I was able to remove nearly all wire
>>> configurations for drives and I'm eaven run a volume with only one single
>>> drive plex - just to get this feature.
>>> It can (or should be able to) resize a volume and should inform the system
>>> about.
>>
>>   I am under the impression that you can only enlarge a vinum volume if it
>> in a RAID 0 configuration (concatenation). Obviously, it would be very
>> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require
>> restriping the data across all disks; I'm not familiar with any product,
>> hardware or software, that can do this.
>
> In case of Striping which is valid for Raid5 and concatenated Raid0 configrations
> it is not simply possible to do.
> But think of a Raid5 volume which is extended with concatenating another Raid5 set.
> This is not doable with vinum - but I'm shure that this won't happen before anyone
> is using such a feature feature.

Well, I'm sure that nobody will use this feature until it's available
:-)

Yes, I remember you asking for this feature.  I suppose I should add
it to the wish list (I just forgot to do it).

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 13:28: 7 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 8B8201513D
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 13:27:59 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id QAA19515;
	Thu, 4 Nov 1999 16:26:58 -0500 (EST)
Date: Thu, 4 Nov 1999 16:26:58 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Greg Lehey <grog@lemis.com>
Cc: Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991104161317.49512@mojave.sitaranetworks.com>
Message-ID: <Pine.BSF.4.05.9911041617350.19301-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a
> > RAID 5 set, but the extra disks will only hold data, no parity.
> 
> That's normal for RAID-5.  Only one disk in any stripe contains the
> parity information.
> 
> Greg

  Except that, if I understand correctly, the new disks don't have parity
for them stored anywhere. They aren't really in on the RAID 5 game, but
tag along and just pretend to be.

  All this talk about extending RAID 5 plexes has got be thinking about
the oft-overlooked RAID 4. I realize this isn't currently implemented in
vinum, but I understand it has similar (although slightly different, not
worse, just different) performance characteristics to RAID 5. But I would
think that RAID 4 would be much simpler to extend because of the fact
only 1 disk contains the parity; rather than restriping, one only needs to
recalculate parity.

  But then again, RAID 4 is one of the black sheep of the RAID family :)

  Kelly

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 14: 0:54 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 2317614C2B
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 14:00:51 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id QAA20219;
	Thu, 4 Nov 1999 16:58:21 -0500 (EST)
Date: Thu, 4 Nov 1999 16:58:21 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Greg Lehey <grog@lemis.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991104163941.53462@mojave.sitaranetworks.com>
Message-ID: <Pine.BSF.4.05.9911041647050.19966-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, 4 Nov 1999, Greg Lehey wrote:

> On Thursday,  4 November 1999 at 16:26:58 -0500, Kelly Yancey wrote:
> >>> Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a
> >>> RAID 5 set, but the extra disks will only hold data, no parity.
> >>
> >> That's normal for RAID-5.  Only one disk in any stripe contains the
> >> parity information.
> >>
> >> Greg
> >
> >   Except that, if I understand correctly, the new disks don't have parity
> > for them stored anywhere. They aren't really in on the RAID 5 game, but
> > tag along and just pretend to be.
> 
> Ah.  What reason do you have to assume that that's the case?

  I just thought that was what he meant by "but the extra disks will only
hold data, no parity". In RAID 5, all disks will hold a portion of the
parity information.

> 
> >   All this talk about extending RAID 5 plexes has got be thinking about
> > the oft-overlooked RAID 4. I realize this isn't currently implemented in
> > vinum, but I understand it has similar (although slightly different, not
> > worse, just different) performance characteristics to RAID 5.
> 
> It has worse performance characteristics than RAID-5.  It also has no
>  redeeming virtues, except possibly code simplicity.

  I've even had hardware which supported it (always just 0, 1, 0/1, and
5), so I don't have any practical experience (read "arm waving"), but I
have read (mainly in Adaptec paraphenalia) that RAID 4 is supposed to have
slightly better read characteristics.

> 
> > But I would think that RAID 4 would be much simpler to extend
> > because of the fact only 1 disk contains the parity; rather than
> > restriping, one only needs to recalculate parity.
> 
> No, the effort is the same.  It's not recalculating parity that's the
> killer, it's moving all the data around.  Consider the first stripe in
> the plex (which looks identical for RAID-4 and RAID-5):
> 
> Disk 1     2      3      4      5       6      7     8      9   
>  ---------------------------------------------------------
>  |      |      |      |      |      |      |      |  P   |
>  ---------------------------------------------------------
> 
> You have a storage of 7 blocks, each of stripe size (say 7 MB for a 1
> MB stripe size).  The first stripe contains the data for 0 to 6 MB,
> the second stripe contains the data for 7 to 13 MB, the third for 14
> to 20, and so on.
> 
> Add a disk and you get:
> 
>  ------------------------------------------------------------------
>  |      |      |      |      |      |      |      |      |    P   |     
>  ------------------------------------------------------------------
> 
> Now the first stripe must contain the data for 0 to 7 MB, the second
> stripe for 8 to 15 MB, the third for 16 to 23, and so on.  See the
> problem?  Recalculating parity is only part of it, and deciding where
> it ends up (stays on disk 8 for RAID-4, moves to a possibly different
> place for RAID-5) is trivial.
> 
> Greg

  I was thinking that with RAID 4 specifying a single disk to hold all
parity information, the volume manager would record which disk held the
parity information (disk 8 in your example above) so adding another disk
would result in:

 ------------------------------------------------------------------
 |      |      |      |      |      |      |      |   P   |       |     
 ------------------------------------------------------------------

  Which looks odd, but would work, right? Then only parity would need to
be recalculated. It only doesn't work with RAID 5 because the data is
supposed to be distributed uniformly across the disks, so restriping is
required.

  Kelly

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 14:32: 2 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 5E29D1568B
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 14:31:59 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id RAA20925;
	Thu, 4 Nov 1999 17:31:17 -0500 (EST)
Date: Thu, 4 Nov 1999 17:31:17 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Greg Lehey <grog@lemis.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991104170819.58641@mojave.sitaranetworks.com>
Message-ID: <Pine.BSF.4.05.9911041728260.20726-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > Then only parity would need to be recalculated. It only doesn't work
> > with RAID 5 because the data is supposed to be distributed uniformly
> > across the disks, so restriping is required.
> 
> Well, no, you've missed the point: with the exception of the first
> stripe, *all* the data in the plex needs to be reshuffled, whether
> you're doing RAID-4 or RAID-5.
> 
> Greg

  Hmm. Yes, I suppose maintaining the location of data on the disk might
be important :) [ looking for dunce cap ]. The new space would have to
appear at the end of the volume, not scattered throughout it's internals
:)

  Kelly

--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 14:32: 7 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id 8F292156C4
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 14:31:59 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id XAA15254;
	Thu, 4 Nov 1999 23:24:42 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id XAA97857;
	Thu, 4 Nov 1999 23:31:20 +0100 (CET)
Date: Thu, 4 Nov 1999 23:31:20 +0100
From: Bernd Walter <ticso@cicely.de>
To: Kelly Yancey <kbyanc@posi.net>
Cc: Greg Lehey <grog@lemis.com>,
	Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
Message-ID: <19991104233119.A97812@cicely7.cicely.de>
References: <19991104161317.49512@mojave.sitaranetworks.com> <Pine.BSF.4.05.9911041617350.19301-100000@kronos.alcnet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <Pine.BSF.4.05.9911041617350.19301-100000@kronos.alcnet.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, Nov 04, 1999 at 04:26:58PM -0500, Kelly Yancey wrote:
> > > Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a
> > > RAID 5 set, but the extra disks will only hold data, no parity.
> > 
> > That's normal for RAID-5.  Only one disk in any stripe contains the
> > parity information.
> > 
> > Greg
> 
>   Except that, if I understand correctly, the new disks don't have parity
> for them stored anywhere. They aren't really in on the RAID 5 game, but
> tag along and just pretend to be.

If you concatenate a single disk to a raid5 set you will have only the
raid5 range redundant.
You need two or more R1/R5 sets to remain redudand.

> 
>   All this talk about extending RAID 5 plexes has got be thinking about
> the oft-overlooked RAID 4. I realize this isn't currently implemented in
> vinum, but I understand it has similar (although slightly different, not
> worse, just different) performance characteristics to RAID 5. But I would
> think that RAID 4 would be much simpler to extend because of the fact

The simplification shouldn't be that much, but R4 is usually slower because
the biggest load is on the parity and with R4 that's not balanced between
the disks.

> only 1 disk contains the parity; rather than restriping, one only needs to
> recalculate parity.

No - that's prety much the same if the basic is striping.
Beside it should be possible get a concatenated Raid4 layout with vinum if
you create a R5 plex with a stripesize equal to the subdisksize.
Nevertheless vinums paritylocking is not optimal for this case.

R4 is only interesting because you can convert from R0 to R4 and back without
the need to copy any datablocks.
It shouldn't be much work to implement R4 but who realy needs it.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:25: 8 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id 0E06014DFD
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:25:01 -0800 (PST)
	(envelope-from tlambert@usr07.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id QAA05335;
	Thu, 4 Nov 1999 16:24:23 -0700 (MST)
Received: from usr07.primenet.com(206.165.6.207)
 via SMTP by smtp03.primenet.com, id smtpdAAA4faWAj; Thu Nov  4 16:23:50 1999
Received: (from tlambert@localhost)
	by usr07.primenet.com (8.8.5/8.8.5) id QAA20462;
	Thu, 4 Nov 1999 16:23:04 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911042323.QAA20462@usr07.primenet.com>
Subject: Re: journaling UFS and LFS
To: dg@root.com
Date: Thu, 4 Nov 1999 23:23:04 +0000 (GMT)
Cc: tlambert@primenet.com, Stephen.Byan@quantum.com,
	freebsd-fs@FreeBSD.ORG
In-Reply-To: <199911012338.PAA07714@implode.root.com> from "David Greenman" at Nov 1, 99 03:38:21 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> >> >> Softupdates is definitely a viable solution however it does not address
> >> >> several issues and the license is not a BSD license so it makes me
> >> >> uncomfortable.
> >
> >The license issue is a Whistle thing.  Talk to Julian and get him
> >to pound on Doug Brent, preferrably before December 31st of this year.
> 
>    How is the softupdates license a Whistle thing? It seems to me that it is
> a Kirk McKusick and Sun MicroSystems thing.

Whistle requested the license so that Whistle could maintain an
edge over the competition in the same product space.  The duration
that it is under the license in the source tree was negotiated
between Whistle and Kirk for that reason.

The purpose of the Whistle financial support for the implementation
was technically to get rid of the UPS in the InterJet.  I was one
of the main evnagelists of this approach within Whistle, having
worked on an FFS with Soft Updates implementation at the company
I worked at prior to coming to work for Whistle.

As I said, talk to Julian.  I believe we (Whistle) can (and always
intended to) release the code under UCB license after recouping R&D
costs, and there there was in fact a contractually specified date
for this happening.  I don't currently have access to the contract.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:27:48 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33])
	by hub.freebsd.org (Postfix) with ESMTP id 8CAC115022
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:27:39 -0800 (PST)
	(envelope-from pantzer@speedy.ludd.luth.se)
Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164])
	by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id AAA02859;
	Fri, 5 Nov 1999 00:26:14 +0100
Message-Id: <199911042326.AAA02859@zed.ludd.luth.se>
X-Mailer: exmh version 2.0.1 12/23/97
To: Greg Lehey <grog@lemis.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs 
In-Reply-To: Message from Greg Lehey <grog@mojave.sitaranetworks.com> 
   of "Thu, 04 Nov 1999 16:13:17 EST." <19991104161317.49512@mojave.sitaranetworks.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Date: Fri, 05 Nov 1999 00:26:14 +0100
From: Mattias Pantzare <pantzer@ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> On Thursday,  4 November 1999 at  1:20:36 +0100, Mattias Pantzare wrote=
:
> >>   I am under the impression that you can only enlarge a vinum volume=
 if it
> >> in a RAID 0 configuration (concatenation). Obviously, it would be ve=
ry
> >> difficult to enlarge a RAID 1 or RAID 5 configuration as it would re=
quire
> >> restriping the data across all disks; I'm not familiar with any prod=
uct,
> >> hardware or software, that can do this.
> >
> > Solaris DiskSuite almost extends RAID 5 configruations. You can add d=
isks to a
> > RAID 5 set, but the extra disks will only hold data, no parity.
> =

> That's normal for RAID-5.  Only one disk in any stripe contains the
> parity information.

Disk, not stripe.

> =

> > I think that it is a strange mix of RAID 5 and concatenation. All
> > data is still parity protected. It might not be as fast as a true
> > RAID 5, but it can be very usful.
> =

> What's the difference?


If you have 3 disks and 3 stripes and number sectors from 1 to 6:

Disk 1   Disk 2   Disk 3 =

  1        2        P    =

  P        3        4   =

  5        P        6 =


Then add a new disk:

Disk 1   Disk 2   Disk 3   New Disk
  1        2        P        7
  P        3        4        8
  5        P        6        9


All you have to do is recalculate the new parity data when you write new =
data =

if you zero the new disk before using it.

Disk accesses will not be spread out as in normal RAID5, but you still ge=
t =

parity protection.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:29: 8 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131])
	by hub.freebsd.org (Postfix) with ESMTP id D3C2515022
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:29:01 -0800 (PST)
	(envelope-from tlambert@usr07.primenet.com)
Received: (from daemon@localhost)
	by smtp01.primenet.com (8.9.3/8.9.3) id QAA04567;
	Thu, 4 Nov 1999 16:28:05 -0700 (MST)
Received: from usr07.primenet.com(206.165.6.207)
 via SMTP by smtp01.primenet.com, id smtpdAAAYyayYh; Thu Nov  4 16:27:48 1999
Received: (from tlambert@localhost)
	by usr07.primenet.com (8.8.5/8.8.5) id QAA20534;
	Thu, 4 Nov 1999 16:26:32 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911042326.QAA20534@usr07.primenet.com>
Subject: Re: Journaling
To: bouyer@antioche.lip6.fr (Manuel Bouyer)
Date: Thu, 4 Nov 1999 23:26:32 +0000 (GMT)
Cc: tlambert@primenet.com, ken@kdm.org, don@calis.blacksun.org,
	ticso@cicely.de, grog@lemis.com, bright@wintelcom.net,
	freebsd-fs@FreeBSD.ORG
In-Reply-To: <19991102134152.A18969@antioche.lip6.fr> from "Manuel Bouyer" at Nov 2, 99 01:41:52 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> > NetBSD currently supports 16.
> > 
> > Yes, it breaks backward compatability.
> 
> No, NetBSD supports 16 only on ports that started with 16.
> Other still are 8. There are discussions about how  to move to a higther
> number (not 16, but at last 64 or more) without breacking backward
> compatability ...

You can't cross mount media between OSs with the same byte ordering.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:34:54 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22])
	by hub.freebsd.org (Postfix) with ESMTP id 07E661518A
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:34:49 -0800 (PST)
	(envelope-from kbyanc@posi.net)
X-Provider: ALC Communications, Inc.  http://www.alcnet.com/
Received: from localhost (kbyanc@localhost)
	by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id SAA22104;
	Thu, 4 Nov 1999 18:33:24 -0500 (EST)
Date: Thu, 4 Nov 1999 18:33:24 -0500 (EST)
From: Kelly Yancey <kbyanc@posi.net>
X-Sender: kbyanc@kronos.alcnet.com
To: Greg Lehey <grog@lemis.com>
Cc: Bernd Walter <ticso@cicely.de>,
	Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs
In-Reply-To: <19991104174750.37515@mojave.sitaranetworks.com>
Message-ID: <Pine.BSF.4.05.9911041822560.21761-100000@kronos.alcnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, 4 Nov 1999, Greg Lehey wrote:

> 
> That's for writing.  When throughput becomes the limit, the write
> throughput of RAID-4 is limited to about 2 / n of the write throughput
> of RAID-5.  On reading (randomly), it's (n - 1) / n.
> 

  I think that it has been significantly proven that RAID 4 is not very
userful, and I regret bringing it up...sometimes the mind wonders :).

Kelly
--
Kelly Yancey  -  kbyanc@posi.net  -  Richmond, VA
Director of Technical Services, ALC Communications  http://www.alcnet.com/
Maintainer, BSD Driver Database       http://www.posi.net/freebsd/drivers/
Coordinator, Team FreeBSD        http://www.posi.net/freebsd/Team-FreeBSD/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:41: 0 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133])
	by hub.freebsd.org (Postfix) with ESMTP id B95C31518B
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:40:53 -0800 (PST)
	(envelope-from tlambert@usr07.primenet.com)
Received: (from daemon@localhost)
	by smtp03.primenet.com (8.9.3/8.9.3) id QAA11596;
	Thu, 4 Nov 1999 16:39:15 -0700 (MST)
Received: from usr07.primenet.com(206.165.6.207)
 via SMTP by smtp03.primenet.com, id smtpdAAAEyaGGw; Thu Nov  4 16:39:08 1999
Received: (from tlambert@localhost)
	by usr07.primenet.com (8.8.5/8.8.5) id QAA21054;
	Thu, 4 Nov 1999 16:39:22 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <199911042339.QAA21054@usr07.primenet.com>
Subject: Re: Features of a journaled file system
To: grog@lemis.com
Date: Thu, 4 Nov 1999 23:39:22 +0000 (GMT)
Cc: don@calis.blacksun.org, freebsd-fs@FreeBSD.ORG
In-Reply-To: <19991102102601.54815@mojave.sitaranetworks.com> from "Greg Lehey" at Nov 2, 99 10:26:01 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> On Saturday, 30 October 1999 at 18:56:24 -0400, Don wrote:
> > What are the features people would like to see in a new FreeBSD file
> > system? Some of the ones I have heard listed are:
> > 1. Ability to grow a FS
> > 2. Ability to shrink a FS
> > 3. Acess control lists on files and file systems
> > 4. Extensibility. (The ability to easily add new features to the
> >    filesystem without having to rewrite utilities such as fsck)
> 
> None of these are specific features of a journalling file system.
> They're probably all desirable.


ACLs, in particular, should not be a feature of an FS that
manages block allocation policy, but should instead be a
semantic access stacking layer.

It is trivial to write an ACL (or quota) stacking layer, given
working stacking layers.


I think that requests like ACLs, extended attributes, user and
group disk quotas, NT security policy management, etc., etc.,
should all go on the "make stacking layers work" list, _not_
the "write a journalled FS" list.


That said, a journalled FS would be a useful thing to have,
and not just for the marketing bullet item.  I am thinking in
particular about how very easy it would be to implement a
userland accessibly transactioning system and record based
file layout semantics with such a beast... 8-).

I also like the idea of re-seperating the UFS and FFS layers,
so that you could initially work on just the journalling
issues, and we could tackle b-tree based directory management
(for example) in a seperate stacking layer... just like the
UFS stacking layer does by overlaying an alphabetic name, link,
and symlink supporting semantics ontop of the FFS namespace,
which is basically a flat numberic namespace that knows how
to do block management in numerically (inode number) named
objects.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:46:40 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from caspian.plutotech.com (caspian.plutotech.com [206.168.67.80])
	by hub.freebsd.org (Postfix) with ESMTP id 325DD1518D
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:46:36 -0800 (PST)
	(envelope-from gibbs@caspian.plutotech.com)
Received: from caspian.plutotech.com (localhost [127.0.0.1])
	by caspian.plutotech.com (8.9.3/8.9.1) with ESMTP id PAA05113;
	Thu, 4 Nov 1999 15:45:26 -0700 (MST)
	(envelope-from gibbs@caspian.plutotech.com)
Message-Id: <199911042245.PAA05113@caspian.plutotech.com>
X-Mailer: exmh version 2.1.0 09/18/1999
To: Kelly Yancey <kbyanc@posi.net>
Cc: Greg Lehey <grog@lemis.com>, Bernd Walter <ticso@cicely.de>,
	Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs 
In-reply-to: Your message of "Thu, 04 Nov 1999 18:33:24 EST."
             <Pine.BSF.4.05.9911041822560.21761-100000@kronos.alcnet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 04 Nov 1999 15:45:26 -0700
From: "Justin T. Gibbs" <gibbs@FreeBSD.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>On Thu, 4 Nov 1999, Greg Lehey wrote:
>
>> 
>> That's for writing.  When throughput becomes the limit, the write
>> throughput of RAID-4 is limited to about 2 / n of the write throughput
>> of RAID-5.  On reading (randomly), it's (n - 1) / n.
>> 
>
>  I think that it has been significantly proven that RAID 4 is not very
>userful, and I regret bringing it up...sometimes the mind wonders :).

It all depends on your application.  If you are dealing with a data
set composed of large, fixed sized entries, RAID 3 or 4 (they are almost
identical) will always outperform RAID5.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:47:23 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from caspian.plutotech.com (caspian.plutotech.com [206.168.67.80])
	by hub.freebsd.org (Postfix) with ESMTP id 68DB31518D
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:47:16 -0800 (PST)
	(envelope-from gibbs@caspian.plutotech.com)
Received: from caspian.plutotech.com (localhost [127.0.0.1])
	by caspian.plutotech.com (8.9.3/8.9.1) with ESMTP id PAA05113;
	Thu, 4 Nov 1999 15:45:26 -0700 (MST)
	(envelope-from gibbs@caspian.plutotech.com)
Message-Id: <199911042245.PAA05113@caspian.plutotech.com>
X-Mailer: exmh version 2.1.0 09/18/1999
To: Kelly Yancey <kbyanc@posi.net>
Cc: Greg Lehey <grog@lemis.com>, Bernd Walter <ticso@cicely.de>,
	Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: feature list journalled fs 
In-reply-to: Your message of "Thu, 04 Nov 1999 18:33:24 EST."
             <Pine.BSF.4.05.9911041822560.21761-100000@kronos.alcnet.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Thu, 04 Nov 1999 15:45:26 -0700
From: "Justin T. Gibbs" <gibbs@FreeBSD.org>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>On Thu, 4 Nov 1999, Greg Lehey wrote:
>
>> 
>> That's for writing.  When throughput becomes the limit, the write
>> throughput of RAID-4 is limited to about 2 / n of the write throughput
>> of RAID-5.  On reading (randomly), it's (n - 1) / n.
>> 
>
>  I think that it has been significantly proven that RAID 4 is not very
>userful, and I regret bringing it up...sometimes the mind wonders :).

It all depends on your application.  If you are dealing with a data
set composed of large, fixed sized entries, RAID 3 or 4 (they are almost
identical) will always outperform RAID5.

--
Justin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 15:53:30 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38])
	by hub.freebsd.org (Postfix) with ESMTP id 903E51518D
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 15:53:15 -0800 (PST)
	(envelope-from julian@whistle.com)
Received: from current1.whiste.com (current1.whistle.com [207.76.205.22])
	by alpo.whistle.com (8.9.1a/8.9.1) with ESMTP id PAA87407;
	Thu, 4 Nov 1999 15:45:31 -0800 (PST)
Date: Thu, 4 Nov 1999 15:45:30 -0800 (PST)
From: Julian Elischer <julian@whistle.com>
To: Terry Lambert <tlambert@primenet.com>
Cc: dg@root.com, Stephen.Byan@quantum.com, freebsd-fs@FreeBSD.ORG
Subject: Re: journaling UFS and LFS
In-Reply-To: <199911042323.QAA20462@usr07.primenet.com>
Message-ID: <Pine.BSF.4.10.9911041531090.5441-100000@current1.whistle.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Thu, 4 Nov 1999, Terry Lambert wrote:

> > >> >> Softupdates is definitely a viable solution however it does not address
> > >> >> several issues and the license is not a BSD license so it makes me
> > >> >> uncomfortable.
> > >
> > >The license issue is a Whistle thing.  Talk to Julian and get him
> > >to pound on Doug Brent, preferrably before December 31st of this year.
> > 
> >    How is the softupdates license a Whistle thing? It seems to me that it is
> > a Kirk McKusick and Sun MicroSystems thing.
> 
> Whistle requested the license so that Whistle could maintain an
> edge over the competition in the same product space.  The duration
> that it is under the license in the source tree was negotiated
> between Whistle and Kirk for that reason.
> 
> The purpose of the Whistle financial support for the implementation
> was technically to get rid of the UPS in the InterJet.  I was one
> of the main evnagelists of this approach within Whistle, having
> worked on an FFS with Soft Updates implementation at the company
> I worked at prior to coming to work for Whistle.
> 
> As I said, talk to Julian.  I believe we (Whistle) can (and always
> intended to) release the code under UCB license after recouping R&D
> costs, and there there was in fact a contractually specified date
> for this happening.  I don't currently have access to the contract.

Terry is slightly mis-stating the situation

Whistle basically asked Kirk what his plans were and offered to support
his development if he agreed that he would not licence it to a few
specified competitors (not my idea, buthte number is countable on one
hand). Obviously this only holds for as long as he is generally licensing
it. When he releases it, our agreement becomes void (Or so I beleive). I
vaguely remember that we had a request that it not be released in less
than N months or something. since N was less than or equal to M, which was
Kirks own needs, this was a non issue.

Basically Whistle didn't want to be subsidising some particular
competitors. On the other hand Whistle wanted the technology in FreeBSD
and generally usable. The agreement had a end-of-life clause
and I believe that it's actually run out, or close to it.

Part of this is that it had to be explainable to the investors as not
being a gift to the opposition.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Thu Nov  4 18:26: 0 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id B92571568B
	for <freebsd-fs@FreeBSD.ORG>; Thu,  4 Nov 1999 18:25:58 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id DAA29104;
	Fri, 5 Nov 1999 03:19:01 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id DAA99081;
	Fri, 5 Nov 1999 03:25:39 +0100 (CET)
Date: Fri, 5 Nov 1999 03:25:38 +0100
From: Bernd Walter <ticso@cicely.de>
To: Terry Lambert <tlambert@primenet.com>
Cc: Manuel Bouyer <bouyer@antioche.lip6.fr>, ken@kdm.org,
	don@calis.blacksun.org, ticso@cicely.de, grog@lemis.com,
	bright@wintelcom.net, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
Message-ID: <19991105032538.A98956@cicely7.cicely.de>
References: <19991102134152.A18969@antioche.lip6.fr> <199911042326.QAA20534@usr07.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <199911042326.QAA20534@usr07.primenet.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, Nov 04, 1999 at 11:26:32PM +0000, Terry Lambert wrote:
> > > NetBSD currently supports 16.
> > > 
> > > Yes, it breaks backward compatability.
> > 
> > No, NetBSD supports 16 only on ports that started with 16.
> > Other still are 8. There are discussions about how  to move to a higther
> > number (not 16, but at last 64 or more) without breacking backward
> > compatability ...
> 
> You can't cross mount media between OSs with the same byte ordering.
> 
The difference between FreeBSD-i386 and alpha produces the same kind of
frustration.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5  1:14:25 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from antioche.lip6.fr (antioche.lip6.fr [132.227.74.11])
	by hub.freebsd.org (Postfix) with ESMTP id 60ED415280
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 01:14:21 -0800 (PST)
	(envelope-from bouyer@antioche.lip6.fr)
Received: from antifer.ipv6.lip6.fr (antifer.ipv6.lip6.fr [132.227.72.132])
	by antioche.lip6.fr (8.9.3/8.9.3) with ESMTP id KAA08520;
	Fri, 5 Nov 1999 10:11:29 +0100 (MET)
Received: (bouyer@localhost) by antifer.ipv6.lip6.fr (8.8.8/8.6.4) id KAA00614; Fri, 5 Nov 1999 10:10:54 +0100 (MET)
Date: Fri, 5 Nov 1999 10:10:54 +0100
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
To: Terry Lambert <tlambert@primenet.com>
Cc: ken@kdm.org, don@calis.blacksun.org, ticso@cicely.de,
	grog@lemis.com, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG
Subject: Re: Journaling
Message-ID: <19991105101054.B584@antioche.lip6.fr>
References: <19991102134152.A18969@antioche.lip6.fr> <199911042326.QAA20534@usr07.primenet.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.95.6us
In-Reply-To: <199911042326.QAA20534@usr07.primenet.com>; from Terry Lambert on Thu, Nov 04, 1999 at 11:26:32PM +0000
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, Nov 04, 1999 at 11:26:32PM +0000, Terry Lambert wrote:
> You can't cross mount media between OSs with the same byte ordering.

Why ? I surely did miss something here ...
Or maybe you meant 'port' instead of 'os' ?
In which case this is true, but it's because of differences in the
on-disk disklabel format (dependant on firmware).
If your media doesn't have a disklabel no problems (I to this between my
i386 and sparc, NetBSD supports byte-swapped FFS). If your media is partitioned
then you have to put an in-core disklabel matching its partitioning before
mouting it.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
--


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5  6:26:12 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from akat.civ.cvut.cz (akat.civ.cvut.cz [147.32.235.105])
	by hub.freebsd.org (Postfix) with SMTP id BB26A14D32
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 06:26:01 -0800 (PST)
	(envelope-from pechy@hp735.cvut.cz)
Received: from localhost (pechy@localhost) by akat.civ.cvut.cz (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA10512; Fri, 5 Nov 1999 15:23:23 +0100
Date: Fri, 5 Nov 1999 15:23:22 +0100
From: Jan Pechanec <pechy@hp735.cvut.cz>
X-Sender: pechy@akat.civ.cvut.cz
To: Erez Zadok <ezk@cs.columbia.edu>
Cc: Robert Watson <robert+freebsd@cyrus.watson.org>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems? 
In-Reply-To: <199910282122.RAA07811@shekel.mcl.cs.columbia.edu>
Message-ID: <Pine.SGI.4.05.9911051518180.10246-100000@akat.civ.cvut.cz>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Thu, 28 Oct 1999, Erez Zadok wrote:

	Hi,

	I think that it is a bit different. What Robert is hacking is
a filesystem where in-vfs-not-experienced programmer can see how vfs
is working. I have just read some of your papers, Erez, and I think
that wrapfs wants me not to bother with something like vfs (just
encode and decode routines).

	I think that Robert's effort is very useful, I wanted myself
to write somethink like this (purpose: to learn and _touch_ vfs
interface). Robert, do you carry on or not?

	BTW, don't you know why deadfs was written? No doc in FreeBSD.
From what I saw in the source code, operations just fail.

	Thank you, Jan.

>Robert, it's been done.  To some degree that's nullfs (if nullfs had been
>working; the VFS is broken).  I've written stackable f/s templates exactly
>for the purpose of developers using them to build other f/s w/o having the
>many hassles of writing a full f/s.  My wrapper templates, called wrapfs,
>work on freebsd, linux, and solaris.  You can build all kinds of f/s using
>them, including f/s that do not require persistent storage.
>
>See
>	http://www.cs.columbia.edu/~ezk/research
>for papers, and
>	http://www.cs.columbia.edu/~ezk/research/software
>for tarballs.
>
>Let me know if you have any questions.
>
>Erez Zadok.
>Columbia University Department of Computer Science.
>EMail: ezk@cs.columbia.edu           Web: http://www.cs.columbia.edu/~ezk
>
>
>To Unsubscribe: send mail to majordomo@FreeBSD.org
>with "unsubscribe freebsd-fs" in the body of the message
>

-- 
Jan PECHANEC (mailto:pechy@hp735.cvut.cz)
Computing Center CTU (Zikova 4, Praha 6, 166 35, Czech Republic)
http://www.civ.cvut.cz, tel: +420 2 2435 2969, http://pechy.civ.cvut.cz


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5  7: 2:14 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by hub.freebsd.org (Postfix) with ESMTP id E05991505E
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 07:02:04 -0800 (PST)
	(envelope-from robert@cyrus.watson.org)
Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3])
	by fledge.watson.org (8.9.3/8.9.3) with SMTP id JAA51639;
	Fri, 5 Nov 1999 09:59:56 -0500 (EST)
	(envelope-from robert@cyrus.watson.org)
Date: Fri, 5 Nov 1999 09:59:56 -0500 (EST)
From: Robert Watson <robert@cyrus.watson.org>
X-Sender: robert@fledge.watson.org
Reply-To: Robert Watson <robert+freebsd@cyrus.watson.org>
To: Jan Pechanec <pechy@hp735.cvut.cz>
Cc: Erez Zadok <ezk@cs.columbia.edu>, freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems? 
In-Reply-To: <Pine.SGI.4.05.9911051518180.10246-100000@akat.civ.cvut.cz>
Message-ID: <Pine.BSF.3.96.991105095504.51562B-100000@fledge.watson.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, 5 Nov 1999, Jan Pechanec wrote:

> On Thu, 28 Oct 1999, Erez Zadok wrote:
> 
> 	Hi,
> 
> 	I think that it is a bit different. What Robert is hacking is
> a filesystem where in-vfs-not-experienced programmer can see how vfs
> is working. I have just read some of your papers, Erez, and I think
> that wrapfs wants me not to bother with something like vfs (just
> encode and decode routines).
> 
> 	I think that Robert's effort is very useful, I wanted myself
> to write somethink like this (purpose: to learn and _touch_ vfs
> interface). Robert, do you carry on or not?
> 
> 	BTW, don't you know why deadfs was written? No doc in FreeBSD.
> From what I saw in the source code, operations just fail.

Because wrapfs doesn't work in 3.3-RELEASE yet, and because of the reasons
you mention, I decided to keep working on a stupidfs :-).  That is, that I
don't want to add functionality to an existing file system by stacking,
but rather to have a new simple file system that I can modify the
semantics of in ways not encouarged by the stacking of file systems.  I am
currently traveling (IETF next week, Active Network conference in
Alberquerque the week after) so won't get back to my development machines
for about two weeks.  After that time, I hope to get a stupidfs
implementation to the point where it might be useful for others to see, so
I'll put it online.  As I mentioned before, the goal is to have a really
simple file system with no backing store, appropriate for use when
experimenting with new VOPs, etc, etc.  It won't be fully functioning (for
example, I probably won't even bother to implement symlinks) but it will
be *simple*, meaning it can be modifed easily.  It will also be separable
into an entirely separate module, unlike UFS which has fingers everywhere,
so it can easily be loaded and unloaded on demand during development.

I wouldn't encourage anyone to use it in production--it will make a fair
amount of use of kernel memory, as it won't back to a process--but for
development it should be useful.

  Robert N M Watson 

robert@fledge.watson.org              http://www.watson.org/~robert/
PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
TIS Labs at Network Associates, Safeport Network Services


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5  9:19:37 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from parker.yahoo.com (parker.yahoo.com [205.216.162.204])
	by hub.freebsd.org (Postfix) with ESMTP id A4C2F1522B
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 09:19:25 -0800 (PST)
	(envelope-from jh@parker.yahoo.com)
Received: from parker.yahoo.com (localhost.yahoo.com [127.0.0.1]) by parker.yahoo.com (8.8.8/8.6.12) with ESMTP id JAA23410; Fri, 5 Nov 1999 09:15:46 -0800 (PST)
Message-Id: <199911051715.JAA23410@parker.yahoo.com>
To: Jan Pechanec <pechy@hp735.cvut.cz>
Cc: Erez Zadok <ezk@cs.columbia.edu>,
	Robert Watson <robert+freebsd@cyrus.watson.org>,
	freebsd-fs@FreeBSD.ORG
Subject: deadfs, Re: stupidfs - easily extensible test file systems?
In-reply-to: Your message of "Fri, 05 Nov 1999 15:23:22 +0100."
             <Pine.SGI.4.05.9911051518180.10246-100000@akat.civ.cvut.cz> 
Date: Fri, 05 Nov 1999 09:15:46 -0800
From: John Hanley <jh@yahoo-inc.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

>	BTW, don't you know why deadfs was written? No doc in FreeBSD.
> From what I saw in the source code, operations just fail.

Deadfs is so you can V_BAD a vnode to revoke access to a tty or pty.
(Or revoke access to a filesystem, upon forcible umount.)
There used to be an ugly security problem where someone would log in,
start background jobs that can read/write a tty or pty, their login shell
exits, and some hapless person logs on to the tty or pty and is abused
by the background jobs that still hold an open file descriptor.
Nowadays, upon logout we V_BAD those file descriptors and the background
jobs can do no harm, but they are allowed to finish their computations
and write their results to disk.


	Cheers,
	JH


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 10:43:53 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20])
	by hub.freebsd.org (Postfix) with ESMTP id 684B214C84
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 10:43:49 -0800 (PST)
	(envelope-from ezk@shekel.mcl.cs.columbia.edu)
Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15])
	by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id NAA22581;
	Fri, 5 Nov 1999 13:43:10 -0500 (EST)
Received: (from ezk@localhost)
	by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id NAA21856;
	Fri, 5 Nov 1999 13:43:09 -0500 (EST)
Date: Fri, 5 Nov 1999 13:43:09 -0500 (EST)
Message-Id: <199911051843.NAA21856@shekel.mcl.cs.columbia.edu>
X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f
From: Erez Zadok <ezk@cs.columbia.edu>
To: Jan Pechanec <pechy@hp735.cvut.cz>
Cc: Erez Zadok <ezk@cs.columbia.edu>,
	Robert Watson <robert+freebsd@cyrus.watson.org>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems? 
In-reply-to: Your message of "Fri, 05 Nov 1999 15:23:22 +0100."
             <Pine.SGI.4.05.9911051518180.10246-100000@akat.civ.cvut.cz> 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <Pine.SGI.4.05.9911051518180.10246-100000@akat.civ.cvut.cz>, Jan Pechanec writes:
> On Thu, 28 Oct 1999, Erez Zadok wrote:
> 
> 	Hi,
> 
> 	I think that it is a bit different. What Robert is hacking is
> a filesystem where in-vfs-not-experienced programmer can see how vfs
> is working. I have just read some of your papers, Erez, and I think
> that wrapfs wants me not to bother with something like vfs (just
> encode and decode routines).

The encode and decode routines that wrapfs exports are an API that greatly
simplifies two difficult tasks:

(1) modifying file names (e.g., translating b/t unix and 8.3 names)
(2) modifying file data (e.g., encryption)

Every other task you want to accomplish in wrapfs, you do it right in the
actual f/s routines, right in the code itself.  For example, if you wanted
to add acl support (as I've done w/ a trivial aclfs based on wrapfs), you
add the right code in lookup().  If you want to create an unrmfs (another
prototype I've got), you add it in unlink().  If you wish, you can also
touch the read/write/getpage/putpage routines directly and not use the
encode/decode API functions.  But you'll find that there's a substantial
amount of support code needed to deal with data pages, locking, and a lot
more stuff around it.  All of this is detailed in my Usenix 99 paper.

> 	I think that Robert's effort is very useful, I wanted myself
> to write somethink like this (purpose: to learn and _touch_ vfs
> interface). Robert, do you carry on or not?

I commend you, but don't be surprised if what you'll produce in the end will
be almost identical to wrapfs in functionality.  It many ways, wrapfs is
"stupid", b/c it only provides a thin layer that passes all VOPs to the
layer below it, while maintaining semantics.  Wrapfs does not do much more
than that.  That's why I'm telling you now that your stupidfs may wind up
being very similar to wrapfs.  You cannot get stacking functionality with
much less than wrapfs does.

If you actually intend to modify the VFS, and add new VOPs, that'll be neat
too.  But I think you'll find it a bit difficult to get VFS changes merged
into the main source tree... :-) And if you will change the VFS, you'll find
that your stupidfs does more than, and is "smarter" than wrapfs.

You cannot introduce new VOPs w/o changing the VFS, and if you change the
VFS, you must make sure that other (native) file systems do something
reasonable with these new VOPs.

> -- 
> Jan PECHANEC (mailto:pechy@hp735.cvut.cz) Computing Center CTU (Zikova 4,
> Praha 6, 166 35, Czech Republic) http://www.civ.cvut.cz, tel: +420 2 2435
> 2969, http://pechy.civ.cvut.cz

Jan et al.  I'm not trying to "hawk my merchandise" on you, but rather to
save you a great deal of effort repeating that which has been done before.
I've created and released my wrapfs templates so that others could build on
them, and create hopefully really useful (even commercial) file systems.  It
may sound corny, but I hope that my work will revitalize the stagnating
field of stackable file systems research.  You may save a lot of time, and
still be able to learn much, by starting with my wrapfs code, and modifying
it to your needs.  I will be happy to help you in any way I can.

Cheers,
Erez.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 11:15:54 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20])
	by hub.freebsd.org (Postfix) with ESMTP id 0435D14BE9
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 11:15:49 -0800 (PST)
	(envelope-from ezk@shekel.mcl.cs.columbia.edu)
Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15])
	by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id OAA24974;
	Fri, 5 Nov 1999 14:14:50 -0500 (EST)
Received: (from ezk@localhost)
	by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id OAA23367;
	Fri, 5 Nov 1999 14:14:49 -0500 (EST)
Date: Fri, 5 Nov 1999 14:14:49 -0500 (EST)
Message-Id: <199911051914.OAA23367@shekel.mcl.cs.columbia.edu>
X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f
From: Erez Zadok <ezk@cs.columbia.edu>
To: Robert Watson <robert+freebsd@cyrus.watson.org>
Cc: Jan Pechanec <pechy@hp735.cvut.cz>,
	Erez Zadok <ezk@cs.columbia.edu>, freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems? 
In-reply-to: Your message of "Fri, 05 Nov 1999 09:59:56 EST."
             <Pine.BSF.3.96.991105095504.51562B-100000@fledge.watson.org> 
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

In message <Pine.BSF.3.96.991105095504.51562B-100000@fledge.watson.org>, Robert Watson writes:
[...]
> Because wrapfs doesn't work in 3.3-RELEASE yet, and because of the reasons
> you mention, I decided to keep working on a stupidfs :-).  

I'll be updating wrapfs for 3.3 once I return from LISA.  With luck, it'll
work again before you return from Albuquerque.

> That is, that I
> don't want to add functionality to an existing file system by stacking,
> but rather to have a new simple file system that I can modify the
> semantics of in ways not encouarged by the stacking of file systems.  

If I understand you right (maybe I didn't), there are two ways to do that:

(1) Create a simple *native* (disk based?) file system template from which
    you can possibly create new file systems that put data on disks and
    floppies, right?  In theory, one should be able to create msdosfs and
    ffs from such a template.  In practice, there are numerous details to
    work out, that getting something barely working for a "stuipid" template
    will require substantial effort.  Such a template would be very useful
    if it will have these two characteristics: (a) be small and simple, and
    (b) require little modification to create file systems such as msdosfs
    and ffs.  I believe that with current OS technology, it is impossible to
    get both 'a' and 'b' done.

(2) If what you want is a file system that can work with other file systems,
    then you're essentially asking for stacking.  Yes stacking f/s usually
    have to maintain VFS semantics, so that a layer is kept independent from
    other layers, either above or below it.  It is possible, however, for a
    stackable f/s to violate this priniciple; for example, you can muck with
    direct disk blocks and inode blocks from a stackable f/s.  It's not
    something I'd recommend, but it is possible.

> I am
> currently traveling (IETF next week, Active Network conference in
> Alberquerque the week after) so won't get back to my development machines
> for about two weeks.  After that time, I hope to get a stupidfs
> implementation to the point where it might be useful for others to see, so
> I'll put it online.  As I mentioned before, the goal is to have a really
> simple file system with no backing store, appropriate for use when
> experimenting with new VOPs, etc, etc.

I'd be very interested in seeing this.  I would also suggest that before you
dive into coding, you write out a detailed design, and post it to this list,
so we could all comment on it.

Note that extensible VFSs have been the expressed desire of stackable file
systems from the very early days.  In order for me to support file system
extensibilty without changing the OS or other file systems, I had to give up
the idea of creating new VOPs.  That is, you cannot add new vops using
wrapfs; you could create new ioctls, however, which are the poor man's
extensible model.  IOW, if you created an infrastructure that can extend the
VFS, you'll have something that wrapfs cannot do --- something that people
have been asking for some time.  (So don't call it "stuipid" :-)

If you haven't already, you should read up on all of the classic stacking
papers first, from Rosenthal, Skinner & Wong, Heidemann, Popek, etc.  Then
you might look into papers on Spring, BSD's Unionfs, and the HURD.  All of
these talk about mechanisms for VFS extensibility that would be useful for
you.

> It won't be fully functioning (for
> example, I probably won't even bother to implement symlinks) but it will
> be *simple*, meaning it can be modifed easily.  It will also be separable
> into an entirely separate module, unlike UFS which has fingers everywhere,
> so it can easily be loaded and unloaded on demand during development.
> 
> I wouldn't encourage anyone to use it in production--it will make a fair
> amount of use of kernel memory, as it won't back to a process--but for
> development it should be useful.

I think you have to be very careful about your implementation.  You cannot
encourage people to use something in PRODUCTION that has not been thoroughly
tested, and esp. if it's missing functionality.  If you want your f/s to be
useful, make sure it works with existing VFSs and existing file systems.  At
the very least, make sure it won't damage people's installations.  It would
be nice if "all" it did was _add_ new VOPs, while keeping existing ones
unchanged.

I'm speaking from experience here.  I've developed wrapfs on solaris,
freebsd, and linux.  In the early days, I've dealt with bugs that easily
corrupted active memory and resulted in total corruption of system and boot
partitions, to a point where a reinstallation was required.  After a few
frustrating reinstallations, I wound up setting up automatic OS installation
systems (network-based booting, installing off of an auxiliary disk, even
using identical disks and dd'ing a good copy onto a trashed one).

>   Robert N M Watson 
> 
> robert@fledge.watson.org              http://www.watson.org/~robert/
> PGP key fingerprint: AF B5 5F FF A6 4A 79 37  ED 5F 55 E9 58 04 6A B1
> TIS Labs at Network Associates, Safeport Network Services

Good luck.  Let me know if I can help.

Erez.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 13:43:46 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33])
	by hub.freebsd.org (Postfix) with ESMTP id F3E1415369
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 13:43:40 -0800 (PST)
	(envelope-from pantzer@speedy.ludd.luth.se)
Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164])
	by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id WAA02426;
	Fri, 5 Nov 1999 22:42:49 +0100
Message-Id: <199911052142.WAA02426@zed.ludd.luth.se>
X-Mailer: exmh version 2.0.1 12/23/97
To: Greg Lehey <grog@lemis.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs) 
In-Reply-To: Message from Greg Lehey <grog@mojave.sitaranetworks.com> 
   of "Thu, 04 Nov 1999 18:37:37 EST." <19991104183737.04186@mojave.sitaranetworks.com> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 05 Nov 1999 22:42:49 +0100
From: Mattias Pantzare <pantzer@ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> 
> Take a look at /usr/src/sys/dev/vinum/vinumraid5.c and tell me how to
> modify the code to make that work in a general case.

Will do. But don't hold your breath :-)

To get to know vinum I tried to use it on devices made with vnconfig. Should I 
debug the panic I got or will it just not work?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 13:53:12 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from yana.lemis.com (yana.lemis.com [192.109.197.140])
	by hub.freebsd.org (Postfix) with ESMTP id B924A14C8F
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 13:53:07 -0800 (PST)
	(envelope-from grog@mojave.sitaranetworks.com)
Received: from mojave.sitaranetworks.com ([199.103.141.157])
	by yana.lemis.com (8.8.8/8.8.8) with ESMTP id IAA07050;
	Sat, 6 Nov 1999 08:21:21 +1030 (CST)
	(envelope-from grog@mojave.sitaranetworks.com)
Message-ID: <19991105165042.50293@mojave.sitaranetworks.com>
Date: Fri, 5 Nov 1999 16:50:42 -0500
From: Greg Lehey <grog@mojave.sitaranetworks.com>
To: Mattias Pantzare <pantzer@ludd.luth.se>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs)
Reply-To: Greg Lehey <grog@lemis.com>
References: <grog@mojave.sitaranetworks.com> <199911052142.WAA02426@zed.ludd.luth.se>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <199911052142.WAA02426@zed.ludd.luth.se>; from Mattias Pantzare on Fri, Nov 05, 1999 at 10:42:49PM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Friday,  5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote:
>>
>> Take a look at /usr/src/sys/dev/vinum/vinumraid5.c and tell me how to
>> modify the code to make that work in a general case.
>
> Will do. But don't hold your breath :-)

I won't.

> To get to know vinum I tried to use it on devices made with
> vnconfig. Should I debug the panic I got or will it just not work?

I don't know of any a priori reason why it shouldn't work.  If you
send me a stack trace, I should be able to help.  Note the
instructions in vinum(4).

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 14:13:45 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id BB55014F74
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 14:13:39 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id XAA04385;
	Fri, 5 Nov 1999 23:06:23 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id XAA02793;
	Fri, 5 Nov 1999 23:13:02 +0100 (CET)
Date: Fri, 5 Nov 1999 23:13:02 +0100
From: Bernd Walter <ticso@cicely.de>
To: Greg Lehey <grog@lemis.com>
Cc: Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs)
Message-ID: <19991105231302.A2771@cicely7.cicely.de>
References: <grog@mojave.sitaranetworks.com> <199911052142.WAA02426@zed.ludd.luth.se> <19991105165042.50293@mojave.sitaranetworks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <19991105165042.50293@mojave.sitaranetworks.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Fri, Nov 05, 1999 at 04:50:42PM -0500, Greg Lehey wrote:
> On Friday,  5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote:
> 
> > To get to know vinum I tried to use it on devices made with
> > vnconfig. Should I debug the panic I got or will it just not work?
> 
> I don't know of any a priori reason why it shouldn't work.  If you
> send me a stack trace, I should be able to help.  Note the
> instructions in vinum(4).
> 
vn devices are file based.
I prety shure that it's strategy function can't be called in interrupt context
as happens in Raid5 cases.
vn calls VOP_READ and VOP_WRITE directly from strategy without a queue like
real disk drivers do.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 14:32:37 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from yana.lemis.com (yana.lemis.com [192.109.197.140])
	by hub.freebsd.org (Postfix) with ESMTP id B791D14D3D
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 14:32:30 -0800 (PST)
	(envelope-from grog@mojave.sitaranetworks.com)
Received: from mojave.sitaranetworks.com ([199.103.141.157])
	by yana.lemis.com (8.8.8/8.8.8) with ESMTP id JAA07085;
	Sat, 6 Nov 1999 09:01:44 +1030 (CST)
	(envelope-from grog@mojave.sitaranetworks.com)
Message-ID: <19991105173107.38019@mojave.sitaranetworks.com>
Date: Fri, 5 Nov 1999 17:31:07 -0500
From: Greg Lehey <grog@mojave.sitaranetworks.com>
To: Bernd Walter <ticso@cicely.de>
Cc: Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs)
Reply-To: Greg Lehey <grog@lemis.com>
References: <grog@mojave.sitaranetworks.com> <199911052142.WAA02426@zed.ludd.luth.se> <19991105165042.50293@mojave.sitaranetworks.com> <19991105231302.A2771@cicely7.cicely.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991105231302.A2771@cicely7.cicely.de>; from Bernd Walter on Fri, Nov 05, 1999 at 11:13:02PM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Friday,  5 November 1999 at 23:13:02 +0100, Bernd Walter wrote:
> On Fri, Nov 05, 1999 at 04:50:42PM -0500, Greg Lehey wrote:
>> On Friday,  5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote:
>>
>>> To get to know vinum I tried to use it on devices made with
>>> vnconfig. Should I debug the panic I got or will it just not work?
>>
>> I don't know of any a priori reason why it shouldn't work.  If you
>> send me a stack trace, I should be able to help.  Note the
>> instructions in vinum(4).
>
> vn devices are file based.
> I prety shure that it's strategy function can't be called in interrupt context
> as happens in Raid5 cases.
> vn calls VOP_READ and VOP_WRITE directly from strategy without a queue like
> real disk drivers do.

Yes, that's reasonable.  I'd still like to see a trace.

We could acommodate vnodes by getting the daemon to complete things.
That would make access still slower, and I can't really see a good
reason for it, but it's possible.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Fri Nov  5 14:32:50 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from yana.lemis.com (yana.lemis.com [192.109.197.140])
	by hub.freebsd.org (Postfix) with ESMTP id C6BB814F2F
	for <freebsd-fs@FreeBSD.ORG>; Fri,  5 Nov 1999 14:32:41 -0800 (PST)
	(envelope-from grog@mojave.sitaranetworks.com)
Received: from mojave.sitaranetworks.com ([199.103.141.157])
	by yana.lemis.com (8.8.8/8.8.8) with ESMTP id JAA07088;
	Sat, 6 Nov 1999 09:02:01 +1030 (CST)
	(envelope-from grog@mojave.sitaranetworks.com)
Message-ID: <19991105173107.38019@mojave.sitaranetworks.com>
Date: Fri, 5 Nov 1999 17:31:07 -0500
From: Greg Lehey <grog@mojave.sitaranetworks.com>
To: Bernd Walter <ticso@cicely.de>
Cc: Mattias Pantzare <pantzer@ludd.luth.se>, freebsd-fs@FreeBSD.ORG
Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs)
Reply-To: Greg Lehey <grog@lemis.com>
References: <grog@mojave.sitaranetworks.com> <199911052142.WAA02426@zed.ludd.luth.se> <19991105165042.50293@mojave.sitaranetworks.com> <19991105231302.A2771@cicely7.cicely.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <19991105231302.A2771@cicely7.cicely.de>; from Bernd Walter on Fri, Nov 05, 1999 at 11:13:02PM +0100
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Friday,  5 November 1999 at 23:13:02 +0100, Bernd Walter wrote:
> On Fri, Nov 05, 1999 at 04:50:42PM -0500, Greg Lehey wrote:
>> On Friday,  5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote:
>>
>>> To get to know vinum I tried to use it on devices made with
>>> vnconfig. Should I debug the panic I got or will it just not work?
>>
>> I don't know of any a priori reason why it shouldn't work.  If you
>> send me a stack trace, I should be able to help.  Note the
>> instructions in vinum(4).
>
> vn devices are file based.
> I prety shure that it's strategy function can't be called in interrupt context
> as happens in Raid5 cases.
> vn calls VOP_READ and VOP_WRITE directly from strategy without a queue like
> real disk drivers do.

Yes, that's reasonable.  I'd still like to see a trace.

We could acommodate vnodes by getting the daemon to complete things.
That would make access still slower, and I can't really see a good
reason for it, but it's possible.

Greg
--
Finger grog@lemis.com for PGP public key
See complete headers for address and phone numbers


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6  2:49: 2 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38])
	by hub.freebsd.org (Postfix) with ESMTP id 5D98914C40
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 02:49:01 -0800 (PST)
	(envelope-from julian@whistle.com)
Received: from current1.whiste.com (current1.whistle.com [207.76.205.22])
	by alpo.whistle.com (8.9.1a/8.9.1) with ESMTP id CAA38753;
	Sat, 6 Nov 1999 02:48:43 -0800 (PST)
Date: Sat, 6 Nov 1999 02:48:42 -0800 (PST)
From: Julian Elischer <julian@whistle.com>
To: Jan Pechanec <pechy@hp735.cvut.cz>
Cc: Erez Zadok <ezk@cs.columbia.edu>,
	Robert Watson <robert+freebsd@cyrus.watson.org>,
	freebsd-fs@FreeBSD.ORG
Subject: Re: stupidfs - easily extensible test file systems? 
In-Reply-To: <Pine.SGI.4.05.9911051518180.10246-100000@akat.civ.cvut.cz>
Message-ID: <Pine.BSF.4.10.9911060244500.7998-100000@current1.whistle.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Fri, 5 Nov 1999, Jan Pechanec wrote:

> 
> 	BTW, don't you know why deadfs was written? No doc in FreeBSD.
> From what I saw in the source code, operations just fail.
> 
When youhave a vnode open, and for some reason the filesystem the vmode
pints to disappears (e.g. the disk is removed, or the PC-CARD is removed,
or many other posibilties), then you cannot track down all teh users fo
that vnode very easily, so insteadm you 'fiddle' with it to make it
reference the DEADFS (use VGONE) and when the users try use it again they
will safely get an error, but at least the system will
not core-dump when they access a non existant filesyste,/device.


julian


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6  7:59: 6 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33])
	by hub.freebsd.org (Postfix) with ESMTP id 7C97D14D6F
	for <freebsd-fs@freebsd.org>; Sat,  6 Nov 1999 07:59:04 -0800 (PST)
	(envelope-from pantzer@speedy.ludd.luth.se)
Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164])
	by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id QAA19311;
	Sat, 6 Nov 1999 16:58:56 +0100
Message-Id: <199911061558.QAA19311@zed.ludd.luth.se>
X-Mailer: exmh version 2.0.1 12/23/97
To: grog@lemis.com
Cc: freebsd-fs@freebsd.org
Subject: RAID-5 and failure
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 06 Nov 1999 16:58:55 +0100
From: Mattias Pantzare <pantzer@ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

What hapens if the data part of a write to a RAID-5 plex completes but not the 
parity part (or the other way)?


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6  8: 7:11 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from gw.nectar.com (gw.nectar.com [209.98.143.44])
	by hub.freebsd.org (Postfix) with ESMTP id 082D314CD5
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 08:07:09 -0800 (PST)
	(envelope-from nectar@nectar.com)
Received: from bone.nectar.com (bone.nectar.com [10.0.0.105])
	by gw.nectar.com (Postfix) with ESMTP
	id 76BB951723; Sat,  6 Nov 1999 10:05:38 -0600 (CST)
Received: from bone.nectar.com (localhost [127.0.0.1])
	by bone.nectar.com (Postfix) with ESMTP
	id C77771D7A; Sat,  6 Nov 1999 10:07:02 -0600 (CST)
X-Mailer: exmh version 2.1.0 09/18/1999
X-Exmh-Isig-CompType: repl
X-Exmh-Isig-Folder: mlist/freebsd/fs
X-PGP-RSAfprint: 00 F9 E6 A2 C5 4D 0A 76  26 8B 8B 57 73 D0 DE EE
X-PGP-RSAkey: http://www.nectar.com/nectar-rsa.txt
X-PGP-DSSfprint: AB2F 8D71 A4F4 467D 352E  8A41 5D79 22E4 71A2 8C73
X-PGP-DHfprint: 2D50 12E5 AB38 60BA AF4B  0778 7242 4460 1C32 F6B1
X-PGP-DH-DSSkey: http://www.nectar.com/nectar-dh-dss.txt
From: Jacques Vidrine <n@nectar.com>
To: Julian Elischer <julian@whistle.com>
Cc: Jan Pechanec <pechy@hp735.cvut.cz>,
	Erez Zadok <ezk@cs.columbia.edu>,
	Robert Watson <robert+freebsd@cyrus.watson.org>,
	freebsd-fs@FreeBSD.ORG
In-reply-to: <Pine.BSF.4.10.9911060244500.7998-100000@current1.whistle.com> 
References: <Pine.BSF.4.10.9911060244500.7998-100000@current1.whistle.com>
Subject: Re: stupidfs - easily extensible test file systems? 
Mime-Version: 1.0
Content-Type: text/plain
Date: Sat, 06 Nov 1999 10:07:02 -0600
Message-Id: <19991106160702.C77771D7A@bone.nectar.com>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On 6 November 1999 at 2:48, Julian Elischer <julian@whistle.com> wrote:
> When youhave a vnode open, and for some reason the filesystem the vmode
> pints to disappears (e.g. the disk is removed, or the PC-CARD is removed,
> or many other posibilties), 
[snip]

The most common case in most systems is probably revoke(2).
--
Jacques Vidrine / n@nectar.com / nectar@FreeBSD.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6  8:34:53 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id 1C8F014C3B
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 08:34:44 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id RAA24945;
	Sat, 6 Nov 1999 17:27:53 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id RAA09170;
	Sat, 6 Nov 1999 17:34:34 +0100 (CET)
Date: Sat, 6 Nov 1999 17:34:34 +0100
From: Bernd Walter <ticso@cicely.de>
To: Mattias Pantzare <pantzer@ludd.luth.se>
Cc: grog@lemis.com, freebsd-fs@FreeBSD.ORG
Subject: Re: RAID-5 and failure
Message-ID: <19991106173434.A9143@cicely7.cicely.de>
References: <199911061558.QAA19311@zed.ludd.luth.se>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <199911061558.QAA19311@zed.ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sat, Nov 06, 1999 at 04:58:55PM +0100, Mattias Pantzare wrote:
> What hapens if the data part of a write to a RAID-5 plex completes but not the 
> parity part (or the other way)?
> 
The parity is not in sync - what else?

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6  9:16:55 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33])
	by hub.freebsd.org (Postfix) with ESMTP id D237314C92
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 09:16:53 -0800 (PST)
	(envelope-from pantzer@speedy.ludd.luth.se)
Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164])
	by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id SAA20783;
	Sat, 6 Nov 1999 18:16:49 +0100
Message-Id: <199911061716.SAA20783@zed.ludd.luth.se>
X-Mailer: exmh version 2.0.1 12/23/97
To: Bernd Walter <ticso@cicely.de>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: RAID-5 and failure 
In-Reply-To: Message from Bernd Walter <ticso@cicely.de> 
   of "Sat, 06 Nov 1999 17:34:34 +0100." <19991106173434.A9143@cicely7.cicely.de> 
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Sat, 06 Nov 1999 18:16:47 +0100
From: Mattias Pantzare <pantzer@ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> On Sat, Nov 06, 1999 at 04:58:55PM +0100, Mattias Pantzare wrote:
> > What hapens if the data part of a write to a RAID-5 plex completes but not the 
> > parity part (or the other way)?
> > 
> The parity is not in sync - what else?

The system could detect it and recalculate the parity. Or give a warning to 
the user so the user knows that the data is not safe.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6  9:33:28 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id 548BE14E97
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 09:33:25 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id SAA28192;
	Sat, 6 Nov 1999 18:26:34 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id SAA09438;
	Sat, 6 Nov 1999 18:33:16 +0100 (CET)
Date: Sat, 6 Nov 1999 18:33:16 +0100
From: Bernd Walter <ticso@cicely.de>
To: Mattias Pantzare <pantzer@ludd.luth.se>
Cc: Bernd Walter <ticso@cicely.de>, freebsd-fs@FreeBSD.ORG
Subject: Re: RAID-5 and failure
Message-ID: <19991106183316.A9420@cicely7.cicely.de>
References: <ticso@cicely.de> <199911061716.SAA20783@zed.ludd.luth.se>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <199911061716.SAA20783@zed.ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sat, Nov 06, 1999 at 06:16:47PM +0100, Mattias Pantzare wrote:
> > On Sat, Nov 06, 1999 at 04:58:55PM +0100, Mattias Pantzare wrote:
> > > What hapens if the data part of a write to a RAID-5 plex completes but not the 
> > > parity part (or the other way)?
> > > 
> > The parity is not in sync - what else?
> 
> The system could detect it and recalculate the parity. Or give a warning to 
> the user so the user knows that the data is not safe.

That's not possible because you need to write more then a single sector to keep
parity in sync which is not atomic.

In case one of the writes fail vinum will do everything needed to work with it
and to inform the user.
Vinum will take the subdisk down because such drives should work with
write reallocation enabled and such a disk is badly broken if you receive a
write error.

If the system panics or power fails between such a write there is no way to
find out if the parity is broken beside verifying the complete plex after
reboot - the problem should be the same with all usual hard and software
solutions - greg already begun or finished recalculating and checking the
parity.
I asume that's the reason why some systems use 520 byte sectors - maybe they
write timestamps or generationnumbers in a single write within the sector.


-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6 10:27:27 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33])
	by hub.freebsd.org (Postfix) with ESMTP id 61E5514EFE
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 10:27:24 -0800 (PST)
	(envelope-from pantzer@speedy.ludd.luth.se)
Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164])
	by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id TAA22113;
	Sat, 6 Nov 1999 19:27:21 +0100
Message-Id: <199911061827.TAA22113@zed.ludd.luth.se>
X-Mailer: exmh version 2.0.1 12/23/97
To: Bernd Walter <ticso@cicely.de>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: RAID-5 and failure 
In-Reply-To: Message from Bernd Walter <ticso@cicely.de> 
   of "Sat, 06 Nov 1999 18:33:16 +0100." <19991106183316.A9420@cicely7.cicely.de> 
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Date: Sat, 06 Nov 1999 19:27:20 +0100
From: Mattias Pantzare <pantzer@ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> If the system panics or power fails between such a write there is no wa=
y to
> find out if the parity is broken beside verifying the complete plex aft=
er
> reboot - the problem should be the same with all usual hard and softwar=
e
> solutions - greg already begun or finished recalculating and checking t=
he
> parity.

This is realy a optimisation issue, if you just write without using
two-phase commit then you have to recalculate parity after a powerfailure=
=2E =

(One might keep track of the regions of the disk that have had writes lat=
ly =

and only recalculate them)

Or you do as it says under Two-phase commitment in
http://www.sunworld.com/sunworldonline/swol-09-1995/swol-09-raid5-2.html.=


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6 10:33:55 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from hromeo.algonet.se (hromeo.algonet.se [194.213.74.10])
	by hub.freebsd.org (Postfix) with SMTP id 7BCDA14E82
	for <freebsd-fs@freebsd.org>; Sat,  6 Nov 1999 10:33:48 -0800 (PST)
	(envelope-from mal@algonet.se)
Received: (qmail 22267 invoked from network); 6 Nov 1999 19:33:47 +0100
Received: from enok.algonet.se (194.213.74.88)
  by hromeo.algonet.se with SMTP; 6 Nov 1999 19:33:47 +0100
Received: from kairos.algonet.se ([194.213.74.18])
 by algonet.se (BLUETAIL Mail Robustifier1.0.4) with ESMTP
 ; Sat, 06 Nov 1999 18:33:47 GMT
Received: (mal@localhost) by kairos.algonet.se (8.8.8+Sun/8.6.12) id TAA04881; Sat, 6 Nov 1999 19:33:46 +0100 (MET)
To: freebsd-fs@freebsd.org
Subject: Re: stupidfs - easily extensible test file systems?
References: <80113h$n8e$1@FreeBSD.csie.NCTU.edu.tw>
From: Mats Lofkvist <mal@algonet.se>
Date: 06 Nov 1999 19:33:46 +0100
In-Reply-To: julian@whistle.com's message of "6 Nov 1999 18:49:21 +0800"
Message-ID: <y2qn1srh3lh.fsf@kairos.algonet.se>
Lines: 25
X-Mailer: Gnus v5.6.45/Emacs 20.3
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

julian@whistle.com (Julian Elischer) writes:
> On Fri, 5 Nov 1999, Jan Pechanec wrote:
> > 
> > 	BTW, don't you know why deadfs was written? No doc in FreeBSD.
> > From what I saw in the source code, operations just fail.
> > 
> When youhave a vnode open, and for some reason the filesystem the vmode
> pints to disappears (e.g. the disk is removed, or the PC-CARD is removed,
> or many other posibilties), then you cannot track down all teh users fo
> that vnode very easily, so insteadm you 'fiddle' with it to make it
> reference the DEADFS (use VGONE) and when the users try use it again they
> will safely get an error, but at least the system will
> not core-dump when they access a non existant filesyste,/device.

I guess deadfs is what makes the -f (force) flag to umount work
also, and that one is a truly great feature in FreeBSD missing in
many other unixen (e.g. solaris {and linux, I believe}).

Having to track down all processes with open descriptors on e.g.
a nfs mount before being able to umount it is a real pain in the *,
most times I give up on it and reboot the machine instead.

      _
Mats Lofkvist
mal@algonet.se


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message


From owner-freebsd-fs  Sat Nov  6 11: 8: 5 1999
Delivered-To: freebsd-fs@freebsd.org
Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57])
	by hub.freebsd.org (Postfix) with ESMTP id CF03C14BDC
	for <freebsd-fs@FreeBSD.ORG>; Sat,  6 Nov 1999 11:08:02 -0800 (PST)
	(envelope-from ticso@mail.cicely.de)
Received: from mail.cicely.de (cicely.de [194.231.9.142])
	by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id UAA03422;
	Sat, 6 Nov 1999 20:01:12 +0100 (MET)
Received: (from ticso@localhost)
	by mail.cicely.de (8.9.0/8.9.0) id UAA09809;
	Sat, 6 Nov 1999 20:07:54 +0100 (CET)
Date: Sat, 6 Nov 1999 20:07:54 +0100
From: Bernd Walter <ticso@cicely.de>
To: Mattias Pantzare <pantzer@ludd.luth.se>
Cc: Bernd Walter <ticso@cicely.de>, freebsd-fs@FreeBSD.ORG
Subject: Re: RAID-5 and failure
Message-ID: <19991106200754.A9682@cicely7.cicely.de>
References: <ticso@cicely.de> <199911061827.TAA22113@zed.ludd.luth.se>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0pre3i
In-Reply-To: <199911061827.TAA22113@zed.ludd.luth.se>
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Sat, Nov 06, 1999 at 07:27:20PM +0100, Mattias Pantzare wrote:
> > If the system panics or power fails between such a write there is no way to
> > find out if the parity is broken beside verifying the complete plex after
> > reboot - the problem should be the same with all usual hard and software
> > solutions - greg already begun or finished recalculating and checking the
> > parity.
> 
> This is realy a optimisation issue, if you just write without using
> two-phase commit then you have to recalculate parity after a powerfailure. 
> (One might keep track of the regions of the disk that have had writes latly 
> and only recalculate them)
> 
> Or you do as it says under Two-phase commitment in
> http://www.sunworld.com/sunworldonline/swol-09-1995/swol-09-raid5-2.html.
> 
That's exactly what vinum does at this moment but without the log.
You need persistent memory for this such as nv-memory or a log area on any disk.
nv-memory on PCs is usually to small and maybe to slow for such purposes.
I asume that a log area on any partitipating disk is not a good idea.
On a different disk it would be an option but still needs implementation.

-- 
B.Walter                  COSMO-Project              http://www.cosmo-project.de
ticso@cicely.de             Usergroup                info@cosmo-project.de


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message