From owner-freebsd-fs Sun Oct 31 3: 5:30 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.127.48]) by hub.freebsd.org (Postfix) with ESMTP id E8EB314C38 for ; Sun, 31 Oct 1999 03:05:17 -0800 (PST) (envelope-from rr@xs4all.nl) Received: from xs3.xs4all.nl (xs3.xs4all.nl [194.109.6.44]) by smtp1.xs4all.nl (8.9.3/8.9.3) with ESMTP id MAA19130 for ; Sun, 31 Oct 1999 12:05:15 +0100 (CET) Received: (from rr@localhost) by xs3.xs4all.nl (8.9.0/8.9.0) id MAA28836 for freebsd-fs@FreeBSD.ORG; Sun, 31 Oct 1999 12:05:14 +0100 (CET) Date: Sun, 31 Oct 1999 12:05:14 +0100 From: Rodney To: freebsd-fs@FreeBSD.ORG Subject: feature list journalled fs Message-ID: <19991031120514.A28103@xs4all.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org hi, here's my list of features I'd like to see in a journalled fs. Have to admit this list is heavily inspired ( ok , copied ) from the VxFS features, apart from th buzz words, some of them make sense, some of them don't but it should give us some stuff to discus: 1) extent based allocation coding this should be easy, it's just a address-lenght pair identifying the starting block address and the length of the extent. I've seen this coded up in qnxfs under linux. I think the vsta filesystem does something similar. 2) fast filesystem recovery , obviously 3) acls would be nice , afs style ? 4) online defrag and resizing (while user are online) 5) online backup/snapshot 6) vinum integration (vague) 7) built features that make databases very happy like msql/mysql/oracle. (vague) also b?trees for indexing sounds cool, thought the xfs implementation seems quite heavy(they maintain 2 of them) , ie over-kill ? The way b+trees are use in the Be fs (bfs) might be more appropriate. Comments ? rodney -- "I can't understand why people are frightened of new ideas. I'm frightened of old ones." --John Cage -- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 31 10:53:52 1999 Delivered-To: freebsd-fs@freebsd.org Received: from angel.algonet.se (angel.algonet.se [194.213.74.112]) by hub.freebsd.org (Postfix) with SMTP id B913C14CAC for ; Sun, 31 Oct 1999 10:53:48 -0800 (PST) (envelope-from mal@algonet.se) Received: (qmail 6333 invoked from network); 31 Oct 1999 19:53:44 +0100 Received: from kent.algonet.se (194.213.74.90) by angel.algonet.se with SMTP; 31 Oct 1999 19:53:44 +0100 Received: from kairos.algonet.se ([194.213.74.18]) by algonet.se (BLUETAIL Mail Robustifier1.0.4) with ESMTP ; Sun, 31 Oct 1999 18:53:44 GMT Received: (mal@localhost) by kairos.algonet.se (8.8.8+Sun/8.6.12) id TAA03522; Sun, 31 Oct 1999 19:53:43 +0100 (MET) To: freebsd-fs@FreeBSD.org Cc: ezk@cs.columbia.edu (Erez Zadok) Subject: Re: stupidfs - easily extensible test file systems? From: Mats Lofkvist Date: 31 Oct 1999 19:53:43 +0100 In-Reply-To: ezk@cs.columbia.edu's message of "29 Oct 1999 05:23:26 +0800" Message-ID: Lines: 26 X-Mailer: Gnus v5.6.45/Emacs 20.3 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org ezk@cs.columbia.edu (Erez Zadok) writes: > Robert, it's been done. To some degree that's nullfs (if nullfs had been > working; the VFS is broken). I've written stackable f/s templates exactly > for the purpose of developers using them to build other f/s w/o having the > many hassles of writing a full f/s. My wrapper templates, called wrapfs, > work on freebsd, linux, and solaris. You can build all kinds of f/s using > them, including f/s that do not require persistent storage. > > See > http://www.cs.columbia.edu/~ezk/research > for papers, and > http://www.cs.columbia.edu/~ezk/research/software > for tarballs. Is wrapfs/fist actively updated for FreeBSD? (I noted that the latest FreeBSD version is almost a year old and for 3.0 only.) And does anyone know if this has a chance being a standard part of FreeBSD, and how it relates to the general cleanup of the stacking fs code that seem to be on the "todo sometime in the future" list for FreeBSD? _ Mats Lofkvist mal@algonet.se To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 31 12: 5:20 1999 Delivered-To: freebsd-fs@freebsd.org Received: from haldjas.folklore.ee (Haldjas.folklore.ee [193.40.6.121]) by hub.freebsd.org (Postfix) with ESMTP id 6F9A414C01 for ; Sun, 31 Oct 1999 12:05:15 -0800 (PST) (envelope-from narvi@haldjas.folklore.ee) Received: from localhost (narvi@localhost) by haldjas.folklore.ee (8.9.3/8.9.3) with SMTP id WAA25104; Sun, 31 Oct 1999 22:04:55 +0200 (EET) (envelope-from narvi@haldjas.folklore.ee) Date: Sun, 31 Oct 1999 22:04:55 +0200 (EET) From: Narvi To: Rodney Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991031120514.A28103@xs4all.nl> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, 31 Oct 1999, Rodney wrote: > hi, > > here's my list of features I'd like to see in a > journalled fs. Have to admit this list is heavily > inspired ( ok , copied ) from the VxFS features, > apart from th buzz words, > some of them make sense, some of them don't > but it should give us some stuff to discus: > > Comments ? > You forgot to include *anything* that in any way relates to journaling. As did others. Which leaves the question whetever you want a journaled filesystem at all, or just a filesystem confirming to a lot of buzz-words. IMHO it would be good to have a journaled filesystem. 8-) If it is extensible enough to easily allow a selection of the buzz-words that have been thrown around, so much the better. But the utility of the features would hopefully be tested before actually incorporated. If somebody is making a list of 'features' then they should add: * Can optimise data placement for the case that the partition it resides on is not located on a single spindel but resides on n spindles. Think of vinum, that is standard in the system. > rodney > -- > "I can't understand why people are frightened of new ideas. > I'm frightened of old ones." --John Cage > -- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 31 14:11:25 1999 Delivered-To: freebsd-fs@freebsd.org Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20]) by hub.freebsd.org (Postfix) with ESMTP id C531814EF6 for ; Sun, 31 Oct 1999 14:11:21 -0800 (PST) (envelope-from ezk@shekel.mcl.cs.columbia.edu) Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15]) by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id RAA27674; Sun, 31 Oct 1999 17:11:21 -0500 (EST) Received: (from ezk@localhost) by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id RAA00014; Sun, 31 Oct 1999 17:11:20 -0500 (EST) Date: Sun, 31 Oct 1999 17:11:20 -0500 (EST) Message-Id: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu> X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f From: Erez Zadok To: Mats Lofkvist Cc: freebsd-fs@FreeBSD.org, ezk@cs.columbia.edu (Erez Zadok) Subject: Re: stupidfs - easily extensible test file systems? In-reply-to: Your message of "31 Oct 1999 19:53:43 +0100." Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message , Mats Lofkvist writes: > ezk@cs.columbia.edu (Erez Zadok) writes: > > > Robert, it's been done. To some degree that's nullfs (if nullfs had been > > working; the VFS is broken). I've written stackable f/s templates exactly > > for the purpose of developers using them to build other f/s w/o having the > > many hassles of writing a full f/s. My wrapper templates, called wrapfs, > > work on freebsd, linux, and solaris. You can build all kinds of f/s using > > them, including f/s that do not require persistent storage. > > > > See > > http://www.cs.columbia.edu/~ezk/research > > for papers, and > > http://www.cs.columbia.edu/~ezk/research/software > > for tarballs. > > Is wrapfs/fist actively updated for FreeBSD? (I noted that the > latest FreeBSD version is almost a year old and for 3.0 only.) I will be updating this port for 3.3 and 4.0 in the two weeks following LISA, i.e. by end of November. > And does anyone know if this has a chance being a standard part > of FreeBSD, and how it relates to the general cleanup of the > stacking fs code that seem to be on the "todo sometime in the > future" list for FreeBSD? What do you mean by "this"? My code will be fixed soon. The problem is that I'm forced to use synchronous writes to work around the VFS problems. I don't expect the VFS to be fixed any time soon. It's been broken for a long time and there aren't too many "customers" complaining about it, or it would have been fixed by now. It just doesn't appear to be a high priority for the freebsd developers. I think it's too late for 3.x, but now would be a good time for freebsd to put those fixes into 4.0, before it becomes the default stable version. Many people on this list understand the problems and know how to fix them. There are even some experimental patches made by Eivind Eklund, but those patches aren't part of the kernel. Eivind's patches used to be in http://www.freebsd.org/~eivind/VOP_GETBACKINGOBJECT.patch and now they appear to be in http://www.freebsd.org/~eivind/FixNULL.patch (Eivind, can you confirm the new URL? FixNull.patch seems to include stuff unrelated to the VFS, such as scsi driver fixes. Thanks.) There's also been talk about some people (McKusick et al) rewriting the whole VFS. While I think that's a great idea, it's a large undertaking and will take a long while for busy people like McKusick to complete. I think a complete rewrite, if any, should be scheduled for 5.x. I would therefore suggest that a simpler fix such as Eivind's be incorporated into a 4.0 so people can use stackable f/s (unionfs, nulls, and my wrapfs/cryptfs, etc.) in the more immediate future. > Mats Lofkvist > mal@algonet.se I'd like to mention that I understand the pressures the freebsd developers are under. From a support perspective, you have to prioritize your human resources based on customer needs. There are, however, enough people (myself included) who are willing to work together and come up with a design and an implementation of the VFS fixes, and we are willing to spend our personal (i.e., free) time to do so. All we ask is commitment on the part of the management to include such patches in a none-too-distant future release. BTW, if there's enough momentum and FreeBSD developers/hackers attending LISA, we can have a brainstorming meeting in Seattle... Thanks, Erez Zadok. --- Columbia University Department of Computer Science. EMail: ezk@cs.columbia.edu Web: http://www.cs.columbia.edu/~ezk To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 31 22:55:27 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.tvol.com (mail.wgate.com [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id 4C99315248 for ; Sun, 31 Oct 1999 22:55:20 -0800 (PST) (envelope-from rjesup@wgate.com) Received: from jesup.eng.tvol.net (jesup.eng.tvol.net [10.32.2.26]) by mail.tvol.com (8.8.8/8.8.3) with ESMTP id BAA07275 for ; Mon, 1 Nov 1999 01:50:35 -0500 (EST) Reply-To: Randell Jesup To: freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS References: From: Randell Jesup Date: 01 Nov 1999 02:51:47 +0000 In-Reply-To: Don's message of "Sat, 30 Oct 1999 19:40:35 -0400 (EDT)" Message-ID: X-Mailer: Gnus v5.6.43/Emacs 20.4 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Don writes: >> Most corporate IT managers wouldn't know a filesystem if they were >> bitten by one. >That is absolutely the case. That is why I can not suggest that >softupdates is as good as a journaled file system. The people I deal with >at least know the buzzword and they want to make sure that whatever >solution they go with will have it. Question: is the fsck time for softupdates the same as for plain UFS (when it needs to fsck, which should be (much) less often, if I remember correctly). Even the occasional long-fsck-time can be a problem for a high-availability production environment. Side question: why is it that there are certain errors (inode out of range, for example) that fsck barfs on and exits? I actually had to go in to the source for fsck and modify it to recover a drive of a coworker (with important changes since the last nightly backup). And please don't say "just clrinode it and retry". First, if you have more than a couple of them this can take a LONG time and lots of manual intervention (in this case, hundreds or more likely thousands of manual clrinodes would have been needed). Second, if that's the suggested resolution, why not make it possible to do from within fsck? If it's REALLY dangerous, then warn people about that, or stop the normal automatic mode from doing this correction without another option (the --i_really_mean_it_i_live_for_danger option). :-) If I hadn't known filesystems and been able to hack the source, the coworker would have lost some important work. -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 31 23:33:40 1999 Delivered-To: freebsd-fs@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 9547D15419 for ; Sun, 31 Oct 1999 23:33:33 -0800 (PST) (envelope-from bright@wintelcom.net) Received: from localhost (bright@localhost) by fw.wintelcom.net (8.9.3/8.9.3) with ESMTP id XAA20576; Sun, 31 Oct 1999 23:56:49 -0800 (PST) Date: Sun, 31 Oct 1999 23:56:48 -0800 (PST) From: Alfred Perlstein To: Randell Jesup Cc: freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 1 Nov 1999, Randell Jesup wrote: > Don writes: > >> Most corporate IT managers wouldn't know a filesystem if they were > >> bitten by one. > >That is absolutely the case. That is why I can not suggest that > >softupdates is as good as a journaled file system. The people I deal with > >at least know the buzzword and they want to make sure that whatever > >solution they go with will have it. > > Question: is the fsck time for softupdates the same as for > plain UFS (when it needs to fsck, which should be (much) less often, > if I remember correctly). Even the occasional long-fsck-time can be > a problem for a high-availability production environment. > > Side question: why is it that there are certain errors (inode out > of range, for example) that fsck barfs on and exits? I actually had to > go in to the source for fsck and modify it to recover a drive of a > coworker (with important changes since the last nightly backup). And > please don't say "just clrinode it and retry". First, if you have > more than a couple of them this can take a LONG time and lots of > manual intervention (in this case, hundreds or more likely thousands of > manual clrinodes would have been needed). Second, if that's the suggested > resolution, why not make it possible to do from within fsck? If it's > REALLY dangerous, then warn people about that, or stop the normal > automatic mode from doing this correction without another option (the > --i_really_mean_it_i_live_for_danger option). :-) A url to your patches would be appreciated. -Alfred > If I hadn't known filesystems and been able to hack the source, > the coworker would have lost some important work. > > -- > Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) > rjesup@wgate.com > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 2:12:54 1999 Delivered-To: freebsd-fs@freebsd.org Received: from akat.civ.cvut.cz (akat.civ.cvut.cz [147.32.235.105]) by hub.freebsd.org (Postfix) with SMTP id 4EF8C14E04 for ; Mon, 1 Nov 1999 02:12:40 -0800 (PST) (envelope-from pechy@hp735.cvut.cz) Received: from localhost (pechy@localhost) by akat.civ.cvut.cz (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id LAA21363; Mon, 1 Nov 1999 11:11:06 +0100 Date: Mon, 1 Nov 1999 11:11:06 +0100 From: Jan Pechanec X-Sender: pechy@akat.civ.cvut.cz To: Poul-Henning Kamp Cc: Greg Lehey , Bernd Walter , Don , Alfred Perlstein , freebsd-fs@FreeBSD.ORG Subject: Re: Journaling In-Reply-To: <4407.941213948@critter.freebsd.dk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, 29 Oct 1999, Poul-Henning Kamp wrote: Vahalia [UNIX Internals, Prentice-Hall] says that FFS is an original BSD filesystem and UFS is rewritten FFS for vnode layer. Jan. >In message <19991029095858.50758@mojave.worldwide.lemis.com>, Greg Lehey writes: >>On Wednesday, 27 October 1999 at 19:32:00 +0200, Bernd Walter wrote: >>> The number of partitions has nothing to do with with the filesystem you use. >>> FFS is not a partitionsheme but a filesystem. >>> UFS is a historic filesystem on which FFS is based. >> >>Well, in fact they're the same thing. The *old* name is FFS (Fast >>File System). When System V.4 was released, they adopted FFS as the >>standard file system and called it the UNIX File System. > >...Whereas in *BSD "UFS" refers to the unix sematics layer (directory >manipulation and all that) and "FFS" refers to the underlying storage >object manager (which only understands inodes and their layout.) > >-- >Poul-Henning Kamp FreeBSD coreteam member >phk@FreeBSD.ORG "Real hackers run -current on their laptop." >FreeBSD -- It will take a long time before progress goes too far! > > >To Unsubscribe: send mail to majordomo@FreeBSD.org >with "unsubscribe freebsd-fs" in the body of the message > -- Jan PECHANEC (mailto:pechy@hp735.cvut.cz) Computing Center CTU (Zikova 4, Praha 6, 166 35, Czech Republic) http://www.civ.cvut.cz, tel: +420 2 2435 2969, http://pechy.civ.cvut.cz To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 2:40:30 1999 Delivered-To: freebsd-fs@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.40.131]) by hub.freebsd.org (Postfix) with ESMTP id 34FAD14A0B for ; Mon, 1 Nov 1999 02:40:26 -0800 (PST) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.9.3/8.9.2) with ESMTP id LAA20349; Mon, 1 Nov 1999 11:36:56 +0100 (CET) (envelope-from phk@critter.freebsd.dk) To: Jan Pechanec Cc: Greg Lehey , Bernd Walter , Don , Alfred Perlstein , freebsd-fs@FreeBSD.ORG Subject: Re: Journaling In-reply-to: Your message of "Mon, 01 Nov 1999 11:11:06 +0100." Date: Mon, 01 Nov 1999 11:36:56 +0100 Message-ID: <20347.941452616@critter.freebsd.dk> From: Poul-Henning Kamp Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message , Jan Pechanec writes: > > Vahalia [UNIX Internals, Prentice-Hall] says that FFS is an >original BSD filesystem and UFS is rewritten FFS for vnode layer. > Well, who do you trust, Kirk & the source, or Vahalia ? Poul-Henning >On Fri, 29 Oct 1999, Poul-Henning Kamp wrote: >> >>...Whereas in *BSD "UFS" refers to the unix sematics layer (directory >>manipulation and all that) and "FFS" refers to the underlying storage >>object manager (which only understands inodes and their layout.) -- Poul-Henning Kamp FreeBSD coreteam member phk@FreeBSD.ORG "Real hackers run -current on their laptop." FreeBSD -- It will take a long time before progress goes too far! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 6:36:49 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.tvol.com (mail.wgate.com [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id DBFBC14BFA for ; Mon, 1 Nov 1999 06:36:42 -0800 (PST) (envelope-from rjesup@wgate.com) Received: from jesup.eng.tvol.net (jesup.eng.tvol.net [10.32.2.26]) by mail.tvol.com (8.8.8/8.8.3) with ESMTP id JAA20835 for ; Mon, 1 Nov 1999 09:32:00 -0500 (EST) Reply-To: Randell Jesup To: freebsd-fs@FreeBSD.ORG Subject: Re: Features of a journaled file system References: <19991031014032.A3510@keltia.freenix.fr> <381B85AB.68EF4A45@zk3.dec.com> From: Randell Jesup Date: 01 Nov 1999 10:33:10 +0000 In-Reply-To: Chang Song's message of "Sat, 30 Oct 1999 19:56:27 -0400" Message-ID: X-Mailer: Gnus v5.6.43/Emacs 20.4 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Chang Song writes: >> > Should the file system use b-trees? What other technologies should such a >> >> B-trees would help a lot in some cases. UFS performance has always been >> abyssimal with large directories... > >I think B+ tree is too complex to maintain and implement. >Extendible hashing (GFS uses it) is great compromise. Easier to implement >yet competitive or sometime faster than B+ tree. I'm a big fan of hashing for directories. Add something to (say) cause the FS to add a hash level to a chain that grows too large (or to all the chains of a directory that grows too large), and the benefit is almost as large as a b+ tree for access/modify, and add/delete would be (much?) faster (it's been a while since I looked at b+ trees for directories - OS2 uses them if I remember correctly). I slightly prefer adding hash table levels according to chain length, but that might require some extra bookkeeping (not a lot, just a counter per chain). I've written FS's that kept duplicate directory lists: one hashed (single level) for speed, and one sequential for speed at listing directories. It has the nice side-effect of making the FS more easily recoverable, at the cost of some disk space and slightly slower create/delete. (The sequential blocks were heavily compressed, and the hash tables only had the head file-header block (inode) of each chain. (I'm not saying this design should be duplicated - it was a way to avoid changing too much of an FS written in ASM, while speeding up dir listings; just mentioning.) -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 7: 4:41 1999 Delivered-To: freebsd-fs@freebsd.org Received: from worf.qntm.com (worf.qntm.com [146.174.250.100]) by hub.freebsd.org (Postfix) with ESMTP id 84DEA14BC2 for ; Mon, 1 Nov 1999 07:04:29 -0800 (PST) (envelope-from Stephen.Byan@quantum.com) Received: from mail3.qntm.com by worf.qntm.com with ESMTP (1.40.112.12/16.2) id AA146378668; Mon, 1 Nov 1999 07:04:28 -0800 Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mail3.qntm.com (8.8.6/8.8.6) with ESMTP id HAA14917 for ; Mon, 1 Nov 1999 07:04:29 -0800 (PST) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.10) id ; Mon, 1 Nov 1999 07:04:25 -0800 Message-Id: <8133266FE373D11190CD00805FA768BF02EE9DD7@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: freebsd-fs@FreeBSD.ORG Subject: RE: journaling UFS and LFS Date: Mon, 1 Nov 1999 07:04:24 -0800 Mime-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.10) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Chang Song [mailto:song@zk3.dec.com] wrote: > >Don wrote: >> >> Softupdates is definitely a viable solution however it does not address >> several issues and the license is not a BSD license so it makes me >> uncomfortable. > >Could you let me know what SoftUpdate does not address? >Thank you. One potential problem with soft updates is that the order of creation/deletion/truncation/etc of files is not preserved through a crash or power outage, wheras UFS and logged file systems (not logging file systems as in LFS; what do you say the kind that maintain a recovery log in addition to their regular metadata?) preserve this ordering. I wonder how many recovery strategies are broken by soft updates. Anyone have any data? Regards, -Steve Steve Byan Design Engineer Quantum Corporation MS 1-3/E23 333 South Street Shrewsbury, MA 01545 voice: (508) 770-3414 fax: (508) 770-2604 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 7:52:43 1999 Delivered-To: freebsd-fs@freebsd.org Received: from angel.algonet.se (angel.algonet.se [194.213.74.112]) by hub.freebsd.org (Postfix) with SMTP id 535DA14C92 for ; Mon, 1 Nov 1999 07:52:37 -0800 (PST) (envelope-from mal@algonet.se) Received: (qmail 13586 invoked from network); 1 Nov 1999 16:52:35 +0100 Received: from enok.algonet.se (194.213.74.88) by angel.algonet.se with SMTP; 1 Nov 1999 16:52:35 +0100 Received: from kairos.algonet.se ([194.213.74.18]) by algonet.se (BLUETAIL Mail Robustifier1.0.4) with ESMTP ; Mon, 01 Nov 1999 15:52:35 GMT Received: (mal@localhost) by kairos.algonet.se (8.8.8+Sun/8.6.12) id QAA19191; Mon, 1 Nov 1999 16:52:34 +0100 (MET) Date: Mon, 1 Nov 1999 16:52:34 +0100 (MET) Message-Id: <199911011552.QAA19191@kairos.algonet.se> X-Authentication-Warning: kairos.algonet.se: mal set sender to mal@kairos.algonet.se using -f From: Mats Lofkvist To: ezk@cs.columbia.edu Cc: freebsd-fs@FreeBSD.org In-reply-to: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu> (message from Erez Zadok on Sun, 31 Oct 1999 17:11:20 -0500 (EST)) Subject: Re: stupidfs - easily extensible test file systems? References: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > And does anyone know if this has a chance being a standard part > of FreeBSD, and how it relates to the general cleanup of the > stacking fs code that seem to be on the "todo sometime in the > future" list for FreeBSD? What do you mean by "this"? My code will be fixed soon. The problem is that I'm forced to use synchronous writes to work around the VFS problems. I don't expect the VFS to be fixed any time soon. It's been broken for a long time and there aren't too many "customers" complaining about it, or it would have been fixed by now. It just doesn't appear to be a high priority for the freebsd developers. I think it's too late for 3.x, but now would be a good time for freebsd to put those fixes into 4.0, before it becomes the default stable version. (My limited VFS knowledge shows here, but what the heck..) What I wondered was if fist/wrapfs helps cleaning up the FreeBSD VFS code, is only using it as is, or if it is incompatible with what the FreeBSD architects have in mind. I.e. is it a good idea to build a new FreeBSD filesystem using wrapfs? _ Mats Lofkvist mal@algonet.se To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 8:21:15 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 66C6914BFA for ; Mon, 1 Nov 1999 08:21:09 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id RAA04291; Mon, 1 Nov 1999 17:19:38 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id RAA73394; Mon, 1 Nov 1999 17:19:37 +0100 (MET) Date: Mon, 1 Nov 1999 17:19:36 +0100 From: Eivind Eklund To: Don Cc: Jacques Vidrine , freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS Message-ID: <19991101171936.J72085@bitbox.follo.net> References: <19991030233304.03DB31DA4@bone.nectar.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from don@calis.blacksun.org on Sat, Oct 30, 1999 at 07:40:35PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote: > This is getting off topic. What features would you like to see in a new > file system. Some suggestions were made. Would you like to add anything to > this list? Yes. * Easy to do concurrent access from multiple hosts to the same physical media * Ability to span more than one disk * Performance guarantees I have design papers on the FS designed for G2, which was intended to support all of the features I've seen listed so far. It has a couple of drawbacks: (1) It is not designed to have the semantics of a standard Unix filesystem. It is designed to run at the bottom end of a chain of stacked filesystems. If you want e.g. symlinks to work, you need to stack a layer. (2) It is not designed to run on a single spindle. Single spindle performance will be horrible. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 10:46: 1 1999 Delivered-To: freebsd-fs@freebsd.org Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20]) by hub.freebsd.org (Postfix) with ESMTP id C610014A20 for ; Mon, 1 Nov 1999 10:45:52 -0800 (PST) (envelope-from ezk@shekel.mcl.cs.columbia.edu) Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15]) by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id NAA08720; Mon, 1 Nov 1999 13:45:51 -0500 (EST) Received: (from ezk@localhost) by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id NAA02285; Mon, 1 Nov 1999 13:45:51 -0500 (EST) Date: Mon, 1 Nov 1999 13:45:51 -0500 (EST) Message-Id: <199911011845.NAA02285@shekel.mcl.cs.columbia.edu> X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f From: Erez Zadok To: Mats Lofkvist Cc: ezk@cs.columbia.edu, freebsd-fs@FreeBSD.org Subject: Re: stupidfs - easily extensible test file systems? In-reply-to: Your message of "Mon, 01 Nov 1999 16:52:34 +0100." <199911011552.QAA19191@kairos.algonet.se> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message <199911011552.QAA19191@kairos.algonet.se>, Mats Lofkvist writes: > > > And does anyone know if this has a chance being a standard part > > of FreeBSD, and how it relates to the general cleanup of the > > stacking fs code that seem to be on the "todo sometime in the > > future" list for FreeBSD? > > What do you mean by "this"? My code will be fixed soon. The problem > is that I'm forced to use synchronous writes to work around the VFS > problems. I don't expect the VFS to be fixed any time soon. It's been > broken for a long time and there aren't too many "customers" > complaining about it, or it would have been fixed by now. It just > doesn't appear to be a high priority for the freebsd developers. I > think it's too late for 3.x, but now would be a good time for freebsd > to put those fixes into 4.0, before it becomes the default stable > version. > > (My limited VFS knowledge shows here, but what the heck..) > > What I wondered was if fist/wrapfs helps cleaning up the FreeBSD VFS code, > is only using it as is, or if it is incompatible with what the FreeBSD > architects have in mind. > > I.e. is it a good idea to build a new FreeBSD filesystem using wrapfs? Yes. That's the premise of my Ph.D. work: (1) I provide you with stackable templates that do not change the VFS, do not change anything else in the OS, and do not modify lower level file systems (FFS, NFS, etc.) That way, when my templates are not in use, the performance of the rest of the system remains the same (which was not true for past stackable vnode interface works). (2) My wrapfs templates use the VFS as is. That was an important goal for me, knowing full well that requiring any significant changes to the VFS will never be accepted by any OS vendor. Requiring big changes was one reason why all the work done by Sun and UCLA is not available in modern, common OSs; no one wants to rewrite the VFS and all file systems to conform to a new "real" stackable interface. (3) The wrapfs templates export a simple API that's similar across different OSS. I have templates for FreeBSD, Linux, and Solaris. When an OS makes small changes to their VFS, I update the wrapfs templates as needed. People who used wrapfs as a basis for another file system don't have to worry too much about kernel internals. Creating the wrapfs templates was the first half of my Ph.D. work. The second half is the creation of a high level stackable f/s language, which I call FiST. Fistgen, the language translator, uses wrapfs templates and f/s descriptions to produce f/s modules automatically for your choice OS. That's a summary of things. If you want more details, I'll be happy to provide them. You can also read my USENIX'99 paper titled "Extending File Systems Using Stackable Templates", available in http://www.cs.columbia.edu/~ezk/research/ Also, there's a WIP paper on FiST in http://www.cs.columbia.edu/~ezk/research/wip.html Now, going back to FreeBSD: I didn't want to change the FreeBSD VFS, so I worked around it. I used synchronous writes to "solve" the backing-object problem. The result was a wrapfs template that is slower due to all those synchronous writes, but at least it works (unlike nullfs and unionfs). When the FreeBSD VFS is fixed, I will produce updated wrapfs templates that don't need synchronous writes. After that, you'd have to port your diffs from the old wrapfs template to the new one (that could be automated using a 3-way diff). If fistgen is out by then, using new templates would be as easy as rerunning fistgen on your existing ".fist" file. > Mats Lofkvist > mal@algonet.se Erez. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 12:32:37 1999 Delivered-To: freebsd-fs@freebsd.org Received: from apollo.sitaranetworks.com (apollo.sitaranetworks.com [199.103.141.105]) by hub.freebsd.org (Postfix) with ESMTP id 1F864153E9 for ; Mon, 1 Nov 1999 12:30:57 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991029095858.50758@mojave.worldwide.lemis.com> Date: Fri, 29 Oct 1999 09:58:58 -0400 From: Greg Lehey To: Bernd Walter , Don Cc: Alfred Perlstein , freebsd-fs@FreeBSD.ORG Subject: Re: Journaling References: <19991027095431.45462@mojave.worldwide.lemis.com> <19991027193200.A52144@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991027193200.A52144@cicely7.cicely.de>; from Bernd Walter on Wed, Oct 27, 1999 at 07:32:00PM +0200 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 27 October 1999 at 19:32:00 +0200, Bernd Walter wrote: > The number of partitions has nothing to do with with the filesystem you use. > FFS is not a partitionsheme but a filesystem. > UFS is a historic filesystem on which FFS is based. Well, in fact they're the same thing. The *old* name is FFS (Fast File System). When System V.4 was released, they adopted FFS as the standard file system and called it the UNIX File System. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 12:34:11 1999 Delivered-To: freebsd-fs@freebsd.org Received: from apollo.sitaranetworks.com (apollo.sitaranetworks.com [199.103.141.105]) by hub.freebsd.org (Postfix) with ESMTP id D62DB15815 for ; Mon, 1 Nov 1999 12:30:57 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991028085348.39481@mojave.worldwide.lemis.com> Date: Thu, 28 Oct 1999 08:53:48 -0400 From: Greg Lehey To: "Kenneth D. Merry" , Don Cc: Bernd Walter , Alfred Perlstein , freebsd-fs@FreeBSD.ORG Subject: Re: Journaling References: <199910280305.VAA13281@panzer.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <199910280305.VAA13281@panzer.kdm.org>; from Kenneth D. Merry on Wed, Oct 27, 1999 at 09:05:04PM -0600 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 27 October 1999 at 21:05:04 -0600, Kenneth D. Merry wrote: > Don wrote... >>> Actually, it's technically 8 partitions, a-h, but c is "special", and >>> shouldn't normally be used. >> Correct C represents the entire disk. >> >>> This is a disklabel limitation, not a filesystem limitation. I believe >>> that Solaris x86 may be able to do 16 partitions (or so a guy at Sun told >>> me). >> >> I will have to check this out. Thanks for the info. Is there any reason >> that disklabel has this limit? > > It has been that way for a long time. I'm not sure why the limit is 8, but > it is. (Someone might know. I suspect it was just an arbitrary value > chosen a long time ago.) Changing it might break backwards compatibility, > though. There was some discussion about increasing it at one point. But as you say, it would probably confuse some programs, and I personally don't see any need. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 12:34:12 1999 Delivered-To: freebsd-fs@freebsd.org Received: from apollo.sitaranetworks.com (apollo.sitaranetworks.com [199.103.141.105]) by hub.freebsd.org (Postfix) with ESMTP id 548CD157CF for ; Mon, 1 Nov 1999 12:30:57 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991028085243.24656@mojave.worldwide.lemis.com> Date: Thu, 28 Oct 1999 08:52:43 -0400 From: Greg Lehey To: Don Cc: Bernd Walter , Alfred Perlstein , freebsd-fs@FreeBSD.ORG Subject: Re: Journaling References: <19991027173720.06226@mojave.worldwide.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Don on Wed, Oct 27, 1999 at 09:59:23PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 27 October 1999 at 21:59:23 -0400, Don wrote: >>> [snipped in original: claim that Vinum wasn't ready for production] >> >> Oh, does it? What problems have you seen? You'd better tell all the >> people who are using it in production, too. > > Ok can we stop with the insults? The point of this thread is research not > attacks on anyone. I have seen problems with disk mirroring using vinum in > which attempting to synchronize a new disk after a previous had failed > caused a kernel panic and left me with no way to recreate the failed disk. > This may have been fixed, however. At the time the problem was > reproduceable and I did not have the time to investigate further. If a tree falls in the forest, and nobody hears it, did it fall? As I said above: "What problems have you seen?". A kernel panic (is there any other kind?) is a matter you should report. We *have* had problems in Vinum; as you say, this isn't necessarily the case at the moment. >> UFS on System V uses the System V partition table, which allows 15 >> partitions. I don't know what use even 7 are, which is probably one >> of the reasons nobody has done anything about it. > Actually I simply run everything off of the root partition and allocate > all of the space to that. > >> Yes, this is the usual result of using too many file system >> partitions. > > No this is a result of a mistake in estimating the size that a given > partition should be. This includes /var and / (although perhaps I > should simply have a single file system mounted off of /) Indeed. But my crystal ball is broken, and I can't find anybody to repair it. How do *you* forsee the future? In any case, even if you can, what benefit do you have from a maze of twisty little file systems, all different? >> I'm not sure what you're talking about here, but the best thing I can >> think of is Vinum. > > Vinum is a volume manager. I dont see why it keeps coming up in reference > to a journaled file system. It doesn't. You were talking about partitioning, which also has nothing to do with a journalling file system. On Wednesday, 27 October 1999 at 22:06:30 -0400, Don wrote: >>> Ok nevermind :) Either way vinum is not up to snuff. It still has a way to >>> go before it can be used in a production environment. >> >> Oh, does it? What problems have you seen? You'd better tell all the >> people who are using it in production, too. > > Perhaps you should read the vinum known bugs page. What was that you were saying about insults above? > That list is far too long for a production application. Ah. Could you define the correct length? "0" is not an answer. > If you dont feel it is too long then by all means use it. When I > stop seeing the words "data corruption" and "kernel panic" on the > known bugs page then I will use vinum. Maybe you should read the context. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 12:34:23 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 7D95315844 for ; Mon, 1 Nov 1999 12:34:08 -0800 (PST) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.1/8.9.1) id NAA55372; Mon, 1 Nov 1999 13:33:56 -0700 Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdWO37qa; Mon Nov 1 13:33:50 1999 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id NAA01820; Mon, 1 Nov 1999 13:33:29 -0700 (MST) From: Terry Lambert Message-Id: <199911012033.NAA01820@usr02.primenet.com> Subject: Re: Journaling To: phk@critter.freebsd.dk (Poul-Henning Kamp) Date: Mon, 1 Nov 1999 20:33:29 +0000 (GMT) Cc: pechy@hp735.cvut.cz, grog@lemis.com, ticso@cicely.de, don@calis.blacksun.org, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG In-Reply-To: <20347.941452616@critter.freebsd.dk> from "Poul-Henning Kamp" at Nov 1, 99 11:36:56 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Jan Pechanec writes: > > > > Vahalia [UNIX Internals, Prentice-Hall] says that FFS is an > >original BSD filesystem and UFS is rewritten FFS for vnode layer. > > Well, who do you trust, Kirk & the source, or Vahalia ? The statement in Vahalia is ambiguous; I tried to get it amended during technical editing. The only really bad call in the book, IMO, is that Vahalia likes the Solaris Slab allocator, but I believe that it is seriously sub-optimal for SMP systems, and I personally prefer the Dynix Zone allocator, which he doesn't like as much. The UFS in System V was originally the Net/2 FFS code, with minor entry point rewrites for insertion into the VFS switch list in the System V kernel. The most significant differences in the current SVR4.2 code for the UFS compared to the current Berkeley FFS are: o No support for vnode stacking, even though the original Heidemann code was done on an SVR4 (Solaris) platform. o The vnodes are owned by the file systems. The ability to do this in BSD UNIX is missing at this time, which makes things like XFS and VXFS, etc., much harder to port. There is some non-general kludge code to support TRW's TFS code's ownership of vnodes; it would be nice to generalize this to enable easier porting of FS code to FreeBSD. o No support for soft updates, even though the original Ganger/Patt code was developed under SVR4.0.2 ES/MP. o Support for Delayed Ordered Writes (DOW). This is a mthod of staging writes; it is similar in result to soft updates, with hard-coded pool drains at any synchronization points (where soft updates would invoke a contention resolver, DOW forces a flush of all pending writes, to ensure ordering guarantees). DOW is covered by a USL patent. o Buffer cache synchronization is still handled manually, even within Solaris, which has a unified VM and buffer cache. In particular, like FreeBSD, there are some code errors which make msync() necessary for some uses. The SVR4.2 buffer cache is not unified with the VM system in standard System V, although the VM system is significantly reworked for SMP (Steve Baumel of USL did much of the rewrite). Having spent time in the bowels of that code, and in the bowels of VXFS, and in a derivative of the code of my own design, I can guarantee you that UFS and VXFS are derived from Net/2 code; more correctly, UFS is derived from Net/2, and VXFS is derived from UFS, in particular, its directory handling code has USL copyrights all over it. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 13:10:46 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 86B0A14DEA for ; Mon, 1 Nov 1999 13:10:17 -0800 (PST) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.1/8.9.1) id OAA60198; Mon, 1 Nov 1999 14:10:09 -0700 Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdbkSLia; Mon Nov 1 14:10:05 1999 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id OAA03339; Mon, 1 Nov 1999 14:10:03 -0700 (MST) From: Terry Lambert Message-Id: <199911012110.OAA03339@usr02.primenet.com> Subject: Re: Journaling To: grog@lemis.com (Greg Lehey) Date: Mon, 1 Nov 1999 21:10:03 +0000 (GMT) Cc: don@calis.blacksun.org, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG In-Reply-To: <19991027095431.45462@mojave.worldwide.lemis.com> from "Greg Lehey" at Oct 27, 99 09:54:31 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Kirk McKusick has been working for the last year or so on > > a combination of "soft-updates" (complete) and "snapshots" > > (not released yet), once complete FFS will have the equivelant > > of logging AND snapshots like the netapp appliance. > > I am familiar with softupdates but not with snapshots. Snapshots are where you put a peg in the soft updates clock and export the state as of the peg. This lets you have a consistant copy of the filesystem state, guaranteed, which will not mutate out from under you while you are, for example, doing a backup of the system. This is a far cry from journalling, which, unless you do an LRU on your journal allocations, doesn't have the capability for "snapshots" (which would be "all journal entries prior to the time of the snapshot"). > The reason for starting a new project was basically to once > and for all get rid of UFS. I assume you mean the on-disk structure. Having been in the bowels of VXFS (Veritas) in SVR4.2, I can guarantee you that the on-disk directory structure is derived from the SVR4 UFS implementation, and that the only real changes are to the way inodes and inode data is stored. > While there is nothing wrong with UFS it does have some limitations which > I would like to eliminate such as a limit of 7 slices. This is a limit of the disklabel partitioning scheme; you might as well say you want to address the 4 partition limit in the FAT FS, since it bears the same relation. The big things that journalling buys you over soft updates or logging are: 1) The ability to come back up at the last valid journalled state, without checking the FS. Like soft updates and LFS, this only works if you can tell the difference between a panic and a power failure; otherwise, you still need a full fsck. If you know this, then it saves you the background cleanup of the cylinder group bitmaps that soft updates requires, and the background "cleanerd" that LFS requires. 2) The ability to roll things forward following a crash, in as much as you know them to be true. This saves you in the case of implied state between user files, without a synchronous commit process in effect (e.g. an index file for a record file). > I would also like to add functionality such as the ability to > grow and shrink partitions etc. You can actually grow partitions with FFS. Der Mouse has written a program to extend FFS size, and it is publically available for download. The problem that arises is that the relative fragmentation rate for the old and new zones are not constant. If you think of the block allocation process as a hashing process, you effectively hash the blocks onto the disk. The original reason for a large free reserve was based on Knuth's seminumerical algorithms: sorting and searching, which states that a hash fill in excess of 85% is the point of diminishing returns for a perfect hash. This actually means that the correct free reserve for a hard disk, for optimal performance, is 15%, which is almost twice the 8% set by MINFREE in fs.h (whose comments are wrong now, as well) So effectively, someone needs to write a defragger. This is actually quite trivial to do, it's just a lot of grunt work, and the danger of a bug is rather amplified, so a lot of rigor would be needed, as well. The case of shrinking the available space is trivial, given a defragger, since you can easily define a "no fly zone" for the defragmentation process to get the data moved out of the region that you are going to take away. In any event, this is unrelated to the idea of journalling. > Softupdates is also not recommended for use on the root partition and This is actually a chicken-and-egg problem with setting the bit, not really an issue of "not being recommended for the root fs"; it's a bit hard to tunefs /. It's likely that the integration of character and block devices will make it impossible, without a seperate boot, since you will no longer be able to "cheat". > it still seems to be just a little flaky. Every once in a while I wind up > with a problem which I have traced to softupdates but which I could > not recreate. (To be fair I have not had a problem in a month or two now) I think these are more VM issues, than anything else; when things change, they tend to break where they are most fragile. The order guarantees in soft updates must be rigidly enforced by the systems on which it depends. If you are not running a UPS, and you are using soft updates, you should make sure to turn off write-caching on your disk drive, since it doesn't do cache flush ("committed to stable storage") notification, and the cache flush operation, if exported by the drive, is not integrated, so soft updates can neither force a flush at a synchronization point, nor can it intentioanlly stall writes over a synchronization point, pending flush notification. In any case, so long as you use it correctly, you should not be experienceing any problems, and I'm sure many of us would be very interested in knowing about any problems you see (Julian and Kirk, especially). Again, soft updates is a contention resoloution technology that is used to guaranteed ordering of metadata writes. I believe that there are good technical arguments why you might want to use soft updates technology, even if you had journalled metadata, to allow dependency ordered log data to be logged on a clock tick rather than on a synchronization point, and to ensure that the journalling process itself does not become a bottleneck. That said, without a distributed cache coherency protocol, you would potentially have to give up some goals, such as multiple machine access to the same filesystem over a shared SCSI bus, like XFS, for example. > > In so far as codebase there is the LFS project, currently > > fixed (afaik) in NetBSD, perhaps porting that to FreeBSD > > would be worthwhile. > > This is indeed going to be the starting point for this project but > I hope I would be able to take it far beyond this. Logging and journalling are very different animals, even if some of the tricks that both do are conceptually similar. I would actually _disourage_ using the LFS as a starting point for a JFS, since I believe that it would limit you options in a number of subtle, but important ways. Also note that XFS is log structured (they have posted their logging code under GPL, up at SGI, as a "teaser" while they "clean" the remainder of their code of encumberances, presumably USL). Actually, AIX has a device driver writer's supplementary guide, which comes with source code for an MFS for AIX, and goes into great detail about th AIX GFS (think file system switch) abstraction, and into some detail on the AIX JFS, as well. I was able to, for example, reverse engineer the entry points for the file locking code, which was not externalized in AIX 4, in support of a shared file descriptor pool that could be used by multiple processes -- a poor man's "rfork". You have to order this book seperately, since it doesn't come with the full documentation set. You might also want to look at the NTFS implementation, as it is described in the thin (about 1/4 inch thick) Helen Custer book. I believe that kernel changes, and in particular, changes to the way VOP_ABORT has to be called and implemented for journalling, will be necessary. It may be easier for you to make these changes with a partially working example, by making the existing NTFS code read/write instead of read-only. Don't despair: Linux is going to require much more extensive VFS changes to support journalling than FreeBSD, so you are ahead of the game, even though the Linux JFS project is already under way. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 13:22:48 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 712CB14CC1 for ; Mon, 1 Nov 1999 13:22:34 -0800 (PST) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id OAA15950; Mon, 1 Nov 1999 14:21:46 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp03.primenet.com, id smtpdAAAEXay4D; Mon Nov 1 14:21:41 1999 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id OAA03623; Mon, 1 Nov 1999 14:19:24 -0700 (MST) From: Terry Lambert Message-Id: <199911012119.OAA03623@usr02.primenet.com> Subject: Re: Journaling To: dhw@whistle.com (David Wolfskill) Date: Mon, 1 Nov 1999 21:19:23 +0000 (GMT) Cc: bright@wintelcom.net, don@calis.blacksun.org, freebsd-fs@FreeBSD.ORG In-Reply-To: <199910271440.HAA31103@pau-amma.whistle.com> from "David Wolfskill" at Oct 27, 99 07:40:16 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >> Kirk McKusick has been working for the last year or so on > >> a combination of "soft-updates" (complete) and "snapshots" > >> (not released yet), once complete FFS will have the equivelant > >> of logging AND snapshots like the netapp appliance. > > > >I am familiar with softupdates but not with snapshots. > > Take a look at Network Appliance's "WAFL". (They have some white > papers up on their Web site, http://www.netapp.com/. In particular, the > one at http://www.netapp.com/tech_library/3002.html descibes WAFL and > snapshots.) Note that the internal implementation of the Network Appliance embedded OS is a non-preemptive cooperative multitasking model, similar to the internal implementation of NetWare, where threads either run to completion or until an explicit yield (this is also why NetWare never did the SMP thing correctly for Native NetWare, and why NetWare for UNIX is able to beat its performance numbers on identical single processor hardware, but really kicks its butt when it comes to SMP hardware). The upshot of this is that the WAFL implementation make some seriously invalid-for-FreeBSD assumptions about not having to have explicit synchronization primitives anywhere. Short of going to a similar kernel model (kernel threads handling device drivers are a generally bad idea for a lot of reasons, including the one where NT was able to kick Linux's ass with the Microsoft specified four ethernet cards on a 4 processor SMP box in the Netcraft and Ziff Davis labs tests), you would have to add significant overhead to the WAFL design discussed in those documents. It would perform very poorly in a standard UNIX kernel, without some significant organizational changes to eliminate the large number of threads model implied synchronization points from having to be changed to explicit locks. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 13:28:49 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 2611F14E2A for ; Mon, 1 Nov 1999 13:28:29 -0800 (PST) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.1/8.9.1) id OAA46230; Mon, 1 Nov 1999 14:28:18 -0700 Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp05.primenet.com, id smtpdCtTf7a; Mon Nov 1 14:28:14 1999 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id OAA03945; Mon, 1 Nov 1999 14:28:11 -0700 (MST) From: Terry Lambert Message-Id: <199911012128.OAA03945@usr02.primenet.com> Subject: Re: Journaling To: bde@zeta.org.au (Bruce Evans) Date: Mon, 1 Nov 1999 21:28:11 +0000 (GMT) Cc: Brendon_Meyer@fmi.com, grog@lemis.com, freebsd-fs@FreeBSD.ORG In-Reply-To: from "Bruce Evans" at Oct 30, 99 04:18:12 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > The supply of 'FDISK' style slices is essentially unlimited. I believe the > limit is 2G or 4G slices for the 'FDISK' (extended) data structure. FreeBSD > drivers only support the first 30 and FreeBSD fdisk only supports the first 4. The slice size limit is based on the 8G overall limit on a partition in which you can place an extended partition. In actual fact, the DOS partition table has a 32 bit alternate size field, which is the count of sectors in the partition, that is supposed to be used when the C/H/S values are all set to zero; clearly, it breaks some backward compatability, when used. This puts the upper bound on a single partition at ~1TB, and the offset is the same, so you should be able to map an ~2TB disk in two partitions using FDISK partitioning. The only caveat is that you won't be able to share the disk with older versions of DOS and Windows, unless you put a smaller partition of standard C/H/S values up front. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 13:52:20 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id C870A15028 for ; Mon, 1 Nov 1999 13:51:59 -0800 (PST) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id OAA27759; Mon, 1 Nov 1999 14:51:27 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp04.primenet.com, id smtpdAAAQFaWd2; Mon Nov 1 14:51:18 1999 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id OAA05179; Mon, 1 Nov 1999 14:51:45 -0700 (MST) From: Terry Lambert Message-Id: <199911012151.OAA05179@usr02.primenet.com> Subject: Re: journaling UFS and LFS To: Stephen.Byan@quantum.com (Stephen Byan) Date: Mon, 1 Nov 1999 21:51:44 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG In-Reply-To: <8133266FE373D11190CD00805FA768BF02EE9DD7@shrcmsg1.tdh.qntm.com> from "Stephen Byan" at Nov 1, 99 07:04:24 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >> Softupdates is definitely a viable solution however it does not address > >> several issues and the license is not a BSD license so it makes me > >> uncomfortable. The license issue is a Whistle thing. Talk to Julian and get him to pound on Doug Brent, preferrably before December 31st of this year. > >Could you let me know what SoftUpdate does not address? > >Thank you. > > One potential problem with soft updates is that the order of > creation/deletion/truncation/etc of files is not preserved through a crash > or power outage, wheras UFS and logged file systems (not logging file > systems as in LFS; what do you say the kind that maintain a recovery log in > addition to their regular metadata?) preserve this ordering. I wonder how > many recovery strategies are broken by soft updates. Anyone have any data? This is not strictly true of soft updates, if you have a well behaved disk drive. The problem with the current implementation is that, when you have uncooperative hardware, you have to sacrifice some of your performance by disabling write caching. Probably, you have not disabled write caching. If the drive would notify when the data has truly been committed to stable storage, as opposed to the write cache, or even if there were an out of band mechanism to force the drive to flush its write cache (and eat the stall that would have to be introduced for this to work), you could get significantly better performance without risking your data. The main recovery strategy that soft updates allows is that, after a crash, the file system state is consistant, with the exception of unallocated blocks showing as allocated in the unflushed-at-the-time-of-the-crash cylinder group bitmaps. Technically, you could lock access to particular cylinder groups as you were fixing up their bitmaps, and effectively do your fsck in the background. One real problem that remains unaddressed in this case, however, is the chicken-and-egg problem. That is, there is no way to distinguigh a power failure or an FS-unrelated panic from an FS-related panic, such as a real disk hardware or buffer cache corrupting failure -- data non-corrupting vs. data corrupting crashes. Without this information, it is unsafe to assume that the crash was an uncorrupting crash, and do the abbrevated fsck. Adding this information would require adding a new bit into the super block, and being willing to write the superblock back in the event of a panic. You would probably have to add a flags parameter to the front of the panic() function in order to tell it what kind of crash was happening; this would be a hell of a lot safer than, for example, a global variable. Another thing that could mitigate this, at least on relatively quiescent systems (e.g. it'd work for power failures in the middle of the night, but wouldn't work for systems with disk writes going on) would be "soft read-only". This would flush all writes, and then if no new writes came in for "a while", you would set a flag on the in code FS structure that you were marking it "soft read-only", and then write out the superblock marking it clean. Subsequent writes would be permitted, but only when the "soft read-only" bit was cleared, after remarking the super block dirty again. We actually implemented both soft updates and soft read-only in our port of FFS to Windows 95, at Artisoft. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 15:44:36 1999 Delivered-To: freebsd-fs@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 0CC6A14C81 for ; Mon, 1 Nov 1999 15:44:31 -0800 (PST) (envelope-from dg@implode.root.com) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id PAA07714; Mon, 1 Nov 1999 15:38:21 -0800 (PST) Message-Id: <199911012338.PAA07714@implode.root.com> To: Terry Lambert Cc: Stephen.Byan@quantum.com (Stephen Byan), freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS In-reply-to: Your message of "Mon, 01 Nov 1999 21:51:44 GMT." <199911012151.OAA05179@usr02.primenet.com> From: David Greenman Reply-To: dg@root.com Date: Mon, 01 Nov 1999 15:38:21 -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >> >> Softupdates is definitely a viable solution however it does not address >> >> several issues and the license is not a BSD license so it makes me >> >> uncomfortable. > >The license issue is a Whistle thing. Talk to Julian and get him >to pound on Doug Brent, preferrably before December 31st of this year. How is the softupdates license a Whistle thing? It seems to me that it is a Kirk McKusick and Sun MicroSystems thing. -DG David Greenman Co-founder/Principal Architect, The FreeBSD Project - http://www.freebsd.org Creator of high-performance Internet servers - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 15:52:44 1999 Delivered-To: freebsd-fs@freebsd.org Received: from uni4nn.gn.iaf.nl (osmium.gn.iaf.nl [193.67.144.12]) by hub.freebsd.org (Postfix) with ESMTP id 685A014A0D for ; Mon, 1 Nov 1999 15:52:18 -0800 (PST) (envelope-from wilko@yedi.iaf.nl) Received: from yedi.iaf.nl (uucp@localhost) by uni4nn.gn.iaf.nl (8.9.2/8.9.2) with UUCP id AAA29469; Tue, 2 Nov 1999 00:37:21 +0100 (MET) Received: (from wilko@localhost) by yedi.iaf.nl (8.9.3/8.9.3) id XAA25459; Mon, 1 Nov 1999 23:06:03 +0100 (CET) (envelope-from wilko) From: Wilko Bulte Message-Id: <199911012206.XAA25459@yedi.iaf.nl> Subject: Re: journaling UFS and LFS In-Reply-To: <199911012151.OAA05179@usr02.primenet.com> from Terry Lambert at "Nov 1, 1999 9:51:44 pm" To: tlambert@primenet.com (Terry Lambert) Date: Mon, 1 Nov 1999 23:06:03 +0100 (CET) Cc: Stephen.Byan@quantum.com, freebsd-fs@FreeBSD.ORG X-Organisation: Private FreeBSD site - Arnhem, The Netherlands X-pgp-info: PGP public key at 'finger wilko@freefall.freebsd.org' X-Mailer: ELM [version 2.4ME+ PL43 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org As Terry Lambert wrote ... > > >> Softupdates is definitely a viable solution however it does not address > > >> several issues and the license is not a BSD license so it makes me > > >> uncomfortable. > > The license issue is a Whistle thing. Talk to Julian and get him > to pound on Doug Brent, preferrably before December 31st of this year. > > > > >Could you let me know what SoftUpdate does not address? > > >Thank you. > > > > One potential problem with soft updates is that the order of > > creation/deletion/truncation/etc of files is not preserved through a crash > > or power outage, wheras UFS and logged file systems (not logging file > > systems as in LFS; what do you say the kind that maintain a recovery log in > > addition to their regular metadata?) preserve this ordering. I wonder how > > many recovery strategies are broken by soft updates. Anyone have any data? > > This is not strictly true of soft updates, if you have a well > behaved disk drive. > > The problem with the current implementation is that, when you > have uncooperative hardware, you have to sacrifice some of your > performance by disabling write caching. > > Probably, you have not disabled write caching. > > If the drive would notify when the data has truly been committed > to stable storage, as opposed to the write cache, or even if there > were an out of band mechanism to force the drive to flush its write > cache (and eat the stall that would have to be introduced for this > to work), you could get significantly better performance without > risking your data. On SCSI you should be able to use the SYNCHRONISE CACHE cmd to get the data onto the platter. How to prioritise this cmd into the various queues is another matter. (As are less-than-wellbehaved SCSI devices that A. don't implement the cmd or B. implement it wrongly or C. forget all about write caching when hit with a SCSI bus reset or... you get my point) Wilko -- | / o / / _ Arnhem, The Netherlands - Powered by FreeBSD - |/|/ / / /( (_) Bulte WWW : http://www.tcja.nl http://www.freebsd.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 16:16:39 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.tvol.com (mail.wgate.com [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id 86A5414DB4 for ; Mon, 1 Nov 1999 16:16:35 -0800 (PST) (envelope-from rjesup@wgate.com) Received: from jesup.eng.tvol.net (jesup.eng.tvol.net [10.32.2.26]) by mail.tvol.com (8.8.8/8.8.3) with ESMTP id TAA14664 for ; Mon, 1 Nov 1999 19:11:55 -0500 (EST) Reply-To: Randell Jesup To: freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS References: <199911012151.OAA05179@usr02.primenet.com> From: Randell Jesup Date: 01 Nov 1999 20:13:04 +0000 In-Reply-To: Terry Lambert's message of "Mon, 1 Nov 1999 21:51:44 +0000 (GMT)" Message-ID: X-Mailer: Gnus v5.6.43/Emacs 20.4 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Terry Lambert writes: >Another thing that could mitigate this, at least on relatively >quiescent systems (e.g. it'd work for power failures in the >middle of the night, but wouldn't work for systems with disk >writes going on) would be "soft read-only". This would flush >all writes, and then if no new writes came in for "a while", >you would set a flag on the in code FS structure that you were >marking it "soft read-only", and then write out the superblock >marking it clean. Subsequent writes would be permitted, but >only when the "soft read-only" bit was cleared, after remarking >the super block dirty again. This scheme was used for the Amiga FS's - in fact it was critical for them, since there was no explicit 'shutdown' command. The root block (equivalent to superblock) would be marked dirty (and flushed to disk) if metadata (including file sizes) changed, and if there was no write activity for a second or two it would be flushed and the root block would be written with a clean flag. (This is a simplification, of course.) On a single-user system, the disks are often (usually) quiescent and thus would be marked clean (even during use - mine's totally quiet right now). On busier systems or under load the superblock would rarely be left in the clean state, however. Also, because of write ordering and the way files were created, during validation (aka fsck) the disk was readable; in some instances if there were corruption a file or directory might not be accessible, and an error would be returned (of course, the validation process would normally fix said error when it got to it). If something tried to write to an unvalidated drive, the filesystem would return an error, and the Write()/Create()/Delete()/etc OS code would put up an error/retry requester, which would automatically go away (and retry) once the drive validated. Validation was also quite fast by fsck standards. Not all problems could be solved by the built-in validator; disk-recovery tools could attempt to fix even very seriously hosed disks. Since the disk was usually mostly readable even with an uncorrectable error, often the disk recovery program could be run from the bad partition itself if need be. Of course, this is mostly of historical interest at this point, but some of the ideas used in it show up moderately often (witness msg above). -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 17:44: 6 1999 Delivered-To: freebsd-fs@freebsd.org Received: from web120.yahoomail.com (web120.yahoomail.com [205.180.60.121]) by hub.freebsd.org (Postfix) with SMTP id D0A3214E94 for ; Mon, 1 Nov 1999 17:44:05 -0800 (PST) (envelope-from dyeske@yahoo.com) Message-ID: <19991102014632.4418.rocketmail@web120.yahoomail.com> Received: from [209.186.12.16] by web120.yahoomail.com; Mon, 01 Nov 1999 17:46:32 PST Date: Mon, 1 Nov 1999 17:46:32 -0800 (PST) From: David Yeske Subject: unsubscribe To: freebsd-fs@FreeBSD.ORG MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org unsubscribe ===== __________________________________________________ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Nov 1 18:47:40 1999 Delivered-To: freebsd-fs@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id E012A14D42 for ; Mon, 1 Nov 1999 18:47:27 -0800 (PST) (envelope-from robert@cyrus.watson.org) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.9.3/8.9.3) with SMTP id VAA22569; Mon, 1 Nov 1999 21:20:21 -0500 (EST) (envelope-from robert@cyrus.watson.org) Date: Mon, 1 Nov 1999 21:20:21 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org Reply-To: Robert Watson To: Rodney Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991031120514.A28103@xs4all.nl> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, 31 Oct 1999, Rodney wrote: > here's my list of features I'd like to see in a > journalled fs. Have to admit this list is heavily > inspired ( ok , copied ) from the VxFS features, > apart from th buzz words, > some of them make sense, some of them don't > but it should give us some stuff to discus: > > 1) extent based allocation > coding this should be easy, it's just a address-lenght pair > identifying the starting block address and the length of the > extent. I've seen this coded up in qnxfs under linux. > I think the vsta filesystem does something similar. > 2) fast filesystem recovery , obviously > 3) acls would be nice , afs style ? > 4) online defrag and resizing (while user are online) > 5) online backup/snapshot > 6) vinum integration (vague) > 7) built features that make databases very happy > like msql/mysql/oracle. (vague) > > also b?trees for indexing sounds cool, thought the xfs > implementation seems quite heavy(they maintain 2 of them) > , ie over-kill ? > The way b+trees are use in the Be fs (bfs) might be more > appropriate. I guess I'd be interested in more seperation of the on-top semantics and filestore. I.e., some piece of code provides a transactional filestore--inodes, attributes, and blocks of data. On top of that, a semantics layer can build directories, acls, etc. This way the transactional implementation doesn't make a mess of the otherwise clean ufs-like behavior? I'm not sure how feasible that is, but it would be nice if possible. I'd also like to see things like ACLs implemented as attributes for storage purposes--while the VFS layer would expose vop_get/set_acl, et al, the top file system layer (not in traditional layering sense of the word) would convert these to internal vop_get/set_extattr calls. I'm really interested in an FS that provides transactional consistency over the inodes (or equiv) and attributes in an extensible way, allowing people to develop extensions (such as ACLs, MAC, etc) without having to understand the filestore in all its complexity. BTW, a useful thing to address would be consistency across layers in a stacked file system--something that I haven't really seen discussed anywhere... Robert N M Watson robert@fledge.watson.org http://www.watson.org/~robert/ PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 TIS Labs at Network Associates, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 4:42:20 1999 Delivered-To: freebsd-fs@freebsd.org Received: from antioche.lip6.fr (antioche.lip6.fr [132.227.74.11]) by hub.freebsd.org (Postfix) with ESMTP id 40CF114DFC for ; Tue, 2 Nov 1999 04:42:09 -0800 (PST) (envelope-from bouyer@antioche.lip6.fr) Received: from antifer.ipv6.lip6.fr (antifer.ipv6.lip6.fr [132.227.72.132]) by antioche.lip6.fr (8.9.3/8.9.3) with ESMTP id NAA01796; Tue, 2 Nov 1999 13:41:53 +0100 (MET) Received: (bouyer@localhost) by antifer.ipv6.lip6.fr (8.8.8/8.6.4) id NAA18983; Tue, 2 Nov 1999 13:41:52 +0100 (MET) Date: Tue, 2 Nov 1999 13:41:52 +0100 From: Manuel Bouyer To: Terry Lambert Cc: "Kenneth D. Merry" , don@calis.blacksun.org, ticso@cicely.de, grog@lemis.com, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG Subject: Re: Journaling Message-ID: <19991102134152.A18969@antioche.lip6.fr> References: <199910280305.VAA13281@panzer.kdm.org> <199910291710.KAA16646@usr02.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.6us In-Reply-To: <199910291710.KAA16646@usr02.primenet.com>; from Terry Lambert on Fri, Oct 29, 1999 at 05:10:14PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Oct 29, 1999 at 05:10:14PM +0000, Terry Lambert wrote: > NetBSD currently supports 16. > > Yes, it breaks backward compatability. No, NetBSD supports 16 only on ports that started with 16. Other still are 8. There are discussions about how to move to a higther number (not 16, but at last 64 or more) without breacking backward compatability ... -- Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr -- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 4:47:12 1999 Delivered-To: freebsd-fs@freebsd.org Received: from antioche.lip6.fr (antioche.lip6.fr [132.227.74.11]) by hub.freebsd.org (Postfix) with ESMTP id 0978314DFC for ; Tue, 2 Nov 1999 04:47:04 -0800 (PST) (envelope-from bouyer@antioche.lip6.fr) Received: from antifer.ipv6.lip6.fr (antifer.ipv6.lip6.fr [132.227.72.132]) by antioche.lip6.fr (8.9.3/8.9.3) with ESMTP id NAA01855; Tue, 2 Nov 1999 13:47:02 +0100 (MET) Received: (bouyer@localhost) by antifer.ipv6.lip6.fr (8.8.8/8.6.4) id NAA18991; Tue, 2 Nov 1999 13:47:01 +0100 (MET) Date: Tue, 2 Nov 1999 13:47:01 +0100 From: Manuel Bouyer To: Kelly Yancey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Journaling Message-ID: <19991102134701.B18969@antioche.lip6.fr> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.6us In-Reply-To: ; from Kelly Yancey on Sat, Oct 30, 1999 at 05:54:56PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Oct 30, 1999 at 05:54:56PM -0400, Kelly Yancey wrote: > Slightly off topic (as if the topic were about journalling anymore in > this thread anyway :) )... > From my perusal of the code, it looks as if the NetBSD change from > 386BSD's partition ID of 165 (which we still use) to 169 is unrelated to > the change to 16 partitions. Actually, I can't find where it is useful at > all; I would have assumed that if they were going to break > backward-compatibility by going to 16 partitions, switching MBR partition > IDs at the same time would be logical. > Does anyone here know the reasoning between switching MBR partition IDs? It's because FreeBSD also uses 165, this makes it hard to install both OSes on the same HD. -- Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr {Net,Free}BSD: 22 ans d'experience feront toujours la difference -- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 8:31:29 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id C03AB15719 for ; Tue, 2 Nov 1999 08:31:22 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991102102601.54815@mojave.sitaranetworks.com> Date: Tue, 2 Nov 1999 10:26:01 -0500 From: Greg Lehey To: Don , freebsd-fs@FreeBSD.ORG Subject: Re: Features of a journaled file system Reply-To: Greg Lehey References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Don on Sat, Oct 30, 1999 at 06:56:24PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Saturday, 30 October 1999 at 18:56:24 -0400, Don wrote: > What are the features people would like to see in a new FreeBSD file > system? Some of the ones I have heard listed are: > 1. Ability to grow a FS > 2. Ability to shrink a FS > 3. Acess control lists on files and file systems > 4. Extensibility. (The ability to easily add new features to the > filesystem without having to rewrite utilities such as fsck) None of these are specific features of a journalling file system. They're probably all desirable. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 8:40:59 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id 89D5514BF5 for ; Tue, 2 Nov 1999 08:40:50 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991102102703.16459@mojave.sitaranetworks.com> Date: Tue, 2 Nov 1999 10:27:03 -0500 From: Greg Lehey To: Chang Song , Ollivier Robert Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Features of a journaled file system Reply-To: Greg Lehey References: <19991031014032.A3510@keltia.freenix.fr> <381B85AB.68EF4A45@zk3.dec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <381B85AB.68EF4A45@zk3.dec.com>; from Chang Song on Sat, Oct 30, 1999 at 07:56:27PM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Saturday, 30 October 1999 at 19:56:27 -0400, Chang Song wrote: > Ollivier Robert wrote: >> >> According to Don: >>> Should the file system use b-trees? What other technologies should such a >> >> B-trees would help a lot in some cases. UFS performance has always been >> abyssimal with large directories... > > I think B+ tree is too complex to maintain and implement. Tandem has been using such a system since 1974. I can't remember anybody having much in the way of problems with it. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 9:28:59 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id D41D614A09 for ; Tue, 2 Nov 1999 09:28:35 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id SAA23846; Tue, 2 Nov 1999 18:28:28 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id SAA81192; Tue, 2 Nov 1999 18:28:27 +0100 (MET) Date: Tue, 2 Nov 1999 18:28:27 +0100 From: Eivind Eklund To: Erez Zadok Cc: Mats Lofkvist , freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? Message-ID: <19991102182827.B72085@bitbox.follo.net> References: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <199910312211.RAA00014@shekel.mcl.cs.columbia.edu>; from ezk@cs.columbia.edu on Sun, Oct 31, 1999 at 05:11:20PM -0500 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sun, Oct 31, 1999 at 05:11:20PM -0500, Erez Zadok wrote: > Many people on this list understand the problems and know how to fix them. > There are even some experimental patches made by Eivind Eklund, but those > patches aren't part of the kernel. Eivind's patches used to be in > > http://www.freebsd.org/~eivind/VOP_GETBACKINGOBJECT.patch > > and now they appear to be in > > http://www.freebsd.org/~eivind/FixNULL.patch > > (Eivind, can you confirm the new URL? FixNull.patch seems to include stuff > unrelated to the VFS, such as scsi driver fixes. Thanks.) The URL is correct - those fixes are there because the environment I used for working on those patches were somewhat unusual (cross-compilation from a RELENG_2_2 box), and that brokenness was in the way of me doing FS work, so it is fixed in that tree (though not committed, as I was not sure it was a good idea). > There's also been talk about some people (McKusick et al) rewriting the > whole VFS. While I think that's a great idea, it's a large undertaking and > will take a long while for busy people like McKusick to complete. I think a > complete rewrite, if any, should be scheduled for 5.x. I would therefore > suggest that a simpler fix such as Eivind's be incorporated into a 4.0 so > people can use stackable f/s (unionfs, nulls, and my wrapfs/cryptfs, etc.) > in the more immediate future. My patches doesn't solve the entire problem. If they actually had created a working environment for stacking layers, they would have been in the kernel already. Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 10:58: 0 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id 638971536A for ; Tue, 2 Nov 1999 10:57:55 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991102123553.21474@mojave.sitaranetworks.com> Date: Tue, 2 Nov 1999 12:35:53 -0500 From: Greg Lehey To: Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Reply-To: Greg Lehey References: <19991031120514.A28103@xs4all.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991031120514.A28103@xs4all.nl>; from Rodney on Sun, Oct 31, 1999 at 12:05:14PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote: > > > hi, > > here's my list of features I'd like to see in a > journalled fs. Have to admit this list is heavily > inspired ( ok , copied ) from the VxFS features, > apart from th buzz words, > some of them make sense, some of them don't > but it should give us some stuff to discus: > [snip] > 6) vinum integration (vague) Vinum is just a virtual disk. As such, any file system should work on it. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 13:45:58 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id 3300814D37 for ; Tue, 2 Nov 1999 13:45:48 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991102154051.35226@mojave.sitaranetworks.com> Date: Tue, 2 Nov 1999 15:40:51 -0500 From: Greg Lehey To: Randell Jesup , freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS Reply-To: Greg Lehey References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Randell Jesup on Mon, Nov 01, 1999 at 02:51:47AM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 1 November 1999 at 2:51:47 +0000, Randell Jesup wrote: > Don writes: >>> Most corporate IT managers wouldn't know a filesystem if they were >>> bitten by one. >> That is absolutely the case. That is why I can not suggest that >> softupdates is as good as a journaled file system. The people I deal with >> at least know the buzzword and they want to make sure that whatever >> solution they go with will have it. > > Question: is the fsck time for softupdates the same as for > plain UFS (when it needs to fsck, which should be (much) less often, > if I remember correctly). My understanding is that the fsck is identical. The only advantage that soft updates brings is that the danger of damage is much less. > Even the occasional long-fsck-time can be a problem for a > high-availability production environment. Agreed. This is the biggest advantage of a log-based fs. > Side question: why is it that there are certain errors (inode out > of range, for example) that fsck barfs on and exits? Because it's broken. We should be able to recognize and fix all these problems. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 13:59:24 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id A53671546F for ; Tue, 2 Nov 1999 13:59:08 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991102155021.38326@mojave.sitaranetworks.com> Date: Tue, 2 Nov 1999 15:50:21 -0500 From: Greg Lehey To: fs@FreeBSD.org, Don Subject: Re: journaling UFS and LFS Reply-To: Greg Lehey References: <86hfj63es8.fsf@not.demophon.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Don on Mon, Nov 01, 1999 at 07:38:42AM -0500 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org [moved to fs] On Monday, 1 November 1999 at 7:38:42 -0500, Don wrote: >> *Very* different from LFS. (What are features? "Has files and >> directories"? Time-complexity? Implementation details? Buzzwords?) > > You know. Features. As in those things that people would like to see in > such a file system. The features we would like to see have already > been listed. Please see the archives if you want to know what was > considered a "feature". > > Besides, VxFS has a closer feature set to what I would like to see. Has anybody thought of lobbying Veritas to release VxFS? I think you might just find some open ears. If anybody's serious about this, contact me privately and I can give some suggestions. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 13:59:30 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id C92911546E; Tue, 2 Nov 1999 13:59:08 -0800 (PST) (envelope-from grog@lemis.com) Message-ID: <19991102154614.55760@mojave.sitaranetworks.com> Date: Tue, 2 Nov 1999 15:46:14 -0500 From: Greg Lehey To: Eivind Eklund , Don Cc: Jacques Vidrine , freebsd-fs@FreeBSD.org Subject: Re: journaling UFS and LFS Reply-To: Greg Lehey References: <19991030233304.03DB31DA4@bone.nectar.com> <19991101171936.J72085@bitbox.follo.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991101171936.J72085@bitbox.follo.net>; from Eivind Eklund on Mon, Nov 01, 1999 at 05:19:36PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Monday, 1 November 1999 at 17:19:36 +0100, Eivind Eklund wrote: > On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote: >> This is getting off topic. What features would you like to see in a new >> file system. Some suggestions were made. Would you like to add anything to >> this list? > > Yes. > * Easy to do concurrent access from multiple hosts to the same > physical media You can never do this in the general case (where any host may request access to any part of the disk). The best you could do there is a file server, but they're not quite our terms of reference. > * Ability to span more than one disk That's not necessarily a file system feature. Vinum does that now. > I have design papers on the FS designed for G2, which was intended to > support all of the features I've seen listed so far. It has a couple > of drawbacks: > (1) It is not designed to have the semantics of a standard Unix > filesystem. That doesn't surprise me, if you want to implement the first of your suggestions. Is there anything in there which would be of interest in our environment? Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 15:54:56 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id B752D14E3E for ; Tue, 2 Nov 1999 15:54:32 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id AAA22553; Wed, 3 Nov 1999 00:47:42 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id AAA88098; Wed, 3 Nov 1999 00:54:16 +0100 (CET) Date: Wed, 3 Nov 1999 00:54:16 +0100 From: Bernd Walter To: Greg Lehey Cc: Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Message-ID: <19991103005415.A88044@cicely7.cicely.de> References: <19991031120514.A28103@xs4all.nl> <19991102123553.21474@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <19991102123553.21474@mojave.sitaranetworks.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote: > On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote: > > > > > > hi, > > > > here's my list of features I'd like to see in a > > journalled fs. Have to admit this list is heavily > > inspired ( ok , copied ) from the VxFS features, > > apart from th buzz words, > > some of them make sense, some of them don't > > but it should give us some stuff to discus: > > [snip] > > 6) vinum integration (vague) > > Vinum is just a virtual disk. As such, any file system should work on > it. > It is more than that - it is a volume manager. Maybe you are not clear how far you got beyound the virtual disk. It manages disks and can find it's drive properly if they changed devices - that's working relay fine that I was able to remove nearly all wire configurations for drives and I'm eaven run a volume with only one single drive plex - just to get this feature. It can (or should be able to) resize a volume and should inform the system about. I have some ideas about how to get FFS resizeable without needing to freeze or umount it before and without loosing inodes. Vinum is the frontend for managing the size of the volume and it should inform the fs driver about any change, because there is no need to manualy call an additional tool. My point is modifying FFS but that's the same for any fs. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 16:17:14 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id E046A152F8 for ; Tue, 2 Nov 1999 16:17:02 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id TAA68704; Tue, 2 Nov 1999 19:16:55 -0500 (EST) Date: Tue, 2 Nov 1999 19:16:55 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Bernd Walter Cc: Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991103005415.A88044@cicely7.cicely.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, 3 Nov 1999, Bernd Walter wrote: > On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote: > > On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote: > > > > > > > > > hi, > > > > > > here's my list of features I'd like to see in a > > > journalled fs. Have to admit this list is heavily > > > inspired ( ok , copied ) from the VxFS features, > > > apart from th buzz words, > > > some of them make sense, some of them don't > > > but it should give us some stuff to discus: > > > [snip] > > > 6) vinum integration (vague) > > > > Vinum is just a virtual disk. As such, any file system should work on > > it. > > > It is more than that - it is a volume manager. > Maybe you are not clear how far you got beyound the virtual disk. > It manages disks and can find it's drive properly if they changed devices - > that's working relay fine that I was able to remove nearly all wire > configurations for drives and I'm eaven run a volume with only one single > drive plex - just to get this feature. > It can (or should be able to) resize a volume and should inform the system > about. I am under the impression that you can only enlarge a vinum volume if it in a RAID 0 configuration (concatenation). Obviously, it would be very difficult to enlarge a RAID 1 or RAID 5 configuration as it would require restriping the data across all disks; I'm not familiar with any product, hardware or software, that can do this. Besides the fact that this would be an issue for any RAID controller also. Anyone with a RAID controller can add a new disk to their RAID 0 and enlarge the virtual disk. Those controllers aren't going to tell you about the increased disk size any more than vinum does. Beyond that, who is to say that the entire size of the new, enlarged, virtual disk is supposed be dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to add disk space to a RAID array and partition it as say FAT32? I think what Greg was getting at as far as the file system is concerned, vinum just looks like a disk. Whatever else vinum may be, to the file system it just looks like a disk. > I have some ideas about how to get FFS resizeable without needing to freeze or > umount it before and without loosing inodes. This is great, but I think that "vinum hooks" are no more needed than "ccd hooks" or "DPT hooks". User-land tools should allow the administrator to resize the file system at the administrators discretion. Beyond the technical issues of providing hooks to automatically extend file systems, there is the social implication of whether that is what the user wanted. User-land tools solve both problems. > Vinum is the frontend for managing the size of the volume and it should inform > the fs driver about any change, because there is no need to manualy call an > additional tool. > My point is modifying FFS but that's the same for any fs. > > No (see above). Forget about vinum, just worry about disks. Vinum will play nice and pretend to be a disk. In the end you will have a cleaner solution that plays nice with others too. Everyone will love the fact that they can extend any disk, at command, either by adding drives to their vinum config, their hardware RAID array, or finally whiping Windows off their home system. -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Nov 2 16:21:59 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 4B484152B2 for ; Tue, 2 Nov 1999 16:21:56 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id TAA83776; Tue, 2 Nov 1999 19:21:51 -0500 (EST) Date: Tue, 2 Nov 1999 19:21:51 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Bernd Walter Cc: Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > I am under the impression that you can only enlarge a vinum volume if it > in a RAID 0 configuration (concatenation). Obviously, it would be very > difficult to enlarge a RAID 1 or RAID 5 configuration as it would require > restriping the data across all disks; I'm not familiar with any product, > hardware or software, that can do this. Oops, my mistake. Scratch the RAID 1, mirroring should be relatively simple to extend. But the rest of the discussion is still valid, I think. Kelly -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 1:21:41 1999 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.yes.no (ns1.yes.no [195.204.136.10]) by hub.freebsd.org (Postfix) with ESMTP id 53BA1157CF for ; Wed, 3 Nov 1999 01:21:33 -0800 (PST) (envelope-from eivind@bitbox.follo.net) Received: from bitbox.follo.net (bitbox.follo.net [195.204.143.218]) by ns1.yes.no (8.9.3/8.9.3) with ESMTP id KAA05915; Wed, 3 Nov 1999 10:19:03 +0100 (CET) Received: (from eivind@localhost) by bitbox.follo.net (8.8.8/8.8.6) id KAA85241; Wed, 3 Nov 1999 10:18:58 +0100 (MET) Date: Wed, 3 Nov 1999 10:18:58 +0100 From: Eivind Eklund To: Greg Lehey Cc: Don , Jacques Vidrine , freebsd-fs@FreeBSD.org Subject: Re: journaling UFS and LFS Message-ID: <19991103101858.E72085@bitbox.follo.net> References: <19991030233304.03DB31DA4@bone.nectar.com> <19991101171936.J72085@bitbox.follo.net> <19991102154614.55760@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <19991102154614.55760@mojave.sitaranetworks.com>; from grog@lemis.com on Tue, Nov 02, 1999 at 03:46:14PM -0500 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, Nov 02, 1999 at 03:46:14PM -0500, Greg Lehey wrote: > On Monday, 1 November 1999 at 17:19:36 +0100, Eivind Eklund wrote: > > On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote: > >> This is getting off topic. What features would you like to see in a new > >> file system. Some suggestions were made. Would you like to add anything to > >> this list? > > > > Yes. > > * Easy to do concurrent access from multiple hosts to the same > > physical media > > You can never do this in the general case (where any host may request > access to any part of the disk). The best you could do there is a > file server, but they're not quite our terms of reference. I don't get this. To give a little more detail in what I mean: You have the FS export a bunch of locks into the DLM (Distributed Lock Manager) you are running (probably over the bus you use to share access to the disks, but you can use another connection media as long as it is there), and the host that wants to do something to some part of the FS grabs the relevant lock. You also design the disk layout to allow writing in a transactional way, so a host failure while the host has a lock doesn't hurt the other hosts accessing the same physical media. I don't get what "general case" there is, as you're designing the system - could you please explain? > > * Ability to span more than one disk > > That's not necessarily a file system feature. Vinum does that now. Sure. The reason for having it in the FS is that you can optimize for the independence of your spindles. This lets you: * Write logs and data to separate spindles (increasing performance) * Give performance guarantees proportional to the number and features of your spindles, instead of being limited by what your weakest link can do (times one) * Optimize data layout to be able to do a semi-recovery after losing one of your spindles * (irrelevant unless we extend the userland interface, which was planned for G2) Give different guarantees for different files in the same namespace. You may need RAID-0 to get the speed wanted for one non-critical file, while wanting RAID-5 to store a file that need safe storage, but don't need fast streaming. > > I have design papers on the FS designed for G2, which was intended to > > support all of the features I've seen listed so far. It has a couple > > of drawbacks: > > (1) It is not designed to have the semantics of a standard Unix > > filesystem. > > That doesn't surprise me, if you want to implement the first of your > suggestions. Actually, that's not a problem - but we decided against pushing any complexity into the bottom end filesystem if we could do it well in a stacking layer. > Is there anything in there which would be of interest in our > environment? As I said, it supports all features I've seen mentioned (by anybody) so far in the discussion. Its most most significant design goal was to support Highly Available Systems; that is, clusters. The design allows more than one machine in a cluster to access a shared disk with a HAS-FS on it, with the system as a whole surviving the (unplanned) loss of any individual member. I think we ended up supporting transactions built from several file operations in multi-machine context, too, but I'm not 100% sure (it is almost 1 1/2 year since Simon and I did the design, which was done during a single three-week session in the same physical location, and I've not worked with the spec since). Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 1:54: 1 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id 5E1FB15530 for ; Wed, 3 Nov 1999 01:53:55 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id KAA02544; Wed, 3 Nov 1999 10:46:59 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id KAA90686; Wed, 3 Nov 1999 10:53:33 +0100 (CET) Date: Wed, 3 Nov 1999 10:53:33 +0100 From: Bernd Walter To: Kelly Yancey Cc: Bernd Walter , Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Message-ID: <19991103105333.A89617@cicely7.cicely.de> References: <19991103005415.A88044@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue, Nov 02, 1999 at 07:16:55PM -0500, Kelly Yancey wrote: > On Wed, 3 Nov 1999, Bernd Walter wrote: > > > On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote: > > > On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote: > > > > > > > > > > > > hi, > > > > > > > > here's my list of features I'd like to see in a > > > > journalled fs. Have to admit this list is heavily > > > > inspired ( ok , copied ) from the VxFS features, > > > > apart from th buzz words, > > > > some of them make sense, some of them don't > > > > but it should give us some stuff to discus: > > > > [snip] > > > > 6) vinum integration (vague) > > > > > > Vinum is just a virtual disk. As such, any file system should work on > > > it. > > > > > It is more than that - it is a volume manager. > > Maybe you are not clear how far you got beyound the virtual disk. > > It manages disks and can find it's drive properly if they changed devices - > > that's working relay fine that I was able to remove nearly all wire > > configurations for drives and I'm eaven run a volume with only one single > > drive plex - just to get this feature. > > It can (or should be able to) resize a volume and should inform the system > > about. > > I am under the impression that you can only enlarge a vinum volume if it > in a RAID 0 configuration (concatenation). Obviously, it would be very > difficult to enlarge a RAID 1 or RAID 5 configuration as it would require > restriping the data across all disks; I'm not familiar with any product, > hardware or software, that can do this. In case of Striping which is valid for Raid5 and concatenated Raid0 configrations it is not simply possible to do. But think of a Raid5 volume which is extended with concatenating another Raid5 set. This is not doable with vinum - but I'm shure that this won't happen before anyone is using such a feature feature. > Besides the fact that this would be an issue for any RAID controller No. Most Controllers I have seen increases the size of a disk - not a volume. > also. Anyone with a RAID controller can add a new disk to their RAID 0 and > enlarge the virtual disk. Those controllers aren't going to tell you about > the increased disk size any more than vinum does. Beyond that, who is to They don't need, because the partition the fs is on won't increases if the virtual disk is getting bigger. > say that the entire size of the new, enlarged, virtual disk is supposed be > dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to > add disk space to a RAID array and partition it as say FAT32? That's why it may be interesting to add such hooks to disklabel. > > I think what Greg was getting at as far as the file system is concerned, > vinum just looks like a disk. Whatever else vinum may be, to the file > system it just looks like a disk. > > > I have some ideas about how to get FFS resizeable without needing to freeze or > > umount it before and without loosing inodes. > > This is great, but I think that "vinum hooks" are no more needed than > "ccd hooks" or "DPT hooks". User-land tools should allow the administrator > to resize the file system at the administrators discretion. Beyond the > technical issues of providing hooks to automatically extend file systems, > there is the social implication of whether that is what the user wanted. > User-land tools solve both problems. DPT should be obsolete because the don't change the size of a partition. ccd's should be partionioned too and is not that usefull any more compared to vinum. vinum and disklabel are the hooks, but I think vinum is more usefull. Greg already is about to implement spare disk support. What about a kind of spare disk which is scheduled to increase a FS automaticaly if running out of space. Features like this need interaction between the fs and the volumemanager. Of course Hardware Raid's are a point too - but that's more difficult. > > > Vinum is the frontend for managing the size of the volume and it should inform > > the fs driver about any change, because there is no need to manualy call an > > additional tool. > > My point is modifying FFS but that's the same for any fs. > > > > > > No (see above). Forget about vinum, just worry about disks. Vinum will > play nice and pretend to be a disk. In the end you will have a cleaner > solution that plays nice with others too. Everyone will love the fact that > they can extend any disk, at command, either by adding drives to their > vinum config, their hardware RAID array, or finally whiping Windows off > their home system. > I don't want vinum or anything else like this know how to resize a fs, but I want them to be able to call the needed tools automaticaly. Think of decreasing - firt you have to find out how big the new partition will become - then you need to decrease the fs and finaly you have to decrease the volume. 3 Points to do with the possibility to shoot yourself in the foot. If vinum calls the tool and say "the user want this volume to decrease 134Meg Do want is needed so I can do what the user wants" it is easier and less likely to get you in troubles. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 8:40:41 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 36BBD15103 for ; Wed, 3 Nov 1999 08:40:35 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id LAA27925; Wed, 3 Nov 1999 11:40:24 -0500 (EST) Date: Wed, 3 Nov 1999 11:40:24 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Bernd Walter Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991103105333.A89617@cicely7.cicely.de> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, 3 Nov 1999, Bernd Walter wrote: > > > > I am under the impression that you can only enlarge a vinum volume if it > > in a RAID 0 configuration (concatenation). Obviously, it would be very > > difficult to enlarge a RAID 1 or RAID 5 configuration as it would require > > restriping the data across all disks; I'm not familiar with any product, > > hardware or software, that can do this. > > In case of Striping which is valid for Raid5 and concatenated Raid0 configrations > it is not simply possible to do. > But think of a Raid5 volume which is extended with concatenating another Raid5 set. > This is not doable with vinum - but I'm shure that this won't happen before anyone > is using such a feature feature. That sounds more like a RAID 5/0 config. While I've never seen a hardware vendor advertise support for such a creature, it should theoretically be possible. However, vinum volumes can only provide mirroring between plexes so it is impossible for vinum to extend a volume composed of RAID 5 plexes via concatenation. On the other hand, I see that Greg has "Extending striped and RAID-5 plexes" on his TODO list for vinum, presumably by [shudder] restriping everything. > > > Besides the fact that this would be an issue for any RAID controller > > No. > Most Controllers I have seen increases the size of a disk - not a volume. Sorry, I was thinking about the software in RAID controllers in the same terms as vinum. You are correct, though, that to the OS it appears as a single disk which has been enlarged. The same thing, though, is true with vinum; it should appear simply as though the disk were enlarged (albeit a "virtual disk"). No file system should care whether a disk is a "real" disk or a "virtual" disk or else a "virtual" disk isn't very virtual. > > > also. Anyone with a RAID controller can add a new disk to their RAID 0 and > > enlarge the virtual disk. Those controllers aren't going to tell you about > > the increased disk size any more than vinum does. Beyond that, who is to > > They don't need, because the partition the fs is on won't increases if the > virtual disk is getting bigger. I need to clarify terminology here just for myself, because otherwise we're getting into confusing territory... partition: UNIX-style partitions of which there can be 8 (lettered a-h); exist in the disklabel of a slice. slice: PC-style partitioning of disk space of which there can be 4; exist in the master boot record. vinum doesn't support partitions; I don't know whether it supports slices. Now, if vinum supports slices, then vinum doesn't care what filesystem one puts on it (ie how it is sliced up). In which case, one could use vinum to manage a virtual disk with NTFS on one slice and FFS on another. However, if it does not support slices, which I suspect it doesn't, then then entire volume must be dedicated to a single file system. So arguably, yes, if someone were to extend the size of the virtual disk (presumably by adding physical disks to the plex), it would be reasonable to assume that any existing filesystem should be extended to fill the new space. What I can't figure out is why Greg doesn't support slicing / partitioning the virtual disk (this is really the only thing that prevents it from being 100% transparent in my estimation). With a MBR, vinum could be used to hold any filesystem (ie. NTFS, ext2, or FAT32) or any combination thereof; with a disklabel vinum wouldn't require kludges like newfs -v. > > > say that the entire size of the new, enlarged, virtual disk is supposed be > > dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to > > add disk space to a RAID array and partition it as say FAT32? > > That's why it may be interesting to add such hooks to disklabel. > You are saying so that when someone updates the disklabel to specify a larger partition, the hooks would be used to notify the filesystem which could then do the dirty work? You haven't happened to visit the Pacific Northwest recent, perhaps near the town of Redmond, WA? :) Seriously, such hooks would have to be in the kernel, not the disklabel program, in the off chance someone uses a tool other than disklabel to edit the partition table. > > > > I think what Greg was getting at as far as the file system is concerned, > > vinum just looks like a disk. Whatever else vinum may be, to the file > > system it just looks like a disk. > > > > > I have some ideas about how to get FFS resizeable without needing to freeze or > > > umount it before and without loosing inodes. > > > > This is great, but I think that "vinum hooks" are no more needed than > > "ccd hooks" or "DPT hooks". User-land tools should allow the administrator > > to resize the file system at the administrators discretion. Beyond the > > technical issues of providing hooks to automatically extend file systems, > > there is the social implication of whether that is what the user wanted. > > User-land tools solve both problems. > > DPT should be obsolete because the don't change the size of a partition. > ccd's should be partionioned too and is not that usefull any more compared to > vinum. > vinum and disklabel are the hooks, but I think vinum is more usefull. > Greg already is about to implement spare disk support. > What about a kind of spare disk which is scheduled to increase a FS > automaticaly if running out of space. > Features like this need interaction between the fs and the volumemanager. > Of course Hardware Raid's are a point too - but that's more difficult. > Basically what we need is a filesystem-specific resize function which userland tools could use a syscall to request a filesystem be resized, and the filesystem itself would do the implemention. Assuming vinum remains the special case of only allowing one file system on it, it would safe for it to call the filesystem resize routine when it brings the spare on-line. However, personally I would like to see vinum become a true virtual disk, allowing multiple file systems. In which case, I don't see where anything other than userland tools would access this interface. > > No (see above). Forget about vinum, just worry about disks. Vinum will > > play nice and pretend to be a disk. In the end you will have a cleaner > > solution that plays nice with others too. Everyone will love the fact that > > they can extend any disk, at command, either by adding drives to their > > vinum config, their hardware RAID array, or finally whiping Windows off > > their home system. > > > > I don't want vinum or anything else like this know how to resize a fs, but > I want them to be able to call the needed tools automaticaly. > Think of decreasing - firt you have to find out how big the new partition > will become - then you need to decrease the fs and finaly you have to > decrease the volume. > 3 Points to do with the possibility to shoot yourself in the foot. > If vinum calls the tool and say "the user want this volume to decrease 134Meg > Do want is needed so I can do what the user wants" it is easier and less > likely to get you in troubles. > This is nice in theory. The tools should still be there to access the functionality, though. My only question is: how does vinum *know* what you want to do. Clearly, in it's current state, it is easy to determine when to enlarge a filesystem (basically whenever more space available); but you can't *know* when the user wants to shrink the filesystem. Userland tools are the only way for the user to tell you. Kelly -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 9:29:50 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id A0C68154D7 for ; Wed, 3 Nov 1999 09:29:32 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id SAA10842; Wed, 3 Nov 1999 18:22:38 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id SAA92054; Wed, 3 Nov 1999 18:29:13 +0100 (CET) Date: Wed, 3 Nov 1999 18:29:13 +0100 From: Bernd Walter To: Kelly Yancey Cc: Bernd Walter , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Message-ID: <19991103182912.A92011@cicely7.cicely.de> References: <19991103105333.A89617@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Nov 03, 1999 at 11:40:24AM -0500, Kelly Yancey wrote: > On Wed, 3 Nov 1999, Bernd Walter wrote: > > That sounds more like a RAID 5/0 config. While I've never seen a > hardware vendor advertise support for such a creature, it should > theoretically be possible. That's what I mean. With the Metadisk software on Solaris it is possible to do. > However, vinum volumes can only provide mirroring between plexes so > it is impossible for vinum to extend a volume composed of RAID 5 plexes > via concatenation. On the other hand, I see that Greg has "Extending > striped and RAID-5 plexes" on his TODO list for vinum, presumably by > [shudder] restriping everything. I asume Greg will do the right thing so everyone should be happy. > > Sorry, I was thinking about the software in RAID controllers in the same > terms as vinum. You are correct, though, that to the OS it appears as a > single disk which has been enlarged. The same thing, though, is true with > vinum; it should appear simply as though the disk were enlarged (albeit a > "virtual disk"). > No file system should care whether a disk is a "real" disk or a > "virtual" disk or else a "virtual" disk isn't very virtual. To be exact vinum does not create a disk in the usual way. The volume it creates don't get partitioned like the ccd ones. > > vinum doesn't support partitions; I don't know whether it supports > slices. > > Now, if vinum supports slices, then vinum doesn't care what filesystem > one puts on it (ie how it is sliced up). In which case, one could use > vinum to manage a virtual disk with NTFS on one slice and FFS on another. > However, if it does not support slices, which I suspect it doesn't, then > then entire volume must be dedicated to a single file system. So > arguably, yes, if someone were to extend the size of the virtual disk > (presumably by adding physical disks to the plex), it would be reasonable > to assume that any existing filesystem should be extended to fill the new > space. I don't see the need for soing that. If you want to have - say a RAID5 volume partitioned you can also create 2 Volumes with one Raid5 plex. The layout on the disk should be the same. > > What I can't figure out is why Greg doesn't support slicing > / partitioning the virtual disk (this is really the only thing that > prevents it from being 100% transparent in my estimation). With a > MBR, vinum could be used to hold any filesystem (ie. NTFS, ext2, or FAT32) > or any combination thereof; with a disklabel vinum wouldn't require > kludges like newfs -v. That's a drive naming thing not the label. Vinum creates an artificial label and you only need to use newfs -v in some cases. I usually name my volumes d0,d1,d2,... and I don't need to use the -v switch. If I remember this right the volume needs to end on 0-9,a-h. The vinum volumes are useable only with an operating system supporting vinum, it is more a fs issue that limits the further use. The partion name may be a point - I never thought about it. > > > > > > say that the entire size of the new, enlarged, virtual disk is supposed be > > > dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to > > > add disk space to a RAID array and partition it as say FAT32? > > > > That's why it may be interesting to add such hooks to disklabel. > > > > You are saying so that when someone updates the disklabel to specify a > larger partition, the hooks would be used to notify the filesystem which > could then do the dirty work? > You haven't happened to visit the Pacific Northwest recent, perhaps near > the town of Redmond, WA? :) Seriously, such hooks would have to be in the > kernel, not the disklabel program, in the off chance someone uses a tool > other than disklabel to edit the partition table. That's an option too - but of course anyone should always be able to do everything manualy. > > Basically what we need is a filesystem-specific resize function which > userland tools could use a syscall to request a filesystem be resized, and > the filesystem itself would do the implemention. Assuming vinum remains > the special case of only allowing one file system on it, it would safe for > it to call the filesystem resize routine when it brings the spare on-line. > However, personally I would like to see vinum become a true virtual disk, > allowing multiple file systems. In which case, I don't see where anything > other than userland tools would access this interface. In my opinion vinum should not remain a special case but a usual. Vinum brings the toolset to manage and handle volumes - why not implementing hooks for the dependencys? > > > No (see above). Forget about vinum, just worry about disks. Vinum will > > > play nice and pretend to be a disk. In the end you will have a cleaner > > > solution that plays nice with others too. Everyone will love the fact that > > > they can extend any disk, at command, either by adding drives to their > > > vinum config, their hardware RAID array, or finally whiping Windows off > > > their home system. > > > > > > > I don't want vinum or anything else like this know how to resize a fs, but > > I want them to be able to call the needed tools automaticaly. > > Think of decreasing - firt you have to find out how big the new partition > > will become - then you need to decrease the fs and finaly you have to > > decrease the volume. > > 3 Points to do with the possibility to shoot yourself in the foot. > > If vinum calls the tool and say "the user want this volume to decrease 134Meg > > Do want is needed so I can do what the user wants" it is easier and less > > likely to get you in troubles. > > > > This is nice in theory. The tools should still be there to access the > functionality, though. My only question is: how does vinum *know* what you > want to do. Clearly, in it's current state, it is easy to determine when > to enlarge a filesystem (basically whenever more space available); but you > can't *know* when the user wants to shrink the filesystem. Userland tools > are the only way for the user to tell you. You tell vinum to decrease the volume space. Vinum tells the fs-tool that the volume would become smaller with saying how small. The fs-tool asks the user if he want's to do this after some sanity checkings. If the user did not want or the fs decreased successfully vinum decreases the space and if the decreasing fails vinum should refuse to do. The key missing is how to determine which kind of fs you have to handle but that's mostly a definition point. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 11:43:58 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id 31B4F1502E for ; Wed, 3 Nov 1999 11:43:49 -0800 (PST) (envelope-from grog@mojave.sitaranetworks.com) Message-ID: <19991103144037.41321@mojave.sitaranetworks.com> Date: Wed, 3 Nov 1999 14:40:37 -0500 From: Greg Lehey To: Kelly Yancey , Bernd Walter Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Reply-To: Greg Lehey References: <19991103105333.A89617@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: ; from Kelly Yancey on Wed, Nov 03, 1999 at 11:40:24AM -0500 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 3 November 1999 at 11:40:24 -0500, Kelly Yancey wrote: > On Wed, 3 Nov 1999, Bernd Walter wrote: > >>> >>> I am under the impression that you can only enlarge a vinum volume if it >>> in a RAID 0 configuration (concatenation). Obviously, it would be very >>> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require >>> restriping the data across all disks; I'm not familiar with any product, >>> hardware or software, that can do this. >> >> In case of Striping which is valid for Raid5 and concatenated Raid0 >> configrations it is not simply possible to do. But think of a >> Raid5 volume which is extended with concatenating another Raid5 >> set. This is not doable with vinum - but I'm shure that this won't >> happen before anyone is using such a feature feature. > > That sounds more like a RAID 5/0 config. While I've never seen a > hardware vendor advertise support for such a creature, it should > theoretically be possible. > However, vinum volumes can only provide mirroring between plexes so > it is impossible for vinum to extend a volume composed of RAID 5 plexes > via concatenation. On the other hand, I see that Greg has "Extending > striped and RAID-5 plexes" on his TODO list for vinum, presumably by > [shudder] restriping everything. That's what I'm thinking of. Yes, it's slow and ugly, but it's a function that people want. The obvious current alternative is to back up the entire volume to tape, rebuild the volume (including reinitializing in the case of RAID-5) and restoring the data. By comparison, restriping looks pretty :-) There is another way to do this now, on line, if you have enough disk: create another plex, start it, remove the original plex, remodel it nearer to the heart's desire, and start it. It's slow, but not as slow as backing up to tape, and you can continue to access the volume while you're doing it. >>> Besides the fact that this would be an issue for any RAID controller >> >> No. >> Most Controllers I have seen increases the size of a disk - not a volume. > > Sorry, I was thinking about the software in RAID controllers in the same > terms as vinum. You are correct, though, that to the OS it appears as a > single disk which has been enlarged. The same thing, though, is true with > vinum; it should appear simply as though the disk were enlarged (albeit a > "virtual disk"). Correct. I don't really see a difference here, except maybe in terminology. Note that many operating systems refer to disks as volumes, however. > No file system should care whether a disk is a "real" disk or a > "virtual" disk or else a "virtual" disk isn't very virtual. Almost correct. It's useful to understand the geometry of a stripe set when setting up ufs; it's very easy to end up with all cylinder groups on the same spindle. >>> also. Anyone with a RAID controller can add a new disk to their RAID 0 and >>> enlarge the virtual disk. Those controllers aren't going to tell you about >>> the increased disk size any more than vinum does. Beyond that, who is to >> >> They don't need, because the partition the fs is on won't increases if the >> virtual disk is getting bigger. > > I need to clarify terminology here just for myself, because otherwise > we're getting into confusing territory... > > partition: UNIX-style partitions of which there can be 8 (lettered a-h); > exist in the disklabel of a slice. > slice: PC-style partitioning of disk space of which there can be 4; > exist in the master boot record. > > vinum doesn't support partitions; I don't know whether it supports > slices. Vinum does support partitions, because there's nothing you can do to stop it doing so. They just don't make sense in a Vinum context. > Now, if vinum supports slices, then vinum doesn't care what filesystem > one puts on it (ie how it is sliced up). In which case, one could use > vinum to manage a virtual disk with NTFS on one slice and FFS on another. > However, if it does not support slices, which I suspect it doesn't, then > then entire volume must be dedicated to a single file system. So > arguably, yes, if someone were to extend the size of the virtual disk > (presumably by adding physical disks to the plex), it would be reasonable > to assume that any existing filesystem should be extended to fill the new > space. Slices are supported too, at least as far as the underlying disk code is fooled by a Vinum volume. But they don't make sense. > What I can't figure out is why Greg doesn't support slicing > / partitioning the virtual disk (this is really the only thing that > prevents it from being 100% transparent in my estimation). As I said, they are supported, but they don't make sense. Vinum has its own, more flexible method for subdividing disks. > With a MBR, vinum could be used to hold any filesystem (ie. NTFS, > ext2, or FAT32) or any combination thereof; It can now. You don't need an MBR, since the bootstrap doesn't understand Vinum. And the usefulness of ext2 or NTFS file systems is limited, since Linux and NT don't understand Vinum. > with a disklabel vinum wouldn't require kludges like newfs -v. newfs -v is needed because newfs *without* -v is a kludge. It shouldn't assume anything from the name of a partition. >>> say that the entire size of the new, enlarged, virtual disk is supposed be >>> dedicated to FFS. Is it not possible, however unlikely, for a sysadmin to >>> add disk space to a RAID array and partition it as say FAT32? >> >> That's why it may be interesting to add such hooks to disklabel. > > You are saying so that when someone updates the disklabel to specify a > larger partition, the hooks would be used to notify the filesystem which > could then do the dirty work? > You haven't happened to visit the Pacific Northwest recent, perhaps near > the town of Redmond, WA? :) Seriously, such hooks would have to be in the > kernel, not the disklabel program, in the off chance someone uses a tool > other than disklabel to edit the partition table. I suppose it's possible to get the Vinum daemon to do this. In principle the idea makes sense, but it would need to be done right. I can think of a lot of more important stuff to do first. >>> I think what Greg was getting at as far as the file system is concerned, >>> vinum just looks like a disk. Whatever else vinum may be, to the file >>> system it just looks like a disk. >>> >>>> I have some ideas about how to get FFS resizeable without needing to freeze or >>>> umount it before and without loosing inodes. >>> >>> This is great, but I think that "vinum hooks" are no more needed than >>> "ccd hooks" or "DPT hooks". User-land tools should allow the administrator >>> to resize the file system at the administrators discretion. Beyond the >>> technical issues of providing hooks to automatically extend file systems, >>> there is the social implication of whether that is what the user wanted. >>> User-land tools solve both problems. >> >> DPT should be obsolete because the don't change the size of a partition. >> ccd's should be partionioned too and is not that usefull any more compared to >> vinum. >> vinum and disklabel are the hooks, but I think vinum is more usefull. >> Greg already is about to implement spare disk support. >> What about a kind of spare disk which is scheduled to increase a FS >> automaticaly if running out of space. >> Features like this need interaction between the fs and the volumemanager. >> Of course Hardware Raid's are a point too - but that's more difficult. > > Basically what we need is a filesystem-specific resize function which > userland tools could use a syscall to request a filesystem be resized, and > the filesystem itself would do the implemention. Resizing a file system is not a thing you can do in a system call. Much needs to be done in user context. > Assuming vinum remains the special case of only allowing one file > system on it, I'd rather hope that this should become the norm. > it would safe for it to call the filesystem resize routine when it > brings the spare on-line. However, personally I would like to see > vinum become a true virtual disk, It is :-) > allowing multiple file systems. It doesn't make any sense to do this. > In which case, I don't see where anything other than userland tools > would access this interface. That's the case at the moment. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 13: 7:29 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id E641114DC6 for ; Wed, 3 Nov 1999 13:07:13 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id WAA24867; Wed, 3 Nov 1999 22:00:13 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id WAA92866; Wed, 3 Nov 1999 22:06:49 +0100 (CET) Date: Wed, 3 Nov 1999 22:06:48 +0100 From: Bernd Walter To: Greg Lehey Cc: Kelly Yancey , Bernd Walter , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Message-ID: <19991103220648.A92524@cicely7.cicely.de> References: <19991103105333.A89617@cicely7.cicely.de> <19991103144037.41321@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <19991103144037.41321@mojave.sitaranetworks.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, Nov 03, 1999 at 02:40:37PM -0500, Greg Lehey wrote: > > > > You are saying so that when someone updates the disklabel to specify a > > larger partition, the hooks would be used to notify the filesystem which > > could then do the dirty work? > > You haven't happened to visit the Pacific Northwest recent, perhaps near > > the town of Redmond, WA? :) Seriously, such hooks would have to be in the > > kernel, not the disklabel program, in the off chance someone uses a tool > > other than disklabel to edit the partition table. > > I suppose it's possible to get the Vinum daemon to do this. In > principle the idea makes sense, but it would need to be done right. I > can think of a lot of more important stuff to do first. At least resizing should work before. > > > > Basically what we need is a filesystem-specific resize function which > > userland tools could use a syscall to request a filesystem be resized, and > > the filesystem itself would do the implemention. > > Resizing a file system is not a thing you can do in a system call. > Much needs to be done in user context. > I have to agree. Everything possible should be done in user-mode because this keeps the code nonresident and the kernel small. But several things need to be done in sync with the incore informations of the fs - at least if you won't freeze the fs and resync the incore. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 13:22:21 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 436E014CE0 for ; Wed, 3 Nov 1999 13:22:17 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id QAA51846; Wed, 3 Nov 1999 16:21:09 -0500 (EST) Date: Wed, 3 Nov 1999 16:21:09 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Greg Lehey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991103144037.41321@mojave.sitaranetworks.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wed, 3 Nov 1999, Greg Lehey wrote: > > > No file system should care whether a disk is a "real" disk or a > > "virtual" disk or else a "virtual" disk isn't very virtual. > > Almost correct. It's useful to understand the geometry of a stripe > set when setting up ufs; it's very easy to end up with all cylinder > groups on the same spindle. But wouldn't the apply to any RAID configuration, not just vinum? Albeit difficult to extract such information from any hardware vendor. > > Vinum does support partitions, because there's nothing you can do to > stop it doing so. They just don't make sense in a Vinum context. > I was was mislead by http://www.lemis.com/vinum/Object-naming.html into believing vinum didn't support partitions: "Volumes appear to the system to be identical to disks, with one exception. Unlike UNIX drives, Vinum does not partition volumes, which thus do not contain a partition table. This has required modification to some disk utilities, notably newfs, which previously tried to interpret the last letter of a Vinum volume name as a partition identifier." > Slices are supported too, at least as far as the underlying disk code > is fooled by a Vinum volume. But they don't make sense. Well, I would say that they make sense in the sense the vinum creates a virtual disk which should appear and behave exactly like any physical disk. You can slice up a physical disk and put FFS and NTFS on separate slices. The "don't make sense" part seems to stem from the fact that 99.9% of people don't have any reason to do the OS's associated with the other file systems don't have vinum to access the virtual disk. I've been looking at it as creating something that is indistinquishable from a physical disk drive. In which case, anything a physical disk can do, a vinum disk should do too. The theory being that other tools won't have to adapt then to handle a special case for vinum. And you have done that. Very well. I considered newfs -v a special case, but now I Think Different(tm). :) > It can now. You don't need an MBR, since the bootstrap doesn't > understand Vinum. And the usefulness of ext2 or NTFS file systems is > limited, since Linux and NT don't understand Vinum. The point wasn't so much the "boot" part of the master boot record, but rather the PC-compatible partition table that is stored in the first sector with the MBR. But otherwise, I think we are on the same wavelength. > newfs -v is needed because newfs *without* -v is a kludge. It > shouldn't assume anything from the name of a partition. I see the light. I was thinking about how things are and trying to figure out why vinum didn't emulate the "standard" behaviour. Now I understand that it is simply because the "standard" behaviour is misguided. This makes very good sense. > > Basically what we need is a filesystem-specific resize function which > > userland tools could use a syscall to request a filesystem be resized, and > > the filesystem itself would do the implemention. > > Resizing a file system is not a thing you can do in a system call. > Much needs to be done in user context. Good point. I was thinking along the lines that the filesystem code would be best suited to understand the disk layout, so ideally one would inform the file system that you needed to do some resizing and it would take care of it. Now that I think about it some more, this was ill-conceived, not only would it be unreasonable to put the functionality in the kernel, newfs gives a good precident for userland tools to implement file-system-specific functionality such as resizing. > > > Assuming vinum remains the special case of only allowing one file > > system on it, > > I'd rather hope that this should become the norm. I meant the virtual disk that vinum presents to the world. I guess things for physical disks are like that now, if you regard each wd0s1e as a virtual disk encompassing a subset of the physical disk. In that line of thinking, the slice table is merely a header written to the disk which represents a first level of virtualization; disklabels provide a second level. Looking at things this way, vinum volumes are equivalent to wd0s1e-style volumes. They are both virtual disks (although vinum vastly more configurable :) ), neither necessarily specifies a 1-1 mapping with physical disks, and both can only contain a single file system. I feel enlightened :) Thank you master. > > Greg > -- -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 16:20:46 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.enteract.com (mail.enteract.com [207.229.143.33]) by hub.freebsd.org (Postfix) with ESMTP id 94F6A155C2 for ; Wed, 3 Nov 1999 16:20:37 -0800 (PST) (envelope-from dscheidt@enteract.com) Received: from shell-2.enteract.com (dscheidt@shell-2.enteract.com [207.229.143.41]) by mail.enteract.com (8.9.3/8.9.3) with SMTP id SAA17504 for ; Wed, 3 Nov 1999 18:18:42 -0600 (CST) (envelope-from dscheidt@enteract.com) Date: Wed, 3 Nov 1999 18:18:42 -0600 (CST) From: David Scheidt To: freebsd-fs@FreeBSD.org Subject: Filesystems reading list? Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Does anyone have a list of readings in modern filesystem design? I understand the basics, at some high-level. What technical stuff do I need to read to get up to speed? Thanks, David Scheidt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 16:21:39 1999 Delivered-To: freebsd-fs@freebsd.org Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33]) by hub.freebsd.org (Postfix) with ESMTP id C24331511E for ; Wed, 3 Nov 1999 16:21:23 -0800 (PST) (envelope-from pantzer@speedy.ludd.luth.se) Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164]) by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id BAA29232; Thu, 4 Nov 1999 01:20:37 +0100 Message-Id: <199911040020.BAA29232@zed.ludd.luth.se> X-Mailer: exmh version 2.0.1 12/23/97 To: Kelly Yancey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: Message from Kelly Yancey of "Tue, 02 Nov 1999 19:16:55 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 04 Nov 1999 01:20:36 +0100 From: Mattias Pantzare Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I am under the impression that you can only enlarge a vinum volume if it > in a RAID 0 configuration (concatenation). Obviously, it would be very > difficult to enlarge a RAID 1 or RAID 5 configuration as it would require > restriping the data across all disks; I'm not familiar with any product, > hardware or software, that can do this. Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a RAID 5 set, but the extra disks will only hold data, no parity. I think that it is a strange mix of RAID 5 and concatenation. All data is still parity protected. It might not be as fast as a true RAID 5, but it can be very usful. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 17:18:47 1999 Delivered-To: freebsd-fs@freebsd.org Received: from login-2.eunet.no (login-2.eunet.no [193.71.71.239]) by hub.freebsd.org (Postfix) with ESMTP id 8441A14EF4 for ; Wed, 3 Nov 1999 17:18:39 -0800 (PST) (envelope-from mbendiks@eunet.no) Received: from login-1.eunet.no (mbendiks@login-1.eunet.no [193.71.71.238]) by login-2.eunet.no (8.9.3/8.9.3/GN) with ESMTP id CAA35867; Thu, 4 Nov 1999 02:18:32 +0100 (CET) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id CAA81477; Thu, 4 Nov 1999 02:18:31 +0100 (CET) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 4 Nov 1999 02:18:31 +0100 (CET) From: Marius Bendiksen To: Robert Watson Cc: freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I believe V9fs covers this. --- Marius Bendiksen, ScanCall AS On Thu, 28 Oct 1999, Robert Watson wrote: > > I'm in the process of hacking up a stupidfs -- i.e., a minimal file system > module that provides simplistic (i.e., stupid) implementations of all the > relevant vnops and vfsops based on in-kernel memory. The purpose of > stupidfs is to allow file system extension developers (like myself) to be > able to add new vnops and implement them in a simple file system without > having to deal initially with the issue of permenant storage in the file > stores, distributed file systems, etc. It would be a poor-man's MFS > (although perhaps more useful than MFS because it doesn't have the weight > of UFS/FFS tangled up in it, which is what has stopped me from using MFS > to do the same kind of testing), with it only really being useful for this > testing purpose. > > However, as this will take a little bit to write, I thought I'd ask if > anyone else has done this already? :-) > > Right now I pretty much have it to the point where I can see the directory > structure, create files of up to 1k, etc, etc, but there's a fair amount > more to do before it's useful. Those people working on ACLs and MACs for > POSIX.1e have needed a test framework that doesn't involve seriously > hurting themselves on the sharp edges of FFS and MFS, but that still > allows them to actually see the results in a file system. Layering would > be another option [if only it worked]. And even with layering, there are > still complications in implementation -- more complicated, than saying > "gee, let's extend the inode to have *this* structure in it" and just > having it work as it backs to nothing and isn't tangled up in the idea of > backing to something (e.g., MFS). > > Robert N M Watson > > robert@fledge.watson.org http://www.watson.org/~robert/ > PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 > TIS Labs at Network Associates, Safeport Network Services > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > > > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 18:13:30 1999 Delivered-To: freebsd-fs@freebsd.org Received: from macalpine.cornfed.com (macalpine.cornfed.com [208.58.42.162]) by hub.freebsd.org (Postfix) with ESMTP id 2691014A27 for ; Wed, 3 Nov 1999 18:13:26 -0800 (PST) (envelope-from fwmiller@macalpine.cornfed.com) Received: (from fwmiller@localhost) by macalpine.cornfed.com (8.8.8/8.8.8) id VAA03158; Wed, 3 Nov 1999 21:11:49 -0500 (EST) (envelope-from fwmiller) From: "Frank W. Miller" Message-Id: <199911040211.VAA03158@macalpine.cornfed.com> Subject: Re: Filesystems reading list? In-Reply-To: from David Scheidt at "Nov 3, 99 06:18:42 pm" To: freebsd-fs@FreeBSD.ORG Date: Wed, 3 Nov 1999 21:11:49 -0500 (EST) Cc: fwmiller@macalpine.cornfed.com (Frank W. Miller) X-Mailer: ELM [version 2.4ME+ PL38 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Does anyone have a list of readings in modern filesystem design? I > understand the basics, at some high-level. What technical stuff do I need > to read to get up to speed? > I would recommend the following papers: McKusick, M. K., Joy, W. N., Leffler, S. J., and Fabry, R. S., ``A Fast File System for UNIX'', ACM TOCS 2, 3 (Aug. 1984) pp. 181-197. Kleinman, S., ``Vnodes: An Architecture for Multiple File System Types in Sun UNIX'', Proc. of the Summer 1986 Conference, USENIX, 1986. Rosenthal, D., ``Evolving the Vnode Interface'', Proc. of the Summer 1990 Conference, USENIX, 1990. Skinner, G. and Wong, T., ``Stacking Vnodes: A Progress Report'', Proc. of the Summer 1993 Conference, USENIX, 1993. Heidemann, J. and Popek, G, ``File-System Development with Stackable Layers'', ACM TOCS, 12, 1, 1994. and the book: McKusick, M., Bostic, K., Karels, M., and Quarterman, J., The Design and Implementation of the 4.4BSD Operating System, Addison-Wesley, 1996. Later, FM -- Frank W. Miller Cornfed Systems Inc www.cornfed.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Nov 3 18:51:41 1999 Delivered-To: freebsd-fs@freebsd.org Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20]) by hub.freebsd.org (Postfix) with ESMTP id DE8BC15615 for ; Wed, 3 Nov 1999 18:51:37 -0800 (PST) (envelope-from ezk@shekel.mcl.cs.columbia.edu) Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15]) by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id VAA10939; Wed, 3 Nov 1999 21:49:25 -0500 (EST) Received: (from ezk@localhost) by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id VAA26232; Wed, 3 Nov 1999 21:49:24 -0500 (EST) Date: Wed, 3 Nov 1999 21:49:24 -0500 (EST) Message-Id: <199911040249.VAA26232@shekel.mcl.cs.columbia.edu> X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f From: Erez Zadok To: "Frank W. Miller" Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Filesystems reading list? In-reply-to: Your message of "Wed, 03 Nov 1999 21:11:49 EST." <199911040211.VAA03158@macalpine.cornfed.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message <199911040211.VAA03158@macalpine.cornfed.com>, "Frank W. Miller" writes: > > > > Does anyone have a list of readings in modern filesystem design? I > > understand the basics, at some high-level. What technical stuff do I need > > to read to get up to speed? > > > > I would recommend the following papers: [...] All good papers. It depends what area or field you'd like to get into wrt filesystems, David. The list Frank supplied is more towards stackable f/s. There are other papers if you're interested in distributed/network file systems (e.g., nfs, coda), high performance file systems (xfs, reiserfs), automounter file systems, (amd, automounter/autofs, hlfsd, Blaze's CFS), extent-like file systems, journaling file systems, numerous special purpose file systems, and even more numerous tweaks to existing file systems. I have an extensive library of f/s papers I've collected over the past decade, and I probably give you pointers to many. > Later, > FM > > -- > Frank W. Miller > Cornfed Systems Inc > www.cornfed.com > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message Erez. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 10: 2:23 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id C524C14C3E for ; Thu, 4 Nov 1999 10:02:11 -0800 (PST) (envelope-from grog@mojave.sitaranetworks.com) Message-ID: <19991104130052.13342@mojave.sitaranetworks.com> Date: Thu, 4 Nov 1999 13:00:52 -0500 From: Greg Lehey To: Bernd Walter , Kelly Yancey Cc: Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Reply-To: Greg Lehey References: <19991103005415.A88044@cicely7.cicely.de> <19991103105333.A89617@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991103105333.A89617@cicely7.cicely.de>; from Bernd Walter on Wed, Nov 03, 1999 at 10:53:33AM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 3 November 1999 at 10:53:33 +0100, Bernd Walter wrote: > On Tue, Nov 02, 1999 at 07:16:55PM -0500, Kelly Yancey wrote: >> On Wed, 3 Nov 1999, Bernd Walter wrote: >> >>> On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote: >>>> On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote: >>>>> >>>>> >>>>> hi, >>>>> >>>>> here's my list of features I'd like to see in a >>>>> journalled fs. Have to admit this list is heavily >>>>> inspired ( ok , copied ) from the VxFS features, >>>>> apart from th buzz words, >>>>> some of them make sense, some of them don't >>>>> but it should give us some stuff to discus: >>>>> [snip] >>>>> 6) vinum integration (vague) >>>> >>>> Vinum is just a virtual disk. As such, any file system should work on >>>> it. >>>> >>> It is more than that - it is a volume manager. >>> Maybe you are not clear how far you got beyound the virtual disk. >>> It manages disks and can find it's drive properly if they changed devices - >>> that's working relay fine that I was able to remove nearly all wire >>> configurations for drives and I'm eaven run a volume with only one single >>> drive plex - just to get this feature. >>> It can (or should be able to) resize a volume and should inform the system >>> about. >> >> I am under the impression that you can only enlarge a vinum volume if it >> in a RAID 0 configuration (concatenation). Obviously, it would be very >> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require >> restriping the data across all disks; I'm not familiar with any product, >> hardware or software, that can do this. > > In case of Striping which is valid for Raid5 and concatenated Raid0 configrations > it is not simply possible to do. > But think of a Raid5 volume which is extended with concatenating another Raid5 set. > This is not doable with vinum - but I'm shure that this won't happen before anyone > is using such a feature feature. Well, I'm sure that nobody will use this feature until it's available :-) Yes, I remember you asking for this feature. I suppose I should add it to the wish list (I just forgot to do it). Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 10: 2:44 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mojave.sitaranetworks.com (mojave.sitaranetworks.com [199.103.141.157]) by hub.freebsd.org (Postfix) with ESMTP id 7567D15409 for ; Thu, 4 Nov 1999 10:02:38 -0800 (PST) (envelope-from grog@mojave.sitaranetworks.com) Message-ID: <19991104130052.13342@mojave.sitaranetworks.com> Date: Thu, 4 Nov 1999 13:00:52 -0500 From: Greg Lehey To: Bernd Walter , Kelly Yancey Cc: Rodney , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Reply-To: Greg Lehey References: <19991103005415.A88044@cicely7.cicely.de> <19991103105333.A89617@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991103105333.A89617@cicely7.cicely.de>; from Bernd Walter on Wed, Nov 03, 1999 at 10:53:33AM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Wednesday, 3 November 1999 at 10:53:33 +0100, Bernd Walter wrote: > On Tue, Nov 02, 1999 at 07:16:55PM -0500, Kelly Yancey wrote: >> On Wed, 3 Nov 1999, Bernd Walter wrote: >> >>> On Tue, Nov 02, 1999 at 12:35:53PM -0500, Greg Lehey wrote: >>>> On Sunday, 31 October 1999 at 12:05:14 +0100, Rodney wrote: >>>>> >>>>> >>>>> hi, >>>>> >>>>> here's my list of features I'd like to see in a >>>>> journalled fs. Have to admit this list is heavily >>>>> inspired ( ok , copied ) from the VxFS features, >>>>> apart from th buzz words, >>>>> some of them make sense, some of them don't >>>>> but it should give us some stuff to discus: >>>>> [snip] >>>>> 6) vinum integration (vague) >>>> >>>> Vinum is just a virtual disk. As such, any file system should work on >>>> it. >>>> >>> It is more than that - it is a volume manager. >>> Maybe you are not clear how far you got beyound the virtual disk. >>> It manages disks and can find it's drive properly if they changed devices - >>> that's working relay fine that I was able to remove nearly all wire >>> configurations for drives and I'm eaven run a volume with only one single >>> drive plex - just to get this feature. >>> It can (or should be able to) resize a volume and should inform the system >>> about. >> >> I am under the impression that you can only enlarge a vinum volume if it >> in a RAID 0 configuration (concatenation). Obviously, it would be very >> difficult to enlarge a RAID 1 or RAID 5 configuration as it would require >> restriping the data across all disks; I'm not familiar with any product, >> hardware or software, that can do this. > > In case of Striping which is valid for Raid5 and concatenated Raid0 configrations > it is not simply possible to do. > But think of a Raid5 volume which is extended with concatenating another Raid5 set. > This is not doable with vinum - but I'm shure that this won't happen before anyone > is using such a feature feature. Well, I'm sure that nobody will use this feature until it's available :-) Yes, I remember you asking for this feature. I suppose I should add it to the wish list (I just forgot to do it). Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 13:28: 7 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 8B8201513D for ; Thu, 4 Nov 1999 13:27:59 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id QAA19515; Thu, 4 Nov 1999 16:26:58 -0500 (EST) Date: Thu, 4 Nov 1999 16:26:58 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Greg Lehey Cc: Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991104161317.49512@mojave.sitaranetworks.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a > > RAID 5 set, but the extra disks will only hold data, no parity. > > That's normal for RAID-5. Only one disk in any stripe contains the > parity information. > > Greg Except that, if I understand correctly, the new disks don't have parity for them stored anywhere. They aren't really in on the RAID 5 game, but tag along and just pretend to be. All this talk about extending RAID 5 plexes has got be thinking about the oft-overlooked RAID 4. I realize this isn't currently implemented in vinum, but I understand it has similar (although slightly different, not worse, just different) performance characteristics to RAID 5. But I would think that RAID 4 would be much simpler to extend because of the fact only 1 disk contains the parity; rather than restriping, one only needs to recalculate parity. But then again, RAID 4 is one of the black sheep of the RAID family :) Kelly -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 14: 0:54 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 2317614C2B for ; Thu, 4 Nov 1999 14:00:51 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id QAA20219; Thu, 4 Nov 1999 16:58:21 -0500 (EST) Date: Thu, 4 Nov 1999 16:58:21 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Greg Lehey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991104163941.53462@mojave.sitaranetworks.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, 4 Nov 1999, Greg Lehey wrote: > On Thursday, 4 November 1999 at 16:26:58 -0500, Kelly Yancey wrote: > >>> Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a > >>> RAID 5 set, but the extra disks will only hold data, no parity. > >> > >> That's normal for RAID-5. Only one disk in any stripe contains the > >> parity information. > >> > >> Greg > > > > Except that, if I understand correctly, the new disks don't have parity > > for them stored anywhere. They aren't really in on the RAID 5 game, but > > tag along and just pretend to be. > > Ah. What reason do you have to assume that that's the case? I just thought that was what he meant by "but the extra disks will only hold data, no parity". In RAID 5, all disks will hold a portion of the parity information. > > > All this talk about extending RAID 5 plexes has got be thinking about > > the oft-overlooked RAID 4. I realize this isn't currently implemented in > > vinum, but I understand it has similar (although slightly different, not > > worse, just different) performance characteristics to RAID 5. > > It has worse performance characteristics than RAID-5. It also has no > redeeming virtues, except possibly code simplicity. I've even had hardware which supported it (always just 0, 1, 0/1, and 5), so I don't have any practical experience (read "arm waving"), but I have read (mainly in Adaptec paraphenalia) that RAID 4 is supposed to have slightly better read characteristics. > > > But I would think that RAID 4 would be much simpler to extend > > because of the fact only 1 disk contains the parity; rather than > > restriping, one only needs to recalculate parity. > > No, the effort is the same. It's not recalculating parity that's the > killer, it's moving all the data around. Consider the first stripe in > the plex (which looks identical for RAID-4 and RAID-5): > > Disk 1 2 3 4 5 6 7 8 9 > --------------------------------------------------------- > | | | | | | | | P | > --------------------------------------------------------- > > You have a storage of 7 blocks, each of stripe size (say 7 MB for a 1 > MB stripe size). The first stripe contains the data for 0 to 6 MB, > the second stripe contains the data for 7 to 13 MB, the third for 14 > to 20, and so on. > > Add a disk and you get: > > ------------------------------------------------------------------ > | | | | | | | | | P | > ------------------------------------------------------------------ > > Now the first stripe must contain the data for 0 to 7 MB, the second > stripe for 8 to 15 MB, the third for 16 to 23, and so on. See the > problem? Recalculating parity is only part of it, and deciding where > it ends up (stays on disk 8 for RAID-4, moves to a possibly different > place for RAID-5) is trivial. > > Greg I was thinking that with RAID 4 specifying a single disk to hold all parity information, the volume manager would record which disk held the parity information (disk 8 in your example above) so adding another disk would result in: ------------------------------------------------------------------ | | | | | | | | P | | ------------------------------------------------------------------ Which looks odd, but would work, right? Then only parity would need to be recalculated. It only doesn't work with RAID 5 because the data is supposed to be distributed uniformly across the disks, so restriping is required. Kelly -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 14:32: 2 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 5E29D1568B for ; Thu, 4 Nov 1999 14:31:59 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id RAA20925; Thu, 4 Nov 1999 17:31:17 -0500 (EST) Date: Thu, 4 Nov 1999 17:31:17 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Greg Lehey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991104170819.58641@mojave.sitaranetworks.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Then only parity would need to be recalculated. It only doesn't work > > with RAID 5 because the data is supposed to be distributed uniformly > > across the disks, so restriping is required. > > Well, no, you've missed the point: with the exception of the first > stripe, *all* the data in the plex needs to be reshuffled, whether > you're doing RAID-4 or RAID-5. > > Greg Hmm. Yes, I suppose maintaining the location of data on the disk might be important :) [ looking for dunce cap ]. The new space would have to appear at the end of the volume, not scattered throughout it's internals :) Kelly -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 14:32: 7 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id 8F292156C4 for ; Thu, 4 Nov 1999 14:31:59 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id XAA15254; Thu, 4 Nov 1999 23:24:42 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id XAA97857; Thu, 4 Nov 1999 23:31:20 +0100 (CET) Date: Thu, 4 Nov 1999 23:31:20 +0100 From: Bernd Walter To: Kelly Yancey Cc: Greg Lehey , Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs Message-ID: <19991104233119.A97812@cicely7.cicely.de> References: <19991104161317.49512@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Nov 04, 1999 at 04:26:58PM -0500, Kelly Yancey wrote: > > > Solaris DiskSuite almost extends RAID 5 configruations. You can add disks to a > > > RAID 5 set, but the extra disks will only hold data, no parity. > > > > That's normal for RAID-5. Only one disk in any stripe contains the > > parity information. > > > > Greg > > Except that, if I understand correctly, the new disks don't have parity > for them stored anywhere. They aren't really in on the RAID 5 game, but > tag along and just pretend to be. If you concatenate a single disk to a raid5 set you will have only the raid5 range redundant. You need two or more R1/R5 sets to remain redudand. > > All this talk about extending RAID 5 plexes has got be thinking about > the oft-overlooked RAID 4. I realize this isn't currently implemented in > vinum, but I understand it has similar (although slightly different, not > worse, just different) performance characteristics to RAID 5. But I would > think that RAID 4 would be much simpler to extend because of the fact The simplification shouldn't be that much, but R4 is usually slower because the biggest load is on the parity and with R4 that's not balanced between the disks. > only 1 disk contains the parity; rather than restriping, one only needs to > recalculate parity. No - that's prety much the same if the basic is striping. Beside it should be possible get a concatenated Raid4 layout with vinum if you create a R5 plex with a stripesize equal to the subdisksize. Nevertheless vinums paritylocking is not optimal for this case. R4 is only interesting because you can convert from R0 to R4 and back without the need to copy any datablocks. It shouldn't be much work to implement R4 but who realy needs it. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:25: 8 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 0E06014DFD for ; Thu, 4 Nov 1999 15:25:01 -0800 (PST) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id QAA05335; Thu, 4 Nov 1999 16:24:23 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp03.primenet.com, id smtpdAAA4faWAj; Thu Nov 4 16:23:50 1999 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id QAA20462; Thu, 4 Nov 1999 16:23:04 -0700 (MST) From: Terry Lambert Message-Id: <199911042323.QAA20462@usr07.primenet.com> Subject: Re: journaling UFS and LFS To: dg@root.com Date: Thu, 4 Nov 1999 23:23:04 +0000 (GMT) Cc: tlambert@primenet.com, Stephen.Byan@quantum.com, freebsd-fs@FreeBSD.ORG In-Reply-To: <199911012338.PAA07714@implode.root.com> from "David Greenman" at Nov 1, 99 03:38:21 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >> >> Softupdates is definitely a viable solution however it does not address > >> >> several issues and the license is not a BSD license so it makes me > >> >> uncomfortable. > > > >The license issue is a Whistle thing. Talk to Julian and get him > >to pound on Doug Brent, preferrably before December 31st of this year. > > How is the softupdates license a Whistle thing? It seems to me that it is > a Kirk McKusick and Sun MicroSystems thing. Whistle requested the license so that Whistle could maintain an edge over the competition in the same product space. The duration that it is under the license in the source tree was negotiated between Whistle and Kirk for that reason. The purpose of the Whistle financial support for the implementation was technically to get rid of the UPS in the InterJet. I was one of the main evnagelists of this approach within Whistle, having worked on an FFS with Soft Updates implementation at the company I worked at prior to coming to work for Whistle. As I said, talk to Julian. I believe we (Whistle) can (and always intended to) release the code under UCB license after recouping R&D costs, and there there was in fact a contractually specified date for this happening. I don't currently have access to the contract. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:27:48 1999 Delivered-To: freebsd-fs@freebsd.org Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33]) by hub.freebsd.org (Postfix) with ESMTP id 8CAC115022 for ; Thu, 4 Nov 1999 15:27:39 -0800 (PST) (envelope-from pantzer@speedy.ludd.luth.se) Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164]) by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id AAA02859; Fri, 5 Nov 1999 00:26:14 +0100 Message-Id: <199911042326.AAA02859@zed.ludd.luth.se> X-Mailer: exmh version 2.0.1 12/23/97 To: Greg Lehey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: Message from Greg Lehey of "Thu, 04 Nov 1999 16:13:17 EST." <19991104161317.49512@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Date: Fri, 05 Nov 1999 00:26:14 +0100 From: Mattias Pantzare Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > On Thursday, 4 November 1999 at 1:20:36 +0100, Mattias Pantzare wrote= : > >> I am under the impression that you can only enlarge a vinum volume= if it > >> in a RAID 0 configuration (concatenation). Obviously, it would be ve= ry > >> difficult to enlarge a RAID 1 or RAID 5 configuration as it would re= quire > >> restriping the data across all disks; I'm not familiar with any prod= uct, > >> hardware or software, that can do this. > > > > Solaris DiskSuite almost extends RAID 5 configruations. You can add d= isks to a > > RAID 5 set, but the extra disks will only hold data, no parity. > = > That's normal for RAID-5. Only one disk in any stripe contains the > parity information. Disk, not stripe. > = > > I think that it is a strange mix of RAID 5 and concatenation. All > > data is still parity protected. It might not be as fast as a true > > RAID 5, but it can be very usful. > = > What's the difference? If you have 3 disks and 3 stripes and number sectors from 1 to 6: Disk 1 Disk 2 Disk 3 = 1 2 P = P 3 4 = 5 P 6 = Then add a new disk: Disk 1 Disk 2 Disk 3 New Disk 1 2 P 7 P 3 4 8 5 P 6 9 All you have to do is recalculate the new parity data when you write new = data = if you zero the new disk before using it. Disk accesses will not be spread out as in normal RAID5, but you still ge= t = parity protection. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:29: 8 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id D3C2515022 for ; Thu, 4 Nov 1999 15:29:01 -0800 (PST) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.9.3/8.9.3) id QAA04567; Thu, 4 Nov 1999 16:28:05 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp01.primenet.com, id smtpdAAAYyayYh; Thu Nov 4 16:27:48 1999 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id QAA20534; Thu, 4 Nov 1999 16:26:32 -0700 (MST) From: Terry Lambert Message-Id: <199911042326.QAA20534@usr07.primenet.com> Subject: Re: Journaling To: bouyer@antioche.lip6.fr (Manuel Bouyer) Date: Thu, 4 Nov 1999 23:26:32 +0000 (GMT) Cc: tlambert@primenet.com, ken@kdm.org, don@calis.blacksun.org, ticso@cicely.de, grog@lemis.com, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG In-Reply-To: <19991102134152.A18969@antioche.lip6.fr> from "Manuel Bouyer" at Nov 2, 99 01:41:52 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > NetBSD currently supports 16. > > > > Yes, it breaks backward compatability. > > No, NetBSD supports 16 only on ports that started with 16. > Other still are 8. There are discussions about how to move to a higther > number (not 16, but at last 64 or more) without breacking backward > compatability ... You can't cross mount media between OSs with the same byte ordering. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:34:54 1999 Delivered-To: freebsd-fs@freebsd.org Received: from kronos.alcnet.com (kronos.alcnet.com [63.69.28.22]) by hub.freebsd.org (Postfix) with ESMTP id 07E661518A for ; Thu, 4 Nov 1999 15:34:49 -0800 (PST) (envelope-from kbyanc@posi.net) X-Provider: ALC Communications, Inc. http://www.alcnet.com/ Received: from localhost (kbyanc@localhost) by kronos.alcnet.com (8.9.3/8.9.3/antispam) with ESMTP id SAA22104; Thu, 4 Nov 1999 18:33:24 -0500 (EST) Date: Thu, 4 Nov 1999 18:33:24 -0500 (EST) From: Kelly Yancey X-Sender: kbyanc@kronos.alcnet.com To: Greg Lehey Cc: Bernd Walter , Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-Reply-To: <19991104174750.37515@mojave.sitaranetworks.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, 4 Nov 1999, Greg Lehey wrote: > > That's for writing. When throughput becomes the limit, the write > throughput of RAID-4 is limited to about 2 / n of the write throughput > of RAID-5. On reading (randomly), it's (n - 1) / n. > I think that it has been significantly proven that RAID 4 is not very userful, and I regret bringing it up...sometimes the mind wonders :). Kelly -- Kelly Yancey - kbyanc@posi.net - Richmond, VA Director of Technical Services, ALC Communications http://www.alcnet.com/ Maintainer, BSD Driver Database http://www.posi.net/freebsd/drivers/ Coordinator, Team FreeBSD http://www.posi.net/freebsd/Team-FreeBSD/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:41: 0 1999 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id B95C31518B for ; Thu, 4 Nov 1999 15:40:53 -0800 (PST) (envelope-from tlambert@usr07.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id QAA11596; Thu, 4 Nov 1999 16:39:15 -0700 (MST) Received: from usr07.primenet.com(206.165.6.207) via SMTP by smtp03.primenet.com, id smtpdAAAEyaGGw; Thu Nov 4 16:39:08 1999 Received: (from tlambert@localhost) by usr07.primenet.com (8.8.5/8.8.5) id QAA21054; Thu, 4 Nov 1999 16:39:22 -0700 (MST) From: Terry Lambert Message-Id: <199911042339.QAA21054@usr07.primenet.com> Subject: Re: Features of a journaled file system To: grog@lemis.com Date: Thu, 4 Nov 1999 23:39:22 +0000 (GMT) Cc: don@calis.blacksun.org, freebsd-fs@FreeBSD.ORG In-Reply-To: <19991102102601.54815@mojave.sitaranetworks.com> from "Greg Lehey" at Nov 2, 99 10:26:01 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > On Saturday, 30 October 1999 at 18:56:24 -0400, Don wrote: > > What are the features people would like to see in a new FreeBSD file > > system? Some of the ones I have heard listed are: > > 1. Ability to grow a FS > > 2. Ability to shrink a FS > > 3. Acess control lists on files and file systems > > 4. Extensibility. (The ability to easily add new features to the > > filesystem without having to rewrite utilities such as fsck) > > None of these are specific features of a journalling file system. > They're probably all desirable. ACLs, in particular, should not be a feature of an FS that manages block allocation policy, but should instead be a semantic access stacking layer. It is trivial to write an ACL (or quota) stacking layer, given working stacking layers. I think that requests like ACLs, extended attributes, user and group disk quotas, NT security policy management, etc., etc., should all go on the "make stacking layers work" list, _not_ the "write a journalled FS" list. That said, a journalled FS would be a useful thing to have, and not just for the marketing bullet item. I am thinking in particular about how very easy it would be to implement a userland accessibly transactioning system and record based file layout semantics with such a beast... 8-). I also like the idea of re-seperating the UFS and FFS layers, so that you could initially work on just the journalling issues, and we could tackle b-tree based directory management (for example) in a seperate stacking layer... just like the UFS stacking layer does by overlaying an alphabetic name, link, and symlink supporting semantics ontop of the FFS namespace, which is basically a flat numberic namespace that knows how to do block management in numerically (inode number) named objects. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:46:40 1999 Delivered-To: freebsd-fs@freebsd.org Received: from caspian.plutotech.com (caspian.plutotech.com [206.168.67.80]) by hub.freebsd.org (Postfix) with ESMTP id 325DD1518D for ; Thu, 4 Nov 1999 15:46:36 -0800 (PST) (envelope-from gibbs@caspian.plutotech.com) Received: from caspian.plutotech.com (localhost [127.0.0.1]) by caspian.plutotech.com (8.9.3/8.9.1) with ESMTP id PAA05113; Thu, 4 Nov 1999 15:45:26 -0700 (MST) (envelope-from gibbs@caspian.plutotech.com) Message-Id: <199911042245.PAA05113@caspian.plutotech.com> X-Mailer: exmh version 2.1.0 09/18/1999 To: Kelly Yancey Cc: Greg Lehey , Bernd Walter , Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-reply-to: Your message of "Thu, 04 Nov 1999 18:33:24 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 04 Nov 1999 15:45:26 -0700 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >On Thu, 4 Nov 1999, Greg Lehey wrote: > >> >> That's for writing. When throughput becomes the limit, the write >> throughput of RAID-4 is limited to about 2 / n of the write throughput >> of RAID-5. On reading (randomly), it's (n - 1) / n. >> > > I think that it has been significantly proven that RAID 4 is not very >userful, and I regret bringing it up...sometimes the mind wonders :). It all depends on your application. If you are dealing with a data set composed of large, fixed sized entries, RAID 3 or 4 (they are almost identical) will always outperform RAID5. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:47:23 1999 Delivered-To: freebsd-fs@freebsd.org Received: from caspian.plutotech.com (caspian.plutotech.com [206.168.67.80]) by hub.freebsd.org (Postfix) with ESMTP id 68DB31518D for ; Thu, 4 Nov 1999 15:47:16 -0800 (PST) (envelope-from gibbs@caspian.plutotech.com) Received: from caspian.plutotech.com (localhost [127.0.0.1]) by caspian.plutotech.com (8.9.3/8.9.1) with ESMTP id PAA05113; Thu, 4 Nov 1999 15:45:26 -0700 (MST) (envelope-from gibbs@caspian.plutotech.com) Message-Id: <199911042245.PAA05113@caspian.plutotech.com> X-Mailer: exmh version 2.1.0 09/18/1999 To: Kelly Yancey Cc: Greg Lehey , Bernd Walter , Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: feature list journalled fs In-reply-to: Your message of "Thu, 04 Nov 1999 18:33:24 EST." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 04 Nov 1999 15:45:26 -0700 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >On Thu, 4 Nov 1999, Greg Lehey wrote: > >> >> That's for writing. When throughput becomes the limit, the write >> throughput of RAID-4 is limited to about 2 / n of the write throughput >> of RAID-5. On reading (randomly), it's (n - 1) / n. >> > > I think that it has been significantly proven that RAID 4 is not very >userful, and I regret bringing it up...sometimes the mind wonders :). It all depends on your application. If you are dealing with a data set composed of large, fixed sized entries, RAID 3 or 4 (they are almost identical) will always outperform RAID5. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 15:53:30 1999 Delivered-To: freebsd-fs@freebsd.org Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (Postfix) with ESMTP id 903E51518D for ; Thu, 4 Nov 1999 15:53:15 -0800 (PST) (envelope-from julian@whistle.com) Received: from current1.whiste.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.9.1a/8.9.1) with ESMTP id PAA87407; Thu, 4 Nov 1999 15:45:31 -0800 (PST) Date: Thu, 4 Nov 1999 15:45:30 -0800 (PST) From: Julian Elischer To: Terry Lambert Cc: dg@root.com, Stephen.Byan@quantum.com, freebsd-fs@FreeBSD.ORG Subject: Re: journaling UFS and LFS In-Reply-To: <199911042323.QAA20462@usr07.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, 4 Nov 1999, Terry Lambert wrote: > > >> >> Softupdates is definitely a viable solution however it does not address > > >> >> several issues and the license is not a BSD license so it makes me > > >> >> uncomfortable. > > > > > >The license issue is a Whistle thing. Talk to Julian and get him > > >to pound on Doug Brent, preferrably before December 31st of this year. > > > > How is the softupdates license a Whistle thing? It seems to me that it is > > a Kirk McKusick and Sun MicroSystems thing. > > Whistle requested the license so that Whistle could maintain an > edge over the competition in the same product space. The duration > that it is under the license in the source tree was negotiated > between Whistle and Kirk for that reason. > > The purpose of the Whistle financial support for the implementation > was technically to get rid of the UPS in the InterJet. I was one > of the main evnagelists of this approach within Whistle, having > worked on an FFS with Soft Updates implementation at the company > I worked at prior to coming to work for Whistle. > > As I said, talk to Julian. I believe we (Whistle) can (and always > intended to) release the code under UCB license after recouping R&D > costs, and there there was in fact a contractually specified date > for this happening. I don't currently have access to the contract. Terry is slightly mis-stating the situation Whistle basically asked Kirk what his plans were and offered to support his development if he agreed that he would not licence it to a few specified competitors (not my idea, buthte number is countable on one hand). Obviously this only holds for as long as he is generally licensing it. When he releases it, our agreement becomes void (Or so I beleive). I vaguely remember that we had a request that it not be released in less than N months or something. since N was less than or equal to M, which was Kirks own needs, this was a non issue. Basically Whistle didn't want to be subsidising some particular competitors. On the other hand Whistle wanted the technology in FreeBSD and generally usable. The agreement had a end-of-life clause and I believe that it's actually run out, or close to it. Part of this is that it had to be explainable to the investors as not being a gift to the opposition. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Nov 4 18:26: 0 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id B92571568B for ; Thu, 4 Nov 1999 18:25:58 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id DAA29104; Fri, 5 Nov 1999 03:19:01 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id DAA99081; Fri, 5 Nov 1999 03:25:39 +0100 (CET) Date: Fri, 5 Nov 1999 03:25:38 +0100 From: Bernd Walter To: Terry Lambert Cc: Manuel Bouyer , ken@kdm.org, don@calis.blacksun.org, ticso@cicely.de, grog@lemis.com, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG Subject: Re: Journaling Message-ID: <19991105032538.A98956@cicely7.cicely.de> References: <19991102134152.A18969@antioche.lip6.fr> <199911042326.QAA20534@usr07.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <199911042326.QAA20534@usr07.primenet.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Nov 04, 1999 at 11:26:32PM +0000, Terry Lambert wrote: > > > NetBSD currently supports 16. > > > > > > Yes, it breaks backward compatability. > > > > No, NetBSD supports 16 only on ports that started with 16. > > Other still are 8. There are discussions about how to move to a higther > > number (not 16, but at last 64 or more) without breacking backward > > compatability ... > > You can't cross mount media between OSs with the same byte ordering. > The difference between FreeBSD-i386 and alpha produces the same kind of frustration. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 1:14:25 1999 Delivered-To: freebsd-fs@freebsd.org Received: from antioche.lip6.fr (antioche.lip6.fr [132.227.74.11]) by hub.freebsd.org (Postfix) with ESMTP id 60ED415280 for ; Fri, 5 Nov 1999 01:14:21 -0800 (PST) (envelope-from bouyer@antioche.lip6.fr) Received: from antifer.ipv6.lip6.fr (antifer.ipv6.lip6.fr [132.227.72.132]) by antioche.lip6.fr (8.9.3/8.9.3) with ESMTP id KAA08520; Fri, 5 Nov 1999 10:11:29 +0100 (MET) Received: (bouyer@localhost) by antifer.ipv6.lip6.fr (8.8.8/8.6.4) id KAA00614; Fri, 5 Nov 1999 10:10:54 +0100 (MET) Date: Fri, 5 Nov 1999 10:10:54 +0100 From: Manuel Bouyer To: Terry Lambert Cc: ken@kdm.org, don@calis.blacksun.org, ticso@cicely.de, grog@lemis.com, bright@wintelcom.net, freebsd-fs@FreeBSD.ORG Subject: Re: Journaling Message-ID: <19991105101054.B584@antioche.lip6.fr> References: <19991102134152.A18969@antioche.lip6.fr> <199911042326.QAA20534@usr07.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.95.6us In-Reply-To: <199911042326.QAA20534@usr07.primenet.com>; from Terry Lambert on Thu, Nov 04, 1999 at 11:26:32PM +0000 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Nov 04, 1999 at 11:26:32PM +0000, Terry Lambert wrote: > You can't cross mount media between OSs with the same byte ordering. Why ? I surely did miss something here ... Or maybe you meant 'port' instead of 'os' ? In which case this is true, but it's because of differences in the on-disk disklabel format (dependant on firmware). If your media doesn't have a disklabel no problems (I to this between my i386 and sparc, NetBSD supports byte-swapped FFS). If your media is partitioned then you have to put an in-core disklabel matching its partitioning before mouting it. -- Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr -- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 6:26:12 1999 Delivered-To: freebsd-fs@freebsd.org Received: from akat.civ.cvut.cz (akat.civ.cvut.cz [147.32.235.105]) by hub.freebsd.org (Postfix) with SMTP id BB26A14D32 for ; Fri, 5 Nov 1999 06:26:01 -0800 (PST) (envelope-from pechy@hp735.cvut.cz) Received: from localhost (pechy@localhost) by akat.civ.cvut.cz (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id PAA10512; Fri, 5 Nov 1999 15:23:23 +0100 Date: Fri, 5 Nov 1999 15:23:22 +0100 From: Jan Pechanec X-Sender: pechy@akat.civ.cvut.cz To: Erez Zadok Cc: Robert Watson , freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? In-Reply-To: <199910282122.RAA07811@shekel.mcl.cs.columbia.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, 28 Oct 1999, Erez Zadok wrote: Hi, I think that it is a bit different. What Robert is hacking is a filesystem where in-vfs-not-experienced programmer can see how vfs is working. I have just read some of your papers, Erez, and I think that wrapfs wants me not to bother with something like vfs (just encode and decode routines). I think that Robert's effort is very useful, I wanted myself to write somethink like this (purpose: to learn and _touch_ vfs interface). Robert, do you carry on or not? BTW, don't you know why deadfs was written? No doc in FreeBSD. From what I saw in the source code, operations just fail. Thank you, Jan. >Robert, it's been done. To some degree that's nullfs (if nullfs had been >working; the VFS is broken). I've written stackable f/s templates exactly >for the purpose of developers using them to build other f/s w/o having the >many hassles of writing a full f/s. My wrapper templates, called wrapfs, >work on freebsd, linux, and solaris. You can build all kinds of f/s using >them, including f/s that do not require persistent storage. > >See > http://www.cs.columbia.edu/~ezk/research >for papers, and > http://www.cs.columbia.edu/~ezk/research/software >for tarballs. > >Let me know if you have any questions. > >Erez Zadok. >Columbia University Department of Computer Science. >EMail: ezk@cs.columbia.edu Web: http://www.cs.columbia.edu/~ezk > > >To Unsubscribe: send mail to majordomo@FreeBSD.org >with "unsubscribe freebsd-fs" in the body of the message > -- Jan PECHANEC (mailto:pechy@hp735.cvut.cz) Computing Center CTU (Zikova 4, Praha 6, 166 35, Czech Republic) http://www.civ.cvut.cz, tel: +420 2 2435 2969, http://pechy.civ.cvut.cz To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 7: 2:14 1999 Delivered-To: freebsd-fs@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id E05991505E for ; Fri, 5 Nov 1999 07:02:04 -0800 (PST) (envelope-from robert@cyrus.watson.org) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.9.3/8.9.3) with SMTP id JAA51639; Fri, 5 Nov 1999 09:59:56 -0500 (EST) (envelope-from robert@cyrus.watson.org) Date: Fri, 5 Nov 1999 09:59:56 -0500 (EST) From: Robert Watson X-Sender: robert@fledge.watson.org Reply-To: Robert Watson To: Jan Pechanec Cc: Erez Zadok , freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, 5 Nov 1999, Jan Pechanec wrote: > On Thu, 28 Oct 1999, Erez Zadok wrote: > > Hi, > > I think that it is a bit different. What Robert is hacking is > a filesystem where in-vfs-not-experienced programmer can see how vfs > is working. I have just read some of your papers, Erez, and I think > that wrapfs wants me not to bother with something like vfs (just > encode and decode routines). > > I think that Robert's effort is very useful, I wanted myself > to write somethink like this (purpose: to learn and _touch_ vfs > interface). Robert, do you carry on or not? > > BTW, don't you know why deadfs was written? No doc in FreeBSD. > From what I saw in the source code, operations just fail. Because wrapfs doesn't work in 3.3-RELEASE yet, and because of the reasons you mention, I decided to keep working on a stupidfs :-). That is, that I don't want to add functionality to an existing file system by stacking, but rather to have a new simple file system that I can modify the semantics of in ways not encouarged by the stacking of file systems. I am currently traveling (IETF next week, Active Network conference in Alberquerque the week after) so won't get back to my development machines for about two weeks. After that time, I hope to get a stupidfs implementation to the point where it might be useful for others to see, so I'll put it online. As I mentioned before, the goal is to have a really simple file system with no backing store, appropriate for use when experimenting with new VOPs, etc, etc. It won't be fully functioning (for example, I probably won't even bother to implement symlinks) but it will be *simple*, meaning it can be modifed easily. It will also be separable into an entirely separate module, unlike UFS which has fingers everywhere, so it can easily be loaded and unloaded on demand during development. I wouldn't encourage anyone to use it in production--it will make a fair amount of use of kernel memory, as it won't back to a process--but for development it should be useful. Robert N M Watson robert@fledge.watson.org http://www.watson.org/~robert/ PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 TIS Labs at Network Associates, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 9:19:37 1999 Delivered-To: freebsd-fs@freebsd.org Received: from parker.yahoo.com (parker.yahoo.com [205.216.162.204]) by hub.freebsd.org (Postfix) with ESMTP id A4C2F1522B for ; Fri, 5 Nov 1999 09:19:25 -0800 (PST) (envelope-from jh@parker.yahoo.com) Received: from parker.yahoo.com (localhost.yahoo.com [127.0.0.1]) by parker.yahoo.com (8.8.8/8.6.12) with ESMTP id JAA23410; Fri, 5 Nov 1999 09:15:46 -0800 (PST) Message-Id: <199911051715.JAA23410@parker.yahoo.com> To: Jan Pechanec Cc: Erez Zadok , Robert Watson , freebsd-fs@FreeBSD.ORG Subject: deadfs, Re: stupidfs - easily extensible test file systems? In-reply-to: Your message of "Fri, 05 Nov 1999 15:23:22 +0100." Date: Fri, 05 Nov 1999 09:15:46 -0800 From: John Hanley Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > BTW, don't you know why deadfs was written? No doc in FreeBSD. > From what I saw in the source code, operations just fail. Deadfs is so you can V_BAD a vnode to revoke access to a tty or pty. (Or revoke access to a filesystem, upon forcible umount.) There used to be an ugly security problem where someone would log in, start background jobs that can read/write a tty or pty, their login shell exits, and some hapless person logs on to the tty or pty and is abused by the background jobs that still hold an open file descriptor. Nowadays, upon logout we V_BAD those file descriptors and the background jobs can do no harm, but they are allowed to finish their computations and write their results to disk. Cheers, JH To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 10:43:53 1999 Delivered-To: freebsd-fs@freebsd.org Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20]) by hub.freebsd.org (Postfix) with ESMTP id 684B214C84 for ; Fri, 5 Nov 1999 10:43:49 -0800 (PST) (envelope-from ezk@shekel.mcl.cs.columbia.edu) Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15]) by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id NAA22581; Fri, 5 Nov 1999 13:43:10 -0500 (EST) Received: (from ezk@localhost) by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id NAA21856; Fri, 5 Nov 1999 13:43:09 -0500 (EST) Date: Fri, 5 Nov 1999 13:43:09 -0500 (EST) Message-Id: <199911051843.NAA21856@shekel.mcl.cs.columbia.edu> X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f From: Erez Zadok To: Jan Pechanec Cc: Erez Zadok , Robert Watson , freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? In-reply-to: Your message of "Fri, 05 Nov 1999 15:23:22 +0100." Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message , Jan Pechanec writes: > On Thu, 28 Oct 1999, Erez Zadok wrote: > > Hi, > > I think that it is a bit different. What Robert is hacking is > a filesystem where in-vfs-not-experienced programmer can see how vfs > is working. I have just read some of your papers, Erez, and I think > that wrapfs wants me not to bother with something like vfs (just > encode and decode routines). The encode and decode routines that wrapfs exports are an API that greatly simplifies two difficult tasks: (1) modifying file names (e.g., translating b/t unix and 8.3 names) (2) modifying file data (e.g., encryption) Every other task you want to accomplish in wrapfs, you do it right in the actual f/s routines, right in the code itself. For example, if you wanted to add acl support (as I've done w/ a trivial aclfs based on wrapfs), you add the right code in lookup(). If you want to create an unrmfs (another prototype I've got), you add it in unlink(). If you wish, you can also touch the read/write/getpage/putpage routines directly and not use the encode/decode API functions. But you'll find that there's a substantial amount of support code needed to deal with data pages, locking, and a lot more stuff around it. All of this is detailed in my Usenix 99 paper. > I think that Robert's effort is very useful, I wanted myself > to write somethink like this (purpose: to learn and _touch_ vfs > interface). Robert, do you carry on or not? I commend you, but don't be surprised if what you'll produce in the end will be almost identical to wrapfs in functionality. It many ways, wrapfs is "stupid", b/c it only provides a thin layer that passes all VOPs to the layer below it, while maintaining semantics. Wrapfs does not do much more than that. That's why I'm telling you now that your stupidfs may wind up being very similar to wrapfs. You cannot get stacking functionality with much less than wrapfs does. If you actually intend to modify the VFS, and add new VOPs, that'll be neat too. But I think you'll find it a bit difficult to get VFS changes merged into the main source tree... :-) And if you will change the VFS, you'll find that your stupidfs does more than, and is "smarter" than wrapfs. You cannot introduce new VOPs w/o changing the VFS, and if you change the VFS, you must make sure that other (native) file systems do something reasonable with these new VOPs. > -- > Jan PECHANEC (mailto:pechy@hp735.cvut.cz) Computing Center CTU (Zikova 4, > Praha 6, 166 35, Czech Republic) http://www.civ.cvut.cz, tel: +420 2 2435 > 2969, http://pechy.civ.cvut.cz Jan et al. I'm not trying to "hawk my merchandise" on you, but rather to save you a great deal of effort repeating that which has been done before. I've created and released my wrapfs templates so that others could build on them, and create hopefully really useful (even commercial) file systems. It may sound corny, but I hope that my work will revitalize the stagnating field of stackable file systems research. You may save a lot of time, and still be able to learn much, by starting with my wrapfs code, and modifying it to your needs. I will be happy to help you in any way I can. Cheers, Erez. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 11:15:54 1999 Delivered-To: freebsd-fs@freebsd.org Received: from cs.columbia.edu (cs.columbia.edu [128.59.16.20]) by hub.freebsd.org (Postfix) with ESMTP id 0435D14BE9 for ; Fri, 5 Nov 1999 11:15:49 -0800 (PST) (envelope-from ezk@shekel.mcl.cs.columbia.edu) Received: from shekel.mcl.cs.columbia.edu (shekel.mcl.cs.columbia.edu [128.59.18.15]) by cs.columbia.edu (8.9.1/8.9.1) with ESMTP id OAA24974; Fri, 5 Nov 1999 14:14:50 -0500 (EST) Received: (from ezk@localhost) by shekel.mcl.cs.columbia.edu (8.9.1/8.9.1) id OAA23367; Fri, 5 Nov 1999 14:14:49 -0500 (EST) Date: Fri, 5 Nov 1999 14:14:49 -0500 (EST) Message-Id: <199911051914.OAA23367@shekel.mcl.cs.columbia.edu> X-Authentication-Warning: shekel.mcl.cs.columbia.edu: ezk set sender to ezk@shekel.mcl.cs.columbia.edu using -f From: Erez Zadok To: Robert Watson Cc: Jan Pechanec , Erez Zadok , freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? In-reply-to: Your message of "Fri, 05 Nov 1999 09:59:56 EST." Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message , Robert Watson writes: [...] > Because wrapfs doesn't work in 3.3-RELEASE yet, and because of the reasons > you mention, I decided to keep working on a stupidfs :-). I'll be updating wrapfs for 3.3 once I return from LISA. With luck, it'll work again before you return from Albuquerque. > That is, that I > don't want to add functionality to an existing file system by stacking, > but rather to have a new simple file system that I can modify the > semantics of in ways not encouarged by the stacking of file systems. If I understand you right (maybe I didn't), there are two ways to do that: (1) Create a simple *native* (disk based?) file system template from which you can possibly create new file systems that put data on disks and floppies, right? In theory, one should be able to create msdosfs and ffs from such a template. In practice, there are numerous details to work out, that getting something barely working for a "stuipid" template will require substantial effort. Such a template would be very useful if it will have these two characteristics: (a) be small and simple, and (b) require little modification to create file systems such as msdosfs and ffs. I believe that with current OS technology, it is impossible to get both 'a' and 'b' done. (2) If what you want is a file system that can work with other file systems, then you're essentially asking for stacking. Yes stacking f/s usually have to maintain VFS semantics, so that a layer is kept independent from other layers, either above or below it. It is possible, however, for a stackable f/s to violate this priniciple; for example, you can muck with direct disk blocks and inode blocks from a stackable f/s. It's not something I'd recommend, but it is possible. > I am > currently traveling (IETF next week, Active Network conference in > Alberquerque the week after) so won't get back to my development machines > for about two weeks. After that time, I hope to get a stupidfs > implementation to the point where it might be useful for others to see, so > I'll put it online. As I mentioned before, the goal is to have a really > simple file system with no backing store, appropriate for use when > experimenting with new VOPs, etc, etc. I'd be very interested in seeing this. I would also suggest that before you dive into coding, you write out a detailed design, and post it to this list, so we could all comment on it. Note that extensible VFSs have been the expressed desire of stackable file systems from the very early days. In order for me to support file system extensibilty without changing the OS or other file systems, I had to give up the idea of creating new VOPs. That is, you cannot add new vops using wrapfs; you could create new ioctls, however, which are the poor man's extensible model. IOW, if you created an infrastructure that can extend the VFS, you'll have something that wrapfs cannot do --- something that people have been asking for some time. (So don't call it "stuipid" :-) If you haven't already, you should read up on all of the classic stacking papers first, from Rosenthal, Skinner & Wong, Heidemann, Popek, etc. Then you might look into papers on Spring, BSD's Unionfs, and the HURD. All of these talk about mechanisms for VFS extensibility that would be useful for you. > It won't be fully functioning (for > example, I probably won't even bother to implement symlinks) but it will > be *simple*, meaning it can be modifed easily. It will also be separable > into an entirely separate module, unlike UFS which has fingers everywhere, > so it can easily be loaded and unloaded on demand during development. > > I wouldn't encourage anyone to use it in production--it will make a fair > amount of use of kernel memory, as it won't back to a process--but for > development it should be useful. I think you have to be very careful about your implementation. You cannot encourage people to use something in PRODUCTION that has not been thoroughly tested, and esp. if it's missing functionality. If you want your f/s to be useful, make sure it works with existing VFSs and existing file systems. At the very least, make sure it won't damage people's installations. It would be nice if "all" it did was _add_ new VOPs, while keeping existing ones unchanged. I'm speaking from experience here. I've developed wrapfs on solaris, freebsd, and linux. In the early days, I've dealt with bugs that easily corrupted active memory and resulted in total corruption of system and boot partitions, to a point where a reinstallation was required. After a few frustrating reinstallations, I wound up setting up automatic OS installation systems (network-based booting, installing off of an auxiliary disk, even using identical disks and dd'ing a good copy onto a trashed one). > Robert N M Watson > > robert@fledge.watson.org http://www.watson.org/~robert/ > PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 > TIS Labs at Network Associates, Safeport Network Services Good luck. Let me know if I can help. Erez. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 13:43:46 1999 Delivered-To: freebsd-fs@freebsd.org Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33]) by hub.freebsd.org (Postfix) with ESMTP id F3E1415369 for ; Fri, 5 Nov 1999 13:43:40 -0800 (PST) (envelope-from pantzer@speedy.ludd.luth.se) Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164]) by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id WAA02426; Fri, 5 Nov 1999 22:42:49 +0100 Message-Id: <199911052142.WAA02426@zed.ludd.luth.se> X-Mailer: exmh version 2.0.1 12/23/97 To: Greg Lehey Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs) In-Reply-To: Message from Greg Lehey of "Thu, 04 Nov 1999 18:37:37 EST." <19991104183737.04186@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 05 Nov 1999 22:42:49 +0100 From: Mattias Pantzare Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Take a look at /usr/src/sys/dev/vinum/vinumraid5.c and tell me how to > modify the code to make that work in a general case. Will do. But don't hold your breath :-) To get to know vinum I tried to use it on devices made with vnconfig. Should I debug the panic I got or will it just not work? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 13:53:12 1999 Delivered-To: freebsd-fs@freebsd.org Received: from yana.lemis.com (yana.lemis.com [192.109.197.140]) by hub.freebsd.org (Postfix) with ESMTP id B924A14C8F for ; Fri, 5 Nov 1999 13:53:07 -0800 (PST) (envelope-from grog@mojave.sitaranetworks.com) Received: from mojave.sitaranetworks.com ([199.103.141.157]) by yana.lemis.com (8.8.8/8.8.8) with ESMTP id IAA07050; Sat, 6 Nov 1999 08:21:21 +1030 (CST) (envelope-from grog@mojave.sitaranetworks.com) Message-ID: <19991105165042.50293@mojave.sitaranetworks.com> Date: Fri, 5 Nov 1999 16:50:42 -0500 From: Greg Lehey To: Mattias Pantzare Cc: freebsd-fs@FreeBSD.ORG Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs) Reply-To: Greg Lehey References: <199911052142.WAA02426@zed.ludd.luth.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <199911052142.WAA02426@zed.ludd.luth.se>; from Mattias Pantzare on Fri, Nov 05, 1999 at 10:42:49PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Friday, 5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote: >> >> Take a look at /usr/src/sys/dev/vinum/vinumraid5.c and tell me how to >> modify the code to make that work in a general case. > > Will do. But don't hold your breath :-) I won't. > To get to know vinum I tried to use it on devices made with > vnconfig. Should I debug the panic I got or will it just not work? I don't know of any a priori reason why it shouldn't work. If you send me a stack trace, I should be able to help. Note the instructions in vinum(4). Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 14:13:45 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id BB55014F74 for ; Fri, 5 Nov 1999 14:13:39 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id XAA04385; Fri, 5 Nov 1999 23:06:23 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id XAA02793; Fri, 5 Nov 1999 23:13:02 +0100 (CET) Date: Fri, 5 Nov 1999 23:13:02 +0100 From: Bernd Walter To: Greg Lehey Cc: Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs) Message-ID: <19991105231302.A2771@cicely7.cicely.de> References: <199911052142.WAA02426@zed.ludd.luth.se> <19991105165042.50293@mojave.sitaranetworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <19991105165042.50293@mojave.sitaranetworks.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Nov 05, 1999 at 04:50:42PM -0500, Greg Lehey wrote: > On Friday, 5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote: > > > To get to know vinum I tried to use it on devices made with > > vnconfig. Should I debug the panic I got or will it just not work? > > I don't know of any a priori reason why it shouldn't work. If you > send me a stack trace, I should be able to help. Note the > instructions in vinum(4). > vn devices are file based. I prety shure that it's strategy function can't be called in interrupt context as happens in Raid5 cases. vn calls VOP_READ and VOP_WRITE directly from strategy without a queue like real disk drivers do. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 14:32:37 1999 Delivered-To: freebsd-fs@freebsd.org Received: from yana.lemis.com (yana.lemis.com [192.109.197.140]) by hub.freebsd.org (Postfix) with ESMTP id B791D14D3D for ; Fri, 5 Nov 1999 14:32:30 -0800 (PST) (envelope-from grog@mojave.sitaranetworks.com) Received: from mojave.sitaranetworks.com ([199.103.141.157]) by yana.lemis.com (8.8.8/8.8.8) with ESMTP id JAA07085; Sat, 6 Nov 1999 09:01:44 +1030 (CST) (envelope-from grog@mojave.sitaranetworks.com) Message-ID: <19991105173107.38019@mojave.sitaranetworks.com> Date: Fri, 5 Nov 1999 17:31:07 -0500 From: Greg Lehey To: Bernd Walter Cc: Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs) Reply-To: Greg Lehey References: <199911052142.WAA02426@zed.ludd.luth.se> <19991105165042.50293@mojave.sitaranetworks.com> <19991105231302.A2771@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991105231302.A2771@cicely7.cicely.de>; from Bernd Walter on Fri, Nov 05, 1999 at 11:13:02PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Friday, 5 November 1999 at 23:13:02 +0100, Bernd Walter wrote: > On Fri, Nov 05, 1999 at 04:50:42PM -0500, Greg Lehey wrote: >> On Friday, 5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote: >> >>> To get to know vinum I tried to use it on devices made with >>> vnconfig. Should I debug the panic I got or will it just not work? >> >> I don't know of any a priori reason why it shouldn't work. If you >> send me a stack trace, I should be able to help. Note the >> instructions in vinum(4). > > vn devices are file based. > I prety shure that it's strategy function can't be called in interrupt context > as happens in Raid5 cases. > vn calls VOP_READ and VOP_WRITE directly from strategy without a queue like > real disk drivers do. Yes, that's reasonable. I'd still like to see a trace. We could acommodate vnodes by getting the daemon to complete things. That would make access still slower, and I can't really see a good reason for it, but it's possible. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Nov 5 14:32:50 1999 Delivered-To: freebsd-fs@freebsd.org Received: from yana.lemis.com (yana.lemis.com [192.109.197.140]) by hub.freebsd.org (Postfix) with ESMTP id C6BB814F2F for ; Fri, 5 Nov 1999 14:32:41 -0800 (PST) (envelope-from grog@mojave.sitaranetworks.com) Received: from mojave.sitaranetworks.com ([199.103.141.157]) by yana.lemis.com (8.8.8/8.8.8) with ESMTP id JAA07088; Sat, 6 Nov 1999 09:02:01 +1030 (CST) (envelope-from grog@mojave.sitaranetworks.com) Message-ID: <19991105173107.38019@mojave.sitaranetworks.com> Date: Fri, 5 Nov 1999 17:31:07 -0500 From: Greg Lehey To: Bernd Walter Cc: Mattias Pantzare , freebsd-fs@FreeBSD.ORG Subject: Re: Extending RAID-5 plexes (was: feature list journalled fs) Reply-To: Greg Lehey References: <199911052142.WAA02426@zed.ludd.luth.se> <19991105165042.50293@mojave.sitaranetworks.com> <19991105231302.A2771@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <19991105231302.A2771@cicely7.cicely.de>; from Bernd Walter on Fri, Nov 05, 1999 at 11:13:02PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Friday, 5 November 1999 at 23:13:02 +0100, Bernd Walter wrote: > On Fri, Nov 05, 1999 at 04:50:42PM -0500, Greg Lehey wrote: >> On Friday, 5 November 1999 at 22:42:49 +0100, Mattias Pantzare wrote: >> >>> To get to know vinum I tried to use it on devices made with >>> vnconfig. Should I debug the panic I got or will it just not work? >> >> I don't know of any a priori reason why it shouldn't work. If you >> send me a stack trace, I should be able to help. Note the >> instructions in vinum(4). > > vn devices are file based. > I prety shure that it's strategy function can't be called in interrupt context > as happens in Raid5 cases. > vn calls VOP_READ and VOP_WRITE directly from strategy without a queue like > real disk drivers do. Yes, that's reasonable. I'd still like to see a trace. We could acommodate vnodes by getting the daemon to complete things. That would make access still slower, and I can't really see a good reason for it, but it's possible. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 2:49: 2 1999 Delivered-To: freebsd-fs@freebsd.org Received: from alpo.whistle.com (alpo.whistle.com [207.76.204.38]) by hub.freebsd.org (Postfix) with ESMTP id 5D98914C40 for ; Sat, 6 Nov 1999 02:49:01 -0800 (PST) (envelope-from julian@whistle.com) Received: from current1.whiste.com (current1.whistle.com [207.76.205.22]) by alpo.whistle.com (8.9.1a/8.9.1) with ESMTP id CAA38753; Sat, 6 Nov 1999 02:48:43 -0800 (PST) Date: Sat, 6 Nov 1999 02:48:42 -0800 (PST) From: Julian Elischer To: Jan Pechanec Cc: Erez Zadok , Robert Watson , freebsd-fs@FreeBSD.ORG Subject: Re: stupidfs - easily extensible test file systems? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, 5 Nov 1999, Jan Pechanec wrote: > > BTW, don't you know why deadfs was written? No doc in FreeBSD. > From what I saw in the source code, operations just fail. > When youhave a vnode open, and for some reason the filesystem the vmode pints to disappears (e.g. the disk is removed, or the PC-CARD is removed, or many other posibilties), then you cannot track down all teh users fo that vnode very easily, so insteadm you 'fiddle' with it to make it reference the DEADFS (use VGONE) and when the users try use it again they will safely get an error, but at least the system will not core-dump when they access a non existant filesyste,/device. julian To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 7:59: 6 1999 Delivered-To: freebsd-fs@freebsd.org Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33]) by hub.freebsd.org (Postfix) with ESMTP id 7C97D14D6F for ; Sat, 6 Nov 1999 07:59:04 -0800 (PST) (envelope-from pantzer@speedy.ludd.luth.se) Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164]) by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id QAA19311; Sat, 6 Nov 1999 16:58:56 +0100 Message-Id: <199911061558.QAA19311@zed.ludd.luth.se> X-Mailer: exmh version 2.0.1 12/23/97 To: grog@lemis.com Cc: freebsd-fs@freebsd.org Subject: RAID-5 and failure Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 06 Nov 1999 16:58:55 +0100 From: Mattias Pantzare Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org What hapens if the data part of a write to a RAID-5 plex completes but not the parity part (or the other way)? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 8: 7:11 1999 Delivered-To: freebsd-fs@freebsd.org Received: from gw.nectar.com (gw.nectar.com [209.98.143.44]) by hub.freebsd.org (Postfix) with ESMTP id 082D314CD5 for ; Sat, 6 Nov 1999 08:07:09 -0800 (PST) (envelope-from nectar@nectar.com) Received: from bone.nectar.com (bone.nectar.com [10.0.0.105]) by gw.nectar.com (Postfix) with ESMTP id 76BB951723; Sat, 6 Nov 1999 10:05:38 -0600 (CST) Received: from bone.nectar.com (localhost [127.0.0.1]) by bone.nectar.com (Postfix) with ESMTP id C77771D7A; Sat, 6 Nov 1999 10:07:02 -0600 (CST) X-Mailer: exmh version 2.1.0 09/18/1999 X-Exmh-Isig-CompType: repl X-Exmh-Isig-Folder: mlist/freebsd/fs X-PGP-RSAfprint: 00 F9 E6 A2 C5 4D 0A 76 26 8B 8B 57 73 D0 DE EE X-PGP-RSAkey: http://www.nectar.com/nectar-rsa.txt X-PGP-DSSfprint: AB2F 8D71 A4F4 467D 352E 8A41 5D79 22E4 71A2 8C73 X-PGP-DHfprint: 2D50 12E5 AB38 60BA AF4B 0778 7242 4460 1C32 F6B1 X-PGP-DH-DSSkey: http://www.nectar.com/nectar-dh-dss.txt From: Jacques Vidrine To: Julian Elischer Cc: Jan Pechanec , Erez Zadok , Robert Watson , freebsd-fs@FreeBSD.ORG In-reply-to: References: Subject: Re: stupidfs - easily extensible test file systems? Mime-Version: 1.0 Content-Type: text/plain Date: Sat, 06 Nov 1999 10:07:02 -0600 Message-Id: <19991106160702.C77771D7A@bone.nectar.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 6 November 1999 at 2:48, Julian Elischer wrote: > When youhave a vnode open, and for some reason the filesystem the vmode > pints to disappears (e.g. the disk is removed, or the PC-CARD is removed, > or many other posibilties), [snip] The most common case in most systems is probably revoke(2). -- Jacques Vidrine / n@nectar.com / nectar@FreeBSD.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 8:34:53 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id 1C8F014C3B for ; Sat, 6 Nov 1999 08:34:44 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id RAA24945; Sat, 6 Nov 1999 17:27:53 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id RAA09170; Sat, 6 Nov 1999 17:34:34 +0100 (CET) Date: Sat, 6 Nov 1999 17:34:34 +0100 From: Bernd Walter To: Mattias Pantzare Cc: grog@lemis.com, freebsd-fs@FreeBSD.ORG Subject: Re: RAID-5 and failure Message-ID: <19991106173434.A9143@cicely7.cicely.de> References: <199911061558.QAA19311@zed.ludd.luth.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <199911061558.QAA19311@zed.ludd.luth.se> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Nov 06, 1999 at 04:58:55PM +0100, Mattias Pantzare wrote: > What hapens if the data part of a write to a RAID-5 plex completes but not the > parity part (or the other way)? > The parity is not in sync - what else? -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 9:16:55 1999 Delivered-To: freebsd-fs@freebsd.org Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33]) by hub.freebsd.org (Postfix) with ESMTP id D237314C92 for ; Sat, 6 Nov 1999 09:16:53 -0800 (PST) (envelope-from pantzer@speedy.ludd.luth.se) Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164]) by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id SAA20783; Sat, 6 Nov 1999 18:16:49 +0100 Message-Id: <199911061716.SAA20783@zed.ludd.luth.se> X-Mailer: exmh version 2.0.1 12/23/97 To: Bernd Walter Cc: freebsd-fs@FreeBSD.ORG Subject: Re: RAID-5 and failure In-Reply-To: Message from Bernd Walter of "Sat, 06 Nov 1999 17:34:34 +0100." <19991106173434.A9143@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 06 Nov 1999 18:16:47 +0100 From: Mattias Pantzare Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > On Sat, Nov 06, 1999 at 04:58:55PM +0100, Mattias Pantzare wrote: > > What hapens if the data part of a write to a RAID-5 plex completes but not the > > parity part (or the other way)? > > > The parity is not in sync - what else? The system could detect it and recalculate the parity. Or give a warning to the user so the user knows that the data is not safe. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 9:33:28 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id 548BE14E97 for ; Sat, 6 Nov 1999 09:33:25 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id SAA28192; Sat, 6 Nov 1999 18:26:34 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id SAA09438; Sat, 6 Nov 1999 18:33:16 +0100 (CET) Date: Sat, 6 Nov 1999 18:33:16 +0100 From: Bernd Walter To: Mattias Pantzare Cc: Bernd Walter , freebsd-fs@FreeBSD.ORG Subject: Re: RAID-5 and failure Message-ID: <19991106183316.A9420@cicely7.cicely.de> References: <199911061716.SAA20783@zed.ludd.luth.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <199911061716.SAA20783@zed.ludd.luth.se> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Nov 06, 1999 at 06:16:47PM +0100, Mattias Pantzare wrote: > > On Sat, Nov 06, 1999 at 04:58:55PM +0100, Mattias Pantzare wrote: > > > What hapens if the data part of a write to a RAID-5 plex completes but not the > > > parity part (or the other way)? > > > > > The parity is not in sync - what else? > > The system could detect it and recalculate the parity. Or give a warning to > the user so the user knows that the data is not safe. That's not possible because you need to write more then a single sector to keep parity in sync which is not atomic. In case one of the writes fail vinum will do everything needed to work with it and to inform the user. Vinum will take the subdisk down because such drives should work with write reallocation enabled and such a disk is badly broken if you receive a write error. If the system panics or power fails between such a write there is no way to find out if the parity is broken beside verifying the complete plex after reboot - the problem should be the same with all usual hard and software solutions - greg already begun or finished recalculating and checking the parity. I asume that's the reason why some systems use 520 byte sectors - maybe they write timestamps or generationnumbers in a single write within the sector. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 10:27:27 1999 Delivered-To: freebsd-fs@freebsd.org Received: from zed.ludd.luth.se (zed.ludd.luth.se [130.240.16.33]) by hub.freebsd.org (Postfix) with ESMTP id 61E5514EFE for ; Sat, 6 Nov 1999 10:27:24 -0800 (PST) (envelope-from pantzer@speedy.ludd.luth.se) Received: from speedy.ludd.luth.se (pantzer@speedy.ludd.luth.se [130.240.16.164]) by zed.ludd.luth.se (8.8.5/8.8.5) with ESMTP id TAA22113; Sat, 6 Nov 1999 19:27:21 +0100 Message-Id: <199911061827.TAA22113@zed.ludd.luth.se> X-Mailer: exmh version 2.0.1 12/23/97 To: Bernd Walter Cc: freebsd-fs@FreeBSD.ORG Subject: Re: RAID-5 and failure In-Reply-To: Message from Bernd Walter of "Sat, 06 Nov 1999 18:33:16 +0100." <19991106183316.A9420@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Date: Sat, 06 Nov 1999 19:27:20 +0100 From: Mattias Pantzare Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > If the system panics or power fails between such a write there is no wa= y to > find out if the parity is broken beside verifying the complete plex aft= er > reboot - the problem should be the same with all usual hard and softwar= e > solutions - greg already begun or finished recalculating and checking t= he > parity. This is realy a optimisation issue, if you just write without using two-phase commit then you have to recalculate parity after a powerfailure= =2E = (One might keep track of the regions of the disk that have had writes lat= ly = and only recalculate them) Or you do as it says under Two-phase commitment in http://www.sunworld.com/sunworldonline/swol-09-1995/swol-09-raid5-2.html.= To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 10:33:55 1999 Delivered-To: freebsd-fs@freebsd.org Received: from hromeo.algonet.se (hromeo.algonet.se [194.213.74.10]) by hub.freebsd.org (Postfix) with SMTP id 7BCDA14E82 for ; Sat, 6 Nov 1999 10:33:48 -0800 (PST) (envelope-from mal@algonet.se) Received: (qmail 22267 invoked from network); 6 Nov 1999 19:33:47 +0100 Received: from enok.algonet.se (194.213.74.88) by hromeo.algonet.se with SMTP; 6 Nov 1999 19:33:47 +0100 Received: from kairos.algonet.se ([194.213.74.18]) by algonet.se (BLUETAIL Mail Robustifier1.0.4) with ESMTP ; Sat, 06 Nov 1999 18:33:47 GMT Received: (mal@localhost) by kairos.algonet.se (8.8.8+Sun/8.6.12) id TAA04881; Sat, 6 Nov 1999 19:33:46 +0100 (MET) To: freebsd-fs@freebsd.org Subject: Re: stupidfs - easily extensible test file systems? References: <80113h$n8e$1@FreeBSD.csie.NCTU.edu.tw> From: Mats Lofkvist Date: 06 Nov 1999 19:33:46 +0100 In-Reply-To: julian@whistle.com's message of "6 Nov 1999 18:49:21 +0800" Message-ID: Lines: 25 X-Mailer: Gnus v5.6.45/Emacs 20.3 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org julian@whistle.com (Julian Elischer) writes: > On Fri, 5 Nov 1999, Jan Pechanec wrote: > > > > BTW, don't you know why deadfs was written? No doc in FreeBSD. > > From what I saw in the source code, operations just fail. > > > When youhave a vnode open, and for some reason the filesystem the vmode > pints to disappears (e.g. the disk is removed, or the PC-CARD is removed, > or many other posibilties), then you cannot track down all teh users fo > that vnode very easily, so insteadm you 'fiddle' with it to make it > reference the DEADFS (use VGONE) and when the users try use it again they > will safely get an error, but at least the system will > not core-dump when they access a non existant filesyste,/device. I guess deadfs is what makes the -f (force) flag to umount work also, and that one is a truly great feature in FreeBSD missing in many other unixen (e.g. solaris {and linux, I believe}). Having to track down all processes with open descriptors on e.g. a nfs mount before being able to umount it is a real pain in the *, most times I give up on it and reboot the machine instead. _ Mats Lofkvist mal@algonet.se To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Nov 6 11: 8: 5 1999 Delivered-To: freebsd-fs@freebsd.org Received: from mail.du.gtn.com (mail.du.gtn.com [194.77.9.57]) by hub.freebsd.org (Postfix) with ESMTP id CF03C14BDC for ; Sat, 6 Nov 1999 11:08:02 -0800 (PST) (envelope-from ticso@mail.cicely.de) Received: from mail.cicely.de (cicely.de [194.231.9.142]) by mail.du.gtn.com (8.9.3/8.9.3) with ESMTP id UAA03422; Sat, 6 Nov 1999 20:01:12 +0100 (MET) Received: (from ticso@localhost) by mail.cicely.de (8.9.0/8.9.0) id UAA09809; Sat, 6 Nov 1999 20:07:54 +0100 (CET) Date: Sat, 6 Nov 1999 20:07:54 +0100 From: Bernd Walter To: Mattias Pantzare Cc: Bernd Walter , freebsd-fs@FreeBSD.ORG Subject: Re: RAID-5 and failure Message-ID: <19991106200754.A9682@cicely7.cicely.de> References: <199911061827.TAA22113@zed.ludd.luth.se> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre3i In-Reply-To: <199911061827.TAA22113@zed.ludd.luth.se> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Nov 06, 1999 at 07:27:20PM +0100, Mattias Pantzare wrote: > > If the system panics or power fails between such a write there is no way to > > find out if the parity is broken beside verifying the complete plex after > > reboot - the problem should be the same with all usual hard and software > > solutions - greg already begun or finished recalculating and checking the > > parity. > > This is realy a optimisation issue, if you just write without using > two-phase commit then you have to recalculate parity after a powerfailure. > (One might keep track of the regions of the disk that have had writes latly > and only recalculate them) > > Or you do as it says under Two-phase commitment in > http://www.sunworld.com/sunworldonline/swol-09-1995/swol-09-raid5-2.html. > That's exactly what vinum does at this moment but without the log. You need persistent memory for this such as nv-memory or a log area on any disk. nv-memory on PCs is usually to small and maybe to slow for such purposes. I asume that a log area on any partitipating disk is not a good idea. On a different disk it would be an option but still needs implementation. -- B.Walter COSMO-Project http://www.cosmo-project.de ticso@cicely.de Usergroup info@cosmo-project.de To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message