From owner-freebsd-fs Sun Jan 12 7:47:58 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D5E9537B401 for ; Sun, 12 Jan 2003 07:47:56 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4AD3A43F3F for ; Sun, 12 Jan 2003 07:47:55 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0CFlsB4003521; Sun, 12 Jan 2003 07:47:54 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0CFlrW4003520; Sun, 12 Jan 2003 07:47:53 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Sun, 12 Jan 2003 07:47:53 -0800 From: David Schultz To: Tomas Pluskal Cc: Bruce Evans , Terry Lambert , freebsd-fs@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem Message-ID: <20030112154753.GA3284@HAL9000.homeunix.com> Mail-Followup-To: Tomas Pluskal , Bruce Evans , Terry Lambert , freebsd-fs@FreeBSD.ORG References: <20021114020947.O6495-100000@gamplex.bde.org> <20030111191832.B18312-200000@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030111191832.B18312-200000@localhost.localdomain> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Tomas Pluskal : > I have made a simple patch to enable clustering in msdosfs. > It is against 4-STABLE. > > With this patch I get speed on my ZIP drive about 700KB/s (while before it > was about 80KB/s). [...] > if (ap->a_runp) { > - /* > - * Sequential clusters should be counted here. > - */ > - *ap->a_runp = 0; > + int nblk; > + > + nblk = (dep->de_FileSize >> bshift) - (lblkno + 1); > + if (nblk <= 0) > + *ap->a_runp = 0; > + else if (nblk >= (MAXBSIZE >> bshift)) > + *ap->a_runp = (MAXBSIZE >> bshift) - 1; > + else > + *ap->a_runp = nblk; > } I'm not sure I understand what you're trying to do here. Does this work with files that are fragmented? You appear to be assuming that they are not. Maybe you copied the code from the cd9660 filesystem, which does not permit external fragmentation. I think you need to use the cluster number returned by pcbmap() to index into the FAT and extract the next cluster number, repeating until you find that the next cluster is not contiguous, or until you hit MAXBSIZE or the end of the file. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Jan 12 13: 0:30 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B77FC37B401 for ; Sun, 12 Jan 2003 13:00:29 -0800 (PST) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7EE1843F13 for ; Sun, 12 Jan 2003 13:00:28 -0800 (PST) (envelope-from bde@zeta.org.au) Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id IAA25769; Mon, 13 Jan 2003 08:00:15 +1100 Date: Mon, 13 Jan 2003 08:00:48 +1100 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: David Schultz Cc: Tomas Pluskal , Terry Lambert , Subject: Re: seeking help to rewrite the msdos filesystem In-Reply-To: <20030112154753.GA3284@HAL9000.homeunix.com> Message-ID: <20030113075546.R8938-100000@gamplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Sun, 12 Jan 2003, David Schultz wrote: > Thus spake Tomas Pluskal : > > I have made a simple patch to enable clustering in msdosfs. > > It is against 4-STABLE. > [...] > > if (ap->a_runp) { > > - /* > > - * Sequential clusters should be counted here. > > - */ > > - *ap->a_runp = 0; > > + int nblk; > > + > > + nblk = (dep->de_FileSize >> bshift) - (lblkno + 1); > > I'm not sure I understand what you're trying to do here. Does > this work with files that are fragmented? You appear to be > assuming that they are not. Maybe you copied the code from the > cd9660 filesystem, which does not permit external fragmentation. ISTR suggesting looking at cd9660 for examples of how to do clustering. Unfortunately, it is too simple here. The corresponding code in ufs and ext2fs is quite complicated. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Jan 12 13:19:55 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BDB3D37B447 for ; Sun, 12 Jan 2003 13:19:53 -0800 (PST) Received: from soulshock.mail.pas.earthlink.net (soulshock.mail.pas.earthlink.net [207.217.120.130]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0BBA943E4A for ; Sun, 12 Jan 2003 13:19:53 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from heron (heron.mail.pas.earthlink.net [207.217.120.189]) by soulshock.mail.pas.earthlink.net (8.11.6+Sun/8.11.6) with ESMTP id h0CLCpH16091 for ; Sun, 12 Jan 2003 13:12:51 -0800 (PST) Received: from pool0195.cvx21-bradley.dialup.earthlink.net ([209.179.192.195] helo=mindspring.com) by heron with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18XpPD-0004rl-00; Sun, 12 Jan 2003 13:12:31 -0800 Message-ID: <3E21D9F0.A2AA9F0@mindspring.com> Date: Sun, 12 Jan 2003 13:11:12 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz Cc: Tomas Pluskal , Bruce Evans , freebsd-fs@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem References: <20021114020947.O6495-100000@gamplex.bde.org> <20030111191832.B18312-200000@localhost.localdomain> <20030112154753.GA3284@HAL9000.homeunix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a40ca706a2e763fcf08679dad9b6f62a67a7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org David Schultz wrote: [ ... clustering patch ... ] > I'm not sure I understand what you're trying to do here. Does > this work with files that are fragmented? You appear to be > assuming that they are not. Maybe you copied the code from the > cd9660 filesystem, which does not permit external fragmentation. > I think you need to use the cluster number returned by pcbmap() to > index into the FAT and extract the next cluster number, repeating > until you find that the next cluster is not contiguous, or until > you hit MAXBSIZE or the end of the file. FWIW, I had the same question, but I haven't had time to really stare at some FS instances from a working Windows box from the FreeBSD side of things, to know how bad this really is, so I thought that this might be on purpose. I don't expect he'd ever see it at all, given his intended usage. I think that it's not that bad (really), but will lose about 50% of the performance improvement on a file that's partially fragged, but still contains contiguous blocks in it. On a generally fragged file, you're not going to trigger the code at all. I'm not really sure a cluster can start at a non-boundary, anyway (this is what I need to looks at examples to see), so it may be a total non-issue. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Jan 12 13:52: 8 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DA29C37B401 for ; Sun, 12 Jan 2003 13:52:06 -0800 (PST) Received: from pohoda.cz (pohoda.pohoda.cz [194.228.111.151]) by mx1.FreeBSD.org (Postfix) with SMTP id 1A1B943F65 for ; Sun, 12 Jan 2003 13:52:05 -0800 (PST) (envelope-from plusik@pohoda.cz) Received: (qmail 15336 invoked from network); 12 Jan 2003 21:52:10 -0000 Received: from plusik@pohoda.cz by pohoda.cz by uid 513 with qmail-scanner-1.15 ( Clear:. Processed in 0.058297 secs); 12 Jan 2003 21:52:10 -0000 Received: from saturn.netcore.cz (HELO localhost.localdomain) (212.67.74.6) by pohoda.pohoda.cz with SMTP; 12 Jan 2003 21:52:09 -0000 Received: by localhost.localdomain (Postfix, from userid 1000) id E27C21CBAF4; Sun, 12 Jan 2003 22:52:02 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by localhost.localdomain (Postfix) with ESMTP id E0B6D1CBAF3; Sun, 12 Jan 2003 22:52:02 +0100 (CET) Date: Sun, 12 Jan 2003 22:52:02 +0100 (CET) From: Tomas Pluskal X-X-Sender: plusik@localhost.localdomain To: Terry Lambert Cc: David Schultz , Bruce Evans , Subject: Re: seeking help to rewrite the msdos filesystem In-Reply-To: <3E21D9F0.A2AA9F0@mindspring.com> Message-ID: <20030112222759.R23717-100000@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thank you all for your comments. I would like to state here, that I have no experience with filesystem development at all (well, now I have a bit :) You told me to look at the cd9660 code, so I did my best to do it like it's in cd9660... I only partially understand what the code really does. By the way, is there any documentation for these things anywhere ? I mean it is quite hard to understand what pcbmap(), bread(), cluster_read() etc. and their parameters really mean, just by reading the code... If I understand it right, when I assume the file is not fragmented, it is just a performance issue - it would make the FS slow on fragmented files, but should not break anything. Is this correct? If any of you could suggest a better solution (in a way that I could understand it :), I can work on it. Tomas On Sun, 12 Jan 2003, Terry Lambert wrote: > > FWIW, I had the same question, but I haven't had time to really > stare at some FS instances from a working Windows box from the > FreeBSD side of things, to know how bad this really is, so I > thought that this might be on purpose. > > I don't expect he'd ever see it at all, given his intended usage. > > I think that it's not that bad (really), but will lose about 50% > of the performance improvement on a file that's partially fragged, > but still contains contiguous blocks in it. On a generally fragged > file, you're not going to trigger the code at all. I'm not really > sure a cluster can start at a non-boundary, anyway (this is what I > need to looks at examples to see), so it may be a total non-issue. > > -- Terry > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Jan 13 8:22:22 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A5A0837B401 for ; Mon, 13 Jan 2003 08:22:20 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id DCF9943F18 for ; Mon, 13 Jan 2003 08:22:14 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0DGMDB4007329; Mon, 13 Jan 2003 08:22:13 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0DGMB13007328; Mon, 13 Jan 2003 08:22:11 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Mon, 13 Jan 2003 08:22:11 -0800 From: David Schultz To: Tomas Pluskal Cc: Terry Lambert , Bruce Evans , freebsd-fs@FreeBSD.ORG Subject: Re: seeking help to rewrite the msdos filesystem Message-ID: <20030113162211.GA7279@HAL9000.homeunix.com> Mail-Followup-To: Tomas Pluskal , Terry Lambert , Bruce Evans , freebsd-fs@FreeBSD.ORG References: <3E21D9F0.A2AA9F0@mindspring.com> <20030112222759.R23717-100000@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030112222759.R23717-100000@localhost.localdomain> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Tomas Pluskal : > Thank you all for your comments. > I would like to state here, that I have no experience with > filesystem development at all (well, now I have a bit :) > You told me to look at the cd9660 code, so I did my best to do it like > it's in cd9660... I only partially understand what the code really does. > > By the way, is there any documentation for these things anywhere ? I mean > it is quite hard to understand what pcbmap(), bread(), cluster_read() etc. > and their parameters really mean, just by reading the code... > > If I understand it right, when I assume the file is not fragmented, it is > just a performance issue - it would make the FS slow on fragmented files, > but should not break anything. Is this correct? That's a good point, and I don't know the answer. I used to know the FAT filesystem very well back in my days with an XT clone, so I can probably answer any questions you have on that, but I have little experience with the VFS interface. The routine I would probably want to start looking at is cluster_read(). (Maybe I will do that when I get back from skiing.) Basically, you want to know whether it trusts VOP_BMAP about runs of contiguous blocks, or whether it verifies what vnode/lbn each block corresponds to. Since ffs_bmap() goes out of its way to ensure that it returns accurate information, I would guess the former. Alternatively, you might try testing your code by writing a one-cluster file, then creating another non-empty file, then appending to the first file. If something gets overwritten, then you have a bug. For a more extensive test, you could try untarring two tarballs to different places on an msdosfs volume simultaneously. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Jan 13 11: 1:14 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D984937B406 for ; Mon, 13 Jan 2003 11:01:13 -0800 (PST) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 226C843F6D for ; Mon, 13 Jan 2003 11:01:03 -0800 (PST) (envelope-from owner-bugmaster@freebsd.org) Received: from freefall.freebsd.org (peter@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id h0DJ12NS050816 for ; Mon, 13 Jan 2003 11:01:02 -0800 (PST) (envelope-from owner-bugmaster@freebsd.org) Received: (from peter@localhost) by freefall.freebsd.org (8.12.6/8.12.6/Submit) id h0DJ12bb050798 for fs@freebsd.org; Mon, 13 Jan 2003 11:01:02 -0800 (PST) Date: Mon, 13 Jan 2003 11:01:02 -0800 (PST) Message-Id: <200301131901.h0DJ12bb050798@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: peter set sender to owner-bugmaster@freebsd.org using -f From: FreeBSD bugmaster To: fs@FreeBSD.org Subject: Current problem reports assigned to you Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Current FreeBSD problem reports Critical problems Serious problems Non-critical problems S Submitted Tracker Resp. Description ------------------------------------------------------------------------------- a [2000/10/06] kern/21807 fs [patches] Make System attribute correspon 1 problem total. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Jan 14 11:26:35 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B930437B401 for ; Tue, 14 Jan 2003 11:26:34 -0800 (PST) Received: from web13505.mail.yahoo.com (web13505.mail.yahoo.com [216.136.175.84]) by mx1.FreeBSD.org (Postfix) with SMTP id 4192C43F5B for ; Tue, 14 Jan 2003 11:26:34 -0800 (PST) (envelope-from jason_jks@yahoo.com) Message-ID: <20030114192634.75751.qmail@web13505.mail.yahoo.com> Received: from [65.205.244.66] by web13505.mail.yahoo.com via HTTP; Tue, 14 Jan 2003 11:26:34 PST Date: Tue, 14 Jan 2003 11:26:34 -0800 (PST) From: Jason Schoonover Subject: large filesystem, journaling filesystem support To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hi guys, Not quite sure where to send this to, so I'll send it here, to the filesystem mailing list. I have two questions really regarding an NFS server with a large filesystem. Right now I'm using FreeBSD 4.7-RELEASE as an NFS server. I used the ccd tool to create a software RAID, but need more disk space, so I ordered a hardware RAID unit that will have about 1TB for disk storage. My question is, are there any file systems that freebsd supports that is stable and can support over a TB of data? Also, I'm wondering if there are any journaling filesystems out there for FreeBSD. I know Linux has a few, and I'm wondering if freebsd will support any of those (ReiserFS, ext3, or JFS)? I don't want to switch to Linux because NFS under linux doesn't seem to be near as good as it is with FreeBSD. And if it wasn't a journaling file systems, seems that, in the event of a crash that fscking it would take forever. Thanks, Jason __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Jan 16 23:51:42 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7205B37B401 for ; Thu, 16 Jan 2003 23:51:41 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9318C43E4A for ; Thu, 16 Jan 2003 23:51:39 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0H7pcbZ003568; Thu, 16 Jan 2003 23:51:38 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0H7pI0e003559; Thu, 16 Jan 2003 23:51:18 -0800 (PST) (envelope-from dschultz@uclink.Berkeley.EDU) Date: Thu, 16 Jan 2003 23:51:18 -0800 From: David Schultz To: Jason Schoonover Cc: freebsd-fs@FreeBSD.ORG Subject: Re: large filesystem, journaling filesystem support Message-ID: <20030117075118.GA3493@HAL9000.homeunix.com> Mail-Followup-To: Jason Schoonover , freebsd-fs@FreeBSD.ORG References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030114192634.75751.qmail@web13505.mail.yahoo.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Jason Schoonover : > I have two questions really regarding an NFS server > with a large filesystem. Right now I'm using FreeBSD > 4.7-RELEASE as an NFS server. I used the ccd tool to > create a software RAID, but need more disk space, so I > ordered a hardware RAID unit that will have about 1TB > for disk storage. > > My question is, are there any file systems that > freebsd supports that is stable and can support over a > TB of data? Also, I'm wondering if there are any > journaling filesystems out there for FreeBSD. I know > Linux has a few, and I'm wondering if freebsd will > support any of those (ReiserFS, ext3, or JFS)? I > don't want to switch to Linux because NFS under linux > doesn't seem to be near as good as it is with FreeBSD. > And if it wasn't a journaling file systems, seems > that, in the event of a crash that fscking it would > take forever. FreeBSD uses softupdates, which achieves similar efficiency and reliability goals to journaling. With softupdates, you don't need to fsck at all at boot time following a power failure or crash because the worst case scenario (hardware failure aside) is that some disk space that is really free is marked as allocated. In FreeBSD 5.0, you can actually run fsck in the background at any time to reclaim this space. That said, there is some limited interest in porting a journaling filesystem to FreeBSD. Several people have started, but I don't know if anyone has finished. A plain old UFS filesystem can be 1 TB in size. Sizes up to 4 TB could work, but you might have trouble with anything bigger than 1 TB in FreeBSD 4.X. UFS2 (supported by FreeBSD 5.0) will allow you to create filesystems much larger than 1 TB. If you're conservative, however, you might want to wait and observe others' experiences with 5.0 before you use it on an important machine. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Jan 17 2:31:20 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9BA0D37B401 for ; Fri, 17 Jan 2003 02:31:16 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id F099E43ED8 for ; Fri, 17 Jan 2003 02:31:15 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0018.cvx40-bradley.dialup.earthlink.net ([216.244.42.18] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18ZTmF-0003hP-00; Fri, 17 Jan 2003 02:31:08 -0800 Message-ID: <3E27DA7F.D5DBEFB@mindspring.com> Date: Fri, 17 Jan 2003 02:27:11 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: David Schultz Cc: Jason Schoonover , freebsd-fs@FreeBSD.ORG Subject: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a44530405ad5ade39a7d84765512278c9ea7ce0e8f8d31aa3f350badd9bab72f9c350badd9bab72f9c Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org This posting is in favor of a JFS. It gives detailed technical arguments about why some of the soft updates claims some people are making are actually incorrect. For the record, Kirk McKusick has stated on FreeBSD -arch that background fsck has the problems I note, in passing, below. > FreeBSD uses softupdates, which achieves similar efficiency and > reliability goals to journaling. With softupdates, you don't need > to fsck at all at boot time following a power failure or crash > because the worst case scenario (hardware failure aside) is that > some disk space that is really free is marked as allocated. No, the worst case following a power failure is a screwed disk track. Modern disk drives read and write a track at a time; this is to avoid rotational latency that woul happen if you waited for a hard "sector start" marker to come around, and it avoids the need for "low level formatting". For a very small window of time in the late 1990's, two manufacturers, IBM and Quantum, created disk drives which were capable of using rotational energy as a power source (regenerative braking) to complete a write in progress, following a DC failure (this provided a small post-failure hold-up time. Modern disk drives no longer do this, because disk manufacturers are morons (or one was a moron, and the others had to compete on price, which amounts to the same thing). The net result is that a DC failure can result in an entire track getting trashed, if it happens at the right time. So why is this important? Soft updates optimizes for sector writing, not track writing, while journalling can journal on the basis of track-sized extents. If it is written correctly (there are a number of technical challenges to writing this correctly, and SGI, IBM, and Linux haven't done it, but it's theoretically possible, though very hard on IDE -- much easier on SCSI because the physical geometry can be accessed via mode page 2). The upshot of this is that a journalled FS can recover any damage from a power failure, if needs be, whereas if this were to happen on a disk protected by soft updates, you are screwed. Journalling and soft updates are orthogonal technologies; they do not solve the same problem space, although there is some minor overlap. > In FreeBSD 5.0, you can actually run fsck in the background at any > time to reclaim this space. In fact, this is not true. You can only run a fsck in the background in the case that you know that the failure mode was a power failure, and that no data was corrupted. This is not something you can know for certain without CMOS. A panic failure situation may result in corrupted disk buffers that are flushed to the disk, prior to the panic. A hardware failure can result in a similar failure. And a power failure can result in a corrupt track. In all three of these cases, a background fsck is unable to recover the system appropriately. Neither is it possible to mount the FS read-only, and make it read/write on a cylinder group basis, following a fsck, until all of the areas of the disks have been checked, else it's possible to load and run corrupt code that then corrupts a previously OK area of the disk. The only reasonable fix is a CMOS area that contains a failure condition code. Unfortunately, one of FreeBSD's failure modes is a spontaneous reboot; this is because this is the normal failure mode for PC hardware on a triple fault, which may occur as a result of a condition that should result in a panic (corruption of kernel memory), if the memory so corrupted is the GDT, or certain other types of failures occur. Thus the only safe way of dealing with this in the soft updates case is a DC holdup circuit whose sole job is to write a "fower fail" code into NVRAM, which can be read out by the OS. This means that the first thing an OS should do following succesful recovery after read that value is to write a non-"power fail" code into the CMOS, so that it can differentiate a power failure from a soft failure. PC hardware has no such assitance for OS's, despite Microsoft and Intel attempting to accelerate the recovery process (maybe the simply didn't think about the problem in sufficient detail to realize hardware help is needed). A journaling FS has the same vulnerability to corrupt kernel buffers that were written out, but not the same vulnerability on recovery, as it does not need to distinguish reboots due to power failure from reboots due to other causes (because it can be insensitive to the difference, by being insensitive to single track failures from write in progress). The upshot of this is that a journalling FS can recover using an abbreviated process, with only software CMOS cause notification, without needing special hardware additions for "power fail" differentiation. > That said, there is some limited > interest in porting a journaling filesystem to FreeBSD. Several > people have started, but I don't know if anyone has finished. Part of the disincentive here is that people keep saying that Soft Updates is "just as good as journalling" or "solves the same problem space journalling solves", etc., when it doesn't, and the technologies are actually complementary. People should stop claiming this, when it isn't true. If you want to talk about the overlap, fine; but don't claim that soft updates or bacground fsck adequately solves the loss of power problem, unless you happen to have an IBM drive from 1997. Personally, I would welcome a journalling FS on FreeBSD. It would have saved us the cost of a custom power supply that provided DC holdup and AC fail notification. While it was significantly cheaper than a UPS, and we were able to make the change because of soft updates, it would have been even cheaper if we could have avoided the problem entirely. The biggest problem, to my mind, that adoption of a journalling filesystem by FreeBSD keeps hitting its head on, is that people keep wanting to port GPL'ed JFS code to FreeBSD, not understanding that it's impossible for a GPL'ed FS to ever be the default for FreeBSD, because the GPL specifically prohibits use of other licenses in statically linked code, and the boot file system must be statically linked into the code in order to mount root, and to load kernel modules. If you want to write a JFS for FreeBSD: fine; but if you are going to start with third party code, be sure that code is under the BSD license, so that your FS can ship in a binary and usable form on the CDROM. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Jan 17 8:37:47 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0708E37B401 for ; Fri, 17 Jan 2003 08:37:46 -0800 (PST) Received: from scrooge.etek.chalmers.se (scrooge.etek.chalmers.se [129.16.32.112]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0630243F18 for ; Fri, 17 Jan 2003 08:37:45 -0800 (PST) (envelope-from b@etek.chalmers.se) Received: from scrooge.etek.chalmers.se (b@localhost [127.0.0.1]) by scrooge.etek.chalmers.se (8.12.3/8.12.3) with ESMTP id h0HGbhca011050; Fri, 17 Jan 2003 17:37:43 +0100 (CET) (envelope-from b@etek.chalmers.se) Received: from localhost (b@localhost) by scrooge.etek.chalmers.se (8.12.3/8.12.3/Submit) with ESMTP id h0HGbhNb011014; Fri, 17 Jan 2003 17:37:43 +0100 (CET) X-Authentication-Warning: scrooge.etek.chalmers.se: b owned process doing -bs Date: Fri, 17 Jan 2003 17:37:43 +0100 (CET) From: Magnus B{ckstr|m To: Terry Lambert , Cc: David Schultz , Jason Schoonover Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I'd like to point anyone who is thinking JFS at www.opendce.org. It would be really nice to see DCE and DFS on FreeBSD, most preferably together with the log-structured filesystem that was designed to underlie it. One would not necessarily want it as a boot filesystem or shipped with the base system; rather, there are things you can do with DFS+"Episode" in distributed environments that turns NFS+anything an embarrassed shade of pink. Still it's going to be interesting to see what kind of a license the OpenDCE project settles on. The default assumption is LGPL-like, but there's been discussion on the mailing list of Apache- or BSD-style licenses. It's up to the Open Group, and they're still in a huddle with their legal people. Magnus On Fri, 17 Jan 2003, Terry Lambert wrote: > >[fzzt] > If you want to write a JFS for FreeBSD: fine; but if you are going > to start with third party code, be sure that code is under the BSD > license, so that your FS can ship in a binary and usable form on > the CDROM. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Jan 17 14:24:16 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D6F0237B401 for ; Fri, 17 Jan 2003 14:24:13 -0800 (PST) Received: from HAL9000.homeunix.com (12-233-57-224.client.attbi.com [12.233.57.224]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0545143F5F for ; Fri, 17 Jan 2003 14:24:13 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Received: from HAL9000.homeunix.com (localhost [127.0.0.1]) by HAL9000.homeunix.com (8.12.6/8.12.5) with ESMTP id h0HMOBbZ005662; Fri, 17 Jan 2003 14:24:11 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Received: (from das@localhost) by HAL9000.homeunix.com (8.12.6/8.12.5/Submit) id h0HMOAJQ005661; Fri, 17 Jan 2003 14:24:10 -0800 (PST) (envelope-from dschultz@uclink.berkeley.edu) Date: Fri, 17 Jan 2003 14:24:10 -0800 From: David Schultz To: Terry Lambert Cc: Jason Schoonover , freebsd-fs@FreeBSD.ORG Subject: Re: JFS vs. Soft Updates (again) (was: Re: large filesystem, journaling filesystem support) Message-ID: <20030117222410.GA5449@HAL9000.homeunix.com> Mail-Followup-To: Terry Lambert , Jason Schoonover , freebsd-fs@FreeBSD.ORG References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3E27DA7F.D5DBEFB@mindspring.com> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Thus spake Terry Lambert : > > FreeBSD uses softupdates, which achieves similar efficiency and > > reliability goals to journaling. With softupdates, you don't need > > to fsck at all at boot time following a power failure or crash > > because the worst case scenario (hardware failure aside) is that > > some disk space that is really free is marked as allocated. > > No, the worst case following a power failure is a screwed disk > track. Yes, I'm familiar with this failure mode; it has been discussed on the lists before. I was grouping it in the ``hardware failure'' category I mentioned so I could make my post concise, so as to not fall asleep in the middle of it. ;-) > Soft updates optimizes for sector writing, not track writing, > while journalling can journal on the basis of track-sized > extents. > > If it is written correctly (there are a number of technical > challenges to writing this correctly, and SGI, IBM, and Linux > haven't done it, but it's theoretically possible, though very > hard on IDE -- much easier on SCSI because the physical geometry > can be accessed via mode page 2). Even if you know the size of each physical track and manage to write a journalling filesystem that takes that into account, I would think that you'd wind up wasting memory or paying for read-modify-write cycles to commit entire tracks. Nevertheless, I suppose it could be done. The LFS was very nearly a solution to this problem, but it didn't take the disk geometry into account. If you are going to assume that the hardware is going to do something stupid (a good assumption), then the problem is actually much worse than you imply. RAID controllers and disk firmware, like operating systems, have race conditions and other bugs. Neither softupdates nor journalling alone will save you from a misdirected or phantom write, a misdirected read, or an interface error. Hardware checksums will not fix the problem either. In the cases of misdirected reads and writes, the checksums match. For an interface error, there isn't even a checksum to verify, because it's already been verified and discarded by the disk. You need far more than just DC holdup if you want to detect and possibly correct these problems. In light of that, I do group softupdates and journalling in the same category, since neither provides filesystem integrity in the face of hardware errors. I agree with you that journalling could solve one particular problem associated with full track writes, but as you mentioned, nobody actually does journalling that way. But the idea that you can take a UFS-like filesystem and fix all of its metadata integrity problems by adding journalling to it is nonsense. There is some ongoing work on a commercial filesystem that can verify metadata integrity and usually recover from errors on the fly. Think of an LFS structured around a Merkle tree. The ultimate goal is to be able to swap to it for a while, and still be able to mount it afterwards without running a filesystem checker. People who really need that kind of reliability should be using that kind of filesystem, and paying the associated performance penalty. The rest of us can use softupdates or journalling and have protection against what is by far the most common case: filesystem corruption as a result of unordered metadata updates. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Jan 17 17:49:10 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 41BA037B401 for ; Fri, 17 Jan 2003 17:49:08 -0800 (PST) Received: from mail.synology.com (dns1.synology.com [210.58.106.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id CFB9443ED8 for ; Fri, 17 Jan 2003 17:49:06 -0800 (PST) (envelope-from cheen@synology.com) Received: (from root@localhost) by mail.synology.com (8.12.5/8.12.5) id h0I1mq33074809; Sat, 18 Jan 2003 09:48:52 +0800 (CST) Received: from homexp (61-223-26-104.HINET-IP.hinet.net [61.223.26.104]) (authenticated bits=0) by mail.synology.com (8.12.5/8.12.5av) with ESMTP id h0I1mleG074796; Sat, 18 Jan 2003 09:48:48 +0800 (CST) Message-ID: <001401c2be93$c36c7490$681adf3d@homexp> From: "Cheen Liao" To: References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> Subject: Transaction File System - a replacement of JFS Date: Sat, 18 Jan 2003 09:48:55 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2720.3000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Recently there are discussions on JFS on FreeBSD. I think my company's development plan may meet the demands. My company is planning to build a Transactional File System (TFS) on FreeBSD, which has journaling (logging) capability and database capability. The basic idea is to build a file system on a database engine. When it is done, it should supersede JFS with its database functionality. TFS also has some advantages over the traditional UNIX file systems. If we put all "inodes" in a btree table, it will have faster path lookup by replacing paired inode-directory block IOs with a btree search for the inode. If design properly, a btee search takes a little bit more than 1 IOs on the average, depending on how many internal pages are cached. Also a small file's contents can be stored in a regular variable length field and be part of the "inode". This greatly improve performance and space for small files. The TFS project is a long term project and now is in the early planning stage. Here is the rough plan, and no schedule :) . develop a prototype on FreeBSD 4.x. . use postgreSQL as the internal database engine. . define the database schema for a) storing directories and files "inodes". b) storing large objects (i.e. storing the block numbers for large files) . write all VFS functions using postgreSQL lib in user mode. . write a file system which will "callback (or pop up)" to user mode functions described above. At the end of this stage, we will have a running prototype of TFS, and obviously it has serious performance problems. Also some of the database functions are not good enough for the file system functions and need to be strengthened. With the database engine inside we can easily add extended attributes for each directory or file object and search on them. So in next stage, we will . move the core database engine into kernel. And this has to be FreeBSD 5.0 kernel. Because some of the database functions can take a long time to run. The pre-5.0 kernel process is non-preemptive, the system could hang in kernel because the long-running functions. Obviously we will need a lot of helps from the FreeBSD community to make the move smooth. Especially merging the database buffers with system cache will be a big challenge. . strengthen the database functions: a) add new free space management that is suitable for database extension, so it can run on raw block device. b) improve btree - store record in btree (clustered Btree), add btree deletion function. c) improve large object storage - including clustering policy and recovery policy. d) make logging robust. It will handle the "torn write". e) expose the database functions through new system calls or other creative methods. At this stage, we should have all the basic TFS working, and we will need a lot of fine tunings and tools, such as . performance tuning - a task that never ends. . fsck on TFS - in case logs are lost and it will fix TFS to a consistent state. . add snapshot capability - it will be a piece of cake with logging supports. . add replication by shipping the log to another system and replay the log there. By now if you are still reading, then you probably know what we are trying to achieve. Suggestions and discussions on TFS are extremely welcome. Any suggestion on how to merge our efforts with BSD community's, if any, and speed up the development? Thanks, Cheen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Jan 17 18:21:39 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E3D737B401 for ; Fri, 17 Jan 2003 18:21:38 -0800 (PST) Received: from mail.eecs.harvard.edu (bowser.eecs.harvard.edu [140.247.60.24]) by mx1.FreeBSD.org (Postfix) with ESMTP id EA04243F18 for ; Fri, 17 Jan 2003 18:21:34 -0800 (PST) (envelope-from ellard@eecs.harvard.edu) Received: by mail.eecs.harvard.edu (Postfix, from userid 465) id EBE3E54C441; Fri, 17 Jan 2003 21:21:28 -0500 (EST) Received: from localhost (localhost [127.0.0.1]) by mail.eecs.harvard.edu (Postfix) with ESMTP id E9D6054C440; Fri, 17 Jan 2003 21:21:28 -0500 (EST) Date: Fri, 17 Jan 2003 21:21:28 -0500 (EST) From: Dan Ellard To: Cheen Liao Cc: freebsd-fs@freebsd.org Subject: Re: Transaction File System - a replacement of JFS In-Reply-To: <001401c2be93$c36c7490$681adf3d@homexp> Message-ID: References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Sat, 18 Jan 2003, Cheen Liao wrote: > Recently there are discussions on JFS on FreeBSD. I think my company's > development plan may meet the demands. > > My company is planning to build a Transactional File System (TFS) on > FreeBSD, which has journaling (logging) capability and database capability. > The basic idea is to build a file system on a database engine. When it is > done, it should supersede JFS with its database functionality. > ... You should get in contact with Lex Stein (stein@eecs.harvard.edu) and Mike Tucker (mtucker@eecs.harvard.edu). They have built a file system on top of Berkeley DB, and it's completely transaction-oriented. It's open source and available to download now. The basic idea sounds like almost exactly what you're planning to do, except that it's based on Berkeley DB instead of Postgres, and its interface is a user-level NFSv3 server instead of VFS. (I don't know whether they've thought about the niftier features like snapshots/replication, beyond what is already provided by BDB) Even if you don't like exactly what they've done, and really want to use VFS, I think you'll find it much easier to cram BDB into the kernel than Postgres! If you're determined to stick with Postgres, however, you should check out Michael Olson's work on the "Inversion" file system, which used Postgres as the basis for a file system that did some of the things you are thinking about, circa 1993. (But note that following in Michael Olson's footsteps will also lead you back to Berkeley DB...) -Dan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Jan 17 19: 9:29 2003 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 91FF037B401 for ; Fri, 17 Jan 2003 19:09:27 -0800 (PST) Received: from mail.synology.com (dns1.synology.com [210.58.106.131]) by mx1.FreeBSD.org (Postfix) with ESMTP id A306F43F1E for ; Fri, 17 Jan 2003 19:09:26 -0800 (PST) (envelope-from cheen@synology.com) Received: (from root@localhost) by mail.synology.com (8.12.5/8.12.5) id h0I39IEF075613; Sat, 18 Jan 2003 11:09:18 +0800 (CST) Received: from homexp (61-223-26-104.HINET-IP.hinet.net [61.223.26.104]) (authenticated bits=0) by mail.synology.com (8.12.5/8.12.5av) with ESMTP id h0I39DeG075602; Sat, 18 Jan 2003 11:09:14 +0800 (CST) Message-ID: <004201c2be9f$004059d0$681adf3d@homexp> From: "Cheen Liao" To: "Dan Ellard" Cc: References: <20030114192634.75751.qmail@web13505.mail.yahoo.com> <20030117075118.GA3493@HAL9000.homeunix.com> <3E27DA7F.D5DBEFB@mindspring.com> <20030117222410.GA5449@HAL9000.homeunix.com> <001401c2be93$c36c7490$681adf3d@homexp> Subject: Re: Transaction File System - a replacement of JFS Date: Sat, 18 Jan 2003 11:09:21 +0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2720.3000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Virus-Scanned: by AMaViS perl-11 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org These are great information. I will check them out. Here let me try some quick explanation to the rationale behind some decisions: We choose postgresql is because, postgresql has true BSD license. It does not matter if it is used for commercial redistribution or not. BDB is not. Also postgresql has great query supports and migration supports. Users can migrate their commercial database application over postgresql, or in the future, TFS. We choose VFS approach is because there are a lot of functions, from both open source community and my company, built on VFS layers. Note that it is more clean to run a database engine in kernel while VFS is just one way to view the data in the database. Certainly NFS can be another way. I expect the main challenge of the project is relying in merging the resources managed by database engine into the kernel. Adding more interfaces to accessing the data can be done in a later stage. Again thank you for the information and your interest in the project, Cheen ----- Original Message ----- From: "Dan Ellard" To: "Cheen Liao" Cc: Sent: Saturday, January 18, 2003 10:21 AM Subject: Re: Transaction File System - a replacement of JFS > On Sat, 18 Jan 2003, Cheen Liao wrote: > > > Recently there are discussions on JFS on FreeBSD. I think my company's > > development plan may meet the demands. > > > > My company is planning to build a Transactional File System (TFS) on > > FreeBSD, which has journaling (logging) capability and database capability. > > The basic idea is to build a file system on a database engine. When it is > > done, it should supersede JFS with its database functionality. > > ... > > You should get in contact with Lex Stein (stein@eecs.harvard.edu) and > Mike Tucker (mtucker@eecs.harvard.edu). They have built a file system > on top of Berkeley DB, and it's completely transaction-oriented. It's > open source and available to download now. The basic idea sounds like > almost exactly what you're planning to do, except that it's based on > Berkeley DB instead of Postgres, and its interface is a user-level > NFSv3 server instead of VFS. (I don't know whether they've thought > about the niftier features like snapshots/replication, beyond what is > already provided by BDB) > > Even if you don't like exactly what they've done, and really want to > use VFS, I think you'll find it much easier to cram BDB into the > kernel than Postgres! If you're determined to stick with Postgres, > however, you should check out Michael Olson's work on the "Inversion" > file system, which used Postgres as the basis for a file system that > did some of the things you are thinking about, circa 1993. (But note > that following in Michael Olson's footsteps will also lead you back to > Berkeley DB...) > > -Dan > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message