From owner-freebsd-fs Sun Mar 10 8:20:51 2002 Delivered-To: freebsd-fs@freebsd.org Received: from tara.freenix.org (keltia.freenix.org [62.4.20.87]) by hub.freebsd.org (Postfix) with ESMTP id 4350737B404 for ; Sun, 10 Mar 2002 08:20:48 -0800 (PST) Received: by tara.freenix.org (Postfix/TLS, from userid 101) id E72492AA3; Sun, 10 Mar 2002 17:20:46 +0100 (CET) Date: Sun, 10 Mar 2002 17:20:46 +0100 From: Ollivier Robert To: freebsd-fs@freebsd.org Subject: Re: [reiserfs-list] Re: Reiserfs on Freebsd Message-ID: <20020310162046.GA8717@tara.freenix.org> Mail-Followup-To: freebsd-fs@freebsd.org References: <20020301224616.A12630@deathsgate.demon.co.uk> <20020302070305.A15982@deathsgate.demon.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20020302070305.A15982@deathsgate.demon.co.uk> User-Agent: Mutt/1.3.26i X-Operating-System: FreeBSD 5.0-CURRENT K6-3D/266 & 2x PIII/800 SMP Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org According to Bradley Kite: > Ack, I probably am approaching the problem with a little naivety, > but this is one way of learning, and your feedback is much appreciated!! Another way to approach this is to talk to Kirk about the journalling FFS Margo Seltzer wrote (Kirk submitted a paper about softupdates vs journalling at BSDcon in 2000) and get the source code. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.freenix.fr FreeBSD keltia.freenix.fr 5.0-CURRENT #80: Sun Jun 4 22:44:19 CEST 2000 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Mar 12 10:58:38 2002 Delivered-To: freebsd-fs@freebsd.org Received: from web13305.mail.yahoo.com (web13305.mail.yahoo.com [216.136.175.41]) by hub.freebsd.org (Postfix) with SMTP id 09EA937B74F for ; Tue, 12 Mar 2002 10:57:48 -0800 (PST) Message-ID: <20020312185747.98993.qmail@web13305.mail.yahoo.com> Received: from [132.248.28.30] by web13305.mail.yahoo.com via HTTP; Tue, 12 Mar 2002 10:57:47 PST Date: Tue, 12 Mar 2002 10:57:47 -0800 (PST) From: AQUAMAN Subject: filesystems compatibility To: freebsd-fs@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Hello My question is the next: I want to install at home debian, mandrake, redhat and freebsd, and a partition /home. The four operating systems can modify the last one, so that I don't have to install a /home partition for each one of them. I know that I have to install a filesystem that is compatible with them. Could you suggest me the appropriate one? Hewi Yoatl ===== Triathletes do it 3 times!!! __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Mar 12 11:25:16 2002 Delivered-To: freebsd-fs@freebsd.org Received: from web21109.mail.yahoo.com (web21109.mail.yahoo.com [216.136.227.111]) by hub.freebsd.org (Postfix) with SMTP id 6A74D37B417 for ; Tue, 12 Mar 2002 11:25:03 -0800 (PST) Message-ID: <20020312192503.2810.qmail@web21109.mail.yahoo.com> Received: from [62.254.0.5] by web21109.mail.yahoo.com via HTTP; Tue, 12 Mar 2002 11:25:03 PST Date: Tue, 12 Mar 2002 11:25:03 -0800 (PST) From: Hiten Pandya Reply-To: hiten@uk.FreeBSD.org Subject: Re: filesystems compatibility To: AQUAMAN , freebsd-fs@FreeBSD.org In-Reply-To: <20020312185747.98993.qmail@web13305.mail.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org --- AQUAMAN wrote: > Hello > > My question is the next: > > I want to install at home debian, mandrake, redhat and > freebsd, and a partition /home. The four operating > systems can modify the last one, so that I don't have > to install a /home partition for each one of them. > > I know that I have to install a filesystem that is > compatible with them. > Could you suggest me the appropriate one? > > Hewi Yoatl Hello Hewi, This list is only for technical discussions, I would suggest that you ask this at freebsd-questions@FreeBSD.org, which will yeild you better responses. Sorry, I can't answer your question, but a rough guess would be to use EXT2FS for your home partition. Regards, -- Hiten Pandya -- __________________________________________________ Do You Yahoo!? Try FREE Yahoo! Mail - the world's greatest free email! http://mail.yahoo.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Mar 12 13:28:15 2002 Delivered-To: freebsd-fs@freebsd.org Received: from harrier.prod.itd.earthlink.net (harrier.mail.pas.earthlink.net [207.217.120.12]) by hub.freebsd.org (Postfix) with ESMTP id 9155737B419 for ; Tue, 12 Mar 2002 13:27:17 -0800 (PST) Received: from pool0291.cvx40-bradley.dialup.earthlink.net ([216.244.43.36] helo=mindspring.com) by harrier.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16ktng-0001b0-00; Tue, 12 Mar 2002 13:27:16 -0800 Message-ID: <3C8E72A3.6E9CBC6F@mindspring.com> Date: Tue, 12 Mar 2002 13:26:59 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: AQUAMAN Cc: freebsd-fs@FreeBSD.org Subject: Re: filesystems compatibility References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org AQUAMAN wrote: > I want to install at home debian, mandrake, redhat and > freebsd, and a partition /home. The four operating > systems can modify the last one, so that I don't have > to install a /home partition for each one of them. > > I know that I have to install a filesystem that is > compatible with them. > Could you suggest me the appropriate one? You probably wanted to ask this in questions. -- It's really hard to answer these kinds of questions exhaustively, since Linux has the bad habit of changing things about the on disk layout of FS data, and not changing the name of the FS; there are at least six incompatible hacks on EXT2FS since the first EXT2FS, and knowing which one you have is an exercise in detective work. The limiting factor is going to be the FS's the are read/write that all the Linux distributions have in common, and that are also supported by FreeBSD. I think the only one in common for all three Linux distributions, that doesn't have local hacks, with be EXT2FS. FreeBSD can read and write EXT2FS, as long as you aren't using local hacks (last time I checked this, a long time ago, I admit, FreeBSD did not support the RedHat hack for sparse superblocks, and neither did Debian). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Mar 13 4: 8:18 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx7.mail.ru (mx7.mail.ru [194.67.57.17]) by hub.freebsd.org (Postfix) with ESMTP id 77A1537B416 for ; Wed, 13 Mar 2002 04:08:15 -0800 (PST) Received: from f9.int ([10.0.0.77] helo=f9.mail.ru) by mx7.mail.ru with esmtp (Exim MX.7) id 16l7YE-000K4I-00 for freebsd-fs@freebsd.org; Wed, 13 Mar 2002 15:08:14 +0300 Received: from mail by f9.mail.ru with local (Exim FE.9) id 16l7YD-0001FG-00 for freebsd-fs@FreeBSD.org; Wed, 13 Mar 2002 15:08:13 +0300 Received: from [144.16.67.8] by eng.mail.ru with HTTP; Wed, 13 Mar 2002 15:08:13 +0300 From: "Parity Error" To: freebsd-fs@FreeBSD.org Cc: Subject: metadata update durability ordering/soft updates Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: 144.16.67.147 via proxy [144.16.67.8] Date: Wed, 13 Mar 2002 15:08:13 +0300 Reply-To: "Parity Error" Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org with soft-updates metadata updates are delayed write. I am wondering if, say there are two independent structural changes, one after another, and then a crash happens. Is there a possibility that the latter structural change got written to disk before the former due to some memory replacement policy ? could this affect the correctness of some applications ? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Mar 13 9: 7:37 2002 Delivered-To: freebsd-fs@freebsd.org Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by hub.freebsd.org (Postfix) with ESMTP id 75A8D37B416 for ; Wed, 13 Mar 2002 09:07:29 -0800 (PST) Received: by elvis.mu.org (Postfix, from userid 1192) id 03582AE24A; Wed, 13 Mar 2002 09:07:29 -0800 (PST) Date: Wed, 13 Mar 2002 09:07:28 -0800 From: Alfred Perlstein To: Parity Error Cc: freebsd-fs@FreeBSD.org Subject: Re: metadata update durability ordering/soft updates Message-ID: <20020313170728.GM32410@elvis.mu.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.27i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org * Parity Error [020313 04:08] wrote: > with soft-updates metadata updates are delayed write. I am wondering if, say > there > are two independent structural changes, one after another, and then a crash > happens. > Is there a possibility that the latter structural change got written to disk > before the > former due to some memory replacement policy ? > > could this affect the correctness of some applications ? Of course! This happens with almost any filesystem. This is why you have fsync(2). -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Mar 13 9:59:24 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mail.wolves.k12.mo.us (mail.wolves.k12.mo.us [207.160.214.1]) by hub.freebsd.org (Postfix) with ESMTP id 48B4C37B41A; Wed, 13 Mar 2002 09:59:08 -0800 (PST) Received: from mail.wolves.k12.mo.us (cdillon@mail.wolves.k12.mo.us [207.160.214.1]) by mail.wolves.k12.mo.us (8.9.3/8.9.3) with ESMTP id LAA35159; Wed, 13 Mar 2002 11:59:07 -0600 (CST) (envelope-from cdillon@wolves.k12.mo.us) Date: Wed, 13 Mar 2002 11:59:06 -0600 (CST) From: Chris Dillon To: Cc: Subject: CD-MRW a.k.a Mt. Rainier support Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org CC'd to freebsd-fs since this is somewhat fs-related... Is anyone working on implementing support for CD-MRW (apparently included in MMC-3) into either the SCSI cd driver or the ATAPI cd driver? Where/how would be the best place to implement this so that it will work with either ATAPI or SCSI drives? Would implementing it in the SCSI cd driver be best, since we now have the option of using ATAPI drives with CAM? In case anyone is wondering what CD-MRW (Mt. Rainier Re-Writable) is, it is a new standard (currently only available in the Yamaha CRW3200 series, that I know of), that allows on-the-fly transparent formatting, hardware defect management, and 2K-block logical addressing of CD-RW discs and specifies a specialized UDF filesystem to be used along with these hardware abilities. This will make drives supporting this standard act like a more traditional magnetic-media removable drive, thus greatly simplifying reading/writing to CD-RW discs. Since MRW uses a new format it is not backwards compatible with any existing CD-RW formats, though it is possible to _read_ a MRW formatted disc in a regular drive with the proper software support. MRW uses UDF as its standard filesystem, which we do not yet support, though I envision using the hardware MRW support of the drive to put just about anything you want onto it, including FAT or UFS, to use it as a "regular" drive. I'd love to take a shot at implementing this if someone isn't already, though I'll need to find the specs for the hardware side of Mt. Rainier. Apprently it is implemented in the new MMC-3 command set. Anyone have any pointers? -- Chris Dillon - cdillon@wolves.k12.mo.us - cdillon@inter-linc.net FreeBSD: The fastest and most stable server OS on the planet - Available for IA32 (Intel x86) and Alpha architectures - IA64, PowerPC, UltraSPARC, and ARM architectures under development - http://www.freebsd.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Mar 13 10: 7:41 2002 Delivered-To: freebsd-fs@freebsd.org Received: from melchior.cuivre.fr.eu.org (melchior.enst.fr [137.194.161.6]) by hub.freebsd.org (Postfix) with ESMTP id 2F0F237B400; Wed, 13 Mar 2002 10:07:37 -0800 (PST) Received: from melusine.cuivre.fr.eu.org (melusine.enst.fr [137.194.160.34]) by melchior.cuivre.fr.eu.org (Postfix) with ESMTP id 86BDC8567; Wed, 13 Mar 2002 19:07:34 +0100 (CET) Received: by melusine.cuivre.fr.eu.org (Postfix, from userid 1000) id 43A4D2C3D2; Wed, 13 Mar 2002 19:07:18 +0100 (CET) Date: Wed, 13 Mar 2002 19:07:18 +0100 From: Thomas Quinot To: Chris Dillon Cc: freebsd-scsi@freebsd.org, freebsd-fs@freebsd.org Subject: Re: CD-MRW a.k.a Mt. Rainier support Message-ID: <20020313190718.A3239@melusine.cuivre.fr.eu.org> Reply-To: thomas@cuivre.fr.eu.org References: Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.2.5i In-Reply-To: ; from cdillon@wolves.k12.mo.us on Wed, Mar 13, 2002 at 11:59:06AM -0600 X-message-flag: WARNING! Using Outlook can damage your computer. Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Le 2002-03-13, Chris Dillon écrivait : > it will work with either ATAPI or SCSI drives? Would implementing it > in the SCSI cd driver be best, since we now have the option of using > ATAPI drives with CAM? I'd say implement in the SCSI cd driver, because this option allows you to support both proper SCSI devices and ATAPI units without duplicated code :). Thomas. -- Thomas.Quinot@Cuivre.FR.EU.ORG To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Mar 13 11: 1:18 2002 Delivered-To: freebsd-fs@freebsd.org Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by hub.freebsd.org (Postfix) with ESMTP id E54B337B400 for ; Wed, 13 Mar 2002 11:01:12 -0800 (PST) Received: from pool0082.cvx21-bradley.dialup.earthlink.net ([209.179.192.82] helo=mindspring.com) by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lDzq-0006V8-00; Wed, 13 Mar 2002 11:01:10 -0800 Message-ID: <3C8FA1E4.A89F52FF@mindspring.com> Date: Wed, 13 Mar 2002 11:00:52 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Parity Error Cc: freebsd-fs@FreeBSD.org Subject: Re: metadata update durability ordering/soft updates References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Parity Error wrote: > with soft-updates metadata updates are delayed write. I am > wondering if, say there are two independent structural changes, > one after another, and then a crash happens. > > Is there a possibility that the latter structural change got > written to disk before the former due to some memory replacement > policy ? Independent writes are independent, by definition. They are permitted to occur in either order. Metadata updates are only ordered by soft updates insofar as necessary to satify dependencies. Thus indepependent writes can occur in any order, but will *usually* occur in order, due to the way that a scheduled write can not be reordered once it is given to the disk controller. This is due to a locking issue on the disk operations queue in the driver, and is arguably a bug. It's likely that some work currently in progress will forceed to the point that the "likely ordering" of independent operations will "go away in the future, so you can't even safely depend on it being likely. This is normally an issue only for updates that do things like update both an index and a record file, and imply a dependency order in the operation. In other words, there is implied metadata between the two files, and therefore an implied dependency. It's the application's responsibility to signal the dependency to the OS, so that the updates are ordered. The normal way to do this is to use a two stage commit operation (per standard database theoury, Circa IBM, 1965). In UNIX this is done by requesting that the first operation be committed, before making the request to begin the second operation (e.g. a software barrier instruction). To find out more about this, you should use "man fsync" and "man open" (in the "open" page, look for "O_FSYNC"). As to misordering of dependent writes, even if you use synchronous I/O properly... Yes, this can happen due to the memory replacement policy on many IDE hard drives, which lie about data having been committed to stable storage, when in fact it has only been written to the disk write cache, which is far from stable storage, being as it's not battery backed, and it is not guaranteed to be written to the disk after a power failure, except on some IBM and Quantum drives which are no longer manufactured. You can ensure this doesn't happen to you by using only disks which can correctly support cache flush primitives and tagged command queues, or disabling write caching on the device. SCSI devices don't have this problem. Another potential problem is that some IDE disks will acknowledge disabling write caching, but will in fact not disable it, no matter what commands you spit at them. For some of these disks, there are firmware updates available, but if you are unlucky enough to own one of these disks, then there is usually no option but to buy a good disk instead. May I recommend SCSI? > could this affect the correctness of some applications ? The disk caching issue could. The implied metadata could not. If you have an application that uses implied metadata, but does not take the necessary steps for UNIX to ensure that the OS is signalled about the implied ordering dependency, then by definition, your application can't have it's correctness effected... since it has no correctness to lose. 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Mar 13 11: 5:17 2002 Delivered-To: freebsd-fs@freebsd.org Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by hub.freebsd.org (Postfix) with ESMTP id 79DEE37B41B for ; Wed, 13 Mar 2002 11:05:10 -0800 (PST) Received: from pool0082.cvx21-bradley.dialup.earthlink.net ([209.179.192.82] helo=mindspring.com) by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lE3e-0004nt-00; Wed, 13 Mar 2002 11:05:06 -0800 Message-ID: <3C8FA2D0.4542C198@mindspring.com> Date: Wed, 13 Mar 2002 11:04:48 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Parity Error , freebsd-fs@FreeBSD.org Subject: Re: metadata update durability ordering/soft updates References: <3C8FA1E4.A89F52FF@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Ugh. Being dyslexic sucks. Terry Lambert wrote: [ ... ] > work currently in progress will forceed to the point that the *proceed* [ ... ] > database theoury, Circa IBM, 1965). In UNIX this is done by *theory* -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Mar 14 1:36:13 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mx8.mail.ru (mx8.mail.ru [194.67.57.18]) by hub.freebsd.org (Postfix) with ESMTP id CA68537B423 for ; Thu, 14 Mar 2002 01:35:56 -0800 (PST) Received: from f10.int ([10.0.0.78] helo=f10.mail.ru) by mx8.mail.ru with esmtp (Exim MX.8) id 16lRax-0005sk-00; Thu, 14 Mar 2002 12:32:23 +0300 Received: from mail by f10.mail.ru with local (Exim FE.10) id 16lReK-000C3T-00; Thu, 14 Mar 2002 12:35:52 +0300 Received: from [144.16.67.8] by eng.mail.ru with HTTP; Thu, 14 Mar 2002 12:35:52 +0300 From: "Parity Error" To: "Terry Lambert" Cc: freebsd-fs@FreeBSD.org Subject: Re[2]: metadata update durability ordering/soft updates Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: 144.16.67.147 via proxy [144.16.67.8] Date: Thu, 14 Mar 2002 12:35:52 +0300 In-Reply-To: <3C8FA1E4.A89F52FF@mindspring.com> Reply-To: "Parity Error" Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org i am referring not to file data, but filesystem metadata, which is now _delayed_ write. When we did synch write to sequence multiple metadata updates belonging to one operation for ensuring recoverability of that one operation, we also got inter-operation ordering for free (and apps/users could have started depending on it) . Unix provides no guarantess reg the order in which file data will become stable, and apps should use fsync/O_SYNC or logging or whatever to ensure the consistency of their data stores. But, the ordering in which different metadata operations becomes stables, if not enforced could result in the following scenario. md a touch a/file{0,1}{0,1}{0,1}{0,1} md a/b touch a/b/file{0,1}{0,1}{0,1}{0,1} < a crash happens sometime later > after recovery, it could turn out that all of a/b/file* is there, but only a few of a/file* are there (possibly those in the first dir block). These kind of things would not occur when we did synch write of metadata (disk scheduling would not affect this). unlink could possibly produce even more dramatic effects. Now the question is whether this kind of behaviour from the filesystem is acceptable and whether some applications can actually fail badly due to this. -----Original Message----- From: Terry Lambert To: Parity Error Date: Wed, 13 Mar 2002 11:00:52 -0800 Subject: Re: metadata update durability ordering/soft updates Parity Error wrote: > with soft-updates metadata updates are delayed write. I am > wondering if, say there are two independent structural changes, > one after another, and then a crash happens. > > Is there a possibility that the latter structural change got > written to disk before the former due to some memory replacement > policy ? Independent writes are independent, by definition. They are permitted to occur in either order. Metadata updates are only ordered by soft updates insofar as necessary to satify dependencies. Thus indepependent writes can occur in any order, but will *usually* occur in order, due to the way that a scheduled write can not be reordered once it is given to the disk controller. This is due to a locking issue on the disk operations queue in the driver, and is arguably a bug. It's likely that some work currently in progress will forceed to the point that the "likely ordering" of independent operations will "go away in the future, so you can't even safely depend on it being likely. This is normally an issue only for updates that do things like update both an index and a record file, and imply a dependency order in the operation. In other words, there is implied metadata between the two files, and therefore an implied dependency. It's the application's responsibility to signal the dependency to the OS, so that the updates are ordered. The normal way to do this is to use a two stage commit operation (per standard database theoury, Circa IBM, 1965). In UNIX this is done by requesting that the first operation be committed, before making the request to begin the second operation (e.g. a software barrier instruction). To find out more about this, you should use "man fsync" and "man open" (in the "open" page, look for "O_FSYNC"). As to misordering of dependent writes, even if you use synchronous I/O properly... Yes, this can happen due to the memory replacement policy on many IDE hard drives, which lie about data having been committed to stable storage, when in fact it has only been written to the disk write cache, which is far from stable storage, being as it's not battery backed, and it is not guaranteed to be written to the disk after a power failure, except on some IBM and Quantum drives which are no longer manufactured. You can ensure this doesn't happen to you by using only disks which can correctly support cache flush primitives and tagged command queues, or disabling write caching on the device. SCSI devices don't have this problem. Another potential problem is that some IDE disks will acknowledge disabling write caching, but will in fact not disable it, no matter what commands you spit at them. For some of these disks, there are firmware updates available, but if you are unlucky enough to own one of these disks, then there is usually no option but to buy a good disk instead. May I recommend SCSI? > could this affect the correctness of some applications ? The disk caching issue could. The implied metadata could not. If you have an application that uses implied metadata, but does not take the necessary steps for UNIX to ensure that the OS is signalled about the implied ordering dependency, then by definition, your application can't have it's correctness effected... since it has no correctness to lose. 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Mar 14 9:32:32 2002 Delivered-To: freebsd-fs@freebsd.org Received: from ns.caldera.de (ns.caldera.de [212.34.180.1]) by hub.freebsd.org (Postfix) with ESMTP id 7D2A437B404 for ; Thu, 14 Mar 2002 09:32:29 -0800 (PST) Received: (from hch@localhost) by ns.caldera.de (8.11.6/8.11.6) id g2EHWJg29073; Thu, 14 Mar 2002 18:32:19 +0100 Date: Thu, 14 Mar 2002 18:32:19 +0100 From: Christoph Hellwig To: Terry Lambert Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility Message-ID: <20020314183219.A28415@caldera.de> References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C8E72A3.6E9CBC6F@mindspring.com>; from tlambert2@mindspring.com on Tue, Mar 12, 2002 at 01:26:59PM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Tue, Mar 12, 2002 at 01:26:59PM -0800, Terry Lambert wrote: > It's really hard to answer these kinds of questions exhaustively, > since Linux has the bad habit of changing things about the on > disk layout of FS data, and not changing the name of the FS; > there are at least six incompatible hacks on EXT2FS since the > first EXT2FS, and knowing which one you have is an exercise in > detective work. > > [snip] Terry, could you _please_ check the facts before you are going to tell the world fs myth over and over? Unlike FFS/UFS which has at least a dozen incompatible derivates ext2 was designed with extensibility in mind. If you would care to actually look at the ext2 superblock definition you would notice two things: 1) a revision level (s_rev_level) 2) 3 feature flags (s_feature_compat, s_feature_incompat, s_feature_ro_compat) The first is used for global filesystem revisioning and so far has only two allowed values, EXT2_GOOD_OLD_REV for very very old filesystems from Linux 0.x days and EXT2_DYNAMIC_REV which is used for any current filesystem. The feature flags (which did not exist in EXT2_GOOD_OLD_REV) allow fine-graded and backwards compatible extension to the filesystem layout without messing up other implementation like it has happened with UFS. The first set of flags, called compatible are extensions that can be ignored by implementation that do not know about them, they have a meaning only for fsck, examples are directory preallocation or the presence of a journal inode for the Linux 'ext3' driver. The second set, called 'ro_compat' is for layout changes that can be mounted r/o by old drivers, an example are the sparse superblocks introduced in Linux 2.2's ext2 driver. The third set is for layout changes that need support from the driver for both reading and writing, examples is the 4.4BSD-style dirent layout ext2 can use optionally or and filesystem with an unrecovered log written by the Linux ext3 driver. > I think the only one in common for all three Linux distributions, > that doesn't have local hacks, with be EXT2FS. FreeBSD can read > and write EXT2FS, as long as you aren't using local hacks (last > time I checked this, a long time ago, I admit, FreeBSD did not > support the RedHat hack for sparse superblocks, and neither did > Debian). Sparse superblocks is a feature introduced in Linux 2.2 and thus supported by all Linux distributions having 2.2 or newer kernels (including Debian potatoe/woody!), which was also backported to 2.0 and included in 2.0.39. Christoph -- Of course it doesn't work. We've performed a software upgrade. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Mar 14 12:48: 0 2002 Delivered-To: freebsd-fs@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id 0C83F37B402 for ; Thu, 14 Mar 2002 12:47:50 -0800 (PST) Received: from pool0226.cvx22-bradley.dialup.earthlink.net ([209.179.198.226] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lc8O-0001xa-00; Thu, 14 Mar 2002 12:47:36 -0800 Message-ID: <3C910C57.71C2D823@mindspring.com> Date: Thu, 14 Mar 2002 12:47:19 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Parity Error Cc: freebsd-fs@FreeBSD.org Subject: Re: metadata update durability ordering/soft updates References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Parity Error wrote: > i am referring not to file data, but filesystem metadata, which > is now _delayed_ write. I understand this. Do you understand that delaying the metatadata writes in soft updates does not affect the dependency ordering, but may affect the time ordering? If I have two dependent lists of operations, A-B-C and D-B-E, then I am ony guaranteed that A and D will occur before B, and C andc E will occur after B, but there is no guarantee on the order of [A,D] vs. [D,A] or [C,E] vs. [E,C]. If I have to OTHER dependent lists of operations, Q-R and S-T, then I am only guaranteed that Q will occur before R, and S will occur before T, but there is no guarantee on the order of [ [Q,S], [Q,T], [R,S], [R,T] ] vs. [ [S,Q], [T,Q], [S,R], [T,R] ]; Q-R-S-T is a valid order, as is S-T-Q-R, as is [Q-S-T-R], as is [Q-S-R-T], etc.. > When we did synch write to sequence multiple metadata updates > belonging to one operation for ensuring recoverability of that > one operation, we also got inter-operation ordering for free Yes. > (and apps/users could have started depending on it) . No. Only misinformed users. The system *never* made *any* guarantees with regard to implied metadata. Your statement "multiple metadata updates belonging to one operation" is bogus. There is no such thing as "one operation" in this context. Multiple metadata updates are multiple operations, and the filesystem guarantees are only that the operations will not return to the user until they have completed in the guaranteed order, not that they have completed in any time relative order compared to each other. > Unix provides no guarantess reg the order in which file data > will become stable, and apps should use fsync/O_SYNC or logging > or whatever to ensure the consistency of their data stores. That's nice, but it's irrelevant to this discussion, since file data was never guaranteed for write anyway. THe reason the fsync/O_SYNC work to serialize the metadata operations is that the operations are guaranteed to occur using synchronous I/O, before they return. In other words, they are stall barriers instituted by the application programmer in order to get the behaviour the users ..."could have started depending on"... on purpose, rather than getting it as a result of an accident of the implementation of the underlying primitives. > But, the ordering in which different metadata operations becomes > stables, if not enforced could result in the following scenario. [ ... demonstration of failure of bogus assumptions ... ] Yes. Bogus assumptions are bogus. That's a circular argument. One must not make bogus assumptions, if one wants one's code to operate reliably. Your example is poor, as well, unless you intended the "touch" operations to occur concurrently. > These kind of things would not occur when we did synch write of > metadata (disk scheduling would not affect this). unlink could > possibly produce even more dramatic effects. Now the question is > whether this kind of behaviour from the filesystem is acceptable > and whether some applications can actually fail badly due to this. A1: The behaviour is acceptable, since the behaviour guarantees for metadata stability are mandated by operational guarantees. To boils this down to laymans language: the OS provides a set of services upon which reliable services can be built, if they are correctly engineered. It is up to the people building the layers of services on top of the OS services to provide those facilities that do not exist within the OS proper, such that they are reliable. In other words, the purpose of the OS is to provide an unconstrained foundation. So long as you don't mount the FS in such a way that the metadata updates are not carried out in the correct order, (e.g. async), then you can create a system in which the ordering guarantees are maintained from end-to-end, and you can reliably know the state that you would have been in had you not crashed, following a crash, and can recover by rolling the operation forward, if all necessary data is available, or backward, if it is not. A2: Applications which expect behaviour other than that guaranteed by the API definitions can be expected to fail badly when their assumptions are proven to be unfounded in reality. STANDARDS COMPLIANCE AND METADATA UPDATES, WITH A SURVEY OF OS/FS's Certaint metadata updates, such as those to ctime, mtime, and atime, are guaranteed by the POSIX standard. These, in turn, imply that the containers for these objects are similarly guaranteed, to the root operation, such that the guaranteed operations are always reliable. Any OS which fails to make these guarantees is, by its definition, non-compliant with POSIX. You can intentionally choose to operate certain filesystems in a POSIX-non-compliant mode; for example, you can use an MFS, or you can mount a filesystem async, such that metatadata update guarantees required for conformance to the standard are not observed. But you knowingly give up standards compliance when you do this. For example, Linux running EXT2FS mounted asynchronously fails to comply with the POSIX standard with regard to update of ctime, atime, and mtime updates, both because of the direct failure for such updates to be committed to stable storage, and because of the indirect failure of the updates to be committed, since the containers are not committed, thus making the containers in which the commits are taking place fail to comply with the definition of "stable storage". Another example would be FreeBSD running FFS, if you went out of the way to mount it async, rather than sync (or with more recent installations, with soft updates). Similarly, mounting it noatime also fails this test. If you were to mount a System V UFS in SVR4.2 by default, without specifying "sync" or "async", then you get a behaviour called DOW (Delayed Ordered Writes), in which an intentionally stall point is inserted between dependeny convergences. THis is similar to soft updates, in that the stall point requires synchronization of the stable storage at the point where the intersection would occur, but it provides only non-commutability on non-commutable operations in a given edge, and does not permit reordering of associativity, even though operations are associative, and effeciency might be gained, thereby. Thus the original A-B-C, D-B-E operation actually *must* occur in A-B B-E ordering, with a stall between the "B" and the "B". This only coincidently makes a *partial* ordering guarantee on the order of independent metadata updates -- so even here, you can not rely on the system ordering independent updates, only on it being standards compliant in the API guarantees. If you want this behaviour on Linux, ReiserFS uses the USL patented DOW technology without a license. If you are outside the US, and don't plan on selling into the US until at least 2018, you could use ReiserFS to get metadata update ordering withing standards guaranteed operations, and it will only stall out as often as the SVR4.2 UFS with DOW. But you will have the same problem with your software that assumes -- incorrectly -- that serially requested independent metadata updates will take place serially... when, in fact, there is no such guarantee. PS: FWIW, it's *possible* to generalize the soft updates mechanism to export a transactioning interface -- actually, a dependency edge that can be used to implement transactioning -- to user space. The effect of doing this would be to also export an edge of the dependency graph upward. For two independent graphs, implying an edge between the top nodes establishes a precedence order on completion, and therefore guarantees ordering of operations within a transaction. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Mar 14 13:24: 8 2002 Delivered-To: freebsd-fs@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id 33E2437B416 for ; Thu, 14 Mar 2002 13:24:04 -0800 (PST) Received: from pool0226.cvx22-bradley.dialup.earthlink.net ([209.179.198.226] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lchY-0006th-00; Thu, 14 Mar 2002 13:23:56 -0800 Message-ID: <3C9114DA.5A2D0591@mindspring.com> Date: Thu, 14 Mar 2002 13:23:38 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Christoph Hellwig Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Christoph Hellwig wrote: > On Tue, Mar 12, 2002 at 01:26:59PM -0800, Terry Lambert wrote: > > It's really hard to answer these kinds of questions exhaustively, > > since Linux has the bad habit of changing things about the on > > disk layout of FS data, and not changing the name of the FS; > > there are at least six incompatible hacks on EXT2FS since the > > first EXT2FS, and knowing which one you have is an exercise in > > detective work. > > [snip] > > Terry, > > could you _please_ check the facts before you are going to tell the > world fs myth over and over? > > Unlike FFS/UFS which has at least a dozen incompatible derivates ext2 > was designed with extensibility in mind. If you would care to actually > look at the ext2 superblock definition you would notice two things: > > 1) a revision level (s_rev_level) > 2) 3 feature flags (s_feature_compat, s_feature_incompat, > s_feature_ro_compat) I am aware of this. Perhaps, since you are knowledgeable in the Linux EXT2FS area, you can answer the rest of the original question, now that I've narrowed the answer to "some version of EXT2FS"? -- What is the highest revision level, and what are the maximum feature flags that one can use interoperably between versions of RedHat, FreeBSD, Debian, and Mandrake? > Sparse superblocks is a feature introduced in Linux 2.2 and thus > supported by all Linux distributions having 2.2 or newer kernels > (including Debian potatoe/woody!), which was also backported to 2.0 > and included in 2.0.39. Sorry; he did not specify the version of Debian he was using, or the version of RedHat or Mandrake, or even FreeBSD for that matter). I'm guessing it will have to be "lowest revision" and "no feature flags". PS: Different versions of FFS have different magic numbers; the original number was Kirk's birthday. PPS: I'm more of an FFS maven than an EXT2FS maven; so I would be more likely to be able to tell you about FFS interoperability between systems. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 0:34:43 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mail.fidnet.com (two.fidnet.com [216.229.64.72]) by hub.freebsd.org (Postfix) with SMTP id 68F7C37B419 for ; Fri, 15 Mar 2002 00:34:35 -0800 (PST) Received: (qmail 10788 invoked from network); 15 Mar 2002 08:34:34 -0000 Received: from beast.hexaneinc.com (HELO beast) (216.229.82.132) by two.fidnet.com with SMTP; 15 Mar 2002 08:34:34 -0000 From: "Matthew Rezny" To: "freebsd-fs@freebsd.org" Date: Fri, 15 Mar 2002 02:35:31 -0600 Reply-To: "Matthew Rezny" X-Mailer: PMMail 2000 Professional (2.10.2010) For Windows 2000 (5.0.2195;2) MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: disks > 1TB Message-Id: <20020315083435.68F7C37B419@hub.freebsd.org> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I just bought a 3ware 7810 controller and 8 160GB drives, which in RAID5 yields 1.04TB (real TB). Having previously seen statements that FFS limit is 64TB, I expected this to work. Unfortunately I found that the number of sectors becomes an issue. Looking through the mailing list history I see this has come up before and it will take a lot to solve, more than the spare time I have this weekend. The quick solution is make a 1TB filesystem and let the extra .04TB go to waste rather than try to patch the whole system. However, there is a slight problem with this, which is limits in the disklabel tool. The disklabel structure which is stored on disk uses u_int32_t for the number of sectors in the device. The disklabel tool uses int when interpretting all numbers in the getasciilabel() function. This limits disklabel to 1TB devices. If the declaration on line 964 of disklabel.c is changed from "int v" to "u_int32_t v" then this limit is lifted. This change is safe since the actual value on disk is unsigned. Using unsigned in the input allow disklabel to work with devices up to 2TB. This allows creation of 1TB slices on devices >1TB so that at least part can be used in the meantime while we wait for the limit to be lifted elsewhere in the system. Also, I've seen one mention of 4TB systems in the mailing list archives. How was this done? Kernel patches, other trickery? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 0:58:37 2002 Delivered-To: freebsd-fs@freebsd.org Received: from swan.prod.itd.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by hub.freebsd.org (Postfix) with ESMTP id 0811B37B400 for ; Fri, 15 Mar 2002 00:58:34 -0800 (PST) Received: from pool0072.cvx40-bradley.dialup.earthlink.net ([216.244.42.72] helo=mindspring.com) by swan.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lnXk-0000T2-00; Fri, 15 Mar 2002 00:58:33 -0800 Message-ID: <3C91B78A.686279D0@mindspring.com> Date: Fri, 15 Mar 2002 00:57:46 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matthew Rezny Cc: "freebsd-fs@freebsd.org" Subject: Re: disks > 1TB References: <20020315083435.68F7C37B419@hub.freebsd.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Matthew Rezny wrote: > Also, I've seen one mention of 4TB systems in the mailing list > archives. How was this done? Kernel patches, other trickery? You can just put the FS on a raw device, without using a disklabel. Thus the disklabel limits don't come into play, though it does limit you to one FS per device. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 1:51: 3 2002 Delivered-To: freebsd-fs@freebsd.org Received: from pop3.psconsult.nl (ps226.psconsult.nl [193.67.147.226]) by hub.freebsd.org (Postfix) with ESMTP id 67B9737B400 for ; Fri, 15 Mar 2002 01:50:58 -0800 (PST) Received: (from paul@localhost) by pop3.psconsult.nl (8.9.2/8.9.2) id KAA79898; Fri, 15 Mar 2002 10:48:16 +0100 (CET) (envelope-from paul) Date: Fri, 15 Mar 2002 10:48:16 +0100 From: Paul Schenkeveld To: Matthew Rezny Cc: "freebsd-fs@freebsd.org" Subject: Re: disks > 1TB Message-ID: <20020315104815.A79816@psconsult.nl> References: <20020315083435.68F7C37B419@hub.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <20020315083435.68F7C37B419@hub.freebsd.org>; from mrezny@umr.edu on Fri, Mar 15, 2002 at 02:35:31AM -0600 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Fri, Mar 15, 2002 at 02:35:31AM -0600, Matthew Rezny wrote: > I just bought a 3ware 7810 controller and 8 160GB drives, which in > RAID5 yields 1.04TB (real TB). Having previously seen statements that > FFS limit is 64TB, I expected this to work. Unfortunately I found that > the number of sectors becomes an issue. Looking through the mailing > list history I see this has come up before and it will take a lot to > solve, more than the spare time I have this weekend. The quick solution > is make a 1TB filesystem and let the extra .04TB go to waste rather > than try to patch the whole system. However, there is a slight problem > with this, which is limits in the disklabel tool. The disklabel > structure which is stored on disk uses u_int32_t for the number of > sectors in the device. The disklabel tool uses int when interpretting > all numbers in the getasciilabel() function. This limits disklabel to > 1TB devices. If the declaration on line 964 of disklabel.c is changed > from "int v" to "u_int32_t v" then this limit is lifted. This change is > safe since the actual value on disk is unsigned. Using unsigned in the > input allow disklabel to work with devices up to 2TB. This allows > creation of 1TB slices on devices >1TB so that at least part can be > used in the meantime while we wait for the limit to be lifted elsewhere > in the system. Did you try to divide the disk in two FreeBSD slices using fdisk? The numbers in disklabel are relative to the fdisk slice so your xx0s1c partition is the same size as the fdisk slice. > Also, I've seen one mention of 4TB systems in the mailing list > archives. How was this done? Kernel patches, other trickery? -- Paul Schenkeveld, Consultant PSconsult ICT Services BV To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 2: 1:14 2002 Delivered-To: freebsd-fs@freebsd.org Received: from ns.caldera.de (ns.caldera.de [212.34.180.1]) by hub.freebsd.org (Postfix) with ESMTP id 2E18D37B404 for ; Fri, 15 Mar 2002 02:01:09 -0800 (PST) Received: (from hch@localhost) by ns.caldera.de (8.11.6/8.11.6) id g2FA0xS32700; Fri, 15 Mar 2002 11:00:59 +0100 Date: Fri, 15 Mar 2002 11:00:59 +0100 From: Christoph Hellwig To: Terry Lambert Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility Message-ID: <20020315110059.A32509@caldera.de> References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C9114DA.5A2D0591@mindspring.com>; from tlambert2@mindspring.com on Thu, Mar 14, 2002 at 01:23:38PM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Thu, Mar 14, 2002 at 01:23:38PM -0800, Terry Lambert wrote: > Perhaps, since you are knowledgeable in the Linux EXT2FS area, > you can answer the rest of the original question, now that I've > narrowed the answer to "some version of EXT2FS"? > > -- > > What is the highest revision level, and what are the maximum > feature flags that one can use interoperably between versions > of RedHat, FreeBSD, Debian, and Mandrake? I don't have all those Linux Distributions handy, but as the feature flags didn't change inbetween of Linux 2.2/2.4 release I'll just use generic 2.2/2.4 Kernels. Linux 2.4.18 [ext3 driver] (include/linux/ext3_fs.h): #define EXT3_FEATURE_COMPAT_SUPP 0 #define EXT3_FEATURE_INCOMPAT_SUPP (EXT3_FEATURE_INCOMPAT_FILETYPE| \ EXT3_FEATURE_INCOMPAT_RECOVER) #define EXT3_FEATURE_RO_COMPAT_SUPP (EXT3_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT3_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT3_FEATURE_RO_COMPAT_BTREE_DIR) Linux 2.4.18 (include/linux/ext2_fs.h): #define EXT2_FEATURE_COMPAT_SUPP 0 #define EXT2_FEATURE_INCOMPAT_SUPP EXT2_FEATURE_INCOMPAT_FILETYPE #define EXT2_FEATURE_RO_COMPAT_SUPP (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT2_FEATURE_RO_COMPAT_BTREE_DIR) Linux 2.2.18 (include/linux/ext2_fs.h): #define EXT2_FEATURE_COMPAT_SUPP 0 #define EXT2_FEATURE_INCOMPAT_SUPP EXT2_FEATURE_INCOMPAT_FILETYPE #define EXT2_FEATURE_RO_COMPAT_SUPP (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT2_FEATURE_RO_COMPAT_BTREE_DIR) FreeBSD-stable (sys/gnu/ext2fs/ext2_fs.h): #define EXT2_FEATURE_COMPAT_SUPP 0 #define EXT2_FEATURE_INCOMPAT_SUPP EXT2_FEATURE_INCOMPAT_FILETYPE #ifdef notyet #define EXT2_FEATURE_RO_COMPAT_SUPP (EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER| \ EXT2_FEATURE_RO_COMPAT_LARGE_FILE| \ EXT2_FEATURE_RO_COMPAT_BTREE_DIR) #else #define EXT2_FEATURE_RO_COMPAT_SUPP EXT2_FEATURE_RO_COMPAT_SPARSE_SUPER #endif So all support revision 1 filesystems, no compat flag, the incompatible 4.4BSD-style dirent and sparse superblocks. Linux 2.2/2.4 support large files and compatiblity for the never released (!) btree directory support. The Linux ext3 driver also supports filesystems that need a log replay - for other drivers this will already be cleared by a fsck run. > PS: Different versions of FFS have different magic numbers; > the original number was Kirk's birthday. Only very few FFS derivates have different major numbers, infact I only know of SVR4.2MP SFS and various HP versions. On the other hand Solaris/Solaris-i386/4.4BSD/OpenStep seem to have the same one and are _very_ incompatible. > PPS: I'm more of an FFS maven than an EXT2FS maven; so I would > be more likely to be able to tell you about FFS interoperability > between systems. Thanks, I have enough of it after implementing SVR4.2MP UFS and SFS support for Linux.. Christoph -- Of course it doesn't work. We've performed a software upgrade. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 6:59:13 2002 Delivered-To: freebsd-fs@freebsd.org Received: from helen.CS.Berkeley.EDU (helen.CS.Berkeley.EDU [128.32.131.251]) by hub.freebsd.org (Postfix) with ESMTP id 0202B37B41B for ; Fri, 15 Mar 2002 06:56:59 -0800 (PST) Received: (from jmacd@localhost) by helen.CS.Berkeley.EDU (8.9.1a/8.9.1) id GAA10649; Fri, 15 Mar 2002 06:56:51 -0800 (PST) Message-ID: <20020315065651.02637@helen.CS.Berkeley.EDU> Date: Fri, 15 Mar 2002 06:56:51 -0800 From: Josh MacDonald To: Terry Lambert , Parity Error Cc: freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: metadata update durability ordering/soft updates References: <3C910C57.71C2D823@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89.1 In-Reply-To: <3C910C57.71C2D823@mindspring.com>; from Terry Lambert on Thu, Mar 14, 2002 at 12:47:19PM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Quoting Terry Lambert (tlambert2@mindspring.com): > Parity Error wrote: > > i am referring not to file data, but filesystem metadata, which > > is now _delayed_ write. > > I understand this. Do you understand that delaying the metatadata > writes in soft updates does not affect the dependency ordering, but > may affect the time ordering? > > If I have two dependent lists of operations, A-B-C and D-B-E, > then I am ony guaranteed that A and D will occur before B, > and C andc E will occur after B, but there is no guarantee on > the order of [A,D] vs. [D,A] or [C,E] vs. [E,C]. > > If I have to OTHER dependent lists of operations, Q-R and S-T, > then I am only guaranteed that Q will occur before R, and S > will occur before T, but there is no guarantee on the order of > [ [Q,S], [Q,T], [R,S], [R,T] ] vs. [ [S,Q], [T,Q], [S,R], [T,R] ]; > Q-R-S-T is a valid order, as is S-T-Q-R, as is [Q-S-T-R], as is > [Q-S-R-T], etc.. > > > When we did synch write to sequence multiple metadata updates > > belonging to one operation for ensuring recoverability of that > > one operation, we also got inter-operation ordering for free > > Yes. > > > (and apps/users could have started depending on it) . > > No. Only misinformed users. The system *never* made *any* > guarantees with regard to implied metadata. Your statement > "multiple metadata updates belonging to one operation" is > bogus. There is no such thing as "one operation" in this > context. Multiple metadata updates are multiple operations, > and the filesystem guarantees are only that the operations > will not return to the user until they have completed in > the guaranteed order, not that they have completed in any > time relative order compared to each other. > > > > Unix provides no guarantess reg the order in which file data > > will become stable, and apps should use fsync/O_SYNC or logging > > or whatever to ensure the consistency of their data stores. > > That's nice, but it's irrelevant to this discussion, since > file data was never guaranteed for write anyway. > > THe reason the fsync/O_SYNC work to serialize the metadata > operations is that the operations are guaranteed to occur > using synchronous I/O, before they return. > > In other words, they are stall barriers instituted by the > application programmer in order to get the behaviour the > users ..."could have started depending on"... on purpose, > rather than getting it as a result of an accident of the > implementation of the underlying primitives. > > > But, the ordering in which different metadata operations becomes > > stables, if not enforced could result in the following scenario. > > [ ... demonstration of failure of bogus assumptions ... ] > > Yes. Bogus assumptions are bogus. That's a circular argument. > One must not make bogus assumptions, if one wants one's code > to operate reliably. > > Your example is poor, as well, unless you intended the "touch" > operations to occur concurrently. > > > > These kind of things would not occur when we did synch write of > > metadata (disk scheduling would not affect this). unlink could > > possibly produce even more dramatic effects. Now the question is > > whether this kind of behaviour from the filesystem is acceptable > > and whether some applications can actually fail badly due to this. > > A1: The behaviour is acceptable, since the behaviour guarantees > for metadata stability are mandated by operational guarantees. > > To boils this down to laymans language: the OS provides a set of > services upon which reliable services can be built, if they are > correctly engineered. It is up to the people building the layers > of services on top of the OS services to provide those facilities > that do not exist within the OS proper, such that they are reliable. > > In other words, the purpose of the OS is to provide an unconstrained > foundation. So long as you don't mount the FS in such a way that > the metadata updates are not carried out in the correct order, (e.g. > async), then you can create a system in which the ordering guarantees > are maintained from end-to-end, and you can reliably know the state > that you would have been in had you not crashed, following a crash, > and can recover by rolling the operation forward, if all necessary > data is available, or backward, if it is not. > > > A2: Applications which expect behaviour other than that guaranteed > by the API definitions can be expected to fail badly when their > assumptions are proven to be unfounded in reality. > > > STANDARDS COMPLIANCE AND METADATA UPDATES, WITH A SURVEY OF OS/FS's > > Certaint metadata updates, such as those to ctime, mtime, and > atime, are guaranteed by the POSIX standard. These, in turn, imply > that the containers for these objects are similarly guaranteed, to > the root operation, such that the guaranteed operations are always > reliable. Any OS which fails to make these guarantees is, by its > definition, non-compliant with POSIX. > > You can intentionally choose to operate certain filesystems in a > POSIX-non-compliant mode; for example, you can use an MFS, or you > can mount a filesystem async, such that metatadata update guarantees > required for conformance to the standard are not observed. But you > knowingly give up standards compliance when you do this. > > For example, Linux running EXT2FS mounted asynchronously fails > to comply with the POSIX standard with regard to update of ctime, > atime, and mtime updates, both because of the direct failure for > such updates to be committed to stable storage, and because of the > indirect failure of the updates to be committed, since the containers > are not committed, thus making the containers in which the commits > are taking place fail to comply with the definition of "stable > storage". > > Another example would be FreeBSD running FFS, if you went out of the > way to mount it async, rather than sync (or with more recent > installations, with soft updates). Similarly, mounting it noatime > also fails this test. > > If you were to mount a System V UFS in SVR4.2 by default, without > specifying "sync" or "async", then you get a behaviour called DOW > (Delayed Ordered Writes), in which an intentionally stall point is > inserted between dependeny convergences. THis is similar to soft > updates, in that the stall point requires synchronization of the > stable storage at the point where the intersection would occur, but > it provides only non-commutability on non-commutable operations in > a given edge, and does not permit reordering of associativity, even > though operations are associative, and effeciency might be gained, > thereby. Thus the original A-B-C, D-B-E operation actually *must* > occur in A-B B-E ordering, with a stall between the "B" and the "B". > This only coincidently makes a *partial* ordering guarantee on the > order of independent metadata updates -- so even here, you can not > rely on the system ordering independent updates, only on it being > standards compliant in the API guarantees. > > If you want this behaviour on Linux, ReiserFS uses the USL patented > DOW technology without a license. If you are outside the US, and > don't plan on selling into the US until at least 2018, you could > use ReiserFS to get metadata update ordering withing standards > guaranteed operations, and it will only stall out as often as the > SVR4.2 UFS with DOW. But you will have the same problem with your > software that assumes -- incorrectly -- that serially requested > independent metadata updates will take place serially... when, in > fact, there is no such guarantee. Terry, I'm not sure what you're talking about with regards to DOW and ReiserFS. It doesn't sound right, and I'm pretty sure we're not using anything like the patented DOW technique as you've described it. We are developing a transaction facility for many of the reasons suggested at by the original post in this thread. To summarize: - The file system has never made any guarantees. - You can use fsync() to stabilize a single file and its metadata dependencies. - You can use two-phase commit above and beyond that. - If you're not doing the right thing, "then by definition, your application can't have it's correctness effected... since it has no correctness to lose." - And, "the OS provides a set of services upon which reliable services can be built, if they are correctly engineered." All of these statements are true. Your attitude seems to be that this is a fine state of affairs, that anyone who writes an application should be fully informed of all these "transactional" issues, and that anyone who is not fully informed of all these issues is a complete moron if they expect to write reliable applications. The problem is that you're asking way to much of the average programmer, who doesn't understand transactions and isn't aware of how little the operating system actually guarantees in this regard. The other problem is that fsync() and two-phase-commit can seriously limit application performance, unless you use highly sophisticated techniques, which again rules out the average programmer. The fact is, it is very difficult to write "reliable services" on top of the standard primitives, and it is not good enough to call people morons if they don't understand this. There is a document describing our transactions design for ReiserFS version 4, which is currently under development: http://namesys.com/txn-doc.html And somewhat off topic, I have demonstrated that using fsync() and rename() as a means for reliable, atomic file updates can seriously limit application performance and that having file system transactions solves the problem. My point is that applications will perform better, not worse, if the operating system helps construct reliable services instead of this do-it-yourself approach. Master's thesis: http://prdownloads.sourceforge.net/xdelta/xdfs.pdf and the graph that shows it all: http://www.cs.berkeley.edu/~jmacd/xdfs-vs-rcs.eps Regards, -josh -- PRCS version control system http://sourceforge.net/projects/prcs Xdelta storage & transport http://sourceforge.net/projects/xdelta Need a concurrent skip list? http://sourceforge.net/projects/skiplist To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 7:53:45 2002 Delivered-To: freebsd-fs@freebsd.org Received: from ns.caldera.de (ns.caldera.de [212.34.180.1]) by hub.freebsd.org (Postfix) with ESMTP id 2645D37B41F for ; Fri, 15 Mar 2002 07:53:35 -0800 (PST) Received: (from hch@localhost) by ns.caldera.de (8.11.6/8.11.6) id g2FFrOL17729; Fri, 15 Mar 2002 16:53:24 +0100 Date: Fri, 15 Mar 2002 16:53:24 +0100 From: Christoph Hellwig To: Terry Lambert Cc: Parity Error , freebsd-fs@FreeBSD.ORG Subject: Re: metadata update durability ordering/soft updates Message-ID: <20020315165324.A17467@caldera.de> References: <3C910C57.71C2D823@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C910C57.71C2D823@mindspring.com>; from tlambert2@mindspring.com on Thu, Mar 14, 2002 at 12:47:19PM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Thu, Mar 14, 2002 at 12:47:19PM -0800, Terry Lambert wrote: > If you want this behaviour on Linux, ReiserFS uses the USL patented > DOW technology without a license. Reiserfs is a typical journaling filesystem in that it writes logical log records to either an inline log or (in recent versions) an extern log device. Ext3 uses physical block based journaling and allows additional tracking of data blocks in a way only remotely similar to DOW (the data=ordered mode). Christoph -- Of course it doesn't work. We've performed a software upgrade. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 7:58:32 2002 Delivered-To: freebsd-fs@freebsd.org Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by hub.freebsd.org (Postfix) with ESMTP id 55E4637B402 for ; Fri, 15 Mar 2002 07:58:28 -0800 (PST) Received: from pool0389.cvx22-bradley.dialup.earthlink.net ([209.179.199.134] helo=mindspring.com) by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lu5u-0002zE-00; Fri, 15 Mar 2002 07:58:14 -0800 Message-ID: <3C921A04.CFCADA9D@mindspring.com> Date: Fri, 15 Mar 2002 07:57:56 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Christoph Hellwig Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Christoph Hellwig wrote: > > PS: Different versions of FFS have different magic numbers; > > the original number was Kirk's birthday. > > Only very few FFS derivates have different major numbers, infact I only > know of SVR4.2MP SFS and various HP versions. On the other hand > Solaris/Solaris-i386/4.4BSD/OpenStep seem to have the same one and are > _very_ incompatible. 8-). Common mistake. They have opposite word order, so the version number is different. They also have different VTOC and disklabel order, so they're easy to differentiate anyway. > > PPS: I'm more of an FFS maven than an EXT2FS maven; so I would > > be more likely to be able to tell you about FFS interoperability > > between systems. > > Thanks, I have enough of it after implementing SVR4.2MP UFS and SFS > support for Linux.. Heh. I did some work on UFS for SVR4.2MP on SVR4.2MP, and a did everything for a derivative called NXFS (the magic number on that one is _my_ birthday). ;^). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 8: 4: 8 2002 Delivered-To: freebsd-fs@freebsd.org Received: from snipe.prod.itd.earthlink.net (snipe.mail.pas.earthlink.net [207.217.120.62]) by hub.freebsd.org (Postfix) with ESMTP id 0CECB37B402 for ; Fri, 15 Mar 2002 08:04:05 -0800 (PST) Received: from pool0389.cvx22-bradley.dialup.earthlink.net ([209.179.199.134] helo=mindspring.com) by snipe.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16luBU-0003GI-00; Fri, 15 Mar 2002 08:04:01 -0800 Message-ID: <3C921B5F.E19B89CD@mindspring.com> Date: Fri, 15 Mar 2002 08:03:43 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Christoph Hellwig Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Christoph Hellwig wrote: > So all support revision 1 filesystems, no compat flag, the incompatible > 4.4BSD-style dirent and sparse superblocks. Linux 2.2/2.4 support > large files and compatiblity for the never released (!) btree directory > support. The Linux ext3 driver also supports filesystems that need a > log replay - for other drivers this will already be cleared by a fsck run. By the way, in case it wasn't implicitly obvious: thanks for the research. I was pretty sure that the gating factor would be either the Mandrake or the FreeBSD EXT2FS features. I guess the answer (which we already knew) is that he's going to have to use the most downrev of the three to implement the EXT2FS support, though I'm still not clear if that's FreeBSD or one of the Linux versions he's running. Is it possible to create an EXT2FS with the lowest common denominator on a modern Linux by specifying the right command line arguments to the FS creation tool under Linux? It's been quite a while since I've done other than look over Linux kernel code (I used to contribute fixes for things like memory leaks in the path component lookup failure case, via a friend of mine with more influence over there, and that was about 3 years ago; I still read the kernel, but have stopped reading the userland). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 8:12:19 2002 Delivered-To: freebsd-fs@freebsd.org Received: from ns.caldera.de (ns.caldera.de [212.34.180.1]) by hub.freebsd.org (Postfix) with ESMTP id C807137B42A for ; Fri, 15 Mar 2002 08:11:55 -0800 (PST) Received: (from hch@localhost) by ns.caldera.de (8.11.6/8.11.6) id g2FGBpI18746; Fri, 15 Mar 2002 17:11:51 +0100 Date: Fri, 15 Mar 2002 17:11:51 +0100 From: Christoph Hellwig To: Terry Lambert Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility Message-ID: <20020315171151.A18291@caldera.de> References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de> <3C921B5F.E19B89CD@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C921B5F.E19B89CD@mindspring.com>; from tlambert2@mindspring.com on Fri, Mar 15, 2002 at 08:03:43AM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Fri, Mar 15, 2002 at 08:03:43AM -0800, Terry Lambert wrote: > Is it possible to create an EXT2FS with the lowest common > denominator on a modern Linux by specifying the right command > line arguments to the FS creation tool under Linux? mke2fs -O none. mke2fs(8) is your friend :) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 8:14:20 2002 Delivered-To: freebsd-fs@freebsd.org Received: from ns.caldera.de (ns.caldera.de [212.34.180.1]) by hub.freebsd.org (Postfix) with ESMTP id 5637B37B400 for ; Fri, 15 Mar 2002 08:14:16 -0800 (PST) Received: (from hch@localhost) by ns.caldera.de (8.11.6/8.11.6) id g2FGECQ19100; Fri, 15 Mar 2002 17:14:12 +0100 Date: Fri, 15 Mar 2002 17:14:12 +0100 From: Christoph Hellwig To: Terry Lambert Cc: AQUAMAN , freebsd-fs@FreeBSD.ORG Subject: Re: filesystems compatibility Message-ID: <20020315171412.A18753@caldera.de> References: <20020312185747.98993.qmail@web13305.mail.yahoo.com> <3C8E72A3.6E9CBC6F@mindspring.com> <20020314183219.A28415@caldera.de> <3C9114DA.5A2D0591@mindspring.com> <20020315110059.A32509@caldera.de> <3C921A04.CFCADA9D@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C921A04.CFCADA9D@mindspring.com>; from tlambert2@mindspring.com on Fri, Mar 15, 2002 at 07:57:56AM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Fri, Mar 15, 2002 at 07:57:56AM -0800, Terry Lambert wrote: > 8-). Common mistake. They have opposite word order, so the > version number is different. They also have different VTOC > and disklabel order, so they're easy to differentiate anyway. At least SVR4.2MP and 4.4BSD run on LE and BE hardware, and the Linux UFS driver supports both endianesses and about 10 different derivates.. VTOC handling is done by Linux between the block drivers and the filesystem which has advantages by e.g. sharing SysV VTOC support for sysvfs, ufs and vxfs and cannot easily accessed by the filesystem due to layering constraints. > Heh. I did some work on UFS for SVR4.2MP on SVR4.2MP, and a > did everything for a derivative called NXFS (the magic number > on that one is _my_ birthday). ;^). That NetWare-Attributes thingy? *shrug* Christoph -- Of course it doesn't work. We've performed a software upgrade. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 10:26: 2 2002 Delivered-To: freebsd-fs@freebsd.org Received: from gull.prod.itd.earthlink.net (gull.mail.pas.earthlink.net [207.217.120.84]) by hub.freebsd.org (Postfix) with ESMTP id 7C54A37B404 for ; Fri, 15 Mar 2002 10:25:46 -0800 (PST) Received: from pool0371.cvx22-bradley.dialup.earthlink.net ([209.179.199.116] helo=mindspring.com) by gull.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lwOa-0004ld-00; Fri, 15 Mar 2002 10:25:40 -0800 Message-ID: <3C923C91.454D7710@mindspring.com> Date: Fri, 15 Mar 2002 10:25:21 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Josh MacDonald Cc: Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: metadata update durability ordering/soft updates References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Josh MacDonald wrote: > Terry, > > I'm not sure what you're talking about with regards to DOW and > ReiserFS. It doesn't sound right, and I'm pretty sure we're not using > anything like the patented DOW technique as you've described it. As usual, the patent claims are general enough to cover things; see: US: 5666532 US: 5642501 Here is the USPTO patent number search engine: http://164.195.100.11/netahtml/srchnum.htm > We are developing a transaction facility for many of the reasons > suggested at by the original post in this thread. Yes, I understand. > To summarize: > > - The file system has never made any guarantees. Yes it has. If you look at the atime/mtime/ctime update requirements for the OS, they are pretty blatant. THey just aren't enough to be able to blindly use them. > - You can use fsync() to stabilize a single file and its metadata > dependencies. Metadata stabilization should be automatic. What an fsync there does is really enforce ordering on metadata writes, by acting as a barrier. > - You can use two-phase commit above and beyond that. Yes, by implementing on top of fsync. > - If you're not doing the right thing, "then by definition, your > application can't have it's correctness effected... since it has no > correctness to lose." Yes. > - And, "the OS provides a set of services upon which reliable services > can be built, if they are correctly engineered." Yes. > All of these statements are true. Your attitude seems to be that this > is a fine state of affairs, that anyone who writes an application > should be fully informed of all these "transactional" issues, and that > anyone who is not fully informed of all these issues is a complete > moron if they expect to write reliable applications. No, I merely expect that a person who claims to be a craftsman should know his tools. > The problem is that you're asking way to much of the average > programmer, who doesn't understand transactions and isn't aware of how > little the operating system actually guarantees in this regard. It's not a problem with what I'm asking, or a problem with what the OS guarantees, it's a problem with "average programmers". BTW, I would disagree; I don't think that average programmers are that badly informed. If they are, then a CS degree is meaningless. > The other problem is that fsync() and two-phase-commit can seriously > limit application performance, unless you use highly sophisticated > techniques, which again rules out the average programmer. "Correct, fast, cheap. Pick two." > The fact is, it is very difficult to write "reliable services" on top > of the standard primitives, and it is not good enough to call people > morons if they don't understand this. 8-). My gut reaction was to write: You're right. We must also be compassionate, and train them how to properly ask ``Would you like fries with that?''. Frankly, I don't think it's possible to child-proof any career choice to the point that anyone can come in with zero assumptions or talent, and be productive. I personally have very little tolerance for people who get into any career field because of the money, rather than genuine interest in the field. It's my considered opinion that these people will not last out the next downturn, whenever that happens, and the world will not be a poorer place for it when they go off chasing the (then) more lucrative rewards in another field. Frankly, rewards are something that comes because of the work you do, not because of where you do it. It's like searching for the contact lens that you lost in the alley under the street-lamp "because the light is better". The whole dot-bomb thing happened because people wanted to be rewarded commensuarate with their job titles, rather than to their actual contributions to society (at large, or in the small of the company in which they were operating). If you think I regret the people with cardboard "will program for food" signs, think again. I might as well regret "Winter" for the effect it has on species survivability of tropical plants foolish people attempt to grow outdoors, in Ontario, Canada, or regret the effect that "Afternoon" has on Morning Glories. > There is a document describing our transactions design for ReiserFS > version 4, which is currently under development: > > http://namesys.com/txn-doc.html I've read it. I don't disagree, for that application domain, which is certainly a subset of all possible application domains (e.g. I'd never use transactions on a Usenet server). And just having it there doesn't mean that unclued people will automatically use it, if it require explicit invocation. > And somewhat off topic, I have demonstrated that using fsync() and > rename() as a means for reliable, atomic file updates can seriously > limit application performance and that having file system transactions > solves the problem. My point is that applications will perform > better, not worse, if the operating system helps construct reliable > services instead of this do-it-yourself approach. That's true as well, at least for applications that require that. I think your "average programmer too uninformed to know about building reliability from primitives" will be using those primitives, though, so long as they are optional. I also think making them non-optional is an error, in that it would perpetuate assumptions in that environment which were not valid to make in all environments. A scientist, even a computer scientist, needs to learn to think, and that involves coming at problems from first principles, among other things. FWIW: I pointed out that Soft Updates can be generalized to export a transaction interface, with a trivial amount of work, precisely because of the performance issues with barriers. That's the same reason I pointed out that DOW is inferior to Soft Updates: it introduces a draining barrier that interferes with concurrency. Given the patent status of DOW, it's really amazing to me that anyone would not opt for Soft Updates in any contest between the two for the ecological niche they fill, particularly in new work. > Master's thesis: > > http://prdownloads.sourceforge.net/xdelta/xdfs.pdf > > and the graph that shows it all: > > http://www.cs.berkeley.edu/~jmacd/xdfs-vs-rcs.eps Thanks for these references; I'll download them now, and read them later today. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 10:39:45 2002 Delivered-To: freebsd-fs@freebsd.org Received: from ns.caldera.de (ns.caldera.de [212.34.180.1]) by hub.freebsd.org (Postfix) with ESMTP id 71B0337B404 for ; Fri, 15 Mar 2002 10:39:38 -0800 (PST) Received: (from hch@localhost) by ns.caldera.de (8.11.6/8.11.6) id g2FIciS26567; Fri, 15 Mar 2002 19:38:44 +0100 Date: Fri, 15 Mar 2002 19:38:44 +0100 From: Christoph Hellwig To: Terry Lambert Cc: Josh MacDonald , Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: metadata update durability ordering/soft updates Message-ID: <20020315193844.A26441@caldera.de> References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3C923C91.454D7710@mindspring.com>; from tlambert2@mindspring.com on Fri, Mar 15, 2002 at 10:25:21AM -0800 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote: > > - The file system has never made any guarantees. > > Yes it has. If you look at the atime/mtime/ctime update > requirements for the OS, they are pretty blatant. THey > just aren't enough to be able to blindly use them. These requirements are only there for O_SYNC. > > - You can use fsync() to stabilize a single file and its metadata > > dependencies. > > Metadata stabilization should be automatic. What an fsync > there does is really enforce ordering on metadata writes, > by acting as a barrier. Why do you think there is fdatasync() (and O_DSYNC)? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 12: 3:58 2002 Delivered-To: freebsd-fs@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id 3C0DF37B402 for ; Fri, 15 Mar 2002 12:03:52 -0800 (PST) Received: from pool0434.cvx21-bradley.dialup.earthlink.net ([209.179.193.179] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16lxvR-000170-00; Fri, 15 Mar 2002 12:03:42 -0800 Message-ID: <3C925387.2DC4F2C0@mindspring.com> Date: Fri, 15 Mar 2002 12:03:19 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Christoph Hellwig Cc: Josh MacDonald , Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: metadata update durability ordering/soft updates References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <20020315193844.A26441@caldera.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Christoph Hellwig wrote: > On Fri, Mar 15, 2002 at 10:25:21AM -0800, Terry Lambert wrote: > > > - The file system has never made any guarantees. > > > > Yes it has. If you look at the atime/mtime/ctime update > > requirements for the OS, they are pretty blatant. THey > > just aren't enough to be able to blindly use them. > > These requirements are only there for O_SYNC. POSIX 1003.1, clauses 2.3.5 and 5.6.6.2 distinguish between "SHALL be marked for update" and "SHALL be updated" with regard to the ctime, mtime, and atime values for a file, which are FS metadata. See also 5.5.3.2. The relevent phrases are: 2.3.5 [ ... ] All fields that are marked for update SHALL be updated when the file is no longer open by any process, or when a stat() or fstat() is performed on the file. Other times at which updates are done are unspecified. 5.6.6.2 [ ... ] The utime() function sets the access and modification times of the named file. 5.5.3.2 [ ... ] Upon successful completion, the rename() function SHALL mark for update the st_ctime and st_mtime fields of the parent directory of each file. The getdirentries update semantics (SHALL update) and the metadata modifications (SHALL update) are pretty unambiguous, as well. The Single UNIX Specification has similar controls on the marking for update in write, mmap, and other cases. The POSIX requirements are stiffer because of VMS, where directories were not implemented as files. I used to dislike it, but way back then, I was just starting out as a student, and didn't realize the transactional implications. The single UNIX specification also fails to specify things like the underlying system call(s) used to implement directory traversal. POSIX, however specifies that the atime "SHALL be updated" (as opposed to merely marked for update). We got around this requirement one project I was on by not using the behaviour specified system call interface to read the directory contents, and declaring that directories were not regular files for the FS in question. > > > - You can use fsync() to stabilize a single file and its metadata > > > dependencies. > > > > Metadata stabilization should be automatic. What an fsync > > there does is really enforce ordering on metadata writes, > > by acting as a barrier. > > Why do you think there is fdatasync() (and O_DSYNC)? Linux? It used to be called "O_WRITESYNC" back in the mid 1980's. The idea that an FS would not order your metadata for you, yet you would still have integrity requirements in such an environment, was simply unthinkable. The O_DSYNC came about because people invented the concept of unsynchronized metadata, which led to the ide that it should be possible to seperately cause data and metadata synchronization. IMO, there's really no excuse for unsynchornized metadata, and synchronous data writes exist only to avoid the system call overhead of seperately calling fsync(), and the OS overhead of having to synchronize all dirty pages instead of a region, based on the descriptor being used for the operation. You can make the same argument in FreeBSD actually: msync() doesn't limit itself to the range specified for the backing object, because it can't tell (there are no reverse maps); last time I looked at msync() in Linux and Solaris, it was true those places, too. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 12: 8:15 2002 Delivered-To: freebsd-fs@freebsd.org Received: from mail.fidnet.com (one.fidnet.com [216.229.64.71]) by hub.freebsd.org (Postfix) with SMTP id A945C37B400 for ; Fri, 15 Mar 2002 12:08:12 -0800 (PST) Received: (qmail 29310 invoked from network); 15 Mar 2002 20:08:05 -0000 Received: from beast.hexaneinc.com (HELO beast) (216.229.82.132) by one.fidnet.com with SMTP; 15 Mar 2002 20:08:04 -0000 From: "Matthew Rezny" To: "Paul Schenkeveld" Cc: "freebsd-fs@freebsd.org" Date: Fri, 15 Mar 2002 14:09:06 -0600 Reply-To: "Matthew Rezny" X-Mailer: PMMail 2000 Professional (2.10.2010) For Windows 2000 (5.0.2195;2) In-Reply-To: <20020315104815.A79816@psconsult.nl> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: Re: disks > 1TB Message-Id: <20020315200812.A945C37B400@hub.freebsd.org> Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org I haven't tried that. I think I need to clarify my points. 1) disklabel has a limit on how large a volume it can handle because it uses a signed variable to temporarily store a value that ultimately goes into an unsigned storage location. I see this as something that should be fixed in the source tree since its a change to one line of code to remove an annoyance that more people will soon run into given the increases in cheap storage. 2) The rest of the OS has similar problems using signed where unneeded which limits addressable disk space to 1TB. This is something that will begin to be a problem and need to be worked on. The quick fix is switch all to unsigned and raise the limit to 2TB. The long term solution is change it all to 64bit, but that changes the size of everything stored and so that change would take a lot more work to ensure that doesn't cause problems with alignment and storage. On Fri, 15 Mar 2002 10:48:16 +0100, Paul Schenkeveld wrote: >On Fri, Mar 15, 2002 at 02:35:31AM -0600, Matthew Rezny wrote: >> I just bought a 3ware 7810 controller and 8 160GB drives, which in >> RAID5 yields 1.04TB (real TB). Having previously seen statements that >> FFS limit is 64TB, I expected this to work. Unfortunately I found that >> the number of sectors becomes an issue. Looking through the mailing >> list history I see this has come up before and it will take a lot to >> solve, more than the spare time I have this weekend. The quick solution >> is make a 1TB filesystem and let the extra .04TB go to waste rather >> than try to patch the whole system. However, there is a slight problem >> with this, which is limits in the disklabel tool. The disklabel >> structure which is stored on disk uses u_int32_t for the number of >> sectors in the device. The disklabel tool uses int when interpretting >> all numbers in the getasciilabel() function. This limits disklabel to >> 1TB devices. If the declaration on line 964 of disklabel.c is changed >> from "int v" to "u_int32_t v" then this limit is lifted. This change is >> safe since the actual value on disk is unsigned. Using unsigned in the >> input allow disklabel to work with devices up to 2TB. This allows >> creation of 1TB slices on devices >1TB so that at least part can be >> used in the meantime while we wait for the limit to be lifted elsewhere >> in the system. > >Did you try to divide the disk in two FreeBSD slices using fdisk? >The numbers in disklabel are relative to the fdisk slice so your >xx0s1c partition is the same size as the fdisk slice. > >> Also, I've seen one mention of 4TB systems in the mailing list >> archives. How was this done? Kernel patches, other trickery? > >-- >Paul Schenkeveld, Consultant >PSconsult ICT Services BV To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 12:56:38 2002 Delivered-To: freebsd-fs@freebsd.org Received: from roc-24-169-102-121.rochester.rr.com (216-42-72-146.ppp.netsville.net [216.42.72.146]) by hub.freebsd.org (Postfix) with ESMTP id CC20437B41B for ; Fri, 15 Mar 2002 12:56:32 -0800 (PST) Received: from localhost ([127.0.0.1] helo=tiny) by roc-24-169-102-121.rochester.rr.com with esmtp (Exim 3.16 #4) id 16lyUC-0001tB-00; Fri, 15 Mar 2002 15:39:36 -0500 Date: Fri, 15 Mar 2002 15:39:36 -0500 From: Chris Mason To: Terry Lambert , Josh MacDonald Cc: Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates Message-ID: <1562810000.1016224776@tiny> In-Reply-To: <3C923C91.454D7710@mindspring.com> References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> X-Mailer: Mulberry/2.1.0 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Friday, March 15, 2002 10:25:21 AM -0800 Terry Lambert wrote: > Josh MacDonald wrote: >> Terry, >> >> I'm not sure what you're talking about with regards to DOW and >> ReiserFS. It doesn't sound right, and I'm pretty sure we're not using >> anything like the patented DOW technique as you've described it. > > As usual, the patent claims are general enough to cover things; > see: > > US: 5666532 > US: 5642501 > > Here is the USPTO patent number search engine: > > http://164.195.100.11/netahtml/srchnum.htm > I haven't read the entire patent, but maybe you can point me to the paragraphs where it covers write-ahead logging in the description. Durning any operation, no attempt at all is made to order the writing of the bitmap, the inode, the directory entries, or any other part of the metadata. It simply makes sure that after a crash the operations are either completed or not. If you mkdir foo and then mkdir foo2, it is entirely possible the blocks for foo2 go to disk first. The reiserfs log is also not a generic system module loosely coupled from with the rest of the filesystem. -chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 16: 9:51 2002 Delivered-To: freebsd-fs@freebsd.org Received: from albatross.prod.itd.earthlink.net (albatross.mail.pas.earthlink.net [207.217.120.120]) by hub.freebsd.org (Postfix) with ESMTP id 2587837B400 for ; Fri, 15 Mar 2002 16:09:48 -0800 (PST) Received: from pool0278.cvx21-bradley.dialup.earthlink.net ([209.179.193.23] helo=mindspring.com) by albatross.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16m1lO-0005n4-00; Fri, 15 Mar 2002 16:09:34 -0800 Message-ID: <3C928D21.404EA11D@mindspring.com> Date: Fri, 15 Mar 2002 16:09:05 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Chris Mason Cc: Josh MacDonald , Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Chris Mason wrote: > I haven't read the entire patent, but maybe you can point me to the > paragraphs where it covers write-ahead logging in the description. A subset of writes to secondary storage are performed using a Delayed Ordered Write (DOW) subsystem, which makes it possible for any file system to control the order in which modifications are propagated to disk. The DOW subsystem consists of two parts. The first part is a specification interface, which a file system implementation or any other kernel subsystem can use to indicate sequential ordering between a modification and some other modification of file system structural data. This is the write-ahead log. The only difference is where it's stored: in memory or on disk. The second part of DOW subsystem is a mechanism that ensures that the disk write operations are indeed performed in accordance with the order store. DOW improves computer system performance by reducing disk traffic as well as the number of context switches that would be generated if synchronous writes were used for ordering. See also claims 1, 6, 23, and 44. > Durning any operation, no attempt at all is made to order the writing > of the bitmap, the inode, the directory entries, or any other part of > the metadata. It simply makes sure that after a crash the operations > are either completed or not. If you mkdir foo and then mkdir foo2, > it is entirely possible the blocks for foo2 go to disk first. I didn't say it infringed Soft Updates (which does this), I said it infringed DOW (which doesn't). Soft Updates aren't infringible, in any case, since they are not patented. > The reiserfs log is also not a generic system module loosely coupled > from with the rest of the filesystem. The patent claims are generic enough that they could cover either case. Software patents are process patents, not performance patents. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 16:17: 0 2002 Delivered-To: freebsd-fs@freebsd.org Received: from albatross.prod.itd.earthlink.net (albatross.mail.pas.earthlink.net [207.217.120.120]) by hub.freebsd.org (Postfix) with ESMTP id E60F937B41F for ; Fri, 15 Mar 2002 16:16:41 -0800 (PST) Received: from pool0278.cvx21-bradley.dialup.earthlink.net ([209.179.193.23] helo=mindspring.com) by albatross.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16m1sB-0006rf-00; Fri, 15 Mar 2002 16:16:36 -0800 Message-ID: <3C928EC6.14363297@mindspring.com> Date: Fri, 15 Mar 2002 16:16:06 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Chris Mason , Josh MacDonald , Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> <3C928D21.404EA11D@mindspring.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Terry Lambert wrote: > This is the write-ahead log. The only difference is where it's > stored: in memory or on disk. [ ... ] > See also claims 1, 6, 23, and 44. If this is still confusing, consider whether or not you would have to cite this patent if you were filing a patent for what ReiserFS does (I think the answer is "yes"). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Mar 15 18:26:12 2002 Delivered-To: freebsd-fs@freebsd.org Received: from magic.adaptec.com (magic.adaptec.com [208.236.45.80]) by hub.freebsd.org (Postfix) with ESMTP id 0000137B430; Fri, 15 Mar 2002 18:25:50 -0800 (PST) Received: from redfish.adaptec.com (redfish.adaptec.com [162.62.50.11]) by magic.adaptec.com (8.10.2+Sun/8.10.2) with ESMTP id g2G2Poj27257; Fri, 15 Mar 2002 18:25:50 -0800 (PST) Received: from btc.btc.adaptec.com (btc.btc.adaptec.com [162.62.64.10]) by redfish.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id SAA22321; Fri, 15 Mar 2002 18:25:49 -0800 (PST) Received: from hollin.btc.adaptec.com (hollin [162.62.149.56]) by btc.btc.adaptec.com (8.8.8+Sun/8.8.8) with ESMTP id TAA17335; Fri, 15 Mar 2002 19:25:47 -0700 (MST) Received: (from scottl@localhost) by hollin.btc.adaptec.com (8.11.6/8.11.6) id g2G2NZM00263; Fri, 15 Mar 2002 19:23:35 -0700 (MST) (envelope-from scottl) Date: Fri, 15 Mar 2002 19:02:26 -0700 From: Scott Long To: Chris Dillon Cc: freebsd-scsi@freebsd.org, freebsd-fs@freebsd.org Subject: Re: CD-MRW a.k.a Mt. Rainier support Message-ID: <20020316020226.GA12097@bunsenhoneydew.btc.adaptec.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.28i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Wed, Mar 13, 2002 at 11:59:06AM -0600, Chris Dillon wrote: > > CC'd to freebsd-fs since this is somewhat fs-related... > > Is anyone working on implementing support for CD-MRW (apparently > included in MMC-3) into either the SCSI cd driver or the ATAPI cd > driver? Where/how would be the best place to implement this so that > it will work with either ATAPI or SCSI drives? Would implementing it > in the SCSI cd driver be best, since we now have the option of using > ATAPI drives with CAM? > > In case anyone is wondering what CD-MRW (Mt. Rainier Re-Writable) is, > it is a new standard (currently only available in the Yamaha CRW3200 > series, that I know of), that allows on-the-fly transparent > formatting, hardware defect management, and 2K-block logical > addressing of CD-RW discs and specifies a specialized UDF filesystem > to be used along with these hardware abilities. This will make drives > supporting this standard act like a more traditional magnetic-media > removable drive, thus greatly simplifying reading/writing to CD-RW > discs. Since MRW uses a new format it is not backwards compatible > with any existing CD-RW formats, though it is possible to _read_ a MRW > formatted disc in a regular drive with the proper software support. > MRW uses UDF as its standard filesystem, which we do not yet support, > though I envision using the hardware MRW support of the drive to put > just about anything you want onto it, including FAT or UFS, to use it > as a "regular" drive. > > I'd love to take a shot at implementing this if someone isn't already, > though I'll need to find the specs for the hardware side of Mt. > Rainier. Apprently it is implemented in the new MMC-3 command set. > Anyone have any pointers? This drive sounds very interesting. Unfortunaley, until the standard becomes ubiquitous, any UDF implementation will still need to understand read-modify-write and sparing tables. UDF is the natural format for it since with removable media you want inter-changability with other systems, but there should be nothing stopping you from putting UFS on it too. I've already started a UDF implementation for FreeBSD. Patches for 5.0-CURRENT can be found at http://people.freebsd.org/~scottl, along with a link for slightly older -STABLE patches. The current status is that CD-RWs and DVD-ROMs can be read (though Sparing Tables are still missing for CD-RW), and once I've cleaned up and filled in the code some more, I intend for it to go into 5.0-RELEASE. I'd welcome any help on the project, escpecially if someone wants to tackle the writing support. Scott To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Mar 16 9:17:56 2002 Delivered-To: freebsd-fs@freebsd.org Received: from roc-24-169-102-121.rochester.rr.com (216-42-72-146.ppp.netsville.net [216.42.72.146]) by hub.freebsd.org (Postfix) with ESMTP id 9C76637B402 for ; Sat, 16 Mar 2002 09:17:47 -0800 (PST) Received: from localhost ([127.0.0.1] helo=tiny) by roc-24-169-102-121.rochester.rr.com with esmtp (Exim 3.16 #4) id 16mHn8-0003VC-00; Sat, 16 Mar 2002 12:16:26 -0500 Date: Sat, 16 Mar 2002 12:16:26 -0500 From: Chris Mason To: Terry Lambert Cc: Josh MacDonald , Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates Message-ID: <1714680000.1016298986@tiny> In-Reply-To: <3C928D21.404EA11D@mindspring.com> References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> <3C928D21.404EA11D@mindspring.com> X-Mailer: Mulberry/2.1.0 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On Friday, March 15, 2002 04:09:05 PM -0800 Terry Lambert wrote: > Chris Mason wrote: >> I haven't read the entire patent, but maybe you can point me to the >> paragraphs where it covers write-ahead logging in the description. > > A subset of writes to secondary storage are performed > using a Delayed Ordered Write (DOW) subsystem, which > makes it possible for any file system to control the > order in which modifications are propagated to disk. > The DOW subsystem consists of two parts. The first > part is a specification interface, which a file system > implementation or any other kernel subsystem can use > to indicate sequential ordering between a modification > and some other modification of file system structural > data. > > This is the write-ahead log. The only difference is where it's > stored: in memory or on disk. Well, I'm certainly not a patent lawyer, but the way they define this interface seems very different from the way reiserfs works. The interface a) is not available to the kernel or any other subsystem and b) does not define ordering. It defines atomic units consisting of multiple operations. > The second part of DOW subsystem is a mechanism that > ensures that the disk write operations are indeed > performed in accordance with the order store. DOW > improves computer system performance by reducing disk > traffic as well as the number of context switches > that would be generated if synchronous writes were > used for ordering. > > See also claims 1, 6, 23, and 44. Claim 44 is probably the most difficult, although I think this: "where said common writes and said function calls have common order dependencies CD1, CD2, . . . , CDcd that preserve the update order dependencies D1, D2, . . . , Dd between the operations in the requests, where cd is an integer, " Restricts it to systems that preserve the ordering of the requests inside the combined common write. In other words, if I batch mkdir foo ; mkdir foo2 into a common write, I think it says that mkdir foo will be done first. > > >> Durning any operation, no attempt at all is made to order the writing >> of the bitmap, the inode, the directory entries, or any other part of >> the metadata. It simply makes sure that after a crash the operations >> are either completed or not. If you mkdir foo and then mkdir foo2, >> it is entirely possible the blocks for foo2 go to disk first. > > I didn't say it infringed Soft Updates (which does this), I > said it infringed DOW (which doesn't). I got that idea from this paragraph: "DOW includes two parts. The first part is an interface by which file system implementations, or any kernel subsystem, specify the sequences in which modifications of file system data blocks can be recorded on disks. These sequences translate into ordering dependencies among disk blocks themselves, which are collectively represented by an ordering graph (entries in an ordering store), prepared by DOW in response to the specification." If this has been discussed in detail already, please drop me a link to the mailing list archive. -chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Mar 16 13:41:40 2002 Delivered-To: freebsd-fs@freebsd.org Received: from avocet.prod.itd.earthlink.net (avocet.mail.pas.earthlink.net [207.217.120.50]) by hub.freebsd.org (Postfix) with ESMTP id 6E4E537B43B for ; Sat, 16 Mar 2002 13:41:36 -0800 (PST) Received: from dialup-209.245.143.72.dial1.sanjose1.level3.net ([209.245.143.72] helo=mindspring.com) by avocet.prod.itd.earthlink.net with esmtp (Exim 3.33 #1) id 16mLvY-0007DT-00; Sat, 16 Mar 2002 13:41:25 -0800 Message-ID: <3C93BBF1.7E8801DF@mindspring.com> Date: Sat, 16 Mar 2002 13:41:05 -0800 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Chris Mason Cc: Josh MacDonald , Parity Error , freebsd-fs@FreeBSD.ORG, reiserfs-dev@namesys.com Subject: Re: [reiserfs-dev] Re: metadata update durability ordering/soft updates References: <3C910C57.71C2D823@mindspring.com> <20020315065651.02637@helen.CS.Berkeley.EDU> <3C923C91.454D7710@mindspring.com> <1562810000.1016224776@tiny> <3C928D21.404EA11D@mindspring.com> <1714680000.1016298986@tiny> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org Chris Mason wrote: > Claim 44 is probably the most difficult, although I think this: > > "where said common writes and said function calls have common order > dependencies CD1, CD2, . . . , CDcd that preserve the update order > dependencies D1, D2, . . . , Dd between the operations in the requests, > where cd is an integer, " > > Restricts it to systems that preserve the ordering of the requests > inside the combined common write. In other words, if I batch > mkdir foo ; mkdir foo2 into a common write, I think it says that > mkdir foo will be done first. I can tell you from my experience with the source code that this is not true, unless both updates occur in the same directory entry block of the same directory. > If this has been discussed in detail already, please drop me a link > to the mailing list archive. It has come up on a number of mailing lists in the past; the FreeBSD mailing lists generally get a snapshot of it whenever anyone suggests porting ReiserFS to FreeBSD. Do a search for "ReiserFS" in the FreeBSD mailing list archives, and you should be able to find it. Personally, I'd prefer not to discuss it in the level of detail required for a legal defense against patent claims, since I believe that ReiserFS would lose, and I'd rather not be the person manufacturing the bullets for the gun that shoots it. Realize that Novell holds the patents that were executed (such that they could then be assigned) during the time that USL was owned by Novell. So SCO buying USL and Caldera buying SCO doesn't give those patents a "get out of jail free" card. 8-(. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message