From owner-freebsd-fs Mon Sep 18 8:56: 2 2000 Delivered-To: freebsd-fs@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id ACFBA37B422; Mon, 18 Sep 2000 08:55:53 -0700 (PDT) Received: from newsguy.com (p03-dn03kiryunisiki.gunma.ocn.ne.jp [210.232.224.132]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id AAA11656; Tue, 19 Sep 2000 00:55:43 +0900 (JST) Message-ID: <39C63ACD.441658CC@newsguy.com> Date: Tue, 19 Sep 2000 00:54:53 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: Marc Tardif Cc: freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marc Tardif wrote: > > What is the FreeBSD naming convention for devices of disk slices and > labels? Considering my system is installed on the first partition of > /dev/wd0 (non-dedicated), these are the block-device interfaces I > have to my disk: > > wd0 wd0c wd0f wd0s1 wd0s1c wd0s1f wd0s2 > wd0a wd0d wd0g wd0s1a wd0s1d wd0s1g wd0s3 > wd0b wd0e wd0h wd0s1b wd0s1e wd0s1h wd0s4 That's for up to 3.x. From 4.x, the device name is ad. > Questions: > 1. What are wd0[a-h] used for? 1) Dangerously Dedicated Disks (no slices). 2) Compatibility mode (an ugly hack) alias for the first FreeBSD slice on that disk. > 2. If wd0s1 is my first slice, why isn't it named wd0s0? Because a slice is what DOS calls a "partition table". They are numbered from 1 on, so we decided to keep the numbering to make things less confusing (which, if you think of it, is pretty silly with the partition/slice confusion). > 3. If I format wd0s2 as any type (Xenix for example), > will /dev now contain wd0s2[a-h]? No. 1. /dev is not a "magical" directory. It contains only what you put in there. 2. If you happened to have devfs, which _is_ "magical", partitions still require that a partition table exists in the slice. > Assuming /dev/wd0s2 contains a few blocks, ie /dev/wd0s1 > doesn't span to the end of disk: > 4. If I want to use /dev/wd0s2 as a raw slice for reading > and writing, what are the steps to follow? None. You just use it. > 4a. Do I need to format the partition as any type? If so > is there a recommended type (perhaps one which won't > be recognised by the bootloader would be preferable)? No, you don't need to format it, nor do you need to worry about it's type. Just make sure the slice does exist. > 4b. Should I then be using /dev/rwd0s2 or /dev/rwd0s2a > for reading and writing (of course, this is assuming > block i/o of multiples of 512 bytes)? Nope, using raw devices is almost always wrong, and we even got rid of raw device in latter versions of FreeBSD. A "raw" device is an _unbuffered_ device. It has nothing to do with formats or types. Anyway, you should be using /dev/wd0s2. Unless you partition the slice, and want to use the "a" partition. > Lastly, where else could I have found this information other > than asking on the FreeBSD mailing list? Beats me, but it _should_ be in the handbook. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.secret.bsdconspiracy.net "I demand that my picture show a handsome face, even if it doesn't look like me." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 9: 1:46 2000 Delivered-To: freebsd-fs@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id 9793337B423; Mon, 18 Sep 2000 09:01:37 -0700 (PDT) Received: from newsguy.com (p03-dn03kiryunisiki.gunma.ocn.ne.jp [210.232.224.132]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id BAA13070; Tue, 19 Sep 2000 01:00:52 +0900 (JST) Message-ID: <39C63C03.2C4C26F8@newsguy.com> Date: Tue, 19 Sep 2000 01:00:03 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: Marc Tardif Cc: "Aleksandr A.Babaylov" , freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marc Tardif wrote: > > If I understand correctly, wd0[a-h] will be the same as wd0s3[a-h] in a > situation where DOS is on first slice, Linux on second and FreeBSD on > third, right? But what if the fourth slice is also FreeBSD? In such a Right. > case, I'll assume you meant "booted slice" instead of "first slice", where > the slice selected when booting will be referred to by the OS as wd0[a-h] > which would translate to "current slice". Confirmation of my assumption > would be appreciated. Nope, he means "first slice", not "booted slice". I think, at some point, it might have changed to "active partition". But the fact is that the wd0[a-h] hack is very gross. Stay away from it, and always use full device specification. > > > 2. If wd0s1 is my first slice, why isn't it named wd0s0? > > wd0s0 == wd0 > > wd0s0a == wd0a > > > I somehow doubt that. Considering wd0s* goes from 1 to 4 inclusively, I > would tend to believe the first slice is wd0s1. The above is incorrect, he misunderstood your question. > > > 4. If I want to use /dev/wd0s2 as a raw slice for reading > > > and writing, what are the steps to follow? > > You can't write several blocks near /dev/wd0s2 beginning. > > Use /dev/wd0 with proper address > > > That is rather risky. Wouldn't it be safer to have a device name I could > dedicate to some purpose. In such a case, I could chown the device to an > appropriate username and group. Furthermore, I could avoid the unfortunate > mistake of overwriting my current FreeBSD fs in case I get the addresses > wrong. He is incorrect. You can use /dev/wd0s2 any way you want, as long as you have nothing of value there. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.secret.bsdconspiracy.net "I demand that my picture show a handsome face, even if it doesn't look like me." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 11:27:10 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.sunesi.net (ns1.sunesi.net [196.15.192.194]) by hub.freebsd.org (Postfix) with ESMTP id 0F83637B422; Mon, 18 Sep 2000 11:27:05 -0700 (PDT) Received: from nbm by ns1.sunesi.net with local (Exim 3.03 #1) id 13b5cq-000K0n-00; Mon, 18 Sep 2000 20:26:44 +0200 Date: Mon, 18 Sep 2000 20:26:44 +0200 From: Neil Blakey-Milner To: "Daniel C. Sobral" Cc: Marc Tardif , freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention Message-ID: <20000918202644.A76911@mithrandr.moria.org> References: <39C63ACD.441658CC@newsguy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <39C63ACD.441658CC@newsguy.com>; from dcs@newsguy.com on Tue, Sep 19, 2000 at 12:54:53AM +0900 Organization: Sunesi Clinical Systems X-Operating-System: FreeBSD 3.3-RELEASE i386 X-URL: http://rucus.ru.ac.za/~nbm/ Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Tue 2000-09-19 (00:54), Daniel C. Sobral wrote: > > Lastly, where else could I have found this information other > > than asking on the FreeBSD mailing list? > > Beats me, but it _should_ be in the handbook. A basic device naming overview, as well as some simple disk layout information, is available in http://www.FreeBSD.org/handbook/disks.html Of course, noone reads documentation, so I don't know why I bother. (: Neil -- Neil Blakey-Milner Sunesi Clinical Systems nbm@mithrandr.moria.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 11:39:22 2000 Delivered-To: freebsd-fs@freebsd.org Received: from Gloria.CAM.ORG (Gloria.CAM.ORG [205.151.116.34]) by hub.freebsd.org (Postfix) with ESMTP id 2E06737B423; Mon, 18 Sep 2000 11:39:17 -0700 (PDT) Received: from localhost (intmktg@localhost) by Gloria.CAM.ORG (8.9.3/8.9.3) with ESMTP id OAA03654; Mon, 18 Sep 2000 14:42:28 -0400 Date: Mon, 18 Sep 2000 14:42:28 -0400 (EDT) From: Marc Tardif To: "Daniel C. Sobral" Cc: freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention In-Reply-To: <39C63ACD.441658CC@newsguy.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org [ snip ] > > Assuming /dev/wd0s2 contains a few blocks, ie /dev/wd0s1 > > doesn't span to the end of disk: > > 4. If I want to use /dev/wd0s2 as a raw slice for reading > > and writing, what are the steps to follow? > > None. You just use it. > This is what I have in fdisk (from /stand/sysinstall): Offset Size End Name PType Desc Subtype Flags 0 63 62 - 6 unused 0 63 1937565 1937627 wd0s1 3 freebsd 165 C 1937628 191268 2128895 - 6 unused 0 At this point, the second slice does not exist yet so I can't use it. For problems in defining a slice, see next question. > > 4a. Do I need to format the partition as any type? If so > > is there a recommended type (perhaps one which won't > > be recognised by the bootloader would be preferable)? > > No, you don't need to format it, nor do you need to worry about it's > type. Just make sure the slice does exist. > When I define a slice, I need to specify what fdisk (from sysinstall) calls a "partition type". In the case of my FreeBSD slice, I selected "165". In the case of a slice I will use for raw io, is there any reason I should use one partition type rather than another? > > 4b. Should I then be using /dev/rwd0s2 or /dev/rwd0s2a > > for reading and writing (of course, this is assuming > > block i/o of multiples of 512 bytes)? > > Nope, using raw devices is almost always wrong, and we even got rid of > raw device in latter versions of FreeBSD. A "raw" device is an > _unbuffered_ device. It has nothing to do with formats or types. > Got rid of raw devices in later versions of FreeBSD? What if I purposely want unbuffered io? There are instances, such as with databases, where the buffer cache is useless. I understand that in many cases, databases using the raw device practically reinvent the wheel by programming what is effectively another filesystem (which, by the way, is most likely slower than bsd's ffs). Even Oracle, which used to be one of the "you gotta use a raw partition if you want any speed at all" type, has moved into the "use a normal partitoin or regular file unless you do things like sharing a RAID between two hosts" camp. Yet, there are still isolated cases where raw io can be beneficial. What should I do for raw io in later versions of FreeBSD? > Anyway, you should be using /dev/wd0s2. Unless you partition the slice, > and want to use the "a" partition. > If I will be storing a few tables in /dev/wd0s2 of a predefined block aligned size, would it be advisable to use the 165 partition type for /dev/wd0s2 and create labels which will effectively become my tables? If this actually makes sense (fat chance), is there any reason I should be creating mount points? Or, if it would be better to define the labels as swap (assuming I already have a swap label in /dev/wd0s1), could FreeBSD inadvertently use those swap partitions and overwrite my data? Marc To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 12:10:42 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ns1.sunesi.net (ns1.sunesi.net [196.15.192.194]) by hub.freebsd.org (Postfix) with ESMTP id D9C4037B424; Mon, 18 Sep 2000 12:10:27 -0700 (PDT) Received: from nbm by ns1.sunesi.net with local (Exim 3.03 #1) id 13b6Ik-000KAS-00; Mon, 18 Sep 2000 21:10:02 +0200 Date: Mon, 18 Sep 2000 21:10:02 +0200 From: Neil Blakey-Milner To: Marc Tardif Cc: "Daniel C. Sobral" , freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention Message-ID: <20000918211002.A77486@mithrandr.moria.org> References: <39C63ACD.441658CC@newsguy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from intmktg@CAM.ORG on Mon, Sep 18, 2000 at 02:42:28PM -0400 Organization: Sunesi Clinical Systems X-Operating-System: FreeBSD 3.3-RELEASE i386 X-URL: http://rucus.ru.ac.za/~nbm/ Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon 2000-09-18 (14:42), Marc Tardif wrote: > > > 4b. Should I then be using /dev/rwd0s2 or /dev/rwd0s2a > > > for reading and writing (of course, this is assuming > > > block i/o of multiples of 512 bytes)? > > > > Nope, using raw devices is almost always wrong, and we even got rid of > > raw device in latter versions of FreeBSD. A "raw" device is an > > _unbuffered_ device. It has nothing to do with formats or types. > > > Got rid of raw devices in later versions of FreeBSD? What if I purposely > want unbuffered io? There are instances, such as with databases, where the > buffer cache is useless. > > I understand that in many cases, databases using the raw device > practically reinvent the wheel by programming what is effectively another > filesystem (which, by the way, is most likely slower than bsd's ffs). Even > Oracle, which used to be one of the "you gotta use a raw partition if you > want any speed at all" type, has moved into the "use a normal partitoin or > regular file unless you do things like sharing a RAID between two hosts" > camp. > > Yet, there are still isolated cases where raw io can be beneficial. What > should I do for raw io in later versions of FreeBSD? We didn't get rid of raw devices. We got rid of block devices, and kept character devices. Neil -- Neil Blakey-Milner Sunesi Clinical Systems nbm@mithrandr.moria.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 12:23:19 2000 Delivered-To: freebsd-fs@freebsd.org Received: from peach.ocn.ne.jp (peach.ocn.ne.jp [210.145.254.87]) by hub.freebsd.org (Postfix) with ESMTP id 8243437B424; Mon, 18 Sep 2000 12:23:13 -0700 (PDT) Received: from newsguy.com (p44-dn02kiryunisiki.gunma.ocn.ne.jp [211.0.245.109]) by peach.ocn.ne.jp (8.9.1a/OCN/) with ESMTP id EAA04552; Tue, 19 Sep 2000 04:23:07 +0900 (JST) Message-ID: <39C66B69.96E57728@newsguy.com> Date: Tue, 19 Sep 2000 04:22:17 +0900 From: "Daniel C. Sobral" X-Mailer: Mozilla 4.7 [en] (Win98; I) X-Accept-Language: en,pt-BR MIME-Version: 1.0 To: Marc Tardif Cc: freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marc Tardif wrote: > > This is what I have in fdisk (from /stand/sysinstall): > Offset Size End Name PType Desc Subtype Flags > 0 63 62 - 6 unused 0 > 63 1937565 1937627 wd0s1 3 freebsd 165 C > 1937628 191268 2128895 - 6 unused 0 > > At this point, the second slice does not exist yet so I can't use it. For > problems in defining a slice, see next question. Really? I wouldn't expect FreeBSD to worry about the type being "unused". > When I define a slice, I need to specify what fdisk (from sysinstall) > calls a "partition type". In the case of my FreeBSD slice, I selected > "165". In the case of a slice I will use for raw io, is there any reason I > should use one partition type rather than another? A type serves to, well, check the type. :-) If you don't care about the type, nothing else will... in theory. I certainly can't see open("/dev/ad0s2", flags) checking the type for anything. Perhaps, it will, indeed, fail for "unused". But nothing more than that. Other things do worry about type, of course. boot0 will identify slices to be booted by their type. loader will list a slice's type, and try to find the disklabel for FreeBSD slices. > Got rid of raw devices in later versions of FreeBSD? What if I purposely > want unbuffered io? There are instances, such as with databases, where the > buffer cache is useless. Oh, sorry, I got things confused. You should verify the hour at which a message is replied for reliability... ;-) Anyway, raw devices are character devices, unbuffered. Then, there were the "block" devices, which were buffered. We got rid of the _block_ devices, not the raw devices. But, as we no longer have two types, we no longer prefix them with "r". > I understand that in many cases, databases using the raw device > practically reinvent the wheel by programming what is effectively another > filesystem (which, by the way, is most likely slower than bsd's ffs). Even > Oracle, which used to be one of the "you gotta use a raw partition if you > want any speed at all" type, has moved into the "use a normal partitoin or > regular file unless you do things like sharing a RAID between two hosts" > camp. > > Yet, there are still isolated cases where raw io can be beneficial. What > should I do for raw io in later versions of FreeBSD? Actually, there is little benefit in buffered device access. Buffering is better handled elsewhere and by other means. > > Anyway, you should be using /dev/wd0s2. Unless you partition the slice, > > and want to use the "a" partition. > > > If I will be storing a few tables in /dev/wd0s2 of a predefined block > aligned size, would it be advisable to use the 165 partition type for > /dev/wd0s2 and create labels which will effectively become my tables? If > this actually makes sense (fat chance), is there any reason I should be > creating mount points? Or, if it would be better to define the labels as > swap (assuming I already have a swap label in /dev/wd0s1), could FreeBSD > inadvertently use those swap partitions and overwrite my data? Well, you could, indeed, use slice type 165, and partition it. We are limited, though, to 6 partitions. c must always be the whole slice (minus the disklabel :), and d is better left unused for historical reasons. OTOH, using a partition (try to avoid using c -- if you want the whole slice, create a partition with the same data as c) would be cleaner, from the point of view of various utilities, than using a slice. You do lose a few sectors. -- Daniel C. Sobral (8-DCS) dcs@newsguy.com dcs@freebsd.org capo@the.secret.bsdconspiracy.net "I demand that my picture show a handsome face, even if it doesn't look like me." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 13:29: 3 2000 Delivered-To: freebsd-fs@freebsd.org Received: from Gloria.CAM.ORG (Gloria.CAM.ORG [205.151.116.34]) by hub.freebsd.org (Postfix) with ESMTP id ED7D637B43E; Mon, 18 Sep 2000 13:28:57 -0700 (PDT) Received: from localhost (intmktg@localhost) by Gloria.CAM.ORG (8.9.3/8.9.3) with ESMTP id QAA04450; Mon, 18 Sep 2000 16:31:03 -0400 Date: Mon, 18 Sep 2000 16:31:03 -0400 (EDT) From: Marc Tardif To: "Daniel C. Sobral" Cc: freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention In-Reply-To: <39C66B69.96E57728@newsguy.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > This is what I have in fdisk (from /stand/sysinstall): > > Offset Size End Name PType Desc Subtype Flags > > 0 63 62 - 6 unused 0 > > 63 1937565 1937627 wd0s1 3 freebsd 165 C > > 1937628 191268 2128895 - 6 unused 0 > > > > At this point, the second slice does not exist yet so I can't use it. For > > problems in defining a slice, see next question. > > Really? I wouldn't expect FreeBSD to worry about the type being > "unused". > If I try the following command as root, nothing is output: # hd /dev/rwd0s2 | head Also, I tried writing a little c program to mmap(2) and, if that fails, read(2) the device. Unfortunately, that didn't work either. It seems I do actually need to define the slice as some type. The reason is maybe to define the limits of the device. Therefore, the actual type is of little importance but knowing where the device starts and stops could be important for some reason. To make the system happy, I then defined the slice as "partition type" 0, but fdisk still displayed "unused". Maybe some obscur type which doesn't appear in the bootloader would be preferable, if I find one... If this kind of information is relevant to the mailing list, I'll post what I find. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 14:26:55 2000 Delivered-To: freebsd-fs@freebsd.org Received: from aaz.links.ru (aaz.links.ru [193.125.152.37]) by hub.freebsd.org (Postfix) with ESMTP id C0B7D37B422; Mon, 18 Sep 2000 14:26:50 -0700 (PDT) Received: (from babolo@localhost) by aaz.links.ru (8.9.3/8.9.3) id BAA27523; Tue, 19 Sep 2000 01:26:42 +0400 (MSD) Message-Id: <200009182126.BAA27523@aaz.links.ru> Subject: Re: device naming convention In-Reply-To: from "Marc Tardif" at "Sep 18, 0 04:31:03 pm" To: intmktg@CAM.ORG (Marc Tardif) Date: Tue, 19 Sep 2000 01:26:41 +0400 (MSD) Cc: dcs@newsguy.com, freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG From: "Aleksandr A.Babaylov" MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marc Tardif writes: > > > This is what I have in fdisk (from /stand/sysinstall): > > > Offset Size End Name PType Desc Subtype Flags > > > 0 63 62 - 6 unused 0 > > > 63 1937565 1937627 wd0s1 3 freebsd 165 C > > > 1937628 191268 2128895 - 6 unused 0 > > > > > > At this point, the second slice does not exist yet so I can't use it. For > > > problems in defining a slice, see next question. > > > > Really? I wouldn't expect FreeBSD to worry about the type being > > "unused". > > > If I try the following command as root, nothing is output: > # hd /dev/rwd0s2 | head > > Also, I tried writing a little c program to mmap(2) and, if that fails, > read(2) the device. Unfortunately, that didn't work either. It seems I do > actually need to define the slice as some type. The reason is maybe to > define the limits of the device. Therefore, the actual type is of little > importance but knowing where the device starts and stops could be > important for some reason. To make the system happy, I then defined the > slice as "partition type" 0, but fdisk still displayed "unused". Maybe > some obscur type which doesn't appear in the bootloader would be > preferable, if I find one... If this kind of information is relevant to > the mailing list, I'll post what I find. Use fdisk. See at begin and size of slices - nothing else affect until you boot or have another OS on your computer. in sysinstal last string mean that entryes for wd0s2, wd0s3 and wd0s4 are all size 0. So you can read EOF only. -- @BABOLO http://links.ru/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 16:40:42 2000 Delivered-To: freebsd-fs@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id E4BAE37B42C; Mon, 18 Sep 2000 16:40:36 -0700 (PDT) Received: from foo.osd.bsdi.com (root@foo.osd.bsdi.com [204.216.28.137]) by pike.osd.bsdi.com (8.11.0/8.9.3) with ESMTP id e8INe0i09558; Mon, 18 Sep 2000 16:40:00 -0700 (PDT) (envelope-from jhb@foo.osd.bsdi.com) Received: (from jhb@localhost) by foo.osd.bsdi.com (8.11.0/8.11.0) id e8INbh074640; Mon, 18 Sep 2000 16:37:43 -0700 (PDT) (envelope-from jhb) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Mon, 18 Sep 2000 16:37:42 -0700 (PDT) Organization: BSD, Inc. From: John Baldwin To: Marc Tardif Subject: Re: device naming convention Cc: freebsd-fs@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG, "Daniel C. Sobral" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 18-Sep-00 Marc Tardif wrote: >> > This is what I have in fdisk (from /stand/sysinstall): >> > Offset Size End Name PType Desc Subtype Flags >> > 0 63 62 - 6 unused 0 >> > 63 1937565 1937627 wd0s1 3 freebsd 165 C >> > 1937628 191268 2128895 - 6 unused 0 >> > >> > At this point, the second slice does not exist yet so I can't use it. For >> > problems in defining a slice, see next question. >> >> Really? I wouldn't expect FreeBSD to worry about the type being >> "unused". >> > If I try the following command as root, nothing is output: ># hd /dev/rwd0s2 | head Look at your output you pasted above. There is no entry with the name 'wd0s2'. There is just an entry with wd0s1. Note that those unused areas are not allocated into any existing slice at the moment, and are thus not the same as a slice with a subtype of '0' indicating that the slice itself is unused. -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.cslab.vt.edu/~jobaldwi/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 18:32:34 2000 Delivered-To: freebsd-fs@freebsd.org Received: from aaz.links.ru (aaz.links.ru [193.125.152.37]) by hub.freebsd.org (Postfix) with ESMTP id 776AE37B422; Mon, 18 Sep 2000 18:32:19 -0700 (PDT) Received: (from babolo@localhost) by aaz.links.ru (8.9.3/8.9.3) id FAA01937; Tue, 19 Sep 2000 05:32:10 +0400 (MSD) Message-Id: <200009190132.FAA01937@aaz.links.ru> Subject: Re: device naming convention In-Reply-To: <39C63C03.2C4C26F8@newsguy.com> from "Daniel C. Sobral" at "Sep 19, 0 01:00:03 am" To: dcs@newsguy.com (Daniel C. Sobral) Date: Tue, 19 Sep 2000 05:32:09 +0400 (MSD) Cc: intmktg@CAM.ORG, babolo@links.ru, freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG From: "Aleksandr A.Babaylov" MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Daniel C. Sobral writes: > Marc Tardif wrote: > > > > 4. If I want to use /dev/wd0s2 as a raw slice for reading > > > > and writing, what are the steps to follow? > > > You can't write several blocks near /dev/wd0s2 beginning. > > > Use /dev/wd0 with proper address > > > > > That is rather risky. Wouldn't it be safer to have a device name I could > > dedicate to some purpose. In such a case, I could chown the device to an > > appropriate username and group. Furthermore, I could avoid the unfortunate > > mistake of overwriting my current FreeBSD fs in case I get the addresses > > wrong. > He is incorrect. You can use /dev/wd0s2 any way you want, as long as you > have nothing of value there. It is English that I know bad. Labeling, partitioning so on I know MUCH better. So I take a time and dictionary and because of long letter I begin with conclusion. There is risky use any partition or slice with FreeBSD for arbitrary purposes. May be any file system except ffs can work improperly (msdos, ntfs, hpfs, 9660) Things are worst - ffs does not stable too, but this is far from subject of this letter. OK, lets verify. "cicuta/home/babolo(N)#" is prompt, number before is exit code. Look at test disk: 0cicuta/home/babolo(1)#fdisk wd0 ******* Working on device /dev/rwd0 ******* parameters extracted from in-core disklabel are: cylinders=1011 heads=15 sectors/track=44 (660 blks/cyl) parameters to be used for BIOS calculations are: cylinders=1011 heads=15 sectors/track=44 (660 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165,(FreeBSD/NetBSD/386BSD) start 0, size 50160 (24 Meg), flag 0 beg: cyl 0/ sector 1/ head 0; end: cyl 75/ sector 44/ head 14 The data for partition 2 is: sysid 0,(unused) start 50160, size 50160 (24 Meg), flag 0 beg: cyl 76/ sector 1/ head 0; end: cyl 151/ sector 44/ head 14 The data for partition 3 is: sysid 165,(FreeBSD/NetBSD/386BSD) start 100320, size 50160 (24 Meg), flag 0 beg: cyl 152/ sector 1/ head 0; end: cyl 227/ sector 44/ head 14 The data for partition 4 is: sysid 0,(unused) start 0, size 0 (0 Meg), flag 80 (active) beg: cyl 0/ sector 0/ head 0; end: cyl 0/ sector 0/ head 0 Now read slices: 0cicuta/home/babolo(2)#dd if=/dev/wd0s1 of=/dev/null bs=660b 76+0 records in 76+0 records out 25681920 bytes transferred in 17.175867 secs (1495233 bytes/sec) 0cicuta/home/babolo(3)#dd if=/dev/wd0s2 of=/dev/null bs=660b 1+1 records in 1+1 records out 368640 bytes transferred in 0.258831 secs (1424250 bytes/sec) 0cicuta/home/babolo(4)#dd if=/dev/wd0s3 of=/dev/null bs=660b 76+0 records in 76+0 records out 25681920 bytes transferred in 22.601690 secs (1136283 bytes/sec) 0cicuta/home/babolo(5)#dd if=/dev/wd0s4 of=/dev/null bs=660b 0+0 records in 0+0 records out 0 bytes transferred in 0.000036 secs (0 bytes/sec) 3 equal slices, test if wd0s2 has some error in it: 0cicuta/home/babolo(6)#dd if=/dev/wd0 of=/dev/null bs=660b 1011+0 records in 1011+0 records out 341637120 bytes transferred in 283.316634 secs (1205849 bytes/sec) Whole disk read successfully. What the test system is? 0cicuta/home/babolo(7)#uname -a FreeBSD cicuta.babolo.ru 2.2.7-RELEASE FreeBSD 2.2.7-RELEASE #0: Tue Dec 29 04:10:35 MSK 1998 babolo@cicuta.babolo.ru:/usr/src/sys/compile/cicuta i386 0cicuta/home/babolo(8)#dmesg [skip] wdc0 at 0x1f0-0x1f7 irq 14 flags 0x80ff80ff on isa wdc0: unit 0 (wd0): , 32-bit, multi-block-8 wd0: 325MB (667260 sectors), 1011 cyls, 15 heads, 44 S/T, 512 B/S wdc0: unit 1 (atapi): , removable, accel, dma, iordis wcd0: 171/1367Kb/sec, 128Kb cache, audio play, 255 volume levels, ejectable tray wcd0: no disc inside, unlocked, lock protected [skip] wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) wd0s2: raw partition size != slice size wd0s2: start 50160, end 100319, size 50160 wd0s2c: start 50160, end 50879, size 720 wd0s3: cannot find label (no disk label) The end of dmesg give the idea, why can't read whole wd0s2 slice. Now about white protected area in slice that is NOT marked 165 (FreeBSD): 0cicuta/home/babolo(9)#dd of=/dev/wd0s2 if=/dev/zero bs=660b dd: /dev/wd0s2: Invalid argument 3+0 records in 2+0 records out 675840 bytes transferred in 0.224398 secs (3011793 bytes/sec) 1cicuta/home/babolo(11)#od -b /dev/wd0s2 0000000 353 020 220 220 121 120 006 123 122 350 300 000 132 133 007 131 0000020 131 313 374 061 311 216 301 216 331 216 321 274 000 174 211 346 0000040 277 000 007 376 305 363 245 276 356 175 200 372 200 162 054 266 0000060 001 350 147 000 271 001 000 276 276 215 266 001 200 174 004 245 0000100 165 007 343 031 366 004 200 165 024 203 306 020 376 306 200 376 0000120 005 162 351 111 343 341 276 114 175 353 122 061 322 211 026 000 [skiped] 0017060 070 066 040 102 117 117 124 012 104 145 146 141 165 154 164 072 0017100 040 045 165 072 045 163 050 045 165 054 045 143 051 045 163 012 0017120 142 157 157 164 072 040 000 116 157 040 045 163 012 000 146 157 0017140 162 155 141 164 000 111 156 166 141 154 151 144 040 045 163 012 0017160 000 171 145 163 000 156 157 000 113 145 171 142 157 141 162 144 0017200 072 040 045 163 012 000 045 163 040 000 116 157 164 040 165 146 0017220 163 012 000 163 154 151 143 145 000 154 141 142 145 154 000 160 0017240 141 162 164 151 164 151 157 156 000 060 061 062 063 064 065 066 0017260 067 070 071 141 142 143 144 145 146 045 143 010 000 104 151 163 0017300 153 040 145 162 162 157 162 040 060 170 045 170 040 050 154 142 0017320 141 075 060 170 045 170 051 012 000 000 000 000 001 000 000 000 0017340 057 174 134 055 000 000 000 000 000 000 000 000 000 000 000 000 0017360 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 * 1320000 Why I use 2.2.7 for test? Because of my lovely 4.1-STABLE is extremly unstable with content of ad0s2 (wd0s2) above and silently reboot after the first dd in the test above. What is slices content? s1 - almost right FreeBSD label s2 - not a right FreeBSD label but similar enough to label. s3 - no label or similar at all. How to do such a content that screw the system? This is my way for this test: - shorten s2 to 3 cilinder. - disklabel -w -r wd0s2 fd360 - restore s2 size. How can you guarantee that occasionally some bits in slice do not fraud FreeBSD if used for arbitrary bits? Do not use slice begin at all. Does 4.1 behave similar? Yes, I know that. But it take some time to select bites in slice begin in such a way that 4.1 not reboot so friquently. You have idea and can test yourself. Remember - 4.1 is HIGHLY unstable during pseudolabel tests. -- @BABOLO http://links.ru/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 20:56:15 2000 Delivered-To: freebsd-fs@freebsd.org Received: from Gloria.CAM.ORG (Gloria.CAM.ORG [205.151.116.34]) by hub.freebsd.org (Postfix) with ESMTP id 6665C37B423; Mon, 18 Sep 2000 20:56:10 -0700 (PDT) Received: from localhost (intmktg@localhost) by Gloria.CAM.ORG (8.9.3/8.9.3) with ESMTP id XAA06712; Mon, 18 Sep 2000 23:59:14 -0400 Date: Mon, 18 Sep 2000 23:59:14 -0400 (EDT) From: Marc Tardif To: "Aleksandr A.Babaylov" Cc: "Daniel C. Sobral" , freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention In-Reply-To: <200009190132.FAA01937@aaz.links.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > 0cicuta/home/babolo(9)#dd of=/dev/wd0s2 if=/dev/zero bs=660b > 1cicuta/home/babolo(11)#od -b /dev/wd0s2 [ snip ] > Why I use 2.2.7 for test? > Because of my lovely 4.1-STABLE is extremly unstable with content of > ad0s2 (wd0s2) above and silently reboot after the first dd in the test above. > Assuming my wd0s2 is still unused and of size 0, 3.5-STABLE also crashes in the test above (no disk activity, ctrl-c doesn't work, alt-f# doesn't work either). Perhaps it eventually reboots, but I wasn't patient enough to wait that long. One solution to this problem is to specify the count blocks after which dd returns properly but still no bytes are copied. > What is slices content? > s1 - almost right FreeBSD label > s2 - not a right FreeBSD label but similar enough to label. > s3 - no label or similar at all. > How to do such a content that screw the system? > This is my way for this test: > - shorten s2 to 3 cilinder. > - disklabel -w -r wd0s2 fd360 > - restore s2 size. > I don't understand this last part, probably because I don't have much experience with labelling and partitioning. Please excuse my questions if they seem basic, but I am fairly new to disks: - how can s2 be "similar enough to label" if it is recognised as "sysid 0,(unused)" by fdisk? - how did you create s2 exactly, in order to make it "similar enough to label" yet remain unused? - how did you create s3 and s4 exactly? - why is s3 not similar at all if it is recognised as a FreeBSD slice by fdisk? - what do you mean by shortening s2 to 3 cylinders? Do you mean s2 should start at the third cylinder? - is there any reason you chose to label wd0s2 as fd360? - how should s2 size be restored? maybe: dd of=/dev/wd0s2 if=/dev/null bs=660b? > How can you guarantee that occasionally some > bits in slice do not fraud FreeBSD > if used for arbitrary bits? > Do not use slice begin at all. > I also didn't quite understand what is wrong with using the slice begin. Your octal dump showed how the first 017343 bytes were not nulls, but why? Is there a fixed number of bytes that should be skipped, or should this number be system dependent and tested manually? To avoid using the slice begin, could the first label be defined at a proper offset to skip the slice begin? Marc To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Sep 18 23:15:38 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id 7D87C37B424 for ; Mon, 18 Sep 2000 23:15:36 -0700 (PDT) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id XAA19970; Mon, 18 Sep 2000 23:12:49 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp02.primenet.com, id smtpdAAAQeaG.M; Mon Sep 18 23:12:43 2000 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id XAA13756; Mon, 18 Sep 2000 23:15:21 -0700 (MST) From: Terry Lambert Message-Id: <200009190615.XAA13756@usr02.primenet.com> Subject: Re: how mmap buffer writes handled? To: mbendiks@eunet.no (Marius Bendiksen) Date: Tue, 19 Sep 2000 06:15:20 +0000 (GMT) Cc: stein@eecs.harvard.edu (Christopher Stein), freebsd-fs@FreeBSD.ORG In-Reply-To: from "Marius Bendiksen" at Sep 13, 2000 03:27:25 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Most architectures that have an MMU, such as the x86, have a bit in their > page tables or equivalent that will indicate whether a page has been > modified since the last time that bit was cleared. This can be sampled and > cleared in one go. > > On architectures lacking an MMU, I think the logical approach would be to > use some of the protection facilities or such to force an exception to be > raised when accessing the page for write, and updating the statistics > based on that. An interesting exception to this is the 386, which will not cause a write fault on a protected page when in protected mode. That means if the kernel craps on the page, it is not correctly seen as being dirty. If you look at the VM code, you will see that the page is actually mapped elsewhere, and the page not present fault is trapped, and then looked up in a translation table, 8-p. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Sep 19 7:30:51 2000 Delivered-To: freebsd-fs@freebsd.org Received: from aaz.links.ru (aaz.links.ru [193.125.152.37]) by hub.freebsd.org (Postfix) with ESMTP id E2C0D37B422; Tue, 19 Sep 2000 07:30:43 -0700 (PDT) Received: (from babolo@localhost) by aaz.links.ru (8.9.3/8.9.3) id SAA26861; Tue, 19 Sep 2000 18:30:36 +0400 (MSD) Message-Id: <200009191430.SAA26861@aaz.links.ru> Subject: Re: device naming convention In-Reply-To: from "Marc Tardif" at "Sep 18, 0 11:59:14 pm" To: intmktg@CAM.ORG (Marc Tardif) Date: Tue, 19 Sep 2000 18:30:36 +0400 (MSD) Cc: babolo@links.ru, dcs@newsguy.com, freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG From: "Aleksandr A.Babaylov" MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marc Tardif writes: > > What is slices content? > > s1 - almost right FreeBSD label > > s2 - not a right FreeBSD label but similar enough to label. > > s3 - no label or similar at all. > > How to do such a content that screw the system? > > This is my way for this test: > > - shorten s2 to 3 cilinder. > > - disklabel -w -r wd0s2 fd360 > > - restore s2 size. > I don't understand this last part, probably because I don't have much > experience with labelling and partitioning. Please excuse my questions if > they seem basic, but I am fairly new to disks: > - how can s2 be "similar enough to label" if it is recognised > as "sysid 0,(unused)" by fdisk? sorry, content of s2 ... No, s2 has some bits at the begin that FreeBSD interpretes as label. "sysid 0,(unused)" has no sense - every sysid cant stop slice from been evaluated for label on it. > - how did you create s2 exactly, in order to make it "similar > enough to label" yet remain unused? in case I write about steps were: - fdisk -u wd0 create 3 slices of equal lenght 76 cylinders s1 - suid 165, s2 - suid 0, s3 - suid 165. - reboot - label s1 - I dont remember exact way, nothing special I believe. - fdisk -u wd0 change slice s2 with suid 0 and 3 cylinders (3*660 blocks) in size - disklabel -w -r wd0s2 fd360 - disklabel -e wd0s2 delete b:, mark a: unused and mark c: 4.2BSD - fdisk -u wd0 change size of s2 to 76 tracks. - reboot Now s2 has invalid (broken) label (or some bits that are similar to label) > - how did you create s3 and s4 exactly? s3 above, s4 is suid 0 start 0 size 0 > - why is s3 not similar at all if it is recognised as a > FreeBSD slice by fdisk? s3 has some scrap that is not recognized by FreeBSD as "label" Again - sysid has no sense if not used in boot process or another system, FreeBSD seek every slice for label independantly of sysid. > - what do you mean by shortening s2 to 3 cylinders? Do you > mean s2 should start at the third cylinder? After first fdisk I change s2 size only, not any other s2 parameter > - is there any reason you chose to label wd0s2 as fd360? It is the easyest way to write something to s2 that is similar to label. fd360 is first type in my /etc/disktypes > - how should s2 size be restored? maybe: > dd of=/dev/wd0s2 if=/dev/null bs=660b? No. change size wia fdisk > > How can you guarantee that occasionally some > > bits in slice do not fraud FreeBSD > > if used for arbitrary bits? > > Do not use slice begin at all. > I also didn't quite understand what is wrong with using the slice begin. > Your octal dump showed how the first 017343 bytes were not nulls, but why? > Is there a fixed number of bytes that should be skipped, or should this > number be system dependent and tested manually? If you use slice in such a way that in label area occur something that can be treated by OS as a FreeBSD label, then protection of label and boot area occur. label area IMHO 1K, boot area in any case ends before 32 block (first suberblock copy in ufs) As far as I understand (but I am not hard in this) just keep 4 bytes (addresses 0376, 00377, 00776, 00777) is sufficient > To avoid using the slice begin, could the first label be defined at a > proper offset to skip the slice begin? If NOT use FreeBSD label? How? If use FreeBSD label? just use FreeBSD partitions inside slice (M$ partiton)? May be. But I have example of misbehave such a conctruction in 2.2.X. Not tested in 4.1. Are you interested? 3.X not interesting at all. -- @BABOLO http://links.ru/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Sep 19 8:42:30 2000 Delivered-To: freebsd-fs@freebsd.org Received: from Gloria.CAM.ORG (Gloria.CAM.ORG [205.151.116.34]) by hub.freebsd.org (Postfix) with ESMTP id 22D4137B423; Tue, 19 Sep 2000 08:42:25 -0700 (PDT) Received: from localhost (intmktg@localhost) by Gloria.CAM.ORG (8.9.3/8.9.3) with ESMTP id LAA08606; Tue, 19 Sep 2000 11:45:40 -0400 Date: Tue, 19 Sep 2000 11:45:40 -0400 (EDT) From: Marc Tardif To: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: device timings Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Considering the following disk configuration: ******* Working on device /dev/rwd0 ******* parameters extracted from in-core disklabel are: cylinders=256 heads=132 sectors/track=63 (8316 blks/cyl) parameters to be used for BIOS calculations are: cylinders=256 heads=132 sectors/track=63 (8316 blks/cyl) Media sector size is 512 Warning: BIOS sector numbering starts with sector 1 Information from DOS bootblock is: The data for partition 1 is: sysid 165,(FreeBSD/NetBSD/386BSD) start 63, size 1937565 (946 Meg), flag 80 (active) beg: cyl 0/ sector 1/ head 1; end: cyl 232/ sector 63/ head 131 The data for partition 2 is: sysid 0,(unused) start 1937628, size 58212 (28 Meg), flag 0 beg: cyl 233/ sector 1/ head 0; end: cyl 239/ sector 63/ head 131 ... Now considering the following timings done with dd, how come I get such different transfer rates (bytes/sec) for s1 and s2? I understand there should be a difference between the block and character interface, as shown in the first two timings, but why isn't the same difference shown for the last two timings? # dd if=/dev/wd0s1 of=/dev/null bs=8316b count=5 5+0 records in 5+0 records out 21288960 bytes transferred in 8.580486 secs (2481090 bytes/sec) # dd if=/dev/rwd0s1 of=/dev/null bs=8316b count=5 5+0 records in 5+0 records out 21288960 bytes transferred in 4.058639 secs (5245344 bytes/sec) # dd if=/dev/wd0s2 of=/dev/null bs=8316b count=5 5+0 records in 5+0 records out 21288960 bytes transferred in 6.066568 secs (3509226 bytes/sec) # dd if=/dev/rwd0s2 of=/dev/null bs=8316b count=5 5+0 records in 5+0 records out 21288960 bytes transferred in 6.015735 secs (3538879 bytes/sec) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Sep 19 8:50:34 2000 Delivered-To: freebsd-fs@freebsd.org Received: from critter.freebsd.dk (flutter.freebsd.dk [212.242.40.147]) by hub.freebsd.org (Postfix) with ESMTP id 7056337B422; Tue, 19 Sep 2000 08:50:30 -0700 (PDT) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.0/8.9.3) with ESMTP id e8JFoQN91118; Tue, 19 Sep 2000 17:50:26 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Marc Tardif Cc: freebsd-fs@FreeBSD.ORG, freebsd-hackers@FreeBSD.ORG Subject: Re: device timings In-Reply-To: Your message of "Tue, 19 Sep 2000 11:45:40 EDT." Date: Tue, 19 Sep 2000 17:50:26 +0200 Message-ID: <91116.969378626@critter> From: Poul-Henning Kamp Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org In message , Marc Tard if writes: >Now considering the following timings done with dd, how come I get such >different transfer rates (bytes/sec) for s1 and s2? I understand there >should be a difference between the block and character interface, as shown >in the first two timings, but why isn't the same difference shown for the >last two timings? Because all modern disks use "zone-layout" where there are typically 50% more sectors in the outher cylinders compared to the inner cylinders. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD coreteam member | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Sep 19 8:58:35 2000 Delivered-To: freebsd-fs@freebsd.org Received: from urban.iinet.net.au (urban.iinet.net.au [203.59.24.231]) by hub.freebsd.org (Postfix) with ESMTP id 8A9AD37B43C; Tue, 19 Sep 2000 08:58:26 -0700 (PDT) Received: from popserver-02.iinet.net.au (popserver-02.iinet.net.au [203.59.24.148]) by urban.iinet.net.au (8.8.7/8.8.7) with ESMTP id XAA10960; Tue, 19 Sep 2000 23:58:21 +0800 Received: from jules.elischer.org (reggae-38-16.nv.iinet.net.au [203.59.172.16]) by popserver-02.iinet.net.au (8.9.3/8.9.3) with SMTP id XAA01515; Tue, 19 Sep 2000 23:58:17 +0800 Message-ID: <39C78D0E.7DE14518@elischer.org> Date: Tue, 19 Sep 2000 08:58:06 -0700 From: Julian Elischer X-Mailer: Mozilla 3.04Gold (X11; I; FreeBSD 5.0-CURRENT i386) MIME-Version: 1.0 To: Marc Tardif Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: device timings References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Modern disks pack different ammounts of data on different tracks.. (the outside tracks are longer right?) so at a constant speed (rpm) outside tracks have more data passing below the head per given time than teh inside tracks do... this seems pretty normal to me.. Marc Tardif wrote: > > Considering the following disk configuration: > ******* Working on device /dev/rwd0 ******* > parameters extracted from in-core disklabel are: > cylinders=256 heads=132 sectors/track=63 (8316 blks/cyl) > > parameters to be used for BIOS calculations are: > cylinders=256 heads=132 sectors/track=63 (8316 blks/cyl) > > Media sector size is 512 > Warning: BIOS sector numbering starts with sector 1 > Information from DOS bootblock is: > The data for partition 1 is: > sysid 165,(FreeBSD/NetBSD/386BSD) > start 63, size 1937565 (946 Meg), flag 80 (active) > beg: cyl 0/ sector 1/ head 1; > end: cyl 232/ sector 63/ head 131 > The data for partition 2 is: > sysid 0,(unused) > start 1937628, size 58212 (28 Meg), flag 0 > beg: cyl 233/ sector 1/ head 0; > end: cyl 239/ sector 63/ head 131 > ... > > Now considering the following timings done with dd, how come I get such > different transfer rates (bytes/sec) for s1 and s2? I understand there > should be a difference between the block and character interface, as shown > in the first two timings, but why isn't the same difference shown for the > last two timings? > > # dd if=/dev/wd0s1 of=/dev/null bs=8316b count=5 > 5+0 records in > 5+0 records out > 21288960 bytes transferred in 8.580486 secs (2481090 bytes/sec) > > # dd if=/dev/rwd0s1 of=/dev/null bs=8316b count=5 > 5+0 records in > 5+0 records out > 21288960 bytes transferred in 4.058639 secs (5245344 bytes/sec) > > # dd if=/dev/wd0s2 of=/dev/null bs=8316b count=5 > 5+0 records in > 5+0 records out > 21288960 bytes transferred in 6.066568 secs (3509226 bytes/sec) > > # dd if=/dev/rwd0s2 of=/dev/null bs=8316b count=5 > 5+0 records in > 5+0 records out > 21288960 bytes transferred in 6.015735 secs (3538879 bytes/sec) > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Perth v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Tue Sep 19 17:30:35 2000 Delivered-To: freebsd-fs@freebsd.org Received: from whistle.com (s205m131.whistle.com [207.76.205.131]) by hub.freebsd.org (Postfix) with ESMTP id 709B337B422; Tue, 19 Sep 2000 17:30:33 -0700 (PDT) Received: (from smap@localhost) by whistle.com (8.10.0/8.10.0) id e8K0UWX20934; Tue, 19 Sep 2000 17:30:32 -0700 (PDT) Received: from bubba.whistle.com( 207.76.205.7) by whistle.com via smap (V2.0) id xma020932; Tue, 19 Sep 2000 17:30:06 -0700 Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id RAA08442; Tue, 19 Sep 2000 17:30:06 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200009200030.RAA08442@bubba.whistle.com> Subject: disable write caching with softupdates? To: fs@freebsd.org Date: Tue, 19 Sep 2000 17:30:06 -0700 (PDT) Cc: sos@freebsd.org X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Isn't it safer (in the face of a power failure) to disable write caching on a hard disk when softupdates is in use? The ata driver currenly always enables write caching. Perhaps there should be a sysctl knob to turn it on/off? -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 0: 4:12 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id AA4E637B424; Wed, 20 Sep 2000 00:04:10 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id AAA14533; Wed, 20 Sep 2000 00:02:47 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpdAAA5laqvC; Wed Sep 20 00:02:39 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id AAA00154; Wed, 20 Sep 2000 00:03:59 -0700 (MST) From: Terry Lambert Message-Id: <200009200703.AAA00154@usr05.primenet.com> Subject: Re: disable write caching with softupdates? To: archie@whistle.com (Archie Cobbs) Date: Wed, 20 Sep 2000 07:03:59 +0000 (GMT) Cc: fs@FreeBSD.ORG, sos@FreeBSD.ORG In-Reply-To: <200009200030.RAA08442@bubba.whistle.com> from "Archie Cobbs" at Sep 19, 2000 05:30:06 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Isn't it safer (in the face of a power failure) to disable write > caching on a hard disk when softupdates is in use? Yes. You _must_ guarantee that the drive does not complete writes out of sequence that it reports having completed in sequence. Hardware which lies is evil. > The ata driver currenly always enables write caching. Perhaps > there should be a sysctl knob to turn it on/off? Write caching should _never_ be enabled, unless you don't care about the data, or the drive reports the operation queueing and completion seperately, so that the OS knows the completion order; even then, the OS will have to be prepared to stall writing new data until completion has occurred at any given synchronizatin point, so that it is impossible for the drive to complete the requests out of the order permitted by the OS. With regard to "_never_": even a sync mounted FS will not be recoverable to a deterministic state if write caching does not guarantee completion in FIFO order, for obvious reasons -- it doesn't matter if you go async in the kernel, or async in the drive, either way your data gets screwed. The only exception would bit if, like NetApp boxes, PrestoServ, and similar systems, your writes were intention logged to NVRAM before being scheduled. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 0: 4:39 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id 4400437B422 for ; Wed, 20 Sep 2000 00:04:37 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id JAA32523; Wed, 20 Sep 2000 09:04:32 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id JAA30216; Wed, 20 Sep 2000 09:04:32 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Wed, 20 Sep 2000 09:04:32 +0200 (CEST) From: Marius Bendiksen To: Terry Lambert Cc: Christopher Stein , freebsd-fs@FreeBSD.ORG Subject: Re: how mmap buffer writes handled? In-Reply-To: <200009190615.XAA13756@usr02.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > use some of the protection facilities or such to force an exception to be > > raised when accessing the page for write, and updating the statistics > > based on that. > as being dirty. If you look at the VM code, you will see that > the page is actually mapped elsewhere, and the page not present > fault is trapped, and then looked up in a translation table, 8-p. Indeed, so it does. But why would we have code to handle this on i386? Marius (PS: The method you mentioned would still qualify as "force an exception to be raised when accessing the page for write" ;) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 0:10:37 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id BA47F37B422 for ; Wed, 20 Sep 2000 00:10:34 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id JAA33746; Wed, 20 Sep 2000 09:10:32 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id JAA30253; Wed, 20 Sep 2000 09:10:32 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Wed, 20 Sep 2000 09:10:32 +0200 (CEST) From: Marius Bendiksen To: Archie Cobbs Cc: fs@FreeBSD.ORG Subject: Re: disable write caching with softupdates? In-Reply-To: <200009200030.RAA08442@bubba.whistle.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Isn't it safer (in the face of a power failure) to disable write > caching on a hard disk when softupdates is in use? Not necessarily safer, I'd think, if you refer to consistency. But certainly if you refer to the amount of data lost; but if that is a concern, you might want to decrease the syncer interval. > The ata driver currenly always enables write caching. Perhaps > there should be a sysctl knob to turn it on/off? *nod* This sounds like a good idea. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 0:12:59 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (Postfix) with ESMTP id 81DAA37B423 for ; Wed, 20 Sep 2000 00:12:57 -0700 (PDT) Received: (from daemon@localhost) by smtp01.primenet.com (8.9.3/8.9.3) id AAA13826; Wed, 20 Sep 2000 00:12:13 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp01.primenet.com, id smtpdAAA_dai5A; Wed Sep 20 00:12:03 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id AAA00357; Wed, 20 Sep 2000 00:12:38 -0700 (MST) From: Terry Lambert Message-Id: <200009200712.AAA00357@usr05.primenet.com> Subject: Re: how mmap buffer writes handled? To: mbendiks@eunet.no (Marius Bendiksen) Date: Wed, 20 Sep 2000 07:12:37 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), stein@eecs.harvard.edu (Christopher Stein), freebsd-fs@FreeBSD.ORG In-Reply-To: from "Marius Bendiksen" at Sep 20, 2000 09:04:32 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > use some of the protection facilities or such to force an exception to be > > > raised when accessing the page for write, and updating the statistics > > > based on that. > > as being dirty. If you look at the VM code, you will see that > > the page is actually mapped elsewhere, and the page not present > > fault is trapped, and then looked up in a translation table, 8-p. > > Indeed, so it does. But why would we have code to handle this on i386? Per my original post, the reason is that a kernel write fault, such as might happen as the result of a file read into an address in an mmap()'ed area, will fail to cause the page in question to be marked dirty, without this hack, and VM coherency will fail to be maintained. Failure of VM coherency is bad. > (PS: The method you mentioned would still qualify as "force an exception > to be raised when accessing the page for write" ;) Actually, not. It's on the order of the F00F bug fix, which is a gross kludge of the worst sort. The page being written doesn't exist. It's a non-existance exception, not an access exception, since if the page mapping existed, it wouldn't result in the exception in the first place. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 0:26:14 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id A3EA337B424 for ; Wed, 20 Sep 2000 00:26:12 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id JAA37040; Wed, 20 Sep 2000 09:26:10 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id JAA30318; Wed, 20 Sep 2000 09:26:10 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Wed, 20 Sep 2000 09:26:10 +0200 (CEST) From: Marius Bendiksen To: Terry Lambert Cc: freebsd-fs@FreeBSD.ORG Subject: Re: how mmap buffer writes handled? In-Reply-To: <200009200712.AAA00357@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Per my original post, the reason is that a kernel write fault, Oops. I misunderstood your post. Apologies. > > (PS: The method you mentioned would still qualify as "force an exception > > to be raised when accessing the page for write" ;) > > Actually, not. It's on the order of the F00F bug fix, which is > a gross kludge of the worst sort. The page being written doesn't Agree. Might there not be some cleaner way of resolving the F00F bug ? ISTR having looked into this some time back. > exist. It's a non-existance exception, not an access exception, > since if the page mapping existed, it wouldn't result in the > exception in the first place. 8-). I did not specify "access" as a qualifier to "exception". Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 0:41:43 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id CCA9337B422; Wed, 20 Sep 2000 00:41:39 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id JAA44730; Wed, 20 Sep 2000 09:45:33 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009200745.JAA44730@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009200703.AAA00154@usr05.primenet.com> from Terry Lambert at "Sep 20, 2000 07:03:59 am" To: tlambert@primenet.com (Terry Lambert) Date: Wed, 20 Sep 2000 09:45:33 +0200 (CEST) Cc: archie@whistle.com (Archie Cobbs), fs@FreeBSD.ORG, sos@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Terry Lambert wrote: > > Isn't it safer (in the face of a power failure) to disable write > > caching on a hard disk when softupdates is in use? > > Yes. You _must_ guarantee that the drive does not complete > writes out of sequence that it reports having completed in > sequence. Hardware which lies is evil. Hmm, the write caching on ATA drives (if they support it at all, very few actually does), is guarantied to be able to write the data to disk on power failure, or at least so they say, and I've not been able to prove otherwise. > > The ata driver currenly always enables write caching. Perhaps > > there should be a sysctl knob to turn it on/off? > > Write caching should _never_ be enabled, unless you don't > care about the data, or the drive reports the operation > queueing and completion seperately, so that the OS knows > the completion order; even then, the OS will have to be > prepared to stall writing new data until completion has > occurred at any given synchronizatin point, so that it is > impossible for the drive to complete the requests out of > the order permitted by the OS. Hmm, they way this (should) work in ATA drives there should be no such problem, and I've never seen it, and belive me I've treid hard to provoke problems this way... -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 1: 8:12 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (Postfix) with ESMTP id EC8EF37B422 for ; Wed, 20 Sep 2000 01:08:09 -0700 (PDT) Received: (from daemon@localhost) by smtp02.primenet.com (8.9.3/8.9.3) id BAA16389; Wed, 20 Sep 2000 01:05:24 -0700 (MST) Received: from usr01.primenet.com(206.165.6.201) via SMTP by smtp02.primenet.com, id smtpdAAAqBaG_F; Wed Sep 20 01:05:16 2000 Received: (from tlambert@localhost) by usr01.primenet.com (8.8.5/8.8.5) id BAA27583; Wed, 20 Sep 2000 01:07:58 -0700 (MST) From: Terry Lambert Message-Id: <200009200807.BAA27583@usr01.primenet.com> Subject: Re: how mmap buffer writes handled? To: mbendiks@eunet.no (Marius Bendiksen) Date: Wed, 20 Sep 2000 08:07:58 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), freebsd-fs@FreeBSD.ORG In-Reply-To: from "Marius Bendiksen" at Sep 20, 2000 09:26:10 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Actually, not. It's on the order of the F00F bug fix, which is > > a gross kludge of the worst sort. The page being written doesn't > > Agree. Might there not be some cleaner way of resolving the F00F bug ? > ISTR having looked into this some time back. Yeah, there's actually a great fix: replace the faulty chip. > > exist. It's a non-existance exception, not an access exception, > > since if the page mapping existed, it wouldn't result in the > > exception in the first place. 8-). > > I did not specify "access" as a qualifier to "exception". But you did say "[...]when accessing the page for write". You can't access a non-existant page, only an existant one. 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 1:19: 5 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id B8A2D37B424; Wed, 20 Sep 2000 01:19:02 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id BAA11803; Wed, 20 Sep 2000 01:19:17 -0700 (MST) Received: from usr01.primenet.com(206.165.6.201) via SMTP by smtp05.primenet.com, id smtpdAAAYVaO_w; Wed Sep 20 01:19:08 2000 Received: (from tlambert@localhost) by usr01.primenet.com (8.8.5/8.8.5) id BAA27874; Wed, 20 Sep 2000 01:18:51 -0700 (MST) From: Terry Lambert Message-Id: <200009200818.BAA27874@usr01.primenet.com> Subject: Re: disable write caching with softupdates? To: sos@freebsd.dk (Soren Schmidt) Date: Wed, 20 Sep 2000 08:18:51 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), archie@whistle.com (Archie Cobbs), fs@FreeBSD.ORG, sos@FreeBSD.ORG In-Reply-To: <200009200745.JAA44730@freebsd.dk> from "Soren Schmidt" at Sep 20, 2000 09:45:33 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > Isn't it safer (in the face of a power failure) to disable write > > > caching on a hard disk when softupdates is in use? > > > > Yes. You _must_ guarantee that the drive does not complete > > writes out of sequence that it reports having completed in > > sequence. Hardware which lies is evil. > > Hmm, the write caching on ATA drives (if they support it at all, > very few actually does), is guarantied to be able to write the > data to disk on power failure, or at least so they say, and I've > not been able to prove otherwise. The ATA drives Whistle is using, which is what I'm assuming Archie is on about, do _not_ support this facility. As far as I can tell, there wre some SCSI drives manufactured by IBM at one time which could do this, and some lab drives at Quantum (also SCSI). The InterJet II _specifically_ uses a non-standard power supply to obtain an AC fail notification in sufficient time so as to not schedule additional writes over a DC failure event. In fact, both the Quantum and now IBM drives which are used in the InterJet II (both ATA drives) fail catastrophically on a power loss during a sector write, to the point of you potentially needing to reformat the sector, if you were so unwise as to be writing when DC to the drive dropped. The only way to get rid of this requirement is either to use a Journalled FS (you might remember me being upset about the IBM announcement of JFS being released under GPL, before we found out that it was the OS/2 JFS, and not the good one), or to do intention write logging to NVRAM (also expensive in terms of hardware). The only bonus is that the new power supply costs a lot less than the UPS in the InterJet I. > > > The ata driver currenly always enables write caching. Perhaps > > > there should be a sysctl knob to turn it on/off? > > > > Write caching should _never_ be enabled, unless you don't > > care about the data, or the drive reports the operation > > queueing and completion seperately, so that the OS knows > > the completion order; even then, the OS will have to be > > prepared to stall writing new data until completion has > > occurred at any given synchronizatin point, so that it is > > impossible for the drive to complete the requests out of > > the order permitted by the OS. > > Hmm, they way this (should) work in ATA drives there should > be no such problem, and I've never seen it, and belive me I've > treid hard to provoke problems this way... I don't think this is enough to ship 100,000 units to customers in the field; absence of evidence is not evidence of absence, and the drive manufacturers specifically state that a 1 sector corruption is possible if a write is occuring during DC failure. It sucks, but it's true. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 2:16:58 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id D231337B422; Wed, 20 Sep 2000 02:16:55 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id LAA66651; Wed, 20 Sep 2000 11:20:46 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009200920.LAA66651@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009200818.BAA27874@usr01.primenet.com> from Terry Lambert at "Sep 20, 2000 08:18:51 am" To: tlambert@primenet.com (Terry Lambert) Date: Wed, 20 Sep 2000 11:20:46 +0200 (CEST) Cc: archie@whistle.com (Archie Cobbs), fs@FreeBSD.ORG, sos@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Terry Lambert wrote: > > > > Isn't it safer (in the face of a power failure) to disable write > > > > caching on a hard disk when softupdates is in use? > > > > > > Yes. You _must_ guarantee that the drive does not complete > > > writes out of sequence that it reports having completed in > > > sequence. Hardware which lies is evil. > > > > Hmm, the write caching on ATA drives (if they support it at all, > > very few actually does), is guarantied to be able to write the > > data to disk on power failure, or at least so they say, and I've > > not been able to prove otherwise. > > The ATA drives Whistle is using, which is what I'm assuming > Archie is on about, do _not_ support this facility. As far > as I can tell, there wre some SCSI drives manufactured by > IBM at one time which could do this, and some lab drives at > Quantum (also SCSI). Hmm, well, lets disable this then, there is no need to complicate things :) -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 7:24:55 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.quantum.com (mx1.quantum.com [204.212.103.34]) by hub.freebsd.org (Postfix) with ESMTP id 2E3F437B422; Wed, 20 Sep 2000 07:24:51 -0700 (PDT) Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mx1.quantum.com (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id HAA11704; Wed, 20 Sep 2000 07:24:49 -0700 (PDT) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.21) id ; Wed, 20 Sep 2000 07:24:46 -0700 Message-ID: <8133266FE373D11190CD00805FA768BF055BD1C9@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: fs@FreeBSD.ORG Cc: sos@FreeBSD.ORG, "'freeBSD-scsi@freeBSD.org'" Subject: RE: disable write caching with softupdates? Date: Wed, 20 Sep 2000 07:24:44 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Terry Lambert [mailto:tlambert@primenet.com] wrote: > > Isn't it safer (in the face of a power failure) to disable write > > caching on a hard disk when softupdates is in use? > > Yes. You _must_ guarantee that the drive does not complete > writes out of sequence that it reports having completed in > sequence. Hardware which lies is evil. > > > > The ata driver currenly always enables write caching. Perhaps > > there should be a sysctl knob to turn it on/off? > > Write caching should _never_ be enabled, unless you don't > care about the data, or the drive reports the operation > queueing and completion seperately, so that the OS knows > the completion order; even then, the OS will have to be > prepared to stall writing new data until completion has > occurred at any given synchronizatin point, so that it is > impossible for the drive to complete the requests out of > the order permitted by the OS. > > With regard to "_never_": even a sync mounted FS will not > be recoverable to a deterministic state if write caching > does not guarantee completion in FIFO order, for obvious > reasons -- it doesn't matter if you go async in the kernel, > or async in the drive, either way your data gets screwed. Wouldn't it be acceptable to mark the meta-data writes as non-cacheable (i.e. write though to the media before signalling completion), and let the remaining writes (user data writes) be cacheable? I think this would improve the performance of the file system. SCSI has supported this for years, in the form of the FUA bit in the CDB for the write command. Somewhat similar behavior can be had in the newer flavors of ATA by issuing a "flush cache" command after each meta-data write, and waiting until the flush command completes before signalling the completion of the non-cacheable write. Regards, -Steve Steve Byan Design Engineer MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508)770-3414 fax: (508)770-2604 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Wed Sep 20 8:19:34 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 96A3737B422; Wed, 20 Sep 2000 08:19:29 -0700 (PDT) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.8.7/8.8.7) with ESMTP id CAA09412; Thu, 21 Sep 2000 02:18:52 +1100 Date: Thu, 21 Sep 2000 02:18:48 +1100 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Marc Tardif Cc: "Aleksandr A.Babaylov" , "Daniel C. Sobral" , freebsd-hackers@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG Subject: Re: device naming convention In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Mon, 18 Sep 2000, Marc Tardif wrote: > > 0cicuta/home/babolo(9)#dd of=/dev/wd0s2 if=/dev/zero bs=660b > > 1cicuta/home/babolo(11)#od -b /dev/wd0s2 > [ snip ] > > Why I use 2.2.7 for test? > > Because of my lovely 4.1-STABLE is extremly unstable with content of > > ad0s2 (wd0s2) above and silently reboot after the first dd in the test above. > > > Assuming my wd0s2 is still unused and of size 0, 3.5-STABLE also crashes in the test above (no disk activity, ctrl-c doesn't work, alt-f# doesn't work either). Perhaps it eventually reboots, but I wasn't patient enough to wait that long. One solution to this problem is to specify the count blocks after which dd returns properly but still no bytes are copied. [Please use lines somewhat shorter than 360 characters.] This is a completely diferent problem. wd0s2 was buffered in 3.5, and buffered devices are very broken in 3.1 and later versions of 3.x (write errors are retried endlessly. Among other bugs, writing beyond EOF hangs the system when it allocates all buffers for writing unwritable data). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 1:24:13 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id B16BE37B50D; Thu, 21 Sep 2000 01:24:07 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id KAA98478; Thu, 21 Sep 2000 10:25:00 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009210825.KAA98478@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1C9@shrcmsg1.tdh.qntm.com> from Stephen Byan at "Sep 20, 2000 07:24:44 am" To: Stephen.Byan@quantum.com (Stephen Byan) Date: Thu, 21 Sep 2000 10:25:00 +0200 (CEST) Cc: fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG ('freeBSD-scsi@freeBSD.org') X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Stephen Byan wrote: > > > > Write caching should _never_ be enabled, unless you don't > > care about the data, or the drive reports the operation > > queueing and completion seperately, so that the OS knows > > the completion order; even then, the OS will have to be > > prepared to stall writing new data until completion has > > occurred at any given synchronizatin point, so that it is > > impossible for the drive to complete the requests out of > > the order permitted by the OS. > > > > With regard to "_never_": even a sync mounted FS will not > > be recoverable to a deterministic state if write caching > > does not guarantee completion in FIFO order, for obvious > > reasons -- it doesn't matter if you go async in the kernel, > > or async in the drive, either way your data gets screwed. > > Wouldn't it be acceptable to mark the meta-data writes as non-cacheable > (i.e. write though to the media before signalling completion), and let the > remaining writes (user data writes) be cacheable? I think this would improve > the performance of the file system. > > SCSI has supported this for years, in the form of the FUA bit in the CDB for > the write command. Somewhat similar behavior can be had in the newer flavors > of ATA by issuing a "flush cache" command after each meta-data write, and > waiting until the flush command completes before signalling the completion > of the non-cacheable write. OK, I played a bit with that, the only info I can see I get from the higher levels is the BIO_ORDERED bit, so I tried to flush the cache each time I get one of those, _bad_ idea, 10% performance loss... Suggestions are welcome :) -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 1:26:25 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ncmail.netcentralen.dk (ncmail.netcentralen.dk [195.24.7.103]) by hub.freebsd.org (Postfix) with ESMTP id 9888837B422 for ; Thu, 21 Sep 2000 01:26:22 -0700 (PDT) Received: from mother.netcentralen.dk (mother.netcentralen.dk [195.24.7.107]) by ncmail.netcentralen.dk (8.9.3/8.9.3) with ESMTP id KAA56560 for ; Thu, 21 Sep 2000 10:40:21 +0200 (CEST) (envelope-from mar@netcentralen.dk) Received: by mother.netcentralen.dk with Internet Mail Service (5.5.2650.21) id ; Thu, 21 Sep 2000 10:30:21 +0200 Message-ID: <9164771DDCABD3118333005004E9446E204774@mother.netcentralen.dk> From: Michael Aronsen To: "'fs@freebsd.org'" Subject: Journaling Filesystems in bsd? Date: Thu, 21 Sep 2000 10:30:19 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Hello Just wanted to know if there are any projects to get something like reiserfs to FreeBSD? Michael To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 4:41:33 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id C257037B423 for ; Thu, 21 Sep 2000 04:41:30 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id NAA56578; Thu, 21 Sep 2000 13:41:27 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id NAA38115; Thu, 21 Sep 2000 13:41:27 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 21 Sep 2000 13:41:27 +0200 (CEST) From: Marius Bendiksen To: Terry Lambert Cc: freebsd-fs@FreeBSD.ORG Subject: Re: how mmap buffer writes handled? In-Reply-To: <200009200807.BAA27583@usr01.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Agree. Might there not be some cleaner way of resolving the F00F bug ? > > ISTR having looked into this some time back. > Yeah, there's actually a great fix: replace the faulty chip. Software, I mean. Personally, I use the K6 chip instead. > But you did say "[...]when accessing the page for write". You > can't access a non-existant page, only an existant one. 8-). I was speaking from a program's point of view. The program does not know the page is not present. To get the system's point of view, you would amend this to "attempting to access". Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 4:43:40 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id 597CE37B423; Thu, 21 Sep 2000 04:43:38 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id NAA56995; Thu, 21 Sep 2000 13:43:36 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id NAA38123; Thu, 21 Sep 2000 13:43:36 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 21 Sep 2000 13:43:36 +0200 (CEST) From: Marius Bendiksen To: Soren Schmidt Cc: Terry Lambert , Archie Cobbs , fs@FreeBSD.ORG, sos@FreeBSD.ORG Subject: Re: disable write caching with softupdates? In-Reply-To: <200009200920.LAA66651@freebsd.dk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > The ATA drives Whistle is using, which is what I'm assuming > > Archie is on about, do _not_ support this facility. As far > > as I can tell, there wre some SCSI drives manufactured by > > IBM at one time which could do this, and some lab drives at > > Quantum (also SCSI). > Hmm, well, lets disable this then, there is no need to complicate > things :) Please make this conditional, as people with non-crippled hardware might want to employ the write cache. A sysctl or build option would be best. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 4:48:57 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id 7C34F37B42C; Thu, 21 Sep 2000 04:48:51 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id NAA58037; Thu, 21 Sep 2000 13:48:49 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id NAA38177; Thu, 21 Sep 2000 13:48:49 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 21 Sep 2000 13:48:49 +0200 (CEST) From: Marius Bendiksen To: Stephen Byan Cc: fs@FreeBSD.ORG, sos@FreeBSD.ORG, "'freeBSD-scsi@freeBSD.org'" Subject: RE: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1C9@shrcmsg1.tdh.qntm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Wouldn't it be acceptable to mark the meta-data writes as non-cacheable > (i.e. write though to the media before signalling completion), and let the > remaining writes (user data writes) be cacheable? I think this would improve > the performance of the file system. Actually, performance-wise, you'd probably want to know the real geometry, given all the stuff FFS does to exploit it. > SCSI has supported this for years, in the form of the FUA bit in the CDB for > the write command. Somewhat similar behavior can be had in the newer flavors As I recall, and from what Eivind noted, this bit is routinely ignored in about 90% of all drives out there. > of ATA by issuing a "flush cache" command after each meta-data write, and > waiting until the flush command completes before signalling the completion > of the non-cacheable write. This has the potential for degrading performance even further. I think you would prefer to disable cache over this. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 6:11:15 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.quantum.com (mx1.quantum.com [204.212.103.34]) by hub.freebsd.org (Postfix) with ESMTP id B727837B422; Thu, 21 Sep 2000 06:11:11 -0700 (PDT) Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mx1.quantum.com (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id GAA16241; Thu, 21 Sep 2000 06:09:52 -0700 (PDT) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.21) id ; Thu, 21 Sep 2000 06:09:49 -0700 Message-ID: <8133266FE373D11190CD00805FA768BF055BD1D3@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: "'Soren Schmidt'" , Stephen Byan Cc: fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG Subject: RE: disable write caching with softupdates? Date: Thu, 21 Sep 2000 06:09:50 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Soren Schmidt [mailto:sos@freebsd.dk] wrote: > > Wouldn't it be acceptable to mark the meta-data writes as > non-cacheable > > (i.e. write though to the media before signalling > completion), and let the > > remaining writes (user data writes) be cacheable? I think > this would improve > > the performance of the file system. > > > > SCSI has supported this for years, in the form of the FUA > bit in the CDB for > > the write command. Somewhat similar behavior can be had in > the newer flavors > > of ATA by issuing a "flush cache" command after each > meta-data write, and > > waiting until the flush command completes before signalling > the completion > > of the non-cacheable write. > > OK, I played a bit with that, the only info I can see I get from the > higher levels is the BIO_ORDERED bit, so I tried to flush the cache > each time I get one of those, _bad_ idea, 10% performance loss... That's the price of having a recoverable file system. See Seltzer, Ganger, McKusick, Smith, Soules, and Stein, "Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems", 2000 USENIX Annual Technical Conference, June 2000, San Diego. Contrast this 10% performance hit versus what you get when you disable caching entirely. Regards, -Steve Steve Byan Design Engineer MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508)770-3414 fax: (508)770-2604 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 6:22:47 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (gate.interxion.com [194.153.74.13]) by hub.freebsd.org (Postfix) with ESMTP id 86EA137B423 for ; Thu, 21 Sep 2000 06:22:44 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id eBMEPWx97966; Fri, 22 Dec 2000 15:25:32 +0100 (CET) (envelope-from adrian) Date: Fri, 22 Dec 2000 15:25:32 +0100 From: Adrian Chadd To: Michael Aronsen Cc: freebsd-fs@freebsd.org Subject: Re: Journaling Filesystems in bsd? Message-ID: <20001222152532.B97883@roaming.cacheboy.net> References: <9164771DDCABD3118333005004E9446E204774@mother.netcentralen.dk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <9164771DDCABD3118333005004E9446E204774@mother.netcentralen.dk>; from mar@netcentralen.dk on Thu, Sep 21, 2000 at 10:30:19AM +0200 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, Sep 21, 2000, Michael Aronsen wrote: > Hello > Just wanted to know if there are any projects to get something like reiserfs > to FreeBSD? There's a few ideas floating around but noone has provided anything concrete. You're more than welcome to start something more concrete. :-) Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 6:38:42 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id DE10837B423; Thu, 21 Sep 2000 06:38:38 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id PAA89799; Thu, 21 Sep 2000 15:38:37 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id PAA38975; Thu, 21 Sep 2000 15:38:36 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 21 Sep 2000 15:38:36 +0200 (CEST) From: Marius Bendiksen To: Stephen Byan Cc: "'Soren Schmidt'" , fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG Subject: RE: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1D3@shrcmsg1.tdh.qntm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > OK, I played a bit with that, the only info I can see I get from the > > higher levels is the BIO_ORDERED bit, so I tried to flush the cache > > each time I get one of those, _bad_ idea, 10% performance loss... > That's the price of having a recoverable file system. See Seltzer, Ganger, Not necessarily. > Contrast this 10% performance hit versus what you get when you disable > caching entirely. I think you will see that on some drives, this may have a greater performance impact than not caching at all. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 6:40:41 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.quantum.com (mx1.quantum.com [204.212.103.34]) by hub.freebsd.org (Postfix) with ESMTP id 1BBED37B422; Thu, 21 Sep 2000 06:40:34 -0700 (PDT) Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mx1.quantum.com (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id GAA20537; Thu, 21 Sep 2000 06:40:27 -0700 (PDT) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.21) id ; Thu, 21 Sep 2000 06:40:25 -0700 Message-ID: <8133266FE373D11190CD00805FA768BF055BD1D4@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: "'Marius Bendiksen'" , Stephen Byan Cc: fs@FreeBSD.ORG, sos@FreeBSD.ORG, "'freeBSD-scsi@freeBSD.org'" Subject: RE: disable write caching with softupdates? Date: Thu, 21 Sep 2000 06:40:24 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marius Bendiksen [mailto:mbendiks@eunet.no] wrote: > > Wouldn't it be acceptable to mark the meta-data writes as > non-cacheable > > (i.e. write though to the media before signalling > completion), and let the > > remaining writes (user data writes) be cacheable? I think > this would improve > > the performance of the file system. > > Actually, performance-wise, you'd probably want to know the > real geometry, > given all the stuff FFS does to exploit it. I think this is a separate issue, but: The problem with exposing the disk geometry is that FFS makes assumptions about the geometry that are false. Disks are zoned, so there aren't a constant number of sectors per track. Due to defects, the number of sectors per zone varies from sample to sample. It's possible that each surface in the drive has a different number of cylinders. In future disk generations, the geometry may get warped in unpredictable ways. Moreover, to take advantage of the geometry, the file system needs an accurate access time model. The constants in this model may vary from sample to sample of the same type of drive, and may vary due to environment conditions like temperature and power supply voltage. (Many of the access time optimization algorithms in the drives do in fact adapt to these variations.) The characteristics of the model vary widely between different designs of drives. So it's hard to envision a standard way of expressing the actual drive geometry and access time model to the file system. > > SCSI has supported this for years, in the form of the FUA > bit in the CDB for > > the write command. Somewhat similar behavior can be had in > the newer flavors > > As I recall, and from what Eivind noted, this bit is > routinely ignored in > about 90% of all drives out there. If you are referring to the SCSI FUA bit, this is absolutely untrue. All Quantum SCSI drives obey this bit. All currently-manufactured drives obey this bit. I believe 99% of the drives that claim compliance with the SCSI SBC spec do in fact obey the FUA bit on writes. There was a recent case where one manufacturer appears to have cheated and ignored this bit, and caught quite a bit of abuse for it. Like lost business from major OEMs. If you are referring to the flush cache command for ATA drives, you may have a point. For ATA drives, earlier versions of the ATA spec did not specify a way to flush the cache. The ATA driver in Windows NT appears to have implemented one vendor's vendor-unique command to flush the cache, which is not widely-supported and which has been superceded by a standard "flush cache" command in the newer versions of the ATA specification. > > of ATA by issuing a "flush cache" command after each > meta-data write, and > > waiting until the flush command completes before signalling > the completion > > of the non-cacheable write. > > This has the potential for degrading performance even > further. I think you > would prefer to disable cache over this. I agree, flushing the write cache could be painful. I don't see how it could be much worse than disabling the cache, since the disk's write cache is not lazily written, and so does not usually have much dirty data. Without write caching, you pay one disk rotation for each sequential write. In workloads with a moderate to high sequential write component, this is an extreme penalty. Also, with caching enabled, the disk does a fair amount of reordering to optimize the total seek and rotational cost of the writes. You give this up when disabling the write cache. Regards, -Steve Steve Byan Design Engineer MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508)770-3414 fax: (508)770-2604 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 7:10: 7 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.quantum.com (mx1.quantum.com [204.212.103.34]) by hub.freebsd.org (Postfix) with ESMTP id B8ADE37B423; Thu, 21 Sep 2000 07:10:01 -0700 (PDT) Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mx1.quantum.com (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id HAA25113; Thu, 21 Sep 2000 07:08:41 -0700 (PDT) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.21) id ; Thu, 21 Sep 2000 07:08:38 -0700 Message-ID: <8133266FE373D11190CD00805FA768BF055BD1D6@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: "'Marius Bendiksen'" , Stephen Byan Cc: "'Soren Schmidt'" , fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG Subject: RE: disable write caching with softupdates? Date: Thu, 21 Sep 2000 07:08:37 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marius Bendiksen [mailto:mbendiks@eunet.no] wrote: > > Contrast this 10% performance hit versus what you get when=20 > you disable > > caching entirely. >=20 > I think you will see that on some drives, this may have a greater > performance impact than not caching at all. Perhaps S=F8ren will be kind enough to run the experiment? I'd be = interested in analyzing cases in ATA drives where flushing delivers worse = performance than disabling cache. Regards, -Steve To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 7:37:16 2000 Delivered-To: freebsd-fs@freebsd.org Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (Postfix) with ESMTP id A827E37B424; Thu, 21 Sep 2000 07:37:09 -0700 (PDT) Received: (from gibbs@localhost) by pluto.plutotech.com (8.9.2/8.9.1) id IAA36298; Thu, 21 Sep 2000 08:35:48 -0600 (MDT) (envelope-from gibbs) From: Justin Gibbs Message-Id: <200009211435.IAA36298@pluto.plutotech.com> Subject: Re: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1D4@shrcmsg1.tdh.qntm.com> from Stephen Byan at "Sep 21, 2000 6:40:24 am" To: Stephen.Byan@quantum.com (Stephen Byan) Date: Thu, 21 Sep 2000 08:35:48 -0600 (MDT) Cc: mbendiks@eunet.no, Stephen.Byan@quantum.com, fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL43 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Without write caching, you pay one disk rotation for each sequential write. This should not be the case if you are allowed to overlap commands. The only penalty should be increased latency in seeing a write complete. Because ATA and now even some SCSI drives only support the "basic queuing" feature set (cannot specify an ordered write barrier to the device), we'll have to find some way to give ordered semantics on these devices or just abandon the use of the B_ORDERED buffer flag. Softupdates does not use it, but FFS does in a few places. Too bad... back when I added it all SCSI devices that supported tagged queuing made this easy to do and I expected to see ATA follow SCSI's lead and implement the same primitive. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 7:57:39 2000 Delivered-To: freebsd-fs@freebsd.org Received: from fledge.watson.org (fledge.watson.org [204.156.12.50]) by hub.freebsd.org (Postfix) with ESMTP id 9344037B422 for ; Thu, 21 Sep 2000 07:57:37 -0700 (PDT) Received: from fledge.watson.org (robert@fledge.pr.watson.org [192.0.2.3]) by fledge.watson.org (8.9.3/8.9.3) with SMTP id KAA78308; Thu, 21 Sep 2000 10:57:34 -0400 (EDT) (envelope-from robert@fledge.watson.org) Date: Thu, 21 Sep 2000 10:57:34 -0400 (EDT) From: Robert Watson X-Sender: robert@fledge.watson.org To: Michael Aronsen Cc: "'fs@freebsd.org'" Subject: Re: Journaling Filesystems in bsd? In-Reply-To: <9164771DDCABD3118333005004E9446E204774@mother.netcentralen.dk> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, 21 Sep 2000, Michael Aronsen wrote: > Just wanted to know if there are any projects to get something > like reiserfs to FreeBSD? At the June, 2000 USENIX technical conference, a journalled implementation of the FFS file system for FreeBSD was described and used in a performance comparison with Softupdates. At the time, it was stated that this journalled implementation would be made available to the FreeBSD community, although a specific date was not set. If you're interested in the paper, it was by Margo Seltzer, Greg Ganger, Craig Soules, and Christopher Stein, and was entitled, ``Journaling Verses Soft Updates: Asynchronous Meta-data Protection in File Systems,'' and did a comprehensive analysis of the performance and safety implications of selecting various forms of file system meta-update protection (sync, async, softupdates, async journalling, sync journalling). As presumably Kirk would be the vehicle by which the journalling code would be incorporated in the base system, and he's currently on vacation, the practical answer is probably to sit tight until BSDCon when he surfaces again :-). Robert N M Watson robert@fledge.watson.org http://www.watson.org/~robert/ PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 TIS Labs at Network Associates, Safeport Network Services To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 8: 0:12 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id 6F4AB37B424; Thu, 21 Sep 2000 08:00:06 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id RAA92422; Thu, 21 Sep 2000 17:02:58 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009211502.RAA92422@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1D6@shrcmsg1.tdh.qntm.com> from Stephen Byan at "Sep 21, 2000 07:08:37 am" To: Stephen.Byan@quantum.com (Stephen Byan) Date: Thu, 21 Sep 2000 17:02:57 +0200 (CEST) Cc: mbendiks@eunet.no ('Marius Bendiksen'), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Stephen Byan wrote: > Marius Bendiksen [mailto:mbendiks@eunet.no] wrote: > > > > Contrast this 10% performance hit versus what you get when > > you disable > > > caching entirely. > > > > I think you will see that on some drives, this may have a greater > > performance impact than not caching at all. > > Perhaps Søren will be kind enough to run the experiment? I'd be interested > in analyzing cases in ATA drives where flushing delivers worse performance > than disabling cache. Well, I have been toying a bit with this, so far results are just timing of a make -j16 buildworld on two IBM DJNA drives (ie no tags) with varius setups. ATA driver "as is": 3602.63 real 0.00 user 2865.62 sys ATA driver with flush cache on "BIO_ORDERED": 3964.18 real 0.00 user 2870.09 sys ATA driver with write cache disabled: 4423.30 real 0.00 user 2871.87 sys So, having the write cache there definitly is a win. I'll try this on TWO IBM DTLA drives with tags enabled and see what gives.. Anything else you want me to mess with now we are at it ? -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 8:16: 4 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.quantum.com (mx1.quantum.com [204.212.103.34]) by hub.freebsd.org (Postfix) with ESMTP id 750C437B423; Thu, 21 Sep 2000 08:16:01 -0700 (PDT) Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mx1.quantum.com (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id IAA06180; Thu, 21 Sep 2000 08:15:52 -0700 (PDT) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.21) id ; Thu, 21 Sep 2000 08:15:50 -0700 Message-ID: <8133266FE373D11190CD00805FA768BF055BD1D7@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: "'Justin Gibbs'" , Stephen Byan Cc: mbendiks@eunet.no, Stephen Byan , fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG Subject: RE: disable write caching with softupdates? Date: Thu, 21 Sep 2000 08:15:50 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Justin Gibbs [mailto:gibbs@plutotech.com] wrote: > > Without write caching, you pay one disk rotation for each > sequential write. > > This should not be the case if you are allowed to overlap > commands. The > only penalty should be increased latency in seeing a write complete. You're correct. I was writing with respect to ATA drives, of which I believe only IBM's support write queuing, so I overlooked the case where queuing is available. I'm not that familiar with ATA in practice; the spec for ATA queuing looked sufficiently convoluted (i.e. a kludge) that it wasn't obvious to me that it would be a performance win to implement it. Regards, -Steve To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 8:17:47 2000 Delivered-To: freebsd-fs@freebsd.org Received: from orvieto.eecs.harvard.edu (orvieto.eecs.harvard.edu [140.247.60.201]) by hub.freebsd.org (Postfix) with ESMTP id 49BB537B42C; Thu, 21 Sep 2000 08:17:44 -0700 (PDT) Received: from localhost (stein@localhost) by orvieto.eecs.harvard.edu (8.9.3/8.9.3) with ESMTP id HAA11370; Thu, 21 Sep 2000 07:32:02 -0400 (EDT) (envelope-from stein@eecs.harvard.edu) X-Authentication-Warning: orvieto.eecs.harvard.edu: stein owned process doing -bs Date: Thu, 21 Sep 2000 07:32:02 -0400 (EDT) From: Christopher Stein To: Robert Watson Cc: Michael Aronsen , "'fs@freebsd.org'" Subject: Re: Journaling Filesystems in bsd? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Actually, we described two implementations of journaling within UFS. CMU built one and Harvard built one. The CMU journaling UFS uses a system file within the UFS partition for the meta-data logging. The Harvard journaling UFS uses an auxiliary file system known as the write-ahead file system (WAFS). "Write-ahead" refering to the write-ahead logging protocol that maintains the dependencies between in-place meta-data updates and log writes. This has the advantage of allowing for the placement of the log on a separate device - possibly a small, fast disk or even an NVRAM device - and consequently eliminating data and log head contention. The file systems were implemented in 4.0-CURRENT. Craig and I are the main implementors. We are both pretty busy with grad school stuff (courses, teaching, etc.). However, we are both keen on getting this out into the community and this is on the schedule for the near future. -Chris On Thu, 21 Sep 2000, Robert Watson wrote: > > On Thu, 21 Sep 2000, Michael Aronsen wrote: > > > Just wanted to know if there are any projects to get something > > like reiserfs to FreeBSD? > > At the June, 2000 USENIX technical conference, a journalled implementation > of the FFS file system for FreeBSD was described and used in a performance > comparison with Softupdates. At the time, it was stated that this > journalled implementation would be made available to the FreeBSD > community, although a specific date was not set. If you're interested in > the paper, it was by Margo Seltzer, Greg Ganger, Craig Soules, and > Christopher Stein, and was entitled, ``Journaling Verses Soft Updates: > Asynchronous Meta-data Protection in File Systems,'' and did a > comprehensive analysis of the performance and safety implications of > selecting various forms of file system meta-update protection (sync, > async, softupdates, async journalling, sync journalling). > > As presumably Kirk would be the vehicle by which the journalling code > would be incorporated in the base system, and he's currently on vacation, > the practical answer is probably to sit tight until BSDCon when he > surfaces again :-). > > Robert N M Watson > > robert@fledge.watson.org http://www.watson.org/~robert/ > PGP key fingerprint: AF B5 5F FF A6 4A 79 37 ED 5F 55 E9 58 04 6A B1 > TIS Labs at Network Associates, Safeport Network Services > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 8:19: 4 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail.clarkson.edu (mail.clarkson.edu [128.153.4.10]) by hub.freebsd.org (Postfix) with SMTP id D743037B43C for ; Thu, 21 Sep 2000 08:19:01 -0700 (PDT) Received: (qmail 1112 invoked by uid 0); 21 Sep 2000 15:18:56 -0000 Received: from sc-1-459.sc.clarkson.edu (HELO clarkson.edu) (128.153.23.68) by mail.clarkson.edu with SMTP; 21 Sep 2000 15:18:56 -0000 Message-ID: <39CA26D7.51D314BF@clarkson.edu> Date: Thu, 21 Sep 2000 11:18:47 -0400 From: Dwight Tuinstra X-Mailer: Mozilla 4.75 [en] (X11; U; FreeBSD 4.1-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: freebsd-fs Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) References: <9164771DDCABD3118333005004E9446E204774@mother.netcentralen.dk> <20001222152532.B97883@roaming.cacheboy.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Adrian Chadd wrote: > > On Thu, Sep 21, 2000, Michael Aronsen wrote: > > Hello > > Just wanted to know if there are any projects to get something like reiserfs > > to FreeBSD? > > There's a few ideas floating around but noone has provided anything > concrete. > > You're more than welcome to start something more concrete. :-) I'm interested in porting an alternative file system to FreeBSD. As a long-term graduate research project, I've been looking into the code for LFS (Log-structured File System) on NetBSD. Such a system is optimized for many small writes, and given the amounts of RAM available for read caches nowadays, should deliver read performance comparable (or not much worse) than FFS. Additionally, LFS should provide better and faster crash recovery than either FFS or journaling file systems. In a nutshell, the difference between journaling file systems and LFSes is that jFSes keep a log of metadata, whereas an LFS keeps metadata AND data in a unified log structure (and yes, this means playing lots of tricks with how inodes are located and found). At present, the only working, available LFS system (that I'm aware of) for a freeNIX is the one in NetBSD, though there are at least two efforts underway for Linux. There is an order of magnitude improvement in the write times for their "pkgsrc" directory (equivalent to the FreeBSD "ports" directory) when comparing FFS to LFS. The NetBSD LFS code is still beta, but in pretty good working order. There has been controversy regarding the performance claims of LFS under various disk activity patterns. The research I'll be doing aims to investigate the claims, especially after implementing the optimizations proposed by Matthews et al (Improving the Performance of Log-Structured File Systems With Adaptive Methods; Proc. Sixteenth ACM Symposium on Operating System Principles). Dr. Matthews is my advisor on this research. Is there any interest in porting/redesigning LFS for FreeBSD? --Dwight Tuinstra tuinstra@clarkson.edu To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 8:23:37 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id 3DF6F37B424; Thu, 21 Sep 2000 08:23:32 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id RAA98589; Thu, 21 Sep 2000 17:26:17 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009211526.RAA98589@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1D7@shrcmsg1.tdh.qntm.com> from Stephen Byan at "Sep 21, 2000 08:15:50 am" To: Stephen.Byan@quantum.com (Stephen Byan) Date: Thu, 21 Sep 2000 17:26:17 +0200 (CEST) Cc: gibbs@plutotech.com ('Justin Gibbs'), mbendiks@eunet.no, fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Stephen Byan wrote: > Justin Gibbs [mailto:gibbs@plutotech.com] wrote: > > > > Without write caching, you pay one disk rotation for each > > sequential write. > > > > This should not be the case if you are allowed to overlap > > commands. The > > only penalty should be increased latency in seeing a write complete. > > You're correct. I was writing with respect to ATA drives, of which I believe > only IBM's support write queuing, so I overlooked the case where queuing is > available. > > I'm not that familiar with ATA in practice; the spec for ATA queuing looked > sufficiently convoluted (i.e. a kludge) that it wasn't obvious to me that it > would be a performance win to implement it. Well, but it is, I see 5-10% performance gain using tagged queing, so its definitively worth the trouble. I dont think it is that bad, it was easy enough to implement support for in the ATA driver, but so fa I think we are alone in actually using it :) -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 8:38:17 2000 Delivered-To: freebsd-fs@freebsd.org Received: from orvieto.eecs.harvard.edu (orvieto.eecs.harvard.edu [140.247.60.201]) by hub.freebsd.org (Postfix) with ESMTP id 6ED2C37B422 for ; Thu, 21 Sep 2000 08:38:13 -0700 (PDT) Received: from localhost (stein@localhost) by orvieto.eecs.harvard.edu (8.9.3/8.9.3) with ESMTP id HAA11396; Thu, 21 Sep 2000 07:52:34 -0400 (EDT) (envelope-from stein@eecs.harvard.edu) X-Authentication-Warning: orvieto.eecs.harvard.edu: stein owned process doing -bs Date: Thu, 21 Sep 2000 07:52:34 -0400 (EDT) From: Christopher Stein To: Dwight Tuinstra Cc: freebsd-fs Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) In-Reply-To: <39CA26D7.51D314BF@clarkson.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > I'm interested in porting an alternative file system to FreeBSD. > > As a long-term graduate research project, I've been looking into > the code for LFS (Log-structured File System) on NetBSD. Such a > system is optimized for many small writes, and given the amounts > of RAM available for read caches nowadays, should deliver read > performance comparable (or not much worse) than FFS. Additionally, > LFS should provide better and faster crash recovery than either FFS > or journaling file systems. What is "better" crash recovery? I would expect journaling recovery to be faster than LFS recovery. > In a nutshell, the difference between journaling file systems > and LFSes is that jFSes keep a log of metadata, whereas an LFS > keeps metadata AND data in a unified log structure (and yes, this > means playing lots of tricks with how inodes are located and found). Log-structured file systems offer different semantics than synchronous journaling file systems. Synchronous journaling can offer the traditional durability of create. Nothing is durable in LFS until the segment reaches stable. Dependencies are maintained, as they are with synchronous UFS, journaling, and soft updates, but without providing durability semantics. Async journaling and soft updates sacrifice this as LFS does. Interestingly, sync journaling is able to maintain it while improving in performance over sync UFS across a large set of workloads (every one I've ever run). Also, the fsync behaviour of transactional workloads and programs like Senmail is highly problematic for LFS. These require durability and will force the segment, potentially when it is not very full. I suggest you look at the work by Seltzer and Smith comparing UFS clustering and LFS. > > At present, the only working, available LFS system (that I'm aware > of) for a freeNIX is the one in NetBSD, though there are at least two > efforts underway for Linux. There is an order of magnitude > improvement in the write times for their "pkgsrc" directory > (equivalent to the FreeBSD "ports" directory) when comparing > FFS to LFS. The NetBSD LFS code is still beta, but in pretty good > working order. Does the cleaner work? (always the first question to ask when people talk about their LFS implementations). You may want to contact Dan Ellard (ellard@eecs.harvard.edu). He is currently doing some BSD LFS work. -Chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 9: 5:38 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id 488DF37B422; Thu, 21 Sep 2000 09:05:33 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id SAA25878; Thu, 21 Sep 2000 18:05:31 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id SAA39435; Thu, 21 Sep 2000 18:05:31 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 21 Sep 2000 18:05:30 +0200 (CEST) From: Marius Bendiksen To: Stephen Byan Cc: fs@FreeBSD.ORG, sos@FreeBSD.ORG, "'freeBSD-scsi@freeBSD.org'" Subject: RE: disable write caching with softupdates? In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1D4@shrcmsg1.tdh.qntm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Disks are zoned, so there aren't a constant number of sectors per track. Due > to defects, the number of sectors per zone varies from sample to sample. > It's possible that each surface in the drive has a different number of > cylinders. In future disk generations, the geometry may get warped in > unpredictable ways. I agree on this. But I think people would step forth to fix these assumptions in FFS, in time, if disks started reporting real geometry. In either case, I think you would still be likely to get _some_ boost. Layout logic should either be entirely in the FS or entirely in the disk. > Moreover, to take advantage of the geometry, the file system needs an > accurate access time model. The constants in this model may vary from sample > to sample of the same type of drive, and may vary due to environment > conditions like temperature and power supply voltage. (Many of the access > time optimization algorithms in the drives do in fact adapt to these > variations.) The characteristics of the model vary widely between different > designs of drives. Many of these parameters can be adapted to in software, if the firmware will expose the required data. > If you are referring to the SCSI FUA bit, this is absolutely untrue. All > Quantum SCSI drives obey this bit. All currently-manufactured drives obey > this bit. I believe 99% of the drives that claim compliance with the SCSI > SBC spec do in fact obey the FUA bit on writes. There was a recent case > where one manufacturer appears to have cheated and ignored this bit, and > caught quite a bit of abuse for it. Like lost business from major OEMs. Okay. In this case, my information has been incorrect, and I apologize. > Without write caching, you pay one disk rotation for each sequential write. > In workloads with a moderate to high sequential write component, this is an > extreme penalty. Also, with caching enabled, the disk does a fair amount of > reordering to optimize the total seek and rotational cost of the writes. You > give this up when disabling the write cache. I agree that minimizing rotational cost by caching a single track is good, if the drive can guarantee the integrity of the data, possibly by providing an NVRAM buffer for the track. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 9:22:34 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-relay.eunet.no (mail-relay.eunet.no [193.71.71.242]) by hub.freebsd.org (Postfix) with ESMTP id 4389437B422 for ; Thu, 21 Sep 2000 09:22:30 -0700 (PDT) Received: from login-1.eunet.no (login-1.eunet.no [193.75.110.2]) by mail-relay.eunet.no (8.9.3/8.9.3/GN) with ESMTP id SAA30063; Thu, 21 Sep 2000 18:22:28 +0200 (CEST) (envelope-from mbendiks@eunet.no) Received: from localhost (mbendiks@localhost) by login-1.eunet.no (8.9.3/8.8.8) with ESMTP id SAA39516; Thu, 21 Sep 2000 18:22:28 +0200 (CEST) (envelope-from mbendiks@eunet.no) X-Authentication-Warning: login-1.eunet.no: mbendiks owned process doing -bs Date: Thu, 21 Sep 2000 18:22:28 +0200 (CEST) From: Marius Bendiksen To: Dwight Tuinstra Cc: freebsd-fs Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) In-Reply-To: <39CA26D7.51D314BF@clarkson.edu> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > As a long-term graduate research project, I've been looking into > the code for LFS (Log-structured File System) on NetBSD. Such a > system is optimized for many small writes, and given the amounts > of RAM available for read caches nowadays, should deliver read > performance comparable (or not much worse) than FFS. Additionally, > LFS should provide better and faster crash recovery than either FFS > or journaling file systems. Research (IIRC, Seltzer and Matthews) has shown that FFS outperforms LFS when the FFS clustering code has been activated. The crash recovery times can supposedly be alleviated by soft updates, I've not looked at that yet. Journalling crash recovery vs LFS crash recovery is more complex than a mere comparison of speed, as these can be tuned in both cases. > In a nutshell, the difference between journaling file systems > and LFSes is that jFSes keep a log of metadata, whereas an LFS > keeps metadata AND data in a unified log structure (and yes, this > means playing lots of tricks with how inodes are located and found). Tricks that should be avoided. The idea of unifying the log is good, IMO. However, locating an inode through the reading of indirect blocks, as will be the case, is not the best approach for large file systems. Also, the problem remains that you may lose locality of reference in time, as inodes will always follow their inode block, not their data blocks. I have a solution for this problem, but am prohibited from providing it due to using it in connection with a commercial venture. > At present, the only working, available LFS system (that I'm aware > of) for a freeNIX is the one in NetBSD, though there are at least two > efforts underway for Linux. There is an order of magnitude > improvement in the write times for their "pkgsrc" directory > (equivalent to the FreeBSD "ports" directory) when comparing > FFS to LFS. The NetBSD LFS code is still beta, but in pretty good > working order. This is simply because the directory structure is written contigously to the log, rather than being scattered. You would see the same benefit on the predecessor to FFS, which didn't do round-robin scattering of the directories. My preferred (among the simple ones) solution to this, is maintaining small directories in a cluster at the centre of the disk. > There has been controversy regarding the performance claims > of LFS under various disk activity patterns. The research I'll be > doing aims to investigate the claims, especially after implementing > the optimizations proposed by Matthews et al (Improving the > Performance of Log-Structured File Systems With Adaptive Methods; > Proc. Sixteenth ACM Symposium on Operating System Principles). > Dr. Matthews is my advisor on this research. Please look into his comparisons with FFS clustering code. > Is there any interest in porting/redesigning LFS for FreeBSD? I am working on a bsdl effort of my own which does not employ logging, and thus will not be able to contribute in the near future. But I would certainly like to be kept up to date, as I am certainly interested. Marius To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 9:24:45 2000 Delivered-To: freebsd-fs@freebsd.org Received: from andrew.cmu.edu (ANDREW.CMU.EDU [128.2.10.101]) by hub.freebsd.org (Postfix) with ESMTP id BC9FB37B42C for ; Thu, 21 Sep 2000 09:24:42 -0700 (PDT) Received: (from postman@localhost) by andrew.cmu.edu (8.9.3/8.9.3) id MAA19534 for freebsd-fs@freebsd.org; Thu, 21 Sep 2000 12:24:36 -0400 (EDT) Received: via switchmail; Thu, 21 Sep 2000 12:24:35 -0400 (EDT) Received: from unix10.andrew.cmu.edu via qmail ID ; Thu, 21 Sep 2000 12:24:22 -0400 (EDT) Received: from unix10.andrew.cmu.edu via qmail ID ; Thu, 21 Sep 2000 12:24:20 -0400 (EDT) Received: from mms.4.60.May..8.2000.10.35.10.sun4.57.EzMail.2.0.CUILIB.3.45.SNAP.NOT.LINKED.unix10.andrew.cmu.edu.sun4x.57 via MS.5.6.unix10.andrew.cmu.edu.sun4_57; Thu, 21 Sep 2000 12:24:20 -0400 (EDT) Message-ID: Date: Thu, 21 Sep 2000 12:24:20 -0400 (EDT) From: Craig A Soules To: freebsd-fs@freebsd.org Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) Cc: freebsd-fs In-Reply-To: References: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Excerpts from internet.computing.freebsd.fs: 21-Sep-100 Re: Journaling > Log-structured file systems offer different semantics than > synchronous journaling file systems. Synchronous journaling can > offer the traditional durability of create. Nothing is durable Wouldn't it be possible to offer the same semantics as FFS in an LFS implementation if the segment was (over)written after each operation? I'm not sure what sort of performance penelty you would incur for this behavior, although it would probably depend heavily upon the segment size. Although with a 64kb segment size, the penelty would probably be small, since its overhead isn't much higher than the 8kb block write seen by a synchronous journaling system. In fact, this might be more efficient than journaling in the long run, since you wouldn't have dirty data building up in the cache. Of course, there is that read thing... > Does the cleaner work? (always the first question to ask when people > talk about their LFS implementations). This may be wrong, so don't flame me, but... I had heard that there was no fully working implementation of LFS in the *BSDs for this reason (and in fact there was an LFS branch in FreeBSD as well, but it was removed since the cleaner was in such shoddy shape). craig To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 9:42:42 2000 Delivered-To: freebsd-fs@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id 61D2937B424 for ; Thu, 21 Sep 2000 09:42:40 -0700 (PDT) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id JAA09518; Thu, 21 Sep 2000 09:38:12 -0700 (PDT) Message-Id: <200009211638.JAA09518@implode.root.com> To: Dwight Tuinstra Cc: freebsd-fs Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) In-reply-to: Your message of "Thu, 21 Sep 2000 11:18:47 EDT." <39CA26D7.51D314BF@clarkson.edu> From: David Greenman Reply-To: dg@root.com Date: Thu, 21 Sep 2000 09:38:12 -0700 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >There has been controversy regarding the performance claims >of LFS under various disk activity patterns. The research I'll be >doing aims to investigate the claims, especially after implementing >the optimizations proposed by Matthews et al (Improving the >Performance of Log-Structured File Systems With Adaptive Methods; >Proc. Sixteenth ACM Symposium on Operating System Principles). >Dr. Matthews is my advisor on this research. > >Is there any interest in porting/redesigning LFS for FreeBSD? Have you done any comparisons with FFS+softupdates? The goal of softupdates was to be as fast or faster than LFS for everything, not require a cleanerd, and along with "snapshots" eliminate requiring fsck before system startup. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 10: 3: 2 2000 Delivered-To: freebsd-fs@freebsd.org Received: from whistle.com (s205m131.whistle.com [207.76.205.131]) by hub.freebsd.org (Postfix) with ESMTP id D218137B422; Thu, 21 Sep 2000 10:02:59 -0700 (PDT) Received: (from smap@localhost) by whistle.com (8.10.0/8.10.0) id e8LH2oY11321; Thu, 21 Sep 2000 10:02:50 -0700 (PDT) Received: from bubba.whistle.com( 207.76.205.7) by whistle.com via smap (V2.0) id xma011309; Thu, 21 Sep 2000 10:02:25 -0700 Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id KAA13662; Thu, 21 Sep 2000 10:02:16 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200009211702.KAA13662@bubba.whistle.com> Subject: Re: disable write caching with softupdates? In-Reply-To: "from Marius Bendiksen at Sep 21, 2000 01:43:36 pm" To: Marius Bendiksen Date: Thu, 21 Sep 2000 10:02:16 -0700 (PDT) Cc: sos@freebsd.org, fs@freebsd.org, terry@lambert.org X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marius Bendiksen writes: > > > The ATA drives Whistle is using, which is what I'm assuming > > > Archie is on about, do _not_ support this facility. As far > > > as I can tell, there wre some SCSI drives manufactured by > > > IBM at one time which could do this, and some lab drives at > > > Quantum (also SCSI). > > Hmm, well, lets disable this then, there is no need to complicate > > things :) > > Please make this conditional, as people with non-crippled hardware might > want to employ the write cache. A sysctl or build option would be best. Yep, that was what I was originally suggesting. Soren do you want me to try to come up with a patch? I don't claim to understand IDE technology.. can you just send the disable command at any time or is it more complicated than that? Thanks, -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 10:10:21 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (roaming.cacheboy.net [203.56.168.69]) by hub.freebsd.org (Postfix) with ESMTP id 8179F37B422; Thu, 21 Sep 2000 10:10:16 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id eBMIDHi07548; Fri, 22 Dec 2000 19:13:17 +0100 (CET) (envelope-from adrian) Date: Fri, 22 Dec 2000 19:13:17 +0100 From: Adrian Chadd To: freebsd-fs@freebsd.org Cc: freebsd-current@freebsd.org Subject: Fsck wrappers, revisited Message-ID: <20001222191317.A7529@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I've updated my fsck wrappers patchset to the latest netbsd and freebsd fsck patches. I'd appreciate some feedback on them before I run off and commit them (with my mentor, of course.) For those who aren't in the know, the general idea is that a single wrapper program spawns a FS-specific fsck process a la mount and mount_*, making multiple-FS support a lot easier. (Think about having fsck_ext2fs, fsck_msdos and fsck_ffs doing your FSes on bootup..) They can be found at http://www.freebsd.org/~adrian/fsck/ . PLEASE read the README before you use them, as there are a few gotchas. Thanks! Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 11: 3:53 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id 1FC5B37B43E; Thu, 21 Sep 2000 11:03:49 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id UAA37597; Thu, 21 Sep 2000 20:07:56 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009211807.UAA37597@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009211702.KAA13662@bubba.whistle.com> from Archie Cobbs at "Sep 21, 2000 10:02:16 am" To: archie@whistle.com (Archie Cobbs) Date: Thu, 21 Sep 2000 20:07:55 +0200 (CEST) Cc: mbendiks@eunet.no (Marius Bendiksen), sos@freebsd.org, fs@freebsd.org, terry@lambert.org X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Archie Cobbs wrote: > Marius Bendiksen writes: > > > > The ATA drives Whistle is using, which is what I'm assuming > > > > Archie is on about, do _not_ support this facility. As far > > > > as I can tell, there wre some SCSI drives manufactured by > > > > IBM at one time which could do this, and some lab drives at > > > > Quantum (also SCSI). > > > Hmm, well, lets disable this then, there is no need to complicate > > > things :) > > > > Please make this conditional, as people with non-crippled hardware might > > want to employ the write cache. A sysctl or build option would be best. > > Yep, that was what I was originally suggesting. > > Soren do you want me to try to come up with a patch? > I don't claim to understand IDE technology.. can you > just send the disable command at any time or is it > more complicated than that? This will do it, or rather leave it as the disk default which should be disabled... cvs diff: Diffing . Index: ata-disk.c =================================================================== RCS file: /home/ncvs/src/sys/dev/ata/ata-disk.c,v retrieving revision 1.77 diff -u -r1.77 ata-disk.c --- ata-disk.c 2000/09/20 07:00:24 1.77 +++ ata-disk.c 2000/09/21 18:00:15 @@ -131,11 +131,11 @@ if (ata_command(adp->controller, adp->unit, ATA_C_SETFEATURES, 0, 0, 0, 0, ATA_C_F_ENAB_RCACHE, ATA_WAIT_INTR)) printf("ad%d: enabling readahead cache failed\n", adp->lun); - +#if NOTYET if (ata_command(adp->controller, adp->unit, ATA_C_SETFEATURES, 0, 0, 0, 0, ATA_C_F_ENAB_WCACHE, ATA_WAIT_INTR)) printf("ad%d: enabling write cache failed\n", adp->lun); - +#endif /* use DMA if drive & controller supports it */ ata_dmainit(adp->controller, adp->unit, ata_pmode(AD_PARAM), ata_wmode(AD_PARAM), ata_umode(AD_PARAM)); -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 11: 8:57 2000 Delivered-To: freebsd-fs@freebsd.org Received: from whistle.com (s205m131.whistle.com [207.76.205.131]) by hub.freebsd.org (Postfix) with ESMTP id 4601A37B422 for ; Thu, 21 Sep 2000 11:08:55 -0700 (PDT) Received: (from smap@localhost) by whistle.com (8.10.0/8.10.0) id e8LI7PI12398; Thu, 21 Sep 2000 11:07:25 -0700 (PDT) Received: from bubba.whistle.com( 207.76.205.7) by whistle.com via smap (V2.0) id xma012395; Thu, 21 Sep 2000 11:07:10 -0700 Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id LAA17698; Thu, 21 Sep 2000 11:07:09 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200009211807.LAA17698@bubba.whistle.com> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009211807.UAA37597@freebsd.dk> "from Soren Schmidt at Sep 21, 2000 08:07:55 pm" To: Soren Schmidt Date: Thu, 21 Sep 2000 11:07:09 -0700 (PDT) Cc: mbendiks@eunet.no, terry@lambert.org, fs@freebsd.org X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Soren Schmidt writes: > > > > > The ATA drives Whistle is using, which is what I'm assuming > > > > > Archie is on about, do _not_ support this facility. As far > > > > > as I can tell, there wre some SCSI drives manufactured by > > > > > IBM at one time which could do this, and some lab drives at > > > > > Quantum (also SCSI). > > > > Hmm, well, lets disable this then, there is no need to complicate > > > > things :) > > > > > > Please make this conditional, as people with non-crippled hardware might > > > want to employ the write cache. A sysctl or build option would be best. > > > > Yep, that was what I was originally suggesting. > > > > Soren do you want me to try to come up with a patch? > > I don't claim to understand IDE technology.. can you > > just send the disable command at any time or is it > > more complicated than that? > > This will do it, or rather leave it as the disk default which > should be disabled... Thanks.. I was talking about a sysctl patch, where you could turn write caching on or off at any time while the system is running via sysctl. Is it even possible to do that? -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 11:16: 7 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id 36A4437B42C for ; Thu, 21 Sep 2000 11:16:03 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id UAA40830; Thu, 21 Sep 2000 20:20:07 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009211820.UAA40830@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009211807.LAA17698@bubba.whistle.com> from Archie Cobbs at "Sep 21, 2000 11:07:09 am" To: archie@whistle.com (Archie Cobbs) Date: Thu, 21 Sep 2000 20:20:07 +0200 (CEST) Cc: mbendiks@eunet.no, terry@lambert.org, fs@freebsd.org X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Archie Cobbs wrote: > > > Soren do you want me to try to come up with a patch? > > > I don't claim to understand IDE technology.. can you > > > just send the disable command at any time or is it > > > more complicated than that? > > > > This will do it, or rather leave it as the disk default which > > should be disabled... > > Thanks.. I was talking about a sysctl patch, where you could > turn write caching on or off at any time while the system is > running via sysctl. Is it even possible to do that? Why on earth would you want this as a sysctl knob ? Either you want to play it safe, or you dont care, this is not something you change your mind about now and then.... I'm planning to make it a compile time option, the question is what the default should be... The _right_ solution of cause is to have the FS code pass down a flag that says only write this when all preceeding bufs are on the media, now _that_ would be nice and work for both ATA and SCSI devices.... -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 11:32:48 2000 Delivered-To: freebsd-fs@freebsd.org Received: from ns.yogotech.com (ns.yogotech.com [206.127.123.66]) by hub.freebsd.org (Postfix) with ESMTP id 1500C37B440 for ; Thu, 21 Sep 2000 11:32:46 -0700 (PDT) Received: from nomad.yogotech.com (nomad.yogotech.com [206.127.123.131]) by ns.yogotech.com (8.9.3/8.9.3) with ESMTP id MAA23414; Thu, 21 Sep 2000 12:32:25 -0600 (MDT) (envelope-from nate@nomad.yogotech.com) Received: (from nate@localhost) by nomad.yogotech.com (8.8.8/8.8.8) id MAA01789; Thu, 21 Sep 2000 12:32:25 -0600 (MDT) (envelope-from nate) Date: Thu, 21 Sep 2000 12:32:25 -0600 (MDT) Message-Id: <200009211832.MAA01789@nomad.yogotech.com> From: Nate Williams MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit To: Soren Schmidt Cc: archie@whistle.com (Archie Cobbs), mbendiks@eunet.no, terry@lambert.org, fs@FreeBSD.ORG Subject: Re: disable write caching with softupdates? In-Reply-To: <200009211820.UAA40830@freebsd.dk> References: <200009211807.LAA17698@bubba.whistle.com> <200009211820.UAA40830@freebsd.dk> X-Mailer: VM 6.34 under 19.16 "Lille" XEmacs Lucid Reply-To: nate@yogotech.com (Nate Williams) Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > > Soren do you want me to try to come up with a patch? > > > > I don't claim to understand IDE technology.. can you > > > > just send the disable command at any time or is it > > > > more complicated than that? > > > > > > This will do it, or rather leave it as the disk default which > > > should be disabled... > > > > Thanks.. I was talking about a sysctl patch, where you could > > turn write caching on or off at any time while the system is > > running via sysctl. Is it even possible to do that? > > Why on earth would you want this as a sysctl knob ? So a user who wants to play it safe and/or play it non-safe doesn't have to recompile the kernel from scratch just to change the behavior. Sysctl's allow us to support more folks w/out recompiling their kernels. Nate To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 12: 2:31 2000 Delivered-To: freebsd-fs@freebsd.org Received: from whistle.com (s205m131.whistle.com [207.76.205.131]) by hub.freebsd.org (Postfix) with ESMTP id 1483537B43C for ; Thu, 21 Sep 2000 12:02:29 -0700 (PDT) Received: (from smap@localhost) by whistle.com (8.10.0/8.10.0) id e8LJ10f13054; Thu, 21 Sep 2000 12:01:00 -0700 (PDT) Received: from bubba.whistle.com( 207.76.205.7) by whistle.com via smap (V2.0) id xma013052; Thu, 21 Sep 2000 12:00:32 -0700 Received: (from archie@localhost) by bubba.whistle.com (8.9.3/8.9.3) id MAA18117; Thu, 21 Sep 2000 12:00:32 -0700 (PDT) (envelope-from archie) From: Archie Cobbs Message-Id: <200009211900.MAA18117@bubba.whistle.com> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009211820.UAA40830@freebsd.dk> "from Soren Schmidt at Sep 21, 2000 08:20:07 pm" To: Soren Schmidt Date: Thu, 21 Sep 2000 12:00:32 -0700 (PDT) Cc: mbendiks@eunet.no, terry@lambert.org, fs@freebsd.org X-Mailer: ELM [version 2.4ME+ PL82 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Soren Schmidt writes: > > Thanks.. I was talking about a sysctl patch, where you could > > turn write caching on or off at any time while the system is > > running via sysctl. Is it even possible to do that? > > Why on earth would you want this as a sysctl knob ? > Either you want to play it safe, or you dont care, this is not > something you change your mind about now and then.... OK, here's an example. You have an /etc/rc script that checks at startup what kind of disk you have (eg "dmesg | grep -q QUANTUM") and for certain disk types that are known to lie about their write guarantees enables or disables write caching as appropriate. FYI I can confirm what Terry said that few disks we've encountered guarantee atomic sector writes in the face of power failure (i.e., leading to a corrupted/lost sector), much less flushing the entire write cache, so IMHO the answer to.. > I'm planning to make it a compile time option, the question > is what the default should be... ..is clearly "disabled" -- and then "enable at your own risk if you know that your disk doesn't lie (or don't care)". > The _right_ solution of cause is to have the FS code pass down > a flag that says only write this when all preceeding bufs are > on the media, now _that_ would be nice and work for both ATA > and SCSI devices.... Agreed, this is the best solution. I think it is necessary in order to make the promise of softupdates really rigorous. What we do on the InterJet right now is detect a power failure and, if detected, freeze the system immediately. This guarantees that the disk won't write a corrupted sector (because our power supply has 80ms or so of residual power -- enough for the disk to finish writing). Disabling write caching guarantees that the sectors believed to be written by soft-updates actually are when the system freezes. BOTH guarantees are necessary in order to guarantee a valid disk. -Archie ___________________________________________________________________________ Archie Cobbs * Whistle Communications, Inc. * http://www.whistle.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 12:35: 3 2000 Delivered-To: freebsd-fs@freebsd.org Received: from orvieto.eecs.harvard.edu (orvieto.eecs.harvard.edu [140.247.60.201]) by hub.freebsd.org (Postfix) with ESMTP id 449EB37B424 for ; Thu, 21 Sep 2000 12:35:00 -0700 (PDT) Received: from localhost (stein@localhost) by orvieto.eecs.harvard.edu (8.9.3/8.9.3) with ESMTP id LAA11600; Thu, 21 Sep 2000 11:49:24 -0400 (EDT) (envelope-from stein@eecs.harvard.edu) X-Authentication-Warning: orvieto.eecs.harvard.edu: stein owned process doing -bs Date: Thu, 21 Sep 2000 11:49:23 -0400 (EDT) From: Christopher Stein To: Craig A Soules Cc: freebsd-fs Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Thu, 21 Sep 2000, Craig A Soules wrote: > Excerpts from internet.computing.freebsd.fs: 21-Sep-100 Re: Journaling > > Log-structured file systems offer different semantics than > > synchronous journaling file systems. Synchronous journaling can > > offer the traditional durability of create. Nothing is durable > > Wouldn't it be possible to offer the same semantics as FFS in an LFS > implementation if the segment was (over)written after each operation? Partial segment writes? A partial segment write solution as was done in de Jonge & Kaashoek's logical disk. This would solve the internal fragmentation problem and make the cleaner's life easier, while allowing the system to provide traditional UFS create semantics. However, forgetting about the cleaner for a moment, I think performance would be just about the same as the application doing an explicit fsync() to force the full segment. As you said, write times are not dominated by bandwidth so 64KB and 8KB disk writes are probably pretty close. If we write just a portion of the segment the cost will be similar to writing the full segment. So fsyncing full (with lots of internal free space) segments and partial segment writes will be basically the same -- with the important difference being on-disk internal fragmentation. Now bringing the cleaner back into the picture (as it always should be).. the higher level of on-disk fragmentation would drop into run-time performance. The cleaner would be more busy copying and packing segments - generally consuming resources and getting in the way. So I agree that partial segment writes make sense. For the reason that it can offer durability without internal fragmentation - making the cleaner's life easier. -Chris To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 12:35:56 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail.m.iinet.net.au (opera3.iinet.net.au [203.59.24.51]) by hub.freebsd.org (Postfix) with SMTP id DB55437B43E for ; Thu, 21 Sep 2000 12:35:51 -0700 (PDT) Received: (qmail 10609 invoked by uid 666); 21 Sep 2000 19:35:46 -0000 Received: from unknown (HELO jules.elischer.org) (203.59.169.108) by mail.m.iinet.net.au with SMTP; 21 Sep 2000 19:35:46 -0000 Message-ID: <39CA6304.2781E494@elischer.org> Date: Thu, 21 Sep 2000 12:35:32 -0700 From: Julian Elischer X-Mailer: Mozilla 3.04Gold (X11; I; FreeBSD 5.0-CURRENT i386) MIME-Version: 1.0 To: Archie Cobbs Cc: Soren Schmidt , mbendiks@eunet.no, terry@lambert.org, fs@freebsd.org Subject: Re: disable write caching with softupdates? References: <200009211900.MAA18117@bubba.whistle.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Archie Cobbs wrote: > > What we do on the InterJet right now is detect a power failure and, > if detected, freeze the system immediately. This guarantees that > the disk won't write a corrupted sector (because our power supply > has 80ms or so of residual power -- enough for the disk to finish > writing). Disabling write caching guarantees that the sectors > believed to be written by soft-updates actually are when the system > freezes. BOTH guarantees are necessary in order to guarantee a > valid disk. for those who claim that (actually 60) mSecs is too long for a single write, I calculated that I needed time toallow for a seek, a write of the first few sectors of a write, another seek to an alternate sector if ther is a badblock in the set, a firther write, followed by a return seek to the original sequence of blocks. Including rotational latencies I decided that 60mSec would cover us for "enough" cases.. If you have 2 separate bad blocks in a single logical write you are probably ok but it is getting tight, especially in Japan where the (sometimes 90V) mains voltage means that the power supply is REALLY going to give you 60mS.. in the USA you get 80 and in AUS (250V) you get about 200mSecs :-) Incidentally, if the mains return before the system dies (within about 80mS) we continue on and sync the disks.... Archies comment about MOST drives not supporting safe writing of accepted work is an understatement. When I was testing and selecting drives for the interjet, I found NO drives that would guarantee that data accepted for writing would be written in the case of power failure. > -- __--_|\ Julian Elischer / \ julian@elischer.org ( OZ ) World tour 2000 ---> X_.---._/ presently in: Perth v To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 12:58:59 2000 Delivered-To: freebsd-fs@freebsd.org Received: from fw.wintelcom.net (ns1.wintelcom.net [209.1.153.20]) by hub.freebsd.org (Postfix) with ESMTP id 8A3BA37B43C for ; Thu, 21 Sep 2000 12:58:52 -0700 (PDT) Received: (from bright@localhost) by fw.wintelcom.net (8.10.0/8.10.0) id e8LJwJB21017; Thu, 21 Sep 2000 12:58:19 -0700 (PDT) Date: Thu, 21 Sep 2000 12:58:19 -0700 From: Alfred Perlstein To: Christopher Stein Cc: Craig A Soules , freebsd-fs Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) Message-ID: <20000921125819.X9141@fw.wintelcom.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from stein@eecs.harvard.edu on Thu, Sep 21, 2000 at 11:49:23AM -0400 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org * Christopher Stein [000921 12:35] wrote: > > On Thu, 21 Sep 2000, Craig A Soules wrote: > > > Excerpts from internet.computing.freebsd.fs: 21-Sep-100 Re: Journaling > > > Log-structured file systems offer different semantics than > > > synchronous journaling file systems. Synchronous journaling can > > > offer the traditional durability of create. Nothing is durable > > > > Wouldn't it be possible to offer the same semantics as FFS in an LFS > > implementation if the segment was (over)written after each operation? > > Partial segment writes? > A partial segment write solution as was done in de Jonge & Kaashoek's > logical disk. This would solve the internal fragmentation problem and > make the cleaner's life easier, while allowing the system to > provide traditional UFS create semantics. > > However, forgetting about the cleaner for a moment, I think performance > would be just about the same as the application doing an explicit > fsync() to force the full segment. As you said, write times are not > dominated by bandwidth so 64KB and 8KB disk writes are probably pretty > close. If we write just a portion of the segment the cost will be similar > to writing the full segment. So fsyncing full (with lots of internal free > space) segments and partial segment writes will be basically the same -- > with the important difference being on-disk internal fragmentation. > > Now bringing the cleaner back into the picture (as it always should > be).. the higher level of on-disk fragmentation would drop into run-time > performance. The cleaner would be more busy copying and packing segments - > generally consuming resources and getting in the way. > So I agree that partial segment writes make sense. For the reason that > it can offer durability without internal fragmentation - making the > cleaner's life easier. One trick that can be done is to detect high fsync traffic and rewrite the blocks several times. most simplistic case: application creates a file and writes to the first block and then fsync() the log is then sync'd to backing store application appends another block and fsyncs again the log is then sync'd to backing store Right there is a major fragmentation problem. Now consider what you can do for this case: application creates a file and writes to the first block and then fsync() the log is then sync'd to backing store application appends another block and fsyncs again the log is then sync'd to backing store along with the first block. Although the log grows much faster, the contents of the previous segment most likely can be discarded rather than needing compaction plus you avoid fragmentation at the expense of additional (but non-seek requiring) data transfer. Another option is to just rewrite the entire previously written partial segment and informing the cleaner that the previous is junk. This is sort of like adapting ffs_doreallocblks (sp?) to LFS and would most likely be a gain, especially if it only happens when the previous data is still cached (ffs_doreallocblks can make an IO happen if I recall what Kirk explained to me correctly). -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 13:35:35 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id EE5A537B446; Thu, 21 Sep 2000 13:35:28 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id NAA18138; Thu, 21 Sep 2000 13:35:42 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAh.aaJG; Thu Sep 21 13:33:11 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id NAA15839; Thu, 21 Sep 2000 13:32:42 -0700 (MST) From: Terry Lambert Message-Id: <200009212032.NAA15839@usr08.primenet.com> Subject: Re: disable write caching with softupdates? To: mbendiks@eunet.no (Marius Bendiksen) Date: Thu, 21 Sep 2000 20:32:41 +0000 (GMT) Cc: Stephen.Byan@quantum.com (Stephen Byan), sos@freebsd.dk ('Soren Schmidt'), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG In-Reply-To: from "Marius Bendiksen" at Sep 21, 2000 03:38:36 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > OK, I played a bit with that, the only info I can see I get from the > > > higher levels is the BIO_ORDERED bit, so I tried to flush the cache > > > each time I get one of those, _bad_ idea, 10% performance loss... > > > That's the price of having a recoverable file system. See Seltzer, Ganger, > > Not necessarily. > > > Contrast this 10% performance hit versus what you get when you disable > > caching entirely. > > I think you will see that on some drives, this may have a greater > performance impact than not caching at all. There will always be a performance impact, since this will, of necessity, stall the write pipeline for the synchronization, unless there are a lot of graphically unrelated I/O's pending. At least in this way, soft updates is better than delayed ordered writes (DOW -- patented by USL, and used without permission in ReiserFS), in that DOW will stall all I/O when hitting a synchronization point, whereas SU will only stall dependent I/O. That said, the question is whether the drive will flush the cache and mark it invalid, or will merely flush the cache to disk, and leave the cache contents intact. If it does the former, then there could be additional overhead for subsequent reads. Really, the OS needs to know the cache strategy of the drive, and follow the same strategy itself, to reduce the number of drive to OS transactions, and to remove the problem of the drive having to go back to the well for a subsequent read, if the cache contents are effectively discarded. Frankly, I find it hard to believe that a cache flush would result in other than a mere write, e.g. that any drive would be so dumb as to discard. But there might be other consequences, since the cache on the drive may all get marked clean, which would result in a natural disordering of the reuse of cache buffers. This may be more or less optimal: it depends on usage patterns by the OS. So minimally, some experimentation should be done with the drive and OS in terms of the OS using mode page 2 to obtain the drive geometry for variable geometry drives, and apply the standard seek optimizations that are currently disabled in FFS, as well as placing the OS caching on a track granularity to match the cache characteristics on most modern drives. --- On a semi-related note, I have done some experimentation with some (admittedly older) code that gave ownership of the vnode to the FS, per SunOS and USL approaches, e.g., instead of two separate allocations: ,-------. <-. ,-->,-------. | inode | | | | vnode | | | | | | | | | `-----------| | | |-----------' | | | | | | `-------' `-------' Having a single allocation: ,-------.<---. | vnode | | | | | | |--. | | | | | |-------|<-' | | inode | | | | | | |----' | | `-------' Which avoids the ability of the vclean() to disassociate valid cached data from ihash objects by reclaiming the vnode out from under the inode, without the inode also being reclaimed, making the operation idempotent in both directions, and then totally removing the ihash(), since the vnode is allocated as part of allocating the in core inode (this avoids the SVR3 inode size limitation problem, which the Berkeley people resolved via a divorce). I measured a better than 30% performance increase on heavily loaded systems by doing this (this was 3.2 code, so take that for what it's worth, which is, I think, a lot, since things have not changed _that_ much). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 13:44:10 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 870F437B440; Thu, 21 Sep 2000 13:44:04 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id NAA22534; Thu, 21 Sep 2000 13:44:20 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAAtuaW8R; Thu Sep 21 13:44:11 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id NAA16157; Thu, 21 Sep 2000 13:43:52 -0700 (MST) From: Terry Lambert Message-Id: <200009212043.NAA16157@usr08.primenet.com> Subject: Re: disable write caching with softupdates? To: sos@freebsd.dk (Soren Schmidt) Date: Thu, 21 Sep 2000 20:43:49 +0000 (GMT) Cc: Stephen.Byan@quantum.com (Stephen Byan), mbendiks@eunet.no ('Marius Bendiksen'), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG In-Reply-To: <200009211502.RAA92422@freebsd.dk> from "Soren Schmidt" at Sep 21, 2000 05:02:57 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > ATA driver with flush cache on "BIO_ORDERED": > 3964.18 real 0.00 user 2870.09 sys > > ATA driver with write cache disabled: > 4423.30 real 0.00 user 2871.87 sys > > So, having the write cache there definitly is a win. > > I'll try this on TWO IBM DTLA drives with tags enabled and see what gives.. > > Anything else you want me to mess with now we are at it ? I suspect that much of this difference is unnecessary I/O bus transactions to recover "lost" cached data resulting from the inode/vnode disassociation. Try significantly increasing the number of vnodes in your system, and the ihash pool size, and see if it has any effect on the second set of numbers, by reducing the bus overhead through unnecessary reclaims. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 13:47:42 2000 Delivered-To: freebsd-fs@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 1AFD737B423; Thu, 21 Sep 2000 13:47:37 -0700 (PDT) Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71]) by feral.com (8.9.3/8.9.3) with ESMTP id NAA09886; Thu, 21 Sep 2000 13:41:02 -0700 Date: Thu, 21 Sep 2000 13:37:46 -0700 (PDT) From: Matthew Jacob Reply-To: mjacob@feral.com To: Terry Lambert Cc: Marius Bendiksen , Stephen Byan , "'Soren Schmidt'" , fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG Subject: Re: disable write caching with softupdates? In-Reply-To: <200009212032.NAA15839@usr08.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Really, the OS needs to know the cache strategy of the drive, and > follow the same strategy itself, to reduce the number of drive to > OS transactions, and to remove the problem of the drive having to > go back to the well for a subsequent read, if the cache contents > are effectively discarded. It's simple enough- at least with SCSI. If you enable write cacheing, you should then allow for the filesystem to issue an ioctl that can use the SYNCHRONIZE CACHE command to provide the commit point. There are also flavors of readns and writes that can affect cache retention policies for certain blocks. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 13:50:45 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mx1.quantum.com (mx1.quantum.com [204.212.103.34]) by hub.freebsd.org (Postfix) with ESMTP id 83DC337B424; Thu, 21 Sep 2000 13:50:40 -0700 (PDT) Received: from milcmima.qntm.com (milcmima.qntm.com [146.174.18.61]) by mx1.quantum.com (8.9.3 (PHNE_18979)/8.9.3) with ESMTP id NAA08676; Thu, 21 Sep 2000 13:50:35 -0700 (PDT) Received: by milcmima.qntm.com with Internet Mail Service (5.5.2650.21) id ; Thu, 21 Sep 2000 13:50:34 -0700 Message-ID: <8133266FE373D11190CD00805FA768BF055BD1DB@shrcmsg1.tdh.qntm.com> From: Stephen Byan To: "'Marius Bendiksen'" , Mike Smith Cc: freeBSD-scsi@FreeBSD.ORG, "'fs@freeBSD.org'" Subject: RE: disable write caching with softupdates? Date: Thu, 21 Sep 2000 13:39:08 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Marius Bendiksen [mailto:mbendiks@eunet.no] wrote: > Actually, you can do certain optimizations based on the > knowledge of how > the disk behaves, if you have the ability to use this knowledge in a > realtime system. I unfortunately misphrased so as to state > that FFS did > accurately use this information, which it does not. It would, however, > be possible to design a system which exploits this, thus improving > performance by, amongst other things, removing rotational latency, and > being aware of bad sector remapping, seek timings, zone layout, et al. > This would require realtime operation and fine-grained timing, though. > There are certainly other things one would benefit from first. > > And this still relies on having the drive provide the correct > information, > including such things as temperature, air moisture levels, or > whatever. > This, as you pointed out, is not reported by disk drives. I think this is one of the more interesting things about the NASD and object-based disk proposals. Since they push the file-store layer down to the disk, the block allocation policy can take all of these things into account, without having to specify an intermediate abstraction layer to convey the necessary information, since the form of that information changes frequently as a reflection of the volatility of the underlying technology in the disk drive. Regards, -Steve Steve Byan Design Engineer MS 1-3/E23 333 South Street Shrewsbury, MA 01545 (508)770-3414 fax: (508)770-2604 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 13:50:50 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id E131737B42C; Thu, 21 Sep 2000 13:50:41 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id NAA14670; Thu, 21 Sep 2000 13:49:16 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp03.primenet.com, id smtpdAAA04aayC; Thu Sep 21 13:49:00 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id NAA16305; Thu, 21 Sep 2000 13:50:21 -0700 (MST) From: Terry Lambert Message-Id: <200009212050.NAA16305@usr08.primenet.com> Subject: Re: disable write caching with softupdates? To: mbendiks@eunet.no (Marius Bendiksen) Date: Thu, 21 Sep 2000 20:50:21 +0000 (GMT) Cc: Stephen.Byan@quantum.com (Stephen Byan), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG ('freeBSD-scsi@freeBSD.org') In-Reply-To: from "Marius Bendiksen" at Sep 21, 2000 06:05:30 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Disks are zoned, so there aren't a constant number of sectors per track. Due > > to defects, the number of sectors per zone varies from sample to sample. > > It's possible that each surface in the drive has a different number of > > cylinders. In future disk generations, the geometry may get warped in > > unpredictable ways. > > I agree on this. But I think people would step forth to fix these > assumptions in FFS, in time, if disks started reporting real geometry. > In either case, I think you would still be likely to get _some_ boost. > Layout logic should either be entirely in the FS or entirely in the disk. This is avaialble in the SCSI 2 standard, on all conforming drives. > I agree that minimizing rotational cost by caching a single track is good, > if the drive can guarantee the integrity of the data, possibly by > providing an NVRAM buffer for the track. Most modern disks will always track-cache, since they write sectors in reverse order. For read operations, immediately following a seek, they start reading into cache at wherever the head is on the disk, and buffer that until they hit the requested data. The result is that sequential data is already loaded into the cache by the time that the next read goes to the drive. Right about now, someone should nudge Rod Grimes to get into this discussion, since he has played with some of this type of disk optimization. Now that it's under a microscope again, it might also be time to consider his spindle-sync work, as well as other stuff that's more hardware dependent. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 13:52:24 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id C526F37B424; Thu, 21 Sep 2000 13:52:17 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id NAA18515; Thu, 21 Sep 2000 13:37:41 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAAubaWBI; Thu Sep 21 13:36:19 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id NAA15997; Thu, 21 Sep 2000 13:38:37 -0700 (MST) From: Terry Lambert Message-Id: <200009212038.NAA15997@usr08.primenet.com> Subject: Re: disable write caching with softupdates? To: Stephen.Byan@quantum.com (Stephen Byan) Date: Thu, 21 Sep 2000 20:38:37 +0000 (GMT) Cc: mbendiks@eunet.no ('Marius Bendiksen'), Stephen.Byan@quantum.com (Stephen Byan), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG ('freeBSD-scsi@freeBSD.org') In-Reply-To: <8133266FE373D11190CD00805FA768BF055BD1D4@shrcmsg1.tdh.qntm.com> from "Stephen Byan" at Sep 21, 2000 06:40:24 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > The problem with exposing the disk geometry is that FFS makes assumptions > about the geometry that are false. FFS is wrong, here. > Disks are zoned, so there aren't a constant number of sectors per track. Due > to defects, the number of sectors per zone varies from sample to sample. > It's possible that each surface in the drive has a different number of > cylinders. In future disk generations, the geometry may get warped in > unpredictable ways. Mode page 2. Yes, this will leave IDE drives out in the cold, but let them grow a mode page 2, or suffer the consequences. > Moreover, to take advantage of the geometry, the file system needs an > accurate access time model. The constants in this model may vary from sample > to sample of the same type of drive, and may vary due to environment > conditions like temperature and power supply voltage. (Many of the access > time optimization algorithms in the drives do in fact adapt to these > variations.) The characteristics of the model vary widely between different > designs of drives. Mostly, all it needs to know whether rotational latency is significantly lower than seek latency, which it is. > So it's hard to envision a standard way of expressing the actual drive > geometry and access time model to the file system. You don't need to be accurate, as much as you need to be correct, and in the ballpark on rotational vs. seek latencies. > > As I recall, and from what Eivind noted, this bit is > > routinely ignored in > > about 90% of all drives out there. > > If you are referring to the SCSI FUA bit, this is absolutely untrue. All > Quantum SCSI drives obey this bit. All currently-manufactured drives obey > this bit. I believe 99% of the drives that claim compliance with the SCSI > SBC spec do in fact obey the FUA bit on writes. There was a recent case > where one manufacturer appears to have cheated and ignored this bit, and > caught quite a bit of abuse for it. Like lost business from major OEMs. > > If you are referring to the flush cache command for ATA drives, you may have > a point. For ATA drives, earlier versions of the ATA spec did not specify a > way to flush the cache. The ATA driver in Windows NT appears to have > implemented one vendor's vendor-unique command to flush the cache, which is > not widely-supported and which has been superceded by a standard "flush > cache" command in the newer versions of the ATA specification. These guys were talking about ATA drives, which ignore the bit. Most SCSI drives, as you point out, do not. And this is a heck of a lot better approach than flushing the cache via an explicit instruction. The SCSI stuff can be handled with a "known rogues" list, but the ATA stuff is pretty much unhandleable at this point. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 14: 9: 9 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 6DC9537B43E for ; Thu, 21 Sep 2000 14:09:02 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id OAA01454; Thu, 21 Sep 2000 14:09:18 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAABAaWWc; Thu Sep 21 14:09:11 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA16992; Thu, 21 Sep 2000 14:08:52 -0700 (MST) From: Terry Lambert Message-Id: <200009212108.OAA16992@usr08.primenet.com> Subject: Crash recovery: SU vs. LFS vs. JFS To: mbendiks@eunet.no (Marius Bendiksen) Date: Thu, 21 Sep 2000 21:08:52 +0000 (GMT) Cc: tuinstra@clarkson.edu (Dwight Tuinstra), freebsd-fs@FreeBSD.ORG (freebsd-fs) In-Reply-To: from "Marius Bendiksen" at Sep 21, 2000 06:22:28 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > As a long-term graduate research project, I've been looking into > > the code for LFS (Log-structured File System) on NetBSD. Such a > > system is optimized for many small writes, and given the amounts > > of RAM available for read caches nowadays, should deliver read > > performance comparable (or not much worse) than FFS. Additionally, > > LFS should provide better and faster crash recovery than either FFS > > or journaling file systems. > > Research (IIRC, Seltzer and Matthews) has shown that FFS outperforms > LFS when the FFS clustering code has been activated. The crash recovery > times can supposedly be alleviated by soft updates, I've not looked at > that yet. Journalling crash recovery vs LFS crash recovery is more complex > than a mere comparison of speed, as these can be tuned in both cases. The crash recovery of soft updates can be sped up considerably, in theory. In practice, you can't tell if the reason for the crash was an FS fault, or whether it was some other fault. Further, for most drives, if you had a DC failure to the drive in the middle of an actual write, you can have single sector format corruption (my personal opinion is that if this is possible, so is multiple sector corruption, if timed just right). This means that you need NVRAM for at least sector logging, and at most, track logging, in order to ensure replay based de-corruption on in progress writes. Further, consider that the drive may have a track cache that will result in out of range (for the OS) being written, and the corruption occuring there. The _only_ software approach for soft updates is really soft readonly, and this only works if the system is quiescent at the time of the crash (soft readonly in force, and the FS marked clean). This ignores the possibility of a software failure (e.g. the failure is assumed to be hardware or power). In the software failure case, there is no telling what data was corrupted, or whether some of it was written to disk prior to the corruption hitting something which the system noticed sufficiently to actually fail. Thus, soft updates are not a good strategy for fast failure recovery. --- LFS is probably fastest, but does not support implied relationships between data. For example, if I have a data file and an index file, and I two-stage commit it by writing new data and then a new index, if the failure occurs between these operations, I potentially lose my transaction. If this were a bank transaction, I would be really hurting in the wallet. If this were a different transaction, it may have less financial risk, but the correctness risk is the same. NB: The database above obviously records the new record to a different record entry, so that the old one is still a valid record in case of failure; the operation is thus non-atomic, but idempotent. The way LFS normally recovers is to go back to oldes time stamped and valid marked log as "the correct state of the FS", and discard and partial logs (and with them, implied metadata). --- JFS is slower. JFS recovers nearly the same as LFS, but must look at outstanding transactions, and actually back them out if they are incomplete, or roll them forward, if it can. The difference between these is whether the journal contains only a journal of events that have transpired, or a journal of both events that have transpired, and those intended to transpire, but not yet commited to the disk. This means that a JFS will keep implied relationships between data intact, so long as it is signalled before and after a transaction involving an implied relationship takes place. So long as the transaction completion is not signalled to the client application until after the transaction has been commited (this requires another hook to user space), then we have no problem with bank transactions. --- Someone said that LFS logs data and metadata, but JFS logs only metadata, and that's the difference. Obviously, this is untrue; a JFS logs transactions. Whether these transactions include only metadata, or include both metadata and data, is really a JFS implementation detail, not an attribute of JFS' themselves. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 14:12:58 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id 41DBB37B43F for ; Thu, 21 Sep 2000 14:12:56 -0700 (PDT) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id OAA23714; Thu, 21 Sep 2000 14:11:32 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp03.primenet.com, id smtpdAAAbQay_T; Thu Sep 21 14:11:11 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA17067; Thu, 21 Sep 2000 14:12:30 -0700 (MST) From: Terry Lambert Message-Id: <200009212112.OAA17067@usr08.primenet.com> Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) To: soules+@andrew.cmu.edu (Craig A Soules) Date: Thu, 21 Sep 2000 21:12:29 +0000 (GMT) Cc: freebsd-fs@FreeBSD.ORG In-Reply-To: from "Craig A Soules" at Sep 21, 2000 12:24:20 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Wouldn't it be possible to offer the same semantics as FFS in an LFS > implementation if the segment was (over)written after each operation? No, unless you used NVRAM to do the job, and did it in an idempotent way (e.g. you did intention logging of the changes, and the NVRAM version was not overwritten, only the disk version was). > > Does the cleaner work? (always the first question to ask when people > > talk about their LFS implementations). > > This may be wrong, so don't flame me, but... > I had heard that there was no fully working implementation of LFS in the > *BSDs for this reason (and in fact there was an LFS branch in FreeBSD as > well, but it was removed since the cleaner was in such shoddy shape). If you have a full source repository, the LFS in FreeBSD, which is Margo Seltzer's, and the same as that in NetBSD (except the FreeBSD once has not been maintained as regards the VM and buffer cache unification, so there is work to be done there), can be recovered from the attic, by checking the code out by the date before it was sent to the attic. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 14:15:18 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id 2EACF37B42C for ; Thu, 21 Sep 2000 14:15:17 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id OAA03918; Thu, 21 Sep 2000 14:15:29 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp05.primenet.com, id smtpdAAATeaWFh; Thu Sep 21 14:15:18 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA17206; Thu, 21 Sep 2000 14:14:59 -0700 (MST) From: Terry Lambert Message-Id: <200009212114.OAA17206@usr08.primenet.com> Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) To: dg@root.com Date: Thu, 21 Sep 2000 21:14:59 +0000 (GMT) Cc: tuinstra@clarkson.edu (Dwight Tuinstra), freebsd-fs@FreeBSD.ORG (freebsd-fs) In-Reply-To: <200009211638.JAA09518@implode.root.com> from "David Greenman" at Sep 21, 2000 09:38:12 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > Have you done any comparisons with FFS+softupdates? The goal of softupdates > was to be as fast or faster than LFS for everything, not require a cleanerd, > and along with "snapshots" eliminate requiring fsck before system startup. Soft updates can not get around the full fsck problem. See my other posting under the title "Crash recovery", wherein I compare the crash recovery mechanisms, with special attention to the soft updates problem with abbreviated crash recovery. Soft updates does some good things, but it also does some bad things (at least relative to an LFS or JFS, and crash recovery). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 14:20:37 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 7AADE37B42C; Thu, 21 Sep 2000 14:20:34 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id OAA07354; Thu, 21 Sep 2000 14:18:06 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAA_qai7n; Thu Sep 21 14:17:46 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA17349; Thu, 21 Sep 2000 14:20:00 -0700 (MST) From: Terry Lambert Message-Id: <200009212120.OAA17349@usr08.primenet.com> Subject: Re: disable write caching with softupdates? To: archie@whistle.com (Archie Cobbs) Date: Thu, 21 Sep 2000 21:20:00 +0000 (GMT) Cc: mbendiks@eunet.no (Marius Bendiksen), sos@freebsd.org, fs@freebsd.org, terry@lambert.org In-Reply-To: <200009211702.KAA13662@bubba.whistle.com> from "Archie Cobbs" at Sep 21, 2000 10:02:16 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > Please make this conditional, as people with non-crippled hardware might > > want to employ the write cache. A sysctl or build option would be best. > > Yep, that was what I was originally suggesting. > > Soren do you want me to try to come up with a patch? > I don't claim to understand IDE technology.. can you > just send the disable command at any time or is it > more complicated than that? I would also send a flush command. I would be skeptical of an IDE drive perhaps disabling the cache, and leaving the thing dirty. Not knowing the exact details of when the cache control may be changed, I can't say for sure (it may have to be done by the BIOS at boot time, for all I know. If you go over to my cube, there is a full technical specification of the ATA stuff for the Quantum drives we used to use in the InterJet II, prior to the IBM acquisition, and it has this information in it, in detail. Yung gave me the documentation when Julian and I were trying to figure out how much DC holdup was required after an AC failure for us to be able to ensure that thee was no sector corruptions, and that the outstanding operations on the drive were committed, after throwing a pin into the soft updates clock to ensure that no more writes were scheduled. FWIW... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 14:50:43 2000 Delivered-To: freebsd-fs@freebsd.org Received: from implode.root.com (root.com [209.102.106.178]) by hub.freebsd.org (Postfix) with ESMTP id D92DD37B424 for ; Thu, 21 Sep 2000 14:50:41 -0700 (PDT) Received: from implode.root.com (localhost [127.0.0.1]) by implode.root.com (8.8.8/8.8.5) with ESMTP id OAA10470; Thu, 21 Sep 2000 14:46:10 -0700 (PDT) Message-Id: <200009212146.OAA10470@implode.root.com> To: Terry Lambert Cc: tuinstra@clarkson.edu (Dwight Tuinstra), freebsd-fs@FreeBSD.ORG (freebsd-fs) Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) In-reply-to: Your message of "Thu, 21 Sep 2000 21:14:59 -0000." <200009212114.OAA17206@usr08.primenet.com> From: David Greenman Reply-To: dg@root.com Date: Thu, 21 Sep 2000 14:46:10 -0700 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >> Have you done any comparisons with FFS+softupdates? The goal of softupdates >> was to be as fast or faster than LFS for everything, not require a cleanerd, >> and along with "snapshots" eliminate requiring fsck before system startup. > >Soft updates can not get around the full fsck problem. See my other >posting under the title "Crash recovery", wherein I compare the crash >recovery mechanisms, with special attention to the soft updates >problem with abbreviated crash recovery. > >Soft updates does some good things, but it also does some bad things >(at least relative to an LFS or JFS, and crash recovery). I didn't say it could. What I said is that it didn't need to be run before system startup occured. -DG David Greenman Co-founder, The FreeBSD Project - http://www.freebsd.org President, TeraSolutions, Inc. - http://www.terasolutions.com Pave the road of life with opportunities. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 14:58: 3 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 9F5EC37B424 for ; Thu, 21 Sep 2000 14:58:00 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id OAA14512; Thu, 21 Sep 2000 14:33:34 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAAPbaiZB; Thu Sep 21 14:33:11 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id OAA17807; Thu, 21 Sep 2000 14:35:23 -0700 (MST) From: Terry Lambert Message-Id: <200009212135.OAA17807@usr08.primenet.com> Subject: Re: disable write caching with softupdates? To: archie@whistle.com (Archie Cobbs) Date: Thu, 21 Sep 2000 21:35:22 +0000 (GMT) Cc: sos@freebsd.dk (Soren Schmidt), mbendiks@eunet.no, terry@lambert.org, fs@freebsd.org In-Reply-To: <200009211900.MAA18117@bubba.whistle.com> from "Archie Cobbs" at Sep 21, 2000 12:00:32 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > > Thanks.. I was talking about a sysctl patch, where you could > > > turn write caching on or off at any time while the system is > > > running via sysctl. Is it even possible to do that? > > > > Why on earth would you want this as a sysctl knob ? > > Either you want to play it safe, or you dont care, this is not > > something you change your mind about now and then.... > > OK, here's an example. You have an /etc/rc script that checks > at startup what kind of disk you have (eg "dmesg | grep -q QUANTUM") > and for certain disk types that are known to lie about their write > guarantees enables or disables write caching as appropriate. > > FYI I can confirm what Terry said that few disks we've encountered > guarantee atomic sector writes in the face of power failure (i.e., > leading to a corrupted/lost sector), much less flushing the entire > write cache, so IMHO the answer to.. A better reason would be "the same reason I can arbitrarily mount an FS as either sync or async". 8-). > > The _right_ solution of cause is to have the FS code pass down > > a flag that says only write this when all preceeding bufs are > > on the media, now _that_ would be nice and work for both ATA > > and SCSI devices.... > > Agreed, this is the best solution. I think it is necessary > in order to make the promise of softupdates really rigorous. Actually, this is implicit in the design of soft updates; it's what soft updates is truing to buy you. Right now, soft updates will stall (not schedule) additional writes, until it knows that preceeding writes (in an ordered dependency list) have been completed. Passing down a flag would still be useful, I think, since it would erase latencies, and permit other non-soft updates type experimentation. But for soft updates, it wouldn't help, as as soon as the buffer is scheduled, it is owned by the system, and there is an implied stall on the write pipeline, since you can't modify a dependency in the "scheduled in the buffer cache, but not scheduled by the disk driver" state. To get around this, you would have to have the concept of multiple reader/single writer locks on stuff in the buffer cache queue. You should talk to Matt Dillon and Kirk McKusick about the stall problem which occurs after the stuff gets into this state, since it's what screwed up the SAMBA concurrency benchmarks at Ziff Davis (the ones which were published for Linux and NT, but not published for FreeBSD). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 15: 9:26 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp04.primenet.com (smtp04.primenet.com [206.165.6.134]) by hub.freebsd.org (Postfix) with ESMTP id 83CB537B43E for ; Thu, 21 Sep 2000 15:09:23 -0700 (PDT) Received: (from daemon@localhost) by smtp04.primenet.com (8.9.3/8.9.3) id PAA29454; Thu, 21 Sep 2000 15:06:56 -0700 (MST) Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp04.primenet.com, id smtpdAAAxNaWm5; Thu Sep 21 15:06:41 2000 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id PAA18776; Thu, 21 Sep 2000 15:08:53 -0700 (MST) From: Terry Lambert Message-Id: <200009212208.PAA18776@usr08.primenet.com> Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) To: dg@root.com Date: Thu, 21 Sep 2000 22:08:53 +0000 (GMT) Cc: tlambert@primenet.com (Terry Lambert), tuinstra@clarkson.edu (Dwight Tuinstra), freebsd-fs@FreeBSD.ORG (freebsd-fs) In-Reply-To: <200009212146.OAA10470@implode.root.com> from "David Greenman" at Sep 21, 2000 02:46:10 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >Soft updates can not get around the full fsck problem. See my other > >posting under the title "Crash recovery", wherein I compare the crash > >recovery mechanisms, with special attention to the soft updates > >problem with abbreviated crash recovery. > > I didn't say it could. What I said is that it didn't need to be run before > system startup occured. The problem is that you could have disk corruption above and beyond that which can occur during normal operation, when there is a failure resulting in a reboot with an unclean FS. This may be a corrupt sector containing metadata (maybe even for the "/" directory or "/kernel", if you were writing a new kernel at the time of the crash), or it may be other corrupt data which became corrupted in a cascade failure that resulted in the crash after one or more corrupted blocks were written to disk. Soft updates simply can't recover from this. If, on the other hand, it were a kernel panic that didn't result in corrupt data being written to disk, then there's no danger of a corrupt sector from a DC failure, and there is no danger of other corrupt data needing fsck'ing, so you would be in the situation where the only thing that would be out of date is the cylinder group bitmaps; you could clean this in the background by "locking" access on a cylinder group by cylinder group basis for a short period of time, to clear bits in the bitmap that said an unallocated sector was allocated. This might be seen as brief stalls by an especially observant user or program (say someone is doing profiling of code at the time), but could be accomplished in the background. The problem is that you can not know the reason for the crash, until after the recovery. If there were available CMOS, you could write a "power failure" value into it at boot time, and then write a "safe panic" or an "unsafe panic" code into it at crash time (a power failure would leave the "power failure" code there). The only valid background case would be for a "safe panic", if you could really guarantee such a thing. The worst possible failure resulting in a reboot is a hardware failure of the disk; I would really be loathe to try cleaning in the background after a track failure or even a sector failure (sector failures are identical to sector format corruption after a DC failure during a write, FWIW). Look, soft updates are a good thing, but they aren't a panacea for all problems. Let's laude them for what they do right, but not misrepresent them as doing something they can't. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 15:43:54 2000 Delivered-To: freebsd-fs@freebsd.org Received: from po6.andrew.cmu.edu (PO6.ANDREW.CMU.EDU [128.2.10.106]) by hub.freebsd.org (Postfix) with ESMTP id D9BD037B43C for ; Thu, 21 Sep 2000 15:43:48 -0700 (PDT) Received: (from postman@localhost) by po6.andrew.cmu.edu (8.9.3/8.9.3) id SAA09642 for freebsd-fs@freebsd.org; Thu, 21 Sep 2000 18:43:38 -0400 (EDT) Received: via switchmail; Thu, 21 Sep 2000 18:43:38 -0400 (EDT) Received: from unix10.andrew.cmu.edu via qmail ID ; Thu, 21 Sep 2000 18:43:06 -0400 (EDT) Received: from unix10.andrew.cmu.edu via qmail ID ; Thu, 21 Sep 2000 18:43:06 -0400 (EDT) Received: from mms.4.60.May..8.2000.10.35.10.sun4.57.EzMail.2.0.CUILIB.3.45.SNAP.NOT.LINKED.unix10.andrew.cmu.edu.sun4x.57 via MS.5.6.unix10.andrew.cmu.edu.sun4_57; Thu, 21 Sep 2000 18:43:05 -0400 (EDT) Message-ID: Date: Thu, 21 Sep 2000 18:43:05 -0400 (EDT) From: Craig A Soules To: freebsd-fs@freebsd.org Subject: Re: Journaling Filesystems in bsd? (LFS, anyone?) Cc: freebsd-fs In-Reply-To: References: Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Excerpts from internet.computing.freebsd.fs: 21-Sep-100 Re: Journaling Filesystems .. by Christopher Stein@eecs.h > On Thu, 21 Sep 2000, Craig A Soules wrote: > > > Excerpts from internet.computing.freebsd.fs: 21-Sep-100 Re: Journaling > > > Log-structured file systems offer different semantics than > > > synchronous journaling file systems. Synchronous journaling can > > > offer the traditional durability of create. Nothing is durable > > > > Wouldn't it be possible to offer the same semantics as FFS in an LFS > > implementation if the segment was (over)written after each operation? > > Partial segment writes? well, that's not quite what I had in mind... I was rather thinking, imagine an async LFS implementation, where you wait to write out each segment until it is full. Instead, do the same thing in memory, but after each operation, also commit that segment to disk. You could end up writing the segment an undefined # of times in a row, but it would give you the same semantics as a sync journaling system, as well as all of the same foreground characteristics of LFS (no additional fragmentation as you hinted at). It seems as though that should be feasable. That way the cleaner wouldn't have additional work (although still all the regular overhead of LFS). craig To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Sep 21 19:35: 1 2000 Delivered-To: freebsd-fs@freebsd.org Received: from mail-out.visi.com (kauket.visi.com [209.98.98.22]) by hub.freebsd.org (Postfix) with ESMTP id 03BE737B43E for ; Thu, 21 Sep 2000 19:34:56 -0700 (PDT) Received: from isis.visi.com (isis.visi.com [209.98.98.8]) by mail-out.visi.com (Postfix) with ESMTP id CDC8C38CA for ; Thu, 21 Sep 2000 21:34:39 -0500 (CDT) Received: (from jfb@localhost) by isis.visi.com (8.8.8/8.8.8) id VAA26517 for freebsd-fs@freebsd.org; Thu, 21 Sep 2000 21:34:39 -0500 (CDT) From: Date: Thu, 21 Sep 2000 21:34:39 -0500 To: freebsd-fs@freebsd.org Subject: New kernel and weird FAT problems Message-ID: <20000921213439.E26139@visi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii User-Agent: Mutt/0.96.1i Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org (cc-d to -questions) Hello, all, I've just built a new kernel (4.1-STABLE, config file included at the end of this post), and upon rebooting, my FAT32 filesystem, which I had been mounting as a regular user, was now mounted as root, with null permissions all the way down the directory tree. This is very bizarre -- I unmounted and remounted the filesystem to no avail. Is this a known problem? Did I, in my zeal to delete unnecessary devices in my kernel, accidentally knock my MSDOS filesystem support tits up? Thanks in advance, (jfb) ------------------------------------------------------------------------ # # GENERIC -- Generic kernel configuration file for FreeBSD/i386 # # For more information on this file, please read the handbook section on # Kernel Configuration Files: # # http://www.FreeBSD.org/handbook/kernelconfig-config.html # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (http://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ./LINT configuration file. If you are # in doubt as to the purpose or necessity of a line, check first in LINT. # # $FreeBSD: src/sys/i386/conf/GENERIC,v 1.246.2.8 2000/07/20 02:51:02 msmith Exp $ machine i386 cpu I686_CPU ident jfb maxusers 32 #makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols options MATH_EMULATE #Support for x87 emulation options INET #InterNETworking options FFS #Berkeley Fast Filesystem options FFS_ROOT #FFS usable as root device [keep this!] options SOFTUPDATES #Enable FFS soft updates support options MFS #Memory Filesystem options MD_ROOT #MD is a potential root device options NFS #Network Filesystem options MSDOSFS #MSDOS Filesystem options CD9660 #ISO 9660 Filesystem options CD9660_ROOT #CD-ROM usable as root, CD9660 required options PROCFS #Process filesystem options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=15000 #Delay (in ms) before probing SCSI options UCONSOLE #Allow users to grab the console options USERCONFIG #boot -c editor options VISUAL_USERCONFIG #visual boot -c editor options KTRACE #ktrace(1) support options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues options SYSVSEM #SYSV-style semaphores options P1003_1B #Posix P1003_1B real-time extensions options _KPOSIX_PRIORITY_SCHEDULING options ICMP_BANDLIM #Rate limit bad replies options KBD_INSTALL_CDEV # install a CDEV entry in /dev device isa device pci # Floppy drives device fdc0 at isa? port IO_FD1 irq 6 drq 2 device fd0 at fdc0 drive 0 device fd1 at fdc0 drive 1 # ATA and ATAPI devices device ata0 at isa? port IO_WD1 irq 14 device ata1 at isa? port IO_WD2 irq 15 device ata device atadisk # ATA disk drives device atapicd # ATAPI CDROM drives device atapifd # ATAPI floppy drives options ATA_STATIC_ID #Static device numbering # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc0 at isa? port IO_KBD device atkbd0 at atkbdc? irq 1 flags 0x1 device psm0 at atkbdc? irq 12 device vga0 at isa? # splash screen/screen saver pseudo-device splash # syscons is the default console driver, resembling an SCO console device sc0 at isa? flags 0x100 # Floating point support - do not disable. device npx0 at nexus? port IO_NPX irq 13 # Power management support (see LINT for more options) device apm0 at nexus? disable flags 0x20 # Advanced Power Management # Serial (COM) ports device sio0 at isa? port IO_COM1 flags 0x10 irq 4 # PCI Ethernet NICs. device miibus # MII bus support device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') # Pseudo devices - the number indicates how many units to allocated. pseudo-device loop # Network loopback pseudo-device ether # Ethernet support pseudo-device tun # Packet tunnel. pseudo-device pty # Pseudo-ttys (telnet etc) pseudo-device md # Memory "disks" # The `bpf' pseudo-device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! pseudo-device bpf #Berkeley packet filter # USB support device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device usb # USB Bus (required) device ugen # Generic device uhid # "Human Interface Devices" device ulpt # Printer device ums # Mouse device pcm # sound driver ------------------------------------------------------------------------ -- When C++ is your hammer, everything looks like a thumb. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 1: 0:43 2000 Delivered-To: freebsd-fs@freebsd.org Received: from netplex.com.au (adsl-63-207-30-186.dsl.snfc21.pacbell.net [63.207.30.186]) by hub.freebsd.org (Postfix) with ESMTP id 6312737B422; Fri, 22 Sep 2000 01:00:38 -0700 (PDT) Received: from netplex.com.au (peter@localhost [127.0.0.1]) by netplex.com.au (8.11.0/8.9.3) with ESMTP id e8M7vtG46023; Fri, 22 Sep 2000 00:57:55 -0700 (PDT) (envelope-from peter@netplex.com.au) Message-Id: <200009220757.e8M7vtG46023@netplex.com.au> X-Mailer: exmh version 2.1.1 10/15/1999 To: Soren Schmidt Cc: Stephen.Byan@quantum.com (Stephen Byan), mbendiks@eunet.no ('Marius Bendiksen'), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG Subject: Re: disable write caching with softupdates? In-Reply-To: <200009211502.RAA92422@freebsd.dk> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Date: Fri, 22 Sep 2000 00:57:55 -0700 From: Peter Wemm Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Soren Schmidt wrote: > It seems Stephen Byan wrote: > > Marius Bendiksen [mailto:mbendiks@eunet.no] wrote: > > > > > > Contrast this 10% performance hit versus what you get when > > > you disable > > > > caching entirely. > > > > > > I think you will see that on some drives, this may have a greater > > > performance impact than not caching at all. > > > > Perhaps Søren will be kind enough to run the experiment? I'd be interested > > in analyzing cases in ATA drives where flushing delivers worse performance > > than disabling cache. > > Well, I have been toying a bit with this, so far results are just > timing of a make -j16 buildworld on two IBM DJNA drives (ie no tags) > with varius setups. > > ATA driver "as is": > 3602.63 real 0.00 user 2865.62 sys > > ATA driver with flush cache on "BIO_ORDERED": > 3964.18 real 0.00 user 2870.09 sys > > ATA driver with write cache disabled: > 4423.30 real 0.00 user 2871.87 sys > > So, having the write cache there definitly is a win. It is a win only if you do not value your data. I would gladly turn it off. How do we do this right now? (ie: completely off) > I'll try this on TWO IBM DTLA drives with tags enabled and see what gives.. I'm curious to know if tagged queueing compensates for the loss incurred by disabling write caching. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 1:10:29 2000 Delivered-To: freebsd-fs@freebsd.org Received: from freebsd.dk (freebsd.dk [212.242.42.178]) by hub.freebsd.org (Postfix) with ESMTP id B5F0F37B423; Fri, 22 Sep 2000 01:10:24 -0700 (PDT) Received: (from sos@localhost) by freebsd.dk (8.9.3/8.9.1) id KAA38726; Fri, 22 Sep 2000 10:13:13 +0200 (CEST) (envelope-from sos) From: Soren Schmidt Message-Id: <200009220813.KAA38726@freebsd.dk> Subject: Re: disable write caching with softupdates? In-Reply-To: <200009220757.e8M7vtG46023@netplex.com.au> from Peter Wemm at "Sep 22, 2000 00:57:55 am" To: peter@netplex.com.au (Peter Wemm) Date: Fri, 22 Sep 2000 10:13:13 +0200 (CEST) Cc: Stephen.Byan@quantum.com (Stephen Byan), mbendiks@eunet.no ('Marius Bendiksen'), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org It seems Peter Wemm wrote: > > Well, I have been toying a bit with this, so far results are just > > timing of a make -j16 buildworld on two IBM DJNA drives (ie no tags) > > with varius setups. > > > > ATA driver "as is": > > 3602.63 real 0.00 user 2865.62 sys > > > > ATA driver with flush cache on "BIO_ORDERED": > > 3964.18 real 0.00 user 2870.09 sys > > > > ATA driver with write cache disabled: > > 4423.30 real 0.00 user 2871.87 sys > > > > So, having the write cache there definitly is a win. > > It is a win only if you do not value your data. I would gladly turn it off. > How do we do this right now? (ie: completely off) Se the patch from yesterday, but I'llcome up with a better one (sysctl) as soon as I get a few hours... > > I'll try this on TWO IBM DTLA drives with tags enabled and see what gives.. > > I'm curious to know if tagged queueing compensates for the loss incurred by > disabling write caching. I'll know later today and will post my findings... -Søren To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 1:32:17 2000 Delivered-To: freebsd-fs@freebsd.org Received: from smtp05.primenet.com (smtp05.primenet.com [206.165.6.135]) by hub.freebsd.org (Postfix) with ESMTP id B16DC37B42C; Fri, 22 Sep 2000 01:32:13 -0700 (PDT) Received: (from daemon@localhost) by smtp05.primenet.com (8.9.3/8.9.3) id BAA23297; Fri, 22 Sep 2000 01:32:30 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp05.primenet.com, id smtpdAAAmJaGDT; Fri Sep 22 01:32:24 2000 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id BAA09658; Fri, 22 Sep 2000 01:32:02 -0700 (MST) From: Terry Lambert Message-Id: <200009220832.BAA09658@usr05.primenet.com> Subject: Re: disable write caching with softupdates? To: peter@netplex.com.au (Peter Wemm) Date: Fri, 22 Sep 2000 08:32:02 +0000 (GMT) Cc: sos@freebsd.dk (Soren Schmidt), Stephen.Byan@quantum.com (Stephen Byan), mbendiks@eunet.no ('Marius Bendiksen'), fs@FreeBSD.ORG, sos@FreeBSD.ORG, freeBSD-scsi@FreeBSD.ORG In-Reply-To: <200009220757.e8M7vtG46023@netplex.com.au> from "Peter Wemm" at Sep 22, 2000 12:57:55 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > So, having the write cache there definitly is a win. > > It is a win only if you do not value your data. I would gladly turn it off. > How do we do this right now? (ie: completely off) > > > I'll try this on TWO IBM DTLA drives with tags enabled and see what gives.. > > I'm curious to know if tagged queueing compensates for the loss incurred by > disabling write caching. For a multiple application system, the anser is "yes", so long as there are enough applications that the average stall time for a context switch equals or exceeds the latency induced by waiting for the write to complete (standard queueing theory 8-)). For a single user workstation, where there is generall only one or two applications running, the answer would be "no". For you to be able to see this, the appropriate test is probably a "make world" (or a similar large multielement compile) with an argument between "-j 8" and -j 12". Note that you should probably _not_ use a memfs /tmp for this, at the same time, since what you want to stress is the impact on concurrency under a multiprogram load. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 2:17:39 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (gate.interxion.com [194.153.74.13]) by hub.freebsd.org (Postfix) with ESMTP id DD6AB37B422; Fri, 22 Sep 2000 02:17:34 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id eBNAKcT37595; Sat, 23 Dec 2000 11:20:38 +0100 (CET) (envelope-from adrian) Date: Sat, 23 Dec 2000 11:20:38 +0100 From: Adrian Chadd To: freebsd-fs@freebsd.org Cc: freebsd-current@freebsd.org Subject: Re: Fsck wrappers, revisited Message-ID: <20001223112038.A37548@roaming.cacheboy.net> References: <20001222191317.A7529@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <20001222191317.A7529@roaming.cacheboy.net>; from adrian@FreeBSD.ORG on Fri, Dec 22, 2000 at 07:13:17PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Dec 22, 2000, Adrian Chadd wrote: > > > I've updated my fsck wrappers patchset to the latest netbsd and freebsd > fsck patches. I'd appreciate some feedback on them before I run off > and commit them (with my mentor, of course.) > > For those who aren't in the know, the general idea is that a single wrapper > program spawns a FS-specific fsck process a la mount and mount_*, making > multiple-FS support a lot easier. (Think about having fsck_ext2fs, fsck_msdos > and fsck_ffs doing your FSes on bootup..) > > They can be found at http://www.freebsd.org/~adrian/fsck/ . PLEASE read the > README before you use them, as there are a few gotchas. Thanks to some feedback from bp, I found a stupid mistake in my porting. Here's the patch: --- fsck.c.orig Sat Dec 23 11:13:30 2000 +++ fsck.c Sat Dec 23 11:13:34 2000 @@ -501,7 +501,7 @@ errx(1, "partition `%s' is not of a legal vfstype", str); - if ((vfstype = dktypenames[t]) == NULL) + if ((vfstype = fstypenames[t]) == NULL) errx(1, "vfstype `%s' on partition `%s' is not supported", fstypenames[t], str); So now is a problem which I'm sure the NetBSD people came up against. The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed this by creating a new list 'mountnames[]', which maps the fs type to a string. http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/sys/sys/disklabel.h.diff?r1=1.60&r2=1.61 What do people think about doing this as well? It would certainly make things a little tidier, but every time a new fs comes in the magic autodetection code will need to be updated (if appropriate, of course.) Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 2:28:59 2000 Delivered-To: freebsd-fs@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id 070BA37B42C; Fri, 22 Sep 2000 02:28:53 -0700 (PDT) Received: by relay.butya.kz (Postfix, from userid 1000) id 2A6E128805; Fri, 22 Sep 2000 16:27:58 +0700 (ALMST) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id 204CF28803; Fri, 22 Sep 2000 16:27:58 +0700 (ALMST) Date: Fri, 22 Sep 2000 16:27:58 +0700 (ALMST) From: Boris Popov To: Adrian Chadd Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Fsck wrappers, revisited In-Reply-To: <20001223112038.A37548@roaming.cacheboy.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=koi8-r Content-Transfer-Encoding: 8BIT Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, 23 Dec 2000, Adrian Chadd wrote: > So now is a problem which I'm sure the NetBSD people came up against. > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > this by creating a new list 'mountnames[]', which maps the fs type to > a string. Probably a hard link to fsck_ffs will do the job fine and makes it clear to see which fs'es are supported: # ls -ail fsck* 6338 -r-xr-xr-x 1 root wheel 66032 22 ÓÅÎ 16:24 fsck 6334 -r-xr-xr-x 3 root wheel 290896 22 ÓÅÎ 15:41 fsck_4.2BSD 6334 -r-xr-xr-x 3 root wheel 290896 22 ÓÅÎ 15:41 fsck_ffs 6334 -r-xr-xr-x 3 root wheel 290896 22 ÓÅÎ 15:41 fsck_ufs -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 2:39: 8 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (gate.interxion.com [194.153.74.13]) by hub.freebsd.org (Postfix) with ESMTP id 58A9137B423; Fri, 22 Sep 2000 02:39:04 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id eBNAfo338064; Sat, 23 Dec 2000 11:41:50 +0100 (CET) (envelope-from adrian) Date: Sat, 23 Dec 2000 11:41:50 +0100 From: Adrian Chadd To: Boris Popov Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Fsck wrappers, revisited Message-ID: <20001223114150.A38052@roaming.cacheboy.net> References: <20001223112038.A37548@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bp@butya.kz on Fri, Sep 22, 2000 at 04:27:58PM +0700 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Sep 22, 2000, Boris Popov wrote: > On Sat, 23 Dec 2000, Adrian Chadd wrote: > > > So now is a problem which I'm sure the NetBSD people came up against. > > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > > this by creating a new list 'mountnames[]', which maps the fs type to > > a string. > > Probably a hard link to fsck_ffs will do the job fine and makes it > clear to see which fs'es are supported: > > # ls -ail fsck* > 6338 -r-xr-xr-x 1 root wheel 66032 22 sen 16:24 fsck > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_4.2BSD > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_ffs > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_ufs The trouble is that some of the FS strings have spaces in their filenames. This might confuse a few people. Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 2:50:14 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (gate.interxion.com [194.153.74.13]) by hub.freebsd.org (Postfix) with ESMTP id 98F4137B424; Fri, 22 Sep 2000 02:50:09 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id e8M9o7m38193; Fri, 22 Sep 2000 11:50:07 +0200 (CEST) (envelope-from adrian) Date: Fri, 22 Sep 2000 11:50:07 +0200 From: Adrian Chadd To: freebsd-fs@FreeBSD.ORG Cc: freebsd-current@FreeBSD.ORG Subject: Re: Fsck wrappers, revisited Message-ID: <20000922115007.A38174@roaming.cacheboy.net> References: <20001222191317.A7529@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: <20001222191317.A7529@roaming.cacheboy.net>; from adrian@FreeBSD.ORG on Fri, Dec 22, 2000 at 07:13:17PM +0100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Dec 22, 2000, Adrian Chadd wrote: > > > I've updated my fsck wrappers patchset to the latest netbsd and freebsd > fsck patches. I'd appreciate some feedback on them before I run off > and commit them (with my mentor, of course.) > > For those who aren't in the know, the general idea is that a single wrapper > program spawns a FS-specific fsck process a la mount and mount_*, making > multiple-FS support a lot easier. (Think about having fsck_ext2fs, fsck_msdos > and fsck_ffs doing your FSes on bootup..) > > They can be found at http://www.freebsd.org/~adrian/fsck/ . PLEASE read the > README before you use them, as there are a few gotchas. .. and I've just redone them again, with more bp comments. I've killed fsck_ffs/preen.c and moved the only function the fsck_ffs code now uses to a new util.c . This makes fsck_ffs a tiny bit smaller, and pretty much stomps on the shared code problem. Anyone else up for testing ? Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 4: 0:14 2000 Delivered-To: freebsd-fs@freebsd.org Received: from relay.butya.kz (butya-gw.butya.kz [212.154.129.94]) by hub.freebsd.org (Postfix) with ESMTP id DDD4A37B423; Fri, 22 Sep 2000 04:00:08 -0700 (PDT) Received: by relay.butya.kz (Postfix, from userid 1000) id 0626E28805; Fri, 22 Sep 2000 18:00:04 +0700 (ALMST) Received: from localhost (localhost [127.0.0.1]) by relay.butya.kz (Postfix) with ESMTP id E740828803; Fri, 22 Sep 2000 18:00:04 +0700 (ALMST) Date: Fri, 22 Sep 2000 18:00:04 +0700 (ALMST) From: Boris Popov To: Adrian Chadd Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Fsck wrappers, revisited In-Reply-To: <20001223114150.A38052@roaming.cacheboy.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, 23 Dec 2000, Adrian Chadd wrote: > On Fri, Sep 22, 2000, Boris Popov wrote: > > On Sat, 23 Dec 2000, Adrian Chadd wrote: > > > > > So now is a problem which I'm sure the NetBSD people came up against. > > > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > > > this by creating a new list 'mountnames[]', which maps the fs type to > > > a string. > > > > Probably a hard link to fsck_ffs will do the job fine and makes it > > clear to see which fs'es are supported: > > > > # ls -ail fsck* > > 6338 -r-xr-xr-x 1 root wheel 66032 22 sen 16:24 fsck > > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_4.2BSD > > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_ffs > > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_ufs > > The trouble is that some of the FS strings have spaces in their filenames. > This might confuse a few people. These (and probably other confusing) characters can be replaced with underscores without much harm. -- Boris Popov http://www.butya.kz/~bp/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 4:37:13 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (gate.interxion.com [194.153.74.13]) by hub.freebsd.org (Postfix) with ESMTP id B192A37B422; Fri, 22 Sep 2000 04:37:08 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id e8MBar711458; Fri, 22 Sep 2000 13:36:53 +0200 (CEST) (envelope-from adrian) Date: Fri, 22 Sep 2000 13:36:52 +0200 From: Adrian Chadd To: Boris Popov Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Fsck wrappers, revisited Message-ID: <20000922133652.A10844@roaming.cacheboy.net> References: <20001223114150.A38052@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bp@butya.kz on Fri, Sep 22, 2000 at 06:00:04PM +0700 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Fri, Sep 22, 2000, Boris Popov wrote: > On Sat, 23 Dec 2000, Adrian Chadd wrote: > > > On Fri, Sep 22, 2000, Boris Popov wrote: > > > On Sat, 23 Dec 2000, Adrian Chadd wrote: > > > > > > > So now is a problem which I'm sure the NetBSD people came up against. > > > > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > > > > this by creating a new list 'mountnames[]', which maps the fs type to > > > > a string. > > > > > > Probably a hard link to fsck_ffs will do the job fine and makes it > > > clear to see which fs'es are supported: > > > > > > # ls -ail fsck* > > > 6338 -r-xr-xr-x 1 root wheel 66032 22 sen 16:24 fsck > > > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_4.2BSD > > > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_ffs > > > 6334 -r-xr-xr-x 3 root wheel 290896 22 sen 15:41 fsck_ufs > > > > The trouble is that some of the FS strings have spaces in their filenames. > > This might confuse a few people. > > These (and probably other confusing) characters can be replaced > with underscores without much harm. That shouldn't be that hard to do. What do others think? Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 8:51:14 2000 Delivered-To: freebsd-fs@freebsd.org Received: from Gloria.CAM.ORG (Gloria.CAM.ORG [205.151.116.34]) by hub.freebsd.org (Postfix) with ESMTP id CB20337B422; Fri, 22 Sep 2000 08:51:10 -0700 (PDT) Received: from localhost (intmktg@localhost) by Gloria.CAM.ORG (8.9.3/8.9.3) with ESMTP id LAA32027; Fri, 22 Sep 2000 11:54:10 -0400 Date: Fri, 22 Sep 2000 11:54:10 -0400 (EDT) From: Marc Tardif To: freebsd-fs@freebsd.org, freebsd-scsi@freebsd.org Subject: ccd with other filesystems Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org At which level does ccd concatenate or mirror disks? Is the operation performed on slices or partitions? If slices, does it have to be set to sysid 165? If partitions, does it have to be in the FFS format? I ask because I'm not sure how the word "partition" is used in the manpage, is it suppose to mean a slice (as in DOS partition) or the partition of a slice? Also, I'm intrigued by the following passage: Note that the `raw' partitions of the disks should not be combined. The kernel will only allow component partitions of type FS_BSDFFS. Does this mean ccd will only accept FFS partitions? Could msdos partitions, be concatenated or mirrored for example? Also, why does it say "the keryenl will only allow", isn't it ccd which allows? Lastly, if ccd doesn't act on raw partitions, does that apply to vinum also? To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 9: 0:28 2000 Delivered-To: freebsd-fs@freebsd.org Received: from gidora.zeta.org.au (gidora.zeta.org.au [203.26.10.25]) by hub.freebsd.org (Postfix) with SMTP id A61D837B43C for ; Fri, 22 Sep 2000 09:00:20 -0700 (PDT) Received: (qmail 16136 invoked from network); 22 Sep 2000 16:00:16 -0000 Received: from unknown (HELO bde.zeta.org.au) (203.2.228.102) by gidora.zeta.org.au with SMTP; 22 Sep 2000 16:00:16 -0000 Date: Sat, 23 Sep 2000 03:00:11 +1100 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Adrian Chadd Cc: freebsd-fs@FreeBSD.ORG, freebsd-current@FreeBSD.ORG Subject: Re: Fsck wrappers, revisited In-Reply-To: <20001223112038.A37548@roaming.cacheboy.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, 23 Dec 2000, Adrian Chadd wrote: > Here's the patch: > > --- fsck.c.orig Sat Dec 23 11:13:30 2000 > +++ fsck.c Sat Dec 23 11:13:34 2000 > @@ -501,7 +501,7 @@ > errx(1, "partition `%s' is not of a legal vfstype", > str); > - if ((vfstype = dktypenames[t]) == NULL) > + if ((vfstype = fstypenames[t]) == NULL) > errx(1, "vfstype `%s' on partition `%s' is not supported", > fstypenames[t], str); > > > So now is a problem which I'm sure the NetBSD people came up against. > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > this by creating a new list 'mountnames[]', which maps the fs type to > a string. fs typenames are already strings in FreeBSD (the kernel's vfc_index is an implementation detail which should not be visible in applications). > http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/sys/sys/disklabel.h.diff?r1=1.60&r2=1.61 > > What do people think about doing this as well? It would certainly make things > a little tidier, but every time a new fs comes in the magic autodetection code > will need to be updated (if appropriate, of course.) This would be a bug. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 11:47:59 2000 Delivered-To: freebsd-fs@freebsd.org Received: from InterJet.elischer.org (c421509-a.pinol1.sfba.home.com [24.7.86.9]) by hub.freebsd.org (Postfix) with ESMTP id 46AC937B42C; Fri, 22 Sep 2000 11:47:57 -0700 (PDT) Received: from InterJet.elischer.org (InterJet.elischer.org [192.168.1.1]) by InterJet.elischer.org (8.9.1a/8.9.1) with ESMTP id KAA24087; Thu, 21 Sep 2000 10:39:44 -0700 (PDT) Date: Thu, 21 Sep 2000 10:39:43 -0700 (PDT) From: Julian Elischer To: Marius Bendiksen Cc: Soren Schmidt , Terry Lambert , Archie Cobbs , fs@FreeBSD.ORG, sos@FreeBSD.ORG Subject: Re: disable write caching with softupdates? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org I know of no non-crippled hardware in this case... i.e. I have yet to see a drive that will complete a write correctly in teh case of a power fail, after having told the system that it did it correctly. The problem is that if two writes are re-ordered by the drive soft updates may fail.... the second write may depend on teh first having succeeded.. On Thu, 21 Sep 2000, Marius Bendiksen wrote: > > > The ATA drives Whistle is using, which is what I'm assuming > > > Archie is on about, do _not_ support this facility. As far > > > as I can tell, there wre some SCSI drives manufactured by > > > IBM at one time which could do this, and some lab drives at > > > Quantum (also SCSI). > > Hmm, well, lets disable this then, there is no need to complicate > > things :) > > Please make this conditional, as people with non-crippled hardware might > want to employ the write cache. A sysctl or build option would be best. > > Marius > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 12:56:22 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (roaming.cacheboy.net [203.56.168.69]) by hub.freebsd.org (Postfix) with ESMTP id 02F5C37B424; Fri, 22 Sep 2000 12:56:17 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id e8MJtu600558; Fri, 22 Sep 2000 21:55:56 +0200 (CEST) (envelope-from adrian) Date: Fri, 22 Sep 2000 21:55:55 +0200 From: Adrian Chadd To: Bruce Evans Cc: freebsd-fs@FreeBSD.ORG, freebsd-current@FreeBSD.ORG Subject: Re: Fsck wrappers, revisited Message-ID: <20000922215555.A449@roaming.cacheboy.net> References: <20001223112038.A37548@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bde@zeta.org.au on Sat, Sep 23, 2000 at 03:00:11AM +1100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Sep 23, 2000, Bruce Evans wrote: > On Sat, 23 Dec 2000, Adrian Chadd wrote: > > > Here's the patch: > > > > --- fsck.c.orig Sat Dec 23 11:13:30 2000 > > +++ fsck.c Sat Dec 23 11:13:34 2000 > > @@ -501,7 +501,7 @@ > > errx(1, "partition `%s' is not of a legal vfstype", > > str); > > - if ((vfstype = dktypenames[t]) == NULL) > > + if ((vfstype = fstypenames[t]) == NULL) > > errx(1, "vfstype `%s' on partition `%s' is not supported", > > fstypenames[t], str); > > > > > > So now is a problem which I'm sure the NetBSD people came up against. > > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > > this by creating a new list 'mountnames[]', which maps the fs type to > > a string. > > fs typenames are already strings in FreeBSD (the kernel's vfc_index is an > implementation detail which should not be visible in applications). > > > http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/sys/sys/disklabel.h.diff?r1=1.60&r2=1.61 > > > > What do people think about doing this as well? It would certainly make things > > a little tidier, but every time a new fs comes in the magic autodetection code > > will need to be updated (if appropriate, of course.) > > This would be a bug. So what would your suggestion here be? This is only used if a -t isn't given or you don't have an entry in /etc/fstab, so I personally don't think its a big issue. Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Fri Sep 22 21: 8:27 2000 Delivered-To: freebsd-fs@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id D3C1C37B423; Fri, 22 Sep 2000 21:08:16 -0700 (PDT) Received: (from grog@localhost) by wantadilla.lemis.com (8.11.0/8.9.3) id e8N486S89558; Sat, 23 Sep 2000 13:38:06 +0930 (CST) (envelope-from grog) Date: Sat, 23 Sep 2000 13:38:06 +0930 From: Greg Lehey To: Marc Tardif Cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: ccd with other filesystems Message-ID: <20000923133806.B78943@wantadilla.lemis.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: ; from intmktg@CAM.ORG on Fri, Sep 22, 2000 at 11:54:10AM -0400 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Friday, 22 September 2000 at 11:54:10 -0400, Marc Tardif wrote: > At which level does ccd concatenate or mirror disks? Is the > operation performed on slices or partitions? Partitions. > If slices, does it have to be set to sysid 165? If partitions, does > it have to be in the FFS format? ccd requires a ufs partition (4.2BSD in the disk label). This is stupid, since what it puts there is not a ufs file system, and it encourages you to shoot yourself in the foot and overwrite a ufs file system. > I ask because I'm not sure how the word "partition" is used > in the manpage, is it suppose to mean a slice (as in DOS > partition) or the partition of a slice? Also, I'm intrigued > by the following passage: > Note that the `raw' partitions of the disks should not be > combined. I really don't understand what this is supposed to mean. > The kernel will only allow component partitions of type FS_BSDFFS. This is basically saying the same thing as I said above: you need a partition of 4.2BSD. > Does this mean ccd will only accept FFS partitions? Yes. > Could msdos partitions, be concatenated or mirrored for example? Don't confuse the partition type that ccd wants with the data you put on it. You can put MS-DOS file systems on ccd, though I can't imagine why you'd want to. > Also, why does it say "the keryenl will only allow", isn't it ccd > which allows? Yes. But it's in the kernel. > Lastly, if ccd doesn't act on raw partitions, does that apply to > vinum also? Vinum requires partitions of type (wait for it) Vinum. We now only have raw partitions, so both work with them. But a raw partition isn't the same thing as a non-4.2BSD partition. On the whole, I'd recommend using Vinum, which is more actively maintained and gives you more flexibility. But I suspect you still have a question out there which I haven't been able to guess. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Sep 23 2:45: 5 2000 Delivered-To: freebsd-fs@freebsd.org Received: from roaming.cacheboy.net (roaming.cacheboy.net [203.56.168.69]) by hub.freebsd.org (Postfix) with ESMTP id 8976937B424; Sat, 23 Sep 2000 02:44:55 -0700 (PDT) Received: (from adrian@localhost) by roaming.cacheboy.net (8.11.0/8.11.0) id e8N9iYW04433; Sat, 23 Sep 2000 11:44:34 +0200 (CEST) (envelope-from adrian) Date: Sat, 23 Sep 2000 11:44:34 +0200 From: Adrian Chadd To: Bruce Evans Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org Subject: Re: Fsck wrappers, revisited Message-ID: <20000923114434.A4419@roaming.cacheboy.net> References: <20001223112038.A37548@roaming.cacheboy.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.4i In-Reply-To: ; from bde@zeta.org.au on Sat, Sep 23, 2000 at 03:00:11AM +1100 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Sat, Sep 23, 2000, Bruce Evans wrote: > On Sat, 23 Dec 2000, Adrian Chadd wrote: > > > Here's the patch: > > > > --- fsck.c.orig Sat Dec 23 11:13:30 2000 > > +++ fsck.c Sat Dec 23 11:13:34 2000 > > @@ -501,7 +501,7 @@ > > errx(1, "partition `%s' is not of a legal vfstype", > > str); > > - if ((vfstype = dktypenames[t]) == NULL) > > + if ((vfstype = fstypenames[t]) == NULL) > > errx(1, "vfstype `%s' on partition `%s' is not supported", > > fstypenames[t], str); > > > > > > So now is a problem which I'm sure the NetBSD people came up against. > > The fstypenames are names like 4.2BSD, vinum, ISO9660, etc. NetBSD fixed > > this by creating a new list 'mountnames[]', which maps the fs type to > > a string. > > fs typenames are already strings in FreeBSD (the kernel's vfc_index is an > implementation detail which should not be visible in applications). Oh, wait. I understand what you're talking about now. There isn't any mapping to partition type (p_fstype) to fs typename string. > > http://cvsweb.netbsd.org/bsdweb.cgi/syssrc/sys/sys/disklabel.h.diff?r1=1.60&r2=1.61 > > > > What do people think about doing this as well? It would certainly make things > > a little tidier, but every time a new fs comes in the magic autodetection code > > will need to be updated (if appropriate, of course.) > > This would be a bug. Well, if you have any suggestions, I'm all for it. :-) Adrian -- Adrian Chadd "The main reason Santa is so jolly is because he knows where all the bad girls live." -- Random IRC quote To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Sep 23 6:58:48 2000 Delivered-To: freebsd-fs@freebsd.org Received: from cs4.cs.ait.ac.th (cs4.cs.ait.ac.th [192.41.170.15]) by hub.freebsd.org (Postfix) with ESMTP id CD27A37B424; Sat, 23 Sep 2000 06:58:42 -0700 (PDT) Received: from bazooka.cs.ait.ac.th (on@bazooka.cs.ait.ac.th [192.41.170.2]) by cs4.cs.ait.ac.th (8.9.3/8.9.3) with ESMTP id IAA16331; Sat, 23 Sep 2000 08:25:22 +0700 (GMT+0700) From: Olivier Nicole Received: (from on@localhost) by bazooka.cs.ait.ac.th (8.8.5/8.8.5) id IAA05688; Sat, 23 Sep 2000 08:25:21 +0700 (ICT) Date: Sat, 23 Sep 2000 08:25:21 +0700 (ICT) Message-Id: <200009230125.IAA05688@bazooka.cs.ait.ac.th> To: intmktg@CAM.ORG Cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-reply-to: (message from Marc Tardif on Fri, 22 Sep 2000 11:54:10 -0400 (EDT)) Subject: Re: ccd with other filesystems Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org As far as I know (FreeBSD 2.2.7, yes!) at partition level (Unix partition) Well I used one partition per slice anyway. Olivier To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message