From owner-freebsd-fs Sun Oct 18 04:00:00 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id EAA05445 for freebsd-fs-outgoing; Sun, 18 Oct 1998 04:00:00 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from uni4nn.gn.iaf.nl (osmium.gn.iaf.nl [193.67.144.12]) by hub.freebsd.org (8.8.8/8.8.8) with SMTP id DAA05429; Sun, 18 Oct 1998 03:59:57 -0700 (PDT) (envelope-from wilko@yedi.iaf.nl) Received: by uni4nn.gn.iaf.nl with UUCP id AA00540 (5.67b/IDA-1.5); Sun, 18 Oct 1998 12:47:37 +0200 Received: (from wilko@localhost) by yedi.iaf.nl (8.8.8/8.6.12) id XAA08150; Sat, 17 Oct 1998 23:31:18 +0200 (CEST) From: Wilko Bulte Message-Id: <199810172131.XAA08150@yedi.iaf.nl> Subject: Re: filesystem safety and SCSI disk write caching In-Reply-To: <19981017191758.A13174@gvr.org> from Guido van Rooij at "Oct 17, 98 07:17:58 pm" To: guido@gvr.org (Guido van Rooij) Date: Sat, 17 Oct 1998 23:31:18 +0200 (CEST) Cc: tlambert@primenet.com, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG X-Organisation: Private FreeBSD site - Arnhem, The Netherlands X-Pgp-Info: PGP public key at 'finger wilko@freefall.freebsd.org' X-Mailer: ELM [version 2.4ME+ PL38 (25)] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org As Guido van Rooij wrote... > On Fri, Oct 16, 1998 at 07:58:17PM +0000, Terry Lambert wrote: > > > > The errors seen are a result of uncommitted data in the drive cache, > > not power spikes and gremlins. The interaction is well understood, > > and on firm footing unrelated to Stephan King novels. > > I always thought a drive will always be able to flush its write cache > to disk, even when power fails. You'd say so, but it is not always the case. I have discussed with the disk experts in our company and they have told me enough to always keep WB caching disabled. Wilko _ ______________________________________________________________________ | / o / / _ Bulte email: wilko@yedi.iaf.nl |/|/ / / /( (_) Arnhem, The Netherlands WWW : http://www.tcja.nl ______________________________________________ Powered by FreeBSD __________ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 18 12:25:32 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA26817 for freebsd-fs-outgoing; Sun, 18 Oct 1998 12:25:32 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA26811; Sun, 18 Oct 1998 12:25:29 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id MAA24141; Sun, 18 Oct 1998 12:25:05 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp03.primenet.com, id smtpd024134; Sun Oct 18 12:24:59 1998 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id MAA19638; Sun, 18 Oct 1998 12:24:58 -0700 (MST) From: Terry Lambert Message-Id: <199810181924.MAA19638@usr06.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: gibbs@plutotech.com (Justin T. Gibbs) Date: Sun, 18 Oct 1998 19:24:58 +0000 (GMT) Cc: tlambert@primenet.com, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: <199810162349.RAA11679@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 16, 98 05:42:34 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >The errors seen are a result of uncommitted data in the drive cache, > >not power spikes and gremlins. The interaction is well understood, > >and on firm footing unrelated to Stephan King novels. > > And why do you think the drive didn't bother to commit the data even > though power was constantly supplied to the drive and only a few, > recent transactions were lost? Most likely because hitting the > reset switch caused a power glitch that reverted the drive to its > power on state. I think it's because the PCI bus POSTed the controller, resulting in a SCSI reset, which lost the uncommitted data. It's a lot easier to believe that than to believe the other. I suppose we should test by making a loadable system call that directly calls the reset code, so as to put the "reset causes a reset reliably, but even though no hardware other than the disk write cache is affected, we believe it's because no one debounced the switch" theory to rest. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 18 12:26:51 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA27036 for freebsd-fs-outgoing; Sun, 18 Oct 1998 12:26:51 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA27027; Sun, 18 Oct 1998 12:26:41 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id MAA24326; Sun, 18 Oct 1998 12:26:16 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp03.primenet.com, id smtpd024301; Sun Oct 18 12:26:07 1998 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id MAA19706; Sun, 18 Oct 1998 12:26:05 -0700 (MST) From: Terry Lambert Message-Id: <199810181926.MAA19706@usr06.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: guido@gvr.org (Guido van Rooij) Date: Sun, 18 Oct 1998 19:26:05 +0000 (GMT) Cc: tlambert@primenet.com, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: <19981017191758.A13174@gvr.org> from "Guido van Rooij" at Oct 17, 98 07:17:58 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > The errors seen are a result of uncommitted data in the drive cache, > > not power spikes and gremlins. The interaction is well understood, > > and on firm footing unrelated to Stephan King novels. > > I always thought a drive will always be able to flush its write cache > to disk, even when power fails. The disks which do this are no longer being manufactured by Quantum. I know, because we tried to buy them, in volume. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 18 12:29:30 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA27376 for freebsd-fs-outgoing; Sun, 18 Oct 1998 12:29:30 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA27365; Sun, 18 Oct 1998 12:29:28 -0700 (PDT) (envelope-from tlambert@usr06.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.8.8/8.8.8) id MAA24841; Sun, 18 Oct 1998 12:29:06 -0700 (MST) Received: from usr06.primenet.com(206.165.6.206) via SMTP by smtp03.primenet.com, id smtpd024822; Sun Oct 18 12:28:59 1998 Received: (from tlambert@localhost) by usr06.primenet.com (8.8.5/8.8.5) id MAA19832; Sun, 18 Oct 1998 12:28:58 -0700 (MST) From: Terry Lambert Message-Id: <199810181928.MAA19832@usr06.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: julian@whistle.com (Julian Elischer) Date: Sun, 18 Oct 1998 19:28:58 +0000 (GMT) Cc: guido@gvr.org, tlambert@primenet.com, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: from "Julian Elischer" at Oct 17, 98 06:59:41 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > > I always thought a drive will always be able to flush its write cache > > to disk, even when power fails. > > no, > That's a myth. Actually, there are a number of drives which used to be manufactured, which did this. I don't think anyone is manufacturing them any more, but I know that at least one 7200 RPM IBM SCSI drive did this, and I have the spec sheets (somewhere) for a quantum that would so this, so long as the unwritten data was only a single track. Perhaps they realized that the mpst likely event after a power fluctuation was a bus reset, and figured "why bother?". Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 18 16:08:55 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA18015 for freebsd-fs-outgoing; Sun, 18 Oct 1998 16:08:55 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA17999; Sun, 18 Oct 1998 16:08:53 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id RAA21386; Sun, 18 Oct 1998 17:08:21 -0600 (MDT) Message-Id: <199810182308.RAA21386@pluto.plutotech.com> X-Mailer: exmh version 2.0.2 2/24/98 To: Terry Lambert cc: julian@whistle.com (Julian Elischer), guido@gvr.org, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Sun, 18 Oct 1998 19:28:58 -0000." <199810181928.MAA19832@usr06.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 18 Oct 1998 17:01:30 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >Perhaps they realized that the mpst likely event after a power >fluctuation was a bus reset, and figured "why bother?". And why is this relevant? The only reasons allowed for cache contents never making it to the disk are power loss and hardware failure. A bus reset (assuming the hard reset alternative is in effect) only clears any transactions that have not been reported as completed to the host. Perhaps you should add the SCSI II and SCSI II specs to your list of things to read. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sun Oct 18 16:17:11 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id QAA18681 for freebsd-fs-outgoing; Sun, 18 Oct 1998 16:17:11 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id QAA18659; Sun, 18 Oct 1998 16:17:08 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id RAA21632; Sun, 18 Oct 1998 17:16:43 -0600 (MDT) Message-Id: <199810182316.RAA21632@pluto.plutotech.com> X-Mailer: exmh version 2.0.2 2/24/98 To: Terry Lambert cc: gibbs@plutotech.com (Justin T. Gibbs), dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Sun, 18 Oct 1998 19:24:58 -0000." <199810181924.MAA19638@usr06.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sun, 18 Oct 1998 17:09:52 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >> And why do you think the drive didn't bother to commit the data even >> though power was constantly supplied to the drive and only a few, >> recent transactions were lost? Most likely because hitting the >> reset switch caused a power glitch that reverted the drive to its >> power on state. > >I think it's because the PCI bus POSTed the controller, resulting in >a SCSI reset, which lost the uncommitted data. On a Hawk? I've never seen one respond incorrectly to a bus reset condition. We're also talking about several seconds of delay before the reset occurs (more than the few ms wait before the hawk will flush its cache) since the aic7xxx chips will not assert the reset line until the BIOS has been installed (i.e. hitting the chip reset or POSTing the chip will not cause a spurious bus reset). >It's a lot easier to believe that than to believe the other. Not really. Hawks have had a very good firmware record and the only way that this could happen in your scenario would be because of a firmware bug. >I suppose we should test by making a loadable system call that directly >calls the reset code, so as to put the "reset causes a reset reliably, >but even though no hardware other than the disk write cache is >affected, we believe it's because no one debounced the switch" theory >to rest. Who needs a system call? The user can easily cause a bus reset to occur by opening the XPT device and sending a ccb with the XPT_RESET_BUS function code in it. The alternative is to put the drive on a separate power supply. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Oct 19 01:55:40 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA07872 for freebsd-fs-outgoing; Mon, 19 Oct 1998 01:55:40 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from baerenklau.de.freebsd.org (baerenklau.de.freebsd.org [195.185.195.14]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA07855; Mon, 19 Oct 1998 01:55:36 -0700 (PDT) (envelope-from w@panke.de.freebsd.org) Received: (from uucp@localhost) by baerenklau.de.freebsd.org (8.8.8/8.8.8) with UUCP id KAA23903; Mon, 19 Oct 1998 10:55:07 +0200 (CEST) (envelope-from w@panke.de.freebsd.org) Received: (from w@localhost) by campa.panke.de.freebsd.org (8.8.8/8.8.8) id PAA00390; Sun, 18 Oct 1998 15:28:36 +0200 (MET DST) (envelope-from w) Message-ID: <19981018152834.A378@panke.de.freebsd.org> Date: Sun, 18 Oct 1998 15:28:34 +0200 From: Wolfram Schneider To: Andre Oppermann , Gary Palmer Cc: freebsd-fs@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching References: <22657.907553262@gjp.erols.com> <3618DA15.46329AA1@pipeline.ch> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.1i In-Reply-To: <3618DA15.46329AA1@pipeline.ch>; from Andre Oppermann on Mon, Oct 05, 1998 at 04:39:17PM +0200 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 1998-10-05 16:39:17 +0200, Andre Oppermann wrote: > > > > I can post (once again) the results of a Novell study on server usage > > > > patterns. The 30,000 foot view for a typical server breaks down to: > > > > > > > > 75% reads > > > > 15% writes > > > > 8% directory search operations > > > > 2% other > > > > I think that is very dependant on the server type. PC NetWare fileservers > > probably have very different access patterns to (say) a web server or a mail > > server. Let alone a news server. > > Is there a way to gather such statistics on FreeBSD? nfsstat(1) Wolfram > I'd like to run it on all my boxes (and others) to get representative > figures. After that we can discuss optimizations. > > -- > Andre > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-fs" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Oct 19 10:48:19 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA28434 for freebsd-fs-outgoing; Mon, 19 Oct 1998 10:48:19 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA28421; Mon, 19 Oct 1998 10:48:12 -0700 (PDT) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id KAA21084; Mon, 19 Oct 1998 10:47:47 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp01.primenet.com, id smtpd021057; Mon Oct 19 10:47:40 1998 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id KAA28054; Mon, 19 Oct 1998 10:47:39 -0700 (MST) From: Terry Lambert Message-Id: <199810191747.KAA28054@usr02.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: gibbs@plutotech.com (Justin T. Gibbs) Date: Mon, 19 Oct 1998 17:47:39 +0000 (GMT) Cc: tlambert@primenet.com, julian@whistle.com, guido@gvr.org, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: <199810182308.RAA21386@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 18, 98 05:01:30 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >Perhaps they realized that the mpst likely event after a power > >fluctuation was a bus reset, and figured "why bother?". > > And why is this relevant? The only reasons allowed for cache contents > never making it to the disk are power loss and hardware failure. A > bus reset (assuming the hard reset alternative is in effect) only clears > any transactions that have not been reported as completed to the host. > > Perhaps you should add the SCSI II and SCSI II specs to your list of things > to read. Feel free to engage in Ad Hominim attacks, *after* you explain why Don Lewis is seeing the empirical behaviour he is seeing, in contradiction to your claims of what's possible and not. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Oct 19 10:51:29 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id KAA28959 for freebsd-fs-outgoing; Mon, 19 Oct 1998 10:51:29 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id KAA28952; Mon, 19 Oct 1998 10:51:27 -0700 (PDT) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id KAA22386; Mon, 19 Oct 1998 10:51:02 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp01.primenet.com, id smtpd022349; Mon Oct 19 10:50:55 1998 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id KAA28209; Mon, 19 Oct 1998 10:50:55 -0700 (MST) From: Terry Lambert Message-Id: <199810191750.KAA28209@usr02.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: gibbs@plutotech.com (Justin T. Gibbs) Date: Mon, 19 Oct 1998 17:50:55 +0000 (GMT) Cc: tlambert@primenet.com, gibbs@plutotech.com, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: <199810182316.RAA21632@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 18, 98 05:09:52 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >I suppose we should test by making a loadable system call that directly > >calls the reset code, so as to put the "reset causes a reset reliably, > >but even though no hardware other than the disk write cache is > >affected, we believe it's because no one debounced the switch" theory > >to rest. > > Who needs a system call? The user can easily cause a bus reset to occur > by opening the XPT device and sending a ccb with the XPT_RESET_BUS > function code in it. The alternative is to put the drive on a separate > power supply. You were claiming it was the machine reset. I was suggesting a software method of machine reset to take the undebounced reset switch out of the equation. I'm *not* claiming it *is* the SCSI reset, I'm merely claiming that I don't believe that an undebounced reset switch is any more likely than a SCSI reset, and that there's a method we can use to verify or impugn the undebounced reset switch theory. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Oct 19 11:11:40 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA01866 for freebsd-fs-outgoing; Mon, 19 Oct 1998 11:11:40 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA01845; Mon, 19 Oct 1998 11:11:33 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id MAA06527; Mon, 19 Oct 1998 12:10:50 -0600 (MDT) Message-Id: <199810191810.MAA06527@pluto.plutotech.com> To: Terry Lambert cc: julian@whistle.com, guido@gvr.org, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Mon, 19 Oct 1998 17:47:39 -0000." <199810191747.KAA28054@usr02.primenet.com> Date: Mon, 19 Oct 1998 12:04:00 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >> >Perhaps they realized that the mpst likely event after a power >> >fluctuation was a bus reset, and figured "why bother?". >> >> And why is this relevant? The only reasons allowed for cache contents >> never making it to the disk are power loss and hardware failure. A >> bus reset (assuming the hard reset alternative is in effect) only clears >> any transactions that have not been reported as completed to the host. >> >> Perhaps you should add the SCSI II and SCSI II specs to your list of things >> to read. > >Feel free to engage in Ad Hominim attacks... If it was an attack, it was self-inflicted. You should know better than to make statements on topics you do not fully comprehend. It is blatantly obvious to anyone who has read the spec that you either have not read the spec or did not comprehend it. It is only fair to ensure that the less informed people on this list know that you are anything but an expert on SCSI and they should take you comments as uninformed supposition at best. If you don't want to be 'attacked' stop throwing FUD around on our lists. >, *after* you explain why >Don Lewis is seeing the empirical behaviour he is seeing, in >contradiction to your claims of what's possible and not. I've already given my opinion on this. I believe the Hawk is seeing a power glitch or temporary power loss when the reset switch is hit and so the contents of the cache are lost. I have never said that the behavior that Don Lewis is seeing is 'not possible', only that, for the drive in question, the reset causing cache corruption is not likely. Pluto has validated the Hawk for use in our RAID 3 systems, and part of this validation includes spurious bus resets. This behavior was never encountered in our tests. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Oct 19 12:12:09 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA08209 for freebsd-fs-outgoing; Mon, 19 Oct 1998 12:12:09 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA08204; Mon, 19 Oct 1998 12:12:07 -0700 (PDT) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id MAA06015; Mon, 19 Oct 1998 12:11:43 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp02.primenet.com, id smtpd005996; Mon Oct 19 12:11:40 1998 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id MAA02817; Mon, 19 Oct 1998 12:11:26 -0700 (MST) From: Terry Lambert Message-Id: <199810191911.MAA02817@usr02.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: gibbs@plutotech.com (Justin T. Gibbs) Date: Mon, 19 Oct 1998 19:11:26 +0000 (GMT) Cc: tlambert@primenet.com, julian@whistle.com, guido@gvr.org, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: <199810191810.MAA06527@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 19, 98 12:04:00 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >> Perhaps you should add the SCSI II and SCSI II specs to your list of things > >> to read. > > > >Feel free to engage in Ad Hominim attacks... > > If it was an attack, it was self-inflicted. You should know better > than to make statements on topics you do not fully comprehend. It > is blatantly obvious to anyone who has read the spec that you either > have not read the spec or did not comprehend it. It is only fair > to ensure that the less informed people on this list know that you > are anything but an expert on SCSI and they should take you comments > as uninformed supposition at best. > > If you don't want to be 'attacked' stop throwing FUD around on our lists. I am not presuming to be an expert on SCSI. I *am* presuming to tell you that Don's experiences, and my own, contradict your interpretation of the spec. This is not to say your interpretation is wrong, but it could easily be the case that the drive or controller is not 100% compliant. Jumping down my throat about intepretation of the spec. because the empirically observed behaviour contradicts your knowledge of the spec. (I do not question that your knowledge of the spec. far exceeds mine) resolves nothing. > >, *after* you explain why > >Don Lewis is seeing the empirical behaviour he is seeing, in > >contradiction to your claims of what's possible and not. > > I've already given my opinion on this. I believe the Hawk is seeing > a power glitch or temporary power loss when the reset switch is hit and > so the contents of the cache are lost. I have never said that the > behavior that Don Lewis is seeing is 'not possible', only that, for > the drive in question, the reset causing cache corruption is not likely. > Pluto has validated the Hawk for use in our RAID 3 systems, and part of > this validation includes spurious bus resets. This behavior was never > encountered in our tests. I personally still think it has something to do with what happens when the controller POSTs. I would like to see the following from Don before we simply accept a "magic power glitch": 1) Reset via software instead of via the reset switch 2) Use of a seperate power supply for the drive during reset This will decisively localize it to one side or the other of the bus. If the problem still occurs after 1 but doesn't after 2, then it's time to put a scope on the supply line to the drive. As a datapoint, I have an NCR controller, and the problem occurs on my external SyJet 1.5G drive, which, by definition, has its own power supply, which I would be hard pressed to believe was affected by my hitting the front panel reset but on my seperately supplied computer. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Mon Oct 19 12:25:50 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id MAA09722 for freebsd-fs-outgoing; Mon, 19 Oct 1998 12:25:50 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA09703; Mon, 19 Oct 1998 12:25:47 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id NAA10003; Mon, 19 Oct 1998 13:24:58 -0600 (MDT) Message-Id: <199810191924.NAA10003@pluto.plutotech.com> To: Terry Lambert cc: gibbs@plutotech.com (Justin T. Gibbs), julian@whistle.com, guido@gvr.org, dkelly@hiwaay.net, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Mon, 19 Oct 1998 19:11:26 -0000." <199810191911.MAA02817@usr02.primenet.com> Date: Mon, 19 Oct 1998 13:18:07 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >I am not presuming to be an expert on SCSI. I *am* presuming to >tell you that Don's experiences, and my own, contradict your >interpretation of the spec. No. You are throwing FUD, unrelated to Don's 'experiences' onto the list. One of the comments I was complaining about was this: >Perhaps they realized that the mpst likely event after a power >fluctuation was a bus reset, and figured "why bother?". This has nothing to do with Don's issue at all. It is also completely faulty logic. >Jumping down my throat about intepretation of the spec. because >the empirically observed behaviour contradicts your knowledge >of the spec. (I do not question that your knowledge of the spec. >far exceeds mine) resolves nothing. The behavior does not contradict my 'interpretation of the spec'. Devices violate the spec all the time, but that is a totally different issue. >I personally still think it has something to do with what happens >when the controller POSTs. > >I would like to see the following from Don before we simply accept >a "magic power glitch": Unless you insist on being argumenative about this, it doesn't matter why or how the cache is invalidated because Don has already decided to turn off his cache. As far as I'm concerned there is nothing more to be gained by this discussion, so I'm going to ignore anything else on this thread. >As a datapoint, I have an NCR controller, and the problem occurs >on my external SyJet 1.5G drive, which, by definition, has its own >power supply, which I would be hard pressed to believe was affected >by my hitting the front panel reset but on my seperately supplied >computer. I do not know what the NCR chips do on POST, nor do I have any experience with the SyJet to know if it has reasonable firmware. This is not a 'datapoint' for Don's experience because it is a totally unrelated device. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Oct 22 17:13:43 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id RAA26608 for freebsd-fs-outgoing; Thu, 22 Oct 1998 17:13:43 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id RAA26590; Thu, 22 Oct 1998 17:13:41 -0700 (PDT) (envelope-from gdonl@tsc.tdk.com) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id RAA07804; Thu, 22 Oct 1998 17:13:12 -0700 (PDT) (envelope-from gdonl@tsc.tdk.com) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id RAA12923; Thu, 22 Oct 1998 17:13:11 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id RAA19305; Thu, 22 Oct 1998 17:13:09 -0700 (PDT) From: Don Lewis Message-Id: <199810230013.RAA19305@salsa.gv.tsc.tdk.com> Date: Thu, 22 Oct 1998 17:13:09 -0700 X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Oct 14, 9:01am, Nate Williams wrote: } Subject: Re: filesystem safety and SCSI disk write caching } Bottom line is that by default FreeBSD w/SoftUpdates is more *unstable* } now with CAM than it was w/out CAM for 99% of the users. That's certainly not my experience. Once the initiate_write_filepage and newdirrem panics were fixed, my system has been completely stable. Since then only filesystem damage that has happened is when I had write caching enabled and I was playing with the reset button. I'm very anxious to upgrade our pre-CAM systems here because of stability problems I've been having with them. On Oct 14, 9:02am, "Justin T. Gibbs" wrote: } Subject: Re: filesystem safety and SCSI disk write caching } Neither the old SCSI code nor the } new CAM code has ever modified the caching behavior of the devices attached } on the bus. My position on this is that we should *never* modify device } mode parameters unless instructed to do so by the end user. I agree. } If the user } community insists that we add a warning about the cache being enabled, now, } after years of silently ignoring the effects of the cache on filesystem } integrity, so be it. My opinion is that if this was a problem in practice, } we'd have heard about it by now from our user community. I think much of the user community is uneducated on some of the finer points and also has low expectations. On Oct 14, 9:18am, Nate Williams wrote: } Subject: Re: filesystem safety and SCSI disk write caching } Up till this point, we never had any FS code that attempted to deal with } 'crashing' robustly. Not true. Unless you were using async mounts, the filesystem code was careful to order certain writes so that there would not be any unrecoverable damage. } I for one knew that if my machine crashed, an fsck } was expected and it was going to repair my disk. It still is necessary to recover lost resources, though in theory it could be done after the filesystem is mounted. With softupdates, it isn't strictly necessary to do it before mount time because blocks are no longer marked free in the bitmaps before they have been deallocated. The traditional code wasn't quite so careful in the interest of performance, since fsck could easily fix the bitmaps. } This is no longer the } case with SoftUpdates, so what was previously attributed to 'the crash' } can now be more fine-tuned to 'write caching screwed up' or 'I have a } bad power supply which breaks my drive when I hit the reset button' or } even simply 'when I lose power, write-caching spams my disk'. These problems can damage the filesystem in ways that fsck can't recover without data loss. On May 13, 5:54pm, "Christopher R. Bowman" wrote: } Subject: Re: filesystem safety and SCSI disk write caching } At 11:08 AM 10/14/98 -0700, Julian Elischer wrote: } >7/ to allow for this to be achieved easily, there should be an easy way to } >ensure that the write cache is turned off. Possibly as simple as } >a good example in camctl.8 . ... and a note in the handbook reminding folks of the importance of doing this when building a system or adding new disks. } Could we make this a mount time option, say if -wc to turn write caching on, } -nowc to turn it off, and if neither flag is present use whatever the drive is } already set for. My initial request was for the opposite, a warning if write caching was on, to keep folks from silently shooting themselves in the foot. This avoids the problems that Justin mentioned in his reply. This could even be tweaked to warn you if write caching was not in the state you desired. However, I now think this isn't the way to go. See below. On Oct 14, 4:54pm, "Justin T. Gibbs" wrote: } Subject: Re: filesystem safety and SCSI disk write caching } The moral of this story is that everyone should decide what kinds of } performance/safety tradeoffs they are willing to make and design their } systems accordingly. Yes, this sounds like an issue that should be discussed in the handbook. On Oct 14, 7:15pm, David Kelly wrote: } Subject: Re: filesystem safety and SCSI disk write caching } Before you fight it too much more, replace the power supply. I've cured } a number of "impossible" problems with a new power supply. One } spectacular example was a Power Mac 7200/120. Crash, crash, crash. } Sometimes it would run for 30 minutes. Sometimes overnight. Technician } replaced everything several times over a couple of weeks. Everything } but the plastic case and the power supply. I insisted on a new PS the } last time back. And it worked like a charm. In normal operation, the system is absolutely stable. The only problem occurs when I hit reset while the system is busy. If I turn off write caching, I haven't gotten any filesystem damage even then. } Power supply filter capacitors age with heat. And lose their ability to } be good capacitors. No telling what kind of noise is on your DC power } wires inside the case. Your PS could be generating a spike of its own } on RESET when/if something suddenly demands a lot of current. Or if } something suddenly quits demanding the current it was using. The capacitors "should" be good since the system is fairly new and has generally only been lightly used. If the problem was load regulation, then I'd expect the problem to also occur during heavy use, but I haven't seen any problems like that. One can never be sure though. On Oct 14, 11:19pm, Dan Nelson wrote: } Subject: Re: filesystem safety and SCSI disk write caching } I humbly submit the following script, to be added to /etc/security, or } periodic/weekly, /etc/rc, or wherever. It's dependant on the exact } output of "camcontrol inquiry" and "camcontrol modepage", but does the } job. Ah, a positive contribution to this thread! I was coming to the conclusion that a script to check this was probably the way to go when I saw your message. You can also use something like this to check some of the other SCSI control bits to make sure things like error recovery are also properly configured. I'd also combine this with a scan for new grown defects. I'd recommend running the script both at boot time and from cron to detect any potential problems. Using a script also avoids kernel bloat. On Oct 16, 6:09am, David Kelly wrote: } Subject: Re: filesystem safety and SCSI disk write caching } Why not? It might be interesting to put a recording voltmeter such as a } digital storage oscilloscope on the HD power leads when Don is punching } the reset. No telling what kind of voltage surges are generated when } the load on the power supply is altered. Ok, so the chapter in the handbook about SCSI write caching will recommend connecting a recording voltmeter to the power supply and monitoring it under varying load (including when the reset button is hit) to make sure there are no problems before enabling write caching? Repeat this procedure after adding new hardware and periodically as the power supply capacitors age. And you still can't prove that you don't have a power supply problem lurking, since you can't prove a negative. All you can state is that the power looks clean under the conditions that you tested. Problems still might occur when the machine is placed in service. On Oct 16, 7:26pm, Ollivier Robert wrote: } Subject: Re: filesystem safety and SCSI disk write caching } I agree. HP-UX has a kernel option to enable write caching and it is off by } default. Not that I'd advocate to do things like HP-SUX but I think it is } better to be safe. I think you are thinking of NFS write caching. An NFS server isn't supposed to tell the NFS client that the write has completed until the data is on stable storage (so the client can retry the write if the server crashes and reboots before the write is done). This badly hurts performance unless the client is able to handle a lot of outstanding write requests. } A lot if not all of the modern drives are shipped with WCE == 1 though. That didn't use to be the case. I think this changed a few years ago. Benchmarkitis no doubt. I didn't even think to check this since I haven't been putting too many new drives in service lately. On Oct 17, 1:48pm, Bill Vermillion wrote: } Subject: Re: filesystem safety and SCSI disk write caching } Guido van Rooij recently said: } > I always thought a drive will always be able to flush its write cache } > to disk, even when power fails. } } Not all do. The high-end IBM's do/did. They used the inertia of } the spinnging platters to generate enough current to flush the } buffers to disk. There aren't a lot of drives that do it. That could be pretty hard to do with the sizeable caches on drives these days. Quite a few seeks could be necessary which are expensive in terms of power. I also suspect that the necessary circuitry to recover the energy from the platters to power the rest of the drive adds a significant amount of cost to the drive, and the drive manufacturers are under quite a lot of cost pressure these days. This would be a disincentive to include a seldom used feature. On Oct 19, 12:04pm, "Justin T. Gibbs" wrote: } Subject: Re: filesystem safety and SCSI disk write caching } >, *after* you explain why } >Don Lewis is seeing the empirical behaviour he is seeing, in } >contradiction to your claims of what's possible and not. } } I've already given my opinion on this. I believe the Hawk is seeing } a power glitch or temporary power loss when the reset switch is hit and } so the contents of the cache are lost. I have never said that the } behavior that Don Lewis is seeing is 'not possible', only that, for } the drive in question, the reset causing cache corruption is not likely. If this problem caused by a load related power glitch, then it is possible to get silent filesystem corruption in normal operation if write caching is enabled, since cached writes could get lost and the driver might never notice. Without write caching, the driver would see transactions timing out and it would have an opportunity to retry the writes, preventing filesystem damage and data loss, and the driver would no doubt complain verbosely. This would alert the operator to the hardware problem. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Oct 22 20:40:27 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id UAA15292 for freebsd-fs-outgoing; Thu, 22 Oct 1998 20:40:27 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id UAA15276; Thu, 22 Oct 1998 20:40:25 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id VAA25994; Thu, 22 Oct 1998 21:39:53 -0600 (MDT) Message-Id: <199810230339.VAA25994@pluto.plutotech.com> X-Mailer: exmh version 2.0.2 2/24/98 To: Don Lewis cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Thu, 22 Oct 1998 17:13:09 PDT." <199810230013.RAA19305@salsa.gv.tsc.tdk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 22 Oct 1998 21:33:03 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >} I've already given my opinion on this. I believe the Hawk is seeing >} a power glitch or temporary power loss when the reset switch is hit and >} so the contents of the cache are lost. I have never said that the >} behavior that Don Lewis is seeing is 'not possible', only that, for >} the drive in question, the reset causing cache corruption is not likely. > >If this problem caused by a load related power glitch, then it is >possible to get silent filesystem corruption in normal operation if >write caching is enabled, since cached writes could get lost and the >driver might never notice. The driver will notice as the drive will notice an issue a Unit Attention response the next time you touch it. Or is your point that you won't necessarily access the device again and so never see that the device saw a loss in power? There has been quite a bit of debate on how UAs should be handled. The original CAM driver was *very* conservative and returned all pending I/O with EIO, marked the pack invalid, and refused to take any I/O unless the device cycled through final close. This ensures that if the device or pack is replaced that you don't spam different media or even if the media is the same, make the problem worse by attempting to continue after some transactions were irrevocably lost. This was considered too disruptive and since a permanent solution could not be developed for 3.0R, the UA code was disabled. The correct solution likely requires better communication to the FS or user layer so that pack validation of some sort can occur. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Oct 22 22:10:41 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA21450 for freebsd-fs-outgoing; Thu, 22 Oct 1998 22:10:41 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from gatekeeper.tsc.tdk.com (gatekeeper.tsc.tdk.com [207.113.159.21]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA21428; Thu, 22 Oct 1998 22:10:37 -0700 (PDT) (envelope-from gdonl@tsc.tdk.com) Received: from sunrise.gv.tsc.tdk.com (root@sunrise.gv.tsc.tdk.com [192.168.241.191]) by gatekeeper.tsc.tdk.com (8.8.8/8.8.8) with ESMTP id WAA09885; Thu, 22 Oct 1998 22:10:06 -0700 (PDT) (envelope-from gdonl@tsc.tdk.com) Received: from salsa.gv.tsc.tdk.com (salsa.gv.tsc.tdk.com [192.168.241.194]) by sunrise.gv.tsc.tdk.com (8.8.5/8.8.5) with ESMTP id WAA17645; Thu, 22 Oct 1998 22:10:05 -0700 (PDT) Received: (from gdonl@localhost) by salsa.gv.tsc.tdk.com (8.8.5/8.8.5) id WAA19810; Thu, 22 Oct 1998 22:10:04 -0700 (PDT) From: Don Lewis Message-Id: <199810230510.WAA19810@salsa.gv.tsc.tdk.com> Date: Thu, 22 Oct 1998 22:10:03 -0700 In-Reply-To: "Justin T. Gibbs" "Re: filesystem safety and SCSI disk write caching" (Oct 22, 9:33pm) X-Mailer: Mail User's Shell (7.2.6 alpha(3) 7/19/95) To: "Justin T. Gibbs" , Don Lewis Subject: Re: filesystem safety and SCSI disk write caching Cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On Oct 22, 9:33pm, "Justin T. Gibbs" wrote: } Subject: Re: filesystem safety and SCSI disk write caching } The driver will notice as the drive will notice an issue a Unit Attention } response the next time you touch it. Or is your point that you won't } necessarily access the device again and so never see that the device } saw a loss in power? I forgot about Unit Attention. At least it will be obvious that the filesystem is potentially corrupted and there is a hardware problem that needs fixing. } There has been quite a bit of debate on how UAs should be handled. The } original CAM driver was *very* conservative and returned all pending } I/O with EIO, marked the pack invalid, and refused to take any I/O unless } the device cycled through final close. This ensures that if the device } or pack is replaced that you don't spam different media or even if the } media is the same, make the problem worse by attempting to continue after } some transactions were irrevocably lost. This was considered too } disruptive and since a permanent solution could not be developed for } 3.0R, the UA code was disabled. The correct solution likely requires } better communication to the FS or user layer so that pack validation } of some sort can occur. If write caching is disabled and if it is possible to verify that the media is the same, it should be safe to retry all the lost I/O transactions. If write caching is off and softupdates is in use, your original solution still works ok even if someone yanks the media out of the drive. The filesystem will still be intact, though it might not be totally up to date. If write caching is on, you're basically SOL if you see a Unit Attention. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Thu Oct 22 22:16:57 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id WAA22126 for freebsd-fs-outgoing; Thu, 22 Oct 1998 22:16:57 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id WAA22095; Thu, 22 Oct 1998 22:16:54 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id XAA29829; Thu, 22 Oct 1998 23:16:23 -0600 (MDT) Message-Id: <199810230516.XAA29829@pluto.plutotech.com> X-Mailer: exmh version 2.0.2 2/24/98 To: Don Lewis cc: "Justin T. Gibbs" , freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Thu, 22 Oct 1998 22:10:03 PDT." <199810230510.WAA19810@salsa.gv.tsc.tdk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 22 Oct 1998 23:09:33 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >} There has been quite a bit of debate on how UAs should be handled. The >} original CAM driver was *very* conservative and returned all pending >} I/O with EIO, marked the pack invalid, and refused to take any I/O unless >} the device cycled through final close. This ensures that if the device >} or pack is replaced that you don't spam different media or even if the >} media is the same, make the problem worse by attempting to continue after >} some transactions were irrevocably lost. This was considered too >} disruptive and since a permanent solution could not be developed for >} 3.0R, the UA code was disabled. The correct solution likely requires >} better communication to the FS or user layer so that pack validation >} of some sort can occur. > >If write caching is disabled and if it is possible to verify that the >media is the same, it should be safe to retry all the lost I/O >transactions. You can't retry the transactions without first determining that the media is the same as before. We can't do that right now without manual intervention which shouldn't be the case. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message From owner-freebsd-fs Sat Oct 24 05:54:06 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id FAA09645 for freebsd-fs-outgoing; Sat, 24 Oct 1998 05:54:06 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from mail.HiWAAY.net (fly.HiWAAY.net [208.147.154.56]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id FAA09640; Sat, 24 Oct 1998 05:54:04 -0700 (PDT) (envelope-from dkelly@n4hhe.ampr.org) Received: from nospam.hiwaay.net (tnt3-227.HiWAAY.net [208.147.146.227]) by mail.HiWAAY.net (8.9.0/8.9.0) with ESMTP id HAA19730; Sat, 24 Oct 1998 07:53:29 -0500 (CDT) Received: from n4hhe.ampr.org (localhost.ampr.org [127.0.0.1]) by nospam.hiwaay.net (8.8.8/8.8.8) with ESMTP id HAA19254; Sat, 24 Oct 1998 07:24:19 -0500 (CDT) (envelope-from dkelly@n4hhe.ampr.org) Message-Id: <199810241224.HAA19254@nospam.hiwaay.net> X-Mailer: exmh version 2.0.2 2/24/98 To: Don Lewis cc: freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG From: David Kelly Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Message from Don Lewis of "Thu, 22 Oct 1998 17:13:09 PDT." <199810230013.RAA19305@salsa.gv.tsc.tdk.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 24 Oct 1998 07:24:19 -0500 Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Don Lewis writes: > On Oct 16, 6:09am, David Kelly wrote: > } Subject: Re: filesystem safety and SCSI disk write caching > > } Why not? It might be interesting to put a recording voltmeter such as a > } digital storage oscilloscope on the HD power leads when Don is punching > } the reset. No telling what kind of voltage surges are generated when > } the load on the power supply is altered. > > Ok, so the chapter in the handbook about SCSI write caching will recommend > connecting a recording voltmeter to the power supply and monitoring it > under varying load (including when the reset button is hit) to make sure > there are no problems before enabling write caching? Repeat this procedure > after adding new hardware and periodically as the power supply capacitors > age. > > And you still can't prove that you don't have a power supply problem > lurking, since you can't prove a negative. All you can state is that > the power looks clean under the conditions that you tested. Problems > still might occur when the machine is placed in service. I do not suggest use of a recording voltmeter or storage oscilloscope be mentioned in the handbook. My point is that in this day of generic PC parts the quality control aspect is getting skipped. While an individual PS may be tested for UL compliance, and the MB for FCC emissions, the system as a package gets skipped. The Mom & Pop PC Shop doesn't have a clue other than, "We sold 100 systems this month and you are the only one complaining." Testing for FCC emissions levels and UL safety on entire systems are nil. For data loss on reset you can prove a negative if you can reproduce the failure during your measurements. If you sample the voltage often enough thru the event then you can prove there was not a slower voltage spike causing the problem. My 50 MHz DSO says it samples at 100 MHz. Its much harder to monitor current as your power leads would have to be cut or some other inline calibrated very low value resistor inserted. But to do a complete job current should be monitored also. Current measurements will tell you if the load on the PS is changing. If current doesn't change significantly thru a RESET, then this is not a boundary condition. Problems are usually found at the boundaries. My suggestion of monitoring the power supply came from the nature of this list and its participants where skills and tools are above average. And the result is a product which is well above average and suitable for use by those who never give it a second thought. A handbook entry on SCSI caching is a attempt to cause such as second thought in more than would have in the first place. -- David Kelly N4HHE, dkelly@nospam.hiwaay.net ===================================================================== The human mind ordinarily operates at only ten percent of its capacity -- the rest is overhead for the operating system. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message