Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Oct 1998 18:00:52 -0600
From:      "Justin T. Gibbs" <gibbs@plutotech.com>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        gibbs@plutotech.com (Justin T. Gibbs), Don.Lewis@tsc.tdk.com, julian@whistle.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Subject:   Re: filesystem safety and SCSI disk write caching 
Message-ID:  <199810140007.SAA02391@pluto.plutotech.com>
In-Reply-To: Your message of "Tue, 13 Oct 1998 23:58:59 -0000." <199810132358.QAA18137@usr08.primenet.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>> You're missing a large step here.  You can't prove that the 'anomaly'
>> is related to the drive firmware without a trace of all transactions
>> on the SCSI bus.  It could well be a missing dependency in the soft
>> update code.
>
>If I turn off write caching on the drive, and repeat the test, and
>the evaluation (E) results in a "# of anomalies == 0", where with
>write caching enabled, the number was > 0, then I can say with
>high confidence that it's the write caching.

I disagree.  You have demonstrable varied the rate at which write
completions as seen by FreeBSD occur which means that the test is
anything but conclusive.

>> I'd be more than happy to reproduce your failure scenario
>> while recording a SCSI bus trace so that the fault is easy to interpret.
>> Just send me any *modern* drive that you think fails.
>
>Sure; just define "modern" for me, since my personal definition is
>"not IDE".

A drive manufactured within the last 3 years.

>> You should also ensure that your reset button does not cause any power
>> spikes on the drive power lines.  That would be cheating.
>
>It doesn't, since "# of anomalies == 0" with write caching disabled.

This doesn't follow.  If the cache is disabled, it doesn't matter if
the drive loses power due to hitting the reset button.  We already 
know that losing power on a drive that cached data will not work.

>> I'm still unclear as to whether Don was turning off power or hitting what I
>> consider the reset button.  His comment about UPSes use makes me think he
>> was testing power outage scenarios.
>
>Well, I know that this might sound insane, but we could ask Don, and
>I could get out of the middle of this whole thing... ;-).

Well, if your offering, I'd be more than happy to take you up on your
offer.

>> Since you were able to test 4 drives so quickly, I'd love to see well
>> documented information on exactly how the file system was inconsistent
>> in the failure cases.
>
>There were directory dependencies which were committed out of order
>(the modified fsck reports these as soft dependency errors...).

Can you be more specific?  Are you positive that the transactions
were committed out of order or could it be that some transactions
were never committed at all?  What was the size of the directory.
Was the failure in directory creation or destruction?  Which portion
of the dependency graph was violated?

--
Justin



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810140007.SAA02391>