Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 19 Sep 1998 17:19:51 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        eivind@yes.no (Eivind Eklund)
Cc:        tlambert@primenet.com, Don.Lewis@tsc.tdk.com, current@FreeBSD.ORG
Subject:   Re: softupdates & fsck
Message-ID:  <199809191719.KAA10028@usr09.primenet.com>
In-Reply-To: <19980919123143.36373@follo.net> from "Eivind Eklund" at Sep 19, 98 12:31:43 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > That you are seeing these problems implies that the bwrite ordering
> > guarantees that the driver must provide (i.e., that the blocks will
> > be written in the order requested, and that the writes will not
> > return as completed until the data has been committed to the disk)
> > are not being honored.
> 
> Given that most drives don't honour these guarantees [1] it may happen
> even without a problem with the driver.
> 
> [1] This marks the point where somebody comes runing, waving standards
> documents and becoming more and more red in the face, while I say
> "Yes, I know they say the drives are supposed to - but in fact, the
> drives don't actually *do* what they're supposed to."

They do if you set their options correctly and insure a holdup
time after power failure, during which you will not engage in
scheduling new writes.

The question is "what happens to the sector under the head during a
write in case of power failure, if you don't have a holdup time?".

For some drives, the answer is "it works".  For the drives that
idiots buy, the answer is "it gets corrupted and the data is not
recoverable at all".


In any case, we are talking about system resets, not power failures,
so this is somewhat a horse of a different wheelbase, and we can
ignore the case where you employ idiots to buy your hardware and/or
you use non-ATX power supplies, followed by the power going out
unexpectedly.


For a drive that isn't powered down, that has stated to the controller
that it has written a block that it has actually cached, it is the
responsibility of the drive to write what it said it did before
acting upon the reset signal.

I can tell you that Quantum and Seagate IDE drives honor this,
and that most (all?) SCSI drives honor this even better (not
returning that the queued command has completed until the data
is committed to disk).


I expect that since (1) this problem doesn't occur without CAM,
and (2) this problem occurs with CAM, that this problem is CAM
related.

Feel free to prove me wrong by duplicating the problem with a
pre-CAM kernel with Loqui's patch applied; I'll be happy of the
stack traceback, as I'm sure Julian and Kirk would be, as well.


For now, from the fsck failure, it looks like the CAM driver isn't
making the ordering guarantees it should, and stating that "some
hardware won't make this guarantees, either", is more an argument
against purchasing "some hardware" than it is an argument for CAM
ignoring the intentional ordering of requests (if that's the root
cause of the problem, which didn't occur pre-CAM; obviously it
could be a different CAM bug causing the problem...).

A good question to ask at this point is "is anyone with an IDE
drive experiencing this problem?".


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809191719.KAA10028>