Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 09 Nov 2008 15:29:08 -0800
From:      Joe Kelsey <joe@zircon.seattle.wa.us>
To:        =?ISO-8859-1?Q?S=F8ren_Schmidt?= <sos@FreeBSD.ORG>
Cc:        Jeremy Chadwick <koitsu@FreeBSD.ORG>, freebsd-stable@FreeBSD.ORG, votdev@gmx.de, Peter Wemm <peter@wemm.org>, freebsd-hardware@FreeBSD.ORG
Subject:   Re: Western Digital hard disks and ATA timeouts
Message-ID:  <49177244.9060802@zircon.seattle.wa.us>
In-Reply-To: <77C223A7-C5FC-45DE-BF1A-3BC7982FA582@FreeBSD.ORG>
References:  <20081107071752.GA5842@icarus.home.lan>	<e7db6d980811071112x315d2d94vb305245e799adfce@mail.gmail.com> <77C223A7-C5FC-45DE-BF1A-3BC7982FA582@FreeBSD.ORG>

next in thread | previous in thread | raw e-mail | index | archive | help
Søren Schmidt wrote:
> On 7Nov, 2008, at 20:12 , Peter Wemm wrote:
>
>> On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick <koitsu@freebsd.org> 
>> wrote:
>> [..]
>>> As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
>>> is not adjustable without editing the ATA code yourself and increasing
>>> the value.  The FreeNAS folks have made patches available to turn the
>>> timeout value into a sysctl.
>>>
>>> Soren and/or others, please increase this timeout value.  Five seconds
>>> has now been deemed too aggressive a default.  And please consider
>>> migrating the timeout value into a sysctl.
>>
>> The 5 second timeout has been a problem for quite a while actually.
>> I've had a number of instances where I've had to increase it to 20 or
>> 30 seconds when recovering from marginal drives.  The longest
>> "successful" recovery attempt I've seen was 26 seconds, I believe on a
>> Maxtor drive a few years ago.   ("successful" == the drive spent 26
>> seconds but eventually successfully read the sector).  Even the IBM
>> death star drives could take much longer than 5 seconds to do a
>> recovery 5 years ago.  5 seconds has never been a good default.
>>
>> I think the timeout should be increased to at least 30 seconds.  My
>> windows box has a timeout that goes for several minutes.
>>
>> If there is concern about FreeBSD appearing to hang, I could imagine
>> that a console warning message could be printed after 5 seconds.  But
>> just say "drive has not yet responded".  But give it more time.
>>
>> In this day and age we're generally not playing games with udma33 vs
>> 66, notched cables, poor CRC support etc.  SATA seems to have
>> eliminated all that.  Hmm, it might make sense to increase the timeout
>> on SATA connections to 2 or 3 minutes by default.
>
> Actually I do have a patch around that logs the timeout on the console 
> after the normal timeout (5secs), then just goes on to wait for double 
> the timeout and log again etc etc, final timeout was IIRC 60 secs but 
> could be anything.
I have a disk which I am finally getting rid of that produces READ_DMA 
and WRITE_DMA errors at a pretty high rate.  I did enable the extra ATA 
error reporting and it doesn't seem to indicate any sort of actual 
errors, just extra long itmeouts.

At one time, I did change the system to extend the timeout, but I did 
not see any real improvement at 30 seconds.  I suspect that an even more 
extended timeout would be necessary to solve the problem.

I am removing the disk this week.  Does anyone want a disk that produces 
DMA timeouts at a regular rate?  Would it help actually solve this problem?

Please let me know if you want such a beast and I will ship it to you.

/Joe




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49177244.9060802>