Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Aug 1997 11:42:31 -0500 (CDT)
From:      Dave Bodenstab <imdave@mcs.net>
To:        freebsd-questions@FreeBSD.ORG
Cc:        art@adn.edu.ph, dwhite@resnet.uoregon.edu
Subject:   Re: disk problem
Message-ID:  <199708081642.LAA03884@imdave.pr.mcs.net>

next in thread | raw e-mail | index | archive | help
> On Tue, 5 Aug 1997, Arthur Alacar wrote:
>
> > > >   wd1: interrupt timeout:
> > > >   wd1: status 50<rdy,seekdone> error 0
> > > 
> > > Do you have IDE spindown enabled?
> > 
> > what's this spindown?
>
> Most modern motherboards support turning the hard disks off after a
> specified amount of time.  When the disk is accessed the disk is turned
> back on.  This takes too long for FreeBSD and will spit out that error
> message.
>
> > > It's harmless unless you get these frequently, with other more serious
> > > errors.
> > 
> > i often got such message..
>
> Perhaps one of your cables is going down or wd1 takes too long to respond.
>
I just finished replacing one of my disks (/ and /usr unfortunately) due
to errors that began with this message.  I had two identical WD AC310000
1.2G drives, and last weekend began getting these errors on one of them.
Of course, there are many things that can probably go wrong, and there's
no way of telling if Mr. Alacar's problem is the same as mine, but based
on my recent experience, these messages are cause to begin more careful
monitoring the system for possible problems.

In my case, these messages always followed an unusual, audible "click" --
perhaps the drive reset itself?  Depending on what I/O was in progress, the
messages got more severe, and quite often resulted in panics and reboots.
Messages included (an very short excerpt from /var/log/messages):

Aug  4 08:48:10 base486 /kernel: wdc0: unit 0 (wd0): <WDC AC31000H>
Aug  4 08:48:10 base486 /kernel: wd0: 1033MB (2116800 sectors), 2100 cyls, 16 heads, 63 S/T, 512 B/S
Aug  4 15:37:01 base486 /kernel: wd0: interrupt timeout:
Aug  4 15:37:01 base486 /kernel: wd0: status 58<seekdone,drq> error 0
Aug  5 17:05:01 base486 /kernel: wd0a: wdstart: timeout waiting to give command writing fsbn 16 of 16-19 (wd0 bn 16; cn 0 tn 0 sn 16)wd0: status 80<busy> error 1<no_dam>
Aug  6 17:54:25 base486 /kernel: wd0a: hard error writing fsbn 144 of 144-159 (wd0 bn 144; cn 0 tn 2 sn 18)wd0: status 51<seekdone,err> error 4<abort>
Aug  6 17:54:25 base486 /kernel: wd0s3e: hard error reading fsbn 341216 of 341216-341231 (wd0s3 bn 515296; cn 511 tn 3 sn 19)wd0: status 59<seekdone,drq,err> error 4<abort>
Aug  6 20:52:59 base486 /kernel: wd0a: wdstart: timeout waiting to give command reading fsbn 54352 of 54352-54367 (wd0 bn 54352; cn 53 tn 14 sn 46)wd0: status 80<busy> error 4<abort>

I went to Western Digital's web site (http://www.wdc.com) and down loaded
their diagnostic program.  Typically, it ran perfectly foam DOS and
continued to report no problems with the drive.  It seemed to just be doing
a surface scan, and had no random seek test, and I strongly suspect that it
was not enabling interrupts, but simply polling the drive.  Finally I got
lucky I guess, and one of these "clicks" occurred which was detected as
an error.  WD is now sending me a new drive under their 3 year warranty.
(Unfortunately, I couldn't wait a week and had to go out and buy a new
one now -- the good news is that IDE drives are really cheap these days.)

Conclusion and suggestions:

1.  Make sure you have a current backup
2.  Be prepared that the drive may be going bad
3.  Monitor your logs and watch for any other symptoms, or an increasing
    frequency of errors
4.  Go to the manufacturers web site and see if they have anything
    that can help diagnose a problem

Good luck.

Dave Bodenstab
imdave@mcs.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708081642.LAA03884>