Date: Fri, 8 Aug 1997 11:42:31 -0500 (CDT) From: Dave Bodenstab <imdave@mcs.net> To: freebsd-questions@FreeBSD.ORG Cc: art@adn.edu.ph, dwhite@resnet.uoregon.edu Subject: Re: disk problem Message-ID: <199708081642.LAA03884@imdave.pr.mcs.net>
next in thread | raw e-mail | index | archive | help
> On Tue, 5 Aug 1997, Arthur Alacar wrote: > > > > > wd1: interrupt timeout: > > > > wd1: status 50<rdy,seekdone> error 0 > > > > > > Do you have IDE spindown enabled? > > > > what's this spindown? > > Most modern motherboards support turning the hard disks off after a > specified amount of time. When the disk is accessed the disk is turned > back on. This takes too long for FreeBSD and will spit out that error > message. > > > > It's harmless unless you get these frequently, with other more serious > > > errors. > > > > i often got such message.. > > Perhaps one of your cables is going down or wd1 takes too long to respond. > I just finished replacing one of my disks (/ and /usr unfortunately) due to errors that began with this message. I had two identical WD AC310000 1.2G drives, and last weekend began getting these errors on one of them. Of course, there are many things that can probably go wrong, and there's no way of telling if Mr. Alacar's problem is the same as mine, but based on my recent experience, these messages are cause to begin more careful monitoring the system for possible problems. In my case, these messages always followed an unusual, audible "click" -- perhaps the drive reset itself? Depending on what I/O was in progress, the messages got more severe, and quite often resulted in panics and reboots. Messages included (an very short excerpt from /var/log/messages): Aug 4 08:48:10 base486 /kernel: wdc0: unit 0 (wd0): <WDC AC31000H> Aug 4 08:48:10 base486 /kernel: wd0: 1033MB (2116800 sectors), 2100 cyls, 16 heads, 63 S/T, 512 B/S Aug 4 15:37:01 base486 /kernel: wd0: interrupt timeout: Aug 4 15:37:01 base486 /kernel: wd0: status 58<seekdone,drq> error 0 Aug 5 17:05:01 base486 /kernel: wd0a: wdstart: timeout waiting to give command writing fsbn 16 of 16-19 (wd0 bn 16; cn 0 tn 0 sn 16)wd0: status 80<busy> error 1<no_dam> Aug 6 17:54:25 base486 /kernel: wd0a: hard error writing fsbn 144 of 144-159 (wd0 bn 144; cn 0 tn 2 sn 18)wd0: status 51<seekdone,err> error 4<abort> Aug 6 17:54:25 base486 /kernel: wd0s3e: hard error reading fsbn 341216 of 341216-341231 (wd0s3 bn 515296; cn 511 tn 3 sn 19)wd0: status 59<seekdone,drq,err> error 4<abort> Aug 6 20:52:59 base486 /kernel: wd0a: wdstart: timeout waiting to give command reading fsbn 54352 of 54352-54367 (wd0 bn 54352; cn 53 tn 14 sn 46)wd0: status 80<busy> error 4<abort> I went to Western Digital's web site (http://www.wdc.com) and down loaded their diagnostic program. Typically, it ran perfectly foam DOS and continued to report no problems with the drive. It seemed to just be doing a surface scan, and had no random seek test, and I strongly suspect that it was not enabling interrupts, but simply polling the drive. Finally I got lucky I guess, and one of these "clicks" occurred which was detected as an error. WD is now sending me a new drive under their 3 year warranty. (Unfortunately, I couldn't wait a week and had to go out and buy a new one now -- the good news is that IDE drives are really cheap these days.) Conclusion and suggestions: 1. Make sure you have a current backup 2. Be prepared that the drive may be going bad 3. Monitor your logs and watch for any other symptoms, or an increasing frequency of errors 4. Go to the manufacturers web site and see if they have anything that can help diagnose a problem Good luck. Dave Bodenstab imdave@mcs.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708081642.LAA03884>