Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 5 Sep 2005 13:19:12 -0400
From:      Jason Morgan <jwm-freebsd@sentinelchicken.net>
To:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: Hard disk woes
Message-ID:  <20050905171912.GF67960@sentinelchicken.net>
In-Reply-To: <20050905151332.P16924@saturn.araneidae.co.uk>
References:  <20050905151332.P16924@saturn.araneidae.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Sep 05, 2005 at 03:16:13PM +0000, Michael Abbott wrote:
> I'm having some very odd behaviour from one of my hard disks and I wonder
> what anybody makes of it.
> 
> In brief, the hard disk in questions works just fine much of the time, but
> when high volume data transfers are requested I get the following in
> /var/log/messages:
> 
> Sep  3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:02 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:12 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:12 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:23 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:23 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:33 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:33 saturn /kernel: ad6: trying fallback to PIO mode
> Sep  3 15:21:33 saturn /kernel: ata3: resetting devices .. done
> Sep  3 15:21:43 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - 
> resetting
> Sep  3 15:21:43 saturn /kernel: ata3: resetting devices .. ata3-slave: ATA 
> identify retries exceeded
> Sep  3 15:21:43 saturn /kernel: done
> 
> After this point the hard disk in question is frozen until I reboot, and
> any process that tries to touch it is similarly frozen (doesn't even
> respond to kill -9).  `shutdown -r` is enough to restore operation, and
> the rest of the system seemed happy enough.
> 
> Another interesting effect.  I placed a replacement hard disk on the same
> ATA bus (as a slave, device ad7) and tried copying files from ad6 to ad7.
> This time when ad6 froze and the kerned decided to give up on ata3 (and so
> decided to disable ad7 at the same time, naturally enough) the entire
> system froze!  No response from the console, stone cold dead, hard reset
> needed.
> 
> 
> So some questions seem to me to arise from this.
> 
> 1.  Why does FreeBSD handle this so ungracefully?  If restarting is
> sufficient to bring ata3 back then can't the ata driver do a proper
> restart?
> 
> 2.  Goodness me, FreeBSD froze!  I know it's a hardware failure, but
> still: it's on a auxillary ATA controller with no system files attached.
> Is this problem of general interest?  It's certainly a massive hint to me
> not to consider (parallel) ATA for RAID!
> 
> 3.  Any thoughts on what is wrong with the hard disk in question?  I've
> changed ATA controllers, so it seems to be the disk, not the controller.
> The behaviour is very odd.  If I copy files off one at a time, eg using:
>  	find . -type f -exec cp {} "$TARGET/"{} \; -exec echo -n '.' \;
> the disk seems to hang in there, but if I just do
>  	cp -R . "$TARGET"
> then it freezes!  (This statement may not have been thoroughly tested:
> having to restart each time gets old quite quickly.)
> 
> 
> Ok, now for the boring bits.
> 
> $ uname -a
> FreeBSD saturn.araneidae.co.uk 4.11-RELEASE-p11 FreeBSD 4.11-RELEASE-p11 
> #6: Sat Aug 27 16:33:58 GMT 2005     
> root@saturn.araneidae.co.uk:/usr/obj/usr/src/sys/GENERIC  i386
> $ dmesg | grep ata
> atapci0: <HighPoint HPT370 ATA100 controller> port 
> 0xa000-0xa0ff,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 irq 
> 12 at device 11.0 on pci0
> ata2: at 0x9000 on atapci0
> ata3: at 0x9800 on atapci0
> atapci1: <VIA 8233 ATA133 controller> port 0xa800-0xa80f at device 17.1 on 
> pci0
> ata0: at 0x1f0 irq 14 on atapci1
> ata1: at 0x170 irq 15 on atapci1
> atapci2: <HighPoint HPT372 ATA133 controller> port 
> 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 irq 
> 10 at device 19.0 on pci0
> ata4: at 0xb400 on atapci2
> ata5: at 0xbc00 on atapci2
> ad0: 39083MB <Maxtor 4D040H2> [79408/16/63] at ata0-master UDMA100
> ad1: 190782MB <SAMSUNG SP2014N> [387621/16/63] at ata0-slave UDMA133
> ad4: 76319MB <ST380021A> [155061/16/63] at ata2-master UDMA100
> ad6: 76319MB <ST380021A> [155061/16/63] at ata3-master UDMA100
> acd0: DVD-ROM <CREATIVEDVD-ROM DVD2240E 12/24/97> at ata1-master PIO4
> $ sudo atacontrol cap ata3 0
> ATA channel 3, Master, device ad6:
> 
> ATA/ATAPI revision    5
> device model          ST380021A
> serial number         3HV0MYL9
> firmware revision     3.10
> cylinders             16383
> heads                 16
> sectors/track         63
> lba supported         156301488 sectors
> lba48 not supported dma supported
> overlap not supported
> 
> Feature                      Support  Enable    Value   Vendor
> write cache                    yes      yes
> read ahead                     yes      yes
> dma queued                     no       no      0/00
> SMART                          yes      no
> microcode download             yes      yes
> security                       yes      no
> power management               yes      yes
> advanced power management      no       no      65278/FEFE
> automatic acoustic management  yes      yes     128/80  128/80
> $
> 
> That's everything I can think of.
> 

Just a general comment:

I had a very similar problem a while back. After replacing the drive in
question, then replacing the motherboard, I discovered it was a power
issue. The power supply was freaking out at medium to high loads, which
was causing the device to continually reset.

Jason



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050905171912.GF67960>