Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jul 2011 01:29:12 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        lev@FreeBSD.org
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: ahci.ko / geom_mirror / zfs hangs up system when one of HDDs fauilts.
Message-ID:  <4E2B4B38.70207@FreeBSD.org>
In-Reply-To: <2710115660.20110723004620@serebryakov.spb.ru>
References:  <1981757790.20110720013856@serebryakov.spb.ru> <4E29A3D6.1080609@FreeBSD.org> <2710115660.20110723004620@serebryakov.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Lev Serebryakov wrote:
> Hello, Alexander.
> You wrote 22 èþëÿ 2011 ã., 20:22:46:
> 
>>>  Screenshot of LARA console in such case is attached.
>> Kernel messages look like if controller or device stuck, unable to
>> complete some command and can't recover from that condition even after
>> device hard reset. I don't see what driver can do about it, except being
>> more aggressive in dropping faulty device after several consecutive
>> timeouts. If that is not a wanted way out, start from updating card BIOS
>> and devices firmware.
>   It is very common hardware: ICH10 on MS-7522 (MSI X58 Platinum) motherboard:
> 
> ahci0: <Intel ICH10 AHCI SATA controller> port 0xb000-0xb007,0xac00-0xac03,0xa880-0xa887,0xa800-0xa803,0xa480-0xa49f mem 0xf9ffa000-0xf9ffa7ff irq 19 at device 31.2 on pci0
> ahci0: [ITHREAD]
> ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier not supported
> 
>   And Samsung F3 drive:
> 
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1: <SAMSUNG HD754JJ 1AJ10001> ATA-8 SATA 2.x device
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 715404MB (1465149168 512 byte sectors: 16H 63S/T 16383C)
> 
>   I'm not sure, that it is possible to update firmware on these
> drives. And MoBo BIOS looks like latest one.

Then I have no idea what to do about the cause of errors. What's about
consequences, I've tried to simulate alike problem (device detected, but
doesn't respond). Recovery (dropping failed device) took a lot of time,
but finally (after about 10 minutes) it succeeded and ZFS continued
operation without that drive. After that I've just committed one patch
to the HEAD and sent another one to freebsd-scsi@ for review. That, I
hope, should significantly (down to 1-2 minutes) speedup that process.

How long have you waited before and after making that screenshot?

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E2B4B38.70207>