Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Jul 2011 19:22:46 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        lev@FreeBSD.org
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: ahci.ko / geom_mirror / zfs hangs up system when one of HDDs fauilts.
Message-ID:  <4E29A3D6.1080609@FreeBSD.org>
In-Reply-To: <1981757790.20110720013856@serebryakov.spb.ru>
References:  <1981757790.20110720013856@serebryakov.spb.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Lev Serebryakov wrote:
>   I've have two identical live locks when HDD becomes broken on
> 8.2-STABLE system with two SATA HDDs withgmirror and ZFS on them.
> 
>   It is Hetzner-based server, so only access I have is LARA console,
> but symptoms are identical in both cases: HDD becomes bad, ahci.ko
> complains about timeouts, and after that server stops to respond on
> high-level access attempts (ssh/HTTP/SMTP), but can be pinged both
> with IPv4 and IPv6 addresses.
> 
>  HDDs are identical, and they are splitted into several (BSD)partions.
> Some partitions are mirrired with geom_mirror and one pair of
> partitions are added to (mirrored) ZFS pool like this (I proved output
> on rebooted one-HDD-only system, but, I think, it is clear how it
> looks when both HDDs are Ok):
> 
>  Screenshot of LARA console in such case is attached.

Kernel messages look like if controller or device stuck, unable to
complete some command and can't recover from that condition even after
device hard reset. I don't see what driver can do about it, except being
more aggressive in dropping faulty device after several consecutive
timeouts. If that is not a wanted way out, start from updating card BIOS
and devices firmware.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E29A3D6.1080609>