Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 05 May 2010 16:56:41 +0200
From:      Harald Schmalzbauer <h.schmalzbauer@omnilan.de>
To:        FreeBSD Stable <freebsd-stable@freebsd.org>
Subject:   Re: ZFS (zpool) doesn't detect failed drive
Message-ID:  <4BE18729.3050209@omnilan.de>
In-Reply-To: <4BE16784.8050400@omnilan.de>
References:  <4BE16784.8050400@omnilan.de>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig8BD6264B362B074B88954B27
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: quoted-printable

Harald Schmalzbauer schrieb am 05.05.2010 14:41 (localtime):
> Hello,
>=20
> one drive of my mirror failed today, but 'zpool staus' shows it "online=
".
> Every process using a ZFS mount hangs. Also 'zpool offline /dev/ad1'=20
> hangs infinitely.
=2E..
Sorry, I made an error with zpool create. Somehow the little word=20
"mirror" must have been lost. So the pool wasn't a mirror but a stripe.=20
Then of course I can't make one vdev offline. Sorry for the noise.
But I took the opportunity to do some tests with that failing drive and=20
created a _real_ mirror. That works without failures, but using the=20
mirror again leads to:
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ad1: TIMEOUT - FLUSHCACHE48 retrying (1 retry left)
ata3: port is not ready (timeout 10000ms) tfd =3D 00000080
ata3: hardware reset timeout
ad1: FAILURE - device detached

Now zpool reporsts the vdev ad1 still online although it has been=20
detached and 'atacontrol list' doesn't show it anymore:

zpool status
   pool: URUBAmirrorP1
  state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are=20
unaffected.
action: Determine if the device needs to be replaced, and clear the error=
s
         using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         URUBAmirrorP1  ONLINE       0     0     0
           mirror    ONLINE       0     0     0
             ad1     ONLINE       3  302K     0
             ad2     ONLINE       0     0     0

errors: No known data errors

atacontrol list
ATA channel 2:
     Master:  ad0 <TRANSCEND/20090520> SATA revision 1.x
     Slave:       no device present
ATA channel 3:
     Master:      no device present
     Slave:       no device present
ATA channel 4:
     Master:  ad2 <SAMSUNG HD154UI/1AG01118> SATA revision 2.x
     Slave:       no device present
ATA channel 5:
     Master:  ad3 <ST3750640NS/3.AEG> SATA revision 1.x
     Slave:       no device present

How should such a failure be handled?
Do I have to manually mark the drive offline for zpool?

Thanks,

-Harry


--------------enig8BD6264B362B074B88954B27
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.13 (FreeBSD)

iEYEARECAAYFAkvhhykACgkQLDqVQ9VXb8jSkgCgpLygtJqPYi+8ZrCCuUdyI7Pw
LmQAnRn4VGBFQDN8ufU2ckVDMBT9x/NA
=9sN5
-----END PGP SIGNATURE-----

--------------enig8BD6264B362B074B88954B27--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4BE18729.3050209>