Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 4 Oct 2009 19:47:47 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Solon Lutz <solon@pyro.de>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: Help needed! ZFS I/O error recovery?
Message-ID:  <20091004174746.GF1660@garage.freebsd.pl>
In-Reply-To: <683849754.20091001110503@pyro.de>
References:  <683849754.20091001110503@pyro.de>

next in thread | previous in thread | raw e-mail | index | archive | help

--CXFpZVxO6m2Ol4tQ
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Oct 01, 2009 at 11:05:03AM +0200, Solon Lutz wrote:
> Hi erverybody,
>=20
> I'm faced with a 10TB ZFS pool on a 12TB RAID6 Areca controller.
> And yes, I know, you shouldn't put a zpool on a RAID-device... =3D(

Just to be sure: you have no redundancy on ZFS level at all? That's
very, very bad idea for important data (you know that already, but to
warn others)...

> The cable was replaced, a parity check was run on the RAID-Volume and
> showed no errors, the zfs scrub however showed some 'defective' files.
> After copying these files with 'dd -conv=3Dnoerror...' and comparing them
> to the originals, they were error-free.
>=20
> Yesterday however, three more defective cables forced the controller
> to take the RAID6 volume offline. Now all cables were replaced and a pari=
ty
> check was run on the RAID-Volume -> data integrity OK.

This means absolutely nothing. It just means that parity match the
actual data, it doesn't mean the data is fine from file system or
application perspective.

> But now ZFS refuses to mount all volumes:
>=20
> Solaris: WARNING: can't process intent log for temp/space1
> Solaris: WARNING: can't process intent log for temp/space2
> Solaris: WARNING: can't process intent log for temp/space3
> Solaris: WARNING: can't process intent log for temp/space4
>=20
> A scrub revealed to following:
>=20
> errors: Permanent errors have been detected in the following files:
>=20
>         temp:<0x0>
>         temp/space1:<0x0>
>         temp/space2:<0x0>
>         temp/space3:<0x0>
>         temp/space4:<0x0>
>=20
>=20
> I tried to switch off checksums for this pool, but that didn't help in any
> way. I also mounted the pool by hand and was faced with with 'empty' volu=
mes
> and 'I/O errors' when trying to list their contents...
>=20
> Any suggestions? I'm offering some self-made blackberry jam and raspberry=
 brandy
> to the person who can help to restore or backup the data.
>=20
> Tech specs:
>=20
> FreeBSD 7.2-STABLE #21: Tue May  5 18:44:10 CEST 2009 (AMD64)
> da0 at arcmsr0 bus 0 target 0 lun 0
> da0: <Areca ARC-1280-VOL#00 R001> Fixed Direct Access SCSI-5 device
> da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit)
> da0: Command Queueing Enabled
> da0: 10490414MB (21484367872 512 byte sectors: 255H 63S/T 1337340C)
> ZFS filesystem version 6
> ZFS storage pool version 6

If you are able to backup your disks, do it before we go further. I've
some ideas, but they can mess up your data even further.

First of all I'd start with upgrading system to stable/8, there could be
better error recovery.

Do not write anything new to the pool, actually do not even read from it
as it may trigger writting as well.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--CXFpZVxO6m2Ol4tQ
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFKyN/CForvXbEpPzQRApwSAJ9BsZk3v4YCjhbbKgjcfQPxpGM3ewCfbQdf
0kw0+VtPZlxxmdHP1WJSB+0=
=hpWC
-----END PGP SIGNATURE-----

--CXFpZVxO6m2Ol4tQ--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091004174746.GF1660>