Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Aug 2009 00:32:47 +0200
From:      Roland Smith <rsmith@xs4all.nl>
To:        Kelly Martin <kellymartin@gmail.com>
Cc:        FreeBSD Questions <freebsd-questions@freebsd.org>
Subject:   Re: hard disk failure - now what?
Message-ID:  <20090824223247.GD43410@slackbox.xs4all.nl>
In-Reply-To: <1338880b0908241129p75b6845cg26d21804e118364@mail.gmail.com>
References:  <1338880b0908241129p75b6845cg26d21804e118364@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--/3yNEOqWowh/8j+e
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Aug 24, 2009 at 12:29:19PM -0600, Kelly Martin wrote:
> I just experienced a hard drive failure on one of my FreeBSD 7.2
> production servers with no backup! I am so mad at myself for not
> backing up!!

Welcome to the club. :-)

> Now it's a salvage operation. Here are the type of errors
> I was getting on the console, over-and-over:
>=20
> ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=3D441633503
> ad4: WARNING - SETFEATURES ENABLE RCACHE taskqueue timeout -
> completing request directly
> ad4: WARNING - SETFEATURES ENABLE WCACHE taskqueue timeout -
> completing request directly
> ad4: WARNING - SET_MULTI taskqueue timeout - completing request directly
> ad4: FAILURE - WRITE_DMA48 timed out LBA=3D441633375
> g_vgs_done():ad4s1f[WRITE(offset=3D216338284544, length=3D16384)]error =
=3D 5

It _could_ just be a bad or improperly connected SATA cable. Try changing or
re-seating the cable.

Read errors cannot damage your data, but write errors can! Immediately stop
all writing to the disk. Re-mount the partitions on that disk as read-only,=
 or
unmount them.

To see if a disk really is broken, install sysutils/smartmontools, and run
'smartctl -a' on the disk. If you see errors in its report (e.g. reallocated
sectors), the disk is dying and should be unplugged to prevent it from gett=
ing
worse.

> My question: what kind of checks and/or repair tools should I run on
> the damaged drive after it's mounted?

As others have mentioned, first make a copy (with the disk unmounted) of the
partitions on that disk with dd, saving them to another drive. That way you
can experiment with the data without further deterioration of the
original. You can use this disk image e.g. as a vnode-backed memory disk, s=
ee
mdconfig(8). If you cannot get a good copy of the disk partitions it might =
be
a good idea to get a quote from a professional hard drive data recovery
company to do that for you. I've never had occasion to try this (hooray for
backups) but I've heard it can be quite expensive. :-/

Try using fsck_ffs on (copies of) the disk image to see if that can restore
the damage. If the damage is beyond repair for fsck_ffs, you have a real
problem. Of course is you have a good disk image, your data is still
there, but you might have to use a forensics program like sysutils/sleuthkit
or hexdump to try and piece files together. And even then you cannot be sure
that there is no corrupted data in the files themselves. Good luck with tha=
t. :-(


Roland
--=20
R.F.Smith                                   http://www.xs4all.nl/~rsmith/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)

--/3yNEOqWowh/8j+e
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.12 (FreeBSD)

iEYEARECAAYFAkqTFQ8ACgkQEnfvsMMhpyVMhwCgr5h3MubFYhWDlv3eMMeI5hAD
sWcAniUb8hErDp7loTu95UvQJ/Mc5YUZ
=vn7D
-----END PGP SIGNATURE-----

--/3yNEOqWowh/8j+e--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090824223247.GD43410>