Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Aug 2011 16:21:25 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Dan Langille <dan@langille.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: bad sector in gmirror HDD
Message-ID:  <20110819232125.GA4965@icarus.home.lan>
In-Reply-To: <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>
References:  <1B4FC0D8-60E6-49DA-BC52-688052C4DA51@langille.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 19, 2011 at 04:50:01PM -0400, Dan Langille wrote:
> System in question: FreeBSD 8.2-STABLE #3: Thu Mar  3 04:52:04 GMT 2011
> 
> After a recent power failure, I'm seeing this in my logs:
> 
> Aug 19 20:36:34 bast smartd[1575]: Device: /dev/ad2, 2 Currently unreadable (pending) sectors

I doubt this is related to a power failure.

> Searching on that error message, I was led to believe that identifying the bad sector and
> running dd to read it would cause the HDD to reallocate that bad block.
> 
>   http://smartmontools.sourceforge.net/badblockhowto.html

This is incorrect (meaning you've misunderstood what's written there).

Unreadable LBAs can be a result of the LBA being actually bad (as in
uncorrectable), or the LBA being marked "suspect".  In either case the
LBA will return an I/O error when read.

If the LBAs are marked "suspect", the drive will perform re-analysis of
the LBA (to determine if the LBA can be read and the data re-mapped, or
if it cannot then the LBA is marked uncorrectable) when you **write** to
the LBA.

The above smartd output doesn't tell me much.  Providing actual SMART
attribute data (smartctl -a) for the drive would help.  The brand of the
drive, the firmware version, and the model all matter -- every drive
behaves a little differently.

Furthermore, if the LBA is re-analysed and determined to be
uncorrectable -- regardless of remapping -- this doesn't actually fix
I/O errors on a filesystem level.  The filesystem itself (and more often
than not in the data section of the file/inode, so things like fsck
can't work around this) can still contain references to the LBA which is
uncorrectable, and will still continue to return I/O errors when read.
There has to be a way to tell the filesystem, when formatted, "avoid use
of this LBA".  How UFS/FFS handles this is unknown to me.  I know of
badsect(8) but I don't know if this works.  "Transparent" remapping I
have never seen work except on SSDs.

If you want me to step you through the procedure of re-testing the LBAs
(assuming they're suspect and not uncorrectable) I can do so, just ask.
Finding the suspect LBAs can be done using a dd loop (I wrote a shell
script for this), or using "smartctl -t select,0-max /dev/XXX" and let
the drive's internal selective test see if it can find them.  From there
it's an issue of submitting a write request to the LBA and seeing what
happens (I do this via dd as well, but the parameters you pass it are
very specific, e.g. don't mix up/misunderstand seek vs. skip).

I've assisted with this time and time again for folks on forums with
varying success.

I've also found some models of drives which claim there's suspect LBAs
yet an internal surface scan passes with no issues (and these are drives
which I myself have, the only difference between my drives and the
individuals' drive is firmware, which leads me to believe a bug on some
drives in the field).

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110819232125.GA4965>