Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 8 Mar 2010 05:46:57 -0600 (CST)
From:      Wes Morgan <morganw@chemikals.org>
To:        Miroslav Lachman <000.fbsd@quip.cz>
Cc:        Eugeny N Dzhurinsky <bofh@redwerk.com>, freebsd-current@freebsd.org
Subject:   Re: A tool for remapping bad sectors in CURRENT?
Message-ID:  <alpine.BSF.2.00.1003080540130.1526@ibyngvyr>
In-Reply-To: <4B94DDC8.5080008@quip.cz>
References:  <20100308102918.GA5485@localhost> <4B94DDC8.5080008@quip.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 8 Mar 2010, Miroslav Lachman wrote:

> Eugeny N Dzhurinsky wrote:
> > Hello, all!
> >
> > Recently I've started to see the following logs in messages:
> >
> > Mar  8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Currently
> > unreadable (pending) sectors
> > Mar  8 12:00:24 localhost smartd[795]: Device: /dev/ad4, 2 Offline
> > uncorrectable sectors
> >
> > smartctl did really show that something is wrong with my HDD, but still no
> > remaps - just read errors.
> >
> > SMART Self-test log structure revision number 1
> > Num  Test_Description    Status                  Remaining  LifeTime(hours)
> > LBA_of_first_error
> > # 1  Extended offline    Completed: read failure       60%      1198
> > 222342559
> > # 2  Extended offline    Completed: read failure       60%      1187
> > 222342557
> > # 3  Extended offline    Completed: read failure       60%      1180
> > 222342559
> > # 4  Short offline       Completed without error       00%      1178
> > -
> > # 5  Extended offline    Aborted by host               90%      1178
> > -
> >
> > and
> >
> > ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> > WHEN_FAILED RAW_VALUE
> > ...
> > Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -
> > 0
> > ...
> >
> > Now can I find out which file owns the LBAs 222342557 and 222342559 ? How do
> > I
> > force remapping of these sectors? I assume that I have to write something
> > directly to the sectors?
>
> We have this problem from time to time on bunch of machines. As we are using
> gmirror, the easiest way is to force re-synchronization (rewrite) of the whole
> drive. The problem is when there are Pending unreadable sectors on both drives
> - it ends up with read error and some file(s) are corrupted, but there is no
> easy way (on FreeBSD) to find what file.

*cough* zfs *cough*

I believe this kind of silent corruption is precisely what zfs was
designed to prevent. Even though you do have a mirror, how do you know
which copy is the correct one? If one drive re-allocates the sector
silently, what is the recovery method? If gmirror synchronizes, how do you
make sure that the *good* copy is the one synchronized? You'll notice it
eventually if you see it in a garbled file, but how does the filesystem
handle it?

> I tried it in the past with fsdb / findblk, but it does not work as I expect
> or I do not fully understand the needed calculations with slices + partitions
> offsets / LBAs and right meaning of the term "block". It seems there are
> several meaning in different contexts.
>
> It would be nice if somebody with enough FS / GEOM knowledge can write some
> HowTo or shell script to do the calculations and operations to find file
> containing bad sector(s) and put it in FAQ, Handbook, or Wiki.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1003080540130.1526>