Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Oct 2013 14:22:25 +0200
From:      Maurizio Vairani <maurizio.vairani@cloverinformatica.it>
To:        Garrett Wollman <wollman@bimajority.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: How to unstick ZFS resilver?
Message-ID:  <525E8501.7080302@cloverinformatica.it>
In-Reply-To: <21084.48646.196295.776944@hergotha.csail.mit.edu>
References:  <21084.48646.196295.776944@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On 15/10/2013 06:01, Garrett Wollman wrote:
> I have a large (88-drive) zpool in which a drive was recently
> replaced.  (The pool has a bunch of duff Toshiba MK2001TRKB drives --
> never ever pay money for these! -- and I'm trying to replace them one
> by one before they fail completely.)  The resilver on the first drive
> replacement has been taking much much too long, and currently it's
> stuck in this state:
>
>    pool: export
>   state: DEGRADED
> status: One or more devices is currently being resilvered.  The pool will
>          continue to function, possibly in a degraded state.
> action: Wait for the resilver to complete.
>    scan: resilver in progress since Wed Oct  9 14:54:47 2013
>          86.5T scanned out of 86.8T at 1/s, (scan is slow, no estimated time)
>          982G resilvered, 99.62% done
>
> The overall progress hasn't changed in twelve hours, even across a
> reboot, and the server is fairly lightly loaded.  Searching the Web is
> no help; can anyone suggest a remedial action?  (This is on
> 9.1-RELEASE, with our local patches, and all the drives are SAS.)
>
> In exchange, I offer the following DTrace script which I used to
> identify the slow SAS drives:
>
> #!/usr/sbin/dtrace -s
>
> #pragma D option quiet
> #pragma D option dynvarsize=2m
>
> inline int TOO_SLOW = 100000000;	/* 100 ms */
>
> dtrace:::BEGIN
> {
>          printf("Tracing... Hit Ctrl-C to end.\n");
> }
>
> fbt::dastrategy:entry
> {
>          start_time[(struct buf *)arg0] = timestamp;
> }
>
> fbt::dadone:entry
> /(this->bp = (struct buf *)args[1]->ccb_h.periph_priv.entries[1].ptr)&&  start_time[this->bp]&&  (timestamp - start_time[this->bp])>  TOO_SLOW/
> {
>          @[strjoin("da", lltostr(args[0]->unit_number))] = count();
>          start_time[this->bp] = 0;
> }
>
> -GAWollman
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

Before replace the drives. Have you upgrade the  MK2001TRKB firmware ?

 From newegg.com:
"Make sure the firmware is at 0106. The issue with the old firmware 
(0105) has to do with the drive going into and then coming out of Idle 
mode B. An over voltage condition can possibly occur which will damage 
the head on the drive. This damage can lead to an increased level of 
read errors.

0106 firmware fixes the issue in drives that have not yet failed but it 
does nothing for drives that have already failed and unfortunately there 
is no way to recover those drives."

Maurizio



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?525E8501.7080302>