From owner-freebsd-stable@FreeBSD.ORG Tue Jul 7 20:54:02 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 976431065672 for ; Tue, 7 Jul 2009 20:54:02 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-yx0-f181.google.com (mail-yx0-f181.google.com [209.85.210.181]) by mx1.freebsd.org (Postfix) with ESMTP id 4B6D08FC41 for ; Tue, 7 Jul 2009 20:54:02 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by yxe11 with SMTP id 11so7316268yxe.3 for ; Tue, 07 Jul 2009 13:54:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=our+FETW1sfuYiMVCafbkOyVZ8OIUUYp46XvY2VLDuQ=; b=WfLM/UplZknQE6pHguoMEoLY+xnbEfRSFR5ou1aJVs0xYeyxIVNVVWjj7fdFZK/OMf f2kq6bfPDLG2lOxZxIsQxgvMEUw892x74kJQHbDSNfKcDGdC5V9M1HBuYuTkl1QGbGxr 2J1UmjD9Lx7jzcNQEOGDFXE0ZO+x+5dfwYdBo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=pE14euKv5nykihnxUlEUYifHqFizh8c3YhD/UzyGahqgfP2iDDOghxJBeyDSAwSweo uzV6bBKPjG/3Zej2q20epeQZ+cZGXJBbC6jqCEJCWGT1ZYnQr+tPjeRn34wd0AwhLbUl wzB2eFW9K95buPypdtOPM5dDOZv3Mz047opEQ= MIME-Version: 1.0 Received: by 10.151.111.19 with SMTP id o19mr598138ybm.6.1247000041689; Tue, 07 Jul 2009 13:54:01 -0700 (PDT) In-Reply-To: <20090707195614.GA24326@martini.nu> References: <20090707195614.GA24326@martini.nu> Date: Tue, 7 Jul 2009 13:54:01 -0700 Message-ID: From: Freddie Cash To: freebsd-stable@freebsd.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS: drive replacement performance X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2009 20:54:02 -0000 On Tue, Jul 7, 2009 at 12:56 PM, Mahlon E. Smith wrote: > I've got a 9 sata drive raidz1 array, started at version 6, upgraded to > version 13. I had an apparent drive failure, and then at some point, a > kernel panic (unrelated to ZFS.) The reboot caused the device numbers > to shuffle, so I did an 'export/import' to re-read the metadata and get > the array back up. > This is why we've started using glabel(8) to label our drives, and then add the labels to the pool: # zpool create store raidz1 label/disk01 label/disk02 label/disk03 That way, it does matter where the kernel detects the drives or what the physical device node is called, GEOM picks up the label, and ZFS uses the label. > Once I swapped drives, I issued a 'zpool replace'. > See comment at the end: what's the replace command that you used? > > That was 4 days ago now. The progress in a 'zpool status' looks like > this, as of right now: > > scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go > > ... which is a little concerning, since a) it appears to have not moved > since I started it, and b) I'm in a DEGRADED state until it finishes... > if it finishes. > There's something wrong here. It definitely should be incrementing. Even when we did the foolish thing of creating a 24-drive raidz2 vdev and had to replace a drive, the progress bar did change. Never got above 39% as it kept restarting, but it did increment. > > So, I reach out to the list! > > - Is the resilver progress notification in a known weird state under > FreeBSD? > > - Anything I can do to kick this in the pants? Tuning params? > I'd redo the replace command, and check the output of "zpool status" to make sure it's showing the proper device node and not some random string of numbers like it is. > - This was my first drive failure under ZFS -- anything I should have > done differently? Such as NOT doing the export/import? (Not sure > what else I could have done there.) > If you knew which drive it was, I'd have shutdown the server and replaced it, so that the drives came back up renumbered correctly. This happened to us once when I was playing around with simulating dead drives (pulling drives) and rebooting. That's when I moved over to using glabels. % zpool status store > pool: store > state: DEGRADED > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scrub: resilver in progress for 0h0m, 0.00% done, 2251h0m to go > config: > > NAME STATE READ WRITE CKSUM > store DEGRADED 0 0 0 > raidz1 DEGRADED 0 0 0 > da0 ONLINE 0 0 0 274K > resilvered > da1 ONLINE 0 0 0 282K > resilvered > replacing DEGRADED 0 0 0 > 2025342973333799752 UNAVAIL 3 4.11K 0 was /dev/da2 > da8 ONLINE 0 0 0 418K > resilvered > da2 ONLINE 0 0 0 280K > resilvered > da3 ONLINE 0 0 0 269K > resilvered > da4 ONLINE 0 0 0 266K > resilvered > da5 ONLINE 0 0 0 270K > resilvered > da6 ONLINE 0 0 0 270K > resilvered > da7 ONLINE 0 0 0 267K > resilvered > > errors: No known data errors > > > ----------------------------------------------------------------------- > > > % zpool iostat -v > capacity operations bandwidth > pool used avail read write read write > ------------------------- ----- ----- ----- ----- ----- ----- > store 1.37T 2.72T 49 106 138K 543K > raidz1 1.37T 2.72T 49 106 138K 543K > da0 - - 15 62 1017K 79.9K > da1 - - 15 62 1020K 80.3K > replacing - - 0 103 0 88.3K > 2025342973333799752 - - 0 0 1.45K 261 > da8 - - 0 79 1.45K 98.2K > da2 - - 14 62 948K 80.3K > da3 - - 13 62 894K 80.0K > da4 - - 14 63 942K 80.3K > da5 - - 15 62 992K 80.4K > da6 - - 15 62 1000K 80.1K > da7 - - 15 62 1022K 80.1K > ------------------------- ----- ----- ----- ----- ----- ----- > That definitely doesn't look right. It should be showing the device name there in the "replacing" section. What's the exact "zpool replace" command that you used? -- Freddie Cash fjwcash@gmail.com