From owner-freebsd-hackers@FreeBSD.ORG Fri Oct 24 15:33:25 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 10F94563; Fri, 24 Oct 2014 15:33:25 +0000 (UTC) Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 78FFC33F; Fri, 24 Oct 2014 15:33:24 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id a1so1328591wgh.11 for ; Fri, 24 Oct 2014 08:33:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=wgXrmU+8kn1JH5xWfoyDqY0K1SagVeWCg58rIyA14Ro=; b=u5U1J+Gn1f5euOJ+sLti7ZS7B+VkHIHDVMUw7f8rTpEQl+rmAONsGO3ScdB0YV4pHM 02WKeiXkLKdFPe5mq1OH8rQ+p+cbFK0g7u2GXKulmdOgRQcplG0QFPshE+nLufqbMqXJ hh6d0tmOm1UrKqVkhF4mFHt4336ioLHXkcqX4yFzVDWyT4CkOr7WO7cAX90qBGwLOz4B S7OstHMV0q9LVX48Ef7+SLwr+QfaFdWQZlS4WuFnX/4Jxetx3762AVzmZ475+XSabpgH lwIpF+PL0G6/fvrEX96uqEr4LppaD6/ecB01lcfw3+iW0LdlJ7kFWl3gmfjyS5Bxmukg IR6g== MIME-Version: 1.0 X-Received: by 10.180.109.99 with SMTP id hr3mr4907955wib.82.1414164802558; Fri, 24 Oct 2014 08:33:22 -0700 (PDT) Sender: asomers@gmail.com Received: by 10.194.220.227 with HTTP; Fri, 24 Oct 2014 08:33:22 -0700 (PDT) In-Reply-To: References: Date: Fri, 24 Oct 2014 09:33:22 -0600 X-Google-Sender-Auth: 8SCGdQ-LlO1ZTBX4V5xzL6yIcYw Message-ID: Subject: Re: ZFS errors on the array but not the disk. From: Alan Somers To: Zaphod Beeblebrox Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs , FreeBSD Hackers X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Oct 2014 15:33:25 -0000 On Thu, Oct 23, 2014 at 11:37 PM, Zaphod Beeblebrox wrote: > What does it mean when checksum errors appear on the array (and the vdev) > but not on any of the disks? See the paste below. One would think that > there isn't some ephemeral data stored somewhere that is not one of the > disks, yet "cksum" errors show only on the vdev and the array lines. Help? > > [2:17:316]root@virtual:/vr2/torrent/in> zpool status > pool: vr2 > state: ONLINE > status: One or more devices is currently being resilvered. The pool will > continue to function, possibly in a degraded state. > action: Wait for the resilver to complete. > scan: resilver in progress since Thu Oct 23 23:11:29 2014 > 1.53T scanned out of 22.6T at 62.4M/s, 98h23m to go > 119G resilvered, 6.79% done > config: > > NAME STATE READ WRITE CKSUM > vr2 ONLINE 0 0 36 > raidz1-0 ONLINE 0 0 72 > label/vr2-d0 ONLINE 0 0 0 > label/vr2-d1 ONLINE 0 0 0 > gpt/vr2-d2c ONLINE 0 0 0 block size: 512B > configured, 4096B native (resilvering) > gpt/vr2-d3b ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-d4a ONLINE 0 0 0 block size: 512B > configured, 4096B native > ada14 ONLINE 0 0 0 > label/vr2-d6 ONLINE 0 0 0 > label/vr2-d7c ONLINE 0 0 0 > label/vr2-d8 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > gpt/vr2-e0 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e1 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e2 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e3 ONLINE 0 0 0 > gpt/vr2-e4 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e5 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e6 ONLINE 0 0 0 block size: 512B > configured, 4096B native > gpt/vr2-e7 ONLINE 0 0 0 block size: 512B > configured, 4096B native > > errors: 43 data errors, use '-v' for a list The checksum errors will appear on the raidz vdev instead of a leaf if vdev_raidz.c can't determine which leaf vdev was responsible. This could happen if two or more leaf vdevs return bad data for the same block, which would also lead to unrecoverable data errors. I see that you have some unrecoverable data errors, so maybe that's what happened to you. Subtle design bugs in ZFS can also lead to vdev_raidz.c being unable to determine which child was responsible for a checksum error. However, I've only seen that happen when a raidz vdev has a mirror child. That can only happen if the child is a spare or replacing vdev. Did you activate any spares, or did you manually replace a vdev? -Alan