From owner-freebsd-questions@freebsd.org Sat Sep 4 09:56:52 2021 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id DE90E6AF2BD for ; Sat, 4 Sep 2021 09:56:52 +0000 (UTC) (envelope-from olgeni@FreeBSD.org) Received: from hub.olgeni.com (hub.olgeni.com [31.171.246.156]) by mx1.freebsd.org (Postfix) with ESMTP id 4H1qnc1fyWz4yRw for ; Sat, 4 Sep 2021 09:56:52 +0000 (UTC) (envelope-from olgeni@FreeBSD.org) Received: from [192.168.0.4] (94-36-151-155.adsl-ull.clienti.tiscali.it [94.36.151.155]) by hub.olgeni.com (Postfix) with ESMTPSA id 6FD1ED7946 for ; Sat, 4 Sep 2021 11:56:45 +0200 (CEST) Date: Sat, 4 Sep 2021 11:56:44 +0200 (CEST) From: Jimmy Olgeni To: freebsd-questions@freebsd.org Subject: Locating ZFS checksum errors Message-ID: X-OpenPGP-KeyID: 0xFCDB3E82F778D8D7 X-OpenPGP-Fingerprint: EE37 B427 91C5 7707 EC54 064A FCDB 3E82 F778 D8D7 X-OpenPGP-URL: http://hub.olgeni.com/~olgeni/pgp/olgeni@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 4H1qnc1fyWz4yRw X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [0.00 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; ASN(0.00)[asn:50837, ipnet:31.171.244.0/22, country:CH] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Sep 2021 09:56:52 -0000 Hi, Short version of the story: due to a bad RAM stick I managed to collect some checksum errors on a ZFS pool; they are not reported by a scrub, but show up when running "zdb -bcsvL". They look like this: capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum rpool 469G 451G 1.65K 0 146M 0 0 0 0 mirror 469G 451G 1.65K 0 146M 0 0 0 0 /dev/gpt/pool1 843 0 73.0M 0 0 0 98 /dev/gpt/pool0 842 0 73.1M 0 0 0 98 A few of them are logged during the scan: zdb_blkptr_cb: Got error 97 reading <404, 0, 1, 17> DVA[0]=<0:e0956e000:6000> DVA[1]=<0:1200c25000:6000> [L1 DMU dnode] fletcher4 lz4 unencrypted LE contiguous unique double size=20000L/6000P birth=7322L/7322P fill=5419 cksum=743743d4a15:652404d7275bf1:349b01108bcc58b4:6eb5731a7332a4d1 -- skipping zdb_blkptr_cb: Got error 97 reading <2271, 0, 5, 0> DVA[0]=<0:c2f30ab000:1000> DVA[1]=<0:c600436000:1000> [L5 DMU dnode] fletcher4 lz4 unencrypted LE contiguous unique double size=20000L/1000 P birth=7289L/7289P fill=337300 cksum=85bc497d18:1eb5fc0b1421b:38938f1daa6522b:5ad4e58754321611 -- skipping zdb_blkptr_cb: Got error 97 reading <3310, 4, 1, 0> DVA[0]=<0:e0956c000:2000> DVA[1]=<0:120086c000:2000> [L1 ZFS directory] fletcher4 lz4 unencrypted LE contiguous unique double size=20000L/2 000P birth=7322L/7322P fill=129 cksum=288290d57d8:bc9ebda8906ed:200f1da7dabb56ec:4fcfb4af9ef377a4 -- skipping zdb_blkptr_cb: Got error 97 reading <3722, 0, 0, 0> DVA[0]=<0:600a59000:1000> DVA[1]=<0:a000c5000:1000> [L0 DMU dnode] fletcher4 lz4 unencrypted LE contiguous unique double size=4000L/1000P b irth=7302L/7302P fill=28 cksum=aa07fc9336:1ad004ec7af20:25a14bccd8cf7cf:61322f0ae33d86ad -- skipping zdb_blkptr_cb: Got error 97 reading <3722, 0, 0, 2> DVA[0]=<0:601948000:1000> DVA[1]=<0:a05199000:1000> [L0 DMU dnode] fletcher4 lz4 unencrypted LE contiguous unique double size=4000L/1000P b irth=7316L/7316P fill=20 cksum=67169bc899:13f93c35f3010:1ff78fe1b055272:31b6e7e44bb229c0 -- skipping zdb_blkptr_cb: Got error 97 reading <3722, 139, 0, 0> DVA[0]=<0:ca000f6000:1000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=800L/800P birth=7298L/ 7298P fill=1 cksum=8af8a441c3:9476600aa3da:63e5ffe2b26478:3244cdb4fc8d9b34 -- skipping zdb_blkptr_cb: Got error 97 reading <3722, 0, 0, 9> DVA[0]=<0:600881000:1000> DVA[1]=<0:a000ba000:1000> [L0 DMU dnode] fletcher4 lz4 unencrypted LE contiguous unique double size=4000L/1000P b irth=7300L/7300P fill=11 cksum=4f4ecf4565:10844f3c60c42:1bdd551a4002c08:fb9b06c01f06226f -- skipping zdb_blkptr_cb: Got error 97 reading <3722, 385, 0, 0> DVA[0]=<0:e09567000:2000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=1400L/1400P birth=7322L /7322P fill=1 cksum=d3768c188b:21a57dbb50d02:37b485d3c25dc20:5c04a9cd9a910a53 -- skipping zdb_blkptr_cb: Got error 97 reading <3722, 760, 0, 0> DVA[0]=<0:ca00280000:1000> [L0 ZFS plain file] fletcher4 uncompressed unencrypted LE contiguous unique single size=400L/400P birth=7322L/ 7322P fill=1 cksum=97317b812:7d938af334d:364ea0c01c20e:1030d470306d731 -- skipping Now, how do I find out which files (or whatever else) are affected, in order to fix them? :) I tried to get a detailed log from zdb with all the DVAs and checksums, but I could not find any match. -- jimmy