Date: Sun, 07 Jun 2009 01:39:26 -0500 From: "James R. Van Artsdalen" <james-freebsd-fs2@jrv.org> To: freebsd-fs@freebsd.org Subject: Reproducible ZFS checksum error, svn192136 (05/14/2009) Message-ID: <4A2B609E.4060702@jrv.org>
next in thread | raw e-mail | index | archive | help
I am able to reproduce a ZFS checksum error. I believe I have ruled out the hard disks, controllers, cables, etc - the usual suspects. I have not ruled out the computer itself as I don't have anything else similar to test with. No other errors are seen on that computer. A Dell 435MT (Core i7) at 2.66 GHz with 12GB of RAM 2 Silicon Imagine 3132 PCI-e cards with 2 eSATA ports each 1 Addonics PCI-e card with 4 ports eSATA ports that identifies itself as a Silicon Imagine 3124 Samsung 1TB disk Error happens with any of the eSATA cards or the onboard Intel chipset eSATA controller. Error happens with any hard disk, enclosure or cabling There are no I/O errors in the logs, and when I use an external hardware RAID it reports no errors from the disks or reported to the host. svn 192136 (Thu, 14 May 2009) amd64, GENERIC config The disk is partitioned like this, with a UFS work area at the end and the area up front being Mac OSX compatible. It boots into UFS land, not ZFS # gpart show => 34 1953525101 ad12 GPT (932G) 34 6 - free - (3.0K) 40 409600 1 efi (200M) 409640 1869229256 2 !6a898cc3-1dd2-11b2-99a6-080020736631 (891G) 1869638896 128 3 freebsd-boot (64K) 1869639024 4194304 4 freebsd-ufs (2.0G) 1873833328 33554432 5 freebsd-swap (16G) 1907387760 4194304 6 freebsd-ufs (2.0G) 1911582064 33554432 7 freebsd-ufs (16G) 1945136496 8388608 8 freebsd-ufs (4.0G) 1953525104 31 - free - (16K) For ease of moving the disk between SATA ports each UFS and swap is labeled with gmirror: # gmirror status Name Status Components mirror/sroot COMPLETE ad12p4 mirror/sswap COMPLETE ad12p5 mirror/stmp COMPLETE ad12p6 mirror/susr COMPLETE ad12p7 mirror/svar COMPLETE ad12p8 /boot/loader.conf contains zfs_load="YES" vm.kmem_size="1536M" vm.kmem_size_min="1536M" vfs.root.mountfrom="ufs:mirror/sroot" kern.maxfiles="32K" kern.ktrace.request_pool="512" geom_mirror_load="YES" # RAID1 disk driver (see gmirror(8)) vfs.zfs.debug=1 #vfs.zfs.prefetch_disable=1 loader_logo="beastie" # Desired logo: fbsdbw, beastiebw, beastie, none boot_verbose="YES" # -v: Causes extra debugging information to be printed 1. Start one buildworld loop thusly on UFS. cd /usr/src while true do make clean make buildworld touch "done-`date`" done 2. Start writes to ZFS with rsync Make a clean pool: zpool create pool ad12p2 Start an rsync copying data to ZFS. I'm copying from a Mac-mini over the network, which gets about 20 MB/s when the systems are not loaded. 3. Run "zpool scrub pool". As each scrub completes start a new one. At some point a scrub will report a checksum error(s), usually within the first 500GB of the rsync, sometimes it takes a few TB. I'm wondering if anyone else is able to try something similar, with I/O to UFS and ZFS, and scrubs, to one disk, on a system with >> 4GB RAM. PS. we need a debug sysctl to make zfs return data from a block with a checksum error so we can easy see what data is on disk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4A2B609E.4060702>