Date: Tue, 06 Dec 2011 13:19:40 +0100 From: Peter Maloney <peter.maloney@brockmann-consult.de> To: freebsd-fs@freebsd.org Subject: Re: weird bug with ZFS and SLOG Message-ID: <4EDE085C.4020406@brockmann-consult.de> In-Reply-To: <20111205220715.GA36072@freebsdbox.adamsnet> References: <20111205220715.GA36072@freebsdbox.adamsnet>
next in thread | previous in thread | raw e-mail | index | archive | help
On 12/05/2011 11:07 PM, Adam Stylinski wrote: > The worst case scenario happened to me where my dedicated SLOG decided to drop off the controller and thus prevent me from importing my pool. I quickly upgrade to FreeBSD 9.0-RC2 after testing this scenario in a VM. It has worked successfully in a VM, but it is not working on my hardware for whatever reason. I rollback the pool with zpool import -F share, seems ok, files are there, finishes scrub, very little corruption. I upgrade the pool to V28, and the fs's to v5. I then do a: > zpool remove share 15752248745115926170 > > It returns no errors and pretends like the operation worked, it even appends it to my zpool history. However, when I do a zpool status, this is what I get: > > [adam@nasbox ~]$ zpool status > pool: share > state: DEGRADED > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scan: scrub repaired 0 in 8h57m with 0 errors on Mon Dec 5 12:48:28 2011 > config: > > NAME STATE READ WRITE CKSUM > share DEGRADED 0 0 0 > raidz1-0 ONLINE 0 0 0 > ada4 ONLINE 0 0 0 > ada1 ONLINE 0 0 0 > *ada2* ONLINE 0 0 0 > ada3 ONLINE 0 0 0 > raidz1-1 ONLINE 0 0 0 > da3 ONLINE 0 0 0 > da0 ONLINE 0 0 0 > da2 ONLINE 0 0 0 > da1 ONLINE 0 0 0 > raidz1-2 ONLINE 0 0 0 > aacd0 ONLINE 0 0 0 > aacd1 ONLINE 0 0 0 > aacd2 ONLINE 0 0 0 > aacd3 ONLINE 0 0 0 > raidz1-4 ONLINE 0 0 0 > aacd4 ONLINE 0 0 0 > aacd5 ONLINE 0 0 0 > aacd6 ONLINE 0 0 0 > aacd7 ONLINE 0 0 0 > logs > 15752248745115926170 UNAVAIL 0 0 0 was /dev/*ada2* This looks like another case of not using labels. (see that share has ada2 in the list, but the log "was /dev/ada2"; they must have switched... maybe they also resilvered and your log is overwritten) I did the same thing when I started on FreeBSD and ZFS... nobody warned me either. When you reboot, sometimes the disks move around and change numbers. Maybe they are reliable with onboard SATA ports (from my experience), but with more io cards, removable media, expanders, etc. they don't seem to ever stay put for me. For me, only the first disk on the back expander and the first disk on the front expander ever seem to be the same, and if I add a new disk in the back, the front ones go up by 1. When a data disk from my pool would switch places with another data disk from the same pool, zfs would automatically handle it. But when a hotspare or something else switched places, it would look the same as you see in your zpool status. "some big number .... UNAVAIL 0 0 0 was /dev/da#" Here, I wrote you a howto, to explain how to convert to labels: http://forums.freebsd.org/showthread.php?p=157004 > errors: 3 data errors, use '-v' for a list > > Here is the ending output of zpool history: > > 2011-12-05.03:38:50 zpool upgrade -V 28 -a > 2011-12-05.03:39:09 zpool export share > 2011-12-05.03:39:33 zpool import -m share > 2011-12-05.03:40:05 zpool remove share 15752248745115926170 > 2011-12-05.03:41:04 zpool remove share 15752248745115926170 > 2011-12-05.03:41:18 zpool export share > 2011-12-05.03:41:56 zpool import -m share > 2011-12-05.03:43:47 zpool remove share 15752248745115926170 > 2011-12-05.03:47:54 zpool remove share 15752248745115926170 > 2011-12-05.03:51:20 zpool scrub share > 2011-12-05.16:33:01 zfs create share/vardb2 > 2011-12-05.16:33:32 zfs set compression=gzip-9 share/vardb2 > 2011-12-05.16:33:38 zfs set atime=off share/vardb2 > 2011-12-05.16:39:37 zfs destroy share/vardb > 2011-12-05.16:39:47 zfs rename share/vardb2 share/vardb > 2011-12-05.16:39:53 zfs set mountpoint=/var/db share/vardb > 2011-12-05.16:47:24 zpool clear share > 2011-12-05.16:48:41 zpool remove share 15752248745115926170 > 2011-12-05.16:53:15 zpool export -f share > 2011-12-05.16:55:21 zpool import -m share > 2011-12-05.16:55:52 zpool remove share 15752248745115926170 > 2011-12-05.16:56:56 zpool remove share -f 15752248745115926170 > 2011-12-05.17:04:07 zpool remove share 15752248745115926170 > > What is going on here and how do I fix it? > -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EDE085C.4020406>