From owner-freebsd-fs@FreeBSD.ORG Thu Jan 12 08:09:44 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7FB641065670 for ; Thu, 12 Jan 2012 08:09:44 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.freebsd.org (Postfix) with ESMTP id 0D7A98FC08 for ; Thu, 12 Jan 2012 08:09:43 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MCwZH-1RtoRx2qDz-0095ll; Thu, 12 Jan 2012 09:09:42 +0100 Message-ID: <4F0E9546.1030405@brockmann-consult.de> Date: Thu, 12 Jan 2012 09:09:42 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20120111154722.000036e4@unknown> <20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan> In-Reply-To: <20120111204041.GA47175@icarus.home.lan> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:oO58sjRLllfCNF/WOZ54vculd8cB7jYV1hBwEMqOTkC MGLryhvVxBvmWu+fKGlNFjkGbyUextABcQ2H36kD9rf4JFMGgj WzvCzR10MWAzWdu9q8EohLubETWSQn/3RFOP6suSWr+OgfAfwN 60gpz+MkOKJxDMEeZCWMsZLKBzngYCA/fwKuBTBHasKqSzKqrn TnWwnXsdL8Wy4JRX7LplSILOKEeBxuavql4i0V1NZHg7YHKwds 45LaspnCxQYHDba2RKpb8tmoCVuaov2mBeMCwuH6vf1ovbFkxf uhtarWpAe+ahL6A6/naCuOutQ7dEAg/NE5dwqVPTQ9aToy/1O+ BF1p62JyT5XWM5AbNU0W5TbGQnNqvwvURSVOIzUv8 Subject: Re: Unplugging disk under ZFS yield panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Jan 2012 08:09:44 -0000 On 01/11/2012 09:40 PM, Jeremy Chadwick wrote: > On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote: >> Gergely CZUCZY wrote: >> >>> I'd like to ask, whether it is normal behaviour when we're unplugging a >>> disk under a ZFS system then on the first write a kernel panic happened. >> Sounds familiar. I currently have two PRs open for >> reproducible kernel panics after a vdev gets lost: >> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010 >> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036 >> >> Note that the pool layouts are different, though. > Is this problem truly ZFS-specific? I'd been tracking this problem for > years, and was told it was fixed: > > http://wiki.freebsd.org/BugBusting/Commonly_reported_issues > > * Panic occurs when a mounted device (USB, SATA, local image file, > etc.) is removed > > Workaround: Be sure to umount all filesystems before removing the > physical device > Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21 > > There is ongoing work to fully fix this problem, ETA 2009/02 > > OP, please provide a kernel backtrace. > > Otherwise, if needed, I can go yank one of the two mirrored disks out of > my FreeBSD box at home to try and reproduce the problem. I have pulled root disks (gpt slices), logs (gpt slices), and other disks (whole disk labels) without unmounting, without panic on 8-STABLE. My whole system is pure zfs with no gmirror, multipath, gnop, etc. devices, using the mps or mpslsi driver. I also have an SSD (with bad firmware?) that fails horribly when pulled (with SCSI / SMP timeouts instead of "lost device"), but *probably* doesn't affect the rest of the system, until you run "gpart recover" or "camcontrol reset ..." on the device, and then you get a panic. (I think mpslsi handles the bad SSD slightly better... sometimes recovering, and never hanging unless all root disks are gone, but not too sure; no difference in panics caused by "gpart recover" or "camcontrol reset ..." between the 2 drivers) However, In my experience, when a log with no redundancy is pulled without first doing "zpool remove ", the pool is marked FAULTED, and does not run (unlike when DEGRADED) until you run "zpool clear " (discarding the log data, possibly losing some files) or put the disk back in. Since your root pool has the log, your root pool would then be FAULTED. And any time your root disk is gone, FreeBSD seems to quickly panic. Could that be related to what you did? You said you pulled one of the multipath data disks though, not the log. I would try the same test with the log removed, or in a zfs mirror of slices/disks (instead of gmirror devices / whatever it is now). >>> On a device removal we're expecting it to moving to the spare disk, or >>> using the available redundant disks. >> I agree that this behaviour would be preferable to a panic. I agree as well. Even if the root disk is lost. If the root is gone in Linux (didn't try with mirrors), it just remounts the system read only (which can be disabled) and runs in an unpredictable state... maybe ssh works, maybe web works, etc. and then when the disk comes back, it is just like a read only file system until you remount it. -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de --------------------------------------------