Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Jan 2012 09:09:42 +0100
From:      Peter Maloney <peter.maloney@brockmann-consult.de>
To:        freebsd-fs@freebsd.org
Subject:   Re: Unplugging disk under ZFS yield panic
Message-ID:  <4F0E9546.1030405@brockmann-consult.de>
In-Reply-To: <20120111204041.GA47175@icarus.home.lan>
References:  <20120111154722.000036e4@unknown>	<20120111210708.1168781e@fabiankeil.de> <20120111204041.GA47175@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/11/2012 09:40 PM, Jeremy Chadwick wrote:
> On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote:
>> Gergely CZUCZY <phoemix@harmless.hu> wrote:
>>
>>> I'd like to ask, whether it is normal behaviour when we're unplugging a
>>> disk under a ZFS system then on the first write a kernel panic happened.
>> Sounds familiar. I currently have two PRs open for
>> reproducible kernel panics after a vdev gets lost:
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036
>>
>> Note that the pool layouts are different, though.
> Is this problem truly ZFS-specific?  I'd been tracking this problem for
> years, and was told it was fixed:
>
> http://wiki.freebsd.org/BugBusting/Commonly_reported_issues
>
> * Panic occurs when a mounted device (USB, SATA, local image file,
>   etc.) is removed
>
>   Workaround: Be sure to umount all filesystems before removing the
>   physical device
>   Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21
>
>   There is ongoing work to fully fix this problem, ETA 2009/02 
>
> OP, please provide a kernel backtrace.
>
> Otherwise, if needed, I can go yank one of the two mirrored disks out of
> my FreeBSD box at home to try and reproduce the problem.

I have pulled root disks (gpt slices), logs (gpt slices), and other
disks (whole disk labels) without unmounting, without panic on 8-STABLE.
My whole system is pure zfs with no gmirror, multipath, gnop, etc.
devices, using the mps or mpslsi driver. I also have an SSD (with bad
firmware?) that fails horribly when pulled (with SCSI / SMP timeouts
instead of "lost device"), but *probably* doesn't affect the rest of the
system, until you run "gpart recover" or "camcontrol reset ..." on the
device, and then you get a panic. (I think mpslsi handles the bad SSD
slightly better... sometimes recovering, and never hanging unless all
root disks are gone, but not too sure; no difference in panics caused by
"gpart recover" or "camcontrol reset ..." between the 2 drivers)

However,
In my experience, when a log with no redundancy is pulled without first
doing "zpool remove <pool> <device>", the pool is marked FAULTED, and
does not run (unlike when DEGRADED) until you run "zpool clear <pool>"
(discarding the log data, possibly losing some files) or put the disk
back in. Since your root pool has the log, your root pool would then be
FAULTED. And any time your root disk is gone, FreeBSD seems to quickly
panic.

Could that be related to what you did? You said you pulled one of the
multipath data disks though, not the log. I would try the same test with
the log removed, or in a zfs mirror of slices/disks (instead of gmirror
devices / whatever it is now).


>>> On a device removal we're expecting it to moving to the spare disk, or
>>> using the available redundant disks.
>> I agree that this behaviour would be preferable to a panic.
I agree as well. Even if the root disk is lost. If the root is gone in
Linux (didn't try with mirrors), it just remounts the system read only
(which can be disabled) and runs in an unpredictable state... maybe ssh
works, maybe web works, etc. and then when the disk comes back, it is
just like a read only file system until you remount it.




-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F0E9546.1030405>