From owner-freebsd-fs@FreeBSD.ORG  Thu Jan 12 08:09:44 2012
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7FB641065670
	for <freebsd-fs@freebsd.org>; Thu, 12 Jan 2012 08:09:44 +0000 (UTC)
	(envelope-from peter.maloney@brockmann-consult.de)
Received: from moutng.kundenserver.de (moutng.kundenserver.de
	[212.227.126.186])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D7A98FC08
	for <freebsd-fs@freebsd.org>; Thu, 12 Jan 2012 08:09:43 +0000 (UTC)
Received: from [10.3.0.26] ([141.4.215.32])
	by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis)
	id 0MCwZH-1RtoRx2qDz-0095ll; Thu, 12 Jan 2012 09:09:42 +0100
Message-ID: <4F0E9546.1030405@brockmann-consult.de>
Date: Thu, 12 Jan 2012 09:09:42 +0100
From: Peter Maloney <peter.maloney@brockmann-consult.de>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <20120111154722.000036e4@unknown>	<20120111210708.1168781e@fabiankeil.de>
	<20120111204041.GA47175@icarus.home.lan>
In-Reply-To: <20120111204041.GA47175@icarus.home.lan>
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Provags-ID: V02:K0:oO58sjRLllfCNF/WOZ54vculd8cB7jYV1hBwEMqOTkC
	MGLryhvVxBvmWu+fKGlNFjkGbyUextABcQ2H36kD9rf4JFMGgj
	WzvCzR10MWAzWdu9q8EohLubETWSQn/3RFOP6suSWr+OgfAfwN
	60gpz+MkOKJxDMEeZCWMsZLKBzngYCA/fwKuBTBHasKqSzKqrn
	TnWwnXsdL8Wy4JRX7LplSILOKEeBxuavql4i0V1NZHg7YHKwds
	45LaspnCxQYHDba2RKpb8tmoCVuaov2mBeMCwuH6vf1ovbFkxf
	uhtarWpAe+ahL6A6/naCuOutQ7dEAg/NE5dwqVPTQ9aToy/1O+
	BF1p62JyT5XWM5AbNU0W5TbGQnNqvwvURSVOIzUv8
Subject: Re: Unplugging disk under ZFS yield panic
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jan 2012 08:09:44 -0000

On 01/11/2012 09:40 PM, Jeremy Chadwick wrote:
> On Wed, Jan 11, 2012 at 09:07:08PM +0100, Fabian Keil wrote:
>> Gergely CZUCZY <phoemix@harmless.hu> wrote:
>>
>>> I'd like to ask, whether it is normal behaviour when we're unplugging a
>>> disk under a ZFS system then on the first write a kernel panic happened.
>> Sounds familiar. I currently have two PRs open for
>> reproducible kernel panics after a vdev gets lost:
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162010
>> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/162036
>>
>> Note that the pool layouts are different, though.
> Is this problem truly ZFS-specific?  I'd been tracking this problem for
> years, and was told it was fixed:
>
> http://wiki.freebsd.org/BugBusting/Commonly_reported_issues
>
> * Panic occurs when a mounted device (USB, SATA, local image file,
>   etc.) is removed
>
>   Workaround: Be sure to umount all filesystems before removing the
>   physical device
>   Partial fix: Committed to CURRENT (8.0) on/prior to 2008/02/21
>
>   There is ongoing work to fully fix this problem, ETA 2009/02 
>
> OP, please provide a kernel backtrace.
>
> Otherwise, if needed, I can go yank one of the two mirrored disks out of
> my FreeBSD box at home to try and reproduce the problem.

I have pulled root disks (gpt slices), logs (gpt slices), and other
disks (whole disk labels) without unmounting, without panic on 8-STABLE.
My whole system is pure zfs with no gmirror, multipath, gnop, etc.
devices, using the mps or mpslsi driver. I also have an SSD (with bad
firmware?) that fails horribly when pulled (with SCSI / SMP timeouts
instead of "lost device"), but *probably* doesn't affect the rest of the
system, until you run "gpart recover" or "camcontrol reset ..." on the
device, and then you get a panic. (I think mpslsi handles the bad SSD
slightly better... sometimes recovering, and never hanging unless all
root disks are gone, but not too sure; no difference in panics caused by
"gpart recover" or "camcontrol reset ..." between the 2 drivers)

However,
In my experience, when a log with no redundancy is pulled without first
doing "zpool remove <pool> <device>", the pool is marked FAULTED, and
does not run (unlike when DEGRADED) until you run "zpool clear <pool>"
(discarding the log data, possibly losing some files) or put the disk
back in. Since your root pool has the log, your root pool would then be
FAULTED. And any time your root disk is gone, FreeBSD seems to quickly
panic.

Could that be related to what you did? You said you pulled one of the
multipath data disks though, not the log. I would try the same test with
the log removed, or in a zfs mirror of slices/disks (instead of gmirror
devices / whatever it is now).


>>> On a device removal we're expecting it to moving to the spare disk, or
>>> using the available redundant disks.
>> I agree that this behaviour would be preferable to a panic.
I agree as well. Even if the root disk is lost. If the root is gone in
Linux (didn't try with mirrors), it just remounts the system read only
(which can be disabled) and runs in an unpredictable state... maybe ssh
works, maybe web works, etc. and then when the disk comes back, it is
just like a read only file system until you remount it.


-- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@brockmann-consult.de
Internet: http://www.brockmann-consult.de
--------------------------------------------