From owner-freebsd-stable@FreeBSD.ORG  Thu Mar 17 04:30:35 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C4A66106566B
	for <freebsd-stable@freebsd.org>; Thu, 17 Mar 2011 04:30:35 +0000 (UTC)
	(envelope-from luke@digital-crocus.com)
Received: from mail.digital-crocus.com (node2.digital-crocus.com
	[91.209.244.128])
	by mx1.freebsd.org (Postfix) with ESMTP id 80A158FC12
	for <freebsd-stable@freebsd.org>; Thu, 17 Mar 2011 04:30:35 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector;
	d=hybrid-logic.co.uk; 
	h=Received:Received:Subject:From:Reply-To:To:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse;
	b=2cwCo2KLzAnowL2pueN59BuliZekQenPmx9LVdOKLxulNv+6PzH+Uj7aNFT7QEam5AWJc/tLLUqe++zkR8OBdgRjRwHBgPrA7SoSXlr98gD3bXDpEFCrpc5s1mPy2s5G;
Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD))
	(envelope-from <luke@digital-crocus.com>) id 1Q04Uf-0006FI-3a
	for freebsd-stable@freebsd.org; Thu, 17 Mar 2011 04:07:25 +0000
Received: from c-76-118-178-109.hsd1.ma.comcast.net ([76.118.178.109]
	helo=[192.168.1.15])
	by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD))
	(envelope-from <luke-lists@hybrid-logic.co.uk>)
	id 1Q04Ue-0006Et-KW; Thu, 17 Mar 2011 04:07:25 +0000
From: Luke Marsden <luke-lists@hybrid-logic.co.uk>
To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org, 
	freebsd-current@freebsd.org
Content-Type: text/plain; charset="UTF-8"
Organization: Hybrid Web Cluster
Date: Thu, 17 Mar 2011 00:08:01 -0400
Message-ID: <1300334881.3837.126.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.30.3 
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.0
X-Digital-Crocus-Maillimit: done
X-Authenticated-Sender: luke
X-Complaints: abuse@digital-crocus.com
X-Admin: admin@digital-crocus.com
X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse
	reports)
Cc: 
Subject: Guaranteed kernel panic with ZFS + nullfs
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: luke@hybrid-logic.co.uk
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Mar 2011 04:30:35 -0000

Hi all,

The following script seems to cause a guaranteed kernel panic on 8.1-R,
8.2-R and 8-STABLE as of today (2011-03-16), with both ZFS v14/15, and
v28 on 8.2-R with mm@ patches from 2011-03. I suspect it may also affect
9-CURRENT but have not tested this yet.

#!/usr/local/bin/bash
export POOL=hpool # change this to your pool name
sudo zfs destroy -r $POOL/foo
sudo zfs create $POOL/foo
sudo zfs set mountpoint=/foo $POOL/foo
sudo mount -t nullfs /foo /bar
sudo touch /foo/baz
ls /bar # should see baz
sudo zfs umount -f $POOL/foo # seems okay (ls: /bar: Bad file
descriptor)
sudo zfs mount $POOL/foo # PANIC!

Can anyone suggest a patch which fixes this? Preferably against
8-STABLE :-)

I also have a more subtle problem where, after mounting and then quickly
force-unmounting a ZFS filesystem (call it A) with two nullfs-mounted
filesystems and a devfs filesystem within it, running "ls" on the
mountpoint of the parent filesystem of A hangs.

I'm working on narrowing it down to a shell script like the above - as
soon as I have one I'll post a followup.

This latter problem is actually more of an issue for me - I can avoid
the behaviour which triggers the panic ("if it hurts, don't do it"), but
I need to be able to perform the actions which trigger the deadlock
(mounting and unmounting filesystems).

This also affects 8.1-R, 8.2-R, 8-STABLE and 8.2-R+v28.

It seems to be the "zfs umount -f" process which hangs and triggers
further accesses to the parent filesystem to hang. Note that I have
definitely correctly unmounted the nullfs and devfs mounts from within
the filesystem before I force the unmount. Unfortunately the -f is
necessary in my application.

After the hang:

hybrid@dev3:/opt/HybridCluster$ sudo ps ax |grep zfs
   41  ??  DL     0:00.11 [zfskern]
 3751  ??  D      0:00.03 /sbin/zfs unmount -f hpool/hcfs/filesystem1

hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 3751
  PID    TID COMM             TDNAME
KSTACK                       
 3751 100264 zfs              -                mi_switch+0x16f
sleepq_wait+0x42 _sleep+0x31c zfsvfs_teardown+0x269 zfs_umount+0x1a7
dounmount+0x28a unmount+0x3c8 syscall+0x1e7 Xfast_syscall+0xe1 

hybrid@dev3:/opt/HybridCluster$ sudo procstat -kk 41
  PID    TID COMM             TDNAME
KSTACK                       
   41 100058 zfskern          arc_reclaim_thre mi_switch+0x16f
sleepq_timedwait+0x42 _cv_timedwait+0x129 arc_reclaim_thread+0x2d1
fork_exit+0x118 fork_trampoline+0xe 
   41 100062 zfskern          l2arc_feed_threa mi_switch+0x16f
sleepq_timedwait+0x42 _cv_timedwait+0x129 l2arc_feed_thread+0x1be
fork_exit+0x118 fork_trampoline+0xe 
   41 100090 zfskern          txg_thread_enter mi_switch+0x16f
sleepq_wait+0x42 _cv_wait+0x111 txg_thread_wait+0x79 txg_quiesce_thread
+0xb5 fork_exit+0x118 fork_trampoline+0xe 
   41 100091 zfskern          txg_thread_enter mi_switch+0x16f
sleepq_timedwait+0x42 _cv_timedwait+0x129 txg_thread_wait+0x3c
txg_sync_thread+0x355 fork_exit+0x118 fork_trampoline+0xe 

I will continue to attempt to create a shell script which makes this
latter bug easily reproducible.

In the meantime, what further information can I gather? I will build a
debug kernel in the morning.

If it helps accelerate finding a solution to this problem, Hybrid Logic
Ltd might be able to fund a small bounty for a fix. Contact me off-list
if you can help in this way.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Phone: +441172232002 / +16179496062