From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 20:29:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CA68B3C7;
 Tue, 24 Dec 2013 20:29:01 +0000 (UTC)
Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10])
 by mx1.freebsd.org (Postfix) with ESMTP id 39C291DEA;
 Tue, 24 Dec 2013 20:29:00 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id B01F2A894; Tue, 24 Dec 2013 21:28:39 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id AB25AA893;
 Tue, 24 Dec 2013 21:28:39 +0100 (CET)
Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
Message-ID: <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
 <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 20:29:01 -0000

Dear all,

Unfortunately, this seems to be serious, as a plain user can deadlock zfs 
using the attached script, and can make a DoS against the mounted 
dataset's .zfs/snapshot directory.

Regards,

Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:

> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
> From: krichy@tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> Dear devs,
>
> A third one, again the snapshot dirs:
>
> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null
>
> Will deadlock in a few seconds.
>
> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries to 
> lock .zfs, while the other just tries to do this in reversed order.
>
> In a vop_lookup, why is the passed in vnode, and the returned vnode must be 
> locked? Why is not enough to have it held?
>
> Thanks in advance,
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote:
>
>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
>> From: krichy@tvnetwork.hu
>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> 
>> Dear Pawel,
>> 
>> Thanks for your response. I hope someone will review my work.
>> 
>> Secondly, I think I;ve found another deadlock scenario:
>> 
>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
>> held. Then it does an unmount of all snapshots, which in turn goes to 
>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>> 
>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does 
>> a snapshot mount, which somewhere tries to acquire spa_namespace_lock. So 
>> it deadlocks. Currently I dont know how to workaround this.
>> 
>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>> 
>> What if we release it function enter, and re-acquire upon exit?
>> 
>> Thanks in advance,
>> 
>> 
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>> 
>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>> 
>>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>> To: krichy@tvnetwork.hu
>>> Cc: freebsd-fs@freebsd.org
>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>> 
>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>>>> Dear devs,
>>>> 
>>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>>> trying to solve it at least for two weeks now, and an ACK or a NACK would
>>>> be nice from someone. Also a commit is reverted since it also caused
>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but I
>>>> did not find any reference how to produce that deadlock. In my view
>>>> reverting that makes my issues disappear, but I dont know what new cases
>>>> will it rise.
>>> 
>>> Richard,
>>> 
>>> I won't be able to analyse it myself anytime soon, because of other
>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>>> there will be able to help you.
>>> 
>>>> I've rewritten traverse() to make more like upstream, added two extra
>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what
>>>> was passed to it (I dont know even that upon creating a snapshot vnode 
>>>> why
>>>> is that extra two holds needed for the vnode.) And unfortunately, due to
>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also
>>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>>> 
>>>> After this, one may also get a deadlock, when a simple access would call
>>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes
>>>> should always be covered, but after some stress test, sometimes we hit
>>>> that call, and that can cause again deadlocks. In our environment I've
>>>> just uncommented that callback, which returns ENODIR on some calls, but 
>>>> at
>>>> least not a deadlock.
>>>> 
>>>> The attached script can be used to reproduce my cases (would one confirm
>>>> that?), and after the patches applied, they disappear (confirm?).
>>>> 
>>>> Thanks,
>>>> 
>>>> 
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>> 
>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>>> 
>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>>> From: krichy@tvnetwork.hu
>>>>> To: pjd@freebsd.org
>>>>> Cc: freebsd-fs@freebsd.org
>>>>> Subject: Re: kern/184677 (fwd)
>>>>> 
>>>>> Dear devs,
>>>>> 
>>>>> I will sum up my experience regarding the issue:
>>>>> 
>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the
>>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>>> 
>>>>> After investigating the problem, I found that zfs send umounts the 
>>>>> snapshots,
>>>>> and that causes the deadlock, so later I tested only with concurrent 
>>>>> umount
>>>>> and the "activity". More later I found that listing the snapshots in
>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I 
>>>>> used
>>>>> them for the tests. But for my surprise, instead of a deadlock, a 
>>>>> recursive
>>>>> lock panic has arised.
>>>>> 
>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's 
>>>>> zfsctl_snapdir_t
>>>>> structure (sdp). This contains a tree of mounted snapshots, and each 
>>>>> entry
>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on 
>>>>> top
>>>>> (se_root). The strange is that the se_root member does not hold a 
>>>>> reference
>>>>> for the vnode, just a simple pointer to it.
>>>>> 
>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is 
>>>>> locked,
>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it
>>>>> exists already. If it founds no entry, does the mount. In the case of an
>>>>> entry was found, the se_root member contains the vnode which the 
>>>>> snapshot is
>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call 
>>>>> will
>>>>> resolve to the real root vnode of the mounted snapshot, returning it as
>>>>> locked. (Examining the traverse() code I've found that it did not follow
>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>>> 
>>>>> On the other way, when an umount is issued, the se_root vnode looses its 
>>>>> last
>>>>> reference (as only the mountpoint holds one for it), it goes through the
>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is 
>>>>> called
>>>>> with a locked vnode, so this is a deadlock race condition. While
>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse()
>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock 
>>>>> on
>>>>> se_root while tries to access the sdp lock.
>>>>> 
>>>>> The zfsctl_snapshot_inactive() has an if statement checking the 
>>>>> v_usecount,
>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it 
>>>>> is
>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() 
>>>>> path
>>>>> assumes that the vnode remains inactive (as opposed to illumos, at least 
>>>>> how
>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources 
>>>>> while in
>>>>> a locked state. I was a bit confused, and probably that is why the 
>>>>> previously
>>>>> posted patch is as is.
>>>>> 
>>>>> Maybe if I had some clues on the directions of this problem, I could 
>>>>> have
>>>>> worked more for a nicer, shorter solution.
>>>>> 
>>>>> Please someone comment on my post.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> 
>>>>> 
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> 
>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>> 
>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: pjd@freebsd.org
>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>> 
>>>>>> Dear PJD,
>>>>>> 
>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>>> regarding some issues in ZFS. Please give some advice for me, where to 
>>>>>> look
>>>>>> for solutions, or how could I help to resolve those issues.
>>>>>> 
>>>>>> Regards,
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> ---------- Forwarded message ----------
>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: freebsd-fs@freebsd.org
>>>>>> Subject: Re: kern/184677
>>>>>> 
>>>>>> 
>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on 
>>>>>> Fri
>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>>> situation. Any comments on this?
>>>>>> 
>>>>>> 
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>> 
>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>> Subject: kern/184677
>>>>>>> 
>>>>>>> Dear devs,
>>>>>>> 
>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, 
>>>>>>> and
>>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>>> 
>>>>>>> But unfortunately, my original problem is still not solved, maybe the 
>>>>>>> same
>>>>>>> as Ryan's:
>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>>> 
>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to 
>>>>>>> acquire
>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a
>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason 
>>>>>>> they
>>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock 
>>>>>>> with
>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>>> 
>>>>>>> I would like to help making ZFS more stable on freebsd also with its 
>>>>>>> whole
>>>>>>> functionality. I would be very thankful if some expert would give some
>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> 
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>>> 
>>>>> 
>>> 
>>> 
>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and
>>>>  replace it with the"
>>>> 
>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>>> 
>>>> Conflicts:
>>>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> ---
>>>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 
>>>> +++++++++------------
>>>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>>> 
>>>> diff --git 
>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>> index af2def2..8272c4d 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>>  extern minor_t zfsdev_minor_alloc(void);
>>>>
>>>>  extern void *zfsdev_state;
>>>> +extern kmutex_t zfsdev_state_lock;
>>>>
>>>>  #endif	/* _KERNEL */
>>>> 
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>> index 15685a5..5c3e9f3 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>  	struct g_provider *pp;
>>>>  	struct g_consumer *cp;
>>>>  	size_t bufsize;
>>>> -	int error;
>>>> +	int error, lock;
>>>>
>>>>  	/*
>>>>  	 * We must have a pathname, and it must be absolute.
>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>
>>>>  	vd->vdev_tsd = NULL;
>>>> 
>>>> +	if (mutex_owned(&spa_namespace_lock)) {
>>>> +		mutex_exit(&spa_namespace_lock);
>>>> +		lock = 1;
>>>> +	} else {
>>>> +		lock = 0;
>>>> +	}
>>>>  	DROP_GIANT();
>>>>  	g_topology_lock();
>>>>  	error = 0;
>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>  	    !ISP2(cp->provider->sectorsize)) {
>>>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>>  		    vd->vdev_path);
>>>> +
>>>> +		g_topology_lock();
>>>>  		vdev_geom_detach(cp, 0);
>>>> +		g_topology_unlock();
>>>> +
>>>>  		error = EINVAL;
>>>>  		cp = NULL;
>>>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>  	}
>>>>  	g_topology_unlock();
>>>>  	PICKUP_GIANT();
>>>> +	if (lock)
>>>> +		mutex_enter(&spa_namespace_lock);
>>>>  	if (cp == NULL) {
>>>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>>  		return (error);
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> index e9fba26..91becde 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>>  	static minor_t last_minor;
>>>>  	minor_t m;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>>>  		if (m > ZFSDEV_MAX_MINOR)
>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>  	minor_t minor;
>>>>  	zfs_soft_state_t *zs;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	minor = zfsdev_minor_alloc();
>>>>  	if (minor == 0)
>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>  static void
>>>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>>  {
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	zfs_onexit_destroy(zo);
>>>>  	ddi_soft_state_free(zfsdev_state, minor);
>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, 
>>>> struct thread *td)
>>>>
>>>>  	/* This is the control device. Allocate a new minor if requested. */
>>>>  	if (flag & FEXCL) {
>>>> -		mutex_enter(&spa_namespace_lock);
>>>> +		mutex_enter(&zfsdev_state_lock);
>>>>  		error = zfs_ctldev_init(devp);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  	}
>>>>
>>>>  	return (error);
>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>>  	if (minor == 0)
>>>>  		return;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>>  	if (zo == NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return;
>>>>  	}
>>>>  	zfs_ctldev_destroy(zo, minor);
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  }
>>>>
>>>>  static int
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> index 72d4502..aec5219 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>>>
>>>>  /*
>>>> - * The spa_namespace_lock protects the zfsdev_state structure from being
>>>> - * modified while it's being used, e.g. an open that comes in before a
>>>> - * create finishes.  It also protects temporary opens of the dataset so 
>>>> that,
>>>> + * This lock protects the zfsdev_state structure from being modified
>>>> + * while it's being used, e.g. an open that comes in before a create
>>>> + * finishes.  It also protects temporary opens of the dataset so that,
>>>>   * e.g., an open doesn't get a spurious EBUSY.
>>>>   */
>>>> +kmutex_t zfsdev_state_lock;
>>>>  static uint32_t zvol_minors;
>>>>
>>>>  typedef struct zvol_extent {
>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>>  	struct g_geom *gp;
>>>>  	zvol_state_t *zv = NULL;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	g_topology_lock();
>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>>  {
>>>>  	zvol_state_t *zv;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	zv = zvol_minor_lookup(name);
>>>>  	if (minor && zv)
>>>>  		*minor = zv->zv_minor;
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (zv ? 0 : -1);
>>>>  }
>>>>  #endif	/* sun */
>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>>
>>>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	if (zvol_minor_lookup(name) != NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EEXIST));
>>>>  	}
>>>> 
>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>>
>>>>  	if (error) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (error);
>>>>  	}
>>>>
>>>>  #ifdef sun
>>>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>
>>>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EAGAIN));
>>>>  	}
>>>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EAGAIN));
>>>>  	}
>>>> 
>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EAGAIN));
>>>>  	}
>>>> 
>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>>
>>>>  	zvol_minors++;
>>>> 
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>
>>>>  	zvol_geom_run(zv);
>>>> 
>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>>  	minor_t minor = zv->zv_minor;
>>>>  #endif
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>  	if (zv->zv_total_opens != 0)
>>>>  		return (SET_ERROR(EBUSY));
>>>> 
>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>>  	zvol_state_t *zv;
>>>>  	int rc;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>  	g_topology_lock();
>>>>  	rc = zvol_remove_zv(zv);
>>>>  	g_topology_unlock();
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (rc);
>>>>  }
>>>> 
>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>>  	dmu_tx_t *tx;
>>>>  	int error;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	tx = dmu_tx_create(os);
>>>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>>  	namelen = strlen(name);
>>>>
>>>>  	DROP_GIANT();
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	g_topology_lock();
>>>>
>>>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>>  	}
>>>>
>>>>  	g_topology_unlock();
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	PICKUP_GIANT();
>>>>  }
>>>> 
>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>> uint64_t volsize)
>>>>  	uint64_t old_volsize = 0ULL;
>>>>  	uint64_t readonly;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	zv = zvol_minor_lookup(name);
>>>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (error);
>>>>  	}
>>>> 
>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>> uint64_t volsize)
>>>>  out:
>>>>  	dmu_objset_rele(os, FTAG);
>>>> 
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>
>>>>  	return (error);
>>>>  }
>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  {
>>>>  	zvol_state_t *zv;
>>>>  	int err = 0;
>>>> -	boolean_t locked = B_FALSE;
>>>> 
>>>> -	/*
>>>> -	 * Protect against recursively entering spa_namespace_lock
>>>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>>>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>>>> -	 * with spa_namespace_lock in the ZVOL code.
>>>> -	 * We are using the same trick as spa_open().
>>>> -	 * Note that calls in zvol_first_open which need to resolve
>>>> -	 * pool name to a spa object will enter spa_open()
>>>> -	 * recursively, but that function already has all the
>>>> -	 * necessary protection.
>>>> -	 */
>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>> -		mutex_enter(&spa_namespace_lock);
>>>> -		locked = B_TRUE;
>>>> -	}
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	zv = pp->private;
>>>>  	if (zv == NULL) {
>>>> -		if (locked)
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>
>>>>  	if (zv->zv_total_opens == 0)
>>>>  		err = zvol_first_open(zv);
>>>>  	if (err) {
>>>> -		if (locked)
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (err);
>>>>  	}
>>>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  #endif
>>>>
>>>>  	zv->zv_total_opens += count;
>>>> -	if (locked)
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>
>>>>  	return (err);
>>>>  out:
>>>>  	if (zv->zv_total_opens == 0)
>>>>  		zvol_last_close(zv);
>>>> -	if (locked)
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (err);
>>>>  }
>>>> 
>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  {
>>>>  	zvol_state_t *zv;
>>>>  	int error = 0;
>>>> -	boolean_t locked = B_FALSE;
>>>> 
>>>> -	/* See comment in zvol_open(). */
>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>> -		mutex_enter(&spa_namespace_lock);
>>>> -		locked = B_TRUE;
>>>> -	}
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	zv = pp->private;
>>>>  	if (zv == NULL) {
>>>> -		if (locked)
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>> 
>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  	if (zv->zv_total_opens == 0)
>>>>  		zvol_last_close(zv);
>>>> 
>>>> -	if (locked)
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (error);
>>>>  }
>>>> 
>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  	int error = 0;
>>>>  	rl_t *rl;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>>
>>>>  	if (zv == NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>  	ASSERT(zv->zv_total_opens > 0);
>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  		dki.dki_ctype = DKC_UNKNOWN;
>>>>  		dki.dki_unit = getminor(dev);
>>>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - 
>>>> zv->zv_min_bs);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>>  			error = SET_ERROR(EFAULT);
>>>>  		return (error);
>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>>  		dkm.dki_media_type = DK_UNKNOWN;
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>>  			error = SET_ERROR(EFAULT);
>>>>  		return (error);
>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  			uint64_t vs = zv->zv_volsize;
>>>>  			uint8_t bs = zv->zv_min_bs;
>>>> 
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +			mutex_exit(&zfsdev_state_lock);
>>>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>>>  			return (error);
>>>>  		}
>>>>
>>>>  	case DKIOCFLUSHWRITECACHE:
>>>>  		dkc = (struct dk_callback *)arg;
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  			}
>>>>  			if (wce) {
>>>>  				zv->zv_flags |= ZVOL_WCE;
>>>> -				mutex_exit(&spa_namespace_lock);
>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>  			} else {
>>>>  				zv->zv_flags &= ~ZVOL_WCE;
>>>> -				mutex_exit(&spa_namespace_lock);
>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>  			}
>>>>  			return (0);
>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  		break;
>>>>
>>>>  	}
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (error);
>>>>  }
>>>>  #endif	/* sun */
>>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>>  {
>>>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>>  	    1) == 0);
>>>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>>>  }
>>>>
>>>>  void
>>>>  zvol_fini(void)
>>>>  {
>>>> +	mutex_destroy(&zfsdev_state_lock);
>>>>  	ddi_soft_state_fini(&zfsdev_state);
>>>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>>>  }
>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize)
>>>>  	uint64_t version = spa_version(spa);
>>>>  	enum zio_checksum checksum;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>>
>>>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char 
>>>> *newname)
>>>>  	struct g_provider *pp;
>>>>  	zvol_state_t *zv;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>  	g_topology_assert();
>>>>
>>>>  	pp = LIST_FIRST(&gp->provider);
>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char 
>>>> *newname)
>>>>  	newnamelen = strlen(newname);
>>>>
>>>>  	DROP_GIANT();
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	g_topology_lock();
>>>>
>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char 
>>>> *newname)
>>>>  	}
>>>>
>>>>  	g_topology_unlock();
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	PICKUP_GIANT();
>>>>  }
>>>> --
>>>> 1.8.4.2
>>>> 
>>> 
>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>>> 
>>>> ---
>>>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 
>>>> +++++++++++++++-------
>>>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>>> 
>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c 
>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>> index 94383d6..4cac053 100644
>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>>  	 * progress on this vnode.
>>>>  	 */
>>>> 
>>>> +	vn_lock(cvp, lktype);
>>>> +
>>>>  	for (;;) {
>>>>  		/*
>>>>  		 * Reached the end of the mount chain?
>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>>  		if (vfsp == NULL)
>>>>  			break;
>>>>  		error = vfs_busy(vfsp, 0);
>>>> -		/*
>>>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>>>> -		 */
>>>> -		if (tvp != NULL)
>>>> -			vput(cvp);
>>>> -		else
>>>> -			vrele(cvp);
>>>> +		VOP_UNLOCK(cvp, 0);
>>>>  		if (error)
>>>>  			return (error);
>>>> 
>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>>  		vfs_unbusy(vfsp);
>>>>  		if (error != 0)
>>>>  			return (error);
>>>> +
>>>> +		VN_RELE(cvp);
>>>> +
>>>>  		cvp = tvp;
>>>>  	}
>>>> 
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>> index 28ab1fa..d3464b4 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>>  		return (0);
>>>>  }
>>>> 
>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>>> + * On return the passed in vp is unlocked */
>>>> +static zfsctl_snapdir_t *
>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>>> +{
>>>> +	gfs_dir_t *dp = vp->v_data;
>>>> +	*dvpp = dp->gfsd_file.gfs_parent;
>>>> +	zfsctl_snapdir_t *sdp;
>>>> +
>>>> +	VN_HOLD(*dvpp);
>>>> +	VOP_UNLOCK(vp, 0);
>>>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>>> +	sdp = (*dvpp)->v_data;
>>>> +	VOP_UNLOCK(*dvpp, 0);
>>>> +
>>>> +	return (sdp);
>>>> +}
>>>> +
>>>>  #ifdef sun
>>>>  vnodeops_t *zfsctl_ops_root;
>>>>  vnodeops_t *zfsctl_ops_snapdir;
>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>>  			 * The snapshot was unmounted behind our backs,
>>>>  			 * try to remount it.
>>>>  			 */
>>>> +			VOP_UNLOCK(*vpp, 0);
>>>> +			VN_HOLD(*vpp);
>>>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, 
>>>> snapname) == 0);
>>>>  			goto domount;
>>>>  		} else {
>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>>  	(void) strcpy(sep->se_name, nm);
>>>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, 
>>>> dmu_objset_id(snap));
>>>> -	VN_HOLD(*vpp);
>>>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>>>
>>>>  	dmu_objset_rele(snap, FTAG);
>>>> @@ -1075,6 +1095,7 @@ domount:
>>>>  	(void) snprintf(mountpoint, mountpoint_len,
>>>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>>> +	VN_HOLD(*vpp);
>>>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>>  	kmem_free(mountpoint, mountpoint_len);
>>>>  	if (err == 0) {
>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>>  	int locked;
>>>>  	vnode_t *dvp;
>>>> 
>>>> -	if (vp->v_count > 0)
>>>> -		goto end;
>>>> -
>>>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>>> -	sdp = dvp->v_data;
>>>> -	VOP_UNLOCK(dvp, 0);
>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>
>>>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>>  		mutex_enter(&sdp->sd_lock);
>>>> 
>>>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>>> +
>>>> +	if (vp->v_count > 0) {
>>>> +		mutex_exit(&sdp->sd_lock);
>>>> +		return (0);
>>>> +	}
>>>> +
>>>>  	ASSERT(!vn_ismntpt(vp));
>>>>
>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>>  		mutex_exit(&sdp->sd_lock);
>>>>  	VN_RELE(dvp);
>>>> 
>>>> -end:
>>>>  	/*
>>>>  	 * Dispose of the vnode for the snapshot mount point.
>>>>  	 * This is safe to do because once this entry has been removed
>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>>  static int
>>>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>>  {
>>>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>>> -	vnode_t *dvp, *vp;
>>>> +	vnode_t *dvp, *vp = ap->a_vp;
>>>>  	zfsctl_snapdir_t *sdp;
>>>>  	zfs_snapentry_t *sep;
>>>> -	int error;
>>>> +	int error = 0;
>>>> 
>>>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>>>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>>> -	if (error != 0)
>>>> -		return (error);
>>>> -	sdp = dvp->v_data;
>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>
>>>>  	mutex_enter(&sdp->sd_lock);
>>>> +
>>>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>>>> +
>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>  	while (sep != NULL) {
>>>>  		vp = sep->se_root;
>>>> --
>>>> 1.8.4.2
>>>> 
>>> 
>>> -- 
>>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>>> FreeBSD committer                         http://www.FreeBSD.org
>>> Am I Evil? Yes, I Am!                     http://mobter.com
>>> 
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> 
>
From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 20:35:37 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B76355FD;
 Tue, 24 Dec 2013 20:35:37 +0000 (UTC)
Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10])
 by mx1.freebsd.org (Postfix) with ESMTP id 2838D1E6A;
 Tue, 24 Dec 2013 20:35:37 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id 9B014A899; Tue, 24 Dec 2013 21:35:16 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id 97E3BA898;
 Tue, 24 Dec 2013 21:35:16 +0100 (CET)
Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
Message-ID: <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
 <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 20:35:37 -0000

Dear all,

Now a user caused deadlock simply caused buildkernel to be blocked, so I 
must correct myself in that this not only does a DoS for .zfs/snapshot 
directories, but may for the whole system.

Regards,

P.S.: Merry Christmas!

Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:

> Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET)
> From: krichy@tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> Dear all,
>
> Unfortunately, this seems to be serious, as a plain user can deadlock zfs 
> using the attached script, and can make a DoS against the mounted dataset's 
> .zfs/snapshot directory.
>
> Regards,
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:
>
>> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
>> From: krichy@tvnetwork.hu
>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> 
>> Dear devs,
>> 
>> A third one, again the snapshot dirs:
>> 
>> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
>> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null
>> 
>> Will deadlock in a few seconds.
>> 
>> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries 
>> to lock .zfs, while the other just tries to do this in reversed order.
>> 
>> In a vop_lookup, why is the passed in vnode, and the returned vnode must be 
>> locked? Why is not enough to have it held?
>> 
>> Thanks in advance,
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>> 
>> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote:
>> 
>>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
>>> From: krichy@tvnetwork.hu
>>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>> 
>>> Dear Pawel,
>>> 
>>> Thanks for your response. I hope someone will review my work.
>>> 
>>> Secondly, I think I;ve found another deadlock scenario:
>>> 
>>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
>>> held. Then it does an unmount of all snapshots, which in turn goes to 
>>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>>> 
>>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock 
>>> does a snapshot mount, which somewhere tries to acquire 
>>> spa_namespace_lock. So it deadlocks. Currently I dont know how to 
>>> workaround this.
>>> 
>>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>>> 
>>> What if we release it function enter, and re-acquire upon exit?
>>> 
>>> Thanks in advance,
>>> 
>>> 
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>> 
>>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>>> 
>>>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>>>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>>> To: krichy@tvnetwork.hu
>>>> Cc: freebsd-fs@freebsd.org
>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>> 
>>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>>>>> Dear devs,
>>>>> 
>>>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>>>> trying to solve it at least for two weeks now, and an ACK or a NACK 
>>>>> would
>>>>> be nice from someone. Also a commit is reverted since it also caused
>>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but 
>>>>> I
>>>>> did not find any reference how to produce that deadlock. In my view
>>>>> reverting that makes my issues disappear, but I dont know what new cases
>>>>> will it rise.
>>>> 
>>>> Richard,
>>>> 
>>>> I won't be able to analyse it myself anytime soon, because of other
>>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>>>> there will be able to help you.
>>>> 
>>>>> I've rewritten traverse() to make more like upstream, added two extra
>>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode 
>>>>> what
>>>>> was passed to it (I dont know even that upon creating a snapshot vnode 
>>>>> why
>>>>> is that extra two holds needed for the vnode.) And unfortunately, due to
>>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also
>>>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>>>> 
>>>>> After this, one may also get a deadlock, when a simple access would call
>>>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes
>>>>> should always be covered, but after some stress test, sometimes we hit
>>>>> that call, and that can cause again deadlocks. In our environment I've
>>>>> just uncommented that callback, which returns ENODIR on some calls, but 
>>>>> at
>>>>> least not a deadlock.
>>>>> 
>>>>> The attached script can be used to reproduce my cases (would one confirm
>>>>> that?), and after the patches applied, they disappear (confirm?).
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> 
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> 
>>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>> 
>>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: pjd@freebsd.org
>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>> 
>>>>>> Dear devs,
>>>>>> 
>>>>>> I will sum up my experience regarding the issue:
>>>>>> 
>>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the
>>>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>>>> 
>>>>>> After investigating the problem, I found that zfs send umounts the 
>>>>>> snapshots,
>>>>>> and that causes the deadlock, so later I tested only with concurrent 
>>>>>> umount
>>>>>> and the "activity". More later I found that listing the snapshots in
>>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I 
>>>>>> used
>>>>>> them for the tests. But for my surprise, instead of a deadlock, a 
>>>>>> recursive
>>>>>> lock panic has arised.
>>>>>> 
>>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's 
>>>>>> zfsctl_snapdir_t
>>>>>> structure (sdp). This contains a tree of mounted snapshots, and each 
>>>>>> entry
>>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on 
>>>>>> top
>>>>>> (se_root). The strange is that the se_root member does not hold a 
>>>>>> reference
>>>>>> for the vnode, just a simple pointer to it.
>>>>>> 
>>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is 
>>>>>> locked,
>>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it
>>>>>> exists already. If it founds no entry, does the mount. In the case of 
>>>>>> an
>>>>>> entry was found, the se_root member contains the vnode which the 
>>>>>> snapshot is
>>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call 
>>>>>> will
>>>>>> resolve to the real root vnode of the mounted snapshot, returning it as
>>>>>> locked. (Examining the traverse() code I've found that it did not 
>>>>>> follow
>>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>>>> 
>>>>>> On the other way, when an umount is issued, the se_root vnode looses 
>>>>>> its last
>>>>>> reference (as only the mountpoint holds one for it), it goes through 
>>>>>> the
>>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is 
>>>>>> called
>>>>>> with a locked vnode, so this is a deadlock race condition. While
>>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and 
>>>>>> traverse()
>>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock 
>>>>>> on
>>>>>> se_root while tries to access the sdp lock.
>>>>>> 
>>>>>> The zfsctl_snapshot_inactive() has an if statement checking the 
>>>>>> v_usecount,
>>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it 
>>>>>> is
>>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() 
>>>>>> path
>>>>>> assumes that the vnode remains inactive (as opposed to illumos, at 
>>>>>> least how
>>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources 
>>>>>> while in
>>>>>> a locked state. I was a bit confused, and probably that is why the 
>>>>>> previously
>>>>>> posted patch is as is.
>>>>>> 
>>>>>> Maybe if I had some clues on the directions of this problem, I could 
>>>>>> have
>>>>>> worked more for a nicer, shorter solution.
>>>>>> 
>>>>>> Please someone comment on my post.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>> 
>>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: pjd@freebsd.org
>>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>>> 
>>>>>>> Dear PJD,
>>>>>>> 
>>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>>>> regarding some issues in ZFS. Please give some advice for me, where to 
>>>>>>> look
>>>>>>> for solutions, or how could I help to resolve those issues.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> 
>>>>>>> ---------- Forwarded message ----------
>>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>> Subject: Re: kern/184677
>>>>>>> 
>>>>>>> 
>>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on 
>>>>>>> Fri
>>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>>>> situation. Any comments on this?
>>>>>>> 
>>>>>>> 
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> 
>>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>>> 
>>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>>>> From: krichy@tvnetwork.hu
>>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>>> Subject: kern/184677
>>>>>>>> 
>>>>>>>> Dear devs,
>>>>>>>> 
>>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, 
>>>>>>>> and
>>>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>>>> 
>>>>>>>> But unfortunately, my original problem is still not solved, maybe the 
>>>>>>>> same
>>>>>>>> as Ryan's:
>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>>>> 
>>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to 
>>>>>>>> acquire
>>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a
>>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason 
>>>>>>>> they
>>>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock 
>>>>>>>> with
>>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>>>> 
>>>>>>>> I would like to help making ZFS more stable on freebsd also with its 
>>>>>>>> whole
>>>>>>>> functionality. I would be very thankful if some expert would give 
>>>>>>>> some
>>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kojedzinszky Richard
>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely 
>>>>> and
>>>>>  replace it with the"
>>>>> 
>>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>>>> 
>>>>> Conflicts:
>>>>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> ---
>>>>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>>>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>>>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 
>>>>> +++++++++------------
>>>>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>>>> 
>>>>> diff --git 
>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>> index af2def2..8272c4d 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>>>  extern minor_t zfsdev_minor_alloc(void);
>>>>>
>>>>>  extern void *zfsdev_state;
>>>>> +extern kmutex_t zfsdev_state_lock;
>>>>>
>>>>>  #endif	/* _KERNEL */
>>>>> 
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>> index 15685a5..5c3e9f3 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>>> *max_psize,
>>>>>  	struct g_provider *pp;
>>>>>  	struct g_consumer *cp;
>>>>>  	size_t bufsize;
>>>>> -	int error;
>>>>> +	int error, lock;
>>>>>
>>>>>  	/*
>>>>>  	 * We must have a pathname, and it must be absolute.
>>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>> uint64_t *max_psize,
>>>>>
>>>>>  	vd->vdev_tsd = NULL;
>>>>> 
>>>>> +	if (mutex_owned(&spa_namespace_lock)) {
>>>>> +		mutex_exit(&spa_namespace_lock);
>>>>> +		lock = 1;
>>>>> +	} else {
>>>>> +		lock = 0;
>>>>> +	}
>>>>>  	DROP_GIANT();
>>>>>  	g_topology_lock();
>>>>>  	error = 0;
>>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>> uint64_t *max_psize,
>>>>>  	    !ISP2(cp->provider->sectorsize)) {
>>>>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>>>  		    vd->vdev_path);
>>>>> +
>>>>> +		g_topology_lock();
>>>>>  		vdev_geom_detach(cp, 0);
>>>>> +		g_topology_unlock();
>>>>> +
>>>>>  		error = EINVAL;
>>>>>  		cp = NULL;
>>>>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>>> *max_psize,
>>>>>  	}
>>>>>  	g_topology_unlock();
>>>>>  	PICKUP_GIANT();
>>>>> +	if (lock)
>>>>> +		mutex_enter(&spa_namespace_lock);
>>>>>  	if (cp == NULL) {
>>>>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>>>  		return (error);
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>> index e9fba26..91becde 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>>>  	static minor_t last_minor;
>>>>>  	minor_t m;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>>>>  		if (m > ZFSDEV_MAX_MINOR)
>>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>  	minor_t minor;
>>>>>  	zfs_soft_state_t *zs;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	minor = zfsdev_minor_alloc();
>>>>>  	if (minor == 0)
>>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>  static void
>>>>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>>>  {
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	zfs_onexit_destroy(zo);
>>>>>  	ddi_soft_state_free(zfsdev_state, minor);
>>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, 
>>>>> struct thread *td)
>>>>>
>>>>>  	/* This is the control device. Allocate a new minor if requested. */
>>>>>  	if (flag & FEXCL) {
>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>> +		mutex_enter(&zfsdev_state_lock);
>>>>>  		error = zfs_ctldev_init(devp);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  	}
>>>>>
>>>>>  	return (error);
>>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>>>  	if (minor == 0)
>>>>>  		return;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>>>  	if (zo == NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return;
>>>>>  	}
>>>>>  	zfs_ctldev_destroy(zo, minor);
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  }
>>>>>
>>>>>  static int
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> index 72d4502..aec5219 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>>>>
>>>>>  /*
>>>>> - * The spa_namespace_lock protects the zfsdev_state structure from 
>>>>> being
>>>>> - * modified while it's being used, e.g. an open that comes in before a
>>>>> - * create finishes.  It also protects temporary opens of the dataset so 
>>>>> that,
>>>>> + * This lock protects the zfsdev_state structure from being modified
>>>>> + * while it's being used, e.g. an open that comes in before a create
>>>>> + * finishes.  It also protects temporary opens of the dataset so that,
>>>>>   * e.g., an open doesn't get a spurious EBUSY.
>>>>>   */
>>>>> +kmutex_t zfsdev_state_lock;
>>>>>  static uint32_t zvol_minors;
>>>>>
>>>>>  typedef struct zvol_extent {
>>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>>>  	struct g_geom *gp;
>>>>>  	zvol_state_t *zv = NULL;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	g_topology_lock();
>>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>>>  {
>>>>>  	zvol_state_t *zv;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	zv = zvol_minor_lookup(name);
>>>>>  	if (minor && zv)
>>>>>  		*minor = zv->zv_minor;
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (zv ? 0 : -1);
>>>>>  }
>>>>>  #endif	/* sun */
>>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>>>
>>>>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	if (zvol_minor_lookup(name) != NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EEXIST));
>>>>>  	}
>>>>> 
>>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>>>
>>>>>  	if (error) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (error);
>>>>>  	}
>>>>>
>>>>>  #ifdef sun
>>>>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>
>>>>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>  	}
>>>>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>  	}
>>>>> 
>>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>  	}
>>>>> 
>>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>>>
>>>>>  	zvol_minors++;
>>>>> 
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>
>>>>>  	zvol_geom_run(zv);
>>>>> 
>>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>>>  	minor_t minor = zv->zv_minor;
>>>>>  #endif
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>  	if (zv->zv_total_opens != 0)
>>>>>  		return (SET_ERROR(EBUSY));
>>>>> 
>>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>>>  	zvol_state_t *zv;
>>>>>  	int rc;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>  	g_topology_lock();
>>>>>  	rc = zvol_remove_zv(zv);
>>>>>  	g_topology_unlock();
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (rc);
>>>>>  }
>>>>> 
>>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>>>  	dmu_tx_t *tx;
>>>>>  	int error;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	tx = dmu_tx_create(os);
>>>>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>>>  	namelen = strlen(name);
>>>>>
>>>>>  	DROP_GIANT();
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	g_topology_lock();
>>>>>
>>>>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>>>  	}
>>>>>
>>>>>  	g_topology_unlock();
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	PICKUP_GIANT();
>>>>>  }
>>>>> 
>>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>>> uint64_t volsize)
>>>>>  	uint64_t old_volsize = 0ULL;
>>>>>  	uint64_t readonly;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	zv = zvol_minor_lookup(name);
>>>>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (error);
>>>>>  	}
>>>>> 
>>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>>> uint64_t volsize)
>>>>>  out:
>>>>>  	dmu_objset_rele(os, FTAG);
>>>>> 
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>
>>>>>  	return (error);
>>>>>  }
>>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  {
>>>>>  	zvol_state_t *zv;
>>>>>  	int err = 0;
>>>>> -	boolean_t locked = B_FALSE;
>>>>> 
>>>>> -	/*
>>>>> -	 * Protect against recursively entering spa_namespace_lock
>>>>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>>>>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>>>>> -	 * with spa_namespace_lock in the ZVOL code.
>>>>> -	 * We are using the same trick as spa_open().
>>>>> -	 * Note that calls in zvol_first_open which need to resolve
>>>>> -	 * pool name to a spa object will enter spa_open()
>>>>> -	 * recursively, but that function already has all the
>>>>> -	 * necessary protection.
>>>>> -	 */
>>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>> -		locked = B_TRUE;
>>>>> -	}
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	zv = pp->private;
>>>>>  	if (zv == NULL) {
>>>>> -		if (locked)
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>
>>>>>  	if (zv->zv_total_opens == 0)
>>>>>  		err = zvol_first_open(zv);
>>>>>  	if (err) {
>>>>> -		if (locked)
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (err);
>>>>>  	}
>>>>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  #endif
>>>>>
>>>>>  	zv->zv_total_opens += count;
>>>>> -	if (locked)
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>
>>>>>  	return (err);
>>>>>  out:
>>>>>  	if (zv->zv_total_opens == 0)
>>>>>  		zvol_last_close(zv);
>>>>> -	if (locked)
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (err);
>>>>>  }
>>>>> 
>>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  {
>>>>>  	zvol_state_t *zv;
>>>>>  	int error = 0;
>>>>> -	boolean_t locked = B_FALSE;
>>>>> 
>>>>> -	/* See comment in zvol_open(). */
>>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>> -		locked = B_TRUE;
>>>>> -	}
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	zv = pp->private;
>>>>>  	if (zv == NULL) {
>>>>> -		if (locked)
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>> 
>>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  	if (zv->zv_total_opens == 0)
>>>>>  		zvol_last_close(zv);
>>>>> 
>>>>> -	if (locked)
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (error);
>>>>>  }
>>>>> 
>>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  	int error = 0;
>>>>>  	rl_t *rl;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>>>
>>>>>  	if (zv == NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>  	ASSERT(zv->zv_total_opens > 0);
>>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  		dki.dki_ctype = DKC_UNKNOWN;
>>>>>  		dki.dki_unit = getminor(dev);
>>>>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - 
>>>>> zv->zv_min_bs);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>>>  			error = SET_ERROR(EFAULT);
>>>>>  		return (error);
>>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>>>  		dkm.dki_media_type = DK_UNKNOWN;
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>>>  			error = SET_ERROR(EFAULT);
>>>>>  		return (error);
>>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  			uint64_t vs = zv->zv_volsize;
>>>>>  			uint8_t bs = zv->zv_min_bs;
>>>>> 
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +			mutex_exit(&zfsdev_state_lock);
>>>>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>>>>  			return (error);
>>>>>  		}
>>>>>
>>>>>  	case DKIOCFLUSHWRITECACHE:
>>>>>  		dkc = (struct dk_callback *)arg;
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  			}
>>>>>  			if (wce) {
>>>>>  				zv->zv_flags |= ZVOL_WCE;
>>>>> -				mutex_exit(&spa_namespace_lock);
>>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>>  			} else {
>>>>>  				zv->zv_flags &= ~ZVOL_WCE;
>>>>> -				mutex_exit(&spa_namespace_lock);
>>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>  			}
>>>>>  			return (0);
>>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  		break;
>>>>>
>>>>>  	}
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (error);
>>>>>  }
>>>>>  #endif	/* sun */
>>>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>>>  {
>>>>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>>>  	    1) == 0);
>>>>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>>>>  }
>>>>>
>>>>>  void
>>>>>  zvol_fini(void)
>>>>>  {
>>>>> +	mutex_destroy(&zfsdev_state_lock);
>>>>>  	ddi_soft_state_fini(&zfsdev_state);
>>>>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>>>>  }
>>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize)
>>>>>  	uint64_t version = spa_version(spa);
>>>>>  	enum zio_checksum checksum;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>>>
>>>>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char 
>>>>> *newname)
>>>>>  	struct g_provider *pp;
>>>>>  	zvol_state_t *zv;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>  	g_topology_assert();
>>>>>
>>>>>  	pp = LIST_FIRST(&gp->provider);
>>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char 
>>>>> *newname)
>>>>>  	newnamelen = strlen(newname);
>>>>>
>>>>>  	DROP_GIANT();
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	g_topology_lock();
>>>>>
>>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char 
>>>>> *newname)
>>>>>  	}
>>>>>
>>>>>  	g_topology_unlock();
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	PICKUP_GIANT();
>>>>>  }
>>>>> --
>>>>> 1.8.4.2
>>>>> 
>>>> 
>>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>>>> 
>>>>> ---
>>>>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 
>>>>> +++++++++++++++-------
>>>>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>>>> 
>>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c 
>>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>> index 94383d6..4cac053 100644
>>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>  	 * progress on this vnode.
>>>>>  	 */
>>>>> 
>>>>> +	vn_lock(cvp, lktype);
>>>>> +
>>>>>  	for (;;) {
>>>>>  		/*
>>>>>  		 * Reached the end of the mount chain?
>>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>  		if (vfsp == NULL)
>>>>>  			break;
>>>>>  		error = vfs_busy(vfsp, 0);
>>>>> -		/*
>>>>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>>>>> -		 */
>>>>> -		if (tvp != NULL)
>>>>> -			vput(cvp);
>>>>> -		else
>>>>> -			vrele(cvp);
>>>>> +		VOP_UNLOCK(cvp, 0);
>>>>>  		if (error)
>>>>>  			return (error);
>>>>> 
>>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>  		vfs_unbusy(vfsp);
>>>>>  		if (error != 0)
>>>>>  			return (error);
>>>>> +
>>>>> +		VN_RELE(cvp);
>>>>> +
>>>>>  		cvp = tvp;
>>>>>  	}
>>>>> 
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>> index 28ab1fa..d3464b4 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>>>  		return (0);
>>>>>  }
>>>>> 
>>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>>>> + * On return the passed in vp is unlocked */
>>>>> +static zfsctl_snapdir_t *
>>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>>>> +{
>>>>> +	gfs_dir_t *dp = vp->v_data;
>>>>> +	*dvpp = dp->gfsd_file.gfs_parent;
>>>>> +	zfsctl_snapdir_t *sdp;
>>>>> +
>>>>> +	VN_HOLD(*dvpp);
>>>>> +	VOP_UNLOCK(vp, 0);
>>>>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>>>> +	sdp = (*dvpp)->v_data;
>>>>> +	VOP_UNLOCK(*dvpp, 0);
>>>>> +
>>>>> +	return (sdp);
>>>>> +}
>>>>> +
>>>>>  #ifdef sun
>>>>>  vnodeops_t *zfsctl_ops_root;
>>>>>  vnodeops_t *zfsctl_ops_snapdir;
>>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>>>  			 * The snapshot was unmounted behind our backs,
>>>>>  			 * try to remount it.
>>>>>  			 */
>>>>> +			VOP_UNLOCK(*vpp, 0);
>>>>> +			VN_HOLD(*vpp);
>>>>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, 
>>>>> snapname) == 0);
>>>>>  			goto domount;
>>>>>  		} else {
>>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>>>  	(void) strcpy(sep->se_name, nm);
>>>>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, 
>>>>> dmu_objset_id(snap));
>>>>> -	VN_HOLD(*vpp);
>>>>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>>>>
>>>>>  	dmu_objset_rele(snap, FTAG);
>>>>> @@ -1075,6 +1095,7 @@ domount:
>>>>>  	(void) snprintf(mountpoint, mountpoint_len,
>>>>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>>>> +	VN_HOLD(*vpp);
>>>>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>>>  	kmem_free(mountpoint, mountpoint_len);
>>>>>  	if (err == 0) {
>>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>>>  	int locked;
>>>>>  	vnode_t *dvp;
>>>>> 
>>>>> -	if (vp->v_count > 0)
>>>>> -		goto end;
>>>>> -
>>>>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>>>> -	sdp = dvp->v_data;
>>>>> -	VOP_UNLOCK(dvp, 0);
>>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>
>>>>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>>>  		mutex_enter(&sdp->sd_lock);
>>>>> 
>>>>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>>>> +
>>>>> +	if (vp->v_count > 0) {
>>>>> +		mutex_exit(&sdp->sd_lock);
>>>>> +		return (0);
>>>>> +	}
>>>>> +
>>>>>  	ASSERT(!vn_ismntpt(vp));
>>>>>
>>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>>>  		mutex_exit(&sdp->sd_lock);
>>>>>  	VN_RELE(dvp);
>>>>> 
>>>>> -end:
>>>>>  	/*
>>>>>  	 * Dispose of the vnode for the snapshot mount point.
>>>>>  	 * This is safe to do because once this entry has been removed
>>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>>>  static int
>>>>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>>>  {
>>>>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>>>> -	vnode_t *dvp, *vp;
>>>>> +	vnode_t *dvp, *vp = ap->a_vp;
>>>>>  	zfsctl_snapdir_t *sdp;
>>>>>  	zfs_snapentry_t *sep;
>>>>> -	int error;
>>>>> +	int error = 0;
>>>>> 
>>>>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>>>>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>>>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>>>> -	if (error != 0)
>>>>> -		return (error);
>>>>> -	sdp = dvp->v_data;
>>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>
>>>>>  	mutex_enter(&sdp->sd_lock);
>>>>> +
>>>>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>>>>> +
>>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>>  	while (sep != NULL) {
>>>>>  		vp = sep->se_root;
>>>>> --
>>>>> 1.8.4.2
>>>>> 
>>>> 
>>>> -- 
>>>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>>>> FreeBSD committer                         http://www.FreeBSD.org
>>>> Am I Evil? Yes, I Am!                     http://mobter.com
>>>> 
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>> 
>