From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 20:29:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CA68B3C7; Tue, 24 Dec 2013 20:29:01 +0000 (UTC) Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10]) by mx1.freebsd.org (Postfix) with ESMTP id 39C291DEA; Tue, 24 Dec 2013 20:29:00 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id B01F2A894; Tue, 24 Dec 2013 21:28:39 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id AB25AA893; Tue, 24 Dec 2013 21:28:39 +0100 (CET) Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 20:29:01 -0000 Dear all, Unfortunately, this seems to be serious, as a plain user can deadlock zfs using the attached script, and can make a DoS against the mounted dataset's .zfs/snapshot directory. Regards, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear devs, > > A third one, again the snapshot dirs: > > # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & > # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null > > Will deadlock in a few seconds. > > The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries to > lock .zfs, while the other just tries to do this in reversed order. > > In a vop_lookup, why is the passed in vnode, and the returned vnode must be > locked? Why is not enough to have it held? > > Thanks in advance, > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: > >> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) >> From: krichy@tvnetwork.hu >> To: Pawel Jakub Dawidek >> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> Dear Pawel, >> >> Thanks for your response. I hope someone will review my work. >> >> Secondly, I think I;ve found another deadlock scenario: >> >> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock >> held. Then it does an unmount of all snapshots, which in turn goes to >> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. >> >> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does >> a snapshot mount, which somewhere tries to acquire spa_namespace_lock. So >> it deadlocks. Currently I dont know how to workaround this. >> >> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? >> >> What if we release it function enter, and re-acquire upon exit? >> >> Thanks in advance, >> >> >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: >> >>> Date: Fri, 20 Dec 2013 14:44:05 +0100 >>> From: Pawel Jakub Dawidek >>> To: krichy@tvnetwork.hu >>> Cc: freebsd-fs@freebsd.org >>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>> >>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>>> Dear devs, >>>> >>>> I am attaching a more clear patch/fix for my snapshot handling issues >>>> (0002), but I would be happy if some ZFS expert would comment it. I am >>>> trying to solve it at least for two weeks now, and an ACK or a NACK would >>>> be nice from someone. Also a commit is reverted since it also caused >>>> deadlocks. I've read its comments, which also eliminates deadlocks, but I >>>> did not find any reference how to produce that deadlock. In my view >>>> reverting that makes my issues disappear, but I dont know what new cases >>>> will it rise. >>> >>> Richard, >>> >>> I won't be able to analyse it myself anytime soon, because of other >>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >>> there will be able to help you. >>> >>>> I've rewritten traverse() to make more like upstream, added two extra >>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what >>>> was passed to it (I dont know even that upon creating a snapshot vnode >>>> why >>>> is that extra two holds needed for the vnode.) And unfortunately, due to >>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >>>> cause deadlocks, so zfsctl_snapshot_inactive() and >>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>>> >>>> After this, one may also get a deadlock, when a simple access would call >>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >>>> should always be covered, but after some stress test, sometimes we hit >>>> that call, and that can cause again deadlocks. In our environment I've >>>> just uncommented that callback, which returns ENODIR on some calls, but >>>> at >>>> least not a deadlock. >>>> >>>> The attached script can be used to reproduce my cases (would one confirm >>>> that?), and after the patches applied, they disappear (confirm?). >>>> >>>> Thanks, >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>>> >>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>>> From: krichy@tvnetwork.hu >>>>> To: pjd@freebsd.org >>>>> Cc: freebsd-fs@freebsd.org >>>>> Subject: Re: kern/184677 (fwd) >>>>> >>>>> Dear devs, >>>>> >>>>> I will sum up my experience regarding the issue: >>>>> >>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>>> >>>>> After investigating the problem, I found that zfs send umounts the >>>>> snapshots, >>>>> and that causes the deadlock, so later I tested only with concurrent >>>>> umount >>>>> and the "activity". More later I found that listing the snapshots in >>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I >>>>> used >>>>> them for the tests. But for my surprise, instead of a deadlock, a >>>>> recursive >>>>> lock panic has arised. >>>>> >>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>>> zfsctl_snapdir_t >>>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>>> entry >>>>> (sep) contains the vnode of entry on which the snapshot is mounted on >>>>> top >>>>> (se_root). The strange is that the se_root member does not hold a >>>>> reference >>>>> for the vnode, just a simple pointer to it. >>>>> >>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>>> locked, >>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>>>> exists already. If it founds no entry, does the mount. In the case of an >>>>> entry was found, the se_root member contains the vnode which the >>>>> snapshot is >>>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>>> will >>>>> resolve to the real root vnode of the mounted snapshot, returning it as >>>>> locked. (Examining the traverse() code I've found that it did not follow >>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>>> >>>>> On the other way, when an umount is issued, the se_root vnode looses its >>>>> last >>>>> reference (as only the mountpoint holds one for it), it goes through the >>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>>> called >>>>> with a locked vnode, so this is a deadlock race condition. While >>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse() >>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock >>>>> on >>>>> se_root while tries to access the sdp lock. >>>>> >>>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>>> v_usecount, >>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it >>>>> is >>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>>> path >>>>> assumes that the vnode remains inactive (as opposed to illumos, at least >>>>> how >>>>> i read the code). So zfsctl_snapshot_inactive() must free resources >>>>> while in >>>>> a locked state. I was a bit confused, and probably that is why the >>>>> previously >>>>> posted patch is as is. >>>>> >>>>> Maybe if I had some clues on the directions of this problem, I could >>>>> have >>>>> worked more for a nicer, shorter solution. >>>>> >>>>> Please someone comment on my post. >>>>> >>>>> Regards, >>>>> >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> >>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>> >>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: pjd@freebsd.org >>>>>> Cc: freebsd-fs@freebsd.org >>>>>> Subject: Re: kern/184677 (fwd) >>>>>> >>>>>> Dear PJD, >>>>>> >>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>>> regarding some issues in ZFS. Please give some advice for me, where to >>>>>> look >>>>>> for solutions, or how could I help to resolve those issues. >>>>>> >>>>>> Regards, >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: freebsd-fs@freebsd.org >>>>>> Subject: Re: kern/184677 >>>>>> >>>>>> >>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>>> Fri >>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>>> situation. Any comments on this? >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>> >>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: freebsd-fs@freebsd.org >>>>>>> Subject: kern/184677 >>>>>>> >>>>>>> Dear devs, >>>>>>> >>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, >>>>>>> and >>>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>>> >>>>>>> But unfortunately, my original problem is still not solved, maybe the >>>>>>> same >>>>>>> as Ryan's: >>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>>> >>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>>> acquire >>>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>>>> deadlock scenario. I looked at illumos's code, and for some reason >>>>>>> they >>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock >>>>>>> with >>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>>> >>>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>>> whole >>>>>>> functionality. I would be very thankful if some expert would give some >>>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>> >>>>> >>> >>> >>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>>> From: Richard Kojedzinszky >>>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and >>>> replace it with the" >>>> >>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>>> >>>> Conflicts: >>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> --- >>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>>> +++++++++------------ >>>> 4 files changed, 70 insertions(+), 80 deletions(-) >>>> >>>> diff --git >>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> index af2def2..8272c4d 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>>> extern minor_t zfsdev_minor_alloc(void); >>>> >>>> extern void *zfsdev_state; >>>> +extern kmutex_t zfsdev_state_lock; >>>> >>>> #endif /* _KERNEL */ >>>> >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> index 15685a5..5c3e9f3 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> struct g_provider *pp; >>>> struct g_consumer *cp; >>>> size_t bufsize; >>>> - int error; >>>> + int error, lock; >>>> >>>> /* >>>> * We must have a pathname, and it must be absolute. >>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> >>>> vd->vdev_tsd = NULL; >>>> >>>> + if (mutex_owned(&spa_namespace_lock)) { >>>> + mutex_exit(&spa_namespace_lock); >>>> + lock = 1; >>>> + } else { >>>> + lock = 0; >>>> + } >>>> DROP_GIANT(); >>>> g_topology_lock(); >>>> error = 0; >>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> !ISP2(cp->provider->sectorsize)) { >>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>>> vd->vdev_path); >>>> + >>>> + g_topology_lock(); >>>> vdev_geom_detach(cp, 0); >>>> + g_topology_unlock(); >>>> + >>>> error = EINVAL; >>>> cp = NULL; >>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> } >>>> g_topology_unlock(); >>>> PICKUP_GIANT(); >>>> + if (lock) >>>> + mutex_enter(&spa_namespace_lock); >>>> if (cp == NULL) { >>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>>> return (error); >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> index e9fba26..91becde 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>>> static minor_t last_minor; >>>> minor_t m; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> for (m = last_minor + 1; m != last_minor; m++) { >>>> if (m > ZFSDEV_MAX_MINOR) >>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>>> minor_t minor; >>>> zfs_soft_state_t *zs; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> minor = zfsdev_minor_alloc(); >>>> if (minor == 0) >>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>>> static void >>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>>> { >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> zfs_onexit_destroy(zo); >>>> ddi_soft_state_free(zfsdev_state, minor); >>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, >>>> struct thread *td) >>>> >>>> /* This is the control device. Allocate a new minor if requested. */ >>>> if (flag & FEXCL) { >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> error = zfs_ctldev_init(devp); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> } >>>> >>>> return (error); >>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>>> if (minor == 0) >>>> return; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>>> if (zo == NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return; >>>> } >>>> zfs_ctldev_destroy(zo, minor); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> } >>>> >>>> static int >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> index 72d4502..aec5219 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>>> #define ZVOL_DUMPSIZE "dumpsize" >>>> >>>> /* >>>> - * The spa_namespace_lock protects the zfsdev_state structure from being >>>> - * modified while it's being used, e.g. an open that comes in before a >>>> - * create finishes. It also protects temporary opens of the dataset so >>>> that, >>>> + * This lock protects the zfsdev_state structure from being modified >>>> + * while it's being used, e.g. an open that comes in before a create >>>> + * finishes. It also protects temporary opens of the dataset so that, >>>> * e.g., an open doesn't get a spurious EBUSY. >>>> */ >>>> +kmutex_t zfsdev_state_lock; >>>> static uint32_t zvol_minors; >>>> >>>> typedef struct zvol_extent { >>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>>> struct g_geom *gp; >>>> zvol_state_t *zv = NULL; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> g_topology_lock(); >>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>>> { >>>> zvol_state_t *zv; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> zv = zvol_minor_lookup(name); >>>> if (minor && zv) >>>> *minor = zv->zv_minor; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (zv ? 0 : -1); >>>> } >>>> #endif /* sun */ >>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>>> >>>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> if (zvol_minor_lookup(name) != NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EEXIST)); >>>> } >>>> >>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>>> >>>> if (error) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> >>>> #ifdef sun >>>> if ((minor = zfsdev_minor_alloc()) == 0) { >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> >>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EAGAIN)); >>>> } >>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>>> ddi_soft_state_free(zfsdev_state, minor); >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EAGAIN)); >>>> } >>>> >>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>>> ddi_remove_minor_node(zfs_dip, chrbuf); >>>> ddi_soft_state_free(zfsdev_state, minor); >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EAGAIN)); >>>> } >>>> >>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>>> >>>> zvol_minors++; >>>> >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> >>>> zvol_geom_run(zv); >>>> >>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>>> minor_t minor = zv->zv_minor; >>>> #endif >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> if (zv->zv_total_opens != 0) >>>> return (SET_ERROR(EBUSY)); >>>> >>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>>> zvol_state_t *zv; >>>> int rc; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> g_topology_lock(); >>>> rc = zvol_remove_zv(zv); >>>> g_topology_unlock(); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (rc); >>>> } >>>> >>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>>> dmu_tx_t *tx; >>>> int error; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> tx = dmu_tx_create(os); >>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>>> namelen = strlen(name); >>>> >>>> DROP_GIANT(); >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> g_topology_lock(); >>>> >>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>>> } >>>> >>>> g_topology_unlock(); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> PICKUP_GIANT(); >>>> } >>>> >>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>>> uint64_t volsize) >>>> uint64_t old_volsize = 0ULL; >>>> uint64_t readonly; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> zv = zvol_minor_lookup(name); >>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> >>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>>> uint64_t volsize) >>>> out: >>>> dmu_objset_rele(os, FTAG); >>>> >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> >>>> return (error); >>>> } >>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>>> count) >>>> { >>>> zvol_state_t *zv; >>>> int err = 0; >>>> - boolean_t locked = B_FALSE; >>>> >>>> - /* >>>> - * Protect against recursively entering spa_namespace_lock >>>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>>> - * This is needed since we replaced upstream zfsdev_state_lock >>>> - * with spa_namespace_lock in the ZVOL code. >>>> - * We are using the same trick as spa_open(). >>>> - * Note that calls in zvol_first_open which need to resolve >>>> - * pool name to a spa object will enter spa_open() >>>> - * recursively, but that function already has all the >>>> - * necessary protection. >>>> - */ >>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>> - mutex_enter(&spa_namespace_lock); >>>> - locked = B_TRUE; >>>> - } >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> zv = pp->private; >>>> if (zv == NULL) { >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> >>>> if (zv->zv_total_opens == 0) >>>> err = zvol_first_open(zv); >>>> if (err) { >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (err); >>>> } >>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>>> count) >>>> #endif >>>> >>>> zv->zv_total_opens += count; >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> >>>> return (err); >>>> out: >>>> if (zv->zv_total_opens == 0) >>>> zvol_last_close(zv); >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (err); >>>> } >>>> >>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>>> count) >>>> { >>>> zvol_state_t *zv; >>>> int error = 0; >>>> - boolean_t locked = B_FALSE; >>>> >>>> - /* See comment in zvol_open(). */ >>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>> - mutex_enter(&spa_namespace_lock); >>>> - locked = B_TRUE; >>>> - } >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> zv = pp->private; >>>> if (zv == NULL) { >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> >>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>>> count) >>>> if (zv->zv_total_opens == 0) >>>> zvol_last_close(zv); >>>> >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> >>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> int error = 0; >>>> rl_t *rl; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>>> >>>> if (zv == NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> ASSERT(zv->zv_total_opens > 0); >>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> dki.dki_ctype = DKC_UNKNOWN; >>>> dki.dki_unit = getminor(dev); >>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>>> zv->zv_min_bs); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>>> error = SET_ERROR(EFAULT); >>>> return (error); >>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>>> dkm.dki_media_type = DK_UNKNOWN; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>>> error = SET_ERROR(EFAULT); >>>> return (error); >>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> uint64_t vs = zv->zv_volsize; >>>> uint8_t bs = zv->zv_min_bs; >>>> >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> error = zvol_getefi((void *)arg, flag, vs, bs); >>>> return (error); >>>> } >>>> >>>> case DKIOCFLUSHWRITECACHE: >>>> dkc = (struct dk_callback *)arg; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> } >>>> if (wce) { >>>> zv->zv_flags |= ZVOL_WCE; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> } else { >>>> zv->zv_flags &= ~ZVOL_WCE; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>> } >>>> return (0); >>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> break; >>>> >>>> } >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> #endif /* sun */ >>>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>>> { >>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>>> 1) == 0); >>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>>> ZFS_LOG(1, "ZVOL Initialized."); >>>> } >>>> >>>> void >>>> zvol_fini(void) >>>> { >>>> + mutex_destroy(&zfsdev_state_lock); >>>> ddi_soft_state_fini(&zfsdev_state); >>>> ZFS_LOG(1, "ZVOL Deinitialized."); >>>> } >>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >>>> uint64_t version = spa_version(spa); >>>> enum zio_checksum checksum; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>>> >>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>>> *newname) >>>> struct g_provider *pp; >>>> zvol_state_t *zv; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> g_topology_assert(); >>>> >>>> pp = LIST_FIRST(&gp->provider); >>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char >>>> *newname) >>>> newnamelen = strlen(newname); >>>> >>>> DROP_GIANT(); >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> g_topology_lock(); >>>> >>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char >>>> *newname) >>>> } >>>> >>>> g_topology_unlock(); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> PICKUP_GIANT(); >>>> } >>>> -- >>>> 1.8.4.2 >>>> >>> >>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>>> From: Richard Kojedzinszky >>>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>>> >>>> --- >>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>>> +++++++++++++++------- >>>> 2 files changed, 42 insertions(+), 24 deletions(-) >>>> >>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> index 94383d6..4cac053 100644 >>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>>> * progress on this vnode. >>>> */ >>>> >>>> + vn_lock(cvp, lktype); >>>> + >>>> for (;;) { >>>> /* >>>> * Reached the end of the mount chain? >>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>>> if (vfsp == NULL) >>>> break; >>>> error = vfs_busy(vfsp, 0); >>>> - /* >>>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>>> - */ >>>> - if (tvp != NULL) >>>> - vput(cvp); >>>> - else >>>> - vrele(cvp); >>>> + VOP_UNLOCK(cvp, 0); >>>> if (error) >>>> return (error); >>>> >>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>>> vfs_unbusy(vfsp); >>>> if (error != 0) >>>> return (error); >>>> + >>>> + VN_RELE(cvp); >>>> + >>>> cvp = tvp; >>>> } >>>> >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> index 28ab1fa..d3464b4 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>>> return (0); >>>> } >>>> >>>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>>> + * On return the passed in vp is unlocked */ >>>> +static zfsctl_snapdir_t * >>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>>> +{ >>>> + gfs_dir_t *dp = vp->v_data; >>>> + *dvpp = dp->gfsd_file.gfs_parent; >>>> + zfsctl_snapdir_t *sdp; >>>> + >>>> + VN_HOLD(*dvpp); >>>> + VOP_UNLOCK(vp, 0); >>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>>> + sdp = (*dvpp)->v_data; >>>> + VOP_UNLOCK(*dvpp, 0); >>>> + >>>> + return (sdp); >>>> +} >>>> + >>>> #ifdef sun >>>> vnodeops_t *zfsctl_ops_root; >>>> vnodeops_t *zfsctl_ops_snapdir; >>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>>> * The snapshot was unmounted behind our backs, >>>> * try to remount it. >>>> */ >>>> + VOP_UNLOCK(*vpp, 0); >>>> + VN_HOLD(*vpp); >>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>>> snapname) == 0); >>>> goto domount; >>>> } else { >>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>>> (void) strcpy(sep->se_name, nm); >>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>>> dmu_objset_id(snap)); >>>> - VN_HOLD(*vpp); >>>> avl_insert(&sdp->sd_snaps, sep, where); >>>> >>>> dmu_objset_rele(snap, FTAG); >>>> @@ -1075,6 +1095,7 @@ domount: >>>> (void) snprintf(mountpoint, mountpoint_len, >>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>>> + VN_HOLD(*vpp); >>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>>> kmem_free(mountpoint, mountpoint_len); >>>> if (err == 0) { >>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>>> int locked; >>>> vnode_t *dvp; >>>> >>>> - if (vp->v_count > 0) >>>> - goto end; >>>> - >>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>>> - sdp = dvp->v_data; >>>> - VOP_UNLOCK(dvp, 0); >>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>> >>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>>> mutex_enter(&sdp->sd_lock); >>>> >>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>>> + >>>> + if (vp->v_count > 0) { >>>> + mutex_exit(&sdp->sd_lock); >>>> + return (0); >>>> + } >>>> + >>>> ASSERT(!vn_ismntpt(vp)); >>>> >>>> sep = avl_first(&sdp->sd_snaps); >>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>>> mutex_exit(&sdp->sd_lock); >>>> VN_RELE(dvp); >>>> >>>> -end: >>>> /* >>>> * Dispose of the vnode for the snapshot mount point. >>>> * This is safe to do because once this entry has been removed >>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>>> static int >>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>>> { >>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>>> - vnode_t *dvp, *vp; >>>> + vnode_t *dvp, *vp = ap->a_vp; >>>> zfsctl_snapdir_t *sdp; >>>> zfs_snapentry_t *sep; >>>> - int error; >>>> + int error = 0; >>>> >>>> - ASSERT(zfsvfs->z_ctldir != NULL); >>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>>> - if (error != 0) >>>> - return (error); >>>> - sdp = dvp->v_data; >>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>> >>>> mutex_enter(&sdp->sd_lock); >>>> + >>>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>>> + >>>> sep = avl_first(&sdp->sd_snaps); >>>> while (sep != NULL) { >>>> vp = sep->se_root; >>>> -- >>>> 1.8.4.2 >>>> >>> >>> -- >>> Pawel Jakub Dawidek http://www.wheelsystems.com >>> FreeBSD committer http://www.FreeBSD.org >>> Am I Evil? Yes, I Am! http://mobter.com >>> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 20:35:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B76355FD; Tue, 24 Dec 2013 20:35:37 +0000 (UTC) Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10]) by mx1.freebsd.org (Postfix) with ESMTP id 2838D1E6A; Tue, 24 Dec 2013 20:35:37 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id 9B014A899; Tue, 24 Dec 2013 21:35:16 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id 97E3BA898; Tue, 24 Dec 2013 21:35:16 +0100 (CET) Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 20:35:37 -0000 Dear all, Now a user caused deadlock simply caused buildkernel to be blocked, so I must correct myself in that this not only does a DoS for .zfs/snapshot directories, but may for the whole system. Regards, P.S.: Merry Christmas! Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear all, > > Unfortunately, this seems to be serious, as a plain user can deadlock zfs > using the attached script, and can make a DoS against the mounted dataset's > .zfs/snapshot directory. > > Regards, > > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > >> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) >> From: krichy@tvnetwork.hu >> To: Pawel Jakub Dawidek >> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> Dear devs, >> >> A third one, again the snapshot dirs: >> >> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & >> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null >> >> Will deadlock in a few seconds. >> >> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries >> to lock .zfs, while the other just tries to do this in reversed order. >> >> In a vop_lookup, why is the passed in vnode, and the returned vnode must be >> locked? Why is not enough to have it held? >> >> Thanks in advance, >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: >> >>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) >>> From: krichy@tvnetwork.hu >>> To: Pawel Jakub Dawidek >>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>> >>> Dear Pawel, >>> >>> Thanks for your response. I hope someone will review my work. >>> >>> Secondly, I think I;ve found another deadlock scenario: >>> >>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock >>> held. Then it does an unmount of all snapshots, which in turn goes to >>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. >>> >>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock >>> does a snapshot mount, which somewhere tries to acquire >>> spa_namespace_lock. So it deadlocks. Currently I dont know how to >>> workaround this. >>> >>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? >>> >>> What if we release it function enter, and re-acquire upon exit? >>> >>> Thanks in advance, >>> >>> >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: >>> >>>> Date: Fri, 20 Dec 2013 14:44:05 +0100 >>>> From: Pawel Jakub Dawidek >>>> To: krichy@tvnetwork.hu >>>> Cc: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>>> >>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>>>> Dear devs, >>>>> >>>>> I am attaching a more clear patch/fix for my snapshot handling issues >>>>> (0002), but I would be happy if some ZFS expert would comment it. I am >>>>> trying to solve it at least for two weeks now, and an ACK or a NACK >>>>> would >>>>> be nice from someone. Also a commit is reverted since it also caused >>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but >>>>> I >>>>> did not find any reference how to produce that deadlock. In my view >>>>> reverting that makes my issues disappear, but I dont know what new cases >>>>> will it rise. >>>> >>>> Richard, >>>> >>>> I won't be able to analyse it myself anytime soon, because of other >>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >>>> there will be able to help you. >>>> >>>>> I've rewritten traverse() to make more like upstream, added two extra >>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode >>>>> what >>>>> was passed to it (I dont know even that upon creating a snapshot vnode >>>>> why >>>>> is that extra two holds needed for the vnode.) And unfortunately, due to >>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >>>>> cause deadlocks, so zfsctl_snapshot_inactive() and >>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>>>> >>>>> After this, one may also get a deadlock, when a simple access would call >>>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >>>>> should always be covered, but after some stress test, sometimes we hit >>>>> that call, and that can cause again deadlocks. In our environment I've >>>>> just uncommented that callback, which returns ENODIR on some calls, but >>>>> at >>>>> least not a deadlock. >>>>> >>>>> The attached script can be used to reproduce my cases (would one confirm >>>>> that?), and after the patches applied, they disappear (confirm?). >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> >>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>>>> >>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: pjd@freebsd.org >>>>>> Cc: freebsd-fs@freebsd.org >>>>>> Subject: Re: kern/184677 (fwd) >>>>>> >>>>>> Dear devs, >>>>>> >>>>>> I will sum up my experience regarding the issue: >>>>>> >>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>>>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>>>> >>>>>> After investigating the problem, I found that zfs send umounts the >>>>>> snapshots, >>>>>> and that causes the deadlock, so later I tested only with concurrent >>>>>> umount >>>>>> and the "activity". More later I found that listing the snapshots in >>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I >>>>>> used >>>>>> them for the tests. But for my surprise, instead of a deadlock, a >>>>>> recursive >>>>>> lock panic has arised. >>>>>> >>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>>>> zfsctl_snapdir_t >>>>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>>>> entry >>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on >>>>>> top >>>>>> (se_root). The strange is that the se_root member does not hold a >>>>>> reference >>>>>> for the vnode, just a simple pointer to it. >>>>>> >>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>>>> locked, >>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>>>>> exists already. If it founds no entry, does the mount. In the case of >>>>>> an >>>>>> entry was found, the se_root member contains the vnode which the >>>>>> snapshot is >>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>>>> will >>>>>> resolve to the real root vnode of the mounted snapshot, returning it as >>>>>> locked. (Examining the traverse() code I've found that it did not >>>>>> follow >>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>>>> >>>>>> On the other way, when an umount is issued, the se_root vnode looses >>>>>> its last >>>>>> reference (as only the mountpoint holds one for it), it goes through >>>>>> the >>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>>>> called >>>>>> with a locked vnode, so this is a deadlock race condition. While >>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and >>>>>> traverse() >>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock >>>>>> on >>>>>> se_root while tries to access the sdp lock. >>>>>> >>>>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>>>> v_usecount, >>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it >>>>>> is >>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>>>> path >>>>>> assumes that the vnode remains inactive (as opposed to illumos, at >>>>>> least how >>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources >>>>>> while in >>>>>> a locked state. I was a bit confused, and probably that is why the >>>>>> previously >>>>>> posted patch is as is. >>>>>> >>>>>> Maybe if I had some clues on the directions of this problem, I could >>>>>> have >>>>>> worked more for a nicer, shorter solution. >>>>>> >>>>>> Please someone comment on my post. >>>>>> >>>>>> Regards, >>>>>> >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>> >>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: pjd@freebsd.org >>>>>>> Cc: freebsd-fs@freebsd.org >>>>>>> Subject: Re: kern/184677 (fwd) >>>>>>> >>>>>>> Dear PJD, >>>>>>> >>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>>>> regarding some issues in ZFS. Please give some advice for me, where to >>>>>>> look >>>>>>> for solutions, or how could I help to resolve those issues. >>>>>>> >>>>>>> Regards, >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: freebsd-fs@freebsd.org >>>>>>> Subject: Re: kern/184677 >>>>>>> >>>>>>> >>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>>>> Fri >>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>>>> situation. Any comments on this? >>>>>>> >>>>>>> >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> >>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>>> >>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>>>> From: krichy@tvnetwork.hu >>>>>>>> To: freebsd-fs@freebsd.org >>>>>>>> Subject: kern/184677 >>>>>>>> >>>>>>>> Dear devs, >>>>>>>> >>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, >>>>>>>> and >>>>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>>>> >>>>>>>> But unfortunately, my original problem is still not solved, maybe the >>>>>>>> same >>>>>>>> as Ryan's: >>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>>>> >>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>>>> acquire >>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason >>>>>>>> they >>>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock >>>>>>>> with >>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>>>> >>>>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>>>> whole >>>>>>>> functionality. I would be very thankful if some expert would give >>>>>>>> some >>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> >>>>>>>> >>>>>>>> Kojedzinszky Richard >>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>>> >>>>>> >>>> >>>> >>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>>>> From: Richard Kojedzinszky >>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely >>>>> and >>>>> replace it with the" >>>>> >>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>>>> >>>>> Conflicts: >>>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> --- >>>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>>>> +++++++++------------ >>>>> 4 files changed, 70 insertions(+), 80 deletions(-) >>>>> >>>>> diff --git >>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> index af2def2..8272c4d 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>>>> extern minor_t zfsdev_minor_alloc(void); >>>>> >>>>> extern void *zfsdev_state; >>>>> +extern kmutex_t zfsdev_state_lock; >>>>> >>>>> #endif /* _KERNEL */ >>>>> >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> index 15685a5..5c3e9f3 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>>> *max_psize, >>>>> struct g_provider *pp; >>>>> struct g_consumer *cp; >>>>> size_t bufsize; >>>>> - int error; >>>>> + int error, lock; >>>>> >>>>> /* >>>>> * We must have a pathname, and it must be absolute. >>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>> uint64_t *max_psize, >>>>> >>>>> vd->vdev_tsd = NULL; >>>>> >>>>> + if (mutex_owned(&spa_namespace_lock)) { >>>>> + mutex_exit(&spa_namespace_lock); >>>>> + lock = 1; >>>>> + } else { >>>>> + lock = 0; >>>>> + } >>>>> DROP_GIANT(); >>>>> g_topology_lock(); >>>>> error = 0; >>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>> uint64_t *max_psize, >>>>> !ISP2(cp->provider->sectorsize)) { >>>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>>>> vd->vdev_path); >>>>> + >>>>> + g_topology_lock(); >>>>> vdev_geom_detach(cp, 0); >>>>> + g_topology_unlock(); >>>>> + >>>>> error = EINVAL; >>>>> cp = NULL; >>>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>>> *max_psize, >>>>> } >>>>> g_topology_unlock(); >>>>> PICKUP_GIANT(); >>>>> + if (lock) >>>>> + mutex_enter(&spa_namespace_lock); >>>>> if (cp == NULL) { >>>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>>>> return (error); >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> index e9fba26..91becde 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>>>> static minor_t last_minor; >>>>> minor_t m; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> for (m = last_minor + 1; m != last_minor; m++) { >>>>> if (m > ZFSDEV_MAX_MINOR) >>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>> minor_t minor; >>>>> zfs_soft_state_t *zs; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> minor = zfsdev_minor_alloc(); >>>>> if (minor == 0) >>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>> static void >>>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>>>> { >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> zfs_onexit_destroy(zo); >>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, >>>>> struct thread *td) >>>>> >>>>> /* This is the control device. Allocate a new minor if requested. */ >>>>> if (flag & FEXCL) { >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> error = zfs_ctldev_init(devp); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> } >>>>> >>>>> return (error); >>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>>>> if (minor == 0) >>>>> return; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>>>> if (zo == NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return; >>>>> } >>>>> zfs_ctldev_destroy(zo, minor); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> } >>>>> >>>>> static int >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> index 72d4502..aec5219 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>>>> #define ZVOL_DUMPSIZE "dumpsize" >>>>> >>>>> /* >>>>> - * The spa_namespace_lock protects the zfsdev_state structure from >>>>> being >>>>> - * modified while it's being used, e.g. an open that comes in before a >>>>> - * create finishes. It also protects temporary opens of the dataset so >>>>> that, >>>>> + * This lock protects the zfsdev_state structure from being modified >>>>> + * while it's being used, e.g. an open that comes in before a create >>>>> + * finishes. It also protects temporary opens of the dataset so that, >>>>> * e.g., an open doesn't get a spurious EBUSY. >>>>> */ >>>>> +kmutex_t zfsdev_state_lock; >>>>> static uint32_t zvol_minors; >>>>> >>>>> typedef struct zvol_extent { >>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>>>> struct g_geom *gp; >>>>> zvol_state_t *zv = NULL; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> g_topology_lock(); >>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>>>> { >>>>> zvol_state_t *zv; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> zv = zvol_minor_lookup(name); >>>>> if (minor && zv) >>>>> *minor = zv->zv_minor; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (zv ? 0 : -1); >>>>> } >>>>> #endif /* sun */ >>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>>>> >>>>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> if (zvol_minor_lookup(name) != NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EEXIST)); >>>>> } >>>>> >>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>>>> >>>>> if (error) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> >>>>> #ifdef sun >>>>> if ((minor = zfsdev_minor_alloc()) == 0) { >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> >>>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EAGAIN)); >>>>> } >>>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EAGAIN)); >>>>> } >>>>> >>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>>>> ddi_remove_minor_node(zfs_dip, chrbuf); >>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EAGAIN)); >>>>> } >>>>> >>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>>>> >>>>> zvol_minors++; >>>>> >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> >>>>> zvol_geom_run(zv); >>>>> >>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>>>> minor_t minor = zv->zv_minor; >>>>> #endif >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> if (zv->zv_total_opens != 0) >>>>> return (SET_ERROR(EBUSY)); >>>>> >>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>>>> zvol_state_t *zv; >>>>> int rc; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> g_topology_lock(); >>>>> rc = zvol_remove_zv(zv); >>>>> g_topology_unlock(); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (rc); >>>>> } >>>>> >>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>>>> dmu_tx_t *tx; >>>>> int error; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> tx = dmu_tx_create(os); >>>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>>>> namelen = strlen(name); >>>>> >>>>> DROP_GIANT(); >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> g_topology_lock(); >>>>> >>>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>>>> } >>>>> >>>>> g_topology_unlock(); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> PICKUP_GIANT(); >>>>> } >>>>> >>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>>>> uint64_t volsize) >>>>> uint64_t old_volsize = 0ULL; >>>>> uint64_t readonly; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> zv = zvol_minor_lookup(name); >>>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> >>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>>>> uint64_t volsize) >>>>> out: >>>>> dmu_objset_rele(os, FTAG); >>>>> >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> >>>>> return (error); >>>>> } >>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>>>> count) >>>>> { >>>>> zvol_state_t *zv; >>>>> int err = 0; >>>>> - boolean_t locked = B_FALSE; >>>>> >>>>> - /* >>>>> - * Protect against recursively entering spa_namespace_lock >>>>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>>>> - * This is needed since we replaced upstream zfsdev_state_lock >>>>> - * with spa_namespace_lock in the ZVOL code. >>>>> - * We are using the same trick as spa_open(). >>>>> - * Note that calls in zvol_first_open which need to resolve >>>>> - * pool name to a spa object will enter spa_open() >>>>> - * recursively, but that function already has all the >>>>> - * necessary protection. >>>>> - */ >>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>> - mutex_enter(&spa_namespace_lock); >>>>> - locked = B_TRUE; >>>>> - } >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> zv = pp->private; >>>>> if (zv == NULL) { >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> >>>>> if (zv->zv_total_opens == 0) >>>>> err = zvol_first_open(zv); >>>>> if (err) { >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (err); >>>>> } >>>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>>>> count) >>>>> #endif >>>>> >>>>> zv->zv_total_opens += count; >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> >>>>> return (err); >>>>> out: >>>>> if (zv->zv_total_opens == 0) >>>>> zvol_last_close(zv); >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (err); >>>>> } >>>>> >>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>>>> count) >>>>> { >>>>> zvol_state_t *zv; >>>>> int error = 0; >>>>> - boolean_t locked = B_FALSE; >>>>> >>>>> - /* See comment in zvol_open(). */ >>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>> - mutex_enter(&spa_namespace_lock); >>>>> - locked = B_TRUE; >>>>> - } >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> zv = pp->private; >>>>> if (zv == NULL) { >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> >>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>>>> count) >>>>> if (zv->zv_total_opens == 0) >>>>> zvol_last_close(zv); >>>>> >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> >>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> int error = 0; >>>>> rl_t *rl; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>>>> >>>>> if (zv == NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> ASSERT(zv->zv_total_opens > 0); >>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> dki.dki_ctype = DKC_UNKNOWN; >>>>> dki.dki_unit = getminor(dev); >>>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>>>> zv->zv_min_bs); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>>>> error = SET_ERROR(EFAULT); >>>>> return (error); >>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>>>> dkm.dki_media_type = DK_UNKNOWN; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>>>> error = SET_ERROR(EFAULT); >>>>> return (error); >>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> uint64_t vs = zv->zv_volsize; >>>>> uint8_t bs = zv->zv_min_bs; >>>>> >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> error = zvol_getefi((void *)arg, flag, vs, bs); >>>>> return (error); >>>>> } >>>>> >>>>> case DKIOCFLUSHWRITECACHE: >>>>> dkc = (struct dk_callback *)arg; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> } >>>>> if (wce) { >>>>> zv->zv_flags |= ZVOL_WCE; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> } else { >>>>> zv->zv_flags &= ~ZVOL_WCE; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>> } >>>>> return (0); >>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> break; >>>>> >>>>> } >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> #endif /* sun */ >>>>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>>>> { >>>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>>>> 1) == 0); >>>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>>>> ZFS_LOG(1, "ZVOL Initialized."); >>>>> } >>>>> >>>>> void >>>>> zvol_fini(void) >>>>> { >>>>> + mutex_destroy(&zfsdev_state_lock); >>>>> ddi_soft_state_fini(&zfsdev_state); >>>>> ZFS_LOG(1, "ZVOL Deinitialized."); >>>>> } >>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >>>>> uint64_t version = spa_version(spa); >>>>> enum zio_checksum checksum; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>>>> >>>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>>>> *newname) >>>>> struct g_provider *pp; >>>>> zvol_state_t *zv; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> g_topology_assert(); >>>>> >>>>> pp = LIST_FIRST(&gp->provider); >>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char >>>>> *newname) >>>>> newnamelen = strlen(newname); >>>>> >>>>> DROP_GIANT(); >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> g_topology_lock(); >>>>> >>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char >>>>> *newname) >>>>> } >>>>> >>>>> g_topology_unlock(); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> PICKUP_GIANT(); >>>>> } >>>>> -- >>>>> 1.8.4.2 >>>>> >>>> >>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>>>> From: Richard Kojedzinszky >>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>>>> >>>>> --- >>>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>>>> +++++++++++++++------- >>>>> 2 files changed, 42 insertions(+), 24 deletions(-) >>>>> >>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> index 94383d6..4cac053 100644 >>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>>>> * progress on this vnode. >>>>> */ >>>>> >>>>> + vn_lock(cvp, lktype); >>>>> + >>>>> for (;;) { >>>>> /* >>>>> * Reached the end of the mount chain? >>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>>>> if (vfsp == NULL) >>>>> break; >>>>> error = vfs_busy(vfsp, 0); >>>>> - /* >>>>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>>>> - */ >>>>> - if (tvp != NULL) >>>>> - vput(cvp); >>>>> - else >>>>> - vrele(cvp); >>>>> + VOP_UNLOCK(cvp, 0); >>>>> if (error) >>>>> return (error); >>>>> >>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>>>> vfs_unbusy(vfsp); >>>>> if (error != 0) >>>>> return (error); >>>>> + >>>>> + VN_RELE(cvp); >>>>> + >>>>> cvp = tvp; >>>>> } >>>>> >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> index 28ab1fa..d3464b4 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>>>> return (0); >>>>> } >>>>> >>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>>>> + * On return the passed in vp is unlocked */ >>>>> +static zfsctl_snapdir_t * >>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>>>> +{ >>>>> + gfs_dir_t *dp = vp->v_data; >>>>> + *dvpp = dp->gfsd_file.gfs_parent; >>>>> + zfsctl_snapdir_t *sdp; >>>>> + >>>>> + VN_HOLD(*dvpp); >>>>> + VOP_UNLOCK(vp, 0); >>>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>>>> + sdp = (*dvpp)->v_data; >>>>> + VOP_UNLOCK(*dvpp, 0); >>>>> + >>>>> + return (sdp); >>>>> +} >>>>> + >>>>> #ifdef sun >>>>> vnodeops_t *zfsctl_ops_root; >>>>> vnodeops_t *zfsctl_ops_snapdir; >>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>>>> * The snapshot was unmounted behind our backs, >>>>> * try to remount it. >>>>> */ >>>>> + VOP_UNLOCK(*vpp, 0); >>>>> + VN_HOLD(*vpp); >>>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>>>> snapname) == 0); >>>>> goto domount; >>>>> } else { >>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>>>> (void) strcpy(sep->se_name, nm); >>>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>>>> dmu_objset_id(snap)); >>>>> - VN_HOLD(*vpp); >>>>> avl_insert(&sdp->sd_snaps, sep, where); >>>>> >>>>> dmu_objset_rele(snap, FTAG); >>>>> @@ -1075,6 +1095,7 @@ domount: >>>>> (void) snprintf(mountpoint, mountpoint_len, >>>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>>>> + VN_HOLD(*vpp); >>>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>>>> kmem_free(mountpoint, mountpoint_len); >>>>> if (err == 0) { >>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>>>> int locked; >>>>> vnode_t *dvp; >>>>> >>>>> - if (vp->v_count > 0) >>>>> - goto end; >>>>> - >>>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>>>> - sdp = dvp->v_data; >>>>> - VOP_UNLOCK(dvp, 0); >>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>> >>>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>>>> mutex_enter(&sdp->sd_lock); >>>>> >>>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>>>> + >>>>> + if (vp->v_count > 0) { >>>>> + mutex_exit(&sdp->sd_lock); >>>>> + return (0); >>>>> + } >>>>> + >>>>> ASSERT(!vn_ismntpt(vp)); >>>>> >>>>> sep = avl_first(&sdp->sd_snaps); >>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>>>> mutex_exit(&sdp->sd_lock); >>>>> VN_RELE(dvp); >>>>> >>>>> -end: >>>>> /* >>>>> * Dispose of the vnode for the snapshot mount point. >>>>> * This is safe to do because once this entry has been removed >>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>>>> static int >>>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>>>> { >>>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>>>> - vnode_t *dvp, *vp; >>>>> + vnode_t *dvp, *vp = ap->a_vp; >>>>> zfsctl_snapdir_t *sdp; >>>>> zfs_snapentry_t *sep; >>>>> - int error; >>>>> + int error = 0; >>>>> >>>>> - ASSERT(zfsvfs->z_ctldir != NULL); >>>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>>>> - if (error != 0) >>>>> - return (error); >>>>> - sdp = dvp->v_data; >>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>> >>>>> mutex_enter(&sdp->sd_lock); >>>>> + >>>>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>>>> + >>>>> sep = avl_first(&sdp->sd_snaps); >>>>> while (sep != NULL) { >>>>> vp = sep->se_root; >>>>> -- >>>>> 1.8.4.2 >>>>> >>>> >>>> -- >>>> Pawel Jakub Dawidek http://www.wheelsystems.com >>>> FreeBSD committer http://www.FreeBSD.org >>>> Am I Evil? Yes, I Am! http://mobter.com >>>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> >