From owner-freebsd-fs@FreeBSD.ORG  Mon Dec 23 11:06:46 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 773AB589
 for <freebsd-fs@FreeBSD.org>; Mon, 23 Dec 2013 11:06:46 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 59A2811B4
 for <freebsd-fs@FreeBSD.org>; Mon, 23 Dec 2013 11:06:46 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rBNB6kdi029973
 for <freebsd-fs@FreeBSD.org>; Mon, 23 Dec 2013 11:06:46 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rBNB6jFw029971
 for freebsd-fs@FreeBSD.org; Mon, 23 Dec 2013 11:06:45 GMT
 (envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 23 Dec 2013 11:06:45 GMT
Message-Id: <201312231106.rBNB6jFw029971@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
 owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@freebsd.org>
To: freebsd-fs@FreeBSD.org
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Dec 2013 11:06:46 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/184478  fs         [smbfs] mount_smbfs cannot read/write files
o kern/182570  fs         [zfs] [patch] ZFS panic in receive
o kern/182536  fs         [zfs] zfs deadlock
o kern/181966  fs         [zfs] Kernel panic in ZFS I/O: solaris assert: BP_EQUA
o kern/181834  fs         [nfs] amd mounting NFS directories can drive a dead-lo
o kern/181565  fs         [swap] Problem with vnode-backed swap space.
o kern/181377  fs         [zfs] zfs recv causes an inconsistant pool
o kern/181281  fs         [msdosfs] stack trace after successfull 'umount /mnt' 
o kern/181082  fs         [fuse] [ntfs] Write to mounted NTFS filesystem using F
o kern/180979  fs         [netsmb][patch]: Fix large files handling
o kern/180876  fs         [zfs] [hast] ZFS with trim,bio_flush or bio_delete loc
o kern/180678  fs         [NFS] succesfully exported filesystems being reported 
o kern/180438  fs         [smbfs] [patch] mount_smbfs fails on arm because of wr
p kern/180236  fs         [zfs] [nullfs] Leakage free space using ZFS with nullf
o kern/178854  fs         [ufs] FreeBSD kernel crash in UFS
s kern/178467  fs         [zfs] [request] Optimized Checksum Code for ZFS
o kern/178412  fs         [smbfs] Coredump when smbfs mounted
o kern/178388  fs         [zfs] [patch] allow up to 8MB recordsize
o kern/178387  fs         [zfs] [patch] sparse files performance improvements
o kern/178349  fs         [zfs] zfs scrub on deduped data could be much less see
o kern/178329  fs         [zfs] extended attributes leak
o kern/178238  fs         [nullfs] nullfs don't release i-nodes on unlink.
f kern/178231  fs         [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr
o kern/177985  fs         [zfs] disk usage problem when copying from one zfs dat
o kern/177971  fs         [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3,
o kern/177966  fs         [zfs] resilver completes but subsequent scrub reports 
o kern/177658  fs         [ufs] FreeBSD panics after get full filesystem with uf
o kern/177536  fs         [zfs] zfs livelock (deadlock) with high write-to-disk 
o kern/177445  fs         [hast] HAST panic
o kern/177240  fs         [zfs] zpool import failed with state UNAVAIL but all d
o kern/176978  fs         [zfs] [panic] zfs send -D causes "panic: System call i
o kern/176857  fs         [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic 
o bin/176253   fs         zpool(8): zfs pool indentation is misleading/wrong
o kern/176141  fs         [zfs] sharesmb=on makes errors for sharenfs, and still
o kern/175950  fs         [zfs] Possible deadlock in zfs after long uptime
o kern/175897  fs         [zfs] operations on readonly zpool hang
o kern/175449  fs         [unionfs] unionfs and devfs misbehaviour
o kern/175179  fs         [zfs] ZFS may attach wrong device on move
o kern/175071  fs         [ufs] [panic] softdep_deallocate_dependencies: unrecov
o kern/174372  fs         [zfs] Pagefault appears to be related to ZFS
o kern/174315  fs         [zfs] chflags uchg not supported
o kern/174310  fs         [zfs] root point mounting broken on CURRENT with multi
o kern/174279  fs         [ufs] UFS2-SU+J journal and filesystem corruption
o kern/173830  fs         [zfs] Brain-dead simple change to ZFS error descriptio
o kern/173718  fs         [zfs] phantom directory in zraid2 pool
f kern/173657  fs         [nfs] strange UID map with nfsuserd
o kern/173363  fs         [zfs] [panic] Panic on 'zpool replace' on readonly poo
o kern/173136  fs         [unionfs] mounting above the NFS read-only share panic
o kern/172942  fs         [smbfs] Unmounting a smb mount when the server became 
o kern/172348  fs         [unionfs] umount -f of filesystem in use with readonly
o kern/172334  fs         [unionfs] unionfs permits recursive union mounts; caus
o kern/171626  fs         [tmpfs] tmpfs should be noisier when the requested siz
o kern/171415  fs         [zfs] zfs recv fails with "cannot receive incremental 
o kern/170945  fs         [gpt] disk layout not portable between direct connect 
o bin/170778   fs         [zfs] [panic] FreeBSD panics randomly
o kern/170680  fs         [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA
o kern/170497  fs         [xfs][panic] kernel will panic whenever I ls a mounted
o kern/169945  fs         [zfs] [panic] Kernel panic while importing zpool (afte
o kern/169480  fs         [zfs] ZFS stalls on heavy I/O
o kern/169398  fs         [zfs] Can't remove file with permanent error
o kern/169339  fs         panic while " : > /etc/123"
o kern/169319  fs         [zfs] zfs resilver can't complete
o kern/168947  fs         [nfs] [zfs] .zfs/snapshot directory is messed up when 
o kern/168942  fs         [nfs] [hang] nfsd hangs after being restarted (not -HU
o kern/168158  fs         [zfs] incorrect parsing of sharenfs options in zfs (fs
o kern/167979  fs         [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste
o kern/167977  fs         [smbfs] mount_smbfs results are differ when utf-8 or U
o kern/167688  fs         [fusefs] Incorrect signal handling with direct_io
o kern/167685  fs         [zfs] ZFS on USB drive prevents shutdown / reboot
o kern/167612  fs         [portalfs] The portal file system gets stuck inside po
o kern/167272  fs         [zfs] ZFS Disks reordering causes ZFS to pick the wron
o kern/167260  fs         [msdosfs] msdosfs disk was mounted the second time whe
o kern/167109  fs         [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene
o kern/167105  fs         [nfs] mount_nfs can not handle source exports wiht mor
o kern/167067  fs         [zfs] [panic] ZFS panics the server
o kern/167065  fs         [zfs] boot fails when a spare is the boot disk
o kern/167048  fs         [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF
o kern/166912  fs         [ufs] [panic] Panic after converting Softupdates to jo
o kern/166851  fs         [zfs] [hang] Copying directory from the mounted UFS di
o kern/166477  fs         [nfs] NFS data corruption.
o kern/165950  fs         [ffs] SU+J and fsck problem
o kern/165521  fs         [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31
o kern/165392  fs         Multiple mkdir/rmdir fails with errno 31
o kern/165087  fs         [unionfs] lock violation in unionfs
o kern/164472  fs         [ufs] fsck -B panics on particular data inconsistency
o kern/164370  fs         [zfs] zfs destroy for snapshot fails on i386 and sparc
o kern/164261  fs         [nullfs] [patch] fix panic with NFS served from NULLFS
o kern/164256  fs         [zfs] device entry for volume is not created after zfs
o kern/164184  fs         [ufs] [panic] Kernel panic with ufs_makeinode
o kern/163801  fs         [md] [request] allow mfsBSD legacy installed in 'swap'
o kern/163770  fs         [zfs] [hang] LOR between zfs&syncer + vnlru leading to
o kern/163501  fs         [nfs] NFS exporting a dir and a subdir in that dir to 
o kern/162944  fs         [coda] Coda file system module looks broken in 9.0
o kern/162860  fs         [zfs] Cannot share ZFS filesystem to hosts with a hyph
o kern/162751  fs         [zfs] [panic] kernel panics during file operations
o kern/162591  fs         [nullfs] cross-filesystem nullfs does not work as expe
o kern/162519  fs         [zfs] "zpool import" relies on buggy realpath() behavi
o kern/161968  fs         [zfs] [hang] renaming snapshot with -r including a zvo
o kern/161864  fs         [ufs] removing journaling from UFS partition fails on 
o kern/161579  fs         [smbfs] FreeBSD sometimes panics when an smb share is 
o kern/161533  fs         [zfs] [panic] zfs receive panic: system ioctl returnin
o kern/161438  fs         [zfs] [panic] recursed on non-recursive spa_namespace_
o kern/161424  fs         [nullfs] __getcwd() calls fail when used on nullfs mou
o kern/161280  fs         [zfs] Stack overflow in gptzfsboot
o kern/161205  fs         [nfs] [pfsync] [regression] [build] Bug report freebsd
o kern/161169  fs         [zfs] [panic] ZFS causes kernel panic in dbuf_dirty
o kern/161112  fs         [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3
o kern/160893  fs         [zfs] [panic] 9.0-BETA2 kernel panic
f kern/160860  fs         [ufs] Random UFS root filesystem corruption with SU+J 
o kern/160801  fs         [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o
o kern/160790  fs         [fusefs] [panic] VPUTX: negative ref count with FUSE
o kern/160777  fs         [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo
o kern/160706  fs         [zfs] zfs bootloader fails when a non-root vdev exists
o kern/160591  fs         [zfs] Fail to boot on zfs root with degraded raidz2 [r
o kern/160410  fs         [smbfs] [hang] smbfs hangs when transferring large fil
o kern/160283  fs         [zfs] [patch] 'zfs list' does abort in make_dataset_ha
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         amd(8) ICMP storm and unkillable process.
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
p kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o bin/153142   fs         [zfs] ls -l outputs `ls: ./.zfs: Operation not support
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
o kern/145750  fs         [unionfs] [hang] unionfs locks the machine
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
f bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141950  fs         [unionfs] [lor] ufs/unionfs/ufs Lock order reversal
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/137588  fs         [unionfs] [lor] LOR nfs/ufs/nfs
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
p kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126973  fs         [unionfs] [hang] System hang with unionfs and init chr
o kern/126553  fs         [unionfs] unionfs move directory problem 2 (files appe
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o bin/123574   fs         [unionfs] df(1) -t option destroys info for unionfs (a
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o kern/121385  fs         [unionfs] unionfs cross mount -> kernel panic
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o kern/118318  fs         [nfs] NFS server hangs under special circumstances
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/67326   fs         [msdosfs] crash after attempt to mount write protected
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t
o kern/9619    fs         [nfs] Restarting mountd kills existing mounts

337 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Dec 23 16:55:36 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B317B155;
 Mon, 23 Dec 2013 16:55:36 +0000 (UTC)
Received: from krichy.tvnetwork.hu (krichy.tvnetwork.hu [109.61.101.194])
 by mx1.freebsd.org (Postfix) with ESMTP id 172871B6C;
 Mon, 23 Dec 2013 16:55:35 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id 7EFF4A6D3; Mon, 23 Dec 2013 17:55:08 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id 7C792A6D2;
 Mon, 23 Dec 2013 17:55:08 +0100 (CET)
Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <20131220134405.GE1658@garage.freebsd.pl>
Message-ID: <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Dec 2013 16:55:36 -0000

Dear Pawel,

Thanks for your response. I hope someone will review my work.

Secondly, I think I;ve found another deadlock scenario:

When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
held. Then it does an unmount of all snapshots, which in turn goes to 
zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.

Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock 
does a snapshot mount, which somewhere tries to acquire 
spa_namespace_lock. So it deadlocks. Currently I dont know how to 
workaround this.

Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?

What if we release it function enter, and re-acquire upon exit?

Thanks in advance,


Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:

> Date: Fri, 20 Dec 2013 14:44:05 +0100
> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> To: krichy@tvnetwork.hu
> Cc: freebsd-fs@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>> Dear devs,
>>
>> I am attaching a more clear patch/fix for my snapshot handling issues
>> (0002), but I would be happy if some ZFS expert would comment it. I am
>> trying to solve it at least for two weeks now, and an ACK or a NACK would
>> be nice from someone. Also a commit is reverted since it also caused
>> deadlocks. I've read its comments, which also eliminates deadlocks, but I
>> did not find any reference how to produce that deadlock. In my view
>> reverting that makes my issues disappear, but I dont know what new cases
>> will it rise.
>
> Richard,
>
> I won't be able to analyse it myself anytime soon, because of other
> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
> there will be able to help you.
>
>> I've rewritten traverse() to make more like upstream, added two extra
>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what
>> was passed to it (I dont know even that upon creating a snapshot vnode why
>> is that extra two holds needed for the vnode.) And unfortunately, due to
>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also
>> cause deadlocks, so zfsctl_snapshot_inactive() and
>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>
>> After this, one may also get a deadlock, when a simple access would call
>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes
>> should always be covered, but after some stress test, sometimes we hit
>> that call, and that can cause again deadlocks. In our environment I've
>> just uncommented that callback, which returns ENODIR on some calls, but at
>> least not a deadlock.
>>
>> The attached script can be used to reproduce my cases (would one confirm
>> that?), and after the patches applied, they disappear (confirm?).
>>
>> Thanks,
>>
>>
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>>
>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>
>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>> From: krichy@tvnetwork.hu
>>> To: pjd@freebsd.org
>>> Cc: freebsd-fs@freebsd.org
>>> Subject: Re: kern/184677 (fwd)
>>>
>>> Dear devs,
>>>
>>> I will sum up my experience regarding the issue:
>>>
>>> The sympton is that a concurrent 'zfs send -R' and some activity on the
>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>
>>> After investigating the problem, I found that zfs send umounts the snapshots,
>>> and that causes the deadlock, so later I tested only with concurrent umount
>>> and the "activity". More later I found that listing the snapshots in
>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I used
>>> them for the tests. But for my surprise, instead of a deadlock, a recursive
>>> lock panic has arised.
>>>
>>> The vnode for the ".zfs/snapshot/" directory contains zfs's zfsctl_snapdir_t
>>> structure (sdp). This contains a tree of mounted snapshots, and each entry
>>> (sep) contains the vnode of entry on which the snapshot is mounted on top
>>> (se_root). The strange is that the se_root member does not hold a reference
>>> for the vnode, just a simple pointer to it.
>>>
>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is locked,
>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it
>>> exists already. If it founds no entry, does the mount. In the case of an
>>> entry was found, the se_root member contains the vnode which the snapshot is
>>> mounted on. Thus, a reference is taken for it, and the traverse() call will
>>> resolve to the real root vnode of the mounted snapshot, returning it as
>>> locked. (Examining the traverse() code I've found that it did not follow
>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>
>>> On the other way, when an umount is issued, the se_root vnode looses its last
>>> reference (as only the mountpoint holds one for it), it goes through the
>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is called
>>> with a locked vnode, so this is a deadlock race condition. While
>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse()
>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock on
>>> se_root while tries to access the sdp lock.
>>>
>>> The zfsctl_snapshot_inactive() has an if statement checking the v_usecount,
>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it is
>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() path
>>> assumes that the vnode remains inactive (as opposed to illumos, at least how
>>> i read the code). So zfsctl_snapshot_inactive() must free resources while in
>>> a locked state. I was a bit confused, and probably that is why the previously
>>> posted patch is as is.
>>>
>>> Maybe if I had some clues on the directions of this problem, I could have
>>> worked more for a nicer, shorter solution.
>>>
>>> Please someone comment on my post.
>>>
>>> Regards,
>>>
>>>
>>>
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>>
>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>
>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>> From: krichy@tvnetwork.hu
>>>> To: pjd@freebsd.org
>>>> Cc: freebsd-fs@freebsd.org
>>>> Subject: Re: kern/184677 (fwd)
>>>>
>>>> Dear PJD,
>>>>
>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>> regarding some issues in ZFS. Please give some advice for me, where to look
>>>> for solutions, or how could I help to resolve those issues.
>>>>
>>>> Regards,
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>
>>>> ---------- Forwarded message ----------
>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>> From: krichy@tvnetwork.hu
>>>> To: freebsd-fs@freebsd.org
>>>> Subject: Re: kern/184677
>>>>
>>>>
>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on Fri
>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>> situation. Any comments on this?
>>>>
>>>>
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>
>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>
>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>> From: krichy@tvnetwork.hu
>>>>> To: freebsd-fs@freebsd.org
>>>>> Subject: kern/184677
>>>>>
>>>>> Dear devs,
>>>>>
>>>>> I've attached a patch, which makes the recursive lockmgr disappear, and
>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>
>>>>> But unfortunately, my original problem is still not solved, maybe the same
>>>>> as Ryan's:
>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>
>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to acquire
>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a
>>>>> deadlock scenario. I looked at illumos's code, and for some reason they
>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock with
>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>
>>>>> I would like to help making ZFS more stable on freebsd also with its whole
>>>>> functionality. I would be very thankful if some expert would give some
>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>>
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>
>>>
>
>
>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and
>>  replace it with the"
>>
>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>
>> Conflicts:
>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>> ---
>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 +++++++++------------
>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>
>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>> index af2def2..8272c4d 100644
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>  extern minor_t zfsdev_minor_alloc(void);
>>
>>  extern void *zfsdev_state;
>> +extern kmutex_t zfsdev_state_lock;
>>
>>  #endif	/* _KERNEL */
>>
>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>> index 15685a5..5c3e9f3 100644
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize,
>>  	struct g_provider *pp;
>>  	struct g_consumer *cp;
>>  	size_t bufsize;
>> -	int error;
>> +	int error, lock;
>>
>>  	/*
>>  	 * We must have a pathname, and it must be absolute.
>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize,
>>
>>  	vd->vdev_tsd = NULL;
>>
>> +	if (mutex_owned(&spa_namespace_lock)) {
>> +		mutex_exit(&spa_namespace_lock);
>> +		lock = 1;
>> +	} else {
>> +		lock = 0;
>> +	}
>>  	DROP_GIANT();
>>  	g_topology_lock();
>>  	error = 0;
>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize,
>>  	    !ISP2(cp->provider->sectorsize)) {
>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>  		    vd->vdev_path);
>> +
>> +		g_topology_lock();
>>  		vdev_geom_detach(cp, 0);
>> +		g_topology_unlock();
>> +
>>  		error = EINVAL;
>>  		cp = NULL;
>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize,
>>  	}
>>  	g_topology_unlock();
>>  	PICKUP_GIANT();
>> +	if (lock)
>> +		mutex_enter(&spa_namespace_lock);
>>  	if (cp == NULL) {
>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>  		return (error);
>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>> index e9fba26..91becde 100644
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>  	static minor_t last_minor;
>>  	minor_t m;
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>
>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>  		if (m > ZFSDEV_MAX_MINOR)
>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>  	minor_t minor;
>>  	zfs_soft_state_t *zs;
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>
>>  	minor = zfsdev_minor_alloc();
>>  	if (minor == 0)
>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>  static void
>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>  {
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>
>>  	zfs_onexit_destroy(zo);
>>  	ddi_soft_state_free(zfsdev_state, minor);
>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, struct thread *td)
>>
>>  	/* This is the control device. Allocate a new minor if requested. */
>>  	if (flag & FEXCL) {
>> -		mutex_enter(&spa_namespace_lock);
>> +		mutex_enter(&zfsdev_state_lock);
>>  		error = zfs_ctldev_init(devp);
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  	}
>>
>>  	return (error);
>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>  	if (minor == 0)
>>  		return;
>>
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>  	if (zo == NULL) {
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return;
>>  	}
>>  	zfs_ctldev_destroy(zo, minor);
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  }
>>
>>  static int
>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>> index 72d4502..aec5219 100644
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>
>>  /*
>> - * The spa_namespace_lock protects the zfsdev_state structure from being
>> - * modified while it's being used, e.g. an open that comes in before a
>> - * create finishes.  It also protects temporary opens of the dataset so that,
>> + * This lock protects the zfsdev_state structure from being modified
>> + * while it's being used, e.g. an open that comes in before a create
>> + * finishes.  It also protects temporary opens of the dataset so that,
>>   * e.g., an open doesn't get a spurious EBUSY.
>>   */
>> +kmutex_t zfsdev_state_lock;
>>  static uint32_t zvol_minors;
>>
>>  typedef struct zvol_extent {
>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>  	struct g_geom *gp;
>>  	zvol_state_t *zv = NULL;
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>
>>  	g_topology_lock();
>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>  {
>>  	zvol_state_t *zv;
>>
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>  	zv = zvol_minor_lookup(name);
>>  	if (minor && zv)
>>  		*minor = zv->zv_minor;
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	return (zv ? 0 : -1);
>>  }
>>  #endif	/* sun */
>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>
>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>
>>  	if (zvol_minor_lookup(name) != NULL) {
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(EEXIST));
>>  	}
>>
>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>
>>  	if (error) {
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (error);
>>  	}
>>
>>  #ifdef sun
>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>  		dmu_objset_disown(os, FTAG);
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(ENXIO));
>>  	}
>>
>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>  		dmu_objset_disown(os, FTAG);
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(EAGAIN));
>>  	}
>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>  		ddi_soft_state_free(zfsdev_state, minor);
>>  		dmu_objset_disown(os, FTAG);
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(EAGAIN));
>>  	}
>>
>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>  		ddi_soft_state_free(zfsdev_state, minor);
>>  		dmu_objset_disown(os, FTAG);
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(EAGAIN));
>>  	}
>>
>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>
>>  	zvol_minors++;
>>
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>
>>  	zvol_geom_run(zv);
>>
>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>  	minor_t minor = zv->zv_minor;
>>  #endif
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>  	if (zv->zv_total_opens != 0)
>>  		return (SET_ERROR(EBUSY));
>>
>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>  	zvol_state_t *zv;
>>  	int rc;
>>
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(ENXIO));
>>  	}
>>  	g_topology_lock();
>>  	rc = zvol_remove_zv(zv);
>>  	g_topology_unlock();
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	return (rc);
>>  }
>>
>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>  	dmu_tx_t *tx;
>>  	int error;
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>
>>  	tx = dmu_tx_create(os);
>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>  	namelen = strlen(name);
>>
>>  	DROP_GIANT();
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>  	g_topology_lock();
>>
>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>  	}
>>
>>  	g_topology_unlock();
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	PICKUP_GIANT();
>>  }
>>
>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, uint64_t volsize)
>>  	uint64_t old_volsize = 0ULL;
>>  	uint64_t readonly;
>>
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>  	zv = zvol_minor_lookup(name);
>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (error);
>>  	}
>>
>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, uint64_t volsize)
>>  out:
>>  	dmu_objset_rele(os, FTAG);
>>
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>
>>  	return (error);
>>  }
>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int count)
>>  {
>>  	zvol_state_t *zv;
>>  	int err = 0;
>> -	boolean_t locked = B_FALSE;
>>
>> -	/*
>> -	 * Protect against recursively entering spa_namespace_lock
>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>> -	 * with spa_namespace_lock in the ZVOL code.
>> -	 * We are using the same trick as spa_open().
>> -	 * Note that calls in zvol_first_open which need to resolve
>> -	 * pool name to a spa object will enter spa_open()
>> -	 * recursively, but that function already has all the
>> -	 * necessary protection.
>> -	 */
>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>> -		mutex_enter(&spa_namespace_lock);
>> -		locked = B_TRUE;
>> -	}
>> +	mutex_enter(&zfsdev_state_lock);
>>
>>  	zv = pp->private;
>>  	if (zv == NULL) {
>> -		if (locked)
>> -			mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(ENXIO));
>>  	}
>>
>>  	if (zv->zv_total_opens == 0)
>>  		err = zvol_first_open(zv);
>>  	if (err) {
>> -		if (locked)
>> -			mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (err);
>>  	}
>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int count)
>>  #endif
>>
>>  	zv->zv_total_opens += count;
>> -	if (locked)
>> -		mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>
>>  	return (err);
>>  out:
>>  	if (zv->zv_total_opens == 0)
>>  		zvol_last_close(zv);
>> -	if (locked)
>> -		mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	return (err);
>>  }
>>
>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int count)
>>  {
>>  	zvol_state_t *zv;
>>  	int error = 0;
>> -	boolean_t locked = B_FALSE;
>>
>> -	/* See comment in zvol_open(). */
>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>> -		mutex_enter(&spa_namespace_lock);
>> -		locked = B_TRUE;
>> -	}
>> +	mutex_enter(&zfsdev_state_lock);
>>
>>  	zv = pp->private;
>>  	if (zv == NULL) {
>> -		if (locked)
>> -			mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(ENXIO));
>>  	}
>>
>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int count)
>>  	if (zv->zv_total_opens == 0)
>>  		zvol_last_close(zv);
>>
>> -	if (locked)
>> -		mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	return (error);
>>  }
>>
>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
>>  	int error = 0;
>>  	rl_t *rl;
>>
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>
>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>
>>  	if (zv == NULL) {
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		return (SET_ERROR(ENXIO));
>>  	}
>>  	ASSERT(zv->zv_total_opens > 0);
>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
>>  		dki.dki_ctype = DKC_UNKNOWN;
>>  		dki.dki_unit = getminor(dev);
>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - zv->zv_min_bs);
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>  			error = SET_ERROR(EFAULT);
>>  		return (error);
>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>  		dkm.dki_media_type = DK_UNKNOWN;
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>  			error = SET_ERROR(EFAULT);
>>  		return (error);
>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
>>  			uint64_t vs = zv->zv_volsize;
>>  			uint8_t bs = zv->zv_min_bs;
>>
>> -			mutex_exit(&spa_namespace_lock);
>> +			mutex_exit(&zfsdev_state_lock);
>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>  			return (error);
>>  		}
>>
>>  	case DKIOCFLUSHWRITECACHE:
>>  		dkc = (struct dk_callback *)arg;
>> -		mutex_exit(&spa_namespace_lock);
>> +		mutex_exit(&zfsdev_state_lock);
>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
>>  			}
>>  			if (wce) {
>>  				zv->zv_flags |= ZVOL_WCE;
>> -				mutex_exit(&spa_namespace_lock);
>> +				mutex_exit(&zfsdev_state_lock);
>>  			} else {
>>  				zv->zv_flags &= ~ZVOL_WCE;
>> -				mutex_exit(&spa_namespace_lock);
>> +				mutex_exit(&zfsdev_state_lock);
>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>  			}
>>  			return (0);
>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp)
>>  		break;
>>
>>  	}
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	return (error);
>>  }
>>  #endif	/* sun */
>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>  {
>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>  	    1) == 0);
>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>  }
>>
>>  void
>>  zvol_fini(void)
>>  {
>> +	mutex_destroy(&zfsdev_state_lock);
>>  	ddi_soft_state_fini(&zfsdev_state);
>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>  }
>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize)
>>  	uint64_t version = spa_version(spa);
>>  	enum zio_checksum checksum;
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>
>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char *newname)
>>  	struct g_provider *pp;
>>  	zvol_state_t *zv;
>>
>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>  	g_topology_assert();
>>
>>  	pp = LIST_FIRST(&gp->provider);
>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char *newname)
>>  	newnamelen = strlen(newname);
>>
>>  	DROP_GIANT();
>> -	mutex_enter(&spa_namespace_lock);
>> +	mutex_enter(&zfsdev_state_lock);
>>  	g_topology_lock();
>>
>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char *newname)
>>  	}
>>
>>  	g_topology_unlock();
>> -	mutex_exit(&spa_namespace_lock);
>> +	mutex_exit(&zfsdev_state_lock);
>>  	PICKUP_GIANT();
>>  }
>> --
>> 1.8.4.2
>>
>
>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>
>> ---
>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 +++++++++++++++-------
>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>
>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>> index 94383d6..4cac053 100644
>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>  	 * progress on this vnode.
>>  	 */
>>
>> +	vn_lock(cvp, lktype);
>> +
>>  	for (;;) {
>>  		/*
>>  		 * Reached the end of the mount chain?
>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>  		if (vfsp == NULL)
>>  			break;
>>  		error = vfs_busy(vfsp, 0);
>> -		/*
>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>> -		 */
>> -		if (tvp != NULL)
>> -			vput(cvp);
>> -		else
>> -			vrele(cvp);
>> +		VOP_UNLOCK(cvp, 0);
>>  		if (error)
>>  			return (error);
>>
>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>  		vfs_unbusy(vfsp);
>>  		if (error != 0)
>>  			return (error);
>> +
>> +		VN_RELE(cvp);
>> +
>>  		cvp = tvp;
>>  	}
>>
>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>> index 28ab1fa..d3464b4 100644
>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>  		return (0);
>>  }
>>
>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>> + * On return the passed in vp is unlocked */
>> +static zfsctl_snapdir_t *
>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>> +{
>> +	gfs_dir_t *dp = vp->v_data;
>> +	*dvpp = dp->gfsd_file.gfs_parent;
>> +	zfsctl_snapdir_t *sdp;
>> +
>> +	VN_HOLD(*dvpp);
>> +	VOP_UNLOCK(vp, 0);
>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>> +	sdp = (*dvpp)->v_data;
>> +	VOP_UNLOCK(*dvpp, 0);
>> +
>> +	return (sdp);
>> +}
>> +
>>  #ifdef sun
>>  vnodeops_t *zfsctl_ops_root;
>>  vnodeops_t *zfsctl_ops_snapdir;
>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>  			 * The snapshot was unmounted behind our backs,
>>  			 * try to remount it.
>>  			 */
>> +			VOP_UNLOCK(*vpp, 0);
>> +			VN_HOLD(*vpp);
>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, snapname) == 0);
>>  			goto domount;
>>  		} else {
>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>  	(void) strcpy(sep->se_name, nm);
>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, dmu_objset_id(snap));
>> -	VN_HOLD(*vpp);
>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>
>>  	dmu_objset_rele(snap, FTAG);
>> @@ -1075,6 +1095,7 @@ domount:
>>  	(void) snprintf(mountpoint, mountpoint_len,
>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>> +	VN_HOLD(*vpp);
>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>  	kmem_free(mountpoint, mountpoint_len);
>>  	if (err == 0) {
>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>  	int locked;
>>  	vnode_t *dvp;
>>
>> -	if (vp->v_count > 0)
>> -		goto end;
>> -
>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>> -	sdp = dvp->v_data;
>> -	VOP_UNLOCK(dvp, 0);
>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>
>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>  		mutex_enter(&sdp->sd_lock);
>>
>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>> +
>> +	if (vp->v_count > 0) {
>> +		mutex_exit(&sdp->sd_lock);
>> +		return (0);
>> +	}
>> +
>>  	ASSERT(!vn_ismntpt(vp));
>>
>>  	sep = avl_first(&sdp->sd_snaps);
>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>  		mutex_exit(&sdp->sd_lock);
>>  	VN_RELE(dvp);
>>
>> -end:
>>  	/*
>>  	 * Dispose of the vnode for the snapshot mount point.
>>  	 * This is safe to do because once this entry has been removed
>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>  static int
>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>  {
>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>> -	vnode_t *dvp, *vp;
>> +	vnode_t *dvp, *vp = ap->a_vp;
>>  	zfsctl_snapdir_t *sdp;
>>  	zfs_snapentry_t *sep;
>> -	int error;
>> +	int error = 0;
>>
>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>> -	if (error != 0)
>> -		return (error);
>> -	sdp = dvp->v_data;
>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>
>>  	mutex_enter(&sdp->sd_lock);
>> +
>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>> +
>>  	sep = avl_first(&sdp->sd_snaps);
>>  	while (sep != NULL) {
>>  		vp = sep->se_root;
>> --
>> 1.8.4.2
>>
>
> -- 
> Pawel Jakub Dawidek                       http://www.wheelsystems.com
> FreeBSD committer                         http://www.FreeBSD.org
> Am I Evil? Yes, I Am!                     http://mobter.com
>

From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 06:10:28 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 675076EB;
 Tue, 24 Dec 2013 06:10:28 +0000 (UTC)
Received: from krichy.tvnetwork.hu (krichy.tvnetwork.hu [109.61.101.194])
 by mx1.freebsd.org (Postfix) with ESMTP id BD56E1238;
 Tue, 24 Dec 2013 06:10:27 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id 56ACAA7C7; Tue, 24 Dec 2013 07:10:05 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id 53A43A7C6;
 Tue, 24 Dec 2013 07:10:05 +0100 (CET)
Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
Message-ID: <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
 <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 06:10:28 -0000

Dear devs,

A third one, again the snapshot dirs:

# cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
# yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null

Will deadlock in a few seconds.

The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries 
to lock .zfs, while the other just tries to do this in reversed order.

In a vop_lookup, why is the passed in vnode, and the returned vnode must 
be locked? Why is not enough to have it held?

Thanks in advance,
Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote:

> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
> From: krichy@tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> Dear Pawel,
>
> Thanks for your response. I hope someone will review my work.
>
> Secondly, I think I;ve found another deadlock scenario:
>
> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
> held. Then it does an unmount of all snapshots, which in turn goes to 
> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>
> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does a 
> snapshot mount, which somewhere tries to acquire spa_namespace_lock. So it 
> deadlocks. Currently I dont know how to workaround this.
>
> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>
> What if we release it function enter, and re-acquire upon exit?
>
> Thanks in advance,
>
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>
>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>> To: krichy@tvnetwork.hu
>> Cc: freebsd-fs@freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> 
>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>>> Dear devs,
>>> 
>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>> trying to solve it at least for two weeks now, and an ACK or a NACK would
>>> be nice from someone. Also a commit is reverted since it also caused
>>> deadlocks. I've read its comments, which also eliminates deadlocks, but I
>>> did not find any reference how to produce that deadlock. In my view
>>> reverting that makes my issues disappear, but I dont know what new cases
>>> will it rise.
>> 
>> Richard,
>> 
>> I won't be able to analyse it myself anytime soon, because of other
>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>> there will be able to help you.
>> 
>>> I've rewritten traverse() to make more like upstream, added two extra
>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what
>>> was passed to it (I dont know even that upon creating a snapshot vnode why
>>> is that extra two holds needed for the vnode.) And unfortunately, due to
>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also
>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>> 
>>> After this, one may also get a deadlock, when a simple access would call
>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes
>>> should always be covered, but after some stress test, sometimes we hit
>>> that call, and that can cause again deadlocks. In our environment I've
>>> just uncommented that callback, which returns ENODIR on some calls, but at
>>> least not a deadlock.
>>> 
>>> The attached script can be used to reproduce my cases (would one confirm
>>> that?), and after the patches applied, they disappear (confirm?).
>>> 
>>> Thanks,
>>> 
>>> 
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>> 
>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>> 
>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>> From: krichy@tvnetwork.hu
>>>> To: pjd@freebsd.org
>>>> Cc: freebsd-fs@freebsd.org
>>>> Subject: Re: kern/184677 (fwd)
>>>> 
>>>> Dear devs,
>>>> 
>>>> I will sum up my experience regarding the issue:
>>>> 
>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the
>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>> 
>>>> After investigating the problem, I found that zfs send umounts the 
>>>> snapshots,
>>>> and that causes the deadlock, so later I tested only with concurrent 
>>>> umount
>>>> and the "activity". More later I found that listing the snapshots in
>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I used
>>>> them for the tests. But for my surprise, instead of a deadlock, a 
>>>> recursive
>>>> lock panic has arised.
>>>> 
>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's 
>>>> zfsctl_snapdir_t
>>>> structure (sdp). This contains a tree of mounted snapshots, and each 
>>>> entry
>>>> (sep) contains the vnode of entry on which the snapshot is mounted on top
>>>> (se_root). The strange is that the se_root member does not hold a 
>>>> reference
>>>> for the vnode, just a simple pointer to it.
>>>> 
>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is 
>>>> locked,
>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it
>>>> exists already. If it founds no entry, does the mount. In the case of an
>>>> entry was found, the se_root member contains the vnode which the snapshot 
>>>> is
>>>> mounted on. Thus, a reference is taken for it, and the traverse() call 
>>>> will
>>>> resolve to the real root vnode of the mounted snapshot, returning it as
>>>> locked. (Examining the traverse() code I've found that it did not follow
>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>> 
>>>> On the other way, when an umount is issued, the se_root vnode looses its 
>>>> last
>>>> reference (as only the mountpoint holds one for it), it goes through the
>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is 
>>>> called
>>>> with a locked vnode, so this is a deadlock race condition. While
>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse()
>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock 
>>>> on
>>>> se_root while tries to access the sdp lock.
>>>> 
>>>> The zfsctl_snapshot_inactive() has an if statement checking the 
>>>> v_usecount,
>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it 
>>>> is
>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() 
>>>> path
>>>> assumes that the vnode remains inactive (as opposed to illumos, at least 
>>>> how
>>>> i read the code). So zfsctl_snapshot_inactive() must free resources while 
>>>> in
>>>> a locked state. I was a bit confused, and probably that is why the 
>>>> previously
>>>> posted patch is as is.
>>>> 
>>>> Maybe if I had some clues on the directions of this problem, I could have
>>>> worked more for a nicer, shorter solution.
>>>> 
>>>> Please someone comment on my post.
>>>> 
>>>> Regards,
>>>> 
>>>> 
>>>> 
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>> 
>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>> 
>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>> From: krichy@tvnetwork.hu
>>>>> To: pjd@freebsd.org
>>>>> Cc: freebsd-fs@freebsd.org
>>>>> Subject: Re: kern/184677 (fwd)
>>>>> 
>>>>> Dear PJD,
>>>>> 
>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>> regarding some issues in ZFS. Please give some advice for me, where to 
>>>>> look
>>>>> for solutions, or how could I help to resolve those issues.
>>>>> 
>>>>> Regards,
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> 
>>>>> ---------- Forwarded message ----------
>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>> From: krichy@tvnetwork.hu
>>>>> To: freebsd-fs@freebsd.org
>>>>> Subject: Re: kern/184677
>>>>> 
>>>>> 
>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on 
>>>>> Fri
>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>> situation. Any comments on this?
>>>>> 
>>>>> 
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> 
>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>> 
>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: freebsd-fs@freebsd.org
>>>>>> Subject: kern/184677
>>>>>> 
>>>>>> Dear devs,
>>>>>> 
>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, and
>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>> 
>>>>>> But unfortunately, my original problem is still not solved, maybe the 
>>>>>> same
>>>>>> as Ryan's:
>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>> 
>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to 
>>>>>> acquire
>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a
>>>>>> deadlock scenario. I looked at illumos's code, and for some reason they
>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock 
>>>>>> with
>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>> 
>>>>>> I would like to help making ZFS more stable on freebsd also with its 
>>>>>> whole
>>>>>> functionality. I would be very thankful if some expert would give some
>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>> 
>>>>>> Thanks in advance,
>>>>>> 
>>>>>> 
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> _______________________________________________
>>>>> freebsd-fs@freebsd.org mailing list
>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>> 
>>>> 
>> 
>> 
>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and
>>>  replace it with the"
>>> 
>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>> 
>>> Conflicts:
>>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>> ---
>>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 
>>> +++++++++------------
>>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>> 
>>> diff --git 
>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h 
>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>> index af2def2..8272c4d 100644
>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>  extern minor_t zfsdev_minor_alloc(void);
>>>
>>>  extern void *zfsdev_state;
>>> +extern kmutex_t zfsdev_state_lock;
>>>
>>>  #endif	/* _KERNEL */
>>> 
>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 
>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>> index 15685a5..5c3e9f3 100644
>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>> *max_psize,
>>>  	struct g_provider *pp;
>>>  	struct g_consumer *cp;
>>>  	size_t bufsize;
>>> -	int error;
>>> +	int error, lock;
>>>
>>>  	/*
>>>  	 * We must have a pathname, and it must be absolute.
>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>> *max_psize,
>>>
>>>  	vd->vdev_tsd = NULL;
>>> 
>>> +	if (mutex_owned(&spa_namespace_lock)) {
>>> +		mutex_exit(&spa_namespace_lock);
>>> +		lock = 1;
>>> +	} else {
>>> +		lock = 0;
>>> +	}
>>>  	DROP_GIANT();
>>>  	g_topology_lock();
>>>  	error = 0;
>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>> *max_psize,
>>>  	    !ISP2(cp->provider->sectorsize)) {
>>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>  		    vd->vdev_path);
>>> +
>>> +		g_topology_lock();
>>>  		vdev_geom_detach(cp, 0);
>>> +		g_topology_unlock();
>>> +
>>>  		error = EINVAL;
>>>  		cp = NULL;
>>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>> *max_psize,
>>>  	}
>>>  	g_topology_unlock();
>>>  	PICKUP_GIANT();
>>> +	if (lock)
>>> +		mutex_enter(&spa_namespace_lock);
>>>  	if (cp == NULL) {
>>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>  		return (error);
>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 
>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>> index e9fba26..91becde 100644
>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>  	static minor_t last_minor;
>>>  	minor_t m;
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>
>>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>>  		if (m > ZFSDEV_MAX_MINOR)
>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>  	minor_t minor;
>>>  	zfs_soft_state_t *zs;
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>
>>>  	minor = zfsdev_minor_alloc();
>>>  	if (minor == 0)
>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>  static void
>>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>  {
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>
>>>  	zfs_onexit_destroy(zo);
>>>  	ddi_soft_state_free(zfsdev_state, minor);
>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, 
>>> struct thread *td)
>>>
>>>  	/* This is the control device. Allocate a new minor if requested. */
>>>  	if (flag & FEXCL) {
>>> -		mutex_enter(&spa_namespace_lock);
>>> +		mutex_enter(&zfsdev_state_lock);
>>>  		error = zfs_ctldev_init(devp);
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  	}
>>>
>>>  	return (error);
>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>  	if (minor == 0)
>>>  		return;
>>> 
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>  	if (zo == NULL) {
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return;
>>>  	}
>>>  	zfs_ctldev_destroy(zo, minor);
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  }
>>>
>>>  static int
>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c 
>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>> index 72d4502..aec5219 100644
>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>>
>>>  /*
>>> - * The spa_namespace_lock protects the zfsdev_state structure from being
>>> - * modified while it's being used, e.g. an open that comes in before a
>>> - * create finishes.  It also protects temporary opens of the dataset so 
>>> that,
>>> + * This lock protects the zfsdev_state structure from being modified
>>> + * while it's being used, e.g. an open that comes in before a create
>>> + * finishes.  It also protects temporary opens of the dataset so that,
>>>   * e.g., an open doesn't get a spurious EBUSY.
>>>   */
>>> +kmutex_t zfsdev_state_lock;
>>>  static uint32_t zvol_minors;
>>>
>>>  typedef struct zvol_extent {
>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>  	struct g_geom *gp;
>>>  	zvol_state_t *zv = NULL;
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>
>>>  	g_topology_lock();
>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>  {
>>>  	zvol_state_t *zv;
>>> 
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>  	zv = zvol_minor_lookup(name);
>>>  	if (minor && zv)
>>>  		*minor = zv->zv_minor;
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	return (zv ? 0 : -1);
>>>  }
>>>  #endif	/* sun */
>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>
>>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>> 
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>
>>>  	if (zvol_minor_lookup(name) != NULL) {
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(EEXIST));
>>>  	}
>>> 
>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>
>>>  	if (error) {
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (error);
>>>  	}
>>>
>>>  #ifdef sun
>>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>>  		dmu_objset_disown(os, FTAG);
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(ENXIO));
>>>  	}
>>>
>>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>  		dmu_objset_disown(os, FTAG);
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(EAGAIN));
>>>  	}
>>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>  		dmu_objset_disown(os, FTAG);
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(EAGAIN));
>>>  	}
>>> 
>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>  		dmu_objset_disown(os, FTAG);
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(EAGAIN));
>>>  	}
>>> 
>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>
>>>  	zvol_minors++;
>>> 
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>
>>>  	zvol_geom_run(zv);
>>> 
>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>  	minor_t minor = zv->zv_minor;
>>>  #endif
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>  	if (zv->zv_total_opens != 0)
>>>  		return (SET_ERROR(EBUSY));
>>> 
>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>  	zvol_state_t *zv;
>>>  	int rc;
>>> 
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(ENXIO));
>>>  	}
>>>  	g_topology_lock();
>>>  	rc = zvol_remove_zv(zv);
>>>  	g_topology_unlock();
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	return (rc);
>>>  }
>>> 
>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>  	dmu_tx_t *tx;
>>>  	int error;
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>
>>>  	tx = dmu_tx_create(os);
>>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>  	namelen = strlen(name);
>>>
>>>  	DROP_GIANT();
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>  	g_topology_lock();
>>>
>>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>  	}
>>>
>>>  	g_topology_unlock();
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	PICKUP_GIANT();
>>>  }
>>> 
>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, 
>>> uint64_t volsize)
>>>  	uint64_t old_volsize = 0ULL;
>>>  	uint64_t readonly;
>>> 
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>  	zv = zvol_minor_lookup(name);
>>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (error);
>>>  	}
>>> 
>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, 
>>> uint64_t volsize)
>>>  out:
>>>  	dmu_objset_rele(os, FTAG);
>>> 
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>
>>>  	return (error);
>>>  }
>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int 
>>> count)
>>>  {
>>>  	zvol_state_t *zv;
>>>  	int err = 0;
>>> -	boolean_t locked = B_FALSE;
>>> 
>>> -	/*
>>> -	 * Protect against recursively entering spa_namespace_lock
>>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>>> -	 * with spa_namespace_lock in the ZVOL code.
>>> -	 * We are using the same trick as spa_open().
>>> -	 * Note that calls in zvol_first_open which need to resolve
>>> -	 * pool name to a spa object will enter spa_open()
>>> -	 * recursively, but that function already has all the
>>> -	 * necessary protection.
>>> -	 */
>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>> -		mutex_enter(&spa_namespace_lock);
>>> -		locked = B_TRUE;
>>> -	}
>>> +	mutex_enter(&zfsdev_state_lock);
>>>
>>>  	zv = pp->private;
>>>  	if (zv == NULL) {
>>> -		if (locked)
>>> -			mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(ENXIO));
>>>  	}
>>>
>>>  	if (zv->zv_total_opens == 0)
>>>  		err = zvol_first_open(zv);
>>>  	if (err) {
>>> -		if (locked)
>>> -			mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (err);
>>>  	}
>>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int 
>>> count)
>>>  #endif
>>>
>>>  	zv->zv_total_opens += count;
>>> -	if (locked)
>>> -		mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>
>>>  	return (err);
>>>  out:
>>>  	if (zv->zv_total_opens == 0)
>>>  		zvol_last_close(zv);
>>> -	if (locked)
>>> -		mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	return (err);
>>>  }
>>> 
>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int 
>>> count)
>>>  {
>>>  	zvol_state_t *zv;
>>>  	int error = 0;
>>> -	boolean_t locked = B_FALSE;
>>> 
>>> -	/* See comment in zvol_open(). */
>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>> -		mutex_enter(&spa_namespace_lock);
>>> -		locked = B_TRUE;
>>> -	}
>>> +	mutex_enter(&zfsdev_state_lock);
>>>
>>>  	zv = pp->private;
>>>  	if (zv == NULL) {
>>> -		if (locked)
>>> -			mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(ENXIO));
>>>  	}
>>> 
>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int 
>>> count)
>>>  	if (zv->zv_total_opens == 0)
>>>  		zvol_last_close(zv);
>>> 
>>> -	if (locked)
>>> -		mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	return (error);
>>>  }
>>> 
>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>> flag, cred_t *cr, int *rvalp)
>>>  	int error = 0;
>>>  	rl_t *rl;
>>> 
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>
>>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>
>>>  	if (zv == NULL) {
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		return (SET_ERROR(ENXIO));
>>>  	}
>>>  	ASSERT(zv->zv_total_opens > 0);
>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>> flag, cred_t *cr, int *rvalp)
>>>  		dki.dki_ctype = DKC_UNKNOWN;
>>>  		dki.dki_unit = getminor(dev);
>>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - 
>>> zv->zv_min_bs);
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>  			error = SET_ERROR(EFAULT);
>>>  		return (error);
>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>> flag, cred_t *cr, int *rvalp)
>>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>  		dkm.dki_media_type = DK_UNKNOWN;
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>  			error = SET_ERROR(EFAULT);
>>>  		return (error);
>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>> flag, cred_t *cr, int *rvalp)
>>>  			uint64_t vs = zv->zv_volsize;
>>>  			uint8_t bs = zv->zv_min_bs;
>>> 
>>> -			mutex_exit(&spa_namespace_lock);
>>> +			mutex_exit(&zfsdev_state_lock);
>>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>>  			return (error);
>>>  		}
>>>
>>>  	case DKIOCFLUSHWRITECACHE:
>>>  		dkc = (struct dk_callback *)arg;
>>> -		mutex_exit(&spa_namespace_lock);
>>> +		mutex_exit(&zfsdev_state_lock);
>>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>> flag, cred_t *cr, int *rvalp)
>>>  			}
>>>  			if (wce) {
>>>  				zv->zv_flags |= ZVOL_WCE;
>>> -				mutex_exit(&spa_namespace_lock);
>>> +				mutex_exit(&zfsdev_state_lock);
>>>  			} else {
>>>  				zv->zv_flags &= ~ZVOL_WCE;
>>> -				mutex_exit(&spa_namespace_lock);
>>> +				mutex_exit(&zfsdev_state_lock);
>>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>  			}
>>>  			return (0);
>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>> flag, cred_t *cr, int *rvalp)
>>>  		break;
>>>
>>>  	}
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	return (error);
>>>  }
>>>  #endif	/* sun */
>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>  {
>>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>  	    1) == 0);
>>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>>  }
>>>
>>>  void
>>>  zvol_fini(void)
>>>  {
>>> +	mutex_destroy(&zfsdev_state_lock);
>>>  	ddi_soft_state_fini(&zfsdev_state);
>>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>>  }
>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize)
>>>  	uint64_t version = spa_version(spa);
>>>  	enum zio_checksum checksum;
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>
>>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char 
>>> *newname)
>>>  	struct g_provider *pp;
>>>  	zvol_state_t *zv;
>>> 
>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>  	g_topology_assert();
>>>
>>>  	pp = LIST_FIRST(&gp->provider);
>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char 
>>> *newname)
>>>  	newnamelen = strlen(newname);
>>>
>>>  	DROP_GIANT();
>>> -	mutex_enter(&spa_namespace_lock);
>>> +	mutex_enter(&zfsdev_state_lock);
>>>  	g_topology_lock();
>>>
>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char 
>>> *newname)
>>>  	}
>>>
>>>  	g_topology_unlock();
>>> -	mutex_exit(&spa_namespace_lock);
>>> +	mutex_exit(&zfsdev_state_lock);
>>>  	PICKUP_GIANT();
>>>  }
>>> --
>>> 1.8.4.2
>>> 
>> 
>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>> 
>>> ---
>>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 
>>> +++++++++++++++-------
>>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>> 
>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c 
>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>> index 94383d6..4cac053 100644
>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>  	 * progress on this vnode.
>>>  	 */
>>> 
>>> +	vn_lock(cvp, lktype);
>>> +
>>>  	for (;;) {
>>>  		/*
>>>  		 * Reached the end of the mount chain?
>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>  		if (vfsp == NULL)
>>>  			break;
>>>  		error = vfs_busy(vfsp, 0);
>>> -		/*
>>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>>> -		 */
>>> -		if (tvp != NULL)
>>> -			vput(cvp);
>>> -		else
>>> -			vrele(cvp);
>>> +		VOP_UNLOCK(cvp, 0);
>>>  		if (error)
>>>  			return (error);
>>> 
>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>  		vfs_unbusy(vfsp);
>>>  		if (error != 0)
>>>  			return (error);
>>> +
>>> +		VN_RELE(cvp);
>>> +
>>>  		cvp = tvp;
>>>  	}
>>> 
>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c 
>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>> index 28ab1fa..d3464b4 100644
>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>  		return (0);
>>>  }
>>> 
>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>> + * On return the passed in vp is unlocked */
>>> +static zfsctl_snapdir_t *
>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>> +{
>>> +	gfs_dir_t *dp = vp->v_data;
>>> +	*dvpp = dp->gfsd_file.gfs_parent;
>>> +	zfsctl_snapdir_t *sdp;
>>> +
>>> +	VN_HOLD(*dvpp);
>>> +	VOP_UNLOCK(vp, 0);
>>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>> +	sdp = (*dvpp)->v_data;
>>> +	VOP_UNLOCK(*dvpp, 0);
>>> +
>>> +	return (sdp);
>>> +}
>>> +
>>>  #ifdef sun
>>>  vnodeops_t *zfsctl_ops_root;
>>>  vnodeops_t *zfsctl_ops_snapdir;
>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>  			 * The snapshot was unmounted behind our backs,
>>>  			 * try to remount it.
>>>  			 */
>>> +			VOP_UNLOCK(*vpp, 0);
>>> +			VN_HOLD(*vpp);
>>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, 
>>> snapname) == 0);
>>>  			goto domount;
>>>  		} else {
>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>  	(void) strcpy(sep->se_name, nm);
>>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, 
>>> dmu_objset_id(snap));
>>> -	VN_HOLD(*vpp);
>>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>>
>>>  	dmu_objset_rele(snap, FTAG);
>>> @@ -1075,6 +1095,7 @@ domount:
>>>  	(void) snprintf(mountpoint, mountpoint_len,
>>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>> +	VN_HOLD(*vpp);
>>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>  	kmem_free(mountpoint, mountpoint_len);
>>>  	if (err == 0) {
>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>  	int locked;
>>>  	vnode_t *dvp;
>>> 
>>> -	if (vp->v_count > 0)
>>> -		goto end;
>>> -
>>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>> -	sdp = dvp->v_data;
>>> -	VOP_UNLOCK(dvp, 0);
>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>
>>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>  		mutex_enter(&sdp->sd_lock);
>>> 
>>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>> +
>>> +	if (vp->v_count > 0) {
>>> +		mutex_exit(&sdp->sd_lock);
>>> +		return (0);
>>> +	}
>>> +
>>>  	ASSERT(!vn_ismntpt(vp));
>>>
>>>  	sep = avl_first(&sdp->sd_snaps);
>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>  		mutex_exit(&sdp->sd_lock);
>>>  	VN_RELE(dvp);
>>> 
>>> -end:
>>>  	/*
>>>  	 * Dispose of the vnode for the snapshot mount point.
>>>  	 * This is safe to do because once this entry has been removed
>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>  static int
>>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>  {
>>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>> -	vnode_t *dvp, *vp;
>>> +	vnode_t *dvp, *vp = ap->a_vp;
>>>  	zfsctl_snapdir_t *sdp;
>>>  	zfs_snapentry_t *sep;
>>> -	int error;
>>> +	int error = 0;
>>> 
>>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>> -	if (error != 0)
>>> -		return (error);
>>> -	sdp = dvp->v_data;
>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>
>>>  	mutex_enter(&sdp->sd_lock);
>>> +
>>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>>> +
>>>  	sep = avl_first(&sdp->sd_snaps);
>>>  	while (sep != NULL) {
>>>  		vp = sep->se_root;
>>> --
>>> 1.8.4.2
>>> 
>> 
>> -- 
>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>> FreeBSD committer                         http://www.FreeBSD.org
>> Am I Evil? Yes, I Am!                     http://mobter.com
>> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 07:25:48 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4EB142F3;
 Tue, 24 Dec 2013 07:25:48 +0000 (UTC)
Received: from freefall.freebsd.org (freefall.freebsd.org
 [IPv6:2001:1900:2254:206c::16:87])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 239241636;
 Tue, 24 Dec 2013 07:25:48 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
 by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rBO7PlWI012584;
 Tue, 24 Dec 2013 07:25:47 GMT
 (envelope-from delphij@freefall.freebsd.org)
Received: (from delphij@localhost)
 by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rBO7Pl9c012567;
 Tue, 24 Dec 2013 07:25:47 GMT (envelope-from delphij)
Date: Tue, 24 Dec 2013 07:25:47 GMT
Message-Id: <201312240725.rBO7Pl9c012567@freefall.freebsd.org>
To: kwhite@site.uottawa.ca, delphij@FreeBSD.org, freebsd-fs@FreeBSD.org,
 avg@FreeBSD.org
From: delphij@FreeBSD.org
Subject: Re: kern/182570: [zfs] [patch] ZFS panic in receive
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 07:25:48 -0000

Synopsis: [zfs] [patch] ZFS panic in receive

State-Changed-From-To: open->closed
State-Changed-By: delphij
State-Changed-When: Tue Dec 24 07:25:24 UTC 2013
State-Changed-Why: 
Mark as closed -- fix was committed by avg@.


Responsible-Changed-From-To: freebsd-fs->avg
Responsible-Changed-By: delphij
Responsible-Changed-When: Tue Dec 24 07:25:24 UTC 2013
Responsible-Changed-Why: 
Assign.

http://www.freebsd.org/cgi/query-pr.cgi?pr=182570

From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 20:29:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CA68B3C7;
 Tue, 24 Dec 2013 20:29:01 +0000 (UTC)
Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10])
 by mx1.freebsd.org (Postfix) with ESMTP id 39C291DEA;
 Tue, 24 Dec 2013 20:29:00 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id B01F2A894; Tue, 24 Dec 2013 21:28:39 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id AB25AA893;
 Tue, 24 Dec 2013 21:28:39 +0100 (CET)
Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
Message-ID: <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
 <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 20:29:01 -0000

Dear all,

Unfortunately, this seems to be serious, as a plain user can deadlock zfs 
using the attached script, and can make a DoS against the mounted 
dataset's .zfs/snapshot directory.

Regards,

Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:

> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
> From: krichy@tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> Dear devs,
>
> A third one, again the snapshot dirs:
>
> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null
>
> Will deadlock in a few seconds.
>
> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries to 
> lock .zfs, while the other just tries to do this in reversed order.
>
> In a vop_lookup, why is the passed in vnode, and the returned vnode must be 
> locked? Why is not enough to have it held?
>
> Thanks in advance,
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote:
>
>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
>> From: krichy@tvnetwork.hu
>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> 
>> Dear Pawel,
>> 
>> Thanks for your response. I hope someone will review my work.
>> 
>> Secondly, I think I;ve found another deadlock scenario:
>> 
>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
>> held. Then it does an unmount of all snapshots, which in turn goes to 
>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>> 
>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does 
>> a snapshot mount, which somewhere tries to acquire spa_namespace_lock. So 
>> it deadlocks. Currently I dont know how to workaround this.
>> 
>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>> 
>> What if we release it function enter, and re-acquire upon exit?
>> 
>> Thanks in advance,
>> 
>> 
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>> 
>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>> 
>>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>> To: krichy@tvnetwork.hu
>>> Cc: freebsd-fs@freebsd.org
>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>> 
>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>>>> Dear devs,
>>>> 
>>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>>> trying to solve it at least for two weeks now, and an ACK or a NACK would
>>>> be nice from someone. Also a commit is reverted since it also caused
>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but I
>>>> did not find any reference how to produce that deadlock. In my view
>>>> reverting that makes my issues disappear, but I dont know what new cases
>>>> will it rise.
>>> 
>>> Richard,
>>> 
>>> I won't be able to analyse it myself anytime soon, because of other
>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>>> there will be able to help you.
>>> 
>>>> I've rewritten traverse() to make more like upstream, added two extra
>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what
>>>> was passed to it (I dont know even that upon creating a snapshot vnode 
>>>> why
>>>> is that extra two holds needed for the vnode.) And unfortunately, due to
>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also
>>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>>> 
>>>> After this, one may also get a deadlock, when a simple access would call
>>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes
>>>> should always be covered, but after some stress test, sometimes we hit
>>>> that call, and that can cause again deadlocks. In our environment I've
>>>> just uncommented that callback, which returns ENODIR on some calls, but 
>>>> at
>>>> least not a deadlock.
>>>> 
>>>> The attached script can be used to reproduce my cases (would one confirm
>>>> that?), and after the patches applied, they disappear (confirm?).
>>>> 
>>>> Thanks,
>>>> 
>>>> 
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>> 
>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>>> 
>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>>> From: krichy@tvnetwork.hu
>>>>> To: pjd@freebsd.org
>>>>> Cc: freebsd-fs@freebsd.org
>>>>> Subject: Re: kern/184677 (fwd)
>>>>> 
>>>>> Dear devs,
>>>>> 
>>>>> I will sum up my experience regarding the issue:
>>>>> 
>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the
>>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>>> 
>>>>> After investigating the problem, I found that zfs send umounts the 
>>>>> snapshots,
>>>>> and that causes the deadlock, so later I tested only with concurrent 
>>>>> umount
>>>>> and the "activity". More later I found that listing the snapshots in
>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I 
>>>>> used
>>>>> them for the tests. But for my surprise, instead of a deadlock, a 
>>>>> recursive
>>>>> lock panic has arised.
>>>>> 
>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's 
>>>>> zfsctl_snapdir_t
>>>>> structure (sdp). This contains a tree of mounted snapshots, and each 
>>>>> entry
>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on 
>>>>> top
>>>>> (se_root). The strange is that the se_root member does not hold a 
>>>>> reference
>>>>> for the vnode, just a simple pointer to it.
>>>>> 
>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is 
>>>>> locked,
>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it
>>>>> exists already. If it founds no entry, does the mount. In the case of an
>>>>> entry was found, the se_root member contains the vnode which the 
>>>>> snapshot is
>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call 
>>>>> will
>>>>> resolve to the real root vnode of the mounted snapshot, returning it as
>>>>> locked. (Examining the traverse() code I've found that it did not follow
>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>>> 
>>>>> On the other way, when an umount is issued, the se_root vnode looses its 
>>>>> last
>>>>> reference (as only the mountpoint holds one for it), it goes through the
>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is 
>>>>> called
>>>>> with a locked vnode, so this is a deadlock race condition. While
>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse()
>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock 
>>>>> on
>>>>> se_root while tries to access the sdp lock.
>>>>> 
>>>>> The zfsctl_snapshot_inactive() has an if statement checking the 
>>>>> v_usecount,
>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it 
>>>>> is
>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() 
>>>>> path
>>>>> assumes that the vnode remains inactive (as opposed to illumos, at least 
>>>>> how
>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources 
>>>>> while in
>>>>> a locked state. I was a bit confused, and probably that is why the 
>>>>> previously
>>>>> posted patch is as is.
>>>>> 
>>>>> Maybe if I had some clues on the directions of this problem, I could 
>>>>> have
>>>>> worked more for a nicer, shorter solution.
>>>>> 
>>>>> Please someone comment on my post.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> 
>>>>> 
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> 
>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>> 
>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: pjd@freebsd.org
>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>> 
>>>>>> Dear PJD,
>>>>>> 
>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>>> regarding some issues in ZFS. Please give some advice for me, where to 
>>>>>> look
>>>>>> for solutions, or how could I help to resolve those issues.
>>>>>> 
>>>>>> Regards,
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> ---------- Forwarded message ----------
>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: freebsd-fs@freebsd.org
>>>>>> Subject: Re: kern/184677
>>>>>> 
>>>>>> 
>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on 
>>>>>> Fri
>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>>> situation. Any comments on this?
>>>>>> 
>>>>>> 
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>> 
>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>> Subject: kern/184677
>>>>>>> 
>>>>>>> Dear devs,
>>>>>>> 
>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, 
>>>>>>> and
>>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>>> 
>>>>>>> But unfortunately, my original problem is still not solved, maybe the 
>>>>>>> same
>>>>>>> as Ryan's:
>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>>> 
>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to 
>>>>>>> acquire
>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a
>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason 
>>>>>>> they
>>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock 
>>>>>>> with
>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>>> 
>>>>>>> I would like to help making ZFS more stable on freebsd also with its 
>>>>>>> whole
>>>>>>> functionality. I would be very thankful if some expert would give some
>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>>> 
>>>>>>> Thanks in advance,
>>>>>>> 
>>>>>>> 
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> _______________________________________________
>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>>> 
>>>>> 
>>> 
>>> 
>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and
>>>>  replace it with the"
>>>> 
>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>>> 
>>>> Conflicts:
>>>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> ---
>>>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 
>>>> +++++++++------------
>>>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>>> 
>>>> diff --git 
>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>> index af2def2..8272c4d 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>>  extern minor_t zfsdev_minor_alloc(void);
>>>>
>>>>  extern void *zfsdev_state;
>>>> +extern kmutex_t zfsdev_state_lock;
>>>>
>>>>  #endif	/* _KERNEL */
>>>> 
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>> index 15685a5..5c3e9f3 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>  	struct g_provider *pp;
>>>>  	struct g_consumer *cp;
>>>>  	size_t bufsize;
>>>> -	int error;
>>>> +	int error, lock;
>>>>
>>>>  	/*
>>>>  	 * We must have a pathname, and it must be absolute.
>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>
>>>>  	vd->vdev_tsd = NULL;
>>>> 
>>>> +	if (mutex_owned(&spa_namespace_lock)) {
>>>> +		mutex_exit(&spa_namespace_lock);
>>>> +		lock = 1;
>>>> +	} else {
>>>> +		lock = 0;
>>>> +	}
>>>>  	DROP_GIANT();
>>>>  	g_topology_lock();
>>>>  	error = 0;
>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>  	    !ISP2(cp->provider->sectorsize)) {
>>>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>>  		    vd->vdev_path);
>>>> +
>>>> +		g_topology_lock();
>>>>  		vdev_geom_detach(cp, 0);
>>>> +		g_topology_unlock();
>>>> +
>>>>  		error = EINVAL;
>>>>  		cp = NULL;
>>>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>> *max_psize,
>>>>  	}
>>>>  	g_topology_unlock();
>>>>  	PICKUP_GIANT();
>>>> +	if (lock)
>>>> +		mutex_enter(&spa_namespace_lock);
>>>>  	if (cp == NULL) {
>>>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>>  		return (error);
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> index e9fba26..91becde 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>>  	static minor_t last_minor;
>>>>  	minor_t m;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>>>  		if (m > ZFSDEV_MAX_MINOR)
>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>  	minor_t minor;
>>>>  	zfs_soft_state_t *zs;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	minor = zfsdev_minor_alloc();
>>>>  	if (minor == 0)
>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>  static void
>>>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>>  {
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	zfs_onexit_destroy(zo);
>>>>  	ddi_soft_state_free(zfsdev_state, minor);
>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, 
>>>> struct thread *td)
>>>>
>>>>  	/* This is the control device. Allocate a new minor if requested. */
>>>>  	if (flag & FEXCL) {
>>>> -		mutex_enter(&spa_namespace_lock);
>>>> +		mutex_enter(&zfsdev_state_lock);
>>>>  		error = zfs_ctldev_init(devp);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  	}
>>>>
>>>>  	return (error);
>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>>  	if (minor == 0)
>>>>  		return;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>>  	if (zo == NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return;
>>>>  	}
>>>>  	zfs_ctldev_destroy(zo, minor);
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  }
>>>>
>>>>  static int
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> index 72d4502..aec5219 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>>>
>>>>  /*
>>>> - * The spa_namespace_lock protects the zfsdev_state structure from being
>>>> - * modified while it's being used, e.g. an open that comes in before a
>>>> - * create finishes.  It also protects temporary opens of the dataset so 
>>>> that,
>>>> + * This lock protects the zfsdev_state structure from being modified
>>>> + * while it's being used, e.g. an open that comes in before a create
>>>> + * finishes.  It also protects temporary opens of the dataset so that,
>>>>   * e.g., an open doesn't get a spurious EBUSY.
>>>>   */
>>>> +kmutex_t zfsdev_state_lock;
>>>>  static uint32_t zvol_minors;
>>>>
>>>>  typedef struct zvol_extent {
>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>>  	struct g_geom *gp;
>>>>  	zvol_state_t *zv = NULL;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	g_topology_lock();
>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>>  {
>>>>  	zvol_state_t *zv;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	zv = zvol_minor_lookup(name);
>>>>  	if (minor && zv)
>>>>  		*minor = zv->zv_minor;
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (zv ? 0 : -1);
>>>>  }
>>>>  #endif	/* sun */
>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>>
>>>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	if (zvol_minor_lookup(name) != NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EEXIST));
>>>>  	}
>>>> 
>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>>
>>>>  	if (error) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (error);
>>>>  	}
>>>>
>>>>  #ifdef sun
>>>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>
>>>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EAGAIN));
>>>>  	}
>>>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EAGAIN));
>>>>  	}
>>>> 
>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>  		dmu_objset_disown(os, FTAG);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(EAGAIN));
>>>>  	}
>>>> 
>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>>
>>>>  	zvol_minors++;
>>>> 
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>
>>>>  	zvol_geom_run(zv);
>>>> 
>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>>  	minor_t minor = zv->zv_minor;
>>>>  #endif
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>  	if (zv->zv_total_opens != 0)
>>>>  		return (SET_ERROR(EBUSY));
>>>> 
>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>>  	zvol_state_t *zv;
>>>>  	int rc;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>  	g_topology_lock();
>>>>  	rc = zvol_remove_zv(zv);
>>>>  	g_topology_unlock();
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (rc);
>>>>  }
>>>> 
>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>>  	dmu_tx_t *tx;
>>>>  	int error;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>
>>>>  	tx = dmu_tx_create(os);
>>>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>>  	namelen = strlen(name);
>>>>
>>>>  	DROP_GIANT();
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	g_topology_lock();
>>>>
>>>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>>  	}
>>>>
>>>>  	g_topology_unlock();
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	PICKUP_GIANT();
>>>>  }
>>>> 
>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>> uint64_t volsize)
>>>>  	uint64_t old_volsize = 0ULL;
>>>>  	uint64_t readonly;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	zv = zvol_minor_lookup(name);
>>>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (error);
>>>>  	}
>>>> 
>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>> uint64_t volsize)
>>>>  out:
>>>>  	dmu_objset_rele(os, FTAG);
>>>> 
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>
>>>>  	return (error);
>>>>  }
>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  {
>>>>  	zvol_state_t *zv;
>>>>  	int err = 0;
>>>> -	boolean_t locked = B_FALSE;
>>>> 
>>>> -	/*
>>>> -	 * Protect against recursively entering spa_namespace_lock
>>>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>>>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>>>> -	 * with spa_namespace_lock in the ZVOL code.
>>>> -	 * We are using the same trick as spa_open().
>>>> -	 * Note that calls in zvol_first_open which need to resolve
>>>> -	 * pool name to a spa object will enter spa_open()
>>>> -	 * recursively, but that function already has all the
>>>> -	 * necessary protection.
>>>> -	 */
>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>> -		mutex_enter(&spa_namespace_lock);
>>>> -		locked = B_TRUE;
>>>> -	}
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	zv = pp->private;
>>>>  	if (zv == NULL) {
>>>> -		if (locked)
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>
>>>>  	if (zv->zv_total_opens == 0)
>>>>  		err = zvol_first_open(zv);
>>>>  	if (err) {
>>>> -		if (locked)
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (err);
>>>>  	}
>>>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  #endif
>>>>
>>>>  	zv->zv_total_opens += count;
>>>> -	if (locked)
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>
>>>>  	return (err);
>>>>  out:
>>>>  	if (zv->zv_total_opens == 0)
>>>>  		zvol_last_close(zv);
>>>> -	if (locked)
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (err);
>>>>  }
>>>> 
>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  {
>>>>  	zvol_state_t *zv;
>>>>  	int error = 0;
>>>> -	boolean_t locked = B_FALSE;
>>>> 
>>>> -	/* See comment in zvol_open(). */
>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>> -		mutex_enter(&spa_namespace_lock);
>>>> -		locked = B_TRUE;
>>>> -	}
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	zv = pp->private;
>>>>  	if (zv == NULL) {
>>>> -		if (locked)
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>> 
>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>> count)
>>>>  	if (zv->zv_total_opens == 0)
>>>>  		zvol_last_close(zv);
>>>> 
>>>> -	if (locked)
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (error);
>>>>  }
>>>> 
>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  	int error = 0;
>>>>  	rl_t *rl;
>>>> 
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>
>>>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>>
>>>>  	if (zv == NULL) {
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		return (SET_ERROR(ENXIO));
>>>>  	}
>>>>  	ASSERT(zv->zv_total_opens > 0);
>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  		dki.dki_ctype = DKC_UNKNOWN;
>>>>  		dki.dki_unit = getminor(dev);
>>>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - 
>>>> zv->zv_min_bs);
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>>  			error = SET_ERROR(EFAULT);
>>>>  		return (error);
>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>>  		dkm.dki_media_type = DK_UNKNOWN;
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>>  			error = SET_ERROR(EFAULT);
>>>>  		return (error);
>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  			uint64_t vs = zv->zv_volsize;
>>>>  			uint8_t bs = zv->zv_min_bs;
>>>> 
>>>> -			mutex_exit(&spa_namespace_lock);
>>>> +			mutex_exit(&zfsdev_state_lock);
>>>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>>>  			return (error);
>>>>  		}
>>>>
>>>>  	case DKIOCFLUSHWRITECACHE:
>>>>  		dkc = (struct dk_callback *)arg;
>>>> -		mutex_exit(&spa_namespace_lock);
>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  			}
>>>>  			if (wce) {
>>>>  				zv->zv_flags |= ZVOL_WCE;
>>>> -				mutex_exit(&spa_namespace_lock);
>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>  			} else {
>>>>  				zv->zv_flags &= ~ZVOL_WCE;
>>>> -				mutex_exit(&spa_namespace_lock);
>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>  			}
>>>>  			return (0);
>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>> flag, cred_t *cr, int *rvalp)
>>>>  		break;
>>>>
>>>>  	}
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	return (error);
>>>>  }
>>>>  #endif	/* sun */
>>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>>  {
>>>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>>  	    1) == 0);
>>>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>>>  }
>>>>
>>>>  void
>>>>  zvol_fini(void)
>>>>  {
>>>> +	mutex_destroy(&zfsdev_state_lock);
>>>>  	ddi_soft_state_fini(&zfsdev_state);
>>>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>>>  }
>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize)
>>>>  	uint64_t version = spa_version(spa);
>>>>  	enum zio_checksum checksum;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>>
>>>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char 
>>>> *newname)
>>>>  	struct g_provider *pp;
>>>>  	zvol_state_t *zv;
>>>> 
>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>  	g_topology_assert();
>>>>
>>>>  	pp = LIST_FIRST(&gp->provider);
>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char 
>>>> *newname)
>>>>  	newnamelen = strlen(newname);
>>>>
>>>>  	DROP_GIANT();
>>>> -	mutex_enter(&spa_namespace_lock);
>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>  	g_topology_lock();
>>>>
>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char 
>>>> *newname)
>>>>  	}
>>>>
>>>>  	g_topology_unlock();
>>>> -	mutex_exit(&spa_namespace_lock);
>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>  	PICKUP_GIANT();
>>>>  }
>>>> --
>>>> 1.8.4.2
>>>> 
>>> 
>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>>> 
>>>> ---
>>>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 
>>>> +++++++++++++++-------
>>>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>>> 
>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c 
>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>> index 94383d6..4cac053 100644
>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>>  	 * progress on this vnode.
>>>>  	 */
>>>> 
>>>> +	vn_lock(cvp, lktype);
>>>> +
>>>>  	for (;;) {
>>>>  		/*
>>>>  		 * Reached the end of the mount chain?
>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>>  		if (vfsp == NULL)
>>>>  			break;
>>>>  		error = vfs_busy(vfsp, 0);
>>>> -		/*
>>>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>>>> -		 */
>>>> -		if (tvp != NULL)
>>>> -			vput(cvp);
>>>> -		else
>>>> -			vrele(cvp);
>>>> +		VOP_UNLOCK(cvp, 0);
>>>>  		if (error)
>>>>  			return (error);
>>>> 
>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>>  		vfs_unbusy(vfsp);
>>>>  		if (error != 0)
>>>>  			return (error);
>>>> +
>>>> +		VN_RELE(cvp);
>>>> +
>>>>  		cvp = tvp;
>>>>  	}
>>>> 
>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c 
>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>> index 28ab1fa..d3464b4 100644
>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>>  		return (0);
>>>>  }
>>>> 
>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>>> + * On return the passed in vp is unlocked */
>>>> +static zfsctl_snapdir_t *
>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>>> +{
>>>> +	gfs_dir_t *dp = vp->v_data;
>>>> +	*dvpp = dp->gfsd_file.gfs_parent;
>>>> +	zfsctl_snapdir_t *sdp;
>>>> +
>>>> +	VN_HOLD(*dvpp);
>>>> +	VOP_UNLOCK(vp, 0);
>>>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>>> +	sdp = (*dvpp)->v_data;
>>>> +	VOP_UNLOCK(*dvpp, 0);
>>>> +
>>>> +	return (sdp);
>>>> +}
>>>> +
>>>>  #ifdef sun
>>>>  vnodeops_t *zfsctl_ops_root;
>>>>  vnodeops_t *zfsctl_ops_snapdir;
>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>>  			 * The snapshot was unmounted behind our backs,
>>>>  			 * try to remount it.
>>>>  			 */
>>>> +			VOP_UNLOCK(*vpp, 0);
>>>> +			VN_HOLD(*vpp);
>>>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, 
>>>> snapname) == 0);
>>>>  			goto domount;
>>>>  		} else {
>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>>  	(void) strcpy(sep->se_name, nm);
>>>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, 
>>>> dmu_objset_id(snap));
>>>> -	VN_HOLD(*vpp);
>>>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>>>
>>>>  	dmu_objset_rele(snap, FTAG);
>>>> @@ -1075,6 +1095,7 @@ domount:
>>>>  	(void) snprintf(mountpoint, mountpoint_len,
>>>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>>> +	VN_HOLD(*vpp);
>>>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>>  	kmem_free(mountpoint, mountpoint_len);
>>>>  	if (err == 0) {
>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>>  	int locked;
>>>>  	vnode_t *dvp;
>>>> 
>>>> -	if (vp->v_count > 0)
>>>> -		goto end;
>>>> -
>>>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>>> -	sdp = dvp->v_data;
>>>> -	VOP_UNLOCK(dvp, 0);
>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>
>>>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>>  		mutex_enter(&sdp->sd_lock);
>>>> 
>>>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>>> +
>>>> +	if (vp->v_count > 0) {
>>>> +		mutex_exit(&sdp->sd_lock);
>>>> +		return (0);
>>>> +	}
>>>> +
>>>>  	ASSERT(!vn_ismntpt(vp));
>>>>
>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>>  		mutex_exit(&sdp->sd_lock);
>>>>  	VN_RELE(dvp);
>>>> 
>>>> -end:
>>>>  	/*
>>>>  	 * Dispose of the vnode for the snapshot mount point.
>>>>  	 * This is safe to do because once this entry has been removed
>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>>  static int
>>>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>>  {
>>>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>>> -	vnode_t *dvp, *vp;
>>>> +	vnode_t *dvp, *vp = ap->a_vp;
>>>>  	zfsctl_snapdir_t *sdp;
>>>>  	zfs_snapentry_t *sep;
>>>> -	int error;
>>>> +	int error = 0;
>>>> 
>>>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>>>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>>> -	if (error != 0)
>>>> -		return (error);
>>>> -	sdp = dvp->v_data;
>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>
>>>>  	mutex_enter(&sdp->sd_lock);
>>>> +
>>>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>>>> +
>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>  	while (sep != NULL) {
>>>>  		vp = sep->se_root;
>>>> --
>>>> 1.8.4.2
>>>> 
>>> 
>>> -- 
>>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>>> FreeBSD committer                         http://www.FreeBSD.org
>>> Am I Evil? Yes, I Am!                     http://mobter.com
>>> 
>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>> 
>
From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 20:35:37 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B76355FD;
 Tue, 24 Dec 2013 20:35:37 +0000 (UTC)
Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10])
 by mx1.freebsd.org (Postfix) with ESMTP id 2838D1E6A;
 Tue, 24 Dec 2013 20:35:37 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id 9B014A899; Tue, 24 Dec 2013 21:35:16 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id 97E3BA898;
 Tue, 24 Dec 2013 21:35:16 +0100 (CET)
Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
Message-ID: <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
 <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 20:35:37 -0000

Dear all,

Now a user caused deadlock simply caused buildkernel to be blocked, so I 
must correct myself in that this not only does a DoS for .zfs/snapshot 
directories, but may for the whole system.

Regards,

P.S.: Merry Christmas!

Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:

> Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET)
> From: krichy@tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> Dear all,
>
> Unfortunately, this seems to be serious, as a plain user can deadlock zfs 
> using the attached script, and can make a DoS against the mounted dataset's 
> .zfs/snapshot directory.
>
> Regards,
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:
>
>> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
>> From: krichy@tvnetwork.hu
>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> 
>> Dear devs,
>> 
>> A third one, again the snapshot dirs:
>> 
>> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
>> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null
>> 
>> Will deadlock in a few seconds.
>> 
>> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries 
>> to lock .zfs, while the other just tries to do this in reversed order.
>> 
>> In a vop_lookup, why is the passed in vnode, and the returned vnode must be 
>> locked? Why is not enough to have it held?
>> 
>> Thanks in advance,
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>> 
>> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote:
>> 
>>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
>>> From: krichy@tvnetwork.hu
>>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>> 
>>> Dear Pawel,
>>> 
>>> Thanks for your response. I hope someone will review my work.
>>> 
>>> Secondly, I think I;ve found another deadlock scenario:
>>> 
>>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
>>> held. Then it does an unmount of all snapshots, which in turn goes to 
>>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>>> 
>>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock 
>>> does a snapshot mount, which somewhere tries to acquire 
>>> spa_namespace_lock. So it deadlocks. Currently I dont know how to 
>>> workaround this.
>>> 
>>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>>> 
>>> What if we release it function enter, and re-acquire upon exit?
>>> 
>>> Thanks in advance,
>>> 
>>> 
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>> 
>>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>>> 
>>>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>>>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>>> To: krichy@tvnetwork.hu
>>>> Cc: freebsd-fs@freebsd.org
>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>> 
>>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>>>>> Dear devs,
>>>>> 
>>>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>>>> trying to solve it at least for two weeks now, and an ACK or a NACK 
>>>>> would
>>>>> be nice from someone. Also a commit is reverted since it also caused
>>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but 
>>>>> I
>>>>> did not find any reference how to produce that deadlock. In my view
>>>>> reverting that makes my issues disappear, but I dont know what new cases
>>>>> will it rise.
>>>> 
>>>> Richard,
>>>> 
>>>> I won't be able to analyse it myself anytime soon, because of other
>>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>>>> there will be able to help you.
>>>> 
>>>>> I've rewritten traverse() to make more like upstream, added two extra
>>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode 
>>>>> what
>>>>> was passed to it (I dont know even that upon creating a snapshot vnode 
>>>>> why
>>>>> is that extra two holds needed for the vnode.) And unfortunately, due to
>>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also
>>>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>>>> 
>>>>> After this, one may also get a deadlock, when a simple access would call
>>>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes
>>>>> should always be covered, but after some stress test, sometimes we hit
>>>>> that call, and that can cause again deadlocks. In our environment I've
>>>>> just uncommented that callback, which returns ENODIR on some calls, but 
>>>>> at
>>>>> least not a deadlock.
>>>>> 
>>>>> The attached script can be used to reproduce my cases (would one confirm
>>>>> that?), and after the patches applied, they disappear (confirm?).
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> 
>>>>> Kojedzinszky Richard
>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>> 
>>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>> 
>>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>>>> From: krichy@tvnetwork.hu
>>>>>> To: pjd@freebsd.org
>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>> 
>>>>>> Dear devs,
>>>>>> 
>>>>>> I will sum up my experience regarding the issue:
>>>>>> 
>>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the
>>>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>>>> 
>>>>>> After investigating the problem, I found that zfs send umounts the 
>>>>>> snapshots,
>>>>>> and that causes the deadlock, so later I tested only with concurrent 
>>>>>> umount
>>>>>> and the "activity". More later I found that listing the snapshots in
>>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I 
>>>>>> used
>>>>>> them for the tests. But for my surprise, instead of a deadlock, a 
>>>>>> recursive
>>>>>> lock panic has arised.
>>>>>> 
>>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's 
>>>>>> zfsctl_snapdir_t
>>>>>> structure (sdp). This contains a tree of mounted snapshots, and each 
>>>>>> entry
>>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on 
>>>>>> top
>>>>>> (se_root). The strange is that the se_root member does not hold a 
>>>>>> reference
>>>>>> for the vnode, just a simple pointer to it.
>>>>>> 
>>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is 
>>>>>> locked,
>>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it
>>>>>> exists already. If it founds no entry, does the mount. In the case of 
>>>>>> an
>>>>>> entry was found, the se_root member contains the vnode which the 
>>>>>> snapshot is
>>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call 
>>>>>> will
>>>>>> resolve to the real root vnode of the mounted snapshot, returning it as
>>>>>> locked. (Examining the traverse() code I've found that it did not 
>>>>>> follow
>>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>>>> 
>>>>>> On the other way, when an umount is issued, the se_root vnode looses 
>>>>>> its last
>>>>>> reference (as only the mountpoint holds one for it), it goes through 
>>>>>> the
>>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is 
>>>>>> called
>>>>>> with a locked vnode, so this is a deadlock race condition. While
>>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and 
>>>>>> traverse()
>>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock 
>>>>>> on
>>>>>> se_root while tries to access the sdp lock.
>>>>>> 
>>>>>> The zfsctl_snapshot_inactive() has an if statement checking the 
>>>>>> v_usecount,
>>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it 
>>>>>> is
>>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() 
>>>>>> path
>>>>>> assumes that the vnode remains inactive (as opposed to illumos, at 
>>>>>> least how
>>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources 
>>>>>> while in
>>>>>> a locked state. I was a bit confused, and probably that is why the 
>>>>>> previously
>>>>>> posted patch is as is.
>>>>>> 
>>>>>> Maybe if I had some clues on the directions of this problem, I could 
>>>>>> have
>>>>>> worked more for a nicer, shorter solution.
>>>>>> 
>>>>>> Please someone comment on my post.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>> 
>>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: pjd@freebsd.org
>>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>>> 
>>>>>>> Dear PJD,
>>>>>>> 
>>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>>>> regarding some issues in ZFS. Please give some advice for me, where to 
>>>>>>> look
>>>>>>> for solutions, or how could I help to resolve those issues.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> 
>>>>>>> ---------- Forwarded message ----------
>>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>> Subject: Re: kern/184677
>>>>>>> 
>>>>>>> 
>>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on 
>>>>>>> Fri
>>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>>>> situation. Any comments on this?
>>>>>>> 
>>>>>>> 
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> 
>>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>>> 
>>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>>>> From: krichy@tvnetwork.hu
>>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>>> Subject: kern/184677
>>>>>>>> 
>>>>>>>> Dear devs,
>>>>>>>> 
>>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, 
>>>>>>>> and
>>>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>>>> 
>>>>>>>> But unfortunately, my original problem is still not solved, maybe the 
>>>>>>>> same
>>>>>>>> as Ryan's:
>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>>>> 
>>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to 
>>>>>>>> acquire
>>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a
>>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason 
>>>>>>>> they
>>>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock 
>>>>>>>> with
>>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>>>> 
>>>>>>>> I would like to help making ZFS more stable on freebsd also with its 
>>>>>>>> whole
>>>>>>>> functionality. I would be very thankful if some expert would give 
>>>>>>>> some
>>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>>>> 
>>>>>>>> Thanks in advance,
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kojedzinszky Richard
>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> _______________________________________________
>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely 
>>>>> and
>>>>>  replace it with the"
>>>>> 
>>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>>>> 
>>>>> Conflicts:
>>>>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> ---
>>>>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>>>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>>>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 
>>>>> +++++++++------------
>>>>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>>>> 
>>>>> diff --git 
>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>> index af2def2..8272c4d 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>>>  extern minor_t zfsdev_minor_alloc(void);
>>>>>
>>>>>  extern void *zfsdev_state;
>>>>> +extern kmutex_t zfsdev_state_lock;
>>>>>
>>>>>  #endif	/* _KERNEL */
>>>>> 
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>> index 15685a5..5c3e9f3 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>>> *max_psize,
>>>>>  	struct g_provider *pp;
>>>>>  	struct g_consumer *cp;
>>>>>  	size_t bufsize;
>>>>> -	int error;
>>>>> +	int error, lock;
>>>>>
>>>>>  	/*
>>>>>  	 * We must have a pathname, and it must be absolute.
>>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>> uint64_t *max_psize,
>>>>>
>>>>>  	vd->vdev_tsd = NULL;
>>>>> 
>>>>> +	if (mutex_owned(&spa_namespace_lock)) {
>>>>> +		mutex_exit(&spa_namespace_lock);
>>>>> +		lock = 1;
>>>>> +	} else {
>>>>> +		lock = 0;
>>>>> +	}
>>>>>  	DROP_GIANT();
>>>>>  	g_topology_lock();
>>>>>  	error = 0;
>>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>> uint64_t *max_psize,
>>>>>  	    !ISP2(cp->provider->sectorsize)) {
>>>>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>>>  		    vd->vdev_path);
>>>>> +
>>>>> +		g_topology_lock();
>>>>>  		vdev_geom_detach(cp, 0);
>>>>> +		g_topology_unlock();
>>>>> +
>>>>>  		error = EINVAL;
>>>>>  		cp = NULL;
>>>>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t 
>>>>> *max_psize,
>>>>>  	}
>>>>>  	g_topology_unlock();
>>>>>  	PICKUP_GIANT();
>>>>> +	if (lock)
>>>>> +		mutex_enter(&spa_namespace_lock);
>>>>>  	if (cp == NULL) {
>>>>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>>>  		return (error);
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>> index e9fba26..91becde 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>>>  	static minor_t last_minor;
>>>>>  	minor_t m;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>>>>  		if (m > ZFSDEV_MAX_MINOR)
>>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>  	minor_t minor;
>>>>>  	zfs_soft_state_t *zs;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	minor = zfsdev_minor_alloc();
>>>>>  	if (minor == 0)
>>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>  static void
>>>>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>>>  {
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	zfs_onexit_destroy(zo);
>>>>>  	ddi_soft_state_free(zfsdev_state, minor);
>>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, 
>>>>> struct thread *td)
>>>>>
>>>>>  	/* This is the control device. Allocate a new minor if requested. */
>>>>>  	if (flag & FEXCL) {
>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>> +		mutex_enter(&zfsdev_state_lock);
>>>>>  		error = zfs_ctldev_init(devp);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  	}
>>>>>
>>>>>  	return (error);
>>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>>>  	if (minor == 0)
>>>>>  		return;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>>>  	if (zo == NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return;
>>>>>  	}
>>>>>  	zfs_ctldev_destroy(zo, minor);
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  }
>>>>>
>>>>>  static int
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> index 72d4502..aec5219 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>>>>
>>>>>  /*
>>>>> - * The spa_namespace_lock protects the zfsdev_state structure from 
>>>>> being
>>>>> - * modified while it's being used, e.g. an open that comes in before a
>>>>> - * create finishes.  It also protects temporary opens of the dataset so 
>>>>> that,
>>>>> + * This lock protects the zfsdev_state structure from being modified
>>>>> + * while it's being used, e.g. an open that comes in before a create
>>>>> + * finishes.  It also protects temporary opens of the dataset so that,
>>>>>   * e.g., an open doesn't get a spurious EBUSY.
>>>>>   */
>>>>> +kmutex_t zfsdev_state_lock;
>>>>>  static uint32_t zvol_minors;
>>>>>
>>>>>  typedef struct zvol_extent {
>>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>>>  	struct g_geom *gp;
>>>>>  	zvol_state_t *zv = NULL;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	g_topology_lock();
>>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>>>  {
>>>>>  	zvol_state_t *zv;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	zv = zvol_minor_lookup(name);
>>>>>  	if (minor && zv)
>>>>>  		*minor = zv->zv_minor;
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (zv ? 0 : -1);
>>>>>  }
>>>>>  #endif	/* sun */
>>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>>>
>>>>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	if (zvol_minor_lookup(name) != NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EEXIST));
>>>>>  	}
>>>>> 
>>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>>>
>>>>>  	if (error) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (error);
>>>>>  	}
>>>>>
>>>>>  #ifdef sun
>>>>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>
>>>>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>  	}
>>>>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>  	}
>>>>> 
>>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>>  		dmu_objset_disown(os, FTAG);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>  	}
>>>>> 
>>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>>>
>>>>>  	zvol_minors++;
>>>>> 
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>
>>>>>  	zvol_geom_run(zv);
>>>>> 
>>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>>>  	minor_t minor = zv->zv_minor;
>>>>>  #endif
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>  	if (zv->zv_total_opens != 0)
>>>>>  		return (SET_ERROR(EBUSY));
>>>>> 
>>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>>>  	zvol_state_t *zv;
>>>>>  	int rc;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>  	g_topology_lock();
>>>>>  	rc = zvol_remove_zv(zv);
>>>>>  	g_topology_unlock();
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (rc);
>>>>>  }
>>>>> 
>>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>>>  	dmu_tx_t *tx;
>>>>>  	int error;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>
>>>>>  	tx = dmu_tx_create(os);
>>>>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>>>  	namelen = strlen(name);
>>>>>
>>>>>  	DROP_GIANT();
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	g_topology_lock();
>>>>>
>>>>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>>>  	}
>>>>>
>>>>>  	g_topology_unlock();
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	PICKUP_GIANT();
>>>>>  }
>>>>> 
>>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>>> uint64_t volsize)
>>>>>  	uint64_t old_volsize = 0ULL;
>>>>>  	uint64_t readonly;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	zv = zvol_minor_lookup(name);
>>>>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (error);
>>>>>  	}
>>>>> 
>>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>>> uint64_t volsize)
>>>>>  out:
>>>>>  	dmu_objset_rele(os, FTAG);
>>>>> 
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>
>>>>>  	return (error);
>>>>>  }
>>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  {
>>>>>  	zvol_state_t *zv;
>>>>>  	int err = 0;
>>>>> -	boolean_t locked = B_FALSE;
>>>>> 
>>>>> -	/*
>>>>> -	 * Protect against recursively entering spa_namespace_lock
>>>>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>>>>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>>>>> -	 * with spa_namespace_lock in the ZVOL code.
>>>>> -	 * We are using the same trick as spa_open().
>>>>> -	 * Note that calls in zvol_first_open which need to resolve
>>>>> -	 * pool name to a spa object will enter spa_open()
>>>>> -	 * recursively, but that function already has all the
>>>>> -	 * necessary protection.
>>>>> -	 */
>>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>> -		locked = B_TRUE;
>>>>> -	}
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	zv = pp->private;
>>>>>  	if (zv == NULL) {
>>>>> -		if (locked)
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>
>>>>>  	if (zv->zv_total_opens == 0)
>>>>>  		err = zvol_first_open(zv);
>>>>>  	if (err) {
>>>>> -		if (locked)
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (err);
>>>>>  	}
>>>>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  #endif
>>>>>
>>>>>  	zv->zv_total_opens += count;
>>>>> -	if (locked)
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>
>>>>>  	return (err);
>>>>>  out:
>>>>>  	if (zv->zv_total_opens == 0)
>>>>>  		zvol_last_close(zv);
>>>>> -	if (locked)
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (err);
>>>>>  }
>>>>> 
>>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  {
>>>>>  	zvol_state_t *zv;
>>>>>  	int error = 0;
>>>>> -	boolean_t locked = B_FALSE;
>>>>> 
>>>>> -	/* See comment in zvol_open(). */
>>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>> -		locked = B_TRUE;
>>>>> -	}
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	zv = pp->private;
>>>>>  	if (zv == NULL) {
>>>>> -		if (locked)
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>> 
>>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>>> count)
>>>>>  	if (zv->zv_total_opens == 0)
>>>>>  		zvol_last_close(zv);
>>>>> 
>>>>> -	if (locked)
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (error);
>>>>>  }
>>>>> 
>>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  	int error = 0;
>>>>>  	rl_t *rl;
>>>>> 
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>
>>>>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>>>
>>>>>  	if (zv == NULL) {
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		return (SET_ERROR(ENXIO));
>>>>>  	}
>>>>>  	ASSERT(zv->zv_total_opens > 0);
>>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  		dki.dki_ctype = DKC_UNKNOWN;
>>>>>  		dki.dki_unit = getminor(dev);
>>>>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - 
>>>>> zv->zv_min_bs);
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>>>  			error = SET_ERROR(EFAULT);
>>>>>  		return (error);
>>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>>>  		dkm.dki_media_type = DK_UNKNOWN;
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>>>  			error = SET_ERROR(EFAULT);
>>>>>  		return (error);
>>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  			uint64_t vs = zv->zv_volsize;
>>>>>  			uint8_t bs = zv->zv_min_bs;
>>>>> 
>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>> +			mutex_exit(&zfsdev_state_lock);
>>>>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>>>>  			return (error);
>>>>>  		}
>>>>>
>>>>>  	case DKIOCFLUSHWRITECACHE:
>>>>>  		dkc = (struct dk_callback *)arg;
>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  			}
>>>>>  			if (wce) {
>>>>>  				zv->zv_flags |= ZVOL_WCE;
>>>>> -				mutex_exit(&spa_namespace_lock);
>>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>>  			} else {
>>>>>  				zv->zv_flags &= ~ZVOL_WCE;
>>>>> -				mutex_exit(&spa_namespace_lock);
>>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>  			}
>>>>>  			return (0);
>>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>> flag, cred_t *cr, int *rvalp)
>>>>>  		break;
>>>>>
>>>>>  	}
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	return (error);
>>>>>  }
>>>>>  #endif	/* sun */
>>>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>>>  {
>>>>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>>>  	    1) == 0);
>>>>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>>>>  }
>>>>>
>>>>>  void
>>>>>  zvol_fini(void)
>>>>>  {
>>>>> +	mutex_destroy(&zfsdev_state_lock);
>>>>>  	ddi_soft_state_fini(&zfsdev_state);
>>>>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>>>>  }
>>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize)
>>>>>  	uint64_t version = spa_version(spa);
>>>>>  	enum zio_checksum checksum;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>>>
>>>>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char 
>>>>> *newname)
>>>>>  	struct g_provider *pp;
>>>>>  	zvol_state_t *zv;
>>>>> 
>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>  	g_topology_assert();
>>>>>
>>>>>  	pp = LIST_FIRST(&gp->provider);
>>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char 
>>>>> *newname)
>>>>>  	newnamelen = strlen(newname);
>>>>>
>>>>>  	DROP_GIANT();
>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>  	g_topology_lock();
>>>>>
>>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char 
>>>>> *newname)
>>>>>  	}
>>>>>
>>>>>  	g_topology_unlock();
>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>  	PICKUP_GIANT();
>>>>>  }
>>>>> --
>>>>> 1.8.4.2
>>>>> 
>>>> 
>>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>>>> 
>>>>> ---
>>>>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 
>>>>> +++++++++++++++-------
>>>>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>>>> 
>>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c 
>>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>> index 94383d6..4cac053 100644
>>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>  	 * progress on this vnode.
>>>>>  	 */
>>>>> 
>>>>> +	vn_lock(cvp, lktype);
>>>>> +
>>>>>  	for (;;) {
>>>>>  		/*
>>>>>  		 * Reached the end of the mount chain?
>>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>  		if (vfsp == NULL)
>>>>>  			break;
>>>>>  		error = vfs_busy(vfsp, 0);
>>>>> -		/*
>>>>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>>>>> -		 */
>>>>> -		if (tvp != NULL)
>>>>> -			vput(cvp);
>>>>> -		else
>>>>> -			vrele(cvp);
>>>>> +		VOP_UNLOCK(cvp, 0);
>>>>>  		if (error)
>>>>>  			return (error);
>>>>> 
>>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>  		vfs_unbusy(vfsp);
>>>>>  		if (error != 0)
>>>>>  			return (error);
>>>>> +
>>>>> +		VN_RELE(cvp);
>>>>> +
>>>>>  		cvp = tvp;
>>>>>  	}
>>>>> 
>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c 
>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>> index 28ab1fa..d3464b4 100644
>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>>>  		return (0);
>>>>>  }
>>>>> 
>>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>>>> + * On return the passed in vp is unlocked */
>>>>> +static zfsctl_snapdir_t *
>>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>>>> +{
>>>>> +	gfs_dir_t *dp = vp->v_data;
>>>>> +	*dvpp = dp->gfsd_file.gfs_parent;
>>>>> +	zfsctl_snapdir_t *sdp;
>>>>> +
>>>>> +	VN_HOLD(*dvpp);
>>>>> +	VOP_UNLOCK(vp, 0);
>>>>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>>>> +	sdp = (*dvpp)->v_data;
>>>>> +	VOP_UNLOCK(*dvpp, 0);
>>>>> +
>>>>> +	return (sdp);
>>>>> +}
>>>>> +
>>>>>  #ifdef sun
>>>>>  vnodeops_t *zfsctl_ops_root;
>>>>>  vnodeops_t *zfsctl_ops_snapdir;
>>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>>>  			 * The snapshot was unmounted behind our backs,
>>>>>  			 * try to remount it.
>>>>>  			 */
>>>>> +			VOP_UNLOCK(*vpp, 0);
>>>>> +			VN_HOLD(*vpp);
>>>>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, 
>>>>> snapname) == 0);
>>>>>  			goto domount;
>>>>>  		} else {
>>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>>>  	(void) strcpy(sep->se_name, nm);
>>>>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, 
>>>>> dmu_objset_id(snap));
>>>>> -	VN_HOLD(*vpp);
>>>>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>>>>
>>>>>  	dmu_objset_rele(snap, FTAG);
>>>>> @@ -1075,6 +1095,7 @@ domount:
>>>>>  	(void) snprintf(mountpoint, mountpoint_len,
>>>>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>>>> +	VN_HOLD(*vpp);
>>>>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>>>  	kmem_free(mountpoint, mountpoint_len);
>>>>>  	if (err == 0) {
>>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>>>  	int locked;
>>>>>  	vnode_t *dvp;
>>>>> 
>>>>> -	if (vp->v_count > 0)
>>>>> -		goto end;
>>>>> -
>>>>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>>>> -	sdp = dvp->v_data;
>>>>> -	VOP_UNLOCK(dvp, 0);
>>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>
>>>>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>>>  		mutex_enter(&sdp->sd_lock);
>>>>> 
>>>>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>>>> +
>>>>> +	if (vp->v_count > 0) {
>>>>> +		mutex_exit(&sdp->sd_lock);
>>>>> +		return (0);
>>>>> +	}
>>>>> +
>>>>>  	ASSERT(!vn_ismntpt(vp));
>>>>>
>>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>>>  		mutex_exit(&sdp->sd_lock);
>>>>>  	VN_RELE(dvp);
>>>>> 
>>>>> -end:
>>>>>  	/*
>>>>>  	 * Dispose of the vnode for the snapshot mount point.
>>>>>  	 * This is safe to do because once this entry has been removed
>>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>>>  static int
>>>>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>>>  {
>>>>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>>>> -	vnode_t *dvp, *vp;
>>>>> +	vnode_t *dvp, *vp = ap->a_vp;
>>>>>  	zfsctl_snapdir_t *sdp;
>>>>>  	zfs_snapentry_t *sep;
>>>>> -	int error;
>>>>> +	int error = 0;
>>>>> 
>>>>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>>>>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>>>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>>>> -	if (error != 0)
>>>>> -		return (error);
>>>>> -	sdp = dvp->v_data;
>>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>
>>>>>  	mutex_enter(&sdp->sd_lock);
>>>>> +
>>>>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>>>>> +
>>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>>  	while (sep != NULL) {
>>>>>  		vp = sep->se_root;
>>>>> --
>>>>> 1.8.4.2
>>>>> 
>>>> 
>>>> -- 
>>>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>>>> FreeBSD committer                         http://www.FreeBSD.org
>>>> Am I Evil? Yes, I Am!                     http://mobter.com
>>>> 
>>> _______________________________________________
>>> freebsd-fs@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>> 
>

From owner-freebsd-fs@FreeBSD.ORG  Tue Dec 24 22:39:34 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D1A4777B
 for <freebsd-fs@freebsd.org>; Tue, 24 Dec 2013 22:39:34 +0000 (UTC)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
 [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 824211633
 for <freebsd-fs@freebsd.org>; Tue, 24 Dec 2013 22:39:34 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: 
X-IronPort-AV: E=Sophos;i="4.95,545,1384318800"; d="scan'208";a="83072814"
Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
 ([131.104.91.222])
 by esa-jnhn.mail.uoguelph.ca with ESMTP; 24 Dec 2013 17:39:33 -0500
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
 by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ED422B4055;
 Tue, 24 Dec 2013 17:39:32 -0500 (EST)
Date: Tue, 24 Dec 2013 17:39:32 -0500 (EST)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Jason Keltz <jas@cse.yorku.ca>
Message-ID: <1566997692.1186754.1387924772956.JavaMail.root@uoguelph.ca>
In-Reply-To: <52A7E53D.8000002@cse.yorku.ca>
Subject: Re: mount ZFS snapshot on Linux system
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790)
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>,
 Steve Dickson <SteveD@redhat.com>
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2013 22:39:34 -0000

Jason Keltz wrote:
> On 10/12/2013 7:21 PM, Rick Macklem wrote:
> > Jason Keltz wrote:
> >> I'm running FreeBSD 9.2 with various ZFS datasets.
> >> I export a dataset to a Linux system (RHEL64), and mount it.  It
> >> works
> >> fine...
> >> When I try to access the ZFS snapshot directory on the Linux NFS
> >> client,
> >> things go weird.
> >>
Thanks to Jason's testing, I have just committed the following patch
  http://people.freebsd.org/~rmacklem/nfsv4-zfs-snapshot.patch
to head as r259845, which I think fixes the problem. If things go
smoothly, I'll MFC it to 9/stable and 10/stable in 1 week.

Hopefully this is now resolved, rick

> >> With NFSv4:
> >>
> >> [jas@archive /]# cd /mnt/.zfs/snapshot
> >> [jas@archive snapshot]# ls
> >> 20131203  20131205  20131206  20131207  20131208  20131209
> >>  20131210
> >> [jas@archive snapshot]# cd 20131210
> >> 20131210: Not a directory.
> >>
> >> huh?
> >>
> >> [jas@archive snapshot]# ls -al
> >> total 77
> >> dr-xr-xr-x   9 root root   9 Dec 10 11:20 .
> >> dr-xr-xr-x   4 root root   4 Nov 28 15:42 ..
> >> drwxr-xr-x 380 root root 380 Dec  2 15:56 20131203
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131205
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131206
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131207
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131208
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131209
> >> drwxr-xr-x 381 root root 381 Dec  3 11:24 20131210
> >> [jas@archive snapshot]# stat *
> >> [jas@archive snapshot]# ls -al
> >> total 292
> >> dr-xr-xr-x 9 root      root         9 Dec 10 11:20 .
> >> dr-xr-xr-x 4 root      root         4 Nov 28 15:42 ..
> >> -rw-r--r-- 1 uax    guest   137647 Mar 17  2010 20131203
> >> -rw-r--r-- 1 uax    guest         865 Jul 31  2009 20131205
> >> -rw-r--r-- 1 uax    guest   137647 Mar 17  2010 20131206
> >> -rw-r--r-- 1 uax    guest         771 Jul 31  2009 20131207
> >> -rw-r--r-- 1 uax    guest         778 Jul 31  2009 20131208
> >> -rw-r--r-- 1 uax     guest       5281 Jul 31  2009 20131209
> >> -rw------- 1 btx      faculty      893 Jul 13 20:21 20131210
> >>
> >> But it gets even more fun..
> >>
> >> # ls -ali
> >> total 205
> >>     2 dr-xr-xr-x   9 root      root       9 Dec 10 11:20 .
> >>     1 dr-xr-xr-x   4 root      root       4 Nov 28 15:42 ..
> >> 863 -rw-r--r--   1 uax     guest 137647 Mar 17  2010 20131203
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131205
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131206
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131207
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131208
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131209
> >>     4 drwxr-xr-x 381 root      root     381 Dec  3 11:24 20131210
> >>
> >> This is not a user id mapping issue because all the files in /mnt
> >> have
> >> the proper owner/groups, and I can access them there fine.
> >>
> >> I also tried explicitly exporting .zfs/snapshot.  The result isn't
> >> any
> >> different.
> >>
> >> If I use nfs v3 it "works", but I'm seeing a whole lot of errors
> >> like
> >> these in syslog:
> >>
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131203: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131209: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131210: Invalid argument
> >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for
> >> /local/backup/home9/.zfs/snapshot/20131207: Invalid argument
> >>
> >> It's not clear to me why this doesn't just "work".
> >>
> >> Can anyone provide any advice on debugging this?
> >>
> > As I think you already know, I know nothing about ZFS and never
> > use it.
> Yup! :)
> > Having said that, I suspect that there are filenos (i-node #s)
> > that are the same in the snapshot as in the parent file system
> > tree.
> >
> > The basic assumptions are:
> > - within a file system, all i-node# are unique (represent one file
> >    object only) and all file objects have the same fsid
> > - when the fsid changes, that indicates a file system boundary and
> >    fileno (i-node#s) can be reused in the subtree with a different
> >    fsid
> >
> > For NFSv3, the server should export single volumes only (all
> > objects
> > have the same fsid and the filenos are unique). This is indicated
> > to
> > the VFS by the use of the NOCROSSMOUNT flag on VOP_LOOKUP() and
> > friends.
> >
> > For NFSv4, the server does export multiple volumes and the boundary
> > is indicated by a change in fsid value.
> >
> > I suspect ZFS snaphots don't obey the above in some way, but that
> > is
> > just a hunch.
> >
> > Now, how to narrow this down...
> > - Do the above tests (both NFSv4 and NFSv3) and capture the
> > packets,
> >    then look at them in wireshark. In particular, look at the
> >    fileid numbers
> >    and fsid values for the various directories under .zfs.
> 
> I gave this a shot, but I haven't used wireshark to capture NFS
> traffic
> before, so if I need to provide additional details, let me know..
> 
> NFSv4:
> 
> For /mnt/.zfs/snapshot/20131203:
> fileid=4
> fsid4.major=1446349656
> fsid4.minor=222
> 
> For /mnt/.zfs/snapshot/20131205:
> fileid=4
> fsid4.major=1845998066
> fsid4.minor=222
> 
> For /mnt/jas:
> fileid=144
> fsid4.major=597946950
> fsid4.minor=222
> 
> For /mnt/jas1:
> fileid=338
> fsid4.major=597946950
> fsid4.minor=222
> 
> So fsid is the same for all the different "data" directories, which
> is
> what I would expect given what you said.  I  guess each snapshot is
> seen
> as a unique filesystem...  but then a repeating inode in different
> filesystems shouldn't be a problem...
> 
> NFSv3:
> 
> For /mnt/.zfs/snapshot/20131203:
> fileid=4
> fsid=0x0000000056358b58
> 
> For /mnt/.zfs/snapshot/20131205:
> fileid=4
> fsid=0x000000006e07b1f2
> 
> For /mnt/jas
> fileid=144
> fsid=0x0000000023a3f246
> 
> For /mnt/jas1:
> fileid=338
> fsid=0x0000000023a3f246
> 
> Here, it seems it's the same, even though it's NFSv3... hmm.
> 
> 
> > - Try mounting the individual snapshot directory, like
> >     .zfs/snapshot/20131209 and see if that works (for both NFSv3
> >     and NFSv4).
> 
> Hmm .. I tried this:
> 
> /local/backup/home9/.zfs/snapshot/20131203  -ro
> archive-mrpriv.cs.yorku.ca
> V4: /
> 
> ... but syslog reports:
> 
> Dec 10 22:28:22 jungle mountd[85405]: can't export
> /local/backup/home9/.zfs/snapshot/20131203
> 
> ... and of course I can't mount from either v3/v4.
> 
> On the other hand, I kept it as:
> 
> /local/backup/home9 -ro archive-mrpriv.cs.yorku.ca
> V4:/
> 
> ... and was able to NFSv4 mount
> /local/backup/home9/.zfs/snapshot/20131203, and this does indeed
> work.
> 
> > - Try doing the mounts with a FreeBSD client and see if you get the
> > same
> >    behaviour?
> I found this:
> http://forums.freenas.org/threads/mounting-snapshot-directory-using-nfs-from-linux-broken.6060/
> .. implies it will work from FreeBSD/Nexenta, just not Linux.
> Found this as well:
> https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/lKyfYsjPMNM
> 
> Jason.
> 
> 

From owner-freebsd-fs@FreeBSD.ORG  Wed Dec 25 05:23:01 2013
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 0D852433;
 Wed, 25 Dec 2013 05:23:01 +0000 (UTC)
Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10])
 by mx1.freebsd.org (Postfix) with ESMTP id 5C0D01233;
 Wed, 25 Dec 2013 05:22:59 +0000 (UTC)
Received: by krichy.tvnetwork.hu (Postfix, from userid 1000)
 id 8880EA943; Wed, 25 Dec 2013 06:22:37 +0100 (CET)
Received: from localhost (localhost [127.0.0.1])
 by krichy.tvnetwork.hu (Postfix) with ESMTP id 82BD7A942;
 Wed, 25 Dec 2013 06:22:37 +0100 (CET)
Date: Wed, 25 Dec 2013 06:22:37 +0100 (CET)
From: krichy@tvnetwork.hu
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
In-Reply-To: <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu>
Message-ID: <alpine.DEB.2.10.1312250614001.2785@krichy.tvnetwork.hu>
References: <alpine.DEB.2.10.1312161647410.11599@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312171326520.7714@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312191629370.4344@krichy.tvnetwork.hu>
 <20131220134405.GE1658@garage.freebsd.pl>
 <alpine.DEB.2.10.1312231750030.31822@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312240706470.8128@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312242126250.19813@krichy.tvnetwork.hu>
 <alpine.DEB.2.10.1312242133240.20268@krichy.tvnetwork.hu>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="1030603365-641758273-1387948957=:2785"
Cc: freebsd-fs@freebsd.org, avg@freebsd.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Dec 2013 05:23:01 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--1030603365-641758273-1387948957=:2785
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed

Dear all,

I've made a new patch again, which fixes most of my earlier issues. 
Mainly, I've enabled shared vnode locks for GFS vnodes - is it acceptable? 
And that way, deadlock cases reduced much, and also the patch is more 
clear (at least to me). One thing still remains, the spa_namespace_lock 
race I mentioned before, I dont know how to handle that.

Waiting for comments on this.

Regards,


Kojedzinszky Richard
Euronet Magyarorszag Informatikai Zrt.

On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:

> Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET)
> From: krichy@tvnetwork.hu
> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
> 
> Dear all,
>
> Now a user caused deadlock simply caused buildkernel to be blocked, so I must 
> correct myself in that this not only does a DoS for .zfs/snapshot 
> directories, but may for the whole system.
>
> Regards,
>
> P.S.: Merry Christmas!
>
> Kojedzinszky Richard
> Euronet Magyarorszag Informatikai Zrt.
>
> On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:
>
>> Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET)
>> From: krichy@tvnetwork.hu
>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>> 
>> Dear all,
>> 
>> Unfortunately, this seems to be serious, as a plain user can deadlock zfs 
>> using the attached script, and can make a DoS against the mounted dataset's 
>> .zfs/snapshot directory.
>> 
>> Regards,
>> 
>> Kojedzinszky Richard
>> Euronet Magyarorszag Informatikai Zrt.
>> 
>> On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote:
>> 
>>> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET)
>>> From: krichy@tvnetwork.hu
>>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>> 
>>> Dear devs,
>>> 
>>> A third one, again the snapshot dirs:
>>> 
>>> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null &
>>> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null
>>> 
>>> Will deadlock in a few seconds.
>>> 
>>> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries 
>>> to lock .zfs, while the other just tries to do this in reversed order.
>>> 
>>> In a vop_lookup, why is the passed in vnode, and the returned vnode must 
>>> be locked? Why is not enough to have it held?
>>> 
>>> Thanks in advance,
>>> Kojedzinszky Richard
>>> Euronet Magyarorszag Informatikai Zrt.
>>> 
>>> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote:
>>> 
>>>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET)
>>>> From: krichy@tvnetwork.hu
>>>> To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org
>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>> 
>>>> Dear Pawel,
>>>> 
>>>> Thanks for your response. I hope someone will review my work.
>>>> 
>>>> Secondly, I think I;ve found another deadlock scenario:
>>>> 
>>>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock 
>>>> held. Then it does an unmount of all snapshots, which in turn goes to 
>>>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked.
>>>> 
>>>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock 
>>>> does a snapshot mount, which somewhere tries to acquire 
>>>> spa_namespace_lock. So it deadlocks. Currently I dont know how to 
>>>> workaround this.
>>>> 
>>>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()?
>>>> 
>>>> What if we release it function enter, and re-acquire upon exit?
>>>> 
>>>> Thanks in advance,
>>>> 
>>>> 
>>>> Kojedzinszky Richard
>>>> Euronet Magyarorszag Informatikai Zrt.
>>>> 
>>>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote:
>>>> 
>>>>> Date: Fri, 20 Dec 2013 14:44:05 +0100
>>>>> From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
>>>>> To: krichy@tvnetwork.hu
>>>>> Cc: freebsd-fs@freebsd.org
>>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks
>>>>> 
>>>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote:
>>>>>> Dear devs,
>>>>>> 
>>>>>> I am attaching a more clear patch/fix for my snapshot handling issues
>>>>>> (0002), but I would be happy if some ZFS expert would comment it. I am
>>>>>> trying to solve it at least for two weeks now, and an ACK or a NACK 
>>>>>> would
>>>>>> be nice from someone. Also a commit is reverted since it also caused
>>>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but 
>>>>>> I
>>>>>> did not find any reference how to produce that deadlock. In my view
>>>>>> reverting that makes my issues disappear, but I dont know what new 
>>>>>> cases
>>>>>> will it rise.
>>>>> 
>>>>> Richard,
>>>>> 
>>>>> I won't be able to analyse it myself anytime soon, because of other
>>>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list
>>>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from
>>>>> there will be able to help you.
>>>>> 
>>>>>> I've rewritten traverse() to make more like upstream, added two extra
>>>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode 
>>>>>> what
>>>>>> was passed to it (I dont know even that upon creating a snapshot vnode 
>>>>>> why
>>>>>> is that extra two holds needed for the vnode.) And unfortunately, due 
>>>>>> to
>>>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could 
>>>>>> also
>>>>>> cause deadlocks, so zfsctl_snapshot_inactive() and
>>>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around.
>>>>>> 
>>>>>> After this, one may also get a deadlock, when a simple access would 
>>>>>> call
>>>>>> into zfsctl_snapshot_lookup(). The documentation says, that those 
>>>>>> vnodes
>>>>>> should always be covered, but after some stress test, sometimes we hit
>>>>>> that call, and that can cause again deadlocks. In our environment I've
>>>>>> just uncommented that callback, which returns ENODIR on some calls, but 
>>>>>> at
>>>>>> least not a deadlock.
>>>>>> 
>>>>>> The attached script can be used to reproduce my cases (would one 
>>>>>> confirm
>>>>>> that?), and after the patches applied, they disappear (confirm?).
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> 
>>>>>> Kojedzinszky Richard
>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>> 
>>>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>> 
>>>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET)
>>>>>>> From: krichy@tvnetwork.hu
>>>>>>> To: pjd@freebsd.org
>>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>>> 
>>>>>>> Dear devs,
>>>>>>> 
>>>>>>> I will sum up my experience regarding the issue:
>>>>>>> 
>>>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on 
>>>>>>> the
>>>>>>> snapshot dir (or in the snapshot) may cause a deadlock.
>>>>>>> 
>>>>>>> After investigating the problem, I found that zfs send umounts the 
>>>>>>> snapshots,
>>>>>>> and that causes the deadlock, so later I tested only with concurrent 
>>>>>>> umount
>>>>>>> and the "activity". More later I found that listing the snapshots in
>>>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I 
>>>>>>> used
>>>>>>> them for the tests. But for my surprise, instead of a deadlock, a 
>>>>>>> recursive
>>>>>>> lock panic has arised.
>>>>>>> 
>>>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's 
>>>>>>> zfsctl_snapdir_t
>>>>>>> structure (sdp). This contains a tree of mounted snapshots, and each 
>>>>>>> entry
>>>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on 
>>>>>>> top
>>>>>>> (se_root). The strange is that the se_root member does not hold a 
>>>>>>> reference
>>>>>>> for the vnode, just a simple pointer to it.
>>>>>>> 
>>>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is 
>>>>>>> locked,
>>>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if 
>>>>>>> it
>>>>>>> exists already. If it founds no entry, does the mount. In the case of 
>>>>>>> an
>>>>>>> entry was found, the se_root member contains the vnode which the 
>>>>>>> snapshot is
>>>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call 
>>>>>>> will
>>>>>>> resolve to the real root vnode of the mounted snapshot, returning it 
>>>>>>> as
>>>>>>> locked. (Examining the traverse() code I've found that it did not 
>>>>>>> follow
>>>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.)
>>>>>>> 
>>>>>>> On the other way, when an umount is issued, the se_root vnode looses 
>>>>>>> its last
>>>>>>> reference (as only the mountpoint holds one for it), it goes through 
>>>>>>> the
>>>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is 
>>>>>>> called
>>>>>>> with a locked vnode, so this is a deadlock race condition. While
>>>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and 
>>>>>>> traverse()
>>>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the 
>>>>>>> lock on
>>>>>>> se_root while tries to access the sdp lock.
>>>>>>> 
>>>>>>> The zfsctl_snapshot_inactive() has an if statement checking the 
>>>>>>> v_usecount,
>>>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context 
>>>>>>> it is
>>>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() 
>>>>>>> path
>>>>>>> assumes that the vnode remains inactive (as opposed to illumos, at 
>>>>>>> least how
>>>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources 
>>>>>>> while in
>>>>>>> a locked state. I was a bit confused, and probably that is why the 
>>>>>>> previously
>>>>>>> posted patch is as is.
>>>>>>> 
>>>>>>> Maybe if I had some clues on the directions of this problem, I could 
>>>>>>> have
>>>>>>> worked more for a nicer, shorter solution.
>>>>>>> 
>>>>>>> Please someone comment on my post.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Kojedzinszky Richard
>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>> 
>>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>>> 
>>>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET)
>>>>>>>> From: krichy@tvnetwork.hu
>>>>>>>> To: pjd@freebsd.org
>>>>>>>> Cc: freebsd-fs@freebsd.org
>>>>>>>> Subject: Re: kern/184677 (fwd)
>>>>>>>> 
>>>>>>>> Dear PJD,
>>>>>>>> 
>>>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts
>>>>>>>> regarding some issues in ZFS. Please give some advice for me, where 
>>>>>>>> to look
>>>>>>>> for solutions, or how could I help to resolve those issues.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Kojedzinszky Richard
>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>> 
>>>>>>>> ---------- Forwarded message ----------
>>>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET)
>>>>>>>> From: krichy@tvnetwork.hu
>>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>>> Subject: Re: kern/184677
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on 
>>>>>>>> Fri
>>>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock
>>>>>>>> situation. Any comments on this?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Kojedzinszky Richard
>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>> 
>>>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote:
>>>>>>>> 
>>>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET)
>>>>>>>>> From: krichy@tvnetwork.hu
>>>>>>>>> To: freebsd-fs@freebsd.org
>>>>>>>>> Subject: kern/184677
>>>>>>>>> 
>>>>>>>>> Dear devs,
>>>>>>>>> 
>>>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, 
>>>>>>>>> and
>>>>>>>>> makes the reported bug to disappear. I dont know if I followed any
>>>>>>>>> guidelines well, or not, but at least it works for me. Please some
>>>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed.
>>>>>>>>> 
>>>>>>>>> But unfortunately, my original problem is still not solved, maybe 
>>>>>>>>> the same
>>>>>>>>> as Ryan's:
>>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html
>>>>>>>>> 
>>>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to 
>>>>>>>>> acquire
>>>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a
>>>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes 
>>>>>>>>> a
>>>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason 
>>>>>>>>> they
>>>>>>>>> use another mutex on zfsdev_close(), which therefore may not 
>>>>>>>>> deadlock with
>>>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem.
>>>>>>>>> 
>>>>>>>>> I would like to help making ZFS more stable on freebsd also with its 
>>>>>>>>> whole
>>>>>>>>> functionality. I would be very thankful if some expert would give 
>>>>>>>>> some
>>>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin?
>>>>>>>>> 
>>>>>>>>> Thanks in advance,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kojedzinszky Richard
>>>>>>>>> Euronet Magyarorszag Informatikai Zrt.
>>>>>>>> _______________________________________________
>>>>>>>> freebsd-fs@freebsd.org mailing list
>>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001
>>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100
>>>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely 
>>>>>> and
>>>>>>  replace it with the"
>>>>>> 
>>>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218.
>>>>>> 
>>>>>> Conflicts:
>>>>>> 	sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> ---
>>>>>>  .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h  |   1 +
>>>>>>  .../opensolaris/uts/common/fs/zfs/vdev_geom.c      |  14 ++-
>>>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c      |  16 +--
>>>>>>  .../contrib/opensolaris/uts/common/fs/zfs/zvol.c   | 119 
>>>>>> +++++++++------------
>>>>>>  4 files changed, 70 insertions(+), 80 deletions(-)
>>>>>> 
>>>>>> diff --git 
>>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h 
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> index af2def2..8272c4d 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h
>>>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor,
>>>>>>  extern minor_t zfsdev_minor_alloc(void);
>>>>>>
>>>>>>  extern void *zfsdev_state;
>>>>>> +extern kmutex_t zfsdev_state_lock;
>>>>>>
>>>>>>  #endif	/* _KERNEL */
>>>>>> 
>>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c 
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> index 15685a5..5c3e9f3 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
>>>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>>> uint64_t *max_psize,
>>>>>>  	struct g_provider *pp;
>>>>>>  	struct g_consumer *cp;
>>>>>>  	size_t bufsize;
>>>>>> -	int error;
>>>>>> +	int error, lock;
>>>>>>
>>>>>>  	/*
>>>>>>  	 * We must have a pathname, and it must be absolute.
>>>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>>> uint64_t *max_psize,
>>>>>>
>>>>>>  	vd->vdev_tsd = NULL;
>>>>>> 
>>>>>> +	if (mutex_owned(&spa_namespace_lock)) {
>>>>>> +		mutex_exit(&spa_namespace_lock);
>>>>>> +		lock = 1;
>>>>>> +	} else {
>>>>>> +		lock = 0;
>>>>>> +	}
>>>>>>  	DROP_GIANT();
>>>>>>  	g_topology_lock();
>>>>>>  	error = 0;
>>>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>>> uint64_t *max_psize,
>>>>>>  	    !ISP2(cp->provider->sectorsize)) {
>>>>>>  		ZFS_LOG(1, "Provider %s has unsupported sectorsize.",
>>>>>>  		    vd->vdev_path);
>>>>>> +
>>>>>> +		g_topology_lock();
>>>>>>  		vdev_geom_detach(cp, 0);
>>>>>> +		g_topology_unlock();
>>>>>> +
>>>>>>  		error = EINVAL;
>>>>>>  		cp = NULL;
>>>>>>  	} else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) {
>>>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, 
>>>>>> uint64_t *max_psize,
>>>>>>  	}
>>>>>>  	g_topology_unlock();
>>>>>>  	PICKUP_GIANT();
>>>>>> +	if (lock)
>>>>>> +		mutex_enter(&spa_namespace_lock);
>>>>>>  	if (cp == NULL) {
>>>>>>  		vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED;
>>>>>>  		return (error);
>>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c 
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> index e9fba26..91becde 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
>>>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void)
>>>>>>  	static minor_t last_minor;
>>>>>>  	minor_t m;
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>>  	for (m = last_minor + 1; m != last_minor; m++) {
>>>>>>  		if (m > ZFSDEV_MAX_MINOR)
>>>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>>  	minor_t minor;
>>>>>>  	zfs_soft_state_t *zs;
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>>  	minor = zfsdev_minor_alloc();
>>>>>>  	if (minor == 0)
>>>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp)
>>>>>>  static void
>>>>>>  zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor)
>>>>>>  {
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>>  	zfs_onexit_destroy(zo);
>>>>>>  	ddi_soft_state_free(zfsdev_state, minor);
>>>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int 
>>>>>> mode, struct thread *td)
>>>>>>
>>>>>>  	/* This is the control device. Allocate a new minor if requested. */
>>>>>>  	if (flag & FEXCL) {
>>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>>> +		mutex_enter(&zfsdev_state_lock);
>>>>>>  		error = zfs_ctldev_init(devp);
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  	}
>>>>>>
>>>>>>  	return (error);
>>>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data)
>>>>>>  	if (minor == 0)
>>>>>>  		return;
>>>>>> 
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>  	zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV);
>>>>>>  	if (zo == NULL) {
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return;
>>>>>>  	}
>>>>>>  	zfs_ctldev_destroy(zo, minor);
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  }
>>>>>>
>>>>>>  static int
>>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c 
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> index 72d4502..aec5219 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c
>>>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag";
>>>>>>  #define	ZVOL_DUMPSIZE		"dumpsize"
>>>>>>
>>>>>>  /*
>>>>>> - * The spa_namespace_lock protects the zfsdev_state structure from 
>>>>>> being
>>>>>> - * modified while it's being used, e.g. an open that comes in before a
>>>>>> - * create finishes.  It also protects temporary opens of the dataset 
>>>>>> so that,
>>>>>> + * This lock protects the zfsdev_state structure from being modified
>>>>>> + * while it's being used, e.g. an open that comes in before a create
>>>>>> + * finishes.  It also protects temporary opens of the dataset so that,
>>>>>>   * e.g., an open doesn't get a spurious EBUSY.
>>>>>>   */
>>>>>> +kmutex_t zfsdev_state_lock;
>>>>>>  static uint32_t zvol_minors;
>>>>>>
>>>>>>  typedef struct zvol_extent {
>>>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name)
>>>>>>  	struct g_geom *gp;
>>>>>>  	zvol_state_t *zv = NULL;
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>>  	g_topology_lock();
>>>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor)
>>>>>>  {
>>>>>>  	zvol_state_t *zv;
>>>>>> 
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>  	zv = zvol_minor_lookup(name);
>>>>>>  	if (minor && zv)
>>>>>>  		*minor = zv->zv_minor;
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	return (zv ? 0 : -1);
>>>>>>  }
>>>>>>  #endif	/* sun */
>>>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name)
>>>>>>
>>>>>>  	ZFS_LOG(1, "Creating ZVOL %s...", name);
>>>>>> 
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>>  	if (zvol_minor_lookup(name) != NULL) {
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(EEXIST));
>>>>>>  	}
>>>>>> 
>>>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name)
>>>>>>  	error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os);
>>>>>>
>>>>>>  	if (error) {
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (error);
>>>>>>  	}
>>>>>>
>>>>>>  #ifdef sun
>>>>>>  	if ((minor = zfsdev_minor_alloc()) == 0) {
>>>>>>  		dmu_objset_disown(os, FTAG);
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(ENXIO));
>>>>>>  	}
>>>>>>
>>>>>>  	if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) {
>>>>>>  		dmu_objset_disown(os, FTAG);
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>>  	}
>>>>>>  	(void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME,
>>>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name)
>>>>>>  	    minor, DDI_PSEUDO, 0) == DDI_FAILURE) {
>>>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>>>  		dmu_objset_disown(os, FTAG);
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>>  	}
>>>>>> 
>>>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name)
>>>>>>  		ddi_remove_minor_node(zfs_dip, chrbuf);
>>>>>>  		ddi_soft_state_free(zfsdev_state, minor);
>>>>>>  		dmu_objset_disown(os, FTAG);
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(EAGAIN));
>>>>>>  	}
>>>>>> 
>>>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name)
>>>>>>
>>>>>>  	zvol_minors++;
>>>>>> 
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>
>>>>>>  	zvol_geom_run(zv);
>>>>>> 
>>>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv)
>>>>>>  	minor_t minor = zv->zv_minor;
>>>>>>  #endif
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>  	if (zv->zv_total_opens != 0)
>>>>>>  		return (SET_ERROR(EBUSY));
>>>>>> 
>>>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name)
>>>>>>  	zvol_state_t *zv;
>>>>>>  	int rc;
>>>>>> 
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>  	if ((zv = zvol_minor_lookup(name)) == NULL) {
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(ENXIO));
>>>>>>  	}
>>>>>>  	g_topology_lock();
>>>>>>  	rc = zvol_remove_zv(zv);
>>>>>>  	g_topology_unlock();
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	return (rc);
>>>>>>  }
>>>>>> 
>>>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize)
>>>>>>  	dmu_tx_t *tx;
>>>>>>  	int error;
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>
>>>>>>  	tx = dmu_tx_create(os);
>>>>>>  	dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL);
>>>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name)
>>>>>>  	namelen = strlen(name);
>>>>>>
>>>>>>  	DROP_GIANT();
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>  	g_topology_lock();
>>>>>>
>>>>>>  	LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) {
>>>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name)
>>>>>>  	}
>>>>>>
>>>>>>  	g_topology_unlock();
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	PICKUP_GIANT();
>>>>>>  }
>>>>>> 
>>>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>>>> uint64_t volsize)
>>>>>>  	uint64_t old_volsize = 0ULL;
>>>>>>  	uint64_t readonly;
>>>>>> 
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>  	zv = zvol_minor_lookup(name);
>>>>>>  	if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) {
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (error);
>>>>>>  	}
>>>>>> 
>>>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, 
>>>>>> uint64_t volsize)
>>>>>>  out:
>>>>>>  	dmu_objset_rele(os, FTAG);
>>>>>> 
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>
>>>>>>  	return (error);
>>>>>>  }
>>>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>>>> count)
>>>>>>  {
>>>>>>  	zvol_state_t *zv;
>>>>>>  	int err = 0;
>>>>>> -	boolean_t locked = B_FALSE;
>>>>>> 
>>>>>> -	/*
>>>>>> -	 * Protect against recursively entering spa_namespace_lock
>>>>>> -	 * when spa_open() is used for a pool on a (local) ZVOL(s).
>>>>>> -	 * This is needed since we replaced upstream zfsdev_state_lock
>>>>>> -	 * with spa_namespace_lock in the ZVOL code.
>>>>>> -	 * We are using the same trick as spa_open().
>>>>>> -	 * Note that calls in zvol_first_open which need to resolve
>>>>>> -	 * pool name to a spa object will enter spa_open()
>>>>>> -	 * recursively, but that function already has all the
>>>>>> -	 * necessary protection.
>>>>>> -	 */
>>>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>>> -		locked = B_TRUE;
>>>>>> -	}
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>>  	zv = pp->private;
>>>>>>  	if (zv == NULL) {
>>>>>> -		if (locked)
>>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(ENXIO));
>>>>>>  	}
>>>>>>
>>>>>>  	if (zv->zv_total_opens == 0)
>>>>>>  		err = zvol_first_open(zv);
>>>>>>  	if (err) {
>>>>>> -		if (locked)
>>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (err);
>>>>>>  	}
>>>>>>  	if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) {
>>>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int 
>>>>>> count)
>>>>>>  #endif
>>>>>>
>>>>>>  	zv->zv_total_opens += count;
>>>>>> -	if (locked)
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>
>>>>>>  	return (err);
>>>>>>  out:
>>>>>>  	if (zv->zv_total_opens == 0)
>>>>>>  		zvol_last_close(zv);
>>>>>> -	if (locked)
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	return (err);
>>>>>>  }
>>>>>> 
>>>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>>>> count)
>>>>>>  {
>>>>>>  	zvol_state_t *zv;
>>>>>>  	int error = 0;
>>>>>> -	boolean_t locked = B_FALSE;
>>>>>> 
>>>>>> -	/* See comment in zvol_open(). */
>>>>>> -	if (!MUTEX_HELD(&spa_namespace_lock)) {
>>>>>> -		mutex_enter(&spa_namespace_lock);
>>>>>> -		locked = B_TRUE;
>>>>>> -	}
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>>  	zv = pp->private;
>>>>>>  	if (zv == NULL) {
>>>>>> -		if (locked)
>>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(ENXIO));
>>>>>>  	}
>>>>>> 
>>>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int 
>>>>>> count)
>>>>>>  	if (zv->zv_total_opens == 0)
>>>>>>  		zvol_last_close(zv);
>>>>>> 
>>>>>> -	if (locked)
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	return (error);
>>>>>>  }
>>>>>> 
>>>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, 
>>>>>> int flag, cred_t *cr, int *rvalp)
>>>>>>  	int error = 0;
>>>>>>  	rl_t *rl;
>>>>>> 
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>
>>>>>>  	zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL);
>>>>>>
>>>>>>  	if (zv == NULL) {
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		return (SET_ERROR(ENXIO));
>>>>>>  	}
>>>>>>  	ASSERT(zv->zv_total_opens > 0);
>>>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>>> flag, cred_t *cr, int *rvalp)
>>>>>>  		dki.dki_ctype = DKC_UNKNOWN;
>>>>>>  		dki.dki_unit = getminor(dev);
>>>>>>  		dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - 
>>>>>> zv->zv_min_bs);
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag))
>>>>>>  			error = SET_ERROR(EFAULT);
>>>>>>  		return (error);
>>>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>>> flag, cred_t *cr, int *rvalp)
>>>>>>  		dkm.dki_lbsize = 1U << zv->zv_min_bs;
>>>>>>  		dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs;
>>>>>>  		dkm.dki_media_type = DK_UNKNOWN;
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag))
>>>>>>  			error = SET_ERROR(EFAULT);
>>>>>>  		return (error);
>>>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, 
>>>>>> int flag, cred_t *cr, int *rvalp)
>>>>>>  			uint64_t vs = zv->zv_volsize;
>>>>>>  			uint8_t bs = zv->zv_min_bs;
>>>>>> 
>>>>>> -			mutex_exit(&spa_namespace_lock);
>>>>>> +			mutex_exit(&zfsdev_state_lock);
>>>>>>  			error = zvol_getefi((void *)arg, flag, vs, bs);
>>>>>>  			return (error);
>>>>>>  		}
>>>>>>
>>>>>>  	case DKIOCFLUSHWRITECACHE:
>>>>>>  		dkc = (struct dk_callback *)arg;
>>>>>> -		mutex_exit(&spa_namespace_lock);
>>>>>> +		mutex_exit(&zfsdev_state_lock);
>>>>>>  		zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>>  		if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) {
>>>>>>  			(*dkc->dkc_callback)(dkc->dkc_cookie, error);
>>>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, 
>>>>>> int flag, cred_t *cr, int *rvalp)
>>>>>>  			}
>>>>>>  			if (wce) {
>>>>>>  				zv->zv_flags |= ZVOL_WCE;
>>>>>> -				mutex_exit(&spa_namespace_lock);
>>>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>>>  			} else {
>>>>>>  				zv->zv_flags &= ~ZVOL_WCE;
>>>>>> -				mutex_exit(&spa_namespace_lock);
>>>>>> +				mutex_exit(&zfsdev_state_lock);
>>>>>>  				zil_commit(zv->zv_zilog, ZVOL_OBJ);
>>>>>>  			}
>>>>>>  			return (0);
>>>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int 
>>>>>> flag, cred_t *cr, int *rvalp)
>>>>>>  		break;
>>>>>>
>>>>>>  	}
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	return (error);
>>>>>>  }
>>>>>>  #endif	/* sun */
>>>>>> @@ -1844,12 +1819,14 @@ zvol_init(void)
>>>>>>  {
>>>>>>  	VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t),
>>>>>>  	    1) == 0);
>>>>>> +	mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL);
>>>>>>  	ZFS_LOG(1, "ZVOL Initialized.");
>>>>>>  }
>>>>>>
>>>>>>  void
>>>>>>  zvol_fini(void)
>>>>>>  {
>>>>>> +	mutex_destroy(&zfsdev_state_lock);
>>>>>>  	ddi_soft_state_fini(&zfsdev_state);
>>>>>>  	ZFS_LOG(1, "ZVOL Deinitialized.");
>>>>>>  }
>>>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t 
>>>>>> resize)
>>>>>>  	uint64_t version = spa_version(spa);
>>>>>>  	enum zio_checksum checksum;
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>  	ASSERT(vd->vdev_ops == &vdev_root_ops);
>>>>>>
>>>>>>  	error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0,
>>>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char 
>>>>>> *newname)
>>>>>>  	struct g_provider *pp;
>>>>>>  	zvol_state_t *zv;
>>>>>> 
>>>>>> -	ASSERT(MUTEX_HELD(&spa_namespace_lock));
>>>>>> +	ASSERT(MUTEX_HELD(&zfsdev_state_lock));
>>>>>>  	g_topology_assert();
>>>>>>
>>>>>>  	pp = LIST_FIRST(&gp->provider);
>>>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const 
>>>>>> char *newname)
>>>>>>  	newnamelen = strlen(newname);
>>>>>>
>>>>>>  	DROP_GIANT();
>>>>>> -	mutex_enter(&spa_namespace_lock);
>>>>>> +	mutex_enter(&zfsdev_state_lock);
>>>>>>  	g_topology_lock();
>>>>>>
>>>>>>  	LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) {
>>>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const 
>>>>>> char *newname)
>>>>>>  	}
>>>>>>
>>>>>>  	g_topology_unlock();
>>>>>> -	mutex_exit(&spa_namespace_lock);
>>>>>> +	mutex_exit(&zfsdev_state_lock);
>>>>>>  	PICKUP_GIANT();
>>>>>>  }
>>>>>> --
>>>>>> 1.8.4.2
>>>>>> 
>>>>> 
>>>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001
>>>>>> From: Richard Kojedzinszky <krichy@cflinux.hu>
>>>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100
>>>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix
>>>>>> 
>>>>>> ---
>>>>>>  .../compat/opensolaris/kern/opensolaris_lookup.c   | 13 +++---
>>>>>>  .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c     | 53 
>>>>>> +++++++++++++++-------
>>>>>>  2 files changed, 42 insertions(+), 24 deletions(-)
>>>>>> 
>>>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c 
>>>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> index 94383d6..4cac053 100644
>>>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c
>>>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>>  	 * progress on this vnode.
>>>>>>  	 */
>>>>>> 
>>>>>> +	vn_lock(cvp, lktype);
>>>>>> +
>>>>>>  	for (;;) {
>>>>>>  		/*
>>>>>>  		 * Reached the end of the mount chain?
>>>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>>  		if (vfsp == NULL)
>>>>>>  			break;
>>>>>>  		error = vfs_busy(vfsp, 0);
>>>>>> -		/*
>>>>>> -		 * tvp is NULL for *cvpp vnode, which we can't unlock.
>>>>>> -		 */
>>>>>> -		if (tvp != NULL)
>>>>>> -			vput(cvp);
>>>>>> -		else
>>>>>> -			vrele(cvp);
>>>>>> +		VOP_UNLOCK(cvp, 0);
>>>>>>  		if (error)
>>>>>>  			return (error);
>>>>>> 
>>>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype)
>>>>>>  		vfs_unbusy(vfsp);
>>>>>>  		if (error != 0)
>>>>>>  			return (error);
>>>>>> +
>>>>>> +		VN_RELE(cvp);
>>>>>> +
>>>>>>  		cvp = tvp;
>>>>>>  	}
>>>>>> 
>>>>>> diff --git 
>>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c 
>>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> index 28ab1fa..d3464b4 100644
>>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c
>>>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b)
>>>>>>  		return (0);
>>>>>>  }
>>>>>> 
>>>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following
>>>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks.
>>>>>> + * On return the passed in vp is unlocked */
>>>>>> +static zfsctl_snapdir_t *
>>>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp)
>>>>>> +{
>>>>>> +	gfs_dir_t *dp = vp->v_data;
>>>>>> +	*dvpp = dp->gfsd_file.gfs_parent;
>>>>>> +	zfsctl_snapdir_t *sdp;
>>>>>> +
>>>>>> +	VN_HOLD(*dvpp);
>>>>>> +	VOP_UNLOCK(vp, 0);
>>>>>> +	vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE);
>>>>>> +	sdp = (*dvpp)->v_data;
>>>>>> +	VOP_UNLOCK(*dvpp, 0);
>>>>>> +
>>>>>> +	return (sdp);
>>>>>> +}
>>>>>> +
>>>>>>  #ifdef sun
>>>>>>  vnodeops_t *zfsctl_ops_root;
>>>>>>  vnodeops_t *zfsctl_ops_snapdir;
>>>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap)
>>>>>>  			 * The snapshot was unmounted behind our backs,
>>>>>>  			 * try to remount it.
>>>>>>  			 */
>>>>>> +			VOP_UNLOCK(*vpp, 0);
>>>>>> +			VN_HOLD(*vpp);
>>>>>>  			VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, 
>>>>>> snapname) == 0);
>>>>>>  			goto domount;
>>>>>>  		} else {
>>>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap)
>>>>>>  	sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP);
>>>>>>  	(void) strcpy(sep->se_name, nm);
>>>>>>  	*vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, 
>>>>>> dmu_objset_id(snap));
>>>>>> -	VN_HOLD(*vpp);
>>>>>>  	avl_insert(&sdp->sd_snaps, sep, where);
>>>>>>
>>>>>>  	dmu_objset_rele(snap, FTAG);
>>>>>> @@ -1075,6 +1095,7 @@ domount:
>>>>>>  	(void) snprintf(mountpoint, mountpoint_len,
>>>>>>  	    "%s/" ZFS_CTLDIR_NAME "/snapshot/%s",
>>>>>>  	    dvp->v_vfsp->mnt_stat.f_mntonname, nm);
>>>>>> +	VN_HOLD(*vpp);
>>>>>>  	err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0);
>>>>>>  	kmem_free(mountpoint, mountpoint_len);
>>>>>>  	if (err == 0) {
>>>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap)
>>>>>>  	int locked;
>>>>>>  	vnode_t *dvp;
>>>>>> 
>>>>>> -	if (vp->v_count > 0)
>>>>>> -		goto end;
>>>>>> -
>>>>>> -	VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0);
>>>>>> -	sdp = dvp->v_data;
>>>>>> -	VOP_UNLOCK(dvp, 0);
>>>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>>
>>>>>>  	if (!(locked = MUTEX_HELD(&sdp->sd_lock)))
>>>>>>  		mutex_enter(&sdp->sd_lock);
>>>>>> 
>>>>>> +	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
>>>>>> +
>>>>>> +	if (vp->v_count > 0) {
>>>>>> +		mutex_exit(&sdp->sd_lock);
>>>>>> +		return (0);
>>>>>> +	}
>>>>>> +
>>>>>>  	ASSERT(!vn_ismntpt(vp));
>>>>>>
>>>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap)
>>>>>>  		mutex_exit(&sdp->sd_lock);
>>>>>>  	VN_RELE(dvp);
>>>>>> 
>>>>>> -end:
>>>>>>  	/*
>>>>>>  	 * Dispose of the vnode for the snapshot mount point.
>>>>>>  	 * This is safe to do because once this entry has been removed
>>>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap)
>>>>>>  static int
>>>>>>  zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap)
>>>>>>  {
>>>>>> -	zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data;
>>>>>> -	vnode_t *dvp, *vp;
>>>>>> +	vnode_t *dvp, *vp = ap->a_vp;
>>>>>>  	zfsctl_snapdir_t *sdp;
>>>>>>  	zfs_snapentry_t *sep;
>>>>>> -	int error;
>>>>>> +	int error = 0;
>>>>>> 
>>>>>> -	ASSERT(zfsvfs->z_ctldir != NULL);
>>>>>> -	error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp,
>>>>>> -	    NULL, 0, NULL, kcred, NULL, NULL, NULL);
>>>>>> -	if (error != 0)
>>>>>> -		return (error);
>>>>>> -	sdp = dvp->v_data;
>>>>>> +	sdp = zfsctl_snapshot_get_snapdir(vp, &dvp);
>>>>>>
>>>>>>  	mutex_enter(&sdp->sd_lock);
>>>>>> +
>>>>>> +	vn_lock(vp, LK_SHARED | LK_RETRY);
>>>>>> +
>>>>>>  	sep = avl_first(&sdp->sd_snaps);
>>>>>>  	while (sep != NULL) {
>>>>>>  		vp = sep->se_root;
>>>>>> --
>>>>>> 1.8.4.2
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> Pawel Jakub Dawidek                       http://www.wheelsystems.com
>>>>> FreeBSD committer                         http://www.FreeBSD.org
>>>>> Am I Evil? Yes, I Am!                     http://mobter.com
>>>>> 
>>>> _______________________________________________
>>>> freebsd-fs@freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>>>> 
>> 
>
--1030603365-641758273-1387948957=:2785
Content-Type: TEXT/x-diff; name=zfs-gfs.patch
Content-Transfer-Encoding: BASE64
Content-ID: <alpine.DEB.2.10.1312250622370.2785@krichy.tvnetwork.hu>
Content-Description: 
Content-Disposition: attachment; filename=zfs-gfs.patch

Y29tbWl0IDQ3MmVkNGIyM2Y5ZDAxZmEwOGUwNmFhYmNiNjA0MDYyZmZlODY4
MWMNCkF1dGhvcjogUmljaGFyZCBLb2plZHppbnN6a3kgPGtyaWNoeUBjZmxp
bnV4Lmh1Pg0KRGF0ZTogICBXZWQgRGVjIDI1IDA2OjA3OjQxIDIwMTMgKzAx
MDANCg0KICAgIFpGUy9HRlMgbG9ja2luZyBmaXhlcw0KDQpkaWZmIC0tZ2l0
IGEvc3lzL2NkZGwvY29tcGF0L29wZW5zb2xhcmlzL2tlcm4vb3BlbnNvbGFy
aXNfbG9va3VwLmMgYi9zeXMvY2RkbC9jb21wYXQvb3BlbnNvbGFyaXMva2Vy
bi9vcGVuc29sYXJpc19sb29rdXAuYw0KaW5kZXggNjU2NTgxMC4uNzRlZDE3
MyAxMDA2NDQNCi0tLSBhL3N5cy9jZGRsL2NvbXBhdC9vcGVuc29sYXJpcy9r
ZXJuL29wZW5zb2xhcmlzX2xvb2t1cC5jDQorKysgYi9zeXMvY2RkbC9jb21w
YXQvb3BlbnNvbGFyaXMva2Vybi9vcGVuc29sYXJpc19sb29rdXAuYw0KQEAg
LTgxLDYgKzgxLDggQEAgdHJhdmVyc2Uodm5vZGVfdCAqKmN2cHAsIGludCBs
a3R5cGUpDQogCSAqIHByb2dyZXNzIG9uIHRoaXMgdm5vZGUuDQogCSAqLw0K
IA0KKwl2bl9sb2NrKGN2cCwgbGt0eXBlKTsNCisNCiAJZm9yICg7Oykgew0K
IAkJLyoNCiAJCSAqIFJlYWNoZWQgdGhlIGVuZCBvZiB0aGUgbW91bnQgY2hh
aW4/DQpAQCAtODksMTMgKzkxLDcgQEAgdHJhdmVyc2Uodm5vZGVfdCAqKmN2
cHAsIGludCBsa3R5cGUpDQogCQlpZiAodmZzcCA9PSBOVUxMKQ0KIAkJCWJy
ZWFrOw0KIAkJZXJyb3IgPSB2ZnNfYnVzeSh2ZnNwLCAwKTsNCi0JCS8qDQot
CQkgKiB0dnAgaXMgTlVMTCBmb3IgKmN2cHAgdm5vZGUsIHdoaWNoIHdlIGNh
bid0IHVubG9jay4NCi0JCSAqLw0KLQkJaWYgKHR2cCAhPSBOVUxMKQ0KLQkJ
CXZwdXQoY3ZwKTsNCi0JCWVsc2UNCi0JCQl2cmVsZShjdnApOw0KKwkJVk9Q
X1VOTE9DSyhjdnAsIDApOw0KIAkJaWYgKGVycm9yKQ0KIAkJCXJldHVybiAo
ZXJyb3IpOw0KIA0KQEAgLTEwNyw2ICsxMDMsOSBAQCB0cmF2ZXJzZSh2bm9k
ZV90ICoqY3ZwcCwgaW50IGxrdHlwZSkNCiAJCXZmc191bmJ1c3kodmZzcCk7
DQogCQlpZiAoZXJyb3IgIT0gMCkNCiAJCQlyZXR1cm4gKGVycm9yKTsNCisN
CisJCVZOX1JFTEUoY3ZwKTsNCisNCiAJCWN2cCA9IHR2cDsNCiAJfQ0KIA0K
ZGlmZiAtLWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRz
L2NvbW1vbi9mcy9nZnMuYyBiL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFy
aXMvdXRzL2NvbW1vbi9mcy9nZnMuYw0KaW5kZXggNTk5NDRhMS4uMGY4ZjE1
NyAxMDA2NDQNCi0tLSBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMv
dXRzL2NvbW1vbi9mcy9nZnMuYw0KKysrIGIvc3lzL2NkZGwvY29udHJpYi9v
cGVuc29sYXJpcy91dHMvY29tbW9uL2ZzL2dmcy5jDQpAQCAtNDQ4LDcgKzQ0
OCw2IEBAIGdmc19sb29rdXBfZG90KHZub2RlX3QgKip2cHAsIHZub2RlX3Qg
KmR2cCwgdm5vZGVfdCAqcHZwLCBjb25zdCBjaGFyICpubSkNCiAJCQlWTl9I
T0xEKHB2cCk7DQogCQkJKnZwcCA9IHB2cDsNCiAJCX0NCi0JCXZuX2xvY2so
KnZwcCwgTEtfRVhDTFVTSVZFIHwgTEtfUkVUUlkpOw0KIAkJcmV0dXJuICgw
KTsNCiAJfQ0KIA0KQEAgLTQ4NSw2ICs0ODQsNyBAQCBnZnNfZmlsZV9jcmVh
dGUoc2l6ZV90IHNpemUsIHZub2RlX3QgKnB2cCwgdmZzX3QgKnZmc3AsIHZu
b2Rlb3BzX3QgKm9wcykNCiAJZnAgPSBrbWVtX3phbGxvYyhzaXplLCBLTV9T
TEVFUCk7DQogCWVycm9yID0gZ2V0bmV3dm5vZGUoInpmcyIsIHZmc3AsIG9w
cywgJnZwKTsNCiAJQVNTRVJUKGVycm9yID09IDApOw0KKwlWTl9MT0NLX0FT
SEFSRSh2cCk7DQogCXZuX2xvY2sodnAsIExLX0VYQ0xVU0lWRSB8IExLX1JF
VFJZKTsNCiAJdnAtPnZfZGF0YSA9IChjYWRkcl90KWZwOw0KIA0KZGlmZiAt
LWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1v
bi9mcy96ZnMvemZzX2N0bGRpci5jIGIvc3lzL2NkZGwvY29udHJpYi9vcGVu
c29sYXJpcy91dHMvY29tbW9uL2ZzL3pmcy96ZnNfY3RsZGlyLmMNCmluZGV4
IDI4YWIxZmEuLjY4MDY4NTggMTAwNjQ0DQotLS0gYS9zeXMvY2RkbC9jb250
cmliL29wZW5zb2xhcmlzL3V0cy9jb21tb24vZnMvemZzL3pmc19jdGxkaXIu
Yw0KKysrIGIvc3lzL2NkZGwvY29udHJpYi9vcGVuc29sYXJpcy91dHMvY29t
bW9uL2ZzL3pmcy96ZnNfY3RsZGlyLmMNCkBAIC02MTIsNyArNjEyLDcgQEAg
emZzY3RsX2ZyZWVic2Rfcm9vdF9sb29rdXAoYXApDQogDQogCWVyciA9IHpm
c2N0bF9yb290X2xvb2t1cChkdnAsIG5tLCB2cHAsIE5VTEwsIDAsIE5VTEws
IGNyLCBOVUxMLCBOVUxMLCBOVUxMKTsNCiAJaWYgKGVyciA9PSAwICYmIChu
bVswXSAhPSAnLicgfHwgbm1bMV0gIT0gJ1wwJykpDQotCQl2bl9sb2NrKCp2
cHAsIExLX0VYQ0xVU0lWRSB8IExLX1JFVFJZKTsNCisJCXZuX2xvY2soKnZw
cCwgYXAtPmFfY25wLT5jbl9sa2ZsYWdzIHwgTEtfUkVUUlkpOw0KIAlyZXR1
cm4gKGVycik7DQogfQ0KIA0KQEAgLTk3NSw2ICs5NzUsOCBAQCB6ZnNjdGxf
c25hcGRpcl9sb29rdXAoYXApDQogCVpGU19FTlRFUih6ZnN2ZnMpOw0KIA0K
IAlpZiAoZ2ZzX2xvb2t1cF9kb3QodnBwLCBkdnAsIHpmc3Zmcy0+el9jdGxk
aXIsIG5tKSA9PSAwKSB7DQorCQlpZiAobm1bMF0gIT0gJy4nIHx8IG5tWzFd
ICE9ICdcMCcpDQorCQkJdm5fbG9jaygqdnBwLCBhcC0+YV9jbnAtPmNuX2xr
ZmxhZ3MgfCBMS19SRVRSWSk7DQogCQlaRlNfRVhJVCh6ZnN2ZnMpOw0KIAkJ
cmV0dXJuICgwKTsNCiAJfQ0KQEAgLTEwMDQsNyArMTAwNiw3IEBAIHpmc2N0
bF9zbmFwZGlyX2xvb2t1cChhcCkNCiAJaWYgKChzZXAgPSBhdmxfZmluZCgm
c2RwLT5zZF9zbmFwcywgJnNlYXJjaCwgJndoZXJlKSkgIT0gTlVMTCkgew0K
IAkJKnZwcCA9IHNlcC0+c2Vfcm9vdDsNCiAJCVZOX0hPTEQoKnZwcCk7DQot
CQllcnIgPSB0cmF2ZXJzZSh2cHAsIExLX0VYQ0xVU0lWRSB8IExLX1JFVFJZ
KTsNCisJCWVyciA9IHRyYXZlcnNlKHZwcCwgYXAtPmFfY25wLT5jbl9sa2Zs
YWdzIHwgTEtfUkVUUlkpOw0KIAkJaWYgKGVyciAhPSAwKSB7DQogCQkJVk5f
UkVMRSgqdnBwKTsNCiAJCQkqdnBwID0gTlVMTDsNCkBAIC0xMDEzLDYgKzEw
MTUsOCBAQCB6ZnNjdGxfc25hcGRpcl9sb29rdXAoYXApDQogCQkJICogVGhl
IHNuYXBzaG90IHdhcyB1bm1vdW50ZWQgYmVoaW5kIG91ciBiYWNrcywNCiAJ
CQkgKiB0cnkgdG8gcmVtb3VudCBpdC4NCiAJCQkgKi8NCisJCQlWT1BfVU5M
T0NLKCp2cHAsIDApOw0KKwkJCVZOX0hPTEQoKnZwcCk7DQogCQkJVkVSSUZZ
KHpmc2N0bF9zbmFwc2hvdF96bmFtZShkdnAsIG5tLCBNQVhOQU1FTEVOLCBz
bmFwbmFtZSkgPT0gMCk7DQogCQkJZ290byBkb21vdW50Ow0KIAkJfSBlbHNl
IHsNCkBAIC0xMDY0LDcgKzEwNjgsNiBAQCB6ZnNjdGxfc25hcGRpcl9sb29r
dXAoYXApDQogCXNlcC0+c2VfbmFtZSA9IGttZW1fYWxsb2Moc3RybGVuKG5t
KSArIDEsIEtNX1NMRUVQKTsNCiAJKHZvaWQpIHN0cmNweShzZXAtPnNlX25h
bWUsIG5tKTsNCiAJKnZwcCA9IHNlcC0+c2Vfcm9vdCA9IHpmc2N0bF9zbmFw
c2hvdF9ta25vZGUoZHZwLCBkbXVfb2Jqc2V0X2lkKHNuYXApKTsNCi0JVk5f
SE9MRCgqdnBwKTsNCiAJYXZsX2luc2VydCgmc2RwLT5zZF9zbmFwcywgc2Vw
LCB3aGVyZSk7DQogDQogCWRtdV9vYmpzZXRfcmVsZShzbmFwLCBGVEFHKTsN
CkBAIC0xMDc1LDYgKzEwNzgsNyBAQCBkb21vdW50Og0KIAkodm9pZCkgc25w
cmludGYobW91bnRwb2ludCwgbW91bnRwb2ludF9sZW4sDQogCSAgICAiJXMv
IiBaRlNfQ1RMRElSX05BTUUgIi9zbmFwc2hvdC8lcyIsDQogCSAgICBkdnAt
PnZfdmZzcC0+bW50X3N0YXQuZl9tbnRvbm5hbWUsIG5tKTsNCisJVk5fSE9M
RCgqdnBwKTsNCiAJZXJyID0gbW91bnRfc25hcHNob3QoY3VydGhyZWFkLCB2
cHAsICJ6ZnMiLCBtb3VudHBvaW50LCBzbmFwbmFtZSwgMCk7DQogCWttZW1f
ZnJlZShtb3VudHBvaW50LCBtb3VudHBvaW50X2xlbik7DQogCWlmIChlcnIg
PT0gMCkgew0KQEAgLTExMzAsNiArMTEzNCw4IEBAIHpmc2N0bF9zaGFyZXNf
bG9va3VwKGFwKQ0KIAlzdHJsY3B5KG5tLCBjbnAtPmNuX25hbWVwdHIsIGNu
cC0+Y25fbmFtZWxlbiArIDEpOw0KIA0KIAlpZiAoZ2ZzX2xvb2t1cF9kb3Qo
dnBwLCBkdnAsIHpmc3Zmcy0+el9jdGxkaXIsIG5tKSA9PSAwKSB7DQorCQlp
ZiAobm1bMF0gIT0gJy4nIHx8IG5tWzFdICE9ICdcMCcpDQorCQkJdm5fbG9j
aygqdnBwLCBhcC0+YV9jbnAtPmNuX2xrZmxhZ3MgfCBMS19SRVRSWSk7DQog
CQlaRlNfRVhJVCh6ZnN2ZnMpOw0KIAkJcmV0dXJuICgwKTsNCiAJfQ0KQEAg
LTE0NjIsMTggKzE0NjgsMjUgQEAgemZzY3RsX3NuYXBzaG90X2luYWN0aXZl
KGFwKQ0KIAl6ZnNjdGxfc25hcGRpcl90ICpzZHA7DQogCXpmc19zbmFwZW50
cnlfdCAqc2VwLCAqbmV4dDsNCiAJaW50IGxvY2tlZDsNCi0Jdm5vZGVfdCAq
ZHZwOw0KLQ0KLQlpZiAodnAtPnZfY291bnQgPiAwKQ0KLQkJZ290byBlbmQ7
DQorCWdmc19kaXJfdCAqZHAgPSB2cC0+dl9kYXRhOw0KKwl2bm9kZV90ICpk
dnAgPSBkcC0+Z2ZzZF9maWxlLmdmc19wYXJlbnQ7DQogDQotCVZFUklGWShn
ZnNfZGlyX2xvb2t1cCh2cCwgIi4uIiwgJmR2cCwgY3IsIDAsIE5VTEwsIE5V
TEwpID09IDApOw0KKwlWTl9IT0xEKGR2cCk7DQorCVZPUF9VTkxPQ0sodnAs
IDApOw0KIAlzZHAgPSBkdnAtPnZfZGF0YTsNCi0JVk9QX1VOTE9DSyhkdnAs
IDApOw0KIA0KIAlpZiAoIShsb2NrZWQgPSBNVVRFWF9IRUxEKCZzZHAtPnNk
X2xvY2spKSkNCiAJCW11dGV4X2VudGVyKCZzZHAtPnNkX2xvY2spOw0KIA0K
Kwl2bl9sb2NrKHZwLCBMS19FWENMVVNJVkUgfCBMS19SRVRSWSk7DQorDQor
CWlmICh2cC0+dl9jb3VudCA+IDApIHsNCisJCWlmICghbG9ja2VkKQ0KKwkJ
CW11dGV4X2V4aXQoJnNkcC0+c2RfbG9jayk7DQorCQlWTl9SRUxFKGR2cCk7
DQorCQlyZXR1cm4oMCk7DQorCX0NCisNCiAJQVNTRVJUKCF2bl9pc21udHB0
KHZwKSk7DQogDQogCXNlcCA9IGF2bF9maXJzdCgmc2RwLT5zZF9zbmFwcyk7
DQpAQCAtMTQ5NCw3ICsxNTA3LDYgQEAgemZzY3RsX3NuYXBzaG90X2luYWN0
aXZlKGFwKQ0KIAkJbXV0ZXhfZXhpdCgmc2RwLT5zZF9sb2NrKTsNCiAJVk5f
UkVMRShkdnApOw0KIA0KLWVuZDoNCiAJLyoNCiAJICogRGlzcG9zZSBvZiB0
aGUgdm5vZGUgZm9yIHRoZSBzbmFwc2hvdCBtb3VudCBwb2ludC4NCiAJICog
VGhpcyBpcyBzYWZlIHRvIGRvIGJlY2F1c2Ugb25jZSB0aGlzIGVudHJ5IGhh
cyBiZWVuIHJlbW92ZWQNCkBAIC0xNTg4LDcgKzE2MDAsNyBAQCB6ZnNjdGxf
c25hcHNob3RfbG9va3VwKGFwKQ0KIAllcnJvciA9IHpmc2N0bF9yb290X2xv
b2t1cCh6ZnN2ZnMtPnpfY3RsZGlyLCAic25hcHNob3QiLCB2cHAsDQogCSAg
ICBOVUxMLCAwLCBOVUxMLCBjciwgTlVMTCwgTlVMTCwgTlVMTCk7DQogCWlm
IChlcnJvciA9PSAwKQ0KLQkJdm5fbG9jaygqdnBwLCBMS19FWENMVVNJVkUg
fCBMS19SRVRSWSk7DQorCQl2bl9sb2NrKCp2cHAsIGFwLT5hX2NucC0+Y25f
bGtmbGFncyB8IExLX1JFVFJZKTsNCiAJcmV0dXJuIChlcnJvcik7DQogfQ0K
IA0K

--1030603365-641758273-1387948957=:2785--