From owner-freebsd-fs@FreeBSD.ORG Mon Dec 23 11:06:46 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 773AB589 for ; Mon, 23 Dec 2013 11:06:46 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 59A2811B4 for ; Mon, 23 Dec 2013 11:06:46 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rBNB6kdi029973 for ; Mon, 23 Dec 2013 11:06:46 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rBNB6jFw029971 for freebsd-fs@FreeBSD.org; Mon, 23 Dec 2013 11:06:45 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 23 Dec 2013 11:06:45 GMT Message-Id: <201312231106.rBNB6jFw029971@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Dec 2013 11:06:46 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/184478 fs [smbfs] mount_smbfs cannot read/write files o kern/182570 fs [zfs] [patch] ZFS panic in receive o kern/182536 fs [zfs] zfs deadlock o kern/181966 fs [zfs] Kernel panic in ZFS I/O: solaris assert: BP_EQUA o kern/181834 fs [nfs] amd mounting NFS directories can drive a dead-lo o kern/181565 fs [swap] Problem with vnode-backed swap space. o kern/181377 fs [zfs] zfs recv causes an inconsistant pool o kern/181281 fs [msdosfs] stack trace after successfull 'umount /mnt' o kern/181082 fs [fuse] [ntfs] Write to mounted NTFS filesystem using F o kern/180979 fs [netsmb][patch]: Fix large files handling o kern/180876 fs [zfs] [hast] ZFS with trim,bio_flush or bio_delete loc o kern/180678 fs [NFS] succesfully exported filesystems being reported o kern/180438 fs [smbfs] [patch] mount_smbfs fails on arm because of wr p kern/180236 fs [zfs] [nullfs] Leakage free space using ZFS with nullf o kern/178854 fs [ufs] FreeBSD kernel crash in UFS s kern/178467 fs [zfs] [request] Optimized Checksum Code for ZFS o kern/178412 fs [smbfs] Coredump when smbfs mounted o kern/178388 fs [zfs] [patch] allow up to 8MB recordsize o kern/178387 fs [zfs] [patch] sparse files performance improvements o kern/178349 fs [zfs] zfs scrub on deduped data could be much less see o kern/178329 fs [zfs] extended attributes leak o kern/178238 fs [nullfs] nullfs don't release i-nodes on unlink. f kern/178231 fs [nfs] 8.3 nfsv4 client reports "nfsv4 client/server pr o kern/177985 fs [zfs] disk usage problem when copying from one zfs dat o kern/177971 fs [nfs] FreeBSD 9.1 nfs client dirlist problem w/ nfsv3, o kern/177966 fs [zfs] resilver completes but subsequent scrub reports o kern/177658 fs [ufs] FreeBSD panics after get full filesystem with uf o kern/177536 fs [zfs] zfs livelock (deadlock) with high write-to-disk o kern/177445 fs [hast] HAST panic o kern/177240 fs [zfs] zpool import failed with state UNAVAIL but all d o kern/176978 fs [zfs] [panic] zfs send -D causes "panic: System call i o kern/176857 fs [softupdates] [panic] 9.1-RELEASE/amd64/GENERIC panic o bin/176253 fs zpool(8): zfs pool indentation is misleading/wrong o kern/176141 fs [zfs] sharesmb=on makes errors for sharenfs, and still o kern/175950 fs [zfs] Possible deadlock in zfs after long uptime o kern/175897 fs [zfs] operations on readonly zpool hang o kern/175449 fs [unionfs] unionfs and devfs misbehaviour o kern/175179 fs [zfs] ZFS may attach wrong device on move o kern/175071 fs [ufs] [panic] softdep_deallocate_dependencies: unrecov o kern/174372 fs [zfs] Pagefault appears to be related to ZFS o kern/174315 fs [zfs] chflags uchg not supported o kern/174310 fs [zfs] root point mounting broken on CURRENT with multi o kern/174279 fs [ufs] UFS2-SU+J journal and filesystem corruption o kern/173830 fs [zfs] Brain-dead simple change to ZFS error descriptio o kern/173718 fs [zfs] phantom directory in zraid2 pool f kern/173657 fs [nfs] strange UID map with nfsuserd o kern/173363 fs [zfs] [panic] Panic on 'zpool replace' on readonly poo o kern/173136 fs [unionfs] mounting above the NFS read-only share panic o kern/172942 fs [smbfs] Unmounting a smb mount when the server became o kern/172348 fs [unionfs] umount -f of filesystem in use with readonly o kern/172334 fs [unionfs] unionfs permits recursive union mounts; caus o kern/171626 fs [tmpfs] tmpfs should be noisier when the requested siz o kern/171415 fs [zfs] zfs recv fails with "cannot receive incremental o kern/170945 fs [gpt] disk layout not portable between direct connect o bin/170778 fs [zfs] [panic] FreeBSD panics randomly o kern/170680 fs [nfs] Multiple NFS Client bug in the FreeBSD 7.4-RELEA o kern/170497 fs [xfs][panic] kernel will panic whenever I ls a mounted o kern/169945 fs [zfs] [panic] Kernel panic while importing zpool (afte o kern/169480 fs [zfs] ZFS stalls on heavy I/O o kern/169398 fs [zfs] Can't remove file with permanent error o kern/169339 fs panic while " : > /etc/123" o kern/169319 fs [zfs] zfs resilver can't complete o kern/168947 fs [nfs] [zfs] .zfs/snapshot directory is messed up when o kern/168942 fs [nfs] [hang] nfsd hangs after being restarted (not -HU o kern/168158 fs [zfs] incorrect parsing of sharenfs options in zfs (fs o kern/167979 fs [ufs] DIOCGDINFO ioctl does not work on 8.2 file syste o kern/167977 fs [smbfs] mount_smbfs results are differ when utf-8 or U o kern/167688 fs [fusefs] Incorrect signal handling with direct_io o kern/167685 fs [zfs] ZFS on USB drive prevents shutdown / reboot o kern/167612 fs [portalfs] The portal file system gets stuck inside po o kern/167272 fs [zfs] ZFS Disks reordering causes ZFS to pick the wron o kern/167260 fs [msdosfs] msdosfs disk was mounted the second time whe o kern/167109 fs [zfs] [panic] zfs diff kernel panic Fatal trap 9: gene o kern/167105 fs [nfs] mount_nfs can not handle source exports wiht mor o kern/167067 fs [zfs] [panic] ZFS panics the server o kern/167065 fs [zfs] boot fails when a spare is the boot disk o kern/167048 fs [nfs] [patch] RELEASE-9 crash when using ZFS+NULLFS+NF o kern/166912 fs [ufs] [panic] Panic after converting Softupdates to jo o kern/166851 fs [zfs] [hang] Copying directory from the mounted UFS di o kern/166477 fs [nfs] NFS data corruption. o kern/165950 fs [ffs] SU+J and fsck problem o kern/165521 fs [zfs] [hang] livelock on 1 Gig of RAM with zfs when 31 o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161864 fs [ufs] removing journaling from UFS partition fails on o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic f kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs p kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o bin/153142 fs [zfs] ls -l outputs `ls: ./.zfs: Operation not support o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server o kern/145750 fs [unionfs] [hang] unionfs locks the machine s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141950 fs [unionfs] [lor] ufs/unionfs/ufs Lock order reversal o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/137588 fs [unionfs] [lor] LOR nfs/ufs/nfs o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis p kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126973 fs [unionfs] [hang] System hang with unionfs and init chr o kern/126553 fs [unionfs] unionfs move directory problem 2 (files appe o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o bin/123574 fs [unionfs] df(1) -t option destroys info for unionfs (a o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o kern/121385 fs [unionfs] unionfs cross mount -> kernel panic o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o kern/118318 fs [nfs] NFS server hangs under special circumstances o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/67326 fs [msdosfs] crash after attempt to mount write protected o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t o kern/9619 fs [nfs] Restarting mountd kills existing mounts 337 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Dec 23 16:55:36 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B317B155; Mon, 23 Dec 2013 16:55:36 +0000 (UTC) Received: from krichy.tvnetwork.hu (krichy.tvnetwork.hu [109.61.101.194]) by mx1.freebsd.org (Postfix) with ESMTP id 172871B6C; Mon, 23 Dec 2013 16:55:35 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id 7EFF4A6D3; Mon, 23 Dec 2013 17:55:08 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id 7C792A6D2; Mon, 23 Dec 2013 17:55:08 +0100 (CET) Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: <20131220134405.GE1658@garage.freebsd.pl> Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 Dec 2013 16:55:36 -0000 Dear Pawel, Thanks for your response. I hope someone will review my work. Secondly, I think I;ve found another deadlock scenario: When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock held. Then it does an unmount of all snapshots, which in turn goes to zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does a snapshot mount, which somewhere tries to acquire spa_namespace_lock. So it deadlocks. Currently I dont know how to workaround this. Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? What if we release it function enter, and re-acquire upon exit? Thanks in advance, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: > Date: Fri, 20 Dec 2013 14:44:05 +0100 > From: Pawel Jakub Dawidek > To: krichy@tvnetwork.hu > Cc: freebsd-fs@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >> Dear devs, >> >> I am attaching a more clear patch/fix for my snapshot handling issues >> (0002), but I would be happy if some ZFS expert would comment it. I am >> trying to solve it at least for two weeks now, and an ACK or a NACK would >> be nice from someone. Also a commit is reverted since it also caused >> deadlocks. I've read its comments, which also eliminates deadlocks, but I >> did not find any reference how to produce that deadlock. In my view >> reverting that makes my issues disappear, but I dont know what new cases >> will it rise. > > Richard, > > I won't be able to analyse it myself anytime soon, because of other > obligations, but I forwarded you e-mail to the zfs-devel@ mailing list > (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from > there will be able to help you. > >> I've rewritten traverse() to make more like upstream, added two extra >> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what >> was passed to it (I dont know even that upon creating a snapshot vnode why >> is that extra two holds needed for the vnode.) And unfortunately, due to >> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >> cause deadlocks, so zfsctl_snapshot_inactive() and >> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >> >> After this, one may also get a deadlock, when a simple access would call >> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >> should always be covered, but after some stress test, sometimes we hit >> that call, and that can cause again deadlocks. In our environment I've >> just uncommented that callback, which returns ENODIR on some calls, but at >> least not a deadlock. >> >> The attached script can be used to reproduce my cases (would one confirm >> that?), and after the patches applied, they disappear (confirm?). >> >> Thanks, >> >> >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >> >>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>> From: krichy@tvnetwork.hu >>> To: pjd@freebsd.org >>> Cc: freebsd-fs@freebsd.org >>> Subject: Re: kern/184677 (fwd) >>> >>> Dear devs, >>> >>> I will sum up my experience regarding the issue: >>> >>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>> snapshot dir (or in the snapshot) may cause a deadlock. >>> >>> After investigating the problem, I found that zfs send umounts the snapshots, >>> and that causes the deadlock, so later I tested only with concurrent umount >>> and the "activity". More later I found that listing the snapshots in >>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I used >>> them for the tests. But for my surprise, instead of a deadlock, a recursive >>> lock panic has arised. >>> >>> The vnode for the ".zfs/snapshot/" directory contains zfs's zfsctl_snapdir_t >>> structure (sdp). This contains a tree of mounted snapshots, and each entry >>> (sep) contains the vnode of entry on which the snapshot is mounted on top >>> (se_root). The strange is that the se_root member does not hold a reference >>> for the vnode, just a simple pointer to it. >>> >>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is locked, >>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>> exists already. If it founds no entry, does the mount. In the case of an >>> entry was found, the se_root member contains the vnode which the snapshot is >>> mounted on. Thus, a reference is taken for it, and the traverse() call will >>> resolve to the real root vnode of the mounted snapshot, returning it as >>> locked. (Examining the traverse() code I've found that it did not follow >>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>> >>> On the other way, when an umount is issued, the se_root vnode looses its last >>> reference (as only the mountpoint holds one for it), it goes through the >>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is called >>> with a locked vnode, so this is a deadlock race condition. While >>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse() >>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock on >>> se_root while tries to access the sdp lock. >>> >>> The zfsctl_snapshot_inactive() has an if statement checking the v_usecount, >>> which is incremented in zfsctl_snapdir_lookup(), but in that context it is >>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() path >>> assumes that the vnode remains inactive (as opposed to illumos, at least how >>> i read the code). So zfsctl_snapshot_inactive() must free resources while in >>> a locked state. I was a bit confused, and probably that is why the previously >>> posted patch is as is. >>> >>> Maybe if I had some clues on the directions of this problem, I could have >>> worked more for a nicer, shorter solution. >>> >>> Please someone comment on my post. >>> >>> Regards, >>> >>> >>> >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>> >>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: pjd@freebsd.org >>>> Cc: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 (fwd) >>>> >>>> Dear PJD, >>>> >>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>> regarding some issues in ZFS. Please give some advice for me, where to look >>>> for solutions, or how could I help to resolve those issues. >>>> >>>> Regards, >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> ---------- Forwarded message ---------- >>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 >>>> >>>> >>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on Fri >>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>> situation. Any comments on this? >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>> >>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>> From: krichy@tvnetwork.hu >>>>> To: freebsd-fs@freebsd.org >>>>> Subject: kern/184677 >>>>> >>>>> Dear devs, >>>>> >>>>> I've attached a patch, which makes the recursive lockmgr disappear, and >>>>> makes the reported bug to disappear. I dont know if I followed any >>>>> guidelines well, or not, but at least it works for me. Please some >>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>> >>>>> But unfortunately, my original problem is still not solved, maybe the same >>>>> as Ryan's: >>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>> >>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to acquire >>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>> deadlock scenario. I looked at illumos's code, and for some reason they >>>>> use another mutex on zfsdev_close(), which therefore may not deadlock with >>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>> >>>>> I would like to help making ZFS more stable on freebsd also with its whole >>>>> functionality. I would be very thankful if some expert would give some >>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>> >>>>> Thanks in advance, >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >>> > > >> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >> From: Richard Kojedzinszky >> Date: Mon, 16 Dec 2013 15:39:11 +0100 >> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and >> replace it with the" >> >> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >> >> Conflicts: >> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> --- >> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 +++++++++------------ >> 4 files changed, 70 insertions(+), 80 deletions(-) >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >> index af2def2..8272c4d 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >> extern minor_t zfsdev_minor_alloc(void); >> >> extern void *zfsdev_state; >> +extern kmutex_t zfsdev_state_lock; >> >> #endif /* _KERNEL */ >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> index 15685a5..5c3e9f3 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> struct g_provider *pp; >> struct g_consumer *cp; >> size_t bufsize; >> - int error; >> + int error, lock; >> >> /* >> * We must have a pathname, and it must be absolute. >> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> >> vd->vdev_tsd = NULL; >> >> + if (mutex_owned(&spa_namespace_lock)) { >> + mutex_exit(&spa_namespace_lock); >> + lock = 1; >> + } else { >> + lock = 0; >> + } >> DROP_GIANT(); >> g_topology_lock(); >> error = 0; >> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> !ISP2(cp->provider->sectorsize)) { >> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >> vd->vdev_path); >> + >> + g_topology_lock(); >> vdev_geom_detach(cp, 0); >> + g_topology_unlock(); >> + >> error = EINVAL; >> cp = NULL; >> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t *max_psize, >> } >> g_topology_unlock(); >> PICKUP_GIANT(); >> + if (lock) >> + mutex_enter(&spa_namespace_lock); >> if (cp == NULL) { >> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >> return (error); >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> index e9fba26..91becde 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >> static minor_t last_minor; >> minor_t m; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> for (m = last_minor + 1; m != last_minor; m++) { >> if (m > ZFSDEV_MAX_MINOR) >> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >> minor_t minor; >> zfs_soft_state_t *zs; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> minor = zfsdev_minor_alloc(); >> if (minor == 0) >> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >> static void >> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >> { >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> zfs_onexit_destroy(zo); >> ddi_soft_state_free(zfsdev_state, minor); >> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, struct thread *td) >> >> /* This is the control device. Allocate a new minor if requested. */ >> if (flag & FEXCL) { >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> error = zfs_ctldev_init(devp); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> } >> >> return (error); >> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >> if (minor == 0) >> return; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >> if (zo == NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return; >> } >> zfs_ctldev_destroy(zo, minor); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> } >> >> static int >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> index 72d4502..aec5219 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >> #define ZVOL_DUMPSIZE "dumpsize" >> >> /* >> - * The spa_namespace_lock protects the zfsdev_state structure from being >> - * modified while it's being used, e.g. an open that comes in before a >> - * create finishes. It also protects temporary opens of the dataset so that, >> + * This lock protects the zfsdev_state structure from being modified >> + * while it's being used, e.g. an open that comes in before a create >> + * finishes. It also protects temporary opens of the dataset so that, >> * e.g., an open doesn't get a spurious EBUSY. >> */ >> +kmutex_t zfsdev_state_lock; >> static uint32_t zvol_minors; >> >> typedef struct zvol_extent { >> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >> struct g_geom *gp; >> zvol_state_t *zv = NULL; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> g_topology_lock(); >> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >> { >> zvol_state_t *zv; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> zv = zvol_minor_lookup(name); >> if (minor && zv) >> *minor = zv->zv_minor; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (zv ? 0 : -1); >> } >> #endif /* sun */ >> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >> >> ZFS_LOG(1, "Creating ZVOL %s...", name); >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> >> if (zvol_minor_lookup(name) != NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EEXIST)); >> } >> >> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >> >> if (error) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> >> #ifdef sun >> if ((minor = zfsdev_minor_alloc()) == 0) { >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> >> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EAGAIN)); >> } >> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >> ddi_soft_state_free(zfsdev_state, minor); >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EAGAIN)); >> } >> >> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >> ddi_remove_minor_node(zfs_dip, chrbuf); >> ddi_soft_state_free(zfsdev_state, minor); >> dmu_objset_disown(os, FTAG); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(EAGAIN)); >> } >> >> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >> >> zvol_minors++; >> >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> >> zvol_geom_run(zv); >> >> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >> minor_t minor = zv->zv_minor; >> #endif >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> if (zv->zv_total_opens != 0) >> return (SET_ERROR(EBUSY)); >> >> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >> zvol_state_t *zv; >> int rc; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> if ((zv = zvol_minor_lookup(name)) == NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> g_topology_lock(); >> rc = zvol_remove_zv(zv); >> g_topology_unlock(); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (rc); >> } >> >> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >> dmu_tx_t *tx; >> int error; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> >> tx = dmu_tx_create(os); >> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >> namelen = strlen(name); >> >> DROP_GIANT(); >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> g_topology_lock(); >> >> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >> } >> >> g_topology_unlock(); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> PICKUP_GIANT(); >> } >> >> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, uint64_t volsize) >> uint64_t old_volsize = 0ULL; >> uint64_t readonly; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> zv = zvol_minor_lookup(name); >> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> >> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, uint64_t volsize) >> out: >> dmu_objset_rele(os, FTAG); >> >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> >> return (error); >> } >> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int count) >> { >> zvol_state_t *zv; >> int err = 0; >> - boolean_t locked = B_FALSE; >> >> - /* >> - * Protect against recursively entering spa_namespace_lock >> - * when spa_open() is used for a pool on a (local) ZVOL(s). >> - * This is needed since we replaced upstream zfsdev_state_lock >> - * with spa_namespace_lock in the ZVOL code. >> - * We are using the same trick as spa_open(). >> - * Note that calls in zvol_first_open which need to resolve >> - * pool name to a spa object will enter spa_open() >> - * recursively, but that function already has all the >> - * necessary protection. >> - */ >> - if (!MUTEX_HELD(&spa_namespace_lock)) { >> - mutex_enter(&spa_namespace_lock); >> - locked = B_TRUE; >> - } >> + mutex_enter(&zfsdev_state_lock); >> >> zv = pp->private; >> if (zv == NULL) { >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> >> if (zv->zv_total_opens == 0) >> err = zvol_first_open(zv); >> if (err) { >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (err); >> } >> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int count) >> #endif >> >> zv->zv_total_opens += count; >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> >> return (err); >> out: >> if (zv->zv_total_opens == 0) >> zvol_last_close(zv); >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (err); >> } >> >> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int count) >> { >> zvol_state_t *zv; >> int error = 0; >> - boolean_t locked = B_FALSE; >> >> - /* See comment in zvol_open(). */ >> - if (!MUTEX_HELD(&spa_namespace_lock)) { >> - mutex_enter(&spa_namespace_lock); >> - locked = B_TRUE; >> - } >> + mutex_enter(&zfsdev_state_lock); >> >> zv = pp->private; >> if (zv == NULL) { >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> >> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int count) >> if (zv->zv_total_opens == 0) >> zvol_last_close(zv); >> >> - if (locked) >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> >> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> int error = 0; >> rl_t *rl; >> >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> >> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >> >> if (zv == NULL) { >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (SET_ERROR(ENXIO)); >> } >> ASSERT(zv->zv_total_opens > 0); >> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> dki.dki_ctype = DKC_UNKNOWN; >> dki.dki_unit = getminor(dev); >> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - zv->zv_min_bs); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >> error = SET_ERROR(EFAULT); >> return (error); >> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> dkm.dki_lbsize = 1U << zv->zv_min_bs; >> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >> dkm.dki_media_type = DK_UNKNOWN; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >> error = SET_ERROR(EFAULT); >> return (error); >> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> uint64_t vs = zv->zv_volsize; >> uint8_t bs = zv->zv_min_bs; >> >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> error = zvol_getefi((void *)arg, flag, vs, bs); >> return (error); >> } >> >> case DKIOCFLUSHWRITECACHE: >> dkc = (struct dk_callback *)arg; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> zil_commit(zv->zv_zilog, ZVOL_OBJ); >> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> } >> if (wce) { >> zv->zv_flags |= ZVOL_WCE; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> } else { >> zv->zv_flags &= ~ZVOL_WCE; >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> zil_commit(zv->zv_zilog, ZVOL_OBJ); >> } >> return (0); >> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int flag, cred_t *cr, int *rvalp) >> break; >> >> } >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> return (error); >> } >> #endif /* sun */ >> @@ -1844,12 +1819,14 @@ zvol_init(void) >> { >> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >> 1) == 0); >> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >> ZFS_LOG(1, "ZVOL Initialized."); >> } >> >> void >> zvol_fini(void) >> { >> + mutex_destroy(&zfsdev_state_lock); >> ddi_soft_state_fini(&zfsdev_state); >> ZFS_LOG(1, "ZVOL Deinitialized."); >> } >> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >> uint64_t version = spa_version(spa); >> enum zio_checksum checksum; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> ASSERT(vd->vdev_ops == &vdev_root_ops); >> >> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char *newname) >> struct g_provider *pp; >> zvol_state_t *zv; >> >> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >> g_topology_assert(); >> >> pp = LIST_FIRST(&gp->provider); >> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char *newname) >> newnamelen = strlen(newname); >> >> DROP_GIANT(); >> - mutex_enter(&spa_namespace_lock); >> + mutex_enter(&zfsdev_state_lock); >> g_topology_lock(); >> >> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char *newname) >> } >> >> g_topology_unlock(); >> - mutex_exit(&spa_namespace_lock); >> + mutex_exit(&zfsdev_state_lock); >> PICKUP_GIANT(); >> } >> -- >> 1.8.4.2 >> > >> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >> From: Richard Kojedzinszky >> Date: Wed, 18 Dec 2013 22:11:21 +0100 >> Subject: [PATCH 2/2] ZFS snapshot handling fix >> >> --- >> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 +++++++++++++++------- >> 2 files changed, 42 insertions(+), 24 deletions(-) >> >> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >> index 94383d6..4cac053 100644 >> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >> * progress on this vnode. >> */ >> >> + vn_lock(cvp, lktype); >> + >> for (;;) { >> /* >> * Reached the end of the mount chain? >> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >> if (vfsp == NULL) >> break; >> error = vfs_busy(vfsp, 0); >> - /* >> - * tvp is NULL for *cvpp vnode, which we can't unlock. >> - */ >> - if (tvp != NULL) >> - vput(cvp); >> - else >> - vrele(cvp); >> + VOP_UNLOCK(cvp, 0); >> if (error) >> return (error); >> >> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >> vfs_unbusy(vfsp); >> if (error != 0) >> return (error); >> + >> + VN_RELE(cvp); >> + >> cvp = tvp; >> } >> >> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >> index 28ab1fa..d3464b4 100644 >> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >> return (0); >> } >> >> +/* Return the zfsctl_snapdir_t object from current vnode, following >> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >> + * On return the passed in vp is unlocked */ >> +static zfsctl_snapdir_t * >> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >> +{ >> + gfs_dir_t *dp = vp->v_data; >> + *dvpp = dp->gfsd_file.gfs_parent; >> + zfsctl_snapdir_t *sdp; >> + >> + VN_HOLD(*dvpp); >> + VOP_UNLOCK(vp, 0); >> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >> + sdp = (*dvpp)->v_data; >> + VOP_UNLOCK(*dvpp, 0); >> + >> + return (sdp); >> +} >> + >> #ifdef sun >> vnodeops_t *zfsctl_ops_root; >> vnodeops_t *zfsctl_ops_snapdir; >> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >> * The snapshot was unmounted behind our backs, >> * try to remount it. >> */ >> + VOP_UNLOCK(*vpp, 0); >> + VN_HOLD(*vpp); >> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, snapname) == 0); >> goto domount; >> } else { >> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >> (void) strcpy(sep->se_name, nm); >> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, dmu_objset_id(snap)); >> - VN_HOLD(*vpp); >> avl_insert(&sdp->sd_snaps, sep, where); >> >> dmu_objset_rele(snap, FTAG); >> @@ -1075,6 +1095,7 @@ domount: >> (void) snprintf(mountpoint, mountpoint_len, >> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >> + VN_HOLD(*vpp); >> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >> kmem_free(mountpoint, mountpoint_len); >> if (err == 0) { >> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >> int locked; >> vnode_t *dvp; >> >> - if (vp->v_count > 0) >> - goto end; >> - >> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >> - sdp = dvp->v_data; >> - VOP_UNLOCK(dvp, 0); >> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >> >> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >> mutex_enter(&sdp->sd_lock); >> >> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >> + >> + if (vp->v_count > 0) { >> + mutex_exit(&sdp->sd_lock); >> + return (0); >> + } >> + >> ASSERT(!vn_ismntpt(vp)); >> >> sep = avl_first(&sdp->sd_snaps); >> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >> mutex_exit(&sdp->sd_lock); >> VN_RELE(dvp); >> >> -end: >> /* >> * Dispose of the vnode for the snapshot mount point. >> * This is safe to do because once this entry has been removed >> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >> static int >> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >> { >> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >> - vnode_t *dvp, *vp; >> + vnode_t *dvp, *vp = ap->a_vp; >> zfsctl_snapdir_t *sdp; >> zfs_snapentry_t *sep; >> - int error; >> + int error = 0; >> >> - ASSERT(zfsvfs->z_ctldir != NULL); >> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >> - if (error != 0) >> - return (error); >> - sdp = dvp->v_data; >> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >> >> mutex_enter(&sdp->sd_lock); >> + >> + vn_lock(vp, LK_SHARED | LK_RETRY); >> + >> sep = avl_first(&sdp->sd_snaps); >> while (sep != NULL) { >> vp = sep->se_root; >> -- >> 1.8.4.2 >> > > -- > Pawel Jakub Dawidek http://www.wheelsystems.com > FreeBSD committer http://www.FreeBSD.org > Am I Evil? Yes, I Am! http://mobter.com > From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 06:10:28 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 675076EB; Tue, 24 Dec 2013 06:10:28 +0000 (UTC) Received: from krichy.tvnetwork.hu (krichy.tvnetwork.hu [109.61.101.194]) by mx1.freebsd.org (Postfix) with ESMTP id BD56E1238; Tue, 24 Dec 2013 06:10:27 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id 56ACAA7C7; Tue, 24 Dec 2013 07:10:05 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id 53A43A7C6; Tue, 24 Dec 2013 07:10:05 +0100 (CET) Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 06:10:28 -0000 Dear devs, A third one, again the snapshot dirs: # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null Will deadlock in a few seconds. The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries to lock .zfs, while the other just tries to do this in reversed order. In a vop_lookup, why is the passed in vnode, and the returned vnode must be locked? Why is not enough to have it held? Thanks in advance, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear Pawel, > > Thanks for your response. I hope someone will review my work. > > Secondly, I think I;ve found another deadlock scenario: > > When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock > held. Then it does an unmount of all snapshots, which in turn goes to > zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. > > Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does a > snapshot mount, which somewhere tries to acquire spa_namespace_lock. So it > deadlocks. Currently I dont know how to workaround this. > > Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? > > What if we release it function enter, and re-acquire upon exit? > > Thanks in advance, > > > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: > >> Date: Fri, 20 Dec 2013 14:44:05 +0100 >> From: Pawel Jakub Dawidek >> To: krichy@tvnetwork.hu >> Cc: freebsd-fs@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>> Dear devs, >>> >>> I am attaching a more clear patch/fix for my snapshot handling issues >>> (0002), but I would be happy if some ZFS expert would comment it. I am >>> trying to solve it at least for two weeks now, and an ACK or a NACK would >>> be nice from someone. Also a commit is reverted since it also caused >>> deadlocks. I've read its comments, which also eliminates deadlocks, but I >>> did not find any reference how to produce that deadlock. In my view >>> reverting that makes my issues disappear, but I dont know what new cases >>> will it rise. >> >> Richard, >> >> I won't be able to analyse it myself anytime soon, because of other >> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >> there will be able to help you. >> >>> I've rewritten traverse() to make more like upstream, added two extra >>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what >>> was passed to it (I dont know even that upon creating a snapshot vnode why >>> is that extra two holds needed for the vnode.) And unfortunately, due to >>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >>> cause deadlocks, so zfsctl_snapshot_inactive() and >>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>> >>> After this, one may also get a deadlock, when a simple access would call >>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >>> should always be covered, but after some stress test, sometimes we hit >>> that call, and that can cause again deadlocks. In our environment I've >>> just uncommented that callback, which returns ENODIR on some calls, but at >>> least not a deadlock. >>> >>> The attached script can be used to reproduce my cases (would one confirm >>> that?), and after the patches applied, they disappear (confirm?). >>> >>> Thanks, >>> >>> >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>> >>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: pjd@freebsd.org >>>> Cc: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 (fwd) >>>> >>>> Dear devs, >>>> >>>> I will sum up my experience regarding the issue: >>>> >>>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>> >>>> After investigating the problem, I found that zfs send umounts the >>>> snapshots, >>>> and that causes the deadlock, so later I tested only with concurrent >>>> umount >>>> and the "activity". More later I found that listing the snapshots in >>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I used >>>> them for the tests. But for my surprise, instead of a deadlock, a >>>> recursive >>>> lock panic has arised. >>>> >>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>> zfsctl_snapdir_t >>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>> entry >>>> (sep) contains the vnode of entry on which the snapshot is mounted on top >>>> (se_root). The strange is that the se_root member does not hold a >>>> reference >>>> for the vnode, just a simple pointer to it. >>>> >>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>> locked, >>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>>> exists already. If it founds no entry, does the mount. In the case of an >>>> entry was found, the se_root member contains the vnode which the snapshot >>>> is >>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>> will >>>> resolve to the real root vnode of the mounted snapshot, returning it as >>>> locked. (Examining the traverse() code I've found that it did not follow >>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>> >>>> On the other way, when an umount is issued, the se_root vnode looses its >>>> last >>>> reference (as only the mountpoint holds one for it), it goes through the >>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>> called >>>> with a locked vnode, so this is a deadlock race condition. While >>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse() >>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock >>>> on >>>> se_root while tries to access the sdp lock. >>>> >>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>> v_usecount, >>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it >>>> is >>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>> path >>>> assumes that the vnode remains inactive (as opposed to illumos, at least >>>> how >>>> i read the code). So zfsctl_snapshot_inactive() must free resources while >>>> in >>>> a locked state. I was a bit confused, and probably that is why the >>>> previously >>>> posted patch is as is. >>>> >>>> Maybe if I had some clues on the directions of this problem, I could have >>>> worked more for a nicer, shorter solution. >>>> >>>> Please someone comment on my post. >>>> >>>> Regards, >>>> >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>> >>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>> From: krichy@tvnetwork.hu >>>>> To: pjd@freebsd.org >>>>> Cc: freebsd-fs@freebsd.org >>>>> Subject: Re: kern/184677 (fwd) >>>>> >>>>> Dear PJD, >>>>> >>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>> regarding some issues in ZFS. Please give some advice for me, where to >>>>> look >>>>> for solutions, or how could I help to resolve those issues. >>>>> >>>>> Regards, >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> >>>>> ---------- Forwarded message ---------- >>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>> From: krichy@tvnetwork.hu >>>>> To: freebsd-fs@freebsd.org >>>>> Subject: Re: kern/184677 >>>>> >>>>> >>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>> Fri >>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>> situation. Any comments on this? >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> >>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>> >>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: freebsd-fs@freebsd.org >>>>>> Subject: kern/184677 >>>>>> >>>>>> Dear devs, >>>>>> >>>>>> I've attached a patch, which makes the recursive lockmgr disappear, and >>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>> >>>>>> But unfortunately, my original problem is still not solved, maybe the >>>>>> same >>>>>> as Ryan's: >>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>> >>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>> acquire >>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>>> deadlock scenario. I looked at illumos's code, and for some reason they >>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock >>>>>> with >>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>> >>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>> whole >>>>>> functionality. I would be very thankful if some expert would give some >>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> _______________________________________________ >>>>> freebsd-fs@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>> >>>> >> >> >>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>> From: Richard Kojedzinszky >>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and >>> replace it with the" >>> >>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>> >>> Conflicts: >>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>> --- >>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>> +++++++++------------ >>> 4 files changed, 70 insertions(+), 80 deletions(-) >>> >>> diff --git >>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>> index af2def2..8272c4d 100644 >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>> extern minor_t zfsdev_minor_alloc(void); >>> >>> extern void *zfsdev_state; >>> +extern kmutex_t zfsdev_state_lock; >>> >>> #endif /* _KERNEL */ >>> >>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>> index 15685a5..5c3e9f3 100644 >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>> *max_psize, >>> struct g_provider *pp; >>> struct g_consumer *cp; >>> size_t bufsize; >>> - int error; >>> + int error, lock; >>> >>> /* >>> * We must have a pathname, and it must be absolute. >>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>> *max_psize, >>> >>> vd->vdev_tsd = NULL; >>> >>> + if (mutex_owned(&spa_namespace_lock)) { >>> + mutex_exit(&spa_namespace_lock); >>> + lock = 1; >>> + } else { >>> + lock = 0; >>> + } >>> DROP_GIANT(); >>> g_topology_lock(); >>> error = 0; >>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>> *max_psize, >>> !ISP2(cp->provider->sectorsize)) { >>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>> vd->vdev_path); >>> + >>> + g_topology_lock(); >>> vdev_geom_detach(cp, 0); >>> + g_topology_unlock(); >>> + >>> error = EINVAL; >>> cp = NULL; >>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>> *max_psize, >>> } >>> g_topology_unlock(); >>> PICKUP_GIANT(); >>> + if (lock) >>> + mutex_enter(&spa_namespace_lock); >>> if (cp == NULL) { >>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>> return (error); >>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>> index e9fba26..91becde 100644 >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>> static minor_t last_minor; >>> minor_t m; >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> >>> for (m = last_minor + 1; m != last_minor; m++) { >>> if (m > ZFSDEV_MAX_MINOR) >>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>> minor_t minor; >>> zfs_soft_state_t *zs; >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> >>> minor = zfsdev_minor_alloc(); >>> if (minor == 0) >>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>> static void >>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>> { >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> >>> zfs_onexit_destroy(zo); >>> ddi_soft_state_free(zfsdev_state, minor); >>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, >>> struct thread *td) >>> >>> /* This is the control device. Allocate a new minor if requested. */ >>> if (flag & FEXCL) { >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> error = zfs_ctldev_init(devp); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> } >>> >>> return (error); >>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>> if (minor == 0) >>> return; >>> >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>> if (zo == NULL) { >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return; >>> } >>> zfs_ctldev_destroy(zo, minor); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> } >>> >>> static int >>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>> index 72d4502..aec5219 100644 >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>> #define ZVOL_DUMPSIZE "dumpsize" >>> >>> /* >>> - * The spa_namespace_lock protects the zfsdev_state structure from being >>> - * modified while it's being used, e.g. an open that comes in before a >>> - * create finishes. It also protects temporary opens of the dataset so >>> that, >>> + * This lock protects the zfsdev_state structure from being modified >>> + * while it's being used, e.g. an open that comes in before a create >>> + * finishes. It also protects temporary opens of the dataset so that, >>> * e.g., an open doesn't get a spurious EBUSY. >>> */ >>> +kmutex_t zfsdev_state_lock; >>> static uint32_t zvol_minors; >>> >>> typedef struct zvol_extent { >>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>> struct g_geom *gp; >>> zvol_state_t *zv = NULL; >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> >>> g_topology_lock(); >>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>> { >>> zvol_state_t *zv; >>> >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> zv = zvol_minor_lookup(name); >>> if (minor && zv) >>> *minor = zv->zv_minor; >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (zv ? 0 : -1); >>> } >>> #endif /* sun */ >>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>> >>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>> >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> >>> if (zvol_minor_lookup(name) != NULL) { >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(EEXIST)); >>> } >>> >>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>> >>> if (error) { >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (error); >>> } >>> >>> #ifdef sun >>> if ((minor = zfsdev_minor_alloc()) == 0) { >>> dmu_objset_disown(os, FTAG); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(ENXIO)); >>> } >>> >>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>> dmu_objset_disown(os, FTAG); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(EAGAIN)); >>> } >>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>> ddi_soft_state_free(zfsdev_state, minor); >>> dmu_objset_disown(os, FTAG); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(EAGAIN)); >>> } >>> >>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>> ddi_remove_minor_node(zfs_dip, chrbuf); >>> ddi_soft_state_free(zfsdev_state, minor); >>> dmu_objset_disown(os, FTAG); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(EAGAIN)); >>> } >>> >>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>> >>> zvol_minors++; >>> >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> >>> zvol_geom_run(zv); >>> >>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>> minor_t minor = zv->zv_minor; >>> #endif >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> if (zv->zv_total_opens != 0) >>> return (SET_ERROR(EBUSY)); >>> >>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>> zvol_state_t *zv; >>> int rc; >>> >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(ENXIO)); >>> } >>> g_topology_lock(); >>> rc = zvol_remove_zv(zv); >>> g_topology_unlock(); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (rc); >>> } >>> >>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>> dmu_tx_t *tx; >>> int error; >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> >>> tx = dmu_tx_create(os); >>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>> namelen = strlen(name); >>> >>> DROP_GIANT(); >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> g_topology_lock(); >>> >>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>> } >>> >>> g_topology_unlock(); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> PICKUP_GIANT(); >>> } >>> >>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>> uint64_t volsize) >>> uint64_t old_volsize = 0ULL; >>> uint64_t readonly; >>> >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> zv = zvol_minor_lookup(name); >>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (error); >>> } >>> >>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>> uint64_t volsize) >>> out: >>> dmu_objset_rele(os, FTAG); >>> >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> >>> return (error); >>> } >>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>> count) >>> { >>> zvol_state_t *zv; >>> int err = 0; >>> - boolean_t locked = B_FALSE; >>> >>> - /* >>> - * Protect against recursively entering spa_namespace_lock >>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>> - * This is needed since we replaced upstream zfsdev_state_lock >>> - * with spa_namespace_lock in the ZVOL code. >>> - * We are using the same trick as spa_open(). >>> - * Note that calls in zvol_first_open which need to resolve >>> - * pool name to a spa object will enter spa_open() >>> - * recursively, but that function already has all the >>> - * necessary protection. >>> - */ >>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>> - mutex_enter(&spa_namespace_lock); >>> - locked = B_TRUE; >>> - } >>> + mutex_enter(&zfsdev_state_lock); >>> >>> zv = pp->private; >>> if (zv == NULL) { >>> - if (locked) >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(ENXIO)); >>> } >>> >>> if (zv->zv_total_opens == 0) >>> err = zvol_first_open(zv); >>> if (err) { >>> - if (locked) >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (err); >>> } >>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>> count) >>> #endif >>> >>> zv->zv_total_opens += count; >>> - if (locked) >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> >>> return (err); >>> out: >>> if (zv->zv_total_opens == 0) >>> zvol_last_close(zv); >>> - if (locked) >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (err); >>> } >>> >>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>> count) >>> { >>> zvol_state_t *zv; >>> int error = 0; >>> - boolean_t locked = B_FALSE; >>> >>> - /* See comment in zvol_open(). */ >>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>> - mutex_enter(&spa_namespace_lock); >>> - locked = B_TRUE; >>> - } >>> + mutex_enter(&zfsdev_state_lock); >>> >>> zv = pp->private; >>> if (zv == NULL) { >>> - if (locked) >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(ENXIO)); >>> } >>> >>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>> count) >>> if (zv->zv_total_opens == 0) >>> zvol_last_close(zv); >>> >>> - if (locked) >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (error); >>> } >>> >>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>> flag, cred_t *cr, int *rvalp) >>> int error = 0; >>> rl_t *rl; >>> >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> >>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>> >>> if (zv == NULL) { >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (SET_ERROR(ENXIO)); >>> } >>> ASSERT(zv->zv_total_opens > 0); >>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>> flag, cred_t *cr, int *rvalp) >>> dki.dki_ctype = DKC_UNKNOWN; >>> dki.dki_unit = getminor(dev); >>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>> zv->zv_min_bs); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>> error = SET_ERROR(EFAULT); >>> return (error); >>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>> flag, cred_t *cr, int *rvalp) >>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>> dkm.dki_media_type = DK_UNKNOWN; >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>> error = SET_ERROR(EFAULT); >>> return (error); >>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>> flag, cred_t *cr, int *rvalp) >>> uint64_t vs = zv->zv_volsize; >>> uint8_t bs = zv->zv_min_bs; >>> >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> error = zvol_getefi((void *)arg, flag, vs, bs); >>> return (error); >>> } >>> >>> case DKIOCFLUSHWRITECACHE: >>> dkc = (struct dk_callback *)arg; >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>> flag, cred_t *cr, int *rvalp) >>> } >>> if (wce) { >>> zv->zv_flags |= ZVOL_WCE; >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> } else { >>> zv->zv_flags &= ~ZVOL_WCE; >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>> } >>> return (0); >>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>> flag, cred_t *cr, int *rvalp) >>> break; >>> >>> } >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> return (error); >>> } >>> #endif /* sun */ >>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>> { >>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>> 1) == 0); >>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>> ZFS_LOG(1, "ZVOL Initialized."); >>> } >>> >>> void >>> zvol_fini(void) >>> { >>> + mutex_destroy(&zfsdev_state_lock); >>> ddi_soft_state_fini(&zfsdev_state); >>> ZFS_LOG(1, "ZVOL Deinitialized."); >>> } >>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >>> uint64_t version = spa_version(spa); >>> enum zio_checksum checksum; >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>> >>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>> *newname) >>> struct g_provider *pp; >>> zvol_state_t *zv; >>> >>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>> g_topology_assert(); >>> >>> pp = LIST_FIRST(&gp->provider); >>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char >>> *newname) >>> newnamelen = strlen(newname); >>> >>> DROP_GIANT(); >>> - mutex_enter(&spa_namespace_lock); >>> + mutex_enter(&zfsdev_state_lock); >>> g_topology_lock(); >>> >>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char >>> *newname) >>> } >>> >>> g_topology_unlock(); >>> - mutex_exit(&spa_namespace_lock); >>> + mutex_exit(&zfsdev_state_lock); >>> PICKUP_GIANT(); >>> } >>> -- >>> 1.8.4.2 >>> >> >>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>> From: Richard Kojedzinszky >>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>> >>> --- >>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>> +++++++++++++++------- >>> 2 files changed, 42 insertions(+), 24 deletions(-) >>> >>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>> index 94383d6..4cac053 100644 >>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>> * progress on this vnode. >>> */ >>> >>> + vn_lock(cvp, lktype); >>> + >>> for (;;) { >>> /* >>> * Reached the end of the mount chain? >>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>> if (vfsp == NULL) >>> break; >>> error = vfs_busy(vfsp, 0); >>> - /* >>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>> - */ >>> - if (tvp != NULL) >>> - vput(cvp); >>> - else >>> - vrele(cvp); >>> + VOP_UNLOCK(cvp, 0); >>> if (error) >>> return (error); >>> >>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>> vfs_unbusy(vfsp); >>> if (error != 0) >>> return (error); >>> + >>> + VN_RELE(cvp); >>> + >>> cvp = tvp; >>> } >>> >>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>> index 28ab1fa..d3464b4 100644 >>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>> return (0); >>> } >>> >>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>> + * On return the passed in vp is unlocked */ >>> +static zfsctl_snapdir_t * >>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>> +{ >>> + gfs_dir_t *dp = vp->v_data; >>> + *dvpp = dp->gfsd_file.gfs_parent; >>> + zfsctl_snapdir_t *sdp; >>> + >>> + VN_HOLD(*dvpp); >>> + VOP_UNLOCK(vp, 0); >>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>> + sdp = (*dvpp)->v_data; >>> + VOP_UNLOCK(*dvpp, 0); >>> + >>> + return (sdp); >>> +} >>> + >>> #ifdef sun >>> vnodeops_t *zfsctl_ops_root; >>> vnodeops_t *zfsctl_ops_snapdir; >>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>> * The snapshot was unmounted behind our backs, >>> * try to remount it. >>> */ >>> + VOP_UNLOCK(*vpp, 0); >>> + VN_HOLD(*vpp); >>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>> snapname) == 0); >>> goto domount; >>> } else { >>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>> (void) strcpy(sep->se_name, nm); >>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>> dmu_objset_id(snap)); >>> - VN_HOLD(*vpp); >>> avl_insert(&sdp->sd_snaps, sep, where); >>> >>> dmu_objset_rele(snap, FTAG); >>> @@ -1075,6 +1095,7 @@ domount: >>> (void) snprintf(mountpoint, mountpoint_len, >>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>> + VN_HOLD(*vpp); >>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>> kmem_free(mountpoint, mountpoint_len); >>> if (err == 0) { >>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>> int locked; >>> vnode_t *dvp; >>> >>> - if (vp->v_count > 0) >>> - goto end; >>> - >>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>> - sdp = dvp->v_data; >>> - VOP_UNLOCK(dvp, 0); >>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>> >>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>> mutex_enter(&sdp->sd_lock); >>> >>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>> + >>> + if (vp->v_count > 0) { >>> + mutex_exit(&sdp->sd_lock); >>> + return (0); >>> + } >>> + >>> ASSERT(!vn_ismntpt(vp)); >>> >>> sep = avl_first(&sdp->sd_snaps); >>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>> mutex_exit(&sdp->sd_lock); >>> VN_RELE(dvp); >>> >>> -end: >>> /* >>> * Dispose of the vnode for the snapshot mount point. >>> * This is safe to do because once this entry has been removed >>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>> static int >>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>> { >>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>> - vnode_t *dvp, *vp; >>> + vnode_t *dvp, *vp = ap->a_vp; >>> zfsctl_snapdir_t *sdp; >>> zfs_snapentry_t *sep; >>> - int error; >>> + int error = 0; >>> >>> - ASSERT(zfsvfs->z_ctldir != NULL); >>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>> - if (error != 0) >>> - return (error); >>> - sdp = dvp->v_data; >>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>> >>> mutex_enter(&sdp->sd_lock); >>> + >>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>> + >>> sep = avl_first(&sdp->sd_snaps); >>> while (sep != NULL) { >>> vp = sep->se_root; >>> -- >>> 1.8.4.2 >>> >> >> -- >> Pawel Jakub Dawidek http://www.wheelsystems.com >> FreeBSD committer http://www.FreeBSD.org >> Am I Evil? Yes, I Am! http://mobter.com >> > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 07:25:48 2013 Return-Path: Delivered-To: freebsd-fs@smarthost.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4EB142F3; Tue, 24 Dec 2013 07:25:48 +0000 (UTC) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:1900:2254:206c::16:87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 239241636; Tue, 24 Dec 2013 07:25:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.7/8.14.7) with ESMTP id rBO7PlWI012584; Tue, 24 Dec 2013 07:25:47 GMT (envelope-from delphij@freefall.freebsd.org) Received: (from delphij@localhost) by freefall.freebsd.org (8.14.7/8.14.7/Submit) id rBO7Pl9c012567; Tue, 24 Dec 2013 07:25:47 GMT (envelope-from delphij) Date: Tue, 24 Dec 2013 07:25:47 GMT Message-Id: <201312240725.rBO7Pl9c012567@freefall.freebsd.org> To: kwhite@site.uottawa.ca, delphij@FreeBSD.org, freebsd-fs@FreeBSD.org, avg@FreeBSD.org From: delphij@FreeBSD.org Subject: Re: kern/182570: [zfs] [patch] ZFS panic in receive X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 07:25:48 -0000 Synopsis: [zfs] [patch] ZFS panic in receive State-Changed-From-To: open->closed State-Changed-By: delphij State-Changed-When: Tue Dec 24 07:25:24 UTC 2013 State-Changed-Why: Mark as closed -- fix was committed by avg@. Responsible-Changed-From-To: freebsd-fs->avg Responsible-Changed-By: delphij Responsible-Changed-When: Tue Dec 24 07:25:24 UTC 2013 Responsible-Changed-Why: Assign. http://www.freebsd.org/cgi/query-pr.cgi?pr=182570 From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 20:29:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CA68B3C7; Tue, 24 Dec 2013 20:29:01 +0000 (UTC) Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10]) by mx1.freebsd.org (Postfix) with ESMTP id 39C291DEA; Tue, 24 Dec 2013 20:29:00 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id B01F2A894; Tue, 24 Dec 2013 21:28:39 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id AB25AA893; Tue, 24 Dec 2013 21:28:39 +0100 (CET) Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 20:29:01 -0000 Dear all, Unfortunately, this seems to be serious, as a plain user can deadlock zfs using the attached script, and can make a DoS against the mounted dataset's .zfs/snapshot directory. Regards, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear devs, > > A third one, again the snapshot dirs: > > # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & > # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null > > Will deadlock in a few seconds. > > The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries to > lock .zfs, while the other just tries to do this in reversed order. > > In a vop_lookup, why is the passed in vnode, and the returned vnode must be > locked? Why is not enough to have it held? > > Thanks in advance, > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: > >> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) >> From: krichy@tvnetwork.hu >> To: Pawel Jakub Dawidek >> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> Dear Pawel, >> >> Thanks for your response. I hope someone will review my work. >> >> Secondly, I think I;ve found another deadlock scenario: >> >> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock >> held. Then it does an unmount of all snapshots, which in turn goes to >> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. >> >> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock does >> a snapshot mount, which somewhere tries to acquire spa_namespace_lock. So >> it deadlocks. Currently I dont know how to workaround this. >> >> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? >> >> What if we release it function enter, and re-acquire upon exit? >> >> Thanks in advance, >> >> >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: >> >>> Date: Fri, 20 Dec 2013 14:44:05 +0100 >>> From: Pawel Jakub Dawidek >>> To: krichy@tvnetwork.hu >>> Cc: freebsd-fs@freebsd.org >>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>> >>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>>> Dear devs, >>>> >>>> I am attaching a more clear patch/fix for my snapshot handling issues >>>> (0002), but I would be happy if some ZFS expert would comment it. I am >>>> trying to solve it at least for two weeks now, and an ACK or a NACK would >>>> be nice from someone. Also a commit is reverted since it also caused >>>> deadlocks. I've read its comments, which also eliminates deadlocks, but I >>>> did not find any reference how to produce that deadlock. In my view >>>> reverting that makes my issues disappear, but I dont know what new cases >>>> will it rise. >>> >>> Richard, >>> >>> I won't be able to analyse it myself anytime soon, because of other >>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >>> there will be able to help you. >>> >>>> I've rewritten traverse() to make more like upstream, added two extra >>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode what >>>> was passed to it (I dont know even that upon creating a snapshot vnode >>>> why >>>> is that extra two holds needed for the vnode.) And unfortunately, due to >>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >>>> cause deadlocks, so zfsctl_snapshot_inactive() and >>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>>> >>>> After this, one may also get a deadlock, when a simple access would call >>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >>>> should always be covered, but after some stress test, sometimes we hit >>>> that call, and that can cause again deadlocks. In our environment I've >>>> just uncommented that callback, which returns ENODIR on some calls, but >>>> at >>>> least not a deadlock. >>>> >>>> The attached script can be used to reproduce my cases (would one confirm >>>> that?), and after the patches applied, they disappear (confirm?). >>>> >>>> Thanks, >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>>> >>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>>> From: krichy@tvnetwork.hu >>>>> To: pjd@freebsd.org >>>>> Cc: freebsd-fs@freebsd.org >>>>> Subject: Re: kern/184677 (fwd) >>>>> >>>>> Dear devs, >>>>> >>>>> I will sum up my experience regarding the issue: >>>>> >>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>>> >>>>> After investigating the problem, I found that zfs send umounts the >>>>> snapshots, >>>>> and that causes the deadlock, so later I tested only with concurrent >>>>> umount >>>>> and the "activity". More later I found that listing the snapshots in >>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I >>>>> used >>>>> them for the tests. But for my surprise, instead of a deadlock, a >>>>> recursive >>>>> lock panic has arised. >>>>> >>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>>> zfsctl_snapdir_t >>>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>>> entry >>>>> (sep) contains the vnode of entry on which the snapshot is mounted on >>>>> top >>>>> (se_root). The strange is that the se_root member does not hold a >>>>> reference >>>>> for the vnode, just a simple pointer to it. >>>>> >>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>>> locked, >>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>>>> exists already. If it founds no entry, does the mount. In the case of an >>>>> entry was found, the se_root member contains the vnode which the >>>>> snapshot is >>>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>>> will >>>>> resolve to the real root vnode of the mounted snapshot, returning it as >>>>> locked. (Examining the traverse() code I've found that it did not follow >>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>>> >>>>> On the other way, when an umount is issued, the se_root vnode looses its >>>>> last >>>>> reference (as only the mountpoint holds one for it), it goes through the >>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>>> called >>>>> with a locked vnode, so this is a deadlock race condition. While >>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and traverse() >>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock >>>>> on >>>>> se_root while tries to access the sdp lock. >>>>> >>>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>>> v_usecount, >>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it >>>>> is >>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>>> path >>>>> assumes that the vnode remains inactive (as opposed to illumos, at least >>>>> how >>>>> i read the code). So zfsctl_snapshot_inactive() must free resources >>>>> while in >>>>> a locked state. I was a bit confused, and probably that is why the >>>>> previously >>>>> posted patch is as is. >>>>> >>>>> Maybe if I had some clues on the directions of this problem, I could >>>>> have >>>>> worked more for a nicer, shorter solution. >>>>> >>>>> Please someone comment on my post. >>>>> >>>>> Regards, >>>>> >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> >>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>> >>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: pjd@freebsd.org >>>>>> Cc: freebsd-fs@freebsd.org >>>>>> Subject: Re: kern/184677 (fwd) >>>>>> >>>>>> Dear PJD, >>>>>> >>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>>> regarding some issues in ZFS. Please give some advice for me, where to >>>>>> look >>>>>> for solutions, or how could I help to resolve those issues. >>>>>> >>>>>> Regards, >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: freebsd-fs@freebsd.org >>>>>> Subject: Re: kern/184677 >>>>>> >>>>>> >>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>>> Fri >>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>>> situation. Any comments on this? >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>> >>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: freebsd-fs@freebsd.org >>>>>>> Subject: kern/184677 >>>>>>> >>>>>>> Dear devs, >>>>>>> >>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, >>>>>>> and >>>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>>> >>>>>>> But unfortunately, my original problem is still not solved, maybe the >>>>>>> same >>>>>>> as Ryan's: >>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>>> >>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>>> acquire >>>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>>>> deadlock scenario. I looked at illumos's code, and for some reason >>>>>>> they >>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock >>>>>>> with >>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>>> >>>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>>> whole >>>>>>> functionality. I would be very thankful if some expert would give some >>>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>>> >>>>>>> Thanks in advance, >>>>>>> >>>>>>> >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> _______________________________________________ >>>>>> freebsd-fs@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>> >>>>> >>> >>> >>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>>> From: Richard Kojedzinszky >>>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely and >>>> replace it with the" >>>> >>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>>> >>>> Conflicts: >>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> --- >>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>>> +++++++++------------ >>>> 4 files changed, 70 insertions(+), 80 deletions(-) >>>> >>>> diff --git >>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> index af2def2..8272c4d 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>>> extern minor_t zfsdev_minor_alloc(void); >>>> >>>> extern void *zfsdev_state; >>>> +extern kmutex_t zfsdev_state_lock; >>>> >>>> #endif /* _KERNEL */ >>>> >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> index 15685a5..5c3e9f3 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> struct g_provider *pp; >>>> struct g_consumer *cp; >>>> size_t bufsize; >>>> - int error; >>>> + int error, lock; >>>> >>>> /* >>>> * We must have a pathname, and it must be absolute. >>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> >>>> vd->vdev_tsd = NULL; >>>> >>>> + if (mutex_owned(&spa_namespace_lock)) { >>>> + mutex_exit(&spa_namespace_lock); >>>> + lock = 1; >>>> + } else { >>>> + lock = 0; >>>> + } >>>> DROP_GIANT(); >>>> g_topology_lock(); >>>> error = 0; >>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> !ISP2(cp->provider->sectorsize)) { >>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>>> vd->vdev_path); >>>> + >>>> + g_topology_lock(); >>>> vdev_geom_detach(cp, 0); >>>> + g_topology_unlock(); >>>> + >>>> error = EINVAL; >>>> cp = NULL; >>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>> *max_psize, >>>> } >>>> g_topology_unlock(); >>>> PICKUP_GIANT(); >>>> + if (lock) >>>> + mutex_enter(&spa_namespace_lock); >>>> if (cp == NULL) { >>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>>> return (error); >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> index e9fba26..91becde 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>>> static minor_t last_minor; >>>> minor_t m; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> for (m = last_minor + 1; m != last_minor; m++) { >>>> if (m > ZFSDEV_MAX_MINOR) >>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>>> minor_t minor; >>>> zfs_soft_state_t *zs; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> minor = zfsdev_minor_alloc(); >>>> if (minor == 0) >>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>>> static void >>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>>> { >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> zfs_onexit_destroy(zo); >>>> ddi_soft_state_free(zfsdev_state, minor); >>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, >>>> struct thread *td) >>>> >>>> /* This is the control device. Allocate a new minor if requested. */ >>>> if (flag & FEXCL) { >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> error = zfs_ctldev_init(devp); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> } >>>> >>>> return (error); >>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>>> if (minor == 0) >>>> return; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>>> if (zo == NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return; >>>> } >>>> zfs_ctldev_destroy(zo, minor); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> } >>>> >>>> static int >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> index 72d4502..aec5219 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>>> #define ZVOL_DUMPSIZE "dumpsize" >>>> >>>> /* >>>> - * The spa_namespace_lock protects the zfsdev_state structure from being >>>> - * modified while it's being used, e.g. an open that comes in before a >>>> - * create finishes. It also protects temporary opens of the dataset so >>>> that, >>>> + * This lock protects the zfsdev_state structure from being modified >>>> + * while it's being used, e.g. an open that comes in before a create >>>> + * finishes. It also protects temporary opens of the dataset so that, >>>> * e.g., an open doesn't get a spurious EBUSY. >>>> */ >>>> +kmutex_t zfsdev_state_lock; >>>> static uint32_t zvol_minors; >>>> >>>> typedef struct zvol_extent { >>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>>> struct g_geom *gp; >>>> zvol_state_t *zv = NULL; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> g_topology_lock(); >>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>>> { >>>> zvol_state_t *zv; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> zv = zvol_minor_lookup(name); >>>> if (minor && zv) >>>> *minor = zv->zv_minor; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (zv ? 0 : -1); >>>> } >>>> #endif /* sun */ >>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>>> >>>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> if (zvol_minor_lookup(name) != NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EEXIST)); >>>> } >>>> >>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>>> >>>> if (error) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> >>>> #ifdef sun >>>> if ((minor = zfsdev_minor_alloc()) == 0) { >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> >>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EAGAIN)); >>>> } >>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>>> ddi_soft_state_free(zfsdev_state, minor); >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EAGAIN)); >>>> } >>>> >>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>>> ddi_remove_minor_node(zfs_dip, chrbuf); >>>> ddi_soft_state_free(zfsdev_state, minor); >>>> dmu_objset_disown(os, FTAG); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(EAGAIN)); >>>> } >>>> >>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>>> >>>> zvol_minors++; >>>> >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> >>>> zvol_geom_run(zv); >>>> >>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>>> minor_t minor = zv->zv_minor; >>>> #endif >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> if (zv->zv_total_opens != 0) >>>> return (SET_ERROR(EBUSY)); >>>> >>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>>> zvol_state_t *zv; >>>> int rc; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> g_topology_lock(); >>>> rc = zvol_remove_zv(zv); >>>> g_topology_unlock(); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (rc); >>>> } >>>> >>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>>> dmu_tx_t *tx; >>>> int error; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> >>>> tx = dmu_tx_create(os); >>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>>> namelen = strlen(name); >>>> >>>> DROP_GIANT(); >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> g_topology_lock(); >>>> >>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>>> } >>>> >>>> g_topology_unlock(); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> PICKUP_GIANT(); >>>> } >>>> >>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>>> uint64_t volsize) >>>> uint64_t old_volsize = 0ULL; >>>> uint64_t readonly; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> zv = zvol_minor_lookup(name); >>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> >>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>>> uint64_t volsize) >>>> out: >>>> dmu_objset_rele(os, FTAG); >>>> >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> >>>> return (error); >>>> } >>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>>> count) >>>> { >>>> zvol_state_t *zv; >>>> int err = 0; >>>> - boolean_t locked = B_FALSE; >>>> >>>> - /* >>>> - * Protect against recursively entering spa_namespace_lock >>>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>>> - * This is needed since we replaced upstream zfsdev_state_lock >>>> - * with spa_namespace_lock in the ZVOL code. >>>> - * We are using the same trick as spa_open(). >>>> - * Note that calls in zvol_first_open which need to resolve >>>> - * pool name to a spa object will enter spa_open() >>>> - * recursively, but that function already has all the >>>> - * necessary protection. >>>> - */ >>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>> - mutex_enter(&spa_namespace_lock); >>>> - locked = B_TRUE; >>>> - } >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> zv = pp->private; >>>> if (zv == NULL) { >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> >>>> if (zv->zv_total_opens == 0) >>>> err = zvol_first_open(zv); >>>> if (err) { >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (err); >>>> } >>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>>> count) >>>> #endif >>>> >>>> zv->zv_total_opens += count; >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> >>>> return (err); >>>> out: >>>> if (zv->zv_total_opens == 0) >>>> zvol_last_close(zv); >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (err); >>>> } >>>> >>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>>> count) >>>> { >>>> zvol_state_t *zv; >>>> int error = 0; >>>> - boolean_t locked = B_FALSE; >>>> >>>> - /* See comment in zvol_open(). */ >>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>> - mutex_enter(&spa_namespace_lock); >>>> - locked = B_TRUE; >>>> - } >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> zv = pp->private; >>>> if (zv == NULL) { >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> >>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>>> count) >>>> if (zv->zv_total_opens == 0) >>>> zvol_last_close(zv); >>>> >>>> - if (locked) >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> >>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> int error = 0; >>>> rl_t *rl; >>>> >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> >>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>>> >>>> if (zv == NULL) { >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (SET_ERROR(ENXIO)); >>>> } >>>> ASSERT(zv->zv_total_opens > 0); >>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> dki.dki_ctype = DKC_UNKNOWN; >>>> dki.dki_unit = getminor(dev); >>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>>> zv->zv_min_bs); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>>> error = SET_ERROR(EFAULT); >>>> return (error); >>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>>> dkm.dki_media_type = DK_UNKNOWN; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>>> error = SET_ERROR(EFAULT); >>>> return (error); >>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> uint64_t vs = zv->zv_volsize; >>>> uint8_t bs = zv->zv_min_bs; >>>> >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> error = zvol_getefi((void *)arg, flag, vs, bs); >>>> return (error); >>>> } >>>> >>>> case DKIOCFLUSHWRITECACHE: >>>> dkc = (struct dk_callback *)arg; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> } >>>> if (wce) { >>>> zv->zv_flags |= ZVOL_WCE; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> } else { >>>> zv->zv_flags &= ~ZVOL_WCE; >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>> } >>>> return (0); >>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>> flag, cred_t *cr, int *rvalp) >>>> break; >>>> >>>> } >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> return (error); >>>> } >>>> #endif /* sun */ >>>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>>> { >>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>>> 1) == 0); >>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>>> ZFS_LOG(1, "ZVOL Initialized."); >>>> } >>>> >>>> void >>>> zvol_fini(void) >>>> { >>>> + mutex_destroy(&zfsdev_state_lock); >>>> ddi_soft_state_fini(&zfsdev_state); >>>> ZFS_LOG(1, "ZVOL Deinitialized."); >>>> } >>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >>>> uint64_t version = spa_version(spa); >>>> enum zio_checksum checksum; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>>> >>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>>> *newname) >>>> struct g_provider *pp; >>>> zvol_state_t *zv; >>>> >>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>> g_topology_assert(); >>>> >>>> pp = LIST_FIRST(&gp->provider); >>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char >>>> *newname) >>>> newnamelen = strlen(newname); >>>> >>>> DROP_GIANT(); >>>> - mutex_enter(&spa_namespace_lock); >>>> + mutex_enter(&zfsdev_state_lock); >>>> g_topology_lock(); >>>> >>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char >>>> *newname) >>>> } >>>> >>>> g_topology_unlock(); >>>> - mutex_exit(&spa_namespace_lock); >>>> + mutex_exit(&zfsdev_state_lock); >>>> PICKUP_GIANT(); >>>> } >>>> -- >>>> 1.8.4.2 >>>> >>> >>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>>> From: Richard Kojedzinszky >>>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>>> >>>> --- >>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>>> +++++++++++++++------- >>>> 2 files changed, 42 insertions(+), 24 deletions(-) >>>> >>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> index 94383d6..4cac053 100644 >>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>>> * progress on this vnode. >>>> */ >>>> >>>> + vn_lock(cvp, lktype); >>>> + >>>> for (;;) { >>>> /* >>>> * Reached the end of the mount chain? >>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>>> if (vfsp == NULL) >>>> break; >>>> error = vfs_busy(vfsp, 0); >>>> - /* >>>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>>> - */ >>>> - if (tvp != NULL) >>>> - vput(cvp); >>>> - else >>>> - vrele(cvp); >>>> + VOP_UNLOCK(cvp, 0); >>>> if (error) >>>> return (error); >>>> >>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>>> vfs_unbusy(vfsp); >>>> if (error != 0) >>>> return (error); >>>> + >>>> + VN_RELE(cvp); >>>> + >>>> cvp = tvp; >>>> } >>>> >>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> index 28ab1fa..d3464b4 100644 >>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>>> return (0); >>>> } >>>> >>>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>>> + * On return the passed in vp is unlocked */ >>>> +static zfsctl_snapdir_t * >>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>>> +{ >>>> + gfs_dir_t *dp = vp->v_data; >>>> + *dvpp = dp->gfsd_file.gfs_parent; >>>> + zfsctl_snapdir_t *sdp; >>>> + >>>> + VN_HOLD(*dvpp); >>>> + VOP_UNLOCK(vp, 0); >>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>>> + sdp = (*dvpp)->v_data; >>>> + VOP_UNLOCK(*dvpp, 0); >>>> + >>>> + return (sdp); >>>> +} >>>> + >>>> #ifdef sun >>>> vnodeops_t *zfsctl_ops_root; >>>> vnodeops_t *zfsctl_ops_snapdir; >>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>>> * The snapshot was unmounted behind our backs, >>>> * try to remount it. >>>> */ >>>> + VOP_UNLOCK(*vpp, 0); >>>> + VN_HOLD(*vpp); >>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>>> snapname) == 0); >>>> goto domount; >>>> } else { >>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>>> (void) strcpy(sep->se_name, nm); >>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>>> dmu_objset_id(snap)); >>>> - VN_HOLD(*vpp); >>>> avl_insert(&sdp->sd_snaps, sep, where); >>>> >>>> dmu_objset_rele(snap, FTAG); >>>> @@ -1075,6 +1095,7 @@ domount: >>>> (void) snprintf(mountpoint, mountpoint_len, >>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>>> + VN_HOLD(*vpp); >>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>>> kmem_free(mountpoint, mountpoint_len); >>>> if (err == 0) { >>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>>> int locked; >>>> vnode_t *dvp; >>>> >>>> - if (vp->v_count > 0) >>>> - goto end; >>>> - >>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>>> - sdp = dvp->v_data; >>>> - VOP_UNLOCK(dvp, 0); >>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>> >>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>>> mutex_enter(&sdp->sd_lock); >>>> >>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>>> + >>>> + if (vp->v_count > 0) { >>>> + mutex_exit(&sdp->sd_lock); >>>> + return (0); >>>> + } >>>> + >>>> ASSERT(!vn_ismntpt(vp)); >>>> >>>> sep = avl_first(&sdp->sd_snaps); >>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>>> mutex_exit(&sdp->sd_lock); >>>> VN_RELE(dvp); >>>> >>>> -end: >>>> /* >>>> * Dispose of the vnode for the snapshot mount point. >>>> * This is safe to do because once this entry has been removed >>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>>> static int >>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>>> { >>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>>> - vnode_t *dvp, *vp; >>>> + vnode_t *dvp, *vp = ap->a_vp; >>>> zfsctl_snapdir_t *sdp; >>>> zfs_snapentry_t *sep; >>>> - int error; >>>> + int error = 0; >>>> >>>> - ASSERT(zfsvfs->z_ctldir != NULL); >>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>>> - if (error != 0) >>>> - return (error); >>>> - sdp = dvp->v_data; >>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>> >>>> mutex_enter(&sdp->sd_lock); >>>> + >>>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>>> + >>>> sep = avl_first(&sdp->sd_snaps); >>>> while (sep != NULL) { >>>> vp = sep->se_root; >>>> -- >>>> 1.8.4.2 >>>> >>> >>> -- >>> Pawel Jakub Dawidek http://www.wheelsystems.com >>> FreeBSD committer http://www.FreeBSD.org >>> Am I Evil? Yes, I Am! http://mobter.com >>> >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >> > From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 20:35:37 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B76355FD; Tue, 24 Dec 2013 20:35:37 +0000 (UTC) Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10]) by mx1.freebsd.org (Postfix) with ESMTP id 2838D1E6A; Tue, 24 Dec 2013 20:35:37 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id 9B014A899; Tue, 24 Dec 2013 21:35:16 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id 97E3BA898; Tue, 24 Dec 2013 21:35:16 +0100 (CET) Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 20:35:37 -0000 Dear all, Now a user caused deadlock simply caused buildkernel to be blocked, so I must correct myself in that this not only does a DoS for .zfs/snapshot directories, but may for the whole system. Regards, P.S.: Merry Christmas! Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear all, > > Unfortunately, this seems to be serious, as a plain user can deadlock zfs > using the attached script, and can make a DoS against the mounted dataset's > .zfs/snapshot directory. > > Regards, > > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > >> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) >> From: krichy@tvnetwork.hu >> To: Pawel Jakub Dawidek >> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> Dear devs, >> >> A third one, again the snapshot dirs: >> >> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & >> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null >> >> Will deadlock in a few seconds. >> >> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries >> to lock .zfs, while the other just tries to do this in reversed order. >> >> In a vop_lookup, why is the passed in vnode, and the returned vnode must be >> locked? Why is not enough to have it held? >> >> Thanks in advance, >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: >> >>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) >>> From: krichy@tvnetwork.hu >>> To: Pawel Jakub Dawidek >>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>> >>> Dear Pawel, >>> >>> Thanks for your response. I hope someone will review my work. >>> >>> Secondly, I think I;ve found another deadlock scenario: >>> >>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock >>> held. Then it does an unmount of all snapshots, which in turn goes to >>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. >>> >>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock >>> does a snapshot mount, which somewhere tries to acquire >>> spa_namespace_lock. So it deadlocks. Currently I dont know how to >>> workaround this. >>> >>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? >>> >>> What if we release it function enter, and re-acquire upon exit? >>> >>> Thanks in advance, >>> >>> >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: >>> >>>> Date: Fri, 20 Dec 2013 14:44:05 +0100 >>>> From: Pawel Jakub Dawidek >>>> To: krichy@tvnetwork.hu >>>> Cc: freebsd-fs@freebsd.org >>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>>> >>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>>>> Dear devs, >>>>> >>>>> I am attaching a more clear patch/fix for my snapshot handling issues >>>>> (0002), but I would be happy if some ZFS expert would comment it. I am >>>>> trying to solve it at least for two weeks now, and an ACK or a NACK >>>>> would >>>>> be nice from someone. Also a commit is reverted since it also caused >>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but >>>>> I >>>>> did not find any reference how to produce that deadlock. In my view >>>>> reverting that makes my issues disappear, but I dont know what new cases >>>>> will it rise. >>>> >>>> Richard, >>>> >>>> I won't be able to analyse it myself anytime soon, because of other >>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >>>> there will be able to help you. >>>> >>>>> I've rewritten traverse() to make more like upstream, added two extra >>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode >>>>> what >>>>> was passed to it (I dont know even that upon creating a snapshot vnode >>>>> why >>>>> is that extra two holds needed for the vnode.) And unfortunately, due to >>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could also >>>>> cause deadlocks, so zfsctl_snapshot_inactive() and >>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>>>> >>>>> After this, one may also get a deadlock, when a simple access would call >>>>> into zfsctl_snapshot_lookup(). The documentation says, that those vnodes >>>>> should always be covered, but after some stress test, sometimes we hit >>>>> that call, and that can cause again deadlocks. In our environment I've >>>>> just uncommented that callback, which returns ENODIR on some calls, but >>>>> at >>>>> least not a deadlock. >>>>> >>>>> The attached script can be used to reproduce my cases (would one confirm >>>>> that?), and after the patches applied, they disappear (confirm?). >>>>> >>>>> Thanks, >>>>> >>>>> >>>>> Kojedzinszky Richard >>>>> Euronet Magyarorszag Informatikai Zrt. >>>>> >>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>>>> >>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>>>> From: krichy@tvnetwork.hu >>>>>> To: pjd@freebsd.org >>>>>> Cc: freebsd-fs@freebsd.org >>>>>> Subject: Re: kern/184677 (fwd) >>>>>> >>>>>> Dear devs, >>>>>> >>>>>> I will sum up my experience regarding the issue: >>>>>> >>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on the >>>>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>>>> >>>>>> After investigating the problem, I found that zfs send umounts the >>>>>> snapshots, >>>>>> and that causes the deadlock, so later I tested only with concurrent >>>>>> umount >>>>>> and the "activity". More later I found that listing the snapshots in >>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I >>>>>> used >>>>>> them for the tests. But for my surprise, instead of a deadlock, a >>>>>> recursive >>>>>> lock panic has arised. >>>>>> >>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>>>> zfsctl_snapdir_t >>>>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>>>> entry >>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on >>>>>> top >>>>>> (se_root). The strange is that the se_root member does not hold a >>>>>> reference >>>>>> for the vnode, just a simple pointer to it. >>>>>> >>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>>>> locked, >>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if it >>>>>> exists already. If it founds no entry, does the mount. In the case of >>>>>> an >>>>>> entry was found, the se_root member contains the vnode which the >>>>>> snapshot is >>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>>>> will >>>>>> resolve to the real root vnode of the mounted snapshot, returning it as >>>>>> locked. (Examining the traverse() code I've found that it did not >>>>>> follow >>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>>>> >>>>>> On the other way, when an umount is issued, the se_root vnode looses >>>>>> its last >>>>>> reference (as only the mountpoint holds one for it), it goes through >>>>>> the >>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>>>> called >>>>>> with a locked vnode, so this is a deadlock race condition. While >>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and >>>>>> traverse() >>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the lock >>>>>> on >>>>>> se_root while tries to access the sdp lock. >>>>>> >>>>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>>>> v_usecount, >>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context it >>>>>> is >>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>>>> path >>>>>> assumes that the vnode remains inactive (as opposed to illumos, at >>>>>> least how >>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources >>>>>> while in >>>>>> a locked state. I was a bit confused, and probably that is why the >>>>>> previously >>>>>> posted patch is as is. >>>>>> >>>>>> Maybe if I had some clues on the directions of this problem, I could >>>>>> have >>>>>> worked more for a nicer, shorter solution. >>>>>> >>>>>> Please someone comment on my post. >>>>>> >>>>>> Regards, >>>>>> >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>> >>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: pjd@freebsd.org >>>>>>> Cc: freebsd-fs@freebsd.org >>>>>>> Subject: Re: kern/184677 (fwd) >>>>>>> >>>>>>> Dear PJD, >>>>>>> >>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>>>> regarding some issues in ZFS. Please give some advice for me, where to >>>>>>> look >>>>>>> for solutions, or how could I help to resolve those issues. >>>>>>> >>>>>>> Regards, >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: freebsd-fs@freebsd.org >>>>>>> Subject: Re: kern/184677 >>>>>>> >>>>>>> >>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>>>> Fri >>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>>>> situation. Any comments on this? >>>>>>> >>>>>>> >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> >>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>>> >>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>>>> From: krichy@tvnetwork.hu >>>>>>>> To: freebsd-fs@freebsd.org >>>>>>>> Subject: kern/184677 >>>>>>>> >>>>>>>> Dear devs, >>>>>>>> >>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, >>>>>>>> and >>>>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>>>> >>>>>>>> But unfortunately, my original problem is still not solved, maybe the >>>>>>>> same >>>>>>>> as Ryan's: >>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>>>> >>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>>>> acquire >>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes a >>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason >>>>>>>> they >>>>>>>> use another mutex on zfsdev_close(), which therefore may not deadlock >>>>>>>> with >>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>>>> >>>>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>>>> whole >>>>>>>> functionality. I would be very thankful if some expert would give >>>>>>>> some >>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> >>>>>>>> >>>>>>>> Kojedzinszky Richard >>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> _______________________________________________ >>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>>> >>>>>> >>>> >>>> >>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>>>> From: Richard Kojedzinszky >>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely >>>>> and >>>>> replace it with the" >>>>> >>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>>>> >>>>> Conflicts: >>>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> --- >>>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>>>> +++++++++------------ >>>>> 4 files changed, 70 insertions(+), 80 deletions(-) >>>>> >>>>> diff --git >>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> index af2def2..8272c4d 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>>>> extern minor_t zfsdev_minor_alloc(void); >>>>> >>>>> extern void *zfsdev_state; >>>>> +extern kmutex_t zfsdev_state_lock; >>>>> >>>>> #endif /* _KERNEL */ >>>>> >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> index 15685a5..5c3e9f3 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>>> *max_psize, >>>>> struct g_provider *pp; >>>>> struct g_consumer *cp; >>>>> size_t bufsize; >>>>> - int error; >>>>> + int error, lock; >>>>> >>>>> /* >>>>> * We must have a pathname, and it must be absolute. >>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>> uint64_t *max_psize, >>>>> >>>>> vd->vdev_tsd = NULL; >>>>> >>>>> + if (mutex_owned(&spa_namespace_lock)) { >>>>> + mutex_exit(&spa_namespace_lock); >>>>> + lock = 1; >>>>> + } else { >>>>> + lock = 0; >>>>> + } >>>>> DROP_GIANT(); >>>>> g_topology_lock(); >>>>> error = 0; >>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>> uint64_t *max_psize, >>>>> !ISP2(cp->provider->sectorsize)) { >>>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>>>> vd->vdev_path); >>>>> + >>>>> + g_topology_lock(); >>>>> vdev_geom_detach(cp, 0); >>>>> + g_topology_unlock(); >>>>> + >>>>> error = EINVAL; >>>>> cp = NULL; >>>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, uint64_t >>>>> *max_psize, >>>>> } >>>>> g_topology_unlock(); >>>>> PICKUP_GIANT(); >>>>> + if (lock) >>>>> + mutex_enter(&spa_namespace_lock); >>>>> if (cp == NULL) { >>>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>>>> return (error); >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> index e9fba26..91becde 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>>>> static minor_t last_minor; >>>>> minor_t m; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> for (m = last_minor + 1; m != last_minor; m++) { >>>>> if (m > ZFSDEV_MAX_MINOR) >>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>> minor_t minor; >>>>> zfs_soft_state_t *zs; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> minor = zfsdev_minor_alloc(); >>>>> if (minor == 0) >>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>> static void >>>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>>>> { >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> zfs_onexit_destroy(zo); >>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int mode, >>>>> struct thread *td) >>>>> >>>>> /* This is the control device. Allocate a new minor if requested. */ >>>>> if (flag & FEXCL) { >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> error = zfs_ctldev_init(devp); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> } >>>>> >>>>> return (error); >>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>>>> if (minor == 0) >>>>> return; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>>>> if (zo == NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return; >>>>> } >>>>> zfs_ctldev_destroy(zo, minor); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> } >>>>> >>>>> static int >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> index 72d4502..aec5219 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>>>> #define ZVOL_DUMPSIZE "dumpsize" >>>>> >>>>> /* >>>>> - * The spa_namespace_lock protects the zfsdev_state structure from >>>>> being >>>>> - * modified while it's being used, e.g. an open that comes in before a >>>>> - * create finishes. It also protects temporary opens of the dataset so >>>>> that, >>>>> + * This lock protects the zfsdev_state structure from being modified >>>>> + * while it's being used, e.g. an open that comes in before a create >>>>> + * finishes. It also protects temporary opens of the dataset so that, >>>>> * e.g., an open doesn't get a spurious EBUSY. >>>>> */ >>>>> +kmutex_t zfsdev_state_lock; >>>>> static uint32_t zvol_minors; >>>>> >>>>> typedef struct zvol_extent { >>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>>>> struct g_geom *gp; >>>>> zvol_state_t *zv = NULL; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> g_topology_lock(); >>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>>>> { >>>>> zvol_state_t *zv; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> zv = zvol_minor_lookup(name); >>>>> if (minor && zv) >>>>> *minor = zv->zv_minor; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (zv ? 0 : -1); >>>>> } >>>>> #endif /* sun */ >>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>>>> >>>>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> if (zvol_minor_lookup(name) != NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EEXIST)); >>>>> } >>>>> >>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>>>> >>>>> if (error) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> >>>>> #ifdef sun >>>>> if ((minor = zfsdev_minor_alloc()) == 0) { >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> >>>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EAGAIN)); >>>>> } >>>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EAGAIN)); >>>>> } >>>>> >>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>>>> ddi_remove_minor_node(zfs_dip, chrbuf); >>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>> dmu_objset_disown(os, FTAG); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(EAGAIN)); >>>>> } >>>>> >>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>>>> >>>>> zvol_minors++; >>>>> >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> >>>>> zvol_geom_run(zv); >>>>> >>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>>>> minor_t minor = zv->zv_minor; >>>>> #endif >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> if (zv->zv_total_opens != 0) >>>>> return (SET_ERROR(EBUSY)); >>>>> >>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>>>> zvol_state_t *zv; >>>>> int rc; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> g_topology_lock(); >>>>> rc = zvol_remove_zv(zv); >>>>> g_topology_unlock(); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (rc); >>>>> } >>>>> >>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>>>> dmu_tx_t *tx; >>>>> int error; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> >>>>> tx = dmu_tx_create(os); >>>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>>>> namelen = strlen(name); >>>>> >>>>> DROP_GIANT(); >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> g_topology_lock(); >>>>> >>>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>>>> } >>>>> >>>>> g_topology_unlock(); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> PICKUP_GIANT(); >>>>> } >>>>> >>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>>>> uint64_t volsize) >>>>> uint64_t old_volsize = 0ULL; >>>>> uint64_t readonly; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> zv = zvol_minor_lookup(name); >>>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> >>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>>>> uint64_t volsize) >>>>> out: >>>>> dmu_objset_rele(os, FTAG); >>>>> >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> >>>>> return (error); >>>>> } >>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>>>> count) >>>>> { >>>>> zvol_state_t *zv; >>>>> int err = 0; >>>>> - boolean_t locked = B_FALSE; >>>>> >>>>> - /* >>>>> - * Protect against recursively entering spa_namespace_lock >>>>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>>>> - * This is needed since we replaced upstream zfsdev_state_lock >>>>> - * with spa_namespace_lock in the ZVOL code. >>>>> - * We are using the same trick as spa_open(). >>>>> - * Note that calls in zvol_first_open which need to resolve >>>>> - * pool name to a spa object will enter spa_open() >>>>> - * recursively, but that function already has all the >>>>> - * necessary protection. >>>>> - */ >>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>> - mutex_enter(&spa_namespace_lock); >>>>> - locked = B_TRUE; >>>>> - } >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> zv = pp->private; >>>>> if (zv == NULL) { >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> >>>>> if (zv->zv_total_opens == 0) >>>>> err = zvol_first_open(zv); >>>>> if (err) { >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (err); >>>>> } >>>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>>>> count) >>>>> #endif >>>>> >>>>> zv->zv_total_opens += count; >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> >>>>> return (err); >>>>> out: >>>>> if (zv->zv_total_opens == 0) >>>>> zvol_last_close(zv); >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (err); >>>>> } >>>>> >>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>>>> count) >>>>> { >>>>> zvol_state_t *zv; >>>>> int error = 0; >>>>> - boolean_t locked = B_FALSE; >>>>> >>>>> - /* See comment in zvol_open(). */ >>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>> - mutex_enter(&spa_namespace_lock); >>>>> - locked = B_TRUE; >>>>> - } >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> zv = pp->private; >>>>> if (zv == NULL) { >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> >>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>>>> count) >>>>> if (zv->zv_total_opens == 0) >>>>> zvol_last_close(zv); >>>>> >>>>> - if (locked) >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> >>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> int error = 0; >>>>> rl_t *rl; >>>>> >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> >>>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>>>> >>>>> if (zv == NULL) { >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (SET_ERROR(ENXIO)); >>>>> } >>>>> ASSERT(zv->zv_total_opens > 0); >>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> dki.dki_ctype = DKC_UNKNOWN; >>>>> dki.dki_unit = getminor(dev); >>>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>>>> zv->zv_min_bs); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>>>> error = SET_ERROR(EFAULT); >>>>> return (error); >>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>>>> dkm.dki_media_type = DK_UNKNOWN; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>>>> error = SET_ERROR(EFAULT); >>>>> return (error); >>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> uint64_t vs = zv->zv_volsize; >>>>> uint8_t bs = zv->zv_min_bs; >>>>> >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> error = zvol_getefi((void *)arg, flag, vs, bs); >>>>> return (error); >>>>> } >>>>> >>>>> case DKIOCFLUSHWRITECACHE: >>>>> dkc = (struct dk_callback *)arg; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> } >>>>> if (wce) { >>>>> zv->zv_flags |= ZVOL_WCE; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> } else { >>>>> zv->zv_flags &= ~ZVOL_WCE; >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>> } >>>>> return (0); >>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>> flag, cred_t *cr, int *rvalp) >>>>> break; >>>>> >>>>> } >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> return (error); >>>>> } >>>>> #endif /* sun */ >>>>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>>>> { >>>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>>>> 1) == 0); >>>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>>>> ZFS_LOG(1, "ZVOL Initialized."); >>>>> } >>>>> >>>>> void >>>>> zvol_fini(void) >>>>> { >>>>> + mutex_destroy(&zfsdev_state_lock); >>>>> ddi_soft_state_fini(&zfsdev_state); >>>>> ZFS_LOG(1, "ZVOL Deinitialized."); >>>>> } >>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t resize) >>>>> uint64_t version = spa_version(spa); >>>>> enum zio_checksum checksum; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>>>> >>>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>>>> *newname) >>>>> struct g_provider *pp; >>>>> zvol_state_t *zv; >>>>> >>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>> g_topology_assert(); >>>>> >>>>> pp = LIST_FIRST(&gp->provider); >>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const char >>>>> *newname) >>>>> newnamelen = strlen(newname); >>>>> >>>>> DROP_GIANT(); >>>>> - mutex_enter(&spa_namespace_lock); >>>>> + mutex_enter(&zfsdev_state_lock); >>>>> g_topology_lock(); >>>>> >>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const char >>>>> *newname) >>>>> } >>>>> >>>>> g_topology_unlock(); >>>>> - mutex_exit(&spa_namespace_lock); >>>>> + mutex_exit(&zfsdev_state_lock); >>>>> PICKUP_GIANT(); >>>>> } >>>>> -- >>>>> 1.8.4.2 >>>>> >>>> >>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>>>> From: Richard Kojedzinszky >>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>>>> >>>>> --- >>>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>>>> +++++++++++++++------- >>>>> 2 files changed, 42 insertions(+), 24 deletions(-) >>>>> >>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> index 94383d6..4cac053 100644 >>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>>>> * progress on this vnode. >>>>> */ >>>>> >>>>> + vn_lock(cvp, lktype); >>>>> + >>>>> for (;;) { >>>>> /* >>>>> * Reached the end of the mount chain? >>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>>>> if (vfsp == NULL) >>>>> break; >>>>> error = vfs_busy(vfsp, 0); >>>>> - /* >>>>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>>>> - */ >>>>> - if (tvp != NULL) >>>>> - vput(cvp); >>>>> - else >>>>> - vrele(cvp); >>>>> + VOP_UNLOCK(cvp, 0); >>>>> if (error) >>>>> return (error); >>>>> >>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>>>> vfs_unbusy(vfsp); >>>>> if (error != 0) >>>>> return (error); >>>>> + >>>>> + VN_RELE(cvp); >>>>> + >>>>> cvp = tvp; >>>>> } >>>>> >>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> index 28ab1fa..d3464b4 100644 >>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>>>> return (0); >>>>> } >>>>> >>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>>>> + * On return the passed in vp is unlocked */ >>>>> +static zfsctl_snapdir_t * >>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>>>> +{ >>>>> + gfs_dir_t *dp = vp->v_data; >>>>> + *dvpp = dp->gfsd_file.gfs_parent; >>>>> + zfsctl_snapdir_t *sdp; >>>>> + >>>>> + VN_HOLD(*dvpp); >>>>> + VOP_UNLOCK(vp, 0); >>>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>>>> + sdp = (*dvpp)->v_data; >>>>> + VOP_UNLOCK(*dvpp, 0); >>>>> + >>>>> + return (sdp); >>>>> +} >>>>> + >>>>> #ifdef sun >>>>> vnodeops_t *zfsctl_ops_root; >>>>> vnodeops_t *zfsctl_ops_snapdir; >>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>>>> * The snapshot was unmounted behind our backs, >>>>> * try to remount it. >>>>> */ >>>>> + VOP_UNLOCK(*vpp, 0); >>>>> + VN_HOLD(*vpp); >>>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>>>> snapname) == 0); >>>>> goto domount; >>>>> } else { >>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>>>> (void) strcpy(sep->se_name, nm); >>>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>>>> dmu_objset_id(snap)); >>>>> - VN_HOLD(*vpp); >>>>> avl_insert(&sdp->sd_snaps, sep, where); >>>>> >>>>> dmu_objset_rele(snap, FTAG); >>>>> @@ -1075,6 +1095,7 @@ domount: >>>>> (void) snprintf(mountpoint, mountpoint_len, >>>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>>>> + VN_HOLD(*vpp); >>>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>>>> kmem_free(mountpoint, mountpoint_len); >>>>> if (err == 0) { >>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>>>> int locked; >>>>> vnode_t *dvp; >>>>> >>>>> - if (vp->v_count > 0) >>>>> - goto end; >>>>> - >>>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>>>> - sdp = dvp->v_data; >>>>> - VOP_UNLOCK(dvp, 0); >>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>> >>>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>>>> mutex_enter(&sdp->sd_lock); >>>>> >>>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>>>> + >>>>> + if (vp->v_count > 0) { >>>>> + mutex_exit(&sdp->sd_lock); >>>>> + return (0); >>>>> + } >>>>> + >>>>> ASSERT(!vn_ismntpt(vp)); >>>>> >>>>> sep = avl_first(&sdp->sd_snaps); >>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>>>> mutex_exit(&sdp->sd_lock); >>>>> VN_RELE(dvp); >>>>> >>>>> -end: >>>>> /* >>>>> * Dispose of the vnode for the snapshot mount point. >>>>> * This is safe to do because once this entry has been removed >>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>>>> static int >>>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>>>> { >>>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>>>> - vnode_t *dvp, *vp; >>>>> + vnode_t *dvp, *vp = ap->a_vp; >>>>> zfsctl_snapdir_t *sdp; >>>>> zfs_snapentry_t *sep; >>>>> - int error; >>>>> + int error = 0; >>>>> >>>>> - ASSERT(zfsvfs->z_ctldir != NULL); >>>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>>>> - if (error != 0) >>>>> - return (error); >>>>> - sdp = dvp->v_data; >>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>> >>>>> mutex_enter(&sdp->sd_lock); >>>>> + >>>>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>>>> + >>>>> sep = avl_first(&sdp->sd_snaps); >>>>> while (sep != NULL) { >>>>> vp = sep->se_root; >>>>> -- >>>>> 1.8.4.2 >>>>> >>>> >>>> -- >>>> Pawel Jakub Dawidek http://www.wheelsystems.com >>>> FreeBSD committer http://www.FreeBSD.org >>>> Am I Evil? Yes, I Am! http://mobter.com >>>> >>> _______________________________________________ >>> freebsd-fs@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>> > From owner-freebsd-fs@FreeBSD.ORG Tue Dec 24 22:39:34 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D1A4777B for ; Tue, 24 Dec 2013 22:39:34 +0000 (UTC) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 824211633 for ; Tue, 24 Dec 2013 22:39:34 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: X-IronPort-AV: E=Sophos;i="4.95,545,1384318800"; d="scan'208";a="83072814" Received: from muskoka.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.222]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 24 Dec 2013 17:39:33 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id ED422B4055; Tue, 24 Dec 2013 17:39:32 -0500 (EST) Date: Tue, 24 Dec 2013 17:39:32 -0500 (EST) From: Rick Macklem To: Jason Keltz Message-ID: <1566997692.1186754.1387924772956.JavaMail.root@uoguelph.ca> In-Reply-To: <52A7E53D.8000002@cse.yorku.ca> Subject: Re: mount ZFS snapshot on Linux system MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 7.2.1_GA_2790 (ZimbraWebClient - FF3.0 (Win)/7.2.1_GA_2790) Cc: FreeBSD Filesystems , Steve Dickson X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Dec 2013 22:39:34 -0000 Jason Keltz wrote: > On 10/12/2013 7:21 PM, Rick Macklem wrote: > > Jason Keltz wrote: > >> I'm running FreeBSD 9.2 with various ZFS datasets. > >> I export a dataset to a Linux system (RHEL64), and mount it. It > >> works > >> fine... > >> When I try to access the ZFS snapshot directory on the Linux NFS > >> client, > >> things go weird. > >> Thanks to Jason's testing, I have just committed the following patch http://people.freebsd.org/~rmacklem/nfsv4-zfs-snapshot.patch to head as r259845, which I think fixes the problem. If things go smoothly, I'll MFC it to 9/stable and 10/stable in 1 week. Hopefully this is now resolved, rick > >> With NFSv4: > >> > >> [jas@archive /]# cd /mnt/.zfs/snapshot > >> [jas@archive snapshot]# ls > >> 20131203 20131205 20131206 20131207 20131208 20131209 > >> 20131210 > >> [jas@archive snapshot]# cd 20131210 > >> 20131210: Not a directory. > >> > >> huh? > >> > >> [jas@archive snapshot]# ls -al > >> total 77 > >> dr-xr-xr-x 9 root root 9 Dec 10 11:20 . > >> dr-xr-xr-x 4 root root 4 Nov 28 15:42 .. > >> drwxr-xr-x 380 root root 380 Dec 2 15:56 20131203 > >> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131205 > >> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131206 > >> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131207 > >> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131208 > >> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131209 > >> drwxr-xr-x 381 root root 381 Dec 3 11:24 20131210 > >> [jas@archive snapshot]# stat * > >> [jas@archive snapshot]# ls -al > >> total 292 > >> dr-xr-xr-x 9 root root 9 Dec 10 11:20 . > >> dr-xr-xr-x 4 root root 4 Nov 28 15:42 .. > >> -rw-r--r-- 1 uax guest 137647 Mar 17 2010 20131203 > >> -rw-r--r-- 1 uax guest 865 Jul 31 2009 20131205 > >> -rw-r--r-- 1 uax guest 137647 Mar 17 2010 20131206 > >> -rw-r--r-- 1 uax guest 771 Jul 31 2009 20131207 > >> -rw-r--r-- 1 uax guest 778 Jul 31 2009 20131208 > >> -rw-r--r-- 1 uax guest 5281 Jul 31 2009 20131209 > >> -rw------- 1 btx faculty 893 Jul 13 20:21 20131210 > >> > >> But it gets even more fun.. > >> > >> # ls -ali > >> total 205 > >> 2 dr-xr-xr-x 9 root root 9 Dec 10 11:20 . > >> 1 dr-xr-xr-x 4 root root 4 Nov 28 15:42 .. > >> 863 -rw-r--r-- 1 uax guest 137647 Mar 17 2010 20131203 > >> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131205 > >> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131206 > >> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131207 > >> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131208 > >> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131209 > >> 4 drwxr-xr-x 381 root root 381 Dec 3 11:24 20131210 > >> > >> This is not a user id mapping issue because all the files in /mnt > >> have > >> the proper owner/groups, and I can access them there fine. > >> > >> I also tried explicitly exporting .zfs/snapshot. The result isn't > >> any > >> different. > >> > >> If I use nfs v3 it "works", but I'm seeing a whole lot of errors > >> like > >> these in syslog: > >> > >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for > >> /local/backup/home9/.zfs/snapshot/20131203: Invalid argument > >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for > >> /local/backup/home9/.zfs/snapshot/20131209: Invalid argument > >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for > >> /local/backup/home9/.zfs/snapshot/20131210: Invalid argument > >> Dec 10 12:32:28 jungle mountd[49579]: can't delete exports for > >> /local/backup/home9/.zfs/snapshot/20131207: Invalid argument > >> > >> It's not clear to me why this doesn't just "work". > >> > >> Can anyone provide any advice on debugging this? > >> > > As I think you already know, I know nothing about ZFS and never > > use it. > Yup! :) > > Having said that, I suspect that there are filenos (i-node #s) > > that are the same in the snapshot as in the parent file system > > tree. > > > > The basic assumptions are: > > - within a file system, all i-node# are unique (represent one file > > object only) and all file objects have the same fsid > > - when the fsid changes, that indicates a file system boundary and > > fileno (i-node#s) can be reused in the subtree with a different > > fsid > > > > For NFSv3, the server should export single volumes only (all > > objects > > have the same fsid and the filenos are unique). This is indicated > > to > > the VFS by the use of the NOCROSSMOUNT flag on VOP_LOOKUP() and > > friends. > > > > For NFSv4, the server does export multiple volumes and the boundary > > is indicated by a change in fsid value. > > > > I suspect ZFS snaphots don't obey the above in some way, but that > > is > > just a hunch. > > > > Now, how to narrow this down... > > - Do the above tests (both NFSv4 and NFSv3) and capture the > > packets, > > then look at them in wireshark. In particular, look at the > > fileid numbers > > and fsid values for the various directories under .zfs. > > I gave this a shot, but I haven't used wireshark to capture NFS > traffic > before, so if I need to provide additional details, let me know.. > > NFSv4: > > For /mnt/.zfs/snapshot/20131203: > fileid=4 > fsid4.major=1446349656 > fsid4.minor=222 > > For /mnt/.zfs/snapshot/20131205: > fileid=4 > fsid4.major=1845998066 > fsid4.minor=222 > > For /mnt/jas: > fileid=144 > fsid4.major=597946950 > fsid4.minor=222 > > For /mnt/jas1: > fileid=338 > fsid4.major=597946950 > fsid4.minor=222 > > So fsid is the same for all the different "data" directories, which > is > what I would expect given what you said. I guess each snapshot is > seen > as a unique filesystem... but then a repeating inode in different > filesystems shouldn't be a problem... > > NFSv3: > > For /mnt/.zfs/snapshot/20131203: > fileid=4 > fsid=0x0000000056358b58 > > For /mnt/.zfs/snapshot/20131205: > fileid=4 > fsid=0x000000006e07b1f2 > > For /mnt/jas > fileid=144 > fsid=0x0000000023a3f246 > > For /mnt/jas1: > fileid=338 > fsid=0x0000000023a3f246 > > Here, it seems it's the same, even though it's NFSv3... hmm. > > > > - Try mounting the individual snapshot directory, like > > .zfs/snapshot/20131209 and see if that works (for both NFSv3 > > and NFSv4). > > Hmm .. I tried this: > > /local/backup/home9/.zfs/snapshot/20131203 -ro > archive-mrpriv.cs.yorku.ca > V4: / > > ... but syslog reports: > > Dec 10 22:28:22 jungle mountd[85405]: can't export > /local/backup/home9/.zfs/snapshot/20131203 > > ... and of course I can't mount from either v3/v4. > > On the other hand, I kept it as: > > /local/backup/home9 -ro archive-mrpriv.cs.yorku.ca > V4:/ > > ... and was able to NFSv4 mount > /local/backup/home9/.zfs/snapshot/20131203, and this does indeed > work. > > > - Try doing the mounts with a FreeBSD client and see if you get the > > same > > behaviour? > I found this: > http://forums.freenas.org/threads/mounting-snapshot-directory-using-nfs-from-linux-broken.6060/ > .. implies it will work from FreeBSD/Nexenta, just not Linux. > Found this as well: > https://groups.google.com/a/zfsonlinux.org/forum/#!topic/zfs-discuss/lKyfYsjPMNM > > Jason. > > From owner-freebsd-fs@FreeBSD.ORG Wed Dec 25 05:23:01 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 0D852433; Wed, 25 Dec 2013 05:23:01 +0000 (UTC) Received: from krichy.tvnetwork.hu (unknown [IPv6:2a01:be00:0:2::10]) by mx1.freebsd.org (Postfix) with ESMTP id 5C0D01233; Wed, 25 Dec 2013 05:22:59 +0000 (UTC) Received: by krichy.tvnetwork.hu (Postfix, from userid 1000) id 8880EA943; Wed, 25 Dec 2013 06:22:37 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by krichy.tvnetwork.hu (Postfix) with ESMTP id 82BD7A942; Wed, 25 Dec 2013 06:22:37 +0100 (CET) Date: Wed, 25 Dec 2013 06:22:37 +0100 (CET) From: krichy@tvnetwork.hu To: Pawel Jakub Dawidek Subject: Re: kern/184677 / ZFS snapshot handling deadlocks In-Reply-To: Message-ID: References: <20131220134405.GE1658@garage.freebsd.pl> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="1030603365-641758273-1387948957=:2785" Cc: freebsd-fs@freebsd.org, avg@freebsd.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Dec 2013 05:23:01 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --1030603365-641758273-1387948957=:2785 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Dear all, I've made a new patch again, which fixes most of my earlier issues. Mainly, I've enabled shared vnode locks for GFS vnodes - is it acceptable? And that way, deadlock cases reduced much, and also the patch is more clear (at least to me). One thing still remains, the spa_namespace_lock race I mentioned before, I dont know how to handle that. Waiting for comments on this. Regards, Kojedzinszky Richard Euronet Magyarorszag Informatikai Zrt. On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > Date: Tue, 24 Dec 2013 21:35:16 +0100 (CET) > From: krichy@tvnetwork.hu > To: Pawel Jakub Dawidek > Cc: freebsd-fs@freebsd.org, avg@freebsd.org > Subject: Re: kern/184677 / ZFS snapshot handling deadlocks > > Dear all, > > Now a user caused deadlock simply caused buildkernel to be blocked, so I must > correct myself in that this not only does a DoS for .zfs/snapshot > directories, but may for the whole system. > > Regards, > > P.S.: Merry Christmas! > > Kojedzinszky Richard > Euronet Magyarorszag Informatikai Zrt. > > On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: > >> Date: Tue, 24 Dec 2013 21:28:39 +0100 (CET) >> From: krichy@tvnetwork.hu >> To: Pawel Jakub Dawidek >> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >> >> Dear all, >> >> Unfortunately, this seems to be serious, as a plain user can deadlock zfs >> using the attached script, and can make a DoS against the mounted dataset's >> .zfs/snapshot directory. >> >> Regards, >> >> Kojedzinszky Richard >> Euronet Magyarorszag Informatikai Zrt. >> >> On Tue, 24 Dec 2013, krichy@tvnetwork.hu wrote: >> >>> Date: Tue, 24 Dec 2013 07:10:05 +0100 (CET) >>> From: krichy@tvnetwork.hu >>> To: Pawel Jakub Dawidek >>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>> >>> Dear devs, >>> >>> A third one, again the snapshot dirs: >>> >>> # cd /tmp/test/.zfs/snapshot && yes .. | xargs stat > /dev/null & >>> # yes /tmp/test/.zfs/snapshot/snap | xargs umount > /dev/null >>> >>> Will deadlock in a few seconds. >>> >>> The first holds a lock on .zfs/snapshot vnode, when looking up .. it tries >>> to lock .zfs, while the other just tries to do this in reversed order. >>> >>> In a vop_lookup, why is the passed in vnode, and the returned vnode must >>> be locked? Why is not enough to have it held? >>> >>> Thanks in advance, >>> Kojedzinszky Richard >>> Euronet Magyarorszag Informatikai Zrt. >>> >>> On Mon, 23 Dec 2013, krichy@tvnetwork.hu wrote: >>> >>>> Date: Mon, 23 Dec 2013 17:55:08 +0100 (CET) >>>> From: krichy@tvnetwork.hu >>>> To: Pawel Jakub Dawidek >>>> Cc: freebsd-fs@freebsd.org, avg@freebsd.org >>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>>> >>>> Dear Pawel, >>>> >>>> Thanks for your response. I hope someone will review my work. >>>> >>>> Secondly, I think I;ve found another deadlock scenario: >>>> >>>> When 'zfs send -R' finishes, in zfsdev_close() it has spa_namespace_lock >>>> held. Then it does an unmount of all snapshots, which in turn goes to >>>> zfsctl_snapshot_inactive(), in which sdp->sd_lock is being locked. >>>> >>>> Concurrently, a zfsctl_snapdir_lookup(), holding the same sdp->sd_lock >>>> does a snapshot mount, which somewhere tries to acquire >>>> spa_namespace_lock. So it deadlocks. Currently I dont know how to >>>> workaround this. >>>> >>>> Is spa_namespace_lock need to be held in zfsctl_snapshot_inactive()? >>>> >>>> What if we release it function enter, and re-acquire upon exit? >>>> >>>> Thanks in advance, >>>> >>>> >>>> Kojedzinszky Richard >>>> Euronet Magyarorszag Informatikai Zrt. >>>> >>>> On Fri, 20 Dec 2013, Pawel Jakub Dawidek wrote: >>>> >>>>> Date: Fri, 20 Dec 2013 14:44:05 +0100 >>>>> From: Pawel Jakub Dawidek >>>>> To: krichy@tvnetwork.hu >>>>> Cc: freebsd-fs@freebsd.org >>>>> Subject: Re: kern/184677 / ZFS snapshot handling deadlocks >>>>> >>>>> On Thu, Dec 19, 2013 at 04:46:11PM +0100, krichy@tvnetwork.hu wrote: >>>>>> Dear devs, >>>>>> >>>>>> I am attaching a more clear patch/fix for my snapshot handling issues >>>>>> (0002), but I would be happy if some ZFS expert would comment it. I am >>>>>> trying to solve it at least for two weeks now, and an ACK or a NACK >>>>>> would >>>>>> be nice from someone. Also a commit is reverted since it also caused >>>>>> deadlocks. I've read its comments, which also eliminates deadlocks, but >>>>>> I >>>>>> did not find any reference how to produce that deadlock. In my view >>>>>> reverting that makes my issues disappear, but I dont know what new >>>>>> cases >>>>>> will it rise. >>>>> >>>>> Richard, >>>>> >>>>> I won't be able to analyse it myself anytime soon, because of other >>>>> obligations, but I forwarded you e-mail to the zfs-devel@ mailing list >>>>> (it is closed, but gathers FreeBSD ZFS devs). Hopefully someone from >>>>> there will be able to help you. >>>>> >>>>>> I've rewritten traverse() to make more like upstream, added two extra >>>>>> VN_HOLD()s to snapdir_lookup() when traverse returned the same vnode >>>>>> what >>>>>> was passed to it (I dont know even that upon creating a snapshot vnode >>>>>> why >>>>>> is that extra two holds needed for the vnode.) And unfortunately, due >>>>>> to >>>>>> FreeBSD calls vop_inactive callbacks with vnodes locked, that could >>>>>> also >>>>>> cause deadlocks, so zfsctl_snapshot_inactive() and >>>>>> zfsctl_snapshot_vptocnp() has been rewritten to work that around. >>>>>> >>>>>> After this, one may also get a deadlock, when a simple access would >>>>>> call >>>>>> into zfsctl_snapshot_lookup(). The documentation says, that those >>>>>> vnodes >>>>>> should always be covered, but after some stress test, sometimes we hit >>>>>> that call, and that can cause again deadlocks. In our environment I've >>>>>> just uncommented that callback, which returns ENODIR on some calls, but >>>>>> at >>>>>> least not a deadlock. >>>>>> >>>>>> The attached script can be used to reproduce my cases (would one >>>>>> confirm >>>>>> that?), and after the patches applied, they disappear (confirm?). >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> Kojedzinszky Richard >>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>> >>>>>> On Tue, 17 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>> >>>>>>> Date: Tue, 17 Dec 2013 14:50:16 +0100 (CET) >>>>>>> From: krichy@tvnetwork.hu >>>>>>> To: pjd@freebsd.org >>>>>>> Cc: freebsd-fs@freebsd.org >>>>>>> Subject: Re: kern/184677 (fwd) >>>>>>> >>>>>>> Dear devs, >>>>>>> >>>>>>> I will sum up my experience regarding the issue: >>>>>>> >>>>>>> The sympton is that a concurrent 'zfs send -R' and some activity on >>>>>>> the >>>>>>> snapshot dir (or in the snapshot) may cause a deadlock. >>>>>>> >>>>>>> After investigating the problem, I found that zfs send umounts the >>>>>>> snapshots, >>>>>>> and that causes the deadlock, so later I tested only with concurrent >>>>>>> umount >>>>>>> and the "activity". More later I found that listing the snapshots in >>>>>>> .zfs/snapshot/ and unounting them can cause the found deadlock, so I >>>>>>> used >>>>>>> them for the tests. But for my surprise, instead of a deadlock, a >>>>>>> recursive >>>>>>> lock panic has arised. >>>>>>> >>>>>>> The vnode for the ".zfs/snapshot/" directory contains zfs's >>>>>>> zfsctl_snapdir_t >>>>>>> structure (sdp). This contains a tree of mounted snapshots, and each >>>>>>> entry >>>>>>> (sep) contains the vnode of entry on which the snapshot is mounted on >>>>>>> top >>>>>>> (se_root). The strange is that the se_root member does not hold a >>>>>>> reference >>>>>>> for the vnode, just a simple pointer to it. >>>>>>> >>>>>>> Upon entry lookup (zfsctl_snapdir_lookup()) the "snapshot" vnode is >>>>>>> locked, >>>>>>> the zfsctl_snapdir_t's tree is locked, and searched for the mount if >>>>>>> it >>>>>>> exists already. If it founds no entry, does the mount. In the case of >>>>>>> an >>>>>>> entry was found, the se_root member contains the vnode which the >>>>>>> snapshot is >>>>>>> mounted on. Thus, a reference is taken for it, and the traverse() call >>>>>>> will >>>>>>> resolve to the real root vnode of the mounted snapshot, returning it >>>>>>> as >>>>>>> locked. (Examining the traverse() code I've found that it did not >>>>>>> follow >>>>>>> FreeBSD's lock order recommendation described in sys/kern/vfs_subr.c.) >>>>>>> >>>>>>> On the other way, when an umount is issued, the se_root vnode looses >>>>>>> its last >>>>>>> reference (as only the mountpoint holds one for it), it goes through >>>>>>> the >>>>>>> vinactive() path, to zfsctl_snapshot_inactive(). In FreeBSD this is >>>>>>> called >>>>>>> with a locked vnode, so this is a deadlock race condition. While >>>>>>> zfsctl_snapdir_lookup() holds the mutex for the sdp tree, and >>>>>>> traverse() >>>>>>> tries to acquire the se_root, zfsctl_snapshot_inactive() holds the >>>>>>> lock on >>>>>>> se_root while tries to access the sdp lock. >>>>>>> >>>>>>> The zfsctl_snapshot_inactive() has an if statement checking the >>>>>>> v_usecount, >>>>>>> which is incremented in zfsctl_snapdir_lookup(), but in that context >>>>>>> it is >>>>>>> not covered by VI_LOCK. And it seems to me that FreeBSD's vinactive() >>>>>>> path >>>>>>> assumes that the vnode remains inactive (as opposed to illumos, at >>>>>>> least how >>>>>>> i read the code). So zfsctl_snapshot_inactive() must free resources >>>>>>> while in >>>>>>> a locked state. I was a bit confused, and probably that is why the >>>>>>> previously >>>>>>> posted patch is as is. >>>>>>> >>>>>>> Maybe if I had some clues on the directions of this problem, I could >>>>>>> have >>>>>>> worked more for a nicer, shorter solution. >>>>>>> >>>>>>> Please someone comment on my post. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Kojedzinszky Richard >>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>> >>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>>> >>>>>>>> Date: Mon, 16 Dec 2013 16:52:16 +0100 (CET) >>>>>>>> From: krichy@tvnetwork.hu >>>>>>>> To: pjd@freebsd.org >>>>>>>> Cc: freebsd-fs@freebsd.org >>>>>>>> Subject: Re: kern/184677 (fwd) >>>>>>>> >>>>>>>> Dear PJD, >>>>>>>> >>>>>>>> I am a happy FreeBSD user, I am sure you've read my previous posts >>>>>>>> regarding some issues in ZFS. Please give some advice for me, where >>>>>>>> to look >>>>>>>> for solutions, or how could I help to resolve those issues. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Kojedzinszky Richard >>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> Date: Mon, 16 Dec 2013 15:23:06 +0100 (CET) >>>>>>>> From: krichy@tvnetwork.hu >>>>>>>> To: freebsd-fs@freebsd.org >>>>>>>> Subject: Re: kern/184677 >>>>>>>> >>>>>>>> >>>>>>>> Seems that pjd did a change which eliminated the zfsdev_state_lock on >>>>>>>> Fri >>>>>>>> Aug 12 07:04:16 2011 +0000, which might introduced a new deadlock >>>>>>>> situation. Any comments on this? >>>>>>>> >>>>>>>> >>>>>>>> Kojedzinszky Richard >>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>>> >>>>>>>> On Mon, 16 Dec 2013, krichy@tvnetwork.hu wrote: >>>>>>>> >>>>>>>>> Date: Mon, 16 Dec 2013 11:08:11 +0100 (CET) >>>>>>>>> From: krichy@tvnetwork.hu >>>>>>>>> To: freebsd-fs@freebsd.org >>>>>>>>> Subject: kern/184677 >>>>>>>>> >>>>>>>>> Dear devs, >>>>>>>>> >>>>>>>>> I've attached a patch, which makes the recursive lockmgr disappear, >>>>>>>>> and >>>>>>>>> makes the reported bug to disappear. I dont know if I followed any >>>>>>>>> guidelines well, or not, but at least it works for me. Please some >>>>>>>>> ZFS/FreeBSD fs expert review it, and fix it where it needed. >>>>>>>>> >>>>>>>>> But unfortunately, my original problem is still not solved, maybe >>>>>>>>> the same >>>>>>>>> as Ryan's: >>>>>>>>> http://lists.freebsd.org/pipermail/freebsd-fs/2013-December/018707.html >>>>>>>>> >>>>>>>>> Tracing the problem down is that zfsctl_snapdir_lookup() tries to >>>>>>>>> acquire >>>>>>>>> spa_namespace_lock while when finishing a zfs send -R does a >>>>>>>>> zfsdev_close(), and that also holds the same mutex. And this causes >>>>>>>>> a >>>>>>>>> deadlock scenario. I looked at illumos's code, and for some reason >>>>>>>>> they >>>>>>>>> use another mutex on zfsdev_close(), which therefore may not >>>>>>>>> deadlock with >>>>>>>>> zfsctl_snapdir_lookup(). But I am still investigating the problem. >>>>>>>>> >>>>>>>>> I would like to help making ZFS more stable on freebsd also with its >>>>>>>>> whole >>>>>>>>> functionality. I would be very thankful if some expert would give >>>>>>>>> some >>>>>>>>> advice, how to solve these bugs. PJD, Steven, Xin? >>>>>>>>> >>>>>>>>> Thanks in advance, >>>>>>>>> >>>>>>>>> >>>>>>>>> Kojedzinszky Richard >>>>>>>>> Euronet Magyarorszag Informatikai Zrt. >>>>>>>> _______________________________________________ >>>>>>>> freebsd-fs@freebsd.org mailing list >>>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>>>>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>>>>>> >>>>>>> >>>>> >>>>> >>>>>> From 39298da838d006ad225e41529d7b7f240fccfe73 Mon Sep 17 00:00:00 2001 >>>>>> From: Richard Kojedzinszky >>>>>> Date: Mon, 16 Dec 2013 15:39:11 +0100 >>>>>> Subject: [PATCH 1/2] Revert "Eliminate the zfsdev_state_lock entirely >>>>>> and >>>>>> replace it with the" >>>>>> >>>>>> This reverts commit 1d8972b3f353f986eb5b85bc108b1c0d946d3218. >>>>>> >>>>>> Conflicts: >>>>>> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> --- >>>>>> .../opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h | 1 + >>>>>> .../opensolaris/uts/common/fs/zfs/vdev_geom.c | 14 ++- >>>>>> .../opensolaris/uts/common/fs/zfs/zfs_ioctl.c | 16 +-- >>>>>> .../contrib/opensolaris/uts/common/fs/zfs/zvol.c | 119 >>>>>> +++++++++------------ >>>>>> 4 files changed, 70 insertions(+), 80 deletions(-) >>>>>> >>>>>> diff --git >>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> index af2def2..8272c4d 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_ioctl.h >>>>>> @@ -383,6 +383,7 @@ extern void *zfsdev_get_soft_state(minor_t minor, >>>>>> extern minor_t zfsdev_minor_alloc(void); >>>>>> >>>>>> extern void *zfsdev_state; >>>>>> +extern kmutex_t zfsdev_state_lock; >>>>>> >>>>>> #endif /* _KERNEL */ >>>>>> >>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> index 15685a5..5c3e9f3 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c >>>>>> @@ -581,7 +581,7 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> struct g_provider *pp; >>>>>> struct g_consumer *cp; >>>>>> size_t bufsize; >>>>>> - int error; >>>>>> + int error, lock; >>>>>> >>>>>> /* >>>>>> * We must have a pathname, and it must be absolute. >>>>>> @@ -593,6 +593,12 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> >>>>>> vd->vdev_tsd = NULL; >>>>>> >>>>>> + if (mutex_owned(&spa_namespace_lock)) { >>>>>> + mutex_exit(&spa_namespace_lock); >>>>>> + lock = 1; >>>>>> + } else { >>>>>> + lock = 0; >>>>>> + } >>>>>> DROP_GIANT(); >>>>>> g_topology_lock(); >>>>>> error = 0; >>>>>> @@ -624,7 +630,11 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> !ISP2(cp->provider->sectorsize)) { >>>>>> ZFS_LOG(1, "Provider %s has unsupported sectorsize.", >>>>>> vd->vdev_path); >>>>>> + >>>>>> + g_topology_lock(); >>>>>> vdev_geom_detach(cp, 0); >>>>>> + g_topology_unlock(); >>>>>> + >>>>>> error = EINVAL; >>>>>> cp = NULL; >>>>>> } else if (cp->acw == 0 && (spa_mode(vd->vdev_spa) & FWRITE) != 0) { >>>>>> @@ -647,6 +657,8 @@ vdev_geom_open(vdev_t *vd, uint64_t *psize, >>>>>> uint64_t *max_psize, >>>>>> } >>>>>> g_topology_unlock(); >>>>>> PICKUP_GIANT(); >>>>>> + if (lock) >>>>>> + mutex_enter(&spa_namespace_lock); >>>>>> if (cp == NULL) { >>>>>> vd->vdev_stat.vs_aux = VDEV_AUX_OPEN_FAILED; >>>>>> return (error); >>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> index e9fba26..91becde 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c >>>>>> @@ -5635,7 +5635,7 @@ zfsdev_minor_alloc(void) >>>>>> static minor_t last_minor; >>>>>> minor_t m; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> for (m = last_minor + 1; m != last_minor; m++) { >>>>>> if (m > ZFSDEV_MAX_MINOR) >>>>>> @@ -5655,7 +5655,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>>> minor_t minor; >>>>>> zfs_soft_state_t *zs; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> minor = zfsdev_minor_alloc(); >>>>>> if (minor == 0) >>>>>> @@ -5676,7 +5676,7 @@ zfs_ctldev_init(struct cdev *devp) >>>>>> static void >>>>>> zfs_ctldev_destroy(zfs_onexit_t *zo, minor_t minor) >>>>>> { >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> zfs_onexit_destroy(zo); >>>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>>> @@ -5706,9 +5706,9 @@ zfsdev_open(struct cdev *devp, int flag, int >>>>>> mode, struct thread *td) >>>>>> >>>>>> /* This is the control device. Allocate a new minor if requested. */ >>>>>> if (flag & FEXCL) { >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> error = zfs_ctldev_init(devp); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> } >>>>>> >>>>>> return (error); >>>>>> @@ -5723,14 +5723,14 @@ zfsdev_close(void *data) >>>>>> if (minor == 0) >>>>>> return; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> zo = zfsdev_get_soft_state(minor, ZSST_CTLDEV); >>>>>> if (zo == NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return; >>>>>> } >>>>>> zfs_ctldev_destroy(zo, minor); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> } >>>>>> >>>>>> static int >>>>>> diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> index 72d4502..aec5219 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c >>>>>> @@ -104,11 +104,12 @@ static char *zvol_tag = "zvol_tag"; >>>>>> #define ZVOL_DUMPSIZE "dumpsize" >>>>>> >>>>>> /* >>>>>> - * The spa_namespace_lock protects the zfsdev_state structure from >>>>>> being >>>>>> - * modified while it's being used, e.g. an open that comes in before a >>>>>> - * create finishes. It also protects temporary opens of the dataset >>>>>> so that, >>>>>> + * This lock protects the zfsdev_state structure from being modified >>>>>> + * while it's being used, e.g. an open that comes in before a create >>>>>> + * finishes. It also protects temporary opens of the dataset so that, >>>>>> * e.g., an open doesn't get a spurious EBUSY. >>>>>> */ >>>>>> +kmutex_t zfsdev_state_lock; >>>>>> static uint32_t zvol_minors; >>>>>> >>>>>> typedef struct zvol_extent { >>>>>> @@ -249,7 +250,7 @@ zvol_minor_lookup(const char *name) >>>>>> struct g_geom *gp; >>>>>> zvol_state_t *zv = NULL; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> g_topology_lock(); >>>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>>> @@ -465,11 +466,11 @@ zvol_name2minor(const char *name, minor_t *minor) >>>>>> { >>>>>> zvol_state_t *zv; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> zv = zvol_minor_lookup(name); >>>>>> if (minor && zv) >>>>>> *minor = zv->zv_minor; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (zv ? 0 : -1); >>>>>> } >>>>>> #endif /* sun */ >>>>>> @@ -489,10 +490,10 @@ zvol_create_minor(const char *name) >>>>>> >>>>>> ZFS_LOG(1, "Creating ZVOL %s...", name); >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> if (zvol_minor_lookup(name) != NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EEXIST)); >>>>>> } >>>>>> >>>>>> @@ -500,20 +501,20 @@ zvol_create_minor(const char *name) >>>>>> error = dmu_objset_own(name, DMU_OST_ZVOL, B_TRUE, FTAG, &os); >>>>>> >>>>>> if (error) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> #ifdef sun >>>>>> if ((minor = zfsdev_minor_alloc()) == 0) { >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> >>>>>> if (ddi_soft_state_zalloc(zfsdev_state, minor) != DDI_SUCCESS) { >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EAGAIN)); >>>>>> } >>>>>> (void) ddi_prop_update_string(minor, zfs_dip, ZVOL_PROP_NAME, >>>>>> @@ -525,7 +526,7 @@ zvol_create_minor(const char *name) >>>>>> minor, DDI_PSEUDO, 0) == DDI_FAILURE) { >>>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EAGAIN)); >>>>>> } >>>>>> >>>>>> @@ -536,7 +537,7 @@ zvol_create_minor(const char *name) >>>>>> ddi_remove_minor_node(zfs_dip, chrbuf); >>>>>> ddi_soft_state_free(zfsdev_state, minor); >>>>>> dmu_objset_disown(os, FTAG); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(EAGAIN)); >>>>>> } >>>>>> >>>>>> @@ -587,7 +588,7 @@ zvol_create_minor(const char *name) >>>>>> >>>>>> zvol_minors++; >>>>>> >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> >>>>>> zvol_geom_run(zv); >>>>>> >>>>>> @@ -609,7 +610,7 @@ zvol_remove_zv(zvol_state_t *zv) >>>>>> minor_t minor = zv->zv_minor; >>>>>> #endif >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> if (zv->zv_total_opens != 0) >>>>>> return (SET_ERROR(EBUSY)); >>>>>> >>>>>> @@ -635,15 +636,15 @@ zvol_remove_minor(const char *name) >>>>>> zvol_state_t *zv; >>>>>> int rc; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> if ((zv = zvol_minor_lookup(name)) == NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> g_topology_lock(); >>>>>> rc = zvol_remove_zv(zv); >>>>>> g_topology_unlock(); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (rc); >>>>>> } >>>>>> >>>>>> @@ -755,7 +756,7 @@ zvol_update_volsize(objset_t *os, uint64_t volsize) >>>>>> dmu_tx_t *tx; >>>>>> int error; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> >>>>>> tx = dmu_tx_create(os); >>>>>> dmu_tx_hold_zap(tx, ZVOL_ZAP_OBJ, TRUE, NULL); >>>>>> @@ -786,7 +787,7 @@ zvol_remove_minors(const char *name) >>>>>> namelen = strlen(name); >>>>>> >>>>>> DROP_GIANT(); >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> g_topology_lock(); >>>>>> >>>>>> LIST_FOREACH_SAFE(gp, &zfs_zvol_class.geom, geom, gptmp) { >>>>>> @@ -804,7 +805,7 @@ zvol_remove_minors(const char *name) >>>>>> } >>>>>> >>>>>> g_topology_unlock(); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> PICKUP_GIANT(); >>>>>> } >>>>>> >>>>>> @@ -818,10 +819,10 @@ zvol_set_volsize(const char *name, major_t maj, >>>>>> uint64_t volsize) >>>>>> uint64_t old_volsize = 0ULL; >>>>>> uint64_t readonly; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> zv = zvol_minor_lookup(name); >>>>>> if ((error = dmu_objset_hold(name, FTAG, &os)) != 0) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> @@ -888,7 +889,7 @@ zvol_set_volsize(const char *name, major_t maj, >>>>>> uint64_t volsize) >>>>>> out: >>>>>> dmu_objset_rele(os, FTAG); >>>>>> >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> >>>>>> return (error); >>>>>> } >>>>>> @@ -899,36 +900,19 @@ zvol_open(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> { >>>>>> zvol_state_t *zv; >>>>>> int err = 0; >>>>>> - boolean_t locked = B_FALSE; >>>>>> >>>>>> - /* >>>>>> - * Protect against recursively entering spa_namespace_lock >>>>>> - * when spa_open() is used for a pool on a (local) ZVOL(s). >>>>>> - * This is needed since we replaced upstream zfsdev_state_lock >>>>>> - * with spa_namespace_lock in the ZVOL code. >>>>>> - * We are using the same trick as spa_open(). >>>>>> - * Note that calls in zvol_first_open which need to resolve >>>>>> - * pool name to a spa object will enter spa_open() >>>>>> - * recursively, but that function already has all the >>>>>> - * necessary protection. >>>>>> - */ >>>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> - locked = B_TRUE; >>>>>> - } >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> zv = pp->private; >>>>>> if (zv == NULL) { >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> >>>>>> if (zv->zv_total_opens == 0) >>>>>> err = zvol_first_open(zv); >>>>>> if (err) { >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (err); >>>>>> } >>>>>> if ((flag & FWRITE) && (zv->zv_flags & ZVOL_RDONLY)) { >>>>>> @@ -950,15 +934,13 @@ zvol_open(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> #endif >>>>>> >>>>>> zv->zv_total_opens += count; >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> >>>>>> return (err); >>>>>> out: >>>>>> if (zv->zv_total_opens == 0) >>>>>> zvol_last_close(zv); >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (err); >>>>>> } >>>>>> >>>>>> @@ -968,18 +950,12 @@ zvol_close(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> { >>>>>> zvol_state_t *zv; >>>>>> int error = 0; >>>>>> - boolean_t locked = B_FALSE; >>>>>> >>>>>> - /* See comment in zvol_open(). */ >>>>>> - if (!MUTEX_HELD(&spa_namespace_lock)) { >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> - locked = B_TRUE; >>>>>> - } >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> zv = pp->private; >>>>>> if (zv == NULL) { >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> >>>>>> @@ -1002,8 +978,7 @@ zvol_close(struct g_provider *pp, int flag, int >>>>>> count) >>>>>> if (zv->zv_total_opens == 0) >>>>>> zvol_last_close(zv); >>>>>> >>>>>> - if (locked) >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> @@ -1658,12 +1633,12 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, >>>>>> int flag, cred_t *cr, int *rvalp) >>>>>> int error = 0; >>>>>> rl_t *rl; >>>>>> >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> >>>>>> zv = zfsdev_get_soft_state(getminor(dev), ZSST_ZVOL); >>>>>> >>>>>> if (zv == NULL) { >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (SET_ERROR(ENXIO)); >>>>>> } >>>>>> ASSERT(zv->zv_total_opens > 0); >>>>>> @@ -1677,7 +1652,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>>> flag, cred_t *cr, int *rvalp) >>>>>> dki.dki_ctype = DKC_UNKNOWN; >>>>>> dki.dki_unit = getminor(dev); >>>>>> dki.dki_maxtransfer = 1 << (SPA_MAXBLOCKSHIFT - >>>>>> zv->zv_min_bs); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> if (ddi_copyout(&dki, (void *)arg, sizeof (dki), flag)) >>>>>> error = SET_ERROR(EFAULT); >>>>>> return (error); >>>>>> @@ -1687,7 +1662,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>>> flag, cred_t *cr, int *rvalp) >>>>>> dkm.dki_lbsize = 1U << zv->zv_min_bs; >>>>>> dkm.dki_capacity = zv->zv_volsize >> zv->zv_min_bs; >>>>>> dkm.dki_media_type = DK_UNKNOWN; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> if (ddi_copyout(&dkm, (void *)arg, sizeof (dkm), flag)) >>>>>> error = SET_ERROR(EFAULT); >>>>>> return (error); >>>>>> @@ -1697,14 +1672,14 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, >>>>>> int flag, cred_t *cr, int *rvalp) >>>>>> uint64_t vs = zv->zv_volsize; >>>>>> uint8_t bs = zv->zv_min_bs; >>>>>> >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> error = zvol_getefi((void *)arg, flag, vs, bs); >>>>>> return (error); >>>>>> } >>>>>> >>>>>> case DKIOCFLUSHWRITECACHE: >>>>>> dkc = (struct dk_callback *)arg; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>>> if ((flag & FKIOCTL) && dkc != NULL && dkc->dkc_callback) { >>>>>> (*dkc->dkc_callback)(dkc->dkc_cookie, error); >>>>>> @@ -1730,10 +1705,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, >>>>>> int flag, cred_t *cr, int *rvalp) >>>>>> } >>>>>> if (wce) { >>>>>> zv->zv_flags |= ZVOL_WCE; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> } else { >>>>>> zv->zv_flags &= ~ZVOL_WCE; >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> zil_commit(zv->zv_zilog, ZVOL_OBJ); >>>>>> } >>>>>> return (0); >>>>>> @@ -1828,7 +1803,7 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t arg, int >>>>>> flag, cred_t *cr, int *rvalp) >>>>>> break; >>>>>> >>>>>> } >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> return (error); >>>>>> } >>>>>> #endif /* sun */ >>>>>> @@ -1844,12 +1819,14 @@ zvol_init(void) >>>>>> { >>>>>> VERIFY(ddi_soft_state_init(&zfsdev_state, sizeof (zfs_soft_state_t), >>>>>> 1) == 0); >>>>>> + mutex_init(&zfsdev_state_lock, NULL, MUTEX_DEFAULT, NULL); >>>>>> ZFS_LOG(1, "ZVOL Initialized."); >>>>>> } >>>>>> >>>>>> void >>>>>> zvol_fini(void) >>>>>> { >>>>>> + mutex_destroy(&zfsdev_state_lock); >>>>>> ddi_soft_state_fini(&zfsdev_state); >>>>>> ZFS_LOG(1, "ZVOL Deinitialized."); >>>>>> } >>>>>> @@ -1889,7 +1866,7 @@ zvol_dump_init(zvol_state_t *zv, boolean_t >>>>>> resize) >>>>>> uint64_t version = spa_version(spa); >>>>>> enum zio_checksum checksum; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> ASSERT(vd->vdev_ops == &vdev_root_ops); >>>>>> >>>>>> error = dmu_free_long_range(zv->zv_objset, ZVOL_OBJ, 0, >>>>>> @@ -2437,7 +2414,7 @@ zvol_rename_minor(struct g_geom *gp, const char >>>>>> *newname) >>>>>> struct g_provider *pp; >>>>>> zvol_state_t *zv; >>>>>> >>>>>> - ASSERT(MUTEX_HELD(&spa_namespace_lock)); >>>>>> + ASSERT(MUTEX_HELD(&zfsdev_state_lock)); >>>>>> g_topology_assert(); >>>>>> >>>>>> pp = LIST_FIRST(&gp->provider); >>>>>> @@ -2471,7 +2448,7 @@ zvol_rename_minors(const char *oldname, const >>>>>> char *newname) >>>>>> newnamelen = strlen(newname); >>>>>> >>>>>> DROP_GIANT(); >>>>>> - mutex_enter(&spa_namespace_lock); >>>>>> + mutex_enter(&zfsdev_state_lock); >>>>>> g_topology_lock(); >>>>>> >>>>>> LIST_FOREACH(gp, &zfs_zvol_class.geom, geom) { >>>>>> @@ -2494,6 +2471,6 @@ zvol_rename_minors(const char *oldname, const >>>>>> char *newname) >>>>>> } >>>>>> >>>>>> g_topology_unlock(); >>>>>> - mutex_exit(&spa_namespace_lock); >>>>>> + mutex_exit(&zfsdev_state_lock); >>>>>> PICKUP_GIANT(); >>>>>> } >>>>>> -- >>>>>> 1.8.4.2 >>>>>> >>>>> >>>>>> From 57d5a383b585c32c77af54e8cdacaddf8ce7584f Mon Sep 17 00:00:00 2001 >>>>>> From: Richard Kojedzinszky >>>>>> Date: Wed, 18 Dec 2013 22:11:21 +0100 >>>>>> Subject: [PATCH 2/2] ZFS snapshot handling fix >>>>>> >>>>>> --- >>>>>> .../compat/opensolaris/kern/opensolaris_lookup.c | 13 +++--- >>>>>> .../opensolaris/uts/common/fs/zfs/zfs_ctldir.c | 53 >>>>>> +++++++++++++++------- >>>>>> 2 files changed, 42 insertions(+), 24 deletions(-) >>>>>> >>>>>> diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> index 94383d6..4cac053 100644 >>>>>> --- a/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_lookup.c >>>>>> @@ -81,6 +81,8 @@ traverse(vnode_t **cvpp, int lktype) >>>>>> * progress on this vnode. >>>>>> */ >>>>>> >>>>>> + vn_lock(cvp, lktype); >>>>>> + >>>>>> for (;;) { >>>>>> /* >>>>>> * Reached the end of the mount chain? >>>>>> @@ -89,13 +91,7 @@ traverse(vnode_t **cvpp, int lktype) >>>>>> if (vfsp == NULL) >>>>>> break; >>>>>> error = vfs_busy(vfsp, 0); >>>>>> - /* >>>>>> - * tvp is NULL for *cvpp vnode, which we can't unlock. >>>>>> - */ >>>>>> - if (tvp != NULL) >>>>>> - vput(cvp); >>>>>> - else >>>>>> - vrele(cvp); >>>>>> + VOP_UNLOCK(cvp, 0); >>>>>> if (error) >>>>>> return (error); >>>>>> >>>>>> @@ -107,6 +103,9 @@ traverse(vnode_t **cvpp, int lktype) >>>>>> vfs_unbusy(vfsp); >>>>>> if (error != 0) >>>>>> return (error); >>>>>> + >>>>>> + VN_RELE(cvp); >>>>>> + >>>>>> cvp = tvp; >>>>>> } >>>>>> >>>>>> diff --git >>>>>> a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> index 28ab1fa..d3464b4 100644 >>>>>> --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c >>>>>> @@ -112,6 +112,25 @@ snapentry_compare(const void *a, const void *b) >>>>>> return (0); >>>>>> } >>>>>> >>>>>> +/* Return the zfsctl_snapdir_t object from current vnode, following >>>>>> + * the lock orders in zfsctl_snapdir_lookup() to avoid deadlocks. >>>>>> + * On return the passed in vp is unlocked */ >>>>>> +static zfsctl_snapdir_t * >>>>>> +zfsctl_snapshot_get_snapdir(vnode_t *vp, vnode_t **dvpp) >>>>>> +{ >>>>>> + gfs_dir_t *dp = vp->v_data; >>>>>> + *dvpp = dp->gfsd_file.gfs_parent; >>>>>> + zfsctl_snapdir_t *sdp; >>>>>> + >>>>>> + VN_HOLD(*dvpp); >>>>>> + VOP_UNLOCK(vp, 0); >>>>>> + vn_lock(*dvpp, LK_SHARED | LK_RETRY | LK_CANRECURSE); >>>>>> + sdp = (*dvpp)->v_data; >>>>>> + VOP_UNLOCK(*dvpp, 0); >>>>>> + >>>>>> + return (sdp); >>>>>> +} >>>>>> + >>>>>> #ifdef sun >>>>>> vnodeops_t *zfsctl_ops_root; >>>>>> vnodeops_t *zfsctl_ops_snapdir; >>>>>> @@ -1013,6 +1032,8 @@ zfsctl_snapdir_lookup(ap) >>>>>> * The snapshot was unmounted behind our backs, >>>>>> * try to remount it. >>>>>> */ >>>>>> + VOP_UNLOCK(*vpp, 0); >>>>>> + VN_HOLD(*vpp); >>>>>> VERIFY(zfsctl_snapshot_zname(dvp, nm, MAXNAMELEN, >>>>>> snapname) == 0); >>>>>> goto domount; >>>>>> } else { >>>>>> @@ -1064,7 +1085,6 @@ zfsctl_snapdir_lookup(ap) >>>>>> sep->se_name = kmem_alloc(strlen(nm) + 1, KM_SLEEP); >>>>>> (void) strcpy(sep->se_name, nm); >>>>>> *vpp = sep->se_root = zfsctl_snapshot_mknode(dvp, >>>>>> dmu_objset_id(snap)); >>>>>> - VN_HOLD(*vpp); >>>>>> avl_insert(&sdp->sd_snaps, sep, where); >>>>>> >>>>>> dmu_objset_rele(snap, FTAG); >>>>>> @@ -1075,6 +1095,7 @@ domount: >>>>>> (void) snprintf(mountpoint, mountpoint_len, >>>>>> "%s/" ZFS_CTLDIR_NAME "/snapshot/%s", >>>>>> dvp->v_vfsp->mnt_stat.f_mntonname, nm); >>>>>> + VN_HOLD(*vpp); >>>>>> err = mount_snapshot(curthread, vpp, "zfs", mountpoint, snapname, 0); >>>>>> kmem_free(mountpoint, mountpoint_len); >>>>>> if (err == 0) { >>>>>> @@ -1464,16 +1485,18 @@ zfsctl_snapshot_inactive(ap) >>>>>> int locked; >>>>>> vnode_t *dvp; >>>>>> >>>>>> - if (vp->v_count > 0) >>>>>> - goto end; >>>>>> - >>>>>> - VERIFY(gfs_dir_lookup(vp, "..", &dvp, cr, 0, NULL, NULL) == 0); >>>>>> - sdp = dvp->v_data; >>>>>> - VOP_UNLOCK(dvp, 0); >>>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>>> >>>>>> if (!(locked = MUTEX_HELD(&sdp->sd_lock))) >>>>>> mutex_enter(&sdp->sd_lock); >>>>>> >>>>>> + vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); >>>>>> + >>>>>> + if (vp->v_count > 0) { >>>>>> + mutex_exit(&sdp->sd_lock); >>>>>> + return (0); >>>>>> + } >>>>>> + >>>>>> ASSERT(!vn_ismntpt(vp)); >>>>>> >>>>>> sep = avl_first(&sdp->sd_snaps); >>>>>> @@ -1494,7 +1517,6 @@ zfsctl_snapshot_inactive(ap) >>>>>> mutex_exit(&sdp->sd_lock); >>>>>> VN_RELE(dvp); >>>>>> >>>>>> -end: >>>>>> /* >>>>>> * Dispose of the vnode for the snapshot mount point. >>>>>> * This is safe to do because once this entry has been removed >>>>>> @@ -1595,20 +1617,17 @@ zfsctl_snapshot_lookup(ap) >>>>>> static int >>>>>> zfsctl_snapshot_vptocnp(struct vop_vptocnp_args *ap) >>>>>> { >>>>>> - zfsvfs_t *zfsvfs = ap->a_vp->v_vfsp->vfs_data; >>>>>> - vnode_t *dvp, *vp; >>>>>> + vnode_t *dvp, *vp = ap->a_vp; >>>>>> zfsctl_snapdir_t *sdp; >>>>>> zfs_snapentry_t *sep; >>>>>> - int error; >>>>>> + int error = 0; >>>>>> >>>>>> - ASSERT(zfsvfs->z_ctldir != NULL); >>>>>> - error = zfsctl_root_lookup(zfsvfs->z_ctldir, "snapshot", &dvp, >>>>>> - NULL, 0, NULL, kcred, NULL, NULL, NULL); >>>>>> - if (error != 0) >>>>>> - return (error); >>>>>> - sdp = dvp->v_data; >>>>>> + sdp = zfsctl_snapshot_get_snapdir(vp, &dvp); >>>>>> >>>>>> mutex_enter(&sdp->sd_lock); >>>>>> + >>>>>> + vn_lock(vp, LK_SHARED | LK_RETRY); >>>>>> + >>>>>> sep = avl_first(&sdp->sd_snaps); >>>>>> while (sep != NULL) { >>>>>> vp = sep->se_root; >>>>>> -- >>>>>> 1.8.4.2 >>>>>> >>>>> >>>>> -- >>>>> Pawel Jakub Dawidek http://www.wheelsystems.com >>>>> FreeBSD committer http://www.FreeBSD.org >>>>> Am I Evil? Yes, I Am! http://mobter.com >>>>> >>>> _______________________________________________ >>>> freebsd-fs@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >>>> >> > --1030603365-641758273-1387948957=:2785 Content-Type: TEXT/x-diff; name=zfs-gfs.patch Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: Content-Disposition: attachment; filename=zfs-gfs.patch Y29tbWl0IDQ3MmVkNGIyM2Y5ZDAxZmEwOGUwNmFhYmNiNjA0MDYyZmZlODY4 MWMNCkF1dGhvcjogUmljaGFyZCBLb2plZHppbnN6a3kgPGtyaWNoeUBjZmxp bnV4Lmh1Pg0KRGF0ZTogICBXZWQgRGVjIDI1IDA2OjA3OjQxIDIwMTMgKzAx MDANCg0KICAgIFpGUy9HRlMgbG9ja2luZyBmaXhlcw0KDQpkaWZmIC0tZ2l0 IGEvc3lzL2NkZGwvY29tcGF0L29wZW5zb2xhcmlzL2tlcm4vb3BlbnNvbGFy aXNfbG9va3VwLmMgYi9zeXMvY2RkbC9jb21wYXQvb3BlbnNvbGFyaXMva2Vy bi9vcGVuc29sYXJpc19sb29rdXAuYw0KaW5kZXggNjU2NTgxMC4uNzRlZDE3 MyAxMDA2NDQNCi0tLSBhL3N5cy9jZGRsL2NvbXBhdC9vcGVuc29sYXJpcy9r ZXJuL29wZW5zb2xhcmlzX2xvb2t1cC5jDQorKysgYi9zeXMvY2RkbC9jb21w YXQvb3BlbnNvbGFyaXMva2Vybi9vcGVuc29sYXJpc19sb29rdXAuYw0KQEAg LTgxLDYgKzgxLDggQEAgdHJhdmVyc2Uodm5vZGVfdCAqKmN2cHAsIGludCBs a3R5cGUpDQogCSAqIHByb2dyZXNzIG9uIHRoaXMgdm5vZGUuDQogCSAqLw0K IA0KKwl2bl9sb2NrKGN2cCwgbGt0eXBlKTsNCisNCiAJZm9yICg7Oykgew0K IAkJLyoNCiAJCSAqIFJlYWNoZWQgdGhlIGVuZCBvZiB0aGUgbW91bnQgY2hh aW4/DQpAQCAtODksMTMgKzkxLDcgQEAgdHJhdmVyc2Uodm5vZGVfdCAqKmN2 cHAsIGludCBsa3R5cGUpDQogCQlpZiAodmZzcCA9PSBOVUxMKQ0KIAkJCWJy ZWFrOw0KIAkJZXJyb3IgPSB2ZnNfYnVzeSh2ZnNwLCAwKTsNCi0JCS8qDQot CQkgKiB0dnAgaXMgTlVMTCBmb3IgKmN2cHAgdm5vZGUsIHdoaWNoIHdlIGNh bid0IHVubG9jay4NCi0JCSAqLw0KLQkJaWYgKHR2cCAhPSBOVUxMKQ0KLQkJ CXZwdXQoY3ZwKTsNCi0JCWVsc2UNCi0JCQl2cmVsZShjdnApOw0KKwkJVk9Q X1VOTE9DSyhjdnAsIDApOw0KIAkJaWYgKGVycm9yKQ0KIAkJCXJldHVybiAo ZXJyb3IpOw0KIA0KQEAgLTEwNyw2ICsxMDMsOSBAQCB0cmF2ZXJzZSh2bm9k ZV90ICoqY3ZwcCwgaW50IGxrdHlwZSkNCiAJCXZmc191bmJ1c3kodmZzcCk7 DQogCQlpZiAoZXJyb3IgIT0gMCkNCiAJCQlyZXR1cm4gKGVycm9yKTsNCisN CisJCVZOX1JFTEUoY3ZwKTsNCisNCiAJCWN2cCA9IHR2cDsNCiAJfQ0KIA0K ZGlmZiAtLWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRz L2NvbW1vbi9mcy9nZnMuYyBiL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFy aXMvdXRzL2NvbW1vbi9mcy9nZnMuYw0KaW5kZXggNTk5NDRhMS4uMGY4ZjE1 NyAxMDA2NDQNCi0tLSBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMv dXRzL2NvbW1vbi9mcy9nZnMuYw0KKysrIGIvc3lzL2NkZGwvY29udHJpYi9v cGVuc29sYXJpcy91dHMvY29tbW9uL2ZzL2dmcy5jDQpAQCAtNDQ4LDcgKzQ0 OCw2IEBAIGdmc19sb29rdXBfZG90KHZub2RlX3QgKip2cHAsIHZub2RlX3Qg KmR2cCwgdm5vZGVfdCAqcHZwLCBjb25zdCBjaGFyICpubSkNCiAJCQlWTl9I T0xEKHB2cCk7DQogCQkJKnZwcCA9IHB2cDsNCiAJCX0NCi0JCXZuX2xvY2so KnZwcCwgTEtfRVhDTFVTSVZFIHwgTEtfUkVUUlkpOw0KIAkJcmV0dXJuICgw KTsNCiAJfQ0KIA0KQEAgLTQ4NSw2ICs0ODQsNyBAQCBnZnNfZmlsZV9jcmVh dGUoc2l6ZV90IHNpemUsIHZub2RlX3QgKnB2cCwgdmZzX3QgKnZmc3AsIHZu b2Rlb3BzX3QgKm9wcykNCiAJZnAgPSBrbWVtX3phbGxvYyhzaXplLCBLTV9T TEVFUCk7DQogCWVycm9yID0gZ2V0bmV3dm5vZGUoInpmcyIsIHZmc3AsIG9w cywgJnZwKTsNCiAJQVNTRVJUKGVycm9yID09IDApOw0KKwlWTl9MT0NLX0FT SEFSRSh2cCk7DQogCXZuX2xvY2sodnAsIExLX0VYQ0xVU0lWRSB8IExLX1JF VFJZKTsNCiAJdnAtPnZfZGF0YSA9IChjYWRkcl90KWZwOw0KIA0KZGlmZiAt LWdpdCBhL3N5cy9jZGRsL2NvbnRyaWIvb3BlbnNvbGFyaXMvdXRzL2NvbW1v bi9mcy96ZnMvemZzX2N0bGRpci5jIGIvc3lzL2NkZGwvY29udHJpYi9vcGVu c29sYXJpcy91dHMvY29tbW9uL2ZzL3pmcy96ZnNfY3RsZGlyLmMNCmluZGV4 IDI4YWIxZmEuLjY4MDY4NTggMTAwNjQ0DQotLS0gYS9zeXMvY2RkbC9jb250 cmliL29wZW5zb2xhcmlzL3V0cy9jb21tb24vZnMvemZzL3pmc19jdGxkaXIu Yw0KKysrIGIvc3lzL2NkZGwvY29udHJpYi9vcGVuc29sYXJpcy91dHMvY29t bW9uL2ZzL3pmcy96ZnNfY3RsZGlyLmMNCkBAIC02MTIsNyArNjEyLDcgQEAg emZzY3RsX2ZyZWVic2Rfcm9vdF9sb29rdXAoYXApDQogDQogCWVyciA9IHpm c2N0bF9yb290X2xvb2t1cChkdnAsIG5tLCB2cHAsIE5VTEwsIDAsIE5VTEws IGNyLCBOVUxMLCBOVUxMLCBOVUxMKTsNCiAJaWYgKGVyciA9PSAwICYmIChu bVswXSAhPSAnLicgfHwgbm1bMV0gIT0gJ1wwJykpDQotCQl2bl9sb2NrKCp2 cHAsIExLX0VYQ0xVU0lWRSB8IExLX1JFVFJZKTsNCisJCXZuX2xvY2soKnZw cCwgYXAtPmFfY25wLT5jbl9sa2ZsYWdzIHwgTEtfUkVUUlkpOw0KIAlyZXR1 cm4gKGVycik7DQogfQ0KIA0KQEAgLTk3NSw2ICs5NzUsOCBAQCB6ZnNjdGxf c25hcGRpcl9sb29rdXAoYXApDQogCVpGU19FTlRFUih6ZnN2ZnMpOw0KIA0K IAlpZiAoZ2ZzX2xvb2t1cF9kb3QodnBwLCBkdnAsIHpmc3Zmcy0+el9jdGxk aXIsIG5tKSA9PSAwKSB7DQorCQlpZiAobm1bMF0gIT0gJy4nIHx8IG5tWzFd ICE9ICdcMCcpDQorCQkJdm5fbG9jaygqdnBwLCBhcC0+YV9jbnAtPmNuX2xr ZmxhZ3MgfCBMS19SRVRSWSk7DQogCQlaRlNfRVhJVCh6ZnN2ZnMpOw0KIAkJ cmV0dXJuICgwKTsNCiAJfQ0KQEAgLTEwMDQsNyArMTAwNiw3IEBAIHpmc2N0 bF9zbmFwZGlyX2xvb2t1cChhcCkNCiAJaWYgKChzZXAgPSBhdmxfZmluZCgm c2RwLT5zZF9zbmFwcywgJnNlYXJjaCwgJndoZXJlKSkgIT0gTlVMTCkgew0K IAkJKnZwcCA9IHNlcC0+c2Vfcm9vdDsNCiAJCVZOX0hPTEQoKnZwcCk7DQot CQllcnIgPSB0cmF2ZXJzZSh2cHAsIExLX0VYQ0xVU0lWRSB8IExLX1JFVFJZ KTsNCisJCWVyciA9IHRyYXZlcnNlKHZwcCwgYXAtPmFfY25wLT5jbl9sa2Zs YWdzIHwgTEtfUkVUUlkpOw0KIAkJaWYgKGVyciAhPSAwKSB7DQogCQkJVk5f UkVMRSgqdnBwKTsNCiAJCQkqdnBwID0gTlVMTDsNCkBAIC0xMDEzLDYgKzEw MTUsOCBAQCB6ZnNjdGxfc25hcGRpcl9sb29rdXAoYXApDQogCQkJICogVGhl IHNuYXBzaG90IHdhcyB1bm1vdW50ZWQgYmVoaW5kIG91ciBiYWNrcywNCiAJ CQkgKiB0cnkgdG8gcmVtb3VudCBpdC4NCiAJCQkgKi8NCisJCQlWT1BfVU5M T0NLKCp2cHAsIDApOw0KKwkJCVZOX0hPTEQoKnZwcCk7DQogCQkJVkVSSUZZ KHpmc2N0bF9zbmFwc2hvdF96bmFtZShkdnAsIG5tLCBNQVhOQU1FTEVOLCBz bmFwbmFtZSkgPT0gMCk7DQogCQkJZ290byBkb21vdW50Ow0KIAkJfSBlbHNl IHsNCkBAIC0xMDY0LDcgKzEwNjgsNiBAQCB6ZnNjdGxfc25hcGRpcl9sb29r dXAoYXApDQogCXNlcC0+c2VfbmFtZSA9IGttZW1fYWxsb2Moc3RybGVuKG5t KSArIDEsIEtNX1NMRUVQKTsNCiAJKHZvaWQpIHN0cmNweShzZXAtPnNlX25h bWUsIG5tKTsNCiAJKnZwcCA9IHNlcC0+c2Vfcm9vdCA9IHpmc2N0bF9zbmFw c2hvdF9ta25vZGUoZHZwLCBkbXVfb2Jqc2V0X2lkKHNuYXApKTsNCi0JVk5f SE9MRCgqdnBwKTsNCiAJYXZsX2luc2VydCgmc2RwLT5zZF9zbmFwcywgc2Vw LCB3aGVyZSk7DQogDQogCWRtdV9vYmpzZXRfcmVsZShzbmFwLCBGVEFHKTsN CkBAIC0xMDc1LDYgKzEwNzgsNyBAQCBkb21vdW50Og0KIAkodm9pZCkgc25w cmludGYobW91bnRwb2ludCwgbW91bnRwb2ludF9sZW4sDQogCSAgICAiJXMv IiBaRlNfQ1RMRElSX05BTUUgIi9zbmFwc2hvdC8lcyIsDQogCSAgICBkdnAt PnZfdmZzcC0+bW50X3N0YXQuZl9tbnRvbm5hbWUsIG5tKTsNCisJVk5fSE9M RCgqdnBwKTsNCiAJZXJyID0gbW91bnRfc25hcHNob3QoY3VydGhyZWFkLCB2 cHAsICJ6ZnMiLCBtb3VudHBvaW50LCBzbmFwbmFtZSwgMCk7DQogCWttZW1f ZnJlZShtb3VudHBvaW50LCBtb3VudHBvaW50X2xlbik7DQogCWlmIChlcnIg PT0gMCkgew0KQEAgLTExMzAsNiArMTEzNCw4IEBAIHpmc2N0bF9zaGFyZXNf bG9va3VwKGFwKQ0KIAlzdHJsY3B5KG5tLCBjbnAtPmNuX25hbWVwdHIsIGNu cC0+Y25fbmFtZWxlbiArIDEpOw0KIA0KIAlpZiAoZ2ZzX2xvb2t1cF9kb3Qo dnBwLCBkdnAsIHpmc3Zmcy0+el9jdGxkaXIsIG5tKSA9PSAwKSB7DQorCQlp ZiAobm1bMF0gIT0gJy4nIHx8IG5tWzFdICE9ICdcMCcpDQorCQkJdm5fbG9j aygqdnBwLCBhcC0+YV9jbnAtPmNuX2xrZmxhZ3MgfCBMS19SRVRSWSk7DQog CQlaRlNfRVhJVCh6ZnN2ZnMpOw0KIAkJcmV0dXJuICgwKTsNCiAJfQ0KQEAg LTE0NjIsMTggKzE0NjgsMjUgQEAgemZzY3RsX3NuYXBzaG90X2luYWN0aXZl KGFwKQ0KIAl6ZnNjdGxfc25hcGRpcl90ICpzZHA7DQogCXpmc19zbmFwZW50 cnlfdCAqc2VwLCAqbmV4dDsNCiAJaW50IGxvY2tlZDsNCi0Jdm5vZGVfdCAq ZHZwOw0KLQ0KLQlpZiAodnAtPnZfY291bnQgPiAwKQ0KLQkJZ290byBlbmQ7 DQorCWdmc19kaXJfdCAqZHAgPSB2cC0+dl9kYXRhOw0KKwl2bm9kZV90ICpk dnAgPSBkcC0+Z2ZzZF9maWxlLmdmc19wYXJlbnQ7DQogDQotCVZFUklGWShn ZnNfZGlyX2xvb2t1cCh2cCwgIi4uIiwgJmR2cCwgY3IsIDAsIE5VTEwsIE5V TEwpID09IDApOw0KKwlWTl9IT0xEKGR2cCk7DQorCVZPUF9VTkxPQ0sodnAs IDApOw0KIAlzZHAgPSBkdnAtPnZfZGF0YTsNCi0JVk9QX1VOTE9DSyhkdnAs IDApOw0KIA0KIAlpZiAoIShsb2NrZWQgPSBNVVRFWF9IRUxEKCZzZHAtPnNk X2xvY2spKSkNCiAJCW11dGV4X2VudGVyKCZzZHAtPnNkX2xvY2spOw0KIA0K Kwl2bl9sb2NrKHZwLCBMS19FWENMVVNJVkUgfCBMS19SRVRSWSk7DQorDQor CWlmICh2cC0+dl9jb3VudCA+IDApIHsNCisJCWlmICghbG9ja2VkKQ0KKwkJ CW11dGV4X2V4aXQoJnNkcC0+c2RfbG9jayk7DQorCQlWTl9SRUxFKGR2cCk7 DQorCQlyZXR1cm4oMCk7DQorCX0NCisNCiAJQVNTRVJUKCF2bl9pc21udHB0 KHZwKSk7DQogDQogCXNlcCA9IGF2bF9maXJzdCgmc2RwLT5zZF9zbmFwcyk7 DQpAQCAtMTQ5NCw3ICsxNTA3LDYgQEAgemZzY3RsX3NuYXBzaG90X2luYWN0 aXZlKGFwKQ0KIAkJbXV0ZXhfZXhpdCgmc2RwLT5zZF9sb2NrKTsNCiAJVk5f UkVMRShkdnApOw0KIA0KLWVuZDoNCiAJLyoNCiAJICogRGlzcG9zZSBvZiB0 aGUgdm5vZGUgZm9yIHRoZSBzbmFwc2hvdCBtb3VudCBwb2ludC4NCiAJICog VGhpcyBpcyBzYWZlIHRvIGRvIGJlY2F1c2Ugb25jZSB0aGlzIGVudHJ5IGhh cyBiZWVuIHJlbW92ZWQNCkBAIC0xNTg4LDcgKzE2MDAsNyBAQCB6ZnNjdGxf c25hcHNob3RfbG9va3VwKGFwKQ0KIAllcnJvciA9IHpmc2N0bF9yb290X2xv b2t1cCh6ZnN2ZnMtPnpfY3RsZGlyLCAic25hcHNob3QiLCB2cHAsDQogCSAg ICBOVUxMLCAwLCBOVUxMLCBjciwgTlVMTCwgTlVMTCwgTlVMTCk7DQogCWlm IChlcnJvciA9PSAwKQ0KLQkJdm5fbG9jaygqdnBwLCBMS19FWENMVVNJVkUg fCBMS19SRVRSWSk7DQorCQl2bl9sb2NrKCp2cHAsIGFwLT5hX2NucC0+Y25f bGtmbGFncyB8IExLX1JFVFJZKTsNCiAJcmV0dXJuIChlcnJvcik7DQogfQ0K IA0K --1030603365-641758273-1387948957=:2785--