From owner-freebsd-fs@FreeBSD.ORG Sun Aug 28 07:36:33 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 273801065673 for ; Sun, 28 Aug 2011 07:36:33 +0000 (UTC) (envelope-from trent@snakebite.org) Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188]) by mx1.freebsd.org (Postfix) with ESMTP id 0D9788FC13 for ; Sun, 28 Aug 2011 07:36:32 +0000 (UTC) Received: from EXCASUM07.exchhosting.com (192.168.11.194) by exhub08.exchhosting.com (192.168.11.106) with Microsoft SMTP Server (TLS) id 8.2.213.0; Sun, 28 Aug 2011 00:26:26 -0700 Received: from EXMBX10.exchhosting.com ([fe80:0000:0000:0000:8133:164f:44.75.166.49]) by EXCASUM07.exchhosting.com ([192.168.11.194]) with mapi; Sun, 28 Aug 2011 00:26:26 -0700 From: Trent Nelson To: "fs@freebsd.org" Date: Sun, 28 Aug 2011 00:26:26 -0700 Thread-Topic: How do you zfs rename a dataset that has children? Thread-Index: AcxlU8tu8ntNhIgBSISPvhSrpgKwfg== Message-ID: <5B89610A-4D16-422A-9E52-F182CAF76D68@snakebite.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Subject: How do you zfs rename a dataset that has children? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 28 Aug 2011 07:36:33 -0000 This has me scratching my head: [root@usbkey/ttypts/0(~)#] zfs create tank/host/foo [root@usbkey/ttypts/0(~)#] zfs create tank/host/foo/bar =20 [root@usbkey/ttypts/0(~)#] zfs rename tank/host/foo tank/host/test Assertion failed: (!clp->cl_alldependents), file /usr/src/cddl/lib/libzfs/.= ./../../cddl/contrib/opensolaris/lib/libzfs/common/libzfs_changelist.c, lin= e 470. zsh: abort (core dumped) zfs rename tank/host/foo tank/host/test Say wha'? Renaming the child dataset first works, but it's not what I want, obviously= : [root@usbkey/ttypts/0(~)#] zfs rename tank/host/foo/bar tank/host/test/bar cannot create 'tank/host/test/bar': parent does not exist [root@usbkey/ttypts/0(~)#] zfs rename -p tank/host/foo/bar tank/host/test/b= ar [root@usbkey/ttypts/0(~)#] zfs rename tank/host/foo tank/host/test/bar =20 cannot rename 'tank/host/foo': dataset already exists [root@usbkey/ttypts/0(~)#] uname -a FreeBSD usbkey.home.trent.me 8.2-STABLE FreeBSD 8.2-STABLE #2 r224667M: Sat= Aug 6 04:11:46 EDT 2011 root@home.trent.me:/usr/obj/usr/src/sys/GENER= IC amd64 (The 'M' in r224667M is due to a device ID I changed in e1000.h; unrelated = to this issue.) What am I doing wrong? (My actual use case is more complicated than the test case above; I built a= new system, named 'fulcrum', from an existing build, named 'flanker'. I wa= nt to rename tank/host/flanker -> tank/host/fulcrum (hence booting from the= usbkey).) Regards, Trent. From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 11:07:07 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C45851065678 for ; Mon, 29 Aug 2011 11:07:07 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id B2FED8FC16 for ; Mon, 29 Aug 2011 11:07:07 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7TB77n5089262 for ; Mon, 29 Aug 2011 11:07:07 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7TB77jW089260 for freebsd-fs@FreeBSD.org; Mon, 29 Aug 2011 11:07:07 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 29 Aug 2011 11:07:07 GMT Message-Id: <201108291107.p7TB77jW089260@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 11:07:07 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/160035 fs [zfs] zfs rollback does not invalidate mmapped cache o kern/159971 fs [ffs] [panic] panic with soft updates journaling durin o kern/159930 fs [ufs] [panic] kernel core o kern/159418 fs [tmpfs] [panic] tmpfs kernel panic: recursing on non r o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159233 fs [ext2fs] [patch] fs/ext2fs: finish reallocblk implemen o kern/159232 fs [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs [amd] amd(8) ICMP storm and unkillable process. o kern/158711 fs [ffs] [panic] panic in ffs_blkfree and ffs_valloc o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157722 fs [geli] unable to newfs a geli encrypted partition o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156933 fs [zfs] ZFS receive after read on readonly=on filesystem o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support multibyte inter o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs f kern/130133 fs [panic] [zfs] 'kmem_map too small' caused by make clea o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs f kern/127375 fs [zfs] If vm.kmem_size_max>"1073741823" then write spee o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi f kern/126703 fs [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F f kern/120210 fs [zfs] [panic] reboot after panic: solaris assert: arc_ o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 246 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 17:32:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 347981065670; Mon, 29 Aug 2011 17:32:48 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id C1F278FC0A; Mon, 29 Aug 2011 17:32:47 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id E6A994AC31; Mon, 29 Aug 2011 21:32:45 +0400 (MSD) Date: Mon, 29 Aug 2011 21:32:44 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1742839983.20110829213244@serebryakov.spb.ru> To: Ivan Voras In-Reply-To: References: <1963980291.20110826232758@serebryakov.spb.ru> <201108262052.p7QKqpen039191@chez.mckusick.com> <758608837.20110827112116@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Strange behaviour of UFS2+SU FS on FreeBSD 8-Stable: dreadful perofrmance for old data, excellent for new. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 17:32:48 -0000 Hello, Ivan. You wrote 27 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 21:02:44: >> I'm going to investigate alter, why it is ony ~180MiB/s, when >> theoretically it should be about (90*4) 360MiB/s linear read, and whom >> to blame: UFS or geom_raid5 or both :) > Try this: http://ivoras.net/blog/tree/2010-11-19.ufs-read-ahead.html > (or it could be a hardware issue - controller bottleneck or something > like that). It is more strange and more complex, than simple "180MiB/s" read. I have: (1) software RAID5 (geom_raid5) on 5xWD Green HDDs (yes, I know, that seek is not very fast on these disks, but I'm discussung only linear access now). Stripe size is 128KiB. Theoretical maximum performance is about 4*90 =3D 360MiB/s. (2) FS with 32K blocks (unfortunately, here is (WAS?) old bug, when system lock up when here are 16KiB/s and 64KiB/-sized FSes in one syste= m). (3) vfs.read_max=3D32, it means 32*32 =3D 1024KiB =3D 8 RAID stripes. Enough for parallel requests. And, in such conditions, good placed (not legacy ones, which are very fragmented, as were written on almost full FS) large (more than 1GiB) files fives from 120MiB/s up to 350MiB/s. Some files tend to read more fast, some not so fast, but it seems that speed could vary for one file from run to run (yes, I clean memory cache by reading big files between "benhc" euns). And, yes, 350MiB/s is not typical. 120-180MiB/s encounters much, much often than larger speeds. Do you have any ideas, how to debug this situation and make sure, that geom_raid5 does it best and is not bottleneck? Maybe some other UFS2 tunings or diagnostics? I've tried such configuration with software (ICH9R, which is not hardware implementation for sure) RAID5 on Windows and it was much more consistent (and almost always shows speed near theoretical maximum). Another question is how to measure and diagnose writing... --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 19:06:15 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C28C31065676 for ; Mon, 29 Aug 2011 19:06:15 +0000 (UTC) (envelope-from hans@beastielabs.net) Received: from mail.beastielabs.net (beasties.demon.nl [82.161.3.114]) by mx1.freebsd.org (Postfix) with ESMTP id 172328FC16 for ; Mon, 29 Aug 2011 19:06:14 +0000 (UTC) Received: from testsoekris.hotsoft.nl (localhost [127.0.0.1]) by mail.beastielabs.net (8.14.4/8.14.4) with ESMTP id p7TIUwJc058058; Mon, 29 Aug 2011 20:30:58 +0200 (CEST) (envelope-from hans@testsoekris.hotsoft.nl) Received: (from hans@localhost) by testsoekris.hotsoft.nl (8.14.4/8.14.4/Submit) id p7TIUwLx058057; Mon, 29 Aug 2011 20:30:58 +0200 (CEST) (envelope-from hans) Date: Mon, 29 Aug 2011 20:30:58 +0200 From: Hans Ottevanger To: freebsd-current@freebsd.org Message-ID: <20110829183058.GA57564@testsoekris.hotsoft.nl> References: <4E4F71B5.3010606@barafranca.com> <20110821100426.GA28260@testsoekris.hotsoft.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110821100426.GA28260@testsoekris.hotsoft.nl> User-Agent: Mutt/1.4.2.3i Cc: freebsd-fs@freebsd.org Subject: Snapshots fail with UFS+J (was: Re: Fwd: Re: Can *you* UFS snapshot a filesystem with 9.0-BETA1?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 19:06:15 -0000 On Sun, Aug 21, 2011 at 12:04:26PM +0200, Hans Ottevanger wrote: > On Sat, Aug 20, 2011 at 09:35:01AM +0100, Hugo Silva wrote: > > > > > > Le Thu, 18 Aug 2011 10:22:31 +0100, > > Hugo Silva a ?crit : > > > > Hello, > > > > > I'm wondering. On a virtual machine (amd64 HVM+PV), it's crashing > > > every time. Not sure if this is SNAFU, as I had never used ufs > > > snapshots on freebsd before. > > > > > > After running mksnap_ffs, ssh stops working (a telnet session doesn't > > > show the sshd banner). The ssh session where the command was run from > > > stops responding, the webserver dies and xm console'ing from the dom0 > > > works, but the VM is unresponsive (ie no login prompt on ENTER). > > > > > > Anyone else seeing the same? > > > > I've tried in a FreeBSD guest (9.0-beta1/i386) into VirtualBox and > > I see a LOR (or looks like a LOR), then the system is freezed. > > This is 100% reproductible. > > > > Unfortunatly, I'm not able to dump a panic or to break into the > > debugger, so a screenshot : > > http://user.lamaiziere.net/patrick/public/lormksnap.png > > > > You should ask on freebsd-current@ > > > > Hi, > > I can confirm that this happens on "real iron" too. > > I use an i386 test installation (P4 2.4 GHz, 2GB RAM, 500GB PATA disk), > running 9.0-BETA1 as distributed (with a kernel effectively being GENERIC > with devices removed that I don't have). When I try to make a snapshot > using > > cd /usr; mksnap_ffs /usr/.snap/testsnap > > the system is still responsive for a few seconds, with lots of disk > activity, but then it prints the following output on the console (using > firewire and dcons to ease capturing): > > lock order reversal: > 1st 0xc5a289e8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425 > 2nd 0xdeb3c078 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 > 3rd 0xc5663af8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546 > KDB: stack backtrace: > db_trace_self_wrapper(c09ec6ba,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26 > kdb_backtrace(c07099eb,c09efe14,c5035308,c5039408,c4fda440,...) at kdb_backtrace+0x2a > _witness_debugger(c09efe14,c5663af8,c09df984,c5039408,c0a10ba2,...) at _witness_debugger+0x25 > witness_checkorder(c5663af8,9,c0a10ba2,222,0,...) at witness_checkorder+0x839 > __lockmgr_args(c5663af8,80100,c5663b18,0,0,...) at __lockmgr_args+0x804 > ffs_lock(c4fda568,c0bf1250,c59b9c30,80100,c5663aa0,...) at ffs_lock+0x8a > VOP_LOCK1_APV(c0a7fb80,c4fda568,c4fda588,c0a8df20,c5663aa0,...) at VOP_LOCK1_APV+0xb5 > _vn_lock(c5663aa0,80100,c0a10ba2,222,c5011e80,...) at _vn_lock+0x5e > ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x14cb > ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13 > vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7 > nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84 > syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263 > syscall(c4fdad28) at syscall+0x34 > Xint0x80_syscall() at Xint0x80_syscall+0x21 > --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 --- > lock order reversal: > 1st 0xdeb3c078 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 > 2nd 0xc51a72dc snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818 > KDB: stack backtrace: > db_trace_self_wrapper(c09ec6ba,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26 > kdb_backtrace(c07099eb,c09efdfb,c5035308,c5039b58,c4fda440,...) at kdb_backtrace+0x2a > _witness_debugger(c09efdfb,c51a72dc,c0a10c04,c5039b58,c0a10ba2,...) at _witness_debugger+0x25 > witness_checkorder(c51a72dc,9,c0a10ba2,332,c5a28a08,...) at witness_checkorder+0x839 > __lockmgr_args(c51a72dc,80400,c5a28a08,0,0,...) at __lockmgr_args+0x804 > ffs_lock(c4fda568,deb2434c,100000,80400,c5a28990,...) at ffs_lock+0x8a > VOP_LOCK1_APV(c0a7fb80,c4fda568,deb243a8,c0a8df20,c5a28990,...) at VOP_LOCK1_APV+0xb5 > _vn_lock(c5a28990,80400,c0a10ba2,332,0,...) at _vn_lock+0x5e > ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x295e > ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13 > vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7 > nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84 > syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263 > syscall(c4fdad28) at syscall+0x34 > Xint0x80_syscall() at Xint0x80_syscall+0x21 > --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 --- > > After this the system is fully unresponsive and requires a hard reset. > > Once rebooted, the snapshot file appears to exist, but is unusable. > > When reverting to just softupdates, i.e. disabling journaling on /usr, > everything goes well, except that the same LOR's still do occur, though > the addresses differ. > > My amd64 9.0-CURRENT system, just updated to r225055, has the same issue, > but since I do not have WITNESS in the kernel config there, the console > output is missing. > > BTW, this issue also makes dump(9) hang the system when the -L option > is used. > > Kind regards, > > Hans Ottevanger > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" Since I did not see any response to these messages and I cannot imagine that Hugo and I are the only ones with this issue, I will follow up to my own post. I have tried just yesterday to make a snapshot of the /usr filesystem (about 16 GB) of my amd64 test system (Q6600, 8GB RAM, 500GB SATA disk) running 9.0-BETA1 (r225228) and the problem still occurs. After these LOR's: lock order reversal: 1st 0xfffffe00073ab278 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425 2nd 0xffffff81eb243498 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 3rd 0xfffffe00073629f8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 ffs_snapshot() at ffs_snapshot+0x1c27 ffs_mount() at ffs_mount+0xa23 vfs_donmount() at vfs_donmount+0xddc nmount() at nmount+0x63 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b --- lock order reversal: 1st 0xffffff81eb243498 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658 2nd 0xfffffe0007404a30 snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x807 __lockmgr_args() at __lockmgr_args+0xdc6 ffs_lock() at ffs_lock+0x8c VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b _vn_lock() at _vn_lock+0x47 ffs_snapshot() at ffs_snapshot+0x1b02 ffs_mount() at ffs_mount+0xa23 vfs_donmount() at vfs_donmount+0xddc nmount() at nmount+0x63 syscallenter() at syscallenter+0x1aa syscall() at syscall+0x4c Xfast_syscall() at Xfast_syscall+0xdd --- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b --- the system is completely unresponsive after a few seconds and can only be revived by pushing the reset button. When making a snapshot of a larger filesystem it takes a bit longer, but the system will finally lock up. Mark that this is not the usual extreme slowdown due to the snapshot taking all the disk bandwidth: the system locks up tightly and does not recover. Is anybody else seeing this? Is it a known problem? How to proceed? Copied to freebsd-fs@ to elicit more response. Kind regards, Hans Ottevanger From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 19:38:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B940F106566C for ; Mon, 29 Aug 2011 19:38:53 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id 7626E8FC08 for ; Mon, 29 Aug 2011 19:38:52 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:Reply-To:To:Cc:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=eJOA8OnBZEpzw45/VhNZ6yvAJAFntsbLQkQMUnjKnXKttuuTu9tLzqTWHdcD8kQqzDlbmCfiipk0juQfnxAuidYBKS3c9AqQrB+dQQzoHW37IivCmQh6d1U0ruWy3EwT; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Qy7ei-000Hh1-Kb for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 20:38:00 +0100 Received: from 127cr.net ([78.105.122.99] helo=[192.168.1.23]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Qy7ei-000Hgf-7y; Mon, 29 Aug 2011 20:38:00 +0100 From: Luke Marsden To: freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" Organization: Hybrid Web Cluster Date: Mon, 29 Aug 2011 20:38:48 +0100 Message-ID: <1314646728.7898.44.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: tech@hybrid-logic.co.uk Subject: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: luke@hybrid-logic.co.uk List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 19:38:53 -0000 Hi all, I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE. FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar 2 08:29:52 CET 2011 root@www4:/usr/obj/usr/src/sys/GENERIC amd64 There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung. Here is the procstat for the 'zfs umount -f': 13451 104337 zfs - mi_switch+0x176 sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4 dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 And the 'zfs rename's all look the same: 20361 101049 zfs - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4 syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 An 'ls' on a directory which contains most of the system's ZFS mount-points (/hcfs) also hangs: 30073 101466 gnuls - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall +0xe2 If I truss the 'ls' it hangs on the stat syscall: stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0 (0x0) There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or -path /var/db/portsnap -prune -or -print' running which is also hung: 2650 101674 find - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall +0xe2 However I/O to the presently mounted filesystems continues to work (even on parts of filesystems which are unlikely to be cached), and 'zfs list' showing all the filesystems (3,500 filesystems with ~100 snapshots per filesystem) also works. Any activity on the structure of the ZFS hierarchy *under the hcfs filesystem* crashes, such as a 'zfs create hpool/hcfs/test': 70868 101874 zfs - mi_switch+0x176 sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same pool, but not rooted on hpool/hcfs) does not hang, and succeeds normally. procstat -kk on the zfskern process gives: PID TID COMM TDNAME KSTACK 5 100045 zfskern arc_reclaim_thre mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9 fork_exit+0x118 fork_trampoline+0xe 5 100046 zfskern l2arc_feed_threa mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce fork_exit+0x118 fork_trampoline+0xe 5 100098 zfskern txg_thread_enter mi_switch+0x176 sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread +0xb5 fork_exit+0x118 fork_trampoline+0xe 5 100099 zfskern txg_thread_enter mi_switch+0x176 sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe Any ideas on what might be causing this? Thank you for supporting ZFS on FreeBSD! -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 20:04:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 951C41065670 for ; Mon, 29 Aug 2011 20:04:06 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id 436DC8FC0A for ; Mon, 29 Aug 2011 20:04:05 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=oLQW1BRvxgQB3q6snaqh2w69KcdbVZJxrppJiWsvSwfPJs/s45W0AWEKIjTmdRU9H6Bveshmccs85WQj4Peos2IW8N/fD27+Fh0yzfg7wwV8a5Snrlnl15DBbb6F+/Ot; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Qy838-000LL6-0k for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 21:03:14 +0100 Received: from 127cr.net ([78.105.122.99] helo=[192.168.1.23]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Qy837-000LKu-Ko; Mon, 29 Aug 2011 21:03:13 +0100 From: Luke Marsden To: freebsd-fs@freebsd.org In-Reply-To: References: <1314646728.7898.44.camel@pow> Content-Type: text/plain; charset="UTF-8" Organization: Hybrid Logic Date: Mon, 29 Aug 2011 21:04:01 +0100 Message-ID: <1314648241.7898.51.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: tech@hybrid-logic.co.uk Subject: Re: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 20:04:06 -0000 > On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden > wrote: > > Hi all, > > > > I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE. > > > > FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar 2 > > 08:29:52 CET 2011 root@www4:/usr/obj/usr/src/sys/GENERIC amd64 > > > > There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung. > > Here is the procstat for the 'zfs umount -f': > > > > 13451 104337 zfs - mi_switch+0x176 > > sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4 > > dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b > > Xfast_syscall+0xe2 > > > > And the 'zfs rename's all look the same: > > > > 20361 101049 zfs - mi_switch+0x176 > > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4 > > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 > > > > An 'ls' on a directory which contains most of the system's ZFS > > mount-points (/hcfs) also hangs: > > > > 30073 101466 gnuls - mi_switch+0x176 > > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred > > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall > > +0xe2 > > > > If I truss the 'ls' it hangs on the stat syscall: > > stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0 > > (0x0) > > > > There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp > > -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or > > -path /var/db/portsnap -prune -or -print' running which is also hung: > > > > 2650 101674 find - mi_switch+0x176 > > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred > > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall > > +0xe2 > > > > However I/O to the presently mounted filesystems continues to work (even > > on parts of filesystems which are unlikely to be cached), and 'zfs list' > > showing all the filesystems (3,500 filesystems with ~100 snapshots per > > filesystem) also works. > > > > Any activity on the structure of the ZFS hierarchy *under the hcfs > > filesystem* crashes, such as a 'zfs create hpool/hcfs/test': > > > > 70868 101874 zfs - mi_switch+0x176 > > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce > > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 > > > > BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same > > pool, but not rooted on hpool/hcfs) does not hang, and succeeds > > normally. > > > > procstat -kk on the zfskern process gives: > > > > PID TID COMM TDNAME > > KSTACK > > 5 100045 zfskern arc_reclaim_thre mi_switch+0x176 > > sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9 > > fork_exit+0x118 fork_trampoline+0xe > > 5 100046 zfskern l2arc_feed_threa mi_switch+0x176 > > sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce > > fork_exit+0x118 fork_trampoline+0xe > > 5 100098 zfskern txg_thread_enter mi_switch+0x176 > > sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread > > +0xb5 fork_exit+0x118 fork_trampoline+0xe > > 5 100099 zfskern txg_thread_enter mi_switch+0x176 > > sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c > > txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe > > > > Any ideas on what might be causing this? > > It sounds like the bug Martin Matuska has recently fixed in FreeBSD > and reported upstream to Illumos: > https://www.illumos.org/issues/1313 > > The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th. Thank you for such a quick response, but I'm not sure it's the right solution. The uptime on this server is 25 days, which is less than 28. Also, I would expect the solution you described to cause issues globally for the zpool, but the hangs only happens localised to one filesystem. Also the bug report refers to slowing down ZFS writes, but this is a hang (as if caused by a lock that ought to have been freed) on reads. hybrid@ns382210:~$ sudo sysctl -a|grep tick kern.clockrate: { hz = 1000, tick = 1000, profhz = 2000, stathz = 133 } kern.timecounter.tick: 1 debug.tickdelay: 2 By the way, the server has 16GB of RAM with 5.8GB free, so I don't suspect memory pressure causing the issue. The zpool is 3.19T striped over two disks. -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Mobile: +1-415-449-1165 (US) / +447791750420 (UK) From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 20:17:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48D08106566C for ; Mon, 29 Aug 2011 20:17:17 +0000 (UTC) (envelope-from geo.liaskos@gmail.com) Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com [209.85.216.54]) by mx1.freebsd.org (Postfix) with ESMTP id 0D2308FC08 for ; Mon, 29 Aug 2011 20:17:16 +0000 (UTC) Received: by qwc9 with SMTP id 9so4396676qwc.13 for ; Mon, 29 Aug 2011 13:17:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=gL7XjojeDOJDxQwS1rIseqH57jOLDsOh6415zLCg3ZY=; b=sjowSAYN5tOFY+LFet/vxV+KVoy3Fb3m8zTKSH3pYYhK043T51ZhRCeYB5OdzniM80 mROtk38iPewXcRjmvTRZf1GkaTzOZzCCD+ch18J5uMkH7qUuYvKDBwDffb+BOHXOBP99 fcSX0cwHyuduUbaWtNEY2U+fxdEu1YviHfEsE= MIME-Version: 1.0 Received: by 10.229.64.80 with SMTP id d16mr6060950qci.169.1314647409710; Mon, 29 Aug 2011 12:50:09 -0700 (PDT) Received: by 10.229.89.138 with HTTP; Mon, 29 Aug 2011 12:50:09 -0700 (PDT) Date: Mon, 29 Aug 2011 22:50:09 +0300 Message-ID: From: George Liaskos To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 Subject: NFSv4: After upgrade to 9 users can no longer list files. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 20:17:17 -0000 Hello, I upgraded my home server the past weekend from 8.2-STABLE to 9, after the upgrade users can no longer list the files / directories of a mount from a client machine. I am using nfsv4 exports for almost a year now, never had an issue, i did not change the configuration during / after the upgrade. My kernel config was using NFSD already. Some server config info; i am exporting ZFS file systems: [/etc/exports] V4: /usr/local/data -sec=sys -network 192.168.0.0/24 /usr/local/data/downloads -network 192.168.0.0/24 -maproot=root /usr/local/data/software -network 192.168.0.0/24 -maproot=root [/etc/rc.conf] rpcbind_enable="YES" nfs_server_enable="YES" nfsv4_server_enable="YES" nfsuserd_enable="YES" mountd_flags="-r -l" mountd_enable="YES" I am able to mount from the clients, root can list everything but other users can't either from console or from a file browser. I can still blindly cat / touch files, everything works except list. The same goes with local mounts on the server. Thank you in advance for your help. Regards, George From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 20:25:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 48B381065673 for ; Mon, 29 Aug 2011 20:25:43 +0000 (UTC) (envelope-from artemb@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0B6878FC0C for ; Mon, 29 Aug 2011 20:25:42 +0000 (UTC) Received: by gyd10 with SMTP id 10so6147247gyd.13 for ; Mon, 29 Aug 2011 13:25:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=xY8wnhmw5VA5uBNBWZo4kd8BrqQdiyWP5JlnV3ndPRk=; b=lEvXLEclnSZXsJ820rIK76gZ+WBdd6Fx1v4nfC4e9QvhiVSS3LmYnfsQ1Z6PphnlCu dfIong+gk28X45yV7gbA1EbWk/6jSQrDYB1BqMAKpDIfjk3ssfnA9yx+To5NTQerER3F iVnMTsqPXzwNxY4QIFn3Wp92e6MI5MlxwQBng= MIME-Version: 1.0 Received: by 10.236.173.131 with SMTP id v3mr27597149yhl.112.1314647726058; Mon, 29 Aug 2011 12:55:26 -0700 (PDT) Sender: artemb@gmail.com Received: by 10.236.102.147 with HTTP; Mon, 29 Aug 2011 12:55:26 -0700 (PDT) In-Reply-To: <1314646728.7898.44.camel@pow> References: <1314646728.7898.44.camel@pow> Date: Mon, 29 Aug 2011 12:55:26 -0700 X-Google-Sender-Auth: NRZVv_XlzhZS7ZFKKebPwgBY6o0 Message-ID: From: Artem Belevich To: luke@hybrid-logic.co.uk Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk Subject: Re: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 20:25:43 -0000 On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden wrote: > Hi all, > > I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE. > > FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar =A02 > 08:29:52 CET 2011 =A0 =A0 root@www4:/usr/obj/usr/src/sys/GENERIC =A0amd64 > > There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung. > Here is the procstat for the 'zfs umount -f': > > 13451 104337 zfs =A0 =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0mi_switch+0x176 > sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4 > dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b > Xfast_syscall+0xe2 > > And the 'zfs rename's all look the same: > > 20361 101049 zfs =A0 =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0mi_switch+0x176 > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4 > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 > > An 'ls' on a directory which contains most of the system's ZFS > mount-points (/hcfs) also hangs: > > 30073 101466 gnuls =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0mi_switch+0x176 > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall > +0xe2 > > If I truss the 'ls' it hangs on the stat syscall: > stat("/hcfs",{ mode=3Ddrwxr-xr-x ,inode=3D3,size=3D2012,blksize=3D16384 }= ) =3D 0 > (0x0) > > There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp > -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or > -path /var/db/portsnap -prune -or -print' running which is also hung: > > =A02650 101674 find =A0 =A0 =A0 =A0 =A0 =A0 - =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0mi_switch+0x176 > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall > +0xe2 > > However I/O to the presently mounted filesystems continues to work (even > on parts of filesystems which are unlikely to be cached), and 'zfs list' > showing all the filesystems (3,500 filesystems with ~100 snapshots per > filesystem) also works. > > Any activity on the structure of the ZFS hierarchy *under the hcfs > filesystem* crashes, such as a 'zfs create hpool/hcfs/test': > > 70868 101874 zfs =A0 =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0mi_switch+0x176 > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 > > BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same > pool, but not rooted on hpool/hcfs) does not hang, and succeeds > normally. > > procstat -kk on the zfskern process gives: > > =A0PID =A0 =A0TID COMM =A0 =A0 =A0 =A0 =A0 =A0 TDNAME > KSTACK > =A0 =A05 100045 zfskern =A0 =A0 =A0 =A0 =A0arc_reclaim_thre mi_switch+0x1= 76 > sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9 > fork_exit+0x118 fork_trampoline+0xe > =A0 =A05 100046 zfskern =A0 =A0 =A0 =A0 =A0l2arc_feed_threa mi_switch+0x1= 76 > sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce > fork_exit+0x118 fork_trampoline+0xe > =A0 =A05 100098 zfskern =A0 =A0 =A0 =A0 =A0txg_thread_enter mi_switch+0x1= 76 > sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread > +0xb5 fork_exit+0x118 fork_trampoline+0xe > =A0 =A05 100099 zfskern =A0 =A0 =A0 =A0 =A0txg_thread_enter mi_switch+0x1= 76 > sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c > txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe > > Any ideas on what might be causing this? It sounds like the bug Martin Matuska has recently fixed in FreeBSD and reported upstream to Illumos: https://www.illumos.org/issues/1313 The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th. --Artem > > Thank you for supporting ZFS on FreeBSD! > > -- > Best Regards, > Luke Marsden > CTO, Hybrid Logic Ltd. > > Web: http://www.hybrid-cluster.com/ > Hybrid Web Cluster - cloud web hosting > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 20:28:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8F411065678 for ; Mon, 29 Aug 2011 20:28:25 +0000 (UTC) (envelope-from ee@athyriogames.com) Received: from madonna.sslcatacombnetworking.com (madonna.sslcatacombnetworking.com [174.133.19.130]) by mx1.freebsd.org (Postfix) with ESMTP id BA4318FC18 for ; Mon, 29 Aug 2011 20:28:25 +0000 (UTC) Received: from c-98-206-215-156.hsd1.in.comcast.net ([98.206.215.156] helo=laptopv) by madonna.sslcatacombnetworking.com with esmtpa (Exim 4.69) (envelope-from ) id 1Qy797-0000Yz-A0 for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 14:05:21 -0500 From: "Engineering" To: Date: Mon, 29 Aug 2011 14:15:08 -0500 Message-ID: <01c801cc667f$f99eb7b0$ecdc2710$@com> MIME-Version: 1.0 X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acxmf/bkxg/WNgMwTaWJDIhaLpbyEA== Content-Language: en-us X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - madonna.sslcatacombnetworking.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - athyriogames.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Read-only disk problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 20:28:26 -0000 Hello all. Please let me know if this is the wrong place to ask. I am working on an embedded system using FreeBSD 7.2, bootinf and running off of flash memory. In order to not burn out the flash, I use the 'diskless' scripts and mount the flash read-only. I have used this configuration successfully in the past. I've recently added a utility to check for disk corruption, basically checksumming the / and /usr partitions. Since they are both read-only, I thought this would work. What I have discovered is that something in the partition is changing between boots. I dd'd the flash over a couple of boots, and compared the binaries to see what was changing. It is a small amount of data, spread across the disk, in an interval that looks very similar to the interval of the 'superblocks' Is there any data that is written to the disk at boot or mount time, and if so, is there a way to prevent it? Thanks Sam From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 20:54:10 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 777F61065673; Mon, 29 Aug 2011 20:54:10 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id EE04F8FC0A; Mon, 29 Aug 2011 20:54:09 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 4D939190586; Mon, 29 Aug 2011 22:54:09 +0200 (CEST) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id CqopX2gWjr6A; Mon, 29 Aug 2011 22:54:06 +0200 (CEST) Received: from [10.9.8.1] (188-167-78-15.dynamic.chello.sk [188.167.78.15]) by mail.vx.sk (Postfix) with ESMTPSA id 99B71190578; Mon, 29 Aug 2011 22:54:06 +0200 (CEST) Message-ID: <4E5BFC6F.5080507@FreeBSD.org> Date: Mon, 29 Aug 2011 22:54:07 +0200 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20110812 Thunderbird/6.0 MIME-Version: 1.0 To: Artem Belevich References: <1314646728.7898.44.camel@pow> In-Reply-To: X-Enigmail-Version: 1.3.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk, luke@hybrid-logic.co.uk Subject: Re: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 20:54:10 -0000 On 29. 8. 2011 21:55, Artem Belevich wrote: > On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden > wrote: >> Hi all, >> >> I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE. >> >> FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar 2 >> 08:29:52 CET 2011 root@www4:/usr/obj/usr/src/sys/GENERIC amd64 >> >> There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung. >> Here is the procstat for the 'zfs umount -f': >> >> 13451 104337 zfs - mi_switch+0x176 >> sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4 >> dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b >> Xfast_syscall+0xe2 >> >> And the 'zfs rename's all look the same: >> >> 20361 101049 zfs - mi_switch+0x176 >> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV >> +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4 >> syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 >> >> An 'ls' on a directory which contains most of the system's ZFS >> mount-points (/hcfs) also hangs: >> >> 30073 101466 gnuls - mi_switch+0x176 >> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV >> +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred >> +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall >> +0xe2 >> >> If I truss the 'ls' it hangs on the stat syscall: >> stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0 >> (0x0) >> >> There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp >> -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or >> -path /var/db/portsnap -prune -or -print' running which is also hung: >> >> 2650 101674 find - mi_switch+0x176 >> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV >> +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred >> +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall >> +0xe2 >> >> However I/O to the presently mounted filesystems continues to work (even >> on parts of filesystems which are unlikely to be cached), and 'zfs list' >> showing all the filesystems (3,500 filesystems with ~100 snapshots per >> filesystem) also works. >> >> Any activity on the structure of the ZFS hierarchy *under the hcfs >> filesystem* crashes, such as a 'zfs create hpool/hcfs/test': >> >> 70868 101874 zfs - mi_switch+0x176 >> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV >> +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce >> syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 >> >> BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same >> pool, but not rooted on hpool/hcfs) does not hang, and succeeds >> normally. >> >> procstat -kk on the zfskern process gives: >> >> PID TID COMM TDNAME >> KSTACK >> 5 100045 zfskern arc_reclaim_thre mi_switch+0x176 >> sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9 >> fork_exit+0x118 fork_trampoline+0xe >> 5 100046 zfskern l2arc_feed_threa mi_switch+0x176 >> sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce >> fork_exit+0x118 fork_trampoline+0xe >> 5 100098 zfskern txg_thread_enter mi_switch+0x176 >> sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread >> +0xb5 fork_exit+0x118 fork_trampoline+0xe >> 5 100099 zfskern txg_thread_enter mi_switch+0x176 >> sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c >> txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe >> >> Any ideas on what might be causing this? > It sounds like the bug Martin Matuska has recently fixed in FreeBSD > and reported upstream to Illumos: > https://www.illumos.org/issues/1313 > > The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th. > > --Artem No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed to stable/8 as r225100). The corresponding patch is: http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch -- Martin Matuska FreeBSD committer http://blog.vx.sk From owner-freebsd-fs@FreeBSD.ORG Mon Aug 29 22:02:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A54C0106564A for ; Mon, 29 Aug 2011 22:02:34 +0000 (UTC) (envelope-from luke@digital-crocus.com) Received: from mail.digital-crocus.com (node2.digital-crocus.com [91.209.244.128]) by mx1.freebsd.org (Postfix) with ESMTP id 5CA2E8FC16 for ; Mon, 29 Aug 2011 22:02:33 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector; d=hybrid-logic.co.uk; h=Received:Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse; b=IA1dEe16vsCBPNPFuyUogNAa+3uMzCWdxpHOxNfWzqA0rJYqr+xc0G5QnpKWV8ivCzcJIoOSCgpF2dbzP591ar2e9en6cefnedej11ABZ/j79fpXR3Vi1y5NcarLrSOH; Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Qy9tm-000AMo-63 for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 23:01:42 +0100 Received: from 127cr.net ([78.105.122.99] helo=[192.168.1.23]) by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1Qy9tl-000AMW-Oz; Mon, 29 Aug 2011 23:01:42 +0100 From: Luke Marsden To: Martin Matuska In-Reply-To: <4E5BFC6F.5080507@FreeBSD.org> References: <1314646728.7898.44.camel@pow> <4E5BFC6F.5080507@FreeBSD.org> Content-Type: text/plain; charset="UTF-8" Organization: Hybrid Logic Date: Mon, 29 Aug 2011 23:02:29 +0100 Message-ID: <1314655349.7898.53.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-Score: -1.0 X-Digital-Crocus-Maillimit: done X-Authenticated-Sender: luke X-Complaints: abuse@digital-crocus.com X-Admin: admin@digital-crocus.com X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse reports) Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk Subject: Re: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Aug 2011 22:02:34 -0000 On Mon, 2011-08-29 at 22:54 +0200, Martin Matuska wrote: > >> procstat -kk on the zfskern process gives: > >> > >> PID TID COMM TDNAME > >> KSTACK > >> 5 100045 zfskern arc_reclaim_thre mi_switch+0x176 > >> sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9 > >> fork_exit+0x118 fork_trampoline+0xe > >> 5 100046 zfskern l2arc_feed_threa mi_switch+0x176 > >> sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce > >> fork_exit+0x118 fork_trampoline+0xe > >> 5 100098 zfskern txg_thread_enter mi_switch+0x176 > >> sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread > >> +0xb5 fork_exit+0x118 fork_trampoline+0xe > >> 5 100099 zfskern txg_thread_enter mi_switch+0x176 > >> sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c > >> txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe > >> > >> Any ideas on what might be causing this? > > It sounds like the bug Martin Matuska has recently fixed in FreeBSD > > and reported upstream to Illumos: > > https://www.illumos.org/issues/1313 > > > > The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th. > > > > --Artem > No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed > to stable/8 as r225100). > > The corresponding patch is: > http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch > Great, thanks! Will this patch apply to ZFS v15? We can't upgrade to v28 yet. -- Best Regards, Luke Marsden CTO, Hybrid Logic Ltd. Web: http://www.hybrid-cluster.com/ Hybrid Web Cluster - cloud web hosting Mobile: +1-415-449-1165 (US) / +447791750420 (UK) From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 01:20:48 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9356C106566B; Tue, 30 Aug 2011 01:20:48 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 27EEB8FC0C; Tue, 30 Aug 2011 01:20:48 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAHI6XE6DaFvO/2dsb2JhbABChEykMYFAAQEBAQMBAQEgKyALGw4KAgINGQIpAQkmBggHBAEcBIdVpw+RdoEshA+BEQSRDoIRkSA X-IronPort-AV: E=Sophos;i="4.68,299,1312171200"; d="scan'208";a="132621741" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Aug 2011 21:20:47 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7EFC4B3F06; Mon, 29 Aug 2011 21:20:47 -0400 (EDT) Date: Mon, 29 Aug 2011 21:20:47 -0400 (EDT) From: Rick Macklem To: George Liaskos Message-ID: <1173512509.517816.1314667247490.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 01:20:48 -0000 George Liaskos wrote: > Hello, > > I upgraded my home server the past weekend from 8.2-STABLE to 9, > after the upgrade users can no longer list the files / directories of > a mount > from a client machine. > > I am using nfsv4 exports for almost a year now, never had an issue, i > did not > change the configuration during / after the upgrade. My kernel config > was > using NFSD already. > > Some server config info; i am exporting ZFS file systems: > > [/etc/exports] > V4: /usr/local/data -sec=sys -network 192.168.0.0/24 > /usr/local/data/downloads -network 192.168.0.0/24 -maproot=root > /usr/local/data/software -network 192.168.0.0/24 -maproot=root > > [/etc/rc.conf] > rpcbind_enable="YES" > nfs_server_enable="YES" > nfsv4_server_enable="YES" > nfsuserd_enable="YES" > mountd_flags="-r -l" > mountd_enable="YES" > > I am able to mount from the clients, root can list everything but > other users can't > either from console or from a file browser. I can still blindly cat / > touch files, > everything works except list. The same goes with local mounts on the > server. > Well, if non-root users can't "ls" locally on the server, this sounds more like a ZFS issue than an NFS one. (I don't see this w.r.t. NFS when exporting a UFS volume.) I don't know anything about ZFS. I've added a couple of the ZFS guys to the cc list, in case they don't read posts with NFS in the subject line. rick > Thank you in advance for your help. > > Regards, > George > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 02:10:28 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9BD0E106566B; Tue, 30 Aug 2011 02:10:28 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 73CAD8FC12; Tue, 30 Aug 2011 02:10:28 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7U2AS1d026805; Tue, 30 Aug 2011 02:10:28 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7U2ASDY026795; Tue, 30 Aug 2011 02:10:28 GMT (envelope-from linimon) Date: Tue, 30 Aug 2011 02:10:28 GMT Message-Id: <201108300210.p7U2ASDY026795@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/160283: [zfs] [patch] 'zfs list' does abort in make_dataset_handle X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 02:10:28 -0000 Old Synopsis: 'zfs list' does abort in make_dataset_handle New Synopsis: [zfs] [patch] 'zfs list' does abort in make_dataset_handle Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue Aug 30 02:09:45 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=160283 From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 08:37:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 979C6106564A; Tue, 30 Aug 2011 08:37:05 +0000 (UTC) (envelope-from geo.liaskos@gmail.com) Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com [209.85.216.175]) by mx1.freebsd.org (Postfix) with ESMTP id DD8DD8FC0A; Tue, 30 Aug 2011 08:37:04 +0000 (UTC) Received: by qyk4 with SMTP id 4so2454570qyk.13 for ; Tue, 30 Aug 2011 01:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=v5Hdo+B71fSU0VG+l2LC1ElJC7S61EGGP08atC0Qw7g=; b=jKVx+7Smr2TUoVQ2uVaxsgirxLCZQCzXmxhdwU7MTIXKGLYzNz7kNYz9MSivdP2Lyb 1wzsQUVMv+7f0umMXvGvmZebifBYYa0Y2e0ZbSTJZy85mMYDnWFrn7IFub8QC/IhENJJ tPdTsPcuLyPn1oXLGOD2LCgm3pJGc14Z/Khfs= MIME-Version: 1.0 Received: by 10.224.27.68 with SMTP id h4mr2492217qac.335.1314693423932; Tue, 30 Aug 2011 01:37:03 -0700 (PDT) Received: by 10.229.89.138 with HTTP; Tue, 30 Aug 2011 01:37:03 -0700 (PDT) In-Reply-To: <1173512509.517816.1314667247490.JavaMail.root@erie.cs.uoguelph.ca> References: <1173512509.517816.1314667247490.JavaMail.root@erie.cs.uoguelph.ca> Date: Tue, 30 Aug 2011 11:37:03 +0300 Message-ID: From: George Liaskos To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 08:37:05 -0000 > Well, if non-root users can't "ls" locally on the server, this sounds more > like a ZFS issue than an NFS one. (I don't see this w.r.t. NFS when exporting > a UFS volume.) > > I don't know anything about ZFS. I've added a couple of the ZFS guys to the > cc list, in case they don't read posts with NFS in the subject line. > > rick Just to be clear, non root users can't ls mounted exports on the server. Using ls directly on the ZFS file system works. I exported a UFS directory, everything works... So this is either a ZFS or an ACL related issue. I will setup a clean VM to see if i can reproduce this. Thank you for your response. Regards, George From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 14:50:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D8D19106566B for ; Tue, 30 Aug 2011 14:50:30 +0000 (UTC) (envelope-from ee@athyriogames.com) Received: from madonna.sslcatacombnetworking.com (madonna.sslcatacombnetworking.com [174.133.19.130]) by mx1.freebsd.org (Postfix) with ESMTP id B713D8FC19 for ; Tue, 30 Aug 2011 14:50:30 +0000 (UTC) Received: from c-98-206-215-156.hsd1.in.comcast.net ([98.206.215.156] helo=laptopv) by madonna.sslcatacombnetworking.com with esmtpa (Exim 4.69) (envelope-from ) id 1QyPTr-0004c9-Mt for freebsd-fs@freebsd.org; Tue, 30 Aug 2011 09:39:59 -0500 From: "Engineering" To: References: <01c801cc667f$f99eb7b0$ecdc2710$@com> In-Reply-To: <01c801cc667f$f99eb7b0$ecdc2710$@com> Date: Tue, 30 Aug 2011 09:49:41 -0500 Message-ID: <020d01cc6724$0f0410b0$2d0c3210$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acxmf/bkxg/WNgMwTaWJDIhaLpbyEAAo4bzQ Content-Language: en-us X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - madonna.sslcatacombnetworking.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - athyriogames.com Subject: RE: Read-only disk problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 14:50:30 -0000 Hi, I've attached some more info. Doing a fsdump shows the following changes over reboot magic 19540119 (UFS2) time Tue Aug 30 03:08:04 2011 ... cg 1: magic 90255 tell 4b1c000 time Tue Aug 30 03:08:04 2011 Changes to magic 19540119 (UFS2) time Tue Aug 30 03:13:14 2011 ... cg 1: magic 90255 tell 4b1c000 time Tue Aug 30 03:13:14 2011 Sam -----Original Message----- From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On Behalf Of Engineering Sent: Monday, August 29, 2011 2:15 PM To: freebsd-fs@freebsd.org Subject: Read-only disk problem Please let me know if this is the wrong place to ask. I am working on an embedded system using FreeBSD 7.2, bootinf and running off of flash memory. In order to not burn out the flash, I use the 'diskless' scripts and mount the flash read-only. I have used this configuration successfully in the past. I've recently added a utility to check for disk corruption, basically checksumming the / and /usr partitions. Since they are both read-only, I thought this would work. What I have discovered is that something in the partition is changing between boots. I dd'd the flash over a couple of boots, and compared the binaries to see what was changing. It is a small amount of data, spread across the disk, in an interval that looks very similar to the interval of the 'superblocks' Is there any data that is written to the disk at boot or mount time, and if so, is there a way to prevent it? Thanks Sam _______________________________________________ freebsd-fs@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-fs To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 15:10:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DD4B11065670; Tue, 30 Aug 2011 15:10:27 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 34F0C8FC23; Tue, 30 Aug 2011 15:10:26 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EACD9XE6DaFvO/2dsb2JhbABChEykSIFAAQEEASMEUgUWDgoCAg0ZAlkGiAWnW5IJgSyEEIERBJMkkSE X-IronPort-AV: E=Sophos;i="4.68,302,1312171200"; d="scan'208";a="135967478" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 30 Aug 2011 11:10:14 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5CC87B3F05; Tue, 30 Aug 2011 11:10:14 -0400 (EDT) Date: Tue, 30 Aug 2011 11:10:14 -0400 (EDT) From: Rick Macklem To: George Liaskos Message-ID: <1005169645.540203.1314717014356.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 15:10:27 -0000 George Liaskos wrote: > > Well, if non-root users can't "ls" locally on the server, this > > sounds more > > like a ZFS issue than an NFS one. (I don't see this w.r.t. NFS when > > exporting > > a UFS volume.) > > > > I don't know anything about ZFS. I've added a couple of the ZFS guys > > to the > > cc list, in case they don't read posts with NFS in the subject line. > > > > rick > > Just to be clear, non root users can't ls mounted exports on the > server. > Using ls directly on the ZFS file system works. > > I exported a UFS directory, everything works... So this is either a > ZFS or > an ACL related issue. I will setup a clean VM to see if i can > reproduce this. > You could try this patch and see what effect it has (applied to the server). It just disables the access check for readdir. --- nfs_nfsdport.c.sav2 2011-08-30 10:35:58.000000000 -0400 +++ nfs_nfsdport.c 2011-08-30 10:36:54.000000000 -0400 @@ -1838,10 +1838,12 @@ nfsrvd_readdirplus(struct nfsrv_descript nd->nd_repstat = NFSERR_NOTDIR; if (!nd->nd_repstat && cnt == 0) nd->nd_repstat = NFSERR_TOOSMALL; +#ifdef notnow if (!nd->nd_repstat) nd->nd_repstat = nfsvno_accchk(vp, VEXEC, nd->nd_cred, exp, p, NFSACCCHK_NOOVERRIDE, NFSACCCHK_VPISLOCKED, NULL); +#endif if (nd->nd_repstat) { vput(vp); if (nd->nd_flag & ND_NFSV3) This wouldn't be suitable for a production system, but whether or not it "fixes" the problem would give us an indication of where the problem is. Also, if you could clarify when your 8/stable was downloaded, whether your 9.0 upgrade was to vanilla Beta1 or ??? and details w.r.t. your ZFS setup, that might help. And one more... If you could create a fresh ZFS pool/volume and export that to see if it exhibits the same problem, that information could help figure it out, too. Please let us know how it goes, rick > Thank you for your response. > > Regards, > George From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 19:10:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 30F3D1065670 for ; Tue, 30 Aug 2011 19:10:28 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id C648B8FC1B for ; Tue, 30 Aug 2011 19:10:27 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 14D604AC31 for ; Tue, 30 Aug 2011 23:10:26 +0400 (MSD) Date: Tue, 30 Aug 2011 23:10:24 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1945418039.20110830231024@serebryakov.spb.ru> To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Subject: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 19:10:28 -0000 Hello, Freebsd-fs. Now, when I "defragmented" my large FS, I see very inconsistent read speeds on same files. Is it Ok? My setup is: (1) FreeBSD 8.2-STABLE/x64 (2) E4400 CPU, 2GiB RAM (3) 5xHDDs in RAID5 (software), controller is ICH9R. (4) UFS2 with 32KiB block, vfs.read_max=3D32 (1MiB read-ahead). (5) System and swap on another (6th) HDD, but swap is unused. (6) No periodic or background processes access FS in question at all. Simple program reads each of 12 files (420MiB each) 15 times in cycle like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed, as pr= ograme returns to dame data every ~5.5GiB and here are only 2GiB physical memory in system. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 19:18:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 103891065670 for ; Tue, 30 Aug 2011 19:18:19 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id CA7188FC08 for ; Tue, 30 Aug 2011 19:18:18 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id B60674AC31 for ; Tue, 30 Aug 2011 23:18:17 +0400 (MSD) Date: Tue, 30 Aug 2011 23:18:15 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <317753422.20110830231815@serebryakov.spb.ru> To: freebsd-fs@freebsd.org In-Reply-To: <1945418039.20110830231024@serebryakov.spb.ru> References: <1945418039.20110830231024@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Subject: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 19:18:19 -0000 Hello, Freebsd-fs. You wrote 30 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 23:10:24: SORRY FOR SENDING INCOMPLETE MESSAGE! Now, when I "defragmented" my large FS, I see very inconsistent read speeds on same files. Is it Ok? My setup is: (1) FreeBSD 8.2-STABLE/x64 (2) E4400 CPU, 2GiB RAM (3) 5xHDDs in RAID5 (software), controller is ICH9R. (4) UFS2 with 32KiB block, vfs.read_max=3D32 (1MiB read-ahead). (5) System and swap on another (6th) HDD, but swap is unused. (6) No periodic or background processes access FS in question at all. Simple program reads each of 12 files (460MiB each) 15 times in cycle like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed, as reading process returns to same data every ~5.5GiB and here are only 2GiB physical memory in system. And speed of these reads are VERY inconsistent. I've calculated min/average/max and standard deviation and results are like this: Name Min/Avg/Max StdDev r012f02.nef 120/235/413 MiB/s 83 r012f09.nef 154/248/393 MiB/s 80 r012f12.nef 106/212/293 MiB/s 63 r012f05.nef 86/206/280 MiB/s 62 r012f08.nef 128/223/332 MiB/s 60 r012f11.nef 155/257/327 MiB/s 56 r012f03.nef 121/213/279 MiB/s 52 r012f10.nef 120/226/284 MiB/s 45 r012f07.nef 121/199/249 MiB/s 41 r012f01.nef 135/199/242 MiB/s 33 It is results from 15 runs! One time file was read at sustained average speed 120MiB/s (~3.8 seconds) and next time it was 413MiB/s (only ~1.1 second!) And it is not case when first read is slowest. No. Sometimes last one is slowest, for example. Is it Ok? I'm very disappointed to see 120MiB/s when I know that hardware can give 415MiB/s, but something strange slows down the process. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 20:09:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A570F106564A; Tue, 30 Aug 2011 20:09:09 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235]) by mx1.freebsd.org (Postfix) with ESMTP id 8682F8FC0C; Tue, 30 Aug 2011 20:09:09 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p7UK9CBQ085481; Tue, 30 Aug 2011 13:09:12 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201108302009.p7UK9CBQ085481@chez.mckusick.com> To: lev@freebsd.org In-reply-to: <317753422.20110830231815@serebryakov.spb.ru> Date: Tue, 30 Aug 2011 13:09:12 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 20:09:09 -0000 Now that you have defragmented your filesystem, we can factor that out of the equation. What is left is the management of the memory used for caching the files. I expect what is happening is that you are busily reading along and run out of free memory in which to read. This triggers a cleanup thread that churns through the memory pool to decide what should be thrown out to make room. Your reading process is demanding memory faster than the cleanup thread can produce it. The result is that your read idles (e.g., appears to run slowly). It is random because it depends on when you run out of memory. The cleanup is complex because it has to deal with all of memory and it wants to avoid a simple LRU which would cause the read of a large file to throw out a lot of things you would rather keep (like your window manager, browser, etc). Not sure what the best strategy is here. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 22:14:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 20945106564A for ; Tue, 30 Aug 2011 22:14:09 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id B21688FC12 for ; Tue, 30 Aug 2011 22:14:08 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id E209A4AC31; Wed, 31 Aug 2011 02:14:06 +0400 (MSD) Date: Wed, 31 Aug 2011 02:14:04 +0400 From: Lev Serebryakov Organization: FreeBSD Project X-Priority: 3 (Normal) Message-ID: <103666698.20110831021404@serebryakov.spb.ru> To: Kirk McKusick In-Reply-To: <201108302009.p7UK9CBQ085481@chez.mckusick.com> References: <317753422.20110830231815@serebryakov.spb.ru> <201108302009.p7UK9CBQ085481@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 22:14:09 -0000 Hello, Kirk. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 0:09:12: > Now that you have defragmented your filesystem, we can factor that out > of the equation. What is left is the management of the memory used for Yep. > caching the files. I expect what is happening is that you are busily > reading along and run out of free memory in which to read. This triggers > a cleanup thread that churns through the memory pool to decide what > should be thrown out to make room. Your reading process is demanding > memory faster than the cleanup thread can produce it. The result is > that your read idles (e.g., appears to run slowly). It is random because > it depends on when you run out of memory. It is interesting. But this box have two real cores (yes, I know, now it is ``only two cores'' :)) and nothing to do but this test program (single-threaded). Of course, this program read data into same buffer and didn't allocate anything in process of benchmark. Yes, I know, that kernel prefer not to throw away data even if it is not needed right now, but here decision looks very simple :) And, one more detail: I use O_DIRECT flag in open(2). Other interesting observing: this program consume about 20% of one core. It is very strange for I/O bound process, isn't it? > The cleanup is complex because it has to deal with all of memory and > it wants to avoid a simple LRU which would cause the read of a large=20 > file to throw out a lot of things you would rather keep (like your=20 > window manager, browser, etc). Not sure what the best strategy is here. window manager and browser look different from one small (128Kb) buffer which is ovewritten again and again. As far as I understand, FreeBSD's VM (according to "Design and Implementation of...) should move such buffers (data read recently, not changed and not needed) into "Inact" state and it is easy task to re-use these buffers -- they don't belong to any active process, they don't need to be paged or swapped out, etc. I'll try this experiment with mmap() and touching every 4096-th byte of mapped memory instead of read(2). --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 22:29:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09E3C106564A for ; Tue, 30 Aug 2011 22:29:37 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id C4F628FC12 for ; Tue, 30 Aug 2011 22:29:36 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 0580D4AC58; Wed, 31 Aug 2011 02:29:35 +0400 (MSD) Date: Wed, 31 Aug 2011 02:29:33 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <1693072185.20110831022933@serebryakov.spb.ru> To: Kirk McKusick , freebsd-fs@freebsd.org In-Reply-To: <103666698.20110831021404@serebryakov.spb.ru> References: <317753422.20110830231815@serebryakov.spb.ru> <201108302009.p7UK9CBQ085481@chez.mckusick.com> <103666698.20110831021404@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 22:29:37 -0000 Hello, Kirk. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 2:14:04: > I'll try this experiment with mmap() and touching every 4096-th byte of > mapped memory instead of read(2). Strange enough, it gives only 40-50MiB/s and results are very consistent. It really surprise me. I didn't think, that there will be so much difference, I was sure, that it will be almost equivalent speed. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 22:31:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C74DE106564A for ; Tue, 30 Aug 2011 22:31:59 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 8E5088FC16 for ; Tue, 30 Aug 2011 22:31:59 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 4DD334AC31; Wed, 31 Aug 2011 02:31:58 +0400 (MSD) Date: Wed, 31 Aug 2011 02:31:55 +0400 From: Lev Serebryakov Organization: FreeBSD Project X-Priority: 3 (Normal) Message-ID: <612137475.20110831023155@serebryakov.spb.ru> To: Kirk McKusick In-Reply-To: <201108302009.p7UK9CBQ085481@chez.mckusick.com> References: <317753422.20110830231815@serebryakov.spb.ru> <201108302009.p7UK9CBQ085481@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 22:31:59 -0000 Hello, Kirk. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 0:09:12: > memory faster than the cleanup thread can produce it. The result is > that your read idles (e.g., appears to run slowly). It is random because > it depends on when you run out of memory. BTW, it could explain why some runs are slower than other. But my situation looks like opposite: some runs much faster than others. And it could not be read-from-cache if VM is sane. It is hard to belive, that VM will store 0.5GiB of read-once data when another 5GiB was read after that only in 2GiB of physical memory --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 23:00:43 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A0597106566B; Tue, 30 Aug 2011 23:00:43 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235]) by mx1.freebsd.org (Postfix) with ESMTP id 82F168FC14; Tue, 30 Aug 2011 23:00:43 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p7UN0jJ6022811; Tue, 30 Aug 2011 16:00:45 -0700 (PDT) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201108302300.p7UN0jJ6022811@chez.mckusick.com> To: lev@FreeBSD.org In-reply-to: <1693072185.20110831022933@serebryakov.spb.ru> Date: Tue, 30 Aug 2011 16:00:45 -0700 From: Kirk McKusick X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY autolearn=failed version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com Cc: freebsd-fs@FreeBSD.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 23:00:43 -0000 > Date: Wed, 31 Aug 2011 02:29:33 +0400 > From: Lev Serebryakov > To: Kirk McKusick , freebsd-fs@FreeBSD.org > Subject: Re: Very inconsistent (read) speed on UFS2 > > Hello, Kirk. > > > I'll try this experiment with mmap() and touching every 4096-th byte of > > mapped memory instead of read(2). > > Strange enough, it gives only 40-50MiB/s and results are very > consistent. > > It really surprise me. I didn't think, that there will be so much > difference, I was sure, that it will be almost equivalent speed. > > -- > // Black Lion AKA Lev Serebryakov I had not realized that you were using O_DIRECT. That would in fact avoid most of the caching / memory-recovery effects that I was blaming earlier. Your test above is definitely hitting them though. My guess is that the consistency is because you are measuring the rate at which free memory can be created. So, my new theory on why your O_DIRECT test is running slowly is due to the single threading in the GEOM layer. Pawel Jakub Dawidek (pjd@) gave a very interesting talk on this problem at this year's BSDCan. Kirk McKusick From owner-freebsd-fs@FreeBSD.ORG Tue Aug 30 23:03:13 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2E111065672; Tue, 30 Aug 2011 23:03:13 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 9AEA18FC0A; Tue, 30 Aug 2011 23:03:13 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p7UMe7il013870; Tue, 30 Aug 2011 17:40:07 -0500 (CDT) Date: Tue, 30 Aug 2011 17:40:07 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Lev Serebryakov In-Reply-To: <1693072185.20110831022933@serebryakov.spb.ru> Message-ID: References: <317753422.20110830231815@serebryakov.spb.ru> <201108302009.p7UK9CBQ085481@chez.mckusick.com> <103666698.20110831021404@serebryakov.spb.ru> <1693072185.20110831022933@serebryakov.spb.ru> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 30 Aug 2011 17:40:07 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Aug 2011 23:03:13 -0000 On Wed, 31 Aug 2011, Lev Serebryakov wrote: > >> I'll try this experiment with mmap() and touching every 4096-th byte of >> mapped memory instead of read(2). > Strange enough, it gives only 40-50MiB/s and results are very > consistent. > > It really surprise me. I didn't think, that there will be so much > difference, I was sure, that it will be almost equivalent speed. FreeBSD does not seem to default to sequential read-ahead when memory mapping is used with sequential page access. Try using madvise() with the MADV_SEQUENTIAL option and see if it helps. There are also MADV_WILLNEED, MADV_DONTNEED, and MADV_FREE. Careful use of these options can help performance quite a lot when data is large compared to memory. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 00:17:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0A32B1065672; Wed, 31 Aug 2011 00:17:58 +0000 (UTC) (envelope-from geo.liaskos@gmail.com) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 4C1478FC1A; Wed, 31 Aug 2011 00:17:57 +0000 (UTC) Received: by yib19 with SMTP id 19so203312yib.13 for ; Tue, 30 Aug 2011 17:17:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=txEr8UwW/UN26zN9v8G1HaWBotqxGa+j0bIv/W1a3ec=; b=vbzkAg/o50XUter5QAinTAecKgFe9+y88Yifln/tTNHs/S/WqTbUIlhr7x8oXYWFUL jZ8woSynsuuqIVHnFNhlXsESRGPvkmrOEGZ4oZfiiBYSaD4tuVYQ2wOvZPUJWs3HAYKp HKJdHeiPRTYm+rVa4D7IxI2OKG6WoNW8ihDuY= MIME-Version: 1.0 Received: by 10.101.3.40 with SMTP id f40mr5713633ani.89.1314749876693; Tue, 30 Aug 2011 17:17:56 -0700 (PDT) Received: by 10.100.42.15 with HTTP; Tue, 30 Aug 2011 17:17:56 -0700 (PDT) In-Reply-To: <1005169645.540203.1314717014356.JavaMail.root@erie.cs.uoguelph.ca> References: <1005169645.540203.1314717014356.JavaMail.root@erie.cs.uoguelph.ca> Date: Wed, 31 Aug 2011 03:17:56 +0300 Message-ID: From: George Liaskos To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 00:17:58 -0000 > You could try this patch and see what effect it has (applied to the > server). It just disables the access check for readdir. > --- nfs_nfsdport.c.sav2 2011-08-30 10:35:58.000000000 -0400 > +++ nfs_nfsdport.c =C2=A0 =C2=A0 =C2=A02011-08-30 10:36:54.000000000 -040= 0 > @@ -1838,10 +1838,12 @@ nfsrvd_readdirplus(struct nfsrv_descript > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =3D= NFSERR_NOTDIR; > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat && cnt =3D=3D 0) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =3D= NFSERR_TOOSMALL; > +#ifdef notnow > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat) > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =3D= nfsvno_accchk(vp, VEXEC, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->= nd_cred, exp, p, NFSACCCHK_NOOVERRIDE, > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NFSA= CCCHK_VPISLOCKED, NULL); > +#endif > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_repstat) { > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vput(vp); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_flag & = ND_NFSV3) > > This wouldn't be suitable for a production system, but whether or > not it "fixes" the problem would give us an indication of where the > problem is. > > Also, if you could clarify when your 8/stable was downloaded, whether > your 9.0 upgrade was to vanilla Beta1 or ??? and details w.r.t. your > ZFS setup, that might help. I use svn, unfortunately i don't remember exactly when i moved from 8.2 to stable. I synced with CURRENT last week and this issue appeared, i did a second update to beta 2 [r225237] with the same results. The patch didn't make any difference. I downloaded an ISO with BETA-1 and made a VM installation, i was not able to reproduce this. Updated one of the clients to r225237, setup some nfs exports on top of ZFS and ls does not work for non root users. I created a new pool on top of a memory fs to test this. Next, i "downgraded" the server to BETA-1 [r224413] and everything is back to normal. So there's a bug which was introduced somewhere between BETA-1 && BETA-2 :p Thank you for your help! Regards, George From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 00:42:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E91D106566B for ; Wed, 31 Aug 2011 00:42:56 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta07.westchester.pa.mail.comcast.net (qmta07.westchester.pa.mail.comcast.net [76.96.62.64]) by mx1.freebsd.org (Postfix) with ESMTP id 4E57D8FC0A for ; Wed, 31 Aug 2011 00:42:56 +0000 (UTC) Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71]) by qmta07.westchester.pa.mail.comcast.net with comcast id Soiw1h0031YDfWL57oiwTt; Wed, 31 Aug 2011 00:42:56 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.westchester.pa.mail.comcast.net with comcast id Sois1h0111t3BNj3goitbq; Wed, 31 Aug 2011 00:42:54 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4D12C102C36; Tue, 30 Aug 2011 17:42:51 -0700 (PDT) Date: Tue, 30 Aug 2011 17:42:51 -0700 From: Jeremy Chadwick To: Lev Serebryakov Message-ID: <20110831004251.GA89979@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <317753422.20110830231815@serebryakov.spb.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 00:42:56 -0000 On Tue, Aug 30, 2011 at 11:18:15PM +0400, Lev Serebryakov wrote: > Now, when I "defragmented" my large FS, I see very inconsistent > read speeds on same files. Is it Ok? > > My setup is: > > (1) FreeBSD 8.2-STABLE/x64 > (2) E4400 CPU, 2GiB RAM > (3) 5xHDDs in RAID5 (software), controller is ICH9R. > (4) UFS2 with 32KiB block, vfs.read_max=32 (1MiB read-ahead). > (5) System and swap on another (6th) HDD, but swap is unused. > (6) No periodic or background processes access FS in question at all. > > Simple program reads each of 12 files (460MiB each) 15 times in cycle > like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed, > as reading process returns to same data every ~5.5GiB and here are > only 2GiB physical memory in system. > > And speed of these reads are VERY inconsistent. I've calculated > min/average/max and standard deviation and results are like this: > > Name Min/Avg/Max StdDev > r012f02.nef 120/235/413 MiB/s 83 > r012f09.nef 154/248/393 MiB/s 80 > r012f12.nef 106/212/293 MiB/s 63 > r012f05.nef 86/206/280 MiB/s 62 > r012f08.nef 128/223/332 MiB/s 60 > r012f11.nef 155/257/327 MiB/s 56 > r012f03.nef 121/213/279 MiB/s 52 > r012f10.nef 120/226/284 MiB/s 45 > r012f07.nef 121/199/249 MiB/s 41 > r012f01.nef 135/199/242 MiB/s 33 > > It is results from 15 runs! One time file was read at sustained > average speed 120MiB/s (~3.8 seconds) and next time it was 413MiB/s > (only ~1.1 second!) > > And it is not case when first read is slowest. No. Sometimes last > one is slowest, for example. > > Is it Ok? I'm very disappointed to see 120MiB/s when I know that > hardware can give 415MiB/s, but something strange slows down the > process. What appears to have been missed here is that there are 5 drives in a RAID-5 fashion. Wait, RAID-5? FreeBSD has RAID-5 support? How? Oh, right... There's a port called sysutils/graid5 which is a "converted to work on FreeBSD 8.x" GEOM class for RAID-5. The original was written for earlier FreeBSD and was called geom_raid5. The original that Arne Worner introduced was written in 2006. A port was made for it only recently: http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/graid5/Makefile What scares me is the number of "variants" on this code: http://en.wikipedia.org/wiki/Geom_raid5 Some users have asked why this code hasn't ever been committed to the FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"): http://forums.freebsd.org/showthread.php?t=9040 There are admissions from Arne that "the code is absolutely horrible", which may be why it's never been committed to FreeBSD. There's also all sorts of other concerns: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00437.html Here's one citing concerns over "aggressive caching", talking about writes and not reads, but my point still applies: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.html The thread continues for quite some time. There's also a freebsd-current thread from 2007 asking if the code could be committed to HEAD, with some users stating they'd like to see that too -- with one noting that gvinum has support for RAID-5 so basically "which is better?" (I imagine that question is still unanswered) There were also concerns over testing, reliability, throughput, etc. and the answers (as of 2007) were really not that great: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00351.html http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00361.html So can I ask what guarantee you have that geom_raid5 is not responsible for the intermittent I/O speeds you see? I would recommend you remove geom_raid5 from the picture entirely and replace it with either gstripe(8) or ccd(4) SOLELY FOR TESTING. Furthermore, why are these benchmarks not providing speed data per-device (e.g. gstat or iostat -x data)? There is a possibility that one of your drives could be performing at less-than-ideal rates (yes, intermittently) and therefore impacts (intermittently) your overall I/O throughput. The other posts in this mail thread so far are much more conclusive, but the above points/concerns I believe are still valid. They have never been thoroughly refuted or addressed. I guess you could say I'm very surprised someone is complaining about performance issues on FreeBSD when using a 3rd-party GEOM class that's been scrutinised in the past. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 07:38:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CB8761065674; Wed, 31 Aug 2011 07:38:38 +0000 (UTC) (envelope-from lev@serebryakov.spb.ru) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 67B108FC12; Wed, 31 Aug 2011 07:38:38 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 96D384AC58; Wed, 31 Aug 2011 11:38:36 +0400 (MSD) Date: Wed, 31 Aug 2011 11:38:33 +0400 From: Lev Serebryakov X-Priority: 3 (Normal) Message-ID: <485583919.20110831113833@serebryakov.spb.ru> To: Bob Friesenhahn In-Reply-To: References: <317753422.20110830231815@serebryakov.spb.ru> <201108302009.p7UK9CBQ085481@chez.mckusick.com> <103666698.20110831021404@serebryakov.spb.ru> <1693072185.20110831022933@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 07:38:38 -0000 Hello, Bob. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 2:40:07: >>> I'll try this experiment with mmap() and touching every 4096-th byte = of >>> mapped memory instead of read(2). >> Strange enough, it gives only 40-50MiB/s and results are very >> consistent. >> >> It really surprise me. I didn't think, that there will be so much >> difference, I was sure, that it will be almost equivalent speed. > FreeBSD does not seem to default to sequential read-ahead when memory > mapping is used with sequential page access. Try using madvise()=20 > with the MADV_SEQUENTIAL option and see if it helps. It were results with MADV_SEQUENTIAL. Code looks like this: (error checking is skipped here, but not in real code, of course): fd =3D open(fileName, O_RDONLY | O_DIRECT); buf =3D mmap(NULL, fileSize, PROT_READ, 0, fd, 0); madvise(buf, fileSize, MADV_SEQUENTIAL); gettimeofday(&start, NULL); for (rd =3D 0; rd < fileSize; rd +=3D 4096) c =3D buf[rd]; gettimeofday(&end, NULL); munmap(buf, fileSize); close(fd); > There are also MADV_WILLNEED, MADV_DONTNEED, and MADV_FREE. Careful > use of these options can help performance quite a lot when data is=20 > large compared to memory. It is too complex for simple linear read test :) --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 07:44:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4724B106566B for ; Wed, 31 Aug 2011 07:44:17 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id A94418FC0C for ; Wed, 31 Aug 2011 07:44:16 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7V7AVnr066398 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 31 Aug 2011 10:10:36 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4E5DDE66.5000508@digsys.bg> Date: Wed, 31 Aug 2011 10:10:30 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110822 Thunderbird/6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> In-Reply-To: <20110831004251.GA89979@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 07:44:17 -0000 On 31.08.11 03:42, Jeremy Chadwick wrote: > There is a possibility that one of your drives could be performing at > less-than-ideal rates (yes, intermittently) and therefore impacts > (intermittently) your overall I/O throughput. This is very probable, given RAID5, that needs to read stripes off every disc on every read. Probably some S.M.A.R.T. investigation might help (different measurements for different drives, if supported by the drives, that is). Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 08:03:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D887D106566C; Wed, 31 Aug 2011 08:03:04 +0000 (UTC) (envelope-from lev@serebryakov.spb.ru) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 5FD128FC0A; Wed, 31 Aug 2011 08:03:04 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id E35954AC31; Wed, 31 Aug 2011 12:03:02 +0400 (MSD) Date: Wed, 31 Aug 2011 12:03:00 +0400 From: Lev Serebryakov X-Priority: 3 (Normal) Message-ID: <687356195.20110831120300@serebryakov.spb.ru> To: Jeremy Chadwick In-Reply-To: <20110831004251.GA89979@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Lev Serebryakov Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 08:03:04 -0000 Hello, Jeremy. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 4:42:51: > What appears to have been missed here is that there are 5 drives in a > RAID-5 fashion. Wait, RAID-5? FreeBSD has RAID-5 support? How? Oh, > right... > There's a port called sysutils/graid5 which is a "converted to work on > FreeBSD 8.x" GEOM class for RAID-5. The original was written for > earlier FreeBSD and was called geom_raid5. The original that Arne > Worner introduced was written in 2006. A port was made for it only > recently: I'm author of this port. And I'm author of some improvements, approved by Arne Worner, which is included into this port :) And it seems, that I'm only user in whole world of this port, too. But it works for me for many years without any data-loss problems. It helps me not to lost data, when I had 3 dead HDDs in these years (not in simultaneously, of course) and upgrade my server from 5x500Gb to 5x2Tb configuration without stopping it (ok, with small stop for "growfs" run, but all HDDs were replaced one-by-one on live system, thaks to STAT hotplug). Now I'm trying to squeeze maximum speed from this software :) > What scares me is the number of "variants" on this code: > http://en.wikipedia.org/wiki/Geom_raid5 There are three wariants dumb proof-of-consept, stable and fast, but not ideal code and experimental one. Port uses second one. First one is way to slow and third one HAVE problems. What scares _me_ is the coding style of Arne. I've spent almost year to understand almost all details of this code, mostly due to two-letter variables, etc. > Some users have asked why this code hasn't ever been committed to the > FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"): > http://forums.freebsd.org/showthread.php?t=3D9040 Code style. And I mean real problems, not some nit-picking about "return 0;" vs "return (0);" or white spaces. I'm trying to clean up it in separate branch, without changing functionality, before I'll implement some new ideas, which should cleannup code even more. But it is not very fast process, as I don't have a lot of spare time now, and it is work which takes A LOT of concentration. > Here's one citing concerns over "aggressive caching", talking about > writes and not reads, but my point still applies: > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.= html > http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.= html Yep, and this aggressive caching could be turned off. But it is GREAT help on write speed. Use good UPS and nut -- they really HELP. And, other note: without UPS and nut even without geom_raid5 here is BIG problem with large volumes and UFS2. Background ffsck for 2Tb volume takes about three hours, when system almost locked, and fails often. fsck of 8Tb volume? It is my worst nightmare. And it doesn't depend on RADI5 and it write cache. Use UPS. USE IT. > So can I ask what guarantee you have that geom_raid5 is not responsible > for the intermittent I/O speeds you see? I would recommend you remove I'm not sure here -- it is the point. I want to understand, is it geom_raid5 problem, UFS2 problem, VMM problem or some combination of ``glithces'' of these subsystems. I'm almost sure, it is not problem of something ``in vacuum,'' it is problem at border between subsystems. And, as I don't understand well how to "look inside" UFS2, I ask for help here. > geom_raid5 from the picture entirely and replace it with either > gstripe(8) or ccd(4) SOLELY FOR TESTING. It is impossible in this config: I have data which is valuable for me. Here is problem: I could do any tests, but speed one, on test server and VMs. I could run testsuite, switch off HDDs, re-create FSes, etc., to be sure that geom_raid5 is STABLE in terms of data safety. But only BIG system, on which I could perform valid speed benchmarks, is my home server with my data, which I could not lost. It is useless to run such benchmarks on array of old 9GiB (yes, you read= it right, 9 gigabytes) SCSI HDDs or in virtual machine with bunch of virtual HDDs. And I have not second server with modern fast and big disks. Sorry. > Furthermore, why are these benchmarks not providing speed data > per-device (e.g. gstat or iostat -x data)? There is a possibility that > one of your drives could be performing at less-than-ideal rates (yes, > intermittently) and therefore impacts (intermittently) your overall I/O > throughput. I'll look at this, but I've zeor-outed all HDDs before placing them into array, and speed were identical. > been thoroughly refuted or addressed. I guess you could say I'm very > surprised someone is complaining about performance issues on FreeBSD > when using a 3rd-party GEOM class that's been scrutinised in the past. It is not complain. It is request for help in profiling very old and complex subsystem :) Maybe, I was not very clear here in my first message. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 08:11:06 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BCB40106566C; Wed, 31 Aug 2011 08:11:06 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 828D78FC15; Wed, 31 Aug 2011 08:11:06 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 4DDB34AC31; Wed, 31 Aug 2011 12:11:05 +0400 (MSD) Date: Wed, 31 Aug 2011 12:11:02 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <170569583.20110831121102@serebryakov.spb.ru> To: Kirk McKusick In-Reply-To: <201108302300.p7UN0jJ6022811@chez.mckusick.com> References: <1693072185.20110831022933@serebryakov.spb.ru> <201108302300.p7UN0jJ6022811@chez.mckusick.com> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@FreeBSD.org, lev@FreeBSD.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 08:11:06 -0000 Hello, Kirk. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 3:00:45: > So, my new theory on why your O_DIRECT test is running slowly is due > to the single threading in the GEOM layer. Pawel Jakub Dawidek (pjd@) > gave a very interesting talk on this problem at this year's BSDCan. I want to stress my point: it is not low speed, what especially bother me. It is inconsistency. Bad drive, always-single-threaded-GEOM, etc., should give consistent slow speed. Bad code in geom_raid5 could give inconsistent write speed, du to caching, but reading path is as straight and simple as possible here. Bad drive or AST-GEOM is not what simple to fix (ok, bad drive IS simple to replace, of course). But I suspect, that here is some simple "misunderstanding" between geom_raid5 code and VFS/FFS2 layer, which I could fix. But I don't know where to look at. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 08:19:24 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A81F11065C9D for ; Wed, 31 Aug 2011 08:19:24 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 6DB1B8FC1C for ; Wed, 31 Aug 2011 08:19:24 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 4828D4AC31; Wed, 31 Aug 2011 12:19:23 +0400 (MSD) Date: Wed, 31 Aug 2011 12:19:20 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <10310173523.20110831121920@serebryakov.spb.ru> To: Daniel Kalchev In-Reply-To: <4E5DDE66.5000508@digsys.bg> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <4E5DDE66.5000508@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 08:19:24 -0000 Hello, Daniel. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 11:10:30: > This is very probable, given RAID5, that needs to read stripes off every > disc on every read. Again: how faulty drive could give inconsistency in one file? If it has relocated sectors (additional long seek in some place), it will give consistent performance degradation in that place. > Probably some S.M.A.R.T. investigation might help (different=20 > measurements for different drives, if supported by the drives, that is). SMARTs are almost identical and exclellent. No relocated sectors (at all!) not multizone read errors (at all!), etc. I'll try to synchronize iostat statistic and reading benchmark, but it is hard to do, as I can not think of the way how to know, that these 100 lines of iostat output is exactly this (slow) reading of this file, when there is 10+ files in question and all of them, are read 10+ times. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 08:36:28 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 57544106564A for ; Wed, 31 Aug 2011 08:36:28 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 06D688FC13 for ; Wed, 31 Aug 2011 08:36:28 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id AC6074AC31; Wed, 31 Aug 2011 12:36:26 +0400 (MSD) Date: Wed, 31 Aug 2011 12:36:23 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <147623060.20110831123623@serebryakov.spb.ru> To: Jeremy Chadwick In-Reply-To: <20110831004251.GA89979@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 08:36:28 -0000 Hello, Jeremy. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 4:42:51: > Furthermore, why are these benchmarks not providing speed data > per-device (e.g. gstat or iostat -x data)? There is a possibility that > one of your drives could be performing at less-than-ideal rates (yes, > intermittently) and therefore impacts (intermittently) your overall I/O > throughput. Ok. I've run my benchamrk when `iostat -x -d -c 999999' is running. Results are like this: device r/s w/s kr/s kw/s wait svc_t %b ada1 340.9 292.9 43138.8 146.5 0 1.2 42 ada2 340.9 293.9 43138.8 147.0 0 1.9 63 ada3 340.9 292.9 43044.7 146.5 0 1.5 57 ada4 341.9 292.9 43232.9 146.5 0 1.3 42 ada5 341.9 292.0 43138.8 146.0 2 1.3 40 Yes, NUMBERS are different from sample to sample and oscillate from 16MB/s to 80Mb/s, but they VERY consistent among disks in question. Slow read? All disks work slowly. Fast read? All disks work fast. I don't like this low-level speed oscillation too. I understand, that something higher on stack cause it. And want to understand -- WHAT. What additionally surprise me: 1) benchmark induce some writing. atime modification? No, I've turned this one off, but it doesn't help. I afraid, that this read-write interleaving could be cause of "problems", but I don't understand, WHY here is some writing (1 writing per 2 reads in average) when read-only benchmark runs. It doesn't write any logs, etc. Yes, writing speed is very low, every write transaction is about 2Kb, but WHY they are here?! If I stop benchmark, here will be less than 1 write transaction per second. 2) without `-x' it shows, that typical read transaction size is about 50Kb. It is very strange, as geom_raid5 shows (I have diagnostics in it), that almost all file access is aligned and is 128Kb-sized... P.S. Several samples for example of consistency in ONE sample and inconsistency BETWEEN samples. Random pick from output, no editing, they were in exact this order: extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada1 165.3 87.0 10515.9 43.5 2 5.0 50 ada2 165.3 87.0 10547.2 43.5 2 7.7 61 ada3 167.2 87.0 10703.7 43.5 1 6.1 55 ada4 165.3 87.0 10484.6 43.5 3 4.9 44 ada5 160.4 87.0 10265.5 43.5 5 5.1 48 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada1 884.1 350.9 56583.1 175.4 0 1.0 49 ada2 886.1 350.9 56677.2 175.4 0 1.3 58 ada3 882.2 349.9 56489.0 175.0 2 1.7 63 ada4 885.1 350.9 56614.5 175.4 0 1.4 64 ada5 887.1 350.9 56739.9 175.4 0 1.5 63 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada1 640.6 261.5 41001.3 130.8 0 0.9 40 ada2 639.7 261.5 40969.9 130.8 0 0.9 35 ada3 637.7 262.5 40844.5 131.3 0 1.5 46 ada4 640.6 260.6 41001.3 130.3 1 1.3 65 ada5 638.7 261.5 40875.9 130.8 0 1.3 46 extended device statistics device r/s w/s kr/s kw/s wait svc_t %b ada1 243.7 102.8 15660.2 51.4 2 1.9 36 ada2 240.8 102.8 15503.6 51.4 3 1.9 43 ada3 242.7 103.7 15566.2 51.9 0 1.9 30 ada4 244.7 103.7 15785.5 51.9 2 2.4 56 ada5 243.7 102.8 15566.2 51.4 2 1.8 30 --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 08:48:42 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FC71106564A for ; Wed, 31 Aug 2011 08:48:42 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id F09158FC14 for ; Wed, 31 Aug 2011 08:48:41 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7V8mXtE067659 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 31 Aug 2011 11:48:38 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4E5DF560.1050507@digsys.bg> Date: Wed, 31 Aug 2011 11:48:32 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110822 Thunderbird/6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru> In-Reply-To: <147623060.20110831123623@serebryakov.spb.ru> Content-Type: text/plain; charset=windows-1251; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 08:48:42 -0000 On 31.08.11 11:36, Lev Serebryakov wrote: > device r/s w/s kr/s kw/s wait svc_t %b > ada1 340.9 292.9 43138.8 146.5 0 1.2 42 > ada2 340.9 293.9 43138.8 147.0 0 1.9 63 > ada3 340.9 292.9 43044.7 146.5 0 1.5 57 > ada4 341.9 292.9 43232.9 146.5 0 1.3 42 > ada5 341.9 292.0 43138.8 146.0 2 1.3 40 > Very interesting, this writes. You need to find out what is causing these. Just some random thoughts: This flapping may have something to do with the drives' internal caches. What are the drives? SATA drives, unlike SAS have simplex communication with the host, that is, the drive cannot simultaneously read and write data and commands (from/to host). There might be some, perhaps locking contention in there? It is not contention for bandwidth obviously. Most consumer drives have rather low IOPS performance. This is especially pronounced when there are both reads and writes. Your IOPS rate here is relatively high for such drive, although the busy percentage is low -- but then, it may not be accurate. In any case, you cannot measure read performance as long as it intermixes with writes, especially as you noted that your RAID5 code has some non-obvious write characteristics/optimizations. Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 09:03:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CFA301065670 for ; Wed, 31 Aug 2011 09:03:58 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 944008FC19 for ; Wed, 31 Aug 2011 09:03:58 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id DDBFD4AC31; Wed, 31 Aug 2011 13:03:56 +0400 (MSD) Date: Wed, 31 Aug 2011 13:03:54 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <177519198.20110831130354@serebryakov.spb.ru> To: Daniel Kalchev In-Reply-To: <4E5DF560.1050507@digsys.bg> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru> <4E5DF560.1050507@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 09:03:58 -0000 Hello, Daniel. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 12:48:32: > On 31.08.11 11:36, Lev Serebryakov wrote: >> device r/s w/s kr/s kw/s wait svc_t %b >> ada1 340.9 292.9 43138.8 146.5 0 1.2 42 >> ada2 340.9 293.9 43138.8 147.0 0 1.9 63 >> ada3 340.9 292.9 43044.7 146.5 0 1.5 57 >> ada4 341.9 292.9 43232.9 146.5 0 1.3 42 >> ada5 341.9 292.0 43138.8 146.0 2 1.3 40 >> > Very interesting, this writes. You need to find out what is causing these. Yep. I've been very surprised by them. > Just some random thoughts: > This flapping may have something to do with the drives' internal caches. > What are the drives? WD20EARS, it is WD Green 2Tb, advanced format. Yes, I know, that they are not best performers at all, when here are seeks. It is why I don't expect good performance in random or multi-threaded (multi-client) access patterns here. And, yes, I know about advanced format. Stripe size is 128Kb, and GEOM is built from raw drives, so all stripes are aligned. FS is created on raw GEOM, without any partitioning again, and block size is 32Kb, so everything should be aligned here too. Really, if all reading speeds were, say, 120MiB/s, but every time and consistent, I don't start this thread. In case I would blame HDDs and my parsimony, but not software :) > SATA drives, unlike SAS have simplex communication with the host, that > is, the drive cannot simultaneously read and write data and commands=20 > (from/to host). There might be some, perhaps locking contention in=20 > there? It is not contention for bandwidth obviously. Yep... > In any case, you cannot measure read performance as long as it > intermixes with writes, especially as you noted that your RAID5 code has > some non-obvious write characteristics/optimizations. I understand. Now I should understand how to pin down these writes. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 10:12:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 24B311065672 for ; Wed, 31 Aug 2011 10:12:14 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id D849C8FC19 for ; Wed, 31 Aug 2011 10:12:13 +0000 (UTC) Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71]) by qmta08.westchester.pa.mail.comcast.net with comcast id SyCE1h0011YDfWL58yCEkt; Wed, 31 Aug 2011 10:12:14 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta20.westchester.pa.mail.comcast.net with comcast id SyCC1h00K1t3BNj3gyCDSB; Wed, 31 Aug 2011 10:12:13 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 4AB70102C36; Wed, 31 Aug 2011 03:12:11 -0700 (PDT) Date: Wed, 31 Aug 2011 03:12:11 -0700 From: Jeremy Chadwick To: Lev Serebryakov Message-ID: <20110831101211.GA98865@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <147623060.20110831123623@serebryakov.spb.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 10:12:14 -0000 On Wed, Aug 31, 2011 at 12:36:23PM +0400, Lev Serebryakov wrote: > Hello, Jeremy. > You wrote 31 ??????? 2011 ?., 4:42:51: > > > Furthermore, why are these benchmarks not providing speed data > > per-device (e.g. gstat or iostat -x data)? There is a possibility that > > one of your drives could be performing at less-than-ideal rates (yes, > > intermittently) and therefore impacts (intermittently) your overall I/O > > throughput. > Ok. I've run my benchamrk when `iostat -x -d -c 999999' is running. > Results are like this: > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 340.9 292.9 43138.8 146.5 0 1.2 42 > ada2 340.9 293.9 43138.8 147.0 0 1.9 63 > ada3 340.9 292.9 43044.7 146.5 0 1.5 57 > ada4 341.9 292.9 43232.9 146.5 0 1.3 42 > ada5 341.9 292.0 43138.8 146.0 2 1.3 40 > > {snipping text, focusing on data} > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 165.3 87.0 10515.9 43.5 2 5.0 50 > ada2 165.3 87.0 10547.2 43.5 2 7.7 61 > ada3 167.2 87.0 10703.7 43.5 1 6.1 55 > ada4 165.3 87.0 10484.6 43.5 3 4.9 44 > ada5 160.4 87.0 10265.5 43.5 5 5.1 48 > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 884.1 350.9 56583.1 175.4 0 1.0 49 > ada2 886.1 350.9 56677.2 175.4 0 1.3 58 > ada3 882.2 349.9 56489.0 175.0 2 1.7 63 > ada4 885.1 350.9 56614.5 175.4 0 1.4 64 > ada5 887.1 350.9 56739.9 175.4 0 1.5 63 > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 640.6 261.5 41001.3 130.8 0 0.9 40 > ada2 639.7 261.5 40969.9 130.8 0 0.9 35 > ada3 637.7 262.5 40844.5 131.3 0 1.5 46 > ada4 640.6 260.6 41001.3 130.3 1 1.3 65 > ada5 638.7 261.5 40875.9 130.8 0 1.3 46 > > device r/s w/s kr/s kw/s wait svc_t %b > ada1 243.7 102.8 15660.2 51.4 2 1.9 36 > ada2 240.8 102.8 15503.6 51.4 3 1.9 43 > ada3 242.7 103.7 15566.2 51.9 0 1.9 30 > ada4 244.7 103.7 15785.5 51.9 2 2.4 56 > ada5 243.7 102.8 15566.2 51.4 2 1.8 30 This benchmark data is more or less unhelpful due to the fact that there are writes occurring during the middle of your reads. There's another spun-off portion of this thread that is discussing how you're benchmarking these things (specifically some code you wrote?). I don't know what else to say in this regard. It would really help if you could use something like bonnie++ and make sure the filesystem is not being used by ANYTHING during your benchmarks. Anyway, the data is interesting because from an aggregate total perspective, you're hitting some arbitrary limit on all of your devices which almost indicates memory bus throttling or something along those lines; CPU time? I really don't know. Aggregate write speeds respectively: 43138.8 + 43138.8 + 43044.7 + 43232.9 + 43138.8 == 215694.0 KByte/sec 10515.9 + 10547.2 + 10703.7 + 10484.6 + 10265.5 == 52516.9 KByte/sec 56583.1 + 56677.2 + 56489.0 + 56614.5 + 56739.9 == 283103.7 KByte/sec 41001.3 + 40969.9 + 40844.5 + 41001.3 + 40875.9 == 204692.9 KByte/sec 15660.2 + 15503.6 + 15566.2 + 15785.5 + 15566.2 == 78081.7 KByte/sec The totals are "all over the place", but what interests me the most is that the total aggregate never exceeds an amount that's slightly under 300MBytes/sec.. That number has some relevance if, say, you're using a port multiplier (5 devices aggregated across one SATA300 port). Despite these being WD20EARS drives (4 platters, ugh!), these individual devices should be able to push 75-90MBytes/sec writes, and slightly higher reads. Like you, it also interests me that all the drives behave the same; meaning all speeds are roughly the same on all 5 devices simultaneously, regardless of speed/rate/throughput. Here's an idea: can you stop using the filesystem for a bit and instead do raw dd's from all of the /dev/adaX entries to /dev/null simultaneously (pick something like bs=64k or bs=256k), then run your iostats? I'm basically trying to figure out if the bad speeds are actually the devices themselves or if it's the geom_raid5 stuff. You get where I'm going with this. If 5 simultaneously dds reading from the drives is very fast (way faster than the above) and there aren't sporadic drops in performance which aren't caused by writes (hence my "stop using the filesystem" comment), then I think we've narrowed down where the issue lies -- not the drives. > 1) benchmark induce some writing. atime modification? No, I've turned > this one off, but it doesn't help. I afraid, that this read-write > interleaving could be cause of "problems", but I don't understand, > WHY here is some writing (1 writing per 2 reads in average) when > read-only benchmark runs. It doesn't write any logs, etc. Yes, > writing speed is very low, every write transaction is about 2Kb, > but WHY they are here?! If I stop benchmark, here will be less than > 1 write transaction per second. (Note: I'm going to assume by "Kb" you mean "kilobytes" and not "kilobits"; B = byte, b = bit. This is why I got into the habit of just writing out the unit in full, because too many people try to shorthand it and pick the wrong one. And it'll be a cold day in hell before I ever use "XXbi" (e.g. kibi, mebi, gibi, tebi)) The dd method I describe should absolutely not induce writes, hence my recommendation. If writes are seen during the dd's, then either the filesystem is mounted and FreeBSD is doing something "interesting" on a filesystem or vfs level, or your system is actually an izbushka..... Maybe softupdates are somehow responsible? Not sure. > 2) without `-x' it shows, that typical read transaction size is > about 50Kb. It is very strange, as geom_raid5 shows (I have > diagnostics in it), that almost all file access is aligned and is > 128Kb-sized... I'm not sure -- please take what I say here with a grain of salt -- but I believe there was a recent discussion on -stable or -fs about some sort of 64KByte "limit" within UFS/UFS2 somewhere? I think I'm thinking of "MAX_BSIZE". I'm having a lot of difficulty following all these storage-related threads. Everyone seems to show up "in bulk" on the mailing lists all at once and it's overwhelming at times. I'm getting old, in more ways than one. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 11:37:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 197D31065672 for ; Wed, 31 Aug 2011 11:37:30 +0000 (UTC) (envelope-from lev@serebryakov.spb.ru) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 964438FC08 for ; Wed, 31 Aug 2011 11:37:29 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 78F4A4AC31; Wed, 31 Aug 2011 15:37:27 +0400 (MSD) Date: Wed, 31 Aug 2011 15:37:24 +0400 From: Lev Serebryakov X-Priority: 3 (Normal) Message-ID: <981083303.20110831153724@serebryakov.spb.ru> To: Jeremy Chadwick In-Reply-To: <20110831101211.GA98865@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru> <20110831101211.GA98865@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 11:37:30 -0000 Hello, Jeremy. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 14:12:11: > This benchmark data is more or less unhelpful due to the fact that there > are writes occurring during the middle of your reads. There's another Yep :( > spun-off portion of this thread that is discussing how you're > benchmarking these things (specifically some code you wrote?). I don't > know what else to say in this regard. It would really help if you could > use something like bonnie++ and make sure the filesystem is not being > used by ANYTHING during your benchmarks. I'll try bonnie++, Ok. My code is really as simple as it could be: fd =3D open(fileName, O_RDONLY | O_DIRECT); gettimeofday(&start, NULL); /* s_BufferSize is 128KiB */ while ((rd =3D read(fd, s_Buffer, s_BufferSize)) > 0) size +=3D rd; gettimeofday(&end, NULL); close(fd); > Anyway, the data is interesting because from an aggregate total > perspective, you're hitting some arbitrary limit on all of your devices > which almost indicates memory bus throttling or something along those > lines; CPU time? I really don't know. Aggregate write speeds > respectively: > 43138.8 + 43138.8 + 43044.7 + 43232.9 + 43138.8 =3D=3D 215694.0 KByte/sec > 10515.9 + 10547.2 + 10703.7 + 10484.6 + 10265.5 =3D=3D 52516.9 KByte/sec > 56583.1 + 56677.2 + 56489.0 + 56614.5 + 56739.9 =3D=3D 283103.7 KByte/sec > 41001.3 + 40969.9 + 40844.5 + 41001.3 + 40875.9 =3D=3D 204692.9 KByte/sec > 15660.2 + 15503.6 + 15566.2 + 15785.5 + 15566.2 =3D=3D 78081.7 KByte/sec > The totals are "all over the place", but what interests me the most is > that the total aggregate never exceeds an amount that's slightly under > 300MBytes/sec.. That number has some relevance if, say, you're using a > port multiplier (5 devices aggregated across one SATA300 port). No. All drives are on separate ports of ICH9R chipset controller. And, yes, sustained and constant 300MiB/s is my dream :) Keywords: sustained and constant. > Despite these being WD20EARS drives (4 platters, ugh!), these individual As ffar as I understand, 4 platters are slightly better in linear access than 3 platters, but worse in random access, as it read more data without heads movement. > devices should be able to push 75-90MBytes/sec writes, and slightly > higher reads. Read is about 110MiB/s at beginning of drive. > Here's an idea: can you stop using the filesystem for a bit and instead > do raw dd's from all of the /dev/adaX entries to /dev/null > simultaneously (pick something like bs=3D64k or bs=3D256k), then run your > iostats? I'm basically trying to figure out if the bad speeds are > actually the devices themselves or if it's the geom_raid5 stuff. You > get where I'm going with this. Not a problem! FS is unmounted, and after that: # for d in 1 2 3 4 5 ; do dd if=3D/dev/ada$d of=3D/dev/null bs=3D64k & done # iostat -c 999999 -dx ada1 ada2 ada3 ada4 ada5 device r/s w/s kr/s kw/s wait svc_t %b ada1 1849.1 0.0 118343.7 0.0 1 0.5 93 ada2 1920.3 0.0 122900.2 0.0 0 0.5 94 ada3 1874.5 0.0 119966.6 0.0 1 0.5 94 ada4 1794.5 0.0 114848.4 0.0 1 0.5 94 ada5 1893.0 0.0 121152.5 0.0 1 0.5 93 It is very typical data, speed slightly goes up and down for all HDDs without any visible fastest or slowest drive. > If 5 simultaneously dds reading from the drives is very fast (way faster > than the above) and there aren't sporadic drops in performance which > aren't caused by writes (hence my "stop using the filesystem" comment), > then I think we've narrowed down where the issue lies -- not the drives. Yep. It seems to be exactly like this. > The dd method I describe should absolutely not induce writes, hence my > recommendation. If writes are seen during the dd's, then either the > filesystem is mounted and FreeBSD is doing something "interesting" on a > filesystem or vfs level, or your system is actually an izbushka..... > Maybe softupdates are somehow responsible? Not sure. I have one ide about geom_raid5 writes... I need to check it. --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 11:47:26 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB188106564A for ; Wed, 31 Aug 2011 11:47:26 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 62B978FC16 for ; Wed, 31 Aug 2011 11:47:26 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7VBlHUT070526 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 31 Aug 2011 14:47:22 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4E5E1F44.8020603@digsys.bg> Date: Wed, 31 Aug 2011 14:47:16 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:6.0) Gecko/20110822 Thunderbird/6.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru> <20110831101211.GA98865@icarus.home.lan> <981083303.20110831153724@serebryakov.spb.ru> In-Reply-To: <981083303.20110831153724@serebryakov.spb.ru> Content-Type: text/plain; charset=windows-1251; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 11:47:26 -0000 On 31.08.11 14:37, Lev Serebryakov wrote: > >> If 5 simultaneously dds reading from the drives is very fast (way faster >> than the above) and there aren't sporadic drops in performance which >> aren't caused by writes (hence my "stop using the filesystem" comment), >> then I think we've narrowed down where the issue lies -- not the drives. > Yep. It seems to be exactly like this. > This test does not rule out drive IOPS limits. Or drive cache trashing. If you tell the drive to continuously read, or write mots of these IOs is served from/to drive cache, thus such large number of IOPS. More that the drive could handle if it has to move heads. Not saying this is the case, but things may be as simple as filling up the write cache and the drive deciding to flush it out to platters, thus reducing read rate. These are desktop drives, apparently designed for non-threaded applications. "raw" read/write speeds may be high, but higher-performing drives at much higher price points offer much more performance, even at lower "raw" read/write rates. Just spending more for smarter controller. Eliminate the writes and the drives might be worth their salt. Daniel From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 12:49:38 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6ED3F106566B; Wed, 31 Aug 2011 12:49:38 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru [IPv6:2a01:4f8:131:60a2::2]) by mx1.freebsd.org (Postfix) with ESMTP id 0B9378FC12; Wed, 31 Aug 2011 12:49:38 +0000 (UTC) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 6E1074AC31; Wed, 31 Aug 2011 16:49:36 +0400 (MSD) Date: Wed, 31 Aug 2011 16:49:33 +0400 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <809344970.20110831164933@serebryakov.spb.ru> To: Jeremy Chadwick In-Reply-To: <20110831101211.GA98865@icarus.home.lan> References: <1945418039.20110830231024@serebryakov.spb.ru> <317753422.20110830231815@serebryakov.spb.ru> <20110831004251.GA89979@icarus.home.lan> <147623060.20110831123623@serebryakov.spb.ru> <20110831101211.GA98865@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=windows-1251 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Lev Serebryakov Subject: Re: Very inconsistent (read) speed on UFS2 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: lev@FreeBSD.org List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 12:49:38 -0000 Hello, Jeremy. You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 14:12:11: > The dd method I describe should absolutely not induce writes, hence my > recommendation. If writes are seen during the dd's, then either the > filesystem is mounted and FreeBSD is doing something "interesting" on a > filesystem or vfs level, or your system is actually an izbushka..... I've eliminate writes. It was RAID5 metadata updates due to very paranoid check for "new metadata". It doesn't hurt in terms of dtaa safety, but not in terms of speed. Ok, now results are MUCH MORE consistent, and is about 50% of theoretical maximum on average. Looks good. SLOWEST (by Average) files: Name Min/Avg/Max StdDev r007f05.nef 205/230/242 MiB/s 12 r008f06.nef 215/234/254 MiB/s 14 r018f10.nef 218/235/258 MiB/s 13 r013f09.nef 230/243/256 MiB/s 9 r013f11.nef 236/243/249 MiB/s 4 r008f10.nef 238/243/249 MiB/s 3 r015f04.nef 220/244/265 MiB/s 17 r011f04.nef 240/245/256 MiB/s 5 r015f05.nef 221/248/286 MiB/s 24 r008f09.nef 231/250/266 MiB/s 11 MOST UNSTABLE files: Name Min/Avg/Max StdDev r008f12.nef 291/327/377 MiB/s 38 r021f06.nef 307/382/404 MiB/s 37 r021f02.nef 253/295/346 MiB/s 34 r013f08.nef 264/329/352 MiB/s 33 r012f05.nef 298/354/398 MiB/s 32 r020f05.nef 305/357/388 MiB/s 30 r020f03.nef 292/316/376 MiB/s 30 r022f06.nef 284/319/371 MiB/s 30 r010f12.nef 303/346/377 MiB/s 29 r013f06.nef 285/329/365 MiB/s 29 --=20 // Black Lion AKA Lev Serebryakov From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 12:49:57 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E70591065672; Wed, 31 Aug 2011 12:49:56 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6B39E8FC21; Wed, 31 Aug 2011 12:49:55 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAKssXk6DaFvO/2dsb2JhbABDFoQ2pHOBQAEBBAEjBFIFFAIOCgICDRkCWQaIBQSnNpILgSyEGIERBJMlkSM X-IronPort-AV: E=Sophos;i="4.68,307,1312171200"; d="scan'208";a="136074187" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 31 Aug 2011 08:49:55 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3BFC2B3F27; Wed, 31 Aug 2011 08:49:55 -0400 (EDT) Date: Wed, 31 Aug 2011 08:49:55 -0400 (EDT) From: Rick Macklem To: George Liaskos Message-ID: <382461010.589453.1314794995233.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 12:49:57 -0000 George Liaskos wrote: > > You could try this patch and see what effect it has (applied to the > > server). It just disables the access check for readdir. > > --- nfs_nfsdport.c.sav2 2011-08-30 10:35:58.000000000 -0400 > > +++ nfs_nfsdport.c 2011-08-30 10:36:54.000000000 -0400 > > @@ -1838,10 +1838,12 @@ nfsrvd_readdirplus(struct nfsrv_descript > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat = =3D NFSERR_NOTDIR; > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat && cnt =3D=3D 0) > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat = =3D NFSERR_TOOSMALL; > > +#ifdef notnow > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat) > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat = =3D nfsvno_accchk(vp, VEXEC, > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd= ->nd_cred, exp, p, NFSACCCHK_NOOVERRIDE, > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NF= SACCCHK_VPISLOCKED, NULL); > > +#endif > > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_repstat) { > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vput(vp); > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_flag = & ND_NFSV3) > > > > This wouldn't be suitable for a production system, but whether or > > not it "fixes" the problem would give us an indication of where the > > problem is. > > > > Also, if you could clarify when your 8/stable was downloaded, > > whether > > your 9.0 upgrade was to vanilla Beta1 or ??? and details w.r.t. your > > ZFS setup, that might help. >=20 > I use svn, unfortunately i don't remember exactly when i moved from > 8.2 to stable. I synced with CURRENT last week and this issue > appeared, i did a second update to beta 2 [r225237] with the same > results. >=20 > The patch didn't make any difference. I downloaded an ISO with BETA-1 > and > made a VM installation, i was not able to reproduce this. >=20 > Updated one of the clients to r225237, setup some nfs exports on top > of ZFS > and ls does not work for non root users. I created a new pool on top > of a memory fs > to test this. >=20 > Next, i "downgraded" the server to BETA-1 [r224413] and everything is > back to normal. Ok, so it sounds like a post-Beta1 server issue. Did I get that correct? > So there's a bug which was introduced somewhere between BETA-1 && > BETA-2 :p >=20 Well, I can't imagine why this would matter, but you can try this patch, which fixes a problem introduced by r224810 where Lookup ".." no longer works. (It's at http://people.freebsd.org/~rmacklem/dotdot.patch, in case the white space gets munged.) Index: fs/nfsserver/nfs_nfsdport.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- fs/nfsserver/nfs_nfsdport.c=09(revision 225270) +++ fs/nfsserver/nfs_nfsdport.c=09(working copy) @@ -282,6 +282,7 @@ nfsvno_namei(struct nfsrv_descript *nd, struct nam =20 =09*retdirp =3D NULL; =09cnp->cn_nameptr =3D cnp->cn_pnbuf; +=09ndp->ni_strictrelative =3D 0; =09/* =09 * Extract and set starting directory. =09 */ Index: nfsserver/nfs_serv.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- nfsserver/nfs_serv.c=09(revision 225270) +++ nfsserver/nfs_serv.c=09(working copy) @@ -157,6 +157,7 @@ ndclear(struct nameidata *nd) =09nd->ni_vp =3D NULL; =09nd->ni_dvp =3D NULL; =09nd->ni_startdir =3D NULL; +=09nd->ni_strictrelative =3D 0; } =20 /* rick > Thank you for your help! >=20 > Regards, > George From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 14:57:02 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A91B8106566C for ; Wed, 31 Aug 2011 14:57:02 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5]) by mx1.freebsd.org (Postfix) with ESMTP id F14B28FC13 for ; Wed, 31 Aug 2011 14:57:01 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VEv0Rs045837 for ; Wed, 31 Aug 2011 21:57:00 +0700 (NOVST) (envelope-from egrosbein@rdtc.ru) Message-ID: <4E5E4BB7.1030307@rdtc.ru> Date: Wed, 31 Aug 2011 21:56:55 +0700 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 To: fs@FreeBSD.org Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: Subject: Unfixable UFS2 corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 14:57:02 -0000 Hi! Please CC: me as I'm not in the list. Long story short: my /usr/local UFS2 filesystem somehow got corrupted and "fsck -y" in single user mode does not fix it. Explanation: # ls -al /usr/local/obj/usr/local/src/secure/lib/libssh ls: : No such file or directory total 8 drwxr-xr-x 2 root wheel 4608 Aug 30 01:28 . drwxr-xr-x 3 root wheel 512 Aug 30 01:28 .. # rm -rf /usr/local/obj/usr/local/src/secure/lib/libssh rm: /usr/local/obj/usr/local/src/secure/lib/libssh: Directory not empty As I've said, I cold booted this FreeBSD 8.2-STABLE system to single user mode where all file systems are not mounted (except root) and ran fsck -y /usr/local It found no errors and said it is CLEAN. The problem still persists. I've written small program and it said me this directory contains third file (besides <.> and <..> entries) having zero file length. I got contents of the directory to plain file with "cat /usr/local/obj/usr/local/src/secure/lib/libssh > /tmp/libssh" and put it online: http://www.grosbein.net/crash/corruption/libssh Please help. The program and its output follow: #include #include #include #include int main(int argc, char* argv[]) { DIR *dirp; struct dirent *dp; unsigned i; if (argc<2) return 1; if ( (dirp = opendir(argv[1])) == NULL ) err (1, "opendir"); i = 0; while ((dp = readdir(dirp)) != NULL) { i++; printf("Entry %u:\n" "d_fileno=%u\n" "d_reclen=%u\n" "d_type=%u\n" "d_namlen=%u\n" "d_name=<%s>\n\n", i, (unsigned) dp->d_fileno, (unsigned) dp->d_reclen, (unsigned) dp->d_type, (unsigned) dp->d_namlen, (char *) dp->d_name); } return closedir(dirp); } # # ./readdir /usr/local/obj/usr/local/src/secure/lib/libssh Entry 1: d_fileno=1531227 d_reclen=12 d_type=4 d_namlen=1 d_name=<.> Entry 2: d_fileno=1389650 d_reclen=500 d_type=4 d_namlen=2 d_name=<..> Entry 3: d_fileno=24 d_reclen=512 d_type=8 d_namlen=0 d_name=<> From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 15:21:02 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BCB6106564A; Wed, 31 Aug 2011 15:21:02 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5]) by mx1.freebsd.org (Postfix) with ESMTP id 6E3C98FC08; Wed, 31 Aug 2011 15:21:01 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VFL0Fq045895; Wed, 31 Aug 2011 22:21:00 +0700 (NOVST) (envelope-from egrosbein@rdtc.ru) Message-ID: <4E5E5157.7050706@rdtc.ru> Date: Wed, 31 Aug 2011 22:20:55 +0700 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 To: FreeBSD Stable , fs@freebsd.org References: <4E5E46B1.4070408@rdtc.ru> In-Reply-To: <4E5E46B1.4070408@rdtc.ru> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 8bit Cc: Subject: Re: Unfixable UFS2 corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 15:21:02 -0000 31.08.2011 21:35, Eugene Grosbein ÐÉÛÅÔ: > # ls -al /usr/local/obj/usr/local/src/secure/lib/libssh > ls: : No such file or directory > total 8 > drwxr-xr-x 2 root wheel 4608 Aug 30 01:28 . > drwxr-xr-x 3 root wheel 512 Aug 30 01:28 .. > > # rm -rf /usr/local/obj/usr/local/src/secure/lib/libssh > rm: /usr/local/obj/usr/local/src/secure/lib/libssh: Directory not empty > > As I've said, I cold booted this FreeBSD 8.2-STABLE system to single user mode > where all file systems are not mounted (except root) and ran fsck -y /usr/local > It found no errors and said it is CLEAN. The problem still persists. > > I've written small program and it said me this directory contains third file > (besides <.> and <..> entries) having zero file length. Not file but file name length is zero. I've just found that dircheck() function in src/sbin/fsck_ffs/dir.c simply does not check if d_namlen is zero as it should, shouldn't it? Eugene Grosbein From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 16:09:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DEB91065672 for ; Wed, 31 Aug 2011 16:09:17 +0000 (UTC) (envelope-from giffunip@tutopia.com) Received: from nm29-vm0.bullet.mail.sp2.yahoo.com (nm29-vm0.bullet.mail.sp2.yahoo.com [98.139.91.236]) by mx1.freebsd.org (Postfix) with SMTP id 18B938FC14 for ; Wed, 31 Aug 2011 16:09:17 +0000 (UTC) Received: from [98.139.91.61] by nm29.bullet.mail.sp2.yahoo.com with NNFMP; 31 Aug 2011 15:56:41 -0000 Received: from [98.139.91.21] by tm1.bullet.mail.sp2.yahoo.com with NNFMP; 31 Aug 2011 15:56:41 -0000 Received: from [127.0.0.1] by omp1021.mail.sp2.yahoo.com with NNFMP; 31 Aug 2011 15:56:41 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 410340.39070.bm@omp1021.mail.sp2.yahoo.com Received: (qmail 15207 invoked by uid 60001); 31 Aug 2011 15:56:40 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1314806200; bh=fJYezCtiQWl6cGA+BETvdebccU6dyRIMVJRbdFL+cpk=; h=X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=esQL7I59vm+TtB2Ie3BOWkY1JcBmzpsXOzMosHuAmGdLZDupTr353AQnr0q8g91yoXfjoxMuxy1mpDH0RvDSoxYkG8mH/vXa8mx+jm8zqL+gCC2T1YTL6jagdtHv5GCnrXE89rM5mwO27tez/OtaTWZNkWeI9kEe6UCuXnCRx1Y= X-YMail-OSG: p3BmoNQVM1kzsYp2jIJjzbiD4A4xCcIDVOAFZr9daKiupEy lEmAvyvNB_vXel2Isi2xncYvst7X3yXhzkEsAdD9pFryQIyDfjPEiQam2Da6 sJnCZ.bV5Rfkm20Si1SDTQ.xxoq681.UMMHBHLWgLvw5L5QNpwockUUBIj3g Z6rehx4Xci3qlIHZn1K.d_MhGOD_hLGidc50GmeM1QTJHGy1GLaYwA4ustPn Nh5Is81YIax4w.H3LIkei..fhQqb5SGKnzN4nU0lsXFYePPvmGcBADSVhgE4 MGWByTUdIxUM0BhMZR3MBh9Fs2dmlWGvIAxjJd4C06ivFDZyHN_MX8HCLXqR P4r7WwIAljwcvHNWujOx3rnekes8QXb4YxEB7d3LsMUe1gD.WqtuJ5PrIiE7 Tw5BHyWYVeYlKLnzA26AXHr.X0jIjZb9tD_AJbKdwbeD7mmtNK7NR3xZu7i4 FhPPX8V._65JJW30fJOvEY39nNM3_WZP87HV6mCo9BRG0mfXLB7QNsivS2yO 3Fk9Owxv9tD9xtaEz.reoQxgx5oM45eF0Q812F41cNh6i4OEPv70frhFJDX4 kHFNtoTObEe7ia0P6tyeF3IZeiQtsVWCAihJkRKS4PSPGTA-- Received: from [200.118.157.7] by web113507.mail.gq1.yahoo.com via HTTP; Wed, 31 Aug 2011 08:56:40 PDT X-RocketYMMF: giffunip X-Mailer: YahooMailClassic/14.0.5 YahooMailWebService/0.8.113.315625 Message-ID: <1314806200.14687.YahooMailClassic@web113507.mail.gq1.yahoo.com> Date: Wed, 31 Aug 2011 08:56:40 -0700 (PDT) From: "Pedro F. Giffuni" To: freebsd-fs@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Subject: SEEK_DATA/SEEK_HOLE on UFS/EXT2FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: giffunip@tutopia.com List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 16:09:17 -0000 Hi; Just FYI, after reconsidering their position wrt NIH, the linux guys now think SEEK_DATA/SEEK_HOLE is wonderful: http://lwn.net/Articles/440255/ and NetBSD is known to be working on it too (latest patch): http://mail-index.netbsd.org/tech-kern/2011/08/17/msg011231.html I hope our own developers haven't forgotten that this is indeed a desired feature and that we get it for 10.0 or, if possible, 9.1. cheers, Pedro. From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 16:13:37 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 065D4106564A; Wed, 31 Aug 2011 16:13:37 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5]) by mx1.freebsd.org (Postfix) with ESMTP id 666268FC16; Wed, 31 Aug 2011 16:13:36 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VGDWKZ046077; Wed, 31 Aug 2011 23:13:32 +0700 (NOVST) (envelope-from egrosbein@rdtc.ru) Message-ID: <4E5E5DA7.1010802@rdtc.ru> Date: Wed, 31 Aug 2011 23:13:27 +0700 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 To: Adam Vande More References: <4E5E46B1.4070408@rdtc.ru> In-Reply-To: Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 8bit Cc: stable@FreeBSD.org, fs@FreeBSD.org Subject: Re: Unfixable UFS2 corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 16:13:37 -0000 31.08.2011 23:02, Adam Vande More ÐÉÛÅÔ: > Long story short: my /usr/local UFS2 filesystem somehow got corrupted > and "fsck -y" in single user mode does not fix it. > > Not sure if this helps or not but on rare occasion I've had to run fsck twice consecutively to fix a FS. Not this time - fsck does NOT find any problems in this file system. Now I think fsck_ffs needs a patch: --- sbin/fsck_ffs/dir.c.orig 2011-08-31 22:54:23.000000000 +0700 +++ sbin/fsck_ffs/dir.c 2011-08-31 22:54:48.000000000 +0700 @@ -225,7 +225,7 @@ type = dp->d_type; if (dp->d_reclen < size || idesc->id_filesize < size || - namlen > MAXNAMLEN || + namlen == 0 || namlen > MAXNAMLEN || type > 15) goto bad; for (cp = dp->d_name, size = 0; size < namlen; size++) Comments? Eugene Grosbein From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 16:24:11 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A4A11065670; Wed, 31 Aug 2011 16:24:11 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5]) by mx1.freebsd.org (Postfix) with ESMTP id 920448FC1A; Wed, 31 Aug 2011 16:24:10 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VGO9gM046130; Wed, 31 Aug 2011 23:24:09 +0700 (NOVST) (envelope-from egrosbein@rdtc.ru) Message-ID: <4E5E6024.3030708@rdtc.ru> Date: Wed, 31 Aug 2011 23:24:04 +0700 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 References: <4E5E46B1.4070408@rdtc.ru> <4E5E5DA7.1010802@rdtc.ru> In-Reply-To: <4E5E5DA7.1010802@rdtc.ru> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 8bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Unfixable UFS2 corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 16:24:11 -0000 31.08.2011 23:13, Eugene Grosbein ÐÉÛÅÔ: > 31.08.2011 23:02, Adam Vande More ÐÉÛÅÔ: > >> Long story short: my /usr/local UFS2 filesystem somehow got corrupted >> and "fsck -y" in single user mode does not fix it. >> >> Not sure if this helps or not but on rare occasion I've had to run fsck twice consecutively to fix a FS. > > Not this time - fsck does NOT find any problems in this file system. > > Now I think fsck_ffs needs a patch: > > --- sbin/fsck_ffs/dir.c.orig 2011-08-31 22:54:23.000000000 +0700 > +++ sbin/fsck_ffs/dir.c 2011-08-31 22:54:48.000000000 +0700 > @@ -225,7 +225,7 @@ > type = dp->d_type; > if (dp->d_reclen < size || > idesc->id_filesize < size || > - namlen > MAXNAMLEN || > + namlen == 0 || namlen > MAXNAMLEN || > type > 15) > goto bad; > for (cp = dp->d_name, size = 0; size < namlen; size++) > > > Comments? With this patch applied, my FS has finally been fixed by fsck: ** Last Mounted on /usr/local ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames DIRECTORY CORRUPTED I=1531227 OWNER=root MODE=40755 SIZE=4608 MTIME=Aug 30 01:28 2011 DIR=/obj/usr/local/src/secure/lib/libssh SALVAGE? [yn] ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts LINK COUNT FILE I=24 OWNER=root MODE=100644 SIZE=892 MTIME=Sep 17 11:10 2010 COUNT 2 SHOULD BE 1 ADJUST? [yn] ** Phase 5 - Check Cyl groups 459580 files, 7411823 used, 7819495 free (105503 frags, 964249 blocks, 0.7% fragmentation) ***** FILE SYSTEM IS CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** Should I fill PR? Eugene Grosbein From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 16:53:16 2011 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3311B106564A; Wed, 31 Aug 2011 16:53:16 +0000 (UTC) (envelope-from egrosbein@rdtc.ru) Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5]) by mx1.freebsd.org (Postfix) with ESMTP id 78FE78FC13; Wed, 31 Aug 2011 16:53:15 +0000 (UTC) Received: from eg.sd.rdtc.ru (localhost [127.0.0.1]) by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VGrEU3046226; Wed, 31 Aug 2011 23:53:14 +0700 (NOVST) (envelope-from egrosbein@rdtc.ru) Message-ID: <4E5E66F5.6090401@rdtc.ru> Date: Wed, 31 Aug 2011 23:53:09 +0700 From: Eugene Grosbein User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU; rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7 MIME-Version: 1.0 To: Adrian Chadd References: <4E5E46B1.4070408@rdtc.ru> <4E5E5DA7.1010802@rdtc.ru> In-Reply-To: Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 8bit Cc: stable@freebsd.org, fs@freebsd.org Subject: Re: Unfixable UFS2 corruption X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 16:53:16 -0000 31.08.2011 23:34, Adrian Chadd ÐÉÛÅÔ: > Have you created a PR for this? http://www.freebsd.org/cgi/query-pr.cgi?pr=160339 Eugene Grosbein From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 19:57:25 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A411E1065675; Wed, 31 Aug 2011 19:57:25 +0000 (UTC) (envelope-from geo.liaskos@gmail.com) Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com [209.85.216.182]) by mx1.freebsd.org (Postfix) with ESMTP id F34608FC15; Wed, 31 Aug 2011 19:57:24 +0000 (UTC) Received: by qyk9 with SMTP id 9so848296qyk.13 for ; Wed, 31 Aug 2011 12:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=I4ceq4uVTQCxf9i2HBL1+ebwXawXGw2W/6KELVylrsI=; b=KYqg9gD3+ZJsPA7N751QnAgBH4K8fbJOnYgwou8rMDwbCPSMaNpXy0TER8d0T4uUde gpoy+IJ/Hd3s+ABz1SF9GDwYnlrP8OfiR5zBFQnr/AGf2V+GAH7YjNaF9Hm1hXsXISee ZBHEBw/A2tOzFjK7nZ/qpI/dY+7hKlAtWhYBg= MIME-Version: 1.0 Received: by 10.229.89.66 with SMTP id d2mr672833qcm.93.1314820643950; Wed, 31 Aug 2011 12:57:23 -0700 (PDT) Received: by 10.229.89.138 with HTTP; Wed, 31 Aug 2011 12:57:23 -0700 (PDT) In-Reply-To: <382461010.589453.1314794995233.JavaMail.root@erie.cs.uoguelph.ca> References: <382461010.589453.1314794995233.JavaMail.root@erie.cs.uoguelph.ca> Date: Wed, 31 Aug 2011 22:57:23 +0300 Message-ID: From: George Liaskos To: Rick Macklem Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 19:57:25 -0000 On Wed, Aug 31, 2011 at 3:49 PM, Rick Macklem wrote: > Well, I can't imagine why this would matter, but you can try this patch, > which fixes a problem introduced by r224810 where Lookup ".." no longer > works. (It's at http://people.freebsd.org/~rmacklem/dotdot.patch, in case > the white space gets munged.) > Index: fs/nfsserver/nfs_nfsdport.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- fs/nfsserver/nfs_nfsdport.c (revision 225270) > +++ fs/nfsserver/nfs_nfsdport.c (working copy) > @@ -282,6 +282,7 @@ nfsvno_namei(struct nfsrv_descript *nd, struct nam > > =C2=A0 =C2=A0 =C2=A0 =C2=A0*retdirp =3D NULL; > =C2=A0 =C2=A0 =C2=A0 =C2=A0cnp->cn_nameptr =3D cnp->cn_pnbuf; > + =C2=A0 =C2=A0 =C2=A0 ndp->ni_strictrelative =3D 0; > =C2=A0 =C2=A0 =C2=A0 =C2=A0/* > =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Extract and set starting directory. > =C2=A0 =C2=A0 =C2=A0 =C2=A0 */ > Index: nfsserver/nfs_serv.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- nfsserver/nfs_serv.c =C2=A0 =C2=A0 =C2=A0 =C2=A0(revision 225270) > +++ nfsserver/nfs_serv.c =C2=A0 =C2=A0 =C2=A0 =C2=A0(working copy) > @@ -157,6 +157,7 @@ ndclear(struct nameidata *nd) > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_vp =3D NULL; > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_dvp =3D NULL; > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_startdir =3D NULL; > + =C2=A0 =C2=A0 =C2=A0 nd->ni_strictrelative =3D 0; > =C2=A0} > > =C2=A0/* > > rick This patch works for me. :) Regards, George From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 22:08:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 68D6B106566B; Wed, 31 Aug 2011 22:08:46 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id E7E078FC0A; Wed, 31 Aug 2011 22:08:45 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAOivXk6DaFvO/2dsb2JhbABCFoQ2pHmBQAEBBAEjBFIFFAIOCgICDRkCWQYTh3IEqQOSF4EshBiBEQSTJZEl X-IronPort-AV: E=Sophos;i="4.68,309,1312171200"; d="scan'208";a="132870357" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 31 Aug 2011 18:08:44 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C43D5B3F80; Wed, 31 Aug 2011 18:08:44 -0400 (EDT) Date: Wed, 31 Aug 2011 18:08:44 -0400 (EDT) From: Rick Macklem To: George Liaskos Message-ID: <1463532532.632117.1314828524787.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.17.91.203] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek Subject: Re: NFSv4: After upgrade to 9 users can no longer list files. (sounds like a ZFS issue?) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 22:08:46 -0000 George Liaskos wrote: > On Wed, Aug 31, 2011 at 3:49 PM, Rick Macklem > wrote: > > Well, I can't imagine why this would matter, but you can try this > > patch, > > which fixes a problem introduced by r224810 where Lookup ".." no > > longer > > works. (It's at http://people.freebsd.org/~rmacklem/dotdot.patch, in > > case > > the white space gets munged.) > > Index: fs/nfsserver/nfs_nfsdport.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- fs/nfsserver/nfs_nfsdport.c (revision 225270) > > +++ fs/nfsserver/nfs_nfsdport.c (working copy) > > @@ -282,6 +282,7 @@ nfsvno_namei(struct nfsrv_descript *nd, struct > > nam > > > > =C2=A0 =C2=A0 =C2=A0 =C2=A0*retdirp =3D NULL; > > =C2=A0 =C2=A0 =C2=A0 =C2=A0cnp->cn_nameptr =3D cnp->cn_pnbuf; > > + ndp->ni_strictrelative =3D 0; > > =C2=A0 =C2=A0 =C2=A0 =C2=A0/* > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Extract and set starting directory. > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 */ > > Index: nfsserver/nfs_serv.c > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > --- nfsserver/nfs_serv.c (revision 225270) > > +++ nfsserver/nfs_serv.c (working copy) > > @@ -157,6 +157,7 @@ ndclear(struct nameidata *nd) > > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_vp =3D NULL; > > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_dvp =3D NULL; > > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_startdir =3D NULL; > > + nd->ni_strictrelative =3D 0; > > =C2=A0} > > > > =C2=A0/* > > > > rick >=20 > This patch works for me. :) >=20 Ah, good. (I can't think of why root vs non-root would have mattered, but if it fixed the problem. Maybe just a side effect, since without being initialized, it would be whatever happened to be on the stack.) Thanks for doing the legwork on this and letting us know. This patch is in the re@ queue, rick. > Regards, > George From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 22:34:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C7B1106566B for ; Wed, 31 Aug 2011 22:34:00 +0000 (UTC) (envelope-from ee@athyriogames.com) Received: from madonna.sslcatacombnetworking.com (madonna.sslcatacombnetworking.com [174.133.19.130]) by mx1.freebsd.org (Postfix) with ESMTP id 28BAB8FC15 for ; Wed, 31 Aug 2011 22:33:59 +0000 (UTC) Received: from c-98-206-215-156.hsd1.in.comcast.net ([98.206.215.156] helo=laptopv) by madonna.sslcatacombnetworking.com with esmtpa (Exim 4.69) (envelope-from ) id 1QytBt-0007b2-NM; Wed, 31 Aug 2011 17:23:26 -0500 From: "Engineering" To: "'Peter Jeremy'" References: <01c801cc667f$f99eb7b0$ecdc2710$@com> <020d01cc6724$0f0410b0$2d0c3210$@com> <20110831210623.GB25698@server.vk2pj.dyndns.org> In-Reply-To: <20110831210623.GB25698@server.vk2pj.dyndns.org> Date: Wed, 31 Aug 2011 17:33:25 -0500 Message-ID: <029c01cc682e$02b82a70$08287f50$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcxoIGIKeoShtOcgQomZZrWDXA027QADXhPg Content-Language: en-us X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - madonna.sslcatacombnetworking.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - athyriogames.com Cc: freebsd-fs@freebsd.org Subject: RE: Read-only disk problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 22:34:00 -0000 Thank you very much! That is what I needed. I have / moutned read only in fstab, but I needed 'root_rw_mount="NO" To seal the deal Thanks again! Sam -----Original Message----- From: Peter Jeremy [mailto:peterjeremy@acm.org] Sent: Wednesday, August 31, 2011 4:06 PM To: Engineering Cc: freebsd-fs@freebsd.org Subject: Re: Read-only disk problem On 2011-Aug-30 09:49:41 -0500, Engineering wrote: >Hi, I've attached some more info. Doing a fsdump shows the following >changes over reboot > >magic 19540119 (UFS2) time Tue Aug 30 03:08:04 2011 >... >cg 1: >magic 90255 tell 4b1c000 time Tue Aug 30 03:08:04 2011 > >Changes to > >magic 19540119 (UFS2) time Tue Aug 30 03:13:14 2011 >... >cg 1: >magic 90255 tell 4b1c000 time Tue Aug 30 03:13:14 2011 It's normal for CG's and superblocks to be updated when there's any activity on a read-write UFS. (By default, the inode atime field will be lazily updated when the inode is accessed, so just reading from a UFS mounted RW is enough to cause writes). >Is there any data that is written to the disk at boot or mount time, >and if so, is there a way to prevent it? Are you sure that the FS is mounted read-only? / is automatically mounted read-write unless 'root_rw_mount="NO"' is specified in /etc/rc.conf. If a filesystem is mounted read-only, it will not be updated at all. -- Peter Jeremy From owner-freebsd-fs@FreeBSD.ORG Wed Aug 31 23:24:03 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF137106566C for ; Wed, 31 Aug 2011 23:24:03 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from fallbackmx09.syd.optusnet.com.au (fallbackmx09.syd.optusnet.com.au [211.29.132.242]) by mx1.freebsd.org (Postfix) with ESMTP id 80A828FC0A for ; Wed, 31 Aug 2011 23:24:03 +0000 (UTC) Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au [211.29.133.168]) by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p7VL6XFb016293 for ; Thu, 1 Sep 2011 07:06:33 +1000 Received: from server.vk2pj.dyndns.org (c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103]) by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p7VL6OFx015513 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 1 Sep 2011 07:06:25 +1000 X-Bogosity: Ham, spamicity=0.000000 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id p7VL6Ofj025834; Thu, 1 Sep 2011 07:06:24 +1000 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id p7VL6NVp025833; Thu, 1 Sep 2011 07:06:23 +1000 (EST) (envelope-from peter) Date: Thu, 1 Sep 2011 07:06:23 +1000 From: Peter Jeremy To: Engineering Message-ID: <20110831210623.GB25698@server.vk2pj.dyndns.org> References: <01c801cc667f$f99eb7b0$ecdc2710$@com> <020d01cc6724$0f0410b0$2d0c3210$@com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="cmJC7u66zC7hs+87" Content-Disposition: inline In-Reply-To: <020d01cc6724$0f0410b0$2d0c3210$@com> X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: Read-only disk problem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Aug 2011 23:24:04 -0000 --cmJC7u66zC7hs+87 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On 2011-Aug-30 09:49:41 -0500, Engineering wrote: >Hi, I've attached some more info. Doing a fsdump shows the following chang= es >over reboot > >magic 19540119 (UFS2) time Tue Aug 30 03:08:04 2011 >... >cg 1: >magic 90255 tell 4b1c000 time Tue Aug 30 03:08:04 2011 > >Changes to > >magic 19540119 (UFS2) time Tue Aug 30 03:13:14 2011 >... >cg 1: >magic 90255 tell 4b1c000 time Tue Aug 30 03:13:14 2011 It's normal for CG's and superblocks to be updated when there's any activity on a read-write UFS. (By default, the inode atime field will be lazily updated when the inode is accessed, so just reading =66rom a UFS mounted RW is enough to cause writes). >Is there any data that is written to the disk at boot or mount time, and if >so, is there a way to prevent it? Are you sure that the FS is mounted read-only? / is automatically mounted read-write unless 'root_rw_mount=3D"NO"' is specified in /etc/rc.conf. If a filesystem is mounted read-only, it will not be updated at all. --=20 Peter Jeremy --cmJC7u66zC7hs+87 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (FreeBSD) iEYEARECAAYFAk5eok8ACgkQ/opHv/APuIfzTwCdGUahmlNAJX9lErJpUdSxn3kM jCcAnj/EO/eFuzcPcnhyVnbZ5s0mPB2F =kWYR -----END PGP SIGNATURE----- --cmJC7u66zC7hs+87-- From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 04:28:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ED1FC106566C for ; Thu, 1 Sep 2011 04:28:09 +0000 (UTC) (envelope-from gnehzuil@gmail.com) Received: from mail-pz0-f45.google.com (mail-pz0-f45.google.com [209.85.210.45]) by mx1.freebsd.org (Postfix) with ESMTP id C4EED8FC0C for ; Thu, 1 Sep 2011 04:28:09 +0000 (UTC) Received: by pzk33 with SMTP id 33so4155511pzk.18 for ; Wed, 31 Aug 2011 21:28:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=nITCqT0/1uMSUgYaLxJYIdC0+9gSuDfbU4Wu3IYFysU=; b=L8Ckmk8tYRRpV6Tc0BBOWTwxiw4wtREttpnGNBvczMo1k9UaSKOP2HyF+4SL2q4p52 ZV3w9w6CoominaI4ZUSSMddsvDR5aSa5SKlC1wNbOKMOU+vB7GyMNZ4XbCPwZFzH4Tw+ EyxoDUDy1WqMmdWvSNDACTdMAzqD6aLAJeI4o= Received: by 10.68.64.103 with SMTP id n7mr1513907pbs.303.1314849861994; Wed, 31 Aug 2011 21:04:21 -0700 (PDT) Received: from [10.32.101.195] ([182.92.247.2]) by mx.google.com with ESMTPS id m16sm310248wfd.0.2011.08.31.21.04.19 (version=SSLv3 cipher=OTHER); Wed, 31 Aug 2011 21:04:20 -0700 (PDT) Message-ID: <4E5F043F.2010303@gmail.com> Date: Thu, 01 Sep 2011 12:04:15 +0800 From: gnehzuil User-Agent: Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20110627 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <1314806200.14687.YahooMailClassic@web113507.mail.gq1.yahoo.com> In-Reply-To: <1314806200.14687.YahooMailClassic@web113507.mail.gq1.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: SEEK_DATA/SEEK_HOLE on UFS/EXT2FS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 04:28:10 -0000 Hi Pedro, Actually, in linux, it doesn't really support SEEK_DATA/SEEK_HOLE. The patches related don't be merged into mainline. At present, when lseek(2) is called with SEEK_DATA, the entire file is as data, as long as offset is smaller than the end of the file. Meanwhile, a virtual hole is at the end of the file. So lseek(2) is called with SEEK_HOLE, i_size in linux is returned. Best regards, lz On 08/31/2011 11:56 PM, Pedro F. Giffuni wrote: > Hi; > > Just FYI, after reconsidering their position wrt NIH, the > linux guys now think SEEK_DATA/SEEK_HOLE is wonderful: > > http://lwn.net/Articles/440255/ > > and NetBSD is known to be working on it too (latest patch): > > http://mail-index.netbsd.org/tech-kern/2011/08/17/msg011231.html > > I hope our own developers haven't forgotten that this > is indeed a desired feature and that we get it for 10.0 > or, if possible, 9.1. > > cheers, > > Pedro. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 06:37:32 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85BFA106564A for ; Thu, 1 Sep 2011 06:37:32 +0000 (UTC) (envelope-from dan@3geeks.org) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4A6B48FC0C for ; Thu, 1 Sep 2011 06:37:31 +0000 (UTC) Received: by yxn22 with SMTP id 22so195226yxn.13 for ; Wed, 31 Aug 2011 23:37:31 -0700 (PDT) Received: by 10.236.155.198 with SMTP id j46mr7014576yhk.23.1314857488130; Wed, 31 Aug 2011 23:11:28 -0700 (PDT) Received: from [172.16.1.35] (99-126-192-237.lightspeed.austtx.sbcglobal.net [99.126.192.237]) by mx.google.com with ESMTPS id a29sm453578yhj.45.2011.08.31.23.11.26 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 31 Aug 2011 23:11:27 -0700 (PDT) From: Daniel Mayfield Date: Thu, 1 Sep 2011 01:11:25 -0500 Message-Id: To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 06:37:32 -0000 I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB disks. = I followed the instructions here: = http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimiz= ed-for-4k-sector-drives/, but just building a single pool so three = partitions per disk (boot, swap and zfs). I'm using the mfsBSD image to = do the boot code. When I reboot to actually come up from ZFS, the = loader spins for half a second and then the machine reboots. I've seen = a number of bug reports on gptzfsboot and 4k sector pools, but I never = saw one fail so early. What data would the ZFS people need to help fix = this? daniel= From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 08:08:23 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DB5C91065672 for ; Thu, 1 Sep 2011 08:08:22 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 9D9CB8FC12 for ; Thu, 1 Sep 2011 08:08:22 +0000 (UTC) Received: by gyd10 with SMTP id 10so1501499gyd.13 for ; Thu, 01 Sep 2011 01:08:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/Hem36pZxNM2hs3WRMKsruZuTdzostbLszBu6DHc76c=; b=Egi+vnfYEtVnBaXMXTXleLYzugGTplNg6CuABtk7WofqRgfqj4226Yd//6V2ZqU/vB dMxYv4KqO+CtzpTgCvn26Qb2qJXly+B9v93VPQIx03WblTh+0U2yc0Lnub/am5hnDV4G pAtcdpyOb6ElLznSzOTJQdEri6QCnCSHi+/tY= MIME-Version: 1.0 Received: by 10.236.116.199 with SMTP id g47mr7192038yhh.44.1314864502024; Thu, 01 Sep 2011 01:08:22 -0700 (PDT) Received: by 10.236.103.19 with HTTP; Thu, 1 Sep 2011 01:08:22 -0700 (PDT) In-Reply-To: References: Date: Thu, 1 Sep 2011 09:08:22 +0100 Message-ID: From: krad To: Daniel Mayfield Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 08:08:23 -0000 On 1 September 2011 07:11, Daniel Mayfield wrote: > I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB disks. I > followed the instructions here: > http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/, > but just building a single pool so three partitions per disk (boot, swap and > zfs). I'm using the mfsBSD image to do the boot code. When I reboot to > actually come up from ZFS, the loader spins for half a second and then the > machine reboots. I've seen a number of bug reports on gptzfsboot and 4k > sector pools, but I never saw one fail so early. What data would the ZFS > people need to help fix this? > > daniel_______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > try these boot bits they always work for me http://people.freebsd.org/~pjd/zfsboot/ From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 13:07:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 738A6106564A for ; Thu, 1 Sep 2011 13:07:09 +0000 (UTC) (envelope-from trent@snakebite.org) Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188]) by mx1.freebsd.org (Postfix) with ESMTP id 514B58FC12 for ; Thu, 1 Sep 2011 13:07:09 +0000 (UTC) Received: from EXCASUM03.exchhosting.com (192.168.11.203) by exhub05.exchhosting.com (192.168.11.101) with Microsoft SMTP Server (TLS) id 8.2.213.0; Thu, 1 Sep 2011 05:57:00 -0700 Received: from [10.211.55.3] (35.11.55.172) by exchange.liveoffice.com (192.168.11.203) with Microsoft SMTP Server (TLS) id 8.2.213.0; Thu, 1 Sep 2011 05:57:00 -0700 Message-ID: <4E5F811A.2040307@snakebite.org> Date: Thu, 1 Sep 2011 08:56:58 -0400 From: Trent Nelson User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Daniel Mayfield References: In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: "freebsd-fs@freebsd.org" Subject: Re: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 13:07:09 -0000 On 01-Sep-11 2:11 AM, Daniel Mayfield wrote: > I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB > disks. I followed the instructions here: > http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/, > but just building a single pool so three partitions per disk (boot, > swap and zfs). I'm using the mfsBSD image to do the boot code. When > I reboot to actually come up from ZFS, the loader spins for half a > second and then the machine reboots. I've seen a number of bug > reports on gptzfsboot and 4k sector pools, but I never saw one fail > so early. What data would the ZFS people need to help fix this? FWIW, I experienced the exact same issue about a week ago with four new WD EARS 2TB disks. I contemplated looking into fixing it, until I noticed the crazy disk usage with 4K sectors. On my old box, my /usr/src dataset was ~450MB (mirrored 512-byte drives), on the new box with the 2TB 4k sector drives, /usr/src was 1.5-something GB. Exact same settings. This appeared to be the case for *everything*; every file system/zfs dataset seemed to be consuming 2-3 times more space on the 4K-sector box. So, combine that with the fact that I couldn't boot into it anyway, and I ditched the 4k-sector effort and just re-built with raidz as per normal (i.e. with 512-byte sectors). One week later? Disk usage is sensible, as expected, but performance (especially writing) is pretty horrid. As much as I'd like to blame raidz overhead, I'm not sure it's the problem; I've got a gstripe of 4x16GB partitions at the start of each 2TB as /scratch; dd'ing /dev/zero to that doesn't yield write speeds faster than ~20-30MB/s if I'm lucky. Writing to the raidz partition nets about 15-20MB/s in very bursty peaks. NFS and Samba performance are even worse; 2-3MB/s sustained if I'm lucky, with the odd burst of 20MB/s every so often. (The box is a lowly dual-core Athlon 1800 w/ 8GB RAM, 8-stable from yesterday.) So, uh, no solution from my end, but perhaps some more problems for you to run into if you get it to boot ;-) Trent. From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 16:30:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 645FD1065673 for ; Thu, 1 Sep 2011 16:30:27 +0000 (UTC) (envelope-from dan@3geeks.org) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 299CA8FC14 for ; Thu, 1 Sep 2011 16:30:26 +0000 (UTC) Received: by gwb15 with SMTP id 15so1315042gwb.13 for ; Thu, 01 Sep 2011 09:30:26 -0700 (PDT) Received: by 10.150.254.1 with SMTP id b1mr226355ybi.323.1314894626396; Thu, 01 Sep 2011 09:30:26 -0700 (PDT) Received: from [172.16.1.35] (99-126-192-237.lightspeed.austtx.sbcglobal.net [99.126.192.237]) by mx.google.com with ESMTPS id l18sm102295ybg.7.2011.09.01.09.30.24 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 01 Sep 2011 09:30:24 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) From: Daniel Mayfield In-Reply-To: <4E5F811A.2040307@snakebite.org> Date: Thu, 1 Sep 2011 11:30:23 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org> References: <4E5F811A.2040307@snakebite.org> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1084) Subject: Re: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 16:30:27 -0000 On Sep 1, 2011, at 7:56 AM, Trent Nelson wrote: > On 01-Sep-11 2:11 AM, Daniel Mayfield wrote: >> I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB >> disks. I followed the instructions here: >> = http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimiz= ed-for-4k-sector-drives/, >> but just building a single pool so three partitions per disk (boot, >> swap and zfs). I'm using the mfsBSD image to do the boot code. When >> I reboot to actually come up from ZFS, the loader spins for half a >> second and then the machine reboots. I've seen a number of bug >> reports on gptzfsboot and 4k sector pools, but I never saw one fail >> so early. What data would the ZFS people need to help fix this? >=20 > FWIW, I experienced the exact same issue about a week ago with four = new WD EARS 2TB disks. I contemplated looking into fixing it, until I = noticed the crazy disk usage with 4K sectors. On my old box, my = /usr/src dataset was ~450MB (mirrored 512-byte drives), on the new box = with the 2TB 4k sector drives, /usr/src was 1.5-something GB. Exact = same settings. I noticed that the free data space was also bigger. I tried it with = raidz on the 512B sectors and it claimed to have only 5.3T of space. = With 4KB sectors, it claimed to have 7.25T of space. Seems like = something is wonky in the space calculations? daniel= From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 17:18:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 910CD106564A for ; Thu, 1 Sep 2011 17:18:05 +0000 (UTC) (envelope-from trent@snakebite.org) Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188]) by mx1.freebsd.org (Postfix) with ESMTP id 7222F8FC0C for ; Thu, 1 Sep 2011 17:18:05 +0000 (UTC) Received: from EXCASUM03.exchhosting.com (192.168.11.203) by exhub03.exchhosting.com (192.168.11.104) with Microsoft SMTP Server (TLS) id 8.2.213.0; Thu, 1 Sep 2011 10:17:52 -0700 Received: from [10.211.55.3] (35.11.55.172) by exchange.liveoffice.com (192.168.11.203) with Microsoft SMTP Server (TLS) id 8.2.213.0; Thu, 1 Sep 2011 10:17:52 -0700 Message-ID: <4E5FBE3E.7020706@snakebite.org> Date: Thu, 1 Sep 2011 13:17:50 -0400 From: Trent Nelson User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Daniel Mayfield , "freebsd-fs@freebsd.org" References: <4E5F811A.2040307@snakebite.org> <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org> In-Reply-To: <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: Re: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 17:18:05 -0000 On 01-Sep-11 12:30 PM, Daniel Mayfield wrote: > > On Sep 1, 2011, at 7:56 AM, Trent Nelson wrote: > >> On 01-Sep-11 2:11 AM, Daniel Mayfield wrote: >>> I just set this up on an Athlon64 machine I have w/ 4 WD EARS >>> 2TB disks. I followed the instructions here: >>> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/, >>>but just building a single pool so three partitions per disk (boot, >>> swap and zfs). I'm using the mfsBSD image to do the boot code. >>> When I reboot to actually come up from ZFS, the loader spins for >>> half a second and then the machine reboots. I've seen a number >>> of bug reports on gptzfsboot and 4k sector pools, but I never saw >>> one fail so early. What data would the ZFS people need to help >>> fix this? >> >> FWIW, I experienced the exact same issue about a week ago with four >> new WD EARS 2TB disks. I contemplated looking into fixing it, >> until I noticed the crazy disk usage with 4K sectors. On my old >> box, my /usr/src dataset was ~450MB (mirrored 512-byte drives), on >> the new box with the 2TB 4k sector drives, /usr/src was >> 1.5-something GB. Exact same settings. > > I noticed that the free data space was also bigger. I tried it with > raidz on the 512B sectors and it claimed to have only 5.3T of space. > With 4KB sectors, it claimed to have 7.25T of space. Seems like > something is wonky in the space calculations? Hmmmm. It didn't occur to me that the space calculations might be wonky. That could explain why I was seeing disk usage much higher on 4K than 512-bytes for all my zfs datasets. Here's my zpool/zfs output w/ 512-byte sectors (4-disk raidz): [root@flanker/ttypts/0(~)#] zpool list tank NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT tank 7.12T 698G 6.44T 9% 1.16x ONLINE - [root@flanker/ttypts/0(~)#] zfs list tank NAME USED AVAIL REFER MOUNTPOINT tank 604G 4.74T 46.4K legacy It's a raidz1-0 of four 2TB disks, so the space available should be (4-1=3)*2TB=6TB? Although I presume that's 6-marketing-terabtyes, which translates to ... 6000000000000/(1024^4)=5. And I've got 64k boot, 8G swap, 16G scratch on each drive *before* the tank, so eh, I guess 4.74T sounds about right. The 7.12T reported by zpool doesn't seem to be taking into account the reduced space from the raidz parity. *shrug* Enough about sizes; what's your read/write performance like between 512-byte/4K? I didn't think to test performance in the 4K configuration; I really wish I had, now. Trent. From owner-freebsd-fs@FreeBSD.ORG Thu Sep 1 17:46:16 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B94A71065670 for ; Thu, 1 Sep 2011 17:46:16 +0000 (UTC) (envelope-from dan@3geeks.org) Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com [209.85.218.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7DD4B8FC08 for ; Thu, 1 Sep 2011 17:46:16 +0000 (UTC) Received: by yib19 with SMTP id 19so2019759yib.13 for ; Thu, 01 Sep 2011 10:46:15 -0700 (PDT) Received: by 10.236.136.65 with SMTP id v41mr874025yhi.29.1314899175618; Thu, 01 Sep 2011 10:46:15 -0700 (PDT) Received: from [172.16.1.35] (99-126-192-237.lightspeed.austtx.sbcglobal.net [99.126.192.237]) by mx.google.com with ESMTPS id o48sm227019yhl.4.2011.09.01.10.46.13 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 01 Sep 2011 10:46:14 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) From: Daniel Mayfield In-Reply-To: <4E5FBE3E.7020706@snakebite.org> Date: Thu, 1 Sep 2011 12:46:12 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <553883C7-B97D-429F-AF4A-E208B6051B62@3geeks.org> References: <4E5F811A.2040307@snakebite.org> <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org> <4E5FBE3E.7020706@snakebite.org> To: freebsd-fs@freebsd.org X-Mailer: Apple Mail (2.1084) Subject: Re: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Sep 2011 17:46:16 -0000 >> I noticed that the free data space was also bigger. I tried it with >> raidz on the 512B sectors and it claimed to have only 5.3T of space. >> With 4KB sectors, it claimed to have 7.25T of space. Seems like >> something is wonky in the space calculations? >=20 > Hmmmm. It didn't occur to me that the space calculations might be = wonky. That could explain why I was seeing disk usage much higher on 4K = than 512-bytes for all my zfs datasets. Here's my zpool/zfs output w/ = 512-byte sectors (4-disk raidz): >=20 > [root@flanker/ttypts/0(~)#] zpool list tank > NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT > tank 7.12T 698G 6.44T 9% 1.16x ONLINE - > [root@flanker/ttypts/0(~)#] zfs list tank > NAME USED AVAIL REFER MOUNTPOINT > tank 604G 4.74T 46.4K legacy >=20 > It's a raidz1-0 of four 2TB disks, so the space available should be = (4-1=3D3)*2TB=3D6TB? Although I presume that's 6-marketing-terabtyes, = which translates to ... 6000000000000/(1024^4)=3D5. And I've got 64k = boot, 8G swap, 16G scratch on each drive *before* the tank, so eh, I = guess 4.74T sounds about right. >=20 > The 7.12T reported by zpool doesn't seem to be taking into account the = reduced space from the raidz parity. *shrug* >=20 > Enough about sizes; what's your read/write performance like between = 512-byte/4K? I didn't think to test performance in the 4K = configuration; I really wish I had, now. I didn't test performance. I'm doing all the work running from the = mfsBSD boot disc. I'm not sure a simple 'dd' is a good test, but if you = have suggestions, I'm open. daniel From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 07:15:46 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DC9AF106566B; Fri, 2 Sep 2011 07:15:46 +0000 (UTC) (envelope-from pawel@dawidek.net) Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60]) by mx1.freebsd.org (Postfix) with ESMTP id 5303D8FC14; Fri, 2 Sep 2011 07:15:46 +0000 (UTC) Received: from localhost (58.wheelsystems.com [83.12.187.58]) by mail.dawidek.net (Postfix) with ESMTPSA id 81235371; Fri, 2 Sep 2011 09:15:44 +0200 (CEST) Date: Fri, 2 Sep 2011 09:15:23 +0200 From: Pawel Jakub Dawidek To: Martin Matuska Message-ID: <20110902071523.GB1660@garage.freebsd.pl> References: <1314646728.7898.44.camel@pow> <4E5BFC6F.5080507@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="VrqPEDrXMn8OVzN4" Content-Disposition: inline In-Reply-To: <4E5BFC6F.5080507@FreeBSD.org> X-OS: FreeBSD 9.0-CURRENT amd64 User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk, luke@hybrid-logic.co.uk Subject: Re: ZFS hang in production on 8.2-RELEASE X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 07:15:46 -0000 --VrqPEDrXMn8OVzN4 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Aug 29, 2011 at 10:54:07PM +0200, Martin Matuska wrote: > On 29. 8. 2011 21:55, Artem Belevich wrote: > > It sounds like the bug Martin Matuska has recently fixed in FreeBSD > > and reported upstream to Illumos: > > https://www.illumos.org/issues/1313 > > > > The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th. > > > > --Artem > No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed > to stable/8 as r225100). >=20 > The corresponding patch is: > http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch My patch fixes deadlock when there is some activity in vdevs handlings (like removal of disk from the pool or something like that). The bug reported is definiately related to force unmount while file system is loaded. I've spend a lot of time trying to get forcible unmounts right, which is not an easy task, believe me, but it is possible the deadlock is already fixed in v28. --=20 Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com --VrqPEDrXMn8OVzN4 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (FreeBSD) iEYEARECAAYFAk5ggosACgkQForvXbEpPzTKeQCdFik3mew907gBOnvRpULE2u1r WgkAoLma6L6SccDqqo4r9EMHjl4lbY9O =QJSl -----END PGP SIGNATURE----- --VrqPEDrXMn8OVzN4-- From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 08:20:14 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9487E1065670 for ; Fri, 2 Sep 2011 08:20:14 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 839868FC12 for ; Fri, 2 Sep 2011 08:20:14 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828KEUe008796 for ; Fri, 2 Sep 2011 08:20:14 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828KEIQ008795; Fri, 2 Sep 2011 08:20:14 GMT (envelope-from gnats) Date: Fri, 2 Sep 2011 08:20:14 GMT Message-Id: <201109020820.p828KEIQ008795@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/160035: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 08:20:14 -0000 The following reply was made to PR kern/160035; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/160035: commit references a PR Date: Fri, 2 Sep 2011 08:19:40 +0000 (UTC) Author: mm Date: Fri Sep 2 08:19:19 2011 New Revision: 225326 URL: http://svn.freebsd.org/changeset/base/225326 Log: MFC r226155: Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c stable/8/sys/kern/vfs_vnops.c stable/8/sys/sys/vnode.h stable/8/sys/ufs/ffs/ffs_inode.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Fri Sep 2 08:19:19 2011 (r225326) @@ -1259,6 +1259,7 @@ zfs_rezget(znode_t *zp) zfsvfs_t *zfsvfs = zp->z_zfsvfs; dmu_object_info_t doi; dmu_buf_t *db; + vnode_t *vp; uint64_t obj_num = zp->z_id; uint64_t mode, size; sa_bulk_attr_t bulk[8]; @@ -1334,8 +1335,9 @@ zfs_rezget(znode_t *zp) * that for example regular file was replaced with directory * which has the same object number. */ - if (ZTOV(zp) != NULL && - ZTOV(zp)->v_type != IFTOVT((mode_t)zp->z_mode)) { + vp = ZTOV(zp); + if (vp != NULL && + vp->v_type != IFTOVT((mode_t)zp->z_mode)) { zfs_znode_dmu_fini(zp); ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); return (EIO); @@ -1343,8 +1345,11 @@ zfs_rezget(znode_t *zp) zp->z_unlinked = (zp->z_links == 0); zp->z_blksz = doi.doi_data_block_size; - if (zp->z_size != size && ZTOV(zp) != NULL) - vnode_pager_setsize(ZTOV(zp), zp->z_size); + if (vp != NULL) { + vn_pages_remove(vp, 0, 0); + if (zp->z_size != size) + vnode_pager_setsize(vp, zp->z_size); + } ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); Modified: stable/8/sys/kern/vfs_vnops.c ============================================================================== --- stable/8/sys/kern/vfs_vnops.c Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/kern/vfs_vnops.c Fri Sep 2 08:19:19 2011 (r225326) @@ -63,6 +63,9 @@ __FBSDID("$FreeBSD$"); #include +#include +#include + static fo_rdwr_t vn_read; static fo_rdwr_t vn_write; static fo_truncate_t vn_truncate; @@ -1353,3 +1356,15 @@ vn_rlimit_fsize(const struct vnode *vp, return (0); } + +void +vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end) +{ + vm_object_t object; + + if ((object = vp->v_object) == NULL) + return; + VM_OBJECT_LOCK(object); + vm_object_page_remove(object, start, end, 0); + VM_OBJECT_UNLOCK(object); +} Modified: stable/8/sys/sys/vnode.h ============================================================================== --- stable/8/sys/sys/vnode.h Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/sys/vnode.h Fri Sep 2 08:19:19 2011 (r225326) @@ -644,6 +644,7 @@ int _vn_lock(struct vnode *vp, int flags int vn_open(struct nameidata *ndp, int *flagp, int cmode, struct file *fp); int vn_open_cred(struct nameidata *ndp, int *flagp, int cmode, u_int vn_open_flags, struct ucred *cred, struct file *fp); +void vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end); int vn_pollrecord(struct vnode *vp, struct thread *p, int events); int vn_rdwr(enum uio_rw rw, struct vnode *vp, void *base, int len, off_t offset, enum uio_seg segflg, int ioflg, Modified: stable/8/sys/ufs/ffs/ffs_inode.c ============================================================================== --- stable/8/sys/ufs/ffs/ffs_inode.c Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/ufs/ffs/ffs_inode.c Fri Sep 2 08:19:19 2011 (r225326) @@ -129,18 +129,6 @@ ffs_update(vp, waitfor) } } -static void -ffs_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end) -{ - vm_object_t object; - - if ((object = vp->v_object) == NULL) - return; - VM_OBJECT_LOCK(object); - vm_object_page_remove(object, start, end, FALSE); - VM_OBJECT_UNLOCK(object); -} - #define SINGLE 0 /* index of single indirect block */ #define DOUBLE 1 /* index of double indirect block */ #define TRIPLE 2 /* index of triple indirect block */ @@ -218,7 +206,7 @@ ffs_truncate(vp, length, flags, cred, td (void) chkdq(ip, -extblocks, NOCRED, 0); #endif vinvalbuf(vp, V_ALT, 0, 0); - ffs_pages_remove(vp, + vn_pages_remove(vp, OFF_TO_IDX(lblktosize(fs, -extblocks)), 0); ip->i_din2->di_extsize = 0; for (i = 0; i < NXADDR; i++) { @@ -297,7 +285,7 @@ ffs_truncate(vp, length, flags, cred, td ASSERT_VOP_LOCKED(vp, "ffs_truncate1"); vinvalbuf(vp, needextclean ? 0 : V_NORMAL, 0, 0); if (!needextclean) - ffs_pages_remove(vp, 0, + vn_pages_remove(vp, 0, OFF_TO_IDX(lblktosize(fs, -extblocks))); vnode_pager_setsize(vp, 0); ip->i_flag |= IN_CHANGE | IN_UPDATE; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 08:20:17 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8DF82106567A for ; Fri, 2 Sep 2011 08:20:17 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7D4B88FC1B for ; Fri, 2 Sep 2011 08:20:17 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828KHpP008816 for ; Fri, 2 Sep 2011 08:20:17 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828KH03008815; Fri, 2 Sep 2011 08:20:17 GMT (envelope-from gnats) Date: Fri, 2 Sep 2011 08:20:17 GMT Message-Id: <201109020820.p828KH03008815@freefall.freebsd.org> To: freebsd-fs@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/156933: commit references a PR X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 08:20:17 -0000 The following reply was made to PR kern/156933; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/156933: commit references a PR Date: Fri, 2 Sep 2011 08:19:40 +0000 (UTC) Author: mm Date: Fri Sep 2 08:19:19 2011 New Revision: 225326 URL: http://svn.freebsd.org/changeset/base/225326 Log: MFC r226155: Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c stable/8/sys/kern/vfs_vnops.c stable/8/sys/sys/vnode.h stable/8/sys/ufs/ffs/ffs_inode.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c ============================================================================== --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c Fri Sep 2 08:19:19 2011 (r225326) @@ -1259,6 +1259,7 @@ zfs_rezget(znode_t *zp) zfsvfs_t *zfsvfs = zp->z_zfsvfs; dmu_object_info_t doi; dmu_buf_t *db; + vnode_t *vp; uint64_t obj_num = zp->z_id; uint64_t mode, size; sa_bulk_attr_t bulk[8]; @@ -1334,8 +1335,9 @@ zfs_rezget(znode_t *zp) * that for example regular file was replaced with directory * which has the same object number. */ - if (ZTOV(zp) != NULL && - ZTOV(zp)->v_type != IFTOVT((mode_t)zp->z_mode)) { + vp = ZTOV(zp); + if (vp != NULL && + vp->v_type != IFTOVT((mode_t)zp->z_mode)) { zfs_znode_dmu_fini(zp); ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); return (EIO); @@ -1343,8 +1345,11 @@ zfs_rezget(znode_t *zp) zp->z_unlinked = (zp->z_links == 0); zp->z_blksz = doi.doi_data_block_size; - if (zp->z_size != size && ZTOV(zp) != NULL) - vnode_pager_setsize(ZTOV(zp), zp->z_size); + if (vp != NULL) { + vn_pages_remove(vp, 0, 0); + if (zp->z_size != size) + vnode_pager_setsize(vp, zp->z_size); + } ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num); Modified: stable/8/sys/kern/vfs_vnops.c ============================================================================== --- stable/8/sys/kern/vfs_vnops.c Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/kern/vfs_vnops.c Fri Sep 2 08:19:19 2011 (r225326) @@ -63,6 +63,9 @@ __FBSDID("$FreeBSD$"); #include +#include +#include + static fo_rdwr_t vn_read; static fo_rdwr_t vn_write; static fo_truncate_t vn_truncate; @@ -1353,3 +1356,15 @@ vn_rlimit_fsize(const struct vnode *vp, return (0); } + +void +vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end) +{ + vm_object_t object; + + if ((object = vp->v_object) == NULL) + return; + VM_OBJECT_LOCK(object); + vm_object_page_remove(object, start, end, 0); + VM_OBJECT_UNLOCK(object); +} Modified: stable/8/sys/sys/vnode.h ============================================================================== --- stable/8/sys/sys/vnode.h Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/sys/vnode.h Fri Sep 2 08:19:19 2011 (r225326) @@ -644,6 +644,7 @@ int _vn_lock(struct vnode *vp, int flags int vn_open(struct nameidata *ndp, int *flagp, int cmode, struct file *fp); int vn_open_cred(struct nameidata *ndp, int *flagp, int cmode, u_int vn_open_flags, struct ucred *cred, struct file *fp); +void vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end); int vn_pollrecord(struct vnode *vp, struct thread *p, int events); int vn_rdwr(enum uio_rw rw, struct vnode *vp, void *base, int len, off_t offset, enum uio_seg segflg, int ioflg, Modified: stable/8/sys/ufs/ffs/ffs_inode.c ============================================================================== --- stable/8/sys/ufs/ffs/ffs_inode.c Fri Sep 2 08:15:48 2011 (r225325) +++ stable/8/sys/ufs/ffs/ffs_inode.c Fri Sep 2 08:19:19 2011 (r225326) @@ -129,18 +129,6 @@ ffs_update(vp, waitfor) } } -static void -ffs_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end) -{ - vm_object_t object; - - if ((object = vp->v_object) == NULL) - return; - VM_OBJECT_LOCK(object); - vm_object_page_remove(object, start, end, FALSE); - VM_OBJECT_UNLOCK(object); -} - #define SINGLE 0 /* index of single indirect block */ #define DOUBLE 1 /* index of double indirect block */ #define TRIPLE 2 /* index of triple indirect block */ @@ -218,7 +206,7 @@ ffs_truncate(vp, length, flags, cred, td (void) chkdq(ip, -extblocks, NOCRED, 0); #endif vinvalbuf(vp, V_ALT, 0, 0); - ffs_pages_remove(vp, + vn_pages_remove(vp, OFF_TO_IDX(lblktosize(fs, -extblocks)), 0); ip->i_din2->di_extsize = 0; for (i = 0; i < NXADDR; i++) { @@ -297,7 +285,7 @@ ffs_truncate(vp, length, flags, cred, td ASSERT_VOP_LOCKED(vp, "ffs_truncate1"); vinvalbuf(vp, needextclean ? 0 : V_NORMAL, 0, 0); if (!needextclean) - ffs_pages_remove(vp, 0, + vn_pages_remove(vp, 0, OFF_TO_IDX(lblktosize(fs, -extblocks))); vnode_pager_setsize(vp, 0); ip->i_flag |= IN_CHANGE | IN_UPDATE; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 08:23:59 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01895106566C; Fri, 2 Sep 2011 08:23:59 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CD4208FC15; Fri, 2 Sep 2011 08:23:58 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828Nwxj017853; Fri, 2 Sep 2011 08:23:58 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828NvEc017849; Fri, 2 Sep 2011 08:23:57 GMT (envelope-from mm) Date: Fri, 2 Sep 2011 08:23:57 GMT Message-Id: <201109020823.p828NvEc017849@freefall.freebsd.org> To: org_freebsd@L93.com, mm@FreeBSD.org, freebsd-fs@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/156933: [zfs] ZFS receive after read on readonly=on filesystem is corrupted without warning X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 08:23:59 -0000 Synopsis: [zfs] ZFS receive after read on readonly=on filesystem is corrupted without warning State-Changed-From-To: open->closed State-Changed-By: mm State-Changed-When: Fri Sep 2 08:23:57 UTC 2011 State-Changed-Why: Resolved. Thanks! http://www.freebsd.org/cgi/query-pr.cgi?pr=156933 From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 08:24:09 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 815FB10656D1; Fri, 2 Sep 2011 08:24:09 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 59F328FC13; Fri, 2 Sep 2011 08:24:09 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828O9qV017954; Fri, 2 Sep 2011 08:24:09 GMT (envelope-from mm@freefall.freebsd.org) Received: (from mm@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828O9JT017950; Fri, 2 Sep 2011 08:24:09 GMT (envelope-from mm) Date: Fri, 2 Sep 2011 08:24:09 GMT Message-Id: <201109020824.p828O9JT017950@freefall.freebsd.org> To: mm@FreeBSD.org, mm@FreeBSD.org, freebsd-fs@FreeBSD.org From: mm@FreeBSD.org Cc: Subject: Re: kern/160035: [zfs] zfs rollback does not invalidate mmapped cache X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 08:24:09 -0000 Synopsis: [zfs] zfs rollback does not invalidate mmapped cache State-Changed-From-To: open->closed State-Changed-By: mm State-Changed-When: Fri Sep 2 08:24:08 UTC 2011 State-Changed-Why: Resolved. Thanks! http://www.freebsd.org/cgi/query-pr.cgi?pr=160035 From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 13:48:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EBAB1106566C for ; Fri, 2 Sep 2011 13:48:32 +0000 (UTC) (envelope-from joh.hendriks@gmail.com) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 819A38FC13 for ; Fri, 2 Sep 2011 13:48:32 +0000 (UTC) Received: by ewy1 with SMTP id 1so1841882ewy.13 for ; Fri, 02 Sep 2011 06:48:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=W/81gqZWQREkbFXIodYcCf1CCe2v3ifpq6HPYclGRB8=; b=UMVphNl3ZB850m03NDAF/dEau034mu4PW5LW4JMurbSZSi2SAmKLJTP6vcqSM8dCET XxUzYkCYceXAYWEh4orEc07tf3MNMpBg2WK5QFneorYnKxV/FNKdwskIE3vY2//p1iqQ qCDnt7dgARVSS3Xv9tpxX7P4jqgjwX8cwYIc4= Received: by 10.213.31.75 with SMTP id x11mr170922ebc.6.1314970004457; Fri, 02 Sep 2011 06:26:44 -0700 (PDT) Received: from [192.168.50.106] (double-l.xs4all.nl [80.126.205.144]) by mx.google.com with ESMTPS id i6sm2111025eeb.11.2011.09.02.06.26.43 (version=SSLv3 cipher=OTHER); Fri, 02 Sep 2011 06:26:43 -0700 (PDT) Message-ID: <4E60D992.3030802@gmail.com> Date: Fri, 02 Sep 2011 15:26:42 +0200 From: Johan Hendriks User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: ZFS on HAST and reboot. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 13:48:33 -0000 Hello all. I just started using ZFS on top of HAST. What i did was first glabel my disks like disk1 to disk3 Then I created my hast devices in /etc/hast.conf /etc/hast.conf looks like this. i resource disk1 { on srv1 { local /dev/label/disk1 remote 192.168.5.41 } on srv2 { local /dev/label/disk1 remote 192.168.5.40 } } resource disk2 { on srv1 { local /dev/label/disk2 remote 192.168.5.41 } on srv2 { local /dev/label/disk2 remote 192.168.5.40 } } resource disk3 { on srv1 { local /dev/label/disk3 remote 192.168.5.41 } on srv2 { local /dev/label/disk3 remote 192.168.5.40 } } This works. I can set srv 1 to primary and srv 2 to secondary and visa versa. hastctl role primary all and hastctl role secondary all. Then i created the raidz on the master srv1 zpool create storage raidz1 hast/disk1 hast/disk2 hast/disk3 all looks good. zpool status pool: storage state: ONLINE scan: scrub repaired 0 in 0h0m with 0 errors on Wed Aug 31 20:49:19 2011 config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 hast/disk1 ONLINE 0 0 0 hast/disk2 ONLINE 0 0 0 hast/disk3 ONLINE 0 0 0 errors: No known data errors then i created the mountpoint and created zfs on it # mkdir /usr/local/virtual # zfs create storage/virtual # zfs list # zfs set mountpoint=/usr/local/virtual storage/virtual # /etc/rc.d/zfs start and whooop there is my /usr/local/virtual zfs filesystem. # mount /dev/ada0p2 on / (ufs, local, journaled soft-updates) devfs on /dev (devfs, local, multilabel) storage on /storage (zfs, local, nfsv4acls) storage/virtual on /usr/local/virtual (zfs, local, nfsv4acls) if i do a zfs export -f storage on srv1 change the hast role to secondary and then set the hast role on srv2 to primary and do zfs import -f storage, i can see the files on srv2. I am a happy camper :D So it works like advertised. Now i rebooted both machines. all is working fine. But if i reboot the server srv1 again, i can not import the pool anymore, it tells me the pool is already imported. I do load the carp-hast-switch master file with ifstated. This does set the hast role to primary. But can not import the pool. Now this can be true because i did not export it. if i do a /etc/rc.d/zfs start, than it gets mounted and the pool is again available. Is there a way i can do this automaticly. In my understanding after a reboot zfs try's to start, but fails because my hast providers are not yet ready. Or am i doing something wrong and should i not do it this way. Can i tell zfs to start after the hast providers are primary at reboot. I hope i explained it correctly. Thanks for your time. regards Johan Hendriks From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 15:07:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8CE2106564A for ; Fri, 2 Sep 2011 15:07:41 +0000 (UTC) (envelope-from trent@snakebite.org) Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188]) by mx1.freebsd.org (Postfix) with ESMTP id 8D2D08FC08 for ; Fri, 2 Sep 2011 15:07:41 +0000 (UTC) Received: from EXCASUM02.exchhosting.com (192.168.11.116) by exhub04.exchhosting.com (192.168.11.100) with Microsoft SMTP Server (TLS) id 8.2.213.0; Fri, 2 Sep 2011 08:07:38 -0700 Received: from [10.211.55.3] (35.11.55.172) by exchange.liveoffice.com (192.168.11.116) with Microsoft SMTP Server (TLS) id 8.2.213.0; Fri, 2 Sep 2011 08:07:38 -0700 Message-ID: <4E60F138.1000705@snakebite.org> Date: Fri, 2 Sep 2011 11:07:36 -0400 From: Trent Nelson User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: References: <4E5F811A.2040307@snakebite.org> <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org> <4E5FBE3E.7020706@snakebite.org> <553883C7-B97D-429F-AF4A-E208B6051B62@3geeks.org> In-Reply-To: <553883C7-B97D-429F-AF4A-E208B6051B62@3geeks.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: gptzfsboot and 4k sector raidz X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 15:07:41 -0000 On 01-Sep-11 1:46 PM, Daniel Mayfield wrote: >> Enough about sizes; what's your read/write performance like between >> 512-byte/4K? I didn't think to test performance in the 4K >> configuration; I really wish I had, now. > > I didn't test performance. I'm doing all the work running from the > mfsBSD boot disc. I'm not sure a simple 'dd' is a good test, but if > you have suggestions, I'm open. It's a good test when it shows you can't get more than 20-30MB/sec in bursts for each disk ;-) I didn't think to try just dd'ing directly to the disk versus through a zfs pool; I think the results of that would be pretty conclusive with regards to whether or not using the new 4K-sector drives with 512-byte sectors is as bad as I'm seeing, or if it's something else. Trent. From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 16:34:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BE0CC106564A for ; Fri, 2 Sep 2011 16:34:33 +0000 (UTC) (envelope-from radiomlodychbandytow@o2.pl) Received: from tur.go2.pl (tur.go2.pl [193.17.41.50]) by mx1.freebsd.org (Postfix) with ESMTP id 7BDCC8FC15 for ; Fri, 2 Sep 2011 16:34:33 +0000 (UTC) Received: from moh2-ve2.go2.pl (moh2-ve2.go2.pl [193.17.41.200]) by tur.go2.pl (Postfix) with ESMTP id 66319230C14 for ; Fri, 2 Sep 2011 18:16:17 +0200 (CEST) Received: from moh2-ve2.go2.pl (unknown [10.0.0.200]) by moh2-ve2.go2.pl (Postfix) with ESMTP id 355D6B00169 for ; Fri, 2 Sep 2011 18:16:15 +0200 (CEST) Received: from unknown (unknown [10.0.0.42]) by moh2-ve2.go2.pl (Postfix) with SMTP for ; Fri, 2 Sep 2011 18:16:14 +0200 (CEST) Received: from host892524678.com-promis.3s.pl [89.25.246.78] by poczta.o2.pl with ESMTP id bfXQjd; Fri, 02 Sep 2011 18:16:14 +0200 Message-ID: <4E61014B.7080100@o2.pl> Date: Fri, 02 Sep 2011 18:16:11 +0200 From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-O2-Trust: 2, 61 X-O2-SPF: neutral Subject: [ZFS] lzjb_uncompress possible access violation? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 16:34:33 -0000 As far as I can see, when checksumming is turned off or there's a collision, it is possible that lzjb_uncompress is fed with corrupted data. Source length is entirely ignored and since source has to be shorter than dest, it is broken. -- Twoje radio From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 16:54:04 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2DBF106564A for ; Fri, 2 Sep 2011 16:54:04 +0000 (UTC) (envelope-from brodbd@uw.edu) Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 72DC68FC0C for ; Fri, 2 Sep 2011 16:54:04 +0000 (UTC) Received: by ewy1 with SMTP id 1so1935806ewy.13 for ; Fri, 02 Sep 2011 09:54:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.213.17.140 with SMTP id s12mr492906eba.111.1314980973922; Fri, 02 Sep 2011 09:29:33 -0700 (PDT) Received: by 10.213.22.210 with HTTP; Fri, 2 Sep 2011 09:29:33 -0700 (PDT) Date: Fri, 2 Sep 2011 09:29:33 -0700 Message-ID: From: David Brodbeck To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: ZFSv28+NFSv4 poor file creation performance, "sync=disabled" has no effect X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 16:54:05 -0000 I originally posted this on FreeBSD-questions, but it was suggested that I bring it here. I'm testing FreeBSD 9.0-BETA with an eye toward eventually using FreeBSD 9.0 to replace some existing OpenSolaris 2008.11 installations. I've found NFS file creation performance (as measured by Bonnie++) is equally slow for both with default settings. However, on OpenSolaris I disable the ZIL to improve file creation performance. This tuning parameter was removed from FreeBSD 9.0; its replacement is supposed to be the per-filesystem flag "sync", but setting this flag seems to have no effect. I did recompile the FreeBSD kernel without debugging features before doing the tests, so I don't think this is a case of debugging code slowing things down. Here's the relevant data; these are all from bonnie++'s "sequential create" benchmark. The NFS client was RedHat Enterprise Linux 5.6. OpenSolaris 2008.11, default settings: 58/second OpenSolaris 2008.11, with "zil_disable=1": 1258/second FreeBSD 9.0-BETA, default settings: 107/second FreeBSD 9.0-BETA, with "sync=disabled": 106/second So it appears the "sync" ZFS parameter has no effect in FreeBSD. Has anyone else seen this? Is there a way to improve NFS file creation performance now that zil_disable has been removed? -- David Brodbeck System Administrator, Linguistics University of Washington From owner-freebsd-fs@FreeBSD.ORG Fri Sep 2 23:40:31 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E78071065674; Fri, 2 Sep 2011 23:40:31 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BEF5B8FC13; Fri, 2 Sep 2011 23:40:31 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p82NeV7v061962; Fri, 2 Sep 2011 23:40:31 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p82NeVm8061952; Fri, 2 Sep 2011 23:40:31 GMT (envelope-from linimon) Date: Fri, 2 Sep 2011 23:40:31 GMT Message-Id: <201109022340.p82NeVm8061952@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/160410: [smbfs] [hang] smbfs hangs when transferring large files X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Sep 2011 23:40:32 -0000 Old Synopsis: smbfs hangs when transferring large files New Synopsis: [smbfs] [hang] smbfs hangs when transferring large files Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Fri Sep 2 23:40:19 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=160410 From owner-freebsd-fs@FreeBSD.ORG Sat Sep 3 01:36:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 64E75106566C for ; Sat, 3 Sep 2011 01:36:11 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 219438FC12 for ; Sat, 3 Sep 2011 01:36:10 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEACWEYU6DaFvO/2dsb2JhbABCDoQ/pRaBRgEBAQECAQEBASArIAsFFg4KAgINGQIpAQkmBggHBAEcBIdSBKVGkWeBLIQtgREEkRqCEpAiM1Q X-IronPort-AV: E=Sophos;i="4.68,322,1312171200"; d="scan'208";a="136405582" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 02 Sep 2011 21:36:10 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 43455B3F80; Fri, 2 Sep 2011 21:36:10 -0400 (EDT) Date: Fri, 2 Sep 2011 21:36:10 -0400 (EDT) From: Rick Macklem To: David Brodbeck Message-ID: <14220705.747900.1315013770239.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: ZFSv28+NFSv4 poor file creation performance, "sync=disabled" has no effect X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 Sep 2011 01:36:11 -0000 David Brodbeck wrote: > I originally posted this on FreeBSD-questions, but it was suggested > that I > bring it here. > I'm testing FreeBSD 9.0-BETA with an eye toward eventually using > FreeBSD 9.0 to replace some existing OpenSolaris 2008.11 > installations. I've found NFS file creation performance (as measured > by Bonnie++) is equally slow for both with default settings. However, > on OpenSolaris I disable the ZIL to improve file creation performance. I know nothing about ZFS, so all I can do is pass along what others have said in previous posts. (I'd suggest you look through the freebsd-fs@ archives.) One post explained how disabling the ZIL can result in up to 5seconds worth of changes being lost if/when the server crashes. (A lot can change on a file system in 5sec. The NFS protocol assumes all fs changes related to a file creation are done before the server replies to the RPC. As such, disabling the ZIL does violate the protocol specs and means you are living dangerously.) > This tuning parameter was removed from FreeBSD 9.0; its replacement > is supposed to be the per-filesystem flag "sync", but setting this > flag seems to have no effect. > > I did recompile the FreeBSD kernel without debugging features before > doing the tests, so I don't think this is a case of debugging code > slowing things down. > > Here's the relevant data; these are all from bonnie++'s "sequential > create" benchmark. The NFS client was RedHat Enterprise Linux 5.6. > > OpenSolaris 2008.11, default settings: 58/second > OpenSolaris 2008.11, with "zil_disable=1": 1258/second > > FreeBSD 9.0-BETA, default settings: 107/second > FreeBSD 9.0-BETA, with "sync=disabled": 106/second > > > So it appears the "sync" ZFS parameter has no effect in FreeBSD. Has > anyone else seen this? Is there a way to improve NFS file creation > performance now that zil_disable has been removed? > Some have reported good results from putting the ZIL on a dedicated device. Some use an SSD, but there are write bandwidth issues related to this. If I understood them correctly, you need to make sure that the SSD is designed to provide good write performance and should be configured to a much larger size than what the ZIL would use. Again, there were good threads discussing this on freebsd-fs@. I have never used either ZFS nor an SSD, so the above is just paraphrasing my understanding from these threads and could be way off base. (I do know that NFS clients expect the changes related to file creation to be stored on non-volatile storage before replying to the creation RPC.) > -- > David Brodbeck > System Administrator, Linguistics > University of Washington > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"