From owner-freebsd-fs@FreeBSD.ORG  Sun Aug 28 07:36:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 273801065673
	for <fs@freebsd.org>; Sun, 28 Aug 2011 07:36:33 +0000 (UTC)
	(envelope-from trent@snakebite.org)
Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D9788FC13
	for <fs@freebsd.org>; Sun, 28 Aug 2011 07:36:32 +0000 (UTC)
Received: from EXCASUM07.exchhosting.com (192.168.11.194) by
	exhub08.exchhosting.com (192.168.11.106) with Microsoft SMTP Server
	(TLS) id 8.2.213.0; Sun, 28 Aug 2011 00:26:26 -0700
Received: from EXMBX10.exchhosting.com
	([fe80:0000:0000:0000:8133:164f:44.75.166.49]) by
	EXCASUM07.exchhosting.com
	([192.168.11.194]) with mapi; Sun, 28 Aug 2011 00:26:26 -0700
From: Trent Nelson <trent@snakebite.org>
To: "fs@freebsd.org" <fs@freebsd.org>
Date: Sun, 28 Aug 2011 00:26:26 -0700
Thread-Topic: How do you zfs rename a dataset that has children?
Thread-Index: AcxlU8tu8ntNhIgBSISPvhSrpgKwfg==
Message-ID: <5B89610A-4D16-422A-9E52-F182CAF76D68@snakebite.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: 
Subject: How do you zfs rename a dataset that has children?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Aug 2011 07:36:33 -0000

This has me scratching my head:

[root@usbkey/ttypts/0(~)#] zfs create tank/host/foo
[root@usbkey/ttypts/0(~)#] zfs create tank/host/foo/bar            =20
[root@usbkey/ttypts/0(~)#] zfs rename tank/host/foo tank/host/test
Assertion failed: (!clp->cl_alldependents), file /usr/src/cddl/lib/libzfs/.=
./../../cddl/contrib/opensolaris/lib/libzfs/common/libzfs_changelist.c, lin=
e 470.
zsh: abort (core dumped)  zfs rename tank/host/foo tank/host/test

Say wha'?

Renaming the child dataset first works, but it's not what I want, obviously=
:

[root@usbkey/ttypts/0(~)#] zfs rename tank/host/foo/bar tank/host/test/bar
cannot create 'tank/host/test/bar': parent does not exist
[root@usbkey/ttypts/0(~)#] zfs rename -p tank/host/foo/bar tank/host/test/b=
ar
[root@usbkey/ttypts/0(~)#] zfs rename tank/host/foo tank/host/test/bar   =20
cannot rename 'tank/host/foo': dataset already exists

[root@usbkey/ttypts/0(~)#] uname -a
FreeBSD usbkey.home.trent.me 8.2-STABLE FreeBSD 8.2-STABLE #2 r224667M: Sat=
 Aug  6 04:11:46 EDT 2011     root@home.trent.me:/usr/obj/usr/src/sys/GENER=
IC  amd64

(The 'M' in r224667M is due to a device ID I changed in e1000.h; unrelated =
to this issue.)

What am I doing wrong?

(My actual use case is more complicated than the test case above; I built a=
 new system, named 'fulcrum', from an existing build, named 'flanker'. I wa=
nt to rename tank/host/flanker -> tank/host/fulcrum (hence booting from the=
 usbkey).)


Regards,

	Trent.


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 11:07:07 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C45851065678
	for <freebsd-fs@FreeBSD.org>; Mon, 29 Aug 2011 11:07:07 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id B2FED8FC16
	for <freebsd-fs@FreeBSD.org>; Mon, 29 Aug 2011 11:07:07 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7TB77n5089262
	for <freebsd-fs@FreeBSD.org>; Mon, 29 Aug 2011 11:07:07 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7TB77jW089260
	for freebsd-fs@FreeBSD.org; Mon, 29 Aug 2011 11:07:07 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 29 Aug 2011 11:07:07 GMT
Message-Id: <201108291107.p7TB77jW089260@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 11:07:07 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/160035  fs         [zfs] zfs rollback does not invalidate mmapped cache
o kern/159971  fs         [ffs] [panic] panic with soft updates journaling durin
o kern/159930  fs         [ufs] [panic] kernel core
o kern/159418  fs         [tmpfs] [panic] tmpfs kernel panic: recursing on non r
o kern/159402  fs         [zfs][loader] symlinks cause I/O errors
o kern/159357  fs         [zfs] ZFS MAXNAMELEN macro has confusing name (off-by-
o kern/159356  fs         [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s
o kern/159351  fs         [nfs] [patch] - divide by zero in mountnfs()
o kern/159251  fs         [zfs] [request]: add FLETCHER4 as DEDUP hash option
o kern/159233  fs         [ext2fs] [patch] fs/ext2fs: finish reallocblk implemen
o kern/159232  fs         [ext2fs] [patch] fs/ext2fs: merge ext2_readwrite into 
o kern/159077  fs         [zfs] Can't cd .. with latest zfs version
o kern/159048  fs         [smbfs] smb mount corrupts large files
o kern/159045  fs         [zfs] [hang] ZFS scrub freezes system
o kern/158839  fs         [zfs] ZFS Bootloader Fails if there is a Dead Disk
o kern/158802  fs         [amd] amd(8) ICMP storm and unkillable process.
o kern/158711  fs         [ffs] [panic] panic in ffs_blkfree and ffs_valloc
o kern/158231  fs         [nullfs] panic on unmounting nullfs mounted over ufs o
f kern/157929  fs         [nfs] NFS slow read
o kern/157722  fs         [geli] unable to newfs a geli encrypted partition
o kern/157399  fs         [zfs] trouble with: mdconfig force delete && zfs strip
o kern/157179  fs         [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov
o kern/156933  fs         [zfs] ZFS receive after read on readonly=on filesystem
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156168  fs         [nfs] [panic] Kernel panic under concurrent access ove
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
o kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
o kern/154447  fs         [zfs] [panic] Occasional panics - solaris assert somew
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153847  fs         [nfs] [panic] Kernel panic from incorrect m_free in nf
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
p kern/152488  fs         [tmpfs] [patch] mtime of file updated when only inode 
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o kern/151845  fs         [smbfs] [patch] smbfs should be upgraded to support Un
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o bin/148296   fs         [zfs] [loader] [patch] Very slow probe in /usr/src/sys
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147790  fs         [zfs] zfs set acl(mode|inherit) fails on existing zfs
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133174  fs         [msdosfs] [patch] msdosfs must support multibyte inter
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
f kern/130133  fs         [panic] [zfs] 'kmem_map too small' caused by make clea
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
f kern/127375  fs         [zfs] If vm.kmem_size_max>"1073741823" then write spee
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
f kern/126703  fs         [panic] [zfs] _mtx_lock_sleep: recursed on non-recursi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
f sparc/123566 fs         [zfs] zpool import issue: EOVERFLOW
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121366   fs         [zfs] [patch] Automatic disk scrubbing from periodic(8
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
f kern/120210  fs         [zfs] [panic] reboot after panic: solaris assert: arc_
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118126  fs         [nfs] [patch] Poor NFS server write performance
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o kern/33464   fs         [ufs] soft update inconsistencies after system crash
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

246 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 17:32:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 347981065670;
	Mon, 29 Aug 2011 17:32:48 +0000 (UTC) (envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id C1F278FC0A;
	Mon, 29 Aug 2011 17:32:47 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id E6A994AC31; 
	Mon, 29 Aug 2011 21:32:45 +0400 (MSD)
Date: Mon, 29 Aug 2011 21:32:44 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <1742839983.20110829213244@serebryakov.spb.ru>
To: Ivan Voras <ivoras@freebsd.org>
In-Reply-To: <j3b7vs$tkd$1@dough.gmane.org>
References: <1963980291.20110826232758@serebryakov.spb.ru>
	<201108262052.p7QKqpen039191@chez.mckusick.com>
	<758608837.20110827112116@serebryakov.spb.ru>
	<j3b7vs$tkd$1@dough.gmane.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Strange behaviour of UFS2+SU FS on FreeBSD 8-Stable: dreadful
	perofrmance for old data, excellent for new.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 17:32:48 -0000

Hello, Ivan.
You wrote 27 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 21:02:44:

>>    I'm going to investigate alter, why it is ony ~180MiB/s, when
>> theoretically it should be about (90*4) 360MiB/s linear read, and whom
>> to blame: UFS or geom_raid5 or both :)
> Try this: http://ivoras.net/blog/tree/2010-11-19.ufs-read-ahead.html
> (or it could be a hardware issue - controller bottleneck or something
> like that).
 It is more strange and more complex, than simple "180MiB/s" read.

 I have:

(1) software RAID5 (geom_raid5) on 5xWD Green HDDs (yes, I know, that
    seek is not very fast on these disks, but I'm discussung only
    linear access now). Stripe size is 128KiB. Theoretical maximum
    performance is about 4*90 =3D 360MiB/s.

(2) FS with 32K blocks (unfortunately, here is (WAS?) old bug, when
    system lock up when here are 16KiB/s and 64KiB/-sized FSes in one syste=
m).

(3) vfs.read_max=3D32, it means 32*32 =3D 1024KiB =3D 8 RAID stripes. Enough
    for parallel requests.

 And, in such conditions, good placed (not legacy ones, which
are very fragmented, as were written on almost full FS) large (more
than 1GiB) files fives from 120MiB/s up to 350MiB/s. Some files
tend to read more fast, some not so fast, but it seems that speed
could vary for one file from run to run (yes, I clean memory cache by
reading big files between "benhc" euns). And, yes, 350MiB/s is not
typical. 120-180MiB/s encounters much, much often than larger speeds.

  Do you have any ideas, how to debug this situation and make sure,
that geom_raid5 does it best and is not bottleneck?
  Maybe some other UFS2 tunings or diagnostics?

  I've tried such configuration with software (ICH9R, which is not
hardware implementation for sure) RAID5 on Windows and it was much
more consistent (and almost always shows speed near theoretical
maximum).

  Another question is how to measure and diagnose writing...

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 19:06:15 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C28C31065676
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 19:06:15 +0000 (UTC)
	(envelope-from hans@beastielabs.net)
Received: from mail.beastielabs.net (beasties.demon.nl [82.161.3.114])
	by mx1.freebsd.org (Postfix) with ESMTP id 172328FC16
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 19:06:14 +0000 (UTC)
Received: from testsoekris.hotsoft.nl (localhost [127.0.0.1])
	by mail.beastielabs.net (8.14.4/8.14.4) with ESMTP id p7TIUwJc058058;
	Mon, 29 Aug 2011 20:30:58 +0200 (CEST)
	(envelope-from hans@testsoekris.hotsoft.nl)
Received: (from hans@localhost)
	by testsoekris.hotsoft.nl (8.14.4/8.14.4/Submit) id p7TIUwLx058057;
	Mon, 29 Aug 2011 20:30:58 +0200 (CEST) (envelope-from hans)
Date: Mon, 29 Aug 2011 20:30:58 +0200
From: Hans Ottevanger <hans@beastielabs.net>
To: freebsd-current@freebsd.org
Message-ID: <20110829183058.GA57564@testsoekris.hotsoft.nl>
References: <4E4F71B5.3010606@barafranca.com>
	<20110821100426.GA28260@testsoekris.hotsoft.nl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110821100426.GA28260@testsoekris.hotsoft.nl>
User-Agent: Mutt/1.4.2.3i
Cc: freebsd-fs@freebsd.org
Subject: Snapshots fail with UFS+J (was: Re: Fwd: Re: Can *you* UFS snapshot
	a filesystem with 9.0-BETA1?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 19:06:15 -0000

On Sun, Aug 21, 2011 at 12:04:26PM +0200, Hans Ottevanger wrote:
> On Sat, Aug 20, 2011 at 09:35:01AM +0100, Hugo Silva wrote:
> > 
> > 
> > Le Thu, 18 Aug 2011 10:22:31 +0100,
> > Hugo Silva <hugo@barafranca.com> a ?crit :
> > 
> > Hello,
> > 
> > > I'm wondering. On a virtual machine (amd64 HVM+PV), it's crashing
> > > every time. Not sure if this is SNAFU, as I had never used ufs
> > > snapshots on freebsd before.
> > > 
> > > After running mksnap_ffs, ssh stops working (a telnet session doesn't
> > > show the sshd banner). The ssh session where the command was run from
> > > stops responding, the webserver dies and xm console'ing from the dom0
> > > works, but the VM is unresponsive (ie no login prompt on ENTER).
> > > 
> > > Anyone else seeing the same?
> > 
> > I've tried in a FreeBSD guest (9.0-beta1/i386) into VirtualBox and
> > I see a LOR (or looks like a LOR), then the system is freezed.
> > This is 100% reproductible.
> > 
> > Unfortunatly, I'm not able to dump a panic or to break into the
> > debugger, so a screenshot :
> > http://user.lamaiziere.net/patrick/public/lormksnap.png
> > 
> > You should ask on freebsd-current@
> > 
> 
> Hi,
> 
> I can confirm that this happens on "real iron" too.
> 
> I use an i386 test installation (P4 2.4 GHz, 2GB RAM, 500GB PATA disk),
> running 9.0-BETA1 as distributed (with a kernel effectively being GENERIC
> with devices removed that I don't have). When I try to make a snapshot
> using
> 
> cd /usr; mksnap_ffs /usr/.snap/testsnap
> 
> the system is still responsive for a few seconds, with lots of disk
> activity, but then it prints the following output on the console (using
> firewire and dcons to ease capturing):
> 
> lock order reversal:
>  1st 0xc5a289e8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425
>  2nd 0xdeb3c078 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658
>  3rd 0xc5663af8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546
> KDB: stack backtrace:
> db_trace_self_wrapper(c09ec6ba,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26
> kdb_backtrace(c07099eb,c09efe14,c5035308,c5039408,c4fda440,...) at kdb_backtrace+0x2a
> _witness_debugger(c09efe14,c5663af8,c09df984,c5039408,c0a10ba2,...) at _witness_debugger+0x25
> witness_checkorder(c5663af8,9,c0a10ba2,222,0,...) at witness_checkorder+0x839
> __lockmgr_args(c5663af8,80100,c5663b18,0,0,...) at __lockmgr_args+0x804
> ffs_lock(c4fda568,c0bf1250,c59b9c30,80100,c5663aa0,...) at ffs_lock+0x8a
> VOP_LOCK1_APV(c0a7fb80,c4fda568,c4fda588,c0a8df20,c5663aa0,...) at VOP_LOCK1_APV+0xb5
> _vn_lock(c5663aa0,80100,c0a10ba2,222,c5011e80,...) at _vn_lock+0x5e
> ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x14cb
> ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13
> vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7
> nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84
> syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263
> syscall(c4fdad28) at syscall+0x34
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 ---
> lock order reversal:
>  1st 0xdeb3c078 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658
>  2nd 0xc51a72dc snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818
> KDB: stack backtrace:
> db_trace_self_wrapper(c09ec6ba,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26
> kdb_backtrace(c07099eb,c09efdfb,c5035308,c5039b58,c4fda440,...) at kdb_backtrace+0x2a
> _witness_debugger(c09efdfb,c51a72dc,c0a10c04,c5039b58,c0a10ba2,...) at _witness_debugger+0x25
> witness_checkorder(c51a72dc,9,c0a10ba2,332,c5a28a08,...) at witness_checkorder+0x839
> __lockmgr_args(c51a72dc,80400,c5a28a08,0,0,...) at __lockmgr_args+0x804
> ffs_lock(c4fda568,deb2434c,100000,80400,c5a28990,...) at ffs_lock+0x8a
> VOP_LOCK1_APV(c0a7fb80,c4fda568,deb243a8,c0a8df20,c5a28990,...) at VOP_LOCK1_APV+0xb5
> _vn_lock(c5a28990,80400,c0a10ba2,332,0,...) at _vn_lock+0x5e
> ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x295e
> ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13
> vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7
> nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84
> syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263
> syscall(c4fdad28) at syscall+0x34
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 ---
> 
> After this the system is fully unresponsive and requires a hard reset.
> 
> Once rebooted, the snapshot file appears to exist, but is unusable.
> 
> When reverting to just softupdates, i.e. disabling journaling on /usr,
> everything goes well, except that the same LOR's still do occur, though
> the addresses differ.
> 
> My amd64 9.0-CURRENT system, just updated to r225055, has the same issue,
> but since I do not have WITNESS in the kernel config there, the console
> output is missing.
> 
> BTW, this issue also makes dump(9) hang the system when the -L option
> is used.
> 
> Kind regards,
> 
> Hans Ottevanger
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"

Since I did not see any response to these messages and I cannot imagine that
Hugo and I are the only ones with this issue, I will follow up to my own post.

I have tried just yesterday to make a snapshot of the /usr filesystem (about
16 GB) of my amd64 test system (Q6600, 8GB RAM, 500GB SATA disk) running
9.0-BETA1 (r225228) and the problem still occurs. After these LOR's:

lock order reversal:
 1st 0xfffffe00073ab278 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425
 2nd 0xffffff81eb243498 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658
 3rd 0xfffffe00073629f8 ufs (ufs) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x807
__lockmgr_args() at __lockmgr_args+0xdc6
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
ffs_snapshot() at ffs_snapshot+0x1c27
ffs_mount() at ffs_mount+0xa23
vfs_donmount() at vfs_donmount+0xddc
nmount() at nmount+0x63
syscallenter() at syscallenter+0x1aa
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xdd
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b ---
lock order reversal:
 1st 0xffffff81eb243498 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2658
 2nd 0xfffffe0007404a30 snaplk (snaplk) @ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x807
__lockmgr_args() at __lockmgr_args+0xdc6
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
ffs_snapshot() at ffs_snapshot+0x1b02
ffs_mount() at ffs_mount+0xa23
vfs_donmount() at vfs_donmount+0xddc
nmount() at nmount+0x63
syscallenter() at syscallenter+0x1aa
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xdd
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b ---
  
the system is completely unresponsive after a few seconds and can only
be revived by pushing the reset button.

When making a snapshot of a larger filesystem it takes a bit longer, but
the system will finally lock up.

Mark that this is not the usual extreme slowdown due to the snapshot
taking all the disk bandwidth: the system locks up tightly and does not
recover.

Is anybody else seeing this? Is it a known problem?

How to proceed?

Copied to freebsd-fs@ to elicit more response.

Kind regards,

Hans Ottevanger


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 19:38:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B940F106566C
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 19:38:53 +0000 (UTC)
	(envelope-from luke@digital-crocus.com)
Received: from mail.digital-crocus.com (node2.digital-crocus.com
	[91.209.244.128])
	by mx1.freebsd.org (Postfix) with ESMTP id 7626E8FC08
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 19:38:52 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector;
	d=hybrid-logic.co.uk; 
	h=Received:Received:Subject:From:Reply-To:To:Cc:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse;
	b=eJOA8OnBZEpzw45/VhNZ6yvAJAFntsbLQkQMUnjKnXKttuuTu9tLzqTWHdcD8kQqzDlbmCfiipk0juQfnxAuidYBKS3c9AqQrB+dQQzoHW37IivCmQh6d1U0ruWy3EwT;
Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD))
	(envelope-from <luke@digital-crocus.com>) id 1Qy7ei-000Hh1-Kb
	for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 20:38:00 +0100
Received: from 127cr.net ([78.105.122.99] helo=[192.168.1.23])
	by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD))
	(envelope-from <luke-lists@hybrid-logic.co.uk>)
	id 1Qy7ei-000Hgf-7y; Mon, 29 Aug 2011 20:38:00 +0100
From: Luke Marsden <luke-lists@hybrid-logic.co.uk>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset="UTF-8"
Organization: Hybrid Web Cluster
Date: Mon, 29 Aug 2011 20:38:48 +0100
Message-ID: <1314646728.7898.44.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.0
X-Digital-Crocus-Maillimit: done
X-Authenticated-Sender: luke
X-Complaints: abuse@digital-crocus.com
X-Admin: admin@digital-crocus.com
X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse
	reports)
Cc: tech@hybrid-logic.co.uk
Subject: ZFS hang in production on 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: luke@hybrid-logic.co.uk
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 19:38:53 -0000

Hi all,

I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE.

FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar  2
08:29:52 CET 2011     root@www4:/usr/obj/usr/src/sys/GENERIC  amd64

There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung.
Here is the procstat for the 'zfs umount -f':

13451 104337 zfs              -                mi_switch+0x176
sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4
dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b
Xfast_syscall+0xe2 

And the 'zfs rename's all look the same:

20361 101049 zfs              -                mi_switch+0x176
sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
+0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 

An 'ls' on a directory which contains most of the system's ZFS
mount-points (/hcfs) also hangs:

30073 101466 gnuls            -                mi_switch+0x176
sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
+0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
+0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
+0xe2 

If I truss the 'ls' it hangs on the stat syscall:
stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0
(0x0)

There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp
-prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or
-path /var/db/portsnap -prune -or -print' running which is also hung:

 2650 101674 find             -                mi_switch+0x176
sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
+0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
+0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
+0xe2 

However I/O to the presently mounted filesystems continues to work (even
on parts of filesystems which are unlikely to be cached), and 'zfs list'
showing all the filesystems (3,500 filesystems with ~100 snapshots per
filesystem) also works.

Any activity on the structure of the ZFS hierarchy *under the hcfs
filesystem* crashes, such as a 'zfs create hpool/hcfs/test':

70868 101874 zfs              -                mi_switch+0x176
sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
+0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce
syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2 

BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same
pool, but not rooted on hpool/hcfs) does not hang, and succeeds
normally.

procstat -kk on the zfskern process gives:

  PID    TID COMM             TDNAME
KSTACK                       
    5 100045 zfskern          arc_reclaim_thre mi_switch+0x176
sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9
fork_exit+0x118 fork_trampoline+0xe 
    5 100046 zfskern          l2arc_feed_threa mi_switch+0x176
sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce
fork_exit+0x118 fork_trampoline+0xe 
    5 100098 zfskern          txg_thread_enter mi_switch+0x176
sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread
+0xb5 fork_exit+0x118 fork_trampoline+0xe 
    5 100099 zfskern          txg_thread_enter mi_switch+0x176
sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c
txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe 

Any ideas on what might be causing this?

Thank you for supporting ZFS on FreeBSD!

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 20:04:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 951C41065670
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:04:06 +0000 (UTC)
	(envelope-from luke@digital-crocus.com)
Received: from mail.digital-crocus.com (node2.digital-crocus.com
	[91.209.244.128])
	by mx1.freebsd.org (Postfix) with ESMTP id 436DC8FC0A
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:04:05 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector;
	d=hybrid-logic.co.uk; 
	h=Received:Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse;
	b=oLQW1BRvxgQB3q6snaqh2w69KcdbVZJxrppJiWsvSwfPJs/s45W0AWEKIjTmdRU9H6Bveshmccs85WQj4Peos2IW8N/fD27+Fh0yzfg7wwV8a5Snrlnl15DBbb6F+/Ot;
Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD))
	(envelope-from <luke@digital-crocus.com>) id 1Qy838-000LL6-0k
	for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 21:03:14 +0100
Received: from 127cr.net ([78.105.122.99] helo=[192.168.1.23])
	by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD))
	(envelope-from <luke@hybrid-logic.co.uk>)
	id 1Qy837-000LKu-Ko; Mon, 29 Aug 2011 21:03:13 +0100
From: Luke Marsden <luke@hybrid-logic.co.uk>
To: freebsd-fs@freebsd.org
In-Reply-To: <CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
References: <1314646728.7898.44.camel@pow>
	<CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Organization: Hybrid Logic
Date: Mon, 29 Aug 2011 21:04:01 +0100
Message-ID: <1314648241.7898.51.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.0
X-Digital-Crocus-Maillimit: done
X-Authenticated-Sender: luke
X-Complaints: abuse@digital-crocus.com
X-Admin: admin@digital-crocus.com
X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse
	reports)
Cc: tech@hybrid-logic.co.uk
Subject: Re: ZFS hang in production on 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 20:04:06 -0000

> On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden
> <luke-lists@hybrid-logic.co.uk> wrote:
> > Hi all,
> >
> > I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE.
> >
> > FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar  2
> > 08:29:52 CET 2011     root@www4:/usr/obj/usr/src/sys/GENERIC  amd64
> >
> > There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung.
> > Here is the procstat for the 'zfs umount -f':
> >
> > 13451 104337 zfs              -                mi_switch+0x176
> > sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4
> > dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b
> > Xfast_syscall+0xe2
> >
> > And the 'zfs rename's all look the same:
> >
> > 20361 101049 zfs              -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4
> > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
> >
> > An 'ls' on a directory which contains most of the system's ZFS
> > mount-points (/hcfs) also hangs:
> >
> > 30073 101466 gnuls            -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
> > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
> > +0xe2
> >
> > If I truss the 'ls' it hangs on the stat syscall:
> > stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0
> > (0x0)
> >
> > There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp
> > -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or
> > -path /var/db/portsnap -prune -or -print' running which is also hung:
> >
> >  2650 101674 find             -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
> > +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
> > +0xe2
> >
> > However I/O to the presently mounted filesystems continues to work (even
> > on parts of filesystems which are unlikely to be cached), and 'zfs list'
> > showing all the filesystems (3,500 filesystems with ~100 snapshots per
> > filesystem) also works.
> >
> > Any activity on the structure of the ZFS hierarchy *under the hcfs
> > filesystem* crashes, such as a 'zfs create hpool/hcfs/test':
> >
> > 70868 101874 zfs              -                mi_switch+0x176
> > sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> > +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce
> > syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
> >
> > BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same
> > pool, but not rooted on hpool/hcfs) does not hang, and succeeds
> > normally.
> >
> > procstat -kk on the zfskern process gives:
> >
> >  PID    TID COMM             TDNAME
> > KSTACK
> >    5 100045 zfskern          arc_reclaim_thre mi_switch+0x176
> > sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9
> > fork_exit+0x118 fork_trampoline+0xe
> >    5 100046 zfskern          l2arc_feed_threa mi_switch+0x176
> > sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce
> > fork_exit+0x118 fork_trampoline+0xe
> >    5 100098 zfskern          txg_thread_enter mi_switch+0x176
> > sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread
> > +0xb5 fork_exit+0x118 fork_trampoline+0xe
> >    5 100099 zfskern          txg_thread_enter mi_switch+0x176
> > sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c
> > txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe
> >
> > Any ideas on what might be causing this?
> 
> It sounds like the bug Martin Matuska has recently fixed in FreeBSD
> and reported upstream to Illumos:
> https://www.illumos.org/issues/1313
> 
> The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th.

Thank you for such a quick response, but I'm not sure it's the right
solution.  The uptime on this server is 25 days, which is less than 28.
Also, I would expect the solution you described to cause issues globally
for the zpool, but the hangs only happens localised to one filesystem.
Also the bug report refers to slowing down ZFS writes, but this is a
hang (as if caused by a lock that ought to have been freed) on reads.

hybrid@ns382210:~$ sudo sysctl -a|grep tick
kern.clockrate: { hz = 1000, tick = 1000, profhz = 2000, stathz = 133 }
kern.timecounter.tick: 1
debug.tickdelay: 2

By the way, the server has 16GB of RAM with 5.8GB free, so I don't
suspect memory pressure causing the issue.  The zpool is 3.19T striped
over two disks.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Mobile: +1-415-449-1165 (US) / +447791750420 (UK)


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 20:17:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 48D08106566C
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:17:17 +0000 (UTC)
	(envelope-from geo.liaskos@gmail.com)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 0D2308FC08
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:17:16 +0000 (UTC)
Received: by qwc9 with SMTP id 9so4396676qwc.13
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 13:17:16 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:date:message-id:subject:from:to:content-type;
	bh=gL7XjojeDOJDxQwS1rIseqH57jOLDsOh6415zLCg3ZY=;
	b=sjowSAYN5tOFY+LFet/vxV+KVoy3Fb3m8zTKSH3pYYhK043T51ZhRCeYB5OdzniM80
	mROtk38iPewXcRjmvTRZf1GkaTzOZzCCD+ch18J5uMkH7qUuYvKDBwDffb+BOHXOBP99
	fcSX0cwHyuduUbaWtNEY2U+fxdEu1YviHfEsE=
MIME-Version: 1.0
Received: by 10.229.64.80 with SMTP id d16mr6060950qci.169.1314647409710; Mon,
	29 Aug 2011 12:50:09 -0700 (PDT)
Received: by 10.229.89.138 with HTTP; Mon, 29 Aug 2011 12:50:09 -0700 (PDT)
Date: Mon, 29 Aug 2011 22:50:09 +0300
Message-ID: <CANcjpOAGDCnBrHLpYWUm2ydEzdfmKDa4DFCO77=mP-Oznoxwbg@mail.gmail.com>
From: George Liaskos <geo.liaskos@gmail.com>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8
Subject: NFSv4: After upgrade to 9 users can no longer list files.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 20:17:17 -0000

Hello,

I upgraded my home server the past weekend from 8.2-STABLE to 9,
after the upgrade users can no longer list the files / directories of a mount
from a client machine.

I am using nfsv4 exports for almost a year now, never had an issue, i did not
change the configuration during / after the upgrade. My kernel config was
using NFSD already.

Some server config info; i am exporting ZFS file systems:

[/etc/exports]
V4: /usr/local/data -sec=sys -network 192.168.0.0/24
/usr/local/data/downloads     -network 192.168.0.0/24 -maproot=root
/usr/local/data/software        -network 192.168.0.0/24 -maproot=root

[/etc/rc.conf]
rpcbind_enable="YES"
nfs_server_enable="YES"
nfsv4_server_enable="YES"
nfsuserd_enable="YES"
mountd_flags="-r -l"
mountd_enable="YES"

I am able to mount from the clients, root can list everything but
other users can't
either from console or from a file browser. I can still blindly cat /
touch files,
everything works except list. The same goes with local mounts on the server.

Thank you in advance for your help.

Regards,
George

From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 20:25:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 48B381065673
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:25:43 +0000 (UTC)
	(envelope-from artemb@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B6878FC0C
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:25:42 +0000 (UTC)
Received: by gyd10 with SMTP id 10so6147247gyd.13
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 13:25:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:sender:in-reply-to:references:date
	:x-google-sender-auth:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=xY8wnhmw5VA5uBNBWZo4kd8BrqQdiyWP5JlnV3ndPRk=;
	b=lEvXLEclnSZXsJ820rIK76gZ+WBdd6Fx1v4nfC4e9QvhiVSS3LmYnfsQ1Z6PphnlCu
	dfIong+gk28X45yV7gbA1EbWk/6jSQrDYB1BqMAKpDIfjk3ssfnA9yx+To5NTQerER3F
	iVnMTsqPXzwNxY4QIFn3Wp92e6MI5MlxwQBng=
MIME-Version: 1.0
Received: by 10.236.173.131 with SMTP id v3mr27597149yhl.112.1314647726058;
	Mon, 29 Aug 2011 12:55:26 -0700 (PDT)
Sender: artemb@gmail.com
Received: by 10.236.102.147 with HTTP; Mon, 29 Aug 2011 12:55:26 -0700 (PDT)
In-Reply-To: <1314646728.7898.44.camel@pow>
References: <1314646728.7898.44.camel@pow>
Date: Mon, 29 Aug 2011 12:55:26 -0700
X-Google-Sender-Auth: NRZVv_XlzhZS7ZFKKebPwgBY6o0
Message-ID: <CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
From: Artem Belevich <art@freebsd.org>
To: luke@hybrid-logic.co.uk
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk
Subject: Re: ZFS hang in production on 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 20:25:43 -0000

On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden
<luke-lists@hybrid-logic.co.uk> wrote:
> Hi all,
>
> I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE.
>
> FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar =A02
> 08:29:52 CET 2011 =A0 =A0 root@www4:/usr/obj/usr/src/sys/GENERIC =A0amd64
>
> There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung.
> Here is the procstat for the 'zfs umount -f':
>
> 13451 104337 zfs =A0 =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0mi_switch+0x176
> sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4
> dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b
> Xfast_syscall+0xe2
>
> And the 'zfs rename's all look the same:
>
> 20361 101049 zfs =A0 =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0mi_switch+0x176
> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4
> syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
>
> An 'ls' on a directory which contains most of the system's ZFS
> mount-points (/hcfs) also hangs:
>
> 30073 101466 gnuls =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0mi_switch+0x176
> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
> +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
> +0xe2
>
> If I truss the 'ls' it hangs on the stat syscall:
> stat("/hcfs",{ mode=3Ddrwxr-xr-x ,inode=3D3,size=3D2012,blksize=3D16384 }=
) =3D 0
> (0x0)
>
> There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp
> -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or
> -path /var/db/portsnap -prune -or -print' running which is also hung:
>
> =A02650 101674 find =A0 =A0 =A0 =A0 =A0 =A0 - =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0mi_switch+0x176
> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
> +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
> +0xe2
>
> However I/O to the presently mounted filesystems continues to work (even
> on parts of filesystems which are unlikely to be cached), and 'zfs list'
> showing all the filesystems (3,500 filesystems with ~100 snapshots per
> filesystem) also works.
>
> Any activity on the structure of the ZFS hierarchy *under the hcfs
> filesystem* crashes, such as a 'zfs create hpool/hcfs/test':
>
> 70868 101874 zfs =A0 =A0 =A0 =A0 =A0 =A0 =A0- =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0mi_switch+0x176
> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
> +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce
> syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
>
> BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same
> pool, but not rooted on hpool/hcfs) does not hang, and succeeds
> normally.
>
> procstat -kk on the zfskern process gives:
>
> =A0PID =A0 =A0TID COMM =A0 =A0 =A0 =A0 =A0 =A0 TDNAME
> KSTACK
> =A0 =A05 100045 zfskern =A0 =A0 =A0 =A0 =A0arc_reclaim_thre mi_switch+0x1=
76
> sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9
> fork_exit+0x118 fork_trampoline+0xe
> =A0 =A05 100046 zfskern =A0 =A0 =A0 =A0 =A0l2arc_feed_threa mi_switch+0x1=
76
> sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce
> fork_exit+0x118 fork_trampoline+0xe
> =A0 =A05 100098 zfskern =A0 =A0 =A0 =A0 =A0txg_thread_enter mi_switch+0x1=
76
> sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread
> +0xb5 fork_exit+0x118 fork_trampoline+0xe
> =A0 =A05 100099 zfskern =A0 =A0 =A0 =A0 =A0txg_thread_enter mi_switch+0x1=
76
> sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c
> txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe
>
> Any ideas on what might be causing this?

It sounds like the bug Martin Matuska has recently fixed in FreeBSD
and reported upstream to Illumos:
https://www.illumos.org/issues/1313

The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th.

--Artem

>
> Thank you for supporting ZFS on FreeBSD!
>
> --
> Best Regards,
> Luke Marsden
> CTO, Hybrid Logic Ltd.
>
> Web: http://www.hybrid-cluster.com/
> Hybrid Web Cluster - cloud web hosting
>
>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>

From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 20:28:25 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E8F411065678
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:28:25 +0000 (UTC)
	(envelope-from ee@athyriogames.com)
Received: from madonna.sslcatacombnetworking.com
	(madonna.sslcatacombnetworking.com [174.133.19.130])
	by mx1.freebsd.org (Postfix) with ESMTP id BA4318FC18
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 20:28:25 +0000 (UTC)
Received: from c-98-206-215-156.hsd1.in.comcast.net ([98.206.215.156]
	helo=laptopv)
	by madonna.sslcatacombnetworking.com with esmtpa (Exim 4.69)
	(envelope-from <ee@athyriogames.com>) id 1Qy797-0000Yz-A0
	for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 14:05:21 -0500
From: "Engineering" <ee@athyriogames.com>
To: <freebsd-fs@freebsd.org>
Date: Mon, 29 Aug 2011 14:15:08 -0500
Message-ID: <01c801cc667f$f99eb7b0$ecdc2710$@com>
MIME-Version: 1.0
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Acxmf/bkxg/WNgMwTaWJDIhaLpbyEA==
Content-Language: en-us
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - madonna.sslcatacombnetworking.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - athyriogames.com
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Read-only disk problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 20:28:26 -0000

Hello all.

 
Please let me know if this is the wrong place to ask. I am working on an
embedded system using FreeBSD 7.2, bootinf and running off of flash memory.
In order to not burn out the flash, I use the 'diskless' scripts and mount
the flash read-only. I have used this configuration successfully in the
past.

 
I've recently added a utility to check for disk corruption, basically
checksumming the / and /usr partitions. Since they are both read-only, I
thought this would work. What I have discovered is that something in the
partition is changing between boots.

 
I dd'd the flash over a couple of boots, and compared the binaries to see
what was changing. It is a small amount of data, spread across the disk, in
an interval that looks very similar to the interval of the 'superblocks'

 
Is there any data that is written to the disk at boot or mount time, and if
so, is there a way to prevent it?

 
Thanks

Sam


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 20:54:10 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 777F61065673;
	Mon, 29 Aug 2011 20:54:10 +0000 (UTC) (envelope-from mm@FreeBSD.org)
Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3])
	by mx1.freebsd.org (Postfix) with ESMTP id EE04F8FC0A;
	Mon, 29 Aug 2011 20:54:09 +0000 (UTC)
Received: from core.vx.sk (localhost [127.0.0.1])
	by mail.vx.sk (Postfix) with ESMTP id 4D939190586;
	Mon, 29 Aug 2011 22:54:09 +0200 (CEST)
X-Virus-Scanned: amavisd-new at mail.vx.sk
Received: from mail.vx.sk ([127.0.0.1])
	by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024)
	with LMTP id CqopX2gWjr6A; Mon, 29 Aug 2011 22:54:06 +0200 (CEST)
Received: from [10.9.8.1] (188-167-78-15.dynamic.chello.sk [188.167.78.15])
	by mail.vx.sk (Postfix) with ESMTPSA id 99B71190578;
	Mon, 29 Aug 2011 22:54:06 +0200 (CEST)
Message-ID: <4E5BFC6F.5080507@FreeBSD.org>
Date: Mon, 29 Aug 2011 22:54:07 +0200
From: Martin Matuska <mm@FreeBSD.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: Artem Belevich <art@freebsd.org>
References: <1314646728.7898.44.camel@pow>
	<CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
In-Reply-To: <CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
X-Enigmail-Version: 1.3.1
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk, luke@hybrid-logic.co.uk
Subject: Re: ZFS hang in production on 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 20:54:10 -0000

On 29. 8. 2011 21:55, Artem Belevich wrote:
> On Mon, Aug 29, 2011 at 12:38 PM, Luke Marsden
> <luke-lists@hybrid-logic.co.uk> wrote:
>> Hi all,
>>
>> I've just noticed a "partial" ZFS deadlock in production on 8.2-RELEASE.
>>
>> FreeBSD XXX 8.2-RELEASE FreeBSD 8.2-RELEASE #0 r219081M: Wed Mar  2
>> 08:29:52 CET 2011     root@www4:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>> There are 9 'zfs rename' processes and 1 'zfs umount -f' processes hung.
>> Here is the procstat for the 'zfs umount -f':
>>
>> 13451 104337 zfs              -                mi_switch+0x176
>> sleepq_wait+0x42 _sleep+0x317 zfsvfs_teardown+0x269 zfs_umount+0x1c4
>> dounmount+0x32a unmount+0x38b syscallenter+0x1e5 syscall+0x4b
>> Xfast_syscall+0xe2
>>
>> And the 'zfs rename's all look the same:
>>
>> 20361 101049 zfs              -                mi_switch+0x176
>> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
>> +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_rmdirat+0xa4
>> syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
>>
>> An 'ls' on a directory which contains most of the system's ZFS
>> mount-points (/hcfs) also hangs:
>>
>> 30073 101466 gnuls            -                mi_switch+0x176
>> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
>> +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
>> +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
>> +0xe2
>>
>> If I truss the 'ls' it hangs on the stat syscall:
>> stat("/hcfs",{ mode=drwxr-xr-x ,inode=3,size=2012,blksize=16384 }) = 0
>> (0x0)
>>
>> There is also a 'find -s / ! ( -fstype zfs ) -prune -or -path /tmp
>> -prune -or -path /usr/tmp -prune -or -path /var/tmp -prune -or
>> -path /var/db/portsnap -prune -or -print' running which is also hung:
>>
>>  2650 101674 find             -                mi_switch+0x176
>> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
>> +0x46 _vn_lock+0x47 zfs_root+0x85 lookup+0x9b8 namei+0x53a vn_open_cred
>> +0x3ac kern_openat+0x181 syscallenter+0x1e5 syscall+0x4b Xfast_syscall
>> +0xe2
>>
>> However I/O to the presently mounted filesystems continues to work (even
>> on parts of filesystems which are unlikely to be cached), and 'zfs list'
>> showing all the filesystems (3,500 filesystems with ~100 snapshots per
>> filesystem) also works.
>>
>> Any activity on the structure of the ZFS hierarchy *under the hcfs
>> filesystem* crashes, such as a 'zfs create hpool/hcfs/test':
>>
>> 70868 101874 zfs              -                mi_switch+0x176
>> sleepq_wait+0x42 __lockmgr_args+0x743 vop_stdlock+0x39 VOP_LOCK1_APV
>> +0x46 _vn_lock+0x47 lookup+0x6e1 namei+0x53a kern_mkdirat+0xce
>> syscallenter+0x1e5 syscall+0x4b Xfast_syscall+0xe2
>>
>> BUT "zfs create hpool/system/opt/hello" (a ZFS filesystem in the same
>> pool, but not rooted on hpool/hcfs) does not hang, and succeeds
>> normally.
>>
>> procstat -kk on the zfskern process gives:
>>
>>  PID    TID COMM             TDNAME
>> KSTACK
>>    5 100045 zfskern          arc_reclaim_thre mi_switch+0x176
>> sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9
>> fork_exit+0x118 fork_trampoline+0xe
>>    5 100046 zfskern          l2arc_feed_threa mi_switch+0x176
>> sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce
>> fork_exit+0x118 fork_trampoline+0xe
>>    5 100098 zfskern          txg_thread_enter mi_switch+0x176
>> sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread
>> +0xb5 fork_exit+0x118 fork_trampoline+0xe
>>    5 100099 zfskern          txg_thread_enter mi_switch+0x176
>> sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c
>> txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe
>>
>> Any ideas on what might be causing this?
> It sounds like the bug Martin Matuska has recently fixed in FreeBSD
> and reported upstream to Illumos:
> https://www.illumos.org/issues/1313
>
> The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th.
>
> --Artem
No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed
to stable/8 as r225100).

The corresponding patch is:
http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch

-- 
Martin Matuska
FreeBSD committer
http://blog.vx.sk


From owner-freebsd-fs@FreeBSD.ORG  Mon Aug 29 22:02:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A54C0106564A
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 22:02:34 +0000 (UTC)
	(envelope-from luke@digital-crocus.com)
Received: from mail.digital-crocus.com (node2.digital-crocus.com
	[91.209.244.128])
	by mx1.freebsd.org (Postfix) with ESMTP id 5CA2E8FC16
	for <freebsd-fs@freebsd.org>; Mon, 29 Aug 2011 22:02:33 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dkselector;
	d=hybrid-logic.co.uk; 
	h=Received:Received:Subject:From:To:Cc:In-Reply-To:References:Content-Type:Organization:Date:Message-ID:Mime-Version:X-Mailer:Content-Transfer-Encoding:X-Spam-Score:X-Digital-Crocus-Maillimit:X-Authenticated-Sender:X-Complaints:X-Admin:X-Abuse;
	b=IA1dEe16vsCBPNPFuyUogNAa+3uMzCWdxpHOxNfWzqA0rJYqr+xc0G5QnpKWV8ivCzcJIoOSCgpF2dbzP591ar2e9en6cefnedej11ABZ/j79fpXR3Vi1y5NcarLrSOH;
Received: from luke by mail.digital-crocus.com with local (Exim 4.69 (FreeBSD))
	(envelope-from <luke@digital-crocus.com>) id 1Qy9tm-000AMo-63
	for freebsd-fs@freebsd.org; Mon, 29 Aug 2011 23:01:42 +0100
Received: from 127cr.net ([78.105.122.99] helo=[192.168.1.23])
	by mail.digital-crocus.com with esmtpa (Exim 4.69 (FreeBSD))
	(envelope-from <luke@hybrid-logic.co.uk>)
	id 1Qy9tl-000AMW-Oz; Mon, 29 Aug 2011 23:01:42 +0100
From: Luke Marsden <luke@hybrid-logic.co.uk>
To: Martin Matuska <mm@FreeBSD.org>
In-Reply-To: <4E5BFC6F.5080507@FreeBSD.org>
References: <1314646728.7898.44.camel@pow>
	<CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
	<4E5BFC6F.5080507@FreeBSD.org>
Content-Type: text/plain; charset="UTF-8"
Organization: Hybrid Logic
Date: Mon, 29 Aug 2011 23:02:29 +0100
Message-ID: <1314655349.7898.53.camel@pow>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.2 
Content-Transfer-Encoding: 7bit
X-Spam-Score: -1.0
X-Digital-Crocus-Maillimit: done
X-Authenticated-Sender: luke
X-Complaints: abuse@digital-crocus.com
X-Admin: admin@digital-crocus.com
X-Abuse: abuse@digital-crocus.com (Please include full headers in abuse
	reports)
Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk
Subject: Re: ZFS hang in production on 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Aug 2011 22:02:34 -0000

On Mon, 2011-08-29 at 22:54 +0200, Martin Matuska wrote:
> >> procstat -kk on the zfskern process gives:
> >>
> >>  PID    TID COMM             TDNAME
> >> KSTACK
> >>    5 100045 zfskern          arc_reclaim_thre mi_switch+0x176
> >> sleepq_timedwait+0x42 _cv_timedwait+0x134 arc_reclaim_thread+0x2a9
> >> fork_exit+0x118 fork_trampoline+0xe
> >>    5 100046 zfskern          l2arc_feed_threa mi_switch+0x176
> >> sleepq_timedwait+0x42 _cv_timedwait+0x134 l2arc_feed_thread+0x1ce
> >> fork_exit+0x118 fork_trampoline+0xe
> >>    5 100098 zfskern          txg_thread_enter mi_switch+0x176
> >> sleepq_wait+0x42 _cv_wait+0x129 txg_thread_wait+0x79 txg_quiesce_thread
> >> +0xb5 fork_exit+0x118 fork_trampoline+0xe
> >>    5 100099 zfskern          txg_thread_enter mi_switch+0x176
> >> sleepq_timedwait+0x42 _cv_timedwait+0x134 txg_thread_wait+0x3c
> >> txg_sync_thread+0x365 fork_exit+0x118 fork_trampoline+0xe
> >>
> >> Any ideas on what might be causing this?
> > It sounds like the bug Martin Matuska has recently fixed in FreeBSD
> > and reported upstream to Illumos:
> > https://www.illumos.org/issues/1313
> >
> > The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th.
> >
> > --Artem
> No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed
> to stable/8 as r225100).
> 
> The corresponding patch is:
> http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch
> 

Great, thanks!  Will this patch apply to ZFS v15?  We can't upgrade to
v28 yet.

-- 
Best Regards,
Luke Marsden
CTO, Hybrid Logic Ltd.

Web: http://www.hybrid-cluster.com/
Hybrid Web Cluster - cloud web hosting

Mobile: +1-415-449-1165 (US) / +447791750420 (UK)


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 01:20:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9356C106566B;
	Tue, 30 Aug 2011 01:20:48 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 27EEB8FC0C;
	Tue, 30 Aug 2011 01:20:48 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAHI6XE6DaFvO/2dsb2JhbABChEykMYFAAQEBAQMBAQEgKyALGw4KAgINGQIpAQkmBggHBAEcBIdVpw+RdoEshA+BEQSRDoIRkSA
X-IronPort-AV: E=Sophos;i="4.68,299,1312171200"; d="scan'208";a="132621741"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 29 Aug 2011 21:20:47 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 7EFC4B3F06;
	Mon, 29 Aug 2011 21:20:47 -0400 (EDT)
Date: Mon, 29 Aug 2011 21:20:47 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: George Liaskos <geo.liaskos@gmail.com>
Message-ID: <1173512509.517816.1314667247490.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CANcjpOAGDCnBrHLpYWUm2ydEzdfmKDa4DFCO77=mP-Oznoxwbg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 01:20:48 -0000

George Liaskos wrote:
> Hello,
> 
> I upgraded my home server the past weekend from 8.2-STABLE to 9,
> after the upgrade users can no longer list the files / directories of
> a mount
> from a client machine.
> 
> I am using nfsv4 exports for almost a year now, never had an issue, i
> did not
> change the configuration during / after the upgrade. My kernel config
> was
> using NFSD already.
> 
> Some server config info; i am exporting ZFS file systems:
> 
> [/etc/exports]
> V4: /usr/local/data -sec=sys -network 192.168.0.0/24
> /usr/local/data/downloads -network 192.168.0.0/24 -maproot=root
> /usr/local/data/software -network 192.168.0.0/24 -maproot=root
> 
> [/etc/rc.conf]
> rpcbind_enable="YES"
> nfs_server_enable="YES"
> nfsv4_server_enable="YES"
> nfsuserd_enable="YES"
> mountd_flags="-r -l"
> mountd_enable="YES"
> 
> I am able to mount from the clients, root can list everything but
> other users can't
> either from console or from a file browser. I can still blindly cat /
> touch files,
> everything works except list. The same goes with local mounts on the
> server.
> 
Well, if non-root users can't "ls" locally on the server, this sounds more
like a ZFS issue than an NFS one. (I don't see this w.r.t. NFS when exporting
a UFS volume.)

I don't know anything about ZFS. I've added a couple of the ZFS guys to the
cc list, in case they don't read posts with NFS in the subject line.

rick
> Thank you in advance for your help.
> 
> Regards,
> George
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 02:10:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9BD0E106566B;
	Tue, 30 Aug 2011 02:10:28 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 73CAD8FC12;
	Tue, 30 Aug 2011 02:10:28 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p7U2AS1d026805;
	Tue, 30 Aug 2011 02:10:28 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p7U2ASDY026795;
	Tue, 30 Aug 2011 02:10:28 GMT (envelope-from linimon)
Date: Tue, 30 Aug 2011 02:10:28 GMT
Message-Id: <201108300210.p7U2ASDY026795@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/160283: [zfs] [patch] 'zfs list' does abort in
	make_dataset_handle
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 02:10:28 -0000

Old Synopsis: 'zfs list' does abort in make_dataset_handle
New Synopsis: [zfs] [patch] 'zfs list' does abort in make_dataset_handle

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Tue Aug 30 02:09:45 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=160283

From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 08:37:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 979C6106564A;
	Tue, 30 Aug 2011 08:37:05 +0000 (UTC)
	(envelope-from geo.liaskos@gmail.com)
Received: from mail-qy0-f175.google.com (mail-qy0-f175.google.com
	[209.85.216.175])
	by mx1.freebsd.org (Postfix) with ESMTP id DD8DD8FC0A;
	Tue, 30 Aug 2011 08:37:04 +0000 (UTC)
Received: by qyk4 with SMTP id 4so2454570qyk.13
	for <multiple recipients>; Tue, 30 Aug 2011 01:37:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=v5Hdo+B71fSU0VG+l2LC1ElJC7S61EGGP08atC0Qw7g=;
	b=jKVx+7Smr2TUoVQ2uVaxsgirxLCZQCzXmxhdwU7MTIXKGLYzNz7kNYz9MSivdP2Lyb
	1wzsQUVMv+7f0umMXvGvmZebifBYYa0Y2e0ZbSTJZy85mMYDnWFrn7IFub8QC/IhENJJ
	tPdTsPcuLyPn1oXLGOD2LCgm3pJGc14Z/Khfs=
MIME-Version: 1.0
Received: by 10.224.27.68 with SMTP id h4mr2492217qac.335.1314693423932; Tue,
	30 Aug 2011 01:37:03 -0700 (PDT)
Received: by 10.229.89.138 with HTTP; Tue, 30 Aug 2011 01:37:03 -0700 (PDT)
In-Reply-To: <1173512509.517816.1314667247490.JavaMail.root@erie.cs.uoguelph.ca>
References: <CANcjpOAGDCnBrHLpYWUm2ydEzdfmKDa4DFCO77=mP-Oznoxwbg@mail.gmail.com>
	<1173512509.517816.1314667247490.JavaMail.root@erie.cs.uoguelph.ca>
Date: Tue, 30 Aug 2011 11:37:03 +0300
Message-ID: <CANcjpOAsOWRRL0BVk_dX22gOQ72KvrJL6hRRJMvMshATHq8-Tw@mail.gmail.com>
From: George Liaskos <geo.liaskos@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 08:37:05 -0000

> Well, if non-root users can't "ls" locally on the server, this sounds more
> like a ZFS issue than an NFS one. (I don't see this w.r.t. NFS when exporting
> a UFS volume.)
>
> I don't know anything about ZFS. I've added a couple of the ZFS guys to the
> cc list, in case they don't read posts with NFS in the subject line.
>
> rick

Just to be clear, non root users can't ls mounted exports on the server.
Using ls directly on the ZFS file system works.

I exported a UFS directory, everything works... So this is either a ZFS or
an ACL related issue. I will setup a clean VM to see if i can reproduce this.

Thank you for your response.

Regards,
George

From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 14:50:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D8D19106566B
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 14:50:30 +0000 (UTC)
	(envelope-from ee@athyriogames.com)
Received: from madonna.sslcatacombnetworking.com
	(madonna.sslcatacombnetworking.com [174.133.19.130])
	by mx1.freebsd.org (Postfix) with ESMTP id B713D8FC19
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 14:50:30 +0000 (UTC)
Received: from c-98-206-215-156.hsd1.in.comcast.net ([98.206.215.156]
	helo=laptopv)
	by madonna.sslcatacombnetworking.com with esmtpa (Exim 4.69)
	(envelope-from <ee@athyriogames.com>) id 1QyPTr-0004c9-Mt
	for freebsd-fs@freebsd.org; Tue, 30 Aug 2011 09:39:59 -0500
From: "Engineering" <ee@athyriogames.com>
To: <freebsd-fs@freebsd.org>
References: <01c801cc667f$f99eb7b0$ecdc2710$@com>
In-Reply-To: <01c801cc667f$f99eb7b0$ecdc2710$@com>
Date: Tue, 30 Aug 2011 09:49:41 -0500
Message-ID: <020d01cc6724$0f0410b0$2d0c3210$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Acxmf/bkxg/WNgMwTaWJDIhaLpbyEAAo4bzQ
Content-Language: en-us
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - madonna.sslcatacombnetworking.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - athyriogames.com
Subject: RE: Read-only disk problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 14:50:30 -0000

Hi, I've attached some more info. Doing a fsdump shows the following changes
over reboot

magic	19540119 (UFS2)	time	Tue Aug 30 03:08:04 2011
...
cg 1:
magic	90255	tell	4b1c000	time	Tue Aug 30 03:08:04 2011

Changes to

magic	19540119 (UFS2)	time	Tue Aug 30 03:13:14 2011
...
cg 1:
magic	90255	tell	4b1c000	time	Tue Aug 30 03:13:14 2011

Sam

-----Original Message-----
From: owner-freebsd-fs@freebsd.org [mailto:owner-freebsd-fs@freebsd.org] On
Behalf Of Engineering
Sent: Monday, August 29, 2011 2:15 PM
To: freebsd-fs@freebsd.org
Subject: Read-only disk problem

Please let me know if this is the wrong place to ask. I am working on an
embedded system using FreeBSD 7.2, bootinf and running off of flash memory.
In order to not burn out the flash, I use the 'diskless' scripts and mount
the flash read-only. I have used this configuration successfully in the
past.

I've recently added a utility to check for disk corruption, basically
checksumming the / and /usr partitions. Since they are both read-only, I
thought this would work. What I have discovered is that something in the
partition is changing between boots.

I dd'd the flash over a couple of boots, and compared the binaries to see
what was changing. It is a small amount of data, spread across the disk, in
an interval that looks very similar to the interval of the 'superblocks'

Is there any data that is written to the disk at boot or mount time, and if
so, is there a way to prevent it?

Thanks
Sam

_______________________________________________
freebsd-fs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 15:10:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DD4B11065670;
	Tue, 30 Aug 2011 15:10:27 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 34F0C8FC23;
	Tue, 30 Aug 2011 15:10:26 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EACD9XE6DaFvO/2dsb2JhbABChEykSIFAAQEEASMEUgUWDgoCAg0ZAlkGiAWnW5IJgSyEEIERBJMkkSE
X-IronPort-AV: E=Sophos;i="4.68,302,1312171200"; d="scan'208";a="135967478"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 30 Aug 2011 11:10:14 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 5CC87B3F05;
	Tue, 30 Aug 2011 11:10:14 -0400 (EDT)
Date: Tue, 30 Aug 2011 11:10:14 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: George Liaskos <geo.liaskos@gmail.com>
Message-ID: <1005169645.540203.1314717014356.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CANcjpOAsOWRRL0BVk_dX22gOQ72KvrJL6hRRJMvMshATHq8-Tw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 15:10:27 -0000

George Liaskos wrote:
> > Well, if non-root users can't "ls" locally on the server, this
> > sounds more
> > like a ZFS issue than an NFS one. (I don't see this w.r.t. NFS when
> > exporting
> > a UFS volume.)
> >
> > I don't know anything about ZFS. I've added a couple of the ZFS guys
> > to the
> > cc list, in case they don't read posts with NFS in the subject line.
> >
> > rick
> 
> Just to be clear, non root users can't ls mounted exports on the
> server.
> Using ls directly on the ZFS file system works.
> 
> I exported a UFS directory, everything works... So this is either a
> ZFS or
> an ACL related issue. I will setup a clean VM to see if i can
> reproduce this.
> 
You could try this patch and see what effect it has (applied to the
server). It just disables the access check for readdir.
--- nfs_nfsdport.c.sav2	2011-08-30 10:35:58.000000000 -0400
+++ nfs_nfsdport.c	2011-08-30 10:36:54.000000000 -0400
@@ -1838,10 +1838,12 @@ nfsrvd_readdirplus(struct nfsrv_descript
 		nd->nd_repstat = NFSERR_NOTDIR;
 	if (!nd->nd_repstat && cnt == 0)
 		nd->nd_repstat = NFSERR_TOOSMALL;
+#ifdef notnow
 	if (!nd->nd_repstat)
 		nd->nd_repstat = nfsvno_accchk(vp, VEXEC,
 		    nd->nd_cred, exp, p, NFSACCCHK_NOOVERRIDE,
 		    NFSACCCHK_VPISLOCKED, NULL);
+#endif
 	if (nd->nd_repstat) {
 		vput(vp);
 		if (nd->nd_flag & ND_NFSV3)

This wouldn't be suitable for a production system, but whether or
not it "fixes" the problem would give us an indication of where the
problem is.

Also, if you could clarify when your 8/stable was downloaded, whether
your 9.0 upgrade was to vanilla Beta1 or ??? and details w.r.t. your
ZFS setup, that might help.

And one more... If you could create a fresh ZFS pool/volume and export
that to see if it exhibits the same problem, that information could
help figure it out, too.

Please let us know how it goes, rick

> Thank you for your response.
> 
> Regards,
> George

From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 19:10:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 30F3D1065670
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 19:10:28 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id C648B8FC1B
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 19:10:27 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 14D604AC31
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 23:10:26 +0400 (MSD)
Date: Tue, 30 Aug 2011 23:10:24 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <1945418039.20110830231024@serebryakov.spb.ru>
To: freebsd-fs@freebsd.org
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Subject: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 19:10:28 -0000

Hello, Freebsd-fs.

  Now, when I "defragmented" my large FS, I see very inconsistent read
speeds on same files. Is it Ok?

  My setup is:

 (1) FreeBSD 8.2-STABLE/x64
 (2) E4400 CPU, 2GiB RAM
 (3) 5xHDDs in RAID5 (software), controller is ICH9R.
 (4) UFS2 with 32KiB block, vfs.read_max=3D32 (1MiB read-ahead).
 (5) System and swap on another (6th) HDD, but swap is unused.
 (6) No periodic or background processes access FS in question at all.

 Simple program reads each of 12 files (420MiB each) 15 times in cycle
 like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed, as pr=
ograme
 returns to dame data every ~5.5GiB and here are only 2GiB physical
 memory in system.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 19:18:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 103891065670
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 19:18:19 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id CA7188FC08
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 19:18:18 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id B60674AC31
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 23:18:17 +0400 (MSD)
Date: Tue, 30 Aug 2011 23:18:15 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <317753422.20110830231815@serebryakov.spb.ru>
To: freebsd-fs@freebsd.org
In-Reply-To: <1945418039.20110830231024@serebryakov.spb.ru>
References: <1945418039.20110830231024@serebryakov.spb.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Subject: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 19:18:19 -0000

Hello, Freebsd-fs.
You wrote 30 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 23:10:24:

 SORRY FOR SENDING INCOMPLETE MESSAGE!

   Now, when I "defragmented" my large FS, I see very inconsistent
 read speeds on same files. Is it Ok?

  My setup is:

 (1) FreeBSD 8.2-STABLE/x64
 (2) E4400 CPU, 2GiB RAM
 (3) 5xHDDs in RAID5 (software), controller is ICH9R.
 (4) UFS2 with 32KiB block, vfs.read_max=3D32 (1MiB read-ahead).
 (5) System and swap on another (6th) HDD, but swap is unused.
 (6) No periodic or background processes access FS in question at all.

 Simple program reads each of 12 files (460MiB each) 15 times in cycle
 like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed,
 as reading process returns to same data every ~5.5GiB and here are
 only 2GiB physical memory in system.

 And speed of these reads are VERY inconsistent. I've calculated
min/average/max and standard deviation and results are like this:

Name        Min/Avg/Max       StdDev
r012f02.nef 120/235/413 MiB/s     83
r012f09.nef 154/248/393 MiB/s     80
r012f12.nef 106/212/293 MiB/s     63
r012f05.nef  86/206/280 MiB/s     62
r012f08.nef 128/223/332 MiB/s     60
r012f11.nef 155/257/327 MiB/s     56
r012f03.nef 121/213/279 MiB/s     52
r012f10.nef 120/226/284 MiB/s     45
r012f07.nef 121/199/249 MiB/s     41
r012f01.nef 135/199/242 MiB/s     33

  It is results from 15 runs! One time file was read at sustained
average speed 120MiB/s (~3.8 seconds) and next time it was 413MiB/s
(only ~1.1 second!)

  And it is not case when first read is slowest. No. Sometimes last
one is slowest, for example.

  Is it Ok? I'm very disappointed to see 120MiB/s when I know that
 hardware can give 415MiB/s, but something strange slows down the
 process.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 20:09:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A570F106564A;
	Tue, 30 Aug 2011 20:09:09 +0000 (UTC)
	(envelope-from mckusick@mckusick.com)
Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235])
	by mx1.freebsd.org (Postfix) with ESMTP id 8682F8FC0C;
	Tue, 30 Aug 2011 20:09:09 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
	by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p7UK9CBQ085481;
	Tue, 30 Aug 2011 13:09:12 -0700 (PDT)
	(envelope-from mckusick@chez.mckusick.com)
Message-Id: <201108302009.p7UK9CBQ085481@chez.mckusick.com>
To: lev@freebsd.org
In-reply-to: <317753422.20110830231815@serebryakov.spb.ru> 
Date: Tue, 30 Aug 2011 13:09:12 -0700
From: Kirk McKusick <mckusick@mckusick.com>
X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY
	autolearn=failed version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 20:09:09 -0000

Now that you have defragmented your filesystem, we can factor that out
of the equation. What is left is the management of the memory used for
caching the files. I expect what is happening is that you are busily
reading along and run out of free memory in which to read. This triggers
a cleanup thread that churns through the memory pool to decide what
should be thrown out to make room. Your reading process is demanding
memory faster than the cleanup thread can produce it. The result is
that your read idles (e.g., appears to run slowly). It is random because
it depends on when you run out of memory.

The cleanup is complex because it has to deal with all of memory and
it wants to avoid a simple LRU which would cause the read of a large 
file to throw out a lot of things you would rather keep (like your 
window manager, browser, etc). Not sure what the best strategy is here.

	Kirk McKusick

From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 22:14:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 20945106564A
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 22:14:09 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id B21688FC12
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 22:14:08 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id E209A4AC31; 
	Wed, 31 Aug 2011 02:14:06 +0400 (MSD)
Date: Wed, 31 Aug 2011 02:14:04 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD Project
X-Priority: 3 (Normal)
Message-ID: <103666698.20110831021404@serebryakov.spb.ru>
To: Kirk McKusick <mckusick@mckusick.com>
In-Reply-To: <201108302009.p7UK9CBQ085481@chez.mckusick.com>
References: <317753422.20110830231815@serebryakov.spb.ru>
	<201108302009.p7UK9CBQ085481@chez.mckusick.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 22:14:09 -0000

Hello, Kirk.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 0:09:12:

> Now that you have defragmented your filesystem, we can factor that out
> of the equation. What is left is the management of the memory used for
  Yep.

> caching the files. I expect what is happening is that you are busily
> reading along and run out of free memory in which to read. This triggers
> a cleanup thread that churns through the memory pool to decide what
> should be thrown out to make room. Your reading process is demanding
> memory faster than the cleanup thread can produce it. The result is
> that your read idles (e.g., appears to run slowly). It is random because
> it depends on when you run out of memory.
  It is interesting. But this box have two real cores (yes, I know,
 now it is ``only two cores'' :)) and nothing to do but this test
 program (single-threaded). Of course, this program read data into
 same buffer and didn't allocate anything in process of benchmark.

  Yes, I know, that kernel prefer not to throw away data even if it is
not needed right now, but here decision looks very simple :)

 And, one more detail: I use O_DIRECT flag in open(2).

  Other interesting observing: this program consume about 20% of one
core. It is very strange for I/O bound process, isn't it?

> The cleanup is complex because it has to deal with all of memory and
> it wants to avoid a simple LRU which would cause the read of a large=20
> file to throw out a lot of things you would rather keep (like your=20
> window manager, browser, etc). Not sure what the best strategy is here.
  window manager and browser look different from one small (128Kb)
 buffer which is ovewritten again and again. As far as I understand,
 FreeBSD's VM (according to "Design and Implementation of...) should
 move such buffers (data read recently, not changed and not needed)
 into "Inact" state and it is easy task to re-use these buffers --
 they don't belong to any active process, they don't need to be paged
 or swapped out, etc.

  I'll try this experiment with mmap() and touching every 4096-th byte of
mapped memory instead of read(2).

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 22:29:37 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 09E3C106564A
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 22:29:37 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id C4F628FC12
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 22:29:36 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 0580D4AC58; 
	Wed, 31 Aug 2011 02:29:35 +0400 (MSD)
Date: Wed, 31 Aug 2011 02:29:33 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <1693072185.20110831022933@serebryakov.spb.ru>
To: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@freebsd.org
In-Reply-To: <103666698.20110831021404@serebryakov.spb.ru>
References: <317753422.20110830231815@serebryakov.spb.ru>
	<201108302009.p7UK9CBQ085481@chez.mckusick.com>
	<103666698.20110831021404@serebryakov.spb.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: 
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 22:29:37 -0000

Hello, Kirk.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 2:14:04:

>   I'll try this experiment with mmap() and touching every 4096-th byte of
> mapped memory instead of read(2).
  Strange enough, it gives only 40-50MiB/s and results are very
 consistent.

  It really surprise me. I didn't think, that there will be so much
difference, I was sure, that it will be almost equivalent speed.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 22:31:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C74DE106564A
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 22:31:59 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E5088FC16
	for <freebsd-fs@freebsd.org>; Tue, 30 Aug 2011 22:31:59 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 4DD334AC31; 
	Wed, 31 Aug 2011 02:31:58 +0400 (MSD)
Date: Wed, 31 Aug 2011 02:31:55 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD Project
X-Priority: 3 (Normal)
Message-ID: <612137475.20110831023155@serebryakov.spb.ru>
To: Kirk McKusick <mckusick@mckusick.com>
In-Reply-To: <201108302009.p7UK9CBQ085481@chez.mckusick.com>
References: <317753422.20110830231815@serebryakov.spb.ru>
	<201108302009.p7UK9CBQ085481@chez.mckusick.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 22:31:59 -0000

Hello, Kirk.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 0:09:12:

> memory faster than the cleanup thread can produce it. The result is
> that your read idles (e.g., appears to run slowly). It is random because
> it depends on when you run out of memory.
   BTW, it could explain why some runs are slower than other. But my
 situation looks like opposite: some runs much faster than others. And
 it could not be read-from-cache if VM is sane. It is hard to belive,
 that VM will store 0.5GiB of read-once data when another 5GiB was
 read after that only in 2GiB of physical memory

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 23:00:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A0597106566B;
	Tue, 30 Aug 2011 23:00:43 +0000 (UTC)
	(envelope-from mckusick@mckusick.com)
Received: from chez.mckusick.com (chez.mckusick.com [70.36.157.235])
	by mx1.freebsd.org (Postfix) with ESMTP id 82F168FC14;
	Tue, 30 Aug 2011 23:00:43 +0000 (UTC)
Received: from chez.mckusick.com (localhost [127.0.0.1])
	by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id p7UN0jJ6022811;
	Tue, 30 Aug 2011 16:00:45 -0700 (PDT)
	(envelope-from mckusick@chez.mckusick.com)
Message-Id: <201108302300.p7UN0jJ6022811@chez.mckusick.com>
To: lev@FreeBSD.org
In-reply-to: <1693072185.20110831022933@serebryakov.spb.ru> 
Date: Tue, 30 Aug 2011 16:00:45 -0700
From: Kirk McKusick <mckusick@mckusick.com>
X-Spam-Status: No, score=0.0 required=5.0 tests=MISSING_MID, UNPARSEABLE_RELAY
	autolearn=failed version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chez.mckusick.com
Cc: freebsd-fs@FreeBSD.org
Subject: Re: Very inconsistent (read) speed on UFS2 
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 23:00:43 -0000

> Date: Wed, 31 Aug 2011 02:29:33 +0400
> From: Lev Serebryakov <lev@FreeBSD.org>
> To: Kirk McKusick <mckusick@mckusick.com>, freebsd-fs@FreeBSD.org
> Subject: Re: Very inconsistent (read) speed on UFS2
> 
> Hello, Kirk.
> 
> >   I'll try this experiment with mmap() and touching every 4096-th byte of
> > mapped memory instead of read(2).
> 
>   Strange enough, it gives only 40-50MiB/s and results are very
>  consistent.
> 
>   It really surprise me. I didn't think, that there will be so much
> difference, I was sure, that it will be almost equivalent speed.
> 
> --
> // Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>

I had not realized that you were using O_DIRECT. That would in fact
avoid most of the caching / memory-recovery effects that I was blaming
earlier. Your test above is definitely hitting them though. My guess
is that the consistency is because you are measuring the rate at which
free memory can be created.

So, my new theory on why your O_DIRECT test is running slowly is due
to the single threading in the GEOM layer. Pawel Jakub Dawidek (pjd@)
gave a very interesting talk on this problem at this year's BSDCan.

	Kirk McKusick

From owner-freebsd-fs@FreeBSD.ORG  Tue Aug 30 23:03:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D2E111065672;
	Tue, 30 Aug 2011 23:03:13 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 9AEA18FC0A;
	Tue, 30 Aug 2011 23:03:13 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p7UMe7il013870; Tue, 30 Aug 2011 17:40:07 -0500 (CDT)
Date: Tue, 30 Aug 2011 17:40:07 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Lev Serebryakov <lev@freebsd.org>
In-Reply-To: <1693072185.20110831022933@serebryakov.spb.ru>
Message-ID: <alpine.GSO.2.01.1108301735030.3028@freddy.simplesystems.org>
References: <317753422.20110830231815@serebryakov.spb.ru>
	<201108302009.p7UK9CBQ085481@chez.mckusick.com>
	<103666698.20110831021404@serebryakov.spb.ru>
	<1693072185.20110831022933@serebryakov.spb.ru>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Tue, 30 Aug 2011 17:40:07 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 30 Aug 2011 23:03:13 -0000

On Wed, 31 Aug 2011, Lev Serebryakov wrote:
>
>>   I'll try this experiment with mmap() and touching every 4096-th byte of
>> mapped memory instead of read(2).
>  Strange enough, it gives only 40-50MiB/s and results are very
> consistent.
>
>  It really surprise me. I didn't think, that there will be so much
> difference, I was sure, that it will be almost equivalent speed.

FreeBSD does not seem to default to sequential read-ahead when memory 
mapping is used with sequential page access.  Try using madvise() 
with the MADV_SEQUENTIAL option and see if it helps.

There are also MADV_WILLNEED, MADV_DONTNEED, and MADV_FREE.  Careful 
use of these options can help performance quite a lot when data is 
large compared to memory.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 00:17:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0A32B1065672;
	Wed, 31 Aug 2011 00:17:58 +0000 (UTC)
	(envelope-from geo.liaskos@gmail.com)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 4C1478FC1A;
	Wed, 31 Aug 2011 00:17:57 +0000 (UTC)
Received: by yib19 with SMTP id 19so203312yib.13
	for <multiple recipients>; Tue, 30 Aug 2011 17:17:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=txEr8UwW/UN26zN9v8G1HaWBotqxGa+j0bIv/W1a3ec=;
	b=vbzkAg/o50XUter5QAinTAecKgFe9+y88Yifln/tTNHs/S/WqTbUIlhr7x8oXYWFUL
	jZ8woSynsuuqIVHnFNhlXsESRGPvkmrOEGZ4oZfiiBYSaD4tuVYQ2wOvZPUJWs3HAYKp
	HKJdHeiPRTYm+rVa4D7IxI2OKG6WoNW8ihDuY=
MIME-Version: 1.0
Received: by 10.101.3.40 with SMTP id f40mr5713633ani.89.1314749876693; Tue,
	30 Aug 2011 17:17:56 -0700 (PDT)
Received: by 10.100.42.15 with HTTP; Tue, 30 Aug 2011 17:17:56 -0700 (PDT)
In-Reply-To: <1005169645.540203.1314717014356.JavaMail.root@erie.cs.uoguelph.ca>
References: <CANcjpOAsOWRRL0BVk_dX22gOQ72KvrJL6hRRJMvMshATHq8-Tw@mail.gmail.com>
	<1005169645.540203.1314717014356.JavaMail.root@erie.cs.uoguelph.ca>
Date: Wed, 31 Aug 2011 03:17:56 +0300
Message-ID: <CANcjpOByga-_DPnrm69731q6CvkGV7hHSHRVsnakzkVjzTQOHw@mail.gmail.com>
From: George Liaskos <geo.liaskos@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 00:17:58 -0000

> You could try this patch and see what effect it has (applied to the
> server). It just disables the access check for readdir.
> --- nfs_nfsdport.c.sav2 2011-08-30 10:35:58.000000000 -0400
> +++ nfs_nfsdport.c =C2=A0 =C2=A0 =C2=A02011-08-30 10:36:54.000000000 -040=
0
> @@ -1838,10 +1838,12 @@ nfsrvd_readdirplus(struct nfsrv_descript
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =3D=
 NFSERR_NOTDIR;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat && cnt =3D=3D 0)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =3D=
 NFSERR_TOOSMALL;
> +#ifdef notnow
> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =3D=
 nfsvno_accchk(vp, VEXEC,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->=
nd_cred, exp, p, NFSACCCHK_NOOVERRIDE,
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NFSA=
CCCHK_VPISLOCKED, NULL);
> +#endif
> =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_repstat) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vput(vp);
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_flag & =
ND_NFSV3)
>
> This wouldn't be suitable for a production system, but whether or
> not it "fixes" the problem would give us an indication of where the
> problem is.
>
> Also, if you could clarify when your 8/stable was downloaded, whether
> your 9.0 upgrade was to vanilla Beta1 or ??? and details w.r.t. your
> ZFS setup, that might help.

I use svn, unfortunately i don't remember exactly when i moved from
8.2 to stable. I synced with CURRENT last week and this issue
appeared, i did a second update to beta 2 [r225237] with the same results.

The patch didn't make any difference. I downloaded an ISO with BETA-1 and
made a VM installation, i was not able to reproduce this.

Updated one of the clients to r225237, setup some nfs exports on top of ZFS
and ls does not work for non root users. I created a new pool on top
of a memory fs
to test this.

Next, i "downgraded" the server to BETA-1 [r224413] and everything is
back to normal.
So there's a bug which was introduced somewhere between BETA-1 && BETA-2 :p

Thank you for your help!

Regards,
George

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 00:42:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8E91D106566B
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 00:42:56 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta07.westchester.pa.mail.comcast.net
	(qmta07.westchester.pa.mail.comcast.net [76.96.62.64])
	by mx1.freebsd.org (Postfix) with ESMTP id 4E57D8FC0A
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 00:42:56 +0000 (UTC)
Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71])
	by qmta07.westchester.pa.mail.comcast.net with comcast
	id Soiw1h0031YDfWL57oiwTt; Wed, 31 Aug 2011 00:42:56 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta20.westchester.pa.mail.comcast.net with comcast
	id Sois1h0111t3BNj3goitbq; Wed, 31 Aug 2011 00:42:54 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 4D12C102C36; Tue, 30 Aug 2011 17:42:51 -0700 (PDT)
Date: Tue, 30 Aug 2011 17:42:51 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Lev Serebryakov <lev@FreeBSD.org>
Message-ID: <20110831004251.GA89979@icarus.home.lan>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <317753422.20110830231815@serebryakov.spb.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 00:42:56 -0000

On Tue, Aug 30, 2011 at 11:18:15PM +0400, Lev Serebryakov wrote:
>    Now, when I "defragmented" my large FS, I see very inconsistent
>  read speeds on same files. Is it Ok?
> 
>   My setup is:
> 
>  (1) FreeBSD 8.2-STABLE/x64
>  (2) E4400 CPU, 2GiB RAM
>  (3) 5xHDDs in RAID5 (software), controller is ICH9R.
>  (4) UFS2 with 32KiB block, vfs.read_max=32 (1MiB read-ahead).
>  (5) System and swap on another (6th) HDD, but swap is unused.
>  (6) No periodic or background processes access FS in question at all.
> 
>  Simple program reads each of 12 files (460MiB each) 15 times in cycle
>  like 01, 02, ..., 12, 01,... so, cache in memory should be thrashed,
>  as reading process returns to same data every ~5.5GiB and here are
>  only 2GiB physical memory in system.
> 
>  And speed of these reads are VERY inconsistent. I've calculated
> min/average/max and standard deviation and results are like this:
> 
> Name        Min/Avg/Max       StdDev
> r012f02.nef 120/235/413 MiB/s     83
> r012f09.nef 154/248/393 MiB/s     80
> r012f12.nef 106/212/293 MiB/s     63
> r012f05.nef  86/206/280 MiB/s     62
> r012f08.nef 128/223/332 MiB/s     60
> r012f11.nef 155/257/327 MiB/s     56
> r012f03.nef 121/213/279 MiB/s     52
> r012f10.nef 120/226/284 MiB/s     45
> r012f07.nef 121/199/249 MiB/s     41
> r012f01.nef 135/199/242 MiB/s     33
> 
>   It is results from 15 runs! One time file was read at sustained
> average speed 120MiB/s (~3.8 seconds) and next time it was 413MiB/s
> (only ~1.1 second!)
> 
>   And it is not case when first read is slowest. No. Sometimes last
> one is slowest, for example.
> 
>   Is it Ok? I'm very disappointed to see 120MiB/s when I know that
>  hardware can give 415MiB/s, but something strange slows down the
>  process.

What appears to have been missed here is that there are 5 drives in a
RAID-5 fashion.  Wait, RAID-5?  FreeBSD has RAID-5 support?  How?  Oh,
right...

There's a port called sysutils/graid5 which is a "converted to work on
FreeBSD 8.x" GEOM class for RAID-5.  The original was written for
earlier FreeBSD and was called geom_raid5.  The original that Arne
Worner introduced was written in 2006.  A port was made for it only
recently:

http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/graid5/Makefile

What scares me is the number of "variants" on this code:

http://en.wikipedia.org/wiki/Geom_raid5

Some users have asked why this code hasn't ever been committed to the
FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"):

http://forums.freebsd.org/showthread.php?t=9040

There are admissions from Arne that "the code is absolutely horrible",
which may be why it's never been committed to FreeBSD.  There's also all
sorts of other concerns:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00437.html

Here's one citing concerns over "aggressive caching", talking about
writes and not reads, but my point still applies:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.html

The thread continues for quite some time.

There's also a freebsd-current thread from 2007 asking if the code could
be committed to HEAD, with some users stating they'd like to see that
too -- with one noting that gvinum has support for RAID-5 so basically
"which is better?"  (I imagine that question is still unanswered)

There were also concerns over testing, reliability, throughput, etc. and
the answers (as of 2007) were really not that great:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00351.html
http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00361.html

So can I ask what guarantee you have that geom_raid5 is not responsible
for the intermittent I/O speeds you see?  I would recommend you remove
geom_raid5 from the picture entirely and replace it with either
gstripe(8) or ccd(4) SOLELY FOR TESTING.

Furthermore, why are these benchmarks not providing speed data
per-device (e.g. gstat or iostat -x data)?  There is a possibility that
one of your drives could be performing at less-than-ideal rates (yes,
intermittently) and therefore impacts (intermittently) your overall I/O
throughput.

The other posts in this mail thread so far are much more conclusive, but
the above points/concerns I believe are still valid.  They have never
been thoroughly refuted or addressed.  I guess you could say I'm very
surprised someone is complaining about performance issues on FreeBSD
when using a 3rd-party GEOM class that's been scrutinised in the past.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 07:38:38 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CB8761065674;
	Wed, 31 Aug 2011 07:38:38 +0000 (UTC)
	(envelope-from lev@serebryakov.spb.ru)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 67B108FC12;
	Wed, 31 Aug 2011 07:38:38 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 96D384AC58; 
	Wed, 31 Aug 2011 11:38:36 +0400 (MSD)
Date: Wed, 31 Aug 2011 11:38:33 +0400
From: Lev Serebryakov <lev@serebryakov.spb.ru>
X-Priority: 3 (Normal)
Message-ID: <485583919.20110831113833@serebryakov.spb.ru>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
In-Reply-To: <alpine.GSO.2.01.1108301735030.3028@freddy.simplesystems.org>
References: <317753422.20110830231815@serebryakov.spb.ru>
	<201108302009.p7UK9CBQ085481@chez.mckusick.com>
	<103666698.20110831021404@serebryakov.spb.ru>
	<1693072185.20110831022933@serebryakov.spb.ru>
	<alpine.GSO.2.01.1108301735030.3028@freddy.simplesystems.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 07:38:38 -0000

Hello, Bob.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 2:40:07:

>>>   I'll try this experiment with mmap() and touching every 4096-th byte =
of
>>> mapped memory instead of read(2).
>>  Strange enough, it gives only 40-50MiB/s and results are very
>> consistent.
>>
>>  It really surprise me. I didn't think, that there will be so much
>> difference, I was sure, that it will be almost equivalent speed.
> FreeBSD does not seem to default to sequential read-ahead when memory
> mapping is used with sequential page access.  Try using madvise()=20
> with the MADV_SEQUENTIAL option and see if it helps.
  It were results with MADV_SEQUENTIAL. Code looks like this: (error
checking is skipped here, but not in real code, of course):

fd =3D open(fileName, O_RDONLY | O_DIRECT);
buf =3D mmap(NULL, fileSize, PROT_READ, 0, fd, 0);
madvise(buf, fileSize, MADV_SEQUENTIAL);
gettimeofday(&start, NULL);
for (rd =3D 0; rd < fileSize; rd +=3D 4096)
    c =3D buf[rd];
gettimeofday(&end, NULL);
munmap(buf, fileSize);
close(fd);

> There are also MADV_WILLNEED, MADV_DONTNEED, and MADV_FREE.  Careful
> use of these options can help performance quite a lot when data is=20
> large compared to memory.
  It is too complex for simple linear read test :)

--=20
// Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 07:44:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4724B106566B
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 07:44:17 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id A94418FC0C
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 07:44:16 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7V7AVnr066398
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 10:10:36 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4E5DDE66.5000508@digsys.bg>
Date: Wed, 31 Aug 2011 10:10:30 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110822 Thunderbird/6.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
In-Reply-To: <20110831004251.GA89979@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 07:44:17 -0000

On 31.08.11 03:42, Jeremy Chadwick wrote:
> There is a possibility that one of your drives could be performing at 
> less-than-ideal rates (yes, intermittently) and therefore impacts 
> (intermittently) your overall I/O throughput.

This is very probable, given RAID5, that needs to read stripes off every 
disc on every read.

Probably some S.M.A.R.T. investigation might help (different 
measurements for different drives, if supported by the drives, that is).

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 08:03:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D887D106566C;
	Wed, 31 Aug 2011 08:03:04 +0000 (UTC)
	(envelope-from lev@serebryakov.spb.ru)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 5FD128FC0A;
	Wed, 31 Aug 2011 08:03:04 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id E35954AC31; 
	Wed, 31 Aug 2011 12:03:02 +0400 (MSD)
Date: Wed, 31 Aug 2011 12:03:00 +0400
From: Lev Serebryakov <lev@serebryakov.spb.ru>
X-Priority: 3 (Normal)
Message-ID: <687356195.20110831120300@serebryakov.spb.ru>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110831004251.GA89979@icarus.home.lan>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, Lev Serebryakov <lev@FreeBSD.org>
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 08:03:04 -0000

Hello, Jeremy.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 4:42:51:

> What appears to have been missed here is that there are 5 drives in a
> RAID-5 fashion.  Wait, RAID-5?  FreeBSD has RAID-5 support?  How?  Oh,
> right...

> There's a port called sysutils/graid5 which is a "converted to work on
> FreeBSD 8.x" GEOM class for RAID-5.  The original was written for
> earlier FreeBSD and was called geom_raid5.  The original that Arne
> Worner introduced was written in 2006.  A port was made for it only
> recently:
  I'm author of this port. And I'm author of some improvements,
approved by Arne Worner, which is included into this port :) And it
seems, that I'm only user in whole world of this port, too. But it
works for me for many years without any data-loss problems. It
helps me not to lost data, when I had 3 dead HDDs in these years (not
in simultaneously, of course) and upgrade my server from 5x500Gb to 5x2Tb
configuration without stopping it (ok, with small stop for "growfs"
run, but all HDDs were replaced one-by-one on live system, thaks to
STAT hotplug).
  Now I'm trying to squeeze maximum speed from this software :)

> What scares me is the number of "variants" on this code:
> http://en.wikipedia.org/wiki/Geom_raid5
   There are three wariants dumb proof-of-consept, stable and fast,
 but not ideal code and experimental one. Port uses second one. First
 one is way to slow and third one HAVE problems.

   What scares _me_ is the coding style of Arne. I've spent almost
 year to understand almost all details of this code, mostly due to
 two-letter variables, etc.

> Some users have asked why this code hasn't ever been committed to the
> FreeBSD kernel (dated 2010, citing "why isn't this in HEAD?"):
> http://forums.freebsd.org/showthread.php?t=3D9040
   Code style. And I mean real problems, not some nit-picking about
 "return 0;" vs "return (0);" or white spaces. I'm trying to clean up
it in separate branch, without changing functionality, before I'll
implement some new ideas, which should cleannup code even more. But it
is not very fast process, as I don't have a lot of spare time now, and
it is work which takes A LOT of concentration.

> Here's one citing concerns over "aggressive caching", talking about
> writes and not reads, but my point still applies:
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00398.=
html
> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-11/msg00403.=
html
  Yep, and this aggressive caching could be turned off. But it is
GREAT help on write speed. Use good UPS and nut -- they really HELP. And,
other note: without UPS and nut even without geom_raid5 here is BIG
problem with large volumes and UFS2. Background ffsck for 2Tb volume
takes about three hours, when system almost locked,  and fails often.
fsck of 8Tb volume? It is my worst nightmare. And it doesn't depend on
RADI5 and it write cache. Use UPS. USE IT.

> So can I ask what guarantee you have that geom_raid5 is not responsible
> for the intermittent I/O speeds you see?  I would recommend you remove
  I'm not sure here -- it is the point. I want to understand, is it
geom_raid5 problem, UFS2 problem, VMM problem or some combination of
``glithces'' of these subsystems. I'm almost sure, it is not problem
of something ``in vacuum,'' it is problem at border between
subsystems. And, as I don't understand well how to "look inside" UFS2,
I ask for help here.

> geom_raid5 from the picture entirely and replace it with either
> gstripe(8) or ccd(4) SOLELY FOR TESTING.
  It is impossible in this config: I have data which is valuable for
 me. Here is problem: I could do any tests, but speed one, on test
 server and VMs. I could run testsuite, switch off HDDs, re-create
 FSes, etc., to be sure that geom_raid5 is STABLE in terms of data
 safety.
   But only BIG system, on which I could perform valid speed
 benchmarks, is my home server with my data, which I could not lost.

   It is useless to run such benchmarks on array of old 9GiB (yes, you read=
 it
 right, 9 gigabytes) SCSI HDDs or in virtual machine with bunch of
 virtual HDDs. And I have not second server with modern fast and big
 disks. Sorry.

> Furthermore, why are these benchmarks not providing speed data
> per-device (e.g. gstat or iostat -x data)?  There is a possibility that
> one of your drives could be performing at less-than-ideal rates (yes,
> intermittently) and therefore impacts (intermittently) your overall I/O
> throughput.
   I'll look at this, but I've zeor-outed all HDDs before placing them
 into array, and speed were identical.

> been thoroughly refuted or addressed.  I guess you could say I'm very
> surprised someone is complaining about performance issues on FreeBSD
> when using a 3rd-party GEOM class that's been scrutinised in the past.
  It is not complain. It is request for help in profiling very old and
 complex subsystem :) Maybe, I was not very clear here in my first
 message.

--=20
// Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 08:11:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BCB40106566C;
	Wed, 31 Aug 2011 08:11:06 +0000 (UTC) (envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 828D78FC15;
	Wed, 31 Aug 2011 08:11:06 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 4DDB34AC31; 
	Wed, 31 Aug 2011 12:11:05 +0400 (MSD)
Date: Wed, 31 Aug 2011 12:11:02 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <170569583.20110831121102@serebryakov.spb.ru>
To: Kirk McKusick <mckusick@mckusick.com>
In-Reply-To: <201108302300.p7UN0jJ6022811@chez.mckusick.com>
References: <1693072185.20110831022933@serebryakov.spb.ru>
	<201108302300.p7UN0jJ6022811@chez.mckusick.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@FreeBSD.org, lev@FreeBSD.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 08:11:06 -0000

Hello, Kirk.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 3:00:45:

> So, my new theory on why your O_DIRECT test is running slowly is due
> to the single threading in the GEOM layer. Pawel Jakub Dawidek (pjd@)
> gave a very interesting talk on this problem at this year's BSDCan.
  I want to stress my point: it is not low speed, what especially
 bother me. It is inconsistency.

  Bad drive, always-single-threaded-GEOM, etc., should give consistent
 slow speed.

  Bad code in geom_raid5 could give inconsistent write speed, du to
caching, but reading path is as straight and simple as possible here.

  Bad drive or AST-GEOM is not what simple to fix (ok, bad drive IS
simple to replace, of course). But I suspect, that here is some simple
"misunderstanding" between geom_raid5 code and VFS/FFS2 layer, which I
could fix. But I don't know where to look at.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 08:19:24 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A81F11065C9D
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 08:19:24 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 6DB1B8FC1C
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 08:19:24 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 4828D4AC31; 
	Wed, 31 Aug 2011 12:19:23 +0400 (MSD)
Date: Wed, 31 Aug 2011 12:19:20 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <10310173523.20110831121920@serebryakov.spb.ru>
To: Daniel Kalchev <daniel@digsys.bg>
In-Reply-To: <4E5DDE66.5000508@digsys.bg>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<4E5DDE66.5000508@digsys.bg>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 08:19:24 -0000

Hello, Daniel.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 11:10:30:

> This is very probable, given RAID5, that needs to read stripes off every
> disc on every read.
  Again: how faulty drive could give inconsistency in one file? If it
 has relocated sectors (additional long seek in some place), it will
 give consistent performance degradation in that place.

> Probably some S.M.A.R.T. investigation might help (different=20
> measurements for different drives, if supported by the drives, that is).
 SMARTs are almost identical and exclellent. No relocated sectors (at
 all!) not multizone read errors (at all!), etc.

 I'll try to synchronize iostat statistic and reading benchmark, but
it is hard to do, as I can not think of the way how to know, that
these 100 lines of iostat output is exactly this (slow) reading of
this file, when there is 10+ files in question and all of them, are
read 10+ times.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 08:36:28 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 57544106564A
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 08:36:28 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 06D688FC13
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 08:36:28 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id AC6074AC31; 
	Wed, 31 Aug 2011 12:36:26 +0400 (MSD)
Date: Wed, 31 Aug 2011 12:36:23 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <147623060.20110831123623@serebryakov.spb.ru>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110831004251.GA89979@icarus.home.lan>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 08:36:28 -0000

Hello, Jeremy.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 4:42:51:

> Furthermore, why are these benchmarks not providing speed data
> per-device (e.g. gstat or iostat -x data)?  There is a possibility that
> one of your drives could be performing at less-than-ideal rates (yes,
> intermittently) and therefore impacts (intermittently) your overall I/O
> throughput.
  Ok. I've run my benchamrk when `iostat -x -d -c 999999' is running.
  Results are like this:

device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada1     340.9 292.9 43138.8   146.5    0   1.2  42
ada2     340.9 293.9 43138.8   147.0    0   1.9  63
ada3     340.9 292.9 43044.7   146.5    0   1.5  57
ada4     341.9 292.9 43232.9   146.5    0   1.3  42
ada5     341.9 292.0 43138.8   146.0    2   1.3  40

  Yes, NUMBERS are different from sample to sample and oscillate from
16MB/s to 80Mb/s, but they VERY consistent among disks in question.
Slow read? All disks work slowly. Fast read? All disks work fast.
  I don't like this low-level speed oscillation too. I understand,
that something higher on stack cause it. And want to understand --
WHAT.

  What additionally surprise me:

1) benchmark induce some writing. atime modification? No, I've turned
   this one off, but it doesn't help. I afraid, that this read-write
   interleaving could be cause of "problems", but I don't understand,
   WHY here is some writing (1 writing per 2 reads in average) when
   read-only benchmark runs. It doesn't write any logs, etc. Yes,
   writing speed is very low, every write transaction is about 2Kb,
   but WHY they are here?! If I stop benchmark, here will be less than
   1 write transaction per second.

2) without `-x' it shows, that typical read transaction size is
   about 50Kb. It is very strange, as geom_raid5 shows (I have
   diagnostics in it), that almost all file access is aligned and is
   128Kb-sized...

P.S. Several samples for example of consistency in ONE sample and
inconsistency BETWEEN samples. Random pick from output, no editing,
they were in exact this order:

                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada1     165.3  87.0 10515.9    43.5    2   5.0  50
ada2     165.3  87.0 10547.2    43.5    2   7.7  61
ada3     167.2  87.0 10703.7    43.5    1   6.1  55
ada4     165.3  87.0 10484.6    43.5    3   4.9  44
ada5     160.4  87.0 10265.5    43.5    5   5.1  48
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada1     884.1 350.9 56583.1   175.4    0   1.0  49
ada2     886.1 350.9 56677.2   175.4    0   1.3  58
ada3     882.2 349.9 56489.0   175.0    2   1.7  63
ada4     885.1 350.9 56614.5   175.4    0   1.4  64
ada5     887.1 350.9 56739.9   175.4    0   1.5  63
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada1     640.6 261.5 41001.3   130.8    0   0.9  40
ada2     639.7 261.5 40969.9   130.8    0   0.9  35
ada3     637.7 262.5 40844.5   131.3    0   1.5  46
ada4     640.6 260.6 41001.3   130.3    1   1.3  65
ada5     638.7 261.5 40875.9   130.8    0   1.3  46
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada1     243.7 102.8 15660.2    51.4    2   1.9  36
ada2     240.8 102.8 15503.6    51.4    3   1.9  43
ada3     242.7 103.7 15566.2    51.9    0   1.9  30
ada4     244.7 103.7 15785.5    51.9    2   2.4  56
ada5     243.7 102.8 15566.2    51.4    2   1.8  30

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 08:48:42 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6FC71106564A
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 08:48:42 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id F09158FC14
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 08:48:41 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7V8mXtE067659
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 11:48:38 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4E5DF560.1050507@digsys.bg>
Date: Wed, 31 Aug 2011 11:48:32 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110822 Thunderbird/6.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<147623060.20110831123623@serebryakov.spb.ru>
In-Reply-To: <147623060.20110831123623@serebryakov.spb.ru>
Content-Type: text/plain; charset=windows-1251; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 08:48:42 -0000


On 31.08.11 11:36, Lev Serebryakov wrote:
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     340.9 292.9 43138.8   146.5    0   1.2  42
> ada2     340.9 293.9 43138.8   147.0    0   1.9  63
> ada3     340.9 292.9 43044.7   146.5    0   1.5  57
> ada4     341.9 292.9 43232.9   146.5    0   1.3  42
> ada5     341.9 292.0 43138.8   146.0    2   1.3  40
>
Very interesting, this writes. You need to find out what is causing these.

Just some random thoughts:

This flapping may have something to do with the drives' internal caches. 
What are the drives?

SATA drives, unlike SAS have simplex communication with the host, that 
is, the drive cannot simultaneously read and write data and commands 
(from/to host). There might be some, perhaps locking contention in 
there? It is not contention for bandwidth obviously.

Most consumer drives have rather low IOPS performance. This is 
especially pronounced when there are both reads and writes. Your IOPS 
rate here is relatively high for such drive, although the busy 
percentage is low -- but then, it may not be accurate.

In any case, you cannot measure read performance as long as it 
intermixes with writes, especially as you noted that your RAID5 code has 
some non-obvious write characteristics/optimizations.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 09:03:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CFA301065670
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 09:03:58 +0000 (UTC)
	(envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 944008FC19
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 09:03:58 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id DDBFD4AC31; 
	Wed, 31 Aug 2011 13:03:56 +0400 (MSD)
Date: Wed, 31 Aug 2011 13:03:54 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <177519198.20110831130354@serebryakov.spb.ru>
To: Daniel Kalchev <daniel@digsys.bg>
In-Reply-To: <4E5DF560.1050507@digsys.bg>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<147623060.20110831123623@serebryakov.spb.ru>
	<4E5DF560.1050507@digsys.bg>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 09:03:58 -0000

Hello, Daniel.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 12:48:32:

> On 31.08.11 11:36, Lev Serebryakov wrote:
>> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
>> ada1     340.9 292.9 43138.8   146.5    0   1.2  42
>> ada2     340.9 293.9 43138.8   147.0    0   1.9  63
>> ada3     340.9 292.9 43044.7   146.5    0   1.5  57
>> ada4     341.9 292.9 43232.9   146.5    0   1.3  42
>> ada5     341.9 292.0 43138.8   146.0    2   1.3  40
>>
> Very interesting, this writes. You need to find out what is causing these.
  Yep. I've been very surprised by them.

> Just some random thoughts:

> This flapping may have something to do with the drives' internal caches.
> What are the drives?
  WD20EARS, it is WD Green 2Tb, advanced format. Yes, I know, that
 they are not best performers at all, when here are seeks. It is why I
 don't expect good performance in random or multi-threaded
 (multi-client) access patterns here.

   And, yes, I know about advanced format. Stripe size is 128Kb, and
 GEOM is built from raw drives, so all stripes are aligned. FS is
 created on raw GEOM, without any partitioning again, and block size
 is 32Kb, so everything should be aligned here too.

   Really, if all reading speeds were, say, 120MiB/s, but
 every time and consistent, I don't start this thread. In case I would
 blame HDDs and my parsimony, but not software :)

> SATA drives, unlike SAS have simplex communication with the host, that
> is, the drive cannot simultaneously read and write data and commands=20
> (from/to host). There might be some, perhaps locking contention in=20
> there? It is not contention for bandwidth obviously.
    Yep...

> In any case, you cannot measure read performance as long as it
> intermixes with writes, especially as you noted that your RAID5 code has
> some non-obvious write characteristics/optimizations.
  I understand. Now I should understand how to pin down these writes.

--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 10:12:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 24B311065672
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 10:12:14 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta08.westchester.pa.mail.comcast.net
	(qmta08.westchester.pa.mail.comcast.net [76.96.62.80])
	by mx1.freebsd.org (Postfix) with ESMTP id D849C8FC19
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 10:12:13 +0000 (UTC)
Received: from omta20.westchester.pa.mail.comcast.net ([76.96.62.71])
	by qmta08.westchester.pa.mail.comcast.net with comcast
	id SyCE1h0011YDfWL58yCEkt; Wed, 31 Aug 2011 10:12:14 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta20.westchester.pa.mail.comcast.net with comcast
	id SyCC1h00K1t3BNj3gyCDSB; Wed, 31 Aug 2011 10:12:13 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 4AB70102C36; Wed, 31 Aug 2011 03:12:11 -0700 (PDT)
Date: Wed, 31 Aug 2011 03:12:11 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Lev Serebryakov <lev@FreeBSD.org>
Message-ID: <20110831101211.GA98865@icarus.home.lan>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<147623060.20110831123623@serebryakov.spb.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <147623060.20110831123623@serebryakov.spb.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 10:12:14 -0000

On Wed, Aug 31, 2011 at 12:36:23PM +0400, Lev Serebryakov wrote:
> Hello, Jeremy.
> You wrote 31 ??????? 2011 ?., 4:42:51:
> 
> > Furthermore, why are these benchmarks not providing speed data
> > per-device (e.g. gstat or iostat -x data)?  There is a possibility that
> > one of your drives could be performing at less-than-ideal rates (yes,
> > intermittently) and therefore impacts (intermittently) your overall I/O
> > throughput.
>   Ok. I've run my benchamrk when `iostat -x -d -c 999999' is running.
>   Results are like this:
> 
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     340.9 292.9 43138.8   146.5    0   1.2  42
> ada2     340.9 293.9 43138.8   147.0    0   1.9  63
> ada3     340.9 292.9 43044.7   146.5    0   1.5  57
> ada4     341.9 292.9 43232.9   146.5    0   1.3  42
> ada5     341.9 292.0 43138.8   146.0    2   1.3  40
>
> {snipping text, focusing on data}
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     165.3  87.0 10515.9    43.5    2   5.0  50
> ada2     165.3  87.0 10547.2    43.5    2   7.7  61
> ada3     167.2  87.0 10703.7    43.5    1   6.1  55
> ada4     165.3  87.0 10484.6    43.5    3   4.9  44
> ada5     160.4  87.0 10265.5    43.5    5   5.1  48
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     884.1 350.9 56583.1   175.4    0   1.0  49
> ada2     886.1 350.9 56677.2   175.4    0   1.3  58
> ada3     882.2 349.9 56489.0   175.0    2   1.7  63
> ada4     885.1 350.9 56614.5   175.4    0   1.4  64
> ada5     887.1 350.9 56739.9   175.4    0   1.5  63
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     640.6 261.5 41001.3   130.8    0   0.9  40
> ada2     639.7 261.5 40969.9   130.8    0   0.9  35
> ada3     637.7 262.5 40844.5   131.3    0   1.5  46
> ada4     640.6 260.6 41001.3   130.3    1   1.3  65
> ada5     638.7 261.5 40875.9   130.8    0   1.3  46
>
> device     r/s   w/s    kr/s    kw/s wait svc_t  %b
> ada1     243.7 102.8 15660.2    51.4    2   1.9  36
> ada2     240.8 102.8 15503.6    51.4    3   1.9  43
> ada3     242.7 103.7 15566.2    51.9    0   1.9  30
> ada4     244.7 103.7 15785.5    51.9    2   2.4  56
> ada5     243.7 102.8 15566.2    51.4    2   1.8  30

This benchmark data is more or less unhelpful due to the fact that there
are writes occurring during the middle of your reads.  There's another
spun-off portion of this thread that is discussing how you're
benchmarking these things (specifically some code you wrote?).  I don't
know what else to say in this regard.  It would really help if you could
use something like bonnie++ and make sure the filesystem is not being
used by ANYTHING during your benchmarks.

Anyway, the data is interesting because from an aggregate total
perspective, you're hitting some arbitrary limit on all of your devices
which almost indicates memory bus throttling or something along those
lines; CPU time?  I really don't know.  Aggregate write speeds
respectively:

43138.8 + 43138.8 + 43044.7 + 43232.9 + 43138.8 == 215694.0 KByte/sec
10515.9 + 10547.2 + 10703.7 + 10484.6 + 10265.5 ==  52516.9 KByte/sec
56583.1 + 56677.2 + 56489.0 + 56614.5 + 56739.9 == 283103.7 KByte/sec
41001.3 + 40969.9 + 40844.5 + 41001.3 + 40875.9 == 204692.9 KByte/sec
15660.2 + 15503.6 + 15566.2 + 15785.5 + 15566.2 ==  78081.7 KByte/sec

The totals are "all over the place", but what interests me the most is
that the total aggregate never exceeds an amount that's slightly under
300MBytes/sec..  That number has some relevance if, say, you're using a
port multiplier (5 devices aggregated across one SATA300 port).

Despite these being WD20EARS drives (4 platters, ugh!), these individual
devices should be able to push 75-90MBytes/sec writes, and slightly
higher reads.

Like you, it also interests me that all the drives behave the same;
meaning all speeds are roughly the same on all 5 devices simultaneously,
regardless of speed/rate/throughput.

Here's an idea: can you stop using the filesystem for a bit and instead
do raw dd's from all of the /dev/adaX entries to /dev/null
simultaneously (pick something like bs=64k or bs=256k), then run your
iostats?  I'm basically trying to figure out if the bad speeds are
actually the devices themselves or if it's the geom_raid5 stuff.  You
get where I'm going with this.

If 5 simultaneously dds reading from the drives is very fast (way faster
than the above) and there aren't sporadic drops in performance which
aren't caused by writes (hence my "stop using the filesystem" comment),
then I think we've narrowed down where the issue lies -- not the drives.

> 1) benchmark induce some writing. atime modification? No, I've turned
>    this one off, but it doesn't help. I afraid, that this read-write
>    interleaving could be cause of "problems", but I don't understand,
>    WHY here is some writing (1 writing per 2 reads in average) when
>    read-only benchmark runs. It doesn't write any logs, etc. Yes,
>    writing speed is very low, every write transaction is about 2Kb,
>    but WHY they are here?! If I stop benchmark, here will be less than
>    1 write transaction per second.

(Note: I'm going to assume by "Kb" you mean "kilobytes" and not
"kilobits"; B = byte, b = bit.  This is why I got into the habit of just
writing out the unit in full, because too many people try to shorthand
it and pick the wrong one.  And it'll be a cold day in hell before I
ever use "XXbi" (e.g. kibi, mebi, gibi, tebi))

The dd method I describe should absolutely not induce writes, hence my
recommendation.  If writes are seen during the dd's, then either the
filesystem is mounted and FreeBSD is doing something "interesting" on a
filesystem or vfs level, or your system is actually an izbushka.....

Maybe softupdates are somehow responsible?  Not sure.

> 2) without `-x' it shows, that typical read transaction size is
>    about 50Kb. It is very strange, as geom_raid5 shows (I have
>    diagnostics in it), that almost all file access is aligned and is
>    128Kb-sized...

I'm not sure -- please take what I say here with a grain of salt -- but
I believe there was a recent discussion on -stable or -fs about some
sort of 64KByte "limit" within UFS/UFS2 somewhere?  I think I'm thinking
of "MAX_BSIZE".  I'm having a lot of difficulty following all these
storage-related threads.  Everyone seems to show up "in bulk" on the
mailing lists all at once and it's overwhelming at times.  I'm getting
old, in more ways than one.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 11:37:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 197D31065672
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 11:37:30 +0000 (UTC)
	(envelope-from lev@serebryakov.spb.ru)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 964438FC08
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 11:37:29 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 78F4A4AC31; 
	Wed, 31 Aug 2011 15:37:27 +0400 (MSD)
Date: Wed, 31 Aug 2011 15:37:24 +0400
From: Lev Serebryakov <lev@serebryakov.spb.ru>
X-Priority: 3 (Normal)
Message-ID: <981083303.20110831153724@serebryakov.spb.ru>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110831101211.GA98865@icarus.home.lan>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<147623060.20110831123623@serebryakov.spb.ru>
	<20110831101211.GA98865@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 11:37:30 -0000

Hello, Jeremy.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 14:12:11:

> This benchmark data is more or less unhelpful due to the fact that there
> are writes occurring during the middle of your reads.  There's another
  Yep :(

> spun-off portion of this thread that is discussing how you're
> benchmarking these things (specifically some code you wrote?).  I don't
> know what else to say in this regard.  It would really help if you could
> use something like bonnie++ and make sure the filesystem is not being
> used by ANYTHING during your benchmarks.
  I'll try bonnie++, Ok. My code is really as simple as it could be:

fd =3D open(fileName, O_RDONLY | O_DIRECT);
gettimeofday(&start, NULL);
/* s_BufferSize is 128KiB */
while ((rd =3D read(fd, s_Buffer, s_BufferSize)) > 0)
   size +=3D rd;
gettimeofday(&end, NULL);
close(fd);


> Anyway, the data is interesting because from an aggregate total
> perspective, you're hitting some arbitrary limit on all of your devices
> which almost indicates memory bus throttling or something along those
> lines; CPU time?  I really don't know.  Aggregate write speeds
> respectively:

> 43138.8 + 43138.8 + 43044.7 + 43232.9 + 43138.8 =3D=3D 215694.0 KByte/sec
> 10515.9 + 10547.2 + 10703.7 + 10484.6 + 10265.5 =3D=3D  52516.9 KByte/sec
> 56583.1 + 56677.2 + 56489.0 + 56614.5 + 56739.9 =3D=3D 283103.7 KByte/sec
> 41001.3 + 40969.9 + 40844.5 + 41001.3 + 40875.9 =3D=3D 204692.9 KByte/sec
> 15660.2 + 15503.6 + 15566.2 + 15785.5 + 15566.2 =3D=3D  78081.7 KByte/sec

> The totals are "all over the place", but what interests me the most is
> that the total aggregate never exceeds an amount that's slightly under
> 300MBytes/sec..  That number has some relevance if, say, you're using a
> port multiplier (5 devices aggregated across one SATA300 port).
  No. All drives are on separate ports of ICH9R chipset controller.
And, yes, sustained and constant 300MiB/s is my dream :) Keywords:
sustained and constant.

> Despite these being WD20EARS drives (4 platters, ugh!), these individual
  As ffar as I understand, 4 platters are slightly better in linear
access than 3 platters, but worse in random access, as it read more
data without heads movement.

> devices should be able to push 75-90MBytes/sec writes, and slightly
> higher reads.
   Read is about 110MiB/s at beginning of drive.

> Here's an idea: can you stop using the filesystem for a bit and instead
> do raw dd's from all of the /dev/adaX entries to /dev/null
> simultaneously (pick something like bs=3D64k or bs=3D256k), then run your
> iostats?  I'm basically trying to figure out if the bad speeds are
> actually the devices themselves or if it's the geom_raid5 stuff.  You
> get where I'm going with this.
  Not a problem! FS is unmounted, and after that:

# for d in 1 2 3 4 5 ; do dd if=3D/dev/ada$d of=3D/dev/null bs=3D64k & done
# iostat -c 999999 -dx ada1 ada2 ada3 ada4 ada5
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ada1     1849.1   0.0 118343.7     0.0    1   0.5  93
ada2     1920.3   0.0 122900.2     0.0    0   0.5  94
ada3     1874.5   0.0 119966.6     0.0    1   0.5  94
ada4     1794.5   0.0 114848.4     0.0    1   0.5  94
ada5     1893.0   0.0 121152.5     0.0    1   0.5  93

  It is very typical data, speed slightly goes up and down for all
 HDDs without any visible fastest or slowest drive.

> If 5 simultaneously dds reading from the drives is very fast (way faster
> than the above) and there aren't sporadic drops in performance which
> aren't caused by writes (hence my "stop using the filesystem" comment),
> then I think we've narrowed down where the issue lies -- not the drives.
   Yep. It seems to be exactly like this.

> The dd method I describe should absolutely not induce writes, hence my
> recommendation.  If writes are seen during the dd's, then either the
> filesystem is mounted and FreeBSD is doing something "interesting" on a
> filesystem or vfs level, or your system is actually an izbushka.....

> Maybe softupdates are somehow responsible?  Not sure.
  I have one ide about geom_raid5 writes... I need to check it.

--=20
// Black Lion AKA Lev Serebryakov <lev@serebryakov.spb.ru>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 11:47:26 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DB188106564A
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 11:47:26 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 62B978FC16
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 11:47:26 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p7VBlHUT070526
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 14:47:22 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4E5E1F44.8020603@digsys.bg>
Date: Wed, 31 Aug 2011 14:47:16 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
	rv:6.0) Gecko/20110822 Thunderbird/6.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<147623060.20110831123623@serebryakov.spb.ru>
	<20110831101211.GA98865@icarus.home.lan>
	<981083303.20110831153724@serebryakov.spb.ru>
In-Reply-To: <981083303.20110831153724@serebryakov.spb.ru>
Content-Type: text/plain; charset=windows-1251; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 11:47:26 -0000


On 31.08.11 14:37, Lev Serebryakov wrote:
>
>> If 5 simultaneously dds reading from the drives is very fast (way faster
>> than the above) and there aren't sporadic drops in performance which
>> aren't caused by writes (hence my "stop using the filesystem" comment),
>> then I think we've narrowed down where the issue lies -- not the drives.
>     Yep. It seems to be exactly like this.
>
This test does not rule out drive IOPS limits. Or drive cache trashing.

If you tell the drive to continuously read, or write mots of these IOs 
is served from/to drive cache, thus such large number of IOPS. More that 
the drive could handle if it has to move heads.

Not saying this is the case, but things may be as simple as filling up 
the write cache and the drive deciding to flush it out to platters, thus 
reducing read rate. These are desktop drives, apparently designed for 
non-threaded applications. "raw" read/write speeds may be high, but 
higher-performing drives at much higher price points offer much more 
performance, even at lower "raw" read/write rates. Just spending more 
for smarter controller.

Eliminate the writes and the drives might be worth their salt.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 12:49:38 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6ED3F106566B;
	Wed, 31 Aug 2011 12:49:38 +0000 (UTC) (envelope-from lev@FreeBSD.org)
Received: from onlyone.friendlyhosting.spb.ru (onlyone.friendlyhosting.spb.ru
	[IPv6:2a01:4f8:131:60a2::2])
	by mx1.freebsd.org (Postfix) with ESMTP id 0B9378FC12;
	Wed, 31 Aug 2011 12:49:38 +0000 (UTC)
Received: from lion.home.serebryakov.spb.ru (unknown
	[IPv6:2001:470:923f:1:6407:f3f9:7d93:d34c])
	(Authenticated sender: lev@serebryakov.spb.ru)
	by onlyone.friendlyhosting.spb.ru (Postfix) with ESMTPA id 6E1074AC31; 
	Wed, 31 Aug 2011 16:49:36 +0400 (MSD)
Date: Wed, 31 Aug 2011 16:49:33 +0400
From: Lev Serebryakov <lev@FreeBSD.org>
Organization: FreeBSD
X-Priority: 3 (Normal)
Message-ID: <809344970.20110831164933@serebryakov.spb.ru>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110831101211.GA98865@icarus.home.lan>
References: <1945418039.20110830231024@serebryakov.spb.ru>
	<317753422.20110830231815@serebryakov.spb.ru>
	<20110831004251.GA89979@icarus.home.lan>
	<147623060.20110831123623@serebryakov.spb.ru>
	<20110831101211.GA98865@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, Lev Serebryakov <lev@FreeBSD.org>
Subject: Re: Very inconsistent (read) speed on UFS2
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: lev@FreeBSD.org
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 12:49:38 -0000

Hello, Jeremy.
You wrote 31 =E0=E2=E3=F3=F1=F2=E0 2011 =E3., 14:12:11:

> The dd method I describe should absolutely not induce writes, hence my
> recommendation.  If writes are seen during the dd's, then either the
> filesystem is mounted and FreeBSD is doing something "interesting" on a
> filesystem or vfs level, or your system is actually an izbushka.....
  I've eliminate writes. It was RAID5 metadata updates due to very
 paranoid check for "new metadata". It doesn't hurt in terms of dtaa
 safety, but not in terms of speed. Ok, now results are MUCH MORE
 consistent, and is about 50% of theoretical maximum on average. Looks
 good.

SLOWEST (by Average) files:
Name        Min/Avg/Max       StdDev
r007f05.nef 205/230/242 MiB/s     12
r008f06.nef 215/234/254 MiB/s     14
r018f10.nef 218/235/258 MiB/s     13
r013f09.nef 230/243/256 MiB/s      9
r013f11.nef 236/243/249 MiB/s      4
r008f10.nef 238/243/249 MiB/s      3
r015f04.nef 220/244/265 MiB/s     17
r011f04.nef 240/245/256 MiB/s      5
r015f05.nef 221/248/286 MiB/s     24
r008f09.nef 231/250/266 MiB/s     11

MOST UNSTABLE files:
Name        Min/Avg/Max       StdDev
r008f12.nef 291/327/377 MiB/s     38
r021f06.nef 307/382/404 MiB/s     37
r021f02.nef 253/295/346 MiB/s     34
r013f08.nef 264/329/352 MiB/s     33
r012f05.nef 298/354/398 MiB/s     32
r020f05.nef 305/357/388 MiB/s     30
r020f03.nef 292/316/376 MiB/s     30
r022f06.nef 284/319/371 MiB/s     30
r010f12.nef 303/346/377 MiB/s     29
r013f06.nef 285/329/365 MiB/s     29


--=20
// Black Lion AKA Lev Serebryakov <lev@FreeBSD.org>


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 12:49:57 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E70591065672;
	Wed, 31 Aug 2011 12:49:56 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 6B39E8FC21;
	Wed, 31 Aug 2011 12:49:55 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAKssXk6DaFvO/2dsb2JhbABDFoQ2pHOBQAEBBAEjBFIFFAIOCgICDRkCWQaIBQSnNpILgSyEGIERBJMlkSM
X-IronPort-AV: E=Sophos;i="4.68,307,1312171200"; d="scan'208";a="136074187"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 31 Aug 2011 08:49:55 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 3BFC2B3F27;
	Wed, 31 Aug 2011 08:49:55 -0400 (EDT)
Date: Wed, 31 Aug 2011 08:49:55 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: George Liaskos <geo.liaskos@gmail.com>
Message-ID: <382461010.589453.1314794995233.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CANcjpOByga-_DPnrm69731q6CvkGV7hHSHRVsnakzkVjzTQOHw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 12:49:57 -0000

George Liaskos wrote:
> > You could try this patch and see what effect it has (applied to the
> > server). It just disables the access check for readdir.
> > --- nfs_nfsdport.c.sav2 2011-08-30 10:35:58.000000000 -0400
> > +++ nfs_nfsdport.c 2011-08-30 10:36:54.000000000 -0400
> > @@ -1838,10 +1838,12 @@ nfsrvd_readdirplus(struct nfsrv_descript
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =
=3D NFSERR_NOTDIR;
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat && cnt =3D=3D 0)
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =
=3D NFSERR_TOOSMALL;
> > +#ifdef notnow
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (!nd->nd_repstat)
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->nd_repstat =
=3D nfsvno_accchk(vp, VEXEC,
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nd=
->nd_cred, exp, p, NFSACCCHK_NOOVERRIDE,
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0NF=
SACCCHK_VPISLOCKED, NULL);
> > +#endif
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_repstat) {
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vput(vp);
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (nd->nd_flag =
& ND_NFSV3)
> >
> > This wouldn't be suitable for a production system, but whether or
> > not it "fixes" the problem would give us an indication of where the
> > problem is.
> >
> > Also, if you could clarify when your 8/stable was downloaded,
> > whether
> > your 9.0 upgrade was to vanilla Beta1 or ??? and details w.r.t. your
> > ZFS setup, that might help.
>=20
> I use svn, unfortunately i don't remember exactly when i moved from
> 8.2 to stable. I synced with CURRENT last week and this issue
> appeared, i did a second update to beta 2 [r225237] with the same
> results.
>=20
> The patch didn't make any difference. I downloaded an ISO with BETA-1
> and
> made a VM installation, i was not able to reproduce this.
>=20
> Updated one of the clients to r225237, setup some nfs exports on top
> of ZFS
> and ls does not work for non root users. I created a new pool on top
> of a memory fs
> to test this.
>=20
> Next, i "downgraded" the server to BETA-1 [r224413] and everything is
> back to normal.
Ok, so it sounds like a post-Beta1 server issue. Did I get that correct?

> So there's a bug which was introduced somewhere between BETA-1 &&
> BETA-2 :p
>=20
Well, I can't imagine why this would matter, but you can try this patch,
which fixes a problem introduced by r224810 where Lookup ".." no longer
works. (It's at http://people.freebsd.org/~rmacklem/dotdot.patch, in case
the white space gets munged.)
Index: fs/nfsserver/nfs_nfsdport.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- fs/nfsserver/nfs_nfsdport.c=09(revision 225270)
+++ fs/nfsserver/nfs_nfsdport.c=09(working copy)
@@ -282,6 +282,7 @@ nfsvno_namei(struct nfsrv_descript *nd, struct nam
=20
 =09*retdirp =3D NULL;
 =09cnp->cn_nameptr =3D cnp->cn_pnbuf;
+=09ndp->ni_strictrelative =3D 0;
 =09/*
 =09 * Extract and set starting directory.
 =09 */
Index: nfsserver/nfs_serv.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- nfsserver/nfs_serv.c=09(revision 225270)
+++ nfsserver/nfs_serv.c=09(working copy)
@@ -157,6 +157,7 @@ ndclear(struct nameidata *nd)
 =09nd->ni_vp =3D NULL;
 =09nd->ni_dvp =3D NULL;
 =09nd->ni_startdir =3D NULL;
+=09nd->ni_strictrelative =3D 0;
 }
=20
 /*

rick
> Thank you for your help!
>=20
> Regards,
> George

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 14:57:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A91B8106566C
	for <fs@FreeBSD.org>; Wed, 31 Aug 2011 14:57:02 +0000 (UTC)
	(envelope-from egrosbein@rdtc.ru)
Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5])
	by mx1.freebsd.org (Postfix) with ESMTP id F14B28FC13
	for <fs@FreeBSD.org>; Wed, 31 Aug 2011 14:57:01 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
	by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VEv0Rs045837
	for <fs@FreeBSD.org>; Wed, 31 Aug 2011 21:57:00 +0700 (NOVST)
	(envelope-from egrosbein@rdtc.ru)
Message-ID: <4E5E4BB7.1030307@rdtc.ru>
Date: Wed, 31 Aug 2011 21:56:55 +0700
From: Eugene Grosbein <egrosbein@rdtc.ru>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU;
	rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7
MIME-Version: 1.0
To: fs@FreeBSD.org
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Unfixable UFS2 corruption
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 14:57:02 -0000

Hi!

Please CC: me as I'm not in the list.

Long story short: my /usr/local UFS2 filesystem somehow got corrupted
and "fsck -y" in single user mode does not fix it.

Explanation:

# ls -al /usr/local/obj/usr/local/src/secure/lib/libssh
ls: : No such file or directory
total 8
drwxr-xr-x  2 root  wheel  4608 Aug 30 01:28 .
drwxr-xr-x  3 root  wheel   512 Aug 30 01:28 ..

# rm -rf /usr/local/obj/usr/local/src/secure/lib/libssh
rm: /usr/local/obj/usr/local/src/secure/lib/libssh: Directory not empty

As I've said, I cold booted this FreeBSD 8.2-STABLE system to single user mode
where all file systems are not mounted (except root) and ran fsck -y /usr/local
It found no errors and said it is CLEAN. The problem still persists.

I've written small program and it said me this directory contains third file
(besides <.> and <..> entries) having zero file length.

I got contents of the directory to plain file with
"cat /usr/local/obj/usr/local/src/secure/lib/libssh > /tmp/libssh" and put it online:
http://www.grosbein.net/crash/corruption/libssh

Please help. The program and its output follow:

#include <sys/types.h>
#include <dirent.h>
#include <err.h>
#include <stdio.h>

int main(int argc, char* argv[])
{

  DIR		*dirp;
  struct dirent *dp;
  unsigned	i;

  if (argc<2)
	return 1;

  if ( (dirp = opendir(argv[1])) == NULL )
	err (1, "opendir");

  i = 0;
  while ((dp = readdir(dirp)) != NULL) {
    i++;
    printf("Entry %u:\n"
	   "d_fileno=%u\n"
           "d_reclen=%u\n"
	   "d_type=%u\n"
	   "d_namlen=%u\n"
	   "d_name=<%s>\n\n",
	   i, (unsigned) dp->d_fileno, (unsigned) dp->d_reclen,
	   (unsigned) dp->d_type, (unsigned) dp->d_namlen,
	   (char *) dp->d_name);
  }
  return closedir(dirp);
}

# # ./readdir /usr/local/obj/usr/local/src/secure/lib/libssh
Entry 1:
d_fileno=1531227
d_reclen=12
d_type=4
d_namlen=1
d_name=<.>

Entry 2:
d_fileno=1389650
d_reclen=500
d_type=4
d_namlen=2
d_name=<..>

Entry 3:
d_fileno=24
d_reclen=512
d_type=8
d_namlen=0
d_name=<>

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 15:21:02 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0BCB6106564A;
	Wed, 31 Aug 2011 15:21:02 +0000 (UTC)
	(envelope-from egrosbein@rdtc.ru)
Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5])
	by mx1.freebsd.org (Postfix) with ESMTP id 6E3C98FC08;
	Wed, 31 Aug 2011 15:21:01 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
	by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VFL0Fq045895;
	Wed, 31 Aug 2011 22:21:00 +0700 (NOVST)
	(envelope-from egrosbein@rdtc.ru)
Message-ID: <4E5E5157.7050706@rdtc.ru>
Date: Wed, 31 Aug 2011 22:20:55 +0700
From: Eugene Grosbein <egrosbein@rdtc.ru>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU;
	rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7
MIME-Version: 1.0
To: FreeBSD Stable <freebsd-stable@freebsd.org>, fs@freebsd.org
References: <4E5E46B1.4070408@rdtc.ru>
In-Reply-To: <4E5E46B1.4070408@rdtc.ru>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 8bit
Cc: 
Subject: Re: Unfixable UFS2 corruption
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 15:21:02 -0000

31.08.2011 21:35, Eugene Grosbein �����:

> # ls -al /usr/local/obj/usr/local/src/secure/lib/libssh
> ls: : No such file or directory
> total 8
> drwxr-xr-x  2 root  wheel  4608 Aug 30 01:28 .
> drwxr-xr-x  3 root  wheel   512 Aug 30 01:28 ..
> 
> # rm -rf /usr/local/obj/usr/local/src/secure/lib/libssh
> rm: /usr/local/obj/usr/local/src/secure/lib/libssh: Directory not empty
> 
> As I've said, I cold booted this FreeBSD 8.2-STABLE system to single user mode
> where all file systems are not mounted (except root) and ran fsck -y /usr/local
> It found no errors and said it is CLEAN. The problem still persists.
> 
> I've written small program and it said me this directory contains third file
> (besides <.> and <..> entries) having zero file length.

Not file but file name length is zero. I've just found that
dircheck() function in src/sbin/fsck_ffs/dir.c simply does not check
if d_namlen is zero as it should, shouldn't it?

Eugene Grosbein

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 16:09:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3DEB91065672
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 16:09:17 +0000 (UTC)
	(envelope-from giffunip@tutopia.com)
Received: from nm29-vm0.bullet.mail.sp2.yahoo.com
	(nm29-vm0.bullet.mail.sp2.yahoo.com [98.139.91.236])
	by mx1.freebsd.org (Postfix) with SMTP id 18B938FC14
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 16:09:17 +0000 (UTC)
Received: from [98.139.91.61] by nm29.bullet.mail.sp2.yahoo.com with NNFMP;
	31 Aug 2011 15:56:41 -0000
Received: from [98.139.91.21] by tm1.bullet.mail.sp2.yahoo.com with NNFMP;
	31 Aug 2011 15:56:41 -0000
Received: from [127.0.0.1] by omp1021.mail.sp2.yahoo.com with NNFMP;
	31 Aug 2011 15:56:41 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 410340.39070.bm@omp1021.mail.sp2.yahoo.com
Received: (qmail 15207 invoked by uid 60001); 31 Aug 2011 15:56:40 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
	t=1314806200; bh=fJYezCtiQWl6cGA+BETvdebccU6dyRIMVJRbdFL+cpk=;
	h=X-YMail-OSG:Received:X-RocketYMMF:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type;
	b=esQL7I59vm+TtB2Ie3BOWkY1JcBmzpsXOzMosHuAmGdLZDupTr353AQnr0q8g91yoXfjoxMuxy1mpDH0RvDSoxYkG8mH/vXa8mx+jm8zqL+gCC2T1YTL6jagdtHv5GCnrXE89rM5mwO27tez/OtaTWZNkWeI9kEe6UCuXnCRx1Y=
X-YMail-OSG: p3BmoNQVM1kzsYp2jIJjzbiD4A4xCcIDVOAFZr9daKiupEy
	lEmAvyvNB_vXel2Isi2xncYvst7X3yXhzkEsAdD9pFryQIyDfjPEiQam2Da6
	sJnCZ.bV5Rfkm20Si1SDTQ.xxoq681.UMMHBHLWgLvw5L5QNpwockUUBIj3g
	Z6rehx4Xci3qlIHZn1K.d_MhGOD_hLGidc50GmeM1QTJHGy1GLaYwA4ustPn
	Nh5Is81YIax4w.H3LIkei..fhQqb5SGKnzN4nU0lsXFYePPvmGcBADSVhgE4
	MGWByTUdIxUM0BhMZR3MBh9Fs2dmlWGvIAxjJd4C06ivFDZyHN_MX8HCLXqR
	P4r7WwIAljwcvHNWujOx3rnekes8QXb4YxEB7d3LsMUe1gD.WqtuJ5PrIiE7
	Tw5BHyWYVeYlKLnzA26AXHr.X0jIjZb9tD_AJbKdwbeD7mmtNK7NR3xZu7i4
	FhPPX8V._65JJW30fJOvEY39nNM3_WZP87HV6mCo9BRG0mfXLB7QNsivS2yO
	3Fk9Owxv9tD9xtaEz.reoQxgx5oM45eF0Q812F41cNh6i4OEPv70frhFJDX4
	kHFNtoTObEe7ia0P6tyeF3IZeiQtsVWCAihJkRKS4PSPGTA--
Received: from [200.118.157.7] by web113507.mail.gq1.yahoo.com via HTTP;
	Wed, 31 Aug 2011 08:56:40 PDT
X-RocketYMMF: giffunip
X-Mailer: YahooMailClassic/14.0.5 YahooMailWebService/0.8.113.315625
Message-ID: <1314806200.14687.YahooMailClassic@web113507.mail.gq1.yahoo.com>
Date: Wed, 31 Aug 2011 08:56:40 -0700 (PDT)
From: "Pedro F. Giffuni" <giffunip@tutopia.com>
To: freebsd-fs@FreeBSD.org
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: 
Subject: SEEK_DATA/SEEK_HOLE on UFS/EXT2FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: giffunip@tutopia.com
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 16:09:17 -0000

Hi;

Just FYI, after reconsidering their position wrt NIH, the
linux guys now think SEEK_DATA/SEEK_HOLE is wonderful:

http://lwn.net/Articles/440255/

and NetBSD is known to be working on it too (latest patch):

http://mail-index.netbsd.org/tech-kern/2011/08/17/msg011231.html

I hope our own developers haven't forgotten that this
is indeed a desired feature and that we get it for 10.0
or, if possible, 9.1.

cheers,

Pedro.

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 16:13:37 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 065D4106564A;
	Wed, 31 Aug 2011 16:13:37 +0000 (UTC)
	(envelope-from egrosbein@rdtc.ru)
Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5])
	by mx1.freebsd.org (Postfix) with ESMTP id 666268FC16;
	Wed, 31 Aug 2011 16:13:36 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
	by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VGDWKZ046077;
	Wed, 31 Aug 2011 23:13:32 +0700 (NOVST)
	(envelope-from egrosbein@rdtc.ru)
Message-ID: <4E5E5DA7.1010802@rdtc.ru>
Date: Wed, 31 Aug 2011 23:13:27 +0700
From: Eugene Grosbein <egrosbein@rdtc.ru>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU;
	rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7
MIME-Version: 1.0
To: Adam Vande More <amvandemore@gmail.com>
References: <4E5E46B1.4070408@rdtc.ru>
	<CA+tpaK33zRqnzXG23f9ODN2QFFa1o-zr5_jw4Kj+kknGj5Wb7w@mail.gmail.com>
In-Reply-To: <CA+tpaK33zRqnzXG23f9ODN2QFFa1o-zr5_jw4Kj+kknGj5Wb7w@mail.gmail.com>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 8bit
Cc: stable@FreeBSD.org, fs@FreeBSD.org
Subject: Re: Unfixable UFS2 corruption
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 16:13:37 -0000

31.08.2011 23:02, Adam Vande More �����:

>     Long story short: my /usr/local UFS2 filesystem somehow got corrupted
>     and "fsck -y" in single user mode does not fix it.
> 
> Not sure if this helps or not but on rare occasion I've had to run fsck twice consecutively to fix a FS.

Not this time - fsck does NOT find any problems in this file system.

Now I think fsck_ffs needs a patch:

--- sbin/fsck_ffs/dir.c.orig	2011-08-31 22:54:23.000000000 +0700
+++ sbin/fsck_ffs/dir.c	2011-08-31 22:54:48.000000000 +0700
@@ -225,7 +225,7 @@
 	type = dp->d_type;
 	if (dp->d_reclen < size ||
 	    idesc->id_filesize < size ||
-	    namlen > MAXNAMLEN ||
+	    namlen == 0 || namlen > MAXNAMLEN ||
 	    type > 15)
 		goto bad;
 	for (cp = dp->d_name, size = 0; size < namlen; size++)


Comments?

Eugene Grosbein

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 16:24:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4A4A11065670;
	Wed, 31 Aug 2011 16:24:11 +0000 (UTC)
	(envelope-from egrosbein@rdtc.ru)
Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5])
	by mx1.freebsd.org (Postfix) with ESMTP id 920448FC1A;
	Wed, 31 Aug 2011 16:24:10 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
	by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VGO9gM046130;
	Wed, 31 Aug 2011 23:24:09 +0700 (NOVST)
	(envelope-from egrosbein@rdtc.ru)
Message-ID: <4E5E6024.3030708@rdtc.ru>
Date: Wed, 31 Aug 2011 23:24:04 +0700
From: Eugene Grosbein <egrosbein@rdtc.ru>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU;
	rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7
MIME-Version: 1.0
References: <4E5E46B1.4070408@rdtc.ru>	<CA+tpaK33zRqnzXG23f9ODN2QFFa1o-zr5_jw4Kj+kknGj5Wb7w@mail.gmail.com>
	<4E5E5DA7.1010802@rdtc.ru>
In-Reply-To: <4E5E5DA7.1010802@rdtc.ru>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 8bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Unfixable UFS2 corruption
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 16:24:11 -0000

31.08.2011 23:13, Eugene Grosbein �����:
> 31.08.2011 23:02, Adam Vande More �����:
> 
>>     Long story short: my /usr/local UFS2 filesystem somehow got corrupted
>>     and "fsck -y" in single user mode does not fix it.
>>
>> Not sure if this helps or not but on rare occasion I've had to run fsck twice consecutively to fix a FS.
> 
> Not this time - fsck does NOT find any problems in this file system.
> 
> Now I think fsck_ffs needs a patch:
> 
> --- sbin/fsck_ffs/dir.c.orig	2011-08-31 22:54:23.000000000 +0700
> +++ sbin/fsck_ffs/dir.c	2011-08-31 22:54:48.000000000 +0700
> @@ -225,7 +225,7 @@
>  	type = dp->d_type;
>  	if (dp->d_reclen < size ||
>  	    idesc->id_filesize < size ||
> -	    namlen > MAXNAMLEN ||
> +	    namlen == 0 || namlen > MAXNAMLEN ||
>  	    type > 15)
>  		goto bad;
>  	for (cp = dp->d_name, size = 0; size < namlen; size++)
> 
> 
> Comments?

With this patch applied, my FS has finally been fixed by fsck:

** Last Mounted on /usr/local
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=1531227  OWNER=root MODE=40755
SIZE=4608 MTIME=Aug 30 01:28 2011 
DIR=/obj/usr/local/src/secure/lib/libssh

SALVAGE? [yn] 

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
LINK COUNT FILE I=24  OWNER=root MODE=100644
SIZE=892 MTIME=Sep 17 11:10 2010  COUNT 2 SHOULD BE 1
ADJUST? [yn] 

** Phase 5 - Check Cyl groups
459580 files, 7411823 used, 7819495 free (105503 frags, 964249 blocks, 0.7% fragmentation)

***** FILE SYSTEM IS CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****

Should I fill PR?

Eugene Grosbein

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 16:53:16 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3311B106564A;
	Wed, 31 Aug 2011 16:53:16 +0000 (UTC)
	(envelope-from egrosbein@rdtc.ru)
Received: from eg.sd.rdtc.ru (unknown [IPv6:2a03:3100:c:13::5])
	by mx1.freebsd.org (Postfix) with ESMTP id 78FE78FC13;
	Wed, 31 Aug 2011 16:53:15 +0000 (UTC)
Received: from eg.sd.rdtc.ru (localhost [127.0.0.1])
	by eg.sd.rdtc.ru (8.14.5/8.14.5) with ESMTP id p7VGrEU3046226;
	Wed, 31 Aug 2011 23:53:14 +0700 (NOVST)
	(envelope-from egrosbein@rdtc.ru)
Message-ID: <4E5E66F5.6090401@rdtc.ru>
Date: Wed, 31 Aug 2011 23:53:09 +0700
From: Eugene Grosbein <egrosbein@rdtc.ru>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; ru-RU;
	rv:1.9.2.13) Gecko/20110112 Thunderbird/3.1.7
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
References: <4E5E46B1.4070408@rdtc.ru>	<CA+tpaK33zRqnzXG23f9ODN2QFFa1o-zr5_jw4Kj+kknGj5Wb7w@mail.gmail.com>	<4E5E5DA7.1010802@rdtc.ru>
	<CAJ-Vmon8e7ankSoji-YJDUbO+weSdZc4Y5uht-fvROAhit_QOA@mail.gmail.com>
In-Reply-To: <CAJ-Vmon8e7ankSoji-YJDUbO+weSdZc4Y5uht-fvROAhit_QOA@mail.gmail.com>
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 8bit
Cc: stable@freebsd.org, fs@freebsd.org
Subject: Re: Unfixable UFS2 corruption
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 16:53:16 -0000

31.08.2011 23:34, Adrian Chadd �����:
> Have you created a PR for this?

http://www.freebsd.org/cgi/query-pr.cgi?pr=160339

Eugene Grosbein

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 19:57:25 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A411E1065675;
	Wed, 31 Aug 2011 19:57:25 +0000 (UTC)
	(envelope-from geo.liaskos@gmail.com)
Received: from mail-qy0-f182.google.com (mail-qy0-f182.google.com
	[209.85.216.182])
	by mx1.freebsd.org (Postfix) with ESMTP id F34608FC15;
	Wed, 31 Aug 2011 19:57:24 +0000 (UTC)
Received: by qyk9 with SMTP id 9so848296qyk.13
	for <multiple recipients>; Wed, 31 Aug 2011 12:57:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=I4ceq4uVTQCxf9i2HBL1+ebwXawXGw2W/6KELVylrsI=;
	b=KYqg9gD3+ZJsPA7N751QnAgBH4K8fbJOnYgwou8rMDwbCPSMaNpXy0TER8d0T4uUde
	gpoy+IJ/Hd3s+ABz1SF9GDwYnlrP8OfiR5zBFQnr/AGf2V+GAH7YjNaF9Hm1hXsXISee
	ZBHEBw/A2tOzFjK7nZ/qpI/dY+7hKlAtWhYBg=
MIME-Version: 1.0
Received: by 10.229.89.66 with SMTP id d2mr672833qcm.93.1314820643950; Wed, 31
	Aug 2011 12:57:23 -0700 (PDT)
Received: by 10.229.89.138 with HTTP; Wed, 31 Aug 2011 12:57:23 -0700 (PDT)
In-Reply-To: <382461010.589453.1314794995233.JavaMail.root@erie.cs.uoguelph.ca>
References: <CANcjpOByga-_DPnrm69731q6CvkGV7hHSHRVsnakzkVjzTQOHw@mail.gmail.com>
	<382461010.589453.1314794995233.JavaMail.root@erie.cs.uoguelph.ca>
Date: Wed, 31 Aug 2011 22:57:23 +0300
Message-ID: <CANcjpOB3ri-zN+1LEka-VRk+9vuV27eTX=VSSVpSU0VQnZEA2A@mail.gmail.com>
From: George Liaskos <geo.liaskos@gmail.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 19:57:25 -0000

On Wed, Aug 31, 2011 at 3:49 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Well, I can't imagine why this would matter, but you can try this patch,
> which fixes a problem introduced by r224810 where Lookup ".." no longer
> works. (It's at http://people.freebsd.org/~rmacklem/dotdot.patch, in case
> the white space gets munged.)
> Index: fs/nfsserver/nfs_nfsdport.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- fs/nfsserver/nfs_nfsdport.c (revision 225270)
> +++ fs/nfsserver/nfs_nfsdport.c (working copy)
> @@ -282,6 +282,7 @@ nfsvno_namei(struct nfsrv_descript *nd, struct nam
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0*retdirp =3D NULL;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0cnp->cn_nameptr =3D cnp->cn_pnbuf;
> + =C2=A0 =C2=A0 =C2=A0 ndp->ni_strictrelative =3D 0;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Extract and set starting directory.
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
> Index: nfsserver/nfs_serv.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- nfsserver/nfs_serv.c =C2=A0 =C2=A0 =C2=A0 =C2=A0(revision 225270)
> +++ nfsserver/nfs_serv.c =C2=A0 =C2=A0 =C2=A0 =C2=A0(working copy)
> @@ -157,6 +157,7 @@ ndclear(struct nameidata *nd)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_vp =3D NULL;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_dvp =3D NULL;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_startdir =3D NULL;
> + =C2=A0 =C2=A0 =C2=A0 nd->ni_strictrelative =3D 0;
> =C2=A0}
>
> =C2=A0/*
>
> rick

This patch works for me. :)

Regards,
George

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 22:08:46 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 68D6B106566B;
	Wed, 31 Aug 2011 22:08:46 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id E7E078FC0A;
	Wed, 31 Aug 2011 22:08:45 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap8EAOivXk6DaFvO/2dsb2JhbABCFoQ2pHmBQAEBBAEjBFIFFAIOCgICDRkCWQYTh3IEqQOSF4EshBiBEQSTJZEl
X-IronPort-AV: E=Sophos;i="4.68,309,1312171200"; d="scan'208";a="132870357"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 31 Aug 2011 18:08:44 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C43D5B3F80;
	Wed, 31 Aug 2011 18:08:44 -0400 (EDT)
Date: Wed, 31 Aug 2011 18:08:44 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: George Liaskos <geo.liaskos@gmail.com>
Message-ID: <1463532532.632117.1314828524787.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CANcjpOB3ri-zN+1LEka-VRk+9vuV27eTX=VSSVpSU0VQnZEA2A@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Originating-IP: [172.17.91.203]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org, Pawel Jakub Dawidek <pjd@freebsd.org>
Subject: Re: NFSv4: After upgrade to 9 users can no longer list files.
 (sounds like a ZFS issue?)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 22:08:46 -0000

George Liaskos wrote:
> On Wed, Aug 31, 2011 at 3:49 PM, Rick Macklem <rmacklem@uoguelph.ca>
> wrote:
> > Well, I can't imagine why this would matter, but you can try this
> > patch,
> > which fixes a problem introduced by r224810 where Lookup ".." no
> > longer
> > works. (It's at http://people.freebsd.org/~rmacklem/dotdot.patch, in
> > case
> > the white space gets munged.)
> > Index: fs/nfsserver/nfs_nfsdport.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- fs/nfsserver/nfs_nfsdport.c (revision 225270)
> > +++ fs/nfsserver/nfs_nfsdport.c (working copy)
> > @@ -282,6 +282,7 @@ nfsvno_namei(struct nfsrv_descript *nd, struct
> > nam
> >
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0*retdirp =3D NULL;
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0cnp->cn_nameptr =3D cnp->cn_pnbuf;
> > + ndp->ni_strictrelative =3D 0;
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0/*
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 * Extract and set starting directory.
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
> > Index: nfsserver/nfs_serv.c
> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > --- nfsserver/nfs_serv.c (revision 225270)
> > +++ nfsserver/nfs_serv.c (working copy)
> > @@ -157,6 +157,7 @@ ndclear(struct nameidata *nd)
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_vp =3D NULL;
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_dvp =3D NULL;
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0nd->ni_startdir =3D NULL;
> > + nd->ni_strictrelative =3D 0;
> > =C2=A0}
> >
> > =C2=A0/*
> >
> > rick
>=20
> This patch works for me. :)
>=20
Ah, good. (I can't think of why root vs non-root would have mattered, but
if it fixed the problem. Maybe just a side effect, since without being
initialized, it would be whatever happened to be on the stack.)

Thanks for doing the legwork on this and letting us know. This patch
is in the re@ queue, rick.

> Regards,
> George

From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 22:34:00 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C7B1106566B
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 22:34:00 +0000 (UTC)
	(envelope-from ee@athyriogames.com)
Received: from madonna.sslcatacombnetworking.com
	(madonna.sslcatacombnetworking.com [174.133.19.130])
	by mx1.freebsd.org (Postfix) with ESMTP id 28BAB8FC15
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 22:33:59 +0000 (UTC)
Received: from c-98-206-215-156.hsd1.in.comcast.net ([98.206.215.156]
	helo=laptopv)
	by madonna.sslcatacombnetworking.com with esmtpa (Exim 4.69)
	(envelope-from <ee@athyriogames.com>)
	id 1QytBt-0007b2-NM; Wed, 31 Aug 2011 17:23:26 -0500
From: "Engineering" <ee@athyriogames.com>
To: "'Peter Jeremy'" <peterjeremy@acm.org>
References: <01c801cc667f$f99eb7b0$ecdc2710$@com>
	<020d01cc6724$0f0410b0$2d0c3210$@com>
	<20110831210623.GB25698@server.vk2pj.dyndns.org>
In-Reply-To: <20110831210623.GB25698@server.vk2pj.dyndns.org>
Date: Wed, 31 Aug 2011 17:33:25 -0500
Message-ID: <029c01cc682e$02b82a70$08287f50$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: AcxoIGIKeoShtOcgQomZZrWDXA027QADXhPg
Content-Language: en-us
X-AntiAbuse: This header was added to track abuse,
	please include it with any abuse report
X-AntiAbuse: Primary Hostname - madonna.sslcatacombnetworking.com
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - athyriogames.com
Cc: freebsd-fs@freebsd.org
Subject: RE: Read-only disk problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 22:34:00 -0000

Thank you very much!

That is what I needed. I have / moutned read only in fstab, but I needed
'root_rw_mount="NO"

To seal the deal

Thanks again!
Sam

-----Original Message-----
From: Peter Jeremy [mailto:peterjeremy@acm.org] 
Sent: Wednesday, August 31, 2011 4:06 PM
To: Engineering
Cc: freebsd-fs@freebsd.org
Subject: Re: Read-only disk problem

On 2011-Aug-30 09:49:41 -0500, Engineering <ee@athyriogames.com> wrote:
>Hi, I've attached some more info. Doing a fsdump shows the following 
>changes over reboot
>
>magic	19540119 (UFS2)	time	Tue Aug 30 03:08:04 2011
>...
>cg 1:
>magic	90255	tell	4b1c000	time	Tue Aug 30 03:08:04 2011
>
>Changes to
>
>magic	19540119 (UFS2)	time	Tue Aug 30 03:13:14 2011
>...
>cg 1:
>magic	90255	tell	4b1c000	time	Tue Aug 30 03:13:14 2011

It's normal for CG's and superblocks to be updated when there's any activity
on a read-write UFS.  (By default, the inode atime field will be lazily
updated when the inode is accessed, so just reading from a UFS mounted RW is
enough to cause writes).

>Is there any data that is written to the disk at boot or mount time, 
>and if so, is there a way to prevent it?

Are you sure that the FS is mounted read-only?  / is automatically mounted
read-write unless 'root_rw_mount="NO"' is specified in /etc/rc.conf.  If a
filesystem is mounted read-only, it will not be updated at all.

--
Peter Jeremy


From owner-freebsd-fs@FreeBSD.ORG  Wed Aug 31 23:24:03 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EF137106566C
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 23:24:03 +0000 (UTC)
	(envelope-from peterjeremy@acm.org)
Received: from fallbackmx09.syd.optusnet.com.au
	(fallbackmx09.syd.optusnet.com.au [211.29.132.242])
	by mx1.freebsd.org (Postfix) with ESMTP id 80A828FC0A
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 23:24:03 +0000 (UTC)
Received: from mail27.syd.optusnet.com.au (mail27.syd.optusnet.com.au
	[211.29.133.168])
	by fallbackmx09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p7VL6XFb016293
	for <freebsd-fs@freebsd.org>; Thu, 1 Sep 2011 07:06:33 +1000
Received: from server.vk2pj.dyndns.org
	(c220-239-116-103.belrs4.nsw.optusnet.com.au [220.239.116.103])
	by mail27.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p7VL6OFx015513
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 1 Sep 2011 07:06:25 +1000
X-Bogosity: Ham, spamicity=0.000000
Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1])
	by server.vk2pj.dyndns.org (8.14.4/8.14.4) with ESMTP id p7VL6Ofj025834;
	Thu, 1 Sep 2011 07:06:24 +1000 (EST)
	(envelope-from peter@server.vk2pj.dyndns.org)
Received: (from peter@localhost)
	by server.vk2pj.dyndns.org (8.14.4/8.14.4/Submit) id p7VL6NVp025833;
	Thu, 1 Sep 2011 07:06:23 +1000 (EST) (envelope-from peter)
Date: Thu, 1 Sep 2011 07:06:23 +1000
From: Peter Jeremy <peterjeremy@acm.org>
To: Engineering <ee@athyriogames.com>
Message-ID: <20110831210623.GB25698@server.vk2pj.dyndns.org>
References: <01c801cc667f$f99eb7b0$ecdc2710$@com>
	<020d01cc6724$0f0410b0$2d0c3210$@com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="cmJC7u66zC7hs+87"
Content-Disposition: inline
In-Reply-To: <020d01cc6724$0f0410b0$2d0c3210$@com>
X-PGP-Key: http://members.optusnet.com.au/peterjeremy/pubkey.asc
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: Read-only disk problem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Aug 2011 23:24:04 -0000


--cmJC7u66zC7hs+87
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On 2011-Aug-30 09:49:41 -0500, Engineering <ee@athyriogames.com> wrote:
>Hi, I've attached some more info. Doing a fsdump shows the following chang=
es
>over reboot
>
>magic	19540119 (UFS2)	time	Tue Aug 30 03:08:04 2011
>...
>cg 1:
>magic	90255	tell	4b1c000	time	Tue Aug 30 03:08:04 2011
>
>Changes to
>
>magic	19540119 (UFS2)	time	Tue Aug 30 03:13:14 2011
>...
>cg 1:
>magic	90255	tell	4b1c000	time	Tue Aug 30 03:13:14 2011

It's normal for CG's and superblocks to be updated when there's any
activity on a read-write UFS.  (By default, the inode atime field
will be lazily updated when the inode is accessed, so just reading
=66rom a UFS mounted RW is enough to cause writes).

>Is there any data that is written to the disk at boot or mount time, and if
>so, is there a way to prevent it?

Are you sure that the FS is mounted read-only?  / is automatically
mounted read-write unless 'root_rw_mount=3D"NO"' is specified in
/etc/rc.conf.  If a filesystem is mounted read-only, it will not
be updated at all.

--=20
Peter Jeremy

--cmJC7u66zC7hs+87
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (FreeBSD)

iEYEARECAAYFAk5eok8ACgkQ/opHv/APuIfzTwCdGUahmlNAJX9lErJpUdSxn3kM
jCcAnj/EO/eFuzcPcnhyVnbZ5s0mPB2F
=kWYR
-----END PGP SIGNATURE-----

--cmJC7u66zC7hs+87--

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 04:28:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ED1FC106566C
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 04:28:09 +0000 (UTC)
	(envelope-from gnehzuil@gmail.com)
Received: from mail-pz0-f45.google.com (mail-pz0-f45.google.com
	[209.85.210.45])
	by mx1.freebsd.org (Postfix) with ESMTP id C4EED8FC0C
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 04:28:09 +0000 (UTC)
Received: by pzk33 with SMTP id 33so4155511pzk.18
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 21:28:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:subject:references
	:in-reply-to:content-type:content-transfer-encoding;
	bh=nITCqT0/1uMSUgYaLxJYIdC0+9gSuDfbU4Wu3IYFysU=;
	b=L8Ckmk8tYRRpV6Tc0BBOWTwxiw4wtREttpnGNBvczMo1k9UaSKOP2HyF+4SL2q4p52
	ZV3w9w6CoominaI4ZUSSMddsvDR5aSa5SKlC1wNbOKMOU+vB7GyMNZ4XbCPwZFzH4Tw+
	EyxoDUDy1WqMmdWvSNDACTdMAzqD6aLAJeI4o=
Received: by 10.68.64.103 with SMTP id n7mr1513907pbs.303.1314849861994;
	Wed, 31 Aug 2011 21:04:21 -0700 (PDT)
Received: from [10.32.101.195] ([182.92.247.2])
	by mx.google.com with ESMTPS id m16sm310248wfd.0.2011.08.31.21.04.19
	(version=SSLv3 cipher=OTHER); Wed, 31 Aug 2011 21:04:20 -0700 (PDT)
Message-ID: <4E5F043F.2010303@gmail.com>
Date: Thu, 01 Sep 2011 12:04:15 +0800
From: gnehzuil <gnehzuil@gmail.com>
User-Agent: Mozilla/5.0 (X11; Linux i686;
	rv:5.0) Gecko/20110627 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <1314806200.14687.YahooMailClassic@web113507.mail.gq1.yahoo.com>
In-Reply-To: <1314806200.14687.YahooMailClassic@web113507.mail.gq1.yahoo.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: SEEK_DATA/SEEK_HOLE on UFS/EXT2FS
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 04:28:10 -0000

Hi Pedro,

Actually, in linux, it doesn't really support SEEK_DATA/SEEK_HOLE. The 
patches related don't be merged into mainline.
At present, when lseek(2) is called with SEEK_DATA, the entire file is 
as data, as long as offset is smaller than the end of the file. 
Meanwhile, a virtual hole is at the end of the file. So lseek(2) is 
called with SEEK_HOLE, i_size in linux is returned.

Best regards,
lz

On 08/31/2011 11:56 PM, Pedro F. Giffuni wrote:
> Hi;
>
> Just FYI, after reconsidering their position wrt NIH, the
> linux guys now think SEEK_DATA/SEEK_HOLE is wonderful:
>
> http://lwn.net/Articles/440255/
>
> and NetBSD is known to be working on it too (latest patch):
>
> http://mail-index.netbsd.org/tech-kern/2011/08/17/msg011231.html
>
> I hope our own developers haven't forgotten that this
> is indeed a desired feature and that we get it for 10.0
> or, if possible, 9.1.
>
> cheers,
>
> Pedro.
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 06:37:32 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 85BFA106564A
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 06:37:32 +0000 (UTC)
	(envelope-from dan@3geeks.org)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 4A6B48FC0C
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 06:37:31 +0000 (UTC)
Received: by yxn22 with SMTP id 22so195226yxn.13
	for <freebsd-fs@freebsd.org>; Wed, 31 Aug 2011 23:37:31 -0700 (PDT)
Received: by 10.236.155.198 with SMTP id j46mr7014576yhk.23.1314857488130;
	Wed, 31 Aug 2011 23:11:28 -0700 (PDT)
Received: from [172.16.1.35] (99-126-192-237.lightspeed.austtx.sbcglobal.net
	[99.126.192.237])
	by mx.google.com with ESMTPS id a29sm453578yhj.45.2011.08.31.23.11.26
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 31 Aug 2011 23:11:27 -0700 (PDT)
From: Daniel Mayfield <dan@3geeks.org>
Date: Thu, 1 Sep 2011 01:11:25 -0500
Message-Id: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
To: freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 06:37:32 -0000

I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB disks. =
 I followed the instructions here: =
http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimiz=
ed-for-4k-sector-drives/, but just building a single pool so three =
partitions per disk (boot, swap and zfs).  I'm using the mfsBSD image to =
do the boot code.  When I reboot to actually come up from ZFS, the =
loader spins for half a second and then the machine reboots.  I've seen =
a number of bug reports on gptzfsboot and 4k sector pools, but I never =
saw one fail so early.  What data would the ZFS people need to help fix =
this?

daniel=

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 08:08:23 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DB5C91065672
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 08:08:22 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 9D9CB8FC12
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 08:08:22 +0000 (UTC)
Received: by gyd10 with SMTP id 10so1501499gyd.13
	for <freebsd-fs@freebsd.org>; Thu, 01 Sep 2011 01:08:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	bh=/Hem36pZxNM2hs3WRMKsruZuTdzostbLszBu6DHc76c=;
	b=Egi+vnfYEtVnBaXMXTXleLYzugGTplNg6CuABtk7WofqRgfqj4226Yd//6V2ZqU/vB
	dMxYv4KqO+CtzpTgCvn26Qb2qJXly+B9v93VPQIx03WblTh+0U2yc0Lnub/am5hnDV4G
	pAtcdpyOb6ElLznSzOTJQdEri6QCnCSHi+/tY=
MIME-Version: 1.0
Received: by 10.236.116.199 with SMTP id g47mr7192038yhh.44.1314864502024;
	Thu, 01 Sep 2011 01:08:22 -0700 (PDT)
Received: by 10.236.103.19 with HTTP; Thu, 1 Sep 2011 01:08:22 -0700 (PDT)
In-Reply-To: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
References: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
Date: Thu, 1 Sep 2011 09:08:22 +0100
Message-ID: <CALfReydQHEXoPnZZL=mN3H8mNG=E5XbTqH+AS5XTsqAoUCmYbQ@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: Daniel Mayfield <dan@3geeks.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 08:08:23 -0000

On 1 September 2011 07:11, Daniel Mayfield <dan@3geeks.org> wrote:

> I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB disks.  I
> followed the instructions here:
> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/,
> but just building a single pool so three partitions per disk (boot, swap and
> zfs).  I'm using the mfsBSD image to do the boot code.  When I reboot to
> actually come up from ZFS, the loader spins for half a second and then the
> machine reboots.  I've seen a number of bug reports on gptzfsboot and 4k
> sector pools, but I never saw one fail so early.  What data would the ZFS
> people need to help fix this?
>
> daniel_______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


try these boot bits they always work for me

http://people.freebsd.org/~pjd/zfsboot/

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 13:07:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 738A6106564A
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 13:07:09 +0000 (UTC)
	(envelope-from trent@snakebite.org)
Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 514B58FC12
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 13:07:09 +0000 (UTC)
Received: from EXCASUM03.exchhosting.com (192.168.11.203) by
	exhub05.exchhosting.com (192.168.11.101) with Microsoft SMTP Server
	(TLS) id 8.2.213.0; Thu, 1 Sep 2011 05:57:00 -0700
Received: from [10.211.55.3] (35.11.55.172) by exchange.liveoffice.com
	(192.168.11.203) with Microsoft SMTP Server (TLS) id 8.2.213.0;
	Thu, 1 Sep 2011 05:57:00 -0700
Message-ID: <4E5F811A.2040307@snakebite.org>
Date: Thu, 1 Sep 2011 08:56:58 -0400
From: Trent Nelson <trent@snakebite.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Daniel Mayfield <dan@3geeks.org>
References: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
In-Reply-To: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Cc: "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org>
Subject: Re: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 13:07:09 -0000

On 01-Sep-11 2:11 AM, Daniel Mayfield wrote:
> I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB
> disks.  I followed the instructions here:
> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/,
> but just building a single pool so three partitions per disk (boot,
> swap and zfs).  I'm using the mfsBSD image to do the boot code.  When
> I reboot to actually come up from ZFS, the loader spins for half a
> second and then the machine reboots.  I've seen a number of bug
> reports on gptzfsboot and 4k sector pools, but I never saw one fail
> so early.  What data would the ZFS people need to help fix this?

FWIW, I experienced the exact same issue about a week ago with four new 
WD EARS 2TB disks.  I contemplated looking into fixing it, until I 
noticed the crazy disk usage with 4K sectors.  On my old box, my 
/usr/src dataset was ~450MB (mirrored 512-byte drives), on the new box 
with the 2TB 4k sector drives, /usr/src was 1.5-something GB.  Exact 
same settings.

This appeared to be the case for *everything*; every file system/zfs 
dataset seemed to be consuming 2-3 times more space on the 4K-sector box.

So, combine that with the fact that I couldn't boot into it anyway, and 
I ditched the 4k-sector effort and just re-built with raidz as per 
normal (i.e. with 512-byte sectors).

One week later?  Disk usage is sensible, as expected, but performance 
(especially writing) is pretty horrid.  As much as I'd like to blame 
raidz overhead, I'm not sure it's the problem; I've got a gstripe of 
4x16GB partitions at the start of each 2TB as /scratch; dd'ing /dev/zero 
to that doesn't yield write speeds faster than ~20-30MB/s if I'm lucky. 
  Writing to the raidz partition nets about 15-20MB/s in very bursty 
peaks.  NFS and Samba performance are even worse; 2-3MB/s sustained if 
I'm lucky, with the odd burst of 20MB/s every so often.

(The box is a lowly dual-core Athlon 1800 w/ 8GB RAM, 8-stable from 
yesterday.)

So, uh, no solution from my end, but perhaps some more problems for you 
to run into if you get it to boot ;-)


	Trent.

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 16:30:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 645FD1065673
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 16:30:27 +0000 (UTC)
	(envelope-from dan@3geeks.org)
Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 299CA8FC14
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 16:30:26 +0000 (UTC)
Received: by gwb15 with SMTP id 15so1315042gwb.13
	for <freebsd-fs@freebsd.org>; Thu, 01 Sep 2011 09:30:26 -0700 (PDT)
Received: by 10.150.254.1 with SMTP id b1mr226355ybi.323.1314894626396;
	Thu, 01 Sep 2011 09:30:26 -0700 (PDT)
Received: from [172.16.1.35] (99-126-192-237.lightspeed.austtx.sbcglobal.net
	[99.126.192.237])
	by mx.google.com with ESMTPS id l18sm102295ybg.7.2011.09.01.09.30.24
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 01 Sep 2011 09:30:24 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1084)
From: Daniel Mayfield <dan@3geeks.org>
In-Reply-To: <4E5F811A.2040307@snakebite.org>
Date: Thu, 1 Sep 2011 11:30:23 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org>
References: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
	<4E5F811A.2040307@snakebite.org>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1084)
Subject: Re: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 16:30:27 -0000


On Sep 1, 2011, at 7:56 AM, Trent Nelson wrote:

> On 01-Sep-11 2:11 AM, Daniel Mayfield wrote:
>> I just set this up on an Athlon64 machine I have w/ 4 WD EARS 2TB
>> disks.  I followed the instructions here:
>> =
http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimiz=
ed-for-4k-sector-drives/,
>> but just building a single pool so three partitions per disk (boot,
>> swap and zfs).  I'm using the mfsBSD image to do the boot code.  When
>> I reboot to actually come up from ZFS, the loader spins for half a
>> second and then the machine reboots.  I've seen a number of bug
>> reports on gptzfsboot and 4k sector pools, but I never saw one fail
>> so early.  What data would the ZFS people need to help fix this?
>=20
> FWIW, I experienced the exact same issue about a week ago with four =
new WD EARS 2TB disks.  I contemplated looking into fixing it, until I =
noticed the crazy disk usage with 4K sectors.  On my old box, my =
/usr/src dataset was ~450MB (mirrored 512-byte drives), on the new box =
with the 2TB 4k sector drives, /usr/src was 1.5-something GB.  Exact =
same settings.

I noticed that the free data space was also bigger.  I tried it with =
raidz on the 512B sectors and it claimed to have only 5.3T of space.  =
With 4KB sectors, it claimed to have 7.25T of space.  Seems like =
something is wonky in the space calculations?

daniel=

From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 17:18:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 910CD106564A
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 17:18:05 +0000 (UTC)
	(envelope-from trent@snakebite.org)
Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 7222F8FC0C
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 17:18:05 +0000 (UTC)
Received: from EXCASUM03.exchhosting.com (192.168.11.203) by
	exhub03.exchhosting.com (192.168.11.104) with Microsoft SMTP Server
	(TLS) id 8.2.213.0; Thu, 1 Sep 2011 10:17:52 -0700
Received: from [10.211.55.3] (35.11.55.172) by exchange.liveoffice.com
	(192.168.11.203) with Microsoft SMTP Server (TLS) id 8.2.213.0;
	Thu, 1 Sep 2011 10:17:52 -0700
Message-ID: <4E5FBE3E.7020706@snakebite.org>
Date: Thu, 1 Sep 2011 13:17:50 -0400
From: Trent Nelson <trent@snakebite.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Daniel Mayfield <dan@3geeks.org>, "freebsd-fs@freebsd.org"
	<freebsd-fs@freebsd.org>
References: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
	<4E5F811A.2040307@snakebite.org>
	<7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org>
In-Reply-To: <7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 17:18:05 -0000

On 01-Sep-11 12:30 PM, Daniel Mayfield wrote:
>
> On Sep 1, 2011, at 7:56 AM, Trent Nelson wrote:
>
>> On 01-Sep-11 2:11 AM, Daniel Mayfield wrote:
>>> I just set this up on an Athlon64 machine I have w/ 4 WD EARS
>>> 2TB disks.  I followed the instructions here:
>>> http://www.leidinger.net/blog/2011/05/03/another-root-on-zfs-howto-optimized-for-4k-sector-drives/,
>>>but just building a single pool so three partitions per disk (boot,
>>> swap and zfs).  I'm using the mfsBSD image to do the boot code.
>>> When I reboot to actually come up from ZFS, the loader spins for
>>> half a second and then the machine reboots.  I've seen a number
>>> of bug reports on gptzfsboot and 4k sector pools, but I never saw
>>> one fail so early.  What data would the ZFS people need to help
>>> fix this?
>>
>> FWIW, I experienced the exact same issue about a week ago with four
>> new WD EARS 2TB disks.  I contemplated looking into fixing it,
>> until I noticed the crazy disk usage with 4K sectors.  On my old
>> box, my /usr/src dataset was ~450MB (mirrored 512-byte drives), on
>> the new box with the 2TB 4k sector drives, /usr/src was
>> 1.5-something GB.  Exact same settings.
>
> I noticed that the free data space was also bigger.  I tried it with
> raidz on the 512B sectors and it claimed to have only 5.3T of space.
> With 4KB sectors, it claimed to have 7.25T of space.  Seems like
> something is wonky in the space calculations?

Hmmmm.  It didn't occur to me that the space calculations might be 
wonky.  That could explain why I was seeing disk usage much higher on 4K 
than 512-bytes for all my zfs datasets.  Here's my zpool/zfs output w/ 
512-byte sectors (4-disk raidz):

[root@flanker/ttypts/0(~)#] zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  7.12T   698G  6.44T     9%  1.16x  ONLINE  -
[root@flanker/ttypts/0(~)#] zfs list tank
NAME   USED  AVAIL  REFER  MOUNTPOINT
tank   604G  4.74T  46.4K  legacy

It's a raidz1-0 of four 2TB disks, so the space available should be 
(4-1=3)*2TB=6TB?  Although I presume that's 6-marketing-terabtyes, which 
translates to ... 6000000000000/(1024^4)=5.  And I've got 64k boot, 8G 
swap, 16G scratch on each drive *before* the tank, so eh, I guess 4.74T 
sounds about right.

The 7.12T reported by zpool doesn't seem to be taking into account the 
reduced space from the raidz parity.  *shrug*

Enough about sizes; what's your read/write performance like between 
512-byte/4K?  I didn't think to test performance in the 4K 
configuration; I really wish I had, now.

	Trent.


From owner-freebsd-fs@FreeBSD.ORG  Thu Sep  1 17:46:16 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B94A71065670
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 17:46:16 +0000 (UTC)
	(envelope-from dan@3geeks.org)
Received: from mail-yi0-f54.google.com (mail-yi0-f54.google.com
	[209.85.218.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 7DD4B8FC08
	for <freebsd-fs@freebsd.org>; Thu,  1 Sep 2011 17:46:16 +0000 (UTC)
Received: by yib19 with SMTP id 19so2019759yib.13
	for <freebsd-fs@freebsd.org>; Thu, 01 Sep 2011 10:46:15 -0700 (PDT)
Received: by 10.236.136.65 with SMTP id v41mr874025yhi.29.1314899175618;
	Thu, 01 Sep 2011 10:46:15 -0700 (PDT)
Received: from [172.16.1.35] (99-126-192-237.lightspeed.austtx.sbcglobal.net
	[99.126.192.237])
	by mx.google.com with ESMTPS id o48sm227019yhl.4.2011.09.01.10.46.13
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 01 Sep 2011 10:46:14 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1084)
From: Daniel Mayfield <dan@3geeks.org>
In-Reply-To: <4E5FBE3E.7020706@snakebite.org>
Date: Thu, 1 Sep 2011 12:46:12 -0500
Content-Transfer-Encoding: quoted-printable
Message-Id: <553883C7-B97D-429F-AF4A-E208B6051B62@3geeks.org>
References: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
	<4E5F811A.2040307@snakebite.org>
	<7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org>
	<4E5FBE3E.7020706@snakebite.org>
To: freebsd-fs@freebsd.org
X-Mailer: Apple Mail (2.1084)
Subject: Re: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Sep 2011 17:46:16 -0000

>> I noticed that the free data space was also bigger.  I tried it with
>> raidz on the 512B sectors and it claimed to have only 5.3T of space.
>> With 4KB sectors, it claimed to have 7.25T of space.  Seems like
>> something is wonky in the space calculations?
>=20
> Hmmmm.  It didn't occur to me that the space calculations might be =
wonky.  That could explain why I was seeing disk usage much higher on 4K =
than 512-bytes for all my zfs datasets.  Here's my zpool/zfs output w/ =
512-byte sectors (4-disk raidz):
>=20
> [root@flanker/ttypts/0(~)#] zpool list tank
> NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
> tank  7.12T   698G  6.44T     9%  1.16x  ONLINE  -
> [root@flanker/ttypts/0(~)#] zfs list tank
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> tank   604G  4.74T  46.4K  legacy
>=20
> It's a raidz1-0 of four 2TB disks, so the space available should be =
(4-1=3D3)*2TB=3D6TB?  Although I presume that's 6-marketing-terabtyes, =
which translates to ... 6000000000000/(1024^4)=3D5.  And I've got 64k =
boot, 8G swap, 16G scratch on each drive *before* the tank, so eh, I =
guess 4.74T sounds about right.
>=20
> The 7.12T reported by zpool doesn't seem to be taking into account the =
reduced space from the raidz parity.  *shrug*
>=20
> Enough about sizes; what's your read/write performance like between =
512-byte/4K?  I didn't think to test performance in the 4K =
configuration; I really wish I had, now.

I didn't test performance.  I'm doing all the work running from the =
mfsBSD boot disc.  I'm not sure a simple 'dd' is a good test, but if you =
have suggestions, I'm open.

daniel


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 07:15:46 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DC9AF106566B;
	Fri,  2 Sep 2011 07:15:46 +0000 (UTC)
	(envelope-from pawel@dawidek.net)
Received: from mail.dawidek.net (60.wheelsystems.com [83.12.187.60])
	by mx1.freebsd.org (Postfix) with ESMTP id 5303D8FC14;
	Fri,  2 Sep 2011 07:15:46 +0000 (UTC)
Received: from localhost (58.wheelsystems.com [83.12.187.58])
	by mail.dawidek.net (Postfix) with ESMTPSA id 81235371;
	Fri,  2 Sep 2011 09:15:44 +0200 (CEST)
Date: Fri, 2 Sep 2011 09:15:23 +0200
From: Pawel Jakub Dawidek <pjd@FreeBSD.org>
To: Martin Matuska <mm@FreeBSD.org>
Message-ID: <20110902071523.GB1660@garage.freebsd.pl>
References: <1314646728.7898.44.camel@pow>
	<CAFqOu6gHvwxiOkFZ0Enh3VRHcs3aD=gH4u_6=XuhfYXg5NnkpQ@mail.gmail.com>
	<4E5BFC6F.5080507@FreeBSD.org>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="VrqPEDrXMn8OVzN4"
Content-Disposition: inline
In-Reply-To: <4E5BFC6F.5080507@FreeBSD.org>
X-OS: FreeBSD 9.0-CURRENT amd64
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org, tech@hybrid-logic.co.uk, luke@hybrid-logic.co.uk
Subject: Re: ZFS hang in production on 8.2-RELEASE
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 07:15:46 -0000


--VrqPEDrXMn8OVzN4
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Aug 29, 2011 at 10:54:07PM +0200, Martin Matuska wrote:
> On 29. 8. 2011 21:55, Artem Belevich wrote:
> > It sounds like the bug Martin Matuska has recently fixed in FreeBSD
> > and reported upstream to Illumos:
> > https://www.illumos.org/issues/1313
> >
> > The fix has been MFC'ed to 8-STABLE r224647 on Aug 4th.
> >
> > --Artem
> No, I think this is more likely fixed by pjd's bugfix in r224791 (MFC'ed
> to stable/8 as r225100).
>=20
> The corresponding patch is:
> http://people.freebsd.org/~pjd/patches/zfsdev_state_lock.patch

My patch fixes deadlock when there is some activity in vdevs handlings
(like removal of disk from the pool or something like that).

The bug reported is definiately related to force unmount while file
system is loaded. I've spend a lot of time trying to get forcible
unmounts right, which is not an easy task, believe me, but it is
possible the deadlock is already fixed in v28.

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

--VrqPEDrXMn8OVzN4
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAk5ggosACgkQForvXbEpPzTKeQCdFik3mew907gBOnvRpULE2u1r
WgkAoLma6L6SccDqqo4r9EMHjl4lbY9O
=QJSl
-----END PGP SIGNATURE-----

--VrqPEDrXMn8OVzN4--

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 08:20:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9487E1065670
	for <freebsd-fs@hub.freebsd.org>; Fri,  2 Sep 2011 08:20:14 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 839868FC12
	for <freebsd-fs@hub.freebsd.org>; Fri,  2 Sep 2011 08:20:14 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828KEUe008796
	for <freebsd-fs@freefall.freebsd.org>; Fri, 2 Sep 2011 08:20:14 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828KEIQ008795;
	Fri, 2 Sep 2011 08:20:14 GMT (envelope-from gnats)
Date: Fri, 2 Sep 2011 08:20:14 GMT
Message-Id: <201109020820.p828KEIQ008795@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: dfilter@FreeBSD.ORG (dfilter service)
Cc: 
Subject: Re: kern/160035: commit references a PR
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: dfilter service <dfilter@FreeBSD.ORG>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 08:20:14 -0000

The following reply was made to PR kern/160035; it has been noted by GNATS.

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/160035: commit references a PR
Date: Fri,  2 Sep 2011 08:19:40 +0000 (UTC)

 Author: mm
 Date: Fri Sep  2 08:19:19 2011
 New Revision: 225326
 URL: http://svn.freebsd.org/changeset/base/225326
 
 Log:
   MFC r226155:
   
   Generalize ffs_pages_remove() into vn_pages_remove().
   
   Remove mapped pages for all dataset vnodes in zfs_rezget() using
   new vn_pages_remove() to fix mmapped files changed by
   zfs rollback or zfs receive -F.
   
   PR:		kern/160035, kern/156933
 
 Modified:
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
   stable/8/sys/kern/vfs_vnops.c
   stable/8/sys/sys/vnode.h
   stable/8/sys/ufs/ffs/ffs_inode.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -1259,6 +1259,7 @@ zfs_rezget(znode_t *zp)
  	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
  	dmu_object_info_t doi;
  	dmu_buf_t *db;
 +	vnode_t *vp;
  	uint64_t obj_num = zp->z_id;
  	uint64_t mode, size;
  	sa_bulk_attr_t bulk[8];
 @@ -1334,8 +1335,9 @@ zfs_rezget(znode_t *zp)
  	 * that for example regular file was replaced with directory
  	 * which has the same object number.
  	 */
 -	if (ZTOV(zp) != NULL &&
 -	    ZTOV(zp)->v_type != IFTOVT((mode_t)zp->z_mode)) {
 +	vp = ZTOV(zp);
 +	if (vp != NULL &&
 +	    vp->v_type != IFTOVT((mode_t)zp->z_mode)) {
  		zfs_znode_dmu_fini(zp);
  		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
  		return (EIO);
 @@ -1343,8 +1345,11 @@ zfs_rezget(znode_t *zp)
  
  	zp->z_unlinked = (zp->z_links == 0);
  	zp->z_blksz = doi.doi_data_block_size;
 -	if (zp->z_size != size && ZTOV(zp) != NULL)
 -		vnode_pager_setsize(ZTOV(zp), zp->z_size);
 +	if (vp != NULL) {
 +		vn_pages_remove(vp, 0, 0);
 +		if (zp->z_size != size)
 +			vnode_pager_setsize(vp, zp->z_size);
 +	}
  
  	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
  
 
 Modified: stable/8/sys/kern/vfs_vnops.c
 ==============================================================================
 --- stable/8/sys/kern/vfs_vnops.c	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/kern/vfs_vnops.c	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -63,6 +63,9 @@ __FBSDID("$FreeBSD$");
  
  #include <security/mac/mac_framework.h>
  
 +#include <vm/vm.h>
 +#include <vm/vm_object.h>
 +
  static fo_rdwr_t	vn_read;
  static fo_rdwr_t	vn_write;
  static fo_truncate_t	vn_truncate;
 @@ -1353,3 +1356,15 @@ vn_rlimit_fsize(const struct vnode *vp, 
  
  	return (0);
  }
 +
 +void
 +vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end)
 +{
 +	vm_object_t object;
 +
 +	if ((object = vp->v_object) == NULL)
 +		return;
 +	VM_OBJECT_LOCK(object);
 +	vm_object_page_remove(object, start, end, 0);
 +	VM_OBJECT_UNLOCK(object);
 +}
 
 Modified: stable/8/sys/sys/vnode.h
 ==============================================================================
 --- stable/8/sys/sys/vnode.h	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/sys/vnode.h	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -644,6 +644,7 @@ int	_vn_lock(struct vnode *vp, int flags
  int	vn_open(struct nameidata *ndp, int *flagp, int cmode, struct file *fp);
  int	vn_open_cred(struct nameidata *ndp, int *flagp, int cmode,
  	    u_int vn_open_flags, struct ucred *cred, struct file *fp);
 +void	vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end);
  int	vn_pollrecord(struct vnode *vp, struct thread *p, int events);
  int	vn_rdwr(enum uio_rw rw, struct vnode *vp, void *base,
  	    int len, off_t offset, enum uio_seg segflg, int ioflg,
 
 Modified: stable/8/sys/ufs/ffs/ffs_inode.c
 ==============================================================================
 --- stable/8/sys/ufs/ffs/ffs_inode.c	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/ufs/ffs/ffs_inode.c	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -129,18 +129,6 @@ ffs_update(vp, waitfor)
  	}
  }
  
 -static void
 -ffs_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end)
 -{
 -	vm_object_t object;
 -
 -	if ((object = vp->v_object) == NULL)
 -		return;
 -	VM_OBJECT_LOCK(object);
 -	vm_object_page_remove(object, start, end, FALSE);
 -	VM_OBJECT_UNLOCK(object);
 -}
 -
  #define	SINGLE	0	/* index of single indirect block */
  #define	DOUBLE	1	/* index of double indirect block */
  #define	TRIPLE	2	/* index of triple indirect block */
 @@ -218,7 +206,7 @@ ffs_truncate(vp, length, flags, cred, td
  			(void) chkdq(ip, -extblocks, NOCRED, 0);
  #endif
  			vinvalbuf(vp, V_ALT, 0, 0);
 -			ffs_pages_remove(vp,
 +			vn_pages_remove(vp,
  			    OFF_TO_IDX(lblktosize(fs, -extblocks)), 0);
  			ip->i_din2->di_extsize = 0;
  			for (i = 0; i < NXADDR; i++) {
 @@ -297,7 +285,7 @@ ffs_truncate(vp, length, flags, cred, td
  			ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
  			vinvalbuf(vp, needextclean ? 0 : V_NORMAL, 0, 0);
  			if (!needextclean)
 -				ffs_pages_remove(vp, 0,
 +				vn_pages_remove(vp, 0,
  				    OFF_TO_IDX(lblktosize(fs, -extblocks)));
  			vnode_pager_setsize(vp, 0);
  			ip->i_flag |= IN_CHANGE | IN_UPDATE;
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 08:20:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8DF82106567A
	for <freebsd-fs@hub.freebsd.org>; Fri,  2 Sep 2011 08:20:17 +0000 (UTC)
	(envelope-from gnats@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 7D4B88FC1B
	for <freebsd-fs@hub.freebsd.org>; Fri,  2 Sep 2011 08:20:17 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828KHpP008816
	for <freebsd-fs@freefall.freebsd.org>; Fri, 2 Sep 2011 08:20:17 GMT
	(envelope-from gnats@freefall.freebsd.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828KH03008815;
	Fri, 2 Sep 2011 08:20:17 GMT (envelope-from gnats)
Date: Fri, 2 Sep 2011 08:20:17 GMT
Message-Id: <201109020820.p828KH03008815@freefall.freebsd.org>
To: freebsd-fs@FreeBSD.org
From: dfilter@FreeBSD.ORG (dfilter service)
Cc: 
Subject: Re: kern/156933: commit references a PR
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: dfilter service <dfilter@FreeBSD.ORG>
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 08:20:17 -0000

The following reply was made to PR kern/156933; it has been noted by GNATS.

From: dfilter@FreeBSD.ORG (dfilter service)
To: bug-followup@FreeBSD.org
Cc:  
Subject: Re: kern/156933: commit references a PR
Date: Fri,  2 Sep 2011 08:19:40 +0000 (UTC)

 Author: mm
 Date: Fri Sep  2 08:19:19 2011
 New Revision: 225326
 URL: http://svn.freebsd.org/changeset/base/225326
 
 Log:
   MFC r226155:
   
   Generalize ffs_pages_remove() into vn_pages_remove().
   
   Remove mapped pages for all dataset vnodes in zfs_rezget() using
   new vn_pages_remove() to fix mmapped files changed by
   zfs rollback or zfs receive -F.
   
   PR:		kern/160035, kern/156933
 
 Modified:
   stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
   stable/8/sys/kern/vfs_vnops.c
   stable/8/sys/sys/vnode.h
   stable/8/sys/ufs/ffs/ffs_inode.c
 Directory Properties:
   stable/8/sys/   (props changed)
   stable/8/sys/amd64/include/xen/   (props changed)
   stable/8/sys/cddl/contrib/opensolaris/   (props changed)
   stable/8/sys/contrib/dev/acpica/   (props changed)
   stable/8/sys/contrib/pf/   (props changed)
 
 Modified: stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
 ==============================================================================
 --- stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -1259,6 +1259,7 @@ zfs_rezget(znode_t *zp)
  	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
  	dmu_object_info_t doi;
  	dmu_buf_t *db;
 +	vnode_t *vp;
  	uint64_t obj_num = zp->z_id;
  	uint64_t mode, size;
  	sa_bulk_attr_t bulk[8];
 @@ -1334,8 +1335,9 @@ zfs_rezget(znode_t *zp)
  	 * that for example regular file was replaced with directory
  	 * which has the same object number.
  	 */
 -	if (ZTOV(zp) != NULL &&
 -	    ZTOV(zp)->v_type != IFTOVT((mode_t)zp->z_mode)) {
 +	vp = ZTOV(zp);
 +	if (vp != NULL &&
 +	    vp->v_type != IFTOVT((mode_t)zp->z_mode)) {
  		zfs_znode_dmu_fini(zp);
  		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
  		return (EIO);
 @@ -1343,8 +1345,11 @@ zfs_rezget(znode_t *zp)
  
  	zp->z_unlinked = (zp->z_links == 0);
  	zp->z_blksz = doi.doi_data_block_size;
 -	if (zp->z_size != size && ZTOV(zp) != NULL)
 -		vnode_pager_setsize(ZTOV(zp), zp->z_size);
 +	if (vp != NULL) {
 +		vn_pages_remove(vp, 0, 0);
 +		if (zp->z_size != size)
 +			vnode_pager_setsize(vp, zp->z_size);
 +	}
  
  	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
  
 
 Modified: stable/8/sys/kern/vfs_vnops.c
 ==============================================================================
 --- stable/8/sys/kern/vfs_vnops.c	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/kern/vfs_vnops.c	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -63,6 +63,9 @@ __FBSDID("$FreeBSD$");
  
  #include <security/mac/mac_framework.h>
  
 +#include <vm/vm.h>
 +#include <vm/vm_object.h>
 +
  static fo_rdwr_t	vn_read;
  static fo_rdwr_t	vn_write;
  static fo_truncate_t	vn_truncate;
 @@ -1353,3 +1356,15 @@ vn_rlimit_fsize(const struct vnode *vp, 
  
  	return (0);
  }
 +
 +void
 +vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end)
 +{
 +	vm_object_t object;
 +
 +	if ((object = vp->v_object) == NULL)
 +		return;
 +	VM_OBJECT_LOCK(object);
 +	vm_object_page_remove(object, start, end, 0);
 +	VM_OBJECT_UNLOCK(object);
 +}
 
 Modified: stable/8/sys/sys/vnode.h
 ==============================================================================
 --- stable/8/sys/sys/vnode.h	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/sys/vnode.h	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -644,6 +644,7 @@ int	_vn_lock(struct vnode *vp, int flags
  int	vn_open(struct nameidata *ndp, int *flagp, int cmode, struct file *fp);
  int	vn_open_cred(struct nameidata *ndp, int *flagp, int cmode,
  	    u_int vn_open_flags, struct ucred *cred, struct file *fp);
 +void	vn_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end);
  int	vn_pollrecord(struct vnode *vp, struct thread *p, int events);
  int	vn_rdwr(enum uio_rw rw, struct vnode *vp, void *base,
  	    int len, off_t offset, enum uio_seg segflg, int ioflg,
 
 Modified: stable/8/sys/ufs/ffs/ffs_inode.c
 ==============================================================================
 --- stable/8/sys/ufs/ffs/ffs_inode.c	Fri Sep  2 08:15:48 2011	(r225325)
 +++ stable/8/sys/ufs/ffs/ffs_inode.c	Fri Sep  2 08:19:19 2011	(r225326)
 @@ -129,18 +129,6 @@ ffs_update(vp, waitfor)
  	}
  }
  
 -static void
 -ffs_pages_remove(struct vnode *vp, vm_pindex_t start, vm_pindex_t end)
 -{
 -	vm_object_t object;
 -
 -	if ((object = vp->v_object) == NULL)
 -		return;
 -	VM_OBJECT_LOCK(object);
 -	vm_object_page_remove(object, start, end, FALSE);
 -	VM_OBJECT_UNLOCK(object);
 -}
 -
  #define	SINGLE	0	/* index of single indirect block */
  #define	DOUBLE	1	/* index of double indirect block */
  #define	TRIPLE	2	/* index of triple indirect block */
 @@ -218,7 +206,7 @@ ffs_truncate(vp, length, flags, cred, td
  			(void) chkdq(ip, -extblocks, NOCRED, 0);
  #endif
  			vinvalbuf(vp, V_ALT, 0, 0);
 -			ffs_pages_remove(vp,
 +			vn_pages_remove(vp,
  			    OFF_TO_IDX(lblktosize(fs, -extblocks)), 0);
  			ip->i_din2->di_extsize = 0;
  			for (i = 0; i < NXADDR; i++) {
 @@ -297,7 +285,7 @@ ffs_truncate(vp, length, flags, cred, td
  			ASSERT_VOP_LOCKED(vp, "ffs_truncate1");
  			vinvalbuf(vp, needextclean ? 0 : V_NORMAL, 0, 0);
  			if (!needextclean)
 -				ffs_pages_remove(vp, 0,
 +				vn_pages_remove(vp, 0,
  				    OFF_TO_IDX(lblktosize(fs, -extblocks)));
  			vnode_pager_setsize(vp, 0);
  			ip->i_flag |= IN_CHANGE | IN_UPDATE;
 _______________________________________________
 svn-src-all@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/svn-src-all
 To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
 

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 08:23:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 01895106566C;
	Fri,  2 Sep 2011 08:23:59 +0000 (UTC) (envelope-from mm@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id CD4208FC15;
	Fri,  2 Sep 2011 08:23:58 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828Nwxj017853;
	Fri, 2 Sep 2011 08:23:58 GMT (envelope-from mm@freefall.freebsd.org)
Received: (from mm@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828NvEc017849;
	Fri, 2 Sep 2011 08:23:57 GMT (envelope-from mm)
Date: Fri, 2 Sep 2011 08:23:57 GMT
Message-Id: <201109020823.p828NvEc017849@freefall.freebsd.org>
To: org_freebsd@L93.com, mm@FreeBSD.org, freebsd-fs@FreeBSD.org
From: mm@FreeBSD.org
Cc: 
Subject: Re: kern/156933: [zfs] ZFS receive after read on readonly=on
	filesystem is corrupted without warning
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 08:23:59 -0000

Synopsis: [zfs] ZFS receive after read on readonly=on filesystem is corrupted without warning

State-Changed-From-To: open->closed
State-Changed-By: mm
State-Changed-When: Fri Sep 2 08:23:57 UTC 2011
State-Changed-Why: 
Resolved. Thanks!

http://www.freebsd.org/cgi/query-pr.cgi?pr=156933

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 08:24:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 815FB10656D1;
	Fri,  2 Sep 2011 08:24:09 +0000 (UTC) (envelope-from mm@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 59F328FC13;
	Fri,  2 Sep 2011 08:24:09 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p828O9qV017954;
	Fri, 2 Sep 2011 08:24:09 GMT (envelope-from mm@freefall.freebsd.org)
Received: (from mm@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p828O9JT017950;
	Fri, 2 Sep 2011 08:24:09 GMT (envelope-from mm)
Date: Fri, 2 Sep 2011 08:24:09 GMT
Message-Id: <201109020824.p828O9JT017950@freefall.freebsd.org>
To: mm@FreeBSD.org, mm@FreeBSD.org, freebsd-fs@FreeBSD.org
From: mm@FreeBSD.org
Cc: 
Subject: Re: kern/160035: [zfs] zfs rollback does not invalidate mmapped
	cache
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 08:24:09 -0000

Synopsis: [zfs] zfs rollback does not invalidate mmapped cache

State-Changed-From-To: open->closed
State-Changed-By: mm
State-Changed-When: Fri Sep 2 08:24:08 UTC 2011
State-Changed-Why: 
Resolved. Thanks!

http://www.freebsd.org/cgi/query-pr.cgi?pr=160035

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 13:48:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id EBAB1106566C
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 13:48:32 +0000 (UTC)
	(envelope-from joh.hendriks@gmail.com)
Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 819A38FC13
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 13:48:32 +0000 (UTC)
Received: by ewy1 with SMTP id 1so1841882ewy.13
	for <freebsd-fs@freebsd.org>; Fri, 02 Sep 2011 06:48:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:subject
	:content-type:content-transfer-encoding;
	bh=W/81gqZWQREkbFXIodYcCf1CCe2v3ifpq6HPYclGRB8=;
	b=UMVphNl3ZB850m03NDAF/dEau034mu4PW5LW4JMurbSZSi2SAmKLJTP6vcqSM8dCET
	XxUzYkCYceXAYWEh4orEc07tf3MNMpBg2WK5QFneorYnKxV/FNKdwskIE3vY2//p1iqQ
	qCDnt7dgARVSS3Xv9tpxX7P4jqgjwX8cwYIc4=
Received: by 10.213.31.75 with SMTP id x11mr170922ebc.6.1314970004457;
	Fri, 02 Sep 2011 06:26:44 -0700 (PDT)
Received: from [192.168.50.106] (double-l.xs4all.nl [80.126.205.144])
	by mx.google.com with ESMTPS id i6sm2111025eeb.11.2011.09.02.06.26.43
	(version=SSLv3 cipher=OTHER); Fri, 02 Sep 2011 06:26:43 -0700 (PDT)
Message-ID: <4E60D992.3030802@gmail.com>
Date: Fri, 02 Sep 2011 15:26:42 +0200
From: Johan Hendriks <joh.hendriks@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: ZFS on HAST and reboot.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 13:48:33 -0000

Hello all.

I just started using ZFS on top of HAST.

What i did was first glabel my disks like disk1 to disk3
Then I created my hast devices in /etc/hast.conf

/etc/hast.conf looks like this.
i
resource disk1 {
on srv1 {
local /dev/label/disk1
remote 192.168.5.41
     }
on srv2 {
local /dev/label/disk1
remote 192.168.5.40
     }
}
resource disk2 {
on srv1 {
local /dev/label/disk2
remote 192.168.5.41
     }
on srv2 {
local /dev/label/disk2
remote 192.168.5.40
     }
}
resource disk3 {
on srv1 {
local /dev/label/disk3
remote 192.168.5.41
     }
on srv2 {
local /dev/label/disk3
remote 192.168.5.40
     }
}

This works.
I can set srv 1 to primary and srv 2 to secondary and visa versa.
hastctl role primary all and hastctl role secondary all.

Then i created  the raidz on the master srv1
zpool create storage raidz1 hast/disk1 hast/disk2 hast/disk3

all looks good.
zpool status
   pool: storage
  state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Aug 31 20:49:19 2011
config:

         NAME            STATE     READ WRITE CKSUM
         storage         ONLINE       0     0     0
           raidz1-0      ONLINE       0     0     0
             hast/disk1  ONLINE       0     0     0
             hast/disk2  ONLINE       0     0     0
             hast/disk3  ONLINE       0     0     0

errors: No known data errors

then i created the mountpoint and created  zfs on it
# mkdir /usr/local/virtual
# zfs create storage/virtual
# zfs list
# zfs set mountpoint=/usr/local/virtual storage/virtual

# /etc/rc.d/zfs start and whooop there is my /usr/local/virtual zfs 
filesystem.
# mount
/dev/ada0p2 on / (ufs, local, journaled soft-updates)
devfs on /dev (devfs, local, multilabel)
storage on /storage (zfs, local, nfsv4acls)
storage/virtual on /usr/local/virtual (zfs, local, nfsv4acls)

if i do a zfs export -f storage on srv1 change the hast role to 
secondary and then set the hast role on srv2 to primary and do zfs 
import -f storage, i can see the files on srv2.

I am a happy camper :D

So it works like advertised.
Now i rebooted both machines.
all is working fine.

But if i reboot the server srv1 again, i can not import the pool 
anymore, it tells me the pool is already imported.
I do load the carp-hast-switch master file with ifstated.
This does set the hast role to primary.
But can not import the pool.
Now this can be true because i did not export it.
if i do a /etc/rc.d/zfs start, than it gets mounted and the pool is 
again available.

Is there a way i can do this automaticly.
In my understanding after a reboot zfs try's to start, but fails because 
my hast providers are not yet ready.
Or am i doing something wrong and should i not do it this way.
Can i tell zfs to start after the hast providers are primary at reboot.

I hope i explained it correctly.
Thanks for your time.

regards
Johan Hendriks


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 15:07:41 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8CE2106564A
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 15:07:41 +0000 (UTC)
	(envelope-from trent@snakebite.org)
Received: from exchange.liveoffice.com (exchla3.liveoffice.com [64.70.67.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 8D2D08FC08
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 15:07:41 +0000 (UTC)
Received: from EXCASUM02.exchhosting.com (192.168.11.116) by
	exhub04.exchhosting.com (192.168.11.100) with Microsoft SMTP Server
	(TLS) id 8.2.213.0; Fri, 2 Sep 2011 08:07:38 -0700
Received: from [10.211.55.3] (35.11.55.172) by exchange.liveoffice.com
	(192.168.11.116) with Microsoft SMTP Server (TLS) id 8.2.213.0;
	Fri, 2 Sep 2011 08:07:38 -0700
Message-ID: <4E60F138.1000705@snakebite.org>
Date: Fri, 2 Sep 2011 11:07:36 -0400
From: Trent Nelson <trent@snakebite.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: <freebsd-fs@freebsd.org>
References: <F335600A-0364-455F-A276-43E23B0E597E@3geeks.org>
	<4E5F811A.2040307@snakebite.org>
	<7FAD4A4D-2465-4A80-A445-1D34424F8BB6@3geeks.org>
	<4E5FBE3E.7020706@snakebite.org>
	<553883C7-B97D-429F-AF4A-E208B6051B62@3geeks.org>
In-Reply-To: <553883C7-B97D-429F-AF4A-E208B6051B62@3geeks.org>
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: gptzfsboot and 4k sector raidz
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 15:07:41 -0000

On 01-Sep-11 1:46 PM, Daniel Mayfield wrote:

>> Enough about sizes; what's your read/write performance like between
>> 512-byte/4K?  I didn't think to test performance in the 4K
>> configuration; I really wish I had, now.
>
> I didn't test performance.  I'm doing all the work running from the
> mfsBSD boot disc.  I'm not sure a simple 'dd' is a good test, but if
> you have suggestions, I'm open.

It's a good test when it shows you can't get more than 20-30MB/sec in 
bursts for each disk ;-)  I didn't think to try just dd'ing directly to 
the disk versus through a zfs pool; I think the results of that would be 
pretty conclusive with regards to whether or not using the new 4K-sector 
drives with 512-byte sectors is as bad as I'm seeing, or if it's 
something else.

     Trent.


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 16:34:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BE0CC106564A
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 16:34:33 +0000 (UTC)
	(envelope-from radiomlodychbandytow@o2.pl)
Received: from tur.go2.pl (tur.go2.pl [193.17.41.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 7BDCC8FC15
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 16:34:33 +0000 (UTC)
Received: from moh2-ve2.go2.pl (moh2-ve2.go2.pl [193.17.41.200])
	by tur.go2.pl (Postfix) with ESMTP id 66319230C14
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 18:16:17 +0200 (CEST)
Received: from moh2-ve2.go2.pl (unknown [10.0.0.200])
	by moh2-ve2.go2.pl (Postfix) with ESMTP id 355D6B00169
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 18:16:15 +0200 (CEST)
Received: from unknown (unknown [10.0.0.42])
	by moh2-ve2.go2.pl (Postfix) with SMTP
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 18:16:14 +0200 (CEST)
Received: from host892524678.com-promis.3s.pl [89.25.246.78]
	by poczta.o2.pl with ESMTP id bfXQjd; Fri, 02 Sep 2011 18:16:14 +0200
Message-ID: <4E61014B.7080100@o2.pl>
Date: Fri, 02 Sep 2011 18:16:11 +0200
From: =?UTF-8?B?UmFkaW8gbcWCb2R5Y2ggYmFuZHl0w7N3?= <radiomlodychbandytow@o2.pl>
User-Agent: Mozilla/5.0 (Windows NT 5.2; WOW64;
	rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-O2-Trust: 2, 61
X-O2-SPF: neutral
Subject: [ZFS] lzjb_uncompress possible access violation?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 16:34:33 -0000

As far as I can see, when checksumming is turned off or there's a 
collision, it is possible that lzjb_uncompress is fed with corrupted 
data. Source length is entirely ignored and since source has to be 
shorter than dest, it is broken.

-- 
Twoje radio


From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 16:54:04 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D2DBF106564A
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 16:54:04 +0000 (UTC)
	(envelope-from brodbd@uw.edu)
Received: from mail-ew0-f54.google.com (mail-ew0-f54.google.com
	[209.85.215.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 72DC68FC0C
	for <freebsd-fs@freebsd.org>; Fri,  2 Sep 2011 16:54:04 +0000 (UTC)
Received: by ewy1 with SMTP id 1so1935806ewy.13
	for <freebsd-fs@freebsd.org>; Fri, 02 Sep 2011 09:54:03 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.213.17.140 with SMTP id s12mr492906eba.111.1314980973922; Fri,
	02 Sep 2011 09:29:33 -0700 (PDT)
Received: by 10.213.22.210 with HTTP; Fri, 2 Sep 2011 09:29:33 -0700 (PDT)
Date: Fri, 2 Sep 2011 09:29:33 -0700
Message-ID: <CAHHaOuY=BEMrhYuzXtD5AtXG7niLXEO1yhO5P4EimcsLuTrLXw@mail.gmail.com>
From: David Brodbeck <brodbd@uw.edu>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: ZFSv28+NFSv4 poor file creation performance,
	"sync=disabled" has no effect
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 16:54:05 -0000

I originally posted this on FreeBSD-questions, but it was suggested that I
bring it here.
I'm testing FreeBSD 9.0-BETA with an eye toward eventually using
FreeBSD 9.0 to replace some existing OpenSolaris 2008.11
installations.  I've found NFS file creation performance (as measured
by Bonnie++) is equally slow for both with default settings.  However,
on OpenSolaris I disable the ZIL to improve file creation performance.
 This tuning parameter was removed from FreeBSD 9.0; its replacement
is supposed to be the per-filesystem flag "sync", but setting this
flag seems to have no effect.

I did recompile the FreeBSD kernel without debugging features before
doing the tests, so I don't think this is a case of debugging code
slowing things down.

Here's the relevant data; these are all from bonnie++'s "sequential
create" benchmark.  The NFS client was RedHat Enterprise Linux 5.6.

OpenSolaris 2008.11, default settings: 58/second
OpenSolaris 2008.11, with "zil_disable=1": 1258/second

FreeBSD 9.0-BETA, default settings: 107/second
FreeBSD 9.0-BETA, with "sync=disabled": 106/second


So it appears the "sync" ZFS parameter has no effect in FreeBSD.  Has
anyone else seen this?  Is there a way to improve NFS file creation
performance now that zil_disable has been removed?

-- 
David Brodbeck
System Administrator, Linguistics
University of Washington

From owner-freebsd-fs@FreeBSD.ORG  Fri Sep  2 23:40:31 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E78071065674;
	Fri,  2 Sep 2011 23:40:31 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BEF5B8FC13;
	Fri,  2 Sep 2011 23:40:31 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p82NeV7v061962;
	Fri, 2 Sep 2011 23:40:31 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p82NeVm8061952;
	Fri, 2 Sep 2011 23:40:31 GMT (envelope-from linimon)
Date: Fri, 2 Sep 2011 23:40:31 GMT
Message-Id: <201109022340.p82NeVm8061952@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/160410: [smbfs] [hang] smbfs hangs when transferring large
	files
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Sep 2011 23:40:32 -0000

Old Synopsis: smbfs hangs when transferring large files
New Synopsis: [smbfs] [hang] smbfs hangs when transferring large files

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Fri Sep 2 23:40:19 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=160410

From owner-freebsd-fs@FreeBSD.ORG  Sat Sep  3 01:36:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 64E75106566C
	for <freebsd-fs@freebsd.org>; Sat,  3 Sep 2011 01:36:11 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id 219438FC12
	for <freebsd-fs@freebsd.org>; Sat,  3 Sep 2011 01:36:10 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AqAEACWEYU6DaFvO/2dsb2JhbABCDoQ/pRaBRgEBAQECAQEBASArIAsFFg4KAgINGQIpAQkmBggHBAEcBIdSBKVGkWeBLIQtgREEkRqCEpAiM1Q
X-IronPort-AV: E=Sophos;i="4.68,322,1312171200"; d="scan'208";a="136405582"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 02 Sep 2011 21:36:10 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 43455B3F80;
	Fri,  2 Sep 2011 21:36:10 -0400 (EDT)
Date: Fri, 2 Sep 2011 21:36:10 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: David Brodbeck <brodbd@uw.edu>
Message-ID: <14220705.747900.1315013770239.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <CAHHaOuY=BEMrhYuzXtD5AtXG7niLXEO1yhO5P4EimcsLuTrLXw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFSv28+NFSv4 poor file creation performance,	"sync=disabled"
 has no effect
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 03 Sep 2011 01:36:11 -0000

David Brodbeck wrote:
> I originally posted this on FreeBSD-questions, but it was suggested
> that I
> bring it here.
> I'm testing FreeBSD 9.0-BETA with an eye toward eventually using
> FreeBSD 9.0 to replace some existing OpenSolaris 2008.11
> installations. I've found NFS file creation performance (as measured
> by Bonnie++) is equally slow for both with default settings. However,
> on OpenSolaris I disable the ZIL to improve file creation performance.

I know nothing about ZFS, so all I can do is pass along what others
have said in previous posts. (I'd suggest you look through the freebsd-fs@
archives.)

One post explained how disabling the ZIL can result in up to 5seconds worth
of changes being lost if/when the server crashes. (A lot can change on a file
system in 5sec. The NFS protocol assumes all fs changes related to a file
creation are done before the server replies to the RPC. As such, disabling the
ZIL does violate the protocol specs and means you are living dangerously.)

> This tuning parameter was removed from FreeBSD 9.0; its replacement
> is supposed to be the per-filesystem flag "sync", but setting this
> flag seems to have no effect.
> 
> I did recompile the FreeBSD kernel without debugging features before
> doing the tests, so I don't think this is a case of debugging code
> slowing things down.
> 
> Here's the relevant data; these are all from bonnie++'s "sequential
> create" benchmark. The NFS client was RedHat Enterprise Linux 5.6.
> 
> OpenSolaris 2008.11, default settings: 58/second
> OpenSolaris 2008.11, with "zil_disable=1": 1258/second
> 
> FreeBSD 9.0-BETA, default settings: 107/second
> FreeBSD 9.0-BETA, with "sync=disabled": 106/second
> 
> 
> So it appears the "sync" ZFS parameter has no effect in FreeBSD. Has
> anyone else seen this? Is there a way to improve NFS file creation
> performance now that zil_disable has been removed?
> 
Some have reported good results from putting the ZIL on a dedicated
device. Some use an SSD, but there are write bandwidth issues related
to this. If I understood them correctly, you need to make sure that the
SSD is designed to provide good write performance and should be configured
to a much larger size than what the ZIL would use. Again, there were
good threads discussing this on freebsd-fs@. I have never used either
ZFS nor an SSD, so the above is just paraphrasing my understanding from
these threads and could be way off base. (I do know that NFS clients
expect the changes related to file creation to be stored on non-volatile
storage before replying to the creation RPC.)

> --
> David Brodbeck
> System Administrator, Linguistics
> University of Washington
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"