From owner-freebsd-fs@FreeBSD.ORG Sun May 8 07:26:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 667A81065674 for ; Sun, 8 May 2011 07:26:45 +0000 (UTC) (envelope-from freebsd@psconsult.nl) Received: from mx1.psconsult.nl (unknown [IPv6:2001:7b8:30f:e0::5059:ee8a]) by mx1.freebsd.org (Postfix) with ESMTP id 1A3A28FC1A for ; Sun, 8 May 2011 07:26:44 +0000 (UTC) Received: from mx1.psconsult.nl (psc11.adsl.iaf.nl [80.89.238.138]) by mx1.psconsult.nl (8.14.4/8.14.4) with ESMTP id p487Qbpk005463 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Sun, 8 May 2011 09:26:43 +0200 (CEST) (envelope-from freebsd@psconsult.nl) Received: (from paul@localhost) by mx1.psconsult.nl (8.14.4/8.14.4/Submit) id p487Qb1a005462 for freebsd-fs@freebsd.org; Sun, 8 May 2011 09:26:37 +0200 (CEST) (envelope-from freebsd@psconsult.nl) X-Authentication-Warning: mx1.psconsult.nl: paul set sender to freebsd@psconsult.nl using -f Date: Sun, 8 May 2011 09:26:37 +0200 From: Paul Schenkeveld To: freebsd-fs@freebsd.org Message-ID: <20110508072637.GA5123@psconsult.nl> References: <210021304745658@web53.yandex.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <210021304745658@web53.yandex.ru> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: Re: ZFS can't mount filesystem X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 May 2011 07:26:45 -0000 On Sat, May 07, 2011 at 09:20:57AM +0400, Igor Zabelin wrote: > Hi, > > I have trouble with ZFS. One of set filesystems can't mount. > zpool scrub is not doing anything > ZFS reports an error when geting the properties. > SMART extended offline test for each disk completed without error. > It's possible to recover data? Mount ignoring errors? > > FreeBSD 8.2-RELEASE > > ZFS reports an error when geting the properties. > > # zfs get all tank/var > > [skip normal output] > internal error: unable to get version property > internal error: unable to get utf8only property > internal error: unable to get normalization property > internal error: unable to get casesensitivity property > [skip normal output] > > # zpool status -v tank > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: scrub stopped after 0h0m with 0 errors on Sat May 7 08:09:35 2011 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 36 > raidz1 ONLINE 0 0 144 > gpt/disk5 ONLINE 0 0 0 > gpt/disk6 ONLINE 0 0 0 > gpt/disk7 ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > tank/var:<0x0> The 'status:' line indicates that (previous) problems with the pool left tank/var unusable because zfs encountered checksum errors that could not be repaired, i.e. because not enough replicas were available. The story doesn't tell the cause of this problem, if the problem was with the drives, cables or controller, the evidence is in the messages file but that is probably on the affected dataset, unless you have backups or send syslog to a syslog server. Disk problems are not likely the cause because you're using raidz1 so at least two disks must have had problems to cause this and SMART apparently did not report drive problems too. Other causes for your problems include: - blocks on disk were overwritten for whatever reason - powerfailure + write-back cache but no battery-backup - problems with the mobo/processor/memory I think your chances of recovering tank/var are very slim but when you 'zfs destroy tank/var' (using -r if there are snapshots) you probably can save the rest of your pool. If you have sub-datasers under tank/var youd could first *try* to move them to another parent like 'zfs rename tank/var/log tank/rescued_var_log' but probably the damage to tank/var will prevent that. As always, sysadmin rule #1 applies here: backups, backups, backups. HTH Paul Schenkeveld From owner-freebsd-fs@FreeBSD.ORG Mon May 9 10:42:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E4EE106564A for ; Mon, 9 May 2011 10:42:45 +0000 (UTC) (envelope-from rs@bytecamp.net) Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9]) by mx1.freebsd.org (Postfix) with ESMTP id 8B6F78FC14 for ; Mon, 9 May 2011 10:42:44 +0000 (UTC) Received: (qmail 94912 invoked by uid 89); 9 May 2011 12:42:42 +0200 Received: from stella.bytecamp.net (HELO ?212.204.60.37?) (rs%bytecamp.net@212.204.60.37) by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP; 9 May 2011 12:42:42 +0200 Message-ID: <4DC7C522.3070601@bytecamp.net> Date: Mon, 09 May 2011 12:42:42 +0200 From: Robert Schulze Organization: bytecamp GmbH User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DC13260.4020905@bytecamp.net> <20110504115540.GA88625@icarus.home.lan> In-Reply-To: <20110504115540.GA88625@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 10:42:45 -0000 Hi, Am 04.05.2011 13:55, schrieb Jeremy Chadwick: >> Is it _required_ to upgrade existing pools and filesystems or can >> that be done anytime later? > > - It can be done later, though by not upgrading you lose the ability to > use newer features. well, the features are not the reason to upgrade the kernel. One more question: could the pools and filesystems be upgraded while there is pressure on them, or shall this procedure be made "offline", i.e. without any clients accessing the machine? with kind regards, Robert Schulze From owner-freebsd-fs@FreeBSD.ORG Mon May 9 10:52:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5AA12106564A for ; Mon, 9 May 2011 10:52:58 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.westchester.pa.mail.comcast.net (qmta10.westchester.pa.mail.comcast.net [76.96.62.17]) by mx1.freebsd.org (Postfix) with ESMTP id 08E9E8FC13 for ; Mon, 9 May 2011 10:52:57 +0000 (UTC) Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88]) by qmta10.westchester.pa.mail.comcast.net with comcast id hNki1g0011uE5Es5ANsysu; Mon, 09 May 2011 10:52:58 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta16.westchester.pa.mail.comcast.net with comcast id hNsr1g00R1t3BNj3cNsvGs; Mon, 09 May 2011 10:52:57 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id B9BA7102C19; Mon, 9 May 2011 03:52:49 -0700 (PDT) Date: Mon, 9 May 2011 03:52:49 -0700 From: Jeremy Chadwick To: Robert Schulze Message-ID: <20110509105249.GA58361@icarus.home.lan> References: <4DC13260.4020905@bytecamp.net> <20110504115540.GA88625@icarus.home.lan> <4DC7C522.3070601@bytecamp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DC7C522.3070601@bytecamp.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: zfs/zpool upgrade required? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 10:52:58 -0000 On Mon, May 09, 2011 at 12:42:42PM +0200, Robert Schulze wrote: > Am 04.05.2011 13:55, schrieb Jeremy Chadwick: > >>Is it _required_ to upgrade existing pools and filesystems or can > >>that be done anytime later? > > > >- It can be done later, though by not upgrading you lose the ability to > >use newer features. > > well, the features are not the reason to upgrade the kernel. One > more question: could the pools and filesystems be upgraded while > there is pressure on them, or shall this procedure be made > "offline", i.e. without any clients accessing the machine? It can be done while I/O requests are being handled. Depending on pool size, amount of data, etc. the "upgrade" commands may take some time, though. If you're worried about customer/client impact, you'd best schedule downtime/a maintenance window. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Mon May 9 11:07:05 2011 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3ED941065672 for ; Mon, 9 May 2011 11:07:05 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 254F68FC22 for ; Mon, 9 May 2011 11:07:05 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p49B75LP070609 for ; Mon, 9 May 2011 11:07:05 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p49B74r6070607 for freebsd-fs@FreeBSD.org; Mon, 9 May 2011 11:07:04 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 9 May 2011 11:07:04 GMT Message-Id: <201105091107.p49B74r6070607@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 May 2011 11:07:05 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156168 fs [nfs] [panic] Kernel panic under concurrent access ove o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs o kern/155484 fs [ufs] GPT + UFS boot don't work well together o kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 o kern/154447 fs [zfs] [panic] Occasional panics - solaris assert somew p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153847 fs [nfs] [panic] Kernel panic from incorrect m_free in nf o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small p kern/152488 fs [tmpfs] [patch] mtime of file updated when only inode o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o kern/151845 fs [smbfs] [patch] smbfs should be upgraded to support Un o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/150207 fs zpool(1): zpool import -d /dev tries to open weird dev o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o bin/148296 fs [zfs] [loader] [patch] Very slow probe in /usr/src/sys o kern/148204 fs [nfs] UDP NFS causes overload o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147790 fs [zfs] zfs set acl(mode|inherit) fails on existing zfs o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an o bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142914 fs [zfs] ZFS performance degradation over time o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140134 fs [msdosfs] write and fsck destroy filesystem integrity o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/133174 fs [msdosfs] [patch] msdosfs must support utf-encoded int o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121366 fs [zfs] [patch] Automatic disk scrubbing from periodic(8 o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha f kern/120991 fs [panic] [ffs] [snapshot] System crashes when manipulat o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117314 fs [ntfs] Long-filename only NTFS fs'es cause kernel pani o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using f kern/116170 fs [panic] Kernel panic when mounting /tmp o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/106030 fs [ufs] [panic] panic in ufs from geom when a dead disk o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o kern/33464 fs [ufs] soft update inconsistencies after system crash o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 223 problems total. From owner-freebsd-fs@FreeBSD.ORG Tue May 10 03:48:17 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6BFF2106566C for ; Tue, 10 May 2011 03:48:17 +0000 (UTC) (envelope-from licquia@linuxfoundation.org) Received: from rimu.licquia.org (rimu.licquia.org [72.249.37.24]) by mx1.freebsd.org (Postfix) with ESMTP id 473238FC0C for ; Tue, 10 May 2011 03:48:17 +0000 (UTC) Received: from server1.internal.licquia.org (c-98-220-117-231.hsd1.in.comcast.net [98.220.117.231]) by rimu.licquia.org (Postfix) with ESMTPS id 0708142558 for ; Mon, 9 May 2011 22:29:57 -0500 (CDT) Received: from server1.internal.licquia.org (localhost.localdomain [127.0.0.1]) by server1.internal.licquia.org (Postfix) with ESMTP id 2552998066 for ; Mon, 9 May 2011 23:30:23 -0400 (EDT) Received: from [192.168.50.14] (unknown [192.168.50.14]) by server1.internal.licquia.org (Postfix) with ESMTP id 0900898063 for ; Mon, 9 May 2011 23:30:23 -0400 (EDT) Message-ID: <4DC8B14E.4050400@linuxfoundation.org> Date: Mon, 09 May 2011 23:30:22 -0400 From: Jeff Licquia User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Subject: Filesystem Hierarchy Standard (FHS) and FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 03:48:17 -0000 (Sorry if this isn't the proper list for this discussion. If not, please point me in the right direction.) The Linux Foundation's LSB workgroup has taken over maintenance of the Filesystem Hierarchy Standard, and is working on a number of updates needed since its last release in 2004. Despite all the "Linux" in the names above, we're wanting to make sure that the FHS remains independent of any particular UNIX implementation, and continues to be useful to non-Linux UNIXes. My question to you is: do you consider the FHS to be relevant to current and future development of FreeBSD? If not, is this simply due to lack of maintenance; would your interest in the FHS be greater with more consistent updates? If you are interested, consider this an invitation to participate. We've set up a mailing list, Web site, etc., and are reviving the old bug tracker. More details can be found here: http://www.linuxfoundation.org/collaborate/workgroups/lsb/fhs From owner-freebsd-fs@FreeBSD.ORG Tue May 10 09:11:33 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9793106566B for ; Tue, 10 May 2011 09:11:33 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 39CD68FC18 for ; Tue, 10 May 2011 09:11:32 +0000 (UTC) Received: by wwc33 with SMTP id 33so6264567wwc.31 for ; Tue, 10 May 2011 02:11:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to:x-mailer; bh=1nMLxYaio3WOyP7PedInUJFtQEwAVVd/fJcupwoV48w=; b=BI5SjqBqzPvzrLo1pHc9sty76jON6kVPK9zZcp+b6TkJkOjb01Mx1gUKIJmjLroeFC yAHMNoYW7o6FS8k1xWNRPAbUI1OPSN7qCizTlftwZpbCnpfcOrpdOYb0rIUC+y0/15A9 qhfnwG6RTp2ves6GdBKPmgj1nZkLcuPfTEO1c= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; b=IL0PskV2CqdhSe5tZyFnMwaHN282RRDD/wYpoN5E7u4WdLGg/5n1yr1zu/QKSnBoZy nbtPgPAYuYO4KHAQTEvQ/sCXYKzo9bMrxsRYpEqnup4sgNlX/ttwjNZybVyi2ZR8o24V 4+MndUTB79hnqCA5JGAtLXQOKZB/R3S1rrLUU= Received: by 10.216.239.71 with SMTP id b49mr2081624wer.107.1305017252755; Tue, 10 May 2011 01:47:32 -0700 (PDT) Received: from sime-imac.logos.hr ([213.147.110.159]) by mx.google.com with ESMTPS id y35sm130685weq.15.2011.05.10.01.47.31 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 May 2011 01:47:32 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: text/plain; charset=us-ascii From: =?iso-8859-2?Q?=A9imun_Mikecin?= In-Reply-To: <4DC8B14E.4050400@linuxfoundation.org> Date: Tue, 10 May 2011 10:47:24 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4DC8B14E.4050400@linuxfoundation.org> To: Jeff Licquia X-Mailer: Apple Mail (2.1084) Cc: freebsd-fs@freebsd.org Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 09:11:33 -0000 On 10. svi. 2011., at 05:30, Jeff Licquia wrote: > (Sorry if this isn't the proper list for this discussion. If not, = please point me in the right direction.) >=20 > The Linux Foundation's LSB workgroup has taken over maintenance of the = Filesystem Hierarchy Standard, and is working on a number of updates = needed since its last release in 2004. >=20 > Despite all the "Linux" in the names above, we're wanting to make sure = that the FHS remains independent of any particular UNIX implementation, = and continues to be useful to non-Linux UNIXes. >=20 > My question to you is: do you consider the FHS to be relevant to = current and future development of FreeBSD? If not, is this simply due = to lack of maintenance; would your interest in the FHS be greater with = more consistent updates? >=20 > If you are interested, consider this an invitation to participate. = We've set up a mailing list, Web site, etc., and are reviving the old = bug tracker. More details can be found here: >=20 > http://www.linuxfoundation.org/collaborate/workgroups/lsb/fhs FHS is Linux centric instead of trying to be useful to other non-Linux = UNIXes (like OpenGroup does). We already have hier(7): = http://www.FreeBSD.org/cgi/man.cgi?query=3Dhier&apropos=3D0&sektion=3D0&ma= npath=3DFreeBSD+8.2-RELEASE&format=3Dhtml From owner-freebsd-fs@FreeBSD.ORG Tue May 10 13:56:11 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 274CE1065673; Tue, 10 May 2011 13:56:11 +0000 (UTC) (envelope-from daichi@ongs.co.jp) Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90]) by mx1.freebsd.org (Postfix) with ESMTP id BFA268FC0C; Tue, 10 May 2011 13:56:10 +0000 (UTC) Received: from [192.168.15.190] (unknown [24.114.252.244]) by natial.ongs.co.jp (Postfix) with ESMTPSA id 4CC9412543B; Tue, 10 May 2011 22:39:29 +0900 (JST) From: Daichi GOTO Date: Tue, 10 May 2011 09:39:25 -0400 Message-Id: <39BCA797-BCE2-4A2A-AA7F-AD8A87014CD4@ongs.co.jp> To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: [Call for Test] unionfs intermediate umount feature X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 13:56:11 -0000 Hi unionfs users ;) We have developed new unionfs feature, "intermediate umount". You can do like this: # mount_unionfs /test2 /test1 # mount_unionfs /test3 /test1 # df :/test2 xxxxx xxxxx xxxxx xx% /test1 :/test3 xxxxx xxxxx xxxxx xx% /test1 # umount ':/test2' # df :/test3 xxxxx xxxxx xxxxx xx% /test1 # patch for current: = http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-intermediate= -umount.diff First, I want to know your opinion. Thanks :) ----- Daichi GOTO= From owner-freebsd-fs@FreeBSD.ORG Tue May 10 14:55:55 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32FE11065672 for ; Tue, 10 May 2011 14:55:55 +0000 (UTC) (envelope-from feld@feld.me) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 0462F8FC16 for ; Tue, 10 May 2011 14:55:54 +0000 (UTC) Received: by iyj12 with SMTP id 12so7517323iyj.13 for ; Tue, 10 May 2011 07:55:54 -0700 (PDT) Received: by 10.42.29.195 with SMTP id s3mr3636476icc.30.1305037886305; Tue, 10 May 2011 07:31:26 -0700 (PDT) Received: from tech304 (supranet-tech.secure-on.net [66.170.8.18]) by mx.google.com with ESMTPS id a1sm2867199ics.4.2011.05.10.07.31.24 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 10 May 2011 07:31:25 -0700 (PDT) Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <4DC8B14E.4050400@linuxfoundation.org> Date: Tue, 10 May 2011 09:31:23 -0500 MIME-Version: 1.0 Content-Transfer-Encoding: Quoted-Printable From: "Mark Felder" Message-ID: In-Reply-To: User-Agent: Opera Mail/11.50 (FreeBSD) Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 14:55:55 -0000 On Tue, 10 May 2011 03:47:24 -0500, =C5=A0imun Mikecin = wrote: > FHS is Linux centric instead of trying to be useful to other non-Linux= = > UNIXes (like OpenGroup does). > We already have hier(7): Jeff, This might seem rather blunt/rude but honestly I don't foresee the FreeB= SD = project integrating with the FHS. The hier(7) layout has been a lot of = work and it's something the project really strives to standardize on. As= a = FreeBSD user, we've been used to "porting" applications to not only run = on = FreeBSD but also adhere properly to the hier(7) structure. But if the FH= S = wants to adopt hier(7) we won't complain.... Regards, Mark From owner-freebsd-fs@FreeBSD.ORG Tue May 10 15:44:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7B50106564A for ; Tue, 10 May 2011 15:44:56 +0000 (UTC) (envelope-from josef.karthauser@unitedlane.com) Received: from k2smtpout03-01.prod.mesa1.secureserver.net (k2smtpout03-01.prod.mesa1.secureserver.net [64.202.189.171]) by mx1.freebsd.org (Postfix) with SMTP id 89A058FC1E for ; Tue, 10 May 2011 15:44:56 +0000 (UTC) Received: (qmail 29426 invoked from network); 10 May 2011 15:18:15 -0000 Received: from unknown (HELO ip-72.167.34.38.ip.secureserver.net) (72.167.34.38) by k2smtpout03-01.prod.mesa1.secureserver.net (64.202.189.171) with ESMTP; 10 May 2011 15:18:14 -0000 Received: (qmail 15827 invoked from network); 10 May 2011 11:16:50 -0400 Received: from p4.dhcp.tao.org.uk (90.155.77.83) by o3dh.com with (AES128-SHA encrypted) SMTP; 10 May 2011 11:16:50 -0400 Mime-Version: 1.0 (Apple Message framework v1082) Content-Type: text/plain; charset=us-ascii From: Dr Josef Karthauser In-Reply-To: <20110328105726.1928377ryc8ppkis@webmail.leidinger.net> Date: Tue, 10 May 2011 16:19:35 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <7D77C2D3-9CBC-4F14-A938-5EA0241043B2@unitedlane.com> References: <9CF23177-92D6-40C5-8C68-B7E2F88236E6@unitedlane.com> <20110326225430.00006a76@unknown> <3BBB1E36-8E09-4D07-B49E-ACA8548B0B44@unitedlane.com> <20110327075814.GA71131@icarus.home.lan> <20110327084355.GA71864@icarus.home.lan> <094E71D9-B28B-46DB-8EA9-B11F17D5A32A@unitedlane.com> <20110327094121.GA72701@icarus.home.lan> <980F394D-36FC-42F2-9F3F-A3C44A385600@unitedlane.com> <20110328105726.1928377ryc8ppkis@webmail.leidinger.net> To: Alexander Leidinger X-Mailer: Apple Mail (2.1082) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS Problem - full disk, can't recover space :(. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 15:44:56 -0000 On 28 Mar 2011, at 09:57, Alexander Leidinger wrote: > Quoting Dr Josef Karthauser (from = Sun, 27 Mar 2011 11:01:04 +0100): >=20 >> I'd really like my disk space back though please! I suspect that I'm = going to have to wait for 28 to have that happen though :(. >=20 > As an intermediate action you could export the pool, boot a 9-current = live-image, import the pool there and export it again. I do not know if = you need to do a scrub or not to recover the free space or not. AFAIK = you do not need to update to v28, the new code should take care about = the issue without an update. >=20 > This will not prevent loosing space again, but at least it should give = back the lost space for the moment. >=20 Ok, I've finally got around to do this. It hasn't recovered the space, = but it at least is billing it to the right file system now: infinity# zfs list void/j/legacy-alpha=20 NAME USED AVAIL REFER MOUNTPOINT void/j/legacy-alpha 58.9G 4.11G 56.9G /j/legacy-alpha # du -hs /j/legacy-alpha 34G /j/legacy-alpha Hmm. It's a pain that I've currently only got 24gb free on this pool, = otherwise I could do a copy and destroy to try and free it up. Joe From owner-freebsd-fs@FreeBSD.ORG Tue May 10 16:28:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 22DEC106566C for ; Tue, 10 May 2011 16:28:36 +0000 (UTC) (envelope-from licquia@linuxfoundation.org) Received: from rimu.licquia.org (rimu.licquia.org [72.249.37.24]) by mx1.freebsd.org (Postfix) with ESMTP id EF7C08FC08 for ; Tue, 10 May 2011 16:28:34 +0000 (UTC) Received: from server1.internal.licquia.org (c-98-220-117-231.hsd1.in.comcast.net [98.220.117.231]) by rimu.licquia.org (Postfix) with ESMTPS id 6C24740334 for ; Tue, 10 May 2011 11:28:07 -0500 (CDT) Received: from server1.internal.licquia.org (localhost.localdomain [127.0.0.1]) by server1.internal.licquia.org (Postfix) with ESMTP id BFEE198066 for ; Tue, 10 May 2011 12:28:33 -0400 (EDT) Received: from [192.168.50.14] (unknown [192.168.50.14]) by server1.internal.licquia.org (Postfix) with ESMTP id A784098063 for ; Tue, 10 May 2011 12:28:33 -0400 (EDT) Message-ID: <4DC967B1.70102@linuxfoundation.org> Date: Tue, 10 May 2011 12:28:33 -0400 From: Jeff Licquia User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DC8B14E.4050400@linuxfoundation.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 16:28:36 -0000 On 05/10/2011 10:31 AM, Mark Felder wrote: > This might seem rather blunt/rude but honestly I don't foresee the > FreeBSD project integrating with the FHS. The hier(7) layout has been a > lot of work and it's something the project really strives to standardize > on. As a FreeBSD user, we've been used to "porting" applications to not > only run on FreeBSD but also adhere properly to the hier(7) structure. > But if the FHS wants to adopt hier(7) we won't complain.... Not at all; that's precisely the kind of feedback I'm looking for. From owner-freebsd-fs@FreeBSD.ORG Tue May 10 19:30:31 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50BC01065676; Tue, 10 May 2011 19:30:31 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 27C808FC22; Tue, 10 May 2011 19:30:31 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4AJUVuT080424; Tue, 10 May 2011 19:30:31 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4AJUUhg080412; Tue, 10 May 2011 19:30:31 GMT (envelope-from linimon) Date: Tue, 10 May 2011 19:30:31 GMT Message-Id: <201105101930.p4AJUUhg080412@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/156933: [zfs] ZFS receive after read on readonly=on filesystem is corrupted without warning X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 19:30:31 -0000 Old Synopsis: ZFS receive after read on readonly=on filesystem is corrupted without warning New Synopsis: [zfs] ZFS receive after read on readonly=on filesystem is corrupted without warning Responsible-Changed-From-To: freebsd-bugs->freebsd-fs Responsible-Changed-By: linimon Responsible-Changed-When: Tue May 10 19:30:17 UTC 2011 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=156933 From owner-freebsd-fs@FreeBSD.ORG Tue May 10 20:17:32 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 200CF1065673 for ; Tue, 10 May 2011 20:17:32 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 924358FC0A for ; Tue, 10 May 2011 20:17:30 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.74 (FreeBSD)) (envelope-from ) id 1QJtMr-0002VQ-Jz; Tue, 10 May 2011 16:17:17 -0400 Date: Tue, 10 May 2011 16:17:17 -0400 From: Gary Palmer To: Jeff Licquia Message-ID: <20110510201717.GA37035@in-addr.com> References: <4DC8B14E.4050400@linuxfoundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DC8B14E.4050400@linuxfoundation.org> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 May 2011 20:17:32 -0000 On Mon, May 09, 2011 at 11:30:22PM -0400, Jeff Licquia wrote: > (Sorry if this isn't the proper list for this discussion. If not, > please point me in the right direction.) You may wish to query freebsd-arch@freebsd.org - freebsd-fs is more aimed at filesystem implementation rather than how the directory hierarchy is organized on top of the filesystem. Moving FreeBSD to a Linux Foundation FHS standard is something that strikes me as being more an architectural discussion, and perhaps a CC to freebsd-standards@freebsd.org. However, I think the answers referring you to hier(7) is certainly a starting point. A glance at the FHS standard seems to also place requirements on which files and/or programs are in certain locations and also require certain config files are called certain things, which goes beyond hier(7). Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Wed May 11 09:48:47 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5B301106564A for ; Wed, 11 May 2011 09:48:47 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184]) by mx1.freebsd.org (Postfix) with ESMTP id E1E8F8FC15 for ; Wed, 11 May 2011 09:48:46 +0000 (UTC) Received: from [203.206.171.212] (helo=[192.168.10.10]) by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1QK5ju-0006WU-78 for freebsd-fs@freebsd.org; Wed, 11 May 2011 19:29:55 +1000 Message-ID: <4DCA5620.1030203@dannysplace.net> Date: Wed, 11 May 2011 19:25:52 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29) X-Date: 2011-05-11 19:29:54 X-Connected-IP: 203.206.171.212:58231 X-Message-Linecount: 35 X-Body-Linecount: 24 X-Message-Size: 1393 X-Body-Size: 944 X-Received-Count: 1 X-Recipient-Count: 1 X-Local-Recipient-Count: 1 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on damka.dannysplace.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net) Subject: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 09:48:47 -0000 Hello all. I've been using ZFS for some time now and have never had an issued (except perhaps the issue of speed...) When v28 is taken into -STABLE I will most likely upgrade to v28 at that point. Currently I am running v15 with v4 on disk. When I move to v28 I will probably wish to enable a L2Arc and also perhaps dedicated log devices. I'm curious about a few things however. 1. Can I remove either the L2 ARC or the log devices if things don't go as planned or if I need to free up some resources? 2. What are the best practices for setting up these? Would a geom mirror for the log device be the way to go. Or can you just let ZFS mirror the log itself? 3. What happens when one or both of the log devices fail. Does ZFS come to a crashing halt and kill all the data? Or does it simply complain that the ZIL is no longer active and continue on it's merry way? In short, what is the best way to set up these two features? -D From owner-freebsd-fs@FreeBSD.ORG Wed May 11 10:06:58 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E905106564A for ; Wed, 11 May 2011 10:06:58 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id 673DB8FC0A for ; Wed, 11 May 2011 10:06:58 +0000 (UTC) Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51]) by qmta04.emeryville.ca.mail.comcast.net with comcast id iA0Q1g00216AWCUA4A6yVt; Wed, 11 May 2011 10:06:58 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta06.emeryville.ca.mail.comcast.net with comcast id iA6w1g0041t3BNj8SA6wjS; Wed, 11 May 2011 10:06:57 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 0B189102C36; Wed, 11 May 2011 03:06:56 -0700 (PDT) Date: Wed, 11 May 2011 03:06:56 -0700 From: Jeremy Chadwick To: Danny Carroll Message-ID: <20110511100655.GA35129@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DCA5620.1030203@dannysplace.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 10:06:58 -0000 On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: > I've been using ZFS for some time now and have never had an issued > (except perhaps the issue of speed...) > When v28 is taken into -STABLE I will most likely upgrade to v28 at that > point. Currently I am running v15 with v4 on disk. > > When I move to v28 I will probably wish to enable a L2Arc and also > perhaps dedicated log devices. > > I'm curious about a few things however. > > 1. Can I remove either the L2 ARC or the log devices if things don't go > as planned or if I need to free up some resources? You can remove L2ARC ("cache") devices without impact, but you cannot remove all log devices without the pool needing to be destroyed (recreated). Please keep reading for details of log devices. L2ARC devices should primarily be something with extremely fast read rates (e.g. SSDs). USB1.x and 2.x memory sticks do not work well for this purpose given protocol and bus speed limits + overhead. (I only mention them because people often think "Oh, USB flash would work great for this!" I disagree.) Furthermore, something I found out on my own: the L2ARC is completely lost in the case the system is cleanly rebooted. This sometimes surprises people (myself included) since L2ARC uses actual storage devices; one might think the data is "restored" on reboot, but it isn't (because the ARC ("layer 1") itself is lost on reboot, obviously). The only way to see how much disk space a cache device is using -- to my knowledge -- is via "zpool iostat -v". > 2. What are the best practices for setting up these? Would a geom > mirror for the log device be the way to go. Or can you just let ZFS > mirror the log itself? Let ZFS handle it. There is no purpose (in my opinion) to added complexity when ZFS can handle it itself. The KISS concept applies greatly here. In the case of ZFS intent logs, you definitely want a mirror. If you have a single log device, loss of that device can/will result in full data loss of the pool which makes use of the log device. Furthermore, a log device is limited to a single pool; e.g. you cannot use the same log device (e.g. ada6) on pool "foo" and pool "bar". It's one or the other. You should read **all** of the data points listed below and pay close attention to the details: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices > 3. What happens when one or both of the log devices fail. Does ZFS > come to a crashing halt and kill all the data? Or does it simply > complain that the ZIL is no longer active and continue on it's merry way? See above. > In short, what is the best way to set up these two features? See the zpool(1) man page for details on how to make use of log devices. Examples are provided, including mirroring of such devices. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed May 11 10:37:14 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A016E1065672 for ; Wed, 11 May 2011 10:37:14 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 29FD48FC14 for ; Wed, 11 May 2011 10:37:13 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4BAb3P6048487 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Wed, 11 May 2011 13:37:08 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DCA66CF.7070608@digsys.bg> Date: Wed, 11 May 2011 13:37:03 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> In-Reply-To: <20110511100655.GA35129@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 10:37:14 -0000 On 11.05.11 13:06, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: >> When I move to v28 I will probably wish to enable a L2Arc and also >> perhaps dedicated log devices. >> > In the case of ZFS intent logs, you definitely want a mirror. If you > have a single log device, loss of that device can/will result in full > data loss of the pool which makes use of the log device. This is true for v15 pools, not true for v28 pools. In ZFS v28 you can remove log devices and in the case of sudden loss of log device (or whatever) roll back the pool to a 'good' state. Therefore, for most installations single log device might be sufficient. If you value your data, you will of course use mirrored log devices, possibly in hot-swap configuration and .. have a backup :) By the way, the SLOG (separate LOG) does not have to be SSD at all. Separate rotating disk(s) will also suffice -- it all depends on the type of workload. SSDs are better, for the higher end, because of the low latency (but not all SSDs are low latency when writing!). The idea of the SLOG is to separate the ZIL records from the main data pool. ZIL records are small, even smaller in v28, but will cause unnecessary head movements if kept in the main pool. The SLOG is "write once, read on failure" media and is written sequentially. Almost all current HDDs offer reasonable sequential write performance for small to medium pools. The L2ARC needs to be fast reading SSD. It is populated slowly, few MB/sec so there is no point to have fast and high-bandwidth write-optimized SSD. The benefit from L2ARC is the low latency. Sort of slower RAM. It is bad idea to use the same SSD for both SLOG and L2ARC, because most SSDs behave poorly if you present them with high read and high write loads. More expensive units might behave, but then... if you pay few k$ for a SSD, you know what you need :) Daniel From owner-freebsd-fs@FreeBSD.ORG Wed May 11 10:51:20 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D9B1A106564A for ; Wed, 11 May 2011 10:51:20 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.westchester.pa.mail.comcast.net (qmta10.westchester.pa.mail.comcast.net [76.96.62.17]) by mx1.freebsd.org (Postfix) with ESMTP id 85A1C8FC0A for ; Wed, 11 May 2011 10:51:20 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta10.westchester.pa.mail.comcast.net with comcast id i9fk1g0031vXlb85AArLwh; Wed, 11 May 2011 10:51:20 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.westchester.pa.mail.comcast.net with comcast id iArJ1g0251t3BNj3dArK5e; Wed, 11 May 2011 10:51:20 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 6E596102C36; Wed, 11 May 2011 03:51:17 -0700 (PDT) Date: Wed, 11 May 2011 03:51:17 -0700 From: Jeremy Chadwick To: Daniel Kalchev Message-ID: <20110511105117.GA36571@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DCA66CF.7070608@digsys.bg> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 10:51:21 -0000 On Wed, May 11, 2011 at 01:37:03PM +0300, Daniel Kalchev wrote: > On 11.05.11 13:06, Jeremy Chadwick wrote: > >On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: > >>When I move to v28 I will probably wish to enable a L2Arc and also > >>perhaps dedicated log devices. > >> > >In the case of ZFS intent logs, you definitely want a mirror. If you > >have a single log device, loss of that device can/will result in full > >data loss of the pool which makes use of the log device. > > This is true for v15 pools, not true for v28 pools. In ZFS v28 you > can remove log devices and in the case of sudden loss of log device > (or whatever) roll back the pool to a 'good' state. Therefore, for > most installations single log device might be sufficient. If you > value your data, you will of course use mirrored log devices, > possibly in hot-swap configuration and .. have a backup :) Has anyone actually *tested* this on FreeBSD? Set up a single log device on classic (non-CAM/non-ahci.ko) ATA, then literally yank the disk out to induce a very bad/rude failure? Does the kernel panic or anything weird happen? I fully acknowledge that in ZFS pool v19 and higher the issue is fixed (at least on Solaris/OpenSolaris), but at this point in time the RELEASE and STABLE branches are running pool version 15. There are numerous ongoing discussions about the ZFS v28 patches right now with regards to STABLE specifically. Recent threads: - Patch did not apply correctly (errors/rejections) - Patch applied correctly but build failed (use "patch -E" I believe?) - Discussion about when v28 is *truly* coming to RELENG_8 and if it's truly ready for RELENG_8 And finally, there's the one thing that people often forget/miss: if you upgrade your pool from v15 to v28 (needed to address the log removal stuff you mention), you cannot roll back without recreating all of your pools. Folks considering v28 need to take that into consideration. > By the way, the SLOG (separate LOG) does not have to be SSD at all. > Separate rotating disk(s) will also suffice -- it all depends on the > type of workload. SSDs are better, for the higher end, because of > the low latency (but not all SSDs are low latency when writing!). I didn't state log devices should be SSDs. I stated cache devices (L2ARC) should be SSDs. :-) A non-high-end SSD for a log device is probably a very bad idea given the sub-par write speeds, agreed. A FusionIO card/setup on the other hand would probably work wonderfully, but that's much more expensive (you cover that below). > The idea of the SLOG is to separate the ZIL records from the main > data pool. ZIL records are small, even smaller in v28, but will > cause unnecessary head movements if kept in the main pool. The SLOG > is "write once, read on failure" media and is written sequentially. > Almost all current HDDs offer reasonable sequential write > performance for small to medium pools. > > The L2ARC needs to be fast reading SSD. It is populated slowly, few > MB/sec so there is no point to have fast and high-bandwidth > write-optimized SSD. The benefit from L2ARC is the low latency. Sort > of slower RAM. Agreed, and the overall point to L2ARC is to help with improved random reads, if I remember right. The concept is that it's a 2nd layer of caching that shouldn't hurt or hinder performance when used/put in place, but can greatly help when the "layer 1" ARC lacks an entry. > It is bad idea to use the same SSD for both SLOG and L2ARC, because > most SSDs behave poorly if you present them with high read and high > write loads. More expensive units might behave, but then... if you > pay few k$ for a SSD, you know what you need :) Again, agreed. Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks should also keep that in mind when putting an SSD into use in this fashion. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed May 11 11:17:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C5FB1065675 for ; Wed, 11 May 2011 11:17:53 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id C83CD8FC1F for ; Wed, 11 May 2011 11:17:52 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4BBHgIC048656 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Wed, 11 May 2011 14:17:47 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DCA7056.20200@digsys.bg> Date: Wed, 11 May 2011 14:17:42 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: Jeremy Chadwick References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> In-Reply-To: <20110511105117.GA36571@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 11:17:53 -0000 On 11.05.11 13:51, Jeremy Chadwick wrote: > > Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > should also keep that in mind when putting an SSD into use in this > fashion. > By the way, what would be the use of TRIM for SLOG and L2ARC devices? I see absolutely no benefit from TRIM for the L2ARC, because it is written slowly (on purpose). Any current, or 1-2 generations back SSD would handle that write load without TRIM and without any performance degradation. Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC SSD for the SLOG, for many reasons. The write regions on the SLC NAND should be smaller (my wild guess, current practice may differ) and the need for rewriting will be small. If you don't need to rewrite already written data, TRIM does not help. Also, as far as I understand, most "serious" SSDs (typical for SLC I guess) would have twice or more the advertised size and always write to fresh cells, scheduling an background erase of the 'overwritten' cell. Does Solaris have TRIM for ZFS? Where? How does it help? I can imagine TRIM for the data pool, that would be good fit for ZFS, but SSD-only pool.. are we already there? Daniel From owner-freebsd-fs@FreeBSD.ORG Wed May 11 12:08:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0CF36106566C for ; Wed, 11 May 2011 12:08:34 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta13.westchester.pa.mail.comcast.net (qmta13.westchester.pa.mail.comcast.net [76.96.59.243]) by mx1.freebsd.org (Postfix) with ESMTP id AA4F28FC0A for ; Wed, 11 May 2011 12:08:33 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta13.westchester.pa.mail.comcast.net with comcast id iAtb1g0051vXlb85DC8ZhW; Wed, 11 May 2011 12:08:33 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.westchester.pa.mail.comcast.net with comcast id iC8X1g01X1t3BNj3dC8YDG; Wed, 11 May 2011 12:08:33 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 7BB6B102C36; Wed, 11 May 2011 05:08:30 -0700 (PDT) Date: Wed, 11 May 2011 05:08:30 -0700 From: Jeremy Chadwick To: Daniel Kalchev Message-ID: <20110511120830.GA37515@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DCA7056.20200@digsys.bg> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 12:08:34 -0000 On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > On 11.05.11 13:51, Jeremy Chadwick wrote: > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > >should also keep that in mind when putting an SSD into use in this > >fashion. > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > I see absolutely no benefit from TRIM for the L2ARC, because it is > written slowly (on purpose). Any current, or 1-2 generations back SSD > would handle that write load without TRIM and without any performance > degradation. > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > SSD for the SLOG, for many reasons. The write regions on the SLC > NAND should be smaller (my wild guess, current practice may differ) > and the need for rewriting will be small. If you don't need to > rewrite already written data, TRIM does not help. Also, as far as I > understand, most "serious" SSDs (typical for SLC I guess) would have > twice or more the advertised size and always write to fresh cells, > scheduling an background erase of the 'overwritten' cell. AFAIK, drive manufacturers do not disclose just how much reallocation space they keep available on an SSD. I'd rather not speculate as to how much, as I'm certain it varies per vendor. I can talk a bit about SSD drive performance from a consumer level (that is to say: low-end consumer Intel SSDs such as the X25-V, X25-M, and latest 320 and 510 series -- I use them all over the place), both before and after TRIM operations. I don't use any of these SSDs on ZFS however, only UFS (and I have seen the results both before and after TRIM support was added to UFS; and yeah, all the drives are running the latest firmware). What's confusing to me is why someone would say TRIM doesn't really matter in the case of an intent log device or a cache device; these devices both implement some degree of write operations, correct? The drive has to erase the NAND flash block (well, page really) before the block can be re-used (written to once again), so by not doing TRIM effectively you're relying 100% on drives' garbage collection mechanisms, which isn't that great (at least WRT the above drives). There are some sites that go over Intel SSD performance out-of-the-box as well as once its been used for a bit, and the performance difference is pretty substantial (50% drop in performance for reads, ~60-70% drop in performance for writes). Something to keep in mind. Furthermore, most people aren't buying SLC given the cost. Right now the absolute #1 or #2 focus of any operation is to save money; one cannot argue with the current economic condition. I think this is also why many SSD companies are focusing primarily on MLC right now; they know the majority of their client base isn't going to spend the money for SLC. > Does Solaris have TRIM for ZFS? Where? How does it help? I can > imagine TRIM for the data pool, that would be good fit for ZFS, but > SSD-only pool.. are we already there? The following blog post, and mailing list thread, provides answers to all of the above questions, including why TRIM is useful on ZFS (see the comments section; not referring to slog or cache however). But it doesn't look like it's actually made use of in ZFS as of January 2011. There is a long discussion about ZFS, TRIM, and slog/cache in the 2nd thread. There's also a reply from pjd@ in there WRT FreeBSD. http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/44855 There's a recommendation to permit TRIM in ZFS, but limit the number of txgs based on a sysctl, since TRIM is a slow/expensive operation. SSDs are neat, but man, NAND-based flash sure makes me a sad panda. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Wed May 11 13:16:27 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 498841065670 for ; Wed, 11 May 2011 13:16:27 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id F01448FC08 for ; Wed, 11 May 2011 13:16:26 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155D6E.dip.t-dialin.net [91.21.93.110]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E1D4F844015; Wed, 11 May 2011 15:16:12 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id D4A1E1279; Wed, 11 May 2011 15:16:09 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4BDG78o064021; Wed, 11 May 2011 15:16:07 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Wed, 11 May 2011 15:16:07 +0200 Message-ID: <20110511151607.105949eypk3ed3c4@webmail.leidinger.net> Date: Wed, 11 May 2011 15:16:07 +0200 From: Alexander Leidinger To: Jeremy Chadwick References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> In-Reply-To: <20110511100655.GA35129@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: E1D4F844015.A079B X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=2.3, required 6, autolearn=disabled, MANGLED_LOAN 2.30) X-EBL-MailScanner-SpamScore: ss X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305724573.53846@wn90bcMmycaD21chBPo8UQ X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 13:16:27 -0000 Quoting Jeremy Chadwick (from Wed, 11 May 2011 03:06:56 -0700): > L2ARC devices should primarily be something with extremely fast read > rates (e.g. SSDs). USB1.x and 2.x memory sticks do not work well for > this purpose given protocol and bus speed limits + overhead. (I only > mention them because people often think "Oh, USB flash would work great > for this!" I disagree.) Using USB flash may work acceptable. It depends upon the rest of the system. If you have very fast harddisks (or only USB 1 hardware), USB flash will not give you a faster FS. If you have slow (and low-power) desktop disks, a fast USB flash (attention, there are also slow ones) connected via USB 2 (or 3) will give you a speed improvement you notice. As a matter of fact, I have this: - Pentium 4 - 1 GB RAM - 1 Western Digital Caviar Blue - 2 Seagate Barracuda 7200.10 - an ICH5 controller (no NCQ) - no name cheap give-away 1 GB USB flash (so not a very fast one) The disks are used in a RAIDZ, with the USB flash as a cache device. My use case was connecting to a webmail system over a slow line (ADSL 224 kilobit/s). I noticed directly when the cache was in use or not. I also have another system, ICH 10 with NCQ, 5 disks (WD RE4 RAID) in RAIDZ2, Intel Xeon 4-core, 12 GB RAM. There USB flash does not make sense at all (and the SSD makes sense if you compare the price of the entire system with the price of a small or medium SSD). For the first system, it does not make sense to spend 200 units of money for a SSD, the system itself is not worth much more now. Spending 5-10 units of money for this system is ok, and gives a speed improvement. Bye, Alexander. -- Even God cannot change the past. -- Joseph Stalin http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Wed May 11 13:57:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8A88C106566B for ; Wed, 11 May 2011 13:57:19 +0000 (UTC) (envelope-from rs@bytecamp.net) Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9]) by mx1.freebsd.org (Postfix) with ESMTP id 1689F8FC0C for ; Wed, 11 May 2011 13:57:18 +0000 (UTC) Received: (qmail 50429 invoked by uid 89); 11 May 2011 15:57:17 +0200 Received: from stella.bytecamp.net (HELO ?212.204.60.37?) (rs%bytecamp.net@212.204.60.37) by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP; 11 May 2011 15:57:17 +0200 Message-ID: <4DCA95BD.6000401@bytecamp.net> Date: Wed, 11 May 2011 15:57:17 +0200 From: Robert Schulze Organization: bytecamp GmbH User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: regarding vfs.zfs.scrub_limit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 13:57:19 -0000 Hi, this value defaults to 10 on 8-STABLE. Is it possible/reasonable to raise that value in order to get a scrub finishing in less time? If so, which values could be recommended? with kind regards, Robert Schulze From owner-freebsd-fs@FreeBSD.ORG Wed May 11 13:59:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3721E1065674 for ; Wed, 11 May 2011 13:59:00 +0000 (UTC) (envelope-from james@jlauser.net) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id E84BA8FC1F for ; Wed, 11 May 2011 13:58:59 +0000 (UTC) Received: by vxc34 with SMTP id 34so477487vxc.13 for ; Wed, 11 May 2011 06:58:59 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.112.130 with SMTP id iq2mr7597018vdb.216.1305122339087; Wed, 11 May 2011 06:58:59 -0700 (PDT) Received: by 10.220.177.199 with HTTP; Wed, 11 May 2011 06:58:59 -0700 (PDT) X-Originating-IP: [13.13.16.2] In-Reply-To: <4DCA66CF.7070608@digsys.bg> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> Date: Wed, 11 May 2011 09:58:59 -0400 Message-ID: From: "James L. Lauser" To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 13:59:00 -0000 On Wed, May 11, 2011 at 6:37 AM, Daniel Kalchev wrote: > > > On 11.05.11 13:06, Jeremy Chadwick wrote: > >> On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote: >> >>> When I move to v28 I will probably wish to enable a L2Arc and also >>> perhaps dedicated log devices. >>> >>> In the case of ZFS intent logs, you definitely want a mirror. If you >> have a single log device, loss of that device can/will result in full >> data loss of the pool which makes use of the log device. >> > > This is true for v15 pools, not true for v28 pools. In ZFS v28 you can > remove log devices and in the case of sudden loss of log device (or > whatever) roll back the pool to a 'good' state. Therefore, for most > installations single log device might be sufficient. If you value your data, > you will of course use mirrored log devices, possibly in hot-swap > configuration and .. have a backup :) > > By the way, the SLOG (separate LOG) does not have to be SSD at all. > Separate rotating disk(s) will also suffice -- it all depends on the type of > workload. SSDs are better, for the higher end, because of the low latency > (but not all SSDs are low latency when writing!). > > The idea of the SLOG is to separate the ZIL records from the main data > pool. ZIL records are small, even smaller in v28, but will cause unnecessary > head movements if kept in the main pool. The SLOG is "write once, read on > failure" media and is written sequentially. Almost all current HDDs offer > reasonable sequential write performance for small to medium pools. > > The L2ARC needs to be fast reading SSD. It is populated slowly, few MB/sec > so there is no point to have fast and high-bandwidth write-optimized SSD. > The benefit from L2ARC is the low latency. Sort of slower RAM. > > It is bad idea to use the same SSD for both SLOG and L2ARC, because most > SSDs behave poorly if you present them with high read and high write loads. > More expensive units might behave, but then... if you pay few k$ for a SSD, > you know what you need :) > > Daniel > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > I recently learned the hard way that you need to be very careful what you choose as your ZIL. On my personal file server, my pool is comprised of 4x 500 GB disks in a RAID-Z and 2x 1.5 TB disks in a mirror. I also had a 1 GB Compact Flash card plugged into an IDE adapter, running as the ZIL. For the longest time, my write performance was capped at about 5 MB/sec. In an attempt to figure out why, I ran gstat, to see that the CF device was pegged at 100%. Having recently upgraded to ZFSv28, I decided to try removing the log device. Write performance instantly jumped to 45 MB/sec. Lesson learned... If you're going to have a dedicated ZIL, make sure its write performance exceeds the performance of the pool itself. On the other hand, again having upgrading to v28, I attempted to use deduplication on my pool. Write performance dropped to an abysmal 1 MB/sec. Why? Because, as I found out, my system doesn't have enough memory to keep the dedupe table in memory, nor can it be upgraded to. But with the application of a sufficiently large cache device, performance goes right back up to where it's supposed to be. -- James L. Lauser james@jlauser.net http://jlauser.net/ From owner-freebsd-fs@FreeBSD.ORG Wed May 11 18:26:34 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 12394106564A for ; Wed, 11 May 2011 18:26:34 +0000 (UTC) (envelope-from roberto@keltia.freenix.fr) Received: from keltia.net (centre.keltia.net [IPv6:2a01:240:fe5c::41]) by mx1.freebsd.org (Postfix) with ESMTP id C000F8FC15 for ; Wed, 11 May 2011 18:26:33 +0000 (UTC) Received: from rron-2.local (unknown [192.75.139.250]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: roberto) by keltia.net (Postfix/TLS) with ESMTPSA id 8DFBCE11E; Wed, 11 May 2011 20:26:30 +0200 (CEST) Date: Wed, 11 May 2011 20:26:33 +0200 From: Ollivier Robert To: freebsd-fs@freebsd.org, Robert Schulze Message-ID: <20110511182632.GA14921@rron-2.local> References: <4DC25DA6.3060009@bytecamp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7.2 / Dell D820 SMP User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Subject: Re: zfs l2arc issue X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 18:26:34 -0000 According to Artem Belevich: > There was an issue with clock_t type overflow . It was fixed in > r218429 on Feb 8th in 8-stable. I can confirm that it does indeed fix the issue. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.net In memoriam to Ondine, our 2nd child: http://ondine.keltia.net/ From owner-freebsd-fs@FreeBSD.ORG Wed May 11 22:38:57 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71081106564A for ; Wed, 11 May 2011 22:38:57 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 2467A8FC14 for ; Wed, 11 May 2011 22:38:57 +0000 (UTC) Received: by iwn33 with SMTP id 33so1288320iwn.13 for ; Wed, 11 May 2011 15:38:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:date:from:to:cc:subject:message-id :references:mime-version:content-type:content-disposition :in-reply-to:x-openpgp-key-id:x-openpgp-key-fingerprint :x-openpgp-key-url; bh=XOWoUv0xjw8dEXQmU00hfRE0ZfN/M9+Lje7UDJImJuE=; b=qFoCI9G6fG44/tl8DKXFxtYbxl7tOTY6mfh2l72iSP+EVGJubABGBCreLX5nL9601a H0A0psHO4VFpfkgcBPjeED46CfhziMB4SV4/Dv3nkIsfsE/LFxA8kGCaSep9r164ggCB UorEOa6tTkcu2BuHg1kKiQGSdtVxP7VjHwdQc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:x-openpgp-key-id :x-openpgp-key-fingerprint:x-openpgp-key-url; b=tZB/DP1UQnjB1A74N800ltnqZ/rxbFDn2p5Is6fv/eGSNUUmhAGM9X4WuGaaiz9uLc hv4TPs85cFM70W5zqznHi+fS6O0UebEV2HzC4ELWnijlzQxrliAhL70xmdbNYxiHNY2a cjc27Y1Urch7BtinFQndWVKJw8nULmsFD4ITg= Received: by 10.42.170.3 with SMTP id d3mr2102112icz.438.1305153536037; Wed, 11 May 2011 15:38:56 -0700 (PDT) Received: from DataIX.net (adsl-99-190-84-116.dsl.klmzmi.sbcglobal.net [99.190.84.116]) by mx.google.com with ESMTPS id d9sm205968ibb.2.2011.05.11.15.38.53 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 11 May 2011 15:38:54 -0700 (PDT) Sender: "J. Hellenthal" Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.4/8.14.4) with ESMTP id p4BMconB074933 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 11 May 2011 18:38:51 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.4/8.14.4/Submit) id p4BMcnFu074932; Wed, 11 May 2011 18:38:49 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Wed, 11 May 2011 18:38:49 -0400 From: Jason Hellenthal To: Jeremy Chadwick Message-ID: <20110511223849.GA65193@DataIX.net> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="jRHKVT23PllUwdXP" Content-Disposition: inline In-Reply-To: <20110511120830.GA37515@icarus.home.lan> X-OpenPGP-Key-Id: 0x89D8547E X-OpenPGP-Key-Fingerprint: 85EF E26B 07BB 3777 76BE B12A 9057 8789 89D8 547E X-OpenPGP-Key-URL: http://bit.ly/0x89D8547E Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2011 22:38:57 -0000 --jRHKVT23PllUwdXP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Jeremy, On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > >should also keep that in mind when putting an SSD into use in this > > >fashion. > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > would handle that write load without TRIM and without any performance > > degradation. > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > SSD for the SLOG, for many reasons. The write regions on the SLC > > NAND should be smaller (my wild guess, current practice may differ) > > and the need for rewriting will be small. If you don't need to > > rewrite already written data, TRIM does not help. Also, as far as I > > understand, most "serious" SSDs (typical for SLC I guess) would have > > twice or more the advertised size and always write to fresh cells, > > scheduling an background erase of the 'overwritten' cell. >=20 > AFAIK, drive manufacturers do not disclose just how much reallocation > space they keep available on an SSD. I'd rather not speculate as to how > much, as I'm certain it varies per vendor. >=20 Lets not forget here: The size of the separate log device may be quite=20 small. A rule of thumb is that you should size the separate log to be able= =20 to handle 10 seconds of your expected synchronous write workload. It would= =20 be rare to need more than 100 MB in a separate log device, but the=20 separate log must be at least 64 MB. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide So in other words how much is TRIM really even effective give the above ? Even with a high database write load on the disks at full compacity of the= =20 incoming link I would find it hard to believe that anyone could get the=20 ZIL to even come close to 512MB. Given most SSD's come at a size greater than 32GB I hope this comes as a=20 early reminder that the ZIL you are buying that disk for is only going to= =20 be using a small percent of that disk and I hope you justify cost over its= =20 actual use. If you do happen to justify creating a ZIL for your pool then= =20 I hope that you partition it wisely to make use of the rest of the space=20 that is untouched. For all other cases I would reccomend if you still want to have a ZIL that= =20 you take some sort of PCI->SD CARD or USB stick into account with=20 mirroring.=20 --=20 Regards, (jhell) Jason Hellenthal --jRHKVT23PllUwdXP Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJNyw/5AAoJEJBXh4mJ2FR+besH/39USB9nnfhl5wL/rH+i7lpY 7lWVW48D0V8kbb2IAOSyGkIrUsvBqdHWmS6FJ5aYPzcrQVJg/ipiuY9c4n/SB9yy k7wF4PgU3uFFyluEKofsRLFtccCd+a5+U5QEdgoT2HXtcI6SNC0tk6dwUJL1M0uu Rzc3g7RQWF1hauDna7Mle13G43iQQThOTnpzWFVQFISQv3Nve/pYUVVXKKwS5e+n g+pS6NkImO6pb070BrAEwv4H4Xm0VBaFRIi2qV1Uc0J350vXjNIfWMBEO6Q4JNWV vBATQh7xR/OyttVXfAVnaohxdKsYhr34VqDdHjfSCsRlPZaH0ifSq6C0QLQeFhk= =o/7q -----END PGP SIGNATURE----- --jRHKVT23PllUwdXP-- From owner-freebsd-fs@FreeBSD.ORG Thu May 12 01:04:41 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E139E106564A for ; Thu, 12 May 2011 01:04:41 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta03.emeryville.ca.mail.comcast.net (qmta03.emeryville.ca.mail.comcast.net [76.96.30.32]) by mx1.freebsd.org (Postfix) with ESMTP id C23408FC12 for ; Thu, 12 May 2011 01:04:41 +0000 (UTC) Received: from omta15.emeryville.ca.mail.comcast.net ([76.96.30.71]) by qmta03.emeryville.ca.mail.comcast.net with comcast id iR241g0031Y3wxoA3R4hBV; Thu, 12 May 2011 01:04:41 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta15.emeryville.ca.mail.comcast.net with comcast id iR4Z1g00t1t3BNj8bR4bDo; Thu, 12 May 2011 01:04:36 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 92A1E102C36; Wed, 11 May 2011 18:04:33 -0700 (PDT) Date: Wed, 11 May 2011 18:04:33 -0700 From: Jeremy Chadwick To: Jason Hellenthal Message-ID: <20110512010433.GA48863@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110511223849.GA65193@DataIX.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 01:04:42 -0000 On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > > Jeremy, > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > > >should also keep that in mind when putting an SSD into use in this > > > >fashion. > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > > would handle that write load without TRIM and without any performance > > > degradation. > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > NAND should be smaller (my wild guess, current practice may differ) > > > and the need for rewriting will be small. If you don't need to > > > rewrite already written data, TRIM does not help. Also, as far as I > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > twice or more the advertised size and always write to fresh cells, > > > scheduling an background erase of the 'overwritten' cell. > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > space they keep available on an SSD. I'd rather not speculate as to how > > much, as I'm certain it varies per vendor. > > > > Lets not forget here: The size of the separate log device may be quite > small. A rule of thumb is that you should size the separate log to be able > to handle 10 seconds of your expected synchronous write workload. It would > be rare to need more than 100 MB in a separate log device, but the > separate log must be at least 64 MB. > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > So in other words how much is TRIM really even effective give the above ? > > Even with a high database write load on the disks at full compacity of the > incoming link I would find it hard to believe that anyone could get the > ZIL to even come close to 512MB. In the case of an SSD being used as a log device (ZIL), I imagine it would only matter the longer the drive was kept in use. I do not use log devices anywhere with ZFS, so I can't really comment. In the case of an SSD being used as a cache device (L2ARC), I imagine it would matter much more. In the case of an SSD being used as a pool device, it matters greatly. Why it matters: there's two methods of "reclaiming" blocks which were used: internal SSD "garbage collection" and TRIM. For a NAND block to be reclaimed, it has to be erased -- SSDs erase things in pages rather than individual LBAs. With TRIM, you submit the data management command via ATA with a list of LBAs you wish to inform the drive are no longer used. The drive aggregates the LBA ranges, determines if an entire flash page can be erased, and does it. If it can't, it makes some sort of mental note that the individual LBA (in some particular page) shouldn't be used. The "garbage collection" works when the SSD is idle. I have no idea what "idle" actually means operationally, because again, vendors don't disclose what the idle intervals are. 5 minutes? 24 hours? It matters, but they don't tell us. (What confuses me about the "idle GC" method is how it determines what it can erase -- if the OS didn't tell it what it's using, how does it know it can erase the page?) Anyway, how all this manifests itself performance-wise is intriguing. It's not speculation: there's hard evidence that not using TRIM results in SSD performance, bluntly put, sucking badly on some SSDs. There's this mentality that wear levelling completely solves all of the **performance** concerns -- that isn't the case at all. In fact, I'm under the impression it probably hurts performance, but it depends on how it's implemented within the drive firmware. bit-tech did an experiment using Windows 7 -- which supports and uses TRIM assuming the device advertises the capability -- with different models of SSDs. The testing procedure is documented here, but I'll document it as well: http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4 Again, remember, this is done on a Windows 7 system which does support TRIM if the device supports it. The testing steps, in this order: 1) SSD without TRIM support -- all LBAs are zeroed. 2) Took read/write benchmark readings. 3) SSD without TRIM support -- partitioned and formatted as NTFS (cluster size unknown), copied 100GB of data to the drive, deleted all the data, and repeated this method 10 times. 4) Step #2 repeated. 5) Upgraded SSD firmware to a version that supports TRIM. 6) SSD with TRIM support -- step #1 repeated. 7) Step #2 repeated. 8) SSD with TRIM support -- step #3 repeated. 9) Step #2 repeated. Without TRIM, some drives drop their read performance by more than 50%, and write performance by almost 70%. I'm focusing on Intel SSDs here, by the way. I do not care for OCZ or Corsair products. So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS on FreeBSD will mimic (to some degree). Therefore, simply put, users should be concerned when using ZFS on FreeBSD with SSDs. It doesn't matter to me if you're only using 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM means degraded performance over time. Can you refute any of this evidence? > Given most SSD's come at a size greater than 32GB I hope this comes as a > early reminder that the ZIL you are buying that disk for is only going to > be using a small percent of that disk and I hope you justify cost over its > actual use. If you do happen to justify creating a ZIL for your pool then > I hope that you partition it wisely to make use of the rest of the space > that is untouched. > > For all other cases I would reccomend if you still want to have a ZIL that > you take some sort of PCI->SD CARD or USB stick into account with > mirroring. Others have pointed out this isn't effective (re: USB sticks). The read and write speeds are too slow, and limit the overall performance of ZFS in a very bad way. I can absolutely confirm this claim (I've tested it myself, using a high-end USB flash drive as a cache device (L2ARC)). Alexander Leidinger pointed out that using a USB stick for cache/L2ARC *does* improve performance on older systems which have slower disk I/O (e.g. ICH5-based systems). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu May 12 01:48:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8683B106566B for ; Thu, 12 May 2011 01:48:56 +0000 (UTC) (envelope-from jhellenthal@gmail.com) Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com [209.85.214.182]) by mx1.freebsd.org (Postfix) with ESMTP id 37E9D8FC0C for ; Thu, 12 May 2011 01:48:56 +0000 (UTC) Received: by iwn33 with SMTP id 33so1411860iwn.13 for ; Wed, 11 May 2011 18:48:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:date:from:to:cc:subject:message-id :references:mime-version:content-type:content-disposition :in-reply-to:x-openpgp-key-id:x-openpgp-key-fingerprint :x-openpgp-key-url; bh=0juTFZwYiqAOITQrJKnlajsg9UTRtRcaAc1MlkxS2N4=; b=VzCiLex67NiTiEFFoKZMX1IsPnQ8RyIOfTJhqwIGVr738FxtDIOudRQd66HmYGG9dS relruV04Qbc2EPDSEg9GVmDetVOiB9O3FEbTyCcwzrHK3skoeYVtKlS6w+58rw185GCZ 4Mw2rwqFFppE/uB32aA4jWIwdvMfTsCYKH0lo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:x-openpgp-key-id :x-openpgp-key-fingerprint:x-openpgp-key-url; b=pXXYMPkDnUx7RxrrKiGNDIFsLVwmuYdmxS0X09GKIEfjh1RsUv0J+hLDYLQnlQkhYp yw5InLVWZ+J9wMFHf8ocXW1KzdVIJCXWsdR/vfC27k6WkaBVgsScge6oflVd+wojDvZe jcxrZ+kYPZRhkpf8Un2/LAsdBOO8WLsbOJ/YY= Received: by 10.43.62.134 with SMTP id xa6mr4315019icb.369.1305164934349; Wed, 11 May 2011 18:48:54 -0700 (PDT) Received: from DataIX.net (adsl-99-190-84-116.dsl.klmzmi.sbcglobal.net [99.190.84.116]) by mx.google.com with ESMTPS id hc41sm268704ibb.47.2011.05.11.18.48.52 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 11 May 2011 18:48:53 -0700 (PDT) Sender: "J. Hellenthal" Received: from DataIX.net (localhost [127.0.0.1]) by DataIX.net (8.14.4/8.14.4) with ESMTP id p4C1mok4085883 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 11 May 2011 21:48:50 -0400 (EDT) (envelope-from jhell@DataIX.net) Received: (from jhell@localhost) by DataIX.net (8.14.4/8.14.4/Submit) id p4C1mnE5085882; Wed, 11 May 2011 21:48:49 -0400 (EDT) (envelope-from jhell@DataIX.net) Date: Wed, 11 May 2011 21:48:48 -0400 From: Jason Hellenthal To: Jeremy Chadwick Message-ID: <20110512014848.GA35736@DataIX.net> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="17pEHd4RhPHOinZp" Content-Disposition: inline In-Reply-To: <20110512010433.GA48863@icarus.home.lan> X-OpenPGP-Key-Id: 0x89D8547E X-OpenPGP-Key-Fingerprint: 85EF E26B 07BB 3777 76BE B12A 9057 8789 89D8 547E X-OpenPGP-Key-URL: http://bit.ly/0x89D8547E Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 01:48:56 -0000 --17pEHd4RhPHOinZp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Jeremy, As always the qaulity of your messages are 101% spot on and I=20 always find some new new information that becomes handy more often than I= =20 could say, and there is always something to be learned.=20 Thanks. On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > >=20 > > Jeremy, > >=20 > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so fo= lks > > > > >should also keep that in mind when putting an SSD into use in this > > > > >fashion. > > > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC device= s? > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > > written slowly (on purpose). Any current, or 1-2 generations back = SSD > > > > would handle that write load without TRIM and without any performan= ce > > > > degradation. > > > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > > NAND should be smaller (my wild guess, current practice may differ) > > > > and the need for rewriting will be small. If you don't need to > > > > rewrite already written data, TRIM does not help. Also, as far as I > > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > > twice or more the advertised size and always write to fresh cells, > > > > scheduling an background erase of the 'overwritten' cell. > > >=20 > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > > space they keep available on an SSD. I'd rather not speculate as to = how > > > much, as I'm certain it varies per vendor. > > >=20 > >=20 > > Lets not forget here: The size of the separate log device may be quite= =20 > > small. A rule of thumb is that you should size the separate log to be a= ble=20 > > to handle 10 seconds of your expected synchronous write workload. It wo= uld=20 > > be rare to need more than 100 MB in a separate log device, but the=20 > > separate log must be at least 64 MB. > >=20 > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > >=20 > > So in other words how much is TRIM really even effective give the above= ? > >=20 > > Even with a high database write load on the disks at full compacity of = the=20 > > incoming link I would find it hard to believe that anyone could get the= =20 > > ZIL to even come close to 512MB. >=20 > In the case of an SSD being used as a log device (ZIL), I imagine it > would only matter the longer the drive was kept in use. I do not use > log devices anywhere with ZFS, so I can't really comment. >=20 > In the case of an SSD being used as a cache device (L2ARC), I imagine it > would matter much more. >=20 > In the case of an SSD being used as a pool device, it matters greatly. >=20 > Why it matters: there's two methods of "reclaiming" blocks which were > used: internal SSD "garbage collection" and TRIM. For a NAND block to be > reclaimed, it has to be erased -- SSDs erase things in pages rather > than individual LBAs. With TRIM, you submit the data management command > via ATA with a list of LBAs you wish to inform the drive are no longer > used. The drive aggregates the LBA ranges, determines if an entire > flash page can be erased, and does it. If it can't, it makes some sort > of mental note that the individual LBA (in some particular page) > shouldn't be used. >=20 > The "garbage collection" works when the SSD is idle. I have no idea > what "idle" actually means operationally, because again, vendors don't > disclose what the idle intervals are. 5 minutes? 24 hours? It > matters, but they don't tell us. (What confuses me about the "idle GC" > method is how it determines what it can erase -- if the OS didn't tell > it what it's using, how does it know it can erase the page?) >=20 > Anyway, how all this manifests itself performance-wise is intriguing. > It's not speculation: there's hard evidence that not using TRIM results > in SSD performance, bluntly put, sucking badly on some SSDs. >=20 > There's this mentality that wear levelling completely solves all of the > **performance** concerns -- that isn't the case at all. In fact, I'm > under the impression it probably hurts performance, but it depends on > how it's implemented within the drive firmware. >=20 > bit-tech did an experiment using Windows 7 -- which supports and uses > TRIM assuming the device advertises the capability -- with different > models of SSDs. The testing procedure is documented here, but I'll > document it as well: >=20 > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-perform= ance-and-trim/4 >=20 > Again, remember, this is done on a Windows 7 system which does support > TRIM if the device supports it. The testing steps, in this order: >=20 > 1) SSD without TRIM support -- all LBAs are zeroed. > 2) Took read/write benchmark readings. > 3) SSD without TRIM support -- partitioned and formatted as NTFS > (cluster size unknown), copied 100GB of data to the drive, deleted all > the data, and repeated this method 10 times. > 4) Step #2 repeated. > 5) Upgraded SSD firmware to a version that supports TRIM. > 6) SSD with TRIM support -- step #1 repeated. > 7) Step #2 repeated. > 8) SSD with TRIM support -- step #3 repeated. > 9) Step #2 repeated. >=20 > Without TRIM, some drives drop their read performance by more than 50%, > and write performance by almost 70%. I'm focusing on Intel SSDs here, > by the way. I do not care for OCZ or Corsair products. >=20 > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS > on FreeBSD will mimic (to some degree). >=20 > Therefore, simply put, users should be concerned when using ZFS on > FreeBSD with SSDs. It doesn't matter to me if you're only using > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > means degraded performance over time. >=20 > Can you refute any of this evidence? >=20 At least now at the moment NO. But I can say depending on how large of a=20 use of SSDs with OpenSolaris users from before the Oracle reaping that I=20 didnt recall seeing any relative bug reports on degradation. But like I=20 said... I havent seen them but thats not to say there wasnt a lack of use= =20 either. Definately more to look into, test, benchmark & test again. > > Given most SSD's come at a size greater than 32GB I hope this comes as = a=20 > > early reminder that the ZIL you are buying that disk for is only going = to=20 > > be using a small percent of that disk and I hope you justify cost over = its=20 > > actual use. If you do happen to justify creating a ZIL for your pool th= en=20 > > I hope that you partition it wisely to make use of the rest of the spac= e=20 > > that is untouched. > >=20 > > For all other cases I would reccomend if you still want to have a ZIL t= hat=20 > > you take some sort of PCI->SD CARD or USB stick into account with=20 > > mirroring. >=20 > Others have pointed out this isn't effective (re: USB sticks). The read > and write speeds are too slow, and limit the overall performance of ZFS > in a very bad way. I can absolutely confirm this claim (I've tested it > myself, using a high-end USB flash drive as a cache device (L2ARC)). >=20 > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC > *does* improve performance on older systems which have slower disk I/O > (e.g. ICH5-based systems). >=20 Agreed. Soon as the bus speed, write speeds are greater than the speeds=20 that USB 2.0 can handle, then any USB based solution is useless. ICH5 and= =20 up would be right about that time you would see this starting to happen. sdcards/cfcards mileage may vary depending on the transfer rates. But=20 still the same situation applies like you said once your main pool=20 throughput outweighs the throughput on your ZIL then its probably not=20 worth even having a ZIL or a Cache device. Emphasis on Cache moreso than ZIL. Anyway all good information for those to make the judgement whether they=20 need a cache or a zil. Thanks again Jeremy. Always appreciated. --=20 Regards, (jhell) Jason Hellenthal --17pEHd4RhPHOinZp Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (FreeBSD) Comment: http://bit.ly/0x89D8547E iQEcBAEBAgAGBQJNyzyAAAoJEJBXh4mJ2FR+2qAH/3A09ZwqGiIjuz25r5FVqwk6 iJuTHR1rlOTV0IqaUh6a2FaFGnWKDu/KpQLOj+ZGDPB6DH70fOon90QvU3/hTjoN RhguCVxHfbQLJbqaXKHZkj+JC6RhMV1H899/VAx29XlVMfvarUXw47vF7Pjcq3G1 tK5pZyK66yldkUzPwQHufIHtcebWu7EVzGWF4Hl25apkRDTyDRHL45rIM/vdDE94 SB81i9bFD+BuMV2KKUwwG/JfKborFxtYID4Vy8nIVDGjq9fE9zh4FTClnyj3wmNE Y5UYsjB1JkZX2q195FkMk3YxLIyS3xSTahHUqwsVZA+bm+Rc2G9DTU1jhlHSudU= =QtJ+ -----END PGP SIGNATURE----- --17pEHd4RhPHOinZp-- From owner-freebsd-fs@FreeBSD.ORG Thu May 12 02:08:09 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29539106564A for ; Thu, 12 May 2011 02:08:09 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by mx1.freebsd.org (Postfix) with ESMTP id D9E558FC14 for ; Thu, 12 May 2011 02:08:08 +0000 (UTC) Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72]) by qmta04.westchester.pa.mail.comcast.net with comcast id iS891g0021ZXKqc54S895N; Thu, 12 May 2011 02:08:09 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta21.westchester.pa.mail.comcast.net with comcast id iS861g0071t3BNj3hS86gF; Thu, 12 May 2011 02:08:07 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id A8313102C36; Wed, 11 May 2011 19:08:04 -0700 (PDT) Date: Wed, 11 May 2011 19:08:04 -0700 From: Jeremy Chadwick To: Jason Hellenthal Message-ID: <20110512020804.GA50560@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan> <20110512014848.GA35736@DataIX.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110512014848.GA35736@DataIX.net> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 02:08:09 -0000 On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote: > Jeremy, As always the qaulity of your messages are 101% spot on and I > always find some new new information that becomes handy more often than I > could say, and there is always something to be learned. > > Thanks. > > On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote: > > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote: > > > > > > Jeremy, > > > > > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks > > > > > >should also keep that in mind when putting an SSD into use in this > > > > > >fashion. > > > > > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > > > written slowly (on purpose). Any current, or 1-2 generations back SSD > > > > > would handle that write load without TRIM and without any performance > > > > > degradation. > > > > > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > > > NAND should be smaller (my wild guess, current practice may differ) > > > > > and the need for rewriting will be small. If you don't need to > > > > > rewrite already written data, TRIM does not help. Also, as far as I > > > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > > > twice or more the advertised size and always write to fresh cells, > > > > > scheduling an background erase of the 'overwritten' cell. > > > > > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > > > space they keep available on an SSD. I'd rather not speculate as to how > > > > much, as I'm certain it varies per vendor. > > > > > > > > > > Lets not forget here: The size of the separate log device may be quite > > > small. A rule of thumb is that you should size the separate log to be able > > > to handle 10 seconds of your expected synchronous write workload. It would > > > be rare to need more than 100 MB in a separate log device, but the > > > separate log must be at least 64 MB. > > > > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > > > > So in other words how much is TRIM really even effective give the above ? > > > > > > Even with a high database write load on the disks at full compacity of the > > > incoming link I would find it hard to believe that anyone could get the > > > ZIL to even come close to 512MB. > > > > In the case of an SSD being used as a log device (ZIL), I imagine it > > would only matter the longer the drive was kept in use. I do not use > > log devices anywhere with ZFS, so I can't really comment. > > > > In the case of an SSD being used as a cache device (L2ARC), I imagine it > > would matter much more. > > > > In the case of an SSD being used as a pool device, it matters greatly. > > > > Why it matters: there's two methods of "reclaiming" blocks which were > > used: internal SSD "garbage collection" and TRIM. For a NAND block to be > > reclaimed, it has to be erased -- SSDs erase things in pages rather > > than individual LBAs. With TRIM, you submit the data management command > > via ATA with a list of LBAs you wish to inform the drive are no longer > > used. The drive aggregates the LBA ranges, determines if an entire > > flash page can be erased, and does it. If it can't, it makes some sort > > of mental note that the individual LBA (in some particular page) > > shouldn't be used. > > > > The "garbage collection" works when the SSD is idle. I have no idea > > what "idle" actually means operationally, because again, vendors don't > > disclose what the idle intervals are. 5 minutes? 24 hours? It > > matters, but they don't tell us. (What confuses me about the "idle GC" > > method is how it determines what it can erase -- if the OS didn't tell > > it what it's using, how does it know it can erase the page?) > > > > Anyway, how all this manifests itself performance-wise is intriguing. > > It's not speculation: there's hard evidence that not using TRIM results > > in SSD performance, bluntly put, sucking badly on some SSDs. > > > > There's this mentality that wear levelling completely solves all of the > > **performance** concerns -- that isn't the case at all. In fact, I'm > > under the impression it probably hurts performance, but it depends on > > how it's implemented within the drive firmware. > > > > bit-tech did an experiment using Windows 7 -- which supports and uses > > TRIM assuming the device advertises the capability -- with different > > models of SSDs. The testing procedure is documented here, but I'll > > document it as well: > > > > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4 > > > > Again, remember, this is done on a Windows 7 system which does support > > TRIM if the device supports it. The testing steps, in this order: > > > > 1) SSD without TRIM support -- all LBAs are zeroed. > > 2) Took read/write benchmark readings. > > 3) SSD without TRIM support -- partitioned and formatted as NTFS > > (cluster size unknown), copied 100GB of data to the drive, deleted all > > the data, and repeated this method 10 times. > > 4) Step #2 repeated. > > 5) Upgraded SSD firmware to a version that supports TRIM. > > 6) SSD with TRIM support -- step #1 repeated. > > 7) Step #2 repeated. > > 8) SSD with TRIM support -- step #3 repeated. > > 9) Step #2 repeated. > > > > Without TRIM, some drives drop their read performance by more than 50%, > > and write performance by almost 70%. I'm focusing on Intel SSDs here, > > by the way. I do not care for OCZ or Corsair products. > > > > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support > > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS > > on FreeBSD will mimic (to some degree). > > > > Therefore, simply put, users should be concerned when using ZFS on > > FreeBSD with SSDs. It doesn't matter to me if you're only using > > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > > means degraded performance over time. > > > > Can you refute any of this evidence? > > > > At least now at the moment NO. But I can say depending on how large of a > use of SSDs with OpenSolaris users from before the Oracle reaping that I > didnt recall seeing any relative bug reports on degradation. But like I > said... I havent seen them but thats not to say there wasnt a lack of use > either. Definately more to look into, test, benchmark & test again. > > > > Given most SSD's come at a size greater than 32GB I hope this comes as a > > > early reminder that the ZIL you are buying that disk for is only going to > > > be using a small percent of that disk and I hope you justify cost over its > > > actual use. If you do happen to justify creating a ZIL for your pool then > > > I hope that you partition it wisely to make use of the rest of the space > > > that is untouched. > > > > > > For all other cases I would reccomend if you still want to have a ZIL that > > > you take some sort of PCI->SD CARD or USB stick into account with > > > mirroring. > > > > Others have pointed out this isn't effective (re: USB sticks). The read > > and write speeds are too slow, and limit the overall performance of ZFS > > in a very bad way. I can absolutely confirm this claim (I've tested it > > myself, using a high-end USB flash drive as a cache device (L2ARC)). > > > > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC > > *does* improve performance on older systems which have slower disk I/O > > (e.g. ICH5-based systems). > > > > Agreed. Soon as the bus speed, write speeds are greater than the speeds > that USB 2.0 can handle, then any USB based solution is useless. ICH5 and > up would be right about that time you would see this starting to happen. > > sdcards/cfcards mileage may vary depending on the transfer rates. But > still the same situation applies like you said once your main pool > throughput outweighs the throughput on your ZIL then its probably not > worth even having a ZIL or a Cache device. Emphasis on Cache moreso than > ZIL. > > > Anyway all good information for those to make the judgement whether they > need a cache or a zil. > > > Thanks again Jeremy. Always appreciated. You're welcome. It's important to note that much of what I say is stuff I've learned and read (technical documentation usually) on my own -- which means I almost certainly misunderstand certain pieces of technology. There are a *lot* of people here who understand it much better than I do. (I'm looking at you, jhb@ ;-) ) As such, I probably should have CC'd pjd@ on this thread, since he's talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when to issue the erasing of said blocks. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu May 12 02:26:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C53E61065688 for ; Thu, 12 May 2011 02:26:43 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184]) by mx1.freebsd.org (Postfix) with ESMTP id 937FF8FC14 for ; Thu, 12 May 2011 02:26:43 +0000 (UTC) Received: from [203.206.171.212] (helo=[192.168.10.10]) by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72 (FreeBSD)) (envelope-from ) id 1QKLfi-000Eyk-I0 for freebsd-fs@freebsd.org; Thu, 12 May 2011 12:30:40 +1000 Message-ID: <4DCB455C.4020805@dannysplace.net> Date: Thu, 12 May 2011 12:26:36 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DCA5620.1030203@dannysplace.net> In-Reply-To: <4DCA5620.1030203@dannysplace.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29) X-Date: 2011-05-12 12:30:39 X-Connected-IP: 203.206.171.212:50545 X-Message-Linecount: 108 X-Body-Linecount: 95 X-Message-Size: 4204 X-Body-Size: 3656 X-Received-Count: 1 X-Recipient-Count: 1 X-Local-Recipient-Count: 1 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on damka.dannysplace.net X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00 autolearn=ham version=3.3.1 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net) Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 02:26:43 -0000 On 11/05/2011 7:25 PM, Danny Carroll wrote: > Hello all. > > I've been using ZFS for some time now and have never had an issued > (except perhaps the issue of speed...) > When v28 is taken into -STABLE I will most likely upgrade to v28 at that > point. Currently I am running v15 with v4 on disk. > > When I move to v28 I will probably wish to enable a L2Arc and also > perhaps dedicated log devices. > > I'm curious about a few things however. > > 1. Can I remove either the L2 ARC or the log devices if things don't go > as planned or if I need to free up some resources? > 2. What are the best practices for setting up these? Would a geom > mirror for the log device be the way to go. Or can you just let ZFS > mirror the log itself? > 3. What happens when one or both of the log devices fail. Does ZFS > come to a crashing halt and kill all the data? Or does it simply > complain that the ZIL is no longer active and continue on it's merry way? > > In short, what is the best way to set up these two features? > Replying to myself in order to summarise the recommendations (when using v28): - Don't use SSD for the Log device. Write speed tends to be a problem. - SSD ok for cache if the sizing is right, but without TRIM, don't expect to take full advantage of the SSD. - Do use two devices for log and mirror them with ZFS. Bad things *can* happen if*all* the log devices die. - Don't colocate L2ARC and Log devices. - Log devices can be small, ZFS Best practices guide specifies about 50% of RAM as max. Minimum should be Throughput * 10 (1Gb for 100MB/sec of writes). let me know if I got anything wrong or missed something important. Remaining questions. - Is there any advantage to using a spare partition on a SCSI or SATA drive as L2Arc? Assuming it was in the machine already but doing nothing? - If I have 2 pools like this: # zpool status pool: tank state: ONLINE scrub: scrub completed after 11h7m with 0 errors on Sun May 8 14:17:07 2011 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/data0 ONLINE 0 0 0 gpt/data1 ONLINE 0 0 0 gpt/data2 ONLINE 0 0 0 gpt/data3 ONLINE 0 0 0 gpt/data4 ONLINE 0 0 0 gpt/data5 ONLINE 0 0 0 raidz1 ONLINE 0 0 0 gpt/data6 ONLINE 0 0 0 gpt/data7 ONLINE 0 0 0 gpt/data8 ONLINE 0 0 0 gpt/data9 ONLINE 0 0 0 gpt/data10 ONLINE 0 0 0 gpt/data11 ONLINE 0 0 0 errors: No known data errors pool: system state: ONLINE scrub: scrub completed after 1h1m with 0 errors on Sun May 8 15:18:23 2011 config: NAME STATE READ WRITE CKSUM system ONLINE 0 0 0 mirror ONLINE 0 0 0 gpt/system0 ONLINE 0 0 0 gpt/system1 ONLINE 0 0 0 And I have free space on the "system" disks. I could give two new partitions on the system disks to ZFS for the log devices of the "tank" pool? If I were worried about performance of my "system" pool, I could also use spare partitions on (a couple of) the "tank" disks in a similar way. But it would be silly to use the same disk for ZIL and pool data. In that case, why would I bother to alter the default. Thanks for the info! -D From owner-freebsd-fs@FreeBSD.ORG Thu May 12 02:44:50 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4C14E106564A for ; Thu, 12 May 2011 02:44:50 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id DE8278FC12 for ; Thu, 12 May 2011 02:44:49 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4C2ij70026830; Wed, 11 May 2011 21:44:45 -0500 (CDT) Date: Wed, 11 May 2011 21:44:45 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jeremy Chadwick In-Reply-To: <20110512010433.GA48863@icarus.home.lan> Message-ID: References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 11 May 2011 21:44:45 -0500 (CDT) Cc: freebsd-fs@freebsd.org, Jason Hellenthal Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 02:44:50 -0000 On Wed, 11 May 2011, Jeremy Chadwick wrote: > > The "garbage collection" works when the SSD is idle. I have no idea > what "idle" actually means operationally, because again, vendors don't > disclose what the idle intervals are. 5 minutes? 24 hours? It > matters, but they don't tell us. (What confuses me about the "idle GC" > method is how it determines what it can erase -- if the OS didn't tell > it what it's using, how does it know it can erase the page?) Garbage collection is not necessarily just when the drive is idle. Regardless, if one "overwrites" a page (or part of a page), the drive can implement that by reading any non-overlapped existing content (which it already has to do), allocating a fresh (already erased) page, and then writing the composite to that new page. The "overwritten" page is then scheduled for erasure. This sort of garbage collector works by over-provisioning the actual amount of flash in the drive, which should be done anyway in a quality product. This simple recirculating/COW algorithm is a reason why TRIM is not really needed given sufficiently intelligent SSD design. > Therefore, simply put, users should be concerned when using ZFS on > FreeBSD with SSDs. It doesn't matter to me if you're only using > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM > means degraded performance over time. This seems unduely harsh. Even with TRIM, SSDs will suffer in continually write-heavy (e.g. server) environments. The reason is that the blocks still need to be erased and the erasure performance is limited. It is not uncommon for servers to be run close to their limits most of the time. One should not be ashamed with purchasing a larger SSD than the space consumption appears to warrant. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Thu May 12 02:52:08 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 519071065672 for ; Thu, 12 May 2011 02:52:08 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 1948D8FC08 for ; Thu, 12 May 2011 02:52:07 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4C2pwMu029542; Wed, 11 May 2011 21:51:58 -0500 (CDT) Date: Wed, 11 May 2011 21:51:58 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Danny Carroll In-Reply-To: <4DCB455C.4020805@dannysplace.net> Message-ID: References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Wed, 11 May 2011 21:51:58 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 02:52:08 -0000 On Thu, 12 May 2011, Danny Carroll wrote: > > Replying to myself in order to summarise the recommendations (when using > v28): > - Don't use SSD for the Log device. Write speed tends to be a problem. DO use SSD for the log device. The log device is only used for synchronous writes. Except for certain usages (E.g. database and NFS server) most writes will be asynchronous and never be written to the log. Huge synchronous writes will also bypass the SSD log device. The log device is for reducing latency on small synchronous writes. > - Is there any advantage to using a spare partition on a SCSI or SATA > drive as L2Arc? Assuming it was in the machine already but doing nothing? The L2ARC is intended to reduce read latency and is random accessed. It is unlikely that rotating media will work well for that. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Thu May 12 03:36:31 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0CD9F106564A for ; Thu, 12 May 2011 03:36:31 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta08.westchester.pa.mail.comcast.net (qmta08.westchester.pa.mail.comcast.net [76.96.62.80]) by mx1.freebsd.org (Postfix) with ESMTP id AE00D8FC08 for ; Thu, 12 May 2011 03:36:30 +0000 (UTC) Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89]) by qmta08.westchester.pa.mail.comcast.net with comcast id iTCV1g0011vXlb858TcWEr; Thu, 12 May 2011 03:36:30 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta17.westchester.pa.mail.comcast.net with comcast id iTcT1g01s1t3BNj3dTcUMy; Thu, 12 May 2011 03:36:30 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id A9A11102C36; Wed, 11 May 2011 20:36:26 -0700 (PDT) Date: Wed, 11 May 2011 20:36:26 -0700 From: Jeremy Chadwick To: Bob Friesenhahn Message-ID: <20110512033626.GA52047@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 03:36:31 -0000 On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: > On Thu, 12 May 2011, Danny Carroll wrote: > > > >Replying to myself in order to summarise the recommendations (when using > >v28): > >- Don't use SSD for the Log device. Write speed tends to be a problem. > > DO use SSD for the log device. The log device is only used for > synchronous writes. Except for certain usages (E.g. database and > NFS server) most writes will be asynchronous and never be written to > the log. Huge synchronous writes will also bypass the SSD log > device. The log device is for reducing latency on small synchronous > writes. Bob, please correct me if I'm wrong, but as I understand it a log device (ZIL) effectively limits the overall write speed of the pool itself. Consumer-level SSDs do not have extremely high write performance (and it gets worse without TRIM; again a 70% decrease in write speed in some cases). I imagine a very high-end SSD (FusionIO, etc. -- the things that cost $900 and higher) would have extremely high write performance and would work perfectly for this role. Or a battery-backed DDR RAM device. What's amusing (to me anyway) is that when ZFS was originally presented, engineers from Sun folks kept focusing on how "you can buy cheap, generic disks and accomplish goals!" yet if the above statement of mine is accurate, that goes against the original principle. Danny might also find this URL useful: http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs > >- Is there any advantage to using a spare partition on a SCSI or SATA > >drive as L2Arc? Assuming it was in the machine already but doing nothing? > > The L2ARC is intended to reduce read latency and is random accessed. > It is unlikely that rotating media will work well for that. Agreed -- this is why I tell folks that an SSD would work very well for L2ARC, but my opinion is just to buy more RAM for the ARC ("layer 1"). -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu May 12 03:56:00 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06A5E106566C for ; Thu, 12 May 2011 03:56:00 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id AF51F8FC16 for ; Thu, 12 May 2011 03:55:59 +0000 (UTC) Received: by gyg13 with SMTP id 13so505401gyg.13 for ; Wed, 11 May 2011 20:55:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=irQ/V/RYJGBt7AcbvtwFEO75vMGvANZdp5IjoQCCFFY=; b=XUk/aRexC8GcoTrlj44WS/xAfZ1ghXowqRxcckmFO8Zx1TFR0SppydcyD9VOyHK1ok 3bS0EE0fqJyp4CMHXoq1vnf7HGu1Aio7MCOx66c3vnfkCKHFe4HNNUNQw6bfNwAYdGlj vnoOSw+E+W1BulqRpWPOBSGR/hniQ0aE6shUo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Zshe1bSaA2CvuVE8VZYZMMhBo7ewvxD/h1/CeE5M33uAZJ/l8xAEl9KmEm9JHMUM/q hVebspe62GVpKjP16OPWDnLAaJ9GlbtWTTCGfxxkmXuq3rhjqbrswb7i7vZb6eIofXnk uN/uyWaPjhIatlAAYmMm+sOW1tvwttdGvtwEM= MIME-Version: 1.0 Received: by 10.90.135.8 with SMTP id i8mr131088agd.113.1305172556994; Wed, 11 May 2011 20:55:56 -0700 (PDT) Received: by 10.90.52.15 with HTTP; Wed, 11 May 2011 20:55:56 -0700 (PDT) In-Reply-To: <20110512033626.GA52047@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> Date: Wed, 11 May 2011 20:55:56 -0700 Message-ID: From: Freddie Cash To: Jeremy Chadwick Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 03:56:00 -0000 On Wed, May 11, 2011 at 8:36 PM, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: >> On Thu, 12 May 2011, Danny Carroll wrote: >> > >> >Replying to myself in order to summarise the recommendations (when usin= g >> >v28): >> >- Don't use SSD for the Log device. =C2=A0Write speed tends to be a pro= blem. >> >> DO use SSD for the log device. =C2=A0The log device is only used for >> synchronous writes. =C2=A0Except for certain usages (E.g. database and >> NFS server) most writes will be asynchronous and never be written to >> the log. =C2=A0Huge synchronous writes will also bypass the SSD log >> device. The log device is for reducing latency on small synchronous >> writes. > > Bob, please correct me if I'm wrong, but as I understand it a log device > (ZIL) effectively limits the overall write speed of the pool itself. Nope. Using a separate log device removes sync writes from the I/O path of the rest of the pool, thus increasing the total write throughput for the pool. > Danny might also find this URL useful: > > http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-f= lash-memory-ssds-and-zfs Read the linked articles. For example: http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-z= il-explained Most sync writes go to the ZIL. If the ZIL is part of the pool, then the pool has to issue two separate writes (once to the ZIL, then later to the pool as part of the normal async txg). If the ZIL is a separate device, then there's no write contention with the rest of the pool. Not every sync write goes to the ZIL. Only writes under a certain size (64 KB or something like that). Every OpenSolaris, Oracle Solaris, Nexenta admin will recommend getting an enterprise-grade, write-optimised, SLC-based SSD (preferably with a supercap) for use as the SLOG device. Especially if you're using ZFS for anything database-related, or serving files over NFS, everyone says the same: get an SSD for SLOG usage. Why would it be any different for ZFS on FreeBSD? There are plenty of benchmarks online and in the zfs-discuss mailing list that shows the benefits to using an SSD-based SLOG. --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu May 12 06:33:18 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BFB701065670 for ; Thu, 12 May 2011 06:33:18 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 337A78FC15 for ; Thu, 12 May 2011 06:33:17 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4C6X77D058558 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 12 May 2011 09:33:12 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DCB7F22.4060008@digsys.bg> Date: Thu, 12 May 2011 09:33:06 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> In-Reply-To: <20110512033626.GA52047@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 06:33:18 -0000 On 12.05.11 06:36, Jeremy Chadwick wrote: > On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: >> On Thu, 12 May 2011, Danny Carroll wrote: >>> Replying to myself in order to summarise the recommendations (when using >>> v28): >>> - Don't use SSD for the Log device. Write speed tends to be a problem. >> DO use SSD for the log device. The log device is only used for >> synchronous writes. Except for certain usages (E.g. database and >> NFS server) most writes will be asynchronous and never be written to >> the log. Huge synchronous writes will also bypass the SSD log >> device. The log device is for reducing latency on small synchronous >> writes. > Bob, please correct me if I'm wrong, but as I understand it a log device > (ZIL) effectively limits the overall write speed of the pool itself. > Perhaps I misstated it in my first post, but there is nothing wrong with using SSD for the SLOG. You can of course create usage/benchmark scenario, where an (cheap) SSD based SLOG will be worse than an (fast) HDD based SLOG, especially if you are not concerned about latency. The SLOG resolves two issues, it increases the pool throughput (primary storage) by removing small synchronous writes from it, that will unnecessarily introduce head movement and more IOPS and it provided low latency for small synchronous writes. The later is only valid if the SSD is sufficiently write-optimized. Most consumer SSDs end up saturated by writes. Sequential write IOPS is what matters here. About TRIM. As it was already mentioned, you will use only small portion of an (for example) 32GB SSD for the SLOG. If you do not allocate the entire SSD, then wear leveling will be able to play well and it is very likely you will not suffer any performance degradation. By the way, I do not believe Windows benchmark has any significance in our ZFS usage for the SSDs. How is TRIM implemented in Windows? How does it relate to SSD usage as SLOG and L2ARC? How can ever TRIM support influence reading from the drive?! TRIM is an slow operation. How often are these issued? What is the impact of issuing TRIM to an otherwise loaded SSD? Daniel From owner-freebsd-fs@FreeBSD.ORG Thu May 12 06:44:19 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A35EF106564A for ; Thu, 12 May 2011 06:44:19 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id 3422E8FC15 for ; Thu, 12 May 2011 06:44:18 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4C6i9d6058598 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 12 May 2011 09:44:14 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DCB81B8.6070301@digsys.bg> Date: Thu, 12 May 2011 09:44:08 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> In-Reply-To: <4DCB455C.4020805@dannysplace.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 06:44:19 -0000 On 12.05.11 05:26, Danny Carroll wrote: > > - Don't use SSD for the Log device. Write speed tends to be a problem. It all depends on your usage. You need to experiment, unfortunately. > - SSD ok for cache if the sizing is right, but without TRIM, don't > expect to take full advantage of the SSD. I do not believe TRIM has any effect on L2ARC. Why? - TRIM is a technique to optimize future writes; - L2ARC is written at controlled, very low rate, I believe something like 8MB/sec. There is no SSD currently on the market, with or without TRIM that has any trouble sustaining that rate. - TRIM might introduce delays, it is very 'expensive' command. But that will surely wary by drive/manufacturer. - There is no way TRIM can influence reading from the flash media. Reading from L2ARC with low latency and high speed is it's main purpose anyway. > Remaining questions. > - Is there any advantage to using a spare partition on a SCSI or SATA > drive as L2Arc? Assuming it was in the machine already but doing nothing? Absolutely no advantage. You want L2ARC to be very low latency and high-bandwidth for random reading. Especially low-latency. This does not apply to rotating disks. Daniel From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:02:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7EE7C106566C for ; Thu, 12 May 2011 08:02:49 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 0BE818FC16 for ; Thu, 12 May 2011 08:02:48 +0000 (UTC) Received: by wwc33 with SMTP id 33so1417854wwc.31 for ; Thu, 12 May 2011 01:02:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to :x-mailer; bh=y5Bfw6bWyC1G9CFze8Vs2J3W+YtO5U4YJGjskubuCFI=; b=r0CDomCnXi0ERwWbds0jmDZTh/iu6P44pXOg+ksMoM00g2Hf9iuSka/wS6qUCc+tl8 uMJUuhRC6gXakY0CDg00ij50LcvpTZs4ufuH3JG8nst5qbXA4xMSMNZygpwfBUY6drEp 4oM06UR09SABIxAgfcKodHottIfK6Uy8P+w9A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=R9OR9gWK+GkuAd4z0kFKi0R4Q89mcDk/klMt5QcK5y/3CoZDV6bt7m+GlfshxGeHOp 2FprGMBQ+hW9nqitMU70rWohNIbvWCYWO88wLRgO3Yu0OrEHULRUUF5w3cIuQgL8WFQc yFcmhELfq693JAPJfNDhUqsHRPcvamuE57BOQ= Received: by 10.216.79.11 with SMTP id h11mr4701200wee.77.1305187367760; Thu, 12 May 2011 01:02:47 -0700 (PDT) Received: from sime-imac.logos.hr ([213.147.110.159]) by mx.google.com with ESMTPS id t5sm473548wes.9.2011.05.12.01.02.46 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 12 May 2011 01:02:46 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) From: =?iso-8859-2?Q?=A9imun_Mikecin?= In-Reply-To: <4DCB81B8.6070301@digsys.bg> Date: Thu, 12 May 2011 10:02:42 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg> To: freebsd-fs X-Mailer: Apple Mail (2.1084) Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:02:49 -0000 On 12. svi. 2011., at 08:44, Daniel Kalchev wrote: > On 12.05.11 05:26, Danny Carroll wrote: >>=20 >> - Don't use SSD for the Log device. Write speed tends to be a = problem. > It all depends on your usage. You need to experiment, unfortunately. What is the alternative for log devices if you are not using SSD? Rotating hard drives? AFAIK, two factors define the speed of log device: write transfer rate = and write latency. You will not find a rotating hard drive that has a write latency = anything near the write latency of even a slowest SSD you can find on = the market. On the other hand, only a very few rotating hard drives have a write = transfer rate that can be compared to SSD's. From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:33:37 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5634C106566C for ; Thu, 12 May 2011 08:33:37 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id D807B8FC0C for ; Thu, 12 May 2011 08:33:36 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4C8XR3l058999 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO) for ; Thu, 12 May 2011 11:33:32 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DCB9B57.8090507@digsys.bg> Date: Thu, 12 May 2011 11:33:27 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:33:37 -0000 On 12.05.11 11:02, =8Aimun Mikecin wrote: > > You will not find a rotating hard drive that has a write latency anythi= ng near the write latency of even a slowest SSD you can find on the marke= t. > Cheap SSDs do not have acceptable latency, when saturated with writes.=20 Not to speak of throughput. Truth is, SLOG should be on write-optimized SLC SSD. Any tricks, such as = compression, manufacturers make with consumer products do influence=20 Windows benchmarks, but are unlikely to change laws of physics. For most small installs, using rotating magnetic media (cheap!) as SLOG, = can have dramatic performance improvement compared to not using any SLOG.= Daniel From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:34:31 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F11C31065672 for ; Thu, 12 May 2011 08:34:31 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta09.emeryville.ca.mail.comcast.net (qmta09.emeryville.ca.mail.comcast.net [76.96.30.96]) by mx1.freebsd.org (Postfix) with ESMTP id D59A88FC1C for ; Thu, 12 May 2011 08:34:31 +0000 (UTC) Received: from omta10.emeryville.ca.mail.comcast.net ([76.96.30.28]) by qmta09.emeryville.ca.mail.comcast.net with comcast id iYa41g0010cQ2SLA9YaX1b; Thu, 12 May 2011 08:34:31 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta10.emeryville.ca.mail.comcast.net with comcast id iYaW1g0021t3BNj8WYaWHr; Thu, 12 May 2011 08:34:31 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id D7B06102C36; Thu, 12 May 2011 01:34:29 -0700 (PDT) Date: Thu, 12 May 2011 01:34:29 -0700 From: Jeremy Chadwick To: Daniel Kalchev Message-ID: <20110512083429.GA58841@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4DCB7F22.4060008@digsys.bg> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:34:32 -0000 On Thu, May 12, 2011 at 09:33:06AM +0300, Daniel Kalchev wrote: > On 12.05.11 06:36, Jeremy Chadwick wrote: > >On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: > >>On Thu, 12 May 2011, Danny Carroll wrote: > >>>Replying to myself in order to summarise the recommendations (when using > >>>v28): > >>>- Don't use SSD for the Log device. Write speed tends to be a problem. > >>DO use SSD for the log device. The log device is only used for > >>synchronous writes. Except for certain usages (E.g. database and > >>NFS server) most writes will be asynchronous and never be written to > >>the log. Huge synchronous writes will also bypass the SSD log > >>device. The log device is for reducing latency on small synchronous > >>writes. > >Bob, please correct me if I'm wrong, but as I understand it a log device > >(ZIL) effectively limits the overall write speed of the pool itself. > > > Perhaps I misstated it in my first post, but there is nothing wrong > with using SSD for the SLOG. > > You can of course create usage/benchmark scenario, where an (cheap) > SSD based SLOG will be worse than an (fast) HDD based SLOG, > especially if you are not concerned about latency. The SLOG resolves > two issues, it increases the pool throughput (primary storage) by > removing small synchronous writes from it, that will unnecessarily > introduce head movement and more IOPS and it provided low latency > for small synchronous writes. I've been reading about this in detail here: http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained I had no idea the primary point of a SLOG was to deal with applications that make use of O_SYNC. I thought it was supposed to improve write performance for both asynchronous and synchronous writes. Obviously I'm wrong here. The author's description (at that URL) of an example scenario makes little sense to me; there's a story he tells referring to a bank and a financial transaction of US$699 performed which got cached in RAM and then the system lost power -- and how the intent log on a filesystem would be replayed during reboot. What guarantee is there that the intent log -- which is written to the disk -- actually got written to the disk in the middle of a power failure? There's a lot of focus there on the idea that "the intent log will fix everything, but may lose writes", but what guarantee do I have that the intent log isn't corrupt or botched during a power failure? I guess this is why others have mentioned the importance of BBUs and supercaps, but I don't know what guarantee there is that during a power failure there won't be some degree of filesystem corruption or lost data. There's a lot about ensuring/guaranteeing filesystem integrity I've to learn. > The later is only valid if the SSD is sufficiently write-optimized. > Most consumer SSDs end up saturated by writes. Sequential write IOPS > is what matters here. Oh, I absolutely agree on this point. So basically consumer-level SSDs that don't provide extreme write speed benefits (compared to a classic MHDD) -- not discussing seek times here, we all know SSDs win there -- probably aren't good candidates for SLOGs. What's interesting about the focus on IOPS is that Intel SSDs, in the consumer class, still trump their competitors. But given that your above statement focuses on sequential writes, and the site I provided is quite clear about what happens to sequential writes on Intel SSD that doesn't have TRIM..... Yeah, you get where I'm going with this. :-) > About TRIM. As it was already mentioned, you will use only small > portion of an (for example) 32GB SSD for the SLOG. If you do not > allocate the entire SSD, then wear leveling will be able to play > well and it is very likely you will not suffer any performance > degradation. That sounds ideal, though I'm not sure about the "won't suffer ANY performance degradation" part. I think degradation is just less likely to be witnessed. I should clarify on what "allocate" in the above paragraph means (for readers, not for you Daniel :-) ): it means disk space actually used (LBAs actually written to). Wear levelling works better when there's more available (unused) flash. The more full the disk (filesystem(s)) is, the worse the wear levelling algorithm performs. > By the way, I do not believe Windows benchmark has any significance > in our ZFS usage for the SSDs. How is TRIM implemented in Windows? > How does it relate to SSD usage as SLOG and L2ARC? Yeah, I knew someone would go down this road. Sigh. I strongly believe it does have relevance. The relevance is in the fact that the non-TRIM benchmarks (read: an OS that has TRIM support but the SSD itself does not, therefore TRIM cannot be used) are strong indicators that the performance of the SSD -- sequential reads and writes both -- greatly degrade without TRIM over time. This is also why you'll find people (who cannot use TRIM) regularly advocating an entire format (writing zeros to all LBAs on the disk) after prolonged use without TRIM. I don't know how TRIM is implemented with NTFS in Windows. > How can ever TRIM support influence reading from the drive?! I guess you want more proof, so here you go. Again, the authors wrote a bunch of data to the filesystem, took a sequential read benchmark, then induced TRIM and took another sequential read benchmark. The difference is obvious. This is an X25-V, however, which is the "low-end" of the consumer series, so the numbers are much worse -- but this is a drive that runs for around US$100, making it appealing to people: http://www.anandtech.com/show/3756/2010-value-ssd-100-roundup-kingston-and-ocz-take-on-intel/5 I imagine the reason this happens is similar to why memory performance degrades under fragmentation or when there's a lot of "middle-man stuff" going on. "Middle-man stuff" in this case means the FTL inside of the SSD which is used to correlate LBAs with physical NAND flash pages (and the physically separate chips; it's not just one big flash chip you know). NAND flash pages tend to be something like 256KByte or 512KByte in size, so erasing one means no part of it should be in use by the OS or underlying filesystem. How does the SSD know what's used by the OS? It has to literally keep track of all the LBAs written to. I imagine that list is extremely large and takes time to iterate over. TRIM allows the OS to tell the underlying SSD "LBAs x-y aren't in use any more", which probably removes an entry from the FTL flash<->LBA map, and even does things like move data around between flash pages so that it can erase a NAND flash page. It can do the latter given the role of the FTL acting as a "middle-man" as noted above. > TRIM is an slow operation. How often are these issued? Good questions, for which I have no answer. The same could be asked of any OS however, not just Windows. And I've asked the same question about SSDs internal "garbage collection" too. I have no answers, so you and I are both wondering the same question. And yes, I am aware TRIM is a costly operation. There's a description I found of the process that makes a lot of sense, so rather than re-word it I'll just include it here: http://www.enterprisestorageforum.com/technology/article.php/11182_3910451_2/Fixing-SSD-Performance-Degradation-Part-1.htm See the paragraph starting with "Another long-awaited technique". > What is the impact of issuing TRIM to an otherwise loaded SSD? I'm not sure if "loaded" means "heavy I/O load" or "heavily used" (space-wise). If you meant "heavy I/O load": as I understand it -- following forums, user experiences, etc. -- a heavily-used drive which hasn't had TRIM issued tends to perform worse as time goes on. Most people with OSes that don't have TRIM (OS X, Windows XP, etc.) tend to resort to a full format of the SSD (every LBA written zero, e.g. the -E flag to newfs) every so often. The interval TRIM should be performed is almost certainly up for discussion, but I can't provide any advice because no OS I run or use seems to implement it (aside from FreeBSD UFS, and that seems to issue TRIM on BIO_DELETE via GEOM). (Inline EDIT: Holy crap, I just realised TRIM support has to be enabled via tunefs on UFS filesystems. I started digging through the code and I found the FS_TRIM bit; gee, maybe I should use tunefs -t. I wish I had known this; I thought it just did this automatically if the underlying storage device provided TRIM support. Sigh) Here's some data which probably won't mean much to you since it's from a Windows machine, but the important part is that it's from a Windows XP SP3 machine -- XP has no TRIM support. Disk: Intel 320-series SSD; model SSDSA2CW080G3; 80GB, MLC SB: Intel ICH9, in "Enhanced" mode (non-AHCI, non-RAID) OS: Windows XP SP3 FS: NTFS, 4KB cluster size, NTFS atime turned off, NTFS partition properly 4KB-aligned Space: Approximately 6GB of 80GB used. This disk is very new (only 436 power-on hours). Here are details of the disk: http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_01.png And a screen shot of a sequential read benchmark which should speak for itself. Block read size is 64KBytes. This is a raw device read and not a filesystem-level read, meaning NTFS isn't in the picture here. What's interesting is the degradation in performance around the 16GB region: http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_02.png Next, a screen shot of a filesystem-based benchmark. This is writing and reading a 256MByte file (to the NTFS filesystem) using different block sizes. Horizontal access is block size, vertical axis is speed. Reads are the blue bar, writes are the orange: http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_03.png And finally, the same device-level sequential read benchmark performed again to show what effect the write benchmarks may have had on the disk: http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_04.png Sadly I can't test sequential writes because it's an OS disk. So, my findings more or less mimic that of what other people are seeing as well. Given that the read benchmarks are device-level and not filesystem-level, one shouldn't be pondering Windows -- one should be pondering the implications of lack of TRIM and what's going on within the drive itself. I also have an Intel 320-series SSD in my home FreeBSD box as an OS disk (UFS2 / UFS2+SU filesystems). The amount of space used there is lower (~4GB). Do you know of some benchmarking utilities which do device-level reads and can plot or provide metrics for LBA offsets or equivalent? I could compare that to the Windows benchmarks, but still, I think we're barking up the wrong tree. I'm really not comparing ZFS to NTFS here; I'm saying that TRIM addresses performance problems (to some degree) regardless of filesystem type. Anyway, I think that's enough from me for now. I've written this over the course of almost 2 hours. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:42:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 324A8106564A for ; Thu, 12 May 2011 08:42:49 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id AC9788FC1B for ; Thu, 12 May 2011 08:42:47 +0000 (UTC) Received: by wyf23 with SMTP id 23so1318541wyf.13 for ; Thu, 12 May 2011 01:42:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=5l4bzEdxUZ/R8DtBDQb0sLziqCdB/UrDGDhJdcjONQ4=; b=jdlFa59LRxFvnKHrhqlB7C3ttX6u9ULrPKVNYz3Yqm+KSrAzN2NSqyNJ4AodndsC0e qD1kc3bIBTxZ1ufifWMwAayEJYiodRJq1NoYBdGULbKgo0A6jcAj5eBtv93JhmQkwWl5 HXYym3vZ4SVYr/jcGydUD5mzNtIoV6zJ9cZ/E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=TANSlEFNsR6pPYmSfIi/dV/NStks5KlNMaOxE32iG7yQEp+HtniZQjf/6e2w+Byj2q SClNtxWGaiU/G9aDtdyJywTmsSuLXCODRvuFd1JaC2secG/Zpz5TS3+pulQRINbHrTPM LZMJ2N7FitD8/oLd1xpXbigv4huTFTLUWPE9Y= MIME-Version: 1.0 Received: by 10.216.63.130 with SMTP id a2mr1398863wed.61.1305189765647; Thu, 12 May 2011 01:42:45 -0700 (PDT) Received: by 10.216.93.70 with HTTP; Thu, 12 May 2011 01:42:45 -0700 (PDT) In-Reply-To: <20110511223849.GA65193@DataIX.net> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> Date: Thu, 12 May 2011 09:42:45 +0100 Message-ID: From: krad To: Jason Hellenthal Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:42:49 -0000 On 11 May 2011 23:38, Jason Hellenthal wrote: > > Jeremy, > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote: > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote: > > > On 11.05.11 13:51, Jeremy Chadwick wrote: > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folk= s > > > >should also keep that in mind when putting an SSD into use in this > > > >fashion. > > > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices? > > > I see absolutely no benefit from TRIM for the L2ARC, because it is > > > written slowly (on purpose). Any current, or 1-2 generations back SS= D > > > would handle that write load without TRIM and without any performance > > > degradation. > > > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC > > > SSD for the SLOG, for many reasons. The write regions on the SLC > > > NAND should be smaller (my wild guess, current practice may differ) > > > and the need for rewriting will be small. If you don't need to > > > rewrite already written data, TRIM does not help. Also, as far as I > > > understand, most "serious" SSDs (typical for SLC I guess) would have > > > twice or more the advertised size and always write to fresh cells, > > > scheduling an background erase of the 'overwritten' cell. > > > > AFAIK, drive manufacturers do not disclose just how much reallocation > > space they keep available on an SSD. I'd rather not speculate as to ho= w > > much, as I'm certain it varies per vendor. > > > > Lets not forget here: The size of the separate log device may be quite > small. A rule of thumb is that you should size the separate log to be abl= e > to handle 10 seconds of your expected synchronous write workload. It woul= d > be rare to need more than 100 MB in a separate log device, but the > separate log must be at least 64 MB. > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > So in other words how much is TRIM really even effective give the above ? > > Even with a high database write load on the disks at full compacity of th= e > incoming link I would find it hard to believe that anyone could get the > ZIL to even come close to 512MB. > > > Given most SSD's come at a size greater than 32GB I hope this comes as a > early reminder that the ZIL you are buying that disk for is only going to > be using a small percent of that disk and I hope you justify cost over it= s > actual use. If you do happen to justify creating a ZIL for your pool then > I hope that you partition it wisely to make use of the rest of the space > that is untouched. > > For all other cases I would reccomend if you still want to have a ZIL tha= t > you take some sort of PCI->SD CARD or USB stick into account with > mirroring. > > -- > > Regards, (jhell) > Jason Hellenthal > > > You have just spotted a gap in the market I suspect. Maybe SSD manufacturers need to produce a sata based ssd of 1 or 2 gb of the fastest write speed available flash on the market. Produce it for < =A350 and you should have a big market From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:42:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 45BBE106566B for ; Thu, 12 May 2011 08:42:52 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id C998F8FC1C for ; Thu, 12 May 2011 08:42:51 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net [91.21.94.216]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E4805844015; Thu, 12 May 2011 10:42:34 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id AD4211289; Thu, 12 May 2011 10:42:31 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4C8gUV1019205; Thu, 12 May 2011 10:42:30 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011 10:42:30 +0200 Message-ID: <20110512104230.588214snqsg1gkn4@webmail.leidinger.net> Date: Thu, 12 May 2011 10:42:30 +0200 From: Alexander Leidinger To: fbsd@dannysplace.net References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> In-Reply-To: <4DCB455C.4020805@dannysplace.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: E4805844015.AF78F X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0.6, required 6, autolearn=disabled, J_CHICKENPOX_56 0.60) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305794557.33711@BBEaED1K/shx8xOIeQ2VMw X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:42:52 -0000 Quoting Danny Carroll (from Thu, 12 May 2011 12:26:36 +1000): > Replying to myself in order to summarise the recommendations (when using > v28): > - Don't use SSD for the Log device. Write speed tends to be a problem. It depends. You could buy a lot of large and low-power (and sort of slow) disks for the raw storage space, and 2 fast but small disks for the log. Even if they are not SSDs, this could improve the write throughput of the system (also depends upon the bus bandwith of the system, the disk-controller (SATA/SCSI) and your workload). The important part is that normally the log devices should have a lower latency and faster transfer speed than the main pool to be effective (you may get an improvement even if the devices have the same specs, as the main pool does not see the same workload then, but it depends upon your workload). > - SSD ok for cache if the sizing is right, but without TRIM, don't > expect to take full advantage of the SSD. As long as we do not have TRIM to play with, we can not give a proper answer here. For sure we can tell that a SSD increases the max performance a pool can deliver by a good amount. I would expect that TRIM can give some improvement for a cache device, but I do not expect that it is much. If it is more than 10% I would be very surprised. I expect the improvement more in the <=5% range for the cache (which may make a difference in read-cases where you are hitting the limit). > - Do use two devices for log and mirror them with ZFS. Bad things > *can* happen if*all* the log devices die. s/can/will/ as you will lose data in this case. The difference between v15 and v28 is the amount of data you lose (the entire pool vs. only what is still on the log devices) > - Don't colocate L2ARC and Log devices. This depends upon the devices and your workload. If you do not have a lot of throughput, but your applications has some hard requirements regarding the latency, it may make sense. Without measuring the outcome on your own workload, there is not really a way to answer this, but if your workload is read and write limited, to add first a separate log device. This way the pool is freed of the sync-writes, the read performance should increase and the write performance too as the data goes to the log device first without interfering with reads (in this case it matters more that the device is a separate device than that the device is significantly faster). Only when this is done and there is more demand regarding reads, I would add a significantly faster cache device (or more RAM, depending on the specs of the machine and the workload). Another of disk-tuning: if you are doing something like this on your workstation / development system where you just want to have the comfort of not waiting for the disks (and the workload does not demand a lot of reads+writes at the same time), you could give a shared cache/log device a try (but the device needs to be significantly faster to feel a difference). > Remaining questions. > - Is there any advantage to using a spare partition on a SCSI or SATA > drive as L2Arc? Assuming it was in the machine already but doing nothing? It depends. In general: if the cache is faster (by an amount which matters to you) than the pool, it helps. So for your particular case: if the other partitions on such a drive are only used occasionally and the drive is faster than your pool, you could get a little bit more out of it if you add the unused partition as a cache. As like for all RAIDs, more spindles (disks) give better performance, as such it could help if the cache has the same characteristics as the rest of the pool, but this depends upon your workload. In such a case it is probably better to add the disks to the pool instead of using it as a cache. A definitive answer to this can only be obtained by running your workload on both setups and compare the results (zpool iostat / gstat). Bye, Alexander. -- There are three kinds of people: men, women, and unix. http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:45:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8353C1065687 for ; Thu, 12 May 2011 08:45:05 +0000 (UTC) (envelope-from numisemis@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 061F38FC20 for ; Thu, 12 May 2011 08:45:04 +0000 (UTC) Received: by wwc33 with SMTP id 33so1449722wwc.31 for ; Thu, 12 May 2011 01:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:subject:mime-version:content-type:from :in-reply-to:date:cc:message-id:references:to:x-mailer; bh=plLgz8RQpu1MLLWCuOQcV4ARzhPOFTu9RwDrXTs5wgw=; b=FrakORjqE3gRQfauCysMMMTNxZHu4vR9CTnKbXQfPiJtBogU8YAfkuN/+yCa3x60Sk jRdNsKlkMg2JW+Ic5cx97ZnUrKFsXBduYhxYd56iCt+WymLPpj/9jkfk3X0pQo/9SxWp mg/GZNpMt0lj/Av2WZewBbfLzaF/EMgxby428= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:mime-version:content-type:from:in-reply-to:date:cc :message-id:references:to:x-mailer; b=AUmwpc5ZjQz/Enf680Jl7SiWUBrLkU5W9K2rYvOSOxCdrVr2kVf8OxLozXOJ56Rq2v BbhNofEA/ziFWboG5wHhtBqVPojor7JwMQKzeFfXbdzB7zMdo+nWMFaLDYp4J3rZnlxW LENcru6LLwRjrSeoSxTRw6zL5jmuuIRJymu7s= Received: by 10.216.69.140 with SMTP id n12mr7610262wed.32.1305189903478; Thu, 12 May 2011 01:45:03 -0700 (PDT) Received: from sime-imac.logos.hr ([213.147.110.159]) by mx.google.com with ESMTPS id f52sm487017wes.35.2011.05.12.01.45.02 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 12 May 2011 01:45:03 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1084) From: =?iso-8859-2?Q?=A9imun_Mikecin?= In-Reply-To: <20110512083429.GA58841@icarus.home.lan> Date: Thu, 12 May 2011 10:45:01 +0200 Message-Id: References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> To: Jeremy Chadwick X-Mailer: Apple Mail (2.1084) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:45:05 -0000 On 12. svi. 2011., at 10:34, Jeremy Chadwick wrote: >=20 > I had no idea the primary point of a SLOG was to deal with = applications > that make use of O_SYNC. I thought it was supposed to improve write > performance for both asynchronous and synchronous writes. Obviously = I'm > wrong here. If the application is not using O_SYNC, write operation returns to the = app before the data is actually written. > What guarantee is there that the intent log -- which is written to the > disk -- actually got written to the disk in the middle of a power > failure? There's a lot of focus there on the idea that "the intent = log > will fix everything, but may lose writes", but what guarantee do I = have > that the intent log isn't corrupt or botched during a power failure? I expect that checksumming also works for ZIL (anybody knows?). If that = is the case, corruption would be detected, but you will have lost data = unless you are using mirrored slog devices. From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:58:05 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B088B106567A for ; Thu, 12 May 2011 08:58:05 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 3EC328FC1E for ; Thu, 12 May 2011 08:58:05 +0000 (UTC) Received: by wwc33 with SMTP id 33so1460113wwc.31 for ; Thu, 12 May 2011 01:58:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=zismQCfy0mUCIfSmEvlGeSMHB/NeG1a9TKl3IfhZrz8=; b=WQcE56jw8sq7oJ7uk3Y0WfboQ2b67Y7pq6vZv88Wq02MfKzWcUgHG5wJUvl3pO8sUB F0y24dqoAFuisfPbk5GvadprsSpMWXU9mmVjTv9SM22Yh6YeYARDeIvrAizjJPrNlf4z EIEOoPsQ2HNoUyhypOrppr57B99J3f63qjzVA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=TL8ZzzQv0RBQoh5nrQ96Yx9PwiFcPOjrope0+jb+3JjMKanwprw2M5t6QwcRAAeIQc F+KN0vaHC4p7ujg+0QWJnL5x2YP2qF5O/n7Q2O8feAvq8CU9OTZwLeXf85YzLXqvDJ60 6+Ejq3KFDHJ5oPCIL74zhNUbDJx1UlQvp8g7o= MIME-Version: 1.0 Received: by 10.216.14.212 with SMTP id d62mr1464049wed.91.1305190684314; Thu, 12 May 2011 01:58:04 -0700 (PDT) Received: by 10.216.93.70 with HTTP; Thu, 12 May 2011 01:58:04 -0700 (PDT) In-Reply-To: References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg> Date: Thu, 12 May 2011 09:58:04 +0100 Message-ID: From: krad To: =?UTF-8?Q?=C5=A0imun_Mikecin?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-fs Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:58:05 -0000 2011/5/12 =C5=A0imun Mikecin > > On 12. svi. 2011., at 08:44, Daniel Kalchev wrote: > > > On 12.05.11 05:26, Danny Carroll wrote: > >> > >> - Don't use SSD for the Log device. Write speed tends to be a proble= m. > > It all depends on your usage. You need to experiment, unfortunately. > > What is the alternative for log devices if you are not using SSD? > Rotating hard drives? > > AFAIK, two factors define the speed of log device: write transfer rate an= d > write latency. > You will not find a rotating hard drive that has a write latency anything > near the write latency of even a slowest SSD you can find on the market. > On the other hand, only a very few rotating hard drives have a write > transfer rate that can be compared to SSD's. > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > ive seen an interesting article using 2x ram disks on two separate computer= s (UPS backed) shared out via iscsi over a high speed network. It had very successful results. Sounds a bit brave in a production environment to me though From owner-freebsd-fs@FreeBSD.ORG Thu May 12 08:59:49 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 321D4106564A for ; Thu, 12 May 2011 08:59:49 +0000 (UTC) (envelope-from fullermd@over-yonder.net) Received: from thyme.infocus-llc.com (server.infocus-llc.com [206.156.254.44]) by mx1.freebsd.org (Postfix) with ESMTP id CDB0B8FC0A for ; Thu, 12 May 2011 08:59:48 +0000 (UTC) Received: from draco.over-yonder.net (c-75-64-226-141.hsd1.ms.comcast.net [75.64.226.141]) (using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by thyme.infocus-llc.com (Postfix) with ESMTPSA id 8CFF837B454; Thu, 12 May 2011 03:40:59 -0500 (CDT) Received: by draco.over-yonder.net (Postfix, from userid 100) id 0456661C42; Thu, 12 May 2011 03:40:59 -0500 (CDT) Date: Thu, 12 May 2011 03:40:59 -0500 From: "Matthew D. Fuller" To: Jeremy Chadwick Message-ID: <20110512084058.GP90856@over-yonder.net> References: <4DCA5620.1030203@dannysplace.net> <20110511100655.GA35129@icarus.home.lan> <4DCA66CF.7070608@digsys.bg> <20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg> <20110511120830.GA37515@icarus.home.lan> <20110511223849.GA65193@DataIX.net> <20110512010433.GA48863@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110512010433.GA48863@icarus.home.lan> X-Editor: vi X-OS: FreeBSD User-Agent: Mutt/1.5.21-fullermd.4 (2010-09-15) X-Virus-Scanned: clamav-milter 0.97 at thyme.infocus-llc.com X-Virus-Status: Clean Cc: freebsd-fs@freebsd.org, Jason Hellenthal Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 08:59:49 -0000 On Wed, May 11, 2011 at 06:04:33PM -0700 I heard the voice of Jeremy Chadwick, and lo! it spake thus: > > (What confuses me about the "idle GC" method is how it determines > what it can erase -- if the OS didn't tell it what it's using, how > does it know it can erase the page?) I'm no expert either, but the following is my understanding... Remember that SSD's (like ZFS, a layer higher up) don't overwrite blocks, they write new data to a new block and update the pointers the level above them (the disk LBA in this case) to point at the new location. So when you overwrite LBA 12345 on the disk with new data, what actually happens is that the SDD writes that data to currently empy flash $SOMEWHERE, and updates its internal table so that LBA 12345 request go there. The bit of flash that was previously considered LBA 12345 still contains the old data, but is now "free" as far as the drive is concerned (though not immediately writable, as it needs to be erased first). Sorta like rm'ing a file doesn't actually delete its contents, just the name pointing to it. Where GC comes in is that the size you can write/address is smaller than the size flash has to be erased in. To pick numbers that are in the right ballpark (it will vary per drive), you have 512 byte blocks that you can read/write (like any other drive), but you can only erase a page of 8k at a time. So let's suppose you write 16 kB of data to a fresh drive. You've written 32 512-byte blocks, which completely fill up 2 8k pages. All nice and compact. Now let's suppose you overwrite from 4k-8k and 12k-16k. Now we have 8k of remaining useful data, but it's spread out over 2 8k pages (4k in each). We can't write new stuff those two now "empty" 4k sections, because we have to erase before we can write, and we can only erase the whole 8k page. This is where the GC kicks in; it knows (because those two LBA ranges have been overwritten) that they're no longer needed, and can notice that all the remaining important data in those two pages can actually fit in a single page. So, it can read 0k-4k and 8k-12k, and write them into a new empty page. Update its LBA map to point those logical addresses over to the new in-flash location, and now the entirety of those two original 8k pages is unused. So now it can go ahead and erase them both, and put them on the "ready for reuse" list. Now, as for TRIM. There are two ways that a block (or set of blocks) can become "no longer needed". One is that they're overwritten with new data; the drive knows that and can mark them as unused like above. The other is that they contain data for a file that's deleted. But the drive has no idea what files being deleted means. All that happens from the drive's perspective is an overwrite of some LBA's that, to the OS, contain directory info. It has no way of knowing that impacts these other LBA's that held a file. TRIM allows the OS to say "OK, these LBA's? Yeah, you can trash 'em now." And so they end up on the dead list, ready for the GC to collapse them away like above. So neither TRIM nor GC is a replacement for the other. GC is about collapsing away reapable space (and also serves a purpose in wear-levelling, but that's unimportant in this discussion). The drive automatically knows about space that's reapable because it was rewritten. TRIM lets it know about space that's reapable because of deletion. Without that, you could delete a file (so LBA 54321 no longer contains useful info, and doesn't need to be preserved), but since the drive doesn't know that, not only can the GC not compact away that space, it has to go ahead and re-copy that block as if it held good data when it shuffles stuff around, so you're creating extra wear. GC can't make TRIM "unnecessary", any more than a book can make a flashlight unnecessary. TRIM is one of the ways you provide info for the GC to use. One thing that CAN make TRIM less important is writing in a "compact" manner (e.g., always write new data to the lowest available LBA). Assuming you oscillate around a steady disk usage (or slowly increase), that means that you'll tend to overwrite space for deleted files relatively soon, so the drive gets to know about the reapable space that way. With more random or other LBA allocation, or if you shrink the used space significantly, a deleted block may hang around unwritten to for much longer, and so have more chance for the GC to unnecessarily recopy and recopy it. This leaves entirely to one side annoying implementational issues. I'm given to understand that due to some combination of "dumb firmware implementation" and "dumb standardized requirements", TRIM can be an unbelievable expensive command, so doing it as part of e.g. 'rm' may damage performance outrageously. That may point to a better implementation being "rack up a list of LBA's and flush periodically", or "scan filesystem weekly and send TRIM's for all empty LBA's" or the like. But again, that's implementation. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. From owner-freebsd-fs@FreeBSD.ORG Thu May 12 09:05:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 50CF7106566C for ; Thu, 12 May 2011 09:05:51 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta05.emeryville.ca.mail.comcast.net (qmta05.emeryville.ca.mail.comcast.net [76.96.30.48]) by mx1.freebsd.org (Postfix) with ESMTP id 34FF38FC0A for ; Thu, 12 May 2011 09:05:50 +0000 (UTC) Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11]) by qmta05.emeryville.ca.mail.comcast.net with comcast id iZ5q1g0040EPchoA5Z5qF5; Thu, 12 May 2011 09:05:50 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta01.emeryville.ca.mail.comcast.net with comcast id iZ5Q1g00W1t3BNj8MZ5WCz; Thu, 12 May 2011 09:05:45 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 74C9E102C19; Thu, 12 May 2011 02:05:24 -0700 (PDT) Date: Thu, 12 May 2011 02:05:24 -0700 From: Jeremy Chadwick To: ?imun Mikecin Message-ID: <20110512090524.GA2106@icarus.home.lan> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 09:05:51 -0000 On Thu, May 12, 2011 at 10:45:01AM +0200, ?imun Mikecin wrote: > On 12. svi. 2011., at 10:34, Jeremy Chadwick wrote: > > > > I had no idea the primary point of a SLOG was to deal with applications > > that make use of O_SYNC. I thought it was supposed to improve write > > performance for both asynchronous and synchronous writes. Obviously I'm > > wrong here. > > If the application is not using O_SYNC, write operation returns to the > app before the data is actually written. Yes, I understand that -- O_SYNC is effectively the same as issuing fsync(2) after every write(2) call. I just thought that the ZIL improved both synchronous and asynchronous writes, but my understanding of the ZIL is obviously very limited. > > What guarantee is there that the intent log -- which is written to the > > disk -- actually got written to the disk in the middle of a power > > failure? There's a lot of focus there on the idea that "the intent log > > will fix everything, but may lose writes", but what guarantee do I have > > that the intent log isn't corrupt or botched during a power failure? > > I expect that checksumming also works for ZIL (anybody knows?). If > that is the case, corruption would be detected, but you will have lost > data unless you are using mirrored slog devices. I can't believe that statement either (the last line). I guess that's also what I'm asking here -- what guarantee do you have that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data* will be *lost* during a power outage? It seems to me the proper phrase would be "the likelihood of losing an entire pool during a power outage is lessened". Alexander indirectly hinted at this in another post of his tonight, specifically regarding zpool v15 versus v28: "The difference between v15 and v28 is the amount of data you lose (the entire pool vs. only what is still on the log devices)". This makes much more sense to me. It seems that in a power outage, there will always be some form of data loss. I imagine even systems that have hardware RAM/cache with BBUs on everything; there's always some form of caching going on *somewhere* within a system, from CPU all the way up, that guarantees some degree of data loss). I guess I'm OCD'ing over the terminology here. Sorry. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB | From owner-freebsd-fs@FreeBSD.ORG Thu May 12 09:12:20 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AE855106566B for ; Thu, 12 May 2011 09:12:20 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 33A4C8FC14 for ; Thu, 12 May 2011 09:12:20 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net [91.21.94.216]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 11BA3844015; Thu, 12 May 2011 11:12:03 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 150D4128A; Thu, 12 May 2011 11:12:00 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4C9BwCY026199; Thu, 12 May 2011 11:11:58 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011 11:11:58 +0200 Message-ID: <20110512111158.16451mu57sv0f8f4@webmail.leidinger.net> Date: Thu, 12 May 2011 11:11:58 +0200 From: Alexander Leidinger To: Jeremy Chadwick References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> In-Reply-To: <20110512083429.GA58841@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 11BA3844015.A258E X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0.077, required 6, autolearn=disabled, TW_ZF 0.08) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305796326.56093@9cEHblS5BSxK1HZIana+XA X-EBL-Spam-Status: No Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 09:12:20 -0000 Quoting Jeremy Chadwick (from Thu, 12 May 2011 01:34:29 -0700): > On Thu, May 12, 2011 at 09:33:06AM +0300, Daniel Kalchev wrote: >> On 12.05.11 06:36, Jeremy Chadwick wrote: >> >On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote: >> >>On Thu, 12 May 2011, Danny Carroll wrote: >> >>>Replying to myself in order to summarise the recommendations (when using >> >>>v28): >> >>>- Don't use SSD for the Log device. Write speed tends to be a problem. >> >>DO use SSD for the log device. The log device is only used for >> >>synchronous writes. Except for certain usages (E.g. database and >> >>NFS server) most writes will be asynchronous and never be written to >> >>the log. Huge synchronous writes will also bypass the SSD log >> >>device. The log device is for reducing latency on small synchronous >> >>writes. >> >Bob, please correct me if I'm wrong, but as I understand it a log device >> >(ZIL) effectively limits the overall write speed of the pool itself. >> > >> Perhaps I misstated it in my first post, but there is nothing wrong >> with using SSD for the SLOG. >> >> You can of course create usage/benchmark scenario, where an (cheap) >> SSD based SLOG will be worse than an (fast) HDD based SLOG, >> especially if you are not concerned about latency. The SLOG resolves >> two issues, it increases the pool throughput (primary storage) by >> removing small synchronous writes from it, that will unnecessarily >> introduce head movement and more IOPS and it provided low latency >> for small synchronous writes. > > I've been reading about this in detail here: > > http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained > > I had no idea the primary point of a SLOG was to deal with applications > that make use of O_SYNC. I thought it was supposed to improve write > performance for both asynchronous and synchronous writes. Obviously I'm > wrong here. > > The author's description (at that URL) of an example scenario makes > little sense to me; there's a story he tells referring to a bank and a > financial transaction of US$699 performed which got cached in RAM and > then the system lost power -- and how the intent log on a filesystem > would be replayed during reboot. > > What guarantee is there that the intent log -- which is written to the > disk -- actually got written to the disk in the middle of a power > failure? There's a lot of focus there on the idea that "the intent log > will fix everything, but may lose writes", but what guarantee do I have > that the intent log isn't corrupt or botched during a power failure? The request comes in, the data is written to stable storage (as it is a sync-write to the SLOG), the application knows that the data has hit stable storage when the write-call returns (as it is a sync-write), the app ACKs to the other party. Without the SLOG you can have the same, just at a lower speed (if done correctly). So the SLOG is not about the guarantee (you should have it already with a normal pool and disks which tell the truth regarding a cache flush), the SLOG is about a higher amount of transactions with such a guarantee. >> The later is only valid if the SSD is sufficiently write-optimized. >> Most consumer SSDs end up saturated by writes. Sequential write IOPS >> is what matters here. > > Oh, I absolutely agree on this point. So basically consumer-level SSDs > that don't provide extreme write speed benefits (compared to a classic > MHDD) -- not discussing seek times here, we all know SSDs win there -- > probably aren't good candidates for SLOGs. > > What's interesting about the focus on IOPS is that Intel SSDs, in the > consumer class, still trump their competitors. But given that your > above statement focuses on sequential writes, and the site I provided is > quite clear about what happens to sequential writes on Intel SSD that > doesn't have TRIM..... Yeah, you get where I'm going with this. :-) TRIM for SLOG is IMO more important than TRIM for the cache. For the SLOG the write-latency matters, for the cache normally it does not _that much_. Remember, if you are in the case that something is moved from RAM to L2ARC, the data you move is not needed ATM. Data is moved from RAM to L2ARC because the OS decides that either the ARC is at some kind of high-watermark (predicting the future and make sure there is some free space for future data, respectively some kind of garbage collection), or because the OS really needs some free RAM _now_ (either some free area in the ARC, or because an application needs memory). In the first case the write latency does not matter much, in the second case it matters (but in this case you can evaluate if adding more RAM is an option here). >> About TRIM. As it was already mentioned, you will use only small >> portion of an (for example) 32GB SSD for the SLOG. If you do not >> allocate the entire SSD, then wear leveling will be able to play >> well and it is very likely you will not suffer any performance >> degradation. > > That sounds ideal, though I'm not sure about the "won't suffer ANY > performance degradation" part. I think degradation is just less likely > to be witnessed. IMO TRIM support for ZFS can improve the performance. IMO the most bang for the bucks would be to add TRIM support first (if it can not be added to everything at the same time) for SLOGs, then for the pool, and then for the cache. My rationale here is, that if you use a SLOG you have very high requirements for sync-writes, and consumer SSDs could give you a lot of ROI if the SSD is not used completely and TRIM is used. I do not expect that TRIM for the cache gives a lot of ROI (less than TRIM support for the pool). FYI: it also depends upon how TRIM is implemented. If you TRIM one LBA after another, this adds a huge amount of latency just for the TRIM. I do not know if TRIMming a range of LBAs is a lot cheaper, but I would expect it is. TRIMming in FreeBSD (in UFS) is AFAIK one LBA after another. Bye, Alexander. -- "Sonny, what is it?" "They shot the old man. Don't worry, he's not dead." -- Sandra and Santino Corleone, "Chapter 2", page 83 http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu May 12 09:50:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C72E1065670 for ; Thu, 12 May 2011 09:50:30 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id CE0168FC22 for ; Thu, 12 May 2011 09:50:29 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net [91.21.94.216]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 31118844015; Thu, 12 May 2011 11:50:15 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 44FEF128B; Thu, 12 May 2011 11:50:12 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4C9oBqD035205; Thu, 12 May 2011 11:50:11 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011 11:50:11 +0200 Message-ID: <20110512115011.17724x18akn60oao@webmail.leidinger.net> Date: Thu, 12 May 2011 11:50:11 +0200 From: Alexander Leidinger To: Jeremy Chadwick References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> <20110512090524.GA2106@icarus.home.lan> In-Reply-To: <20110512090524.GA2106@icarus.home.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: 7bit User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: 31118844015.AEC84 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0.6, required 6, autolearn=disabled, J_CHICKENPOX_33 0.60) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305798616.47302@T9cO4aU4OGvqHqwn9kk7mQ X-EBL-Spam-Status: No Cc: freebsd-fs Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 09:50:30 -0000 Quoting Jeremy Chadwick (from Thu, 12 May 2011 02:05:24 -0700): >> > What guarantee is there that the intent log -- which is written to the >> > disk -- actually got written to the disk in the middle of a power >> > failure? There's a lot of focus there on the idea that "the intent log >> > will fix everything, but may lose writes", but what guarantee do I have >> > that the intent log isn't corrupt or botched during a power failure? >> >> I expect that checksumming also works for ZIL (anybody knows?). If It would be a damn big design flaw if it wouldn't checksum the ZIL. >> that is the case, corruption would be detected, but you will have lost >> data unless you are using mirrored slog devices. > > I can't believe that statement either (the last line). > > I guess that's also what I'm asking here -- what guarantee do you have > that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data* > will be *lost* during a power outage? > > It seems to me the proper phrase would be "the likelihood of losing an > entire pool during a power outage is lessened". Alexander indirectly > hinted at this in another post of his tonight, specifically regarding > zpool v15 versus v28: > > "The difference between v15 and v28 is the amount of data you lose (the > entire pool vs. only what is still on the log devices)". To recover the context: This was for losing the SLOG completely. > This makes much more sense to me. > > It seems that in a power outage, there will always be some form of data > loss. I imagine even systems that have hardware RAM/cache with BBUs on > everything; there's always some form of caching going on *somewhere* > within a system, from CPU all the way up, that guarantees some degree of > data loss). I guess I'm OCD'ing over the terminology here. Sorry. A simple power-loss should not destroy the SLOG (or the pool). For easy comprehension just let us assume that the log can only be destroyed by a hardware problem (broken disk -> the reason why it should be mirrored -> if all devices are broken, you have the same case as if the pool without a SLOG lost more drives than the redundancy allows): As written in my other mail (which I've send before I've seen this mail but probably arrived after you wrote this mail), the SLOG is not about an enhanced guarantee (you had the guarantee before), it is about performance. You need to handle the data-loss problem at several layers. If you have a power-loss during the write of the SLOG, you will lose the last SLOG entry (but there is no corruption). At this point in time the write did not return to the application, so the application should not have ACKed the reception of the data. If it did, you will lose data. If it didn't the application will just pick this transaction again from the queue of outstanding transactions and redo it. Detecting the case of a succeeded write but a power-loss before the ACK to the sender is up to be handled in the application too (e.g. calculating an ID based upon the incoming data, writing the ID together with the rest of the transaction, if the ID is in e.g. the DB and a corresponding state flag in the DB (if the processing is split up into several DB-transactions) which is written in the corresponding transaction then you know that the write before the power-loss was done correctly and the app just needs to ACK to the sender). Was this clear enough, or shall I try to draw a better picture (in this case please try to specify your concerns, maybe with an example)? Bye, Alexander. -- Do YOU have redeeming social value? http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu May 12 10:03:36 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5CDD7106564A for ; Thu, 12 May 2011 10:03:36 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de [217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id CCF488FC16 for ; Thu, 12 May 2011 10:03:35 +0000 (UTC) Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net [91.21.94.216]) by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E2840844015; Thu, 12 May 2011 12:03:20 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [IPv6:fd73:10c7:2053:1::2:102]) by outgoing.leidinger.net (Postfix) with ESMTP id 0421A128C; Thu, 12 May 2011 12:03:18 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4CA3HOg038348; Thu, 12 May 2011 12:03:17 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011 12:03:17 +0200 Message-ID: <20110512120317.12543g51m4im15k4@webmail.leidinger.net> Date: Thu, 12 May 2011 12:03:17 +0200 From: Alexander Leidinger To: =?utf-8?b?wqlpbXVu?= Mikecin References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6) X-EBL-MailScanner-Information: Please contact the ISP for more information X-EBL-MailScanner-ID: E2840844015.ADE71 X-EBL-MailScanner: Found to be clean X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN, SpamAssassin (not cached, score=0, required 6, autolearn=disabled) X-EBL-MailScanner-From: alexander@leidinger.net X-EBL-MailScanner-Watermark: 1305799403.3855@Z7UYgEbaHiH2ZjllWstZjA X-EBL-Spam-Status: No Cc: freebsd-fs Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 10:03:36 -0000 Quoting =C2=A9imun Mikecin (from Thu, 12 May 2011 =20 10:02:42 +0200): > > On 12. svi. 2011., at 08:44, Daniel Kalchev wrote: > >> On 12.05.11 05:26, Danny Carroll wrote: >>> >>> - Don't use SSD for the Log device. Write speed tends to be a problem= . >> It all depends on your usage. You need to experiment, unfortunately. > > What is the alternative for log devices if you are not using SSD? > Rotating hard drives? > > AFAIK, two factors define the speed of log device: write transfer =20 > rate and write latency. There is also bus contention (either on the SCSI bus, or in the SATA =20 channel/controller, or on the PCI-whatever (e/X/y) bus). > You will not find a rotating hard drive that has a write latency =20 > anything near the write latency of even a slowest SSD you can find =20 > on the market. > On the other hand, only a very few rotating hard drives have a write =20 > transfer rate that can be compared to SSD's. And if your PCI-something bus is not saturated but your SCSI/SATA =20 controller struggles with the work which is thrown at it, a separate =20 log device (normal HD) on another controller could free up the =20 pool-controller(s) up to a situation where it can handle all requests =20 at full speed and the log-controller can provide the additional =20 throughput at full speed which the pool-controller was not able to =20 satisfy. What you do in this case is that you add more spindles (disks) =20 dedicated to sync-write operations. The normal RAID-common-knowledge =20 of "adding more spindles for more performance" applies here, just that =20 it is specially for sync-write operations. The generic hint to have =20 them faster than the pool-disks is an answer for the worst case. As =20 always, the worst case for one person may not be the worst case for =20 another persons workload. Bye, Alexander. --=20 If love is the answer, could you rephrase the question? =09=09-- Lily Tomlin http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137 From owner-freebsd-fs@FreeBSD.ORG Thu May 12 10:16:56 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 63E48106566B for ; Thu, 12 May 2011 10:16:56 +0000 (UTC) (envelope-from daniel@digsys.bg) Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230]) by mx1.freebsd.org (Postfix) with ESMTP id C87F78FC1A for ; Thu, 12 May 2011 10:16:55 +0000 (UTC) Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5]) (authenticated bits=0) by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4CAGh52059565 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 12 May 2011 13:16:49 +0300 (EEST) (envelope-from daniel@digsys.bg) Message-ID: <4DCBB38B.3090806@digsys.bg> Date: Thu, 12 May 2011 13:16:43 +0300 From: Daniel Kalchev User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9 MIME-Version: 1.0 To: Jeremy Chadwick References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> In-Reply-To: <20110512083429.GA58841@icarus.home.lan> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 10:16:56 -0000 On 12.05.11 11:34, Jeremy Chadwick wrote: > > I guess this is why others have mentioned the importance of BBUs and > supercaps, but I don't know what guarantee there is that during a power > failure there won't be some degree of filesystem corruption or lost > data. You can think of the SLOG as the BBU of ZFS. The best SLOG of course is battery backed RAM. Just what the BBUs are. Any battery backed RAM device used for SLOG will beat (by a large margin) any however expensive SSD. Fears of corruption is, besides performance, what makes people use SLC flash for SLOG devices. The MLC flash is much more prone to errors, than SLC flash. This includes situations like power loss. This is also the reason people talk so much about super-capacitors. > >> How can ever TRIM support influence reading from the drive?! > I guess you want more proof, so here you go. Of course :) > I imagine the reason this happens is similar to why memory performance > degrades under fragmentation or when there's a lot of "middle-man stuff" > going on. TRIM does not change fragmentation. All TRIM does is erase the flash cells in background, so that when the new write request arrives, data can just be written, instead of erased-written. The erase operation is slow in flash memory. Think of TRIM as OS-assisted garbage collection. It is nothing else -- no matter what advertising says :) Also, please note that there is no "fragmentation" in either SLOG or L2ARC to be concerned with. There are no "files" there - just raw blocks that can sit anywhere. >> TRIM is an slow operation. How often are these issued? > Good questions, for which I have no answer. The same could be asked of > any OS however, not just Windows. And I've asked the same question > about SSDs internal "garbage collection" too. I have no answers, so you > and I are both wondering the same question. And yes, I am aware TRIM is > a costly operation. Well, at least we know some commodity SSDs on the market have "lazy" garbage collection, some do it right away. The "lazy" drives give good performance initially Jeremy, thanks for the detailed data. So much about theory :) Just a quick "(slow) HDD as SLOG" test, not very scientific :) Hardware: Supermicro X8DTH-6F (integrated LSI2008) 2xE5620 Xeons 24 GB RAM 6x Hitachi HDS72303 drives All disks are labeled with GPT, first partition on 1GB. First, create ashift=12 raidz2 zpool with all drives # gnop create -S 4096 gpt/disk00 # zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 gpt/disk05 $ bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP a1.register.bg 48G 126 99 293971 93 177423 52 357 99 502710 86 234.2 8 Latency 68881us 2817ms 5388ms 37301us 1266ms 471ms Version 1.96 ------Sequential Create------ --------Random Create-------- a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 25801 90 +++++ +++ 23915 94 25869 98 +++++ +++ 24858 97 Latency 12098us 117us 141us 24121us 29us 66us 1.96,1.96,a1.register.bg,1,1305158675,48G,,126,99,293971,93,177423,52,357,99,502710,86,234. 2,8,16,,,,,25801,90,+++++,+++,23915,94,25869,98,+++++,+++,24858,97,68881us,2817ms,5388ms,37 301us,1266ms,471ms,12098us,117us,141us,24121us,29us,66us Recreate the pool with 5 drives + one drive as SLOG # zpool destroy storage # zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 log gpt/disk05 $ bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP a1.register.bg 48G 110 99 306932 68 223853 46 354 99 664034 65 501.8 11 Latency 172ms 11571ms 4217ms 50414us 1895ms 245ms Version 1.96 ------Sequential Create------ --------Random Create-------- a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 24673 97 +++++ +++ 24262 98 19108 97 +++++ +++ 23821 97 Latency 12051us 132us 143us 23392us 47us 79us 1.96,1.96,a1.register.bg,1,1305171999,48G,,110,99,306932,68,223853,46,354,99,664034,65,501.8,11,16,,,,,24673,97,+++++,+++,24262,98,19108,97,+++++,+++,23821,97,172ms,11571ms,4217ms,50414us,1895ms,245ms,12051us,132us,143us,23392us,47us,79us Interesting to note that zpool iostat -v 1 never showed more than 128K of usage on the SLOG drive, although from time to time it was hitting over 1200 IOPS and over 150 MB/s write. Also, the second pool is with one disk less. For comparison, here is the same pool with 5 disks and no SLOG # zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 gpt/disk04 $ bonnie++ Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP a1.register.bg 48G 118 99 287361 92 152566 40 345 98 398392 51 242.4 24 Latency 56962us 2619ms 4308ms 57304us 1214ms 350ms Version 1.96 ------Sequential Create------ --------Random Create-------- a1.register.bg -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 27438 95 +++++ +++ 19374 90 25259 97 +++++ +++ 6876 99 Latency 8913us 200us 295us 27249us 30us 238us 1.96,1.96,a1.register.bg,1,1305165435,48G,,118,99,287361,92,152566,40,345,98,398392,51,242. 4,24,16,,,,,27438,95,+++++,+++,19374,90,25259,97,+++++,+++,6876,99,56962us,2619ms,4308ms,57 304us,1214ms,350ms,8913us,200us,295us,27249us,30us,238us One side effect I forgot to mention from using a SLOG is less fragmentation in the pool. When the ZIL is in the main pool, it is frequently written and erased and the ZIL is variable size, leaving undesired gaps. Hope this helps. Daniel From owner-freebsd-fs@FreeBSD.ORG Thu May 12 13:57:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD1C3106564A for ; Thu, 12 May 2011 13:57:30 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 90A348FC0A for ; Thu, 12 May 2011 13:57:29 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4CDvHqT015705; Thu, 12 May 2011 08:57:17 -0500 (CDT) Date: Thu, 12 May 2011 08:57:17 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jeremy Chadwick In-Reply-To: <20110512033626.GA52047@icarus.home.lan> Message-ID: References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Thu, 12 May 2011 08:57:17 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 13:57:30 -0000 On Wed, 11 May 2011, Jeremy Chadwick wrote: > > Bob, please correct me if I'm wrong, but as I understand it a log device > (ZIL) effectively limits the overall write speed of the pool itself. > Consumer-level SSDs do not have extremely high write performance (and it > gets worse without TRIM; again a 70% decrease in write speed in some > cases). It is certainly a factor. However, large block writes (something like 128K, I don't remember exactly) bypass the dedicated log device and instead are written to the main store (with only a reference being added to the dedicated device). The reason this is done is for the exact reason you point out. The SSD has a very fast seek and zero rotational latency but being a singular resource it suffers from bandwidth limitations. The main store usually suffers from multi-millisecond seeks and rotational latency but offers linearly scalable and substantial write performance for larger writes. Matt Ahrens has described this a few times on the zfs-discuss list and there is mention of it on slide 15 of the presentation found at "http://www.slideshare.net/edigit/zfs-presentation". The large write feature of the ZIL is a reason why we should appreciate modern NFS's large-write capability and avoid anchient NFS. It is worth mentioning that the ZIL is a write-only device which is only read when the system boots or a pool is imported. The writes are usually "write and forget" since zfs uses them to improve its ability to cache larger transaction groups. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Thu May 12 14:08:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B069B106564A for ; Thu, 12 May 2011 14:08:06 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 770D18FC13 for ; Thu, 12 May 2011 14:08:06 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4CE85uW015763; Thu, 12 May 2011 09:08:05 -0500 (CDT) Date: Thu, 12 May 2011 09:08:05 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Jeremy Chadwick In-Reply-To: <20110512083429.GA58841@icarus.home.lan> Message-ID: References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Thu, 12 May 2011 09:08:05 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 14:08:06 -0000 On Thu, 12 May 2011, Jeremy Chadwick wrote: > > What guarantee is there that the intent log -- which is written to the > disk -- actually got written to the disk in the middle of a power > failure? There's a lot of focus there on the idea that "the intent log > will fix everything, but may lose writes", but what guarantee do I have > that the intent log isn't corrupt or botched during a power failure? This is pretty easy. Zfs requests that the disk containing the intent log commit its update (and waits for completion) before it returns from the "write" request. As long as the disk does not lie, the data will be present after the reboot. Note that many SSDs do lie about cache commit requests and these are best avoided for anything to do with zfs. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Thu May 12 18:05:59 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 852B3106566C for ; Thu, 12 May 2011 18:05:59 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 53A5A8FC13 for ; Thu, 12 May 2011 18:05:59 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.74 (FreeBSD)) (envelope-from ) id 1QKaGr-000BCB-RL; Thu, 12 May 2011 14:05:57 -0400 Date: Thu, 12 May 2011 14:05:57 -0400 From: Gary Palmer To: Jeremy Chadwick Message-ID: <20110512180557.GB37035@in-addr.com> References: <4DCA5620.1030203@dannysplace.net> <4DCB455C.4020805@dannysplace.net> <20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg> <20110512083429.GA58841@icarus.home.lan> <20110512090524.GA2106@icarus.home.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110512090524.GA2106@icarus.home.lan> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: freebsd-fs Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 18:05:59 -0000 On Thu, May 12, 2011 at 02:05:24AM -0700, Jeremy Chadwick wrote: > I guess that's also what I'm asking here -- what guarantee do you have > that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data* > will be *lost* during a power outage? > > It seems to me the proper phrase would be "the likelihood of losing an > entire pool during a power outage is lessened". Alexander indirectly > hinted at this in another post of his tonight, specifically regarding > zpool v15 versus v28: > > "The difference between v15 and v28 is the amount of data you lose (the > entire pool vs. only what is still on the log devices)". > > This makes much more sense to me. > > It seems that in a power outage, there will always be some form of data > loss. I imagine even systems that have hardware RAM/cache with BBUs on > everything; there's always some form of caching going on *somewhere* > within a system, from CPU all the way up, that guarantees some degree of > data loss). I guess I'm OCD'ing over the terminology here. Sorry. At one level, nothing you can do in hardware can protect you from data loss or corruption due to a power outage. This is why applications and protocols must be designed with that in mind. E.g. RFC 821/2821/5321 explicitly state that a MTA cannot acknowledge the . at the end of the DATA segment until the message is committed to permanent storage. That can lead to message duplication, but thats better than the alternative - the message is always queued *somewhere*. (And yes, there are/ were vendors who "accidentally" overlook that requirement in the name of increased throughput) Trying to solve this entirely in hardware is pointless. You need to look at the entire system end-to-end to eliminate data loss problems. Regards, Gary From owner-freebsd-fs@FreeBSD.ORG Thu May 12 23:02:53 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 087D3106566B for ; Thu, 12 May 2011 23:02:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id BC2FF8FC14 for ; Thu, 12 May 2011 23:02:52 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAI9mzE2DaFvO/2dsb2JhbACEVqIaiHCuNpEYgSuDY4EHBI98jwU X-IronPort-AV: E=Sophos;i="4.64,361,1301889600"; d="scan'208";a="124562715" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 12 May 2011 19:02:51 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C4EDE793A7; Thu, 12 May 2011 19:02:51 -0400 (EDT) Date: Thu, 12 May 2011 19:02:51 -0400 (EDT) From: Rick Macklem To: Bob Friesenhahn Message-ID: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 23:02:53 -0000 > On Wed, 11 May 2011, Jeremy Chadwick wrote: > > > > Bob, please correct me if I'm wrong, but as I understand it a log > > device > > (ZIL) effectively limits the overall write speed of the pool itself. > > Consumer-level SSDs do not have extremely high write performance > > (and it > > gets worse without TRIM; again a 70% decrease in write speed in some > > cases). > > It is certainly a factor. However, large block writes (something like > 128K, I don't remember exactly) bypass the dedicated log device and > instead are written to the main store (with only a reference being > added to the dedicated device). The reason this is done is for the > exact reason you point out. The SSD has a very fast seek and zero > rotational latency but being a singular resource it suffers from > bandwidth limitations. The main store usually suffers from > multi-millisecond seeks and rotational latency but offers linearly > scalable and substantial write performance for larger writes. > > Matt Ahrens has described this a few times on the zfs-discuss list and > there is mention of it on slide 15 of the presentation found at > "http://www.slideshare.net/edigit/zfs-presentation". > > The large write feature of the ZIL is a reason why we should > appreciate modern NFS's large-write capability and avoid anchient NFS. > The size of a write for the new FreeBSD NFS server is limited to MAX_BSIZE. It is currently 64K, but I would like to see it much larger. I am going to try increasing MAX_BSIZE soon, to see what happens. This sounds like another good reason to increase it. However, a client chooses what size to use, up to the server`s limit (and, again, MAX_BSIZE for the FreeBSD client). rick From owner-freebsd-fs@FreeBSD.ORG Thu May 12 23:19:51 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9DB19106566B for ; Thu, 12 May 2011 23:19:51 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 6522B8FC19 for ; Thu, 12 May 2011 23:19:51 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4CNJo53017906; Thu, 12 May 2011 18:19:50 -0500 (CDT) Date: Thu, 12 May 2011 18:19:50 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Rick Macklem In-Reply-To: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Thu, 12 May 2011 18:19:50 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 May 2011 23:19:51 -0000 On Thu, 12 May 2011, Rick Macklem wrote: >> The large write feature of the ZIL is a reason why we should >> appreciate modern NFS's large-write capability and avoid anchient NFS. >> > The size of a write for the new FreeBSD NFS server is limited to > MAX_BSIZE. It is currently 64K, but I would like to see it much larger. > I am going to try increasing MAX_BSIZE soon, to see what happens. Zfs would certainly appreciate 128K since that is its default block size. When existing file content is overwritten, writing in properly aligned 128K blocks is much faster due to ZFS's COW algorithm and not needing to read the existing block. With a partial "overwrite", if the existing block is not already cached in the ARC, then it would need to be read from underlying store before the replacement block can be written. This effect becomes readily apparent in benchmarks. In my own benchmarking I have found that 128K is sufficient and using larger multiples of 128K does not obtain much more performance. When creating a file from scratch, zfs performs well for async writes if a process writes data smaller than 128K. That might not be the case for sync writes. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Fri May 13 00:03:40 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 85066106566B for ; Fri, 13 May 2011 00:03:40 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id 448F18FC13 for ; Fri, 13 May 2011 00:03:40 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEAK50zE2DaFvO/2dsb2JhbACEVqIaiHCte5ETgSuDY4EHBI98jwU X-IronPort-AV: E=Sophos;i="4.64,361,1301889600"; d="scan'208";a="120545285" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 12 May 2011 20:03:39 -0400 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 49C93B3F5B; Thu, 12 May 2011 20:03:39 -0400 (EDT) Date: Thu, 12 May 2011 20:03:39 -0400 (EDT) From: Rick Macklem To: Bob Friesenhahn Message-ID: <921935873.267812.1305245019197.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 00:03:40 -0000 > On Thu, 12 May 2011, Rick Macklem wrote: > >> The large write feature of the ZIL is a reason why we should > >> appreciate modern NFS's large-write capability and avoid anchient > >> NFS. > >> > > The size of a write for the new FreeBSD NFS server is limited to > > MAX_BSIZE. It is currently 64K, but I would like to see it much > > larger. > > I am going to try increasing MAX_BSIZE soon, to see what happens. > > Zfs would certainly appreciate 128K since that is its default block > size. When existing file content is overwritten, writing in properly > aligned 128K blocks is much faster due to ZFS's COW algorithm and not > needing to read the existing block. With a partial "overwrite", if > the existing block is not already cached in the ARC, then it would > need to be read from underlying store before the replacement block can > be written. This effect becomes readily apparent in benchmarks. In > my own benchmarking I have found that 128K is sufficient and using > larger multiples of 128K does not obtain much more performance. > > When creating a file from scratch, zfs performs well for async writes > if a process writes data smaller than 128K. That might not be the > case for sync writes. > Yep, I think sizes greater than 128K might only benefit WAN connections with a larger bandwidth * delay product. It also helps to find "not so great" network interfaces/drivers. When I used 128K on the Mac OS X port, it worked great for some Macs and horribly for others. Some Macs would drop packets when they would see a burst of read traffic (the Mac was a client and the server was Solaris10, which handles NFS read/write sizes up to 1Mbyte) and wouldn't perform well above 32Kbytes (for a now rather old port to Leopard). rick From owner-freebsd-fs@FreeBSD.ORG Fri May 13 00:46:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6B02C106566B for ; Fri, 13 May 2011 00:46:06 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com [209.85.213.182]) by mx1.freebsd.org (Postfix) with ESMTP id 272FD8FC0A for ; Fri, 13 May 2011 00:46:06 +0000 (UTC) Received: by yxl31 with SMTP id 31so927967yxl.13 for ; Thu, 12 May 2011 17:46:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=/Z9GaqqcyNhje0gNUL7KOwYgFeYbd6c6Fr8+MdnbI+o=; b=hqC3Xf4xnwOsbS8GzegE30o4Z76U64Dwf2ctuXqc+rsIUcIumjZPXYFYOnn41QFxdt c4eCgaZNPPfDgx1cGMiZE0E8FZ2RB0ujisFqiP3GDqsKy88FVX2D6zEJXN1dB2LbyFyl ycvY4SjPaWTnjbVTzfDWJiHpCwso5ikyVz02U= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=P0/15Yg8rbeRHALIQT6S05gheSxZMUSZJJudKQ5vv+PNKMjLdMpvxi4J6Z7EEjIg7O QJfEWYSfguqweYUG0zJEaIPgKdrrREqyrNzYi9sUnRucYrSM74Vyx6402+5j6iMEJLW7 1XR0wd6erGx7OSXHTa/FDYOBgtaF/tT7pS1fE= MIME-Version: 1.0 Received: by 10.90.194.2 with SMTP id r2mr841599agf.86.1305247565592; Thu, 12 May 2011 17:46:05 -0700 (PDT) Received: by 10.90.52.15 with HTTP; Thu, 12 May 2011 17:46:05 -0700 (PDT) In-Reply-To: References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> Date: Thu, 12 May 2011 17:46:05 -0700 Message-ID: From: Freddie Cash To: Bob Friesenhahn Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 00:46:06 -0000 On Thu, May 12, 2011 at 4:19 PM, Bob Friesenhahn wrote: > On Thu, 12 May 2011, Rick Macklem wrote: >>> >>> The large write feature of the ZIL is a reason why we should >>> appreciate modern NFS's large-write capability and avoid anchient NFS. >>> >> The size of a write for the new FreeBSD NFS server is limited to >> MAX_BSIZE. It is currently 64K, but I would like to see it much larger. >> I am going to try increasing MAX_BSIZE soon, to see what happens. > > Zfs would certainly appreciate 128K since that is its default block size. Note: the "default block size" is a max block size, not an "every block written is this size" setting. A ZFS filesystem will use any power-of-2 size under the block size setting for that filesystem. Only zvols have an "every block written will be this size" setting. -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Fri May 13 12:12:45 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 71A54106566C for ; Fri, 13 May 2011 12:12:45 +0000 (UTC) (envelope-from szumo@szumo.net) Received: from v000054.home.net.pl (people.pl [212.85.96.54]) by mx1.freebsd.org (Postfix) with SMTP id CCA798FC12 for ; Fri, 13 May 2011 12:12:44 +0000 (UTC) Received: from vmy2.home.net.pl [79.96.240.52] (HELO vmy2.home.net.pl) by people.home.pl [212.85.96.54] with SMTP (IdeaSmtpServer v0.70) id 5ad367fe1196bb13; Fri, 13 May 2011 13:46:03 +0200 Received: from 80.53.66.10 (80.53.66.10) user szumo.people via webmail From: "=?UTF-8?B?TWFjaWVqIFN6dW1vY2tp?=" To: freebsd-fs@freebsd.org Date: Fri, 13 May 2011 13:46:03 +0200 Content-Type: text/plain; charset=UTF-8 User-Agent: home.pl my.webmail/2.0 MIME-Version: 1.0 X-Mailer: home.pl my.webmail/2.0 X-Priority: 3 Message-ID: <1975530a6ee695ac3a1fea7648612c50.qmail@home.pl> Content-Transfer-Encoding: base64 Subject: Kernel panic on zfs pool import (8.2-RELEASE #0 amd64) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 12:12:45 -0000 77u/SGkgYWxsLAoKSSBnZXQgYSBrZXJuZWwgcGFuaWMgd2hlbiB0cnlpbmcgdG8gaW1wb3J0IGEg emZzIHBvb2wgb24gYm9vdDoKRmF0YWwgdHJhcCAxMjogcGFnZSBmYXVsdCB3aGlsZSBpbiBrZXJu ZWwgbW9kZQpjcHVpZCA9IDA7IGFwaWMgaWQgPSAwMApmYXVsdCB2aXJ0dWFsIGFkZHJlc3MgICA9 IDB4MjgKZmF1bHQgY29kZSAgICAgICAgICAgICAgPSBzdXBlcnZpc29yIHJlYWQgZGF0YSwgcGFn ZSBub3QgcHJlc2VudAppbnN0cnVjdGlvbiBwb2ludGVyICAgICA9IDB4MjA6MHhmZmZmZmZmZjgw ZGVmYjkzCnN0YWNrIHBvaW50ZXIgICAgICAgICAgID0gMHgyODoweGZmZmZmZjgxMjExMzk1NTAK ZnJhbWUgcG9pbnRlciAgICAgICAgICAgPSAweDI4OjB4ZmZmZmZmODEyMTEzOTU4MApjb2RlIHNl Z21lbnQgICAgICAgICAgICA9IGJhc2UgMHgwLCBsaW1pdCAweGZmZmZmLCB0eXBlIDB4MWIKICAg ICAgICAgICAgICAgICAgICAgICAgID0gRFBMIDAsIHByZXMgMSwgbG9uZyAxLCBkZWYzMiAwLCBn cmFuIDEKcHJvY2Vzc29yIGVmbGFncyAgICAgICAgPSBpbnRlcnJ1cHQgZW5hYmxlZCwgcmVzdW1l LCBJT1BMID0gMApjdXJyZW50IHByb2Nlc3MgICAgICAgICA9IDExMTAgKHpwb29sKQp0cmFwIG51 bWJlciAgICAgICAgICAgICA9IDEyCnBhbmljOiBwYWdlIGZhdWx0CmNwdWlkID0gMApLREI6IHN0 YWNrIGJhY2t0cmFjZToKIzAgMHhmZmZmZmZmZjgwNWY0ZTBlIGF0IGtkYl9iYWNrdHJhY2UrMHg1 ZQojMSAweGZmZmZmZmZmODA1YzJkMDcgYXQgcGFuaWMrMHgxODcKIzIgMHhmZmZmZmZmZjgwOGFj NjAwIGF0IHRyYXBfZmF0YWwrMHgyOTAKIzMgMHhmZmZmZmZmZjgwOGFjOWRmIGF0IHRyYXBfcGZh dWx0KzB4MjhmCiM0IDB4ZmZmZmZmZmY4MDhhY2ViZiBhdCB0cmFwKzB4M2RmCiM1IDB4ZmZmZmZm ZmY4MDg5NGZiNCBhdCBjYWxsdHJhcCsweDgKIzYgMHhmZmZmZmZmZjgwZGY2YjU3IGF0IHZkZXZf bWlycm9yX2NoaWxkX3NlbGVjdCsweDY3CiM3IDB4ZmZmZmZmZmY4MGRmNzBmZSBhdCB2ZGV2X21p cnJvcl9pb19zdGFydCsweDIzZQojOCAweGZmZmZmZmZmODBlMDgyODcgYXQgemlvX2V4ZWN1dGUr MHg3NwojOSAweGZmZmZmZmZmODBlMDgzMmQgYXQgemlvX3dhaXQrMHgyZAojMTAgMHhmZmZmZmZm ZjgwZGJhNjlhIGF0IGFyY19yZWFkX25vbG9jaysweDZiYQojMTEgMHhmZmZmZmZmZjgwZGM2YzIw IGF0IGRtdV9vYmpzZXRfb3Blbl9pbXBsKzB4ZDAKIzEyIDB4ZmZmZmZmZmY4MGRkNzQwYSBhdCBk c2xfcG9vbF9vcGVuKzB4NWEKIzEzIDB4ZmZmZmZmZmY4MGRlNTgyMiBhdCBzcGFfbG9hZCsweDM1 MgojMTQgMHhmZmZmZmZmZjgwZGU2M2YzIGF0IHNwYV9vcGVuX2NvbW1vbisweDEzMwojMTUgMHhm ZmZmZmZmZjgwZTE3YjczIGF0IHpmc19sb2dfaGlzdG9yeSsweDMzCiMxNiAweGZmZmZmZmZmODBl MTdlY2QgYXQgemZzZGV2X2lvY3RsKzB4YmQKSSd2ZSB1c2VkIFNvbGFyaXMgRXhwcmVzcyAxMSBV U0IgbGl2ZSBpbWFnZSB0byBpbXBvcnQgLUZmIHRoZSBwb29sICh3aGljaApkaXNjYXJkZWQgNTgg dHJhbnNhY3Rpb25zKSB0aGVuIGV4cG9ydCBpdCwgYWZ0ZXIgd2hpY2ggRnJlZUJTRCBib290cyAg CmFuZCB6cG9vbCBpbXBvcnQgc2hvd3M6CnNlbnRyeSMgenBvb2wgaW1wb3J0CiAgIHBvb2w6IHpm aWxlcwogICAgIGlkOiA3NjI2MzI1MjE2MTQ5MzAwNjYyCiAgc3RhdGU6IFVOQVZBSUwKc3RhdHVz OiBPbmUgb3IgbW9yZSBkZXZpY2VzIGFyZSBtaXNzaW5nIGZyb20gdGhlIHN5c3RlbS4KYWN0aW9u OiBUaGUgcG9vbCBjYW5ub3QgYmUgaW1wb3J0ZWQuIEF0dGFjaCB0aGUgbWlzc2luZwogICAgICAg ICBkZXZpY2VzIGFuZCB0cnkgYWdhaW4uCiAgICBzZWU6IGh0dHA6Ly93d3cuc3VuLmNvbS9tc2cv WkZTLTgwMDAtNlgKY29uZmlnOgogICAgICAgICB6ZmlsZXMgICAgICBVTkFWQUlMICBtaXNzaW5n IGRldmljZQogICAgICAgICAgIHJhaWR6MSAgICBPTkxJTkUKICAgICAgICAgICAgIGFkMTQgICAg T05MSU5FCiAgICAgICAgICAgICBhZDYgICAgIE9OTElORQogICAgICAgICAgICAgYWQxMCAgICBP TkxJTkUKICAgICAgICAgICAgIGFkMTIgICAgT05MSU5FCiAgICAgICAgIEFkZGl0aW9uYWwgZGV2 aWNlcyBhcmUga25vd24gdG8gYmUgcGFydCBvZiB0aGlzIHBvb2wsIHRob3VnaCB0aGVpcgogICAg ICAgICBleGFjdCBjb25maWd1cmF0aW9uIGNhbm5vdCBiZSBkZXRlcm1pbmVkLgp6ZGIgb3V0cHV0 IGZvciB0aGF0IHBvb2wgaXM6CnNlbnRyeSMgemRiCnpmaWxlcwogICAgIHZlcnNpb249MTUKICAg ICB0eGc9MAogICAgIHBvb2xfZ3VpZD03NjI2MzI1MjE2MTQ5MzAwNjYyCiAgICAgdmRldl90cmVl CiAgICAgICAgIHR5cGU9J3Jvb3QnCiAgICAgICAgIGlkPTAKICAgICAgICAgZ3VpZD03NjI2MzI1 MjE2MTQ5MzAwNjYyCmJhZCBjb25maWcgdHlwZSAxNiBmb3Igc3RhdHMKICAgICAgICAgY2hpbGRy ZW5bMF0KICAgICAgICAgICAgICAgICB0eXBlPSdyYWlkeicKICAgICAgICAgICAgICAgICBpZD0w CiAgICAgICAgICAgICAgICAgZ3VpZD02NDM5NTE3NDM1MTI1NjYyNDM3CiAgICAgICAgICAgICAg ICAgbnBhcml0eT0xCiAgICAgICAgICAgICAgICAgbWV0YXNsYWJfYXJyYXk9MjMKICAgICAgICAg ICAgICAgICBtZXRhc2xhYl9zaGlmdD0zNQogICAgICAgICAgICAgICAgIGFzaGlmdD05CiAgICAg ICAgICAgICAgICAgYXNpemU9NDAwMDc5NTU5MDY1NgogICAgICAgICAgICAgICAgIGlzX2xvZz0w CmJhZCBjb25maWcgdHlwZSAxNiBmb3Igc3RhdHMKICAgICAgICAgICAgICAgICBjaGlsZHJlblsw XQogICAgICAgICAgICAgICAgICAgICAgICAgdHlwZT0nZGlzaycKICAgICAgICAgICAgICAgICAg ICAgICAgIGlkPTAKICAgICAgICAgICAgICAgICAgICAgICAgIGd1aWQ9MTYzMjA0NTUxMDE3MTg0 Mzk0NTAKICAgICAgICAgICAgICAgICAgICAgICAgIHBhdGg9Jy9kZXYvYWQxNCcKICAgICAgICAg ICAgICAgICAgICAgICAgICAKcGh5c19wYXRoPScvcGNpQDAsMC9wY2kxMGRlLDU2M0BjL3BjaTEw OTUsNzEzMkAwL2Rpc2tAMSwwOnEnCiAgICAgICAgICAgICAgICAgICAgICAgICB3aG9sZV9kaXNr PTAKICAgICAgICAgICAgICAgICAgICAgICAgIERUTD0xNDYKYmFkIGNvbmZpZyB0eXBlIDE2IGZv ciBzdGF0cwogICAgICAgICAgICAgICAgIGNoaWxkcmVuWzFdCiAgICAgICAgICAgICAgICAgICAg ICAgICB0eXBlPSdkaXNrJwogICAgICAgICAgICAgICAgICAgICAgICAgaWQ9MQogICAgICAgICAg ICAgICAgICAgICAgICAgZ3VpZD05NDMwMjkwNDE0NTg4MDAxNzA4CiAgICAgICAgICAgICAgICAg ICAgICAgICBwYXRoPScvZGV2L2FkNicKICAgICAgICAgICAgICAgICAgICAgICAgIHBoeXNfcGF0 aD0nL3BjaUAwLDAvcGNpMTA0Myw4MzA4QDkvZGlza0AxLDA6cScKICAgICAgICAgICAgICAgICAg ICAgICAgIHdob2xlX2Rpc2s9MAogICAgICAgICAgICAgICAgICAgICAgICAgRFRMPTE0NQpiYWQg Y29uZmlnIHR5cGUgMTYgZm9yIHN0YXRzCiAgICAgICAgICAgICAgICAgY2hpbGRyZW5bMl0KICAg ICAgICAgICAgICAgICAgICAgICAgIHR5cGU9J2Rpc2snCiAgICAgICAgICAgICAgICAgICAgICAg ICBpZD0yCiAgICAgICAgICAgICAgICAgICAgICAgICBndWlkPTk4MDc3OTc5MTAxNDExNTU2NzIK ICAgICAgICAgICAgICAgICAgICAgICAgIHBhdGg9Jy9kZXYvYWQxMCcKICAgICAgICAgICAgICAg ICAgICAgICAgIHBoeXNfcGF0aD0nL3BjaUAwLDAvcGNpMTA0Myw4MzA4QDkvZGlza0AzLDA6cScK ICAgICAgICAgICAgICAgICAgICAgICAgIHdob2xlX2Rpc2s9MAogICAgICAgICAgICAgICAgICAg ICAgICAgRFRMPTE0NApiYWQgY29uZmlnIHR5cGUgMTYgZm9yIHN0YXRzCiAgICAgICAgICAgICAg ICAgY2hpbGRyZW5bM10KICAgICAgICAgICAgICAgICAgICAgICAgIHR5cGU9J2Rpc2snCiAgICAg ICAgICAgICAgICAgICAgICAgICBpZD0zCiAgICAgICAgICAgICAgICAgICAgICAgICBndWlkPTE2 Nzc3Mzg5NjkyNjc1NDQwODY5CiAgICAgICAgICAgICAgICAgICAgICAgICBwYXRoPScvZGV2L2Fk MTInCiAgICAgICAgICAgICAgICAgICAgICAgICAgCnBoeXNfcGF0aD0nL3BjaUAwLDAvcGNpMTBk ZSw1NjNAYy9wY2kxMDk1LDcxMzJAMC9kaXNrQDAsMDpjJwogICAgICAgICAgICAgICAgICAgICAg ICAgd2hvbGVfZGlzaz0wCiAgICAgICAgICAgICAgICAgICAgICAgICBEVEw9MTQzCmJhZCBjb25m aWcgdHlwZSAxNiBmb3Igc3RhdHMKICAgICBuYW1lPSd6ZmlsZXMnCiAgICAgc3RhdGU9MQogICAg IHRpbWVzdGFtcD0xMzA0MzYzODQ0CiAgICAgaG9zdGlkPTYxOTAwMAogICAgIGhvc3RuYW1lPSdz b2xhcmlzJwpBbnkgcmVjb21tZW5kYXRpb25zPwpNYWNpZWogU3p1bW9ja2k= From owner-freebsd-fs@FreeBSD.ORG Fri May 13 12:56:30 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A8CE3106564A for ; Fri, 13 May 2011 12:56:30 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 686AA8FC1B for ; Fri, 13 May 2011 12:56:30 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QKrut-0003bl-Uu for freebsd-fs@freebsd.org; Fri, 13 May 2011 14:56:27 +0200 Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 May 2011 14:56:27 +0200 Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 May 2011 14:56:27 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Date: Fri, 13 May 2011 13:56:15 +0100 Lines: 73 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 Subject: fusefs broken on 8-stable? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 12:56:30 -0000 Heya! Using encfs (built on top of fuse) gives me panics in combination with rsync. Dump didn't succeed. The info below is transcribbled from a photograph. This is repeatable. Without dump this is probably not very helpful.... # uname -a FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Mar 10 23:30:08 GMT 2011 root@XXX:/usr/obj/usr/src/sys/GENERIC amd64 First panic (top bits scrolled off screen): trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace #0 ... kbd_backtrace+0x5c #1 ... panic+0x1b4 #2 ... trap_fatal+0x394 #3 ... trap_pfault+0x252 #4 ... trap+0x3f4 #5 ... calltrap+0x8 #6 ... fdisp_make+0xe4 #7 ... fuse_lookup+0x1dc #8 ... VOP_LOOKUP_APV+0x4c #9 ... at lookup+0x61e #10 ... at namei+0x592 #11 ... at vn_open_cred+0x339 #12 ... at vn_open+0x1c #13 ... at kern_openat+0x152 #14 ... at kern_open+0x19 #15 ... at open+0x18 #16 ... at syscallenter+0x2d9 #17 ... at syscall+0x38 Second panic: code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 17 (vnlru) trap number = 12 panic: page fault cpuid = 0 KDB: stack backtrace #0 ... at kdb_backtrace+0x5c #1 ... at panic+0x1b4 #2 ... at trap_fatal+0x394 #3 ... at trap_pfault0x252 #4 ... at trap+0x3f4 #5 ... at calltrap+0x8 #6 ... at fdisp_make_pid+0xc7 #7 ... at fuse_send_forget+0x44 #8 ... at fuse_recyc_backend+0xb2 #9 ... at VOP_RECLAIM_APV+0x49 #10 ... at vgonel+0x1b7 #11 ... at vnlru_proc+0x591 #12 ... at fork_exit+0x121 #13 ... at fork_trampoline+0xe Any idea what could be going on? Johannes From owner-freebsd-fs@FreeBSD.ORG Fri May 13 13:42:12 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B8C57106566B for ; Fri, 13 May 2011 13:42:12 +0000 (UTC) (envelope-from gleb.kurtsou@gmail.com) Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50]) by mx1.freebsd.org (Postfix) with ESMTP id 4AA5B8FC08 for ; Fri, 13 May 2011 13:42:11 +0000 (UTC) Received: by wwc33 with SMTP id 33so2826136wwc.31 for ; Fri, 13 May 2011 06:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=Di4youyES9DGM0AoklWBIrzZZ3dXki4gM+jH5b0IjNQ=; b=Eketko0B4eKgByUG+CYk1ti9dUckbMRBK/JcSVHMNI4krA3wZC8hZtJ9URz+aVnDYx Igc8+gghyraqhHDJSBa6khDVGbsDVD1uK+gO4fJhCJzVDdtP9oYmz2L/jUpxZbT9Srnb WiCZoXvQ3zuXPeFoTi+4mlD3jY+HxWq1BVgwA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=vCes9HUCQrkSnLbQ+ezk320PTNs2d9nqzAhyo7oXp4b6z9qRjS8nQG6he2BBXCg6dw LZ7wpCa9lFJi8uHnEtEBhLHjCpkC5B3bDFSM5cMuJOdj5L3zfnJ3quWUP4K0XseMFA8d 48xlLWxeOitIuxXdXEKfhlp9ibyz1Vx9Qjsdk= Received: by 10.227.12.1 with SMTP id v1mr174331wbv.83.1305292793320; Fri, 13 May 2011 06:19:53 -0700 (PDT) Received: from localhost (lan-78-157-92-5.vln.skynet.lt [78.157.92.5]) by mx.google.com with ESMTPS id h11sm1388596wbc.9.2011.05.13.06.19.52 (version=SSLv3 cipher=OTHER); Fri, 13 May 2011 06:19:52 -0700 (PDT) Date: Fri, 13 May 2011 16:19:03 +0300 From: Gleb Kurtsou To: Johannes Totz Message-ID: <20110513131902.GA34738@tops> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-fs@freebsd.org Subject: Re: fusefs broken on 8-stable? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 13:42:12 -0000 On (13/05/2011 13:56), Johannes Totz wrote: > Heya! > > Using encfs (built on top of fuse) gives me panics in combination with > rsync. Dump didn't succeed. The info below is transcribbled from a > photograph. This is repeatable. > Without dump this is probably not very helpful.... As far as I know there is memory corruption. But this particular case looks like VFS bug in fuse. I'd appreciate if you give native FreeBSD kernel level cryptographic filesystem PEFS a try: http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/156002 -- port http://wiki.freebsd.org/PEFS https://github.com/glk/pefs > > > # uname -a > FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Mar 10 23:30:08 GMT > 2011 root@XXX:/usr/obj/usr/src/sys/GENERIC amd64 > > > > First panic (top bits scrolled off screen): > > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace > #0 ... kbd_backtrace+0x5c > #1 ... panic+0x1b4 > #2 ... trap_fatal+0x394 > #3 ... trap_pfault+0x252 > #4 ... trap+0x3f4 > #5 ... calltrap+0x8 > #6 ... fdisp_make+0xe4 > #7 ... fuse_lookup+0x1dc > #8 ... VOP_LOOKUP_APV+0x4c > #9 ... at lookup+0x61e > #10 ... at namei+0x592 > #11 ... at vn_open_cred+0x339 > #12 ... at vn_open+0x1c > #13 ... at kern_openat+0x152 > #14 ... at kern_open+0x19 > #15 ... at open+0x18 > #16 ... at syscallenter+0x2d9 > #17 ... at syscall+0x38 > > > > Second panic: > > > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 17 (vnlru) > trap number = 12 > panic: page fault > cpuid = 0 > KDB: stack backtrace > #0 ... at kdb_backtrace+0x5c > #1 ... at panic+0x1b4 > #2 ... at trap_fatal+0x394 > #3 ... at trap_pfault0x252 > #4 ... at trap+0x3f4 > #5 ... at calltrap+0x8 > #6 ... at fdisp_make_pid+0xc7 > #7 ... at fuse_send_forget+0x44 > #8 ... at fuse_recyc_backend+0xb2 > #9 ... at VOP_RECLAIM_APV+0x49 > #10 ... at vgonel+0x1b7 > #11 ... at vnlru_proc+0x591 > #12 ... at fork_exit+0x121 > #13 ... at fork_trampoline+0xe > > > > Any idea what could be going on? > > > Johannes > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri May 13 14:13:52 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 67DCF1065673 for ; Fri, 13 May 2011 14:13:52 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 0EA6A8FC15 for ; Fri, 13 May 2011 14:13:51 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4DEDoDQ022145; Fri, 13 May 2011 09:13:50 -0500 (CDT) Date: Fri, 13 May 2011 09:13:50 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Freddie Cash In-Reply-To: Message-ID: References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Fri, 13 May 2011 09:13:50 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS: How to enable cache and logs. X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 14:13:52 -0000 On Thu, 12 May 2011, Freddie Cash wrote: >> >> Zfs would certainly appreciate 128K since that is its default block size. > > Note: the "default block size" is a max block size, not an "every > block written is this size" setting. A ZFS filesystem will use any > power-of-2 size under the block size setting for that filesystem. Except for file tail blocks, or when compression/encrpytion is used, zfs will write full blocks as is configured for the filesystem being written to (the current setting when the file was originally created). Even with compression/encrpytion enabled, the input (uncompressed) data size is the configured block size. The block needs to be read, and (possibly) decompressed, and (possibly) decrypted so that it can be checksummed, and any changes made. The checksum is based on the decoded block in order to capture as many potential error cases as possible, and so that the zfs "send" stream can use the same checksums. Zfs writes data in large transaction groups ("TXG") which allows it to buffer quite a lot of update data (up to 5 seconds worth) before anything is actually written. Even if the application should write 16kb at a time, zfs is likely to have buffered many times 128kb by the time the next TXG is written. If zfs goes to write a block and the user has supplied less than the block size, and the file data has not been accessed for a long time, or the system is under memory pressure so the file data is no longer cached, then zfs needs to read (which includes checksum validation, and possibly decompression and deencryption) the existing block content so that it can fill in the gaps since it always writes full blocks. The blocks are written using a Copy On Write ("COW") algorithm so that the block is written to a new block location. If the NFS client conveniently sent the data 128K at a time for sequential writes then there is a better chance that zfs will be able to avoid some heavy lifting. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Fri May 13 21:11:06 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 16D621065670 for ; Fri, 13 May 2011 21:11:06 +0000 (UTC) (envelope-from freebsd-fs@m.gmane.org) Received: from lo.gmane.org (lo.gmane.org [80.91.229.12]) by mx1.freebsd.org (Postfix) with ESMTP id 9EE518FC08 for ; Fri, 13 May 2011 21:11:05 +0000 (UTC) Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1QKzdY-0006ZN-Kr for freebsd-fs@freebsd.org; Fri, 13 May 2011 23:11:04 +0200 Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 May 2011 23:11:04 +0200 Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 13 May 2011 23:11:04 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-fs@freebsd.org From: Johannes Totz Date: Fri, 13 May 2011 22:10:52 +0100 Lines: 89 Message-ID: References: <20110513131902.GA34738@tops> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 In-Reply-To: <20110513131902.GA34738@tops> Subject: Re: fusefs broken on 8-stable? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 21:11:06 -0000 On 13/05/2011 14:19, Gleb Kurtsou wrote: > On (13/05/2011 13:56), Johannes Totz wrote: >> Heya! >> >> Using encfs (built on top of fuse) gives me panics in combination with >> rsync. Dump didn't succeed. The info below is transcribbled from a >> photograph. This is repeatable. >> Without dump this is probably not very helpful.... > > As far as I know there is memory corruption. But this particular case > looks like VFS bug in fuse. > > I'd appreciate if you give native FreeBSD kernel level cryptographic > filesystem PEFS a try: > http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/156002 -- port > http://wiki.freebsd.org/PEFS > https://github.com/glk/pefs Looks interesting... I was relying on encfs's reverse-mode though: given a plaintext directory, it provides an encrypted view on-the-fly which I was rsync'ing to other file servers. >> # uname -a >> FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Mar 10 23:30:08 GMT >> 2011 root@XXX:/usr/obj/usr/src/sys/GENERIC amd64 >> >> >> >> First panic (top bits scrolled off screen): >> >> trap number = 12 >> panic: page fault >> cpuid = 0 >> KDB: stack backtrace >> #0 ... kbd_backtrace+0x5c >> #1 ... panic+0x1b4 >> #2 ... trap_fatal+0x394 >> #3 ... trap_pfault+0x252 >> #4 ... trap+0x3f4 >> #5 ... calltrap+0x8 >> #6 ... fdisp_make+0xe4 >> #7 ... fuse_lookup+0x1dc >> #8 ... VOP_LOOKUP_APV+0x4c >> #9 ... at lookup+0x61e >> #10 ... at namei+0x592 >> #11 ... at vn_open_cred+0x339 >> #12 ... at vn_open+0x1c >> #13 ... at kern_openat+0x152 >> #14 ... at kern_open+0x19 >> #15 ... at open+0x18 >> #16 ... at syscallenter+0x2d9 >> #17 ... at syscall+0x38 >> >> >> >> Second panic: >> >> >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 17 (vnlru) >> trap number = 12 >> panic: page fault >> cpuid = 0 >> KDB: stack backtrace >> #0 ... at kdb_backtrace+0x5c >> #1 ... at panic+0x1b4 >> #2 ... at trap_fatal+0x394 >> #3 ... at trap_pfault0x252 >> #4 ... at trap+0x3f4 >> #5 ... at calltrap+0x8 >> #6 ... at fdisp_make_pid+0xc7 >> #7 ... at fuse_send_forget+0x44 >> #8 ... at fuse_recyc_backend+0xb2 >> #9 ... at VOP_RECLAIM_APV+0x49 >> #10 ... at vgonel+0x1b7 >> #11 ... at vnlru_proc+0x591 >> #12 ... at fork_exit+0x121 >> #13 ... at fork_trampoline+0xe >> >> >> >> Any idea what could be going on? >> >> >> Johannes From owner-freebsd-fs@FreeBSD.ORG Sat May 14 16:18:48 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AFECA1065673; Sat, 14 May 2011 16:18:48 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 873778FC13; Sat, 14 May 2011 16:18:48 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EGImXq022117; Sat, 14 May 2011 16:18:48 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EGIlte022113; Sat, 14 May 2011 16:18:47 GMT (envelope-from jh) Date: Sat, 14 May 2011 16:18:47 GMT Message-Id: <201105141618.p4EGIlte022113@freefall.freebsd.org> To: bas@kompasmedia.nl, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/120991: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 May 2011 16:18:48 -0000 Synopsis: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots State-Changed-From-To: feedback->open State-Changed-By: jh State-Changed-When: Sat May 14 16:18:47 UTC 2011 State-Changed-Why: Feedback received. http://www.freebsd.org/cgi/query-pr.cgi?pr=120991 From owner-freebsd-fs@FreeBSD.ORG Sat May 14 16:53:03 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E432E1065784; Sat, 14 May 2011 16:53:03 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id BB36E8FC0A; Sat, 14 May 2011 16:53:03 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EGr3u6057761; Sat, 14 May 2011 16:53:03 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EGr3iP057757; Sat, 14 May 2011 16:53:03 GMT (envelope-from jh) Date: Sat, 14 May 2011 16:53:03 GMT Message-Id: <201105141653.p4EGr3iP057757@freefall.freebsd.org> To: bas@kompasmedia.nl, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/120991: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 May 2011 16:53:04 -0000 Synopsis: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots State-Changed-From-To: open->feedback State-Changed-By: jh State-Changed-When: Sat May 14 16:51:46 UTC 2011 State-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=120991 From owner-freebsd-fs@FreeBSD.ORG Sat May 14 16:55:13 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CF8391065670; Sat, 14 May 2011 16:55:13 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 9DC9F8FC14; Sat, 14 May 2011 16:55:13 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EGtDEF057957; Sat, 14 May 2011 16:55:13 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EGtDbs057953; Sat, 14 May 2011 16:55:13 GMT (envelope-from jh) Date: Sat, 14 May 2011 16:55:13 GMT Message-Id: <201105141655.p4EGtDbs057953@freefall.freebsd.org> To: jh@FreeBSD.org, freebsd-fs@FreeBSD.org, jh@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/120991: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 May 2011 16:55:13 -0000 Synopsis: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots Responsible-Changed-From-To: freebsd-fs->jh Responsible-Changed-By: jh Responsible-Changed-When: Sat May 14 16:55:13 UTC 2011 Responsible-Changed-Why: Can you still reproduce this on a supported release? http://www.freebsd.org/cgi/query-pr.cgi?pr=120991 From owner-freebsd-fs@FreeBSD.ORG Sat May 14 17:10:55 2011 Return-Path: Delivered-To: freebsd-fs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5B36106564A; Sat, 14 May 2011 17:10:55 +0000 (UTC) (envelope-from jh@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7CEEC8FC0C; Sat, 14 May 2011 17:10:55 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EHAtQ9072923; Sat, 14 May 2011 17:10:55 GMT (envelope-from jh@freefall.freebsd.org) Received: (from jh@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EHAtaD072910; Sat, 14 May 2011 17:10:55 GMT (envelope-from jh) Date: Sat, 14 May 2011 17:10:55 GMT Message-Id: <201105141710.p4EHAtaD072910@freefall.freebsd.org> To: mjacob@freebsd.org, jh@FreeBSD.org, freebsd-fs@FreeBSD.org From: jh@FreeBSD.org Cc: Subject: Re: kern/106030: [ufs] [panic] panic in ufs from geom when a dead disk is invalidated X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 May 2011 17:10:55 -0000 Synopsis: [ufs] [panic] panic in ufs from geom when a dead disk is invalidated State-Changed-From-To: open->feedback State-Changed-By: jh State-Changed-When: Sat May 14 17:10:55 UTC 2011 State-Changed-Why: Can you still reproduce this on a supported release? http://www.freebsd.org/cgi/query-pr.cgi?pr=106030