From owner-freebsd-fs@FreeBSD.ORG  Sun May  8 07:26:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 667A81065674
	for <freebsd-fs@freebsd.org>; Sun,  8 May 2011 07:26:45 +0000 (UTC)
	(envelope-from freebsd@psconsult.nl)
Received: from mx1.psconsult.nl (unknown [IPv6:2001:7b8:30f:e0::5059:ee8a])
	by mx1.freebsd.org (Postfix) with ESMTP id 1A3A28FC1A
	for <freebsd-fs@freebsd.org>; Sun,  8 May 2011 07:26:44 +0000 (UTC)
Received: from mx1.psconsult.nl (psc11.adsl.iaf.nl [80.89.238.138])
	by mx1.psconsult.nl (8.14.4/8.14.4) with ESMTP id p487Qbpk005463
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Sun, 8 May 2011 09:26:43 +0200 (CEST)
	(envelope-from freebsd@psconsult.nl)
Received: (from paul@localhost)
	by mx1.psconsult.nl (8.14.4/8.14.4/Submit) id p487Qb1a005462
	for freebsd-fs@freebsd.org; Sun, 8 May 2011 09:26:37 +0200 (CEST)
	(envelope-from freebsd@psconsult.nl)
X-Authentication-Warning: mx1.psconsult.nl: paul set sender to
	freebsd@psconsult.nl using -f
Date: Sun, 8 May 2011 09:26:37 +0200
From: Paul Schenkeveld <freebsd@psconsult.nl>
To: freebsd-fs@freebsd.org
Message-ID: <20110508072637.GA5123@psconsult.nl>
References: <210021304745658@web53.yandex.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <210021304745658@web53.yandex.ru>
User-Agent: Mutt/1.5.21 (2010-09-15)
Subject: Re: ZFS can't mount filesystem
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 08 May 2011 07:26:45 -0000

On Sat, May 07, 2011 at 09:20:57AM +0400, Igor Zabelin wrote:
> Hi,
> 
> I have trouble with ZFS. One of set filesystems can't mount.
> zpool scrub is not doing anything
> ZFS reports an error when geting the properties.
> SMART extended offline test for each disk completed without error.
> It's possible to recover data? Mount ignoring errors?
> 
> FreeBSD 8.2-RELEASE 
> 
> ZFS reports an error when geting the properties.
> 
> # zfs get all tank/var
> 
> [skip normal output]
> internal error: unable to get version property
> internal error: unable to get utf8only property
> internal error: unable to get normalization property
> internal error: unable to get casesensitivity property
> [skip normal output]
> 
> # zpool status -v tank
>   pool: tank
>  state: ONLINE
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scrub: scrub stopped after 0h0m with 0 errors on Sat May  7 08:09:35 2011
> config:
> 
>         NAME           STATE     READ WRITE CKSUM
>         tank           ONLINE       0     0    36
>           raidz1       ONLINE       0     0   144
>             gpt/disk5  ONLINE       0     0     0
>             gpt/disk6  ONLINE       0     0     0
>             gpt/disk7  ONLINE       0     0     0
> 
> errors: Permanent errors have been detected in the following files:
> 
>         tank/var:<0x0>

The 'status:' line indicates that (previous) problems with the pool left
tank/var unusable because zfs encountered checksum errors that could
not be repaired, i.e. because not enough replicas were available.  The
story doesn't tell the cause of this problem, if the problem was with
the drives, cables or controller, the evidence is in the messages file
but that is probably on the affected dataset, unless you have backups or
send syslog to a syslog server.  Disk problems are not likely the cause
because you're using raidz1 so at least two disks must have had problems
to cause this and SMART apparently did not report drive problems too.

Other causes for your problems include:
 - blocks on disk were overwritten for whatever reason
 - powerfailure + write-back cache but no battery-backup
 - problems with the mobo/processor/memory

I think your chances of recovering tank/var are very slim but when you
'zfs destroy tank/var' (using -r if there are snapshots) you probably
can save the rest of your pool.  If you have sub-datasers under tank/var
youd could first *try* to move them to another parent like
'zfs rename tank/var/log tank/rescued_var_log' but probably the damage
to tank/var will prevent that.

As always, sysadmin rule #1 applies here: backups, backups, backups.

HTH

Paul Schenkeveld

From owner-freebsd-fs@FreeBSD.ORG  Mon May  9 10:42:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4E4EE106564A
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 10:42:45 +0000 (UTC)
	(envelope-from rs@bytecamp.net)
Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 8B6F78FC14
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 10:42:44 +0000 (UTC)
Received: (qmail 94912 invoked by uid 89); 9 May 2011 12:42:42 +0200
Received: from stella.bytecamp.net (HELO ?212.204.60.37?)
	(rs%bytecamp.net@212.204.60.37)
	by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP;
	9 May 2011 12:42:42 +0200
Message-ID: <4DC7C522.3070601@bytecamp.net>
Date: Mon, 09 May 2011 12:42:42 +0200
From: Robert Schulze <rs@bytecamp.net>
Organization: bytecamp GmbH
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DC13260.4020905@bytecamp.net>
	<20110504115540.GA88625@icarus.home.lan>
In-Reply-To: <20110504115540.GA88625@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 09 May 2011 10:42:45 -0000

Hi,

Am 04.05.2011 13:55, schrieb Jeremy Chadwick:
>> Is it _required_ to upgrade existing pools and filesystems or can
>> that be done anytime later?
>
> - It can be done later, though by not upgrading you lose the ability to
> use newer features.

well, the features are not the reason to upgrade the kernel. One more 
question: could the pools and filesystems be upgraded while there is 
pressure on them, or shall this procedure be made "offline", i.e. 
without any clients accessing the machine?

with kind regards,
Robert Schulze

From owner-freebsd-fs@FreeBSD.ORG  Mon May  9 10:52:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5AA12106564A
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 10:52:58 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta10.westchester.pa.mail.comcast.net
	(qmta10.westchester.pa.mail.comcast.net [76.96.62.17])
	by mx1.freebsd.org (Postfix) with ESMTP id 08E9E8FC13
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 10:52:57 +0000 (UTC)
Received: from omta16.westchester.pa.mail.comcast.net ([76.96.62.88])
	by qmta10.westchester.pa.mail.comcast.net with comcast
	id hNki1g0011uE5Es5ANsysu; Mon, 09 May 2011 10:52:58 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta16.westchester.pa.mail.comcast.net with comcast
	id hNsr1g00R1t3BNj3cNsvGs; Mon, 09 May 2011 10:52:57 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id B9BA7102C19; Mon,  9 May 2011 03:52:49 -0700 (PDT)
Date: Mon, 9 May 2011 03:52:49 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Robert Schulze <rs@bytecamp.net>
Message-ID: <20110509105249.GA58361@icarus.home.lan>
References: <4DC13260.4020905@bytecamp.net>
	<20110504115540.GA88625@icarus.home.lan>
	<4DC7C522.3070601@bytecamp.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DC7C522.3070601@bytecamp.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: zfs/zpool upgrade required?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 09 May 2011 10:52:58 -0000

On Mon, May 09, 2011 at 12:42:42PM +0200, Robert Schulze wrote:
> Am 04.05.2011 13:55, schrieb Jeremy Chadwick:
> >>Is it _required_ to upgrade existing pools and filesystems or can
> >>that be done anytime later?
> >
> >- It can be done later, though by not upgrading you lose the ability to
> >use newer features.
> 
> well, the features are not the reason to upgrade the kernel. One
> more question: could the pools and filesystems be upgraded while
> there is pressure on them, or shall this procedure be made
> "offline", i.e. without any clients accessing the machine?

It can be done while I/O requests are being handled.  Depending on pool
size, amount of data, etc. the "upgrade" commands may take some time,
though.  If you're worried about customer/client impact, you'd best
schedule downtime/a maintenance window.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Mon May  9 11:07:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3ED941065672
	for <freebsd-fs@FreeBSD.org>; Mon,  9 May 2011 11:07:05 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 254F68FC22
	for <freebsd-fs@FreeBSD.org>; Mon,  9 May 2011 11:07:05 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p49B75LP070609
	for <freebsd-fs@FreeBSD.org>; Mon, 9 May 2011 11:07:05 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p49B74r6070607
	for freebsd-fs@FreeBSD.org; Mon, 9 May 2011 11:07:04 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 9 May 2011 11:07:04 GMT
Message-Id: <201105091107.p49B74r6070607@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-fs@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 09 May 2011 11:07:05 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o kern/156797  fs         [zfs] [panic] Double panic with FreeBSD 9-CURRENT and 
o kern/156781  fs         [zfs] zfs is losing the snapshot directory,
p kern/156545  fs         [ufs] mv could break UFS on SMP systems
o kern/156193  fs         [ufs] [hang] UFS snapshot hangs && deadlocks processes
o kern/156168  fs         [nfs] [panic] Kernel panic under concurrent access ove
o kern/156039  fs         [nullfs] [unionfs] nullfs + unionfs do not compose, re
o kern/155615  fs         [zfs] zfs v28 broken on sparc64 -current
o kern/155587  fs         [zfs] [panic] kernel panic with zfs
o kern/155484  fs         [ufs] GPT + UFS boot don't work well together
o kern/155411  fs         [regression] [8.2-release] [tmpfs]: mount: tmpfs : No 
o kern/155199  fs         [ext2fs] ext3fs mounted as ext2fs gives I/O errors
o bin/155104   fs         [zfs][patch] use /dev prefix by default when importing
o kern/154930  fs         [zfs] cannot delete/unlink file from full volume -> EN
o kern/154828  fs         [msdosfs] Unable to create directories on external USB
o kern/154491  fs         [smbfs] smb_co_lock: recursive lock for object 1
o kern/154447  fs         [zfs] [panic] Occasional panics - solaris assert somew
p kern/154228  fs         [md] md getting stuck in wdrain state
o kern/153996  fs         [zfs] zfs root mount error while kernel is not located
o kern/153847  fs         [nfs] [panic] Kernel panic from incorrect m_free in nf
o kern/153753  fs         [zfs] ZFS v15 - grammatical error when attempting to u
o kern/153716  fs         [zfs] zpool scrub time remaining is incorrect
o kern/153695  fs         [patch] [zfs] Booting from zpool created on 4k-sector 
o kern/153680  fs         [xfs] 8.1 failing to mount XFS partitions
o kern/153520  fs         [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable
o kern/153418  fs         [zfs] [panic] Kernel Panic occurred writing to zfs vol
o kern/153351  fs         [zfs] locking directories/files in ZFS
o bin/153258   fs         [patch][zfs] creating ZVOLs requires `refreservation' 
s kern/153173  fs         [zfs] booting from a gzip-compressed dataset doesn't w
o kern/153126  fs         [zfs] vdev failure, zpool=peegel type=vdev.too_small
p kern/152488  fs         [tmpfs] [patch] mtime of file updated when only inode 
o kern/152022  fs         [nfs] nfs service hangs with linux client [regression]
o kern/151942  fs         [zfs] panic during ls(1) zfs snapshot directory
o kern/151905  fs         [zfs] page fault under load in /sbin/zfs
o kern/151845  fs         [smbfs] [patch] smbfs should be upgraded to support Un
o bin/151713   fs         [patch] Bug in growfs(8) with respect to 32-bit overfl
o kern/151648  fs         [zfs] disk wait bug
o kern/151629  fs         [fs] [patch] Skip empty directory entries during name 
o kern/151330  fs         [zfs] will unshare all zfs filesystem after execute a 
o kern/151326  fs         [nfs] nfs exports fail if netgroups contain duplicate 
o kern/151251  fs         [ufs] Can not create files on filesystem with heavy us
o kern/151226  fs         [zfs] can't delete zfs snapshot
o kern/151111  fs         [zfs] vnodes leakage during zfs unmount
o kern/150503  fs         [zfs] ZFS disks are UNAVAIL and corrupted after reboot
o kern/150501  fs         [zfs] ZFS vdev failure vdev.bad_label on amd64
o kern/150390  fs         [zfs] zfs deadlock when arcmsr reports drive faulted
o kern/150336  fs         [nfs] mountd/nfsd became confused; refused to reload n
o kern/150207  fs         zpool(1): zpool import -d /dev tries to open weird dev
o kern/149208  fs         mksnap_ffs(8) hang/deadlock
o kern/149173  fs         [patch] [zfs] make OpenSolaris <sys/nvpair.h> installa
o kern/149015  fs         [zfs] [patch] misc fixes for ZFS code to build on Glib
o kern/149014  fs         [zfs] [patch] declarations in ZFS libraries/utilities 
o kern/149013  fs         [zfs] [patch] make ZFS makefiles use the libraries fro
o kern/148504  fs         [zfs] ZFS' zpool does not allow replacing drives to be
o kern/148490  fs         [zfs]: zpool attach - resilver bidirectionally, and re
o kern/148368  fs         [zfs] ZFS hanging forever on 8.1-PRERELEASE
o bin/148296   fs         [zfs] [loader] [patch] Very slow probe in /usr/src/sys
o kern/148204  fs         [nfs] UDP NFS causes overload
o kern/148138  fs         [zfs] zfs raidz pool commands freeze
o kern/147903  fs         [zfs] [panic] Kernel panics on faulty zfs device
o kern/147881  fs         [zfs] [patch] ZFS "sharenfs" doesn't allow different "
o kern/147790  fs         [zfs] zfs set acl(mode|inherit) fails on existing zfs
o kern/147560  fs         [zfs] [boot] Booting 8.1-PRERELEASE raidz system take 
o kern/147420  fs         [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt 
o kern/146941  fs         [zfs] [panic] Kernel Double Fault - Happens constantly
o kern/146786  fs         [zfs] zpool import hangs with checksum errors
o kern/146708  fs         [ufs] [panic] Kernel panic in softdep_disk_write_compl
o kern/146528  fs         [zfs] Severe memory leak in ZFS on i386
o kern/146502  fs         [nfs] FreeBSD 8 NFS Client Connection to Server
s kern/145712  fs         [zfs] cannot offline two drives in a raidz2 configurat
o kern/145411  fs         [xfs] [panic] Kernel panics shortly after mounting an 
o bin/145309   fs         bsdlabel: Editing disk label invalidates the whole dev
o kern/145272  fs         [zfs] [panic] Panic during boot when accessing zfs on 
o kern/145246  fs         [ufs] dirhash in 7.3 gratuitously frees hashes when it
o kern/145238  fs         [zfs] [panic] kernel panic on zpool clear tank
o kern/145229  fs         [zfs] Vast differences in ZFS ARC behavior between 8.0
o kern/145189  fs         [nfs] nfsd performs abysmally under load
o kern/144929  fs         [ufs] [lor] vfs_bio.c + ufs_dirhash.c
p kern/144447  fs         [zfs] sharenfs fsunshare() & fsshare_main() non functi
o kern/144416  fs         [panic] Kernel panic on online filesystem optimization
s kern/144415  fs         [zfs] [panic] kernel panics on boot after zfs crash
o kern/144234  fs         [zfs] Cannot boot machine with recent gptzfsboot code 
o kern/143825  fs         [nfs] [panic] Kernel panic on NFS client
o bin/143572   fs         [zfs] zpool(1): [patch] The verbose output from iostat
o kern/143212  fs         [nfs] NFSv4 client strange work ...
o kern/143184  fs         [zfs] [lor] zfs/bufwait LOR
o kern/142914  fs         [zfs] ZFS performance degradation over time
o kern/142878  fs         [zfs] [vfs] lock order reversal
o kern/142597  fs         [ext2fs] ext2fs does not work on filesystems with real
o kern/142489  fs         [zfs] [lor] allproc/zfs LOR
o kern/142466  fs         Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re
o kern/142306  fs         [zfs] [panic] ZFS drive (from OSX Leopard) causes two 
o kern/142068  fs         [ufs] BSD labels are got deleted spontaneously
o kern/141897  fs         [msdosfs] [panic] Kernel panic. msdofs: file name leng
o kern/141463  fs         [nfs] [panic] Frequent kernel panics after upgrade fro
o kern/141305  fs         [zfs] FreeBSD ZFS+sendfile severe performance issues (
o kern/141091  fs         [patch] [nullfs] fix panics with DIAGNOSTIC enabled
o kern/141086  fs         [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS
o kern/141010  fs         [zfs] "zfs scrub" fails when backed by files in UFS2
o kern/140888  fs         [zfs] boot fail from zfs root while the pool resilveri
o kern/140661  fs         [zfs] [patch] /boot/loader fails to work on a GPT/ZFS-
o kern/140640  fs         [zfs] snapshot crash
o kern/140134  fs         [msdosfs] write and fsck destroy filesystem integrity
o kern/140068  fs         [smbfs] [patch] smbfs does not allow semicolon in file
o kern/139725  fs         [zfs] zdb(1) dumps core on i386 when examining zpool c
o kern/139715  fs         [zfs] vfs.numvnodes leak on busy zfs
p bin/139651   fs         [nfs] mount(8): read-only remount of NFS volume does n
o kern/139597  fs         [patch] [tmpfs] tmpfs initializes va_gen but doesn't u
o kern/139564  fs         [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo
o kern/139407  fs         [smbfs] [panic] smb mount causes system crash if remot
o kern/138662  fs         [panic] ffs_blkfree: freeing free block
o kern/138421  fs         [ufs] [patch] remove UFS label limitations
o kern/138202  fs         mount_msdosfs(1) see only 2Gb
o kern/136968  fs         [ufs] [lor] ufs/bufwait/ufs (open)
o kern/136945  fs         [ufs] [lor] filedesc structure/ufs (poll)
o kern/136944  fs         [ffs] [lor] bufwait/snaplk (fsync)
o kern/136873  fs         [ntfs] Missing directories/files on NTFS volume
o kern/136865  fs         [nfs] [patch] NFS exports atomic and on-the-fly atomic
p kern/136470  fs         [nfs] Cannot mount / in read-only, over NFS
o kern/135546  fs         [zfs] zfs.ko module doesn't ignore zpool.cache filenam
o kern/135469  fs         [ufs] [panic] kernel crash on md operation in ufs_dirb
o kern/135050  fs         [zfs] ZFS clears/hides disk errors on reboot
o kern/134491  fs         [zfs] Hot spares are rather cold...
o kern/133676  fs         [smbfs] [panic] umount -f'ing a vnode-based memory dis
o kern/133174  fs         [msdosfs] [patch] msdosfs must support utf-encoded int
o kern/132960  fs         [ufs] [panic] panic:ffs_blkfree: freeing free frag
o kern/132397  fs         reboot causes filesystem corruption (failure to sync b
o kern/132331  fs         [ufs] [lor] LOR ufs and syncer
o kern/132237  fs         [msdosfs] msdosfs has problems to read MSDOS Floppy
o kern/132145  fs         [panic] File System Hard Crashes
o kern/131441  fs         [unionfs] [nullfs] unionfs and/or nullfs not combineab
o kern/131360  fs         [nfs] poor scaling behavior of the NFS server under lo
o kern/131342  fs         [nfs] mounting/unmounting of disks causes NFS to fail
o bin/131341   fs         makefs: error "Bad file descriptor"  on the mount poin
o kern/130920  fs         [msdosfs] cp(1) takes 100% CPU time while copying file
o kern/130210  fs         [nullfs] Error by check nullfs
o kern/129760  fs         [nfs] after 'umount -f' of a stale NFS share FreeBSD l
o kern/129488  fs         [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: 
o kern/129231  fs         [ufs] [patch] New UFS mount (norandom) option - mostly
o kern/129152  fs         [panic] non-userfriendly panic when trying to mount(8)
o kern/127787  fs         [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs
o bin/127270   fs         fsck_msdosfs(8) may crash if BytesPerSec is zero
o kern/127029  fs         [panic] mount(8): trying to mount a write protected zi
o kern/126287  fs         [ufs] [panic] Kernel panics while mounting an UFS file
o kern/125895  fs         [ffs] [panic] kernel: panic: ffs_blkfree: freeing free
s kern/125738  fs         [zfs] [request] SHA256 acceleration in ZFS
o kern/123939  fs         [msdosfs] corrupts new files
o kern/122380  fs         [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash
o bin/122172   fs         [fs]: amd(8) automount daemon dies on 6.3-STABLE i386,
o bin/121898   fs         [nullfs] pwd(1)/getcwd(2) fails with Permission denied
o bin/121366   fs         [zfs] [patch] Automatic disk scrubbing from periodic(8
o bin/121072   fs         [smbfs] mount_smbfs(8) cannot normally convert the cha
f kern/120991  fs         [panic] [ffs] [snapshot] System crashes when manipulat
o kern/120483  fs         [ntfs] [patch] NTFS filesystem locking changes
o kern/120482  fs         [ntfs] [patch] Sync style changes between NetBSD and F
o kern/118912  fs         [2tb] disk sizing/geometry problem with large array
o kern/118713  fs         [minidump] [patch] Display media size required for a k
o bin/118249   fs         [ufs] mv(1): moving a directory changes its mtime
o kern/118107  fs         [ntfs] [panic] Kernel panic when accessing a file at N
o kern/117954  fs         [ufs] dirhash on very large directories blocks the mac
o bin/117315   fs         [smbfs] mount_smbfs(8) and related options can't mount
o kern/117314  fs         [ntfs] Long-filename only NTFS fs'es cause kernel pani
o kern/117158  fs         [zfs] zpool scrub causes panic if geli vdevs detach on
o bin/116980   fs         [msdosfs] [patch] mount_msdosfs(8) resets some flags f
o conf/116931  fs         lack of fsck_cd9660 prevents mounting iso images with 
o kern/116583  fs         [ffs] [hang] System freezes for short time when using 
f kern/116170  fs         [panic] Kernel panic when mounting /tmp
o bin/115361   fs         [zfs] mount(8) gets into a state where it won't set/un
o kern/114955  fs         [cd9660] [patch] [request] support for mask,dirmask,ui
o kern/114847  fs         [ntfs] [patch] [request] dirmask support for NTFS ala 
o kern/114676  fs         [ufs] snapshot creation panics: snapacct_ufs2: bad blo
o bin/114468   fs         [patch] [request] add -d option to umount(8) to detach
o kern/113852  fs         [smbfs] smbfs does not properly implement DFS referral
o bin/113838   fs         [patch] [request] mount(8): add support for relative p
o bin/113049   fs         [patch] [request] make quot(8) use getopt(3) and show 
o kern/112658  fs         [smbfs] [patch] smbfs and caching problems (resolves b
o kern/111843  fs         [msdosfs] Long Names of files are incorrectly created 
o kern/111782  fs         [ufs] dump(8) fails horribly for large filesystems
s bin/111146   fs         [2tb] fsck(8) fails on 6T filesystem
o kern/109024  fs         [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat
o kern/109010  fs         [msdosfs] can't mv directory within fat32 file system
o bin/107829   fs         [2TB] fdisk(8): invalid boundary checking in fdisk / w
o kern/106107  fs         [ufs] left-over fsck_snapshot after unfinished backgro
o kern/106030  fs         [ufs] [panic] panic in ufs from geom when a dead disk 
o kern/104406  fs         [ufs] Processes get stuck in "ufs" state under persist
o kern/104133  fs         [ext2fs] EXT2FS module corrupts EXT2/3 filesystems
o kern/103035  fs         [ntfs] Directories in NTFS mounted disc images appear 
o kern/101324  fs         [smbfs] smbfs sometimes not case sensitive when it's s
o kern/99290   fs         [ntfs] mount_ntfs ignorant of cluster sizes
s bin/97498    fs         [request] newfs(8) has no option to clear the first 12
o kern/97377   fs         [ntfs] [patch] syntax cleanup for ntfs_ihash.c
o kern/95222   fs         [cd9660] File sections on ISO9660 level 3 CDs ignored
o kern/94849   fs         [ufs] rename on UFS filesystem is not atomic
o bin/94810    fs         fsck(8) incorrectly reports 'file system marked clean'
o kern/94769   fs         [ufs] Multiple file deletions on multi-snapshotted fil
o kern/94733   fs         [smbfs] smbfs may cause double unlock
o kern/93942   fs         [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D
o kern/92272   fs         [ffs] [hang] Filling a filesystem while creating a sna
o kern/91134   fs         [smbfs] [patch] Preserve access and modification time 
a kern/90815   fs         [smbfs] [patch] SMBFS with character conversions somet
o kern/88657   fs         [smbfs] windows client hang when browsing a samba shar
o kern/88555   fs         [panic] ffs_blkfree: freeing free frag on AMD 64
o kern/88266   fs         [smbfs] smbfs does not implement UIO_NOCOPY and sendfi
o bin/87966    fs         [patch] newfs(8): introduce -A flag for newfs to enabl
o kern/87859   fs         [smbfs] System reboot while umount smbfs.
o kern/86587   fs         [msdosfs] rm -r /PATH fails with lots of small files
o bin/85494    fs         fsck_ffs: unchecked use of cg_inosused macro etc.
o kern/80088   fs         [smbfs] Incorrect file time setting on NTFS mounted vi
o bin/74779    fs         Background-fsck checks one filesystem twice and omits 
o kern/73484   fs         [ntfs] Kernel panic when doing `ls` from the client si
o bin/73019    fs         [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino
o kern/71774   fs         [ntfs] NTFS cannot "see" files on a WinXP filesystem
o bin/70600    fs         fsck(8) throws files away when it can't grow lost+foun
o kern/68978   fs         [panic] [ufs] crashes with failing hard disk, loose po
o kern/65920   fs         [nwfs] Mounted Netware filesystem behaves strange
o kern/65901   fs         [smbfs] [patch] smbfs fails fsx write/truncate-down/tr
o kern/61503   fs         [smbfs] mount_smbfs does not work as non-root
o kern/55617   fs         [smbfs] Accessing an nsmb-mounted drive via a smb expo
o kern/51685   fs         [hang] Unbounded inode allocation causes kernel to loc
o kern/51583   fs         [nullfs] [patch] allow to work with devices and socket
o kern/36566   fs         [smbfs] System reboot with dead smb mount and umount
o kern/33464   fs         [ufs] soft update inconsistencies after system crash
o bin/27687    fs         fsck(8) wrapper is not properly passing options to fsc
o kern/18874   fs         [2TB] 32bit NFS servers export wrong negative values t

223 problems total.


From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 03:48:17 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6BFF2106566C
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 03:48:17 +0000 (UTC)
	(envelope-from licquia@linuxfoundation.org)
Received: from rimu.licquia.org (rimu.licquia.org [72.249.37.24])
	by mx1.freebsd.org (Postfix) with ESMTP id 473238FC0C
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 03:48:17 +0000 (UTC)
Received: from server1.internal.licquia.org
	(c-98-220-117-231.hsd1.in.comcast.net [98.220.117.231])
	by rimu.licquia.org (Postfix) with ESMTPS id 0708142558
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 22:29:57 -0500 (CDT)
Received: from server1.internal.licquia.org (localhost.localdomain [127.0.0.1])
	by server1.internal.licquia.org (Postfix) with ESMTP id 2552998066
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 23:30:23 -0400 (EDT)
Received: from [192.168.50.14] (unknown [192.168.50.14])
	by server1.internal.licquia.org (Postfix) with ESMTP id 0900898063
	for <freebsd-fs@freebsd.org>; Mon,  9 May 2011 23:30:23 -0400 (EDT)
Message-ID: <4DC8B14E.4050400@linuxfoundation.org>
Date: Mon, 09 May 2011 23:30:22 -0400
From: Jeff Licquia <licquia@linuxfoundation.org>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV using ClamSMTP
Subject: Filesystem Hierarchy Standard (FHS) and FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 03:48:17 -0000

(Sorry if this isn't the proper list for this discussion.  If not, 
please point me in the right direction.)

The Linux Foundation's LSB workgroup has taken over maintenance of the 
Filesystem Hierarchy Standard, and is working on a number of updates 
needed since its last release in 2004.

Despite all the "Linux" in the names above, we're wanting to make sure 
that the FHS remains independent of any particular UNIX implementation, 
and continues to be useful to non-Linux UNIXes.

My question to you is: do you consider the FHS to be relevant to current 
and future development of FreeBSD?  If not, is this simply due to lack 
of maintenance; would your interest in the FHS be greater with more 
consistent updates?

If you are interested, consider this an invitation to participate. 
We've set up a mailing list, Web site, etc., and are reviving the old 
bug tracker.  More details can be found here:

http://www.linuxfoundation.org/collaborate/workgroups/lsb/fhs


From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 09:11:33 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A9793106566B
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 09:11:33 +0000 (UTC)
	(envelope-from numisemis@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 39CD68FC18
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 09:11:32 +0000 (UTC)
Received: by wwc33 with SMTP id 33so6264567wwc.31
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 02:11:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:subject:mime-version:content-type:from
	:in-reply-to:date:cc:content-transfer-encoding:message-id:references
	:to:x-mailer; bh=1nMLxYaio3WOyP7PedInUJFtQEwAVVd/fJcupwoV48w=;
	b=BI5SjqBqzPvzrLo1pHc9sty76jON6kVPK9zZcp+b6TkJkOjb01Mx1gUKIJmjLroeFC
	yAHMNoYW7o6FS8k1xWNRPAbUI1OPSN7qCizTlftwZpbCnpfcOrpdOYb0rIUC+y0/15A9
	qhfnwG6RTp2ves6GdBKPmgj1nZkLcuPfTEO1c=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to:x-mailer;
	b=IL0PskV2CqdhSe5tZyFnMwaHN282RRDD/wYpoN5E7u4WdLGg/5n1yr1zu/QKSnBoZy
	nbtPgPAYuYO4KHAQTEvQ/sCXYKzo9bMrxsRYpEqnup4sgNlX/ttwjNZybVyi2ZR8o24V
	4+MndUTB79hnqCA5JGAtLXQOKZB/R3S1rrLUU=
Received: by 10.216.239.71 with SMTP id b49mr2081624wer.107.1305017252755;
	Tue, 10 May 2011 01:47:32 -0700 (PDT)
Received: from sime-imac.logos.hr ([213.147.110.159])
	by mx.google.com with ESMTPS id y35sm130685weq.15.2011.05.10.01.47.31
	(version=TLSv1/SSLv3 cipher=OTHER);
	Tue, 10 May 2011 01:47:32 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset=us-ascii
From: =?iso-8859-2?Q?=A9imun_Mikecin?= <numisemis@gmail.com>
In-Reply-To: <4DC8B14E.4050400@linuxfoundation.org>
Date: Tue, 10 May 2011 10:47:24 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <B3553DE8-8910-4895-994F-55A323433EE2@gmail.com>
References: <4DC8B14E.4050400@linuxfoundation.org>
To: Jeff Licquia <licquia@linuxfoundation.org>
X-Mailer: Apple Mail (2.1084)
Cc: freebsd-fs@freebsd.org
Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 09:11:33 -0000


On 10. svi. 2011., at 05:30, Jeff Licquia wrote:

> (Sorry if this isn't the proper list for this discussion.  If not, =
please point me in the right direction.)
>=20
> The Linux Foundation's LSB workgroup has taken over maintenance of the =
Filesystem Hierarchy Standard, and is working on a number of updates =
needed since its last release in 2004.
>=20
> Despite all the "Linux" in the names above, we're wanting to make sure =
that the FHS remains independent of any particular UNIX implementation, =
and continues to be useful to non-Linux UNIXes.
>=20
> My question to you is: do you consider the FHS to be relevant to =
current and future development of FreeBSD?  If not, is this simply due =
to lack of maintenance; would your interest in the FHS be greater with =
more consistent updates?
>=20
> If you are interested, consider this an invitation to participate. =
We've set up a mailing list, Web site, etc., and are reviving the old =
bug tracker.  More details can be found here:
>=20
> http://www.linuxfoundation.org/collaborate/workgroups/lsb/fhs

FHS is Linux centric instead of trying to be useful to other non-Linux =
UNIXes (like OpenGroup does).
We already have hier(7):

=
http://www.FreeBSD.org/cgi/man.cgi?query=3Dhier&apropos=3D0&sektion=3D0&ma=
npath=3DFreeBSD+8.2-RELEASE&format=3Dhtml


From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 13:56:11 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 274CE1065673;
	Tue, 10 May 2011 13:56:11 +0000 (UTC)
	(envelope-from daichi@ongs.co.jp)
Received: from natial.ongs.co.jp (natial.ongs.co.jp [202.216.246.90])
	by mx1.freebsd.org (Postfix) with ESMTP id BFA268FC0C;
	Tue, 10 May 2011 13:56:10 +0000 (UTC)
Received: from [192.168.15.190] (unknown [24.114.252.244])
	by natial.ongs.co.jp (Postfix) with ESMTPSA id 4CC9412543B;
	Tue, 10 May 2011 22:39:29 +0900 (JST)
From: Daichi GOTO <daichi@ongs.co.jp>
Date: Tue, 10 May 2011 09:39:25 -0400
Message-Id: <39BCA797-BCE2-4A2A-AA7F-AD8A87014CD4@ongs.co.jp>
To: freebsd-current@freebsd.org,
 freebsd-fs@freebsd.org
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: 
Subject: [Call for Test] unionfs intermediate umount feature
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 13:56:11 -0000

Hi unionfs users ;)

We have developed new unionfs feature, "intermediate umount".
You can do like this:

  # mount_unionfs /test2 /test1
  # mount_unionfs /test3 /test1
  # df
  <above>:/test2  xxxxx xxxxx xxxxx  xx%  /test1
  <above>:/test3  xxxxx xxxxx xxxxx  xx%  /test1
  # umount '<above>:/test2'
  # df
  <above>:/test3  xxxxx xxxxx xxxxx  xx%  /test1
  #

patch for current:
    =
http://people.freebsd.org/~daichi/unionfs/experiments/unionfs-intermediate=
-umount.diff

First, I want to know your opinion. Thanks :)

-----
Daichi GOTO=

From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 14:55:55 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 32FE11065672
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 14:55:55 +0000 (UTC)
	(envelope-from feld@feld.me)
Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com
	[209.85.210.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 0462F8FC16
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 14:55:54 +0000 (UTC)
Received: by iyj12 with SMTP id 12so7517323iyj.13
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 07:55:54 -0700 (PDT)
Received: by 10.42.29.195 with SMTP id s3mr3636476icc.30.1305037886305;
	Tue, 10 May 2011 07:31:26 -0700 (PDT)
Received: from tech304 (supranet-tech.secure-on.net [66.170.8.18])
	by mx.google.com with ESMTPS id a1sm2867199ics.4.2011.05.10.07.31.24
	(version=TLSv1/SSLv3 cipher=OTHER);
	Tue, 10 May 2011 07:31:25 -0700 (PDT)
Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
To: freebsd-fs@freebsd.org
References: <4DC8B14E.4050400@linuxfoundation.org>
	<B3553DE8-8910-4895-994F-55A323433EE2@gmail.com>
Date: Tue, 10 May 2011 09:31:23 -0500
MIME-Version: 1.0
Content-Transfer-Encoding: Quoted-Printable
From: "Mark Felder" <feld@feld.me>
Message-ID: <op.vu9rylj434t2sn@tech304>
In-Reply-To: <B3553DE8-8910-4895-994F-55A323433EE2@gmail.com>
User-Agent: Opera Mail/11.50 (FreeBSD)
Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 14:55:55 -0000

On Tue, 10 May 2011 03:47:24 -0500, =C5=A0imun Mikecin <numisemis@gmail.=
com>  =

wrote:

> FHS is Linux centric instead of trying to be useful to other non-Linux=
  =

> UNIXes (like OpenGroup does).
> We already have hier(7):

Jeff,

This might seem rather blunt/rude but honestly I don't foresee the FreeB=
SD  =

project integrating with the FHS. The hier(7) layout has been a lot of  =

work and it's something the project really strives to standardize on. As=
 a  =

FreeBSD user, we've been used to "porting" applications to not only run =
on  =

FreeBSD but also adhere properly to the hier(7) structure. But if the FH=
S  =

wants to adopt hier(7) we won't complain....


Regards,


Mark

From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 15:44:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B7B50106564A
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 15:44:56 +0000 (UTC)
	(envelope-from josef.karthauser@unitedlane.com)
Received: from k2smtpout03-01.prod.mesa1.secureserver.net
	(k2smtpout03-01.prod.mesa1.secureserver.net [64.202.189.171])
	by mx1.freebsd.org (Postfix) with SMTP id 89A058FC1E
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 15:44:56 +0000 (UTC)
Received: (qmail 29426 invoked from network); 10 May 2011 15:18:15 -0000
Received: from unknown (HELO ip-72.167.34.38.ip.secureserver.net)
	(72.167.34.38)
	by k2smtpout03-01.prod.mesa1.secureserver.net (64.202.189.171) with
	ESMTP; 10 May 2011 15:18:14 -0000
Received: (qmail 15827 invoked from network); 10 May 2011 11:16:50 -0400
Received: from p4.dhcp.tao.org.uk (90.155.77.83)
	by o3dh.com with (AES128-SHA encrypted) SMTP; 10 May 2011 11:16:50 -0400
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Dr Josef Karthauser <josef.karthauser@unitedlane.com>
In-Reply-To: <20110328105726.1928377ryc8ppkis@webmail.leidinger.net>
Date: Tue, 10 May 2011 16:19:35 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <7D77C2D3-9CBC-4F14-A938-5EA0241043B2@unitedlane.com>
References: <9CF23177-92D6-40C5-8C68-B7E2F88236E6@unitedlane.com>
	<20110326225430.00006a76@unknown>
	<3BBB1E36-8E09-4D07-B49E-ACA8548B0B44@unitedlane.com>
	<20110327075814.GA71131@icarus.home.lan>
	<E70F2E76-5253-4DB9-B05B-AEF3C6F4237E@unitedlane.com>
	<20110327084355.GA71864@icarus.home.lan>
	<094E71D9-B28B-46DB-8EA9-B11F17D5A32A@unitedlane.com>
	<20110327094121.GA72701@icarus.home.lan>
	<980F394D-36FC-42F2-9F3F-A3C44A385600@unitedlane.com>
	<20110328105726.1928377ryc8ppkis@webmail.leidinger.net>
To: Alexander Leidinger <Alexander@Leidinger.net>
X-Mailer: Apple Mail (2.1082)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS Problem - full disk, can't recover space :(.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 15:44:56 -0000


On 28 Mar 2011, at 09:57, Alexander Leidinger wrote:

> Quoting Dr Josef Karthauser <josef.karthauser@unitedlane.com> (from =
Sun, 27 Mar 2011 11:01:04 +0100):
>=20
>> I'd really like my disk space back though please! I suspect that I'm =
going to have to wait for 28 to have that happen though :(.
>=20
> As an intermediate action you could export the pool, boot a 9-current =
live-image, import the pool there and export it again. I do not know if =
you need to do a scrub or not to recover the free space or not. AFAIK =
you do not need to update to v28, the new code should take care about =
the issue without an update.
>=20
> This will not prevent loosing space again, but at least it should give =
back the lost space for the moment.
>=20

Ok, I've finally got around to do this. It hasn't recovered the space, =
but it at least is billing it to the right file system now:

	infinity# zfs list void/j/legacy-alpha=20
	NAME                  USED  AVAIL  REFER  MOUNTPOINT
	void/j/legacy-alpha  58.9G  4.11G  56.9G  /j/legacy-alpha

	# du -hs /j/legacy-alpha
	 34G	/j/legacy-alpha

Hmm. It's a pain that I've currently only got 24gb free on this pool, =
otherwise I could do a copy and destroy to try and free it up.

Joe


From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 16:28:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 22DEC106566C
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 16:28:36 +0000 (UTC)
	(envelope-from licquia@linuxfoundation.org)
Received: from rimu.licquia.org (rimu.licquia.org [72.249.37.24])
	by mx1.freebsd.org (Postfix) with ESMTP id EF7C08FC08
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 16:28:34 +0000 (UTC)
Received: from server1.internal.licquia.org
	(c-98-220-117-231.hsd1.in.comcast.net [98.220.117.231])
	by rimu.licquia.org (Postfix) with ESMTPS id 6C24740334
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 11:28:07 -0500 (CDT)
Received: from server1.internal.licquia.org (localhost.localdomain [127.0.0.1])
	by server1.internal.licquia.org (Postfix) with ESMTP id BFEE198066
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 12:28:33 -0400 (EDT)
Received: from [192.168.50.14] (unknown [192.168.50.14])
	by server1.internal.licquia.org (Postfix) with ESMTP id A784098063
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 12:28:33 -0400 (EDT)
Message-ID: <4DC967B1.70102@linuxfoundation.org>
Date: Tue, 10 May 2011 12:28:33 -0400
From: Jeff Licquia <licquia@linuxfoundation.org>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DC8B14E.4050400@linuxfoundation.org>	<B3553DE8-8910-4895-994F-55A323433EE2@gmail.com>
	<op.vu9rylj434t2sn@tech304>
In-Reply-To: <op.vu9rylj434t2sn@tech304>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: ClamAV using ClamSMTP
Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 16:28:36 -0000

On 05/10/2011 10:31 AM, Mark Felder wrote:
> This might seem rather blunt/rude but honestly I don't foresee the
> FreeBSD project integrating with the FHS. The hier(7) layout has been a
> lot of work and it's something the project really strives to standardize
> on. As a FreeBSD user, we've been used to "porting" applications to not
> only run on FreeBSD but also adhere properly to the hier(7) structure.
> But if the FHS wants to adopt hier(7) we won't complain....

Not at all; that's precisely the kind of feedback I'm looking for.

From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 19:30:31 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 50BC01065676;
	Tue, 10 May 2011 19:30:31 +0000 (UTC)
	(envelope-from linimon@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 27C808FC22;
	Tue, 10 May 2011 19:30:31 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4AJUVuT080424;
	Tue, 10 May 2011 19:30:31 GMT
	(envelope-from linimon@freefall.freebsd.org)
Received: (from linimon@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4AJUUhg080412;
	Tue, 10 May 2011 19:30:31 GMT (envelope-from linimon)
Date: Tue, 10 May 2011 19:30:31 GMT
Message-Id: <201105101930.p4AJUUhg080412@freefall.freebsd.org>
To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-fs@FreeBSD.org
From: linimon@FreeBSD.org
Cc: 
Subject: Re: kern/156933: [zfs] ZFS receive after read on readonly=on
	filesystem is corrupted without warning
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 19:30:31 -0000

Old Synopsis: ZFS receive after read on readonly=on filesystem is corrupted without warning
New Synopsis: [zfs] ZFS receive after read on readonly=on filesystem is corrupted without warning

Responsible-Changed-From-To: freebsd-bugs->freebsd-fs
Responsible-Changed-By: linimon
Responsible-Changed-When: Tue May 10 19:30:17 UTC 2011
Responsible-Changed-Why: 
Over to maintainer(s).

http://www.freebsd.org/cgi/query-pr.cgi?pr=156933

From owner-freebsd-fs@FreeBSD.ORG  Tue May 10 20:17:32 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 200CF1065673
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 20:17:32 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 924358FC0A
	for <freebsd-fs@freebsd.org>; Tue, 10 May 2011 20:17:30 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.74 (FreeBSD))
	(envelope-from <gpalmer@freebsd.org>)
	id 1QJtMr-0002VQ-Jz; Tue, 10 May 2011 16:17:17 -0400
Date: Tue, 10 May 2011 16:17:17 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Jeff Licquia <licquia@linuxfoundation.org>
Message-ID: <20110510201717.GA37035@in-addr.com>
References: <4DC8B14E.4050400@linuxfoundation.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DC8B14E.4050400@linuxfoundation.org>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false
Cc: freebsd-fs@freebsd.org
Subject: Re: Filesystem Hierarchy Standard (FHS) and FreeBSD
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 20:17:32 -0000

On Mon, May 09, 2011 at 11:30:22PM -0400, Jeff Licquia wrote:
> (Sorry if this isn't the proper list for this discussion.  If not, 
> please point me in the right direction.)

You may wish to query freebsd-arch@freebsd.org - freebsd-fs is more
aimed at filesystem implementation rather than how the directory
hierarchy is organized on top of the filesystem.  Moving FreeBSD to
a Linux Foundation FHS standard is something that strikes me as being
more an architectural discussion, and perhaps a CC to
freebsd-standards@freebsd.org.

However, I think the answers referring you to hier(7) is certainly
a starting point.  A glance at the FHS standard seems to
also place requirements on which files and/or programs are
in certain locations and also require certain config files
are called certain things, which goes beyond hier(7).

Regards,

Gary


From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 09:48:47 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5B301106564A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 09:48:47 +0000 (UTC)
	(envelope-from fbsd@dannysplace.net)
Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184])
	by mx1.freebsd.org (Postfix) with ESMTP id E1E8F8FC15
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 09:48:46 +0000 (UTC)
Received: from [203.206.171.212] (helo=[192.168.10.10])
	by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <fbsd@dannysplace.net>)
	id 1QK5ju-0006WU-78
	for freebsd-fs@freebsd.org; Wed, 11 May 2011 19:29:55 +1000
Message-ID: <4DCA5620.1030203@dannysplace.net>
Date: Wed, 11 May 2011 19:25:52 +1000
From: Danny Carroll <fbsd@dannysplace.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Authenticated-User: danny
X-Authenticator: plain
X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29)
X-Date: 2011-05-11 19:29:54
X-Connected-IP: 203.206.171.212:58231
X-Message-Linecount: 35
X-Body-Linecount: 24
X-Message-Size: 1393
X-Body-Size: 944
X-Received-Count: 1
X-Recipient-Count: 1
X-Local-Recipient-Count: 1
X-Local-Recipient-Defer-Count: 0
X-Local-Recipient-Fail-Count: 0
X-SA-Exim-Connect-IP: 203.206.171.212
X-SA-Exim-Mail-From: fbsd@dannysplace.net
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	damka.dannysplace.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-SA-Exim-Version: 4.2
X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net)
Subject: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: fbsd@dannysplace.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 09:48:47 -0000

Hello all.

I've been using ZFS for some time now and have never had an issued
(except perhaps the issue of speed...)
When v28 is taken into -STABLE I will most likely upgrade to v28 at that
point.   Currently I am running v15 with v4 on disk.

When I move to v28 I will probably wish to enable a L2Arc and also
perhaps dedicated log devices.

I'm curious about a few things however.

1. Can I remove either the L2 ARC or the log devices if things don't go
as planned or if I need to free up some resources?
2. What are the best practices for setting up these?   Would a geom
mirror for the log device be the way to go.  Or can you just let ZFS
mirror the log itself?
3. What happens when one or both of the log devices fail.   Does ZFS
come to a crashing halt and kill all the data?   Or does it simply
complain that the ZIL is no longer active and continue on it's merry way?

In short, what is the best way to set up these two features?

-D

From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 10:06:58 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7E905106564A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 10:06:58 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.emeryville.ca.mail.comcast.net
	(qmta04.emeryville.ca.mail.comcast.net [76.96.30.40])
	by mx1.freebsd.org (Postfix) with ESMTP id 673DB8FC0A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 10:06:58 +0000 (UTC)
Received: from omta06.emeryville.ca.mail.comcast.net ([76.96.30.51])
	by qmta04.emeryville.ca.mail.comcast.net with comcast
	id iA0Q1g00216AWCUA4A6yVt; Wed, 11 May 2011 10:06:58 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta06.emeryville.ca.mail.comcast.net with comcast
	id iA6w1g0041t3BNj8SA6wjS; Wed, 11 May 2011 10:06:57 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 0B189102C36; Wed, 11 May 2011 03:06:56 -0700 (PDT)
Date: Wed, 11 May 2011 03:06:56 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Danny Carroll <fbsd@dannysplace.net>
Message-ID: <20110511100655.GA35129@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DCA5620.1030203@dannysplace.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 10:06:58 -0000

On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
> I've been using ZFS for some time now and have never had an issued
> (except perhaps the issue of speed...)
> When v28 is taken into -STABLE I will most likely upgrade to v28 at that
> point.   Currently I am running v15 with v4 on disk.
> 
> When I move to v28 I will probably wish to enable a L2Arc and also
> perhaps dedicated log devices.
>
> I'm curious about a few things however.
> 
> 1. Can I remove either the L2 ARC or the log devices if things don't go
> as planned or if I need to free up some resources?

You can remove L2ARC ("cache") devices without impact, but you cannot
remove all log devices without the pool needing to be destroyed
(recreated).  Please keep reading for details of log devices.

L2ARC devices should primarily be something with extremely fast read
rates (e.g. SSDs).  USB1.x and 2.x memory sticks do not work well for
this purpose given protocol and bus speed limits + overhead.  (I only
mention them because people often think "Oh, USB flash would work great
for this!"  I disagree.)

Furthermore, something I found out on my own: the L2ARC is completely
lost in the case the system is cleanly rebooted.  This sometimes
surprises people (myself included) since L2ARC uses actual storage
devices; one might think the data is "restored" on reboot, but it isn't
(because the ARC ("layer 1") itself is lost on reboot, obviously).

The only way to see how much disk space a cache device is using -- to my
knowledge -- is via "zpool iostat -v".

> 2. What are the best practices for setting up these?   Would a geom
> mirror for the log device be the way to go.  Or can you just let ZFS
> mirror the log itself?

Let ZFS handle it.  There is no purpose (in my opinion) to added
complexity when ZFS can handle it itself.  The KISS concept applies
greatly here.

In the case of ZFS intent logs, you definitely want a mirror.  If you
have a single log device, loss of that device can/will result in full
data loss of the pool which makes use of the log device.

Furthermore, a log device is limited to a single pool; e.g. you cannot
use the same log device (e.g. ada6) on pool "foo" and pool "bar".  It's
one or the other.

You should read **all** of the data points listed below and pay close
attention to the details:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Separate_Log_Devices

> 3. What happens when one or both of the log devices fail.   Does ZFS
> come to a crashing halt and kill all the data?   Or does it simply
> complain that the ZIL is no longer active and continue on it's merry way?

See above.

> In short, what is the best way to set up these two features?

See the zpool(1) man page for details on how to make use of log devices.
Examples are provided, including mirroring of such devices.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 10:37:14 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A016E1065672
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 10:37:14 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 29FD48FC14
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 10:37:13 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4BAb3P6048487
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:37:08 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4DCA66CF.7070608@digsys.bg>
Date: Wed, 11 May 2011 13:37:03 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
In-Reply-To: <20110511100655.GA35129@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 10:37:14 -0000


On 11.05.11 13:06, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
>> When I move to v28 I will probably wish to enable a L2Arc and also
>> perhaps dedicated log devices.
>>
> In the case of ZFS intent logs, you definitely want a mirror.  If you
> have a single log device, loss of that device can/will result in full
> data loss of the pool which makes use of the log device.

This is true for v15 pools, not true for v28 pools. In ZFS v28 you can 
remove log devices and in the case of sudden loss of log device (or 
whatever) roll back the pool to a 'good' state. Therefore, for most 
installations single log device might be sufficient. If you value your 
data, you will of course use mirrored log devices, possibly in hot-swap 
configuration and .. have a backup :)

By the way, the SLOG (separate LOG) does not have to be SSD at all. 
Separate rotating disk(s) will also suffice -- it all depends on the 
type of workload. SSDs are better, for the higher end, because of the 
low latency (but not all SSDs are low latency when writing!).

The idea of the SLOG is to separate the ZIL records from the main data 
pool. ZIL records are small, even smaller in v28, but will cause 
unnecessary head movements if kept in the main pool. The SLOG is "write 
once, read on failure" media and is written sequentially. Almost all 
current HDDs offer reasonable sequential write performance for small to 
medium pools.

The L2ARC needs to be fast reading SSD. It is populated slowly, few 
MB/sec so there is no point to have fast and high-bandwidth 
write-optimized SSD. The benefit from L2ARC is the low latency. Sort of 
slower RAM.

It is bad idea to use the same SSD for both SLOG and L2ARC, because most 
SSDs behave poorly if you present them with high read and high write 
loads. More expensive units might behave, but then... if you pay few k$ 
for a SSD, you know what you need :)

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 10:51:20 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D9B1A106564A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 10:51:20 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta10.westchester.pa.mail.comcast.net
	(qmta10.westchester.pa.mail.comcast.net [76.96.62.17])
	by mx1.freebsd.org (Postfix) with ESMTP id 85A1C8FC0A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 10:51:20 +0000 (UTC)
Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89])
	by qmta10.westchester.pa.mail.comcast.net with comcast
	id i9fk1g0031vXlb85AArLwh; Wed, 11 May 2011 10:51:20 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta17.westchester.pa.mail.comcast.net with comcast
	id iArJ1g0251t3BNj3dArK5e; Wed, 11 May 2011 10:51:20 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 6E596102C36; Wed, 11 May 2011 03:51:17 -0700 (PDT)
Date: Wed, 11 May 2011 03:51:17 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Daniel Kalchev <daniel@digsys.bg>
Message-ID: <20110511105117.GA36571@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DCA66CF.7070608@digsys.bg>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 10:51:21 -0000

On Wed, May 11, 2011 at 01:37:03PM +0300, Daniel Kalchev wrote:
> On 11.05.11 13:06, Jeremy Chadwick wrote:
> >On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
> >>When I move to v28 I will probably wish to enable a L2Arc and also
> >>perhaps dedicated log devices.
> >>
> >In the case of ZFS intent logs, you definitely want a mirror.  If you
> >have a single log device, loss of that device can/will result in full
> >data loss of the pool which makes use of the log device.
> 
> This is true for v15 pools, not true for v28 pools. In ZFS v28 you
> can remove log devices and in the case of sudden loss of log device
> (or whatever) roll back the pool to a 'good' state. Therefore, for
> most installations single log device might be sufficient. If you
> value your data, you will of course use mirrored log devices,
> possibly in hot-swap configuration and .. have a backup :)

Has anyone actually *tested* this on FreeBSD?  Set up a single log
device on classic (non-CAM/non-ahci.ko) ATA, then literally yank the
disk out to induce a very bad/rude failure?  Does the kernel panic or
anything weird happen?

I fully acknowledge that in ZFS pool v19 and higher the issue is fixed
(at least on Solaris/OpenSolaris), but at this point in time the RELEASE
and STABLE branches are running pool version 15.

There are numerous ongoing discussions about the ZFS v28 patches right
now with regards to STABLE specifically.  Recent threads:

- Patch did not apply correctly (errors/rejections)
- Patch applied correctly but build failed (use "patch -E" I believe?)
- Discussion about when v28 is *truly* coming to RELENG_8 and if it's
  truly ready for RELENG_8

And finally, there's the one thing that people often forget/miss: if you
upgrade your pool from v15 to v28 (needed to address the log removal
stuff you mention), you cannot roll back without recreating all of your
pools.  Folks considering v28 need to take that into consideration.

> By the way, the SLOG (separate LOG) does not have to be SSD at all.
> Separate rotating disk(s) will also suffice -- it all depends on the
> type of workload. SSDs are better, for the higher end, because of
> the low latency (but not all SSDs are low latency when writing!).

I didn't state log devices should be SSDs.  I stated cache devices
(L2ARC) should be SSDs.  :-)  A non-high-end SSD for a log device is
probably a very bad idea given the sub-par write speeds, agreed.  A
FusionIO card/setup on the other hand would probably work wonderfully,
but that's much more expensive (you cover that below).

> The idea of the SLOG is to separate the ZIL records from the main
> data pool. ZIL records are small, even smaller in v28, but will
> cause unnecessary head movements if kept in the main pool. The SLOG
> is "write once, read on failure" media and is written sequentially.
> Almost all current HDDs offer reasonable sequential write
> performance for small to medium pools.
> 
> The L2ARC needs to be fast reading SSD. It is populated slowly, few
> MB/sec so there is no point to have fast and high-bandwidth
> write-optimized SSD. The benefit from L2ARC is the low latency. Sort
> of slower RAM.

Agreed, and the overall point to L2ARC is to help with improved random
reads, if I remember right.  The concept is that it's a 2nd layer
of caching that shouldn't hurt or hinder performance when used/put in
place, but can greatly help when the "layer 1" ARC lacks an entry.

> It is bad idea to use the same SSD for both SLOG and L2ARC, because
> most SSDs behave poorly if you present them with high read and high
> write loads. More expensive units might behave, but then... if you
> pay few k$ for a SSD, you know what you need :)

Again, agreed.

Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
should also keep that in mind when putting an SSD into use in this
fashion.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 11:17:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C5FB1065675
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 11:17:53 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id C83CD8FC1F
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 11:17:52 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4BBHgIC048656
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Wed, 11 May 2011 14:17:47 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4DCA7056.20200@digsys.bg>
Date: Wed, 11 May 2011 14:17:42 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan>
In-Reply-To: <20110511105117.GA36571@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 11:17:53 -0000


On 11.05.11 13:51, Jeremy Chadwick wrote:
>
> Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> should also keep that in mind when putting an SSD into use in this
> fashion.
>
By the way, what would be the use of TRIM for SLOG and L2ARC devices?
I see absolutely no benefit from TRIM for the L2ARC, because it is 
written slowly (on purpose). Any current, or 1-2 generations back SSD 
would handle that write load without TRIM and without any performance 
degradation.

Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC SSD 
for the SLOG, for many reasons. The write regions on the SLC NAND should 
be smaller (my wild guess, current practice may differ) and the need for 
rewriting will be small. If you don't need to rewrite already written 
data, TRIM does not help. Also, as far as I understand, most "serious" 
SSDs (typical for SLC I guess) would have twice or more the advertised 
size and always write to fresh cells, scheduling an background erase of 
the 'overwritten' cell.

Does Solaris have TRIM for ZFS? Where? How does it help? I can imagine 
TRIM for the data pool, that would be good fit for ZFS, but SSD-only 
pool.. are we already there?

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 12:08:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0CF36106566C
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 12:08:34 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta13.westchester.pa.mail.comcast.net
	(qmta13.westchester.pa.mail.comcast.net [76.96.59.243])
	by mx1.freebsd.org (Postfix) with ESMTP id AA4F28FC0A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 12:08:33 +0000 (UTC)
Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89])
	by qmta13.westchester.pa.mail.comcast.net with comcast
	id iAtb1g0051vXlb85DC8ZhW; Wed, 11 May 2011 12:08:33 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta17.westchester.pa.mail.comcast.net with comcast
	id iC8X1g01X1t3BNj3dC8YDG; Wed, 11 May 2011 12:08:33 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 7BB6B102C36; Wed, 11 May 2011 05:08:30 -0700 (PDT)
Date: Wed, 11 May 2011 05:08:30 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Daniel Kalchev <daniel@digsys.bg>
Message-ID: <20110511120830.GA37515@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DCA7056.20200@digsys.bg>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 12:08:34 -0000

On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> On 11.05.11 13:51, Jeremy Chadwick wrote:
> >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> >should also keep that in mind when putting an SSD into use in this
> >fashion.
>
> By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> I see absolutely no benefit from TRIM for the L2ARC, because it is
> written slowly (on purpose).  Any current, or 1-2 generations back SSD
> would handle that write load without TRIM and without any performance
> degradation.
>
> Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> SSD for the SLOG, for many reasons. The write regions on the SLC
> NAND should be smaller (my wild guess, current practice may differ)
> and the need for rewriting will be small. If you don't need to
> rewrite already written data, TRIM does not help. Also, as far as I
> understand, most "serious" SSDs (typical for SLC I guess) would have
> twice or more the advertised size and always write to fresh cells,
> scheduling an background erase of the 'overwritten' cell.

AFAIK, drive manufacturers do not disclose just how much reallocation
space they keep available on an SSD.  I'd rather not speculate as to how
much, as I'm certain it varies per vendor.

I can talk a bit about SSD drive performance from a consumer level (that
is to say: low-end consumer Intel SSDs such as the X25-V, X25-M, and
latest 320 and 510 series -- I use them all over the place), both before
and after TRIM operations.  I don't use any of these SSDs on ZFS
however, only UFS (and I have seen the results both before and after
TRIM support was added to UFS; and yeah, all the drives are running the
latest firmware).

What's confusing to me is why someone would say TRIM doesn't really
matter in the case of an intent log device or a cache device; these
devices both implement some degree of write operations, correct?  The
drive has to erase the NAND flash block (well, page really) before the
block can be re-used (written to once again), so by not doing TRIM
effectively you're relying 100% on drives' garbage collection
mechanisms, which isn't that great (at least WRT the above drives).
There are some sites that go over Intel SSD performance out-of-the-box
as well as once its been used for a bit, and the performance difference
is pretty substantial (50% drop in performance for reads, ~60-70% drop
in performance for writes).  Something to keep in mind.

Furthermore, most people aren't buying SLC given the cost.  Right now
the absolute #1 or #2 focus of any operation is to save money; one
cannot argue with the current economic condition.  I think this is also
why many SSD companies are focusing primarily on MLC right now; they
know the majority of their client base isn't going to spend the money
for SLC.

> Does Solaris have TRIM for ZFS? Where? How does it help? I can
> imagine TRIM for the data pool, that would be good fit for ZFS, but
> SSD-only pool.. are we already there?

The following blog post, and mailing list thread, provides answers to
all of the above questions, including why TRIM is useful on ZFS (see the
comments section; not referring to slog or cache however).  But it
doesn't look like it's actually made use of in ZFS as of January 2011.
There is a long discussion about ZFS, TRIM, and slog/cache in the 2nd
thread.  There's also a reply from pjd@ in there WRT FreeBSD.

http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/44855

There's a recommendation to permit TRIM in ZFS, but limit the number of
txgs based on a sysctl, since TRIM is a slow/expensive operation.
SSDs are neat, but man, NAND-based flash sure makes me a sad panda.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 13:16:27 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 498841065670
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:16:27 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id F01448FC08
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:16:26 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155D6E.dip.t-dialin.net
	[91.21.93.110])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E1D4F844015;
	Wed, 11 May 2011 15:16:12 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id D4A1E1279;
	Wed, 11 May 2011 15:16:09 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4BDG78o064021;
	Wed, 11 May 2011 15:16:07 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Wed, 11 May 2011
	15:16:07 +0200
Message-ID: <20110511151607.105949eypk3ed3c4@webmail.leidinger.net>
Date: Wed, 11 May 2011 15:16:07 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
In-Reply-To: <20110511100655.GA35129@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: E1D4F844015.A079B
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=2.3, required 6, autolearn=disabled,
	MANGLED_LOAN 2.30)
X-EBL-MailScanner-SpamScore: ss
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305724573.53846@wn90bcMmycaD21chBPo8UQ
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 13:16:27 -0000

Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Wed, 11 May  
2011 03:06:56 -0700):

> L2ARC devices should primarily be something with extremely fast read
> rates (e.g. SSDs).  USB1.x and 2.x memory sticks do not work well for
> this purpose given protocol and bus speed limits + overhead.  (I only
> mention them because people often think "Oh, USB flash would work great
> for this!"  I disagree.)

Using USB flash may work acceptable. It depends upon the rest of the  
system. If you have very fast harddisks (or only USB 1 hardware), USB  
flash will not give you a faster FS. If you have slow (and low-power)  
desktop disks, a fast USB flash (attention, there are also slow ones)  
connected via USB 2 (or 3) will give you a speed improvement you notice.

As a matter of fact, I have this:
  - Pentium 4
  - 1 GB RAM
  - 1 Western Digital Caviar Blue
  - 2 Seagate Barracuda 7200.10
  - an ICH5 controller (no NCQ)
  - no name cheap give-away 1 GB USB flash (so not a very fast one)

The disks are used in a RAIDZ, with the USB flash as a cache device.

My use case was connecting to a webmail system over a slow line (ADSL  
224 kilobit/s). I noticed directly when the cache was in use or not.

I also have another system, ICH 10 with NCQ, 5 disks (WD RE4 RAID) in  
RAIDZ2, Intel Xeon 4-core, 12 GB RAM. There USB flash does not make  
sense at all (and the SSD makes sense if you compare the price of the  
entire system with the price of a small or medium SSD).

For the first system, it does not make sense to spend 200 units of  
money for a SSD, the system itself is not worth much more now.  
Spending 5-10 units of money for this system is ok, and gives a speed  
improvement.

Bye,
Alexander.

-- 
Even God cannot change the past.
		-- Joseph Stalin

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 13:57:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8A88C106566B
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:57:19 +0000 (UTC)
	(envelope-from rs@bytecamp.net)
Received: from mail.bytecamp.net (mail.bytecamp.net [212.204.60.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 1689F8FC0C
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:57:18 +0000 (UTC)
Received: (qmail 50429 invoked by uid 89); 11 May 2011 15:57:17 +0200
Received: from stella.bytecamp.net (HELO ?212.204.60.37?)
	(rs%bytecamp.net@212.204.60.37)
	by mail.bytecamp.net with CAMELLIA256-SHA encrypted SMTP;
	11 May 2011 15:57:17 +0200
Message-ID: <4DCA95BD.6000401@bytecamp.net>
Date: Wed, 11 May 2011 15:57:17 +0200
From: Robert Schulze <rs@bytecamp.net>
Organization: bytecamp GmbH
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: regarding vfs.zfs.scrub_limit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 13:57:19 -0000

Hi,

this value defaults to 10 on 8-STABLE.
Is it possible/reasonable to raise that value in order to get a scrub 
finishing in less time? If so, which values could be recommended?

with kind regards,
Robert Schulze

From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 13:59:00 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3721E1065674
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:59:00 +0000 (UTC)
	(envelope-from james@jlauser.net)
Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com
	[209.85.220.182])
	by mx1.freebsd.org (Postfix) with ESMTP id E84BA8FC1F
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 13:58:59 +0000 (UTC)
Received: by vxc34 with SMTP id 34so477487vxc.13
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 06:58:59 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.52.112.130 with SMTP id iq2mr7597018vdb.216.1305122339087;
	Wed, 11 May 2011 06:58:59 -0700 (PDT)
Received: by 10.220.177.199 with HTTP; Wed, 11 May 2011 06:58:59 -0700 (PDT)
X-Originating-IP: [13.13.16.2]
In-Reply-To: <4DCA66CF.7070608@digsys.bg>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
Date: Wed, 11 May 2011 09:58:59 -0400
Message-ID: <BANLkTikVNzjKdfJhjZ11oRQmCyUkJ=94Hg@mail.gmail.com>
From: "James L. Lauser" <james@jlauser.net>
To: freebsd-fs@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 13:59:00 -0000

On Wed, May 11, 2011 at 6:37 AM, Daniel Kalchev <daniel@digsys.bg> wrote:

>
>
> On 11.05.11 13:06, Jeremy Chadwick wrote:
>
>> On Wed, May 11, 2011 at 07:25:52PM +1000, Danny Carroll wrote:
>>
>>> When I move to v28 I will probably wish to enable a L2Arc and also
>>> perhaps dedicated log devices.
>>>
>>>  In the case of ZFS intent logs, you definitely want a mirror.  If you
>> have a single log device, loss of that device can/will result in full
>> data loss of the pool which makes use of the log device.
>>
>
> This is true for v15 pools, not true for v28 pools. In ZFS v28 you can
> remove log devices and in the case of sudden loss of log device (or
> whatever) roll back the pool to a 'good' state. Therefore, for most
> installations single log device might be sufficient. If you value your data,
> you will of course use mirrored log devices, possibly in hot-swap
> configuration and .. have a backup :)
>
> By the way, the SLOG (separate LOG) does not have to be SSD at all.
> Separate rotating disk(s) will also suffice -- it all depends on the type of
> workload. SSDs are better, for the higher end, because of the low latency
> (but not all SSDs are low latency when writing!).
>
> The idea of the SLOG is to separate the ZIL records from the main data
> pool. ZIL records are small, even smaller in v28, but will cause unnecessary
> head movements if kept in the main pool. The SLOG is "write once, read on
> failure" media and is written sequentially. Almost all current HDDs offer
> reasonable sequential write performance for small to medium pools.
>
> The L2ARC needs to be fast reading SSD. It is populated slowly, few MB/sec
> so there is no point to have fast and high-bandwidth write-optimized SSD.
> The benefit from L2ARC is the low latency. Sort of slower RAM.
>
> It is bad idea to use the same SSD for both SLOG and L2ARC, because most
> SSDs behave poorly if you present them with high read and high write loads.
> More expensive units might behave, but then... if you pay few k$ for a SSD,
> you know what you need :)
>
> Daniel
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


I recently learned the hard way that you need to be very careful what you
choose as your ZIL.  On my personal file server, my pool is comprised of 4x
500 GB disks in a RAID-Z and 2x 1.5 TB disks in a mirror.  I also had a 1 GB
Compact Flash card plugged into an IDE adapter, running as the ZIL.  For the
longest time, my write performance was capped at about 5 MB/sec.  In an
attempt to figure out why, I ran gstat, to see that the CF device was pegged
at 100%.

Having recently upgraded to ZFSv28, I decided to try removing the log
device.  Write performance instantly jumped to 45 MB/sec.  Lesson
learned...  If you're going to have a dedicated ZIL, make sure its write
performance exceeds the performance of the pool itself.

On the other hand, again having upgrading to v28, I attempted to use
deduplication on my pool.  Write performance dropped to an abysmal 1
MB/sec.  Why?  Because, as I found out, my system doesn't have enough memory
to keep the dedupe table in memory, nor can it be upgraded to.  But with the
application of a sufficiently large cache device, performance goes right
back up to where it's supposed to be.

--  James L. Lauser
    james@jlauser.net
    http://jlauser.net/

From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 18:26:34 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 12394106564A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 18:26:34 +0000 (UTC)
	(envelope-from roberto@keltia.freenix.fr)
Received: from keltia.net (centre.keltia.net [IPv6:2a01:240:fe5c::41])
	by mx1.freebsd.org (Postfix) with ESMTP id C000F8FC15
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 18:26:33 +0000 (UTC)
Received: from rron-2.local (unknown [192.75.139.250])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(No client certificate requested) (Authenticated sender: roberto)
	by keltia.net (Postfix/TLS) with ESMTPSA id 8DFBCE11E;
	Wed, 11 May 2011 20:26:30 +0200 (CEST)
Date: Wed, 11 May 2011 20:26:33 +0200
From: Ollivier Robert <roberto@keltia.freenix.fr>
To: freebsd-fs@freebsd.org, Robert Schulze <rs@bytecamp.net>
Message-ID: <20110511182632.GA14921@rron-2.local>
References: <4DC25DA6.3060009@bytecamp.net>
	<BANLkTimbyUEK=1TsND0z8y6QL5-EnSqnzA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <BANLkTimbyUEK=1TsND0z8y6QL5-EnSqnzA@mail.gmail.com>
X-Operating-System: MacOS X / Macbook Pro - FreeBSD 7.2 / Dell D820 SMP
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: 
Subject: Re: zfs l2arc issue
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 18:26:34 -0000

According to Artem Belevich:
> There was an issue with clock_t type overflow . It was fixed in
> r218429 on Feb 8th in 8-stable.

I can confirm that it does indeed fix the issue.

-- 
Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- roberto@keltia.net
In memoriam to Ondine, our 2nd child: http://ondine.keltia.net/


From owner-freebsd-fs@FreeBSD.ORG  Wed May 11 22:38:57 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 71081106564A
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 22:38:57 +0000 (UTC)
	(envelope-from jhellenthal@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 2467A8FC14
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 22:38:57 +0000 (UTC)
Received: by iwn33 with SMTP id 33so1288320iwn.13
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 15:38:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:sender:date:from:to:cc:subject:message-id
	:references:mime-version:content-type:content-disposition
	:in-reply-to:x-openpgp-key-id:x-openpgp-key-fingerprint
	:x-openpgp-key-url;
	bh=XOWoUv0xjw8dEXQmU00hfRE0ZfN/M9+Lje7UDJImJuE=;
	b=qFoCI9G6fG44/tl8DKXFxtYbxl7tOTY6mfh2l72iSP+EVGJubABGBCreLX5nL9601a
	H0A0psHO4VFpfkgcBPjeED46CfhziMB4SV4/Dv3nkIsfsE/LFxA8kGCaSep9r164ggCB
	UorEOa6tTkcu2BuHg1kKiQGSdtVxP7VjHwdQc=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=sender:date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:x-openpgp-key-id
	:x-openpgp-key-fingerprint:x-openpgp-key-url;
	b=tZB/DP1UQnjB1A74N800ltnqZ/rxbFDn2p5Is6fv/eGSNUUmhAGM9X4WuGaaiz9uLc
	hv4TPs85cFM70W5zqznHi+fS6O0UebEV2HzC4ELWnijlzQxrliAhL70xmdbNYxiHNY2a
	cjc27Y1Urch7BtinFQndWVKJw8nULmsFD4ITg=
Received: by 10.42.170.3 with SMTP id d3mr2102112icz.438.1305153536037;
	Wed, 11 May 2011 15:38:56 -0700 (PDT)
Received: from DataIX.net (adsl-99-190-84-116.dsl.klmzmi.sbcglobal.net
	[99.190.84.116])
	by mx.google.com with ESMTPS id d9sm205968ibb.2.2011.05.11.15.38.53
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 11 May 2011 15:38:54 -0700 (PDT)
Sender: "J. Hellenthal" <jhellenthal@gmail.com>
Received: from DataIX.net (localhost [127.0.0.1])
	by DataIX.net (8.14.4/8.14.4) with ESMTP id p4BMconB074933
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 11 May 2011 18:38:51 -0400 (EDT)
	(envelope-from jhell@DataIX.net)
Received: (from jhell@localhost)
	by DataIX.net (8.14.4/8.14.4/Submit) id p4BMcnFu074932;
	Wed, 11 May 2011 18:38:49 -0400 (EDT)
	(envelope-from jhell@DataIX.net)
Date: Wed, 11 May 2011 18:38:49 -0400
From: Jason Hellenthal <jhell@DataIX.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20110511223849.GA65193@DataIX.net>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="jRHKVT23PllUwdXP"
Content-Disposition: inline
In-Reply-To: <20110511120830.GA37515@icarus.home.lan>
X-OpenPGP-Key-Id: 0x89D8547E
X-OpenPGP-Key-Fingerprint: 85EF E26B 07BB 3777 76BE  B12A 9057 8789 89D8 547E
X-OpenPGP-Key-URL: http://bit.ly/0x89D8547E
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2011 22:38:57 -0000


--jRHKVT23PllUwdXP
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


Jeremy,

On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > >should also keep that in mind when putting an SSD into use in this
> > >fashion.
> >
> > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > would handle that write load without TRIM and without any performance
> > degradation.
> >
> > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > SSD for the SLOG, for many reasons. The write regions on the SLC
> > NAND should be smaller (my wild guess, current practice may differ)
> > and the need for rewriting will be small. If you don't need to
> > rewrite already written data, TRIM does not help. Also, as far as I
> > understand, most "serious" SSDs (typical for SLC I guess) would have
> > twice or more the advertised size and always write to fresh cells,
> > scheduling an background erase of the 'overwritten' cell.
>=20
> AFAIK, drive manufacturers do not disclose just how much reallocation
> space they keep available on an SSD.  I'd rather not speculate as to how
> much, as I'm certain it varies per vendor.
>=20

Lets not forget here: The size of the separate log device may be quite=20
small. A rule of thumb is that you should size the separate log to be able=
=20
to handle 10 seconds of your expected synchronous write workload. It would=
=20
be rare to need more than 100 MB in a separate log device, but the=20
separate log must be at least 64 MB.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide


So in other words how much is TRIM really even effective give the above ?

Even with a high database write load on the disks at full compacity of the=
=20
incoming link I would find it hard to believe that anyone could get the=20
ZIL to even come close to 512MB.


Given most SSD's come at a size greater than 32GB I hope this comes as a=20
early reminder that the ZIL you are buying that disk for is only going to=
=20
be using a small percent of that disk and I hope you justify cost over its=
=20
actual use. If you do happen to justify creating a ZIL for your pool then=
=20
I hope that you partition it wisely to make use of the rest of the space=20
that is untouched.

For all other cases I would reccomend if you still want to have a ZIL that=
=20
you take some sort of PCI->SD CARD or USB stick into account with=20
mirroring.=20

--=20

 Regards, (jhell)
 Jason Hellenthal


--jRHKVT23PllUwdXP
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)
Comment: http://bit.ly/0x89D8547E

iQEcBAEBAgAGBQJNyw/5AAoJEJBXh4mJ2FR+besH/39USB9nnfhl5wL/rH+i7lpY
7lWVW48D0V8kbb2IAOSyGkIrUsvBqdHWmS6FJ5aYPzcrQVJg/ipiuY9c4n/SB9yy
k7wF4PgU3uFFyluEKofsRLFtccCd+a5+U5QEdgoT2HXtcI6SNC0tk6dwUJL1M0uu
Rzc3g7RQWF1hauDna7Mle13G43iQQThOTnpzWFVQFISQv3Nve/pYUVVXKKwS5e+n
g+pS6NkImO6pb070BrAEwv4H4Xm0VBaFRIi2qV1Uc0J350vXjNIfWMBEO6Q4JNWV
vBATQh7xR/OyttVXfAVnaohxdKsYhr34VqDdHjfSCsRlPZaH0ifSq6C0QLQeFhk=
=o/7q
-----END PGP SIGNATURE-----

--jRHKVT23PllUwdXP--

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 01:04:41 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E139E106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:04:41 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta03.emeryville.ca.mail.comcast.net
	(qmta03.emeryville.ca.mail.comcast.net [76.96.30.32])
	by mx1.freebsd.org (Postfix) with ESMTP id C23408FC12
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:04:41 +0000 (UTC)
Received: from omta15.emeryville.ca.mail.comcast.net ([76.96.30.71])
	by qmta03.emeryville.ca.mail.comcast.net with comcast
	id iR241g0031Y3wxoA3R4hBV; Thu, 12 May 2011 01:04:41 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta15.emeryville.ca.mail.comcast.net with comcast
	id iR4Z1g00t1t3BNj8bR4bDo; Thu, 12 May 2011 01:04:36 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 92A1E102C36; Wed, 11 May 2011 18:04:33 -0700 (PDT)
Date: Wed, 11 May 2011 18:04:33 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Jason Hellenthal <jhell@DataIX.net>
Message-ID: <20110512010433.GA48863@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
	<20110511223849.GA65193@DataIX.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110511223849.GA65193@DataIX.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 01:04:42 -0000

On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> 
> Jeremy,
> 
> On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > >should also keep that in mind when putting an SSD into use in this
> > > >fashion.
> > >
> > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > would handle that write load without TRIM and without any performance
> > > degradation.
> > >
> > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > NAND should be smaller (my wild guess, current practice may differ)
> > > and the need for rewriting will be small. If you don't need to
> > > rewrite already written data, TRIM does not help. Also, as far as I
> > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > twice or more the advertised size and always write to fresh cells,
> > > scheduling an background erase of the 'overwritten' cell.
> > 
> > AFAIK, drive manufacturers do not disclose just how much reallocation
> > space they keep available on an SSD.  I'd rather not speculate as to how
> > much, as I'm certain it varies per vendor.
> > 
> 
> Lets not forget here: The size of the separate log device may be quite 
> small. A rule of thumb is that you should size the separate log to be able 
> to handle 10 seconds of your expected synchronous write workload. It would 
> be rare to need more than 100 MB in a separate log device, but the 
> separate log must be at least 64 MB.
> 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> 
> So in other words how much is TRIM really even effective give the above ?
> 
> Even with a high database write load on the disks at full compacity of the 
> incoming link I would find it hard to believe that anyone could get the 
> ZIL to even come close to 512MB.

In the case of an SSD being used as a log device (ZIL), I imagine it
would only matter the longer the drive was kept in use.  I do not use
log devices anywhere with ZFS, so I can't really comment.

In the case of an SSD being used as a cache device (L2ARC), I imagine it
would matter much more.

In the case of an SSD being used as a pool device, it matters greatly.

Why it matters: there's two methods of "reclaiming" blocks which were
used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
reclaimed, it has to be erased -- SSDs erase things in pages rather
than individual LBAs.  With TRIM, you submit the data management command
via ATA with a list of LBAs you wish to inform the drive are no longer
used.  The drive aggregates the LBA ranges, determines if an entire
flash page can be erased, and does it.  If it can't, it makes some sort
of mental note that the individual LBA (in some particular page)
shouldn't be used.

The "garbage collection" works when the SSD is idle.  I have no idea
what "idle" actually means operationally, because again, vendors don't
disclose what the idle intervals are.  5 minutes?  24 hours?  It
matters, but they don't tell us.  (What confuses me about the "idle GC"
method is how it determines what it can erase -- if the OS didn't tell
it what it's using, how does it know it can erase the page?)

Anyway, how all this manifests itself performance-wise is intriguing.
It's not speculation: there's hard evidence that not using TRIM results
in SSD performance, bluntly put, sucking badly on some SSDs.

There's this mentality that wear levelling completely solves all of the
**performance** concerns -- that isn't the case at all.  In fact, I'm
under the impression it probably hurts performance, but it depends on
how it's implemented within the drive firmware.

bit-tech did an experiment using Windows 7 -- which supports and uses
TRIM assuming the device advertises the capability -- with different
models of SSDs.  The testing procedure is documented here, but I'll
document it as well:

http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4

Again, remember, this is done on a Windows 7 system which does support
TRIM if the device supports it.  The testing steps, in this order:

1) SSD without TRIM support -- all LBAs are zeroed.
2) Took read/write benchmark readings.
3) SSD without TRIM support -- partitioned and formatted as NTFS
   (cluster size unknown), copied 100GB of data to the drive, deleted all
   the data, and repeated this method 10 times.
4) Step #2 repeated.
5) Upgraded SSD firmware to a version that supports TRIM.
6) SSD with TRIM support -- step #1 repeated.
7) Step #2 repeated.
8) SSD with TRIM support -- step #3 repeated.
9) Step #2 repeated.

Without TRIM, some drives drop their read performance by more than 50%,
and write performance by almost 70%.  I'm focusing on Intel SSDs here,
by the way.  I do not care for OCZ or Corsair products.

So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
on FreeBSD will mimic (to some degree).

Therefore, simply put, users should be concerned when using ZFS on
FreeBSD with SSDs.  It doesn't matter to me if you're only using
64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
means degraded performance over time.

Can you refute any of this evidence?

> Given most SSD's come at a size greater than 32GB I hope this comes as a 
> early reminder that the ZIL you are buying that disk for is only going to 
> be using a small percent of that disk and I hope you justify cost over its 
> actual use. If you do happen to justify creating a ZIL for your pool then 
> I hope that you partition it wisely to make use of the rest of the space 
> that is untouched.
> 
> For all other cases I would reccomend if you still want to have a ZIL that 
> you take some sort of PCI->SD CARD or USB stick into account with 
> mirroring.

Others have pointed out this isn't effective (re: USB sticks).  The read
and write speeds are too slow, and limit the overall performance of ZFS
in a very bad way.  I can absolutely confirm this claim (I've tested it
myself, using a high-end USB flash drive as a cache device (L2ARC)).

Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
*does* improve performance on older systems which have slower disk I/O
(e.g. ICH5-based systems).

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 01:48:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8683B106566B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:48:56 +0000 (UTC)
	(envelope-from jhellenthal@gmail.com)
Received: from mail-iw0-f182.google.com (mail-iw0-f182.google.com
	[209.85.214.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 37E9D8FC0C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:48:56 +0000 (UTC)
Received: by iwn33 with SMTP id 33so1411860iwn.13
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 18:48:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:sender:date:from:to:cc:subject:message-id
	:references:mime-version:content-type:content-disposition
	:in-reply-to:x-openpgp-key-id:x-openpgp-key-fingerprint
	:x-openpgp-key-url;
	bh=0juTFZwYiqAOITQrJKnlajsg9UTRtRcaAc1MlkxS2N4=;
	b=VzCiLex67NiTiEFFoKZMX1IsPnQ8RyIOfTJhqwIGVr738FxtDIOudRQd66HmYGG9dS
	relruV04Qbc2EPDSEg9GVmDetVOiB9O3FEbTyCcwzrHK3skoeYVtKlS6w+58rw185GCZ
	4Mw2rwqFFppE/uB32aA4jWIwdvMfTsCYKH0lo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=sender:date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:x-openpgp-key-id
	:x-openpgp-key-fingerprint:x-openpgp-key-url;
	b=pXXYMPkDnUx7RxrrKiGNDIFsLVwmuYdmxS0X09GKIEfjh1RsUv0J+hLDYLQnlQkhYp
	yw5InLVWZ+J9wMFHf8ocXW1KzdVIJCXWsdR/vfC27k6WkaBVgsScge6oflVd+wojDvZe
	jcxrZ+kYPZRhkpf8Un2/LAsdBOO8WLsbOJ/YY=
Received: by 10.43.62.134 with SMTP id xa6mr4315019icb.369.1305164934349;
	Wed, 11 May 2011 18:48:54 -0700 (PDT)
Received: from DataIX.net (adsl-99-190-84-116.dsl.klmzmi.sbcglobal.net
	[99.190.84.116])
	by mx.google.com with ESMTPS id hc41sm268704ibb.47.2011.05.11.18.48.52
	(version=TLSv1/SSLv3 cipher=OTHER);
	Wed, 11 May 2011 18:48:53 -0700 (PDT)
Sender: "J. Hellenthal" <jhellenthal@gmail.com>
Received: from DataIX.net (localhost [127.0.0.1])
	by DataIX.net (8.14.4/8.14.4) with ESMTP id p4C1mok4085883
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 11 May 2011 21:48:50 -0400 (EDT)
	(envelope-from jhell@DataIX.net)
Received: (from jhell@localhost)
	by DataIX.net (8.14.4/8.14.4/Submit) id p4C1mnE5085882;
	Wed, 11 May 2011 21:48:49 -0400 (EDT)
	(envelope-from jhell@DataIX.net)
Date: Wed, 11 May 2011 21:48:48 -0400
From: Jason Hellenthal <jhell@DataIX.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20110512014848.GA35736@DataIX.net>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
	<20110511223849.GA65193@DataIX.net>
	<20110512010433.GA48863@icarus.home.lan>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="17pEHd4RhPHOinZp"
Content-Disposition: inline
In-Reply-To: <20110512010433.GA48863@icarus.home.lan>
X-OpenPGP-Key-Id: 0x89D8547E
X-OpenPGP-Key-Fingerprint: 85EF E26B 07BB 3777 76BE  B12A 9057 8789 89D8 547E
X-OpenPGP-Key-URL: http://bit.ly/0x89D8547E
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 01:48:56 -0000


--17pEHd4RhPHOinZp
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


Jeremy, As always the qaulity of your messages are 101% spot on and I=20
always find some new new information that becomes handy more often than I=
=20
could say, and there is always something to be learned.=20

Thanks.

On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> >=20
> > Jeremy,
> >=20
> > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so fo=
lks
> > > > >should also keep that in mind when putting an SSD into use in this
> > > > >fashion.
> > > >
> > > > By the way, what would be the use of TRIM for SLOG and L2ARC device=
s?
> > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > written slowly (on purpose).  Any current, or 1-2 generations back =
SSD
> > > > would handle that write load without TRIM and without any performan=
ce
> > > > degradation.
> > > >
> > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > and the need for rewriting will be small. If you don't need to
> > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > twice or more the advertised size and always write to fresh cells,
> > > > scheduling an background erase of the 'overwritten' cell.
> > >=20
> > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > space they keep available on an SSD.  I'd rather not speculate as to =
how
> > > much, as I'm certain it varies per vendor.
> > >=20
> >=20
> > Lets not forget here: The size of the separate log device may be quite=
=20
> > small. A rule of thumb is that you should size the separate log to be a=
ble=20
> > to handle 10 seconds of your expected synchronous write workload. It wo=
uld=20
> > be rare to need more than 100 MB in a separate log device, but the=20
> > separate log must be at least 64 MB.
> >=20
> > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> >=20
> > So in other words how much is TRIM really even effective give the above=
 ?
> >=20
> > Even with a high database write load on the disks at full compacity of =
the=20
> > incoming link I would find it hard to believe that anyone could get the=
=20
> > ZIL to even come close to 512MB.
>=20
> In the case of an SSD being used as a log device (ZIL), I imagine it
> would only matter the longer the drive was kept in use.  I do not use
> log devices anywhere with ZFS, so I can't really comment.
>=20
> In the case of an SSD being used as a cache device (L2ARC), I imagine it
> would matter much more.
>=20
> In the case of an SSD being used as a pool device, it matters greatly.
>=20
> Why it matters: there's two methods of "reclaiming" blocks which were
> used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> reclaimed, it has to be erased -- SSDs erase things in pages rather
> than individual LBAs.  With TRIM, you submit the data management command
> via ATA with a list of LBAs you wish to inform the drive are no longer
> used.  The drive aggregates the LBA ranges, determines if an entire
> flash page can be erased, and does it.  If it can't, it makes some sort
> of mental note that the individual LBA (in some particular page)
> shouldn't be used.
>=20
> The "garbage collection" works when the SSD is idle.  I have no idea
> what "idle" actually means operationally, because again, vendors don't
> disclose what the idle intervals are.  5 minutes?  24 hours?  It
> matters, but they don't tell us.  (What confuses me about the "idle GC"
> method is how it determines what it can erase -- if the OS didn't tell
> it what it's using, how does it know it can erase the page?)
>=20
> Anyway, how all this manifests itself performance-wise is intriguing.
> It's not speculation: there's hard evidence that not using TRIM results
> in SSD performance, bluntly put, sucking badly on some SSDs.
>=20
> There's this mentality that wear levelling completely solves all of the
> **performance** concerns -- that isn't the case at all.  In fact, I'm
> under the impression it probably hurts performance, but it depends on
> how it's implemented within the drive firmware.
>=20
> bit-tech did an experiment using Windows 7 -- which supports and uses
> TRIM assuming the device advertises the capability -- with different
> models of SSDs.  The testing procedure is documented here, but I'll
> document it as well:
>=20
> http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-perform=
ance-and-trim/4
>=20
> Again, remember, this is done on a Windows 7 system which does support
> TRIM if the device supports it.  The testing steps, in this order:
>=20
> 1) SSD without TRIM support -- all LBAs are zeroed.
> 2) Took read/write benchmark readings.
> 3) SSD without TRIM support -- partitioned and formatted as NTFS
>    (cluster size unknown), copied 100GB of data to the drive, deleted all
>    the data, and repeated this method 10 times.
> 4) Step #2 repeated.
> 5) Upgraded SSD firmware to a version that supports TRIM.
> 6) SSD with TRIM support -- step #1 repeated.
> 7) Step #2 repeated.
> 8) SSD with TRIM support -- step #3 repeated.
> 9) Step #2 repeated.
>=20
> Without TRIM, some drives drop their read performance by more than 50%,
> and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> by the way.  I do not care for OCZ or Corsair products.
>=20
> So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> on FreeBSD will mimic (to some degree).
>=20
> Therefore, simply put, users should be concerned when using ZFS on
> FreeBSD with SSDs.  It doesn't matter to me if you're only using
> 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> means degraded performance over time.
>=20
> Can you refute any of this evidence?
>=20

At least now at the moment NO. But I can say depending on how large of a=20
use of SSDs with OpenSolaris users from before the Oracle reaping that I=20
didnt recall seeing any relative bug reports on degradation. But like I=20
said... I havent seen them but thats not to say there wasnt a lack of use=
=20
either. Definately more to look into, test, benchmark & test again.

> > Given most SSD's come at a size greater than 32GB I hope this comes as =
a=20
> > early reminder that the ZIL you are buying that disk for is only going =
to=20
> > be using a small percent of that disk and I hope you justify cost over =
its=20
> > actual use. If you do happen to justify creating a ZIL for your pool th=
en=20
> > I hope that you partition it wisely to make use of the rest of the spac=
e=20
> > that is untouched.
> >=20
> > For all other cases I would reccomend if you still want to have a ZIL t=
hat=20
> > you take some sort of PCI->SD CARD or USB stick into account with=20
> > mirroring.
>=20
> Others have pointed out this isn't effective (re: USB sticks).  The read
> and write speeds are too slow, and limit the overall performance of ZFS
> in a very bad way.  I can absolutely confirm this claim (I've tested it
> myself, using a high-end USB flash drive as a cache device (L2ARC)).
>=20
> Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> *does* improve performance on older systems which have slower disk I/O
> (e.g. ICH5-based systems).
>=20

Agreed. Soon as the bus speed, write speeds are greater than the speeds=20
that USB 2.0 can handle, then any USB based solution is useless. ICH5 and=
=20
up would be right about that time you would see this starting to happen.

sdcards/cfcards mileage may vary depending on the transfer rates. But=20
still the same situation applies like you said once your main pool=20
throughput outweighs the throughput on your ZIL then its probably not=20
worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
ZIL.


Anyway all good information for those to make the judgement whether they=20
need a cache or a zil.


Thanks again Jeremy. Always appreciated.

--=20

 Regards, (jhell)
 Jason Hellenthal


--17pEHd4RhPHOinZp
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (FreeBSD)
Comment: http://bit.ly/0x89D8547E

iQEcBAEBAgAGBQJNyzyAAAoJEJBXh4mJ2FR+2qAH/3A09ZwqGiIjuz25r5FVqwk6
iJuTHR1rlOTV0IqaUh6a2FaFGnWKDu/KpQLOj+ZGDPB6DH70fOon90QvU3/hTjoN
RhguCVxHfbQLJbqaXKHZkj+JC6RhMV1H899/VAx29XlVMfvarUXw47vF7Pjcq3G1
tK5pZyK66yldkUzPwQHufIHtcebWu7EVzGWF4Hl25apkRDTyDRHL45rIM/vdDE94
SB81i9bFD+BuMV2KKUwwG/JfKborFxtYID4Vy8nIVDGjq9fE9zh4FTClnyj3wmNE
Y5UYsjB1JkZX2q195FkMk3YxLIyS3xSTahHUqwsVZA+bm+Rc2G9DTU1jhlHSudU=
=QtJ+
-----END PGP SIGNATURE-----

--17pEHd4RhPHOinZp--

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 02:08:09 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 29539106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:08:09 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta04.westchester.pa.mail.comcast.net
	(qmta04.westchester.pa.mail.comcast.net [76.96.62.40])
	by mx1.freebsd.org (Postfix) with ESMTP id D9E558FC14
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:08:08 +0000 (UTC)
Received: from omta21.westchester.pa.mail.comcast.net ([76.96.62.72])
	by qmta04.westchester.pa.mail.comcast.net with comcast
	id iS891g0021ZXKqc54S895N; Thu, 12 May 2011 02:08:09 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta21.westchester.pa.mail.comcast.net with comcast
	id iS861g0071t3BNj3hS86gF; Thu, 12 May 2011 02:08:07 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id A8313102C36; Wed, 11 May 2011 19:08:04 -0700 (PDT)
Date: Wed, 11 May 2011 19:08:04 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Jason Hellenthal <jhell@DataIX.net>
Message-ID: <20110512020804.GA50560@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
	<20110511223849.GA65193@DataIX.net>
	<20110512010433.GA48863@icarus.home.lan>
	<20110512014848.GA35736@DataIX.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110512014848.GA35736@DataIX.net>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 02:08:09 -0000

On Wed, May 11, 2011 at 09:48:48PM -0400, Jason Hellenthal wrote:
> Jeremy, As always the qaulity of your messages are 101% spot on and I 
> always find some new new information that becomes handy more often than I 
> could say, and there is always something to be learned. 
>
> Thanks.
>
> On Wed, May 11, 2011 at 06:04:33PM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 06:38:49PM -0400, Jason Hellenthal wrote:
> > > 
> > > Jeremy,
> > > 
> > > On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > > > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folks
> > > > > >should also keep that in mind when putting an SSD into use in this
> > > > > >fashion.
> > > > >
> > > > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > > > written slowly (on purpose).  Any current, or 1-2 generations back SSD
> > > > > would handle that write load without TRIM and without any performance
> > > > > degradation.
> > > > >
> > > > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > > > NAND should be smaller (my wild guess, current practice may differ)
> > > > > and the need for rewriting will be small. If you don't need to
> > > > > rewrite already written data, TRIM does not help. Also, as far as I
> > > > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > > > twice or more the advertised size and always write to fresh cells,
> > > > > scheduling an background erase of the 'overwritten' cell.
> > > > 
> > > > AFAIK, drive manufacturers do not disclose just how much reallocation
> > > > space they keep available on an SSD.  I'd rather not speculate as to how
> > > > much, as I'm certain it varies per vendor.
> > > > 
> > > 
> > > Lets not forget here: The size of the separate log device may be quite 
> > > small. A rule of thumb is that you should size the separate log to be able 
> > > to handle 10 seconds of your expected synchronous write workload. It would 
> > > be rare to need more than 100 MB in a separate log device, but the 
> > > separate log must be at least 64 MB.
> > > 
> > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
> > > 
> > > So in other words how much is TRIM really even effective give the above ?
> > > 
> > > Even with a high database write load on the disks at full compacity of the 
> > > incoming link I would find it hard to believe that anyone could get the 
> > > ZIL to even come close to 512MB.
> > 
> > In the case of an SSD being used as a log device (ZIL), I imagine it
> > would only matter the longer the drive was kept in use.  I do not use
> > log devices anywhere with ZFS, so I can't really comment.
> > 
> > In the case of an SSD being used as a cache device (L2ARC), I imagine it
> > would matter much more.
> > 
> > In the case of an SSD being used as a pool device, it matters greatly.
> > 
> > Why it matters: there's two methods of "reclaiming" blocks which were
> > used: internal SSD "garbage collection" and TRIM.  For a NAND block to be
> > reclaimed, it has to be erased -- SSDs erase things in pages rather
> > than individual LBAs.  With TRIM, you submit the data management command
> > via ATA with a list of LBAs you wish to inform the drive are no longer
> > used.  The drive aggregates the LBA ranges, determines if an entire
> > flash page can be erased, and does it.  If it can't, it makes some sort
> > of mental note that the individual LBA (in some particular page)
> > shouldn't be used.
> > 
> > The "garbage collection" works when the SSD is idle.  I have no idea
> > what "idle" actually means operationally, because again, vendors don't
> > disclose what the idle intervals are.  5 minutes?  24 hours?  It
> > matters, but they don't tell us.  (What confuses me about the "idle GC"
> > method is how it determines what it can erase -- if the OS didn't tell
> > it what it's using, how does it know it can erase the page?)
> > 
> > Anyway, how all this manifests itself performance-wise is intriguing.
> > It's not speculation: there's hard evidence that not using TRIM results
> > in SSD performance, bluntly put, sucking badly on some SSDs.
> > 
> > There's this mentality that wear levelling completely solves all of the
> > **performance** concerns -- that isn't the case at all.  In fact, I'm
> > under the impression it probably hurts performance, but it depends on
> > how it's implemented within the drive firmware.
> > 
> > bit-tech did an experiment using Windows 7 -- which supports and uses
> > TRIM assuming the device advertises the capability -- with different
> > models of SSDs.  The testing procedure is documented here, but I'll
> > document it as well:
> > 
> > http://www.bit-tech.net/hardware/storage/2010/02/04/windows-7-ssd-performance-and-trim/4
> > 
> > Again, remember, this is done on a Windows 7 system which does support
> > TRIM if the device supports it.  The testing steps, in this order:
> > 
> > 1) SSD without TRIM support -- all LBAs are zeroed.
> > 2) Took read/write benchmark readings.
> > 3) SSD without TRIM support -- partitioned and formatted as NTFS
> >    (cluster size unknown), copied 100GB of data to the drive, deleted all
> >    the data, and repeated this method 10 times.
> > 4) Step #2 repeated.
> > 5) Upgraded SSD firmware to a version that supports TRIM.
> > 6) SSD with TRIM support -- step #1 repeated.
> > 7) Step #2 repeated.
> > 8) SSD with TRIM support -- step #3 repeated.
> > 9) Step #2 repeated.
> > 
> > Without TRIM, some drives drop their read performance by more than 50%,
> > and write performance by almost 70%.  I'm focusing on Intel SSDs here,
> > by the way.  I do not care for OCZ or Corsair products.
> > 
> > So because ZFS on FreeBSD (and Solaris/OpenSolaris) doesn't support
> > TRIM, effectively the benchmarks shown pre-firmware-upgrade are what ZFS
> > on FreeBSD will mimic (to some degree).
> > 
> > Therefore, simply put, users should be concerned when using ZFS on
> > FreeBSD with SSDs.  It doesn't matter to me if you're only using
> > 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> > means degraded performance over time.
> > 
> > Can you refute any of this evidence?
> > 
> 
> At least now at the moment NO. But I can say depending on how large of a 
> use of SSDs with OpenSolaris users from before the Oracle reaping that I 
> didnt recall seeing any relative bug reports on degradation. But like I 
> said... I havent seen them but thats not to say there wasnt a lack of use 
> either. Definately more to look into, test, benchmark & test again.
> 
> > > Given most SSD's come at a size greater than 32GB I hope this comes as a 
> > > early reminder that the ZIL you are buying that disk for is only going to 
> > > be using a small percent of that disk and I hope you justify cost over its 
> > > actual use. If you do happen to justify creating a ZIL for your pool then 
> > > I hope that you partition it wisely to make use of the rest of the space 
> > > that is untouched.
> > > 
> > > For all other cases I would reccomend if you still want to have a ZIL that 
> > > you take some sort of PCI->SD CARD or USB stick into account with 
> > > mirroring.
> > 
> > Others have pointed out this isn't effective (re: USB sticks).  The read
> > and write speeds are too slow, and limit the overall performance of ZFS
> > in a very bad way.  I can absolutely confirm this claim (I've tested it
> > myself, using a high-end USB flash drive as a cache device (L2ARC)).
> > 
> > Alexander Leidinger pointed out that using a USB stick for cache/L2ARC
> > *does* improve performance on older systems which have slower disk I/O
> > (e.g. ICH5-based systems).
> > 
> 
> Agreed. Soon as the bus speed, write speeds are greater than the speeds 
> that USB 2.0 can handle, then any USB based solution is useless. ICH5 and 
> up would be right about that time you would see this starting to happen.
> 
> sdcards/cfcards mileage may vary depending on the transfer rates. But 
> still the same situation applies like you said once your main pool 
> throughput outweighs the throughput on your ZIL then its probably not 
> worth even having a ZIL or a Cache device. Emphasis on Cache moreso than
> ZIL.
> 
> 
> Anyway all good information for those to make the judgement whether they 
> need a cache or a zil.
> 
> 
> Thanks again Jeremy. Always appreciated.

You're welcome.

It's important to note that much of what I say is stuff I've learned and
read (technical documentation usually) on my own -- which means I almost
certainly misunderstand certain pieces of technology.  There are a *lot*
of people here who understand it much better than I do.  (I'm looking at
you, jhb@  ;-) )

As such, I probably should have CC'd pjd@ on this thread, since he's
talked a bit about how to get ZFS on FreeBSD to work with TRIM, and when
to issue the erasing of said blocks.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 02:26:43 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id C53E61065688
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:26:43 +0000 (UTC)
	(envelope-from fbsd@dannysplace.net)
Received: from mailgw.dannysplace.net (mailgw.dannysplace.net [204.109.56.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 937FF8FC14
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:26:43 +0000 (UTC)
Received: from [203.206.171.212] (helo=[192.168.10.10])
	by mailgw.dannysplace.net with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
	(Exim 4.72 (FreeBSD)) (envelope-from <fbsd@dannysplace.net>)
	id 1QKLfi-000Eyk-I0
	for freebsd-fs@freebsd.org; Thu, 12 May 2011 12:30:40 +1000
Message-ID: <4DCB455C.4020805@dannysplace.net>
Date: Thu, 12 May 2011 12:26:36 +1000
From: Danny Carroll <fbsd@dannysplace.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
	rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DCA5620.1030203@dannysplace.net>
In-Reply-To: <4DCA5620.1030203@dannysplace.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Authenticated-User: danny
X-Authenticator: plain
X-Exim-Version: 4.72 (build at 12-Jul-2010 18:31:29)
X-Date: 2011-05-12 12:30:39
X-Connected-IP: 203.206.171.212:50545
X-Message-Linecount: 108
X-Body-Linecount: 95
X-Message-Size: 4204
X-Body-Size: 3656
X-Received-Count: 1
X-Recipient-Count: 1
X-Local-Recipient-Count: 1
X-Local-Recipient-Defer-Count: 0
X-Local-Recipient-Fail-Count: 0
X-SA-Exim-Connect-IP: 203.206.171.212
X-SA-Exim-Mail-From: fbsd@dannysplace.net
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	damka.dannysplace.net
X-Spam-Level: 
X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00
	autolearn=ham version=3.3.1
X-SA-Exim-Version: 4.2
X-SA-Exim-Scanned: Yes (on mailgw.dannysplace.net)
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: fbsd@dannysplace.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 02:26:43 -0000

On 11/05/2011 7:25 PM, Danny Carroll wrote:
> Hello all.
>
> I've been using ZFS for some time now and have never had an issued
> (except perhaps the issue of speed...)
> When v28 is taken into -STABLE I will most likely upgrade to v28 at that
> point.   Currently I am running v15 with v4 on disk.
>
> When I move to v28 I will probably wish to enable a L2Arc and also
> perhaps dedicated log devices.
>
> I'm curious about a few things however.
>
> 1. Can I remove either the L2 ARC or the log devices if things don't go
> as planned or if I need to free up some resources?
> 2. What are the best practices for setting up these?   Would a geom
> mirror for the log device be the way to go.  Or can you just let ZFS
> mirror the log itself?
> 3. What happens when one or both of the log devices fail.   Does ZFS
> come to a crashing halt and kill all the data?   Or does it simply
> complain that the ZIL is no longer active and continue on it's merry way?
>
> In short, what is the best way to set up these two features?
>


Replying to myself in order to summarise the recommendations (when using
v28):
 - Don't use SSD for the Log device.  Write speed tends to be a problem.
 - SSD ok for cache if the sizing is right, but without TRIM, don't
expect to take full advantage of the SSD.
 - Do use two devices for log and mirror them with ZFS.  Bad things
*can* happen if*all* the log devices die.
 - Don't colocate L2ARC and Log devices.
 - Log devices can be small, ZFS Best practices guide specifies about
50% of RAM as max.  Minimum should be Throughput * 10 (1Gb for 100MB/sec
of writes).


let me know if I got anything wrong or missed something important.

Remaining questions.
- Is there any advantage to using a spare partition on a SCSI or SATA
drive as L2Arc?  Assuming it was in the machine already but doing nothing?
- If I have 2 pools like this:
# zpool status
  pool: tank
 state: ONLINE
 scrub: scrub completed after 11h7m with 0 errors on Sun May  8 14:17:07
2011
config:

        NAME            STATE     READ WRITE CKSUM
        tank            ONLINE       0     0     0
          raidz1        ONLINE       0     0     0
            gpt/data0   ONLINE       0     0     0
            gpt/data1   ONLINE       0     0     0
            gpt/data2   ONLINE       0     0     0
            gpt/data3   ONLINE       0     0     0
            gpt/data4   ONLINE       0     0     0
            gpt/data5   ONLINE       0     0     0
          raidz1        ONLINE       0     0     0
            gpt/data6   ONLINE       0     0     0
            gpt/data7   ONLINE       0     0     0
            gpt/data8   ONLINE       0     0     0
            gpt/data9   ONLINE       0     0     0
            gpt/data10  ONLINE       0     0     0
            gpt/data11  ONLINE       0     0     0

errors: No known data errors

  pool: system
 state: ONLINE
 scrub: scrub completed after 1h1m with 0 errors on Sun May  8 15:18:23 2011
config:

        NAME             STATE     READ WRITE CKSUM
        system           ONLINE       0     0     0
          mirror         ONLINE       0     0     0
            gpt/system0  ONLINE       0     0     0
            gpt/system1  ONLINE       0     0     0


And I have free space on the "system" disks.   I could give two new
partitions on the system disks to ZFS for the log devices of the "tank"
pool?
If I were worried about performance of my "system" pool, I could also
use spare partitions on (a couple of) the "tank" disks in a similar way. 
But it would be silly to use the same disk for ZIL and pool data.  In
that case, why would I bother to alter the default.

Thanks for the info!

-D

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 02:44:50 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4C14E106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:44:50 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id DE8278FC12
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:44:49 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p4C2ij70026830; Wed, 11 May 2011 21:44:45 -0500 (CDT)
Date: Wed, 11 May 2011 21:44:45 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110512010433.GA48863@icarus.home.lan>
Message-ID: <alpine.GSO.2.01.1105112120220.20825@freddy.simplesystems.org>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
	<20110511223849.GA65193@DataIX.net>
	<20110512010433.GA48863@icarus.home.lan>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Wed, 11 May 2011 21:44:45 -0500 (CDT)
Cc: freebsd-fs@freebsd.org, Jason Hellenthal <jhell@DataIX.net>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 02:44:50 -0000

On Wed, 11 May 2011, Jeremy Chadwick wrote:
>
> The "garbage collection" works when the SSD is idle.  I have no idea
> what "idle" actually means operationally, because again, vendors don't
> disclose what the idle intervals are.  5 minutes?  24 hours?  It
> matters, but they don't tell us.  (What confuses me about the "idle GC"
> method is how it determines what it can erase -- if the OS didn't tell
> it what it's using, how does it know it can erase the page?)

Garbage collection is not necessarily just when the drive is idle. 
Regardless, if one "overwrites" a page (or part of a page), the drive 
can implement that by reading any non-overlapped existing content 
(which it already has to do), allocating a fresh (already erased) 
page, and then writing the composite to that new page.  The 
"overwritten" page is then scheduled for erasure.  This sort of 
garbage collector works by over-provisioning the actual amount of 
flash in the drive, which should be done anyway in a quality product.

This simple recirculating/COW algorithm is a reason why TRIM is not 
really needed given sufficiently intelligent SSD design.

> Therefore, simply put, users should be concerned when using ZFS on
> FreeBSD with SSDs.  It doesn't matter to me if you're only using
> 64MBytes of a 40GB drive or if you're using the entire thing; no TRIM
> means degraded performance over time.

This seems unduely harsh.  Even with TRIM, SSDs will suffer in 
continually write-heavy (e.g. server) environments.  The reason is 
that the blocks still need to be erased and the erasure performance is 
limited.  It is not uncommon for servers to be run close to their 
limits most of the time.

One should not be ashamed with purchasing a larger SSD than the space 
consumption appears to warrant.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 02:52:08 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 519071065672
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:52:08 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 1948D8FC08
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 02:52:07 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p4C2pwMu029542; Wed, 11 May 2011 21:51:58 -0500 (CDT)
Date: Wed, 11 May 2011 21:51:58 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Danny Carroll <fbsd@dannysplace.net>
In-Reply-To: <4DCB455C.4020805@dannysplace.net>
Message-ID: <alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Wed, 11 May 2011 21:51:58 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 02:52:08 -0000

On Thu, 12 May 2011, Danny Carroll wrote:
>
> Replying to myself in order to summarise the recommendations (when using
> v28):
> - Don't use SSD for the Log device.  Write speed tends to be a problem.

DO use SSD for the log device.  The log device is only used for 
synchronous writes.  Except for certain usages (E.g. database and NFS 
server) most writes will be asynchronous and never be written to the 
log.  Huge synchronous writes will also bypass the SSD log device. 
The log device is for reducing latency on small synchronous writes.

> - Is there any advantage to using a spare partition on a SCSI or SATA
> drive as L2Arc?  Assuming it was in the machine already but doing nothing?

The L2ARC is intended to reduce read latency and is random accessed. 
It is unlikely that rotating media will work well for that.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 03:36:31 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0CD9F106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 03:36:31 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta08.westchester.pa.mail.comcast.net
	(qmta08.westchester.pa.mail.comcast.net [76.96.62.80])
	by mx1.freebsd.org (Postfix) with ESMTP id AE00D8FC08
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 03:36:30 +0000 (UTC)
Received: from omta17.westchester.pa.mail.comcast.net ([76.96.62.89])
	by qmta08.westchester.pa.mail.comcast.net with comcast
	id iTCV1g0011vXlb858TcWEr; Thu, 12 May 2011 03:36:30 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta17.westchester.pa.mail.comcast.net with comcast
	id iTcT1g01s1t3BNj3dTcUMy; Thu, 12 May 2011 03:36:30 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id A9A11102C36; Wed, 11 May 2011 20:36:26 -0700 (PDT)
Date: Wed, 11 May 2011 20:36:26 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Message-ID: <20110512033626.GA52047@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 03:36:31 -0000

On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
> On Thu, 12 May 2011, Danny Carroll wrote:
> >
> >Replying to myself in order to summarise the recommendations (when using
> >v28):
> >- Don't use SSD for the Log device.  Write speed tends to be a problem.
> 
> DO use SSD for the log device.  The log device is only used for
> synchronous writes.  Except for certain usages (E.g. database and
> NFS server) most writes will be asynchronous and never be written to
> the log.  Huge synchronous writes will also bypass the SSD log
> device. The log device is for reducing latency on small synchronous
> writes.

Bob, please correct me if I'm wrong, but as I understand it a log device
(ZIL) effectively limits the overall write speed of the pool itself.
Consumer-level SSDs do not have extremely high write performance (and it
gets worse without TRIM; again a 70% decrease in write speed in some
cases).

I imagine a very high-end SSD (FusionIO, etc. -- the things that cost
$900 and higher) would have extremely high write performance and would
work perfectly for this role.  Or a battery-backed DDR RAM device.

What's amusing (to me anyway) is that when ZFS was originally presented,
engineers from Sun folks kept focusing on how "you can buy cheap,
generic disks and accomplish goals!" yet if the above statement of mine
is accurate, that goes against the original principle.

Danny might also find this URL useful:

http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-flash-memory-ssds-and-zfs

> >- Is there any advantage to using a spare partition on a SCSI or SATA
> >drive as L2Arc?  Assuming it was in the machine already but doing nothing?
> 
> The L2ARC is intended to reduce read latency and is random accessed.
> It is unlikely that rotating media will work well for that.

Agreed -- this is why I tell folks that an SSD would work very well for
L2ARC, but my opinion is just to buy more RAM for the ARC ("layer 1").

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 03:56:00 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 06A5E106566C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 03:56:00 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com
	[209.85.160.182])
	by mx1.freebsd.org (Postfix) with ESMTP id AF51F8FC16
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 03:55:59 +0000 (UTC)
Received: by gyg13 with SMTP id 13so505401gyg.13
	for <freebsd-fs@freebsd.org>; Wed, 11 May 2011 20:55:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type
	:content-transfer-encoding;
	bh=irQ/V/RYJGBt7AcbvtwFEO75vMGvANZdp5IjoQCCFFY=;
	b=XUk/aRexC8GcoTrlj44WS/xAfZ1ghXowqRxcckmFO8Zx1TFR0SppydcyD9VOyHK1ok
	3bS0EE0fqJyp4CMHXoq1vnf7HGu1Aio7MCOx66c3vnfkCKHFe4HNNUNQw6bfNwAYdGlj
	vnoOSw+E+W1BulqRpWPOBSGR/hniQ0aE6shUo=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	b=Zshe1bSaA2CvuVE8VZYZMMhBo7ewvxD/h1/CeE5M33uAZJ/l8xAEl9KmEm9JHMUM/q
	hVebspe62GVpKjP16OPWDnLAaJ9GlbtWTTCGfxxkmXuq3rhjqbrswb7i7vZb6eIofXnk
	uN/uyWaPjhIatlAAYmMm+sOW1tvwttdGvtwEM=
MIME-Version: 1.0
Received: by 10.90.135.8 with SMTP id i8mr131088agd.113.1305172556994; Wed, 11
	May 2011 20:55:56 -0700 (PDT)
Received: by 10.90.52.15 with HTTP; Wed, 11 May 2011 20:55:56 -0700 (PDT)
In-Reply-To: <20110512033626.GA52047@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
Date: Wed, 11 May 2011 20:55:56 -0700
Message-ID: <BANLkTi=e1ZV0GkC7twCAUTu_jSMDdhBH0w@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 03:56:00 -0000

On Wed, May 11, 2011 at 8:36 PM, Jeremy Chadwick
<freebsd@jdc.parodius.com> wrote:
> On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
>> On Thu, 12 May 2011, Danny Carroll wrote:
>> >
>> >Replying to myself in order to summarise the recommendations (when usin=
g
>> >v28):
>> >- Don't use SSD for the Log device. =C2=A0Write speed tends to be a pro=
blem.
>>
>> DO use SSD for the log device. =C2=A0The log device is only used for
>> synchronous writes. =C2=A0Except for certain usages (E.g. database and
>> NFS server) most writes will be asynchronous and never be written to
>> the log. =C2=A0Huge synchronous writes will also bypass the SSD log
>> device. The log device is for reducing latency on small synchronous
>> writes.
>
> Bob, please correct me if I'm wrong, but as I understand it a log device
> (ZIL) effectively limits the overall write speed of the pool itself.

Nope.  Using a separate log device removes sync writes from the I/O
path of the rest of the pool, thus increasing the total write
throughput for the pool.

> Danny might also find this URL useful:
>
> http://constantin.glez.de/blog/2011/02/frequently-asked-questions-about-f=
lash-memory-ssds-and-zfs

Read the linked articles.  For example:
http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-z=
il-explained

Most sync writes go to the ZIL.  If the ZIL is part of the pool, then
the pool has to issue two separate writes (once to the ZIL, then later
to the pool as part of the normal async txg).  If the ZIL is a
separate device, then there's no write contention with the rest of the
pool.

Not every sync write goes to the ZIL.  Only writes under a certain
size (64 KB or something like that).

Every OpenSolaris, Oracle Solaris, Nexenta admin will recommend
getting an enterprise-grade, write-optimised, SLC-based SSD
(preferably with a supercap) for use as the SLOG device.  Especially
if you're using ZFS for anything database-related, or serving files
over NFS, everyone says the same:  get an SSD for SLOG usage.

Why would it be any different for ZFS on FreeBSD?

There are plenty of benchmarks online and in the zfs-discuss mailing
list that shows the benefits to using an SSD-based SLOG.

--=20
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 06:33:18 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BFB701065670
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 06:33:18 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 337A78FC15
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 06:33:17 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4C6X77D058558
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:33:12 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4DCB7F22.4060008@digsys.bg>
Date: Thu, 12 May 2011 09:33:06 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DCA5620.1030203@dannysplace.net>	<4DCB455C.4020805@dannysplace.net>	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
In-Reply-To: <20110512033626.GA52047@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 06:33:18 -0000


On 12.05.11 06:36, Jeremy Chadwick wrote:
> On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
>> On Thu, 12 May 2011, Danny Carroll wrote:
>>> Replying to myself in order to summarise the recommendations (when using
>>> v28):
>>> - Don't use SSD for the Log device.  Write speed tends to be a problem.
>> DO use SSD for the log device.  The log device is only used for
>> synchronous writes.  Except for certain usages (E.g. database and
>> NFS server) most writes will be asynchronous and never be written to
>> the log.  Huge synchronous writes will also bypass the SSD log
>> device. The log device is for reducing latency on small synchronous
>> writes.
> Bob, please correct me if I'm wrong, but as I understand it a log device
> (ZIL) effectively limits the overall write speed of the pool itself.
>
Perhaps I misstated it in my first post, but there is nothing wrong with 
using SSD for the SLOG.

You can of course create usage/benchmark scenario, where an (cheap) SSD 
based SLOG will be worse than an (fast) HDD based SLOG, especially if 
you are not concerned about latency. The SLOG resolves two issues, it 
increases the pool throughput (primary storage) by removing small 
synchronous writes from it, that will unnecessarily introduce head 
movement and more IOPS and it provided low latency for small synchronous 
writes.

The later is only valid if the SSD is sufficiently write-optimized. Most 
consumer SSDs end up saturated by writes. Sequential write IOPS is what 
matters here.

About TRIM. As it was already mentioned, you will use only small portion 
of an (for example) 32GB SSD for the SLOG. If you do not allocate the 
entire SSD, then wear leveling will be able to play well and it is very 
likely you will not suffer any performance degradation.

By the way, I do not believe Windows benchmark has any significance in 
our ZFS usage for the SSDs. How is TRIM implemented in Windows? How does 
it relate to SSD usage as SLOG and L2ARC?

How can ever TRIM support influence reading from the drive?!

TRIM is an slow operation. How often are these issued? What is the 
impact of issuing TRIM to an otherwise loaded SSD?

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 06:44:19 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A35EF106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 06:44:19 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id 3422E8FC15
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 06:44:18 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4C6i9d6058598
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:44:14 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4DCB81B8.6070301@digsys.bg>
Date: Thu, 12 May 2011 09:44:08 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
In-Reply-To: <4DCB455C.4020805@dannysplace.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 06:44:19 -0000

On 12.05.11 05:26, Danny Carroll wrote:
>
>   - Don't use SSD for the Log device.  Write speed tends to be a problem.
It all depends on your usage. You need to experiment, unfortunately.

>   - SSD ok for cache if the sizing is right, but without TRIM, don't
> expect to take full advantage of the SSD.
I do not believe TRIM has any effect on L2ARC.

Why?
- TRIM is a technique to optimize future writes;
- L2ARC is written at controlled, very low rate, I believe something 
like 8MB/sec. There is no SSD currently on the market, with or without 
TRIM that has any trouble sustaining that rate.
- TRIM might introduce delays, it is very 'expensive' command. But that 
will surely wary by drive/manufacturer.
- There is no way TRIM can influence reading from the flash media. 
Reading from L2ARC with low latency and high speed is it's main purpose 
anyway.

> Remaining questions.
> - Is there any advantage to using a spare partition on a SCSI or SATA
> drive as L2Arc?  Assuming it was in the machine already but doing nothing?
Absolutely no advantage. You want L2ARC to be very low latency and 
high-bandwidth for random reading. Especially low-latency. This does not 
apply to rotating disks.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:02:49 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7EE7C106566C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:02:49 +0000 (UTC)
	(envelope-from numisemis@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 0BE818FC16
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:02:48 +0000 (UTC)
Received: by wwc33 with SMTP id 33so1417854wwc.31
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:02:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:content-type:mime-version:subject:from
	:in-reply-to:date:content-transfer-encoding:message-id:references:to
	:x-mailer; bh=y5Bfw6bWyC1G9CFze8Vs2J3W+YtO5U4YJGjskubuCFI=;
	b=r0CDomCnXi0ERwWbds0jmDZTh/iu6P44pXOg+ksMoM00g2Hf9iuSka/wS6qUCc+tl8
	uMJUuhRC6gXakY0CDg00ij50LcvpTZs4ufuH3JG8nst5qbXA4xMSMNZygpwfBUY6drEp
	4oM06UR09SABIxAgfcKodHottIfK6Uy8P+w9A=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=content-type:mime-version:subject:from:in-reply-to:date
	:content-transfer-encoding:message-id:references:to:x-mailer;
	b=R9OR9gWK+GkuAd4z0kFKi0R4Q89mcDk/klMt5QcK5y/3CoZDV6bt7m+GlfshxGeHOp
	2FprGMBQ+hW9nqitMU70rWohNIbvWCYWO88wLRgO3Yu0OrEHULRUUF5w3cIuQgL8WFQc
	yFcmhELfq693JAPJfNDhUqsHRPcvamuE57BOQ=
Received: by 10.216.79.11 with SMTP id h11mr4701200wee.77.1305187367760;
	Thu, 12 May 2011 01:02:47 -0700 (PDT)
Received: from sime-imac.logos.hr ([213.147.110.159])
	by mx.google.com with ESMTPS id t5sm473548wes.9.2011.05.12.01.02.46
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 12 May 2011 01:02:46 -0700 (PDT)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1084)
From: =?iso-8859-2?Q?=A9imun_Mikecin?= <numisemis@gmail.com>
In-Reply-To: <4DCB81B8.6070301@digsys.bg>
Date: Thu, 12 May 2011 10:02:42 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg>
To: freebsd-fs <freebsd-fs@freebsd.org>
X-Mailer: Apple Mail (2.1084)
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:02:49 -0000


On 12. svi. 2011., at 08:44, Daniel Kalchev wrote:

> On 12.05.11 05:26, Danny Carroll wrote:
>>=20
>>  - Don't use SSD for the Log device.  Write speed tends to be a =
problem.
> It all depends on your usage. You need to experiment, unfortunately.

What is the alternative for log devices if you are not using SSD?
Rotating hard drives?

AFAIK, two factors define the speed of log device: write transfer rate =
and write latency.
You will not find a rotating hard drive that has a write latency =
anything near the write latency of even a slowest SSD you can find on =
the market.
On the other hand, only a very few rotating hard drives have a write =
transfer rate that can be compared to SSD's.


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:33:37 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5634C106566C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:33:37 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id D807B8FC0C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:33:36 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4C8XR3l058999
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO)
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 11:33:32 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4DCB9B57.8090507@digsys.bg>
Date: Thu, 12 May 2011 11:33:27 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9
MIME-Version: 1.0
To: freebsd-fs@freebsd.org
References: <4DCA5620.1030203@dannysplace.net>	<4DCB455C.4020805@dannysplace.net>
	<4DCB81B8.6070301@digsys.bg>
	<A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
In-Reply-To: <A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:33:37 -0000


On 12.05.11 11:02, =8Aimun Mikecin wrote:
>
> You will not find a rotating hard drive that has a write latency anythi=
ng near the write latency of even a slowest SSD you can find on the marke=
t.
>
Cheap SSDs do not have acceptable latency, when saturated with writes.=20
Not to speak of throughput.

Truth is, SLOG should be on write-optimized SLC SSD. Any tricks, such as =

compression, manufacturers make with consumer products do influence=20
Windows benchmarks, but are unlikely to change laws of physics.

For most small installs, using rotating magnetic media (cheap!) as SLOG, =

can have dramatic performance improvement compared to not using any SLOG.=


Daniel


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:34:31 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F11C31065672
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:34:31 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta09.emeryville.ca.mail.comcast.net
	(qmta09.emeryville.ca.mail.comcast.net [76.96.30.96])
	by mx1.freebsd.org (Postfix) with ESMTP id D59A88FC1C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:34:31 +0000 (UTC)
Received: from omta10.emeryville.ca.mail.comcast.net ([76.96.30.28])
	by qmta09.emeryville.ca.mail.comcast.net with comcast
	id iYa41g0010cQ2SLA9YaX1b; Thu, 12 May 2011 08:34:31 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta10.emeryville.ca.mail.comcast.net with comcast
	id iYaW1g0021t3BNj8WYaWHr; Thu, 12 May 2011 08:34:31 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id D7B06102C36; Thu, 12 May 2011 01:34:29 -0700 (PDT)
Date: Thu, 12 May 2011 01:34:29 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Daniel Kalchev <daniel@digsys.bg>
Message-ID: <20110512083429.GA58841@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
	<4DCB7F22.4060008@digsys.bg>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4DCB7F22.4060008@digsys.bg>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:34:32 -0000

On Thu, May 12, 2011 at 09:33:06AM +0300, Daniel Kalchev wrote:
> On 12.05.11 06:36, Jeremy Chadwick wrote:
> >On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
> >>On Thu, 12 May 2011, Danny Carroll wrote:
> >>>Replying to myself in order to summarise the recommendations (when using
> >>>v28):
> >>>- Don't use SSD for the Log device.  Write speed tends to be a problem.
> >>DO use SSD for the log device.  The log device is only used for
> >>synchronous writes.  Except for certain usages (E.g. database and
> >>NFS server) most writes will be asynchronous and never be written to
> >>the log.  Huge synchronous writes will also bypass the SSD log
> >>device. The log device is for reducing latency on small synchronous
> >>writes.
> >Bob, please correct me if I'm wrong, but as I understand it a log device
> >(ZIL) effectively limits the overall write speed of the pool itself.
> >
> Perhaps I misstated it in my first post, but there is nothing wrong
> with using SSD for the SLOG.
> 
> You can of course create usage/benchmark scenario, where an (cheap)
> SSD based SLOG will be worse than an (fast) HDD based SLOG,
> especially if you are not concerned about latency. The SLOG resolves
> two issues, it increases the pool throughput (primary storage) by
> removing small synchronous writes from it, that will unnecessarily
> introduce head movement and more IOPS and it provided low latency
> for small synchronous writes.

I've been reading about this in detail here:

http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained

I had no idea the primary point of a SLOG was to deal with applications
that make use of O_SYNC.  I thought it was supposed to improve write
performance for both asynchronous and synchronous writes.  Obviously I'm
wrong here.

The author's description (at that URL) of an example scenario makes
little sense to me; there's a story he tells referring to a bank and a
financial transaction of US$699 performed which got cached in RAM and
then the system lost power -- and how the intent log on a filesystem
would be replayed during reboot.

What guarantee is there that the intent log -- which is written to the
disk -- actually got written to the disk in the middle of a power
failure?  There's a lot of focus there on the idea that "the intent log
will fix everything, but may lose writes", but what guarantee do I have
that the intent log isn't corrupt or botched during a power failure?

I guess this is why others have mentioned the importance of BBUs and
supercaps, but I don't know what guarantee there is that during a power
failure there won't be some degree of filesystem corruption or lost
data.

There's a lot about ensuring/guaranteeing filesystem integrity I've to
learn.

> The later is only valid if the SSD is sufficiently write-optimized.
> Most consumer SSDs end up saturated by writes. Sequential write IOPS
> is what matters here.

Oh, I absolutely agree on this point.  So basically consumer-level SSDs
that don't provide extreme write speed benefits (compared to a classic
MHDD) -- not discussing seek times here, we all know SSDs win there --
probably aren't good candidates for SLOGs.

What's interesting about the focus on IOPS is that Intel SSDs, in the
consumer class, still trump their competitors.  But given that your
above statement focuses on sequential writes, and the site I provided is
quite clear about what happens to sequential writes on Intel SSD that
doesn't have TRIM..... Yeah, you get where I'm going with this.  :-)

> About TRIM. As it was already mentioned, you will use only small
> portion of an (for example) 32GB SSD for the SLOG. If you do not
> allocate the entire SSD, then wear leveling will be able to play
> well and it is very likely you will not suffer any performance
> degradation.

That sounds ideal, though I'm not sure about the "won't suffer ANY
performance degradation" part.  I think degradation is just less likely
to be witnessed.

I should clarify on what "allocate" in the above paragraph means (for
readers, not for you Daniel :-) ): it means disk space actually used
(LBAs actually written to).  Wear levelling works better when there's
more available (unused) flash.  The more full the disk (filesystem(s))
is, the worse the wear levelling algorithm performs.

> By the way, I do not believe Windows benchmark has any significance
> in our ZFS usage for the SSDs. How is TRIM implemented in Windows?
> How does it relate to SSD usage as SLOG and L2ARC?

Yeah, I knew someone would go down this road.  Sigh.  I strongly believe
it does have relevance.  The relevance is in the fact that the non-TRIM
benchmarks (read: an OS that has TRIM support but the SSD itself does
not, therefore TRIM cannot be used) are strong indicators that the
performance of the SSD -- sequential reads and writes both -- greatly
degrade without TRIM over time.  This is also why you'll find people
(who cannot use TRIM) regularly advocating an entire format (writing
zeros to all LBAs on the disk) after prolonged use without TRIM.

I don't know how TRIM is implemented with NTFS in Windows.

> How can ever TRIM support influence reading from the drive?!

I guess you want more proof, so here you go.  Again, the authors wrote a
bunch of data to the filesystem, took a sequential read benchmark, then
induced TRIM and took another sequential read benchmark.  The difference
is obvious.  This is an X25-V, however, which is the "low-end" of the
consumer series, so the numbers are much worse -- but this is a drive
that runs for around US$100, making it appealing to people:

http://www.anandtech.com/show/3756/2010-value-ssd-100-roundup-kingston-and-ocz-take-on-intel/5

I imagine the reason this happens is similar to why memory performance
degrades under fragmentation or when there's a lot of "middle-man stuff"
going on.

"Middle-man stuff" in this case means the FTL inside of the SSD which is
used to correlate LBAs with physical NAND flash pages (and the
physically separate chips; it's not just one big flash chip you know).
NAND flash pages tend to be something like 256KByte or 512KByte in size,
so erasing one means no part of it should be in use by the OS or
underlying filesystem.

How does the SSD know what's used by the OS?  It has to literally keep
track of all the LBAs written to.  I imagine that list is extremely
large and takes time to iterate over.

TRIM allows the OS to tell the underlying SSD "LBAs x-y aren't in use
any more", which probably removes an entry from the FTL flash<->LBA map,
and even does things like move data around between flash pages so that
it can erase a NAND flash page.  It can do the latter given the role of
the FTL acting as a "middle-man" as noted above.

> TRIM is an slow operation. How often are these issued?

Good questions, for which I have no answer.  The same could be asked of
any OS however, not just Windows.  And I've asked the same question
about SSDs internal "garbage collection" too.  I have no answers, so you
and I are both wondering the same question.  And yes, I am aware TRIM is
a costly operation.

There's a description I found of the process that makes a lot of sense,
so rather than re-word it I'll just include it here:

http://www.enterprisestorageforum.com/technology/article.php/11182_3910451_2/Fixing-SSD-Performance-Degradation-Part-1.htm

See the paragraph starting with "Another long-awaited technique".

> What is the impact of issuing TRIM to an otherwise loaded SSD?

I'm not sure if "loaded" means "heavy I/O load" or "heavily used"
(space-wise).  If you meant "heavy I/O load": as I understand it --
following forums, user experiences, etc. -- a heavily-used drive which
hasn't had TRIM issued tends to perform worse as time goes on.  Most
people with OSes that don't have TRIM (OS X, Windows XP, etc.) tend to
resort to a full format of the SSD (every LBA written zero, e.g. the -E
flag to newfs) every so often. 

The interval TRIM should be performed is almost certainly up for
discussion, but I can't provide any advice because no OS I run or use
seems to implement it (aside from FreeBSD UFS, and that seems to issue
TRIM on BIO_DELETE via GEOM).

(Inline EDIT: Holy crap, I just realised TRIM support has to be enabled
via tunefs on UFS filesystems.  I started digging through the code and I
found the FS_TRIM bit; gee, maybe I should use tunefs -t.  I wish I had
known this; I thought it just did this automatically if the underlying
storage device provided TRIM support.  Sigh)

Here's some data which probably won't mean much to you since it's from a
Windows machine, but the important part is that it's from a Windows XP
SP3 machine -- XP has no TRIM support.

 Disk: Intel 320-series SSD; model SSDSA2CW080G3; 80GB, MLC
   SB: Intel ICH9, in "Enhanced" mode (non-AHCI, non-RAID)
   OS: Windows XP SP3
   FS: NTFS, 4KB cluster size, NTFS atime turned off, NTFS partition
       properly 4KB-aligned
Space: Approximately 6GB of 80GB used.

This disk is very new (only 436 power-on hours).

Here are details of the disk:

http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_01.png

And a screen shot of a sequential read benchmark which should speak for
itself.  Block read size is 64KBytes.  This is a raw device read and not
a filesystem-level read, meaning NTFS isn't in the picture here.  What's
interesting is the degradation in performance around the 16GB region:

http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_02.png

Next, a screen shot of a filesystem-based benchmark.  This is writing
and reading a 256MByte file (to the NTFS filesystem) using different
block sizes.  Horizontal access is block size, vertical axis is speed.
Reads are the blue bar, writes are the orange:

http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_03.png

And finally, the same device-level sequential read benchmark performed
again to show what effect the write benchmarks may have had on the disk:

http://jdc.parodius.com/freebsd/i320ssd/ssdsa2cw080g3_04.png

Sadly I can't test sequential writes because it's an OS disk.

So, my findings more or less mimic that of what other people are seeing
as well.  Given that the read benchmarks are device-level and not
filesystem-level, one shouldn't be pondering Windows -- one should be
pondering the implications of lack of TRIM and what's going on within
the drive itself.

I also have an Intel 320-series SSD in my home FreeBSD box as an OS disk
(UFS2 / UFS2+SU filesystems).  The amount of space used there is lower
(~4GB).  Do you know of some benchmarking utilities which do
device-level reads and can plot or provide metrics for LBA offsets or
equivalent?  I could compare that to the Windows benchmarks, but still,
I think we're barking up the wrong tree.  I'm really not comparing ZFS
to NTFS here; I'm saying that TRIM addresses performance problems (to
some degree) regardless of filesystem type.

Anyway, I think that's enough from me for now.  I've written this over
the course of almost 2 hours.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:42:49 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 324A8106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:42:49 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com
	[74.125.82.182])
	by mx1.freebsd.org (Postfix) with ESMTP id AC9788FC1B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:42:47 +0000 (UTC)
Received: by wyf23 with SMTP id 23so1318541wyf.13
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:42:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=5l4bzEdxUZ/R8DtBDQb0sLziqCdB/UrDGDhJdcjONQ4=;
	b=jdlFa59LRxFvnKHrhqlB7C3ttX6u9ULrPKVNYz3Yqm+KSrAzN2NSqyNJ4AodndsC0e
	qD1kc3bIBTxZ1ufifWMwAayEJYiodRJq1NoYBdGULbKgo0A6jcAj5eBtv93JhmQkwWl5
	HXYym3vZ4SVYr/jcGydUD5mzNtIoV6zJ9cZ/E=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=TANSlEFNsR6pPYmSfIi/dV/NStks5KlNMaOxE32iG7yQEp+HtniZQjf/6e2w+Byj2q
	SClNtxWGaiU/G9aDtdyJywTmsSuLXCODRvuFd1JaC2secG/Zpz5TS3+pulQRINbHrTPM
	LZMJ2N7FitD8/oLd1xpXbigv4huTFTLUWPE9Y=
MIME-Version: 1.0
Received: by 10.216.63.130 with SMTP id a2mr1398863wed.61.1305189765647; Thu,
	12 May 2011 01:42:45 -0700 (PDT)
Received: by 10.216.93.70 with HTTP; Thu, 12 May 2011 01:42:45 -0700 (PDT)
In-Reply-To: <20110511223849.GA65193@DataIX.net>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
	<20110511223849.GA65193@DataIX.net>
Date: Thu, 12 May 2011 09:42:45 +0100
Message-ID: <BANLkTi=vOD5EJSo14otdD+CxThJ7kE5txA@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: Jason Hellenthal <jhell@dataix.net>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:42:49 -0000

On 11 May 2011 23:38, Jason Hellenthal <jhell@dataix.net> wrote:

>
> Jeremy,
>
> On Wed, May 11, 2011 at 05:08:30AM -0700, Jeremy Chadwick wrote:
> > On Wed, May 11, 2011 at 02:17:42PM +0300, Daniel Kalchev wrote:
> > > On 11.05.11 13:51, Jeremy Chadwick wrote:
> > > >Furthermore, TRIM support doesn't exist with ZFS on FreeBSD, so folk=
s
> > > >should also keep that in mind when putting an SSD into use in this
> > > >fashion.
> > >
> > > By the way, what would be the use of TRIM for SLOG and L2ARC devices?
> > > I see absolutely no benefit from TRIM for the L2ARC, because it is
> > > written slowly (on purpose).  Any current, or 1-2 generations back SS=
D
> > > would handle that write load without TRIM and without any performance
> > > degradation.
> > >
> > > Perhaps TRIM helps with the SLOG. But then, it is wise to use SLC
> > > SSD for the SLOG, for many reasons. The write regions on the SLC
> > > NAND should be smaller (my wild guess, current practice may differ)
> > > and the need for rewriting will be small. If you don't need to
> > > rewrite already written data, TRIM does not help. Also, as far as I
> > > understand, most "serious" SSDs (typical for SLC I guess) would have
> > > twice or more the advertised size and always write to fresh cells,
> > > scheduling an background erase of the 'overwritten' cell.
> >
> > AFAIK, drive manufacturers do not disclose just how much reallocation
> > space they keep available on an SSD.  I'd rather not speculate as to ho=
w
> > much, as I'm certain it varies per vendor.
> >
>
> Lets not forget here: The size of the separate log device may be quite
> small. A rule of thumb is that you should size the separate log to be abl=
e
> to handle 10 seconds of your expected synchronous write workload. It woul=
d
> be rare to need more than 100 MB in a separate log device, but the
> separate log must be at least 64 MB.
>
> http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide
>
>
> So in other words how much is TRIM really even effective give the above ?
>
> Even with a high database write load on the disks at full compacity of th=
e
> incoming link I would find it hard to believe that anyone could get the
> ZIL to even come close to 512MB.
>
>
> Given most SSD's come at a size greater than 32GB I hope this comes as a
> early reminder that the ZIL you are buying that disk for is only going to
> be using a small percent of that disk and I hope you justify cost over it=
s
> actual use. If you do happen to justify creating a ZIL for your pool then
> I hope that you partition it wisely to make use of the rest of the space
> that is untouched.
>
> For all other cases I would reccomend if you still want to have a ZIL tha=
t
> you take some sort of PCI->SD CARD or USB stick into account with
> mirroring.
>
> --
>
>  Regards, (jhell)
>  Jason Hellenthal
>
>
> You have just spotted a gap in the market I suspect. Maybe SSD
manufacturers need to produce a sata based ssd of 1 or 2 gb of the fastest
write speed available flash on the market. Produce it for < =A350 and you
should have a big market

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:42:52 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 45BBE106566B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:42:52 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id C998F8FC1C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:42:51 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net
	[91.21.94.216])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E4805844015;
	Thu, 12 May 2011 10:42:34 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id AD4211289;
	Thu, 12 May 2011 10:42:31 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4C8gUV1019205;
	Thu, 12 May 2011 10:42:30 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011
	10:42:30 +0200
Message-ID: <20110512104230.588214snqsg1gkn4@webmail.leidinger.net>
Date: Thu, 12 May 2011 10:42:30 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: fbsd@dannysplace.net
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
In-Reply-To: <4DCB455C.4020805@dannysplace.net>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: E4805844015.AF78F
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0.6, required 6, autolearn=disabled,
	J_CHICKENPOX_56 0.60)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305794557.33711@BBEaED1K/shx8xOIeQ2VMw
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:42:52 -0000

Quoting Danny Carroll <fbsd@dannysplace.net> (from Thu, 12 May 2011  
12:26:36 +1000):

> Replying to myself in order to summarise the recommendations (when using
> v28):
>  - Don't use SSD for the Log device.  Write speed tends to be a problem.

It depends. You could buy a lot of large and low-power (and sort of  
slow) disks for the raw storage space, and 2 fast but small disks for  
the log. Even if they are not SSDs, this could improve the write  
throughput of the system (also depends upon the bus bandwith of the  
system, the disk-controller (SATA/SCSI) and your workload).

The important part is that normally the log devices should have a  
lower latency and faster transfer speed than the main pool to be  
effective (you may get an improvement even if the devices have the  
same specs, as the main pool does not see the same workload then, but  
it depends upon your workload).

>  - SSD ok for cache if the sizing is right, but without TRIM, don't
> expect to take full advantage of the SSD.

As long as we do not have TRIM to play with, we can not give a proper  
answer here. For sure we can tell that a SSD increases the max  
performance a pool can deliver by a good amount.

I would expect that TRIM can give some improvement for a cache device,  
but I do not expect that it is much. If it is more than 10% I would be  
very surprised. I expect the improvement more in the <=5% range for  
the cache (which may make a difference in read-cases where you are  
hitting the limit).

>  - Do use two devices for log and mirror them with ZFS.  Bad things
> *can* happen if*all* the log devices die.

s/can/will/ as you will lose data in this case. The difference between  
v15 and v28 is the amount of data you lose (the entire pool vs. only  
what is still on the log devices)

>  - Don't colocate L2ARC and Log devices.

This depends upon the devices and your workload. If you do not have a  
lot of throughput, but your applications has some hard requirements  
regarding the latency, it may make sense.

Without measuring the outcome on your own workload, there is not  
really a way to answer this, but if your workload is read and write  
limited, to add first a separate log device. This way the pool is  
freed of the sync-writes, the read performance should increase and the  
write performance too as the data goes to the log device first without  
interfering with reads (in this case it matters more that the device  
is a separate device than that the device is significantly faster).  
Only when this is done and there is more demand regarding reads, I  
would add a significantly faster cache device (or more RAM, depending  
on the specs of the machine and the workload).

Another of disk-tuning: if you are doing something like this on your  
workstation / development system where you just want to have the  
comfort of not waiting for the disks (and the workload does not demand  
a lot of reads+writes at the same time), you could give a shared  
cache/log device a try (but the device needs to be significantly  
faster to feel a difference).

> Remaining questions.
> - Is there any advantage to using a spare partition on a SCSI or SATA
> drive as L2Arc?  Assuming it was in the machine already but doing nothing?

It depends. In general: if the cache is faster (by an amount which  
matters to you) than the pool, it helps. So for your particular case:  
if the other partitions on such a drive are only used occasionally and  
the drive is faster than your pool, you could get a little bit more  
out of it if you add the unused partition as a cache.

As like for all RAIDs, more spindles (disks) give better performance,  
as such it could help if the cache has the same characteristics as the  
rest of the pool, but this depends upon your workload. In such a case  
it is probably better to add the disks to the pool instead of using it  
as a cache. A definitive answer to this can only be obtained by  
running your workload on both setups and compare the results (zpool  
iostat / gstat).

Bye,
Alexander.

-- 
There are three kinds of people: men, women, and unix.

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:45:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8353C1065687
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:45:05 +0000 (UTC)
	(envelope-from numisemis@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 061F38FC20
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:45:04 +0000 (UTC)
Received: by wwc33 with SMTP id 33so1449722wwc.31
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:45:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:subject:mime-version:content-type:from
	:in-reply-to:date:cc:message-id:references:to:x-mailer;
	bh=plLgz8RQpu1MLLWCuOQcV4ARzhPOFTu9RwDrXTs5wgw=;
	b=FrakORjqE3gRQfauCysMMMTNxZHu4vR9CTnKbXQfPiJtBogU8YAfkuN/+yCa3x60Sk
	jRdNsKlkMg2JW+Ic5cx97ZnUrKFsXBduYhxYd56iCt+WymLPpj/9jkfk3X0pQo/9SxWp
	mg/GZNpMt0lj/Av2WZewBbfLzaF/EMgxby428=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:message-id:references:to:x-mailer;
	b=AUmwpc5ZjQz/Enf680Jl7SiWUBrLkU5W9K2rYvOSOxCdrVr2kVf8OxLozXOJ56Rq2v
	BbhNofEA/ziFWboG5wHhtBqVPojor7JwMQKzeFfXbdzB7zMdo+nWMFaLDYp4J3rZnlxW
	LENcru6LLwRjrSeoSxTRw6zL5jmuuIRJymu7s=
Received: by 10.216.69.140 with SMTP id n12mr7610262wed.32.1305189903478;
	Thu, 12 May 2011 01:45:03 -0700 (PDT)
Received: from sime-imac.logos.hr ([213.147.110.159])
	by mx.google.com with ESMTPS id f52sm487017wes.35.2011.05.12.01.45.02
	(version=TLSv1/SSLv3 cipher=OTHER);
	Thu, 12 May 2011 01:45:03 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1084)
From: =?iso-8859-2?Q?=A9imun_Mikecin?= <numisemis@gmail.com>
In-Reply-To: <20110512083429.GA58841@icarus.home.lan>
Date: Thu, 12 May 2011 10:45:01 +0200
Message-Id: <A3B76BB6-49DC-4C2F-BD2B-9A0C62F4D24C@gmail.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
	<4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
X-Mailer: Apple Mail (2.1084)
Content-Type: text/plain;
	charset=us-ascii
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:45:05 -0000

On 12. svi. 2011., at 10:34, Jeremy Chadwick wrote:
>=20
> I had no idea the primary point of a SLOG was to deal with =
applications
> that make use of O_SYNC.  I thought it was supposed to improve write
> performance for both asynchronous and synchronous writes.  Obviously =
I'm
> wrong here.

If the application is not using O_SYNC, write operation returns to the =
app before the data is actually written.

> What guarantee is there that the intent log -- which is written to the
> disk -- actually got written to the disk in the middle of a power
> failure?  There's a lot of focus there on the idea that "the intent =
log
> will fix everything, but may lose writes", but what guarantee do I =
have
> that the intent log isn't corrupt or botched during a power failure?

I expect that checksumming also works for ZIL (anybody knows?). If that =
is the case, corruption would be detected, but you will have lost data =
unless you are using mirrored slog devices.


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:58:05 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B088B106567A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:58:05 +0000 (UTC)
	(envelope-from kraduk@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 3EC328FC1E
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:58:05 +0000 (UTC)
Received: by wwc33 with SMTP id 33so1460113wwc.31
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 01:58:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=zismQCfy0mUCIfSmEvlGeSMHB/NeG1a9TKl3IfhZrz8=;
	b=WQcE56jw8sq7oJ7uk3Y0WfboQ2b67Y7pq6vZv88Wq02MfKzWcUgHG5wJUvl3pO8sUB
	F0y24dqoAFuisfPbk5GvadprsSpMWXU9mmVjTv9SM22Yh6YeYARDeIvrAizjJPrNlf4z
	EIEOoPsQ2HNoUyhypOrppr57B99J3f63qjzVA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=TL8ZzzQv0RBQoh5nrQ96Yx9PwiFcPOjrope0+jb+3JjMKanwprw2M5t6QwcRAAeIQc
	F+KN0vaHC4p7ujg+0QWJnL5x2YP2qF5O/n7Q2O8feAvq8CU9OTZwLeXf85YzLXqvDJ60
	6+Ejq3KFDHJ5oPCIL74zhNUbDJx1UlQvp8g7o=
MIME-Version: 1.0
Received: by 10.216.14.212 with SMTP id d62mr1464049wed.91.1305190684314; Thu,
	12 May 2011 01:58:04 -0700 (PDT)
Received: by 10.216.93.70 with HTTP; Thu, 12 May 2011 01:58:04 -0700 (PDT)
In-Reply-To: <A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg>
	<A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
Date: Thu, 12 May 2011 09:58:04 +0100
Message-ID: <BANLkTikfAKYVm=EtVXVy8kPCmB=AOKjgLA@mail.gmail.com>
From: krad <kraduk@gmail.com>
To: =?UTF-8?Q?=C5=A0imun_Mikecin?= <numisemis@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.5
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:58:05 -0000

2011/5/12 =C5=A0imun Mikecin <numisemis@gmail.com>

>
> On 12. svi. 2011., at 08:44, Daniel Kalchev wrote:
>
> > On 12.05.11 05:26, Danny Carroll wrote:
> >>
> >>  - Don't use SSD for the Log device.  Write speed tends to be a proble=
m.
> > It all depends on your usage. You need to experiment, unfortunately.
>
> What is the alternative for log devices if you are not using SSD?
> Rotating hard drives?
>
> AFAIK, two factors define the speed of log device: write transfer rate an=
d
> write latency.
> You will not find a rotating hard drive that has a write latency anything
> near the write latency of even a slowest SSD you can find on the market.
> On the other hand, only a very few rotating hard drives have a write
> transfer rate that can be compared to SSD's.
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
>


ive seen an interesting article using 2x ram disks on two separate computer=
s
(UPS backed) shared out via iscsi over a high speed network. It had very
successful results. Sounds a bit brave in a production environment to me
though

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 08:59:49 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 321D4106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:59:49 +0000 (UTC)
	(envelope-from fullermd@over-yonder.net)
Received: from thyme.infocus-llc.com (server.infocus-llc.com [206.156.254.44])
	by mx1.freebsd.org (Postfix) with ESMTP id CDB0B8FC0A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 08:59:48 +0000 (UTC)
Received: from draco.over-yonder.net (c-75-64-226-141.hsd1.ms.comcast.net
	[75.64.226.141])
	(using TLSv1 with cipher ADH-CAMELLIA256-SHA (256/256 bits))
	(No client certificate requested)
	by thyme.infocus-llc.com (Postfix) with ESMTPSA id 8CFF837B454;
	Thu, 12 May 2011 03:40:59 -0500 (CDT)
Received: by draco.over-yonder.net (Postfix, from userid 100)
	id 0456661C42; Thu, 12 May 2011 03:40:59 -0500 (CDT)
Date: Thu, 12 May 2011 03:40:59 -0500
From: "Matthew D. Fuller" <fullermd@over-yonder.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20110512084058.GP90856@over-yonder.net>
References: <4DCA5620.1030203@dannysplace.net>
	<20110511100655.GA35129@icarus.home.lan>
	<4DCA66CF.7070608@digsys.bg>
	<20110511105117.GA36571@icarus.home.lan> <4DCA7056.20200@digsys.bg>
	<20110511120830.GA37515@icarus.home.lan>
	<20110511223849.GA65193@DataIX.net>
	<20110512010433.GA48863@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110512010433.GA48863@icarus.home.lan>
X-Editor: vi
X-OS: FreeBSD <http://www.freebsd.org/>
User-Agent: Mutt/1.5.21-fullermd.4 (2010-09-15)
X-Virus-Scanned: clamav-milter 0.97 at thyme.infocus-llc.com
X-Virus-Status: Clean
Cc: freebsd-fs@freebsd.org, Jason Hellenthal <jhell@DataIX.net>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 08:59:49 -0000

On Wed, May 11, 2011 at 06:04:33PM -0700 I heard the voice of
Jeremy Chadwick, and lo! it spake thus:
>
> (What confuses me about the "idle GC" method is how it determines
> what it can erase -- if the OS didn't tell it what it's using, how
> does it know it can erase the page?)

I'm no expert either, but the following is my understanding...


Remember that SSD's (like ZFS, a layer higher up) don't overwrite
blocks, they write new data to a new block and update the pointers the
level above them (the disk LBA in this case) to point at the new
location.

So when you overwrite LBA 12345 on the disk with new data, what
actually happens is that the SDD writes that data to currently empy
flash $SOMEWHERE, and updates its internal table so that LBA 12345
request go there.  The bit of flash that was previously considered LBA
12345 still contains the old data, but is now "free" as far as the
drive is concerned (though not immediately writable, as it needs to be
erased first).  Sorta like rm'ing a file doesn't actually delete its
contents, just the name pointing to it.


Where GC comes in is that the size you can write/address is smaller
than the size flash has to be erased in.  To pick numbers that are in
the right ballpark (it will vary per drive), you have 512 byte blocks
that you can read/write (like any other drive), but you can only erase
a page of 8k at a time.  So let's suppose you write 16 kB of data to a
fresh drive.  You've written 32 512-byte blocks, which completely fill
up 2 8k pages.  All nice and compact.

Now let's suppose you overwrite from 4k-8k and 12k-16k.  Now we have
8k of remaining useful data, but it's spread out over 2 8k pages (4k
in each).  We can't write new stuff those two now "empty" 4k sections,
because we have to erase before we can write, and we can only erase
the whole 8k page.  This is where the GC kicks in; it knows (because
those two LBA ranges have been overwritten) that they're no longer
needed, and can notice that all the remaining important data in those
two pages can actually fit in a single page.  So, it can read 0k-4k
and 8k-12k, and write them into a new empty page.  Update its LBA map
to point those logical addresses over to the new in-flash location,
and now the entirety of those two original 8k pages is unused.  So now
it can go ahead and erase them both, and put them on the "ready for
reuse" list.


Now, as for TRIM.  There are two ways that a block (or set of blocks)
can become "no longer needed".  One is that they're overwritten with
new data; the drive knows that and can mark them as unused like above.
The other is that they contain data for a file that's deleted.  But
the drive has no idea what files being deleted means.  All that
happens from the drive's perspective is an overwrite of some LBA's
that, to the OS, contain directory info.  It has no way of knowing
that impacts these other LBA's that held a file.  TRIM allows the OS
to say "OK, these LBA's?  Yeah, you can trash 'em now."  And so they
end up on the dead list, ready for the GC to collapse them away like
above.


So neither TRIM nor GC is a replacement for the other.  GC is about
collapsing away reapable space (and also serves a purpose in
wear-levelling, but that's unimportant in this discussion).  The drive
automatically knows about space that's reapable because it was
rewritten.  TRIM lets it know about space that's reapable because of
deletion.  Without that, you could delete a file (so LBA 54321 no
longer contains useful info, and doesn't need to be preserved), but
since the drive doesn't know that, not only can the GC not compact
away that space, it has to go ahead and re-copy that block as if it
held good data when it shuffles stuff around, so you're creating extra
wear.

GC can't make TRIM "unnecessary", any more than a book can make a
flashlight unnecessary.  TRIM is one of the ways you provide info for
the GC to use.  One thing that CAN make TRIM less important is writing
in a "compact" manner (e.g., always write new data to the lowest
available LBA).  Assuming you oscillate around a steady disk usage (or
slowly increase), that means that you'll tend to overwrite space for
deleted files relatively soon, so the drive gets to know about the
reapable space that way.  With more random or other LBA allocation, or
if you shrink the used space significantly, a deleted block may hang
around unwritten to for much longer, and so have more chance for the
GC to unnecessarily recopy and recopy it.


This leaves entirely to one side annoying implementational issues.
I'm given to understand that due to some combination of "dumb firmware
implementation" and "dumb standardized requirements", TRIM can be an
unbelievable expensive command, so doing it as part of e.g. 'rm' may
damage performance outrageously.  That may point to a better
implementation being "rack up a list of LBA's and flush periodically",
or "scan filesystem weekly and send TRIM's for all empty LBA's" or the
like.  But again, that's implementation.


-- 
Matthew Fuller     (MF4839)   |  fullermd@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
           On the Internet, nobody can hear you scream.

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 09:05:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 50CF7106566C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:05:51 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta05.emeryville.ca.mail.comcast.net
	(qmta05.emeryville.ca.mail.comcast.net [76.96.30.48])
	by mx1.freebsd.org (Postfix) with ESMTP id 34FF38FC0A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:05:50 +0000 (UTC)
Received: from omta01.emeryville.ca.mail.comcast.net ([76.96.30.11])
	by qmta05.emeryville.ca.mail.comcast.net with comcast
	id iZ5q1g0040EPchoA5Z5qF5; Thu, 12 May 2011 09:05:50 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta01.emeryville.ca.mail.comcast.net with comcast
	id iZ5Q1g00W1t3BNj8MZ5WCz; Thu, 12 May 2011 09:05:45 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 74C9E102C19; Thu, 12 May 2011 02:05:24 -0700 (PDT)
Date: Thu, 12 May 2011 02:05:24 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: ?imun Mikecin <numisemis@gmail.com>
Message-ID: <20110512090524.GA2106@icarus.home.lan>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
	<4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
	<A3B76BB6-49DC-4C2F-BD2B-9A0C62F4D24C@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <A3B76BB6-49DC-4C2F-BD2B-9A0C62F4D24C@gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 09:05:51 -0000

On Thu, May 12, 2011 at 10:45:01AM +0200, ?imun Mikecin wrote:
> On 12. svi. 2011., at 10:34, Jeremy Chadwick wrote:
> > 
> > I had no idea the primary point of a SLOG was to deal with applications
> > that make use of O_SYNC.  I thought it was supposed to improve write
> > performance for both asynchronous and synchronous writes.  Obviously I'm
> > wrong here.
> 
> If the application is not using O_SYNC, write operation returns to the
> app before the data is actually written.

Yes, I understand that -- O_SYNC is effectively the same as issuing
fsync(2) after every write(2) call.  I just thought that the ZIL
improved both synchronous and asynchronous writes, but my understanding
of the ZIL is obviously very limited.

> > What guarantee is there that the intent log -- which is written to the
> > disk -- actually got written to the disk in the middle of a power
> > failure?  There's a lot of focus there on the idea that "the intent log
> > will fix everything, but may lose writes", but what guarantee do I have
> > that the intent log isn't corrupt or botched during a power failure?
> 
> I expect that checksumming also works for ZIL (anybody knows?). If
> that is the case, corruption would be detected, but you will have lost
> data unless you are using mirrored slog devices.

I can't believe that statement either (the last line).

I guess that's also what I'm asking here -- what guarantee do you have
that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data*
will be *lost* during a power outage?

It seems to me the proper phrase would be "the likelihood of losing an
entire pool during a power outage is lessened".  Alexander indirectly
hinted at this in another post of his tonight, specifically regarding
zpool v15 versus v28:

"The difference between v15 and v28 is the amount of data you lose (the
entire pool vs. only what is still on the log devices)".

This makes much more sense to me.

It seems that in a power outage, there will always be some form of data
loss.  I imagine even systems that have hardware RAM/cache with BBUs on
everything; there's always some form of caching going on *somewhere*
within a system, from CPU all the way up, that guarantees some degree of
data loss).  I guess I'm OCD'ing over the terminology here.  Sorry.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |


From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 09:12:20 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AE855106566B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:12:20 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id 33A4C8FC14
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:12:20 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net
	[91.21.94.216])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 11BA3844015;
	Thu, 12 May 2011 11:12:03 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 150D4128A;
	Thu, 12 May 2011 11:12:00 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4C9BwCY026199;
	Thu, 12 May 2011 11:11:58 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011
	11:11:58 +0200
Message-ID: <20110512111158.16451mu57sv0f8f4@webmail.leidinger.net>
Date: Thu, 12 May 2011 11:11:58 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
In-Reply-To: <20110512083429.GA58841@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: 11BA3844015.A258E
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0.077, required 6,
	autolearn=disabled, TW_ZF 0.08)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305796326.56093@9cEHblS5BSxK1HZIana+XA
X-EBL-Spam-Status: No
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 09:12:20 -0000

Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Thu, 12 May  
2011 01:34:29 -0700):

> On Thu, May 12, 2011 at 09:33:06AM +0300, Daniel Kalchev wrote:
>> On 12.05.11 06:36, Jeremy Chadwick wrote:
>> >On Wed, May 11, 2011 at 09:51:58PM -0500, Bob Friesenhahn wrote:
>> >>On Thu, 12 May 2011, Danny Carroll wrote:
>> >>>Replying to myself in order to summarise the recommendations (when using
>> >>>v28):
>> >>>- Don't use SSD for the Log device.  Write speed tends to be a problem.
>> >>DO use SSD for the log device.  The log device is only used for
>> >>synchronous writes.  Except for certain usages (E.g. database and
>> >>NFS server) most writes will be asynchronous and never be written to
>> >>the log.  Huge synchronous writes will also bypass the SSD log
>> >>device. The log device is for reducing latency on small synchronous
>> >>writes.
>> >Bob, please correct me if I'm wrong, but as I understand it a log device
>> >(ZIL) effectively limits the overall write speed of the pool itself.
>> >
>> Perhaps I misstated it in my first post, but there is nothing wrong
>> with using SSD for the SLOG.
>>
>> You can of course create usage/benchmark scenario, where an (cheap)
>> SSD based SLOG will be worse than an (fast) HDD based SLOG,
>> especially if you are not concerned about latency. The SLOG resolves
>> two issues, it increases the pool throughput (primary storage) by
>> removing small synchronous writes from it, that will unnecessarily
>> introduce head movement and more IOPS and it provided low latency
>> for small synchronous writes.
>
> I've been reading about this in detail here:
>
> http://constantin.glez.de/blog/2010/07/solaris-zfs-synchronous-writes-and-zil-explained
>
> I had no idea the primary point of a SLOG was to deal with applications
> that make use of O_SYNC.  I thought it was supposed to improve write
> performance for both asynchronous and synchronous writes.  Obviously I'm
> wrong here.
>
> The author's description (at that URL) of an example scenario makes
> little sense to me; there's a story he tells referring to a bank and a
> financial transaction of US$699 performed which got cached in RAM and
> then the system lost power -- and how the intent log on a filesystem
> would be replayed during reboot.
>
> What guarantee is there that the intent log -- which is written to the
> disk -- actually got written to the disk in the middle of a power
> failure?  There's a lot of focus there on the idea that "the intent log
> will fix everything, but may lose writes", but what guarantee do I have
> that the intent log isn't corrupt or botched during a power failure?

The request comes in, the data is written to stable storage (as it is  
a sync-write to the SLOG), the application knows that the data has hit  
stable storage when the write-call returns (as it is a sync-write),  
the app ACKs to the other party.

Without the SLOG you can have the same, just at a lower speed (if done  
correctly). So the SLOG is not about the guarantee (you should have it  
already with a normal pool and disks which tell the truth regarding a  
cache flush), the SLOG is about a higher amount of transactions with  
such a guarantee.

>> The later is only valid if the SSD is sufficiently write-optimized.
>> Most consumer SSDs end up saturated by writes. Sequential write IOPS
>> is what matters here.
>
> Oh, I absolutely agree on this point.  So basically consumer-level SSDs
> that don't provide extreme write speed benefits (compared to a classic
> MHDD) -- not discussing seek times here, we all know SSDs win there --
> probably aren't good candidates for SLOGs.
>
> What's interesting about the focus on IOPS is that Intel SSDs, in the
> consumer class, still trump their competitors.  But given that your
> above statement focuses on sequential writes, and the site I provided is
> quite clear about what happens to sequential writes on Intel SSD that
> doesn't have TRIM..... Yeah, you get where I'm going with this.  :-)

TRIM for SLOG is IMO more important than TRIM for the cache. For the  
SLOG the write-latency matters, for the cache normally it does not  
_that much_. Remember, if you are in the case that something is moved  
from RAM to L2ARC, the data you move is not needed ATM. Data is moved  
from RAM to L2ARC because the OS decides that either the ARC is at  
some kind of high-watermark (predicting the future and make sure there  
is some free space for future data, respectively some kind of garbage  
collection), or because the OS really needs some free RAM _now_  
(either some free area in the ARC, or because an application needs  
memory). In the first case the write latency does not matter much, in  
the second case it matters (but in this case you can evaluate if  
adding more RAM is an option here).

>> About TRIM. As it was already mentioned, you will use only small
>> portion of an (for example) 32GB SSD for the SLOG. If you do not
>> allocate the entire SSD, then wear leveling will be able to play
>> well and it is very likely you will not suffer any performance
>> degradation.
>
> That sounds ideal, though I'm not sure about the "won't suffer ANY
> performance degradation" part.  I think degradation is just less likely
> to be witnessed.

IMO TRIM support for ZFS can improve the performance. IMO the most  
bang for the bucks would be to add TRIM support first (if it can not  
be added to everything at the same time) for SLOGs, then for the pool,  
and then for the cache.

My rationale here is, that if you use a SLOG you have very high  
requirements for sync-writes, and consumer SSDs could give you a lot  
of ROI if the SSD is not used completely and TRIM is used. I do not  
expect that TRIM for the cache gives a lot of ROI (less than TRIM  
support for the pool).

FYI: it also depends upon how TRIM is implemented. If you TRIM one LBA  
after another, this adds a huge amount of latency just for the TRIM. I  
do not know if TRIMming a range of LBAs is a lot cheaper, but I would  
expect it is. TRIMming in FreeBSD (in UFS) is AFAIK one LBA after  
another.

Bye,
Alexander.

-- 
"Sonny, what is it?"
"They shot the old man. Don't worry, he's not dead."
		-- Sandra and Santino Corleone, "Chapter 2", page 83

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 09:50:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C72E1065670
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:50:30 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id CE0168FC22
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 09:50:29 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net
	[91.21.94.216])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id 31118844015;
	Thu, 12 May 2011 11:50:15 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 44FEF128B;
	Thu, 12 May 2011 11:50:12 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4C9oBqD035205;
	Thu, 12 May 2011 11:50:11 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011
	11:50:11 +0200
Message-ID: <20110512115011.17724x18akn60oao@webmail.leidinger.net>
Date: Thu, 12 May 2011 11:50:11 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan> <4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
	<A3B76BB6-49DC-4C2F-BD2B-9A0C62F4D24C@gmail.com>
	<20110512090524.GA2106@icarus.home.lan>
In-Reply-To: <20110512090524.GA2106@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 7bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: 31118844015.AEC84
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0.6, required 6, autolearn=disabled,
	J_CHICKENPOX_33 0.60)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305798616.47302@T9cO4aU4OGvqHqwn9kk7mQ
X-EBL-Spam-Status: No
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 09:50:30 -0000

Quoting Jeremy Chadwick <freebsd@jdc.parodius.com> (from Thu, 12 May  
2011 02:05:24 -0700):

>> > What guarantee is there that the intent log -- which is written to the
>> > disk -- actually got written to the disk in the middle of a power
>> > failure?  There's a lot of focus there on the idea that "the intent log
>> > will fix everything, but may lose writes", but what guarantee do I have
>> > that the intent log isn't corrupt or botched during a power failure?
>>
>> I expect that checksumming also works for ZIL (anybody knows?). If

It would be a damn big design flaw if it wouldn't checksum the ZIL.

>> that is the case, corruption would be detected, but you will have lost
>> data unless you are using mirrored slog devices.
>
> I can't believe that statement either (the last line).
>
> I guess that's also what I'm asking here -- what guarantee do you have
> that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data*
> will be *lost* during a power outage?
>
> It seems to me the proper phrase would be "the likelihood of losing an
> entire pool during a power outage is lessened".  Alexander indirectly
> hinted at this in another post of his tonight, specifically regarding
> zpool v15 versus v28:
>
> "The difference between v15 and v28 is the amount of data you lose (the
> entire pool vs. only what is still on the log devices)".

To recover the context: This was for losing the SLOG completely.

> This makes much more sense to me.
>
> It seems that in a power outage, there will always be some form of data
> loss.  I imagine even systems that have hardware RAM/cache with BBUs on
> everything; there's always some form of caching going on *somewhere*
> within a system, from CPU all the way up, that guarantees some degree of
> data loss).  I guess I'm OCD'ing over the terminology here.  Sorry.

A simple power-loss should not destroy the SLOG (or the pool). For  
easy comprehension just let us assume that the log can only be  
destroyed by a hardware problem (broken disk -> the reason why it  
should be mirrored -> if all devices are broken, you have the same  
case as if the pool without a SLOG lost more drives than the  
redundancy allows): As written in my other mail (which I've send  
before I've seen this mail but probably arrived after you wrote this  
mail), the SLOG is not about an enhanced guarantee (you had the  
guarantee before), it is about performance.

You need to handle the data-loss problem at several layers.

If you have a power-loss during the write of the SLOG, you will lose  
the last SLOG entry (but there is no corruption). At this point in  
time the write did not return to the application, so the application  
should not have ACKed the reception of the data. If it did, you will  
lose data. If it didn't the application will just pick this  
transaction again from the queue of outstanding transactions and redo  
it. Detecting the case of a succeeded write but a power-loss before  
the ACK to the sender is up to be handled in the application too (e.g.  
calculating an ID based upon the incoming data, writing the ID  
together with the rest of the transaction, if the ID is in e.g. the DB  
and a corresponding state flag in the DB (if the processing is split  
up into several DB-transactions) which is written in the corresponding  
transaction then you know that the write before the power-loss was  
done correctly and the app just needs to ACK to the sender).

Was this clear enough, or shall I try to draw a better picture (in  
this case please try to specify your concerns, maybe with an example)?

Bye,
Alexander.

-- 
Do YOU have redeeming social value?

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID = 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 10:03:36 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5CDD7106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 10:03:36 +0000 (UTC)
	(envelope-from alexander@leidinger.net)
Received: from mail.ebusiness-leidinger.de (mail.ebusiness-leidinger.de
	[217.11.53.44]) by mx1.freebsd.org (Postfix) with ESMTP id CCF488FC16
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 10:03:35 +0000 (UTC)
Received: from outgoing.leidinger.net (p5B155ED8.dip.t-dialin.net
	[91.21.94.216])
	by mail.ebusiness-leidinger.de (Postfix) with ESMTPSA id E2840844015;
	Thu, 12 May 2011 12:03:20 +0200 (CEST)
Received: from webmail.leidinger.net (webmail.Leidinger.net
	[IPv6:fd73:10c7:2053:1::2:102])
	by outgoing.leidinger.net (Postfix) with ESMTP id 0421A128C;
	Thu, 12 May 2011 12:03:18 +0200 (CEST)
Received: (from www@localhost)
	by webmail.leidinger.net (8.14.4/8.13.8/Submit) id p4CA3HOg038348;
	Thu, 12 May 2011 12:03:17 +0200 (CEST)
	(envelope-from Alexander@Leidinger.net)
Received: from pslux.ec.europa.eu (pslux.ec.europa.eu [158.169.9.14]) by
	webmail.leidinger.net (Horde Framework) with HTTP; Thu, 12 May 2011
	12:03:17 +0200
Message-ID: <20110512120317.12543g51m4im15k4@webmail.leidinger.net>
Date: Thu, 12 May 2011 12:03:17 +0200
From: Alexander Leidinger <Alexander@Leidinger.net>
To: =?utf-8?b?wqlpbXVu?= Mikecin <numisemis@gmail.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net> <4DCB81B8.6070301@digsys.bg>
	<A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
In-Reply-To: <A0F4CB3E-1068-4E19-A50F-AA5620A9EB85@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;
 charset=UTF-8;
 DelSp="Yes";
 format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
X-EBL-MailScanner-Information: Please contact the ISP for more information
X-EBL-MailScanner-ID: E2840844015.ADE71
X-EBL-MailScanner: Found to be clean
X-EBL-MailScanner-SpamCheck: not spam, spamhaus-ZEN,
	SpamAssassin (not cached, score=0, required 6, autolearn=disabled)
X-EBL-MailScanner-From: alexander@leidinger.net
X-EBL-MailScanner-Watermark: 1305799403.3855@Z7UYgEbaHiH2ZjllWstZjA
X-EBL-Spam-Status: No
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 10:03:36 -0000

Quoting =C2=A9imun Mikecin <numisemis@gmail.com> (from Thu, 12 May 2011 =20
10:02:42 +0200):

>
> On 12. svi. 2011., at 08:44, Daniel Kalchev wrote:
>
>> On 12.05.11 05:26, Danny Carroll wrote:
>>>
>>>  - Don't use SSD for the Log device.  Write speed tends to be a problem=
.
>> It all depends on your usage. You need to experiment, unfortunately.
>
> What is the alternative for log devices if you are not using SSD?
> Rotating hard drives?
>
> AFAIK, two factors define the speed of log device: write transfer =20
> rate and write latency.

There is also bus contention (either on the SCSI bus, or in the SATA =20
channel/controller, or on the PCI-whatever (e/X/y) bus).

> You will not find a rotating hard drive that has a write latency =20
> anything near the write latency of even a slowest SSD you can find =20
> on the market.
> On the other hand, only a very few rotating hard drives have a write =20
> transfer rate that can be compared to SSD's.

And if your PCI-something bus is not saturated but your SCSI/SATA =20
controller struggles with the work which is thrown at it, a separate =20
log device (normal HD) on another controller could free up the =20
pool-controller(s) up to a situation where it can handle all requests =20
at full speed and the log-controller can provide the additional =20
throughput at full speed which the pool-controller was not able to =20
satisfy.

What you do in this case is that you add more spindles (disks) =20
dedicated to sync-write operations. The normal RAID-common-knowledge =20
of "adding more spindles for more performance" applies here, just that =20
it is specially for sync-write operations. The generic hint to have =20
them faster than the pool-disks is an answer for the worst case. As =20
always, the worst case for one person may not be the worst case for =20
another persons workload.

Bye,
Alexander.

--=20
If love is the answer, could you rephrase the question?
=09=09-- Lily Tomlin

http://www.Leidinger.net    Alexander @ Leidinger.net: PGP ID =3D B0063FE7
http://www.FreeBSD.org       netchild @ FreeBSD.org  : PGP ID =3D 72077137

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 10:16:56 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 63E48106566B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 10:16:56 +0000 (UTC)
	(envelope-from daniel@digsys.bg)
Received: from smtp-sofia.digsys.bg (smtp-sofia.digsys.bg [193.68.3.230])
	by mx1.freebsd.org (Postfix) with ESMTP id C87F78FC1A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 10:16:55 +0000 (UTC)
Received: from dcave.digsys.bg (dcave.digsys.bg [192.92.129.5])
	(authenticated bits=0)
	by smtp-sofia.digsys.bg (8.14.4/8.14.4) with ESMTP id p4CAGh52059565
	(version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
	Thu, 12 May 2011 13:16:49 +0300 (EEST)
	(envelope-from daniel@digsys.bg)
Message-ID: <4DCBB38B.3090806@digsys.bg>
Date: Thu, 12 May 2011 13:16:43 +0300
From: Daniel Kalchev <daniel@digsys.bg>
User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US;
	rv:1.9.2.15) Gecko/20110307 Thunderbird/3.1.9
MIME-Version: 1.0
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
	<4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
In-Reply-To: <20110512083429.GA58841@icarus.home.lan>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 10:16:56 -0000


On 12.05.11 11:34, Jeremy Chadwick wrote:
>
> I guess this is why others have mentioned the importance of BBUs and
> supercaps, but I don't know what guarantee there is that during a power
> failure there won't be some degree of filesystem corruption or lost
> data.
You can think of the SLOG as the BBU of ZFS.

The best SLOG of course is battery backed RAM. Just what the BBUs are. 
Any battery backed RAM device used for SLOG will beat (by a large 
margin) any however expensive SSD.

Fears of corruption is, besides performance, what makes people use SLC 
flash for SLOG devices. The MLC flash is much more prone to errors, than 
SLC flash. This includes situations like power loss.
This is also the reason people talk so much about super-capacitors.

>
>> How can ever TRIM support influence reading from the drive?!
> I guess you want more proof, so here you go.
Of course :)
> I imagine the reason this happens is similar to why memory performance
> degrades under fragmentation or when there's a lot of "middle-man stuff"
> going on.
TRIM does not change fragmentation.
All TRIM does is erase the flash cells in background, so that when the 
new write request arrives, data can just be written, instead of 
erased-written. The erase operation is slow in flash memory.
Think of TRIM as OS-assisted garbage collection. It is nothing else -- 
no matter what advertising says :)

Also, please note that there is no "fragmentation" in either SLOG or 
L2ARC to be concerned with. There are no "files" there - just raw blocks 
that can sit anywhere.

>> TRIM is an slow operation. How often are these issued?
> Good questions, for which I have no answer.  The same could be asked of
> any OS however, not just Windows.  And I've asked the same question
> about SSDs internal "garbage collection" too.  I have no answers, so you
> and I are both wondering the same question.  And yes, I am aware TRIM is
> a costly operation.
Well, at least we know some commodity SSDs on the market have "lazy" 
garbage collection, some do it right away. The "lazy"  drives give good 
performance initially

Jeremy, thanks for the detailed data.

So much about theory :)

Just a quick "(slow) HDD as SLOG" test, not very scientific :)

Hardware: Supermicro X8DTH-6F (integrated LSI2008)
2xE5620 Xeons
24 GB RAM
6x Hitachi HDS72303 drives

All disks are labeled with GPT, first partition on 1GB.

First, create ashift=12 raidz2 zpool with all drives
# gnop create -S 4096 gpt/disk00
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 
gpt/disk04 gpt/disk05

$ bonnie++
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
a1.register.bg  48G   126  99 293971  93 177423  52   357  99 502710  86 
234.2   8
Latency             68881us    2817ms    5388ms   37301us    1266ms     
471ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
a1.register.bg      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 25801  90 +++++ +++ 23915  94 25869  98 +++++ +++ 
24858  97
Latency             12098us     117us     141us   24121us      29us      
66us
1.96,1.96,a1.register.bg,1,1305158675,48G,,126,99,293971,93,177423,52,357,99,502710,86,234.
2,8,16,,,,,25801,90,+++++,+++,23915,94,25869,98,+++++,+++,24858,97,68881us,2817ms,5388ms,37
301us,1266ms,471ms,12098us,117us,141us,24121us,29us,66us

Recreate the pool with 5 drives + one drive as SLOG

# zpool destroy storage
# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 
gpt/disk04 log gpt/disk05

$ bonnie++
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
a1.register.bg  48G   110  99 306932  68 223853  46   354  99 664034  65 
501.8  11
Latency               172ms   11571ms    4217ms   50414us    1895ms     
245ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
a1.register.bg      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 24673  97 +++++ +++ 24262  98 19108  97 +++++ +++ 
23821  97
Latency             12051us     132us     143us   23392us      47us      
79us
1.96,1.96,a1.register.bg,1,1305171999,48G,,110,99,306932,68,223853,46,354,99,664034,65,501.8,11,16,,,,,24673,97,+++++,+++,24262,98,19108,97,+++++,+++,23821,97,172ms,11571ms,4217ms,50414us,1895ms,245ms,12051us,132us,143us,23392us,47us,79us


Interesting to note that

zpool iostat -v 1
never showed more than 128K of usage on the SLOG drive, although from 
time to time it was hitting over 1200 IOPS and over 150 MB/s write.

Also, the second pool is with one disk less. For comparison, here is the 
same pool with 5 disks and no SLOG

# zpool create storage gpt/disk00.nop gpt/disk01 gpt/disk02 gpt/disk03 
gpt/disk04

$ bonnie++
Version  1.96       ------Sequential Output------ --Sequential Input- 
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  
/sec %CP
a1.register.bg  48G   118  99 287361  92 152566  40   345  98 398392  51 
242.4  24
Latency             56962us    2619ms    4308ms   57304us    1214ms     
350ms
Version  1.96       ------Sequential Create------ --------Random 
Create--------
a1.register.bg      -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  
/sec %CP
                  16 27438  95 +++++ +++ 19374  90 25259  97 +++++ +++  
6876  99
Latency              8913us     200us     295us   27249us      30us     
238us
1.96,1.96,a1.register.bg,1,1305165435,48G,,118,99,287361,92,152566,40,345,98,398392,51,242.
4,24,16,,,,,27438,95,+++++,+++,19374,90,25259,97,+++++,+++,6876,99,56962us,2619ms,4308ms,57
304us,1214ms,350ms,8913us,200us,295us,27249us,30us,238us


One side effect I forgot to mention from using a SLOG is less 
fragmentation in the pool. When the ZIL is in the main pool, it is 
frequently written and erased and the ZIL is variable size, leaving 
undesired gaps.

Hope this helps.

Daniel

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 13:57:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CD1C3106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 13:57:30 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 90A348FC0A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 13:57:29 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p4CDvHqT015705; Thu, 12 May 2011 08:57:17 -0500 (CDT)
Date: Thu, 12 May 2011 08:57:17 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110512033626.GA52047@icarus.home.lan>
Message-ID: <alpine.GSO.2.01.1105120842580.20825@freddy.simplesystems.org>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Thu, 12 May 2011 08:57:17 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 13:57:30 -0000

On Wed, 11 May 2011, Jeremy Chadwick wrote:
>
> Bob, please correct me if I'm wrong, but as I understand it a log device
> (ZIL) effectively limits the overall write speed of the pool itself.
> Consumer-level SSDs do not have extremely high write performance (and it
> gets worse without TRIM; again a 70% decrease in write speed in some
> cases).

It is certainly a factor.  However, large block writes (something like 
128K, I don't remember exactly) bypass the dedicated log device and 
instead are written to the main store (with only a reference being 
added to the dedicated device).  The reason this is done is for the 
exact reason you point out.  The SSD has a very fast seek and zero 
rotational latency but being a singular resource it suffers from 
bandwidth limitations.  The main store usually suffers from 
multi-millisecond seeks and rotational latency but offers linearly 
scalable and substantial write performance for larger writes.

Matt Ahrens has described this a few times on the zfs-discuss list and 
there is mention of it on slide 15 of the presentation found at 
"http://www.slideshare.net/edigit/zfs-presentation".

The large write feature of the ZIL is a reason why we should 
appreciate modern NFS's large-write capability and avoid anchient NFS.

It is worth mentioning that the ZIL is a write-only device which is 
only read when the system boots or a pool is imported.  The writes are 
usually "write and forget" since zfs uses them to improve its ability 
to cache larger transaction groups.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 14:08:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B069B106564A
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 14:08:06 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 770D18FC13
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 14:08:06 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p4CE85uW015763; Thu, 12 May 2011 09:08:05 -0500 (CDT)
Date: Thu, 12 May 2011 09:08:05 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
In-Reply-To: <20110512083429.GA58841@icarus.home.lan>
Message-ID: <alpine.GSO.2.01.1105120902220.20825@freddy.simplesystems.org>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
	<4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Thu, 12 May 2011 09:08:05 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 14:08:06 -0000

On Thu, 12 May 2011, Jeremy Chadwick wrote:
>
> What guarantee is there that the intent log -- which is written to the
> disk -- actually got written to the disk in the middle of a power
> failure?  There's a lot of focus there on the idea that "the intent log
> will fix everything, but may lose writes", but what guarantee do I have
> that the intent log isn't corrupt or botched during a power failure?

This is pretty easy.  Zfs requests that the disk containing the intent 
log commit its update (and waits for completion) before it returns 
from the "write" request.  As long as the disk does not lie, the data 
will be present after the reboot.

Note that many SSDs do lie about cache commit requests and these are 
best avoided for anything to do with zfs.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 18:05:59 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 852B3106566C
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 18:05:59 +0000 (UTC)
	(envelope-from gpalmer@freebsd.org)
Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1])
	by mx1.freebsd.org (Postfix) with ESMTP id 53A5A8FC13
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 18:05:59 +0000 (UTC)
Received: from gjp by noop.in-addr.com with local (Exim 4.74 (FreeBSD))
	(envelope-from <gpalmer@freebsd.org>)
	id 1QKaGr-000BCB-RL; Thu, 12 May 2011 14:05:57 -0400
Date: Thu, 12 May 2011 14:05:57 -0400
From: Gary Palmer <gpalmer@freebsd.org>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20110512180557.GB37035@in-addr.com>
References: <4DCA5620.1030203@dannysplace.net>
	<4DCB455C.4020805@dannysplace.net>
	<alpine.GSO.2.01.1105112146500.20825@freddy.simplesystems.org>
	<20110512033626.GA52047@icarus.home.lan>
	<4DCB7F22.4060008@digsys.bg>
	<20110512083429.GA58841@icarus.home.lan>
	<A3B76BB6-49DC-4C2F-BD2B-9A0C62F4D24C@gmail.com>
	<20110512090524.GA2106@icarus.home.lan>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110512090524.GA2106@icarus.home.lan>
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: gpalmer@freebsd.org
X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false
Cc: freebsd-fs <freebsd-fs@freebsd.org>
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 18:05:59 -0000

On Thu, May 12, 2011 at 02:05:24AM -0700, Jeremy Chadwick wrote:
> I guess that's also what I'm asking here -- what guarantee do you have
> that even with a mirrored 2-disk SLOG (or heck, 3 or 4!) that *no data*
> will be *lost* during a power outage?
> 
> It seems to me the proper phrase would be "the likelihood of losing an
> entire pool during a power outage is lessened".  Alexander indirectly
> hinted at this in another post of his tonight, specifically regarding
> zpool v15 versus v28:
> 
> "The difference between v15 and v28 is the amount of data you lose (the
> entire pool vs. only what is still on the log devices)".
> 
> This makes much more sense to me.
> 
> It seems that in a power outage, there will always be some form of data
> loss.  I imagine even systems that have hardware RAM/cache with BBUs on
> everything; there's always some form of caching going on *somewhere*
> within a system, from CPU all the way up, that guarantees some degree of
> data loss).  I guess I'm OCD'ing over the terminology here.  Sorry.

At one level, nothing you can do in hardware can protect you from data
loss or corruption due to a power outage.  This is why applications and
protocols must be designed with that in mind. E.g. RFC 821/2821/5321
explicitly state that a MTA cannot acknowledge the <CR><LF>.<CR><LF> at
the end of the DATA segment until the message is committed to permanent
storage.  That can lead to message duplication, but thats better than the
alternative - the message is always queued *somewhere*. (And yes, there are/
were vendors who "accidentally" overlook that requirement in the name of
increased throughput)

Trying to solve this entirely in hardware is pointless.  You need to look
at the entire system end-to-end to eliminate data loss problems.

Regards,

Gary

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 23:02:53 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 087D3106566B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 23:02:53 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca
	[131.104.91.44])
	by mx1.freebsd.org (Postfix) with ESMTP id BC2FF8FC14
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 23:02:52 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAI9mzE2DaFvO/2dsb2JhbACEVqIaiHCuNpEYgSuDY4EHBI98jwU
X-IronPort-AV: E=Sophos;i="4.64,361,1301889600"; d="scan'208";a="124562715"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 12 May 2011 19:02:51 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id C4EDE793A7;
	Thu, 12 May 2011 19:02:51 -0400 (EDT)
Date: Thu, 12 May 2011 19:02:51 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Message-ID: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <alpine.GSO.2.01.1105120842580.20825@freddy.simplesystems.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.201]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 23:02:53 -0000

> On Wed, 11 May 2011, Jeremy Chadwick wrote:
> >
> > Bob, please correct me if I'm wrong, but as I understand it a log
> > device
> > (ZIL) effectively limits the overall write speed of the pool itself.
> > Consumer-level SSDs do not have extremely high write performance
> > (and it
> > gets worse without TRIM; again a 70% decrease in write speed in some
> > cases).
> 
> It is certainly a factor. However, large block writes (something like
> 128K, I don't remember exactly) bypass the dedicated log device and
> instead are written to the main store (with only a reference being
> added to the dedicated device). The reason this is done is for the
> exact reason you point out. The SSD has a very fast seek and zero
> rotational latency but being a singular resource it suffers from
> bandwidth limitations. The main store usually suffers from
> multi-millisecond seeks and rotational latency but offers linearly
> scalable and substantial write performance for larger writes.
> 
> Matt Ahrens has described this a few times on the zfs-discuss list and
> there is mention of it on slide 15 of the presentation found at
> "http://www.slideshare.net/edigit/zfs-presentation".
> 
> The large write feature of the ZIL is a reason why we should
> appreciate modern NFS's large-write capability and avoid anchient NFS.
> 
The size of a write for the new FreeBSD NFS server is limited to
MAX_BSIZE. It is currently 64K, but I would like to see it much larger.
I am going to try increasing MAX_BSIZE soon, to see what happens.

This sounds like another good reason to increase it.

However, a client chooses what size to use, up to the server`s
limit (and, again, MAX_BSIZE for the FreeBSD client).

rick

From owner-freebsd-fs@FreeBSD.ORG  Thu May 12 23:19:51 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9DB19106566B
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 23:19:51 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 6522B8FC19
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 23:19:51 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p4CNJo53017906; Thu, 12 May 2011 18:19:50 -0500 (CDT)
Date: Thu, 12 May 2011 18:19:50 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Rick Macklem <rmacklem@uoguelph.ca>
In-Reply-To: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca>
Message-ID: <alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org>
References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Thu, 12 May 2011 18:19:50 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 12 May 2011 23:19:51 -0000

On Thu, 12 May 2011, Rick Macklem wrote:
>> The large write feature of the ZIL is a reason why we should
>> appreciate modern NFS's large-write capability and avoid anchient NFS.
>>
> The size of a write for the new FreeBSD NFS server is limited to
> MAX_BSIZE. It is currently 64K, but I would like to see it much larger.
> I am going to try increasing MAX_BSIZE soon, to see what happens.

Zfs would certainly appreciate 128K since that is its default block 
size.  When existing file content is overwritten, writing in properly 
aligned 128K blocks is much faster due to ZFS's COW algorithm and not 
needing to read the existing block.  With a partial "overwrite", if 
the existing block is not already cached in the ARC, then it would 
need to be read from underlying store before the replacement block can 
be written.  This effect becomes readily apparent in benchmarks.  In 
my own benchmarking I have found that 128K is sufficient and using 
larger multiples of 128K does not obtain much more performance.

When creating a file from scratch, zfs performs well for async writes 
if a process writes data smaller than 128K.  That might not be the 
case for sync writes.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 00:03:40 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 85066106566B
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 00:03:40 +0000 (UTC)
	(envelope-from rmacklem@uoguelph.ca)
Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca
	[131.104.91.36])
	by mx1.freebsd.org (Postfix) with ESMTP id 448F18FC13
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 00:03:40 +0000 (UTC)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEAK50zE2DaFvO/2dsb2JhbACEVqIaiHCte5ETgSuDY4EHBI98jwU
X-IronPort-AV: E=Sophos;i="4.64,361,1301889600"; d="scan'208";a="120545285"
Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca)
	([131.104.91.206])
	by esa-annu-pri.mail.uoguelph.ca with ESMTP; 12 May 2011 20:03:39 -0400
Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1])
	by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 49C93B3F5B;
	Thu, 12 May 2011 20:03:39 -0400 (EDT)
Date: Thu, 12 May 2011 20:03:39 -0400 (EDT)
From: Rick Macklem <rmacklem@uoguelph.ca>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Message-ID: <921935873.267812.1305245019197.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Originating-IP: [172.17.91.202]
X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - IE7 (Win)/6.0.10_GA_2692)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 00:03:40 -0000

> On Thu, 12 May 2011, Rick Macklem wrote:
> >> The large write feature of the ZIL is a reason why we should
> >> appreciate modern NFS's large-write capability and avoid anchient
> >> NFS.
> >>
> > The size of a write for the new FreeBSD NFS server is limited to
> > MAX_BSIZE. It is currently 64K, but I would like to see it much
> > larger.
> > I am going to try increasing MAX_BSIZE soon, to see what happens.
> 
> Zfs would certainly appreciate 128K since that is its default block
> size. When existing file content is overwritten, writing in properly
> aligned 128K blocks is much faster due to ZFS's COW algorithm and not
> needing to read the existing block. With a partial "overwrite", if
> the existing block is not already cached in the ARC, then it would
> need to be read from underlying store before the replacement block can
> be written. This effect becomes readily apparent in benchmarks. In
> my own benchmarking I have found that 128K is sufficient and using
> larger multiples of 128K does not obtain much more performance.
> 
> When creating a file from scratch, zfs performs well for async writes
> if a process writes data smaller than 128K. That might not be the
> case for sync writes.
> 
Yep, I think sizes greater than 128K might only benefit WAN connections
with a larger bandwidth * delay product.

It also helps to find "not so great" network interfaces/drivers. When I
used 128K on the Mac OS X port, it worked great for some Macs and horribly
for others. Some Macs would drop packets when they would see a burst of
read traffic (the Mac was a client and the server was Solaris10, which
handles NFS read/write sizes up to 1Mbyte) and wouldn't perform well above
32Kbytes (for a now rather old port to Leopard).

rick

From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 00:46:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 6B02C106566B
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 00:46:06 +0000 (UTC)
	(envelope-from fjwcash@gmail.com)
Received: from mail-yx0-f182.google.com (mail-yx0-f182.google.com
	[209.85.213.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 272FD8FC0A
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 00:46:06 +0000 (UTC)
Received: by yxl31 with SMTP id 31so927967yxl.13
	for <freebsd-fs@freebsd.org>; Thu, 12 May 2011 17:46:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:mime-version:in-reply-to:references:date
	:message-id:subject:from:to:cc:content-type;
	bh=/Z9GaqqcyNhje0gNUL7KOwYgFeYbd6c6Fr8+MdnbI+o=;
	b=hqC3Xf4xnwOsbS8GzegE30o4Z76U64Dwf2ctuXqc+rsIUcIumjZPXYFYOnn41QFxdt
	c4eCgaZNPPfDgx1cGMiZE0E8FZ2RB0ujisFqiP3GDqsKy88FVX2D6zEJXN1dB2LbyFyl
	ycvY4SjPaWTnjbVTzfDWJiHpCwso5ikyVz02U=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type;
	b=P0/15Yg8rbeRHALIQT6S05gheSxZMUSZJJudKQ5vv+PNKMjLdMpvxi4J6Z7EEjIg7O
	QJfEWYSfguqweYUG0zJEaIPgKdrrREqyrNzYi9sUnRucYrSM74Vyx6402+5j6iMEJLW7
	1XR0wd6erGx7OSXHTa/FDYOBgtaF/tT7pS1fE=
MIME-Version: 1.0
Received: by 10.90.194.2 with SMTP id r2mr841599agf.86.1305247565592; Thu, 12
	May 2011 17:46:05 -0700 (PDT)
Received: by 10.90.52.15 with HTTP; Thu, 12 May 2011 17:46:05 -0700 (PDT)
In-Reply-To: <alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org>
References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca>
	<alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org>
Date: Thu, 12 May 2011 17:46:05 -0700
Message-ID: <BANLkTi=b+wA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com>
From: Freddie Cash <fjwcash@gmail.com>
To: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
Content-Type: text/plain; charset=UTF-8
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 00:46:06 -0000

On Thu, May 12, 2011 at 4:19 PM, Bob Friesenhahn
<bfriesen@simple.dallas.tx.us> wrote:
> On Thu, 12 May 2011, Rick Macklem wrote:
>>>
>>> The large write feature of the ZIL is a reason why we should
>>> appreciate modern NFS's large-write capability and avoid anchient NFS.
>>>
>> The size of a write for the new FreeBSD NFS server is limited to
>> MAX_BSIZE. It is currently 64K, but I would like to see it much larger.
>> I am going to try increasing MAX_BSIZE soon, to see what happens.
>
> Zfs would certainly appreciate 128K since that is its default block size.

Note:  the "default block size" is a max block size, not an "every
block written is this size" setting.  A ZFS filesystem will use any
power-of-2 size under the block size setting for that filesystem.

Only zvols have an "every block written will be this size" setting.


-- 
Freddie Cash
fjwcash@gmail.com

From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 12:12:45 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 71A54106566C
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 12:12:45 +0000 (UTC)
	(envelope-from szumo@szumo.net)
Received: from v000054.home.net.pl (people.pl [212.85.96.54])
	by mx1.freebsd.org (Postfix) with SMTP id CCA798FC12
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 12:12:44 +0000 (UTC)
Received: from vmy2.home.net.pl [79.96.240.52] (HELO vmy2.home.net.pl)
	by people.home.pl [212.85.96.54] with SMTP (IdeaSmtpServer v0.70)
	id 5ad367fe1196bb13; Fri, 13 May 2011 13:46:03 +0200
Received: from 80.53.66.10 (80.53.66.10) user szumo.people via webmail
From: "=?UTF-8?B?TWFjaWVqIFN6dW1vY2tp?=" <szumo@szumo.net>
To: freebsd-fs@freebsd.org
Date: Fri, 13 May 2011 13:46:03 +0200
Content-Type: text/plain; charset=UTF-8
User-Agent: home.pl my.webmail/2.0
MIME-Version: 1.0
X-Mailer: home.pl my.webmail/2.0
X-Priority: 3
Message-ID: <1975530a6ee695ac3a1fea7648612c50.qmail@home.pl>
Content-Transfer-Encoding: base64
Subject: Kernel panic on zfs pool import (8.2-RELEASE #0 amd64)
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 12:12:45 -0000

77u/SGkgYWxsLAoKSSBnZXQgYSBrZXJuZWwgcGFuaWMgd2hlbiB0cnlpbmcgdG8gaW1wb3J0IGEg
emZzIHBvb2wgb24gYm9vdDoKRmF0YWwgdHJhcCAxMjogcGFnZSBmYXVsdCB3aGlsZSBpbiBrZXJu
ZWwgbW9kZQpjcHVpZCA9IDA7IGFwaWMgaWQgPSAwMApmYXVsdCB2aXJ0dWFsIGFkZHJlc3MgICA9
IDB4MjgKZmF1bHQgY29kZSAgICAgICAgICAgICAgPSBzdXBlcnZpc29yIHJlYWQgZGF0YSwgcGFn
ZSBub3QgcHJlc2VudAppbnN0cnVjdGlvbiBwb2ludGVyICAgICA9IDB4MjA6MHhmZmZmZmZmZjgw
ZGVmYjkzCnN0YWNrIHBvaW50ZXIgICAgICAgICAgID0gMHgyODoweGZmZmZmZjgxMjExMzk1NTAK
ZnJhbWUgcG9pbnRlciAgICAgICAgICAgPSAweDI4OjB4ZmZmZmZmODEyMTEzOTU4MApjb2RlIHNl
Z21lbnQgICAgICAgICAgICA9IGJhc2UgMHgwLCBsaW1pdCAweGZmZmZmLCB0eXBlIDB4MWIKICAg
ICAgICAgICAgICAgICAgICAgICAgID0gRFBMIDAsIHByZXMgMSwgbG9uZyAxLCBkZWYzMiAwLCBn
cmFuIDEKcHJvY2Vzc29yIGVmbGFncyAgICAgICAgPSBpbnRlcnJ1cHQgZW5hYmxlZCwgcmVzdW1l
LCBJT1BMID0gMApjdXJyZW50IHByb2Nlc3MgICAgICAgICA9IDExMTAgKHpwb29sKQp0cmFwIG51
bWJlciAgICAgICAgICAgICA9IDEyCnBhbmljOiBwYWdlIGZhdWx0CmNwdWlkID0gMApLREI6IHN0
YWNrIGJhY2t0cmFjZToKIzAgMHhmZmZmZmZmZjgwNWY0ZTBlIGF0IGtkYl9iYWNrdHJhY2UrMHg1
ZQojMSAweGZmZmZmZmZmODA1YzJkMDcgYXQgcGFuaWMrMHgxODcKIzIgMHhmZmZmZmZmZjgwOGFj
NjAwIGF0IHRyYXBfZmF0YWwrMHgyOTAKIzMgMHhmZmZmZmZmZjgwOGFjOWRmIGF0IHRyYXBfcGZh
dWx0KzB4MjhmCiM0IDB4ZmZmZmZmZmY4MDhhY2ViZiBhdCB0cmFwKzB4M2RmCiM1IDB4ZmZmZmZm
ZmY4MDg5NGZiNCBhdCBjYWxsdHJhcCsweDgKIzYgMHhmZmZmZmZmZjgwZGY2YjU3IGF0IHZkZXZf
bWlycm9yX2NoaWxkX3NlbGVjdCsweDY3CiM3IDB4ZmZmZmZmZmY4MGRmNzBmZSBhdCB2ZGV2X21p
cnJvcl9pb19zdGFydCsweDIzZQojOCAweGZmZmZmZmZmODBlMDgyODcgYXQgemlvX2V4ZWN1dGUr
MHg3NwojOSAweGZmZmZmZmZmODBlMDgzMmQgYXQgemlvX3dhaXQrMHgyZAojMTAgMHhmZmZmZmZm
ZjgwZGJhNjlhIGF0IGFyY19yZWFkX25vbG9jaysweDZiYQojMTEgMHhmZmZmZmZmZjgwZGM2YzIw
IGF0IGRtdV9vYmpzZXRfb3Blbl9pbXBsKzB4ZDAKIzEyIDB4ZmZmZmZmZmY4MGRkNzQwYSBhdCBk
c2xfcG9vbF9vcGVuKzB4NWEKIzEzIDB4ZmZmZmZmZmY4MGRlNTgyMiBhdCBzcGFfbG9hZCsweDM1
MgojMTQgMHhmZmZmZmZmZjgwZGU2M2YzIGF0IHNwYV9vcGVuX2NvbW1vbisweDEzMwojMTUgMHhm
ZmZmZmZmZjgwZTE3YjczIGF0IHpmc19sb2dfaGlzdG9yeSsweDMzCiMxNiAweGZmZmZmZmZmODBl
MTdlY2QgYXQgemZzZGV2X2lvY3RsKzB4YmQKSSd2ZSB1c2VkIFNvbGFyaXMgRXhwcmVzcyAxMSBV
U0IgbGl2ZSBpbWFnZSB0byBpbXBvcnQgLUZmIHRoZSBwb29sICh3aGljaApkaXNjYXJkZWQgNTgg
dHJhbnNhY3Rpb25zKSB0aGVuIGV4cG9ydCBpdCwgYWZ0ZXIgd2hpY2ggRnJlZUJTRCBib290cyAg
CmFuZCB6cG9vbCBpbXBvcnQgc2hvd3M6CnNlbnRyeSMgenBvb2wgaW1wb3J0CiAgIHBvb2w6IHpm
aWxlcwogICAgIGlkOiA3NjI2MzI1MjE2MTQ5MzAwNjYyCiAgc3RhdGU6IFVOQVZBSUwKc3RhdHVz
OiBPbmUgb3IgbW9yZSBkZXZpY2VzIGFyZSBtaXNzaW5nIGZyb20gdGhlIHN5c3RlbS4KYWN0aW9u
OiBUaGUgcG9vbCBjYW5ub3QgYmUgaW1wb3J0ZWQuIEF0dGFjaCB0aGUgbWlzc2luZwogICAgICAg
ICBkZXZpY2VzIGFuZCB0cnkgYWdhaW4uCiAgICBzZWU6IGh0dHA6Ly93d3cuc3VuLmNvbS9tc2cv
WkZTLTgwMDAtNlgKY29uZmlnOgogICAgICAgICB6ZmlsZXMgICAgICBVTkFWQUlMICBtaXNzaW5n
IGRldmljZQogICAgICAgICAgIHJhaWR6MSAgICBPTkxJTkUKICAgICAgICAgICAgIGFkMTQgICAg
T05MSU5FCiAgICAgICAgICAgICBhZDYgICAgIE9OTElORQogICAgICAgICAgICAgYWQxMCAgICBP
TkxJTkUKICAgICAgICAgICAgIGFkMTIgICAgT05MSU5FCiAgICAgICAgIEFkZGl0aW9uYWwgZGV2
aWNlcyBhcmUga25vd24gdG8gYmUgcGFydCBvZiB0aGlzIHBvb2wsIHRob3VnaCB0aGVpcgogICAg
ICAgICBleGFjdCBjb25maWd1cmF0aW9uIGNhbm5vdCBiZSBkZXRlcm1pbmVkLgp6ZGIgb3V0cHV0
IGZvciB0aGF0IHBvb2wgaXM6CnNlbnRyeSMgemRiCnpmaWxlcwogICAgIHZlcnNpb249MTUKICAg
ICB0eGc9MAogICAgIHBvb2xfZ3VpZD03NjI2MzI1MjE2MTQ5MzAwNjYyCiAgICAgdmRldl90cmVl
CiAgICAgICAgIHR5cGU9J3Jvb3QnCiAgICAgICAgIGlkPTAKICAgICAgICAgZ3VpZD03NjI2MzI1
MjE2MTQ5MzAwNjYyCmJhZCBjb25maWcgdHlwZSAxNiBmb3Igc3RhdHMKICAgICAgICAgY2hpbGRy
ZW5bMF0KICAgICAgICAgICAgICAgICB0eXBlPSdyYWlkeicKICAgICAgICAgICAgICAgICBpZD0w
CiAgICAgICAgICAgICAgICAgZ3VpZD02NDM5NTE3NDM1MTI1NjYyNDM3CiAgICAgICAgICAgICAg
ICAgbnBhcml0eT0xCiAgICAgICAgICAgICAgICAgbWV0YXNsYWJfYXJyYXk9MjMKICAgICAgICAg
ICAgICAgICBtZXRhc2xhYl9zaGlmdD0zNQogICAgICAgICAgICAgICAgIGFzaGlmdD05CiAgICAg
ICAgICAgICAgICAgYXNpemU9NDAwMDc5NTU5MDY1NgogICAgICAgICAgICAgICAgIGlzX2xvZz0w
CmJhZCBjb25maWcgdHlwZSAxNiBmb3Igc3RhdHMKICAgICAgICAgICAgICAgICBjaGlsZHJlblsw
XQogICAgICAgICAgICAgICAgICAgICAgICAgdHlwZT0nZGlzaycKICAgICAgICAgICAgICAgICAg
ICAgICAgIGlkPTAKICAgICAgICAgICAgICAgICAgICAgICAgIGd1aWQ9MTYzMjA0NTUxMDE3MTg0
Mzk0NTAKICAgICAgICAgICAgICAgICAgICAgICAgIHBhdGg9Jy9kZXYvYWQxNCcKICAgICAgICAg
ICAgICAgICAgICAgICAgICAKcGh5c19wYXRoPScvcGNpQDAsMC9wY2kxMGRlLDU2M0BjL3BjaTEw
OTUsNzEzMkAwL2Rpc2tAMSwwOnEnCiAgICAgICAgICAgICAgICAgICAgICAgICB3aG9sZV9kaXNr
PTAKICAgICAgICAgICAgICAgICAgICAgICAgIERUTD0xNDYKYmFkIGNvbmZpZyB0eXBlIDE2IGZv
ciBzdGF0cwogICAgICAgICAgICAgICAgIGNoaWxkcmVuWzFdCiAgICAgICAgICAgICAgICAgICAg
ICAgICB0eXBlPSdkaXNrJwogICAgICAgICAgICAgICAgICAgICAgICAgaWQ9MQogICAgICAgICAg
ICAgICAgICAgICAgICAgZ3VpZD05NDMwMjkwNDE0NTg4MDAxNzA4CiAgICAgICAgICAgICAgICAg
ICAgICAgICBwYXRoPScvZGV2L2FkNicKICAgICAgICAgICAgICAgICAgICAgICAgIHBoeXNfcGF0
aD0nL3BjaUAwLDAvcGNpMTA0Myw4MzA4QDkvZGlza0AxLDA6cScKICAgICAgICAgICAgICAgICAg
ICAgICAgIHdob2xlX2Rpc2s9MAogICAgICAgICAgICAgICAgICAgICAgICAgRFRMPTE0NQpiYWQg
Y29uZmlnIHR5cGUgMTYgZm9yIHN0YXRzCiAgICAgICAgICAgICAgICAgY2hpbGRyZW5bMl0KICAg
ICAgICAgICAgICAgICAgICAgICAgIHR5cGU9J2Rpc2snCiAgICAgICAgICAgICAgICAgICAgICAg
ICBpZD0yCiAgICAgICAgICAgICAgICAgICAgICAgICBndWlkPTk4MDc3OTc5MTAxNDExNTU2NzIK
ICAgICAgICAgICAgICAgICAgICAgICAgIHBhdGg9Jy9kZXYvYWQxMCcKICAgICAgICAgICAgICAg
ICAgICAgICAgIHBoeXNfcGF0aD0nL3BjaUAwLDAvcGNpMTA0Myw4MzA4QDkvZGlza0AzLDA6cScK
ICAgICAgICAgICAgICAgICAgICAgICAgIHdob2xlX2Rpc2s9MAogICAgICAgICAgICAgICAgICAg
ICAgICAgRFRMPTE0NApiYWQgY29uZmlnIHR5cGUgMTYgZm9yIHN0YXRzCiAgICAgICAgICAgICAg
ICAgY2hpbGRyZW5bM10KICAgICAgICAgICAgICAgICAgICAgICAgIHR5cGU9J2Rpc2snCiAgICAg
ICAgICAgICAgICAgICAgICAgICBpZD0zCiAgICAgICAgICAgICAgICAgICAgICAgICBndWlkPTE2
Nzc3Mzg5NjkyNjc1NDQwODY5CiAgICAgICAgICAgICAgICAgICAgICAgICBwYXRoPScvZGV2L2Fk
MTInCiAgICAgICAgICAgICAgICAgICAgICAgICAgCnBoeXNfcGF0aD0nL3BjaUAwLDAvcGNpMTBk
ZSw1NjNAYy9wY2kxMDk1LDcxMzJAMC9kaXNrQDAsMDpjJwogICAgICAgICAgICAgICAgICAgICAg
ICAgd2hvbGVfZGlzaz0wCiAgICAgICAgICAgICAgICAgICAgICAgICBEVEw9MTQzCmJhZCBjb25m
aWcgdHlwZSAxNiBmb3Igc3RhdHMKICAgICBuYW1lPSd6ZmlsZXMnCiAgICAgc3RhdGU9MQogICAg
IHRpbWVzdGFtcD0xMzA0MzYzODQ0CiAgICAgaG9zdGlkPTYxOTAwMAogICAgIGhvc3RuYW1lPSdz
b2xhcmlzJwpBbnkgcmVjb21tZW5kYXRpb25zPwpNYWNpZWogU3p1bW9ja2k=


From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 12:56:30 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A8CE3106564A
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 12:56:30 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 686AA8FC1B
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 12:56:30 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-fs@m.gmane.org>) id 1QKrut-0003bl-Uu
	for freebsd-fs@freebsd.org; Fri, 13 May 2011 14:56:27 +0200
Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 14:56:27 +0200
Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 14:56:27 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Johannes Totz <jtotz@imperial.ac.uk>
Date: Fri, 13 May 2011 13:56:15 +0100
Lines: 73
Message-ID: <iqj9pf$mrn$1@dough.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;
	rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
Subject: fusefs broken on 8-stable?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 12:56:30 -0000

Heya!

Using encfs (built on top of fuse) gives me panics in combination with
rsync. Dump didn't succeed. The info below is transcribbled from a
photograph. This is repeatable.
Without dump this is probably not very helpful....


# uname -a
FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Mar 10 23:30:08 GMT
2011     root@XXX:/usr/obj/usr/src/sys/GENERIC  amd64


First panic (top bits scrolled off screen):

trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace
#0 ... kbd_backtrace+0x5c
#1 ... panic+0x1b4
#2 ... trap_fatal+0x394
#3 ... trap_pfault+0x252
#4 ... trap+0x3f4
#5 ... calltrap+0x8
#6 ... fdisp_make+0xe4
#7 ... fuse_lookup+0x1dc
#8 ... VOP_LOOKUP_APV+0x4c
#9 ... at lookup+0x61e
#10 ... at namei+0x592
#11 ... at vn_open_cred+0x339
#12 ... at vn_open+0x1c
#13 ... at kern_openat+0x152
#14 ... at kern_open+0x19
#15 ... at open+0x18
#16 ... at syscallenter+0x2d9
#17 ... at syscall+0x38


Second panic:


code segment = base 0x0, limit 0xfffff, type 0x1b
             = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 17 (vnlru)
trap number = 12
panic: page fault
cpuid = 0
KDB: stack backtrace
#0 ... at kdb_backtrace+0x5c
#1 ... at panic+0x1b4
#2 ... at trap_fatal+0x394
#3 ... at trap_pfault0x252
#4 ... at trap+0x3f4
#5 ... at calltrap+0x8
#6 ... at fdisp_make_pid+0xc7
#7 ... at fuse_send_forget+0x44
#8 ... at fuse_recyc_backend+0xb2
#9 ... at VOP_RECLAIM_APV+0x49
#10 ... at vgonel+0x1b7
#11 ... at vnlru_proc+0x591
#12 ... at fork_exit+0x121
#13 ... at fork_trampoline+0xe


Any idea what could be going on?


Johannes


From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 13:42:12 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B8C57106566B
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 13:42:12 +0000 (UTC)
	(envelope-from gleb.kurtsou@gmail.com)
Received: from mail-ww0-f50.google.com (mail-ww0-f50.google.com [74.125.82.50])
	by mx1.freebsd.org (Postfix) with ESMTP id 4AA5B8FC08
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 13:42:11 +0000 (UTC)
Received: by wwc33 with SMTP id 33so2826136wwc.31
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 06:42:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:date:from:to:cc:subject:message-id:references
	:mime-version:content-type:content-disposition:in-reply-to
	:user-agent; bh=Di4youyES9DGM0AoklWBIrzZZ3dXki4gM+jH5b0IjNQ=;
	b=Eketko0B4eKgByUG+CYk1ti9dUckbMRBK/JcSVHMNI4krA3wZC8hZtJ9URz+aVnDYx
	Igc8+gghyraqhHDJSBa6khDVGbsDVD1uK+gO4fJhCJzVDdtP9oYmz2L/jUpxZbT9Srnb
	WiCZoXvQ3zuXPeFoTi+4mlD3jY+HxWq1BVgwA=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=date:from:to:cc:subject:message-id:references:mime-version
	:content-type:content-disposition:in-reply-to:user-agent;
	b=vCes9HUCQrkSnLbQ+ezk320PTNs2d9nqzAhyo7oXp4b6z9qRjS8nQG6he2BBXCg6dw
	LZ7wpCa9lFJi8uHnEtEBhLHjCpkC5B3bDFSM5cMuJOdj5L3zfnJ3quWUP4K0XseMFA8d
	48xlLWxeOitIuxXdXEKfhlp9ibyz1Vx9Qjsdk=
Received: by 10.227.12.1 with SMTP id v1mr174331wbv.83.1305292793320;
	Fri, 13 May 2011 06:19:53 -0700 (PDT)
Received: from localhost (lan-78-157-92-5.vln.skynet.lt [78.157.92.5])
	by mx.google.com with ESMTPS id h11sm1388596wbc.9.2011.05.13.06.19.52
	(version=SSLv3 cipher=OTHER); Fri, 13 May 2011 06:19:52 -0700 (PDT)
Date: Fri, 13 May 2011 16:19:03 +0300
From: Gleb Kurtsou <gleb.kurtsou@gmail.com>
To: Johannes Totz <jtotz@imperial.ac.uk>
Message-ID: <20110513131902.GA34738@tops>
References: <iqj9pf$mrn$1@dough.gmane.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <iqj9pf$mrn$1@dough.gmane.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-fs@freebsd.org
Subject: Re: fusefs broken on 8-stable?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 13:42:12 -0000

On (13/05/2011 13:56), Johannes Totz wrote:
> Heya!
> 
> Using encfs (built on top of fuse) gives me panics in combination with
> rsync. Dump didn't succeed. The info below is transcribbled from a
> photograph. This is repeatable.
> Without dump this is probably not very helpful....

As far as I know there is memory corruption. But this particular case
looks like VFS bug in fuse.

I'd appreciate if you give native FreeBSD kernel level cryptographic
filesystem PEFS a try:
http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/156002 -- port
http://wiki.freebsd.org/PEFS
https://github.com/glk/pefs


> 
> 
> # uname -a
> FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Mar 10 23:30:08 GMT
> 2011     root@XXX:/usr/obj/usr/src/sys/GENERIC  amd64
> 
> 
> 
> First panic (top bits scrolled off screen):
> 
> trap number = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace
> #0 ... kbd_backtrace+0x5c
> #1 ... panic+0x1b4
> #2 ... trap_fatal+0x394
> #3 ... trap_pfault+0x252
> #4 ... trap+0x3f4
> #5 ... calltrap+0x8
> #6 ... fdisp_make+0xe4
> #7 ... fuse_lookup+0x1dc
> #8 ... VOP_LOOKUP_APV+0x4c
> #9 ... at lookup+0x61e
> #10 ... at namei+0x592
> #11 ... at vn_open_cred+0x339
> #12 ... at vn_open+0x1c
> #13 ... at kern_openat+0x152
> #14 ... at kern_open+0x19
> #15 ... at open+0x18
> #16 ... at syscallenter+0x2d9
> #17 ... at syscall+0x38
> 
> 
> 
> Second panic:
> 
> 
> code segment = base 0x0, limit 0xfffff, type 0x1b
>              = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 17 (vnlru)
> trap number = 12
> panic: page fault
> cpuid = 0
> KDB: stack backtrace
> #0 ... at kdb_backtrace+0x5c
> #1 ... at panic+0x1b4
> #2 ... at trap_fatal+0x394
> #3 ... at trap_pfault0x252
> #4 ... at trap+0x3f4
> #5 ... at calltrap+0x8
> #6 ... at fdisp_make_pid+0xc7
> #7 ... at fuse_send_forget+0x44
> #8 ... at fuse_recyc_backend+0xb2
> #9 ... at VOP_RECLAIM_APV+0x49
> #10 ... at vgonel+0x1b7
> #11 ... at vnlru_proc+0x591
> #12 ... at fork_exit+0x121
> #13 ... at fork_trampoline+0xe
> 
> 
> 
> Any idea what could be going on?
> 
> 
> Johannes
> 
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 14:13:52 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 67DCF1065673
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 14:13:52 +0000 (UTC)
	(envelope-from bfriesen@simple.dallas.tx.us)
Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74])
	by mx1.freebsd.org (Postfix) with ESMTP id 0EA6A8FC15
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 14:13:51 +0000 (UTC)
Received: from freddy.simplesystems.org (freddy.simplesystems.org
	[65.66.246.65])
	by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id
	p4DEDoDQ022145; Fri, 13 May 2011 09:13:50 -0500 (CDT)
Date: Fri, 13 May 2011 09:13:50 -0500 (CDT)
From: Bob Friesenhahn <bfriesen@simple.dallas.tx.us>
X-X-Sender: bfriesen@freddy.simplesystems.org
To: Freddie Cash <fjwcash@gmail.com>
In-Reply-To: <BANLkTi=b+wA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com>
Message-ID: <alpine.GSO.2.01.1105130848030.20825@freddy.simplesystems.org>
References: <1700693186.266759.1305241371736.JavaMail.root@erie.cs.uoguelph.ca>
	<alpine.GSO.2.01.1105121805500.8019@freddy.simplesystems.org>
	<BANLkTi=b+wA-ADup9SQvykexJBZwjK9WZw@mail.gmail.com>
User-Agent: Alpine 2.01 (GSO 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2
	(blade.simplesystems.org [65.66.246.90]);
	Fri, 13 May 2011 09:13:50 -0500 (CDT)
Cc: freebsd-fs@freebsd.org
Subject: Re: ZFS: How to enable cache and logs.
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 14:13:52 -0000

On Thu, 12 May 2011, Freddie Cash wrote:
>>
>> Zfs would certainly appreciate 128K since that is its default block size.
>
> Note:  the "default block size" is a max block size, not an "every
> block written is this size" setting.  A ZFS filesystem will use any
> power-of-2 size under the block size setting for that filesystem.

Except for file tail blocks, or when compression/encrpytion is used, 
zfs will write full blocks as is configured for the filesystem being 
written to (the current setting when the file was originally created). 
Even with compression/encrpytion enabled, the input (uncompressed) 
data size is the configured block size.  The block needs to be read, 
and (possibly) decompressed, and (possibly) decrypted so that it can 
be checksummed, and any changes made.  The checksum is based on the 
decoded block in order to capture as many potential error cases as 
possible, and so that the zfs "send" stream can use the same 
checksums.

Zfs writes data in large transaction groups ("TXG") which allows it to 
buffer quite a lot of update data (up to 5 seconds worth) before 
anything is actually written.  Even if the application should write 
16kb at a time, zfs is likely to have buffered many times 128kb by the 
time the next TXG is written.

If zfs goes to write a block and the user has supplied less than the 
block size, and the file data has not been accessed for a long time, 
or the system is under memory pressure so the file data is no longer 
cached, then zfs needs to read (which includes checksum validation, 
and possibly decompression and deencryption) the existing block 
content so that it can fill in the gaps since it always writes full 
blocks.  The blocks are written using a Copy On Write ("COW") 
algorithm so that the block is written to a new block location.  If 
the NFS client conveniently sent the data 128K at a time for 
sequential writes then there is a better chance that zfs will be able 
to avoid some heavy lifting.

Bob
-- 
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

From owner-freebsd-fs@FreeBSD.ORG  Fri May 13 21:11:06 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 16D621065670
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 21:11:06 +0000 (UTC)
	(envelope-from freebsd-fs@m.gmane.org)
Received: from lo.gmane.org (lo.gmane.org [80.91.229.12])
	by mx1.freebsd.org (Postfix) with ESMTP id 9EE518FC08
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 21:11:05 +0000 (UTC)
Received: from list by lo.gmane.org with local (Exim 4.69)
	(envelope-from <freebsd-fs@m.gmane.org>) id 1QKzdY-0006ZN-Kr
	for freebsd-fs@freebsd.org; Fri, 13 May 2011 23:11:04 +0200
Received: from ib-jtotz.ib.ic.ac.uk ([155.198.110.220])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 23:11:04 +0200
Received: from jtotz by ib-jtotz.ib.ic.ac.uk with local (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-fs@freebsd.org>; Fri, 13 May 2011 23:11:04 +0200
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-fs@freebsd.org
From: Johannes Totz <jtotz@imperial.ac.uk>
Date: Fri, 13 May 2011 22:10:52 +0100
Lines: 89
Message-ID: <iqk6os$3ui$1@dough.gmane.org>
References: <iqj9pf$mrn$1@dough.gmane.org> <20110513131902.GA34738@tops>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@dough.gmane.org
X-Gmane-NNTP-Posting-Host: ib-jtotz.ib.ic.ac.uk
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB;
	rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10
In-Reply-To: <20110513131902.GA34738@tops>
Subject: Re: fusefs broken on 8-stable?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 13 May 2011 21:11:06 -0000

On 13/05/2011 14:19, Gleb Kurtsou wrote:
> On (13/05/2011 13:56), Johannes Totz wrote:
>> Heya!
>>
>> Using encfs (built on top of fuse) gives me panics in combination with
>> rsync. Dump didn't succeed. The info below is transcribbled from a
>> photograph. This is repeatable.
>> Without dump this is probably not very helpful....
> 
> As far as I know there is memory corruption. But this particular case
> looks like VFS bug in fuse.
> 
> I'd appreciate if you give native FreeBSD kernel level cryptographic
> filesystem PEFS a try:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/156002 -- port
> http://wiki.freebsd.org/PEFS
> https://github.com/glk/pefs

Looks interesting...

I was relying on encfs's reverse-mode though: given a plaintext
directory, it provides an encrypted view on-the-fly which I was
rsync'ing to other file servers.

>> # uname -a
>> FreeBSD XXX 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu Mar 10 23:30:08 GMT
>> 2011     root@XXX:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>>
>>
>> First panic (top bits scrolled off screen):
>>
>> trap number = 12
>> panic: page fault
>> cpuid = 0
>> KDB: stack backtrace
>> #0 ... kbd_backtrace+0x5c
>> #1 ... panic+0x1b4
>> #2 ... trap_fatal+0x394
>> #3 ... trap_pfault+0x252
>> #4 ... trap+0x3f4
>> #5 ... calltrap+0x8
>> #6 ... fdisp_make+0xe4
>> #7 ... fuse_lookup+0x1dc
>> #8 ... VOP_LOOKUP_APV+0x4c
>> #9 ... at lookup+0x61e
>> #10 ... at namei+0x592
>> #11 ... at vn_open_cred+0x339
>> #12 ... at vn_open+0x1c
>> #13 ... at kern_openat+0x152
>> #14 ... at kern_open+0x19
>> #15 ... at open+0x18
>> #16 ... at syscallenter+0x2d9
>> #17 ... at syscall+0x38
>>
>>
>>
>> Second panic:
>>
>>
>> code segment = base 0x0, limit 0xfffff, type 0x1b
>>              = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = interrupt enabled, resume, IOPL = 0
>> current process = 17 (vnlru)
>> trap number = 12
>> panic: page fault
>> cpuid = 0
>> KDB: stack backtrace
>> #0 ... at kdb_backtrace+0x5c
>> #1 ... at panic+0x1b4
>> #2 ... at trap_fatal+0x394
>> #3 ... at trap_pfault0x252
>> #4 ... at trap+0x3f4
>> #5 ... at calltrap+0x8
>> #6 ... at fdisp_make_pid+0xc7
>> #7 ... at fuse_send_forget+0x44
>> #8 ... at fuse_recyc_backend+0xb2
>> #9 ... at VOP_RECLAIM_APV+0x49
>> #10 ... at vgonel+0x1b7
>> #11 ... at vnlru_proc+0x591
>> #12 ... at fork_exit+0x121
>> #13 ... at fork_trampoline+0xe
>>
>>
>>
>> Any idea what could be going on?
>>
>>
>> Johannes


From owner-freebsd-fs@FreeBSD.ORG  Sat May 14 16:18:48 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id AFECA1065673;
	Sat, 14 May 2011 16:18:48 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 873778FC13;
	Sat, 14 May 2011 16:18:48 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EGImXq022117;
	Sat, 14 May 2011 16:18:48 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EGIlte022113;
	Sat, 14 May 2011 16:18:47 GMT (envelope-from jh)
Date: Sat, 14 May 2011 16:18:47 GMT
Message-Id: <201105141618.p4EGIlte022113@freefall.freebsd.org>
To: bas@kompasmedia.nl, jh@FreeBSD.org, freebsd-fs@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/120991: [panic] [ffs] [snapshot] System crashes when
	manipulating fs snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 May 2011 16:18:48 -0000

Synopsis: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots

State-Changed-From-To: feedback->open
State-Changed-By: jh
State-Changed-When: Sat May 14 16:18:47 UTC 2011
State-Changed-Why: 
Feedback received.

http://www.freebsd.org/cgi/query-pr.cgi?pr=120991

From owner-freebsd-fs@FreeBSD.ORG  Sat May 14 16:53:03 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E432E1065784;
	Sat, 14 May 2011 16:53:03 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id BB36E8FC0A;
	Sat, 14 May 2011 16:53:03 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EGr3u6057761;
	Sat, 14 May 2011 16:53:03 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EGr3iP057757;
	Sat, 14 May 2011 16:53:03 GMT (envelope-from jh)
Date: Sat, 14 May 2011 16:53:03 GMT
Message-Id: <201105141653.p4EGr3iP057757@freefall.freebsd.org>
To: bas@kompasmedia.nl, jh@FreeBSD.org, freebsd-fs@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/120991: [panic] [ffs] [snapshot] System crashes when
	manipulating fs snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 May 2011 16:53:04 -0000

Synopsis: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots

State-Changed-From-To: open->feedback
State-Changed-By: jh
State-Changed-When: Sat May 14 16:51:46 UTC 2011
State-Changed-Why: 


http://www.freebsd.org/cgi/query-pr.cgi?pr=120991

From owner-freebsd-fs@FreeBSD.ORG  Sat May 14 16:55:13 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id CF8391065670;
	Sat, 14 May 2011 16:55:13 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 9DC9F8FC14;
	Sat, 14 May 2011 16:55:13 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EGtDEF057957;
	Sat, 14 May 2011 16:55:13 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EGtDbs057953;
	Sat, 14 May 2011 16:55:13 GMT (envelope-from jh)
Date: Sat, 14 May 2011 16:55:13 GMT
Message-Id: <201105141655.p4EGtDbs057953@freefall.freebsd.org>
To: jh@FreeBSD.org, freebsd-fs@FreeBSD.org, jh@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/120991: [panic] [ffs] [snapshot] System crashes when
	manipulating fs snapshots
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 May 2011 16:55:13 -0000

Synopsis: [panic] [ffs] [snapshot] System crashes when manipulating fs snapshots

Responsible-Changed-From-To: freebsd-fs->jh
Responsible-Changed-By: jh
Responsible-Changed-When: Sat May 14 16:55:13 UTC 2011
Responsible-Changed-Why: 
Can you still reproduce this on a supported release?

http://www.freebsd.org/cgi/query-pr.cgi?pr=120991

From owner-freebsd-fs@FreeBSD.ORG  Sat May 14 17:10:55 2011
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
Delivered-To: freebsd-fs@hub.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id A5B36106564A;
	Sat, 14 May 2011 17:10:55 +0000 (UTC) (envelope-from jh@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 7CEEC8FC0C;
	Sat, 14 May 2011 17:10:55 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p4EHAtQ9072923;
	Sat, 14 May 2011 17:10:55 GMT (envelope-from jh@freefall.freebsd.org)
Received: (from jh@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p4EHAtaD072910;
	Sat, 14 May 2011 17:10:55 GMT (envelope-from jh)
Date: Sat, 14 May 2011 17:10:55 GMT
Message-Id: <201105141710.p4EHAtaD072910@freefall.freebsd.org>
To: mjacob@freebsd.org, jh@FreeBSD.org, freebsd-fs@FreeBSD.org
From: jh@FreeBSD.org
Cc: 
Subject: Re: kern/106030: [ufs] [panic] panic in ufs from geom when a dead
	disk is invalidated
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 14 May 2011 17:10:55 -0000

Synopsis: [ufs] [panic] panic in ufs from geom when a dead disk is invalidated

State-Changed-From-To: open->feedback
State-Changed-By: jh
State-Changed-When: Sat May 14 17:10:55 UTC 2011
State-Changed-Why: 
Can you still reproduce this on a supported release?

http://www.freebsd.org/cgi/query-pr.cgi?pr=106030