From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 05:26:04 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C52EC1065692 for ; Fri, 1 Jan 2010 05:26:04 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mail.dannysplace.net (mail.dannysplace.net [80.69.71.124]) by mx1.freebsd.org (Postfix) with ESMTP id 7BEF78FC19 for ; Fri, 1 Jan 2010 05:26:04 +0000 (UTC) Received: from nas.lan ([203.206.171.212] helo=[192.168.10.10]) by mail.dannysplace.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1NQa1R-000J5b-NV; Fri, 01 Jan 2010 15:26:02 +1000 Message-ID: <4B3D875F.8070909@dannysplace.net> Date: Fri, 01 Jan 2010 15:25:51 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: Solon Lutz References: <568624531.20091215163420@pyro.de> <42952D86-6B4D-49A3-8E4F-7A1A53A954C2@spry.com> <957649379.20091216005253@pyro.de> In-Reply-To: <957649379.20091216005253@pyro.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.69 (build at 13-Aug-2009 20:22:24) X-Date: 2010-01-01 15:26:02 X-Connected-IP: 203.206.171.212:60162 X-Message-Linecount: 26 X-Body-Linecount: 12 X-Message-Size: 1006 X-Body-Size: 336 X-Received-Count: 1 X-Recipient-Count: 3 X-Local-Recipient-Count: 3 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Rcpt-To: solon@pyro.de, matt@corp.spry.com, freebsd-fs@freebsd.org X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ferrari.dannysplace.net X-Spam-Level: * X-Spam-Status: No, score=1.7 required=8.0 tests=ALL_TRUSTED,AWL, FH_DATE_PAST_20XX autolearn=disabled version=3.2.5 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mail.dannysplace.net) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 05:26:04 -0000 On 16/12/2009 9:52 AM, Solon Lutz wrote:ost 10TB of data. (No backups - to expensive) > Why do you use JBOD? You can configure a passthrough for all drives, > explicitly degrading the Areca to a dumb sata controller... > > > Actually, with passthrough, you get JBOD plus cache plus battery backup.... Quite a nice setup IMHO. -D From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 06:48:07 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A7B441065698 for ; Fri, 1 Jan 2010 06:48:07 +0000 (UTC) (envelope-from danny@dannysplace.net) Received: from mail.dannysplace.net (mail.dannysplace.net [80.69.71.124]) by mx1.freebsd.org (Postfix) with ESMTP id 5D9AA8FC17 for ; Fri, 1 Jan 2010 06:48:07 +0000 (UTC) Received: from nas.lan ([203.206.171.212] helo=[192.168.10.10]) by mail.dannysplace.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1NQayV-000JIi-BW; Fri, 01 Jan 2010 16:27:03 +1000 Message-ID: <4B3D95AD.8050304@dannysplace.net> Date: Fri, 01 Jan 2010 16:26:53 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: Matt Simerson References: <568624531.20091215163420@pyro.de> <42952D86-6B4D-49A3-8E4F-7A1A53A954C2@spry.com> <957649379.20091216005253@pyro.de> <26F8D203-A923-47D3-9935-BE4BC6DA09B7@corp.spry.com> In-Reply-To: <26F8D203-A923-47D3-9935-BE4BC6DA09B7@corp.spry.com> X-Enigmail-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.69 (build at 13-Aug-2009 20:22:24) X-Date: 2010-01-01 16:27:03 X-Connected-IP: 203.206.171.212:50080 X-Message-Linecount: 36 X-Body-Linecount: 22 X-Message-Size: 1681 X-Body-Size: 946 X-Received-Count: 1 X-Recipient-Count: 3 X-Local-Recipient-Count: 3 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Rcpt-To: matt@corp.spry.com, solon@pyro.de, freebsd-fs@freebsd.org X-SA-Exim-Mail-From: danny@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ferrari.dannysplace.net X-Spam-Level: * X-Spam-Status: No, score=1.8 required=8.0 tests=ALL_TRUSTED,AWL, FH_DATE_PAST_20XX autolearn=disabled version=3.2.5 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mail.dannysplace.net) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 06:48:07 -0000 On 17/12/2009 6:43 AM, Matt Simerson wrote: > Why would I bother? Both ways present each disk to FreeBSD. Based on > my understanding (and an answer received from Areca support), the only > reason I'd bother manually configuring some disks for passthrough is > if I wanted to use some disks in a RAID array and others as raw disks. > Configuring JBOD mode configures ALL the disks on the controller as > passthrough devices. > It's my experience that the statement above is simply not true. I confirmed this with throughput tests as well as talking to Areca (as I understand you also did). JBOD = Give the OS the drive. PassThrough = Give the OS the drive but run it through the cache first. I use passthrough specifically for this reason. If I lose power, then the cache should fix up the writes at restart. You do not have this protection when ZFS has access to the raw devices. Even worse if the devices write cache is turned on. -D From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 15:12:41 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3B7621065693 for ; Fri, 1 Jan 2010 15:12:41 +0000 (UTC) (envelope-from mahlerrd@yahoo.com) Received: from web51002.mail.re2.yahoo.com (web51002.mail.re2.yahoo.com [206.190.38.133]) by mx1.freebsd.org (Postfix) with SMTP id DA91E8FC18 for ; Fri, 1 Jan 2010 15:12:40 +0000 (UTC) Received: (qmail 85934 invoked by uid 60001); 1 Jan 2010 15:12:37 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1262358757; bh=0TnDNaJYCKwfLCzguXgSj50XXkWt4d/3Z7uGSzfZHeI=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=ed96aDppzBMGSgGCaaJKQSwKNmzB4ArKlScWH9pKZlfEjWMieggcovhyv8XcjcPxmFPRnxcLzVh0U8oecwxZrQgpdYjkJP92kgbdO2oic34nvmTZXez0ahx6/gLGksMPNyns5hGgmF6Xglm/5xhrXm2UfdUtpfzrhB9FUFcx6JM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=cfHu3QGd2R1/iauZlHjLHYS+IAvxcBqtU38PTridYkBwpnrJ+ASkj3URlhAB1yjPZTyvlRe9+IW+zvIBXvwtFVvz/EKLACiboWHm7QmCl5Qdpasj7dW9mL9+8dKpV5w9a/FnUQBfK3ujGXHYaIJbsB8iO9F6nzAv/0b6rvYKOU8=; Message-ID: <884434.85467.qm@web51002.mail.re2.yahoo.com> X-YMail-OSG: hdNw9ZAVM1lwt2V8WTVihFU8SoLMKeRKTqzpsV8090O2ahKrqz80SE5K0MMv2y9F3AySaXwhxgHxEiXtRv1nxijwXSrKgNoXFbbG5DWSPo2xV5qo87HGHQlyQoc8CG8Mbq2yFA5HfkfkLYqPY5mrS07iDHQQvEQHJfFKOqxDULAU1rP1pvVPBrplLR4TBVTtuyZH7nPZl78rdiLVHGxloDzjfUqSsQTnxhoN6u3Vow_Qz61CLRlWCr.VCklPaNoLC8YnaCf83P.2k03RueN8OFdQiNj2Y3tI0.0OE7EAhSZ91Wwna2NprfYWQh9FRaK2RqT2Vsldv_H49ujHF0qyFOqpL1fauCZ8iamSXMLrWV23VCRg7uwKs5it1tQQhaiJsyWexA.J.OuxrCSEv1TRzP3eKYdE3z2DWWIp.qiqG20AuQzZf_3LpDFLtCmWTd6hZKFnWbWi2gOIWOEBjC15Vg4dQqOFb.xIIcUSrgAMsy5HJl8SLF11gpB9LsZ5G7TUqAHunPfrWoUKUnyxB0FowrY6aaJOc0nyDGQoi7Gjcxz49ix0nuP4tMZYgbnMUQaofA-- Received: from [174.101.168.4] by web51002.mail.re2.yahoo.com via HTTP; Fri, 01 Jan 2010 07:12:37 PST X-Mailer: YahooMailRC/240.3 YahooMailWebService/0.8.100.260964 References: <368515.86742.qm@web51005.mail.re2.yahoo.com> <489290.49450.qm@web51006.mail.re2.yahoo.com> <4B3C751A.9040909@quip.cz> Date: Fri, 1 Jan 2010 07:12:37 -0800 (PST) From: Richard Mahlerwein To: Miroslav Lachman <000.fbsd@quip.cz> In-Reply-To: <4B3C751A.9040909@quip.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-fs@freebsd.org Subject: Samba + Previous Versions X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Richard Mahlerwein List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 15:12:41 -0000 >From: Miroslav Lachman <000.fbsd@quip.cz> > >Do you see Previous Versions on Samaba shares from FreeBSD? I think you >need some tweaks in smb.conf: > >http://www.edplese.com/samba-with-zfs.html > >http://www.edplese.com/blog/2009/12/02/samba-shadow_copy2-enhancements/ > >http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/VFS.html#id2651813 > >Miroslav Lachman I applied the patch to my 3.3.9 source and did a make deinstall && make reinstall. It compiled OK, but it still isn't working properly overall. Not sure this still belongs in freebsd-fs. I'd be happy to move or repost the thread elsewhere if that should be the case... Here's the relevant portion of my smb.conf [test] comment = Testing ZFS and Snapshots path=/tank root preexec = /usr/bin/snapshot_date.sh tank vfs objects = shadow_copy2 shadow_copy2: snapdir = tank shadow_copy2: sort = desc shadow_copy2: localtime = yes read only = no guest ok = yes For snapdir, I have tried various things; .zfs, /tank, tank, .snap, /tank/snap, /tank/.snap... and another 30 or 40 items. I have no idea what this should be. Anyway, to continue. The preexec is working, as shown by the snapshots with GMT in them (BTW, it's a test VM and I never bothered to set the time zone, so those are GMT tagged but are actually Eastern Time US. :) Cool! curie# zfs list NAME USED AVAIL REFER MOUNTPOINT tank 178K 2.94G 48K /tank tank@snap1 16K - 18K - tank@GMT-2009.12.31-15.49.36 0 - 48K - tank@GMT-2009.12.31-15.54.02 0 - 48K - tank@GMT-2010.01.01-09.50.29 0 - 48K - And a snip of my /var/log/samba/log.smbd says (you'll note an extra non-standard log line, described below) [2010/01/01 09:50:29, 1] smbd/service.c:make_connection_snum(1119) fcp-rich (192.168.1.100) connect to service test initially as user test (uid=1002, gid=1002) (pid 4835) [2010/01/01 09:50:30, 0] modules/vfs_shadow_copy2.c:shadow_copy2_get_shadow_copy2_data(647) shadow:initializing with snapdir (null) [2010/01/01 09:50:30, 0] modules/vfs_shadow_copy2.c:shadow_copy2_get_shadow_copy2_data(651) shadow:snapdir not found for /tank in get_shadow_copy_data [2010/01/01 09:50:30, 0] smbd/nttrans.c:call_nt_transact_ioctl(1867) FSCTL_GET_SHADOW_COPY_DATA: connectpath /tank, failed. [2010/01/01 09:50:39, 1] smbd/service.c:close_cnum(1331) fcp-rich (192.168.1.100) closed connection to service test You'll notice I added one extra DEBUG item around line 647 in my source to tell me what snapdir actually was when it fails. 635 static int shadow_copy2_get_shadow_copy2_data(vfs_handle_struct *handle, 636 files_struct *fsp, 637 SHADOW_COPY_DATA *shadow_copy2_data, 638 bool labels) 639 { 640 SMB_STRUCT_DIR *p; 641 const char *snapdir; 642 SMB_STRUCT_DIRENT *d; 643 TALLOC_CTX *tmp_ctx = talloc_new(handle->data); 644 char *snapshot; 645 646 snapdir = shadow_copy2_find_snapdir(tmp_ctx, handle); 647 DEBUG(0,("shadow:initializing with snapdir %s\n", snapdir)); 648 649 if (snapdir == NULL) { 650 DEBUG(0,("shadow:snapdir not found for %s in get_shadow_copy_data\n", 651 handle->conn->connectpath)); So, question 1. What do I put for snapdir? What else am I missing? I'm sure it's something simple I've overlooked. Thanks! Rich Mahlerwein From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 16:56:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9B426106566C for ; Fri, 1 Jan 2010 16:56:47 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 617EE8FC1C for ; Fri, 1 Jan 2010 16:56:46 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o01GuLUQ017035; Fri, 1 Jan 2010 10:56:21 -0600 (CST) Date: Fri, 1 Jan 2010 10:56:21 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Danny Carroll In-Reply-To: <4B3D95AD.8050304@dannysplace.net> Message-ID: References: <568624531.20091215163420@pyro.de> <42952D86-6B4D-49A3-8E4F-7A1A53A954C2@spry.com> <957649379.20091216005253@pyro.de> <26F8D203-A923-47D3-9935-BE4BC6DA09B7@corp.spry.com> <4B3D95AD.8050304@dannysplace.net> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Fri, 01 Jan 2010 10:56:21 -0600 (CST) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 16:56:47 -0000 On Fri, 1 Jan 2010, Danny Carroll wrote: > > You do not have this protection when ZFS has access to the raw devices. > Even worse if the devices write cache is turned on. This statement does not appear to be true. ZFS will always request that devices flush their cache. The only time there is no "protection" is if the device ignores that flush request and the cache is volatile. Controller battery-backed RAM is useful since the controller can respond to the cache flush request once the data is in battery-backed RAM, thereby dramatically improving write latencies for small writes Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 19:52:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 75C891065696 for ; Fri, 1 Jan 2010 19:52:38 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id D4D268FC18 for ; Fri, 1 Jan 2010 19:52:37 +0000 (UTC) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o01JqAR9066589 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 1 Jan 2010 20:52:10 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by cicely5.cicely.de (8.14.2/8.14.2) with ESMTP id o01Jq1Iu053805 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jan 2010 20:52:01 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o01Jq1QM023169; Fri, 1 Jan 2010 20:52:01 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o01Jpu29023168; Fri, 1 Jan 2010 20:51:56 +0100 (CET) (envelope-from ticso) Date: Fri, 1 Jan 2010 20:51:56 +0100 From: Bernd Walter To: Bob Friesenhahn Message-ID: <20100101195155.GV43739@cicely7.cicely.de> References: <568624531.20091215163420@pyro.de> <42952D86-6B4D-49A3-8E4F-7A1A53A954C2@spry.com> <957649379.20091216005253@pyro.de> <26F8D203-A923-47D3-9935-BE4BC6DA09B7@corp.spry.com> <4B3D95AD.8050304@dannysplace.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.5 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=-1.285, BAYES_00=-2.599, FH_DATE_PAST_20XX=3.188 autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd.cicely.de Cc: freebsd-fs@freebsd.org, Danny Carroll Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 19:52:38 -0000 On Fri, Jan 01, 2010 at 10:56:21AM -0600, Bob Friesenhahn wrote: > On Fri, 1 Jan 2010, Danny Carroll wrote: > > > >You do not have this protection when ZFS has access to the raw devices. > >Even worse if the devices write cache is turned on. > > This statement does not appear to be true. ZFS will always request > that devices flush their cache. The only time there is no > "protection" is if the device ignores that flush request and the cache > is volatile. Controller battery-backed RAM is useful since the > controller can respond to the cache flush request once the data is in > battery-backed RAM, thereby dramatically improving write latencies for > small writes Which - if it is true for the controller - can be dangerous. A battery backed cache is volatile if the system is going down for a long time. Or consider the system is going down to relocate the disks to a new machine, or just to a newer controller? -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 20:33:47 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 32D9D1065672 for ; Fri, 1 Jan 2010 20:33:47 +0000 (UTC) (envelope-from numisemis@yahoo.com) Received: from web112405.mail.gq1.yahoo.com (web112405.mail.gq1.yahoo.com [98.137.26.137]) by mx1.freebsd.org (Postfix) with SMTP id 012E78FC14 for ; Fri, 1 Jan 2010 20:33:46 +0000 (UTC) Received: (qmail 88803 invoked by uid 60001); 1 Jan 2010 20:06:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1262376417; bh=2N6gM5ZEM9rb418nV2kIyFQrZMZnYh9oeD/akhYq/NQ=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=fANy8f9ddKKFd3hsBbO5sOsex+9Wi3K6ru90VcC+WNbuvrjCJfJbWUmYIS03J0fKMXKqaAfO5QV2uN99UHOt/oime3Qn6487S7tAN/SbMEWgpDYDgeC6vEEu+HBshvj3uUrVPFv54BwpY8t1qIwB7uENEPAO20dSiVi1X/yR6SM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Wv7St4Vga00po3vsWKQvXvVUQoWxEFKi+/mzmQiFr7d9cfcevHbZuDbzTLgorb0n9VujuqGqT8bMGGSYsdLE1WPnfPcRW8tE/x2Vi1H8GXHWoKjkHAn29XKk80jTgt6MrKaS1OxGCZTNBtyNc080P76cWs0Fp8sPbd55rOIkRW4=; Message-ID: <55389.88569.qm@web112405.mail.gq1.yahoo.com> X-YMail-OSG: 93BjT.gVM1leEIxe88uXv4Z2Gh_sJNZCytYWlD_LEbayd5Qx6klRM.e6S6mVtg7gEi3whZO6lrG.YrURnkRtFvJHPgEriCfamPdHOAYnwCUL_tWPMthFBDuMV6DtfTIWxrFRLthCTIq_VrvH5J_iOF4L6r7TBWks.yRrygu8oeHHFxMxKxTnBg3Qp3HLCKrUrHuxFlieyjF5F1SWGq7Ko7m6lKeTWanK8JTCq5x2HugAqc73LCRP_eRjnF.hGYgKYP38a146.84z_LVU2hQ1dHyUbVybCzVVfEMH7wqGwTkUz_nBB0SeO8QYyRTTvZkwHMq.sDvirHBVZ4QsqqElLU2aGEtsFwFPlfeoL_uf0g8FIqU6KKb2Oa_r_RWpKq7VAr._.iDni5LbM1S2wpflNdNfdA7stuWBGMmyXYA- Received: from [87.248.121.166] by web112405.mail.gq1.yahoo.com via HTTP; Fri, 01 Jan 2010 12:06:56 PST X-Mailer: YahooMailWebService/0.8.100.260964 Date: Fri, 1 Jan 2010 12:06:56 -0800 (PST) From: =?utf-8?B?xaBpbXVuIE1pa2VjaW4=?= To: "ticso@cicely.de" MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-fs@freebsd.org" , Danny Carroll Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 20:33:47 -0000 =0A1. sij. 2010., u 20:51, Bernd Walter napisao:= =0A=0AOn Fri, Jan 01, 2010 at 10:56:21AM -0600, Bob Friesenhahn wrote:=0AOn= Fri, 1 Jan 2010, Danny Carroll wrote:=0A=0AYou do not have this protection= when ZFS has access to the raw devices.=0AEven worse if the devices write = cache is turned on.=0A=0AThis statement does not appear to be true. ZFS wi= ll always request =0Athat devices flush their cache. The only time there i= s no =0A"protection" is if the device ignores that flush request and the ca= che =0Ais volatile. Controller battery-backed RAM is useful since the =0Ac= ontroller can respond to the cache flush request once the data is in =0Abat= tery-backed RAM, thereby dramatically improving write latencies for =0Asmal= l writes=0A=0AWhich - if it is true for the controller - can be dangerous.= =0AA battery backed cache is volatile if the system is going down for=0Aa l= ong time.=0AOr consider the system is going down to relocate the disks to a= new=0Amachine, or just to a newer controller?=0A=0A=0AIf you are using amr= driver then FreeBSD will flush cache during shutdown. Haven't tried other = drivers myself, but I suppose they also do the same.=0AYou can have a probl= em only on unclean shutdown after which you didn't reboot (nobody does that= willingly).=0A=0A=0A From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 20:48:03 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B75BA106566B for ; Fri, 1 Jan 2010 20:48:03 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 227588FC1F for ; Fri, 1 Jan 2010 20:48:02 +0000 (UTC) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o01Kltw2073945 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 1 Jan 2010 21:47:56 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by cicely5.cicely.de (8.14.2/8.14.2) with ESMTP id o01KlrFf055992 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jan 2010 21:47:53 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o01Klrwc023409; Fri, 1 Jan 2010 21:47:53 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o01KlrFI023408; Fri, 1 Jan 2010 21:47:53 +0100 (CET) (envelope-from ticso) Date: Fri, 1 Jan 2010 21:47:53 +0100 From: Bernd Walter To: ??imun Mikecin Message-ID: <20100101204752.GW43739@cicely7.cicely.de> References: <55389.88569.qm@web112405.mail.gq1.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55389.88569.qm@web112405.mail.gq1.yahoo.com> X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.4 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=-1.168, BAYES_00=-2.599, FH_DATE_PAST_20XX=3.188 autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd.cicely.de Cc: "freebsd-fs@freebsd.org" , "ticso@cicely.de" , Danny Carroll Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 20:48:03 -0000 On Fri, Jan 01, 2010 at 12:06:56PM -0800, ??imun Mikecin wrote: > > 1. sij. 2010., u 20:51, Bernd Walter napisao: > > On Fri, Jan 01, 2010 at 10:56:21AM -0600, Bob Friesenhahn wrote: > On Fri, 1 Jan 2010, Danny Carroll wrote: > > You do not have this protection when ZFS has access to the raw devices. > Even worse if the devices write cache is turned on. > > This statement does not appear to be true. ZFS will always request > that devices flush their cache. The only time there is no > "protection" is if the device ignores that flush request and the cache > is volatile. Controller battery-backed RAM is useful since the > controller can respond to the cache flush request once the data is in > battery-backed RAM, thereby dramatically improving write latencies for > small writes > > Which - if it is true for the controller - can be dangerous. > A battery backed cache is volatile if the system is going down for > a long time. > Or consider the system is going down to relocate the disks to a new > machine, or just to a newer controller? > > > If you are using amr driver then FreeBSD will flush cache during shutdown. Haven't tried other drivers myself, but I suppose they also do the same. > You can have a problem only on unclean shutdown after which you didn't reboot (nobody does that willingly). Everyone do this if the board dies and needs replacement. Not willingly, but it happens. And what about zfs export - relocate disks to another machine - and zfs import - without halt? It is less safe if a cache flush won't flush its cache. The real purpose to have buffered cache is to handle asyncronity in RAID systems after power failure, but RAIDZ won't have this problem by design, at least if running with CRC enabled. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 21:45:28 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19ECA106566B for ; Fri, 1 Jan 2010 21:45:28 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id D46358FC17 for ; Fri, 1 Jan 2010 21:45:27 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o01LjJgD019402; Fri, 1 Jan 2010 15:45:19 -0600 (CST) Date: Fri, 1 Jan 2010 15:45:19 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "ticso@cicely.de" In-Reply-To: <20100101204752.GW43739@cicely7.cicely.de> Message-ID: References: <55389.88569.qm@web112405.mail.gq1.yahoo.com> <20100101204752.GW43739@cicely7.cicely.de> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Fri, 01 Jan 2010 15:45:20 -0600 (CST) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 21:45:28 -0000 On Fri, 1 Jan 2010, Bernd Walter wrote: > > Everyone do this if the board dies and needs replacement. > Not willingly, but it happens. > And what about zfs export - relocate disks to another machine - and > zfs import - without halt? > It is less safe if a cache flush won't flush its cache. > The real purpose to have buffered cache is to handle asyncronity in > RAID systems after power failure, but RAIDZ won't have this problem > by design, at least if running with CRC enabled. A proper write-through cache should automatically commit itself (in order) to backing store within a second or two. Other than cache designs which are not "proper" (which we should not use) the main concern is if the system loses power or crashes while it is producing a significant write load so that there is uncomitted data in cache. ZFS is not particularly more likely to lose user data, but it is much more likely to detect and report loss since most other filesystems don't even check, or even have a way to check. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 22:22:57 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A0C2106566B for ; Fri, 1 Jan 2010 22:22:57 +0000 (UTC) (envelope-from fbsd@dannysplace.net) Received: from mail.dannysplace.net (mail.dannysplace.net [80.69.71.124]) by mx1.freebsd.org (Postfix) with ESMTP id E28788FC14 for ; Fri, 1 Jan 2010 22:22:56 +0000 (UTC) Received: from nas.lan ([203.206.171.212] helo=[192.168.10.10]) by mail.dannysplace.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1NQptV-000NXW-UZ; Sat, 02 Jan 2010 08:22:54 +1000 Message-ID: <4B3E75B4.6040505@dannysplace.net> Date: Sat, 02 Jan 2010 08:22:44 +1000 From: Danny Carroll User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.5) Gecko/20091204 Thunderbird/3.0 MIME-Version: 1.0 To: Bob Friesenhahn References: <568624531.20091215163420@pyro.de> <42952D86-6B4D-49A3-8E4F-7A1A53A954C2@spry.com> <957649379.20091216005253@pyro.de> <26F8D203-A923-47D3-9935-BE4BC6DA09B7@corp.spry.com> <4B3D95AD.8050304@dannysplace.net> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated-User: danny X-Authenticator: plain X-Sender-Verify: SUCCEEDED (sender exists & accepts mail) X-Exim-Version: 4.69 (build at 13-Aug-2009 20:22:24) X-Date: 2010-01-02 08:22:54 X-Connected-IP: 203.206.171.212:53204 X-Message-Linecount: 41 X-Body-Linecount: 27 X-Message-Size: 1993 X-Body-Size: 1126 X-Received-Count: 1 X-Recipient-Count: 3 X-Local-Recipient-Count: 3 X-Local-Recipient-Defer-Count: 0 X-Local-Recipient-Fail-Count: 0 X-SA-Exim-Connect-IP: 203.206.171.212 X-SA-Exim-Rcpt-To: bfriesen@simple.dallas.tx.us, matt@corp.spry.com, freebsd-fs@freebsd.org X-SA-Exim-Mail-From: fbsd@dannysplace.net X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ferrari.dannysplace.net X-Spam-Level: * X-Spam-Status: No, score=1.7 required=8.0 tests=ALL_TRUSTED,AWL, FH_DATE_PAST_20XX autolearn=disabled version=3.2.5 X-SA-Exim-Version: 4.2 X-SA-Exim-Scanned: Yes (on mail.dannysplace.net) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: fbsd@dannysplace.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 22:22:57 -0000 On 2/01/2010 2:56 AM, Bob Friesenhahn wrote: > On Fri, 1 Jan 2010, Danny Carroll wrote: >> >> You do not have this protection when ZFS has access to the raw devices. >> Even worse if the devices write cache is turned on. > > This statement does not appear to be true. ZFS will always request > that devices flush their cache. The only time there is no > "protection" is if the device ignores that flush request and the cache > is volatile. Controller battery-backed RAM is useful since the > controller can respond to the cache flush request once the data is in > battery-backed RAM, thereby dramatically improving write latencies for > small writes > Yeah I should have been more clear on that. When this happens on the controller, it can be an issue if the controller decides not to commit the data to disk (as others have pointed out). Last time I looked into this I think I read that some controllers will flush to disk, some will simply be "OK" once the write to cache has occurred. Given that the BBU is optional on most array cards anyway, I never understood why a vendor would chose *not* to flush to disk. -D From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 22:39:54 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 369FD106566B for ; Fri, 1 Jan 2010 22:39:54 +0000 (UTC) (envelope-from ticso@cicely7.cicely.de) Received: from raven.bwct.de (raven.bwct.de [85.159.14.73]) by mx1.freebsd.org (Postfix) with ESMTP id 975728FC12 for ; Fri, 1 Jan 2010 22:39:51 +0000 (UTC) Received: from cicely5.cicely.de ([10.1.1.7]) by raven.bwct.de (8.13.4/8.13.4) with ESMTP id o01Mdb3r078195 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 1 Jan 2010 23:39:37 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (cicely7.cicely.de [10.1.1.9]) by cicely5.cicely.de (8.14.2/8.14.2) with ESMTP id o01MdTpc059356 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jan 2010 23:39:29 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: from cicely7.cicely.de (localhost [127.0.0.1]) by cicely7.cicely.de (8.14.2/8.14.2) with ESMTP id o01MdPeK023862; Fri, 1 Jan 2010 23:39:29 +0100 (CET) (envelope-from ticso@cicely7.cicely.de) Received: (from ticso@localhost) by cicely7.cicely.de (8.14.2/8.14.2/Submit) id o01Md82i023861; Fri, 1 Jan 2010 23:39:08 +0100 (CET) (envelope-from ticso) Date: Fri, 1 Jan 2010 23:39:08 +0100 From: Bernd Walter To: Bob Friesenhahn Message-ID: <20100101223907.GX43739@cicely7.cicely.de> References: <55389.88569.qm@web112405.mail.gq1.yahoo.com> <20100101204752.GW43739@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD cicely7.cicely.de 7.0-STABLE i386 User-Agent: Mutt/1.5.11 X-Spam-Status: No, score=-2.3 required=5.0 tests=ALL_TRUSTED=-1.8, AWL=-1.071, BAYES_00=-2.599, FH_DATE_PAST_20XX=3.188 autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd.cicely.de Cc: "freebsd-fs@freebsd.org" , "ticso@cicely.de" Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: ticso@cicely.de List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 22:39:54 -0000 On Fri, Jan 01, 2010 at 03:45:19PM -0600, Bob Friesenhahn wrote: > On Fri, 1 Jan 2010, Bernd Walter wrote: > > > >Everyone do this if the board dies and needs replacement. > >Not willingly, but it happens. > >And what about zfs export - relocate disks to another machine - and > >zfs import - without halt? > >It is less safe if a cache flush won't flush its cache. > >The real purpose to have buffered cache is to handle asyncronity in > >RAID systems after power failure, but RAIDZ won't have this problem > >by design, at least if running with CRC enabled. > > A proper write-through cache should automatically commit itself (in > order) to backing store within a second or two. Other than cache > designs which are not "proper" (which we should not use) the main > concern is if the system loses power or crashes while it is producing > a significant write load so that there is uncomitted data in cache. There are many possible reasons why this won't happen. One of them is a simple write failure, which can't be reported back to the filesystem, because not even a cache flush fails. Yes - the risk might be tolerable for many people and I don't think it is very high. The main problem I see is that such controllers won't tell about their strategy, so you are left in the dark. > ZFS is not particularly more likely to lose user data, but it is much > more likely to detect and report loss since most other filesystems > don't even check, or even have a way to check. Agreed. And ZFS can win a lot from fast flushes. -- B.Walter http://www.bwct.de Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm. From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 23:05:35 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 666C91065670 for ; Fri, 1 Jan 2010 23:05:35 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id 2C1C18FC19 for ; Fri, 1 Jan 2010 23:05:34 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.13.8+Sun/8.13.8) with ESMTP id o01N5PjY019933; Fri, 1 Jan 2010 17:05:26 -0600 (CST) Date: Fri, 1 Jan 2010 17:05:25 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: "ticso@cicely.de" In-Reply-To: <20100101223907.GX43739@cicely7.cicely.de> Message-ID: References: <55389.88569.qm@web112405.mail.gq1.yahoo.com> <20100101204752.GW43739@cicely7.cicely.de> <20100101223907.GX43739@cicely7.cicely.de> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Fri, 01 Jan 2010 17:05:26 -0600 (CST) Cc: "freebsd-fs@freebsd.org" Subject: Re: ZFS RaidZ2 with 24 drives? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 23:05:35 -0000 On Fri, 1 Jan 2010, Bernd Walter wrote: > > There are many possible reasons why this won't happen. > One of them is a simple write failure, which can't be reported back > to the filesystem, because not even a cache flush fails. For most RAID systems (and for ZFS) it is best if write failures are tossed because there should be a redundant copy somewhere else. Write failures usually indicate a completely failed disk since modern disks include their own bad-block management. The most important thing for ZFS is that transaction group writes are written in order, as demarcated by transaction group cache sync requests. Otherwise you get a scrambled pool which may require an expert human to untangle. > The main problem I see is that such controllers won't tell about > their strategy, so you are left in the dark. That is unfortunate. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 23:17:38 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3C5A41065692 for ; Fri, 1 Jan 2010 23:17:38 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-yw0-f197.google.com (mail-yw0-f197.google.com [209.85.211.197]) by mx1.freebsd.org (Postfix) with ESMTP id F015C8FC15 for ; Fri, 1 Jan 2010 23:17:37 +0000 (UTC) Received: by ywh35 with SMTP id 35so5082735ywh.7 for ; Fri, 01 Jan 2010 15:17:29 -0800 (PST) Received: by 10.101.189.20 with SMTP id r20mr27615904anp.191.1262387849010; Fri, 01 Jan 2010 15:17:29 -0800 (PST) Received: from ?10.0.1.198? (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id 8sm5427448ywg.19.2010.01.01.15.17.26 (version=SSLv3 cipher=RC4-MD5); Fri, 01 Jan 2010 15:17:27 -0800 (PST) Date: Fri, 1 Jan 2010 13:19:33 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Ben Schumacher In-Reply-To: <9859143f0912292118h44a33961mc8207d9b943a5f1f@mail.gmail.com> Message-ID: References: <32CA2B73-3412-49DD-9401-4773CC73BED0@patpro.net> <4B3283F2.7060804@barryp.org> <3ea87f5f62bb8ba30d798d4605a64c83@localhost> <9859143f0912292118h44a33961mc8207d9b943a5f1f@mail.gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="2547152148-1490549838-1262387976=:1027" Cc: freebsd-fs@freebsd.org Subject: Re: snapshot implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 23:17:38 -0000 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --2547152148-1490549838-1262387976=:1027 Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT On Tue, 29 Dec 2009, Ben Schumacher wrote: > On Sun, Dec 27, 2009 at 6:25 PM, Jeff Roberson wrote: >> It can take some time depending on fs activity on the machine.  There are >> ways to continue to optimize it within the existing infrastructure.  It only >> requires someone willing to expend the time. > > Any idea how complex of a task this is (and how much fruit it would > bear)? I've been interested in dipping my toes into some FreeBSD > kernel work, but I'm not exactly sure where to start. I honestly don't > have tons of free time to work on it (job commitments and all that), > but am curious if this is something that an experienced C programmer > would have a shot at doing having very little experience with > low-level kernel internals. (I'm used to dealing with POSIX interfaces > and not the code that implements them...) > > I've recently picked up a copy of "The Design and Implementation of > the FreeBSD OS", so I'm starting there, but I would love it if anybody > could toss me a hint or two on what some of the low-hanging fruit in > the arena might be. I've been playing with ZFS on a few boxes now, but > I've had (even with FreeBSD 8) enough unusual crashes that I'm > personally not ready to commit to using it on at least one "mission > critical" project I'm working on. That being said I'd love to be able > to do snapshots on the box without it hanging for over an hour due to > the fact that the data drive is >400GB (frankly on the small side for > some of the storage applications I've read about on this mailing > list). > > Any hints, tips, pointers would be appreciated. The daemon book is a good start. I'd say the snapshot problem might be a bit tough right out of the gate but it could be possible if you have strong fundamentals and someone experienced mentors you. Why don't you read a bit of the daemon book and see if you can follow the existing snapshot code to understand how it works. Once you feel like you have a good graps of that email me directly and we'll talk. Kirk and I discussed ways that we could speed it up dramatically by doing copy-on-write of the cgs in the allocation functions. This would take care of the considerable delay that is incurred when making a snapshot. One good first project that would introduce you somewhat to the process of kernel programming would be to add timing instrumentation to the various stages of building a snapshot. This way you can prove where the delay happens. Then you would be familiar with using timers, building custom kernels, and a little bit of the snapshot code. Cheers, Jeff > > Cheers, > Ben > --2547152148-1490549838-1262387976=:1027-- From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 23:21:49 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 925D9106566B for ; Fri, 1 Jan 2010 23:21:49 +0000 (UTC) (envelope-from jroberson@jroberson.net) Received: from mail-gx0-f218.google.com (mail-gx0-f218.google.com [209.85.217.218]) by mx1.freebsd.org (Postfix) with ESMTP id 5394C8FC19 for ; Fri, 1 Jan 2010 23:21:49 +0000 (UTC) Received: by gxk10 with SMTP id 10so13024146gxk.3 for ; Fri, 01 Jan 2010 15:21:37 -0800 (PST) Received: by 10.91.164.22 with SMTP id r22mr4101321ago.64.1262388090012; Fri, 01 Jan 2010 15:21:30 -0800 (PST) Received: from ?10.0.1.198? (udp022762uds.hawaiiantel.net [72.234.79.107]) by mx.google.com with ESMTPS id 14sm7839679gxk.6.2010.01.01.15.21.27 (version=SSLv3 cipher=RC4-MD5); Fri, 01 Jan 2010 15:21:29 -0800 (PST) Date: Fri, 1 Jan 2010 13:23:35 -1000 (HST) From: Jeff Roberson X-X-Sender: jroberson@desktop To: Ronald Klop In-Reply-To: Message-ID: References: <32CA2B73-3412-49DD-9401-4773CC73BED0@patpro.net> <4B3283F2.7060804@barryp.org> <3ea87f5f62bb8ba30d798d4605a64c83@localhost> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-fs@freebsd.org, patpro Subject: Re: snapshot implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 23:21:49 -0000 On Tue, 29 Dec 2009, Ronald Klop wrote: > On Fri, 25 Dec 2009 15:29:53 +0100, patpro wrote: > >> >> On Wed, 23 Dec 2009 14:56:18 -0600, Barry Pederson wrote: >>> "...there's virtually no overhead at all due to the copy-on-write >>> architecture. In fact, sometimes it is faster to take a snapshot rather >>> than free the blocks containing the old data!" >>> >>> That's certainly not the case with UFS snapshots, which can take a long >>> time to complete (we're talking freezing your machine's disk activity >>> for many minutes), and are limited to 20 total. >> >> >> UFS uses copy on write. But you say many minutes to complete? Don't you >> speak about dump(1), that uses snapshot as a basis to dump a live file >> system? >> I agree, UFS snapshot creation is not lightning-fast, but many minutes >> seems a lot to me, and I never experienced such a long creation time. > > As far as I know UFS snapshots need to create a list of currently in use > blocks. This is O(n) on the size of the FS and pauses the FS during the > snapshot. On large FS's this can take a long time. > ZFS always maintains this list so it only needs to mark this list as readonly > to create a snapshot. This is O(1). > This is not quite right. It's the copy of cg blocks that takes so long. cg blocks are limited in size to one filesystem block which means on very large drives there are quite a lot of them. When we create a snapshot we first make a copy of all cg blocks, then we suspend the filesystem and sync it, and then we copy all dirtied cg blocks and unsuspend. We seem to be copying an excessive number of CGs once suspended so there may be a bug there. A relatively straightforward improvement would be to also COW the cg blocks rather than copying them in a seperate step. There's no reason the COW snapshot mechanism can't be fast theoretically. It's just a matter of the practical implementation. Thanks, Jeff > Ronald. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Fri Jan 1 23:36:55 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 477D4106568F for ; Fri, 1 Jan 2010 23:36:55 +0000 (UTC) (envelope-from marco@tols.org) Received: from tols.org (goofy.tols.org [83.163.60.200]) by mx1.freebsd.org (Postfix) with ESMTP id AFA978FC1E for ; Fri, 1 Jan 2010 23:36:54 +0000 (UTC) Received: from donald.home.tols.org (localhost [127.0.0.1]) by donald.home.tols.org (8.14.3/8.14.3) with ESMTP id o01NaqAA001302 for ; Fri, 1 Jan 2010 23:36:52 GMT (envelope-from marco@donald.home.tols.org) Received: (from marco@localhost) by donald.home.tols.org (8.14.3/8.14.3/Submit) id o01NaqlR001301 for freebsd-fs@freebsd.org; Sat, 2 Jan 2010 00:36:52 +0100 (CET) (envelope-from marco) Date: Sat, 2 Jan 2010 00:36:52 +0100 From: Marco van Tol To: freebsd-fs@freebsd.org Message-ID: <20100101233652.GA1111@donald.home.tols.org> Mail-Followup-To: freebsd-fs@freebsd.org References: <20091228225228.GA1114@donald.home.tols.org> <20091229012443.GF43739@cicely7.cicely.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20091229012443.GF43739@cicely7.cicely.de> User-Agent: Mutt/1.4.2.3i Subject: Re: zfs sharenfs to multiple subnets - found a dirty looking hack X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jan 2010 23:36:55 -0000 On Tue, Dec 29, 2009 at 02:24:44AM +0100, Bernd Walter wrote: > On Mon, Dec 28, 2009 at 11:52:28PM +0100, Marco van Tol wrote: > > Hi there, > > > > I would like to refer to a thread in this list about zfs exporting to > > multiple subnets using sharenfs. The thread I mean is this one: > > http://lists.freebsd.org/pipermail/freebsd-fs/2008-September/005158.html > > I wasn't subscribed at the time, so I'm just referencing to the thread. > > > > I was testing and needed to also export a filesystem to multiple subnets, > > and found something out that may or may not be allowed. > > > > What happens is you start to type > > zfs set sharenfs=" > > and don't close the double quote. The result on the following lines will > > literally make it to /etc/zfs/exports, and make it work as desired. > > > > A full session would look like: > > (Bear with me for typo's, I didn't copy-paste) > > > > zfs set sharenfs="-maproot=root -network 10.0.0.0/24 > > > /path/to/mountpoint -maproot=root -network 192.168.0.0/24 > > > /path/to/mountpoint -maproot=root -network 172.16.0.0/24" pool0/space > > > > This translates to an /etc/zfs/exports like: > > ----< cut here >---- > > /path/to/mountpoint -maproot=root -network=10.0.0.0/24 > > /path/to/mountpoint -maproot=root -network=192.168.0.0/24 > > /path/to/mountpoint -maproot=root -network=172.16.0.0/24 > > ----< cut here >---- > > > > The resulting "zfs get sharenfs" looks like: > > ----< cut here >---- > > pool0/space sharenfs -maproot=root -network=10.0.0.0/24 > > /path/to/mountpoint -maproot=root -network=192.168.0.0/24 > > /path/to/mountpoint -maproot=root -network=172.16.0.0/24 local > > ----< cut here >---- > > > > This all makes it work so that it exports the pool to multiple subnets, > > possibly with their own properties. > > > > Question is however, how desirable is it that this works? ;-) > > The really cool thing about using zfs property instead of manual > exports line is the ability to inherit export options, but since > you need to write the path into the argument... True. > It would be nice however if the export-file creator would parse the > newline and automatically prepend the path to each line. Mm well, to be really honest, it would be nice to be able to export to multiple subnets through the use of zfs properties in a documented and supported way. :-) For all those experimenting with the hack I wrote above, make sure you write the /path/to/mountpoint without typo's. If you do for starters it won't have the desired effect, and for two if you change the property again your /etc/zfs/exports file is screwed. Just thought I should mention it. ;-) Marco -- Nothing takes the past away like the future - Madonna in Nothing Really Matters From owner-freebsd-fs@FreeBSD.ORG Sat Jan 2 06:21:44 2010 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6FAA9106566B for ; Sat, 2 Jan 2010 06:21:44 +0000 (UTC) (envelope-from mckusick@mckusick.com) Received: from chez.mckusick.com (chez.mckusick.com [64.81.247.49]) by mx1.freebsd.org (Postfix) with ESMTP id 304AB8FC1B for ; Sat, 2 Jan 2010 06:21:44 +0000 (UTC) Received: from chez.mckusick.com (localhost [127.0.0.1]) by chez.mckusick.com (8.14.3/8.14.3) with ESMTP id o026LXac038508; Fri, 1 Jan 2010 22:21:33 -0800 (PST) (envelope-from mckusick@chez.mckusick.com) Message-Id: <201001020621.o026LXac038508@chez.mckusick.com> To: Ben Schumacher In-reply-to: Date: Fri, 01 Jan 2010 22:21:33 -0800 From: Kirk McKusick Cc: freebsd-fs@freebsd.org, Jeff Roberson Subject: Re: snapshot implementation X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jan 2010 06:21:44 -0000 > Date: Fri, 1 Jan 2010 13:19:33 -1000 (HST) > From: Jeff Roberson > To: Ben Schumacher > Cc: freebsd-fs@freebsd.org > Subject: Re: snapshot implementation > > On Tue, 29 Dec 2009, Ben Schumacher wrote: > > > I've recently picked up a copy of "The Design and Implementation of > > the FreeBSD OS", so I'm starting there, but I would love it if anybody > > could toss me a hint or two on what some of the low-hanging fruit in > > the arena might be. I've been playing with ZFS on a few boxes now, but > > I've had (even with FreeBSD 8) enough unusual crashes that I'm > > personally not ready to commit to using it on at least one "mission > > critical" project I'm working on. That being said I'd love to be able > > to do snapshots on the box without it hanging for over an hour due to > > the fact that the data drive is >400GB (frankly on the small side for > > some of the storage applications I've read about on this mailing > > list). > > > > Any hints, tips, pointers would be appreciated. > > > > Cheers, > > Ben > > The daemon book is a good start. I'd say the snapshot problem might be a > bit tough right out of the gate but it could be possible if you have > strong fundamentals and someone experienced mentors you. > > Why don't you read a bit of the daemon book and see if you can follow the > existing snapshot code to understand how it works. Once you feel like you > have a good graps of that email me directly and we'll talk. Kirk and I > discussed ways that we could speed it up dramatically by doing > copy-on-write of the cgs in the allocation functions. This would take > care of the considerable delay that is incurred when making a snapshot. > > One good first project that would introduce you somewhat to the process of > kernel programming would be to add timing instrumentation to the various > stages of building a snapshot. This way you can prove where the delay > happens. Then you would be familiar with using timers, building custom > kernels, and a little bit of the snapshot code. > > Cheers, > Jeff I would love to see the speedup work done on taking snapshots, and Jeff's suggested approach to getting started on that project is excellent. However it is not a simple or easy project. As such you might find yourself getting frustrated. But if you feel up to the task, please do try it out. An alternative and somewhat simpler project which I would also like to see done would be to take a set of snapshots and instead of presenting them as a separate tree would be to provide them in a .snapshot directory located in each directory of the live filesystem (as done by network appliance and I believe ZFS). The project is simpler because most of the code already exists in the union filesystem. This project would start from the existing union filesystem code using it to place each snaphot "under" the live filesystem, then fiddling with the name lookup function to make it place each file in a .snapshot subdirectory under each live directory rather than by putting each file in the live directory itself (where it would be obscured by the live file of the same name). Kirk McKusick