From owner-freebsd-fs@FreeBSD.ORG Tue May 31 22:57:54 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 40E80106564A for ; Tue, 31 May 2011 22:57:54 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id D6DFE8FC19 for ; Tue, 31 May 2011 22:57:53 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id p4VMvquR001365; Tue, 31 May 2011 17:57:52 -0500 (CDT) Date: Tue, 31 May 2011 17:57:52 -0500 (CDT) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Per von Zweigbergk In-Reply-To: Message-ID: References: User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Tue, 31 May 2011 17:57:52 -0500 (CDT) Cc: freebsd-fs@freebsd.org Subject: Re: Storing revisions of large files using ZFS snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2011 22:57:54 -0000 On Tue, 31 May 2011, Per von Zweigbergk wrote: > > Basically what would happen is that every night after the backup > completes, rsync would be run, synchronizing over the differences > between the synthetic full backup from the previous day. Historic > copies of the full backup images as synchronized by rsync would be > kept using ZFS snapshots. After our retention window closes, I'd > just nuke the oldest snapshots from the server. This is feasable using rsync's "--inplace --no-whole-file" options, and is what I use as part of my daily backup strategy so that each backup cycle only consumes disk space for the changed blocks. It works fantastically. There is of course some risk associated with these rsync options since if rsync fails, the backup file may only be partially updated. Of course you can re-try the rsync a number of times during the backup interval if rsync fails. If the rsync previously failed (or perhaps always), it is necessary to also supply the expensive --checksum option so that rsync will definitely assure that the file content is correct. This option is expensive since it will force all blocks to be read on both sides. > First of all, will ZFS will do copy-on-write on a block level when > it comes to snapshots, or is copy-on-write on ZFS snapshots done on > a whole-file level? It would seem that block-level COW would be > required for this to even have a chance of working. Please note that > I'm not talking about deduplication in ZFS itself, but rather using > snapshots as a means to perform a crude kind of deduplication. ZFS does copy-on-write for file blocks when the file is updated. There is surely some copy-on-write of zfs metadata when a snapshot is taken, but surely vastly less than the amount of data consumed by the file. Remember that filesystem volumes look like files and they snapshot nicely. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/