From owner-freebsd-fs@FreeBSD.ORG Tue May 31 20:42:43 2011 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 17546106566C for ; Tue, 31 May 2011 20:42:43 +0000 (UTC) (envelope-from pvz@itassistans.se) Received: from zcs1.itassistans.net (zcs1.itassistans.net [212.112.191.37]) by mx1.freebsd.org (Postfix) with ESMTP id C39D18FC0C for ; Tue, 31 May 2011 20:42:42 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zcs1.itassistans.net (Postfix) with ESMTP id A7003C0222 for ; Tue, 31 May 2011 22:42:40 +0200 (CEST) X-Virus-Scanned: amavisd-new at zcs1.itassistans.net Received: from zcs1.itassistans.net ([127.0.0.1]) by localhost (zcs1.itassistans.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rXo8ZNr2wbNW for ; Tue, 31 May 2011 22:42:38 +0200 (CEST) Received: from [192.168.1.239] (c213-89-160-61.bredband.comhem.se [213.89.160.61]) by zcs1.itassistans.net (Postfix) with ESMTPSA id 40A04C01FC for ; Tue, 31 May 2011 22:42:38 +0200 (CEST) From: Per von Zweigbergk Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Tue, 31 May 2011 22:42:37 +0200 Message-Id: To: freebsd-fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Subject: Storing revisions of large files using ZFS snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 May 2011 20:42:43 -0000 I'm currently looking at the option of using a FreeBSD server using ZFS = to store offsite backups. The primary backup product used (Veeam Backup & Replication) stores its = backups in what's called reverse-incremental mode. Basically, this means = storing backups as a huge VBK file (one for each job) containing a = deduplicated and compressed dump of all the virtual machine files being = backed up. The system will also store what are known as "reverse = incrementals", i.e. anything it overwrites on a backup pass will be = preserved in a file, similar to a traditional incremental backup, except = in the other direction. Since this product does not have any real solutions for offsite backup = replication, after considering a few different options, I'm seriously = considering using a combination of ZFS snapshots and rsync. Basically what would happen is that every night after the backup = completes, rsync would be run, synchronizing over the differences = between the synthetic full backup from the previous day. Historic copies = of the full backup images as synchronized by rsync would be kept using = ZFS snapshots. After our retention window closes, I'd just nuke the = oldest snapshots from the server. We're talking about a file that's around 1 TB big or so even after the = backup software does its own inline compression and deduplication (and = is likely to grow bigger as our environment grows) which is kind of = impractical to send in its entirety even over our current 100 Mbit/s = leased line to our datacenter.=20 First of all, will ZFS will do copy-on-write on a block level when it = comes to snapshots, or is copy-on-write on ZFS snapshots done on a = whole-file level? It would seem that block-level COW would be required = for this to even have a chance of working. Please note that I'm not = talking about deduplication in ZFS itself, but rather using snapshots as = a means to perform a crude kind of deduplication. Second, are there any other caveats that I'm likely to run into as I go = down this path for storing backups? Obviously, I'd prefer just trucking over plain old incremental backups, = and doing a consolidation job off-site, but the backup software doesn't = have any image management software that could consolidate a full backup = plus its incrementals into a synthetic full backup. It'll only do it as = part of a backup job. Grmbl. But then I wouldn't get to play with the = idea of actually storing full backup images for every restore point = using filesystem level snapshots. :)=