From owner-freebsd-questions@FreeBSD.ORG Mon May 18 18:40:04 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A50451065673 for ; Mon, 18 May 2009 18:40:04 +0000 (UTC) (envelope-from vogelke@hcst.com) Received: from beta.hcst.com (beta.hcst.com [192.52.183.241]) by mx1.freebsd.org (Postfix) with ESMTP id 656F58FC25 for ; Mon, 18 May 2009 18:40:04 +0000 (UTC) (envelope-from vogelke@hcst.com) Received: from beta.hcst.com (localhost [127.0.0.1]) by beta.hcst.com (8.13.8/8.13.8/Debian-3) with ESMTP id n4IIe3wQ006062; Mon, 18 May 2009 14:40:03 -0400 Received: (from vogelke@localhost) by beta.hcst.com (8.13.8/8.13.8/Submit) id n4IIe30r006061; Mon, 18 May 2009 14:40:03 -0400 Received: by kev.msw.wpafb.af.mil (Postfix, from userid 32768) id D0E7BBEBB; Mon, 18 May 2009 14:38:27 -0400 (EDT) To: Kelly Jones In-reply-to: <26face530905170912m3ca8b762nd0cfadc7db34da6f@mail.gmail.com> (message from Kelly Jones on Sun, 17 May 2009 09:12:57 -0700) Organization: Oasis Systems Inc. X-Disclaimer: I don't speak for the USAF or Oasis. X-GPG-ID: 1024D/711752A0 2006-06-27 Karl Vogel X-GPG-Fingerprint: 56EB 6DBF 4224 C953 F417 CC99 4C7C 7D46 7117 52A0 Message-Id: <20090518183829.D0E7BBEBB@kev.msw.wpafb.af.mil> Date: Mon, 18 May 2009 14:38:27 -0400 (EDT) From: vogelke+unix@pobox.com (Karl Vogel) Cc: freebsd-questions@freebsd.org Subject: Re: Backing up FreeBSD and other Unix systems securely X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: vogelke+unix@pobox.com List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 May 2009 18:40:05 -0000 >> On Sun, 17 May 2009 09:12:57 -0700, >> Kelly Jones said: K> I like this plan because it does versioned backups, and doesn't backup K> identical files twice. I dislike it because I lose Mozy's unlimited disk K> space. K> % Is there software that already does this? I have a 3-Tbyte server running FreeBSD-6.1 that does something very similar. I don't bother with encrypting the filenames or hashes because we control the box, and if I'm not at work, other admins might need to restore something quickly. We have around 3.7 million files from 5 other servers backed up under two 1.5-Tbyte filesystems, /mir01 and /mir02. My setup looks like this: +-----mir01 | +-----HASH | | +-----00 | | | +-----00 | | | +-----01 ... | | +-----01 ... | | +-----fe | | +-----ff | +-----server1 | +-----server2 +-----mir02 | +-----HASH | +-----server3 | +-----server4 | +-----server5 The HASH directories have two levels of subdirectories 00-ff. That's been more than sufficient to keep directories from getting too big; I average around 25 files per directory. I do hourly backups on the other fileservers using something like the find and timestamp method you mentioned, but I ignore 0-length files because they always hash to the same value. The backup directories for the second fileserver look like this for 5 May 2009: +-----mir01 | +-----server2 | | +-----2009 | | | +-----0505 | | | | +-----070700 | | | | | +-----doc (filesystem) | | | | | +-----home | | | | +-----080700 | | | | | +-----doc | | | | | +-----home ... | | | | +-----190700 | | | | | +-----home After the backups are rsynced to the backup server, I find any regular files with only one link, compute the RMD160 hash of the contents, and make a hardlink to the appropriate filename under the HASH directory. People love to make copies of copies of files, so this really cuts down on the disk space used. The hardlinks make it easy to avoid restoring things that aren't what the user had in mind; if a file's been corrupted, I can tell when it happened just by looking at the inode, so I don't restore an earlier version that's also junk. I can also tell if there were duplicates anywhere on the fileserver at the time the user lost the good version; it's a lot faster for them to get a known good copy from somewhere else on the fileserver than it is to restore over the network. The software is just a few scripts to do things like find files with just one link, compute hashes, do hardlinks, etc. I can put up a tarball if anyone's interested. -- Karl Vogel I don't speak for the USAF or my company The best way for the Government to maintain its credit is to pay as it goes-not by resorting to loans, but by keeping out of debt-through an adequate income secured by a system of taxation, external or internal, or both. --Pres. William McKinley's First Inaugural Address, March 4, 1897