Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 May 2009 14:38:27 -0400 (EDT)
From:      vogelke+unix@pobox.com (Karl Vogel)
To:        Kelly Jones <kelly.terry.jones@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Backing up FreeBSD and other Unix systems securely
Message-ID:  <20090518183829.D0E7BBEBB@kev.msw.wpafb.af.mil>
In-Reply-To: <26face530905170912m3ca8b762nd0cfadc7db34da6f@mail.gmail.com> (message from Kelly Jones on Sun, 17 May 2009 09:12:57 -0700)

next in thread | previous in thread | raw e-mail | index | archive | help
>> On Sun, 17 May 2009 09:12:57 -0700, 
>> Kelly Jones <kelly.terry.jones@gmail.com> said:

K> I like this plan because it does versioned backups, and doesn't backup
K> identical files twice. I dislike it because I lose Mozy's unlimited disk
K> space.

K> % Is there software that already does this?

   I have a 3-Tbyte server running FreeBSD-6.1 that does something very
   similar.  I don't bother with encrypting the filenames or hashes
   because we control the box, and if I'm not at work, other admins
   might need to restore something quickly.

   We have around 3.7 million files from 5 other servers backed up
   under two 1.5-Tbyte filesystems, /mir01 and /mir02.  My setup looks
   like this:

     +-----mir01
     |      +-----HASH
     |      |      +-----00
     |      |      |      +-----00
     |      |      |      +-----01
                          ...
     |      |      +-----01
                   ...
     |      |      +-----fe
     |      |      +-----ff
     |      +-----server1
     |      +-----server2
     +-----mir02
     |      +-----HASH
     |      +-----server3
     |      +-----server4
     |      +-----server5

   The HASH directories have two levels of subdirectories 00-ff.
   That's been more than sufficient to keep directories from getting
   too big; I average around 25 files per directory.

   I do hourly backups on the other fileservers using something like the
   find and timestamp method you mentioned, but I ignore 0-length files
   because they always hash to the same value.  The backup directories
   for the second fileserver look like this for 5 May 2009:

     +-----mir01
     |      +-----server2
     |      |      +-----2009
     |      |      |      +-----0505
     |      |      |      |      +-----070700
     |      |      |      |      |      +-----doc      (filesystem)
     |      |      |      |      |      +-----home
     |      |      |      |      +-----080700
     |      |      |      |      |      +-----doc
     |      |      |      |      |      +-----home
     ...
     |      |      |      |      +-----190700
     |      |      |      |      |      +-----home

   After the backups are rsynced to the backup server, I find any regular
   files with only one link, compute the RMD160 hash of the contents, and
   make a hardlink to the appropriate filename under the HASH directory.
   People love to make copies of copies of files, so this really cuts down
   on the disk space used.

   The hardlinks make it easy to avoid restoring things that aren't what
   the user had in mind; if a file's been corrupted, I can tell when it
   happened just by looking at the inode, so I don't restore an earlier
   version that's also junk.  I can also tell if there were duplicates
   anywhere on the fileserver at the time the user lost the good version;
   it's a lot faster for them to get a known good copy from somewhere
   else on the fileserver than it is to restore over the network.

   The software is just a few scripts to do things like find files with
   just one link, compute hashes, do hardlinks, etc.  I can put up a tarball
   if anyone's interested.

-- 
Karl Vogel                      I don't speak for the USAF or my company

The best way for the Government to maintain its credit is to pay as it
goes-not by resorting to loans, but by keeping out of debt-through an
adequate income secured by a system of taxation, external or internal,
or both.  --Pres. William McKinley's First Inaugural Address, March 4, 1897



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20090518183829.D0E7BBEBB>