Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 15 Jan 2008 14:00:22 +0100
From:      =?ISO-8859-1?Q?Johan_Str=F6m?= <johan@stromnet.se>
To:        Jeremy Chadwick <koitsu@FreeBSD.org>
Cc:        emj@emj.se, freebsd-stable@freebsd.org
Subject:   Re: Backup solution suggestions
Message-ID:  <06DAF546-AB57-4B1D-89CE-3DCF66678A2C@stromnet.se>
In-Reply-To: <20080115123424.GA7259@eos.sc1.parodius.com>
References:  <E6BCC509-6CC8-44F1-98C2-416920A52218@stromnet.se> <20080115123424.GA7259@eos.sc1.parodius.com>

next in thread | previous in thread | raw e-mail | index | archive | help
First of all, thanks for your extensive answer!

On Jan 15, 2008, at 13:34 , Jeremy Chadwick wrote:

> On Tue, Jan 15, 2008 at 10:52:56AM +0100, Johan Str=F6m wrote:
>> I'm looking to invest in some new hardware for backup. probably =20
>> some kind
>> of NAS (a 4-disk 1U NAS or something in that size). The thing is =20
>> that I
>> won't be the only one with access to this box, thus I would like =20
>> to secure
>> my data.
>
> In my experience, your best bet when it comes to backups like what you
> want (1U box with 4 disks, or a 2U box with 8 or more) is to simply =20=

> buy
> a server with the specifications you want, and run FreeBSD on it.  I
> cannot recommend commercial products for something of this =20
> "scale" (e.g.
> small/medium).
>
> I could list off all the reasons why [as a small hosting provider] I
> avoid proprietary backup solutions, but the list is quite long.  The
> two main reasons:
>
> 1) Proprietary solutions often use proprietary hardware.  How do you
> know what's inside of that mystery box?  What if it uses a SATA
> controller you know has h/w-level bugs in it?  What if something in =20=

> the
> device fails; are you going to be charged an arm and a leg for a
> replacement part?  Does it even HAVE user-servicable parts?  etc...
>
> I feel much more confident relying on hardware that I'm familiar with,
> e.g. I know what motherboard is in the server I buy or build, I =20
> know who
> makes it, I know if it's compatible with FreeBSD or Linux, I know the
> SATA controller works and isn't flaky, I know the SATA backplane
> actually works properly and supports hot-swapping, and I know if I =20
> need
> replacement parts I can get them promptly.  Also, if the h/w I buy =20
> turns
> out to have compatibility problems or performance issues, I can always
> return it, get my money back, and try other h/w; with a proprietary
> solution you're "stuck with it", and if something's broken about it
> which the vendor can't/won't fix, you're screwed.
>
> 2) Proprietary solutions also means proprietary software.  This is
> pretty much guaranteed regardless of what h/w is used.  What if the
> volume manager used for your array has a bug and your data is
> corrupt?  You have no way of really "knowing" this until it's too =20
> late,
> and you only have one person to turn to: the vendor.

All good points there, cannot argue against that. Certainly something =20=

to think about before doing any purchases. The only thing against =20
that right now is size (we've got "cheap" access to a rack with =20
limited depth), havent realy found any good 1U chassis that arent to =20
deep. Admittedly I haven't spent veery much time looking yet but.. :)

>
> I prefer to have freedom of choice when it comes to backup methods.
> "Hmm, dump/restore isn't working out very well, so maybe I'll try ZFS,
> or bacula, or tar over NFS, or rsync, or...".
>
>> What I would like is encryption both for the transfer to the box, and
>> encrypted on disk. The data on disk should not be readable by =20
>> anyone but me
>> (ie the other user(s) of the box should not be able to read it, at =20=

>> least
>> not without a big effort).
>
> I'm curious what the reason is for on-disk encryption?  Is it =20
> necessary
> for something *only you* will have access to?  What's the concern =20
> here?

I think I wrote that I *wont* be the only one with access to the box. =20=

Sorry if that wasn't clear.

It will be shared with a friend (or rather his company) of mine. I do =20=

trust him, but to keep some level of security I don't want him (or =20
rather, someone with access to his box) to be able to read my files =20
(and the other way arround for his files).

>
>> So, I'm wondering what the best solution might be.. Tar'balling =20
>> all my
>> stuff and encrypt it with GPG or something and just dump it there =20
>> with NFS
>> would be the easiest solution, but maybe not the best. I've been =20
>> thinking
>> about running a GELI image on my box, and store that on the NAS =20
>> over NFS..
>> would that be doable/secure/stable?
>
> I would recommend avoiding NFS unless the machine you're running
> nfsd/mountd/portmap on has no direct way to talk to the Internet.  =20
> It's
> impossible to get NFS-related daemons to bind solely to one IP/=20
> interface
> on FreeBSD, which imposes a security risk.  If the machine is behind
> NAT, you're very likely safe (unless the public has some way of
> accessing another machine on that NAT network).  Thus, if you =20
> choose to
> go the NFS route, have it on a segregated network.

The box will be on a separate LAN only accessible by our two boxes. =20
No internet connectivity. But the client boxes ofcourse have internet =20=

connectivty (but that would only be NFS clients, not servers).

>
> That said -- what we use in our production environment is dump/restore
> over SSH over a dedicated LAN.  I wrote a series of scripts that do
> this, using SSH keys for the SSH portion.  Incrementals are done 6 =20
> days
> a week, with fulls done once a week.

I use a similar scheme now, using BackupPC. However that is to my box =20=

at home which is not a very good solution due to bandwidth =20
limitations (5MBit only).. The first copy takes ages, the incremental =20=

ones not as much.. It's around 20-30GB of data currently. The NAS/=20
backup box would be located on an 100MBit/1000MBit unmetered link.

>
> Does it work?  Yes.  Have I had to restore from it?  Yes, twice.  =20
> Did it
> work OK?  Yes, but was not as simple as "restore the backup to this
> disk, throw the disk in the server, and voila FreeBSD is back up and
> running".  It's more of "replace the disk, install FreeBSD on it,
> configure the box like before, then restore the user data..."
>
> Once all of our systems are running RELENG_7, I plan on utilising ZFS
> heavily.  ZFS offers backup/restore capability, including over a
> network, and it's very fast.  Now if only installing FreeBSD onto ZFS
> was made simple, ditto with booting off of ZFS...
>
> Now, on a personal level -- I do backups at home too.  My home system
> has 4 disks in it -- one for the OS (UFS2), one for backups (UFS2), =20=

> and
> two for a ZFS RAID-0-like volume.
>
> For the OS disk and filesystems (e.g. / /var /usr /tmp /home), I use
> rsync.  For the ZFS volume, I use ZFS snapshots in an incremental
> fashion (6 days of incrementals, 1 day of full) and do "zfs send
> {volume} > /backup_disk/volume.X" to do the backups.
>
> In case you're wondering about how long they all take and how much =20
> data
> is backed up, here's some times of full level 0 backups:
>
> =3D=3D> Backing up / to /backups/rootfs/ (method: rsync)
> =3D=3D> Start time: Sun Jan 13 02:45:01 PST 2008
> =3D=3D> End time:   Sun Jan 13 02:45:01 PST 2008
> =3D=3D> Backing up /var to /backups/var/ (method: rsync)
> =3D=3D> Start time: Sun Jan 13 02:45:01 PST 2008
> =3D=3D> End time:   Sun Jan 13 02:45:06 PST 2008
> =3D=3D> Backing up /usr to /backups/usr/ (method: rsync)
> =3D=3D> Start time: Sun Jan 13 02:45:06 PST 2008
> =3D=3D> End time:   Sun Jan 13 02:46:03 PST 2008
> =3D=3D> Backing up /home to /backups/home/ (method: rsync)
> =3D=3D> Start time: Sun Jan 13 02:46:03 PST 2008
> =3D=3D> End time:   Sun Jan 13 02:46:03 PST 2008
> =3D=3D> Backing up storage to /backups/storage.zfs.%%% (method: zfs)
> =3D=3D> Start time: Sun Jan 13 02:46:03 PST 2008
> =3D=3D> End time:   Sun Jan 13 03:29:33 PST 2008
>
> Filesystem   1024-blocks      Used     Avail Capacity  Mounted on
> /dev/ad8s1a       507630    211410    255610    45%    /
> /dev/ad8s1d      8122126    108502   7363854     1%    /var
> /dev/ad8s1e      4058062       420   3732998     0%    /tmp
> /dev/ad8s1f     32494668   2023282  27871814     7%    /usr
> /dev/ad8s1g    139955812     11640 128747708     0%    /home
> /dev/ad10s1d   473009638 146843210 288325658    34%    /backups
> storage        957526016 124001408 833524608    13%    /storage
>
> And here's what you see on /backups:
>
> total 144005480
> drwxr-xr-x    6 root      wheel              512 16 Oct 10:08 home/
> drwxr-xr-x   24 root      wheel              512 13 Jan 23:49 rootfs/
> -rw-r--r--    1 root      wheel     126996957624 13 Jan 03:29 =20
> storage.zfs.0
> -rw-r--r--    1 root      wheel           747136 14 Jan 02:46 =20
> storage.zfs.1
> -rw-r--r--    1 root      wheel        541937432 15 Jan 02:45 =20
> storage.zfs.2
> -rw-r--r--    1 root      wheel       4408684056  9 Jan 02:46 =20
> storage.zfs.3
> -rw-r--r--    1 root      wheel       4716827040 10 Jan 02:47 =20
> storage.zfs.4
> -rw-r--r--    1 root      wheel       5362108640 11 Jan 02:47 =20
> storage.zfs.5
> -rw-r--r--    1 root      wheel       5362108640 12 Jan 02:47 =20
> storage.zfs.6
> drwxr-xr-x   17 root      wheel              512  1 Dec 09:06 usr/
> drwxr-xr-x   23 root      wheel              512  6 Jan 01:36 var/
>
> For the ZFS incremental storage.zfs.2 (541MB of data), the time was =20=

> very
> quick (9 seconds)
>
> =3D=3D> Backing up storage to /backups/storage.zfs.%%% (method: zfs)
> =3D=3D> Start time: Tue Jan 15 02:45:26 PST 2008
> =3D=3D> End time:   Tue Jan 15 02:45:35 PST 2008
>
> I have dump/restore on UFS2 via ssh times if you want them as well.
> They're not pretty.


ZFS is indeed very nice, I'm running it at home for a not-so-=20
important server.. I love it! Have been working without a single =20
hickup since I started using it (end of November).
We've been thinking of doing using a fbsd machine with ZFS, but the =20
dump/restore scheme wouldnt help us since the machines beeing backupd =20=

doesnt run ZFS (didnt exist on Fbsd/wasnt stable enough when those =20
where setup). So relying on ZFS's dump/restore for the backupee-=20
 >backup box is, I'm afraid, not an option. However the snapshots =20
could ofcourse be usable on the backup box, ie copying the files =20
first time, creating a snapshot, rsyncing new versions, new shapshot =20
& new rsync and so on, if I've understood the snapshots correct =20
(havent played with them very much yet).
However this wont work either, or at least probably not very =20
effective since the data should be encrypted and not in plaintext.

>
>> Another idea would be to go with some regular 1U box running some =20
>> FBSD,
>> doing scp to the box and geli local on the box but that would =20
>> require me to
>> have the encryption keys on that box (which would be shared so =20
>> thus no good
>> idea).
>
> I would recommend going this route, at least in regards to the 1U box
> running FreeBSD.  See above comment about GELI.  scp to the box =20
> would be
> fine; why does this part worry you?

Well, explained above, I *wont* be the only one with access to it.

>
>> Any other ideas? Being able to rsync to the backup storage instead =20=

>> of just
>> sending big encrypted tarballs would be very nice (and I guess =20
>> that would
>> be possible with geli version)
>
> See above, re: why is encryption needed?
>

Above again.


Again, thanks you very much for all your time and thoughts, very much =20=

appreciated!

--
Johan=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?06DAF546-AB57-4B1D-89CE-3DCF66678A2C>