Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Oct 2003 08:42:25 -0800
From:      David Wolfskill <david@egation.com>
To:        freebsd-isp@freebsd.org
Subject:   Re: restoring dumps from crashed drive
Message-ID:  <20031027164225.GA361@frecnocpc2.noc.egation.com>
In-Reply-To: <DBEIKNMKGOBGNDHAAKGNCEDOGFAC.dave@nexusinternetsolutions.net>
References:  <DBEIKNMKGOBGNDHAAKGNCEDOGFAC.dave@nexusinternetsolutions.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Oct 27, 2003 at 08:26:49AM -0500, Dave [Nexus] wrote:
> recently had an unfotunate incident where a hard drive crashed after only 8
> months of service. Total loss of data on the drive even after sending it in to a
> recovery company.

>...

> We had boot disks, and the thought was to build a base installation, mount the
> backup drive(secondary hard drive), then simply run restore over the various
> partitions.  Some of the problems we ran into were;
> - unable to copy various system files, kernel, etc...
> - restore being unable to find files and trees referred to by symbolic link
> (which at first I figured would be solved by simply running it twice once the
> files were there to be linked to)
> - and other peculiarities.

> Bottom line is we ended up ditching it, installing a 4.8, cvsup to 4.9, then
> rebuilding the server by hand, and copying user data over.  We are still trying
> to get database files restored which are problematic because of the massive
> changes in the various MySQL and PostgreSQL since previous versions.

The above list of modes of failure strike me as unexpected, at best.

> Aside from the nice dump/restore examples,  does anyone have a real world
> situation where they could discuss the proceedures they did to restore a server
> from backup, assuming total loss of the primary drive.

Certainly.

By its nature, dump requires a nearly incestuous relationship with the
type of file system it's reading; on the other hand, if the file system
has capabilities that more general utilities (e.g. tar or cpio) may not
be aware of -- such as "flags"( cf. "man chflags"), a more file system-
specific tool is appropriate to use.

My backups at home are done with dump (transported via ssh); I have
recovered from failed boot drives on a couple of FreeBSD systems and a
Solaris (2.6) system via those backups.

For the FreeBSD systems, I set them up to boot from either slice 1 or
slice 2 (so I have both / and /usr on those slices, and /var and
"everything else" -- including swap -- on the 3rd slice).  In these
cases, I do a minimal install on slice 2, boot from slice 2, then
restore to slice 1.

In the case of the Solaris system, I still had a flaky, but
marginally-servicable, disk drive from which I could boot, while I put
the new drive in the other position (this was on a SPARCstation 5) and
partition the new drive, created the file systems, then restored the
data.

The reason for setting up the FreeBSD systems to boot from either of 2
slices, however, is not to facilitate such recovery (though it does do
that); rather, it is to make fairly frequent upgrades (while preserving
an ability to fall back to a reasonably well-known system).  I use a
"dump | restore" pipeline to copy the file systems from the active slice
to the inactive one, then boot from the newly-wwritten slice.  I then do
the "make installkernel && mergemaster -p && make installworld &&
mergemaster" sequence in-place on the (now-active) slice -- I use a
different (and faster) machine to do the builds, both for the world
(including the sendmail configs) and the kernels.

(I note, too, that I typically have /usr mounted read-only except during
upgrades.  I tried mounting / read-only a few years ago, but seem to
recall ssh having significant problems with that ... and since a couple
of the boxes I care about run headless, breaking the ability to use ssh
to access them wasn't exactly high on my list of "fun things to do."
Despite that, / doesn't tend to be a very active file system on boxes I
run -- except during upgrades, of course.)

Since I track -STABLE on my laptop (thus getting a "feel" for just how
"stable" it is for my usage), I tend to do these upgrades -- at home --
about every couple of weeks or so.  (If there are circumstances that
justify a more frequent schedule, such as problems with SSL, I'll do
that; if it is my perception that -STABLE isn't suitably "stable" for my
use, I'll hold off for a week or so.)

I confess that I have yet to implement that (or a similar) scheme here
at work, though the new machines I've put into production do get set up
to support it.  But I just got started here....  :-}

So I'm sorry to read of your "tale of woe," but find myself puzzled as
to how that happened.

I cannot help but recommend, though, that anyone doing (or planning)
backups actually *test* the ability to use those backups from time to
time.

Peace,
david
-- 
David H. Wolfskill                                 david@egation.com



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031027164225.GA361>