From owner-freebsd-isp@FreeBSD.ORG  Mon Oct 27 08:42:35 2003
Return-Path: <owner-freebsd-isp@FreeBSD.ORG>
Delivered-To: freebsd-isp@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 1281016A4B3
	for <freebsd-isp@freebsd.org>; Mon, 27 Oct 2003 08:42:35 -0800 (PST)
Received: from mail.egation.com (frhemail.colo.egation.com [216.218.216.14])
	by mx1.FreeBSD.org (Postfix) with SMTP id 2A2D943FA3
	for <freebsd-isp@freebsd.org>; Mon, 27 Oct 2003 08:42:34 -0800 (PST)
	(envelope-from david@mail.egation.com)
Received: (qmail 96977 invoked by uid 0); 27 Oct 2003 16:42:33 -0000
Received: from frecnocpc2.noc.egation.com (66.220.15.53)
  by frhemail.colo.egation.com with SMTP; 27 Oct 2003 16:42:33 -0000
Received: from frecnocpc2.noc.egation.com (localhost [127.0.0.1])
	h9RGgWpX000404
	for <freebsd-isp@freebsd.org>; Mon, 27 Oct 2003 08:42:32 -0800 (PST)
	(envelope-from david@frecnocpc2.noc.egation.com)
Received: (from david@localhost)h9RGgPwa000403
	for freebsd-isp@freebsd.org; Mon, 27 Oct 2003 08:42:25 -0800 (PST)
	(envelope-from david)
Date: Mon, 27 Oct 2003 08:42:25 -0800
From: David Wolfskill <david@egation.com>
To: freebsd-isp@freebsd.org
Message-ID: <20031027164225.GA361@frecnocpc2.noc.egation.com>
References: <DBEIKNMKGOBGNDHAAKGNCEDOGFAC.dave@nexusinternetsolutions.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <DBEIKNMKGOBGNDHAAKGNCEDOGFAC.dave@nexusinternetsolutions.net>
User-Agent: Mutt/1.4.1i
Subject: Re: restoring dumps from crashed drive
X-BeenThere: freebsd-isp@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Internet Services Providers <freebsd-isp.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-isp>,
	<mailto:freebsd-isp-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-isp>
List-Post: <mailto:freebsd-isp@freebsd.org>
List-Help: <mailto:freebsd-isp-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-isp>,
	<mailto:freebsd-isp-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Oct 2003 16:42:35 -0000

On Mon, Oct 27, 2003 at 08:26:49AM -0500, Dave [Nexus] wrote:
> recently had an unfotunate incident where a hard drive crashed after only 8
> months of service. Total loss of data on the drive even after sending it in to a
> recovery company.

>...

> We had boot disks, and the thought was to build a base installation, mount the
> backup drive(secondary hard drive), then simply run restore over the various
> partitions.  Some of the problems we ran into were;
> - unable to copy various system files, kernel, etc...
> - restore being unable to find files and trees referred to by symbolic link
> (which at first I figured would be solved by simply running it twice once the
> files were there to be linked to)
> - and other peculiarities.

> Bottom line is we ended up ditching it, installing a 4.8, cvsup to 4.9, then
> rebuilding the server by hand, and copying user data over.  We are still trying
> to get database files restored which are problematic because of the massive
> changes in the various MySQL and PostgreSQL since previous versions.

The above list of modes of failure strike me as unexpected, at best.

> Aside from the nice dump/restore examples,  does anyone have a real world
> situation where they could discuss the proceedures they did to restore a server
> from backup, assuming total loss of the primary drive.

Certainly.

By its nature, dump requires a nearly incestuous relationship with the
type of file system it's reading; on the other hand, if the file system
has capabilities that more general utilities (e.g. tar or cpio) may not
be aware of -- such as "flags"( cf. "man chflags"), a more file system-
specific tool is appropriate to use.

My backups at home are done with dump (transported via ssh); I have
recovered from failed boot drives on a couple of FreeBSD systems and a
Solaris (2.6) system via those backups.

For the FreeBSD systems, I set them up to boot from either slice 1 or
slice 2 (so I have both / and /usr on those slices, and /var and
"everything else" -- including swap -- on the 3rd slice).  In these
cases, I do a minimal install on slice 2, boot from slice 2, then
restore to slice 1.

In the case of the Solaris system, I still had a flaky, but
marginally-servicable, disk drive from which I could boot, while I put
the new drive in the other position (this was on a SPARCstation 5) and
partition the new drive, created the file systems, then restored the
data.

The reason for setting up the FreeBSD systems to boot from either of 2
slices, however, is not to facilitate such recovery (though it does do
that); rather, it is to make fairly frequent upgrades (while preserving
an ability to fall back to a reasonably well-known system).  I use a
"dump | restore" pipeline to copy the file systems from the active slice
to the inactive one, then boot from the newly-wwritten slice.  I then do
the "make installkernel && mergemaster -p && make installworld &&
mergemaster" sequence in-place on the (now-active) slice -- I use a
different (and faster) machine to do the builds, both for the world
(including the sendmail configs) and the kernels.

(I note, too, that I typically have /usr mounted read-only except during
upgrades.  I tried mounting / read-only a few years ago, but seem to
recall ssh having significant problems with that ... and since a couple
of the boxes I care about run headless, breaking the ability to use ssh
to access them wasn't exactly high on my list of "fun things to do."
Despite that, / doesn't tend to be a very active file system on boxes I
run -- except during upgrades, of course.)

Since I track -STABLE on my laptop (thus getting a "feel" for just how
"stable" it is for my usage), I tend to do these upgrades -- at home --
about every couple of weeks or so.  (If there are circumstances that
justify a more frequent schedule, such as problems with SSL, I'll do
that; if it is my perception that -STABLE isn't suitably "stable" for my
use, I'll hold off for a week or so.)

I confess that I have yet to implement that (or a similar) scheme here
at work, though the new machines I've put into production do get set up
to support it.  But I just got started here....  :-}

So I'm sorry to read of your "tale of woe," but find myself puzzled as
to how that happened.

I cannot help but recommend, though, that anyone doing (or planning)
backups actually *test* the ability to use those backups from time to
time.

Peace,
david
-- 
David H. Wolfskill                                 david@egation.com