From owner-freebsd-isp@FreeBSD.ORG Mon Oct 27 08:42:35 2003 Return-Path: Delivered-To: freebsd-isp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1281016A4B3 for ; Mon, 27 Oct 2003 08:42:35 -0800 (PST) Received: from mail.egation.com (frhemail.colo.egation.com [216.218.216.14]) by mx1.FreeBSD.org (Postfix) with SMTP id 2A2D943FA3 for ; Mon, 27 Oct 2003 08:42:34 -0800 (PST) (envelope-from david@mail.egation.com) Received: (qmail 96977 invoked by uid 0); 27 Oct 2003 16:42:33 -0000 Received: from frecnocpc2.noc.egation.com (66.220.15.53) by frhemail.colo.egation.com with SMTP; 27 Oct 2003 16:42:33 -0000 Received: from frecnocpc2.noc.egation.com (localhost [127.0.0.1]) h9RGgWpX000404 for ; Mon, 27 Oct 2003 08:42:32 -0800 (PST) (envelope-from david@frecnocpc2.noc.egation.com) Received: (from david@localhost)h9RGgPwa000403 for freebsd-isp@freebsd.org; Mon, 27 Oct 2003 08:42:25 -0800 (PST) (envelope-from david) Date: Mon, 27 Oct 2003 08:42:25 -0800 From: David Wolfskill To: freebsd-isp@freebsd.org Message-ID: <20031027164225.GA361@frecnocpc2.noc.egation.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.1i Subject: Re: restoring dumps from crashed drive X-BeenThere: freebsd-isp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Internet Services Providers List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Oct 2003 16:42:35 -0000 On Mon, Oct 27, 2003 at 08:26:49AM -0500, Dave [Nexus] wrote: > recently had an unfotunate incident where a hard drive crashed after only 8 > months of service. Total loss of data on the drive even after sending it in to a > recovery company. >... > We had boot disks, and the thought was to build a base installation, mount the > backup drive(secondary hard drive), then simply run restore over the various > partitions. Some of the problems we ran into were; > - unable to copy various system files, kernel, etc... > - restore being unable to find files and trees referred to by symbolic link > (which at first I figured would be solved by simply running it twice once the > files were there to be linked to) > - and other peculiarities. > Bottom line is we ended up ditching it, installing a 4.8, cvsup to 4.9, then > rebuilding the server by hand, and copying user data over. We are still trying > to get database files restored which are problematic because of the massive > changes in the various MySQL and PostgreSQL since previous versions. The above list of modes of failure strike me as unexpected, at best. > Aside from the nice dump/restore examples, does anyone have a real world > situation where they could discuss the proceedures they did to restore a server > from backup, assuming total loss of the primary drive. Certainly. By its nature, dump requires a nearly incestuous relationship with the type of file system it's reading; on the other hand, if the file system has capabilities that more general utilities (e.g. tar or cpio) may not be aware of -- such as "flags"( cf. "man chflags"), a more file system- specific tool is appropriate to use. My backups at home are done with dump (transported via ssh); I have recovered from failed boot drives on a couple of FreeBSD systems and a Solaris (2.6) system via those backups. For the FreeBSD systems, I set them up to boot from either slice 1 or slice 2 (so I have both / and /usr on those slices, and /var and "everything else" -- including swap -- on the 3rd slice). In these cases, I do a minimal install on slice 2, boot from slice 2, then restore to slice 1. In the case of the Solaris system, I still had a flaky, but marginally-servicable, disk drive from which I could boot, while I put the new drive in the other position (this was on a SPARCstation 5) and partition the new drive, created the file systems, then restored the data. The reason for setting up the FreeBSD systems to boot from either of 2 slices, however, is not to facilitate such recovery (though it does do that); rather, it is to make fairly frequent upgrades (while preserving an ability to fall back to a reasonably well-known system). I use a "dump | restore" pipeline to copy the file systems from the active slice to the inactive one, then boot from the newly-wwritten slice. I then do the "make installkernel && mergemaster -p && make installworld && mergemaster" sequence in-place on the (now-active) slice -- I use a different (and faster) machine to do the builds, both for the world (including the sendmail configs) and the kernels. (I note, too, that I typically have /usr mounted read-only except during upgrades. I tried mounting / read-only a few years ago, but seem to recall ssh having significant problems with that ... and since a couple of the boxes I care about run headless, breaking the ability to use ssh to access them wasn't exactly high on my list of "fun things to do." Despite that, / doesn't tend to be a very active file system on boxes I run -- except during upgrades, of course.) Since I track -STABLE on my laptop (thus getting a "feel" for just how "stable" it is for my usage), I tend to do these upgrades -- at home -- about every couple of weeks or so. (If there are circumstances that justify a more frequent schedule, such as problems with SSL, I'll do that; if it is my perception that -STABLE isn't suitably "stable" for my use, I'll hold off for a week or so.) I confess that I have yet to implement that (or a similar) scheme here at work, though the new machines I've put into production do get set up to support it. But I just got started here.... :-} So I'm sorry to read of your "tale of woe," but find myself puzzled as to how that happened. I cannot help but recommend, though, that anyone doing (or planning) backups actually *test* the ability to use those backups from time to time. Peace, david -- David H. Wolfskill david@egation.com