From owner-freebsd-questions@FreeBSD.ORG Fri Oct 1 20:35:19 2004 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4FA7116A4CF for ; Fri, 1 Oct 2004 20:35:19 +0000 (GMT) Received: from dirg.bris.ac.uk (dirg.bris.ac.uk [137.222.10.102]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD8D843D49 for ; Fri, 1 Oct 2004 20:35:18 +0000 (GMT) (envelope-from Jan.Grant@bristol.ac.uk) Received: from mail.ilrt.bris.ac.uk ([137.222.16.62]) by dirg.bris.ac.uk with esmtp (Exim 4.34) id 1CDU7Y-0006Aw-91; Fri, 01 Oct 2004 21:35:17 +0100 Received: from cmjg (helo=localhost) by mail.ilrt.bris.ac.uk with local-esmtp (Exim 4.34) id 1CDU7O-0003B7-Vy; Fri, 01 Oct 2004 21:35:15 +0100 Date: Fri, 1 Oct 2004 21:35:06 +0100 (BST) From: Jan Grant X-X-Sender: cmjg@mail.ilrt.bris.ac.uk To: Brian McCann In-Reply-To: <2b5f066d0410010542a38e93b@mail.gmail.com> Message-ID: References: <2b5f066d04093013344d048003@mail.gmail.com> <2034.67.167.52.21.1096577945.squirrel@www.l-i-e.com> <2b5f066d0410010542a38e93b@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: Jan Grant X-Spam-Score: 0.0 X-Spam-Level: / cc: freebsd-questions@freebsd.org Subject: Re: Backup/Restore X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2004 20:35:19 -0000 On Fri, 1 Oct 2004, Brian McCann wrote: > On Thu, 30 Sep 2004 13:59:05 -0700 (PDT), Richard Lynch wrote: > > > > Brian McCann wrote: > > > Hi all...I'm having a conceptual problem I can't get around and > > > was hoping someone can change my focus here. I've been backing up > > > roughly 6-8 million small files (roughly 2-4k each) using dump, but > > > restores take forever due to the huge number of files and directories. > > > Luckily, I haven't had to restore for an emergency yet...but if I > > > need to, I'm kinda stuck. I've looked at distributed file systems > > > like CODA, but the number of files I have to deal with will make it > > > choke. Can anyone offer any suggestions? I've pondered running > > > rsync, but am very worried about how long that will take... > > > > Do the files change a lot, or is it more like a few files added/changed > > every day, and the bulk don't change? > > > > If it's the latter, you could maybe get best performance from something > > like Subversion (a CVS derivative). > > > > Though I suspect rsync would also do well in that case. > > > > If a ton of those files are changing all the time, try doing a test on > > creating a tarball and then backing up the tarball. That may be a simple > > managable solution. There are probably other more complex solutions of > > which I am ignorant :-) > > I have the case where a new file is created about every second or two, > nothing gets changed, but files get deleted occasionally (it's a mail > server). I thought of using tar, but it would be just as slow as dump > I would think. I've thought of breaking it up into chunks, but that > still doesn't solve my speed issue...i'm beginning to consider using > dd since it reads the actual disk bits, and just hope that a)I don't > ever need one file and b) the system I restore to has at least or more > space then the original server. Any other thoughts anyone? You might want to experiment with something like rsync to maintain a "live" (ie, on a FS) second copy. If you do this don't be put off by the initial rsync time (which may well take ages - tar or dump/restore may be faster to get the second copy in place initially). Rsync over such a large filesystem may take quite a while but the best bet is to actually try it to see if it meets your needs. Obviously a restore of a mail repository is a pretty awful thing to have to do. Amongst other things, users can find the "ressurrection" of deleted mails to be a real pain. You might want to see if your mail repo can generate some kind of replay log - if so, this might be the best route for minimising the amount of time needed to synchronise mailstores and to get the closest fidelity out of the copy. Breaking your mailstore into separate chunks may well help. Yes, the total time for a dump/restore may be close to your current state of play, but if you can split the partitions between machines then you have the option to perform these in parallel. -- jan grant, ILRT, University of Bristol. http://www.ilrt.bris.ac.uk/ Tel +44(0)117 9287088 Fax +44 (0)117 9287112 http://ioctl.org/jan/ "...perl has been dead for more than 4 years." - Abigail in the Monastery