FreeBSD Mail Archives

Date:      Tue, 3 Mar 1998 20:06:52 -0600
From:      Karl Denninger  <karl@mcs.net>
To:        shimon@simon-shapiro.org
Cc:        grog@lemis.com, hackers@FreeBSD.ORG, blkirk@float.eli.net, jdn@acp.qiv.com, tlambert@primenet.com, sbabkin@dcn.att.com, Wilko Bulte <wilko@yedi.iaf.nl>
Subject:   Re: SCSI Bus redundancy...
Message-ID:  <19980303200652.07366@mcs.net>
In-Reply-To: <XFMail.980303170953.shimon@simon-shapiro.org>; from Simon Shapiro on Tue, Mar 03, 1998 at 05:09:53PM -0800
References:  <19980303183101.05201@mcs.net> <XFMail.980303170953.shimon@simon-shapiro.org>

On Tue, Mar 03, 1998 at 05:09:53PM -0800, Simon Shapiro wrote:
> > We still run tapes nightly for incrementals, and weekly for full dumps -
> > but
> > they are more for the "aw shit" user-induced stupidity (like the infamous
> > "rm -rf *") rather than hardware coverage.  The pain of a restore across
> > disks of this size is just too darn big.
> 
> I wrote a white paper at Oracle some years ago, claiming that databases
> over a certain size simply cannot be backed up.  I became very UN-popular
> very quickly.  In you moderate setup, you already see the proof of
> corectness.

Correct.  I consider any "regular" filesystem with more than 4G of data on
it to be unrestorable, simply because there isn't enough time to do the
restore and not get skewered.  

If the filesystem has news on it?  Forget it.  The small files blast the
hell out of restore (or pax, or anything else) during the creates - even if
mounted async during that operation.  It simply takes forever.  I've tried
copying a 4G news spool disk before - get ready for a 12 hour wait.

This, by the way, is why most ISPs just dump the spool if they get hit with
a disk failure.  By the time you restore it its out of date anyway :-)

That's also why we have a nice RAID 0+1 array serving that on our primary 
news machine. :-)

> This is why most MIS types shiver when they hear about databases on Unix
> filesystems.  All you need is a crash and fsck in a bad mood.  If you are
> lucky, the entire data base is gone.  If you are unlucky, a block will
> disappear form somewhere in the middle, and you will find out a week later.
> Now backup is literally useless.

Yep.  This is, by the way, why a proper rotation of tapes is ESSENTIAL.  If
not, you're *DEAD*.  I've been at this too long to get screwed by this
kind of problem.

Databases on Unix filesystems aren't safe.  Neither are databases on raw
partitions.  I've seen both lost due to physical problems.  Ever see a disk
adapter decide to "translate" a block address?  I have - and guess what
happened to the database (this one was on a raw partition)?  It was 
over a week later before the problem was detected when the back-end crashed
for no ascertainable reason, and the validate failed.  That one wasn't my
responsibility, and the person who *WAS* the DBA wasn't doing the right
things with the TLOGs.  

The company lost a week's worth of work.

> > This is, by the way, one of the reasons I used to favor lots of 1G drives
> > and filesystems - they can be restored in an hour or so if one fails. 
> > With
> > a 9G drive, even the newest and fastest ones, and the best tape devices,
> > you're looking at a multi-hour outage.
> 
> True.  Your perfromance also goes up with the smaller drives.  You can
> stripe better.  I think I mentioned it before in this forum;  Most DBMS
> benchmarks only use 300MB of the disk.  This is sort of the ``sweet spot''
> between system cost and perfrormance.

To a point this is true.  The problem is that the smaller disks rotate
slower, and have bit densities that are lower.

There is a tradeoff between seek latency and transfer time.  If there are
lots of small files, the huge number of small disks wins big.  If there are
a few large files, the small number of disks with speed on the physical I/O
wins, provided you can seek sequentially.

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly to FULL DS-3 Service
			     | NEW! K56Flex support on ALL modems
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980303200652.07366>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation