Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 3 Mar 1998 18:31:01 -0600
From:      Karl Denninger  <karl@mcs.net>
To:        shimon@simon-shapiro.org
Cc:        Wilko Bulte <wilko@yedi.iaf.nl>, sbabkin@dcn.att.com, tlambert@primenet.com, jdn@acp.qiv.com, blkirk@float.eli.net, hackers@FreeBSD.ORG, grog@lemis.com
Subject:   Re: SCSI Bus redundancy...
Message-ID:  <19980303183101.05201@mcs.net>
In-Reply-To: <XFMail.980303162324.shimon@simon-shapiro.org>; from Simon Shapiro on Tue, Mar 03, 1998 at 04:23:24PM -0800
References:  <199803032155.WAA04054@yedi.iaf.nl> <XFMail.980303162324.shimon@simon-shapiro.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Mar 03, 1998 at 04:23:24PM -0800, Simon Shapiro wrote:
> I think the focus has to change:
> 
> *  We used to do RAID to protect from hardware failure disrupting service. 
>    In the face of O/S and firmware volatility and buginess, this is absurd; 
>    As I said, I am using DPT controllers for ALL my storage. and yet have to
>    loose a byte to disk failure (unless I use WD or certain Micropolis
>    models).
> 
> *  I think RAID is only important to protect us fro mthe damage WHEN the
>    failure occurs.
> 
> I think the focus changed from operational feature to insurance policy. 
> Risk management is something not too many of us is any good at (count the
> number of times you/I/we delivered a project on time.
> 
> What does it all mean?  I dunno.  I leave it to the scientists to ponder.

My CMD RAID adapters have saved my nuts twice in the last month.

In both cases there was a non-recoverable, hard sector error on a 9G drive.
Without parity I would have lost something.  With the RAID5 in place I lost
nothing, other than the time to pull the pack, replace it, and set the new
disk to "warm spare" (the system had already started the rebuild onto the
existing spare).

Lose 36GB all the way back to your last full + incremental dump (at least a
day's worth of revisions) across 10,000 customers and tell me what happens
to your head when they get done with you.

The problem isn't even necessarily the data loss - its the restore time.  A
9G drive takes a shitload of time to reload from even the fastest DLT drive.

We still run tapes nightly for incrementals, and weekly for full dumps - but
they are more for the "aw shit" user-induced stupidity (like the infamous
"rm -rf *") rather than hardware coverage.  The pain of a restore across
disks of this size is just too darn big.

This is, by the way, one of the reasons I used to favor lots of 1G drives
and filesystems - they can be restored in an hour or so if one fails.  With
a 9G drive, even the newest and fastest ones, and the best tape devices,
you're looking at a multi-hour outage.

--
-- 
Karl Denninger (karl@MCS.Net)| MCSNet - Serving Chicagoland and Wisconsin
http://www.mcs.net/          | T1's from $600 monthly to FULL DS-3 Service
			     | NEW! K56Flex support on ALL modems
Voice: [+1 312 803-MCS1 x219]| EXCLUSIVE NEW FEATURE ON ALL PERSONAL ACCOUNTS
Fax:   [+1 312 803-4929]     | *SPAMBLOCK* Technology now included at no cost

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980303183101.05201>