Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Mar 1998 21:52:15 -0800 (PST)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Greg Lehey <grog@lemis.com>
Cc:        Wilko Bulte <wilko@yedi.iaf.nl>, sbabkin@dcn.att.com, tlambert@primenet.com, jdn@acp.qiv.com, blkirk@float.eli.net, hackers@FreeBSD.ORG
Subject:   Re: SCSI Bus redundancy...
Message-ID:  <XFMail.980303215215.shimon@simon-shapiro.org>
In-Reply-To: <19980304143448.27241@freebie.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 04-Mar-98 Greg Lehey wrote:
 
...

>> The only problem I have here, is the assumption that the O/S will do all
>> that.  Not only it consumes much CPU, I/O bus, memory bandwidth, etc.,
>> but
>> O/S crashes are the number one cause of failure in any modern
>> computer.
> 
> Sure.
> 
>> Putting all this logic there is asking for it to crash frequently,
>> and run under load all the time.
> 
> I don't think you can assume that.  The load depends on the input.
> The reliability depends on the quality of the code.

It does not matter at all how good your code is.  It matterswhat the sound
driver decided to do, the PPP driver, the x25 driver, X11, serial console,
etc.  There is no compartmentalizing in Unix.  This means that any failure
in totally unrelated code WILL crash your code.  In a RAID controller, as
long is power is applied, and the hardware did not fail, the firmware has
to worry about only one thing;  the firmware.  There are fewer interrupts,
fewer processes, simpler O/S, no filesystem topaic with freeing free inode.
 Just SCSI devices on one side and (in the case of DPT, etc.) a PCI
interface.  I think the work you do is great and important, but the
filesystem and volume managers are more interesting than the RAID stuff.

When I joined Oracle, I thought just like you.  The attitude there was that
Oracle does not do ANY storage availability at all.  This is in the
platform.  If the platform has RAID, then Oracle will have a more reliable
storage.  If not, not.  It assumes that if (write(...) == 0) then the data
can be read.  This is how it was.  I will be amazed if that changed.
I tend these days to think the same way.  If I can buy the realiability in
a black box, for the cost of a single drive, and get a host of other
features with it for free, and (in most real life cases) get also a better
perfromance, why waste my time?

Now, the Unix filesystem is a different matter altogether.  It is totally
volatile, explosive and self igniting.  FreeBSD has a darn good one, but I
have lost more than once filesystem to crashes.  The best?  ALL the
executablesin /usr/local/bin turned into binary data files.  Cool, eh?

 ...

> You've made the point: spend a few dollars.  It's a tradeoff between
> speed and cost.

Nope.  the tradeoff is between cost and reliability.  I will be har to
convince that a PC, with its 1-2 processors can run at load 5.0 and above,
on user code, and have enough CPU cycles to do all the parity for a RAID-5
array FASTER than a DPT (Mylex, whoever) that can throw a 160 MHz i960 at
the problem, in a private memory space, do the RAID in hardware and do
exactly one thing:  Run the disk.

I do not have to be a scientist to know that conputing 5 things takes more
CPU than computing four.

On an idle machine, doing nothing else, a P6-200 will run RAID-5 code
almost as fast as an i960.  If you want to have (for bandwidth, and
redundency) more than one SCSI bus, and have, say 20 drives (on a typical
full feed News server, etc.), you will still handle 20 times as many
interrupts as a DPT card will give you.  Sure, the DPT will take all these
20 interrupts, but it will not be serving an interrupt per keyboard
keystroke, an interrupt per 16 bytes on the PPP link, interrupt per packet
on the internet, etc.

Mainframe designers learned this trick in the early 1960's.  An IBM 370 had
about 4 MIPS on a CPU, ig I remember right, and about 4MB of RAM and
supported, OLTP load (airline reservations) with 10,000 terminals and half
as manu disks.  The architecture of these things is fascinating, the amount
of intelligence and elegance is incredible.  We have a lot to learn from
these.  They were the original distributed processing systems.

>> BTW, Since 1984 or so, I NEVER lost data due to disk failure.  
> 
> I have.
> 
>> I lost a LOT of data due to O/S failures, and some data due to bugs
>> in RAID logic.
> 
> That, too, but *much* less.  Maybe you've been using different
> operating systems?

No, I use FreeBSD for the last 18 months.  Seagate claims, in writing
1,000,000 hours mean time between failures on their drives.  Can FreeBSD
make that claim?  Can Tandem make this claim about their O/S?  No operating
system written today (or EVER) can honestly make this claim.  I think
Seagate is full of it with this million hours nonsense.  But 5 years of
contineous operation is reasonable.  I have Fujitsu drives and some
Micropolis drives with that much contineous operation on them.  Zero
failures

BTW, I know the difference between hardware and software.  In theory
software does not fail.  In reality it does.  Go to a company like Oracle,
which makes money when their software does NOT demonstrat a bug;  They make
almost as much money from support as from sales.  They have a huge army of
talented programmers and they keep excellent records of their support.
Ask them (they will probably say no :-) about the ratio of service loss due
to hardware failure, vs. RDBMS failure, vs. O/S failure (when I was there
we had 58 different ports), operator errors, applications failure.  You
come out of these presentation very sober.

>> Although I do not belive Seagate's claim for 1,000,000 hours MTBF, I
>> think the realized MTBF will far exceed any FreeBSD uptime.
> 
> Especially not when it comes from Seagate :-)

I think, left in the orignal container, in temperature controlled room,
non-operational, it may survive for 500,000 hours.  Maybe.  Running in a
badly cooled PC?  5 years top.


----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980303215215.shimon>