Date: Tue, 03 Mar 1998 21:52:15 -0800 (PST) From: Simon Shapiro <shimon@simon-shapiro.org> To: Greg Lehey <grog@lemis.com> Cc: Wilko Bulte <wilko@yedi.iaf.nl>, sbabkin@dcn.att.com, tlambert@primenet.com, jdn@acp.qiv.com, blkirk@float.eli.net, hackers@FreeBSD.ORG Subject: Re: SCSI Bus redundancy... Message-ID: <XFMail.980303215215.shimon@simon-shapiro.org> In-Reply-To: <19980304143448.27241@freebie.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 04-Mar-98 Greg Lehey wrote: ... >> The only problem I have here, is the assumption that the O/S will do all >> that. Not only it consumes much CPU, I/O bus, memory bandwidth, etc., >> but >> O/S crashes are the number one cause of failure in any modern >> computer. > > Sure. > >> Putting all this logic there is asking for it to crash frequently, >> and run under load all the time. > > I don't think you can assume that. The load depends on the input. > The reliability depends on the quality of the code. It does not matter at all how good your code is. It matterswhat the sound driver decided to do, the PPP driver, the x25 driver, X11, serial console, etc. There is no compartmentalizing in Unix. This means that any failure in totally unrelated code WILL crash your code. In a RAID controller, as long is power is applied, and the hardware did not fail, the firmware has to worry about only one thing; the firmware. There are fewer interrupts, fewer processes, simpler O/S, no filesystem topaic with freeing free inode. Just SCSI devices on one side and (in the case of DPT, etc.) a PCI interface. I think the work you do is great and important, but the filesystem and volume managers are more interesting than the RAID stuff. When I joined Oracle, I thought just like you. The attitude there was that Oracle does not do ANY storage availability at all. This is in the platform. If the platform has RAID, then Oracle will have a more reliable storage. If not, not. It assumes that if (write(...) == 0) then the data can be read. This is how it was. I will be amazed if that changed. I tend these days to think the same way. If I can buy the realiability in a black box, for the cost of a single drive, and get a host of other features with it for free, and (in most real life cases) get also a better perfromance, why waste my time? Now, the Unix filesystem is a different matter altogether. It is totally volatile, explosive and self igniting. FreeBSD has a darn good one, but I have lost more than once filesystem to crashes. The best? ALL the executablesin /usr/local/bin turned into binary data files. Cool, eh? ... > You've made the point: spend a few dollars. It's a tradeoff between > speed and cost. Nope. the tradeoff is between cost and reliability. I will be har to convince that a PC, with its 1-2 processors can run at load 5.0 and above, on user code, and have enough CPU cycles to do all the parity for a RAID-5 array FASTER than a DPT (Mylex, whoever) that can throw a 160 MHz i960 at the problem, in a private memory space, do the RAID in hardware and do exactly one thing: Run the disk. I do not have to be a scientist to know that conputing 5 things takes more CPU than computing four. On an idle machine, doing nothing else, a P6-200 will run RAID-5 code almost as fast as an i960. If you want to have (for bandwidth, and redundency) more than one SCSI bus, and have, say 20 drives (on a typical full feed News server, etc.), you will still handle 20 times as many interrupts as a DPT card will give you. Sure, the DPT will take all these 20 interrupts, but it will not be serving an interrupt per keyboard keystroke, an interrupt per 16 bytes on the PPP link, interrupt per packet on the internet, etc. Mainframe designers learned this trick in the early 1960's. An IBM 370 had about 4 MIPS on a CPU, ig I remember right, and about 4MB of RAM and supported, OLTP load (airline reservations) with 10,000 terminals and half as manu disks. The architecture of these things is fascinating, the amount of intelligence and elegance is incredible. We have a lot to learn from these. They were the original distributed processing systems. >> BTW, Since 1984 or so, I NEVER lost data due to disk failure. > > I have. > >> I lost a LOT of data due to O/S failures, and some data due to bugs >> in RAID logic. > > That, too, but *much* less. Maybe you've been using different > operating systems? No, I use FreeBSD for the last 18 months. Seagate claims, in writing 1,000,000 hours mean time between failures on their drives. Can FreeBSD make that claim? Can Tandem make this claim about their O/S? No operating system written today (or EVER) can honestly make this claim. I think Seagate is full of it with this million hours nonsense. But 5 years of contineous operation is reasonable. I have Fujitsu drives and some Micropolis drives with that much contineous operation on them. Zero failures BTW, I know the difference between hardware and software. In theory software does not fail. In reality it does. Go to a company like Oracle, which makes money when their software does NOT demonstrat a bug; They make almost as much money from support as from sales. They have a huge army of talented programmers and they keep excellent records of their support. Ask them (they will probably say no :-) about the ratio of service loss due to hardware failure, vs. RDBMS failure, vs. O/S failure (when I was there we had 58 different ports), operator errors, applications failure. You come out of these presentation very sober. >> Although I do not belive Seagate's claim for 1,000,000 hours MTBF, I >> think the realized MTBF will far exceed any FreeBSD uptime. > > Especially not when it comes from Seagate :-) I think, left in the orignal container, in temperature controlled room, non-operational, it may survive for 500,000 hours. Maybe. Running in a badly cooled PC? 5 years top. ---------- Sincerely Yours, Simon Shapiro Shimon@Simon-Shapiro.ORG Voice: 503.799.2313 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980303215215.shimon>