Date: Sat, 22 Jan 2005 02:13:29 -0800 From: "Ted Mittelstaedt" <tedm@toybox.placo.com> To: "Stijn Hoop" <stijn@win.tue.nl> Cc: FreeBSD Questions <freebsd-questions@freebsd.org> Subject: RE: Hardware RAID Message-ID: <LOBBIFDAGNMAMLGJJCKNOEBNFAAA.tedm@toybox.placo.com> In-Reply-To: <20050122090052.GG35557@pcwin002.win.tue.nl>
next in thread | previous in thread | raw e-mail | index | archive | help
> -----Original Message----- > From: Stijn Hoop [mailto:stijn@win.tue.nl] > Sent: Saturday, January 22, 2005 1:01 AM > To: Ted Mittelstaedt > Cc: Sandy Rutherford; FreeBSD Questions > Subject: Re: Hardware RAID > > > > I explicitly stated vinum is a great > > thing if what your wanting to do is use a bunch of cheap disks and > > cheap controller cards to either get a giant partition, or to > > stripe them together and get faster access. > > Yes, but that's what I was refuting in part; I've used it for > reliability purposes to great effect, as I stated. So IMHO it's also a > great thing if you need reliability for a lower price. > Well that may be so but RAID reliability is kind of like this: if there's 10 people running it and 9 of them have no problems and one of them does, then be very afraid! You might be that 10th person. The desirable situation with RAID reliability is to have all 10 people with no problems, and a series of vague rumors that someone heard that a friend of a friend might of had a problem, then when you bother chasing it down you find the person was smoking pipeweed. Another way of saying it is that my kernel crashdump file of a blown-up vinum install that blew my array - which is online for anyone to download if they so choose as I post this - is worth 500 of your testimonals about how reliable vinum is. > > It was not my intent to describe vinum as being 'better' than the > hardware RAID. As I read it, you dismissed software RAID for > reliability purposes. I do. From a structural standpoint a lot more things can go wrong with it. > I was stating that it can be used for that > purpose. > My crashdump file says raid isn't a reliable means of getting out of having to backup your data. > > I didn't say these things couldn't happen on a hardware array. I > > said that when these things do happen, it's worse for a software > > array than a hardware array, and that they happen a lot more on a > > software array. > > In my experience, when bad things happen, it was the same for the > software RAID arrays as for the hardware RAID arrays. > How many hardware arrays vs software arrays do you deal with? Over the last decade I think I've directly admined about 20-30 different makes and models of hardware array cards in different servers. I've lost about 3 disks in those. Admittedly not a lot. But so far I've never had one that lost a disk where replacing the disk didn't recover the array. Oh sure, some of them you had to do some really stupid things like take the server down completely for half the day to do it. But they all came back. During this time I've admined exactly 3 servers on software arrays. One was a news server using ccd which ran for years. The other are 2 vinum servers one of which is going strong, the other blew up due to a bad SCSI cable which wrote garbage on 2 drives making the array unrecoverable. In my experience if the reliabilty was equal, none of the software arrays should have given trouble and one or two of the hardware ones should have blown. Now granted in my vinum case the scsi cable is at fault. But, the log clearly shows vinum trying a write to one disk, getting a parity error, trying a write to another, getting another parity error, then the server freezing. The problem with vinum in this instance wasn't the initial parity errors and freezing. In fact, THAT was exactly what should have happend - shut the works down before you write garbage over the entire disk. The problem was that after a very simple error like that only a few blocks of data on the disks would have been bad so the vinum manager should have been able to recover the array to the point that it could be mounted again, so that fsck could have ripped out a handful of files and got the disk clean. Could this same have happend with a hardware array card? Probably. But I would be betting that the recovery routines in any hardware raid could have got the array to the point that a higher level tool like fsck could have got at least some data off it. And in any case, regardless of whether using software or hardware arrays, you should be backing up. I didn't with my software array and data was lost (fortunately not my data, and I don't know if the people who had data on it were backing their data up, they were supposed to, but I don't trust anyone on that) So I was stupid. Don't you or anyone else be stupid - learn from my mistake. > Regular vinum does have a few warts (notably, online rebuilding is > b0rked) but other than that it's the same procedure: remove bad drive, > add new drive, rebuild. > > I agree that I've seen more failures with software RAID than hardware > RAID. And certainly cost is a factor in that. It still comes down to > cost vs downtime. > What? I don't think I understand what your saying with that statement. RAID when used for reliability is because you cannot be backing up continuously - for example you have a database server that is receiving writes throughout the day, you raid it because you only backup once a day and if you lose a disk your going to lose that days transactions. This isn't a cost-downtime issue this is a do you want to lose your data or not issue. RAID of any kind isn't a substitute for a nightly backup. My mistake was in treating it as such. But even if I was going to do that, at the very least I would have had a better chance at some reliability with a hardware card. > The only thing I 'objected' to in your post was the fact that you > dismissed vinum as being useful in reliability situations. Compared to hardware raid, it isn't useful. raid is like backup - it's not enough just to be able to commit the data to what you are running to provide the reliability - you have to be able to get at least some of the data back if something goes wrong. With the vinum development, I think the assumption made was that a failure situation is always going to involve ONE drive. I think there was no assumption that a minor failure might happen that affected multiple drives that needed to be recovered from. Admittedly that is a statistically lower scenario than the one drive failure scenario, but obviously in retrospect, this assumption reduces reliability. I suppose things might have been different if it had been developed on ESDI drives, har har. But in any case, why is this so important? Even you already said that running without backups was stupid. vinum does a lot of other things that I consider a lot more useful anyhow. After all how many times do you see a bad disk anyway? Maybe once or twice during a servers lifetime? yet disk pack speed is an issue every second of the servers life. vinum can make slow disks run faster when you stripe them together, that alone is far more valuable than reliability. If your disks aren't getting a lot of writes every minute then your data reliability needs can likely be handled with tape during the nightly backup. But a tapedrive won't make disks run faster nor will it allow you to create giant partitions. Ted
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?LOBBIFDAGNMAMLGJJCKNOEBNFAAA.tedm>