Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 25 Aug 2000 10:42:44 -0400 (EDT)
From:      David Gilbert <dgilbert@velocet.ca>
To:        Greg Lehey <grog@lemis.com>
Cc:        David Gilbert <dgilbert@velocet.ca>, freebsd-scsi@FreeBSD.ORG
Subject:   Re: Vinum 29160 detaches drives, invalidates RAID.
Message-ID:  <14758.34276.167320.197675@trooper.velocet.net>
In-Reply-To: <20000825113638.D39208@wantadilla.lemis.com>
References:  <14757.14569.732766.367692@trooper.velocet.net> <20000825113638.D39208@wantadilla.lemis.com>

next in thread | previous in thread | raw e-mail | index | archive | help
>>>>> "Greg" == Greg Lehey <grog@lemis.com> writes:

>> First of all, I'm very pleased with the speed.  The system easily
>> beats the AMI MegaRAID 1500 (same drives) with a whopping 35Mbyte/s
>> in RAID-5 (vs. the 1500's 14Mbyte/s) for read.  (They both score a
>> dead heat of 4Mbyte/s write.)

Greg> Nice to hear :-)

In general, I'm an advocate of the vinum system.  I've been hammering
it for months now on the test RAID-5 system.  Besides this
disconnecting problem, the system is performing very well.

>> Now... if I reboot, and "vinum setstate up" all these drives,

Greg> They all go down, do they?

Not all... Sometimes 2 sometimes 4.  I suppose I should have said that 
I setstate up all the drives that are down.

>> fsck completes without any complaint.  I then generally have to
>> "vinum rebuild parity" ... but I suppose that I'd expect that.

Greg> Hmm.  rebuildparity is a dangerous command.  Basically, a parity
Greg> error means that *one* (or more) of the drives has incorrect
Greg> data.  rebuildparity simply assumes that the error is in the
Greg> data block and "corrects" it.  It's a serious problem, one that
Greg> is very difficult to solve.

Well... at the point of failure, we're doing the nightly finds on the
disk.  I do the fsck (usually) before I do the rebuildparity.  I
suspect that the only information being written to the disk at this
point is the access time updates.  I would expect, then, that corrupt
data is likely limited to an update of this nature.

>> The problem I'm having here (and I've had it before) is that the
>> FreeBSD SCSI system seems to "give up" under conditions that others
>> would keep retrying or resetting/retrying.

>> It seems really, really, really important to me that we try harder
>> to get a drive back online.  This seems as if it could affect the
>> long-term viability of a vinum-based raid server... not because
>> vinum is bad, but because the SCSI subsystem is too fragile.

Greg> Hmm.  I can't really comment on that, but it would be nice if
Greg> the SCSI system could recover from these problems.

I think this is a critical thing.  I can accept that it may be hard to 
discern if the device has been yanked from the bus or had gone into
some other bad state --- but this is definately not the case.  The
FreeBSD SCSI subsystem as-it-stands is very fragile.  I realize that
cabling must be 100% for many different reasons;

... But by the same token, we need things to keep retrying and
resetting far longer before loosing all hope.

Dave.

-- 
============================================================================
|David Gilbert, Velocet Communications.       | Two things can only be     |
|Mail:       dgilbert@velocet.net             |  equal if and only if they |
|http://www.velocet.net/~dgilbert             |   are precisely opposite.  |
=========================================================GLO================


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14758.34276.167320.197675>