Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 01 Jun 2003 11:36:02 -0600
From:      Scott Long <scott_long@btc.adaptec.com>
To:        "Marc G. Fournier" <scrappy@hub.org>
Cc:        freebsd-scsi@freebsd.org
Subject:   Re: Critical bug in Adaptec(aac) driver ...
Message-ID:  <3EDA3982.5040202@btc.adaptec.com>
In-Reply-To: <20030601131404.P6572@hub.org>
References:  <20030601131404.P6572@hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Marc G. Fournier wrote:
> As those on this list will have seen over the past few months, I have a
> server that had (past tense) an Adaptec 2120s controller in her that was
> giving alot of grief ... about 3 weeks ago, the server it was in *really*
> blew up ... one drive was reported as down (in a RAID5 array), and when we
> tried to bring it back up, a second drive started to "fail" ... I got the
> techs to shut her down, and literally rushed to the remote location to see
> if there was anything that I could do to at least recover the data ...
> 
> When I got there to bring it back up, the server reported that a 3rd drive
> had failed ... and within a few hours, a 4th drive failed ... the result
> being that we lost all of the data on that server, which turned out to be
> quite painful to recover ...
> 
> While down there, we replaced the Adaptec controller with an Intel one,
> reformatted the exact same drives, in the exact same chassis, and she's
> been running fine since ...
> 
> On my trip back, I had a chat with a friend that does development work in
> the Linux world, and who had had that server previous to myself, and
> apparently there is a "known bug" in Linux that he says sounds exactly
> like what I experienced (they hit it right in the middle of developing on
> that box) and that there are apparently two Linux kernel patches that they
> had to apply (after rebuilding from scratch) to correct the problem ...
> 
> The way he explained the problem to me, he made it sound like the kernel
> driver was interacting with the BIOs and causing some corruption ... not
> sure at what level, but since trying to swap in a new controller didn't
> restore things, I'm suspecting at the hard drive level ... ?
> 
> Scott, while down there, I tried just about everything I could think to
> ... we replaced the SCSI cable, put the drives/controller into a second
> identical chassis, swap host controller cards themselves (I had brought
> spares) ... and that server, as I mentioned, is currently running quite
> happily with an Intel host controller in it :(  So, unless the same
> "failure" was hitting two host controllers, hardware failure doesn't seem
> to have been the cause ...
> 

I understand your frustration and wish there was more I could do to 
help.  Please send me whatever information that you have.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EDA3982.5040202>