Date: Sun, 01 Jun 2003 11:36:02 -0600 From: Scott Long <scott_long@btc.adaptec.com> To: "Marc G. Fournier" <scrappy@hub.org> Cc: freebsd-scsi@freebsd.org Subject: Re: Critical bug in Adaptec(aac) driver ... Message-ID: <3EDA3982.5040202@btc.adaptec.com> In-Reply-To: <20030601131404.P6572@hub.org> References: <20030601131404.P6572@hub.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Marc G. Fournier wrote: > As those on this list will have seen over the past few months, I have a > server that had (past tense) an Adaptec 2120s controller in her that was > giving alot of grief ... about 3 weeks ago, the server it was in *really* > blew up ... one drive was reported as down (in a RAID5 array), and when we > tried to bring it back up, a second drive started to "fail" ... I got the > techs to shut her down, and literally rushed to the remote location to see > if there was anything that I could do to at least recover the data ... > > When I got there to bring it back up, the server reported that a 3rd drive > had failed ... and within a few hours, a 4th drive failed ... the result > being that we lost all of the data on that server, which turned out to be > quite painful to recover ... > > While down there, we replaced the Adaptec controller with an Intel one, > reformatted the exact same drives, in the exact same chassis, and she's > been running fine since ... > > On my trip back, I had a chat with a friend that does development work in > the Linux world, and who had had that server previous to myself, and > apparently there is a "known bug" in Linux that he says sounds exactly > like what I experienced (they hit it right in the middle of developing on > that box) and that there are apparently two Linux kernel patches that they > had to apply (after rebuilding from scratch) to correct the problem ... > > The way he explained the problem to me, he made it sound like the kernel > driver was interacting with the BIOs and causing some corruption ... not > sure at what level, but since trying to swap in a new controller didn't > restore things, I'm suspecting at the hard drive level ... ? > > Scott, while down there, I tried just about everything I could think to > ... we replaced the SCSI cable, put the drives/controller into a second > identical chassis, swap host controller cards themselves (I had brought > spares) ... and that server, as I mentioned, is currently running quite > happily with an Intel host controller in it :( So, unless the same > "failure" was hitting two host controllers, hardware failure doesn't seem > to have been the cause ... > I understand your frustration and wish there was more I could do to help. Please send me whatever information that you have. Scott
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EDA3982.5040202>