From owner-freebsd-stable Sat Aug 7 5:53:44 1999 Delivered-To: freebsd-stable@freebsd.org Received: from corwin.nall.com (corwin.nall.com [216.30.44.163]) by hub.freebsd.org (Postfix) with ESMTP id 4724214DC1 for ; Sat, 7 Aug 1999 05:53:41 -0700 (PDT) (envelope-from joe@nall.com) Received: from nall.com (localhost [127.0.0.1]) by corwin.nall.com with ESMTP (8.7.1/8.7.1) id HAA08252; Sat, 7 Aug 1999 07:52:03 -0500 (CDT) Message-ID: <37AC2BF2.C4C60F1E@nall.com> Date: Sat, 07 Aug 1999 07:52:02 -0500 From: Joe Nall Organization: Nall Design Works X-Mailer: Mozilla 4.6 [en] (X11; I; HP-UX B.10.26 9000/770) X-Accept-Language: en MIME-Version: 1.0 To: lweb Lightningweb Cc: freebsd-stable@FreeBSD.ORG Subject: Re: continued crashes with 3.1-Stable References: <19990807033241.17071.qmail@hotmail.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG lweb Lightningweb wrote: > > One suggestion was to fix the "pthreads library," whic we did. The other > was: "You may have hardware problems." > ... > We have replaced drives in the RAID array, we are now replacing drive > caddies. Next step I think will be the RAID controller. I have a strong > gut feeling that it is software however. There's nothing to substantiate > this, except that that more often than not, the crash happens during an > MySQL query. > ... > (da0:dpt0:0:0:0): Invalidating pack > biodone: buffer already done > spec_getpages: I/O read failure: (error code=6) > size: 32768, resid: 32768, a_count: 32768, valid: 0x0 > nread: 0, reqpage: 0, pindex: 0, pcount: 8 > > Everyone please take a second look at this and help us brainstorm the > problem? I am including a list of the hardware, the original message we > sent to the list, and a recent dmesg: > > FreeBSD 3.1-STABLE #1 > Dual-Proc PII 450 > 512MB RAM > DPT PM334UW RAID controller > - 16MB RAM > - dual bus Ultra Wide > - Six 9.1GB Quantum VikingII SCSI3 U2W drives > - Three drives per bus, RAID5, one drive is hot-spare > Intel EtherExpress Pro 10/100B Ethernet > TOSHIBA CD-ROM XM-6201TA Don't discount the hardware problem response. We use big (200GB+) Winchester Systems raid arrays on production HP-UX servers at work. These boxes have a custom, modified OS that we were blaming for random, very painful crashes with occasional data corruption on a JFS filesystem. On of the symptoms of these crashes were the lack of crash dumps. Two week ago we found out that the firmware installed in the dual redundant controllers (nothing but the best :) had known problems with similar symptoms to ours and we should upgrade. The lack of crash dump should have been a clue earlier that there were disk problems. The "Invalidating pack" error comes from the SCSI CAM driver in "src/sys/cam/scsi/scsi_da.c" and occurs when there has been a catastrophic error (quoting from the code). The error returned from the driver is ENXIO. It appears that your DPT is dropping a SCSI LUN off line. So far my FreeBSD servers have been exactly as reliable as my hardware. Good Luck, Joe To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message