From owner-freebsd-stable Fri Aug 6 21:20:33 1999 Delivered-To: freebsd-stable@freebsd.org Received: from freya.circle.net (morrigu.circle.net [209.95.64.11]) by hub.freebsd.org (Postfix) with ESMTP id BAFCC14CB9 for ; Fri, 6 Aug 1999 21:20:30 -0700 (PDT) (envelope-from tcobb@staff.circle.net) Received: by FREYA with Internet Mail Service (5.5.2448.0) id ; Sat, 7 Aug 1999 00:15:39 -0400 Message-ID: <307D63ED6749CF11AAE9005004461A5B3FB8@FREYA> From: tcobb@staff.circle.net To: lightningweb@hotmail.com, freebsd-stable@FreeBSD.ORG Cc: greg@lightningweb.com, jeremy@lightningweb.com, keith@lightningweb.com, criter@lightningweb.com Subject: RE: continued crashes with 3.1-Stable Date: Sat, 7 Aug 1999 00:15:38 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="windows-1252" Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I think the problem is the SMP. I've been having frequent freezes with SMP under heavy webserver load with 3.2-R, and 3.2-S. I'm unfortunately led to believe that FreeBSD SMP is just not ready for primetime. Too bad the $$ we blew on a dual PIII-550 box. -Troy Cobb Circle Net, Inc. http://www.circle.net > -----Original Message----- > From: lweb Lightningweb [mailto:lightningweb@hotmail.com] > Sent: Friday, August 06, 1999 11:33 PM > To: freebsd-stable@FreeBSD.ORG > Cc: greg@lightningweb.com; jeremy@lightningweb.com; > keith@lightningweb.com; criter@lightningweb.com > Subject: continued crashes with 3.1-Stable > > > We have not resolved our problem with frequent freezes with > our web server. > We had two responses to our first mail to this list, but > neither one was the > solution. The problem is that the server will stop > responding to ANYTHING > except pings. No telnet, no ssh, no web, no ftp, nothing. > Open telnet > sessions don't drop, there's just no response to keyboard activity. > > One suggestion was to fix the "pthreads library," whic we > did. The other > was: "You may have hardware problems." > > This server is going down more frequently. Three times so > far today. There > is no apparant pattern to the crashes. They seem to happen > most often > during an Mysql database query, but it's happened many > times without any > queries (a few times just by running "pine" with a large > mailbox file). > > We cannot recreate the crash when we want, it just crashes > at random times. > We've tried hammering it with web, database queries, and > benchamrking > programs that slam the RAID array and memory and > processors, but it chuggs > right along. > > We have replaced drives in the RAID array, we are now > replacing drive > caddies. Next step I think will be the RAID controller. I > have a strong > gut feeling that it is software however. There's nothing > to substantiate > this, except that that more often than not, the crash > happens during an > MySQL query. > > Some (NOT ALL) of the suspect errors that we've recorded > from the console > during a crash are: > > (da0:dpt0:0:0:0): Invalidating pack > biodone: buffer already done > spec_getpages: I/O read failure: (error code=6) > size: 32768, resid: 32768, a_count: 32768, valid: 0x0 > nread: 0, reqpage: 0, pindex: 0, pcount: 8 > > > Everyone please take a second look at this and help us > brainstorm the > problem? I am including a list of the hardware, the > original message we > sent to the list, and a recent dmesg: > > FreeBSD 3.1-STABLE #1 > Dual-Proc PII 450 > 512MB RAM > DPT PM334UW RAID controller > - 16MB RAM > - dual bus Ultra Wide > - Six 9.1GB Quantum VikingII SCSI3 U2W drives > - Three drives per bus, RAID5, one drive is hot-spare > Intel EtherExpress Pro 10/100B Ethernet > TOSHIBA CD-ROM XM-6201TA > > > -------------- > I've recently had the job of system administration dumped > in my lap. I'm > looking forward to getting on top of it, but I'm a little > behind the 8-ball > right now. If my subject matter varies too far from the > allowed context of > this list, please don't flame me too badly. > > Background: We are running a dual PII 450 system with a 45 > gig raid array, > controlled by a DPT PM334. > > The O/S: FreeBSD 3.1-STABLE #1 > > For several months this has been rock solid. However, in > the past three > weeks, we've had a number of crashes, most of which seem to > be related to > mysql queries. The system would be totally unresponsive to > ssh/telnet and > web, but would still return pings. > > The server is colocated at our ISP, so it's been tricky to > track down the > exact 'on screen' console errors. Today, shortly after we > upgraded our > mysql version, I did see the error. > > > (da0:dpt0:0:0:0): Invalidating Pack > (da0:dpt0:0:0:0): Invalidating Pack > devstat_end_transaction: HELP!! busy_count for da0 is < 0 (-1)! > biodone: buffer already done > (da0:dpt0:0:0:0): Read (10). CDB: 28 0 3 87 33 1f 0 0 80 0 > (da0:dpt0:0:0:0): ILLEGAL REQUEST asc:20,0 > (da0:dpt0:0:0:0): Invalid command operation code > devstat_end_transaction: HELP!! busy_count for da0 is < 0 (-1)! > biodone: buffer already done > > > Followed by a complete system freeze, including the console. > > Some hunting and searching has led us to believe that we > are encountering a > driver failure and that we should bring the OS back to -stable. > > As I said, I haven't done this before, so I'm a little > anxious. Before I > take that step, I would be very greatful to hear some input > from those who > surely know more about this than I do. > > Is bringing the system back to -stable likely to correct > our problem? Am I > missing some indicator in the error above? Has someone > else encountered > similar trouble (and found a fix?) > > I'll be happy to take replies in private e-mail if this is > off topic. > > Any help would be great. > > Thanks, > Jeremy > > > _______________________________________________________________ > Get Free Email and Do More On The Web. Visit http://www.msn.com > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message