From owner-freebsd-stable Sat Aug 7 13:23:21 1999 Delivered-To: freebsd-stable@freebsd.org Received: from freya.circle.net (morrigu.circle.net [209.95.64.11]) by hub.freebsd.org (Postfix) with ESMTP id 697AC14CEE for ; Sat, 7 Aug 1999 13:23:16 -0700 (PDT) (envelope-from tcobb@staff.circle.net) Received: by FREYA with Internet Mail Service (5.5.2448.0) id ; Sat, 7 Aug 1999 16:18:09 -0400 Message-ID: <307D63ED6749CF11AAE9005004461A5B3FC4@FREYA> From: tcobb@staff.circle.net To: seth@freebie.dp.ny.frb.org Cc: lightningweb@hotmail.com, freebsd-stable@FreeBSD.ORG, greg@lightningweb.com, jeremy@lightningweb.com, keith@lightningweb.com, criter@lightningweb.com Subject: RE: continued crashes with 3.1-Stable Date: Sat, 7 Aug 1999 16:18:07 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="windows-1252" Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG The machine I'm having trouble with is a dual-PIII 500 running FreeBSD 3.2-19990730-STABLE. -Troy Cobb Circle Net, Inc. http://www.circle.net > -----Original Message----- > From: Seth [mailto:seth@freebie.dp.ny.frb.org] > Sent: Saturday, August 07, 1999 2:35 PM > To: tcobb@staff.circle.net > Cc: lightningweb@hotmail.com; freebsd-stable@FreeBSD.ORG; > greg@lightningweb.com; jeremy@lightningweb.com; > keith@lightningweb.com; > criter@lightningweb.com > Subject: RE: continued crashes with 3.1-Stable > > > Have to disagree. 3.1-R had some problems. 3.2-S works > flawlessly, even > under heavy load (big disk I/O and >24 system load > averages). This is a > dual Xeon 450, scsi disks off an adaptec 7890, 384 megs RAM. > > Some things you didn't mention -- what was the date of your > last build, > and why haven't you gone to 3.2-S yet? > > SB > > On Sat, 7 Aug 1999 tcobb@staff.circle.net wrote: > > > I think the problem is the SMP. I've been having frequent > > freezes with SMP under heavy webserver load with 3.2-R, > > and 3.2-S. I'm unfortunately led to believe that FreeBSD > > SMP is just not ready for primetime. Too bad the $$ we blew > > on a dual PIII-550 box. > > > > -Troy Cobb > > Circle Net, Inc. > > http://www.circle.net > > > > > -----Original Message----- > > > From: lweb Lightningweb [mailto:lightningweb@hotmail.com] > > > Sent: Friday, August 06, 1999 11:33 PM > > > To: freebsd-stable@FreeBSD.ORG > > > Cc: greg@lightningweb.com; jeremy@lightningweb.com; > > > keith@lightningweb.com; criter@lightningweb.com > > > Subject: continued crashes with 3.1-Stable > > > > > > > > > We have not resolved our problem with frequent freezes with > > > our web server. > > > We had two responses to our first mail to this list, but > > > neither one was the > > > solution. The problem is that the server will stop > > > responding to ANYTHING > > > except pings. No telnet, no ssh, no web, no ftp, nothing. > > > Open telnet > > > sessions don't drop, there's just no response to > keyboard activity. > > > > > > One suggestion was to fix the "pthreads library," whic we > > > did. The other > > > was: "You may have hardware problems." > > > > > > This server is going down more frequently. Three times so > > > far today. There > > > is no apparant pattern to the crashes. They seem to happen > > > most often > > > during an Mysql database query, but it's happened many > > > times without any > > > queries (a few times just by running "pine" with a large > > > mailbox file). > > > > > > We cannot recreate the crash when we want, it just crashes > > > at random times. > > > We've tried hammering it with web, database queries, and > > > benchamrking > > > programs that slam the RAID array and memory and > > > processors, but it chuggs > > > right along. > > > > > > We have replaced drives in the RAID array, we are now > > > replacing drive > > > caddies. Next step I think will be the RAID controller. I > > > have a strong > > > gut feeling that it is software however. There's nothing > > > to substantiate > > > this, except that that more often than not, the crash > > > happens during an > > > MySQL query. > > > > > > Some (NOT ALL) of the suspect errors that we've recorded > > > from the console > > > during a crash are: > > > > > > (da0:dpt0:0:0:0): Invalidating pack > > > biodone: buffer already done > > > spec_getpages: I/O read failure: (error code=6) > > > size: 32768, resid: 32768, a_count: > 32768, valid: 0x0 > > > nread: 0, reqpage: 0, pindex: 0, pcount: 8 > > > > > > > > > Everyone please take a second look at this and help us > > > brainstorm the > > > problem? I am including a list of the hardware, the > > > original message we > > > sent to the list, and a recent dmesg: > > > > > > FreeBSD 3.1-STABLE #1 > > > Dual-Proc PII 450 > > > 512MB RAM > > > DPT PM334UW RAID controller > > > - 16MB RAM > > > - dual bus Ultra Wide > > > - Six 9.1GB Quantum VikingII SCSI3 U2W drives > > > - Three drives per bus, RAID5, one drive is hot-spare > > > Intel EtherExpress Pro 10/100B Ethernet > > > TOSHIBA CD-ROM XM-6201TA > > > > > > > > > -------------- > > > I've recently had the job of system administration dumped > > > in my lap. I'm > > > looking forward to getting on top of it, but I'm a little > > > behind the 8-ball > > > right now. If my subject matter varies too far from the > > > allowed context of > > > this list, please don't flame me too badly. > > > > > > Background: We are running a dual PII 450 system with a 45 > > > gig raid array, > > > controlled by a DPT PM334. > > > > > > The O/S: FreeBSD 3.1-STABLE #1 > > > > > > For several months this has been rock solid. However, in > > > the past three > > > weeks, we've had a number of crashes, most of which seem to > > > be related to > > > mysql queries. The system would be totally unresponsive to > > > ssh/telnet and > > > web, but would still return pings. > > > > > > The server is colocated at our ISP, so it's been tricky to > > > track down the > > > exact 'on screen' console errors. Today, shortly after we > > > upgraded our > > > mysql version, I did see the error. > > > > > > > > > (da0:dpt0:0:0:0): Invalidating Pack > > > (da0:dpt0:0:0:0): Invalidating Pack > > > devstat_end_transaction: HELP!! busy_count for da0 is > < 0 (-1)! > > > biodone: buffer already done > > > (da0:dpt0:0:0:0): Read (10). CDB: 28 0 3 87 33 1f 0 0 80 0 > > > (da0:dpt0:0:0:0): ILLEGAL REQUEST asc:20,0 > > > (da0:dpt0:0:0:0): Invalid command operation code > > > devstat_end_transaction: HELP!! busy_count for da0 is > < 0 (-1)! > > > biodone: buffer already done > > > > > > > > > Followed by a complete system freeze, including the console. > > > > > > Some hunting and searching has led us to believe that we > > > are encountering a > > > driver failure and that we should bring the OS back > to -stable. > > > > > > As I said, I haven't done this before, so I'm a little > > > anxious. Before I > > > take that step, I would be very greatful to hear some input > > > from those who > > > surely know more about this than I do. > > > > > > Is bringing the system back to -stable likely to correct > > > our problem? Am I > > > missing some indicator in the error above? Has someone > > > else encountered > > > similar trouble (and found a fix?) > > > > > > I'll be happy to take replies in private e-mail if this is > > > off topic. > > > > > > Any help would be great. > > > > > > Thanks, > > > Jeremy > > > > > > > > > > _______________________________________________________________ > > > Get Free Email and Do More On The Web. Visit http://www.msn.com > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-stable" in the body of the message > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message