From owner-freebsd-stable Sat Aug 7 11:36:15 1999 Delivered-To: freebsd-stable@freebsd.org Received: from p1fed3.frb.org (p1fed3.frb.org [151.198.183.196]) by hub.freebsd.org (Postfix) with ESMTP id 8E9CF14EA8 for ; Sat, 7 Aug 1999 11:36:11 -0700 (PDT) (envelope-from seth@freebie.dp.ny.frb.org) Received: by p1fed3.frb.org; id OAA01335; Sat, 7 Aug 1999 14:35:13 -0400 (EDT) Received: from p1pmdf.p1fw.frb.org(192.168.11.8) by p1fed3.frb.org via smap (3.2) id xma001326; Sat, 7 Aug 99 14:34:52 -0400 Date: Sat, 07 Aug 1999 14:34:49 -0400 (EDT) From: Seth Subject: RE: continued crashes with 3.1-Stable In-reply-to: <307D63ED6749CF11AAE9005004461A5B3FB8@FREYA> To: tcobb@staff.circle.net Cc: lightningweb@hotmail.com, freebsd-stable@FreeBSD.ORG, greg@lightningweb.com, jeremy@lightningweb.com, keith@lightningweb.com, criter@lightningweb.com Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Have to disagree. 3.1-R had some problems. 3.2-S works flawlessly, even under heavy load (big disk I/O and >24 system load averages). This is a dual Xeon 450, scsi disks off an adaptec 7890, 384 megs RAM. Some things you didn't mention -- what was the date of your last build, and why haven't you gone to 3.2-S yet? SB On Sat, 7 Aug 1999 tcobb@staff.circle.net wrote: > I think the problem is the SMP. I've been having frequent > freezes with SMP under heavy webserver load with 3.2-R, > and 3.2-S. I'm unfortunately led to believe that FreeBSD > SMP is just not ready for primetime. Too bad the $$ we blew > on a dual PIII-550 box. > > -Troy Cobb > Circle Net, Inc. > http://www.circle.net > > > -----Original Message----- > > From: lweb Lightningweb [mailto:lightningweb@hotmail.com] > > Sent: Friday, August 06, 1999 11:33 PM > > To: freebsd-stable@FreeBSD.ORG > > Cc: greg@lightningweb.com; jeremy@lightningweb.com; > > keith@lightningweb.com; criter@lightningweb.com > > Subject: continued crashes with 3.1-Stable > > > > > > We have not resolved our problem with frequent freezes with > > our web server. > > We had two responses to our first mail to this list, but > > neither one was the > > solution. The problem is that the server will stop > > responding to ANYTHING > > except pings. No telnet, no ssh, no web, no ftp, nothing. > > Open telnet > > sessions don't drop, there's just no response to keyboard activity. > > > > One suggestion was to fix the "pthreads library," whic we > > did. The other > > was: "You may have hardware problems." > > > > This server is going down more frequently. Three times so > > far today. There > > is no apparant pattern to the crashes. They seem to happen > > most often > > during an Mysql database query, but it's happened many > > times without any > > queries (a few times just by running "pine" with a large > > mailbox file). > > > > We cannot recreate the crash when we want, it just crashes > > at random times. > > We've tried hammering it with web, database queries, and > > benchamrking > > programs that slam the RAID array and memory and > > processors, but it chuggs > > right along. > > > > We have replaced drives in the RAID array, we are now > > replacing drive > > caddies. Next step I think will be the RAID controller. I > > have a strong > > gut feeling that it is software however. There's nothing > > to substantiate > > this, except that that more often than not, the crash > > happens during an > > MySQL query. > > > > Some (NOT ALL) of the suspect errors that we've recorded > > from the console > > during a crash are: > > > > (da0:dpt0:0:0:0): Invalidating pack > > biodone: buffer already done > > spec_getpages: I/O read failure: (error code=6) > > size: 32768, resid: 32768, a_count: 32768, valid: 0x0 > > nread: 0, reqpage: 0, pindex: 0, pcount: 8 > > > > > > Everyone please take a second look at this and help us > > brainstorm the > > problem? I am including a list of the hardware, the > > original message we > > sent to the list, and a recent dmesg: > > > > FreeBSD 3.1-STABLE #1 > > Dual-Proc PII 450 > > 512MB RAM > > DPT PM334UW RAID controller > > - 16MB RAM > > - dual bus Ultra Wide > > - Six 9.1GB Quantum VikingII SCSI3 U2W drives > > - Three drives per bus, RAID5, one drive is hot-spare > > Intel EtherExpress Pro 10/100B Ethernet > > TOSHIBA CD-ROM XM-6201TA > > > > > > -------------- > > I've recently had the job of system administration dumped > > in my lap. I'm > > looking forward to getting on top of it, but I'm a little > > behind the 8-ball > > right now. If my subject matter varies too far from the > > allowed context of > > this list, please don't flame me too badly. > > > > Background: We are running a dual PII 450 system with a 45 > > gig raid array, > > controlled by a DPT PM334. > > > > The O/S: FreeBSD 3.1-STABLE #1 > > > > For several months this has been rock solid. However, in > > the past three > > weeks, we've had a number of crashes, most of which seem to > > be related to > > mysql queries. The system would be totally unresponsive to > > ssh/telnet and > > web, but would still return pings. > > > > The server is colocated at our ISP, so it's been tricky to > > track down the > > exact 'on screen' console errors. Today, shortly after we > > upgraded our > > mysql version, I did see the error. > > > > > > (da0:dpt0:0:0:0): Invalidating Pack > > (da0:dpt0:0:0:0): Invalidating Pack > > devstat_end_transaction: HELP!! busy_count for da0 is < 0 (-1)! > > biodone: buffer already done > > (da0:dpt0:0:0:0): Read (10). CDB: 28 0 3 87 33 1f 0 0 80 0 > > (da0:dpt0:0:0:0): ILLEGAL REQUEST asc:20,0 > > (da0:dpt0:0:0:0): Invalid command operation code > > devstat_end_transaction: HELP!! busy_count for da0 is < 0 (-1)! > > biodone: buffer already done > > > > > > Followed by a complete system freeze, including the console. > > > > Some hunting and searching has led us to believe that we > > are encountering a > > driver failure and that we should bring the OS back to -stable. > > > > As I said, I haven't done this before, so I'm a little > > anxious. Before I > > take that step, I would be very greatful to hear some input > > from those who > > surely know more about this than I do. > > > > Is bringing the system back to -stable likely to correct > > our problem? Am I > > missing some indicator in the error above? Has someone > > else encountered > > similar trouble (and found a fix?) > > > > I'll be happy to take replies in private e-mail if this is > > off topic. > > > > Any help would be great. > > > > Thanks, > > Jeremy > > > > > > _______________________________________________________________ > > Get Free Email and Do More On The Web. Visit http://www.msn.com > > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > > with "unsubscribe freebsd-stable" in the body of the message > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message