From owner-freebsd-stable@FreeBSD.ORG Tue Mar 9 12:59:27 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4364E16A4CE for ; Tue, 9 Mar 2004 12:59:27 -0800 (PST) Received: from drtboi.rdsl.lmi.net (drtboi.rdsl.lmi.net [66.117.128.114]) by mx1.FreeBSD.org (Postfix) with ESMTP id E3BBA43D1F for ; Tue, 9 Mar 2004 12:59:26 -0800 (PST) (envelope-from drtboi@drtboi.rdsl.lmi.net) Received: from drtboi.rdsl.lmi.net (localhost [127.0.0.1]) by drtboi.rdsl.lmi.net (8.12.10/8.12.10) with ESMTP id i29KxQPa001706 for ; Tue, 9 Mar 2004 12:59:26 -0800 Message-Id: <200403092059.i29KxQPa001706@drtboi.rdsl.lmi.net> From: Todd Meister To: "FreeBSD Stable" In-reply-to: Your message of "Mon, 01 Mar 2004 08:59:16 PST." <200403010859.16585.dsilver@urchin.com> Date: Tue, 09 Mar 2004 12:59:26 -0800 Sender: drtboi@drtboi.rdsl.lmi.net Subject: Re: Unexplained reboots with 4.9 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: todd@lmi.net List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Mar 2004 20:59:27 -0000 Doug Silver writes: >I recently brought up an old 700MHz P3 4.9 machine and added a 3ware IDE raid >card 7506-8 with 4 120Gb drives in a raid 5 array. It will randomly reboot, >about once-per-day and even though I'm running a debug kernel, it does not >leave any crash information (which I assume just means that the kernel did >not panic and dump core). After the first few times, I upgraded to a new >400W power supply. The machine is not heavily loaded and its primary >function is for NFS/samba sharing. This is an old thread (I'm playing catch-up with this list), but I just had a similar problem. We had a new, 2U, P4 2.66GHz machine, all-SCSI, with an Adaptec 2100s RAID device (using the asr driver) doing RAID 5 with four drives, plus one spare, and a gigabyte of RAM. This was our new mail server, and got about 150k to 200k connections a day. We run MIME Defang with Clamav, and a lot of our users use Spamassassin. So it was used pretty thoroughly, though it rarely hit a load greater than .6, and swap was nearly un-utilized. The first four days it ran, it was fine. Then it spontaneously rebooted one morning. Two days later, it did the same thing. Within a week, it would barely stay up a full 24 hours (we were going through a lot of troubleshooting during this time, BTW, not just standing around, picking our noses). We ended up taking the whole thing down and reinstating our old, barely-sufficient system, while we tested the box. I could go through a list of everything we tested, but won't bother racking my memory, unless someone really wants to hear it. We ended up replacing nearly every piece of hardware but the case - NIC, M/B, RAID card, RAM - but nothing worked. I was always pretty sure it was hardware-related, as we could never capture a panic, and by the time it got really bad (the day we replaced the M/B), I could watch it reboot almost as soon as it finished booting. In the end, the culprit was exactly what I suspected from the beginning, but was assured it couldn't be - the riser card in the 2U case. We don't have anything but circumstantial evidence pointing to that, but it's pretty sure. If we took the riser card out of the case, and plugged everything directly into the M/B (which required leaving the top of the case, of course), we could bludgeon the system with SMTP connections while running a disk I/O benchmarker and FTPing large amounts of data in variously-sized files back and forth. If we put the card back in, it'd reboot in about three hours. We switched to a 4U case, upgraded the system the Friday before last, and haven't had a problem, yet (fingers crossed, knocking on wood, etc.). So I guess all this was just to say "beware the riser card." -Todd