From owner-freebsd-current@FreeBSD.ORG Fri Oct 10 11:05:45 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BDB4C16A4B3 for ; Fri, 10 Oct 2003 11:05:45 -0700 (PDT) Received: from smtp1.server.rpi.edu (smtp1.server.rpi.edu [128.113.2.1]) by mx1.FreeBSD.org (Postfix) with ESMTP id F326643F93 for ; Fri, 10 Oct 2003 11:05:44 -0700 (PDT) (envelope-from drosih@rpi.edu) Received: from [128.113.24.47] (gilead.netel.rpi.edu [128.113.24.47]) by smtp1.server.rpi.edu (8.12.10/8.12.9) with ESMTP id h9AI5fLg005398 for ; Fri, 10 Oct 2003 14:05:41 -0400 Mime-Version: 1.0 X-Sender: drosih@mail.rpi.edu Message-Id: Date: Fri, 10 Oct 2003 14:05:40 -0400 To: current@FreeBSD.org From: Garance A Drosihn Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-Scanned-By: CanIt (www . canit . ca) Subject: Seeing system-lockups on recent current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Oct 2003 18:05:45 -0000 For the past week or so, I have been having a frustrating time with my freebsd-current/i386 system. It is a dual Athlon system. It has been running -current just fine since December, with me updating the OS every week or two. I did not update it for most of September, and then went to update it to pick up the recent round of security-related fixes. My first update run picked up a change which caused system panics. Other people were also seeing that panic, and it wasn't long before updates were committed to current to fix that problem. However, ever since then my -current system has very frequently locked up. Totally locked. The only way to get it back is a hardware reset. I have rebuilt the system at least a dozen times since then. I have built it with snapshots of /usr/src from Sept 12th to Oct 8th (which is what it's running at the moment). I have dropped back to a single-CPU kernel. I turned off X (in /etc/ttys) so that doesn't start up at all. All those attempts to get a reliable 5.x-system have not worked. Sometimes the system will crash in the middle of a buildworld, other times it will crash while it's basically idle and the monitor is turned off. One time it crashed in the middle of an installworld -- right when it was replacing /lib files. Boy was that a headache to recover from! On the same PC, in a different DOS partition, is a 4.x-stable system. If I boot into 4.x, I have no problems. I fire up all the servers that I run, start buildworlds, run cvsup's, and even had all the 5.x partitions mounted and was running a infinite-loop that MD5'd every file in the 5.x system. I had all of that going on at the same time, and the system is fine. While in the 4.x system, I've removed /usr/src on the 5.x system and recreated it, just in case there were some files corrupted in there. And once the problems started, I made a point of always removing all of /usr/obj/usr/src before starting the buildworld, in case there were corrupted files in there. I still have a few things I want to try. And I know it could still be a hardware problem (although it bugs me that it fails so consistently on 5.x and never fails on 4.x). Perhaps it is just some disk-corruption problem that occurred during the first few panics. But I thought I'd at least mention it, and see if anyone else has been having similar problems. -- Garance Alistair Drosehn = gad@gilead.netel.rpi.edu Senior Systems Programmer or gad@freebsd.org Rensselaer Polytechnic Institute or drosih@rpi.edu