From owner-freebsd-stable@FreeBSD.ORG Thu Dec 1 01:26:40 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 34E3216A41F for ; Thu, 1 Dec 2005 01:26:40 +0000 (GMT) (envelope-from dan@syz.com) Received: from mail.clearwave.ca (h139-142-194-114.gtcust.grouptelecom.net [139.142.194.114]) by mx1.FreeBSD.org (Postfix) with ESMTP id 86AF143D5E for ; Thu, 1 Dec 2005 01:26:39 +0000 (GMT) (envelope-from dan@syz.com) Received: from localhost (localhost.clearwave.ca [127.0.0.1]) by mail.clearwave.ca (Postfix) with ESMTP id C23A510378D2; Wed, 30 Nov 2005 18:26:19 -0700 (MST) Received: from mail.clearwave.ca ([127.0.0.1]) by localhost (mail.clearwave.ca [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 90343-06; Wed, 30 Nov 2005 18:26:11 -0700 (MST) Received: from [192.168.2.108] (h139-142-196-33.gtcust.grouptelecom.net [139.142.196.33]) by mail.clearwave.ca (Postfix) with ESMTP id 3E1F410378D1; Wed, 30 Nov 2005 18:26:11 -0700 (MST) In-Reply-To: <438D3D8E.3010609@math.missouri.edu> References: <20051129204524.C626D16A41F@hub.freebsd.org> <6740EFFC-3303-4030-A175-2348A7067F9A@syz.com> <438D3D8E.3010609@math.missouri.edu> Mime-Version: 1.0 (Apple Message framework v746.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Dan Charrois Date: Wed, 30 Nov 2005 18:26:29 -0700 To: Stephen Montgomery-Smith X-Mailer: Apple Mail (2.746.2) X-Virus-Scanned: amavisd-new at clearwave.ca Cc: freebsd-stable@freebsd.org Subject: Re: FreeBSD unstable on Dell 1750 using SMP? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Dec 2005 01:26:40 -0000 This is encouraging - it's the first I've heard of someone who has found a way to trigger the problem "on demand". The problems I was experiencing were on a dual Xeon with HTT enabled as well. Perhaps someone out there who knows much more about the inner workings of FreeBSD may have an idea of why running top in "aggressive mode" like this might trigger the random rebooting. In particular, it would be nice to *know* that someone out there specifically fixed whatever is wrong in 5.4 when bringing it to 6.0. It's encouraging that you haven't had any problems since upgrading to 6.0, but I have to wonder if the bug's actually fixed, or the specific trigger of running top doesn't trigger the problem but the problem is still lurking in the background waiting to strike with the right combination of events. In any case, I'm anxious to try it out myself on our server to see if "top -s0" brings it down "on command" with HTT enabled, and not with HTT disabled. But I'm going to have to wait until some time over the Christmas holidays to do that sort of experimentation at a time when it isn't affecting the end users of the machine. I may also upgrade to 6.0 at that time, since by then it will have been out for a couple of months, so most of the worst quirks should be worked out by then. In the meantime, disabling HTT as I've done seems like a reasonable precaution to improve the stability.. Thanks for your help! Dan On Nov 29, 2005, at 10:50 PM, Stephen Montgomery-Smith wrote: > Dan Charrois wrote: > >> It actually may be a comfort, since perhaps HTT is related to the >> culprit. Since the last crash, about a month ago, I disabled >> HTT, both in the kernel as well in the BIOS. So as far as I >> know, it's completely been disabled (and the boot messages and >> top only show 2 CPUs). And I haven't had the system go down for >> nearly a month now. > > I don't know if it is related, but I used to have random reboots on > a dual Xeon system with HTT enabled. It happened when I ran a CPU > intensive threaded program at the same time as "top" - running "top > -s0" (which you have to do as root) could usually kill the machine > in seconds if not minutes. > > All I can tell you is that with FreeBSD 6.0 the problem disappeared. > > Well not totally - I still get a bunch of harmless calcru negative > messages, although I don't know if it is actually related to the > boot problems I used to have with FreeBSD 5.4, because I get the > calcru backwards messages even with HTT disabled. > > Anyway, if you are in the mood to try it out, you might like to try > re-enabling HTT, starting up whatever process you usually use (I'm > guessing it is MySQL), and then run "top -s0". If you get a crash > soon after that, you have the same problem I had. > > Let me also add that these crashes usually did not trigger a crash > dump (I had dumpon set), and when it did the resulting dump looked > rather corrupted. > > Stephen > -- Syzygy Research & Technology Box 83, Legal, AB T0G 1L0 Canada Phone: 780-961-2213