From owner-freebsd-current@FreeBSD.ORG Thu Nov 13 08:22:41 2003 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 664E016A4CE for ; Thu, 13 Nov 2003 08:22:41 -0800 (PST) Received: from mail.speakeasy.net (mail7.speakeasy.net [216.254.0.207]) by mx1.FreeBSD.org (Postfix) with ESMTP id 78F1643FDF for ; Thu, 13 Nov 2003 08:22:39 -0800 (PST) (envelope-from jhb@FreeBSD.org) Received: (qmail 20697 invoked from network); 13 Nov 2003 16:22:38 -0000 Received: from unknown (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender )encrypted SMTP for ; 13 Nov 2003 16:22:38 -0000 Received: from laptop.baldwin.cx (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.9/8.12.9) with ESMTP id hADGMZEW000736; Thu, 13 Nov 2003 11:22:35 -0500 (EST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.5.4 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <3FB31128.1030709@wossname.net> Date: Thu, 13 Nov 2003 11:22:33 -0500 (EST) From: John Baldwin To: Benjamin Lewis X-Spam-Checker-Version: SpamAssassin 2.55 (1.174.2.19-2003-05-19-exp) cc: sos@FreeBSD.org cc: current@FreeBSD.org cc: bhlewis@purdue.edu Subject: RE: Recent -current hangs on Tyan S2460 before finishing boot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 13 Nov 2003 16:22:41 -0000 On 13-Nov-2003 Benjamin Lewis wrote: > Hello, > > I'm having trouble getting recent (post- "device apic", pre- turnstile) > kernels to boot on my Tyan S2460 (Tiger MP) system with dual AMD > Athlons. What happens is that the machine seems to get "stuck" soon > after the "Waiting for SCSI devices to settle" message is printed -- it > appears to be willing to wait forever rather than the SCSI_DELAY time. > > Disabling ACPI in the BIOS has no apparent effect on the hang. Using > SCHED_4BSD or SCHED_ULE likewise makes no difference. I've been > following the current@ list hoping to see someone else report a problem > similar to mine but haven't seen anything yet. > > I do have a serial console attached to the machine and DDB enabled so > I'm able to provide some information and get more if needed. I'm > including a copy of the boot messages from my last attempt to boot > "FreeBSD 5.1-CURRENT #2: Tue Nov 11 17:35:40 EST 2003" which was > cvsup'ed shortly prior to the build date. Included in the messages are > the output of "ps" and "trace" once I broke into ddb. > > I'm also including output from "acpidump -t" and "mptable -verbose" > since I've seen that information requested in the past. > > Some details about the system that may be pertinent: > 1. It has two 1Ghz Athlon "Thunderbird" (Not MP) processors. > That hasn't been a problem so far. > 2. The BIOS is version 1.04 (latest is 1.05). The last time I > tried updating to 1.05 (some time ago) I saw lots of error > messagess complaining about undefined ACPI stuff so I > reverted. > 3. There is a Tekram 390F (I think that's the model -- it uses > the sym driver) and an Adaptec 3944 SCSI controller. A > single internal SCSI drive is connected to the Tekram and 10 > external drives are connected to the two ports on the 3944. > The external drives are configured as a Vinum Raid10 array. > There's also a single IDE drive connected to one of the > built-in IDE controllers. > > Please let me know if there is anything more you want to know. > > Thanks, Can you do a 'show intrcnt' from the ddb prompt? It sounds like you may be getting an interrupt storm due to a mis-routed PCI interrupt. Actually, I think the problem is in the ata driver. Well, there are possibly bugs in the interrupt code in that interrupts that don't exist in the mptable (IRQ's 11 and 15) still get created, but, the fact that the mptable has no IRQ 15 to me means that there is no IRQ 15 and thus there should not be an ata1. Note that in your dmesg, ata1 does say that it doesn't do DMA because it has been disabled. Perhaps the ata driver needs to disable ata1 altogether on that chipset if it sees that condition. My guess is that the ata driver is waiting forever for an interrupt from ata1 which is never going to arrive, hence the hang. Do you have a boot -v dmesg from a working kernel? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/