From owner-freebsd-current@FreeBSD.ORG Thu Sep 16 18:36:42 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1306016A4CF for ; Thu, 16 Sep 2004 18:36:42 +0000 (GMT) Received: from postal2.es.net (postal2.es.net [198.128.3.206]) by mx1.FreeBSD.org (Postfix) with ESMTP id EE44243D41 for ; Thu, 16 Sep 2004 18:36:41 +0000 (GMT) (envelope-from oberman@es.net) Received: from ptavv.es.net ([198.128.4.29]) by postal2.es.net (Postal Node 2) with ESMTP (SSL) id IBA74465; Thu, 16 Sep 2004 11:36:41 -0700 Received: from ptavv (localhost [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 3A0FB5D04; Thu, 16 Sep 2004 11:36:41 -0700 (PDT) To: Scott Long In-reply-to: Your message of "Wed, 15 Sep 2004 15:05:34 MDT." <4148AE9E.5050905@samsco.org> Date: Thu, 16 Sep 2004 11:36:41 -0700 From: "Kevin Oberman" Message-Id: <20040916183641.3A0FB5D04@ptavv.es.net> cc: Mike Jakubik cc: DanGer cc: current@freebsd.org cc: =?ISO-8859-1?Q?S=F8ren_Schmidt?= Subject: Re: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=207594611 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Sep 2004 18:36:42 -0000 > Date: Wed, 15 Sep 2004 15:05:34 -0600 > From: Scott Long > Sender: owner-freebsd-current@freebsd.org > > Søren Schmidt wrote: > > Mike Jakubik wrote: > > > >> Søren Schmidt said: > >> > >> > >>> You are having massive ICRC problems which are different and most likely > >>> due to bad cables/connectors or cables that are turned around (blue > >>> connector at controller, black/grey at devices), or it can be a > >>> weak/overloaded PSU. > >>> > >> This is a different error message from what everyone else, including > >> me is > >> reporting. What about the errors we are getting? > > > > > > I have no idea, I can't reproduce the problem at all. However I suspect > > somthing else is blocking interrupt delivery but its just a hunch... > > > > -Søren > > > > I'm finding it hard to imagine a scenario where a timeout could fire but > not a hardware interrupt. Nothing usually shares the interrupt vectors > with ATA, so it's pretty unlikely that the ata ithread is being blocked > by anything but itself. This sounds reasonable, but I can make the problem start/stop by starting/stopping the network card. No problems in single-user. Then I 'ifconfig xl0 192.116.1.1' and immediately start getting the errors. I also get watchdog timeouts on xl0. 'ifconfig xl0 down' stops the errors. xl0 is on IRQ10, ata1 is on IRQ15. I have a K6 processor in an ASUS P5A with neither SMP or APIC. (I am running ACPI, not that there is much to it on this system.) While I don't entirely discount the possibility that this is in ata, it seems odd that I get no errors even doing a buildworld as long as the network is off. This started pretty recently, but changes have been made in the period of suspicion to the scheduler, ACPI, and ata, so it's still fuzzy. My system gets the errors consistently enough that I will try to narrow down what patch caused the problem. (Wish it was a bit faster to build kernels, though!) I have a feeling in the pit of my stomach that it's going to show up at with a scheduler patch MT5, but I hope I'm wrong! I think I'd prefer an ATA problem to a scheduler issue. (Of course, Søren probably has a differing opinion on this.) -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634