From owner-freebsd-stable@FreeBSD.ORG Tue Jan 31 14:41:20 2006 Return-Path: X-Original-To: freebsd-stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ADB7116A422; Tue, 31 Jan 2006 14:41:20 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from harmony.bsdimp.com (vc4-2-0-87.dsl.netrack.net [199.45.160.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C94B43D79; Tue, 31 Jan 2006 14:41:20 +0000 (GMT) (envelope-from imp@bsdimp.com) Received: from localhost (localhost.village.org [127.0.0.1] (may be forged)) by harmony.bsdimp.com (8.13.3/8.13.3) with ESMTP id k0VEcbHA032506; Tue, 31 Jan 2006 07:38:38 -0700 (MST) (envelope-from imp@bsdimp.com) Date: Tue, 31 Jan 2006 07:38:48 -0700 (MST) Message-Id: <20060131.073848.39874110.imp@bsdimp.com> To: roam@ringlet.net From: "M. Warner Losh" In-Reply-To: <20060131112447.GA1173@straylight.m.ringlet.net> References: <20060131091027.CC43516A424@hub.freebsd.org> <20060131083002.GC93773@FreeBSD.org> <20060131112447.GA1173@straylight.m.ringlet.net> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0 (harmony.bsdimp.com [127.0.0.1]); Tue, 31 Jan 2006 07:38:38 -0700 (MST) Cc: mistry.7@osu.edu, glebius@FreeBSD.org, freebsd-stable@FreeBSD.org Subject: Re: dc0: watchdog timeout and nve0: device timeout X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jan 2006 14:41:20 -0000 In message: <20060131112447.GA1173@straylight.m.ringlet.net> Peter Pentchev writes: : On Tue, 31 Jan 2006 11:30:02 +0300, Gleb Smirnoff wrote: : > On Tue, Jan 31, 2006 at 03:08:03AM -0500, Anish Mistry wrote: : > A> After updating to STABLE today I'm getting the following message with : > A> my dc and nve NICs every few seconds. UP, AMD64. A kernel from last : > A> Thursday was fine. : > A> : > A> dc0: watchdog timeout : > A> nve0: device timeout (4) : > : > Can you try to backout the code in sys/dev/pci to Thursday? If this : > doesn't help, you probably need to do a binary search in this small : > timeframe. : : I think I found the problem - the merge was not quite correct, and : the PCI interrupt rerouting was disabled for some reason. : : Warner, is there a reason for hiding the "Try to re-route interrupts" : code behind an apparently "ifdef 0" case? Well, okay, most probably : there is a reason, since you've done it, but... it breaks my re0 card : and it also seems to break Anish's hardware :) I'm pretty sure that's the problem. I thought I'd specifically checked to make sure that I didn't merge this :-( : BTW, the commit message was not quite correct - rev. 1.302 was not : really merged, it's included in my patch here. Also, rev. 1.305 of : pci.c seems to have more than just adding the PCI_FIND_EXTCAP method - : there are a couple of offset fixes that I also included in the patch : while trying to come as close to the -CURRENT code as possible; could : you check if they actually apply to -STABLE? They do. : Anyway, here's a patch that fixes it for me, although most probably : the __PCI_REROUTE_INTERRUPT chunk should be sufficient. Warner, if : you want more details, I could help with debugging this - on my : system, the re0 card definitely needs this rerouting. I've posted : some verbose boot output with explanations at : http://people.FreeBSD.org/~roam/pcirouting/ : The patch itself is also there in case it gets munged by the mail : swervers along the way. : : Index: src/sys/dev/pci/pci.c : =================================================================== : RCS file: /home/ncvs/src/sys/dev/pci/pci.c,v : retrieving revision 1.292.2.6 : diff -u -r1.292.2.6 pci.c : --- src/sys/dev/pci/pci.c 30 Jan 2006 18:42:10 -0000 1.292.2.6 : +++ src/sys/dev/pci/pci.c 31 Jan 2006 10:57:32 -0000 : @@ -428,7 +428,7 @@ : ptrptr = PCIR_CAP_PTR; : break; : case 2: : - ptrptr = 0x14; : + ptrptr = PCIR_CAP_PTR_2; : break; : default: : return; /* no extended capabilities support */ : @@ -447,10 +447,10 @@ : } : /* Find the next entry */ : ptr = nextptr; : - nextptr = REG(ptr + 1, 1); : + nextptr = REG(ptr + PCICAP_NEXTPTR, 1); : : /* Process this entry */ : - switch (REG(ptr, 1)) { : + switch (REG(ptr + PCICAP_ID, 1)) { : case PCIY_PMG: /* PCI power management */ : if (cfg->pp.pp_cap == 0) { : cfg->pp.pp_cap = REG(ptr + PCIR_POWER_CAP, 2); : @@ -1040,7 +1040,8 @@ : } : : if (cfg->intpin > 0 && PCI_INTERRUPT_VALID(cfg->intline)) { : -#ifdef __PCI_REROUTE_INTERRUPT : +#if defined(__ia64__) || defined(__i386__) || defined(__amd64__) || \ : + defined(__arm__) || defined(__alpha__) : /* : * Try to re-route interrupts. Sometimes the BIOS or : * firmware may leave bogus values in these registers. : : Hope this helps! I'm pretty sure that the REROUTE thing is the only one. That shouldn't have been committed, and I thought I'd checked it specifically before the commit, but I just checked what I committed and it slipped by. This fits with the symptoms that I saw my server last night (the only differences between a stable boot and an older stable boot was IRQs). The last part of this patch seems to fix things for me. Warner