From owner-freebsd-current@FreeBSD.ORG Sat Aug 11 09:26:53 2007 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A466C16A420 for ; Sat, 11 Aug 2007 09:26:53 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from wa-out-1112.google.com (wa-out-1112.google.com [209.85.146.176]) by mx1.freebsd.org (Postfix) with ESMTP id 776D213C45B for ; Sat, 11 Aug 2007 09:26:53 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: by wa-out-1112.google.com with SMTP id k17so1230405waf for ; Sat, 11 Aug 2007 02:26:53 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:received:received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=H0eKCkzN/19VlN5bQtY0Za5KN/hDUozX2tJdBq1jQJMnbcSP3D3Pm29+g5GrC/Ndnwr4OvWuIGubELb8w5+3M89lZTLoW4XV49b8e+sA+dOU64V78soESCFPj2AWJEWzobtY1PMj+eClySQ46OnEN9JMqML4H+o1tyiHrA2gQBQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:date:from:to:cc:subject:message-id:reply-to:references:mime-version:content-type:content-disposition:in-reply-to:user-agent; b=KuPUhrCeyvScY5ZKILRfKorlTh8zUPMwHr0uDhWuIPQJ45vf0LOoajw9O8eTCT57bpr/OtvzwvbMxeMbxGvRE563d8aRwhdGUwSsnIlK36PLlpbxFwgB1Wzh/DNzxpJjwbV5S+vZYbwGrQAgf5Cn5WDhnALtqL5KB1qqDBR9BRQ= Received: by 10.114.175.16 with SMTP id x16mr675843wae.1186824412882; Sat, 11 Aug 2007 02:26:52 -0700 (PDT) Received: from michelle.cdnetworks.co.kr ( [211.53.35.84]) by mx.google.com with ESMTPS id m17sm4066436waf.2007.08.11.02.26.34 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 11 Aug 2007 02:26:37 -0700 (PDT) Received: from michelle.cdnetworks.co.kr (localhost.cdnetworks.co.kr [127.0.0.1]) by michelle.cdnetworks.co.kr (8.13.5/8.13.5) with ESMTP id l7B9QQZ5022667 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 11 Aug 2007 18:26:26 +0900 (KST) (envelope-from pyunyh@gmail.com) Received: (from yongari@localhost) by michelle.cdnetworks.co.kr (8.13.5/8.13.5/Submit) id l7B9QQqr022666; Sat, 11 Aug 2007 18:26:26 +0900 (KST) (envelope-from pyunyh@gmail.com) Date: Sat, 11 Aug 2007 18:26:26 +0900 From: Pyun YongHyeon To: Don Lewis Message-ID: <20070811092626.GB22569@cdnetworks.co.kr> References: <200708110542.l7B5gJm9058171@gw.catspoiler.org> <20070811092222.GA22569@cdnetworks.co.kr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070811092222.GA22569@cdnetworks.co.kr> User-Agent: Mutt/1.4.2.1i Cc: current@FreeBSD.org Subject: Re: bizarre nfe(4) problem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Aug 2007 09:26:53 -0000 On Sat, Aug 11, 2007 at 06:22:22PM +0900, To Don Lewis wrote: > On Fri, Aug 10, 2007 at 10:42:19PM -0700, Don Lewis wrote: > > I've a rather strange nfe(4) problem that appears to be repeatable. I > > recently started running -CURRENT on a older socket 754 motherboard with > > the nForce3 chipset. Initially, I was running an SMP kernel, but I had > > problems with sporadic "nfe0: watchdog timeout (missed Tx interrupts) -- > > recovering" problems that would intermittently cause the system to lose > > network connectivity which it would recover from. The kernel was very > > similar to GENERIC, with just the addition of "options DEBUG_VFS_LOCKS" > > and the replacement of atapicd with atapicam. > > > > The nfe0 problem totally went away when I removed "options SMP" and > > "device apic" from the kernel configuration, except under the following > > very specific circumstances: > > > > A vncserver session using the GNOME desktop was started on the > > system. > > > > There was no keyboard or mouse activity on the console for an > > extended period of time, allowing the GNOME screen saver to kick > > in and lock the screen. > > > > The system would run fine in this state for many hours, and would accept > > incoming SMTP connections, etc. > > > > A remote vncclient makes a connection to the vncserver session > > and the password was entired on the client. > > > > At this point the nfe0 interface would appear to go deaf. This might > > happen before or slightly after the password dialog box appeared for the > > vnc session. For a short while, the system would be able to transmit > > TCP packets, ntp queries, etc., but it would not respond to any incoming > > packets (ping, TCP connection requests, etc.). Eventually, the ARP cache > > would time out and the only packets being transmitted would be ARP > > requests and the occasional UDP broadcast from the samba server running > > on the machine. > > > > Pressing any key on the (PS/2) keyboard would instantly bring the > > network interface back to life. Examination of /var/log/messages showed > > lots of "nfe0: watchdog timeout" messages for the entire time that nfe0 > > was not listening to the network. > > > > I've had this problem happen twice. Both times were after an extended > > period of console inactivity. An incoming vnc connection is not > > sufficient to trigger the problem if the console was recently active, > > and even waiting for the GNOME screensaver to put the monitor in DPMS > > power save mode before initiating the vnc connection does not appear to > > be sufficient to trigger the problem. > > > > I believe that nfe0 was sharing an interrupt with one of the USB ports > > when the kernel was compiled with "device apic", but it is not sharing > > an interrupt without "device apic". > > > > Any thoughts on how to debug this problem? > > > > > > # vmstat -i > > interrupt total rate > > irq0: clk 41903449 1000 > > irq1: atkbd0 39034 0 > > irq3: ohci0 5 0 > > irq7: ppc0 2 0 > > irq8: rtc 5362802 127 > > irq9: ohci1 ahc0+ 1963559 46 > > irq10: nfe0+ 225593 5 > ^^ > You have nfe0+ which indicates vmstat had run out of room to > display somthing. I'm not sure but it's still sharing interrupt > with other device? It seems the interrupt is shared with atapci1. > > > irq11: drm0 2511908 59 > > irq12: psm0 332931 7 > > irq14: ata0 48 0 > > Total 52339331 1249 > > > > -- > Regards, > Pyun YongHyeon -- Regards, Pyun YongHyeon