From owner-freebsd-net@FreeBSD.ORG Fri Jan 15 22:49:07 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE5F5106566B for ; Fri, 15 Jan 2010 22:49:07 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.26]) by mx1.freebsd.org (Postfix) with ESMTP id 8DBA18FC0C for ; Fri, 15 Jan 2010 22:49:07 +0000 (UTC) Received: by qw-out-2122.google.com with SMTP id 5so159859qwd.7 for ; Fri, 15 Jan 2010 14:49:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=l3EhAHzn3DoKdj/RN4BK9zpp/Q/Vp8IyYRjv9RZdIJA=; b=wFolDghju6zVufo/y8smxxjdEORRerJF1LVrYzItp8CsyD40/obnjOVmp9opjwU2kW dPmK//CEp0MFprPy+tiSZ8Kj57Z7jTtFongMWWxM9y+a+WMdu/Iw7PYnIOxhX/BgomxC s+Rw57E9Ndc87o6XTzsUYCbuNMmg7e7ThUVNQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=c4l36C2f39lr5cq5QChzxplpX3xwMu90PmrH71XoRNOmnmcgyp/79aOzXh8aKwGRw7 ebcrxHJxToRf0KAOJoBz8YqFO0ClEAJqBoCZL95UAhM4EVyzVb4uEDjrETupuGVV5mWJ ueok+K+OQURphxFAA11Wq17buXNa5kJ4s7Iow= Received: by 10.224.115.84 with SMTP id h20mr2647880qaq.289.1263595741415; Fri, 15 Jan 2010 14:49:01 -0800 (PST) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id 23sm2067019qyk.3.2010.01.15.14.48.57 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 15 Jan 2010 14:48:59 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Fri, 15 Jan 2010 14:48:19 -0800 From: Pyun YongHyeon Date: Fri, 15 Jan 2010 14:48:19 -0800 To: Floris Bos Message-ID: <20100115224819.GK1228@michelle.cdnetworks.com> References: <201001140140.o0E1e5hr072464@freefall.freebsd.org> <201001150333.59107.info@je-eigen-domein.nl> <20100115185424.GG1228@michelle.cdnetworks.com> <201001152246.50315.info@je-eigen-domein.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201001152246.50315.info@je-eigen-domein.nl> User-Agent: Mutt/1.4.2.3i Cc: freebsd-net@freebsd.org Subject: Re: kern/92090: [bge] bge: watchdog timeout -- resetting X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2010 22:49:07 -0000 On Fri, Jan 15, 2010 at 10:46:50PM +0100, Floris Bos wrote: > On Friday 15 January 2010 07:54:24 pm Pyun YongHyeon wrote: > > On Fri, Jan 15, 2010 at 03:33:58AM +0100, Floris Bos wrote: > > > On Friday 15 January 2010 01:53:16 am Pyun YongHyeon wrote: > > > > On Thu, Jan 14, 2010 at 09:48:56PM +0100, Floris Bos wrote: > > > > > On Thursday 14 January 2010 09:11:44 pm Pyun YongHyeon wrote: > > > > > > On Thu, Jan 14, 2010 at 09:08:02PM +0100, Floris Bos wrote: > > > > > > > On Thursday 14 January 2010 06:56:03 pm Pyun YongHyeon wrote: > > > > > > > > On Thu, Jan 14, 2010 at 04:33:19AM +0100, Floris Bos wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > On Thursday 14 January 2010 03:54:52 am Pyun YongHyeon wrote: > > > > > > > > > > > == > > > > > > > > > > > bge0: mem 0xdf900000-0xdf90ffff irq 16 at device 0.0 on pci32 > > > > > > > > > > > == > > > > > > > > > > > > > > > > > > > > > > After boot, the network works for about 5 seconds, barely enough time to get an IP by DHCP, and sent a ping or 2. > > > > > > > > > > > Then network connectivity goes down, and after some time there is a "bge0: watchdog timeout -- resetting" message. > > > > > > > > > > > > > > > > > > > > > > Then network works again for 5 seconds, and goes down again. All the time, repeatedly. > > > > > > > > > > > > > > > > > > > > > > The system works fine under Ubuntu. So I assume the hardware is ok. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm not sure but it looks like you have a BCM5784 controller. What is > > > > > > > > > > the output of "devinfo -rv | grep phy"? > > > > > > > > > > > > > > > > > > == > > > > > > > > > ukphy0 pnpinfo oui=0x50ef model=0x3a rev=0x4 at phyno=1 > > > > > > > > > ukphy1 pnpinfo oui=0x50ef model=0x3a rev=0x4 at phyno=1 > > > > > > > > > == > > > > > > > > > > > > > > > > Support for the PHY was added in r202269. > > > > > > > > Please try again after applying the change. Or you can download > > > > > > > > sys/dev/mii/miidevs and sys/dev/mii/brgphy.c from HEAD and rebuild > > > > > > > > kernel. > > > > > > > > > > > > > > Fetched the latest source using CVS on another computer, and transferred it to the system concerned by USB stick. > > > > > > > Rebuild the kernel, but the problem is still there. > > > > > > > > > > > > > Would you show me full dmesg output including "watchodg timeout" > > > > > > messages? > > > > > > > > > > === > > > > > Copyright (c) 1992-2010 The FreeBSD Project. > > > > > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > > > > > The Regents of the University of California. All rights reserved. > > > > > > > > [...] > > > > > > > > > bge0: mem 0xdf900000-0xdf90ffff irq 16 at device 0.0 on pci32 > > > > > miibus0: on bge0 > > > > > brgphy0: PHY 1 on miibus0 > > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > > > > > bge0: Ethernet address: f4:ce:46:0f:2a:2c > > > > > bge0: [FILTER] > > > > > pcib4: irq 16 at device 28.5 on pci0 > > > > > pci34: on pcib4 > > > > > bge1: mem 0xdfa00000-0xdfa0ffff irq 17 at device 0.0 on pci34 > > > > > miibus1: on bge1 > > > > > brgphy1: PHY 1 on miibus1 > > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > > > > > bge1: Ethernet address: f4:ce:46:0f:2a:2d > > > > > bge1: [FILTER] > > > > > > > > [...] > > > > > > > > Would you give attached patch try? I don't know whether it help > > > > or not though. I couldn't find any related information for possible > > > > clue of the issue in publicly available datasheet. > > > > > > The patch did not make any difference. > > > > > > > > > However I did notice something else odd. > > > The problem only occurs on bge0, the second interface bge1 does work. > > > > > > I grabbed the U57DIAG diagnostic boot CD from the Broadcom site, and noticed that the first interface has ASF enabled, while the second one has not. > > > I disabled ASF by doing: > > > > > > = > > > b57udiag -cmd > > > setasf -d > > > == > > > > > > And now the first interface also works properly. > > > > > > > Glad to hear you solved the issue. I totally forgot CURRENT enabled > > ASF support by default(hw.bge.allow_asf). > > > > > So there is something with the ASF stuff that conflicts with FreeBSD. > > > The IPMI card of the system is configured to use a dedicated 3rd LAN port, and is NOT sharing bge0. > > > But perhaps the NIC is initialized differently nevertheless when ASF firmware is enabled, and that is causing issues? > > > > > > > Yes, I remember there were a couple of issues related with ASF. > > Linux seems to have very complex logic to coexist with ASF/IPMI > > firmware which I don't still understand its implications at this > > time. bge(4) may need more robust code to handle that but datasheet > > seems to show very limited information. Lack of ASF/IPMI capable > > bge(4) controller also make me hard to experiment some code. > > Can understand the difficulty to debug such things, without having the hardware. > So I did some more research myself, and found the bug. > > You said Linux was complicated, so I took a look at the Opensolaris bge source instead, to see how they do ASF things and I noticed the following comment ( http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/io/bge/bge_chip2.c ) > > == > 5698 /* > 5699 * The driver is supposed to notify ASF that the OS is still running > 5700 * every three seconds, otherwise the management server may attempt > 5701 * to reboot the machine. If it hasn't actually failed, this is > 5702 * not a desirable result. However, this isn't running as a real-time > 5703 * thread, and even if it were, it might not be able to generate the > 5704 * heartbeat in a timely manner due to system load. As it isn't a > 5705 * significant strain on the machine, we will set the interval to half > 5706 * of the required value. > 5707 */ > == > > What a coincidence, although not the entire system is rebooted, my network link went up & down every 3 seconds according to the switch. > > Seems FreeBSD only notifies ASF every 5 seconds. Attached a patch that reduces it to 2 seconds, and it solves the problem for me, with ASF enabled. > Nice catch! Thanks a lot! Actually I guess there is another bug in ASF handling. I'll request CFT to list and see how other bge(4) controllers work. > > Yours sincerely, > > Floris Bos