From owner-freebsd-stable@FreeBSD.ORG Thu May 30 06:44:42 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 14BB2DEF for ; Thu, 30 May 2013 06:44:42 +0000 (UTC) (envelope-from danny@cs.huji.ac.il) Received: from kabab.cs.huji.ac.il (kabab.cs.huji.ac.il [132.65.16.84]) by mx1.freebsd.org (Postfix) with ESMTP id 811FC257 for ; Thu, 30 May 2013 06:44:41 +0000 (UTC) Received: from pampa.cs.huji.ac.il ([132.65.80.32]) by kabab.cs.huji.ac.il with esmtp id 1UhwbD-000Ea3-SL; Thu, 30 May 2013 09:44:35 +0300 X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.3 To: pyunyh@gmail.com Subject: Re: SunFire X2200 ilo's bge1 DOWN/UP In-reply-to: <20130529085544.GC3042@michelle.cdnetworks.com> References: <20130528052953.GA1457@michelle.cdnetworks.com> <20130528064850.GB1457@michelle.cdnetworks.com> <20130529085544.GC3042@michelle.cdnetworks.com> Comments: In-reply-to YongHyeon PYUN message dated "Wed, 29 May 2013 17:55:44 +0900." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 30 May 2013 09:44:35 +0300 From: Daniel Braniss Message-ID: Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 May 2013 06:44:42 -0000 > > --/04w6evG8XlLl3ft > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > > On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote: > > > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote: > > > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote: > > > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote: > > > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200, > > > > > > > > > > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output. > > > > > > > > > > > > > > > > > > > bge0: mem > > > > > > 0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6 > > > > > > bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > > > miibus2: on bge0 > > > > > > brgphy0: PHY 1 on miibus2 > > > > > > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd > > > > > > bge1: mem > > > > > > 0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6 > > > > > > bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz > > > > > > miibus3: on bge1 > > > > > > brgphy1: PHY 1 on miibus3 > > > > > > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > > > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow > > > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be > > > > > > > > > > > > sf-10> ifconfig bge1 > > > > > > bge1: flags=8802 metric 0 mtu 1500 > > > > > > options=8009b > > > > > TE> > > > > > > ether 00:1b:24:5d:5b:be > > > > > > nd6 options=21 > > > > > > media: Ethernet autoselect (100baseTX ) > > > > > > status: active > > > > > > > > > > > > > > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events. > > > > > Do you have some network script run by cron? > > > > > > > > no scripts. > > > > this port is shared with the ILO/IPMI, and back in March you fixed a problem > > > > that it was hanging soon after it was initialized by the driver, > > > > (r248226 - but I'm not sure if it was ever MFC'ed). > > > > > > It was MFCed. > > > > > > > Initialy I thought it could be caused by connections to it from other > > > > hosts (either via the web, or ssh) so I killed them, but it didn't help. > > > > without that patch the connection fails, and I don't see any DOWN/UP. > > > > > > Could you check how many number of interrupts you get from bge1? > > > Ideally you shouldn't get any interrupts for bge1. > > > > it's not even mentioned :-) > > sf-04> vmstat -i > > interrupt total rate > > irq3: uart1 964 0 > > irq4: uart0 6 0 > > irq14: ata0 227354 0 > > irq17: bge0 1021981 2 > > irq21: ohci0 28 0 > > irq22: ehci0 2 0 > > irq23: atapci1 293228 0 > > cpu0:timer 383244076 1124 > > cpu1:timer 2225144 6 > > cpu2:timer 2056087 6 > > cpu3:timer 2093943 6 > > Total 391162813 1147 > > > > Then the only way link UP/DOWN event could be generated for DOWN > interface would be invocation of media status query > (i.e. ifconfig -a) triggered by an external application. Most > drivers I touched check IFF_UP flag before poking media status > register. However I'm not sure you're seeing this issue because you > do not use any network script run by cron. > Anyway, try attached patch and let me know whether it makes any > difference. > > > > > > > > > > > > > > > > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO. > > > > > > > > To check, I upgraded another identical host, and the same problem appears. > > > > > > > > > > > > > > What is the last known working revision? > > > > > > > > > > > > I have no idea, but I have older versions, and ill start from the oldets > > > > > > (9.1-prerelease), but > > > > > > it will take time, since it takes hours till it happens. > > > > > > > > > > > > > > > > ok. > > > > > > > > > > > > > > --/04w6evG8XlLl3ft > Content-Type: text/x-diff; charset=us-ascii > Content-Disposition: attachment; filename="bge.media_sts.diff" > > Index: sys/dev/bge/if_bge.c > =================================================================== > --- sys/dev/bge/if_bge.c (revision 251021) > +++ sys/dev/bge/if_bge.c (working copy) > @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar > > BGE_LOCK(sc); > > + if ((ifp->if_flags & IFF_UP) == 0) { > + BGE_UNLOCK(sc); > + return; > + } > if (sc->bge_flags & BGE_FLAG_TBI) { > ifmr->ifm_status = IFM_AVALID; > ifmr->ifm_active = IFM_ETHER; > > --/04w6evG8XlLl3ft-- after 18hs, the logs are empty! it seems the patch fixes the problem. now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts ... thanks, danny