Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 May 2013 17:55:44 +0900
From:      YongHyeon PYUN <pyunyh@gmail.com>
To:        Daniel Braniss <danny@cs.huji.ac.il>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: SunFire X2200 ilo's bge1 DOWN/UP
Message-ID:  <20130529085544.GC3042@michelle.cdnetworks.com>
In-Reply-To: <E1UhDoa-000ElU-2U@kabab.cs.huji.ac.il>
References:  <E1UgsL2-000DBa-El@kabab.cs.huji.ac.il> <20130528052953.GA1457@michelle.cdnetworks.com> <E1UhDO4-000Dr7-PJ@kabab.cs.huji.ac.il> <20130528064850.GB1457@michelle.cdnetworks.com> <E1UhDoa-000ElU-2U@kabab.cs.huji.ac.il>

next in thread | previous in thread | raw e-mail | index | archive | help

--/04w6evG8XlLl3ft
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote:
> > On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
> > > > On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
> > > > > > On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
> > > > > > > hi, after upgrading to 9.1-stable, this particular hardware - SunFire X2200,
> > > > > > 
> > > > > > Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
> > > > > > 
> > > > > 
> > > > > bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 
> > > > > 0xfdff0000-0xfdffffff,0xfdfe0000-0xfdfeffff irq 17 at device 4.0 on pci6
> > > > > bge0: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > > > miibus2: <MII bus> on bge0
> > > > > brgphy0: <BCM5714 1000BASE-T media interface> PHY 1 on miibus2
> > > > > brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > > > bge0: Ethernet address: 00:1b:24:5d:5b:bd
> > > > > bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003> mem 
> > > > > 0xfdfc0000-0xfdfcffff,0xfdfb0000-0xfdfbffff irq 18 at device 4.1 on pci6
> > > > > bge1: CHIP ID 0x00009003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
> > > > > miibus3: <MII bus> on bge1
> > > > > brgphy1: <BCM5714 1000BASE-T media interface> PHY 1 on miibus3
> > > > > brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
> > > > > 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
> > > > > bge1: Ethernet address: 00:1b:24:5d:5b:be
> > > > > 
> > > > > sf-10> ifconfig bge1
> > > > > bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
> > > > >         options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
> > > > > TE>
> > > > >         ether 00:1b:24:5d:5b:be
> > > > >         nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
> > > > >         media: Ethernet autoselect (100baseTX <full-duplex>)
> > > > >         status: active
> > > > > 
> > > > 
> > > > Because bge1 is not UP, I wonder how you get link UP/DOWN events.
> > > > Do you have some network script run by cron?
> > > 
> > > no scripts.
> > > this port is shared with the ILO/IPMI, and back in March you fixed a problem
> > > that it was hanging soon after it was initialized by the driver,
> > > (r248226 - but I'm not sure if it was ever MFC'ed).
> > 
> > It was MFCed.
> > 
> > > Initialy I thought it could be caused by connections to it from other
> > > hosts (either via the web, or ssh) so I killed them, but it didn't help.
> > > without that patch the connection fails, and I don't see any DOWN/UP.
> > 
> > Could you check how many number of interrupts you get from bge1?
> > Ideally you shouldn't get any interrupts for bge1.
> 
> it's not even mentioned :-)
> sf-04> vmstat -i
> interrupt                          total       rate
> irq3: uart1                          964          0
> irq4: uart0                            6          0
> irq14: ata0                       227354          0
> irq17: bge0                      1021981          2
> irq21: ohci0                          28          0
> irq22: ehci0                           2          0
> irq23: atapci1                    293228          0
> cpu0:timer                     383244076       1124
> cpu1:timer                       2225144          6
> cpu2:timer                       2056087          6
> cpu3:timer                       2093943          6
> Total                          391162813       1147
> 

Then the only way link UP/DOWN event could be generated for DOWN
interface would be invocation of media status query
(i.e. ifconfig -a) triggered by an external application.  Most
drivers I touched check IFF_UP flag before poking media status
register. However I'm not sure you're seeing this issue because you
do not use any network script run by cron.
Anyway, try attached patch and let me know whether it makes any
difference.

> > 
> > > 
> > > > 
> > > > > > > is toggeling bge1 DOWN/UP every few hours, this port is being used by the ILO.
> > > > > > > To check, I upgraded another identical host, and the same problem appears. 
> > > > > > 
> > > > > > What is the last known working revision?
> > > > > 
> > > > > I have no idea, but I have older versions, and ill start from the oldets 
> > > > > (9.1-prerelease), but
> > > > > it will take time, since it takes hours till it happens.
> > > > > 
> > > > 
> > > > ok.
> > > 
> > > 
> 
> 

--/04w6evG8XlLl3ft
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: attachment; filename="bge.media_sts.diff"

Index: sys/dev/bge/if_bge.c
===================================================================
--- sys/dev/bge/if_bge.c	(revision 251021)
+++ sys/dev/bge/if_bge.c	(working copy)
@@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar
 
 	BGE_LOCK(sc);
 
+	if ((ifp->if_flags & IFF_UP) == 0) {
+		BGE_UNLOCK(sc);
+		return;
+	}
 	if (sc->bge_flags & BGE_FLAG_TBI) {
 		ifmr->ifm_status = IFM_AVALID;
 		ifmr->ifm_active = IFM_ETHER;

--/04w6evG8XlLl3ft--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130529085544.GC3042>