Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Oct 2006 23:18:16 -0600
From:      Scott Long <scottl@samsco.org>
To:        John Marshall <John.Marshall@riverwillow.com.au>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Watchdog Timeout - bge devices
Message-ID:  <45234418.7000205@samsco.org>
In-Reply-To: <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au>
References:  <9F7B653A50CF3D45A92C05401046239B0E0C27@rwsrv06.rw2.riverwillow.net.au>

next in thread | previous in thread | raw e-mail | index | archive | help
John Marshall wrote:
> $ dmesg | grep bge
> bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem
> 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4
> miibus1: <MII bus> on bge0
> bge0: Ethernet address: 00:0b:cd:e7:51:ba
> bge0: watchdog timeout -- resetting
> bge0: link state changed to DOWN
> bge0: link state changed to UP
> 
> I initially pronounced the network cable dead and replaced it. Then I
> suspected the FastEthernet switch port and relocated to a different
> port. Watchdog timeouts persisted. I concluded that the bge hardware
> must be flaky until I read a recent thread on em device watchdog
> timeouts which led me to wonder about CPU scheduling.
> 
> The server experiencing the bge timeouts was using SCHED_ULE. I built
> 6.2-PRERELEASE on a spare disk and booted the problem server from that
> disk - bge problem persisted.
> 
> We have a second (identical) problem-free server configured with
> SCHED_4BSD. I reconfigured both machines so that the first machine (now
> 6.2-PRERELEASE) used SCHED_4BSD and the second machine (6.1-RELEASE)
> uses SCHED_ULE. Both machines are configured with PREEMPTION.
> 
> +-----------------------------------------------+
> | THE PROBLEM FOLLOWS SCHED_ULE ACROSS MACHINES |
> +-----------------------------------------------+
> 
> The machines are hp ProLiant ML110 servers.
> 
> There is nothing sharing the interrupt with the bge device. No USB
> drivers are loaded.
> 
> 
> $ vmstat -i
> interrupt                          total       rate
> irq1: atkbd0                          70          0
> irq6: fdc0                             9          0
> irq14: ata0                      1234430          6
> irq15: ata1                           47          0
> irq17: bge0                     17543591         93
> irq26: fxp0                        70832          0
> cpu0: timer                    376381765       1999
> Total                          395230744       2099
> 
> 
> $ sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge
> kern.version: FreeBSD 6.1-RELEASE-p10 #1: Mon Oct  2 08:36:56 AEST 2006
> 
> kern.sched.name: ule
> kern.sched.slice_min: 10
> kern.sched.slice_max: 142
> kern.sched.preemption: 1
> kern.smp.maxcpus: 1
> kern.smp.active: 0
> kern.smp.disabled: 0
> kern.smp.cpus: 1
> hw.machine: i386
> hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz
> dev.bge.0.%desc: Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003
> dev.bge.0.%driver: bge
> dev.bge.0.%location: slot=4 function=0
> dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1654 subvendor=0x103c
> subdevice=0x1654 class=0x020000
> dev.bge.0.%parent: pci4
> 
> Is there any other information I ought to post to help with diagnosis -
> or is this a known problem? (I've only subscribed recently)
> 
> John Marshall.

Very interesting data point.  I wonder if this accounts for some of the
inconsistency in the reporting from others.  In any case, SCHED_ULE is
still considered to be highly experimental.  Hopefully it will get some
more attention in the near future to bring it closer to production
quality.

Scott




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45234418.7000205>