From owner-freebsd-stable@FreeBSD.ORG Thu Nov 2 23:50:38 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CC87C16A4FD for ; Thu, 2 Nov 2006 23:50:38 +0000 (UTC) (envelope-from John.Marshall@riverwillow.com.au) Received: from mail2.riverwillow.net.au (ns2.riverwillow.net.au [203.58.93.41]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2F15743D46 for ; Thu, 2 Nov 2006 23:50:37 +0000 (GMT) (envelope-from John.Marshall@riverwillow.com.au) Received: from rwmail.mby.riverwillow.net.au (rwsrv06.rw2.riverwillow.net.au [172.25.25.16]) by mail2.riverwillow.net.au (8.13.8/8.13.8) with ESMTP id kA2NoZcl099808 for ; Fri, 3 Nov 2006 10:50:35 +1100 (AEDT) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Fri, 3 Nov 2006 10:50:34 +1100 Message-ID: <9F7B653A50CF3D45A92C05401046239B0E0CBD@rwsrv06.rw2.riverwillow.net.au> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Watchdog Timeout - bge device - 6.2-PRERELEASE thread-index: Acb+2bAQ3vRQ4WQvTw628EppmvaXAw== From: "John Marshall" To: Subject: Watchdog Timeout - bge device - 6.2-PRERELEASE X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2006 23:50:38 -0000 rwsrv05> dmesg | grep bge bge0: mem 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 miibus1: on bge0 bge0: Ethernet address: 00:0b:cd:e7:70:19 bge0: link state changed to UP bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP This is happening, on average, once per day. It happens when the bge0 interface is under load. I cannot reproduce it at will. I posted here about a month ago when I was seeing this problem under SCHED_ULE. http://lists.freebsd.org/pipermail/freebsd-stable/2006-October/029079.ht ml Having been duly castigated for using SCHED_ULE, I reverted to SCHED_4BSD and kept quiet. The symptoms are back! (less frequently) under SCHED_4BSD - but the kernel now has lots of extras. In order to help with testing 6.2-PRERELEASE, I've been loading up drivers for bits of the hardware which I don't even use. That has brought to light a shared interrupt which may or may not have some relevance. I'm also now running SMP. I've also compiled in INVARIANTS on the understanding that it's supposed to provide helpful debugging information for this issue (but I don't know how to use it - and I haven't seen any extra clues). Hardware: hp ProLiant ML110 rwsrv05> vmstat -i interrupt total rate irq1: atkbd0 546 0 irq6: fdc0 9 0 irq14: ata0 156756 2 irq15: ata1 47 0 irq17: bge0+ 18518341 309 irq24: fxp0 78098 1 irq26: mpt0 851102 14 cpu0: timer 119569853 2000 cpu1: timer 119555276 1999 Total 258730028 4327 rwsrv05> dmesg | grep 'irq 17' bge0: mem 0xe8200000-0xe820ffff irq 17 at device 4.0 on pci4 ichsmb0: port 0x1440-0x145f irq 17 at device 31.3 on pci0 rwsrv05> sysctl kern.version kern.sched kern.smp hw.machine hw.model dev.bge kern.version: FreeBSD 6.2-PRERELEASE #0: Tue Oct 31 21:30:38 AEDT 2006 root@rwsrv05.mby.riverwillow.net.au:/spare/obj/usr/src/sys/RWSRV05 kern.sched.name: 4BSD kern.sched.quantum: 100000 kern.sched.ipiwakeup.enabled: 1 kern.sched.ipiwakeup.requested: 2 kern.sched.ipiwakeup.delivered: 2 kern.sched.ipiwakeup.usemask: 1 kern.sched.ipiwakeup.useloop: 0 kern.sched.ipiwakeup.onecpu: 0 kern.sched.ipiwakeup.htt2: 0 kern.sched.followon: 0 kern.sched.pfollowons: 0 kern.sched.kgfollowons: 0 kern.sched.preemption: 1 kern.sched.runq_fuzz: 1 kern.smp.maxcpus: 16 kern.smp.active: 1 kern.smp.disabled: 0 kern.smp.cpus: 2 kern.smp.forward_signal_enabled: 1 kern.smp.forward_roundrobin_enabled: 1 hw.machine: i386 hw.model: Intel(R) Pentium(R) 4 CPU 2.80GHz dev.bge.0.%desc: Broadcom BCM5705 A3, ASIC rev. 0x3003 dev.bge.0.%driver: bge dev.bge.0.%location: slot=3D4 function=3D0 dev.bge.0.%pnpinfo: vendor=3D0x14e4 device=3D0x1654 subvendor=3D0x103c subdevice=3D0x1654 class=3D0x020000 dev.bge.0.%parent: pci4 rwsrv05>=20 Here's what I've added to the kernel config since 4th October... rwsrv05> rcsdiff -u -r1.9 -r1.18 RWSRV05 | grep ^+ =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: RCS/RWSRV05,v retrieving revision 1.9 retrieving revision 1.18 diff -u -r1.9 -r1.18 +++ RWSRV05 2006/10/31 10:24:01 1.18 +# $Id: RWSRV05,v 1.18 2006/10/31 10:24:01 john Exp $ +options INVARIANT_SUPPORT +options INVARIANTS +options SMP # Symmetric MultiProcessor Kernel +#options SCHED_ULE # ULE scheduler +options SCHED_4BSD # 4BSD scheduler + +options NFSSERVER # Network File System server +options NFSCLIENT # Network File System client + +# USB support +device usb # General USB code (mandatory for USB) +device uhci # UHCI controller +device ehci # EHCI controller + +# SMB bus +device smbus # Bus support, required for smb below. +# ichsmb Intel ICH SMBus controller chips (82801AA, 82801AB, 82801BA) +device ichsmb +device smb + +# AGP GART support +device agp + +# Direct Rendering modules for 3D acceleration +device drm # DRM core module required by DRM drivers +device mach64drm # ATI Rage Pro, Rage Mobility P/M, Rage XL + +# ichwd: Intel ICH watchdog timer +device ichwd rwsrv05>=20 I'm not actually using this extra stuff. I just thought it might be helpful (to FreeBSD) to find drivers for all my hardware to see if anything was broken. John Marshall.