Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Sep 2006 14:42:30 +0200
From:      Philippe Pegon <Philippe.Pegon@crc.u-strasbg.fr>
To:        "Patrick M. Hausen" <hausen@punkt.de>
Cc:        freebsd-stable@freebsd.org, Oliver Brandmueller <ob@e-Gitt.NET>
Subject:   Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2
Message-ID:  <451A71B6.6040201@crc.u-strasbg.fr>
In-Reply-To: <20060927094509.GB75104@hugo10.ka.punkt.de>
References:  <451A1375.5080202@gneto.com> <20060927071538.GF22229@e-Gitt.NET>	<451A4189.5020906@samsco.org> <20060927094509.GB75104@hugo10.ka.punkt.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

it's just a me too. On our ftp server (ftp8.fr.freebsd.org), sometimes
we see some "watchdog timeout" in the log with a bge card, but maybe it's
not the same problem... :

/var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: watchdog timeout -- resetting
/var/log/messages:Sep 23 02:47:06 anubis kernel: bge1: link state changed to DOWN
/var/log/messages:Sep 23 02:47:11 anubis kernel: bge1: link state changed to UP
/var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: watchdog timeout -- resetting
/var/log/messages.0.bz2:Sep 12 22:22:48 anubis kernel: bge1: link state changed to DOWN
/var/log/messages.0.bz2:Sep 12 22:22:51 anubis kernel: bge1: link state changed to UP
/var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: watchdog timeout -- resetting
/var/log/messages.0.bz2:Sep 17 15:22:01 anubis kernel: bge1: link state changed to DOWN
/var/log/messages.0.bz2:Sep 17 15:22:06 anubis kernel: bge1: link state changed to UP
/var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: watchdog timeout -- resetting
/var/log/messages.0.bz2:Sep 20 12:13:07 anubis kernel: bge1: link state changed to DOWN
/var/log/messages.0.bz2:Sep 20 12:13:11 anubis kernel: bge1: link state changed to UP
/var/log/messages.1.bz2:Sep  6 08:33:54 anubis kernel: bge1: watchdog timeout -- resetting
/var/log/messages.1.bz2:Sep  6 08:33:54 anubis kernel: bge1: link state changed to DOWN
/var/log/messages.1.bz2:Sep  6 08:33:59 anubis kernel: bge1: link state changed to UP
/var/log/messages.2.bz2:Sep  4 17:39:25 anubis kernel: bge1: link state changed to DOWN
/var/log/messages.2.bz2:Sep  4 17:39:28 anubis kernel: bge1: link state changed to UP
/var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: watchdog timeout -- resetting
/var/log/messages.3.bz2:Aug 29 12:09:36 anubis kernel: bge0: link state changed to DOWN
/var/log/messages.3.bz2:Aug 29 12:09:41 anubis kernel: bge0: link state changed to UP
/var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: watchdog timeout -- resetting
/var/log/messages.4.bz2:Aug 22 15:44:00 anubis kernel: bge0: link state changed to DOWN
/var/log/messages.4.bz2:Aug 22 15:44:03 anubis kernel: bge0: link state changed to UP

--
Philippe Pegon

Patrick M. Hausen wrote:
> Hello!
> 
>> Well, the best I can say at the moment is, "Wow."  =-(  I guess the 
>> thing to do here is to figure out if the problem lies with the em 
>> interrupt handler not getting run, or the taskqueue not getting run.
> 
> I helped Pyun with some debugging by providing ssh access to
> a machine showing the (seemingly) same problem.
> 
> At first he thought the interrupt handler of the em driver was
> the culprit, but we applied quite a few patches and tested
> afterwards - seems like the driver is not the cause.
> 
> On -stable occasionally other people complained about very similar
> looking problems with bge and other drivers. My guess is, though 
> I'm not a kernel developer, just an experienced admin, that
> em stands out as problematic just by coincidence. Certain onboard
> network components tend to come with certaiin chipsets and certain
> architectures.
> 
> So, Pyun suggested it was a problem with the taskqueue that was
> introduced some time between 6.0 and 6.1.
> 
> With my system (Tyan GT20 B5161G20) the problem shows when there
> is heavy disk and cpu activity, like "make buildworld".
> I made sure that the em interface doesn't share an interrupt
> with the SATA controller. When the problem occurs, I get the
> well known "watchdog timeout" messages and then the system's
> network activity over that interface freezes completely for
> a couple of minutes.
> Usually the system recovers after a while without reboot or
> other measures.
> 
> What I can do: give ssh access to a system showing this behaviour
> including a network connection to another box, so one can transfer
> large amounts of data over a private LAN. I used FTP of a sparse
> big file.
> 
> Prerequisite: fixed IP address of the machine that the developer
> whishes to use to connect to my system.
> 
> HTH,
> Patrick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?451A71B6.6040201>