Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Aug 2006 18:43:37 +0100
From:      Sam Eaton <sam@fqdn.net>
To:        freebsd-current@freebsd.org
Subject:   [sam@fqdn.net: bce0 watchdog timeout errors]
Message-ID:  <20060829174337.GA60234@host.fqdn.net>

next in thread | raw e-mail | index | archive | help

--C7zPtVaVf+AK4Oqc
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I see looking through the archives that Julian Elischer has reported
something *vaguely* similar to my problems described in the attached
message to this list, so I thought I'd join in.

Julian reports a problem when the bce interface is disconnected and then
reconnected under load, and never comes back until all load is removed,
while reporting watchdog timeout errors.

I see something rather like this (details in the attached message
(hopefully the attachment will survive :))), but without physically
disconnecting the cable.

I am wondering if I'm actually seeing the same problem, but that rather
than the cable being unplugged, it's something like the interface
resetting on our (old, 100Mb/s) switch resetting and triggering this.

Thought it was worth offering another data point.  I'm running the most
recent version of the bce driver with the changes to fix the 'mbuf'
errors.

Thanks,

Sam.
-- 
"Fortified with Essential Bitterness and Sarcasm"
    Matt Groening, "Binky's Guide to Love".

--C7zPtVaVf+AK4Oqc
Content-Type: message/rfc822
Content-Disposition: inline

Date: Tue, 29 Aug 2006 17:52:34 +0100
From: Sam Eaton <sam@fqdn.net>
To: freebsd-stable@freebsd.org
Subject: bce0 watchdog timeout errors
Message-ID: <20060829165234.GA15988@host.fqdn.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i

I'm still seeing an ongoing problem with the bce device on my Dell 1950.  

I'm running AMD64 6-STABLE, with the stock SMP kernel, and I'm running
the most recent version of the bce driver, which did cure the other
errors we were seeing (the mbuf related ones).

The card is currently connected at an auto-negotiated 100BaseTX full
duplex (rather than gigabit) as we don't currently have a gigabit switch
to test on (the machine is under test rather than deployed).

I can consistently cause the system to go into a 'Watchdog timeout
occurred, resetting!' loop, by trying to do any reasonable amount of
work over an nfs mounted filesystem.  

An easy way to reproduce this for me is to try and build some reasonably
large port on our nfs mounted copy of the ports tree.  

I can also cause this by running bonnie++ against an nfs mounted
filesystem.  

I've so far failed to find some simpler network only test to trigger
the problem (I've tried sshing large amounts of data back and forth,
iperf, ping floods, etc).  NFS seems to do the trick every time though.

Once it's reported the watchdog timeout, the networking on the box never
recovers.

Is anyone else seeing anything similar?  And does anyone have any
suggestions as to what I can do to try and diagnose this further so we
can get to the bottom of it?

Thanks,

Sam.
-- 
"Fortified with Essential Bitterness and Sarcasm"
    Matt Groening, "Binky's Guide to Love".

--C7zPtVaVf+AK4Oqc--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060829174337.GA60234>