Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 08 Nov 2006 09:26:26 -0700
From:      Scott Long <scottl@samsco.org>
To:        Nikolay Pavlov <quetzal@zone3000.net>, Jack Vogel <jfvogel@gmail.com>, Adrian Chadd <adrian@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: em driver testing
Message-ID:  <45520532.3000603@samsco.org>
In-Reply-To: <20061108154102.GA40238@icarus.home.lan>
References:  <68011C68-0962-4946-88E1-F36EE7C707DA@redstarling.com>	<20061106221219.GA66676@hugo10.ka.punkt.de>	<041201c701f9$37b2aed0$9603a8c0@claylaptop>	<2a41acea0611061614n478efe77y82c0ebc2e1b01e19@mail.gmail.com>	<d763ac660611062047n67058489jeca8d4c79e8c7490@mail.gmail.com>	<2a41acea0611062242h42b1bde6w711e9a5039ed1a90@mail.gmail.com>	<20061108144003.GA43734@zone3000.net> <20061108154102.GA40238@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
Jeremy Chadwick wrote:
> On Wed, Nov 08, 2006 at 04:40:03PM +0200, Nikolay Pavlov wrote:
>> Well i have 5.5 box with very similar symptomatic :)
>> I do not see watchdog timeouts on it, but a lot of UP/DOWN events.
> 
> Are you sure this is the same problem as what's being discussed
> here?  If you revert to a previous kernel or em driver, does the
> problem (link up/down) go away?  Are you sure you don't actually
> have a flaky cable or RJ45 connector?  What does the switch your
> NIC is connected to say? (does it show link going up and down)
> 
> I feel horrible for both Scott and Jack -- I think there's tons
> of people coming out of the woodwork with "ME TOO" comments who
> may in fact be suffering from other problems, and are looking for
> a scapegoat thread.
> 

The timeout/watchdog mechanism in the interface layer has been a problem
ever since the MPSAFE work was done on the network stack.  It's prone to
races, and as the OS has improved and gotten faster over the past 2
years, those races have gotten bigger.  In a way, it's a actually a
positive indication of progress and improvement =-)

I don't doubt that there are users with other problems.  We spent some
time collecting as much user data as we could in order to find patterns
and weed out the uncommon cases.  But this timer/watchdog thing looks to
be a strong candidate for being the root cause of many of the problems.
We'll continue to investigate these problems and address other drivers.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45520532.3000603>