Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 Nov 2006 18:46:44 +0200
From:      Nikolay Pavlov <quetzal@zone3000.net>
To:        Jack Vogel <jfvogel@gmail.com>, Adrian Chadd <adrian@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: em driver testing
Message-ID:  <20061108164644.GA50151@zone3000.net>
In-Reply-To: <20061108154102.GA40238@icarus.home.lan>
References:  <68011C68-0962-4946-88E1-F36EE7C707DA@redstarling.com> <20061106221219.GA66676@hugo10.ka.punkt.de> <041201c701f9$37b2aed0$9603a8c0@claylaptop> <2a41acea0611061614n478efe77y82c0ebc2e1b01e19@mail.gmail.com> <d763ac660611062047n67058489jeca8d4c79e8c7490@mail.gmail.com> <2a41acea0611062242h42b1bde6w711e9a5039ed1a90@mail.gmail.com> <20061108144003.GA43734@zone3000.net> <20061108154102.GA40238@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday,  8 November 2006 at  7:41:02 -0800, Jeremy Chadwick wrote:
> On Wed, Nov 08, 2006 at 04:40:03PM +0200, Nikolay Pavlov wrote:
> > Well i have 5.5 box with very similar symptomatic :)
> > I do not see watchdog timeouts on it, but a lot of UP/DOWN events.
> 
> Are you sure this is the same problem as what's being discussed
> here?  If you revert to a previous kernel or em driver, does the
> problem (link up/down) go away?  Are you sure you don't actually
> have a flaky cable or RJ45 connector?  What does the switch your
> NIC is connected to say? (does it show link going up and down)

I am pretty sure. All my servers using the same em chip, on all my 6.1
boxes either UP or SMP i see watchdog timeout, average load of this 
adapters is 5000 - 6000 interrunpts per second. I have only one box with
5.5 (same task and same platform), but i am not claiming that this is 
exactly the watchdog problem, it's just very symptomatic in context of 
discussion. In any case new 6.2 em patch works for me, at least i do not 
see watchdog timeouts after 48 hours of uptime.

By the way the box is connected to 2950 switch, i can't find any
problems on cabling.

Here is how it looks like on 5.5:

Oct 18 05:38:45 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 05:38:50 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 05:39:21 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 05:39:32 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: is alive again
Oct 18 05:52:22 ms6 kernel: em0: Link is Down
Oct 18 05:55:13 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 05:55:13 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 05:55:44 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 05:55:46 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: is alive again
Oct 18 06:01:52 ms6 kernel: em0: Link is Down
Oct 18 06:03:54 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:03:54 ms6 kernel: em0: Link is Down
Oct 18 06:04:01 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:16:07 ms6 kernel: em0: Link is Down
Oct 18 06:18:16 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:21:55 ms6 kernel: em0: Link is Down
Oct 18 06:25:12 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:25:25 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:25:27 ms6 kernel: em0: Link is Down
Oct 18 06:25:33 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:25:43 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:26:10 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: is alive again
Oct 18 06:43:12 ms6 kernel: em0: Link is Down
Oct 18 06:45:13 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:45:44 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:46:15 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:46:27 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:46:28 ms6 kernel: em0: Link is Down
Oct 18 06:46:34 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 06:46:46 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:47:17 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 06:47:26 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: is alive again
Oct 18 07:02:51 ms6 kernel: em0: Link is Down
Oct 18 07:04:42 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 07:04:44 ms6 kernel: em0: Link is Down
Oct 18 07:04:50 ms6 kernel: em0: Link is up 1000 Mbps Full Duplex
Oct 18 07:05:13 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 18 07:05:25 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: is alive again
Oct 18 16:40:05 ms6 kernel: receive error 60 from nfs server 206.53.x.x:/usr/home/shared
Oct 19 03:55:13 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: not responding
Oct 19 03:55:15 ms6 kernel: nfs server 206.53.x.x:/usr/home/shared: is alive again

After that date it was rebooted at least three times and i don't 
see such symptoms any more.

> 
> I feel horrible for both Scott and Jack -- I think there's tons
> of people coming out of the woodwork with "ME TOO" comments who
> may in fact be suffering from other problems, and are looking for
> a scapegoat thread.

Just ignore me. Patch works for me and this is end.

-- 
======================================================================  
- Best regards, Nikolay Pavlov. <<<-----------------------------------    
======================================================================  




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061108164644.GA50151>