Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Feb 2014 14:41:17 -0700
From:      Kevin Bowling <kevin.bowling@kev009.com>
To:        freebsd-net@freebsd.org
Subject:   Re: FreeBSD 10 network flapping, ix driver unreliable?
Message-ID:  <ldtvlk$kuc$1@ger.gmane.org>
In-Reply-To: <CE04609E-3C64-42A1-A2E7-BE7E0518AD32@neville-neil.com>
References:  <ldohqb$s2c$1@ger.gmane.org> <61748F81-A763-4504-BC81-132D394F0170@neville-neil.com> <ldp7vp$hf7$1@ger.gmane.org> <CE04609E-3C64-42A1-A2E7-BE7E0518AD32@neville-neil.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/16/2014 9:04 PM, George Neville-Neil wrote:
>
> On Feb 15, 2014, at 21:32 , Kevin Bowling <kevin.bowling@kev009.com> wrote:
>
>> On 2/15/2014 4:43 PM, George Neville-Neil wrote:
>>>
>>> On Feb 15, 2014, at 15:14 , Kevin Bowling <kevin.bowling@kev009.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes.  Each node has an Intel X520-DA2 dual port 10gig card.  One of the ports on each go to a switch using direct attach coaxial cables.  The other port is directly connected between the two nodes (think crossover in twisted pair terminology) again using direct attach coaxial cables.
>>>>
>>>> On both machines, and on both ports (including the "crossover"), the links flap several times per day.
>>>>
>>>> I've pasted the output of lspci -vv and dmesg here:
>>>> https://gist.github.com/kev009/9024442
>>>>
>>>> There's nothing outstanding about the setup otherwise.  I suspected some interaction with the switch initially but the "crossover" has eliminated that suspicion.
>>>>
>>>> It seems the ix driver is not very reliable under common conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=7&t=44570 and a search of this list.  Any recommendations or tests?
>>>>
>>>
>>> Can you post (to your gist link) the output of sysctl dev.ix ?
>>
>> Hi George,
>>
>> sysctl info added to gist link.  ix0 has been up for around 27 days. ix1 for about 24hrs.
>>
>
> I think this has something to do with it.
>
> dev.ix.0.mac_stats.local_faults: 314
> dev.ix.0.mac_stats.remote_faults: 41
>
> The device is seeing errors at the MAC layer, which  I don’t think a driver bug would
> cause, though there is always the possibility of a misconfiguration causing flapping.
> Can you try different cables?
>
> When you hook it to the switch does the switch give better diagnostics?  Reading
> over the Intel 82599 chip manual is not, shall we say, illuminating,
> "Number of faults in the local MAC. This register is valid only when the link speed is 10 Gb/s.”

Appreciate your help, this led me to find some new info although it 
doesn't entirely answer what local_faluts are for me: 
http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf

I may have spoke too soon, the "crossover" ix1 seems to be holding 
steady, so the local and remote faults must have been during negotiation 
and me bringing up the interfaces.

On the other system's ix0, the faults are almost all local and quite a 
bit more frequent:
dev.ix.0.mac_stats.local_faults: 10752
dev.ix.0.mac_stats.remote_faults: 2

I then noticed the switch had mandatory flow control on both send and 
receive for 10gig, but the FreeBSD box was only negotiating receive flow 
control.  I disabled both on the switch and rebooted but am still seeing 
some increments of local_faults.

Could it be a switch STP problem?  Switch is a Cisco 4948-10ge.  Configs 
look like below, which is working well on some copper gigabit interfaces:

spanning-tree mode pvst
spanning-tree portfast default
spanning-tree extend system-id
!
interface TenGigabitEthernet1/49
  switchport trunk encapsulation dot1q
  switchport mode trunk
  spanning-tree portfast trunk
!
interface TenGigabitEthernet1/50
  switchport trunk encapsulation dot1q
  switchport mode trunk
  flowcontrol receive desired
  flowcontrol send desired
  spanning-tree portfast trunk
!

It will be hard for me to source SFPs and fiber, but I can try to see if 
it's a physical layer problem.  In the mean time I might try imaging one 
of the systems with a different OS and seeing if the problem persists.

Regards,
Kevin Bowling





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ldtvlk$kuc$1>