Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Feb 2014 09:16:32 -0500
From:      George Neville-Neil <gnn@neville-neil.com>
To:        Kevin Bowling <kevin.bowling@kev009.com>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: FreeBSD 10 network flapping, ix driver unreliable?
Message-ID:  <11F52C6F-1A9C-4D5B-8364-AFB62322CB91@neville-neil.com>
In-Reply-To: <ldtvlk$kuc$1@ger.gmane.org>
References:  <ldohqb$s2c$1@ger.gmane.org> <61748F81-A763-4504-BC81-132D394F0170@neville-neil.com> <ldp7vp$hf7$1@ger.gmane.org> <CE04609E-3C64-42A1-A2E7-BE7E0518AD32@neville-neil.com> <ldtvlk$kuc$1@ger.gmane.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Feb 17, 2014, at 16:41 , Kevin Bowling <kevin.bowling@kev009.com> =
wrote:

> On 2/16/2014 9:04 PM, George Neville-Neil wrote:
>>=20
>> On Feb 15, 2014, at 21:32 , Kevin Bowling <kevin.bowling@kev009.com> =
wrote:
>>=20
>>> On 2/15/2014 4:43 PM, George Neville-Neil wrote:
>>>>=20
>>>> On Feb 15, 2014, at 15:14 , Kevin Bowling =
<kevin.bowling@kev009.com> wrote:
>>>>=20
>>>>> Hi,
>>>>>=20
>>>>> I have FreeBSD 10.0-RELEASE installed on two Dell C6100 nodes.  =
Each node has an Intel X520-DA2 dual port 10gig card.  One of the ports =
on each go to a switch using direct attach coaxial cables.  The other =
port is directly connected between the two nodes (think crossover in =
twisted pair terminology) again using direct attach coaxial cables.
>>>>>=20
>>>>> On both machines, and on both ports (including the "crossover"), =
the links flap several times per day.
>>>>>=20
>>>>> I've pasted the output of lspci -vv and dmesg here:
>>>>> https://gist.github.com/kev009/9024442
>>>>>=20
>>>>> There's nothing outstanding about the setup otherwise.  I =
suspected some interaction with the switch initially but the "crossover" =
has eliminated that suspicion.
>>>>>=20
>>>>> It seems the ix driver is not very reliable under common =
conditions, i.e. https://forums.freebsd.org/viewtopic.php?f=3D7&t=3D44570 =
and a search of this list.  Any recommendations or tests?
>>>>>=20
>>>>=20
>>>> Can you post (to your gist link) the output of sysctl dev.ix ?
>>>=20
>>> Hi George,
>>>=20
>>> sysctl info added to gist link.  ix0 has been up for around 27 days. =
ix1 for about 24hrs.
>>>=20
>>=20
>> I think this has something to do with it.
>>=20
>> dev.ix.0.mac_stats.local_faults: 314
>> dev.ix.0.mac_stats.remote_faults: 41
>>=20
>> The device is seeing errors at the MAC layer, which  I don=92t think =
a driver bug would
>> cause, though there is always the possibility of a misconfiguration =
causing flapping.
>> Can you try different cables?
>>=20
>> When you hook it to the switch does the switch give better =
diagnostics?  Reading
>> over the Intel 82599 chip manual is not, shall we say, illuminating,
>> "Number of faults in the local MAC. This register is valid only when =
the link speed is 10 Gb/s.=94
>=20
> Appreciate your help, this led me to find some new info although it =
doesn't entirely answer what local_faluts are for me: =
http://grouper.ieee.org/groups/802/3/ae/public/nov00/taborek_2_1100.pdf
>=20
> I may have spoke too soon, the "crossover" ix1 seems to be holding =
steady, so the local and remote faults must have been during negotiation =
and me bringing up the interfaces.
>=20
> On the other system's ix0, the faults are almost all local and quite a =
bit more frequent:
> dev.ix.0.mac_stats.local_faults: 10752
> dev.ix.0.mac_stats.remote_faults: 2
>=20
> I then noticed the switch had mandatory flow control on both send and =
receive for 10gig, but the FreeBSD box was only negotiating receive flow =
control.  I disabled both on the switch and rebooted but am still seeing =
some increments of local_faults.
>=20
> Could it be a switch STP problem?  Switch is a Cisco 4948-10ge.  =
Configs look like below, which is working well on some copper gigabit =
interfaces:
>=20
> spanning-tree mode pvst
> spanning-tree portfast default
> spanning-tree extend system-id
> !
> interface TenGigabitEthernet1/49
> switchport trunk encapsulation dot1q
> switchport mode trunk
> spanning-tree portfast trunk
> !
> interface TenGigabitEthernet1/50
> switchport trunk encapsulation dot1q
> switchport mode trunk
> flowcontrol receive desired
> flowcontrol send desired
> spanning-tree portfast trunk
> !
>=20
> It will be hard for me to source SFPs and fiber, but I can try to see =
if it's a physical layer problem.  In the mean time I might try imaging =
one of the systems with a different OS and seeing if the problem =
persists.
>=20

Another possibility is flow control.

Can you try this setting?

sysctl dev.ix.0.fc=3D0

Best,
George




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?11F52C6F-1A9C-4D5B-8364-AFB62322CB91>