Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jul 2011 21:51:05 -0400 (EDT)
From:      Charles Sprickman <spork@bway.net>
To:        YongHyeon PYUN <pyunyh@gmail.com>
Cc:        freebsd-net@freebsd.org, David Christensen <davidch@freebsd.org>
Subject:   Re: bce packet loss
Message-ID:  <alpine.OSX.2.00.1107072129310.2407@freemac>
In-Reply-To: <20110707174233.GB8702@michelle.cdnetworks.com>
References:  <alpine.OSX.2.00.1107042113000.2407@freemac> <20110706201509.GA5559@michelle.cdnetworks.com> <alpine.OSX.2.00.1107070121060.2407@freemac> <20110707174233.GB8702@michelle.cdnetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 7 Jul 2011, YongHyeon PYUN wrote:

> On Thu, Jul 07, 2011 at 02:00:26AM -0400, Charles Sprickman wrote:
>> More inline, including a bigger picture of what I'm seeing on some other
>> hosts, but I wanted to thank everyone for all the fascinating ethernet BER
>> info and the final explanation of what the "IfHCInBadOctets" counter
>> represents.  Interesting stuff.
>>
>> On Wed, 6 Jul 2011, YongHyeon PYUN wrote:
>>
>>> On Mon, Jul 04, 2011 at 09:32:11PM -0400, Charles Sprickman wrote:
>>>> Hello,
>>>>
>>>> We're running a few 8.1-R servers with Broadcom bce interfaces (Dell R510)
>>>> and I'm seeing occasional packet loss on them (enough that it trips nagios
>>>> now and then).  Cabling seems fine as neither the switch nor the sysctl
>>>> info for the device show any errors/collisions/etc, however there is one
>>>> odd one, which is "dev.bce.1.stat_IfHCInBadOctets: 539369".  See [1] below
>>>> for full sysctl output.  The switch shows no errors but for "Dropped
>>>> packets 683868".
>>>>
>>>> pciconf output is also below. [2]
>>>>
>>>> By default, the switch had flow control set to "on".  I also let it run
>>>> with "auto".  In both cases, the drops continued to increment.  I'm now
>>>> running with flow control off to see if that changes anything.
>>>>
>>>> I do see some correlation between cpu usage and drops - I have cpu usage
>>>> graphed in nagios and cacti is graphing the drops on the dell switch.
>>>> There's no signs of running out of mbufs or similar.
>>>>
>>>> So given that limited info, is there anything I can look at to track this
>>>> down?  Anything stand out in the stats sysctl exposes?  Two things are
>>>> standing out for me - the number of changes in bce regarding flow control
>>>> that are not in 8.1, and the correlation between cpu load and the drops.
>>>>
>>>> What other information can I provide?
>>>>
>>>
>>> You had 282 RX buffer shortages and these frames were dropped. This
>>> may explain why you see occasional packet loss. 'netstat -m' will
>>> show which size of cluster allocation were failed.
>>
>> Nothing of note:
>>
>> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
>> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
>> 0/0/0 sfbufs in use (current/peak/max)
>> 0 requests for sfbufs denied
>> 0 requests for sfbufs delayed
>>
>
> Hmm... it's strange, I can't explain how you have non-zero
> mbuf_alloc_failed_count.

Odd, but probably a red herring.  It's not really climbing much, it's at 
288 right now.

>>> However it seems you have 0 com_no_buffers which indicates
>>> controller was able to receive all packets destined for this host.
>>> You may host lost some packets(i.e. non-zero mbuf_alloc_failed_count)
>>> but your controller and system was still responsive to the network
>>> traffic.
>>
>> OK.  I recall seeing a thread in the -net archives where some folks had
>> the "com_no_buffers" incrementing, but I'm not seeing that at all.
>>
>>> Data sheet says IfHCInBadOctets indicates number of octets received
>>> on the interface, including framing characters for packets that
>>> were dropped in the MAC for any reason. I'm not sure this counter
>>> includes packets IfInFramesL2FilterDiscards which indicates number
>>> of good frames that have been dropped due to the L2 perfect match,
>>> broadcast, multicast or MAC control frame filters. If your switch
>>> runs STP it would periodically sends BPDU packets to destination
>>> address of STP multicast address 01:80:C2:00:00:00. Not sure this
>>> is the reason though. Probably David can explain more details on
>>> IfHCInBadOctets counter(CCed).
>>
>> Again, thanks for that.
>>
>> If I could just ask for a bit more assistance, it would be greatly
>> appreciated.  I collected a fair bit of data and it's done nothing but
>> complicate the issue for me so far.
>>
>> -If I'm reading the switch stats correctly, most of my drops are
>> host->switch, although I'm not certain of that, these Dell 2848s have no
>
> IfHCInBadOctets is counter for RX(i.e. switch->host).  And TX
> hardware MAC counters showed no error at all.

Thanks.  Also of note is that if I look at the stats for the Dell via 
SNMP, the OID that the web interface labels as "dropped packets" is 
actually labelled "discards".  Don't know if that's an important 
distinction, but "discard" seems to suggest a drop with a purpose (like 
"oh some buffer is full, sorry!").

>> real cli interface to speak of.
>>
>> -I'm seeing similar drops, but not quite so bad, on other hosts.  They all
>> use the em interface but for one other with bge.  This particular host
>> (with the bce interface) just seems to get bad enough to trigger nagios
>> alerts (simple ping check from a host on the same switch/subnet).  All
>> these hosts are forced to 100/FD as is the switch.  The switch is our
>> external (internet facing) switch with a 100Mb connection to our upstream.
>> At *peak* our aggregate bandwidth on this switch is maybe 45Mb/s, most of
>> it outbound.  We are nowhere near saturating the switching fabric (I
>> hope).
>>
>> -There are three reasons I set the ports at 100baseTX - the old Cisco that
>> lost a few ports was a 10/100 switch and the hosts were already hard-coded
>> for 100/FD, I figured if the Dell craps out I can toss the Cisco back
>> without changing the speed/duplex on all the hosts, and lastly our uplink
>> is only 100/FD so why bother.  Also maybe some vague notion that I'd not
>> use up some kind of buffers in the switch by matching the speed on all
>> ports...
>>
>> -We have an identical switch (same model, same hardware rev, same
>> firmware) for our internal network (lots of log analysis over nfs mounts,
>> a ton of internal dns (upwards of 10K queries/sec at peak), and occasional
>> large file transfers.  On this host and all others, the dropped packet
>> count on the switch ports is at worst around 5000 packets.  The counters
>> have not been reset on it and it's been up for 460 days.
>>
>> -A bunch of legacy servers that have fxp interfaces on the external switch
>> and em on the internal switch show *no* significant drops nor do
>> the switch ports they are connected to.
>>
>> -To see if forcing the ports to 100/FD was causing a problem, I set the
>> host and switch to 1000/FD.  Over roughly 24 hours, the switch is
>> reporting 197346 dropped packets of 52166986 packets received.
>>
>> -Tonight's change was to turn off spanning tree.  This is a long shot
>> based on some Dell bug I saw discussed on their forums.  Given our simple
>> network layout, I don't really see spanning tree as being at all
>> necessary.
>>
>> One of the first replies I got to my original post was private and
>> amounted to "Dell is garbage".  That may be true, but the excellent
>> performance on the more heavily loaded internal network makes me doubt
>> there's a fundamental shortcoming in the switch.  It would have to be real
>> garbage to crap out with a combined load of 45Mb/s.  I am somewhat curious
>> if some weird buffering issue is possible with a mix of 100/FD and 1000/FD
>> ports.
>>
>> Any thoughts on that?  It's the only thing that differs between the two
>> switches.
>>
>
> This makes me think possibility of duplex mismatch between bce(4)
> and link partner. You should not use forced media configuration on
> 1000baseT link. If you used manual media configuration on bce(4)
> and link partner used auto-negotiation, resolved duplex would be
> half-duplex. It's standard behavior and Duplex mismatch can cause
> strange problems.
> I would check whether link partner also agrees on the resolved
> speed/duplex of bce(4).

Many, many years ago I developed the habit of always locking speed and 
duplex at both ends.  This is the case here, and both ends agree on it 
whether it's at the original 100/FD or the current 1000/FD.

>> Before replacing the switch I'm also going to cycle through turning off
>> TSO, rxcsum, and txcsum since it seems that has been a fix for some people
>> with otherwise unexplained network issues.  I assume those features all
>> depend on the firmware of the NIC being bug-free, and I'm not quite ready
>> to accept that.
>>
>
> It's worth to try but I wonder how it can explain ICMP ECHO request
> packet loss.

Ah, yes.  Thank you for that. :)

I think this is turning into a Dell thread though.  I had noted that the 
internal network, which is all 1000/FD hosts but for a few hosts (dns 
resolvers) that only send sizable traffic to faster hosts, was running 
totally clean.

I was able to reproduce the drops in very large numbers on the internal 
network today.  I simply scp'd some large files from 1000/FD hosts to a 
100/FD host (ie: scp bigfile.tgz oldhost.i:/dev/null).  Immediately the 
1000/FD hosts sending the files showed massive amounts of drops on the 
switch.  This makes me suspect that this switch might be garbage in that 
it doesn't have enough buffer space to handle sending large amounts of 
traffic from the GigE ports to the FE ports without randomly dropping 
packets.  Granted, I don't really understand how a "good" switch does this 
either, I would have thought tcp just took care of throttling itself.

Bear in mind that on the external switch our port to our ISP, which is the 
destination of almost all the traffic, is 100/FD and not 1000/FD.

This of course does not explain why the original setup where I'd locked 
the switch ports and the host ports to 100/FD showed the same behavior.

I'm stumped.

We are running 8.1, am I correct in that flow control is not implemented 
there?  We do have an 8.2-STABLE image from a month or so ago that we are 
testing with zfs v28, might that implement flow control?

Although reading this:

http://en.wikipedia.org/wiki/Ethernet_flow_control

It sounds like flow control is not terribly optimal since it forces the 
host to block all traffic.  Not sure if this means drops are eliminated, 
reduced or shuffled around.

That's the last thing (host-side) I can think of...

Oh, and to Mr. Swiger, my apologies for brushing off your early suggestion 
that this may all be due to the switch being something of a piece of junk.

Thanks,

Charles

>> Thanks,
>>
>> Charles
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.OSX.2.00.1107072129310.2407>