Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 6 Jul 2011 13:15:09 -0700
From:      YongHyeon PYUN <pyunyh@gmail.com>
To:        Charles Sprickman <spork@bway.net>
Cc:        freebsd-net@freebsd.org, David Christensen <davidch@freebsd.org>
Subject:   Re: bce packet loss
Message-ID:  <20110706201509.GA5559@michelle.cdnetworks.com>
In-Reply-To: <alpine.OSX.2.00.1107042113000.2407@freemac>
References:  <alpine.OSX.2.00.1107042113000.2407@freemac>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Jul 04, 2011 at 09:32:11PM -0400, Charles Sprickman wrote:
> Hello,
> 
> We're running a few 8.1-R servers with Broadcom bce interfaces (Dell R510) 
> and I'm seeing occasional packet loss on them (enough that it trips nagios 
> now and then).  Cabling seems fine as neither the switch nor the sysctl 
> info for the device show any errors/collisions/etc, however there is one 
> odd one, which is "dev.bce.1.stat_IfHCInBadOctets: 539369".  See [1] below 
> for full sysctl output.  The switch shows no errors but for "Dropped 
> packets 683868".
> 
> pciconf output is also below. [2]
> 
> By default, the switch had flow control set to "on".  I also let it run 
> with "auto".  In both cases, the drops continued to increment.  I'm now 
> running with flow control off to see if that changes anything.
> 
> I do see some correlation between cpu usage and drops - I have cpu usage 
> graphed in nagios and cacti is graphing the drops on the dell switch. 
> There's no signs of running out of mbufs or similar.
> 
> So given that limited info, is there anything I can look at to track this 
> down?  Anything stand out in the stats sysctl exposes?  Two things are 
> standing out for me - the number of changes in bce regarding flow control 
> that are not in 8.1, and the correlation between cpu load and the drops.
> 
> What other information can I provide?
> 

You had 282 RX buffer shortages and these frames were dropped. This
may explain why you see occasional packet loss. 'netstat -m' will
show which size of cluster allocation were failed.
However it seems you have 0 com_no_buffers which indicates
controller was able to receive all packets destined for this host.
You may host lost some packets(i.e. non-zero mbuf_alloc_failed_count)
but your controller and system was still responsive to the network
traffic.

Data sheet says IfHCInBadOctets indicates number of octets received
on the interface, including framing characters for packets that
were dropped in the MAC for any reason. I'm not sure this counter
includes packets IfInFramesL2FilterDiscards which indicates number
of good frames that have been dropped due to the L2 perfect match,
broadcast, multicast or MAC control frame filters. If your switch
runs STP it would periodically sends BPDU packets to destination
address of STP multicast address 01:80:C2:00:00:00. Not sure this
is the reason though. Probably David can explain more details on
IfHCInBadOctets counter(CCed).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110706201509.GA5559>