Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 13 May 2011 13:16:42 +0200
From:      Vlad Galu <dudu@dudu.ro>
To:        freebsd-net@freebsd.org
Subject:   Re: bge(4) on RELENG_8 mbuf cluster starvation
Message-ID:  <BANLkTinrZbhc8p=OLCJdAfdi-d17s2nR_A@mail.gmail.com>
In-Reply-To: <AANLkTi=mO65OoDTcz2gxpsB075-%2BWdjKTFe9Chm_MY=Y@mail.gmail.com>
References:  <AANLkTimSs48ftRv8oh1wTwMEpgN1Ny3B1ahzfS=AbML_@mail.gmail.com> <AANLkTimfh3OdXOe1JFo5u6JypcLrcWKv2WpSu8Uv-tgv@mail.gmail.com> <AANLkTi=rWobA40UtCTSeOzEz65TMw8vfCcxtMWBBme%2Bu@mail.gmail.com> <20110313011632.GA1621@michelle.cdnetworks.com> <AANLkTi=dci-cKVuvpXCs40u8u=5LGzey6s5-jYXEPM7s@mail.gmail.com> <20110330171023.GA8601@michelle.cdnetworks.com> <AANLkTi=mO65OoDTcz2gxpsB075-%2BWdjKTFe9Chm_MY=Y@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 30, 2011 at 7:17 PM, Vlad Galu <dudu@dudu.ro> wrote:

>
>
> On Wed, Mar 30, 2011 at 7:10 PM, YongHyeon PYUN <pyunyh@gmail.com> wrote:
>
>> On Wed, Mar 30, 2011 at 05:55:47PM +0200, Vlad Galu wrote:
>> > On Sun, Mar 13, 2011 at 2:16 AM, YongHyeon PYUN <pyunyh@gmail.com>
>> wrote:
>> >
>> > > On Sat, Mar 12, 2011 at 09:17:28PM +0100, Vlad Galu wrote:
>> > > > On Sat, Mar 12, 2011 at 8:53 PM, Arnaud Lacombe <lacombar@gmail.com
>> >
>> > > wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > On Sat, Mar 12, 2011 at 4:03 AM, Vlad Galu <dudu@dudu.ro> wrote:
>> > > > > > Hi folks,
>> > > > > >
>> > > > > > On a fairly busy recent (r219010) RELENG_8 machine I keep
>> getting
>> > > > > > -- cut here --
>> > > > > > 1096/1454/2550 mbufs in use (current/cache/total)
>> > > > > > 1035/731/1766/262144 mbuf clusters in use
>> (current/cache/total/max)
>> > > > > > 1035/202 mbuf+clusters out of packet secondary zone in use
>> > > > > (current/cache)
>> > > > > > 0/117/117/12800 4k (page size) jumbo clusters in use
>> > > > > > (current/cache/total/max)
>> > > > > > 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
>> > > > > > 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
>> > > > > > 2344K/2293K/4637K bytes allocated to network
>> (current/cache/total)
>> > > > > > 0/70128196/37726935 requests for mbufs denied
>> > > > > (mbufs/clusters/mbuf+clusters)
>> > > > > > ^^^^^^^^^^^^^^^^^^^^^
>> > > > > > -- and here --
>> > > > > >
>> > > > > > kern.ipc.nmbclusters is set to 131072. Other settings:
>> > > > > no, netstat(8) says 262144.
>> > > > >
>> > > > >
>> > > > Heh, you're right, I forgot I'd doubled it a while ago. Wrote that
>> from
>> > > the
>> > > > top of my head.
>> > > >
>> > > >
>> > > > > Maybe can you include $(sysctl dev.bge) ? Might be useful.
>> > > > >
>> > > > >  - Arnaud
>> > > > >
>> > > >
>> > > > Sure:
>> > >
>> > > [...]
>> > >
>> > > > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller,
>> ASIC
>> > > rev.
>> > > > 0x004101
>> > > > dev.bge.1.%driver: bge
>> > > > dev.bge.1.%location: slot=0 function=0
>> > > > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014
>> > > > subdevice=0x02c6 class=0x020000
>> > > > dev.bge.1.%parent: pci5
>> > > > dev.bge.1.forced_collapse: 2
>> > > > dev.bge.1.forced_udpcsum: 0
>> > > > dev.bge.1.stats.FramesDroppedDueToFilters: 0
>> > > > dev.bge.1.stats.DmaWriteQueueFull: 0
>> > > > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
>> > > > dev.bge.1.stats.NoMoreRxBDs: 680050
>> > >   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> > > This indicates bge(4) encountered RX buffer shortage. Perhaps
>> > > bge(4) couldn't fill new RX buffers for incoming frames due to
>> > > other system activities.
>> > >
>> > > > dev.bge.1.stats.InputDiscards: 228755931
>> > >
>> > > This counter indicates number of frames discarded due to RX buffer
>> > > shortage. bge(4) discards received frame if it failed to allocate
>> > > new RX buffer such that InputDiscards is normally higher than
>> > > NoMoreRxBDs.
>> > >
>> > > > dev.bge.1.stats.InputErrors: 49080818
>> > >   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> > > Something is wrong here. Too many frames were classified as error
>> > > frames. You may see poor RX performance.
>> > >
>> > > > dev.bge.1.stats.RecvThresholdHit: 0
>> > > > dev.bge.1.stats.rx.ifHCInOctets: 2095148839247
>> > > > dev.bge.1.stats.rx.Fragments: 47887706
>> > > > dev.bge.1.stats.rx.UnicastPkts: 32672557601
>> > > > dev.bge.1.stats.rx.MulticastPkts: 1218
>> > > > dev.bge.1.stats.rx.BroadcastPkts: 2
>> > > > dev.bge.1.stats.rx.FCSErrors: 2822217
>> > >   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> > > FCS errors are too high. Please check cabling again(I'm assuming
>> > > the controller is not broken here). I think you can use vendor's
>> > > diagnostic tools to verify this.
>> > >
>> > > > dev.bge.1.stats.rx.AlignmentErrors: 0
>> > > > dev.bge.1.stats.rx.xonPauseFramesReceived: 0
>> > > > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
>> > > > dev.bge.1.stats.rx.ControlFramesReceived: 0
>> > > > dev.bge.1.stats.rx.xoffStateEntered: 0
>> > > > dev.bge.1.stats.rx.FramesTooLong: 0
>> > > > dev.bge.1.stats.rx.Jabbers: 0
>> > > > dev.bge.1.stats.rx.UndersizePkts: 0
>> > > > dev.bge.1.stats.tx.ifHCOutOctets: 48751515826
>> > > > dev.bge.1.stats.tx.Collisions: 0
>> > > > dev.bge.1.stats.tx.XonSent: 0
>> > > > dev.bge.1.stats.tx.XoffSent: 0
>> > > > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
>> > > > dev.bge.1.stats.tx.SingleCollisionFrames: 0
>> > > > dev.bge.1.stats.tx.MultipleCollisionFrames: 0
>> > > > dev.bge.1.stats.tx.DeferredTransmissions: 0
>> > > > dev.bge.1.stats.tx.ExcessiveCollisions: 0
>> > > > dev.bge.1.stats.tx.LateCollisions: 0
>> > > > dev.bge.1.stats.tx.UnicastPkts: 281039183
>> > > > dev.bge.1.stats.tx.MulticastPkts: 0
>> > > > dev.bge.1.stats.tx.BroadcastPkts: 1153
>> > > > -- and here --
>> > > >
>> > > > And now, that I remembered about this as well:
>> > > > -- cut here --
>> > > > Name    Mtu Network       Address              Ipkts Ierrs Idrop
>>  Opkts
>> > > > Oerrs  Coll
>> > > > bge1   1500 <Link#2>      00:11:25:22:0d:ed 32321767025 278517070
>> > > 37726837
>> > > > 281068216     0     0
>> > > > -- and here --
>> > > > The colo provider changed my cable a couple of times so I'd not
>> blame it
>> > > on
>> > > > that. Unfortunately, I don't have access to the port statistics on
>> the
>> > > > switch. Running netstat with -w1 yields between 0 and 4
>> errors/second.
>> > > >
>> > >
>> > > Hardware MAC counters still show high number of FCS errors. The
>> > > service provider should have to check possible cabling issues on
>> > > the port of the switch.
>> > >
>> >
>> > After swapping cables and moving the NIC into another switch, there are
>> some
>> > improvements. However:
>> > -- cut here --
>> > dev.bge.1.%desc: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC
>> rev.
>> > 0x004101
>> > dev.bge.1.%driver: bge
>> > dev.bge.1.%location: slot=0 function=0
>> > dev.bge.1.%pnpinfo: vendor=0x14e4 device=0x1659 subvendor=0x1014
>> > subdevice=0x02c6 class=0x020000
>> > dev.bge.1.%parent: pci5
>> > dev.bge.1.forced_collapse: 0
>> > dev.bge.1.forced_udpcsum: 0
>> > dev.bge.1.stats.FramesDroppedDueToFilters: 0
>> > dev.bge.1.stats.DmaWriteQueueFull: 0
>> > dev.bge.1.stats.DmaWriteHighPriQueueFull: 0
>> > dev.bge.1.stats.NoMoreRxBDs: 243248 <- this
>> > dev.bge.1.stats.InputDiscards: 9945500
>> > dev.bge.1.stats.InputErrors: 0
>>
>> There are still discarded frames but I believe it's not related
>> with any cabling issues since you don't have FCS or alignment
>> errors.
>>
>> > dev.bge.1.stats.RecvThresholdHit: 0
>> > dev.bge.1.stats.rx.ifHCInOctets: 36697296701
>> > dev.bge.1.stats.rx.Fragments: 0
>> > dev.bge.1.stats.rx.UnicastPkts: 549334370
>> > dev.bge.1.stats.rx.MulticastPkts: 113638
>> > dev.bge.1.stats.rx.BroadcastPkts: 0
>> > dev.bge.1.stats.rx.FCSErrors: 0
>> > dev.bge.1.stats.rx.AlignmentErrors: 0
>> > dev.bge.1.stats.rx.xonPauseFramesReceived: 0
>> > dev.bge.1.stats.rx.xoffPauseFramesReceived: 0
>> > dev.bge.1.stats.rx.ControlFramesReceived: 0
>> > dev.bge.1.stats.rx.xoffStateEntered: 0
>> > dev.bge.1.stats.rx.FramesTooLong: 0
>> > dev.bge.1.stats.rx.Jabbers: 0
>> > dev.bge.1.stats.rx.UndersizePkts: 0
>> > dev.bge.1.stats.tx.ifHCOutOctets: 10578000636
>> > dev.bge.1.stats.tx.Collisions: 0
>> > dev.bge.1.stats.tx.XonSent: 0
>> > dev.bge.1.stats.tx.XoffSent: 0
>> > dev.bge.1.stats.tx.InternalMacTransmitErrors: 0
>> > dev.bge.1.stats.tx.SingleCollisionFrames: 0
>> > dev.bge.1.stats.tx.MultipleCollisionFrames: 0
>> > dev.bge.1.stats.tx.DeferredTransmissions: 0
>> > dev.bge.1.stats.tx.ExcessiveCollisions: 0
>> > dev.bge.1.stats.tx.LateCollisions: 0
>> > dev.bge.1.stats.tx.UnicastPkts: 64545266
>> > dev.bge.1.stats.tx.MulticastPkts: 0
>> > dev.bge.1.stats.tx.BroadcastPkts: 313
>> >
>> > and
>> > 0/1710531/2006005 requests for mbufs denied
>> (mbufs/clusters/mbuf+clusters)
>> > -- and here --
>> >
>> > I'll start gathering some stats/charts on this host to see if I can
>> > correlate the starvation with other system events.
>> >
>>
>> Now MAC statistics counter show no abnormal things which in turn
>> indicates the mbuf starvation came from other issues. The next
>> thing is to identify which process or kernel subsystem consumes a
>> lot of mbuf clusters.
>>
>>
> Thanks for the feedback. Oh, there is a BPF consumer listening on bge1.
> After noticing
> http://www.mail-archive.com/freebsd-net@freebsd.org/msg25685.html, I
> decided to shut it down for a while. It's pretty weird, my BPF buffer size
> is set to 4MB and traffic on that interface is nowhere near that high. I'll
> get back as soon as I have new data.
>
>
>> >
>> >
>> > > However this does not explain why you have large number of mbuf
>> > > cluster allocation failure. The only wild guess I have at this
>> > > moment is some process or kernel subsystems are too slow to release
>> > > allocated mbuf clusters. Did you check various system activities
>> > > while seeing the issue?
>> > >
>>
>
>
>
> --
> Good, fast & cheap. Pick any two.
>


I've finally managed to see what triggers the symptom. It's a SYN flood.
Tweaking the syncache and disabling PF made no measurable difference. What
is odd is that the clock swi starts eating up more than 50% of the CPU. I
tried both ACPI-fast and TSC. The machine is UP, so when the clock swi takes
50% of the CPU and the netisr swi takes another 50%, there isn't much CPU
time left for user processes.

-- 
Good, fast & cheap. Pick any two.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTinrZbhc8p=OLCJdAfdi-d17s2nR_A>