Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 3 Dec 2012 09:38:00 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Maxim Sobolev <sobomax@FreeBSD.org>
Cc:        Alfred Perlstein <bright@mu.org>, Andre Oppermann <andre@freebsd.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-user@freebsd.org" <svn-src-user@freebsd.org>
Subject:   Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys
Message-ID:  <alpine.BSF.2.00.1212030937160.18806@fledge.watson.org>
In-Reply-To: <50BC6EF9.4040706@FreeBSD.org>
References:  <201211120847.qAC8lEAM086331@svn.freebsd.org> <50A0D420.4030106@freebsd.org> <0039CD42-C909-41D0-B0A7-7DFBC5B8D839@mu.org> <50A1206B.1000200@freebsd.org> <3D373186-09E2-48BC-8451-E4439F99B29D@mu.org> <50BC4EF6.8040902@FreeBSD.org> <50BC61A1.9040604@freebsd.org> <50BC6EF9.4040706@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, 3 Dec 2012, Maxim Sobolev wrote:

>>> We are also in quite mbufs hungry environment, is's not 10GigE, but we are 
>>> dealing with forwarding voice traffic, which consists of predominantly 
>>> very small packets (20-40 bytes). So we have a lot of small packets 
>>> in-flight, which uses a lot of MBUFS.
>>> 
>>> What however happens, the network stack consistently lock up after we put 
>>> more than 16-18MB/sec onto it, which corresponds to about 350-400 Kpps.
>> 
>> Can you drop into kdb?  Do you have any backtrace to see where or how it 
>> lock up?
>
> Unfortunately it's hardly and option in production, unless we can reproduce 
> the issue on the test machine. It is not locking up per se, but all 
> network-related activity ceases. We can still get in through kvm console.

Could you share the results of vmstat -z and netstat -m for the box?

(FYI, if you do find yourself in DDB, "show uma" is essentially the same as 
"vmstat -z".)

Robert


>
>>> This is way lower than any nmbclusters/maxusers limits we have 
>>> (1.5m/1500).
>>> 
>>> With half of that critical load right now we see something along those
>>> lines:
>>> 
>>> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max)
>>> 149617K/187910K/337528K bytes allocated to network (current/cache/total)
>>> 
>>> Machine has 24GB of ram.
>>> 
>>> vm.kmem_map_free: 24886267904
>>> vm.kmem_map_size: 70615040
>>> vm.kmem_size_scale: 1
>>> vm.kmem_size_max: 329853485875
>>> vm.kmem_size_min: 0
>>> vm.kmem_size: 24956903424
>>> 
>>> So my question is whether there are some other limits that can cause
>>> MBUFS starvation if the number
>>> of allocated clusters grows to more than 200-250k? I am curious how it
>>> works in the dynamic system -
>>> since no memory is pre-allocated for MBUFS, what happens if the
>>> network load increases gradually
>>> while the system is running? Is it possible to get to ENOMEM
>>> eventually with all memory already
>>> taken for other pools?
>> 
>> Yes, mbuf allocation is not guaranteed and can fail before the limit is
>> reached.  What may happen is that a RX DMA ring refill failed and the
>> driver wedges.  This would be a driver bug.
>> 
>> Can you give more information on the NIC's and drivers you use?
>
> All of them use various incarnations of Intel GigE chip, mostly igb(4), but 
> we've seen the same behaviour with em(4) as well.
>
> Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 has 
> the same issue.
>
> igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.1> port 
> 0xec00-0xec1f mem 
> 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 at 
> device 0.1 on pci10
> igb1: Using MSIX interrupts with 9 vectors
> igb1: Ethernet address: 00:30:48:cf:bb:1d
> igb1: [ITHREAD]
> igb1: Bound queue 0 to cpu 8
> igb1: [ITHREAD]
> igb1: Bound queue 1 to cpu 9
> igb1: [ITHREAD]
> igb1: Bound queue 2 to cpu 10
> igb1: [ITHREAD]
> igb1: Bound queue 3 to cpu 11
> igb1: [ITHREAD]
> igb1: Bound queue 4 to cpu 12
> igb1: [ITHREAD]
> igb1: Bound queue 5 to cpu 13
> igb1: [ITHREAD]
> igb1: Bound queue 6 to cpu 14
> igb1: [ITHREAD]
> igb1: Bound queue 7 to cpu 15
> igb1: [ITHREAD]
>
> igb1@pci0:10:0:1:       class=0x020000 card=0x10c915d9 chip=0x10c98086 
> rev=0x01 hdr=0x00
>    vendor     = 'Intel Corporation'
>    class      = network
>    subclass   = ethernet
>
> -Maxim
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1212030937160.18806>