Date: Mon, 03 Dec 2012 01:20:57 -0800 From: Maxim Sobolev <sobomax@FreeBSD.org> To: Andre Oppermann <andre@freebsd.org> Cc: Alfred Perlstein <bright@mu.org>, "src-committers@freebsd.org" <src-committers@freebsd.org>, "svn-src-user@freebsd.org" <svn-src-user@freebsd.org> Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys Message-ID: <50BC6EF9.4040706@FreeBSD.org> In-Reply-To: <50BC61A1.9040604@freebsd.org> References: <201211120847.qAC8lEAM086331@svn.freebsd.org> <50A0D420.4030106@freebsd.org> <0039CD42-C909-41D0-B0A7-7DFBC5B8D839@mu.org> <50A1206B.1000200@freebsd.org> <3D373186-09E2-48BC-8451-E4439F99B29D@mu.org> <50BC4EF6.8040902@FreeBSD.org> <50BC61A1.9040604@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
>> We are also in quite mbufs hungry environment, is's not 10GigE, but we >> are dealing with forwarding >> voice traffic, which consists of predominantly very small packets >> (20-40 bytes). So we have a lot of >> small packets in-flight, which uses a lot of MBUFS. >> >> What however happens, the network stack consistently lock up after we >> put more than 16-18MB/sec onto >> it, which corresponds to about 350-400 Kpps. > > Can you drop into kdb? Do you have any backtrace to see where or how it > lock up? Unfortunately it's hardly and option in production, unless we can reproduce the issue on the test machine. It is not locking up per se, but all network-related activity ceases. We can still get in through kvm console. >> This is way lower than any nmbclusters/maxusers limits we have >> (1.5m/1500). >> >> With half of that critical load right now we see something along those >> lines: >> >> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max) >> 149617K/187910K/337528K bytes allocated to network (current/cache/total) >> >> Machine has 24GB of ram. >> >> vm.kmem_map_free: 24886267904 >> vm.kmem_map_size: 70615040 >> vm.kmem_size_scale: 1 >> vm.kmem_size_max: 329853485875 >> vm.kmem_size_min: 0 >> vm.kmem_size: 24956903424 >> >> So my question is whether there are some other limits that can cause >> MBUFS starvation if the number >> of allocated clusters grows to more than 200-250k? I am curious how it >> works in the dynamic system - >> since no memory is pre-allocated for MBUFS, what happens if the >> network load increases gradually >> while the system is running? Is it possible to get to ENOMEM >> eventually with all memory already >> taken for other pools? > > Yes, mbuf allocation is not guaranteed and can fail before the limit is > reached. What may happen is that a RX DMA ring refill failed and the > driver wedges. This would be a driver bug. > > Can you give more information on the NIC's and drivers you use? All of them use various incarnations of Intel GigE chip, mostly igb(4), but we've seen the same behaviour with em(4) as well. Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 has the same issue. igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.1> port 0xec00-0xec1f mem 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 at device 0.1 on pci10 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:30:48:cf:bb:1d igb1: [ITHREAD] igb1: Bound queue 0 to cpu 8 igb1: [ITHREAD] igb1: Bound queue 1 to cpu 9 igb1: [ITHREAD] igb1: Bound queue 2 to cpu 10 igb1: [ITHREAD] igb1: Bound queue 3 to cpu 11 igb1: [ITHREAD] igb1: Bound queue 4 to cpu 12 igb1: [ITHREAD] igb1: Bound queue 5 to cpu 13 igb1: [ITHREAD] igb1: Bound queue 6 to cpu 14 igb1: [ITHREAD] igb1: Bound queue 7 to cpu 15 igb1: [ITHREAD] igb1@pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet -Maxim
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50BC6EF9.4040706>