From owner-svn-src-user@FreeBSD.ORG Mon Dec 3 09:21:05 2012 Return-Path: Delivered-To: svn-src-user@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 218E92F4; Mon, 3 Dec 2012 09:21:05 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail.sippysoft.com (hub.sippysoft.com [174.36.24.17]) by mx1.freebsd.org (Postfix) with ESMTP id E7C638FC12; Mon, 3 Dec 2012 09:21:04 +0000 (UTC) Received: from s173-180-43-49.bc.hsia.telus.net ([173.180.43.49] helo=[192.168.22.32]) by mail.sippysoft.com with esmtpsa (TLSv1:DHE-RSA-CAMELLIA256-SHA:256) (Exim 4.80 (FreeBSD)) (envelope-from ) id 1TfSD1-0000Id-MQ; Mon, 03 Dec 2012 01:21:03 -0800 Message-ID: <50BC6EF9.4040706@FreeBSD.org> Date: Mon, 03 Dec 2012 01:20:57 -0800 From: Maxim Sobolev Organization: Sippy Software, Inc. User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys References: <201211120847.qAC8lEAM086331@svn.freebsd.org> <50A0D420.4030106@freebsd.org> <0039CD42-C909-41D0-B0A7-7DFBC5B8D839@mu.org> <50A1206B.1000200@freebsd.org> <3D373186-09E2-48BC-8451-E4439F99B29D@mu.org> <50BC4EF6.8040902@FreeBSD.org> <50BC61A1.9040604@freebsd.org> In-Reply-To: <50BC61A1.9040604@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: sobomax@sippysoft.com X-ssp-trusted: yes Cc: Alfred Perlstein , "src-committers@freebsd.org" , "svn-src-user@freebsd.org" X-BeenThere: svn-src-user@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the experimental " user" src tree" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 03 Dec 2012 09:21:05 -0000 >> We are also in quite mbufs hungry environment, is's not 10GigE, but we >> are dealing with forwarding >> voice traffic, which consists of predominantly very small packets >> (20-40 bytes). So we have a lot of >> small packets in-flight, which uses a lot of MBUFS. >> >> What however happens, the network stack consistently lock up after we >> put more than 16-18MB/sec onto >> it, which corresponds to about 350-400 Kpps. > > Can you drop into kdb? Do you have any backtrace to see where or how it > lock up? Unfortunately it's hardly and option in production, unless we can reproduce the issue on the test machine. It is not locking up per se, but all network-related activity ceases. We can still get in through kvm console. >> This is way lower than any nmbclusters/maxusers limits we have >> (1.5m/1500). >> >> With half of that critical load right now we see something along those >> lines: >> >> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max) >> 149617K/187910K/337528K bytes allocated to network (current/cache/total) >> >> Machine has 24GB of ram. >> >> vm.kmem_map_free: 24886267904 >> vm.kmem_map_size: 70615040 >> vm.kmem_size_scale: 1 >> vm.kmem_size_max: 329853485875 >> vm.kmem_size_min: 0 >> vm.kmem_size: 24956903424 >> >> So my question is whether there are some other limits that can cause >> MBUFS starvation if the number >> of allocated clusters grows to more than 200-250k? I am curious how it >> works in the dynamic system - >> since no memory is pre-allocated for MBUFS, what happens if the >> network load increases gradually >> while the system is running? Is it possible to get to ENOMEM >> eventually with all memory already >> taken for other pools? > > Yes, mbuf allocation is not guaranteed and can fail before the limit is > reached. What may happen is that a RX DMA ring refill failed and the > driver wedges. This would be a driver bug. > > Can you give more information on the NIC's and drivers you use? All of them use various incarnations of Intel GigE chip, mostly igb(4), but we've seen the same behaviour with em(4) as well. Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 has the same issue. igb1: port 0xec00-0xec1f mem 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 at device 0.1 on pci10 igb1: Using MSIX interrupts with 9 vectors igb1: Ethernet address: 00:30:48:cf:bb:1d igb1: [ITHREAD] igb1: Bound queue 0 to cpu 8 igb1: [ITHREAD] igb1: Bound queue 1 to cpu 9 igb1: [ITHREAD] igb1: Bound queue 2 to cpu 10 igb1: [ITHREAD] igb1: Bound queue 3 to cpu 11 igb1: [ITHREAD] igb1: Bound queue 4 to cpu 12 igb1: [ITHREAD] igb1: Bound queue 5 to cpu 13 igb1: [ITHREAD] igb1: Bound queue 6 to cpu 14 igb1: [ITHREAD] igb1: Bound queue 7 to cpu 15 igb1: [ITHREAD] igb1@pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet -Maxim