From owner-svn-src-user@FreeBSD.ORG  Mon Dec  3 09:21:05 2012
Return-Path: <owner-svn-src-user@FreeBSD.ORG>
Delivered-To: svn-src-user@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 218E92F4;
 Mon,  3 Dec 2012 09:21:05 +0000 (UTC)
 (envelope-from sobomax@sippysoft.com)
Received: from mail.sippysoft.com (hub.sippysoft.com [174.36.24.17])
 by mx1.freebsd.org (Postfix) with ESMTP id E7C638FC12;
 Mon,  3 Dec 2012 09:21:04 +0000 (UTC)
Received: from s173-180-43-49.bc.hsia.telus.net ([173.180.43.49]
 helo=[192.168.22.32])
 by mail.sippysoft.com with esmtpsa (TLSv1:DHE-RSA-CAMELLIA256-SHA:256)
 (Exim 4.80 (FreeBSD)) (envelope-from <sobomax@sippysoft.com>)
 id 1TfSD1-0000Id-MQ; Mon, 03 Dec 2012 01:21:03 -0800
Message-ID: <50BC6EF9.4040706@FreeBSD.org>
Date: Mon, 03 Dec 2012 01:20:57 -0800
From: Maxim Sobolev <sobomax@FreeBSD.org>
Organization: Sippy Software, Inc.
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
To: Andre Oppermann <andre@freebsd.org>
Subject: Re: svn commit: r242910 - in user/andre/tcp_workqueue/sys: kern sys
References: <201211120847.qAC8lEAM086331@svn.freebsd.org>
 <50A0D420.4030106@freebsd.org> <0039CD42-C909-41D0-B0A7-7DFBC5B8D839@mu.org>
 <50A1206B.1000200@freebsd.org> <3D373186-09E2-48BC-8451-E4439F99B29D@mu.org>
 <50BC4EF6.8040902@FreeBSD.org> <50BC61A1.9040604@freebsd.org>
In-Reply-To: <50BC61A1.9040604@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Sender: sobomax@sippysoft.com
X-ssp-trusted: yes
Cc: Alfred Perlstein <bright@mu.org>,
 "src-committers@freebsd.org" <src-committers@freebsd.org>,
 "svn-src-user@freebsd.org" <svn-src-user@freebsd.org>
X-BeenThere: svn-src-user@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: "SVN commit messages for the experimental &quot; user&quot;
 src tree" <svn-src-user.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-user>
List-Post: <mailto:svn-src-user@freebsd.org>
List-Help: <mailto:svn-src-user-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-user>,
 <mailto:svn-src-user-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 03 Dec 2012 09:21:05 -0000

>> We are also in quite mbufs hungry environment, is's not 10GigE, but we
>> are dealing with forwarding
>> voice traffic, which consists of predominantly very small packets
>> (20-40 bytes). So we have a lot of
>> small packets in-flight, which uses a lot of MBUFS.
>>
>> What however happens, the network stack consistently lock up after we
>> put more than 16-18MB/sec onto
>> it, which corresponds to about 350-400 Kpps.
>
> Can you drop into kdb?  Do you have any backtrace to see where or how it
> lock up?

Unfortunately it's hardly and option in production, unless we can 
reproduce the issue on the test machine. It is not locking up per se, 
but all network-related activity ceases. We can still get in through kvm 
console.

>> This is way lower than any nmbclusters/maxusers limits we have
>> (1.5m/1500).
>>
>> With half of that critical load right now we see something along those
>> lines:
>>
>> 66365/71953/138318/1597440 mbuf clusters in use (current/cache/total/max)
>> 149617K/187910K/337528K bytes allocated to network (current/cache/total)
>>
>> Machine has 24GB of ram.
>>
>> vm.kmem_map_free: 24886267904
>> vm.kmem_map_size: 70615040
>> vm.kmem_size_scale: 1
>> vm.kmem_size_max: 329853485875
>> vm.kmem_size_min: 0
>> vm.kmem_size: 24956903424
>>
>> So my question is whether there are some other limits that can cause
>> MBUFS starvation if the number
>> of allocated clusters grows to more than 200-250k? I am curious how it
>> works in the dynamic system -
>> since no memory is pre-allocated for MBUFS, what happens if the
>> network load increases gradually
>> while the system is running? Is it possible to get to ENOMEM
>> eventually with all memory already
>> taken for other pools?
>
> Yes, mbuf allocation is not guaranteed and can fail before the limit is
> reached.  What may happen is that a RX DMA ring refill failed and the
> driver wedges.  This would be a driver bug.
>
> Can you give more information on the NIC's and drivers you use?

All of them use various incarnations of Intel GigE chip, mostly igb(4), 
but we've seen the same behaviour with em(4) as well.

Both 8.2 and 8.3 are affected. We have not been able to confirm if 9.1 
has the same issue.

igb1: <Intel(R) PRO/1000 Network Connection version - 2.3.1> port 
0xec00-0xec1f mem 
0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbe9c000-0xfbe9ffff irq 40 
at device 0.1 on pci10
igb1: Using MSIX interrupts with 9 vectors
igb1: Ethernet address: 00:30:48:cf:bb:1d
igb1: [ITHREAD]
igb1: Bound queue 0 to cpu 8
igb1: [ITHREAD]
igb1: Bound queue 1 to cpu 9
igb1: [ITHREAD]
igb1: Bound queue 2 to cpu 10
igb1: [ITHREAD]
igb1: Bound queue 3 to cpu 11
igb1: [ITHREAD]
igb1: Bound queue 4 to cpu 12
igb1: [ITHREAD]
igb1: Bound queue 5 to cpu 13
igb1: [ITHREAD]
igb1: Bound queue 6 to cpu 14
igb1: [ITHREAD]
igb1: Bound queue 7 to cpu 15
igb1: [ITHREAD]

igb1@pci0:10:0:1:       class=0x020000 card=0x10c915d9 chip=0x10c98086 
rev=0x01 hdr=0x00
     vendor     = 'Intel Corporation'
     class      = network
     subclass   = ethernet

-Maxim