From owner-freebsd-net@FreeBSD.ORG Mon Apr 4 07:46:12 2011 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55F8C106564A for ; Mon, 4 Apr 2011 07:46:12 +0000 (UTC) (envelope-from dudu@dudu.ro) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 246BA8FC22 for ; Mon, 4 Apr 2011 07:46:11 +0000 (UTC) Received: by iyj12 with SMTP id 12so7094095iyj.13 for ; Mon, 04 Apr 2011 00:46:11 -0700 (PDT) Received: by 10.42.1.70 with SMTP id 6mr10110343icf.483.1301901396118; Mon, 04 Apr 2011 00:16:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.3.13 with HTTP; Mon, 4 Apr 2011 00:15:56 -0700 (PDT) In-Reply-To: <4D9969A8.1060701@rdtc.ru> References: <4D9969A8.1060701@rdtc.ru> From: Vlad Galu Date: Mon, 4 Apr 2011 09:15:56 +0200 Message-ID: To: Eugene Grosbein Content-Type: text/plain; charset=KOI8-R X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: "net@freebsd.org" Subject: Re: mbuf clusters exhaustion & keglimit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Apr 2011 07:46:12 -0000 On Mon, Apr 4, 2011 at 8:48 AM, Eugene Grosbein wrote: > Hi! > > I'm running several loaded PPPoE access servers based on FreeBSD > 8.2-STABLE/amd64 > with em and igb network interfaces and 4GB RAM. No memory-intensive tasks > other than routing about 2Gbit/s (1G "in" and a bit less "out"). > > kern.ipc.nmbclusters is set to 100000 in /etc/sysctl.conf and several > months > I had no problems with mbufs. Last week one of the routes stopped > serviceing > users for several hours but responded to pings and console was alive. > Outgoing ping worked fine too but any process trying to use > TCP or UDP kernel service got stuck in "keglimit" state. > > I've dropped to KDB from console, ran "call doadump", got full crashdump, > returned from KDB, saved crashdump and tried to reboot cleanly. > > mpd5 failed to stop within 30 seconds timeout but file systems > were unmounted cleanly and system rebooted. > > "vmstat -z -M vmcore" says that system was out of mbuf clusters: > > ITEM SIZE LIMIT USED FREE REQUESTS > FAILURES > mbuf_cluster: 2048, 100000, 100000, 0, 18897242, > 317691 > > After that I've created graphs of mbuf cluster usage for all my routers > and see no apparent leaks. > > The question is: how much kernel memory is it safe to dedicate to mbuf > clusters? > This system still runs with 100000 mbuf clusters maximum: > > Mem: 65M Active, 2759M Inact, 455M Wired, 31M Cache, 398M Buf, 435M Free > > It seems, 100000 mbuf clusters take only 207MB (2048+256 bytes for each), > do they? > > Eugene Grosbein > I've been having the same kind of issues with another 8.2/amd64 box with bge(4) NICs. Unfortunately I don't have console access to that machine and haven't yet graphed anything, but it just so happened for the symptom to occur while I was logged in a couple of days ago and the machine wasn't busy handling anything else than my SSH session. The ISP has checked their switch graphs and told me there was no spike that would correlate to this event either. My machine is UP and I tried both direct and queued (with various queue lenghts) ISR dispatch modes. I never had more than 250k mbuf clusters allocated but for this machine's workload even that is quite generous... -- Good, fast & cheap. Pick any two.