Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Apr 2011 09:15:56 +0200
From:      Vlad Galu <dudu@dudu.ro>
To:        Eugene Grosbein <egrosbein@rdtc.ru>
Cc:        "net@freebsd.org" <net@freebsd.org>
Subject:   Re: mbuf clusters exhaustion & keglimit
Message-ID:  <BANLkTikQc_N6RKE-Ov_nNqivDemefuCpdQ@mail.gmail.com>
In-Reply-To: <4D9969A8.1060701@rdtc.ru>
References:  <4D9969A8.1060701@rdtc.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Apr 4, 2011 at 8:48 AM, Eugene Grosbein <egrosbein@rdtc.ru> wrote:

> Hi!
>
> I'm running several loaded PPPoE access servers based on FreeBSD
> 8.2-STABLE/amd64
> with em and igb network interfaces and 4GB RAM. No memory-intensive tasks
> other than routing about 2Gbit/s (1G "in" and a bit less "out").
>
> kern.ipc.nmbclusters is set to 100000 in /etc/sysctl.conf and several
> months
> I had no problems with mbufs. Last week one of the routes stopped
> serviceing
> users for several hours but responded to pings and console was alive.
> Outgoing ping worked fine too but any process trying to use
> TCP or UDP kernel service got stuck in "keglimit" state.
>
> I've dropped to KDB from console, ran "call doadump", got full crashdump,
> returned from KDB, saved crashdump and tried to reboot cleanly.
>
> mpd5 failed to stop within 30 seconds timeout but file systems
> were unmounted cleanly and system rebooted.
>
> "vmstat -z -M vmcore" says that system was out of mbuf clusters:
>
> ITEM                     SIZE     LIMIT      USED      FREE  REQUESTS
>  FAILURES
> mbuf_cluster:            2048,   100000,   100000,        0, 18897242,
> 317691
>
> After that I've created graphs of mbuf cluster usage for all my routers
> and see no apparent leaks.
>
> The question is: how much kernel memory is it safe to dedicate to mbuf
> clusters?
> This system still runs with 100000 mbuf clusters maximum:
>
> Mem: 65M Active, 2759M Inact, 455M Wired, 31M Cache, 398M Buf, 435M Free
>
> It seems, 100000 mbuf clusters take only 207MB (2048+256 bytes for each),
> do they?
>
> Eugene Grosbein
>

I've been having the same kind of issues with another 8.2/amd64 box with
bge(4) NICs. Unfortunately I don't have console access to that machine and
haven't yet graphed anything, but it just so happened for the symptom to
occur while I was logged in a couple of days ago and the machine wasn't busy
handling anything else than my SSH session. The ISP has checked their switch
graphs and told me there was no spike that would correlate to this event
either. My machine is UP and I tried both direct and queued (with various
queue lenghts) ISR dispatch modes. I never had more than 250k mbuf clusters
allocated but for this machine's workload even that is quite generous...

-- 
Good, fast & cheap. Pick any two.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTikQc_N6RKE-Ov_nNqivDemefuCpdQ>