Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Sep 2008 03:56:31 -0700
From:      Jeremy Chadwick <koitsu@FreeBSD.org>
To:        Michael Grant <mgrant@grant.org>
Cc:        Kris Kennaway <kris@freebsd.org>, FreeBSD Stable List <freebsd-stable@freebsd.org>
Subject:   Re: Fresh 7.0 Install: Fatal Trap 12 panic when put under load
Message-ID:  <20080911105631.GB25493@icarus.home.lan>
In-Reply-To: <62b856460809110308sa44f057mc08189a97efa9d0c@mail.gmail.com>
References:  <BF6724CD748744908D602889CCF119F1@emea.hubersuhner.net> <487E0D1B.2060902@FreeBSD.org> <20080716203900.5jt4qce17gg0og0o@mail.basicnets.co.uk> <A403B8D27BE048E79A94B09C0C520854@emea.hubersuhner.net> <B4E29257-B805-4597-9024-E042F34243D1@mac.com> <62b856460807241309k3cea60dbh24eea677cd6751f7@mail.gmail.com> <4888E207.4020606@FreeBSD.org> <62b856460809110138o5fb10171h9832ac8b964fa3f6@mail.gmail.com> <20080911092047.GA24499@icarus.home.lan> <62b856460809110308sa44f057mc08189a97efa9d0c@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Sep 11, 2008 at 12:08:47PM +0200, Michael Grant wrote:
> On Thu, Sep 11, 2008 at 11:20 AM, Jeremy Chadwick <koitsu@freebsd.org> wrote:
> > On Thu, Sep 11, 2008 at 10:38:36AM +0200, Michael Grant wrote:
> >> My box crashed again:
> >>
> >> panic: kmem_malloc(4096): kmem_map too small: 1073741824 total allocated
> >> cpuid = 0
> >> Uptime: 33d11h12m58s
> >> Dumping 3327 MB (2 chunks)
> >>   chunk 0: 1MB (151 pages) ... ok
> >>   chunk 1: 3327MB (851568 pages)  <---hung here
> >>
> >> Still no valid dump.
> >>
> >> There is 4gig of physical memory in the machine.
> >>
> >> In /boot/loader.conf, I currently have the following:
> >>
> >> vm.kmem_size=1G
> >> vm.kmem_size_max=1G
> >> vm.kmem_size_scale=2
> >>
> >> and in my kernel conf file I have:
> >>
> >> options         KVA_PAGES=512
> >>
> >> It stayed up for 33 days this time.  Is there anything else I can do?
> >
> > First and foremost: are you using ZFS on this machine?  If so, there are
> > many tunables you can apply to try and limit this; I'm willing to bet
> > it's ARC which is doing it.  See below.
> >
> > In general, it appears that you need to increase the maximum range of
> > kmem.  The kernel attempted to utilise more than 1GB, and your limit is
> > 1G.  My machines running RELENG_7 on amd64, with only 2GB of RAM
> > installed, use the following tunables in loader.conf:
> >
> > vm.kmem_size="1536M"
> > vm.kmem_size_max="1536M"
> >
> > If ZFS is in use, I recommend these as well:
> >
> > vfs.zfs.arc_min="16M"
> > vfs.zfs.arc_max="64M"
> > vfs.zfs.prefetch_disable="1"
> >
> > Do not increase kmem_size any larger than 1.5GB; the amount of RAM you
> > have in the machine, with regards to RELENG_7, will not help.  This is a
> > known limitation which has been fixed in HEAD/CURRENT (where the limit
> > has been increased to 512GB).  See the "Kernel" section below; you'll
> > see the applicable item.
> >
> > http://wiki.freebsd.org/JeremyChadwick/Commonly_reported_issues
> >
> > Your only solution may be to run HEAD/CURRENT.
> 
> I am not running ZFS.  My file systems are ufs.
> 
> This feels like some sort of memory leak in the kernel.  Giving it
> more and more memory just seems to delay the crash.  Are you saying
> the crash is fixed in HEAD/CURRENT?

It's an intentional crash, not "the program tried to access NULL, which
crashed the machine" crash.  The kernel wants more memory to accomplish
a certain thing, and it's not available.  kris@ can explain this in
better terms than I can.

First and foremost, it would be good to find out what all you are
running on this machine (process-wise).  A process could be tickling
something in the kernel which requires a large amount of memory to be
required.  I can imagine something like MySQL would require this.

Ideally what needs to happen is to debug the kernel or get a full map
of kmem to find out what's using what.  I believe vmstat -m or vmstat -z
output might help.

Obviously since the machine panics, you won't be able to run those
commands after the fact.  I would recommend you set up a cronjob that
runs every 1-2 minutes and logs the output of both of those commands
to a file.  When the panic happens, restart the system and look at
the logfile to see if you can figure out if anything suddenly starts
taking up a large amount of memory, or if it's a gradual thing
(indicating a memory leak).

If you can figure out what might be tickling the problem, you can
ultimately figure out if increasing kmem is the right thing to do, or if
there's a greater problem here.

> I'm running 6.3 by the way.
> 
> I have put your changes into my loader.conf, we'll see how long it
> goes this time.  I'm not qute in position to update everything to 7.x
> at the moment.

Our production webservers run RELENG_6 and RELENG_7, and we don't
encounter this kind of problem.  I'm not saying what you're experiencing
is indicative of hardware issues or something like that -- I'm simply
saying I have loaded systems which don't ever hit that condition.  So
figuring out what's causing it in your case would be good.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080911105631.GB25493>