Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Nov 2013 12:54:25 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Shawn Wallbridge <shawn@wallbridge.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: 9.2-RELEASE Kernel panic, mbuf underflow
Message-ID:  <20131102195425.GI73243@funkthat.com>
In-Reply-To: <E5E8DE91-F977-421E-95AD-EAD1A087BDEA@wallbridge.net>
References:  <E5E8DE91-F977-421E-95AD-EAD1A087BDEA@wallbridge.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Shawn Wallbridge wrote this message on Tue, Oct 29, 2013 at 21:37 -0700:
> I have a file server that keeps panic?ing with a mbuf cluster in the 17 Quadrillion range (2^64 - 2). I am pretty sure it?s a buffer underflow.

Ok, after some tracking stuff down, I do not think it has anything to
do w/ mbufs, as the stats appear to be correct... The problem is that
mbuf clusters takes into the fact that some clusters might be still
associated w/ packets (from usr.bin/netstat/mbuf.c):
        printf("%ju/%ju/%ju/%ju mbuf clusters in use "
            "(current/cache/total/max)\n",
            cluster_count - packet_free, cluster_free + packet_free,
            cluster_count + cluster_free, cluster_limit);

notice how current is cluster_count - packet_free instead of something
like cluster_count - cluster_free...  And I just printed your values
from vmcore.6, and apparently packet_count is 0, while packet_free is
5215...

cluster_count is 2049, cluster_free is 1997..

And because packet is a secondary zone of mbufs, things apparently get
confused...  So I wouldn't go down this road anymore...  This looks
like a simple race/accounting error in the status...

> I have opened a PR, but I haven?t had any movement on it. This happened while I was running 9.1-RELEASE as well.
> 
> Here is the PR..
> 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=183424
> 
> And I have uploaded the crash dumps here..
> 
> http://www.wallbridge.net/crash/
> 
> If anyone has any ideas, I would be grateful as this is a production box and it?s really impacting us. 

Have you done a full fsck on the fs to make sure that there isn't any
corruption on the disk that keeps popping up?  I do realize that it
will take a LONG time to fsck...  Sadly, you're last three cores
(all on 9.2-R) are for different inodes...

Could you tell me the path and filename of inodes: 3226539015,
3224134148 and 3343904256?  It could help us track down which app is
causing this and being able to reproduce this...

To find the inode on the fs use find <fs> -inum <inum>, so:
find <fs> -inum 3226539015 -or -inum 3224134148 -or -inum 3343904256

will do it in one pass so it won't take so long...

Thanks.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20131102195425.GI73243>