Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Dec 2003 10:32:03 -0800 (PST)
From:      Don Lewis <truckman@FreeBSD.org>
To:        shoesoft@gmx.net
Cc:        current@FreeBSD.org
Subject:   Re: kernel pointer polka, possibly by mount_nfs
Message-ID:  <200312121832.hBCIW3eF058641@gw.catspoiler.org>
In-Reply-To: <1071223849.1494.21.camel@shoeserv.freebsd>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12 Dec, Stefan Ehmann wrote:
> On Thu, 2003-12-11 at 07:49, Don Lewis wrote:

>> 
>> That sounds a somewhat like the Heisenbug I've been on the hunt for in
>> the last few weeks.  This one liked to munch some file system's struct
>> mount, or whatever structure that mnt_data was pointing to.  The system
>> in question typically blew up when attempting to lock mnt_lock in
>> vfs_busy().  The trigger appeared to be the use of read-only ext2fs. The
>> user who reported this problem said that the system would panic after a
>> few hours.  After getting the user to sprinkle KASSERT()s around, I've
>> pretty come to the conclusion that the bug is not in the code for the
>> vfs top half.  Another bit of data is that the struct mount getting
>> nuked doesn't appear to belong to ext2fs.  It's hard to tell whose it is
>> though because it gets zeroed.
>> 
>> I use NFS on my two -CURRENT boxes and haven't run into any problems,
>> and I also haven't been able to reproduce any panics with ext2fs, though
>> I haven't exercised that nearly as much.
> 
> I guess you are talking about my panics. Since we don't seem to make any
> progress - would it help to find out when the change that causes the
> problem was made?
> 
> I was running an end of september kernel for nearly two months without
> having panics 3 times a day. The kernel of Nov 23 had these problems. So
> the problem should be located somwhere in these two months.
> 
> Since this may take quite some time (and a lot of kernel and
> worldbuilds), I'll only take it into account if there is a good chance
> that this will reveal the source of the problem.

Unfortunately, that may be the fastest way to track down the culprit.
The only other way would be to write a more aggressive assertion checker
function that validates the integrity of all the mount structures and
sprinkle lots of calls to this function around the kernel.

I also diff'ed the 2003/09/23 and 2003/11/23 versions of the ext2fs code
and didn't see anything suspicious.  That means that either the culprit
change is something subtle in extfs, or it is elsewhere in the kernel.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200312121832.hBCIW3eF058641>