Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Dec 2012 01:28:00 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Derek Kulinski <takeda@takeda.tk>
Cc:        freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org
Subject:   Re: FreeBSD 9.1-RELEASE crashes almost daily; backtraces always list zfs routines
Message-ID:  <50D8E500.1070408@FreeBSD.org>
In-Reply-To: <331959998.20121224101719@takeda.tk>
References:  <1824023197.20121223142308@takeda.tk> <50D87C56.70709@FreeBSD.org> <331959998.20121224101719@takeda.tk>

next in thread | previous in thread | raw e-mail | index | archive | help
on 24/12/2012 20:17 Derek Kulinski said the following:
> Hello Andriy,
> 
> Monday, December 24, 2012, 8:01:26 AM, you wrote:
> 
>> on 24/12/2012 00:23 Derek Kulinski said the following:
>>> Dumping 3701 out of 8072 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
>> So do you have the crash dump(s)?
> 
> Yes, but they are 3.5GB each. I attached text dump to GNATS but I can
> resend it to you (I don't know if it's ok to send attachments to the
> mailing list). If you would prefer I could give you access to the
> box.

Derek,

I've looked through the cores and it does look like in all cases some sort of
memory corruption is a precursor to a subsequent crash.

I can't decidedly say if the corruptions are caused by the hardware, by some
code overwriting random memory locations ("rogue" driver) or by a "simpler" bug
like use after free.

I am always inclined to suspect the hardware first.

You can try to reproduce the problem with some additional checks enabled in the
kernel.  Those should catch the problem earlier and thus make its source clearer.

I recommend the following:
options         INVARIANTS
options         INVARIANT_SUPPORT
options         WITNESS
options         DEBUG_MEMGUARD
makeoptions     DEBUG+="-DDEBUG"

The last is really needed only for the ZFS and OpenSolaris compat code.  It make
result in some extra noise from unrelated subsystems.
Perhaps you could just add "#define DEBUG" to
sys/cddl/contrib/opensolaris/uts/common/sys/debug.h.  I haven't tested this
approach though.

Also, please put vm.memguard.desc="arc_buf_hdr_t" into loader.conf.

Please note that these options will make your system significantly slower.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50D8E500.1070408>