Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Nov 2009 11:29:00 -0800
From:      Mark Atkinson <atkin901@yahoo.com>
To:        freebsd-current@freebsd.org
Subject:   Re: 8.0RC2 amd64 - kernel panic running make buildworld
Message-ID:  <hdcetu$4iq$1@ger.gmane.org>
In-Reply-To: <20091110184821.4f58a0bf@orwell.free.de>
References:  <1031257439203@webmail57.yandex.ru>	<20091105184925.16b55c43@ernst.jennejohn.org>	<31221257446063@webmail71.yandex.ru>	<20091106101943.5a763f43@ernst.jennejohn.org>	<41361257585651@webmail39.yandex.ru>	<20091107115256.3df62bc3@ernst.jennejohn.org>	<1257618758.1511.14.camel@RabbitsDen>	<6511257846119@webmail85.yandex.ru>	<20091110105856.1270038e@ernst.jennejohn.org>	<1257864452.46072.25.camel@RabbitsDen>	<20091110162205.48abcffe@ernst.jennejohn.org>	<4AF99D53.9030005@icyb.net.ua> <20091110184821.4f58a0bf@orwell.free.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Kai Gallasch wrote:
> Am Tue, 10 Nov 2009 19:05:23 +0200
> schrieb Andriy Gapon <avg@icyb.net.ua>:
> 
>> on 10/11/2009 17:22 gary.jennejohn@freenet.de said the following:
>>> Well, OK, I may have misinterpreted what you wrote or have chosen
>>> bad wording myself to convey the same message.  Nonetheless it
>>> looks like a hardware problem to me.
>> [Trying to make up for my previous mistake.]
>>
>> The symptom certainly looks like misbehaving hardware, but other
>> information from the reports seems to suggest that it is possible
>> that this misbehavior might be caused by software misconfiguring the
>> hardware.
> 
> Hi.
> 
> This thread was started by me. In the meantime I filed a PR:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=140338 
>  
>> I would re-test vm.pmap.pg_ps_enabled=0 just to be sure that it was
>> correctly teh first time.
> 
> I toggled vm.pmap.pg_ps_enabled three times between reboots and the
> result is always the same. superpages enabled: reboot, superpages not
> enabled: server stable
> 
>> I would try to see how 8.0-RC1 kernel behaves and in general try to
>> find last working, first non-working version.
> 8.0RC1, 8.0BETA4 already showed the same behaviour
> 
>> It would be useful to know any (if any) non-default loader.conf and
>> rc.conf settings or kernel config (if not GENERIC).
> 
> loader.conf untouched, rc.conf had just settings for networking active
> when testing. In the end I enabled some other stuff to have it ready for
> 8.0 RELEASE, *after* I found out that disabling superpages helped
> against the crashes.
> 
> Ah yes. I also ran memtest86 on the server for about half a day - no
> problems.
> 
> But read for yourself in the PR.
> 
> I don't rule out that this behaviour with vm.pmap.pg_ps_enabled maybe
> hardware related, but why then is the server running stable
> with RELENG_7 and memtest and server diagnostics don't report any
> problem? 

See the following, where I noticed this problem first a long time
ago on my HPDL385g5.  It also passed memtest86 for days and I was able
to swap out memory modules to the same result.

http://article.gmane.org/gmane.os.freebsd.current/111307

I suspect this is actually a machine check exception you're seeing,
which you'll notice if you enable

hw.mca.enabled="1", and superpages, then do buildworld. Using -j doesn't
matter, it's just takes longer to throw an exception.

I'm hoping this is the rev E lfence problem, even though my chips are
not targetted.   When and if a patch goes into -current, I'll try it out
to see if the problem with superpages goes away.

-Mark





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?hdcetu$4iq$1>