Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Jan 2020 16:36:52 +0000
From:      matthew@FreeBSD.org
To:        freebsd-questions@freebsd.org
Subject:   Re: 12.1 RELEASE General Protection Fault (Trap 9)
Message-ID:  <693acc2b-b573-9fba-ab73-91d28f27e8ac@infracaninophile.co.uk>
In-Reply-To: <22046a36-12d3-032a-6325-24e18b1a855b@lateapex.net>
References:  <22046a36-12d3-032a-6325-24e18b1a855b@lateapex.net>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
On 22/01/2020 15:12, Jason Van Patten wrote:
> Since sometime before Christmas (as far as I know), my NAS has started 
> randomly crashing, reloading, and saving cores in /var/crash.  It was 
> doing this with 12.0 and now with 12.1.  My gut tells me it's hardware 
> related, but I'm not quite sure.  The various bits and pieces are:

Given the crashes do not appear to be associated with any particular 
activity, I think you're on the money with your diagnosis that it is 
hardware related.

Did you change any of the hardware on this system recently?  If you've 
added more disks or such, then you may have overloaded the PSU.  If the 
PSU can't produce voltages in spec, then you will see random crashes, 
although I doubt in that case you'ld always see 'General PRotection 
Fault'.  Unless this is a new machine, or you've changed some of the 
hardware this is unlikely to be the diagnosis.

Otherwise, suspect hardware problems.  In rough order of expense, least 
to most:

    * Bad heatsink, failed case fan, CPU thermal paste not up to snuff
      or other cause that may lead to your system overheating

    * Bad memory

    * Bad CPU

The first of these is relatively cheap and easy to handle: make sure 
you're getting unimpeded airflow through the chassis -- clean any 
filters, make sure fans are spinning correctly and that heatsinks have 
good thermal contact, if necessary by renewing any thermal paste. 
Monitoring the CPU temperature will help here -- if you see the CPU 
temperature increasing just before everything goes kaput, that's a 
fairly solid diagnostic. For an i7, you should be able to use the 
coretemp(4) kernel module and read-off the temperature from the 
dev.cpu.%d.temperature sysctls.

Memory problems can frequently be diagnosed by use of a memory checker 
like sysutils/memtest86+ -- if this says you have a problem, then you do 
have a problem.  However, it may not catch every possible memory problem 
so it can wrongly give you an 'all clear'.  It's pretty accurate in 
practice though.  A more definitive test is to swap out any suspect RAM 
modules and see if the problem goes away.

The worst case is a bad CPU.  memtest86+ will diagnose some CPU faults, 
but it is less effective on CPU problems.  If there is a CPU problem, it 
will be a pretty subtle one, as typical symptoms of CPU problems are the 
system won't boot and the BIOS makes horrible beeping noises when you try.

Even so, this isn't a definitive list.  I've heard tales about trying to 
diagnose this sort of problem where someone had bit by bit swapped out 
all of the components of a system except for the case, and the problem 
still occurred.  Turned out the case was slightly bent and that put 
enough stress on the motherboard to cause some intermittent electrical 
connectivity.

	Cheers,

	Matthew



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?693acc2b-b573-9fba-ab73-91d28f27e8ac>