Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 22 Jan 2020 18:48:20 +0200
From:      Christos Chatzaras <chris@cretaforce.gr>
To:        FreeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   Re: 12.1 RELEASE General Protection Fault (Trap 9)
Message-ID:  <5A315787-F2FA-48BC-81BC-6668C1C08493@cretaforce.gr>
In-Reply-To: <693acc2b-b573-9fba-ab73-91d28f27e8ac@infracaninophile.co.uk>
References:  <22046a36-12d3-032a-6325-24e18b1a855b@lateapex.net> <693acc2b-b573-9fba-ab73-91d28f27e8ac@infracaninophile.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help


> On 22 Jan 2020, at 18:36, matthew@freebsd.org <matthew@FreeBSD.org> =
wrote:
>=20
> On 22/01/2020 15:12, Jason Van Patten wrote:
>> Since sometime before Christmas (as far as I know), my NAS has =
started randomly crashing, reloading, and saving cores in /var/crash.  =
It was doing this with 12.0 and now with 12.1.  My gut tells me it's =
hardware related, but I'm not quite sure.  The various bits and pieces =
are:
>=20
> Given the crashes do not appear to be associated with any particular =
activity, I think you're on the money with your diagnosis that it is =
hardware related.
>=20
> Did you change any of the hardware on this system recently?  If you've =
added more disks or such, then you may have overloaded the PSU.  If the =
PSU can't produce voltages in spec, then you will see random crashes, =
although I doubt in that case you'ld always see 'General PRotection =
Fault'.  Unless this is a new machine, or you've changed some of the =
hardware this is unlikely to be the diagnosis.
>=20
> Otherwise, suspect hardware problems.  In rough order of expense, =
least to most:
>=20
>   * Bad heatsink, failed case fan, CPU thermal paste not up to snuff
>     or other cause that may lead to your system overheating
>=20
>   * Bad memory
>=20
>   * Bad CPU
>=20
> The first of these is relatively cheap and easy to handle: make sure =
you're getting unimpeded airflow through the chassis -- clean any =
filters, make sure fans are spinning correctly and that heatsinks have =
good thermal contact, if necessary by renewing any thermal paste. =
Monitoring the CPU temperature will help here -- if you see the CPU =
temperature increasing just before everything goes kaput, that's a =
fairly solid diagnostic. For an i7, you should be able to use the =
coretemp(4) kernel module and read-off the temperature from the =
dev.cpu.%d.temperature sysctls.
>=20
> Memory problems can frequently be diagnosed by use of a memory checker =
like sysutils/memtest86+ -- if this says you have a problem, then you do =
have a problem.  However, it may not catch every possible memory problem =
so it can wrongly give you an 'all clear'.  It's pretty accurate in =
practice though.  A more definitive test is to swap out any suspect RAM =
modules and see if the problem goes away.
>=20
> The worst case is a bad CPU.  memtest86+ will diagnose some CPU =
faults, but it is less effective on CPU problems.  If there is a CPU =
problem, it will be a pretty subtle one, as typical symptoms of CPU =
problems are the system won't boot and the BIOS makes horrible beeping =
noises when you try.
>=20
> Even so, this isn't a definitive list.  I've heard tales about trying =
to diagnose this sort of problem where someone had bit by bit swapped =
out all of the components of a system except for the case, and the =
problem still occurred.  Turned out the case was slightly bent and that =
put enough stress on the motherboard to cause some intermittent =
electrical connectivity.
>=20
> 	Cheers,
>=20
> 	Matthew

I had similar crashes and it was bad RAM.

I recommend to check RAM using the userland memtester if downtime is not =
an option.

Keep in mind that it's better to use memtest86+ as it can check all RAM.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A315787-F2FA-48BC-81BC-6668C1C08493>