Date: Wed, 22 Jan 2020 18:48:20 +0200 From: Christos Chatzaras <chris@cretaforce.gr> To: FreeBSD Mailing List <freebsd-questions@freebsd.org> Subject: Re: 12.1 RELEASE General Protection Fault (Trap 9) Message-ID: <5A315787-F2FA-48BC-81BC-6668C1C08493@cretaforce.gr> In-Reply-To: <693acc2b-b573-9fba-ab73-91d28f27e8ac@infracaninophile.co.uk> References: <22046a36-12d3-032a-6325-24e18b1a855b@lateapex.net> <693acc2b-b573-9fba-ab73-91d28f27e8ac@infracaninophile.co.uk>
next in thread | previous in thread | raw e-mail | index | archive | help
> On 22 Jan 2020, at 18:36, matthew@freebsd.org <matthew@FreeBSD.org> = wrote: >=20 > On 22/01/2020 15:12, Jason Van Patten wrote: >> Since sometime before Christmas (as far as I know), my NAS has = started randomly crashing, reloading, and saving cores in /var/crash. = It was doing this with 12.0 and now with 12.1. My gut tells me it's = hardware related, but I'm not quite sure. The various bits and pieces = are: >=20 > Given the crashes do not appear to be associated with any particular = activity, I think you're on the money with your diagnosis that it is = hardware related. >=20 > Did you change any of the hardware on this system recently? If you've = added more disks or such, then you may have overloaded the PSU. If the = PSU can't produce voltages in spec, then you will see random crashes, = although I doubt in that case you'ld always see 'General PRotection = Fault'. Unless this is a new machine, or you've changed some of the = hardware this is unlikely to be the diagnosis. >=20 > Otherwise, suspect hardware problems. In rough order of expense, = least to most: >=20 > * Bad heatsink, failed case fan, CPU thermal paste not up to snuff > or other cause that may lead to your system overheating >=20 > * Bad memory >=20 > * Bad CPU >=20 > The first of these is relatively cheap and easy to handle: make sure = you're getting unimpeded airflow through the chassis -- clean any = filters, make sure fans are spinning correctly and that heatsinks have = good thermal contact, if necessary by renewing any thermal paste. = Monitoring the CPU temperature will help here -- if you see the CPU = temperature increasing just before everything goes kaput, that's a = fairly solid diagnostic. For an i7, you should be able to use the = coretemp(4) kernel module and read-off the temperature from the = dev.cpu.%d.temperature sysctls. >=20 > Memory problems can frequently be diagnosed by use of a memory checker = like sysutils/memtest86+ -- if this says you have a problem, then you do = have a problem. However, it may not catch every possible memory problem = so it can wrongly give you an 'all clear'. It's pretty accurate in = practice though. A more definitive test is to swap out any suspect RAM = modules and see if the problem goes away. >=20 > The worst case is a bad CPU. memtest86+ will diagnose some CPU = faults, but it is less effective on CPU problems. If there is a CPU = problem, it will be a pretty subtle one, as typical symptoms of CPU = problems are the system won't boot and the BIOS makes horrible beeping = noises when you try. >=20 > Even so, this isn't a definitive list. I've heard tales about trying = to diagnose this sort of problem where someone had bit by bit swapped = out all of the components of a system except for the case, and the = problem still occurred. Turned out the case was slightly bent and that = put enough stress on the motherboard to cause some intermittent = electrical connectivity. >=20 > Cheers, >=20 > Matthew I had similar crashes and it was bad RAM. I recommend to check RAM using the userland memtester if downtime is not = an option. Keep in mind that it's better to use memtest86+ as it can check all RAM.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5A315787-F2FA-48BC-81BC-6668C1C08493>