Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Apr 2016 10:27:06 +0200
From:      "lokadamus@gmx.de" <lokadamus@gmx.de>
To:        shahzaib mushtaq <shahzaib.cb@gmail.com>
Cc:        freebsd-questions@freebsd.org, galtsev@kicp.uchicago.edu
Subject:   Re: FreeBSD Crashes Intermittently !!
Message-ID:  <5715EBDA.10907@gmx.de>
In-Reply-To: <CAD3xhrMTv_tOXTWX7kCim-mptYkXRZ-n_Mx%2BFX80OQkd-WMsPw@mail.gmail.com>
References:  <CAD3xhrMfKO8hVdpzR1xNqV=vwTMedPeTHR7v2=5W6RwC3F4V7A@mail.gmail.com> <56E2E9AC.1040902@gmx.de> <33444.128.135.52.6.1457712900.squirrel@cosmo.uchicago.edu> <56E2F586.9000108@gmx.de> <CAD3xhrM_Q=OjZzbJO1jY5a8Qhqne50ziQjJKSQ3kLO73FnJ0ag@mail.gmail.com> <CAD3xhrP=KYjzOEusoOwnysTqp=ZgWgH1ofrnnmuqaK4Z=7r_pA@mail.gmail.com> <5715DF0F.4090808@gmx.de> <CAD3xhrMTv_tOXTWX7kCim-mptYkXRZ-n_Mx%2BFX80OQkd-WMsPw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

I think about the error lines:
Hardware event. This is not a software error.
CPU 23 BANK 5
MISC 0 ADDR 805613c60
MCG status:MCIP
STATUS be00000000800400 MCGSTATUS 4
....
Hardware event. This is not a software error.
CPU 22 BANK 5

https://en.wikipedia.org/wiki/Machine-check_exception

Looks like a hardware problem from the second cpu.
Thinks, what can be done:
- Is it possible to read cpu heat infos from bios?
- Disable HTT and look, if the error comes again
- Remove the second cpu and look, if ...
- Install microcode updates and hope, it will fix it

Intel offers for many CPUs an microcode update.
https://downloadcenter.intel.com/download/25512/Linux-Processor-Microcode-Data-File?v=t

Can you test a cpu in another system?
https://www.freebsd.org/cgi/ports.cgi?query=cpuburn&stype=all


Regards

On 04/19/16 09:35, shahzaib mushtaq wrote:
> Hi, sorry for the mistake, cpus are :
> 
> 2 x Intel(R) Xeon(R) CPU  L5640 @ 2.27GHz5640 (12 cores, 24 threads)
> 
> On Tue, Apr 19, 2016 at 12:32 PM, lokadamus@gmx.de <lokadamus@gmx.de> wrote:
> 
>> On 04/18/16 16:28, shahzaib mushtaq wrote:
>>> Hi again, got back after a long time. So yes, we've move to new Dell R510
>>> Hardware now. Here is the specs :
>>>
>>> DELL R510
>>> 2 x L5520
>>> 64GB RAM
>>> 12x3TB Raid stripping+mirroring (HBA LSI-9211-fw version 19.00)
>>> FreeBSD cw009.tunefiles.com 10.2-RELEASE-p14 FreeBSD 10.2-RELEASE-p14
>> #0:
>>> Wed Mar 16 20:46:12 UTC 2016
>>> root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
>>> amd64
>>>
>>> After 9days of uptime, server again got crashed with following error in
>>> crash log :
>>>
>>> http://pastebin.com/baShWuMP
>>>
>>> I am so much depressed now, there's much pressure on me from my company.
>> Please
>>> help us resolving this crash issue . :(
>> Which CPU Model is installed? Is it one or more?
>>
>> There where some microcode updates for some models.
>>
>> Greeting
>>
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"
> 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5715EBDA.10907>