Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Feb 2016 12:14:05 -0800
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-hardware@freebsd.org
Cc:        Ultima <ultima1252@gmail.com>
Subject:   Re: MCA error, possible causes?
Message-ID:  <1599604.5jmidy9vDx@ralph.baldwin.cx>
In-Reply-To: <CANJ8om7C2UreYEkm-=XxL222Gqmc9i5kQH2p=oc8ntgbkehn5A@mail.gmail.com>
References:  <CANJ8om7C2UreYEkm-=XxL222Gqmc9i5kQH2p=oc8ntgbkehn5A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, February 12, 2016 08:11:37 PM Ultima wrote:
>  Recently installed some cpus and received two MCA errors. Using mcelog, I
> found that the version in ports is about 5 years out of dated and didn't
> support my cpu. Decided to update it to the newest version (Will post on
> bugzilla shortly) to pull some more info. Going to post orig and decoded
> mcelog.
> 
> 
> Raw:
> MCA: Bank 20, Status 0xc800084000310e0f
> MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 0
> MCA: CPU 0 COR (33) OVER BUSLG ??? ERR Other
> MCA: Misc 0x1df87b000d9eff
> MCA: Bank 5, Status 0xc800008000310e0f
> MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 42
> MCA: CPU 34 COR (2) OVER BUSLG ??? ERR Other
> MCA: Misc 0xdf87b008d9eff
> 
> mcelog v131:
> Hardware event. This is not a software error.
> CPU 0 BANK 20
> MISC 1df87b000d9eff
> MCG status:
> QPI: Rx detected CRC error - successful LLR wihout Phy re-init
> STATUS c800084000310e0f MCGSTATUS 0
> MCGCAP 7000c16 APICID 0 SOCKETID 0
> CPUID Vendor Intel Family 6 Model 63
> Hardware event. This is not a software error.
> CPU 34 BANK 5
> MISC df87b008d9eff
> MCG status:
> QPI: Rx detected CRC error - successful LLR wihout Phy re-init
> STATUS c800008000310e0f MCGSTATUS 0
> MCGCAP 7000c16 APICID 2a SOCKETID 0
> CPUID Vendor Intel Family 6 Model 63
> 
>  After receiving this error, the system was in a frozen state. Any ideas
> what may cause this?

Well, hardware causes it.  QPI is the interconnect bus between your 
CPUs and RAM.  "Rx detected CRC error" implies that a CPU detected a
corrupted message on that bus, but when it requested a resend the
resent message was ok.  Normally corrected errors shouldn't hang your
machine, but perhaps your machine had another hardware error after this
that broke it too badly to report and/or log the subsequent error.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1599604.5jmidy9vDx>