From owner-freebsd-hardware@freebsd.org Wed Feb 24 20:17:12 2016 Return-Path: Delivered-To: freebsd-hardware@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 12A56AA6CB4 for ; Wed, 24 Feb 2016 20:17:12 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E9333DFB for ; Wed, 24 Feb 2016 20:17:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 0DA5AB94B; Wed, 24 Feb 2016 15:17:11 -0500 (EST) From: John Baldwin To: freebsd-hardware@freebsd.org Cc: Ultima Subject: Re: MCA error, possible causes? Date: Wed, 24 Feb 2016 12:14:05 -0800 Message-ID: <1599604.5jmidy9vDx@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-STABLE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 24 Feb 2016 15:17:11 -0500 (EST) X-BeenThere: freebsd-hardware@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion of FreeBSD hardware List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Feb 2016 20:17:12 -0000 On Friday, February 12, 2016 08:11:37 PM Ultima wrote: > Recently installed some cpus and received two MCA errors. Using mcelog, I > found that the version in ports is about 5 years out of dated and didn't > support my cpu. Decided to update it to the newest version (Will post on > bugzilla shortly) to pull some more info. Going to post orig and decoded > mcelog. > > > Raw: > MCA: Bank 20, Status 0xc800084000310e0f > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 0 > MCA: CPU 0 COR (33) OVER BUSLG ??? ERR Other > MCA: Misc 0x1df87b000d9eff > MCA: Bank 5, Status 0xc800008000310e0f > MCA: Global Cap 0x0000000007000c16, Status 0x0000000000000000 > MCA: Vendor "GenuineIntel", ID 0x306f1, APIC ID 42 > MCA: CPU 34 COR (2) OVER BUSLG ??? ERR Other > MCA: Misc 0xdf87b008d9eff > > mcelog v131: > Hardware event. This is not a software error. > CPU 0 BANK 20 > MISC 1df87b000d9eff > MCG status: > QPI: Rx detected CRC error - successful LLR wihout Phy re-init > STATUS c800084000310e0f MCGSTATUS 0 > MCGCAP 7000c16 APICID 0 SOCKETID 0 > CPUID Vendor Intel Family 6 Model 63 > Hardware event. This is not a software error. > CPU 34 BANK 5 > MISC df87b008d9eff > MCG status: > QPI: Rx detected CRC error - successful LLR wihout Phy re-init > STATUS c800008000310e0f MCGSTATUS 0 > MCGCAP 7000c16 APICID 2a SOCKETID 0 > CPUID Vendor Intel Family 6 Model 63 > > After receiving this error, the system was in a frozen state. Any ideas > what may cause this? Well, hardware causes it. QPI is the interconnect bus between your CPUs and RAM. "Rx detected CRC error" implies that a CPU detected a corrupted message on that bus, but when it requested a resend the resent message was ok. Normally corrected errors shouldn't hang your machine, but perhaps your machine had another hardware error after this that broke it too badly to report and/or log the subsequent error. -- John Baldwin