Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 Jan 2014 11:48:42 -0600
From:      Tim Daneliuk <tundra@tundraware.com>
To:        John Baldwin <jhb@freebsd.org>, freebsd-stable@freebsd.org
Cc:        FreeBSD Hardware Mailing List <freebsd-hardware@freebsd.org>
Subject:   Re: Need Help With MCA  Code
Message-ID:  <52EBE1FA.2040603@tundraware.com>
In-Reply-To: <201401311222.12136.jhb@freebsd.org>
References:  <52E73717.3000503@tundraware.com> <52E99381.5050803@tundraware.com> <201401311222.12136.jhb@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01/31/2014 11:22 AM, John Baldwin wrote:
> On Wednesday, January 29, 2014 6:49:21 pm Tim Daneliuk wrote:
>> Resending in hopes that people on one of the other lists will have some insight here:
>>
>> On 01/27/2014 10:50 PM, Tim Daneliuk wrote:
>>> I am running 9.2 stable i386 r261207.  As noted earlier:
>>>
>>>> I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with
>>>> an Intel i3-4130).  I am not overclocking ...  but I continue to see this sort of thing:
>>>
>>>> MCA: CPU 0 COR (1) internal parity error
>>>
>>> Dmesg shows:
>>>
>>>> MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0
>>>> MCA: CPU 0 COR (1) internal parity error
>>>> MCA: Bank 0, Status 0x90000040000f0005
>>>> MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_
>>>
>>> I've swapped CPUs (i5). I've fiddled with an endless supply of
>>> mobo settings. I've switched power supplies.  I've moved mem
>>> sticks around ....   No joy.
>>>
>>> So, I dug through the sources and found this:
>>>
>>>
>>>
>>> mca_log(const struct mca_record *rec)
>>> {
>>>           uint16_t mca_error;
>>>
>>>           printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank,
>>>               (long long)rec->mr_status);
>>>           printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n",
>>>               (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status);
>>>           printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor,
>>>               rec->mr_cpu_id, rec->mr_apic_id);
>>>           printf("MCA: CPU %d ", rec->mr_cpu);
>>>           if (rec->mr_status & MC_STATUS_UC)
>>>                   printf("UNCOR ");
>>>           else {
>>>                   printf("COR ");
>>>                   if (rec->mr_mcg_cap & MCG_CAP_CMCI_P)
>>>                           printf("(%lld) ", ((long long)rec->mr_status &
>>>                               MC_STATUS_COR_COUNT) >> 38);
>>>           }
>>>
>>>
>>> It looks like the trailing else clause is kicking out the error but I am
>>> unclear what the error means, beyond the fact that it appears to be a parity
>>> error somewhere within the CPU's internal memory (cache?).  Is this error
>>> getting corrected?  Is this benign, Should I get a different mobo?
>>>
>>> Um .... Haaaaalp :)
>>
>>
>> I have now tried different motherboards, CPUs, memory, and power supplies and
>> this error is still showing up now and then.
>>
>> This points strongly to either FreeBSD bogus reporting, or these errors being
>> benign.  It's hard to believe that the exact same error might occur with
>> completely different hardware ... unless it's being caused by the case.
>
> Are they all the same model CPU?  Since it is a corrected error you can
> probably ignore it, but it is not bogus reporting.  FreeBSD only reports
> these errors because they show up in registers on your CPU.
>

It's looking like this is an artifact of running 9.2-STABLE i386 on that hardware.
I just installed 10-STABLE x64 and am beating the hardware to death and have yet
to see an MCA check.

It *is* possible the 9.2 install is boogered up (I went to grad school to learn how
to say that), so I am pursuing a full rebuild of the server.  While painful, this
will also finally move this machine to x64 which is long overdue.



-- 
----------------------------------------------------------------------------
Tim Daneliuk     tundra@tundraware.com
PGP Key:         http://www.tundraware.com/PGP/




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52EBE1FA.2040603>