Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jan 2014 17:49:21 -0600
From:      Tim Daneliuk <tundra@tundraware.com>
To:        freebsd-stable@freebsd.org, FreeBSD Hardware Mailing List <freebsd-hardware@freebsd.org>
Subject:   Need Help With MCA  Code
Message-ID:  <52E99381.5050803@tundraware.com>
In-Reply-To: <52E73717.3000503@tundraware.com>
References:  <52E73717.3000503@tundraware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Resending in hopes that people on one of the other lists will have some insight here:

On 01/27/2014 10:50 PM, Tim Daneliuk wrote:
> I am running 9.2 stable i386 r261207.  As noted earlier:
>
>> I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with
>> an Intel i3-4130).  I am not overclocking ...  but I continue to see this sort of thing:
>
>> MCA: CPU 0 COR (1) internal parity error
>
> Dmesg shows:
>
>> MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0
>> MCA: CPU 0 COR (1) internal parity error
>> MCA: Bank 0, Status 0x90000040000f0005
>> MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_
>
> I've swapped CPUs (i5). I've fiddled with an endless supply of
> mobo settings. I've switched power supplies.  I've moved mem
> sticks around ....   No joy.
>
> So, I dug through the sources and found this:
>
>
>
> mca_log(const struct mca_record *rec)
> {
>          uint16_t mca_error;
>
>          printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank,
>              (long long)rec->mr_status);
>          printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n",
>              (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status);
>          printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor,
>              rec->mr_cpu_id, rec->mr_apic_id);
>          printf("MCA: CPU %d ", rec->mr_cpu);
>          if (rec->mr_status & MC_STATUS_UC)
>                  printf("UNCOR ");
>          else {
>                  printf("COR ");
>                  if (rec->mr_mcg_cap & MCG_CAP_CMCI_P)
>                          printf("(%lld) ", ((long long)rec->mr_status &
>                              MC_STATUS_COR_COUNT) >> 38);
>          }
>
>
> It looks like the trailing else clause is kicking out the error but I am
> unclear what the error means, beyond the fact that it appears to be a parity
> error somewhere within the CPU's internal memory (cache?).  Is this error
> getting corrected?  Is this benign, Should I get a different mobo?
>
> Um .... Haaaaalp :)


I have now tried different motherboards, CPUs, memory, and power supplies and
this error is still showing up now and then.

This points strongly to either FreeBSD bogus reporting, or these errors being
benign.  It's hard to believe that the exact same error might occur with
completely different hardware ... unless it's being caused by the case.



-- 
----------------------------------------------------------------------------
Tim Daneliuk     tundra@tundraware.com
PGP Key:         http://www.tundraware.com/PGP/




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52E99381.5050803>