Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 31 Jan 2014 12:22:12 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Tim Daneliuk <tundra@tundraware.com>, FreeBSD Hardware Mailing List <freebsd-hardware@freebsd.org>
Subject:   Re: Need Help With MCA  Code
Message-ID:  <201401311222.12136.jhb@freebsd.org>
In-Reply-To: <52E99381.5050803@tundraware.com>
References:  <52E73717.3000503@tundraware.com> <52E99381.5050803@tundraware.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, January 29, 2014 6:49:21 pm Tim Daneliuk wrote:
> Resending in hopes that people on one of the other lists will have some insight here:
> 
> On 01/27/2014 10:50 PM, Tim Daneliuk wrote:
> > I am running 9.2 stable i386 r261207.  As noted earlier:
> >
> >> I just replaced mobo/CPU on FBSD server (Gigabyte Z-87-D3HP with
> >> an Intel i3-4130).  I am not overclocking ...  but I continue to see this sort of thing:
> >
> >> MCA: CPU 0 COR (1) internal parity error
> >
> > Dmesg shows:
> >
> >> MCA: Vendor "GenuineIntel", ID 0x306c3, APIC ID 0
> >> MCA: CPU 0 COR (1) internal parity error
> >> MCA: Bank 0, Status 0x90000040000f0005
> >> MCA: Global Cap 0x0000000000000c07, Status 0x0000000000000000_
> >
> > I've swapped CPUs (i5). I've fiddled with an endless supply of
> > mobo settings. I've switched power supplies.  I've moved mem
> > sticks around ....   No joy.
> >
> > So, I dug through the sources and found this:
> >
> >
> >
> > mca_log(const struct mca_record *rec)
> > {
> >          uint16_t mca_error;
> >
> >          printf("MCA: Bank %d, Status 0x%016llx\n", rec->mr_bank,
> >              (long long)rec->mr_status);
> >          printf("MCA: Global Cap 0x%016llx, Status 0x%016llx\n",
> >              (long long)rec->mr_mcg_cap, (long long)rec->mr_mcg_status);
> >          printf("MCA: Vendor \"%s\", ID 0x%x, APIC ID %d\n", cpu_vendor,
> >              rec->mr_cpu_id, rec->mr_apic_id);
> >          printf("MCA: CPU %d ", rec->mr_cpu);
> >          if (rec->mr_status & MC_STATUS_UC)
> >                  printf("UNCOR ");
> >          else {
> >                  printf("COR ");
> >                  if (rec->mr_mcg_cap & MCG_CAP_CMCI_P)
> >                          printf("(%lld) ", ((long long)rec->mr_status &
> >                              MC_STATUS_COR_COUNT) >> 38);
> >          }
> >
> >
> > It looks like the trailing else clause is kicking out the error but I am
> > unclear what the error means, beyond the fact that it appears to be a parity
> > error somewhere within the CPU's internal memory (cache?).  Is this error
> > getting corrected?  Is this benign, Should I get a different mobo?
> >
> > Um .... Haaaaalp :)
> 
> 
> I have now tried different motherboards, CPUs, memory, and power supplies and
> this error is still showing up now and then.
> 
> This points strongly to either FreeBSD bogus reporting, or these errors being
> benign.  It's hard to believe that the exact same error might occur with
> completely different hardware ... unless it's being caused by the case.

Are they all the same model CPU?  Since it is a corrected error you can
probably ignore it, but it is not bogus reporting.  FreeBSD only reports
these errors because they show up in registers on your CPU.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201401311222.12136.jhb>