Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Sep 2010 13:25:15 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-stable@freebsd.org
Cc:        Adam Vande More <amvandemore@gmail.com>
Subject:   Re: MCA messages in dmesg
Message-ID:  <201009301325.15113.jhb@freebsd.org>
In-Reply-To: <AANLkTinyBrF65LbjPfcBdEcHn1PE-=sHWaJhwnHibVvt@mail.gmail.com>
References:  <AANLkTine8Prmd-TOrHixJijHiR%2BNEMzwSKdcoTUsBJ_B@mail.gmail.com> <201009300940.43136.jhb@freebsd.org> <AANLkTinyBrF65LbjPfcBdEcHn1PE-=sHWaJhwnHibVvt@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, September 30, 2010 12:33:24 pm Adam Vande More wrote:
> On Thu, Sep 30, 2010 at 8:40 AM, John Baldwin <jhb@freebsd.org> wrote:
> 
> > On Thursday, September 30, 2010 2:49:24 am Adam Vande More wrote:
> > > For awhile now, my home server has been acting up.  Actually it had a bad
> > > set of RAM long ago, replaced and it and worked fine.  It's been weird
> > again
> > > now, and I've found this in dmesg:
> > >
> > > MCA: Bank 0, Status 0xf200000000000800
> > > MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
> > > MCA: Vendor "GenuineIntel", ID 0x6fb, APIC ID 2
> > > MCA: CPU 2 UNCOR PCC OVER BUSL0 Source ERR Memory
> > > MCA: Bank 0, Status 0xf200000000000800
> > > MCA: Global Cap 0x0000000000000806, Status 0x0000000000000000
> > > MCA: Vendor "GenuineIntel", ID 0x6fb, APIC ID 3
> > > MCA: CPU 3 UNCOR PCC OVER BUSL0 Source ERR Memory
> >
> > Are you getting a panic when this happens?
> >
> 
> It's symptoms vary, but yes I think so.  The box is headless, so I depend on
> logs after boot to see what happens.  Sometimes the box panics and powers
> off with no warning, and other times it just seems to hit a stall state
> where everything become unresponsive and I have to manually power off.

Ok, it is a memory error of some sort, but mcelog claims it is a transaction
timeout rather than an ECC error, per se:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 BANK 0 
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
STATUS f200000000000800 MCGSTATUS 0
MCGCAP 806 APICID 2 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 15
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 0 
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Local-CPU-originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
STATUS f200000000000800 MCGSTATUS 0
MCGCAP 806 APICID 3 SOCKETID 0 
CPUID Vendor Intel Family 6 Model 15

I've no idea what specific hardware is busted (memory or motherboard or CPU),
but I suspect something is likely broken.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201009301325.15113.jhb>