Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 May 2011 18:23:19 +0200
From:      John Hay <jhay@meraka.org.za>
To:        alc@freebsd.org
Cc:        freebsd-stable@freebsd.org
Subject:   Re: MCA: CPU 0 UNCOR PCC DTLB L1 error
Message-ID:  <20110516162319.GA58581@zibbi.meraka.csir.co.za>
In-Reply-To: <BANLkTik79gjQKsdrz_8mQdLc3e9KGiGzzQ@mail.gmail.com>
References:  <20110510125220.GA88338@zibbi.meraka.csir.co.za> <BANLkTik79gjQKsdrz_8mQdLc3e9KGiGzzQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, May 11, 2011 at 05:26:50PM -0500, Alan Cox wrote:
> On Tue, May 10, 2011 at 7:52 AM, John Hay <jhay@meraka.org.za> wrote:
> 
> > Hi,
> >
> > I have seen this panic a few times on a Gigabyte E350N-USB3 running
> > 8-STABLE.
> > I have only seen it while in X, but then the machine is always in X. At
> > first,
> > I just got these hangs, so bought a PCI-express RS232 card and could see
> > these
> > at last. For some reason it does not go past this, so I have not been able
> > to
> > get a dump yet.
> >
> > Have anybody an idea of why this is or how to debug it further? I searched
> > the archives and found something similar about a year ago, but it looks
> > like it was solved with a fix that got committed.
> >
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=140338
> >
> > I have now disabled mca in loader.conf with 'hw.mca.enabled="0"' and I have
> > not seen that panic again. I do occasionally see a panic in devfs_open(),
> > but I guess that should be handled in another thread.
> >
> > The kernel is basically a GENERIC kernel with puc uncommented and the
> > following in loader.conf
> >
> > vm.kmem_size="12G"
> > hw.mca.enabled="0"
> > zfs_load="YES"
> > ahci_load="YES"
> > xhci_load="YES"
> > amdtemp_load="YES"
> > ng_ubt_load="YES"
> > uplcom_load="YES"
> >
> > Here is the panic message and after that dmesg.
> >
> > John
> > --
> > John Hay -- jhay@meraka.csir.co.za / jhay@FreeBSD.org
> >
> > ####################################################
> > MCA: Bank 0, Status 0xb600000000010015
> > MCA: Global Cap 0x0000000000000106, Status 0x0000000000000004
> > MCA: Vendor "AuthenticAMD", ID 0x500f10, APIC ID 0
> > MCA: CPU 0 UNCOR PCC DTLB L1 error
> > MCA: Address 0x8016c4000
> >
> >
> > Fatal trap 28: machine check trap while in user mode
> > cpuid = 0; apic id = 00
> > instruction pointer     = 0x43:0x80156af85
> > stack pointer           = 0x3b:0x7fffffffcb18
> > frame pointer           = 0x3b:0x80fe87800
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                        = DPL 3, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, IOPL = 0
> > current process         = 2484 (initial thread)
> > trap number             = 28
> > panic: machine check trap
> > cpuid = 0
> > KDB: stack backtrace:
> > #0 0xffffffff80608d5e at kdb_backtrace+0x5e
> > #1 0xffffffff805d6707 at panic+0x187
> > #2 0xffffffff808bf4c0 at trap_fatal+0x290
> > #3 0xffffffff808bfaa9 at trap+0x109
> > #4 0xffffffff808a7d94 at calltrap+0x8
> > ####################################################
> >
> >
> Please try the following patch:
> 
> Index: x86/x86/mca.c
> ===================================================================
> --- x86/x86/mca.c       (revision 219060)
> +++ x86/x86/mca.c       (working copy)
> @@ -665,7 +665,8 @@ mca_setup(uint64_t mcg_cap)
>          * for Erratum 383.
>          */
>         if (cpu_vendor_id == CPU_VENDOR_AMD &&
> -           CPUID_TO_FAMILY(cpu_id) == 0x10 && amd10h_L1TP)
> +           (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
> +           CPUID_TO_FAMILY(cpu_id) == 0x14) && amd10h_L1TP)
>                 workaround_erratum383 = 1;
> 
>         mtx_init(&mca_lock, "mca", NULL, MTX_SPIN);
> Index: i386/i386/pmap.c
> ===================================================================
> --- i386/i386/pmap.c    (revision 219060)
> +++ i386/i386/pmap.c    (working copy)
> @@ -758,7 +758,8 @@ pmap_init(void)
>          * machine monitor.
>          */
>         if (vm_guest == VM_GUEST_VM && cpu_vendor_id == CPU_VENDOR_AMD &&
> -           CPUID_TO_FAMILY(cpu_id) == 0x10)
> +           (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
> +           CPUID_TO_FAMILY(cpu_id) == 0x14))
>                 workaround_erratum383 = 1;
> 
>         /*
> Index: amd64/amd64/pmap.c
> ===================================================================
> --- amd64/amd64/pmap.c  (revision 219060)
> +++ amd64/amd64/pmap.c  (working copy)
> @@ -727,7 +727,8 @@ pmap_init(void)
>          * machine monitor.
>          */
>         if (vm_guest == VM_GUEST_VM && cpu_vendor_id == CPU_VENDOR_AMD &&
> -           CPUID_TO_FAMILY(cpu_id) == 0x10)
> +           (CPUID_TO_FAMILY(cpu_id) == 0x10 ||
> +           CPUID_TO_FAMILY(cpu_id) == 0x14))
>                 workaround_erratum383 = 1;
> 
>         /*

I have applied the patch, but got another one today. I still do not get
a prompt or dump. :-( It just get stuck right after #4. If there is anything
more that I can try, just ask.

#####################################################################
MCA: Bank 0, Status 0xb600000000010015
MCA: Global Cap 0x0000000000000106, Status 0x0000000000000004
MCA: Vendor "AuthenticAMD", ID 0x500f10, APIC ID 0
MCA: CPU 0 UNCOR PCC DTLB L1 error
MCA: Address 0x808ace000


Fatal trap 28: machine check trap while in user mode
cpuid = 1; apic id = 01
instruction pointer	= 0x43:0x80af206d5
stack pointer	        = 0x3b:0x7fffffffb8e8
frame pointer	        = 0x3b:0x809b92450
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 3, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, IOPL = 0
current process		= 22228 (initial thread)
trap number		= 28
panic: machine check trap
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80608f6e at kdb_backtrace+0x5e
#1 0xffffffff805d6917 at panic+0x187
#2 0xffffffff808bf7c0 at trap_fatal+0x290
#3 0xffffffff808bfda9 at trap+0x109
#4 0xffffffff808a8084 at calltrap+0x8
#####################################################################

John
-- 
John Hay -- jhay@meraka.csir.co.za / jhay@FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110516162319.GA58581>