Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Jan 2011 11:32:55 +0100
From:      Alban Hertroys <dalroi@solfertje.student.utwente.nl>
To:        stable@freebsd.org
Subject:   Machine check errors
Message-ID:  <652E5569-2566-4D3C-BC8B-C8B00F3B61EA@solfertje.student.utwente.nl>

next in thread | raw e-mail | index | archive | help

--Apple-Mail-11--356003971
Content-Transfer-Encoding: 8bit
Content-Type: text/plain;
	charset=us-ascii

Ever since installing 7.4-PRERELEASE I'm seeing MCA machine check errors on my home-server. They usually occur during my Sunday-night level1 dump via ssh to a disk connected to a different machine, although that's probably not relevant.

Today I finally managed to catch it on the terminal, here's a hand-transcribed copy:

MCA: Bank 0, Status 0xb622000000000135
MCA: Global Cap 0x0000000000000104, Status 0x0000000000000004
MCA: Vendor "AuthenticAMD". ID 0x662, APIC ID 1
MCA: CPU 0 UNCOR PCC DCACHE L1 DRD error
MCA: Address 0x162933f0


Fatal trap 20: Machine check trap while in user mode
cpuid = 0; apic id = 01
instruction pointer	= 0x33:0x8086bd0
stack pointer		= 0x3b:0xbfbfd390
frame pointer		= 0x3b:0xbfbfd3e8
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 3, pres 1, def32 1, gran 1
processor eflags	= interrupt enabled, IOPL = 0
current process		= 18119 (postgres)
trap number		= 20
panic: machine check trap
cpuid = 0
GEOM_MIRROR: Device home: provider mirror/home destroyed.

Dmesg is also attached.


!DSPAM:363,4d3c03e411733364220958!

--Apple-Mail-11--356003971
Content-Disposition: attachment;
	filename=dmesg_20110123
Content-Type: application/octet-stream;
	name="dmesg_20110123"
Content-Transfer-Encoding: 7bit

Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.4-PRERELEASE #7: Mon Dec  6 19:30:23 CET 2010
    dalroi@solfertje.student.utwente.nl:/usr/obj/usr/src/sys/ERGOPROXY i386
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) XP 2000+ (1666.73-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x662  Family = 6  Model = 6  Stepping = 2
  Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
  AMD Features=0xc0400800<SYSCALL,MMX+,3DNow!+,3DNow!>
real memory  = 1610088448 (1535 MB)
avail memory = 1568038912 (1495 MB)
ACPI APIC Table: <PTLTD  	 APIC  >
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <PTLTD   RSDT> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 5ff00000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <AMD 762 host to AGP bridge> on hostb0
device_attach: agp0 attach returned 12
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <AMD 768 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
pci0: <bridge> at device 7.3 (no driver attached)
3ware device driver for 9000 series storage controllers, version: 3.70.05.010
twa0: <3ware 9000 series Storage Controller> port 0x1000-0x103f mem 0xfc000000-0xfdffffff,0xfa000000-0xfa000fff irq 21 at device 9.0 on pci0
twa0: [ITHREAD]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SXU-4LP, 4 ports, Firmware FE9X 3.08.02.005, BIOS BE9X 3.08.00.002
pcib2: <ACPI PCI-PCI bridge> at device 16.0 on pci0
pci2: <ACPI PCI bus> on pcib2
ohci0: <OHCI (generic) USB controller> mem 0xfa104000-0xfa104fff irq 19 at device 0.0 on pci2
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 4 ports with 4 removable, self powered
vgapci0: <VGA-compatible display> mem 0xfa100000-0xfa103fff,0xfa800000-0xfaffffff irq 18 at device 6.0 on pci2
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x2000-0x207f mem 0xfa105000-0xfa10507f irq 19 at device 7.0 on pci2
miibus0: <MII bus> on xl0
xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0
xlphy0:  10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, auto, auto-flow
xl0: Ethernet address: 00:04:76:0f:59:7a
xl0: [ITHREAD]
xl1: <3Com 3c905C-TX Fast Etherlink XL> port 0x2080-0x20ff mem 0xfa105400-0xfa10547f irq 19 at device 8.0 on pci2
miibus1: <MII bus> on xl1
ukphy0: <Generic IEEE 802.3u media interface> PHY 24 on miibus1
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
xl1: Ethernet address: 00:e0:81:27:1b:4b
xl1: [ITHREAD]
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
cpu0: <ACPI CPU> on acpi0
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xc9000-0xc97ff,0xe0000-0xe3fff pnpid ORM0000 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: parallel port not found.
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
uart0: [FILTER]
uart0: console (19200,n,8,1)
uart1: <16550 or compatible> at port 0x2f8-0x2ff irq 3 on isa0
uart1: [FILTER]
Timecounter "TSC" frequency 1666733145 Hz quality 800
Timecounters tick every 1.000 msec
ad0: 190782MB <Seagate ST3200822A 3.01> at ata0-master UDMA100
ad1: 190782MB <Seagate ST3200822A 3.01> at ata0-slave UDMA100
acd0: DVDROM <Pioneer DVD-ROM ATAPIModel DVD-116 0122/E1.22> at ata1-master UDMA66
GEOM_STRIPE: Device tmp created (id=1982480573).
GEOM_STRIPE: Disk ad0s1e attached to tmp.
GEOM_STRIPE: Device usr created (id=1752489598).
GEOM_STRIPE: Disk ad0s1f attached to usr.
GEOM_MIRROR: Device mirror/root launched (2/2).
GEOM_MIRROR: Device mirror/var launched (2/2).
GEOM_STRIPE: Disk ad1s1e attached to tmp.
GEOM_STRIPE: Device tmp activated.
GEOM_STRIPE: Disk ad1s1f attached to usr.
GEOM_STRIPE: Device usr activated.
GEOM_MIRROR: Device mirror/home launched (2/2).
WARNING: Expected rawoffset 0, found 63
WARNING: Expected rawoffset 0, found 63
da0 at twa0 bus 0 target 0 lun 0
da0: <AMCC 9550SXU-4L DISK 3.08> Fixed Direct Access SCSI-5 device 
da0: 100.000MB/s transfers
da0: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C)
da1 at twa0 bus 0 target 1 lun 0
da1: <AMCC 9550SXU-4L DISK 3.08> Fixed Direct Access SCSI-5 device 
da1: 100.000MB/s transfers
da1: 953664MB (1953103872 512 byte sectors: 255H 63S/T 121575C)
Trying to mount root from ufs:/dev/mirror/root
WARNING: / was not properly dismounted
WARNING: Expected rawoffset 0, found 63
WARNING: Expected rawoffset 0, found 63
twa0: INFO: (0x04: 0x0029): Verify started: unit=0
twa0: INFO: (0x04: 0x0029): Verify started: unit=1

--Apple-Mail-11--356003971
Content-Transfer-Encoding: 8bit
Content-Type: text/plain;
	charset=us-ascii



>From searching the archives I found claims that L1 cache errors would cause far more troubles than I'm seeing. The user in that case however was using an Intel-based Thinkpad laptop, while I'm seeing them on an AthlonXP-based server (Tyan Tiger board, 3Ware RAID-controller, the works).

Now there is something unusual about my server that could be related to these MCA errors: It's a dual-CPU motherboard that normally would host two AthlonMP's, but is instead hosting a single AthlonXP. So one of the CPU sockets has no CPU in it.

So, what's my situation? Do I need to go looking for a replacement CPU or is something wrong with the machine-check itself?

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.



!DSPAM:363,4d3c03e411733364220958!

--Apple-Mail-11--356003971--





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?652E5569-2566-4D3C-BC8B-C8B00F3B61EA>