From owner-freebsd-stable@FreeBSD.ORG Sun Jan 23 10:50:38 2011 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3D72106564A for ; Sun, 23 Jan 2011 10:50:38 +0000 (UTC) (envelope-from dalroi@solfertje.student.utwente.nl) Received: from solfertje.student.utwente.nl (solfertje.student.utwente.nl [130.89.167.40]) by mx1.freebsd.org (Postfix) with ESMTP id 5E59E8FC12 for ; Sun, 23 Jan 2011 10:50:38 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by solfertje.student.utwente.nl (Postfix) with SMTP id AB8338063 for ; Sun, 23 Jan 2011 11:33:08 +0100 (CET) Received: from hollewijn.internal (hollewijn.internal [10.236.150.4]) by solfertje.student.utwente.nl (Postfix) with ESMTP id 13C8B8053 for ; Sun, 23 Jan 2011 11:32:56 +0100 (CET) From: Alban Hertroys Content-Type: multipart/mixed; boundary=Apple-Mail-11--356003971 Date: Sun, 23 Jan 2011 11:32:55 +0100 Message-Id: <652E5569-2566-4D3C-BC8B-C8B00F3B61EA@solfertje.student.utwente.nl> To: stable@freebsd.org Mime-Version: 1.0 (Apple Message framework v1082) X-Mailer: Apple Mail (2.1082) X-DSPAM-Result: Innocent X-DSPAM-Processed: Sun Jan 23 11:33:08 2011 X-DSPAM-Confidence: 1.0000 X-DSPAM-Probability: 0.0023 X-DSPAM-Signature: 363,4d3c03e411733364220958 X-DSPAM-Factors: 27, Address+0x162933f0, 0.40000, could, 0.40000, but, 0.40000, Content-Type*application/octet+stream, 0.40000, From*Alban, 0.40000, that+case, 0.40000, Mime-Version*Message, 0.40000, Global+Cap, 0.40000, DRD, 0.40000, number+=, 0.40000, or, 0.40000, provider+mirror/home, 0.40000, an, 0.40000, an, 0.40000, 0x33+0x8086bd0, 0.40000, trees, 0.40000, trees, 0.40000, the+terminal, 0.40000, is+something, 0.40000, is+something, 0.40000, normally, 0.40000, to+go, 0.40000, via, 0.40000, cut, 0.40000, that+L1, 0.40000, GEOM_MIRROR, 0.40000, of, 0.40000 Cc: Subject: Machine check errors X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 Jan 2011 10:50:38 -0000 --Apple-Mail-11--356003971 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=us-ascii Ever since installing 7.4-PRERELEASE I'm seeing MCA machine check errors on my home-server. They usually occur during my Sunday-night level1 dump via ssh to a disk connected to a different machine, although that's probably not relevant. Today I finally managed to catch it on the terminal, here's a hand-transcribed copy: MCA: Bank 0, Status 0xb622000000000135 MCA: Global Cap 0x0000000000000104, Status 0x0000000000000004 MCA: Vendor "AuthenticAMD". ID 0x662, APIC ID 1 MCA: CPU 0 UNCOR PCC DCACHE L1 DRD error MCA: Address 0x162933f0 Fatal trap 20: Machine check trap while in user mode cpuid = 0; apic id = 01 instruction pointer = 0x33:0x8086bd0 stack pointer = 0x3b:0xbfbfd390 frame pointer = 0x3b:0xbfbfd3e8 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 3, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, IOPL = 0 current process = 18119 (postgres) trap number = 20 panic: machine check trap cpuid = 0 GEOM_MIRROR: Device home: provider mirror/home destroyed. Dmesg is also attached. !DSPAM:363,4d3c03e411733364220958! --Apple-Mail-11--356003971 Content-Disposition: attachment; filename=dmesg_20110123 Content-Type: application/octet-stream; name="dmesg_20110123" Content-Transfer-Encoding: 7bit Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.4-PRERELEASE #7: Mon Dec 6 19:30:23 CET 2010 dalroi@solfertje.student.utwente.nl:/usr/obj/usr/src/sys/ERGOPROXY i386 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) XP 2000+ (1666.73-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x662 Family = 6 Model = 6 Stepping = 2 Features=0x383fbff AMD Features=0xc0400800 real memory = 1610088448 (1535 MB) avail memory = 1568038912 (1495 MB) ACPI APIC Table: MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: Sleep Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 5ff00000 (3) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff,0x8000-0x807f,0x8080-0x80ff iomem 0xd8000-0xdbfff on acpi0 pci0: on pcib0 agp0: on hostb0 device_attach: agp0 attach returned 12 pcib1: at device 1.0 on pci0 pci1: on pcib1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 7.1 on pci0 ata0: on atapci0 ata0: [ITHREAD] ata1: on atapci0 ata1: [ITHREAD] pci0: at device 7.3 (no driver attached) 3ware device driver for 9000 series storage controllers, version: 3.70.05.010 twa0: <3ware 9000 series Storage Controller> port 0x1000-0x103f mem 0xfc000000-0xfdffffff,0xfa000000-0xfa000fff irq 21 at device 9.0 on pci0 twa0: [ITHREAD] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SXU-4LP, 4 ports, Firmware FE9X 3.08.02.005, BIOS BE9X 3.08.00.002 pcib2: at device 16.0 on pci0 pci2: on pcib2 ohci0: mem 0xfa104000-0xfa104fff irq 19 at device 0.0 on pci2 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: on ohci0 usb0: USB revision 1.0 uhub0: on usb0 uhub0: 4 ports with 4 removable, self powered vgapci0: mem 0xfa100000-0xfa103fff,0xfa800000-0xfaffffff irq 18 at device 6.0 on pci2 xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x2000-0x207f mem 0xfa105000-0xfa10507f irq 19 at device 7.0 on pci2 miibus0: on xl0 xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0 xlphy0: 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow, auto, auto-flow xl0: Ethernet address: 00:04:76:0f:59:7a xl0: [ITHREAD] xl1: <3Com 3c905C-TX Fast Etherlink XL> port 0x2080-0x20ff mem 0xfa105400-0xfa10547f irq 19 at device 8.0 on pci2 miibus1: on xl1 ukphy0: PHY 24 on miibus1 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow xl1: Ethernet address: 00:e0:81:27:1b:4b xl1: [ITHREAD] atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 cpu0: on acpi0 fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 pmtimer0 on isa0 orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xc9000-0xc97ff,0xe0000-0xe3fff pnpid ORM0000 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: parallel port not found. uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 uart0: [FILTER] uart0: console (19200,n,8,1) uart1: <16550 or compatible> at port 0x2f8-0x2ff irq 3 on isa0 uart1: [FILTER] Timecounter "TSC" frequency 1666733145 Hz quality 800 Timecounters tick every 1.000 msec ad0: 190782MB at ata0-master UDMA100 ad1: 190782MB at ata0-slave UDMA100 acd0: DVDROM at ata1-master UDMA66 GEOM_STRIPE: Device tmp created (id=1982480573). GEOM_STRIPE: Disk ad0s1e attached to tmp. GEOM_STRIPE: Device usr created (id=1752489598). GEOM_STRIPE: Disk ad0s1f attached to usr. GEOM_MIRROR: Device mirror/root launched (2/2). GEOM_MIRROR: Device mirror/var launched (2/2). GEOM_STRIPE: Disk ad1s1e attached to tmp. GEOM_STRIPE: Device tmp activated. GEOM_STRIPE: Disk ad1s1f attached to usr. GEOM_STRIPE: Device usr activated. GEOM_MIRROR: Device mirror/home launched (2/2). WARNING: Expected rawoffset 0, found 63 WARNING: Expected rawoffset 0, found 63 da0 at twa0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-5 device da0: 100.000MB/s transfers da0: 476827MB (976541696 512 byte sectors: 255H 63S/T 60786C) da1 at twa0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 953664MB (1953103872 512 byte sectors: 255H 63S/T 121575C) Trying to mount root from ufs:/dev/mirror/root WARNING: / was not properly dismounted WARNING: Expected rawoffset 0, found 63 WARNING: Expected rawoffset 0, found 63 twa0: INFO: (0x04: 0x0029): Verify started: unit=0 twa0: INFO: (0x04: 0x0029): Verify started: unit=1 --Apple-Mail-11--356003971 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=us-ascii >From searching the archives I found claims that L1 cache errors would cause far more troubles than I'm seeing. The user in that case however was using an Intel-based Thinkpad laptop, while I'm seeing them on an AthlonXP-based server (Tyan Tiger board, 3Ware RAID-controller, the works). Now there is something unusual about my server that could be related to these MCA errors: It's a dual-CPU motherboard that normally would host two AthlonMP's, but is instead hosting a single AthlonXP. So one of the CPU sockets has no CPU in it. So, what's my situation? Do I need to go looking for a replacement CPU or is something wrong with the machine-check itself? Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:363,4d3c03e411733364220958! --Apple-Mail-11--356003971--