Date: Wed, 27 Feb 2008 16:46:02 GMT From: Jim Pingle <lists@pingle.org> To: freebsd-gnats-submit@FreeBSD.org Subject: i386/121148: Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled Message-ID: <200802271646.m1RGk2o3004379@www.freebsd.org> Resent-Message-ID: <200802271650.m1RGo1aD033724@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 121148 >Category: i386 >Synopsis: Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-i386 >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Feb 27 16:50:00 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Jim Pingle >Release: 7.0-PRERELEASE (RELENG_7) >Organization: HPC Internet Services >Environment: FreeBSD test1.hpcisp.com 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #1: Thu Feb 14 14:08:02 EST 2008 root@test1.hpcisp.com:/usr/obj/usr/src/sys/TEST i386 >Description: SuperMicro SuperServer 6022L-6 will not fully boot RELENG_7 unless I booth with ACPI disabled. RELENG_7_0 does not crash on the same hardware with the same config. Crash is as follows: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x2043455c fault code = supervisor read, page not present instruction pointer = 0x20:0xc0742c86 stack pointer = 0x28:0xe8cada0c frame pointer = 0x28:0xe8cada38 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 68 (sysctl) trap number = 12 panic: page fault cpuid = 3 Uptime: 6s Physical memory: 2035 MB Dumping 65 MB: 50 34 18 2 The crash happens just after the "Entropy harvesting..." line, before swap is started. As you can see in the crash output, the offending process is sysctl. I can boot to single user mode, but if I issue sysctl -a while there, it also crashes. When sysctl -a is run in single user mode, the last three lines before the crash are (transcribed by hand, no serial console available): dev.pcib.3.%location: handle=\_SB_.PCI3 dev.pcib.3.%pnpinfo: _HID=PNP0A03 UID=3 dev.pcib.3.%parent: acpi0 With a working RELENG_7_0 the lines immediately following this are: dev.pcib.4.%desc: ACPI Host-PCI bridge dev.pcib.4.%driver: pcib dev.pcib.4.%location: handle=\_SB_.PCI4 dev.pcib.4.%pnpinfo: _HID=PNP0A03 _UID=4 dev.pcib.4.%parent: acpi0 I tried a binary search of the source tree to narrow down the crash. I found that one possible vector for the crash was introduced between 2007/12/19 20:00:00 (booted OK) and 2007/12/19 23:59:00 (crashed), which left me with only a handful of files to test. By process of elimination, I found that if I backed some changes out in src/sys/i386/i386/machdep.c, the crash stopped. src/sys/i386/i386/machdep.c v1.658 2007/08/09 njl - Boots OK src/sys/i386/i386/machdep.c v1.658.2.1 2007/12/19 rpaulo - Crashes The confusing part (to me) is that my next step was to update all the way to RELENG_7 as of yesterday, then back out those same changes, but the crash still happened. So either I misidentified the cause of the crash -- which is quite possible -- or it was reintroduced in some other change (or both!). kgdb output from vmcore.0: Unread portion of the kernel message buffer: Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.0-PRERELEASE #0: Mon Feb 25 15:22:54 EST 2008 root@test1.hpcisp.com:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) XEON(TM) CPU 2.00GHz (1999.94-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf24 Stepping = 4 Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM> Logical CPUs per core: 2 real memory = 2147418112 (2047 MB) avail memory = 2091872256 (1994 MB) ACPI APIC Table: <RCC GCHE > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length: 0 0/8 [20070320] MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-15 on motherboard ioapic1 <Version 1.1> irqs 16-31 on motherboard ioapic2 <Version 1.1> irqs 32-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (Feb 25 2008 15:20:56) acpi0: <RCC GCHE> on motherboard ACPI Warning (dswload-0794): Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [IO__] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [DATA] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [ICNT] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [ACPI] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [IORG] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [BIOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [CMOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [KBC_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] ACPI Warning (dswload-0794): Type override - [OEM_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320] acpi0: [ITHREAD] acpi0: Power Button (fixed) acpi0: reservation of 0, a0000 (3) failed acpi0: reservation of 100000, 7ff00000 (3) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0 cpu0: <ACPI CPU> on acpi0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 cpu1: <ACPI CPU> on acpi0 p4tcc1: <CPU Frequency Thermal Control> on cpu1 cpu2: <ACPI CPU> on acpi0 p4tcc2: <CPU Frequency Thermal Control> on cpu2 cpu3: <ACPI CPU> on acpi0 p4tcc3: <CPU Frequency Thermal Control> on cpu3 acpi_button0: <Sleep Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0xa800-0xa8ff mem 0xfd000000-0xfdffffff,0xfe5ff000-0xfe5fffff irq 18 at device 2.0 on pci0 fxp0: <Intel 82550 Pro/100 Ethernet> port 0xae80-0xaebf mem 0xfe5fc000-0xfe5fcfff,0xfe580000-0xfe59ffff irq 17 at device 4.0 on pci0 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> PHY 1 on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:30:48:20:a3:9e fxp0: [ITHREAD] fxp1: <Intel 82550 Pro/100 Ethernet> port 0xaf00-0xaf3f mem 0xfe5fd000-0xfe5fdfff,0xfe5a0000-0xfe5bffff irq 19 at device 5.0 on pci0 miibus1: <MII bus> on fxp1 inphy1: <i82555 10/100 media interface> PHY 1 on miibus1 inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp1: Ethernet address: 00:30:48:20:a3:9f fxp1: [ITHREAD] isab0: <PCI-ISA bridge> at device 15.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <ServerWorks CSB5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] ata1: <ATA channel 1> on atapci0 ata1: [ITHREAD] ohci0: <OHCI (generic) USB controller> mem 0xfe5fe000-0xfe5fefff irq 10 at device 15.2 on pci0 ohci0: [GIANT-LOCKED] ohci0: [ITHREAD] usb0: OHCI version 1.0, legacy support usb0: SMM does not respond, resetting usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: <(0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 4 ports with 4 removable, self powered pcib1: <ACPI Host-PCI bridge> on acpi0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI Host-PCI bridge> on acpi0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI Host-PCI bridge> on acpi0 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI Host-PCI bridge> on acpi0 pci4: <ACPI PCI bus> on pcib4 asr0: <Adaptec Caching SCSI RAID> mem 0xfeb00000-0xfebfffff,0xfb000000-0xfbffffff,0xf8000000-0xf9ffffff irq 29 at device 3.0 on pci4 asr0: [GIANT-LOCKED] asr0: [ITHREAD] asr0: ADAPTEC 2005S FW Rev. 380E, 2 channel, 2000 CCBs, Protocol I2O atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model NetMouse/NetScroll Optical, device ID 0 fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcdfff,0xce000-0xcefff,0xcf000-0xcffff pnpid ORM0000 on isa0 ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/8 bytes threshold ppbus0: <Parallel port bus> on ppc0 ppbus0: [ITHREAD] plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec hptrr: no controller detected. acd0: CDROM <MATSHITA CR-177/7T0D> at ata1-master UDMA33 da0 at asr0 bus 0 target 0 lun 0 da0: <ADAPTEC RAID-5 380E> Fixed Direct Access SCSI-2 device ses0 at asr0 bus 0 target 6 lun 0 ses0: <SUPER GEM318 0> Fixed Processor SCSI-2 device SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/da0s1a <118>Loading configuration files. <118>kernel dumps on /dev/da0s1b <118>Entropy harvesting: <118> interrupts <118> ethernet <118> point_to_point Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x2043455c fault code = supervisor read, page not present instruction pointer = 0x20:0xc0742c86 stack pointer = 0x28:0xe8cada0c frame pointer = 0x28:0xe8cada38 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 68 (sysctl) trap number = 12 panic: page fault cpuid = 3 Uptime: 6s Physical memory: 2035 MB Dumping 65 MB: 50 34 18 2 #0 doadump () at pcpu.h:195 195 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:195 #1 0xc073a688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409 #2 0xc073a941 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:563 #3 0xc0a19dc0 in trap_fatal (frame=0xe8cad9cc, eva=541279580) at /usr/src/sys/i386/i386/trap.c:899 #4 0xc0a1a030 in trap_pfault (frame=0xe8cad9cc, usermode=0, eva=541279580) at /usr/src/sys/i386/i386/trap.c:812 #5 0xc0a1a9ad in trap (frame=0xe8cad9cc) at /usr/src/sys/i386/i386/trap.c:490 #6 0xc0a01cab in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #7 0xc0742c86 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:630 #8 0xc0742d46 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:618 #9 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:630 #10 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:630 #11 0xc0742de6 in sysctl_sysctl_next (oidp=0xc0b4c940, arg1=0xe8cadc1c, arg2=4, req=0xe8cadba4) at /usr/src/sys/kern/kern_sysctl.c:651 #12 0xc07436f2 in sysctl_root (oidp=Variable "oidp" is not available. ) at /usr/src/sys/kern/kern_sysctl.c:1306 #13 0xc074382e in userland_sysctl (td=0xc5574210, name=0xe8cadc14, namelen=6, old=0xbfbfe4e8, oldlenp=0xbfbfe598, inkernel=0, new=0x0, newlen=0, retval=0xe8cadc10, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1401 #14 0xc0744462 in __sysctl (td=0xc5574210, uap=0xe8cadcfc) at /usr/src/sys/kern/kern_sysctl.c:1336 #15 0xc0a1a378 in syscall (frame=0xe8cadd38) at /usr/src/sys/i386/i386/trap.c:1035 #16 0xc0a01d10 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196 #17 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) This is a testing machine that is only being used to evaluate 7.0 for use on similar hardware. I can take whatever debugging steps that are needed, just let me know what information is necessary to help resolve the issue. I tried posting this information to the -STABLE list, but received no replies. System is running with the most current BIOS available from the OEM. RAM tested OK with memtest86+ left running for a day or so. >How-To-Repeat: Attempt to boot with a RELENG_7 world/kernel on a SuperMicro SuperServer 6022L-6 with ACPI enabled. Alternately, boot to single user mode and issue "sysctl -a". Crashes every time in the exact same place. >Fix: Workaround is to run with ACPI disabled, but that is not desired. One part of the crash was possibly introduced with rev v1.658.2.1 of src/sys/i386/i386/machdep.c, but I am unable to repeat that fix on recent RELENG_7 sources. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200802271646.m1RGk2o3004379>