Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Feb 2008 16:46:02 GMT
From:      Jim Pingle <lists@pingle.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   i386/121148: Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled
Message-ID:  <200802271646.m1RGk2o3004379@www.freebsd.org>
Resent-Message-ID: <200802271650.m1RGo1aD033724@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         121148
>Category:       i386
>Synopsis:       Repeatable sysctl crash (Fatal Trap 12) with ACPI enabled
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Wed Feb 27 16:50:00 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Jim Pingle
>Release:        7.0-PRERELEASE (RELENG_7)
>Organization:
HPC Internet Services
>Environment:
FreeBSD test1.hpcisp.com 7.0-PRERELEASE FreeBSD 7.0-PRERELEASE #1: Thu Feb 14 14:08:02 EST 2008 root@test1.hpcisp.com:/usr/obj/usr/src/sys/TEST  i386 
>Description:
SuperMicro SuperServer 6022L-6 will not fully boot RELENG_7 unless I booth with ACPI disabled. RELENG_7_0 does not crash on the same hardware with the same config.

Crash is as follows:
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x2043455c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc0742c86
stack pointer           = 0x28:0xe8cada0c
frame pointer           = 0x28:0xe8cada38
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 68 (sysctl)
trap number             = 12
panic: page fault
cpuid = 3
Uptime: 6s
Physical memory: 2035 MB
Dumping 65 MB: 50 34 18 2

The crash happens just after the "Entropy harvesting..." line, before swap is started. As you can see in the crash output, the offending process is sysctl.

I can boot to single user mode, but if I issue sysctl -a while there, it also crashes. When sysctl -a is run in single user mode, the last three lines before the crash are (transcribed by hand, no serial console available):

dev.pcib.3.%location: handle=\_SB_.PCI3
dev.pcib.3.%pnpinfo: _HID=PNP0A03 UID=3
dev.pcib.3.%parent: acpi0

With a working RELENG_7_0 the lines immediately following this are:

dev.pcib.4.%desc: ACPI Host-PCI bridge
dev.pcib.4.%driver: pcib
dev.pcib.4.%location: handle=\_SB_.PCI4
dev.pcib.4.%pnpinfo: _HID=PNP0A03 _UID=4
dev.pcib.4.%parent: acpi0

I tried a binary search of the source tree to narrow down the crash. I found that one possible vector for the crash was introduced between 2007/12/19 20:00:00 (booted OK) and 2007/12/19 23:59:00 (crashed), which left me with only a handful of files to test.

By process of elimination, I found that if I backed some changes out in src/sys/i386/i386/machdep.c, the crash stopped.

src/sys/i386/i386/machdep.c v1.658 2007/08/09 njl - Boots OK
src/sys/i386/i386/machdep.c v1.658.2.1 2007/12/19 rpaulo - Crashes

The confusing part (to me) is that my next step was to update all the way to RELENG_7 as of yesterday, then back out those same changes, but the crash still happened. So either I misidentified the cause of the crash -- which is quite possible -- or it was reintroduced in some other change (or both!). 

kgdb output from vmcore.0:
Unread portion of the kernel message buffer:
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-PRERELEASE #0: Mon Feb 25 15:22:54 EST 2008
    root@test1.hpcisp.com:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) XEON(TM) CPU 2.00GHz (1999.94-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf24  Stepping = 4
  Features=0x3febfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM>
  Logical CPUs per core: 2
real memory  = 2147418112 (2047 MB)
avail memory = 2091872256 (1994 MB)
ACPI APIC Table: <RCC    GCHE    >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length:        0       0/8 [20070320]
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-15 on motherboard
ioapic1 <Version 1.1> irqs 16-31 on motherboard
ioapic2 <Version 1.1> irqs 32-47 on motherboard
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Feb 25 2008 15:20:56)
acpi0: <RCC GCHE> on motherboard
ACPI Warning (dswload-0794): Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [IO__] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [DATA] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [ICNT] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [ACPI] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [IORG] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SB__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [PM__] had invalid type (String) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [BIOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [CMOS] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [KBC_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
ACPI Warning (dswload-0794): Type override - [OEM_] had invalid type (Integer) for Scope operator, changed to (Scope) [20070320]
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 7ff00000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0
cpu0: <ACPI CPU> on acpi0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
p4tcc3: <CPU Frequency Thermal Control> on cpu3
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0xa800-0xa8ff mem 0xfd000000-0xfdffffff,0xfe5ff000-0xfe5fffff irq 18 at device 2.0 on pci0
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xae80-0xaebf mem 0xfe5fc000-0xfe5fcfff,0xfe580000-0xfe59ffff irq 17 at device 4.0 on pci0
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> PHY 1 on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:30:48:20:a3:9e
fxp0: [ITHREAD]
fxp1: <Intel 82550 Pro/100 Ethernet> port 0xaf00-0xaf3f mem 0xfe5fd000-0xfe5fdfff,0xfe5a0000-0xfe5bffff irq 19 at device 5.0 on pci0
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> PHY 1 on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: Ethernet address: 00:30:48:20:a3:9f
fxp1: [ITHREAD]
isab0: <PCI-ISA bridge> at device 15.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <ServerWorks CSB5 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 15.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
ohci0: <OHCI (generic) USB controller> mem 0xfe5fe000-0xfe5fefff irq 10 at device 15.2 on pci0
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: <(0x1166) OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 4 ports with 4 removable, self powered
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI Host-PCI bridge> on acpi0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI Host-PCI bridge> on acpi0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI Host-PCI bridge> on acpi0
pci4: <ACPI PCI bus> on pcib4
asr0: <Adaptec Caching SCSI RAID> mem 0xfeb00000-0xfebfffff,0xfb000000-0xfbffffff,0xf8000000-0xf9ffffff irq 29 at device 3.0 on pci4
asr0: [GIANT-LOCKED]
asr0: [ITHREAD]
asr0:   ADAPTEC 2005S FW Rev. 380E, 2 channel, 2000 CCBs, Protocol I2O
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model NetMouse/NetScroll Optical, device ID 0
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f3,0x3f4-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcdfff,0xce000-0xcefff,0xcf000-0xcffff pnpid ORM0000 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: Generic chipset (ECP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/8 bytes threshold
ppbus0: <Parallel port bus> on ppc0
ppbus0: [ITHREAD]
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
hptrr: no controller detected.
acd0: CDROM <MATSHITA CR-177/7T0D> at ata1-master UDMA33
da0 at asr0 bus 0 target 0 lun 0
da0: <ADAPTEC RAID-5 380E> Fixed Direct Access SCSI-2 device 
ses0 at asr0 bus 0 target 6 lun 0
ses0: <SUPER GEM318 0> Fixed Processor SCSI-2 device 
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/da0s1a
<118>Loading configuration files.
<118>kernel dumps on /dev/da0s1b
<118>Entropy harvesting:
<118> interrupts
<118> ethernet
<118> point_to_point


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x2043455c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc0742c86
stack pointer           = 0x28:0xe8cada0c
frame pointer           = 0x28:0xe8cada38
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 68 (sysctl)
trap number             = 12
panic: page fault
cpuid = 3
Uptime: 6s
Physical memory: 2035 MB
Dumping 65 MB: 50 34 18 2

#0  doadump () at pcpu.h:195
195     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc073a688 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc073a941 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc0a19dc0 in trap_fatal (frame=0xe8cad9cc, eva=541279580) at /usr/src/sys/i386/i386/trap.c:899
#4  0xc0a1a030 in trap_pfault (frame=0xe8cad9cc, usermode=0, eva=541279580) at /usr/src/sys/i386/i386/trap.c:812
#5  0xc0a1a9ad in trap (frame=0xe8cad9cc) at /usr/src/sys/i386/i386/trap.c:490
#6  0xc0a01cab in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#7  0xc0742c86 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#8  0xc0742d46 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:618
#9  0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#10 0xc0742d83 in sysctl_sysctl_next_ls (lsp=Variable "lsp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:630
#11 0xc0742de6 in sysctl_sysctl_next (oidp=0xc0b4c940, arg1=0xe8cadc1c, arg2=4, req=0xe8cadba4)
    at /usr/src/sys/kern/kern_sysctl.c:651
#12 0xc07436f2 in sysctl_root (oidp=Variable "oidp" is not available.
) at /usr/src/sys/kern/kern_sysctl.c:1306
#13 0xc074382e in userland_sysctl (td=0xc5574210, name=0xe8cadc14, namelen=6, old=0xbfbfe4e8, oldlenp=0xbfbfe598, 
    inkernel=0, new=0x0, newlen=0, retval=0xe8cadc10, flags=0) at /usr/src/sys/kern/kern_sysctl.c:1401
#14 0xc0744462 in __sysctl (td=0xc5574210, uap=0xe8cadcfc) at /usr/src/sys/kern/kern_sysctl.c:1336
#15 0xc0a1a378 in syscall (frame=0xe8cadd38) at /usr/src/sys/i386/i386/trap.c:1035
#16 0xc0a01d10 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:196
#17 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

This is a testing machine that is only being used to evaluate 7.0 for use on similar hardware. I can take whatever debugging steps that are needed, just let me know what information is necessary to help resolve the issue. 

I tried posting this information to the -STABLE list, but received no replies.

System is running with the most current BIOS available from the OEM. RAM tested OK with memtest86+ left running for a day or so.
>How-To-Repeat:
Attempt to boot with a RELENG_7 world/kernel on a SuperMicro SuperServer 6022L-6 with ACPI enabled.

Alternately, boot to single user mode and issue "sysctl -a". Crashes every time in the exact same place.
>Fix:
Workaround is to run with ACPI disabled, but that is not desired.

One part of the crash was possibly introduced with rev v1.658.2.1 of src/sys/i386/i386/machdep.c, but I am unable to repeat that fix on recent RELENG_7 sources.

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200802271646.m1RGk2o3004379>