Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 23 Aug 2009 23:48:28 GMT
From:      Terrence Koeman <root@mediamonks.net>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   i386/138117: spin lock held too long
Message-ID:  <200908232348.n7NNmSc1033699@www.freebsd.org>
Resent-Message-ID: <200908232350.n7NNo13E024802@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         138117
>Category:       i386
>Synopsis:       spin lock held too long
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Aug 23 23:50:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Terrence Koeman
>Release:        7.2-STABLE
>Organization:
>Environment:
FreeBSD persephone 7.2-STABLE FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009     terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE  i386
>Description:
When rebooting the kernel panics with 'spin lock held too long' (full text below).

This happens every reboot, not incidentally as I've seen it reported by others.

I tried the following:

-Removed PAE (revert to GENERIC), this decreased the panics from every reboot to about 50% of reboots.
-Changed SCHED_ULE to SCHED_4BSD, did not change anything.
-Added KDB_UNATTENDED, but the server hangs on the panic.

Also the server hangs on boot, after the 10s countdown it hangs for about 2-3 minutes before either booting normally or (1 out of 10 boots) hangs forever (apparently, I've let it sit for hours).

I did notice that when the server is in the hung state on booting, I can sometimes make it continue to boot by pressing and holding return on the (usb) keyboard. I've tried other keys, but only the return key sometimes works. It's not a coincidence, I can let it sit 10m or an hour and when then holding return it will continue to boot within 3-5 seconds.

Aside from booting the server is rock-solid, I had it running overnight with a concurrent 'make -j 20 buildkernel && make -j 20 buildworld', 'portupgrade -afr' and '/usr/libexec/locate.updatedb' in a continous loop while running apachebench with 1500 connections on it from another machine on the LAN.

Aside from an ACPI warning at boot (see below for dmesg) there are no errors whatsoever and also no dumps or possibility to get into the debugger.

When ACPI is disabled the server never boots, it hangs at the same point after the 10s countdown but never hangs and does not respond to return.

I'd be happy to get more info, but with no debugger or dump I don't know where to start :) Please advise.

Panic:

---
Rebooting...
cpu_reset: Stopping other CPUs
spin lock 0xc05a8c00 (sched lock 0) held by 0xd3087000 (tid 100011) too long
panic: spin lock held too long
cpuid = 0
---
(with about half the panics the 5 lines text above the panic message are garbled with what looks like random ASCII printable data)

dmesg:
(Server is an IBM xseries 445 with 4x 3Ghz Xeon and 12Gb memory.

---
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.2-STABLE #15: Mon Aug 24 01:00:54 CEST 2009
    terrence@persephone:/usr/obj/usr/src/sys/PERSEPHONE-SMP-PAE
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.00GHz (2993.89-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4400<CNXT-ID,xTPR>
  Logical CPUs per core: 2
real memory  = 13421772800 (12800 MB)
avail memory = 12619841536 (12035 MB)
ACPI APIC Table: <IBM    SERVIGIL>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP/HT): APIC ID:  1
 cpu2 (AP): APIC ID: 18
 cpu3 (AP/HT): APIC ID: 19
 cpu4 (AP): APIC ID: 32
 cpu5 (AP/HT): APIC ID: 33
 cpu6 (AP): APIC ID: 50
 cpu7 (AP/HT): APIC ID: 51
ACPI Warning (tbfadt-0505): Optional field "Gpe1Block" has zero address or length:        0       0/4 [20070320]
MADT: Forcing active-low polarity and level trigger for SCI
ioapic1 <Version 1.1> irqs 44-87 on motherboard
ioapic0 <Version 1.1> irqs 0-43 on motherboard
kbd1 at kbdmux0
acpi0: <IBM SERVIGIL> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 400, 100 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x508-0x50b on acpi0
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
vgapci0: <VGA-compatible display> port 0x1800-0x18ff mem 0xe0000000-0xe7ffffff,0xf0a20000-0xf0a2ffff irq 16 at device 4.0 on pci0
isab0: <PCI-ISA bridge> at device 5.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686B UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x700-0x70f at device 5.1 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
uhci0: <VIA 83C572 USB controller> port 0x1900-0x191f irq 18 at device 5.2 on pci0
uhci0: [GIANT-LOCKED]
uhci0: [ITHREAD]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0x1920-0x193f irq 18 at device 5.3 on pci0
uhci1: [GIANT-LOCKED]
uhci1: [ITHREAD]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 2 ports with 2 removable, self powered
pci0: <serial bus, SMBus> at device 5.4 (no driver attached)
pcib1: <ACPI Host-PCI bridge> on acpi0
pci1: <ACPI PCI bus> on pcib1
bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2002> mem 0xf0b00000-0xf0b0ffff,0xf0b10000-0xf0b1ffff irq 42 at device 4.0 on pci1
miibus0: <MII bus> on bge0
brgphy0: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge0: Ethernet address: 00:09:6b:e6:39:7f
bge0: [ITHREAD]
bge1: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x2002> mem 0xf0b20000-0xf0b2ffff,0xf0b30000-0xf0b3ffff irq 11 at device 4.1 on pci1
miibus1: <MII bus> on bge1
brgphy1: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1: Ethernet address: 00:09:6b:16:39:7f
bge1: [ITHREAD]
pcib2: <ACPI Host-PCI bridge> on acpi0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI Host-PCI bridge> on acpi0
pci5: <ACPI PCI bus> on pcib3
pcib4: <ACPI Host-PCI bridge> on acpi0
pci7: <ACPI PCI bus> on pcib4
ips0: <IBM ServeRAID Adapter> mem 0xf9c00000-0xf9c01fff irq 60 at device 3.0 on pci7
ips0: [ITHREAD]
pcib5: <ACPI Host-PCI bridge> on acpi0
pci9: <ACPI PCI bus> on pcib5
fdc0: <floppy drive controller> port 0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FILTER]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
cpu0: <ACPI CPU> on acpi0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
p4tcc3: <CPU Frequency Thermal Control> on cpu3
cpu4: <ACPI CPU> on acpi0
p4tcc4: <CPU Frequency Thermal Control> on cpu4
cpu5: <ACPI CPU> on acpi0
p4tcc5: <CPU Frequency Thermal Control> on cpu5
cpu6: <ACPI CPU> on acpi0
p4tcc6: <CPU Frequency Thermal Control> on cpu6
cpu7: <ACPI CPU> on acpi0
p4tcc7: <CPU Frequency Thermal Control> on cpu7
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xce7ff pnpid ORM0000 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ukbd0: <Microsoft Natural\M-. Ergonomic Keyboard 4000, class 0/0, rev 2.00/1.73, addr 2> on uhub1
kbd2 at ukbd0
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to accept, logging limited to 2000 packets/entry by default
acd0: DVDROM <MATSHITADVD-ROM SR-8177/NB21> at ata0-master UDMA33
ips0: adapter type: ServeRAID 4Lx (neo lite)
ips0: logical drives: 1
ips0: Logical Drive 0: RAID1 sectors: 286748672, state OK
ipsd0: <Logical Drive> on ips0
ipsd0: Logical Drive  (140014MB)
SMP: AP CPU #7 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
Trying to mount root from ufs:/dev/ipsd0s1a
bge0: link state changed to UP
---

>How-To-Repeat:
reboot
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908232348.n7NNmSc1033699>