Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Dec 2007 10:49:50 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Elliot Finley <efinleywork@efinley.com>
Cc:        freebsd-stable@freebsd.org, User Questions <freebsd-questions@freebsd.org>
Subject:   Re: OS bug in taskq
Message-ID:  <20071217103625.S90185@fledge.watson.org>
In-Reply-To: <m4c8m3lpt31svrvlb5jcj04nsnhffv1e6r@4ax.com>
References:  <m4c8m3lpt31svrvlb5jcj04nsnhffv1e6r@4ax.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sat, 15 Dec 2007, Elliot Finley wrote:

> in the kernel and I'm still unable to obtain a crash dump.  Hopefully there 
> is enough info in this email for a hacker to point me in the right direction 
> to debug this.

If you're unable to obtain a crash dump, you should still be able to use 
interactive console-based debugging with DDB.  I find this is easiest to do 
with a serial console from an adjacent machine, so that I can copy-and-paste 
the results into an e-mail rather than hand-transcribe.  You can also use 
firewire consoles to the same effect, although I've never done that.

Once the system panics, it will drop into DDB.  I usually kick off debugging 
by doing a backtrace, "bt", and showing the status of the current and then all 
processors "show pcpu", "show allpcpu".  Depending on the type of bug, I find 
output from "ps", "alltrace", "show lockedvnods", "show alllocks", "show uma", 
"show malloc" quite useful.  The below panic is a NULL pointer dereference in 
the taskqueue code, but it's likely triggered by a bug in a consumer of the 
task queue service, rather than the task queue code itself.  That means we'll 
need to identify what consumer that is.  That information should become 
visible by looking at the arguments to the stack trace in DDB.  If not, we may 
need to work a little harder to get a dump, or set up serial or firewire kgdb 
to inspect the live running system with a full debugger.

On the swap / dump / etc thing.  In order to capture a saved kernel dump, you 
need sufficient room for the full dump on whatever partition /var/crash is on, 
and it must be writable.  Because dumps are normally written to swap 
partitions, running fsck before the dump is captured can lead to portions of 
the dump being overwritten if fsck uses a lot of memory (and hence overflows 
into swap).  As many systems have a separate /var and /var is often small, it 
could well be that you can successfully capture the dump by just booting to 
single-user, manually fscking /var, mounting /var, and running savecore in the 
/var/crash directory.  You can also configure additional partitions as purely 
dump partitions, rather than swap partitions.  One trick I've used previousy 
is to add a disk temporarily just for the purposes of dumping to, and manually 
doing a dumpon for a partition on that disk (but not a swapon).

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> dmesg:
>
> Copyright (c) 1992-2007 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
> 1994
>        The Regents of the University of California. All rights
> reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 6.2-RELEASE-p5 #1: Mon Nov 19 11:16:44 MST 2007
>    root@postmaster.etv.net:/usr/obj/usr/src/sys/DDB-SMP
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2793.20-MHz 686-class CPU)
>  Origin = "GenuineIntel"  Id = 0xf4a  Stepping = 10
>
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>  Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
>  AMD Features=0x20100000<NX,LM>
>  AMD Features2=0x1<LAHF>
>  Logical CPUs per core: 2
> real memory  = 3220963328 (3071 MB)
> avail memory = 3150856192 (3004 MB)
> ACPI APIC Table: <DELL   PE BKC  >
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> cpu0 (BSP): APIC ID:  0
> cpu1 (AP): APIC ID:  1
> cpu2 (AP): APIC ID:  6
> cpu3 (AP): APIC ID:  7
> ioapic0: Changing APIC ID to 8
> ioapic1: Changing APIC ID to 9
> ioapic1: WARNING: intbase 32 != expected base 24
> ioapic2: Changing APIC ID to 10
> ioapic2: WARNING: intbase 64 != expected base 56
> ioapic0 <Version 2.0> irqs 0-23 on motherboard
> ioapic1 <Version 2.0> irqs 32-55 on motherboard
> ioapic2 <Version 2.0> irqs 64-87 on motherboard
> kbd1 at kbdmux0
> ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
> RF5413)
> acpi0: <DELL PE BKC> on motherboard
> acpi0: Power Button (fixed)
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
> cpu0: <ACPI CPU> on acpi0
> cpu1: <ACPI CPU> on acpi0
> cpu2: <ACPI CPU> on acpi0
> cpu3: <ACPI CPU> on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
> pci1: <ACPI PCI bus> on pcib1
> pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci1
> pci2: <ACPI PCI bus> on pcib2
> amr0: <LSILogic MegaRAID 1.53> mem
> 0xd80f0000-0xd80fffff,0xdfdc0000-0xdfdfffff irq 46 at device 14.0 on
> pci2
> amr0: delete logical drives supported by controller
> amr0: <LSILogic PERC 4e/Di> Firmware 522A, BIOS H430, 256MB RAM
> pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci1
> pci3: <ACPI PCI bus> on pcib3
> pcib4: <ACPI PCI-PCI bridge> at device 4.0 on pci0
> pci4: <ACPI PCI bus> on pcib4
> pcib5: <ACPI PCI-PCI bridge> at device 5.0 on pci0
> pci5: <ACPI PCI bus> on pcib5
> pcib6: <ACPI PCI-PCI bridge> at device 0.0 on pci5
> pci6: <ACPI PCI bus> on pcib6
> em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> 0xecc0-0xecff mem 0xdfae0000-0xdfafffff irq 64 at device 7.0 on pci6
> em0: Ethernet address: 00:18:8b:34:70:50
> pcib7: <ACPI PCI-PCI bridge> at device 0.2 on pci5
> pci7: <ACPI PCI bus> on pcib7
> em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port
> 0xdcc0-0xdcff mem 0xdf8e0000-0xdf8fffff irq 65 at device 8.0 on pci7
> em1: Ethernet address: 00:18:8b:34:70:51
> pcib8: <ACPI PCI-PCI bridge> at device 6.0 on pci0
> pci8: <ACPI PCI bus> on pcib8
> uhci0: <Intel 82801EB (ICH5) USB controller USB-A> port 0xbce0-0xbcff
> irq 16 at device 29.0 on pci0
> uhci0: [GIANT-LOCKED]
> usb0: <Intel 82801EB (ICH5) USB controller USB-A> on uhci0
> usb0: USB revision 1.0
> uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub0: 2 ports with 2 removable, self powered
> uhci1: <Intel 82801EB (ICH5) USB controller USB-B> port 0xbcc0-0xbcdf
> irq 19 at device 29.1 on pci0
> uhci1: [GIANT-LOCKED]
> usb1: <Intel 82801EB (ICH5) USB controller USB-B> on uhci1
> usb1: USB revision 1.0
> uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub1: 2 ports with 2 removable, self powered
> uhci2: <Intel 82801EB (ICH5) USB controller USB-C> port 0xbca0-0xbcbf
> irq 18 at device 29.2 on pci0
> uhci2: [GIANT-LOCKED]
> usb2: <Intel 82801EB (ICH5) USB controller USB-C> on uhci2
> usb2: USB revision 1.0
> uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
> uhub2: 2 ports with 2 removable, self powered
> ehci0: <Intel 82801EB/R (ICH5) USB 2.0 controller> mem
> 0xdff00000-0xdff003ff irq 23 at device 29.7 on pci0
> ehci0: [GIANT-LOCKED]
> usb3: EHCI version 1.0
> usb3: companion controllers, 2 ports each: usb0 usb1 usb2
> usb3: <Intel 82801EB/R (ICH5) USB 2.0 controller> on ehci0
> usb3: USB revision 2.0
> uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
> uhub3: 6 ports with 6 removable, self powered
> uhub4: vendor 0x413c product 0xa001, class 9/0, rev 2.00/0.00, addr 2
> uhub4: multiple transaction translators
> uhub4: 2 ports with 2 removable, self powered
> pcib9: <ACPI PCI-PCI bridge> at device 30.0 on pci0
> pci9: <ACPI PCI bus> on pcib9
> pci9: <unknown> at device 5.0 (no driver attached)
> pci9: <unknown> at device 5.1 (no driver attached)
> pci9: <unknown> at device 5.2 (no driver attached)
> atapci0: <SiI 0680 UDMA133 controller> port
> 0xccf0-0xccf7,0xcce4-0xcce7,0xccd8-0xccdf,0xccd0-0xccd3,0xcc70-0xcc7f
> mem 0xdf5fec00-0xdf5fecff irq 23 at device 6.0 on pci9
> ata2: <ATA channel 0> on atapci0
> ata3: <ATA channel 1> on atapci0
> pci9: <display, VGA> at device 13.0 (no driver attached)
> isab0: <PCI-ISA bridge> at device 31.0 on pci0
> isa0: <ISA bus> on isab0
> atapci1: <Intel ICH5 UDMA100 controller> port
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfc00-0xfc0f at device 31.1 on
> pci0
> ata0: <ATA channel 0> on atapci1
> ata1: <ATA channel 1> on atapci1
> fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
> acpi0
> fdc0: [FAST]
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: [GIANT-LOCKED]
> psm0: model IntelliMouse, device ID 3
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10
> on acpi0
> sio0: type 16550A
> pmtimer0 on isa0
> orm0: <ISA Option ROMs> at iomem
> 0xc0000-0xcafff,0xcb000-0xcbfff,0xcc000-0xccfff,0xec000-0xeffff on
> isa0
> ppc0: parallel port not found.
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
> isa0
> ukbd0: Dell DRAC4, rev 1.10/0.00, addr 2, iclass 3/1
> kbd2 at ukbd0
> ums0: Dell DRAC4, rev 1.10/0.00, addr 2, iclass 3/1
> ums0: X report 0x0002 not supported
> device_attach: ums0 attach returned 6
> Timecounters tick every 1.000 msec
> acd0: CDROM <TEAC CD-ROM CD-224E-N/3.AB> at ata0-master UDMA33
> device_attach: afd0 attach returned 6
> acd1: CDROM <VIRTUALCDROM DRIVE/> at ata2-slave PIO3
> amr0: delete logical drives supported by controller
> amrd0: <LSILogic MegaRAID logical drive> on amr0
> amrd0: 559600MB (1146060800 sectors) RAID 5 (optimal)
> SMP: AP CPU #2 Launched!
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #3 Launched!
> Trying to mount root from ufs:/dev/amrd0s1a
> fire_saver: the console does not support M_VGA_CG320
> module_register_init: MOD_LOAD (fire_saver, 0xc8d50c10, 0) error 19
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20071217103625.S90185>