From owner-freebsd-current@FreeBSD.ORG Tue May 22 12:21:16 2007 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 136B516A41F for ; Tue, 22 May 2007 12:21:16 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8D93D13C457 for ; Tue, 22 May 2007 12:21:14 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 3D387474CD; Tue, 22 May 2007 08:21:13 -0400 (EDT) Date: Tue, 22 May 2007 08:21:13 -0400 (EDT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: "Steven G. Kargl" In-Reply-To: <200705220015.l4M0F76e001014@troutmask.apl.washington.edu> Message-ID: <20070522081827.V28780@fledge.watson.org> References: <200705220015.l4M0F76e001014@troutmask.apl.washington.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@freebsd.org Subject: Re: kernel panic in sbflush_internal X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 May 2007 12:21:16 -0000 On Mon, 21 May 2007, Steven G. Kargl wrote: > One of my colleagues brought down a node on my cluster while running a MPI > job. The kernel coredump shows > > Script started on Mon May 21 17:02:53 2007 > node12:root[201] kgdb kernel.debug vmcore.0 > [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] > > Unread portion of the kernel message buffer: > panic: sbflush_internal: cc 4294965848 || mb 0 || mbcnt 0 > cpuid = 0 > Uptime: 7h6m34s > Physical memory: 16119 MB > Dumping 631 MB: 616 600 584 568 552 536 520 504 488 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8 Is the kernel build date an accurate reflection of the source code version it is being used with? Could you let me know what file revisions are in use for uipc_socket.c, uipc_sockbuf2.c, uipc_syscalls.c, tcp_usrreq.c, tcp_input.c, tcp_subr.c? Could you print *sb in frame #4, *so in frame #7, *tp in frame $5, and *inp in #5 (if defined) -- otherwise, (struct inpcb *)so->so_pcb, if non-NULL, in frame #6. Robert N M Watson Computer Laboratory University of Cambridge > > #0 doadump () at pcpu.h:171 > 171 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) bt > #0 doadump () at pcpu.h:171 > #1 0xffffffff802a01eb in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:409 > #2 0xffffffff802a08cc in panic (fmt=0xffffff03157e0d20 "") > at /usr/src/sys/kern/kern_shutdown.c:563 > #3 0xffffffff802f4d23 in sbflush_internal (sb=0xffffff031243ab68) > at /usr/src/sys/kern/uipc_sockbuf.c:808 > #4 0xffffffff802f50cb in sbflush (sb=0xffffff031243ab68) > at /usr/src/sys/kern/uipc_sockbuf.c:825 > #5 0xffffffff803b7246 in tcp_disconnect (tp=0xffffff03101f73e0) > at /usr/src/sys/netinet/tcp_usrreq.c:1496 > #6 0xffffffff803b7539 in tcp_usr_disconnect (so=0xffffff0311a04690) > at /usr/src/sys/netinet/tcp_usrreq.c:584 > #7 0xffffffff802f67f2 in soclose (so=0xffffff031243aae0) > at /usr/src/sys/kern/uipc_socket.c:642 > #8 0xffffffff802de133 in soo_close (fp=0xffffff0312402258, td=0x0) > at /usr/src/sys/kern/sys_socket.c:296 > #9 0xffffffff8027479f in fdrop (fp=0xffffff0312402258, td=0xffffff03157e0d20) > at file.h:297 > #10 0xffffffff80274aaf in closef (fp=0xffffff0312402258, td=0xffffff03157e0d20) > at /usr/src/sys/kern/kern_descrip.c:1928 > #11 0xffffffff80275f54 in fdfree (td=0xffffff03157e0d20) > at /usr/src/sys/kern/kern_descrip.c:1638 > #12 0xffffffff8027f537 in exit1 (td=0xffffff03157e0d20, rv=9) > at /usr/src/sys/kern/kern_exit.c:271 > #13 0xffffffff802a578f in sigexit (td=0xffffff03157e0d20, sig=9) > at /usr/src/sys/kern/kern_sig.c:2862 > #14 0xffffffff802a63ac in postsig (sig=9) at /usr/src/sys/kern/kern_sig.c:2741 > #15 0xffffffff802d3547 in ast (framep=0xffffffffb0580c70) > at /usr/src/sys/kern/subr_trap.c:271 > #16 0xffffffff804787f0 in Xfast_syscall () > ---Type to continue, or q to quit--- > at /usr/src/sys/amd64/amd64/exception.S:283 > #17 0x00000003c0c7294c in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) quit > > I have the debug kernel and vmcore file, and can make it available. > > The dmesg for the node that panic is > > Copyright (c) 1992-2007 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 7.0-CURRENT #6: Fri May 18 10:19:43 PDT 2007 > kargl@node10.cimu.org:/usr/obj/usr/src/sys/HPC > ACPI APIC Table: > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Dual Core AMD Opteron(tm) Processor 280 (2391.55-MHz K8-class CPU) > Origin = "AuthenticAMD" Id = 0x20f12 Stepping = 2 > Features=0x178bfbff > Features2=0x1 > AMD Features=0xe2500800 > AMD Features2=0x3 > Cores per package: 2 > usable memory = 16902705152 (16119 MB) > avail memory = 16387166208 (15628 MB) > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 2 > cpu3 (AP): APIC ID: 3 > MADT: Forcing active-low polarity and level trigger for SCI > ioapic0 irqs 0-23 on motherboard > ioapic1 irqs 24-27 on motherboard > ioapic2 irqs 28-31 on motherboard > acpi0: on motherboard > acpi0: [ITHREAD] > acpi_hpet0: iomem 0xfec01000-0xfec013ff on acpi0 > Timecounter "HPET" frequency 14318180 Hz quality 2000 > acpi0: Power Button (fixed) > acpi0: reservation of 0, a0000 (3) failed > acpi0: reservation of 100000, eff00000 (3) failed > Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > cpu0: on acpi0 > acpi_throttle0: on cpu0 > cpu1: on acpi0 > cpu2: on acpi0 > cpu3: on acpi0 > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pcib1: at device 6.0 on pci0 > pci3: on pcib1 > ohci0: mem 0xfeafc000-0xfeafcfff irq 19 at device 0.0 on pci3 > ohci0: [GIANT-LOCKED] > ohci0: [ITHREAD] > usb0: OHCI version 1.0, legacy support > usb0: SMM does not respond, resetting > usb0: on ohci0 > usb0: USB revision 1.0 > uhub0: on usb0 > device_attach: uhub0 attach returned 6 > usb0: port 0, set config at addr 1 failed > usb0: root hub problem, error=4 > ohci1: mem 0xfeafd000-0xfeafdfff irq 19 at device 0.1 on pci3 > ohci1: [GIANT-LOCKED] > ohci1: [ITHREAD] > usb1: OHCI version 1.0, legacy support > usb1: SMM does not respond, resetting > usb1: on ohci1 > usb1: USB revision 1.0 > uhub1: on usb1 > uhub1: 3 ports with 3 removable, self powered > atapci0: port 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f mem 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3 > atapci0: [ITHREAD] > ata2: on atapci0 > ata2: [ITHREAD] > ata3: on atapci0 > ata3: [ITHREAD] > ata4: on atapci0 > ata4: [ITHREAD] > ata5: on atapci0 > ata5: [ITHREAD] > vgapci0: port 0xb800-0xb8ff mem 0xfd000000-0xfdffffff,0xfeaff000-0xfeafffff irq 18 at device 6.0 on pci3 > isab0: at device 7.0 on pci0 > isa0: on isab0 > atapci1: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0 > ata0: on atapci1 > ata0: [ITHREAD] > ata1: on atapci1 > ata1: [ITHREAD] > amdsmb0: port 0xcc00-0xcc1f irq 19 at device 7.2 on pci0 > smbus0: on amdsmb0 > smb0: on smbus0 > amdpm0: port 0x10e0-0x10ff at device 7.3 on pci0 > smbus1: on amdpm0 > smb1: on smbus1 > pcib2: at device 10.0 on pci0 > pci2: on pcib2 > pci2:9:0: bad VPD cksum, remain 72 > bge0: mem 0xfc8c0000-0xfc8cffff,0xfc8b0000-0xfc8bffff irq 24 at device 9.0 on pci2 > miibus0: on bge0 > brgphy0: PHY 1 on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > bge0: Ethernet address: 00:e0:81:34:e1:4c > bge0: [ITHREAD] > pci2:9:1: bad VPD cksum, remain 72 > bge1: mem 0xfc8f0000-0xfc8fffff,0xfc8e0000-0xfc8effff irq 25 at device 9.1 on pci2 > miibus1: on bge1 > brgphy1: PHY 1 on miibus1 > brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto > bge1: Ethernet address: 00:e0:81:34:e1:4d > bge1: [ITHREAD] > pcib3: at device 11.0 on pci0 > pci1: on pcib3 > acpi_button0: on acpi0 > atkbdc0: port 0x60,0x64 irq 1 on acpi0 > atkbd0: irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > atkbd0: [ITHREAD] > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 > sio0: type 16550A > sio0: [FILTER] > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > sio1: [FILTER] > fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > ppc0: port 0x378-0x37f irq 7 on acpi0 > ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode > ppbus0: on ppc0 > lpt0: on ppbus0 > lpt0: Interrupt-driven port > ppi0: on ppbus0 > ppc0: [GIANT-LOCKED] > ppc0: [ITHREAD] > fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: does not respond > device_attach: fdc0 attach returned 6 > orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff,0xcc800-0xcdfff,0xce000-0xcf7ff,0xcf800-0xd07ff on isa0 > sc0: at flags 0x100 on isa0 > sc0: VGA <8 virtual consoles, flags=0x300> > vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounters tick every 1.000 msec > ad4: 239372MB at ata2-master SATA150 > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > SMP: AP CPU #3 Launched! > hwpmc: TSC/1/0x20 K8/4/0x1ff > Trying to mount root from ufs:/dev/ad4s1a > WARNING: / was not properly dismounted > > -- > Steve > http://troutmask.apl.washington.edu/~kargl/ > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" >