From owner-freebsd-bugs@FreeBSD.ORG Fri Jan 14 06:10:23 2005 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9D7B916A4CE for ; Fri, 14 Jan 2005 06:10:23 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3C6F343D48 for ; Fri, 14 Jan 2005 06:10:23 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.1/8.13.1) with ESMTP id j0E6ANu0006260 for ; Fri, 14 Jan 2005 06:10:23 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.1/8.13.1/Submit) id j0E6AN4h006259; Fri, 14 Jan 2005 06:10:23 GMT (envelope-from gnats) Resent-Date: Fri, 14 Jan 2005 06:10:23 GMT Resent-Message-Id: <200501140610.j0E6AN4h006259@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Chris Jones Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9BF8D16A4CE for ; Fri, 14 Jan 2005 06:09:54 +0000 (GMT) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3127843D4C for ; Fri, 14 Jan 2005 06:09:54 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j0E69rjf044251 for ; Fri, 14 Jan 2005 06:09:53 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id j0E69rHY044250; Fri, 14 Jan 2005 06:09:53 GMT (envelope-from nobody) Message-Id: <200501140609.j0E69rHY044250@www.freebsd.org> Date: Fri, 14 Jan 2005 06:09:53 GMT From: Chris Jones To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Subject: kern/76237: vinum panics on ufs_baddir, possible link to reviving stale subdisks X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2005 06:10:23 -0000 >Number: 76237 >Category: kern >Synopsis: vinum panics on ufs_baddir, possible link to reviving stale subdisks >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Jan 14 06:10:22 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Chris Jones >Release: FreeBSD RELENG_5_3 >Organization: Novus Ordo Systems >Environment: FreeBSD ishtar 5.3-RELEASE-p4 FreeBSD 5.3-RELEASE-p4 #1: Thu Jan 13 20:12:44 MST 2005 root@ishtar:/usr/obj/usr/src/sys/ISHTAR i386 >Description: I've got a FreeBSD box, ishtar, which has three SATA drives connected using Silicon Image 1312A cards, without RAID enabled. Most of the drives is devoted to vinum for future use as a file server. I have two RAID-5 volume with 16 GB on each drive, for a size of 32 GB per volume. ishtar is running the GENERIC kernel, with DDB, KDB, and BREAK_TO_DEBUGGER added. I've built & installed the new kernel, and rebuilt & installed world to ensure that everything's in sync. I was transferring files from my laptop to ishtar using rsync; one of the disks had a DMA error and I stopped the rsync transfer: Jan 13 21:00:30 ishtar kernel: ad6: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=38632958 Jan 13 21:00:31 ishtar kernel: ad6: FAILURE - WRITE_DMA timed out Jan 13 21:00:31 ishtar kernel: vinum: backups.p0.s2 is stale by force Jan 13 21:00:31 ishtar kernel: vinum: backups.p0 is degraded Jan 13 21:00:31 ishtar kernel: fatal :backups.p0.s2 write error, block 37060031 for 65536 bytes Jan 13 21:00:31 ishtar kernel: backups.p0.s2: user buffer block 7010656 for 65536 bytes I did a 'vinum start backups.p0.s2' to have vinum rebuild the parity information on that subdisk. The rebuild occurred uneventfully, and I rebooted ishtar to ensure that all was well: Jan 13 21:58:13 ishtar kernel: vinum: backups.p0.s2 is up by force Jan 13 21:58:13 ishtar kernel: vinum: backups.p0 is up Jan 13 21:58:13 ishtar kernel: vinum: backups.p0.s2 is up Jan 13 21:58:44 ishtar reboot: rebooted by root The system came up cleanly, at which point I started vinum ('vinum start'); vinum reported no errors ('vinum l'). I did not rebuild the parity data (that shouldn't be necessary when you have a stale disk that's been rebuilt, right?). I then fscked (no errors found) and mounted the volume, and restarted the rsync, which proceded uneventfully for a few minutes, until ishtar crashed with the following on the screen: /export/backups: bad dir ino 495250 at offset 7680: mangled entry panic: ufs_dirbad: bad dir KDB: enter: panic [thread 100084] Stopped at kdb_enter+0x2b: nop db> where kdb_enter(c0824119) at kdb_enter+0x2b panic(c08353ce,cbec8e00,dbd1bb08,c07550d2,c2454a64) at panic+0xbb ufs_dirbad(c2454a64,1e00,c0835388,c1bd7640,0) at ufs_dirbad+0x3d ufs_lookup(d8d1bb38,d8d1bb74,c0659f5a,d8d1bb38,d8d1bb50) at ufs_lookup+0x3d2 ufs_nvoperate(d8d1bb38) at ufs_vnoperate+0x13 vfs_cache_lookup(d8d1bbac,d8d1bbc8,c065eb9b,d8d1bbac,c1bd7640) at vfs_cache_lookup+0xe2 ufs_vnoperate(d8d1bbac) at ufs_vnoperate+0x13 lookup(d8d1bc30,c1bd7640,c1d88ca8,c1e4bd00,c2772200) at lookup+0x2cf namei(d8d1bc30,bfbfe810,0,0,c1becc60) at namei+0x1c1 lstat(c1bd7640,d8d1bd14,2a2,296) at lstat+0x4a syscall(2f,2f,2f,bfbfed5c,bfbfed3c) at syscall+0x27b Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (190, FreeBSD ELF32, lstat), eip = 0x280fb837, esp = 0xbfbfdddc, ebp = 0xbfbfddf8 --- db> call doadump Dumping 495 MB .. [ dumps, halt box, reboot etc. ] I have the crash dump saved and can make it available if required. I can also reduce the size of the kernel memory for a more manageable file. This kernel panic appears to be reproducible, although it does not occur after a specific period of time or on a specific file transfer; the LBA of the WRITE_DMA timeout is not constant. In addition, I can reproduce PR kern/73757 with this system and provide a dump for it. dmesg follows: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RELEASE-p4 #1: Thu Jan 13 20:12:44 MST 2005 root@ishtar:/usr/obj/usr/src/sys/ISHTAR Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) XP 2600+ (1908.95-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x6a0 Stepping = 0 Features=0x383fbff AMD Features=0xc0400000 real memory = 520028160 (495 MB) avail memory = 499208192 (476 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: on acpi0 acpi_tz0: on acpi0 acpi_button0: on acpi0 acpi_button1: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xe0000000-0xe3ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) atapci0: port 0xb000-0xb00f,0xac00-0xac03,0xa800-0xa807,0xa400-0xa403,0xa000-0xa007 mem 0xeb000000-0xeb0001ff irq 16 at device 8.0 on pci0 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 atapci1: port 0xc400-0xc40f,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 mem 0xeb001000-0xeb0011ff irq 18 at device 10.0 on pci0 ata4: channel #0 on atapci1 ata5: channel #1 on atapci1 uhci0: port 0xc800-0xc81f irq 21 at device 16.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xcc00-0xcc1f irq 21 at device 16.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xd000-0xd01f irq 21 at device 16.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: at device 16.3 (no driver attached) isab0: at device 17.0 on pci0 isa0: on isab0 atapci2: port 0xd400-0xd40f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 ata0: channel #0 on atapci2 ata1: channel #1 on atapci2 pci0: at device 17.5 (no driver attached) vr0: port 0xe000-0xe0ff mem 0xeb003000-0xeb0030ff irq 23 at device 18.0 on pci0 miibus0: on vr0 ukphy0: on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr0: Ethernet address: 00:0e:a6:57:2e:de fdc0: port 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on acpi0 fdc0: [FAST] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x64,0x60 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] orm0: at iomem 0xd8000-0xdc7ff on isa0 pmtimer0 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 1908948716 Hz quality 800 Timecounters tick every 10.000 msec acd0: CDRW <8X4X32/5.AX> at ata0-master PIO4 ad4: 114473MB [232581/16/63] at ata2-master SATA150 ad6: 38146MB [77504/16/63] at ata3-master SATA150 ad8: 114473MB [232581/16/63] at ata4-master SATA150 Mounting root from ufs:/dev/ad6s1a >How-To-Repeat: >Fix: >Release-Note: >Audit-Trail: >Unformatted: