Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 14 Jan 2005 06:09:53 GMT
From:      Chris Jones <cdjones@novusordo.net>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/76237: vinum panics on ufs_baddir, possible link to reviving stale subdisks
Message-ID:  <200501140609.j0E69rHY044250@www.freebsd.org>
Resent-Message-ID: <200501140610.j0E6AN4h006259@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         76237
>Category:       kern
>Synopsis:       vinum panics on ufs_baddir, possible link to reviving stale subdisks
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jan 14 06:10:22 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Chris Jones
>Release:        FreeBSD RELENG_5_3
>Organization:
Novus Ordo Systems
>Environment:
FreeBSD ishtar 5.3-RELEASE-p4 FreeBSD 5.3-RELEASE-p4 #1: Thu Jan 13 20:12:44 MST 2005     root@ishtar:/usr/obj/usr/src/sys/ISHTAR  i386
>Description:
I've got a FreeBSD box, ishtar, which has three SATA drives connected using Silicon Image 1312A cards, without RAID enabled.  Most of the drives is devoted to vinum for future use as a file server.  I have two RAID-5 volume with 16 GB on each drive, for a size of 32 GB per volume.   

ishtar is running the GENERIC kernel, with DDB, KDB, and BREAK_TO_DEBUGGER added.  I've built & installed the new kernel, and rebuilt & installed world to ensure that everything's in sync.

I was transferring files from my laptop to ishtar using rsync; one of the disks had a DMA error and I stopped the rsync transfer:

Jan 13 21:00:30 ishtar kernel: ad6: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=38632958
Jan 13 21:00:31 ishtar kernel: ad6: FAILURE - WRITE_DMA timed out
Jan 13 21:00:31 ishtar kernel: vinum: backups.p0.s2 is stale by force
Jan 13 21:00:31 ishtar kernel: vinum: backups.p0 is degraded
Jan 13 21:00:31 ishtar kernel: fatal :backups.p0.s2 write error, block 37060031 for 65536 bytes
Jan 13 21:00:31 ishtar kernel: backups.p0.s2: user buffer block 7010656 for 65536 bytes

I did a 'vinum start backups.p0.s2' to have vinum rebuild the parity information on that subdisk.  The rebuild occurred uneventfully, and I rebooted ishtar to ensure that all was well:

Jan 13 21:58:13 ishtar kernel: vinum: backups.p0.s2 is up by force
Jan 13 21:58:13 ishtar kernel: vinum: backups.p0 is up
Jan 13 21:58:13 ishtar kernel: vinum: backups.p0.s2 is up
Jan 13 21:58:44 ishtar reboot: rebooted by root

The system came up cleanly, at which point I started vinum ('vinum start'); vinum reported no errors ('vinum l').  I did not rebuild the parity data (that shouldn't be necessary when you have a stale disk that's been rebuilt, right?).  I then fscked (no errors found) and mounted the volume, and restarted the rsync, which proceded uneventfully for a few minutes, until ishtar crashed with the following on the screen:

/export/backups: bad dir ino 495250 at offset 7680: mangled entry
panic: ufs_dirbad: bad dir
KDB: enter: panic
[thread 100084]
Stopped at 	kdb_enter+0x2b: nop
db> where
kdb_enter(c0824119) at kdb_enter+0x2b
panic(c08353ce,cbec8e00,dbd1bb08,c07550d2,c2454a64) at panic+0xbb
ufs_dirbad(c2454a64,1e00,c0835388,c1bd7640,0) at ufs_dirbad+0x3d
ufs_lookup(d8d1bb38,d8d1bb74,c0659f5a,d8d1bb38,d8d1bb50) at ufs_lookup+0x3d2
ufs_nvoperate(d8d1bb38) at ufs_vnoperate+0x13
vfs_cache_lookup(d8d1bbac,d8d1bbc8,c065eb9b,d8d1bbac,c1bd7640) at vfs_cache_lookup+0xe2
ufs_vnoperate(d8d1bbac) at ufs_vnoperate+0x13
lookup(d8d1bc30,c1bd7640,c1d88ca8,c1e4bd00,c2772200) at lookup+0x2cf
namei(d8d1bc30,bfbfe810,0,0,c1becc60) at namei+0x1c1
lstat(c1bd7640,d8d1bd14,2a2,296) at lstat+0x4a
syscall(2f,2f,2f,bfbfed5c,bfbfed3c) at syscall+0x27b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x280fb837, esp = 0xbfbfdddc, ebp = 0xbfbfddf8 ---
db> call doadump
Dumping 495 MB
.. [ dumps, halt box, reboot etc. ]

I have the crash dump saved and can make it available if required.  I can also reduce the size of the kernel memory for a more manageable file.  This kernel panic appears to be reproducible, although it does not occur after a specific period of time or on a specific file transfer; the LBA of the WRITE_DMA timeout is not constant.

In addition, I can reproduce PR kern/73757 with this system and provide a dump for it.

dmesg follows:

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.3-RELEASE-p4 #1: Thu Jan 13 20:12:44 MST 2005
    root@ishtar:/usr/obj/usr/src/sys/ISHTAR
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) XP 2600+ (1908.95-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x6a0  Stepping = 0
  Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
  AMD Features=0xc0400000<AMIE,DSP,3DNow!>
real memory  = 520028160 (495 MB)
avail memory = 499208192 (476 MB)
ACPI APIC Table: <KM400A AWRDACPI>
ioapic0 <Version 0.3> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <KM400A AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU (3 Cx states)> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_button0: <Power Button> on acpi0
acpi_button1: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA Generic host to PCI bridge> mem 0xe0000000-0xe3ffffff at device 0.0 on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
atapci0: <SiI 3112 SATA150 controller> port 0xb000-0xb00f,0xac00-0xac03,0xa800-0xa807,0xa400-0xa403,0xa000-0xa007 mem 0xeb000000-0xeb0001ff irq 16 at device 8.0 on pci0
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
atapci1: <SiI 3112 SATA150 controller> port 0xc400-0xc40f,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 mem 0xeb001000-0xeb0011ff irq 18 at device 10.0 on pci0
ata4: channel #0 on atapci1
ata5: channel #1 on atapci1
uhci0: <VIA 83C572 USB controller> port 0xc800-0xc81f irq 21 at device 16.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 21 at device 16.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 21 at device 16.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <VIA 83C572 USB controller> on uhci2
usb2: USB revision 1.0
uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 16.3 (no driver attached)
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci2: <VIA 8235 UDMA133 controller> port 0xd400-0xd40f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0
ata0: channel #0 on atapci2
ata1: channel #1 on atapci2
pci0: <multimedia, audio> at device 17.5 (no driver attached)
vr0: <VIA VT6102 Rhine II 10/100BaseTX> port 0xe000-0xe0ff mem 0xeb003000-0xeb0030ff irq 23 at device 18.0 on pci0
miibus0: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vr0: Ethernet address: 00:0e:a6:57:2e:de
fdc0: <floppy drive controller> port 0x3f7,0x3f2-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
ppc0: <ECP parallel printer port> port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
orm0: <ISA Option ROM> at iomem 0xd8000-0xdc7ff on isa0
pmtimer0 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 1908948716 Hz quality 800
Timecounters tick every 10.000 msec
acd0: CDRW <8X4X32/5.AX> at ata0-master PIO4
ad4: 114473MB <ST3120026AS/3.05> [232581/16/63] at ata2-master SATA150
ad6: 38146MB <ST340014AS/8.05> [77504/16/63] at ata3-master SATA150
ad8: 114473MB <ST3120026AS/3.05> [232581/16/63] at ata4-master SATA150
Mounting root from ufs:/dev/ad6s1a
>How-To-Repeat:

>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200501140609.j0E69rHY044250>