Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Jul 2001 10:53:25 -0700 (PDT)
From:      Bill Moran <wmoran@iowna.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   i386/29045: Heavy disk usage causes panic in ffs_blkfree
Message-ID:  <200107171753.f6HHrPd52978@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         29045
>Category:       i386
>Synopsis:       Heavy disk usage causes panic in ffs_blkfree
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jul 17 11:00:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Bill Moran
>Release:        4.3
>Organization:
Independent Consultant
>Environment:
FreeBSD backup.prioritydesigns.com 4.3-STABLE FreeBSD 4.3-STABLE #0: Thu Jun 21 11:14:06 EDT 2001     root@backup.prioritydesigns.com:/usr/obj/usr/src/sys/BACKUP  i386

>Description:
Under heavy load, panics occur in the filesystem code. The panics apparently result in some sort of subtle corruption to the filesystem that then results in an increased chance of further panics.
The following is a typical debug session after such a panic.

IdlePTD 3174400
initial pcb at 286640
panicstr: ffs_blkfree: freeing free block
panic messages:
---
panic: ffs_blkfree: freeing free block

syncing disks... 128 123 110 92 66 34 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
giving up on 2 buffers
Uptime: 24d9h59m3s

dumping to dev #ad/0x20021, offset 380616
dump ata2: resetting devices .. done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:469
469		if (dumping++) {
(kgdb) bt
#0  dumpsys () at /usr/src/sys/kern/kern_shutdown.c:469
#1  0xc0152017 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:309
#2  0xc0152394 in poweroff_wait (junk=0xc0252400, howto=-1071307808)
    at /usr/src/sys/kern/kern_shutdown.c:556
#3  0xc01dada2 in ffs_blkfree (ip=0xc1b57000, bno=0, size=8192)
    at /usr/src/sys/ufs/ffs/ffs_alloc.c:1349
#4  0xc01dd06d in ffs_indirtrunc (ip=0xc1b57000, lbn=-12, dbn=115240768, 
    lastbn=-1, level=0, countp=0xcc441d88)
    at /usr/src/sys/ufs/ffs/ffs_inode.c:498
#5  0xc01dcbbd in ffs_truncate (vp=0xcc78fac0, length=0, flags=0, cred=0x0, 
    p=0xcb0eba00) at /usr/src/sys/ufs/ffs/ffs_inode.c:314
#6  0xc01e719e in ufs_inactive (ap=0xcc441eb4)
    at /usr/src/sys/ufs/ufs/ufs_inode.c:84
#7  0xc01ec3ed in ufs_vnoperate (ap=0xcc441eb4)
    at /usr/src/sys/ufs/ufs/ufs_vnops.c:2373
#8  0xc017df9a in vput (vp=0xcc78fac0) at vnode_if.h:815
#9  0xc01811d9 in unlink (p=0xcb0eba00, uap=0xcc441f80)
    at /usr/src/sys/kern/vfs_syscalls.c:1471
#10 0xc0227ee2 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
      tf_edi = 1, tf_esi = 134721536, tf_ebp = -1077936872, 
      tf_isp = -867950636, tf_ebx = 134768128, tf_edx = 134768200, tf_ecx = 0, 
      tf_eax = 10, tf_trapno = 7, tf_err = 2, tf_eip = 134533564, tf_cs = 31, 
      tf_eflags = 643, tf_esp = -1077936916, tf_ss = 47})
---Type <return> to continue, or q <return> to quit---
    at /usr/src/sys/i386/i386/trap.c:1150
#11 0xc021cda5 in Xint0x80_syscall ()
#12 0x804832b in ?? ()
#13 0x8048135 in ?? ()

backtraces vary slightly from panic to panic, but the panic message is always the same and is always called from ffs_blkfree()

The hardware involved is as follows:

Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 4.3-STABLE #0: Thu Jun 21 11:14:06 EDT 2001
    root@backup.prioritydesigns.com:/usr/obj/usr/src/sys/BACKUP
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 756743963 Hz
CPU: AMD Athlon(tm) Processor (756.74-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x642  Stepping = 2
  Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
  AMD Features=0xc0440000<<b18>,AMIE,DSP,3DNow!>
real memory  = 134135808 (130992K bytes)
avail memory = 127270912 (124288K bytes)
Preloaded elf kernel "kernel" at 0xc02e8000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc02e809c.
Pentium Pro MTRR support enabled
md0: Malloc disk
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib2: <PCI to PCI bridge (vendor=1106 device=8305)> at device 1.0 on pci0
pci1: <PCI bus> on pcib2
pci1: <ATI Mach64-GM graphics accelerator> at 0.0 irq 11
isab0: <VIA 82C686 PCI-ISA bridge> at device 4.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 82C686 ATA100 controller> port 0xb800-0xb80f at device 4.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
pci0: <VIA 83C572 USB controller> at 4.3 irq 5
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0x9400-0x943f mem 0xe0800000-0xe08fffff,0xe1000000-0xe1000fff irq 5 at device 9.0 on pci0
fxp0: Ethernet address 00:d0:b7:46:10:e9
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ahc0: <Adaptec 2930CU SCSI adapter> port 0x9000-0x90ff mem 0xe0000000-0xe0000fff irq 7 at device 10.0 on pci0
aic7860: Single Channel A, SCSI Id=7, 3/255 SCBs
atapci1: <Promise ATA100 controller> port 0x7400-0x743f,0x7800-0x7803,0x8000-0x8007,0x8400-0x8403,0x8800-0x8807 mem 0xdf800000-0xdf81ffff irq 10 at device 17.0 on pci0
ata2: at 0x8800 on atapci1
ata3: at 0x8000 on atapci1
pcib1: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib1
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 8250
sio1: configured irq 3 not in bitmap of probed irqs 0
ppc0: parallel port not found.
ad4: 73308MB <IBM-DTLA-307075> [148945/16/63] at ata2-master UDMA100
acd0: CDROM <ATAPI-CD ROM-DRIVE-50MAX> at ata0-slave using PIO4
Waiting 5 seconds for SCSI devices to settle
sa0 at ahc0 bus 0 target 0 lun 0
sa0: <SONY SDT-9000 0400> Removable Sequential Access SCSI-2 device 
sa0: 10.000MB/s transfers (10.000MHz, offset 15)
Mounting root from ufs:/dev/ad4s1a
WARNING: / was not properly dismounted

>How-To-Repeat:
The HDD is partitioned into /, /var, /usr, swap and /data. The /data partition is ~65G and is where the problem appears to occur. There is also an NFS mount to a 65G partition on another FreeBSD computer. This computer contains ~30G of data (fluctuates +/-3G per day) On Sunday at 3:00 AM, a script is run that does 'rm -r /data/*' and then does a cp to duplicate the remote drive onto the data partition. This operation will cause the described panic approximatly once a month.
The other days of the week, rsync is used to maintain file versions. rsync has only caused a panic once in the four months this machine has been operating. (except as described below)
Once a panic has occurred, any type of heavy HDD usage will cause a repeat panic ~90% of the time. newfsing the /data partition reduces the fequency of the panics to the percentages previously described.
The enabling/disabling of softupdates doesn't seem to make any difference to the frequency of the panics.
>Fix:
I don't have any suggestions at this point. My best guess so far is that while dealing with fragmentation the code is somehow losing track of whether or not a block is free.
I have 3 crash dumps so far (I'm starting a collection) If you need any more information on this, please don't hesitate to contact me.
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200107171753.f6HHrPd52978>