From owner-freebsd-bugs Tue Jul 17 11: 0:12 2001 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id F0F6937B40E for ; Tue, 17 Jul 2001 11:00:00 -0700 (PDT) (envelope-from gnats@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.4/8.11.4) id f6HI00C53640; Tue, 17 Jul 2001 11:00:00 -0700 (PDT) (envelope-from gnats) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 12A8B37B401 for ; Tue, 17 Jul 2001 10:53:25 -0700 (PDT) (envelope-from nobody@FreeBSD.org) Received: (from nobody@localhost) by freefall.freebsd.org (8.11.4/8.11.4) id f6HHrPd52978; Tue, 17 Jul 2001 10:53:25 -0700 (PDT) (envelope-from nobody) Message-Id: <200107171753.f6HHrPd52978@freefall.freebsd.org> Date: Tue, 17 Jul 2001 10:53:25 -0700 (PDT) From: Bill Moran To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: i386/29045: Heavy disk usage causes panic in ffs_blkfree Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 29045 >Category: i386 >Synopsis: Heavy disk usage causes panic in ffs_blkfree >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Jul 17 11:00:00 PDT 2001 >Closed-Date: >Last-Modified: >Originator: Bill Moran >Release: 4.3 >Organization: Independent Consultant >Environment: FreeBSD backup.prioritydesigns.com 4.3-STABLE FreeBSD 4.3-STABLE #0: Thu Jun 21 11:14:06 EDT 2001 root@backup.prioritydesigns.com:/usr/obj/usr/src/sys/BACKUP i386 >Description: Under heavy load, panics occur in the filesystem code. The panics apparently result in some sort of subtle corruption to the filesystem that then results in an increased chance of further panics. The following is a typical debug session after such a panic. IdlePTD 3174400 initial pcb at 286640 panicstr: ffs_blkfree: freeing free block panic messages: --- panic: ffs_blkfree: freeing free block syncing disks... 128 123 110 92 66 34 2 2 2 2 2 2 2 2 2 2 2 2 2 2 giving up on 2 buffers Uptime: 24d9h59m3s dumping to dev #ad/0x20021, offset 380616 dump ata2: resetting devices .. done 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:469 469 if (dumping++) { (kgdb) bt #0 dumpsys () at /usr/src/sys/kern/kern_shutdown.c:469 #1 0xc0152017 in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:309 #2 0xc0152394 in poweroff_wait (junk=0xc0252400, howto=-1071307808) at /usr/src/sys/kern/kern_shutdown.c:556 #3 0xc01dada2 in ffs_blkfree (ip=0xc1b57000, bno=0, size=8192) at /usr/src/sys/ufs/ffs/ffs_alloc.c:1349 #4 0xc01dd06d in ffs_indirtrunc (ip=0xc1b57000, lbn=-12, dbn=115240768, lastbn=-1, level=0, countp=0xcc441d88) at /usr/src/sys/ufs/ffs/ffs_inode.c:498 #5 0xc01dcbbd in ffs_truncate (vp=0xcc78fac0, length=0, flags=0, cred=0x0, p=0xcb0eba00) at /usr/src/sys/ufs/ffs/ffs_inode.c:314 #6 0xc01e719e in ufs_inactive (ap=0xcc441eb4) at /usr/src/sys/ufs/ufs/ufs_inode.c:84 #7 0xc01ec3ed in ufs_vnoperate (ap=0xcc441eb4) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2373 #8 0xc017df9a in vput (vp=0xcc78fac0) at vnode_if.h:815 #9 0xc01811d9 in unlink (p=0xcb0eba00, uap=0xcc441f80) at /usr/src/sys/kern/vfs_syscalls.c:1471 #10 0xc0227ee2 in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 1, tf_esi = 134721536, tf_ebp = -1077936872, tf_isp = -867950636, tf_ebx = 134768128, tf_edx = 134768200, tf_ecx = 0, tf_eax = 10, tf_trapno = 7, tf_err = 2, tf_eip = 134533564, tf_cs = 31, tf_eflags = 643, tf_esp = -1077936916, tf_ss = 47}) ---Type to continue, or q to quit--- at /usr/src/sys/i386/i386/trap.c:1150 #11 0xc021cda5 in Xint0x80_syscall () #12 0x804832b in ?? () #13 0x8048135 in ?? () backtraces vary slightly from panic to panic, but the panic message is always the same and is always called from ffs_blkfree() The hardware involved is as follows: Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.3-STABLE #0: Thu Jun 21 11:14:06 EDT 2001 root@backup.prioritydesigns.com:/usr/obj/usr/src/sys/BACKUP Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 756743963 Hz CPU: AMD Athlon(tm) Processor (756.74-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x642 Stepping = 2 Features=0x183f9ff AMD Features=0xc0440000<,AMIE,DSP,3DNow!> real memory = 134135808 (130992K bytes) avail memory = 127270912 (124288K bytes) Preloaded elf kernel "kernel" at 0xc02e8000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc02e809c. Pentium Pro MTRR support enabled md0: Malloc disk npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib2: at device 1.0 on pci0 pci1: on pcib2 pci1: at 0.0 irq 11 isab0: at device 4.0 on pci0 isa0: on isab0 atapci0: port 0xb800-0xb80f at device 4.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 pci0: at 4.3 irq 5 fxp0: port 0x9400-0x943f mem 0xe0800000-0xe08fffff,0xe1000000-0xe1000fff irq 5 at device 9.0 on pci0 fxp0: Ethernet address 00:d0:b7:46:10:e9 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto ahc0: port 0x9000-0x90ff mem 0xe0000000-0xe0000fff irq 7 at device 10.0 on pci0 aic7860: Single Channel A, SCSI Id=7, 3/255 SCBs atapci1: port 0x7400-0x743f,0x7800-0x7803,0x8000-0x8007,0x8400-0x8403,0x8800-0x8807 mem 0xdf800000-0xdf81ffff irq 10 at device 17.0 on pci0 ata2: at 0x8800 on atapci1 ata3: at 0x8000 on atapci1 pcib1: on motherboard pci2: on pcib1 fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0: configured irq 4 not in bitmap of probed irqs 0 sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 8250 sio1: configured irq 3 not in bitmap of probed irqs 0 ppc0: parallel port not found. ad4: 73308MB [148945/16/63] at ata2-master UDMA100 acd0: CDROM at ata0-slave using PIO4 Waiting 5 seconds for SCSI devices to settle sa0 at ahc0 bus 0 target 0 lun 0 sa0: Removable Sequential Access SCSI-2 device sa0: 10.000MB/s transfers (10.000MHz, offset 15) Mounting root from ufs:/dev/ad4s1a WARNING: / was not properly dismounted >How-To-Repeat: The HDD is partitioned into /, /var, /usr, swap and /data. The /data partition is ~65G and is where the problem appears to occur. There is also an NFS mount to a 65G partition on another FreeBSD computer. This computer contains ~30G of data (fluctuates +/-3G per day) On Sunday at 3:00 AM, a script is run that does 'rm -r /data/*' and then does a cp to duplicate the remote drive onto the data partition. This operation will cause the described panic approximatly once a month. The other days of the week, rsync is used to maintain file versions. rsync has only caused a panic once in the four months this machine has been operating. (except as described below) Once a panic has occurred, any type of heavy HDD usage will cause a repeat panic ~90% of the time. newfsing the /data partition reduces the fequency of the panics to the percentages previously described. The enabling/disabling of softupdates doesn't seem to make any difference to the frequency of the panics. >Fix: I don't have any suggestions at this point. My best guess so far is that while dealing with fragmentation the code is somehow losing track of whether or not a block is free. I have 3 crash dumps so far (I'm starting a collection) If you need any more information on this, please don't hesitate to contact me. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message