From owner-freebsd-current@FreeBSD.ORG Tue Mar 21 21:03:49 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7FF0E16A422; Tue, 21 Mar 2006 21:03:49 +0000 (UTC) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 20E9643D46; Tue, 21 Mar 2006 21:03:48 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k2LL3lpe037830; Tue, 21 Mar 2006 15:03:47 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <44206A33.4000702@centtech.com> Date: Tue, 21 Mar 2006 15:03:47 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5 (X11/20060112) MIME-Version: 1.0 To: Kris Kennaway References: <4415E8BB.1080602@centtech.com> <441B1C28.1020808@centtech.com> <441B2049.20507@centtech.com> <200603201528.49007.jhb@freebsd.org> <20060320220102.GA78361@xor.obsecurity.org> In-Reply-To: <20060320220102.GA78361@xor.obsecurity.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1347/Tue Mar 21 10:35:25 2006 on mh2.centtech.com X-Virus-Status: Clean Cc: freebsd-current@freebsd.org Subject: Re: panic: ffs_valloc: dup alloc in 6.1-BETA4 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Mar 2006 21:03:49 -0000 Kris Kennaway wrote: > On Mon, Mar 20, 2006 at 03:28:46PM -0500, John Baldwin wrote: > >> On Friday 17 March 2006 15:47, Eric Anderson wrote: >> >>> Eric Anderson wrote: >>> >>>> [moved to -current due to lack of response] >>>> >>>> Eric Anderson wrote: >>>> >>>>> Mike Tancsa wrote: >>>>> >>>>>> At 04:48 PM 13/03/2006, Eric Anderson wrote: >>>>>> >>>>>>> I get the above panic after nfs clients attach to this nfs server >>>>>>> and being >>>>>>> I do have dumps from two crashes so far. >>>>>>> This is FreeBSD-6.1-PRERELEASE from Friday-ish. >>>>>>> >>>>>> Dont know if it was fixed or not, but there were a lot of VM changes >>>>>> committed last night that might help. >>>>>> >>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2006-March/023526.html >>>>>> >>>>>> >>>>> I just updated, and it still happens. More information for those >>>>> interested: >>>>> >>>>> mode = 0100600, inum = 58456203, fs = /mnt >>>>> panic: ffs_valloc: dup alloc >>>>> >>>>> >>>>> #0 doadump () at pcpu.h:165 >>>>> 165 __asm __volatile("movl %%fs:0,%0" : "=r" (td)); >>>>> (kgdb) backtrace >>>>> #0 doadump () at pcpu.h:165 >>>>> #1 0xc064482f in boot (howto=260) at >>>>> /usr/src/sys/kern/kern_shutdown.c:399 >>>>> #2 0xc0644b55 in panic (fmt=0xc0890967 "ffs_valloc: dup alloc") at >>>>> /usr/src/sys/kern/kern_shutdown.c:555 >>>>> #3 0xc077ee3c in ffs_valloc (pvp=0xc8eab440, mode=33152, >>>>> cred=0xc8a91d80, vpp=0xe83a5824) at /usr/src/sys/ufs/ffs/ffs_alloc.c:945 >>>>> #4 0xc07a5933 in ufs_makeinode (mode=33152, dvp=0xc8eab440, >>>>> vpp=0xe83a5acc, cnp=0xe83a5ae0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2165 >>>>> #5 0xc07a2b0d in ufs_create (ap=0x0) at >>>>> /usr/src/sys/ufs/ufs/ufs_vnops.c:171 >>>>> #6 0xc082dc98 in VOP_CREATE_APV (vop=0x0, a=0xe83a5a18) at >>>>> vnode_if.c:204 >>>>> #7 0xc0737590 in nfsrv_create (nfsd=0xc8a91d00, slp=0xc8816700, >>>>> td=0xc7d99780, mrq=0xe83a5c98) at vnode_if.h:111 >>>>> #8 0xc0744e95 in nfssvc_nfsd (td=0x0) at >>>>> /usr/src/sys/nfsserver/nfs_syscalls.c:472 >>>>> #9 0xc0744688 in nfssvc (td=0xc7d99780, uap=0xe83a5d04) at >>>>> /usr/src/sys/nfsserver/nfs_syscalls.c:181 >>>>> #10 0xc081cd7f in syscall (frame= >>>>> {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 1, tf_esi = 0, >>>>> tf_ebp = -1077941448, tf_isp = -398828188, tf_ebx = 4, tf_edx = >>>>> 672385208, tf_ecx = 25, tf_eax = 155, tf_trapno = 12, tf_err = 2, >>>>> tf_eip = 671840155, tf_cs = 51, tf_eflags = 662, tf_esp = >>>>> -1077941476, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981 >>>>> #11 0xc0809e8f in Xint0x80_syscall () at >>>>> /usr/src/sys/i386/i386/exception.s:200 >>>>> #12 0x00000033 in ?? () >>>>> Previous frame inner to this frame (corrupt stack?) >>>>> (kgdb) >>>>> >>>>> Maybe that helps somebody? >>>>> >>>>> Should I sent this to -current instead, since it appears this would >>>>> happen under -current also, and possibly there is a larger base of >>>>> people watching the list? >>>>> >>>> Also, here's a screenshot of the crash, and I have a good dump if >>>> anyone wants me to get more debugging info. >>>> >>>> http://www.googlebit.com/freebsd/fbsd-6.1b4-nfscrash.png >>>> >>>> >>> Oh yea, and I can reproduce at will, on two separate machines. >>> >> If you boot the machines in single user and run 'fsck -y' repeatedly >> until fsck stops finding breakage does it work ok after that? It maybe >> that you have corrupted disks that bgfsck just can't handle. >> > > Basically it seems to me that bg fsck is always dangerous: there is an > assumption that the only kinds of filesystem damage that exist are the > "harmless" kinds (from power failure) it can later repair. But this > is clearly false, because the filesystem may be in an arbitrarily > damaged state (e.g. after a panic), and the kernel does not handle the > possibility that filesystem data may not be completely trustable at > runtime (this was the point of foreground fsck). > Turns out, that this bug was caused by no having softupdates enabled on the filesystem. So, here's how to reproduce the problem, at least this brought the problem about two times. newfs /dev/device (softupdates not enabled I guess) mount /dev/device /mnt export the filesystem mount the filesystem on a client begin lots of writes to the nfs mounted filesystem over NFS power cycle the server fsck_ffs -y /dev/device Once it's clean, mount, export, and within a few seconds, panic. fsck'ing, and then enabling softupdates, makes the problem disappear. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------