From owner-freebsd-fs@FreeBSD.ORG Fri Mar 1 05:11:47 2013 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id A524AB8A; Fri, 1 Mar 2013 05:11:47 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 6F80D7FC; Fri, 1 Mar 2013 05:11:47 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id r215BWoU092532; Thu, 28 Feb 2013 21:11:36 -0800 (PST) (envelope-from truckman@FreeBSD.org) Message-Id: <201303010511.r215BWoU092532@gw.catspoiler.org> Date: Thu, 28 Feb 2013 21:11:32 -0800 (PST) From: Don Lewis Subject: Re: Panic in ffs_valloc (Was: Unexpected SU+J inconsistency AGAIN -- please, don't shift topic to ZFS!) To: lev@FreeBSD.org In-Reply-To: <1698593972.20130228164821@serebryakov.spb.ru> MIME-Version: 1.0 Content-Type: TEXT/plain; charset=iso-8859-5 Content-Transfer-Encoding: 8BIT Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Mar 2013 05:11:47 -0000 On 28 Feb, Lev Serebryakov wrote: > Hello, Lev. > You wrote 28 февраля 2013 г., 14:13:23: > > LS>>> My server runs 9.1-STABLE and have 8Tb UFS2 SU+J FS. > LS>>> It crashed a several minutes ago (I don't know reason yet) and fsck > LS>>> says "Unexpected SU+J inconsistency" (Inode mode/directory tyme > LS>>> mismatch) and requested full check (which will take more than hour on > LS>>> such FS). > LS>> Full fsck found "INTERNAL ERROR: DUPS WITH SOFTUPDATES" and keeps running... > LS> full fsck reconnected about 1000 files, which was written in time of > LS> crash. > LS> Really, sever crashed when SVN mirror seed was been unpacking on > LS> this FS, so there was massive file creation at this time. > Ok, I've checked memory, and now I have booted system with crashlog > (!) > > Here it is (please note, that panic() was called by ffs_valloc): > > #0 doadump (textdump=) at pcpu.h:229 > 229 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump (textdump=) at pcpu.h:229 > #1 0xffffffff80431494 in kern_reboot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:448 > #2 0xffffffff80431997 in panic (fmt=0x1
) > at /usr/src/sys/kern/kern_shutdown.c:636 > #3 0xffffffff80573d8c in ffs_valloc (pvp=0xfffffe0024d68000, mode=33204, > cred=0xfffffe0023d52700, vpp=0xffffff81c35586b8) > at /usr/src/sys/ufs/ffs/ffs_alloc.c:995 > #4 0xffffffff805aa126 in ufs_makeinode (mode=33204, dvp=0xfffffe0024d68000, > vpp=0xffffff81c3558a10, cnp=0xffffff81c3558a38) > at /usr/src/sys/ufs/ufs/ufs_vnops.c:2614 > #5 0xffffffff80634391 in VOP_CREATE_APV (vop=, > a=0xffffff81c3558920) at vnode_if.c:252 > #6 0xffffffff804d389a in vn_open_cred (ndp=0xffffff81c35589d0, > flagp=0xffffff81c35589cc, cmode=, > vn_open_flags=, cred=0xfffffe0023d52700, > fp=0xfffffe00ae9cf370) at vnode_if.h:109 > #7 0xffffffff804cc0d9 in kern_openat (td=0xfffffe012d095000, fd=-100, > path=0x801c951e0
, > pathseg=UIO_USERSPACE, flags=2562, mode=) > at /usr/src/sys/kern/vfs_syscalls.c:1132 > #8 0xffffffff805f1400 in amd64_syscall (td=0xfffffe012d095000, traced=0) > at subr_syscall.c:135 > #9 0xffffffff805dbfc7 in Xfast_syscall () > at /usr/src/sys/amd64/amd64/exception.S:387 > #10 0x000000080177ce5c in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) > > Full textdump: http://lev.serebryakov.spb.ru/crashes/core-ffs-crash.txt.1 > > Please note, that FS was loaded by torrent client (40Mbit/s outbound > traffic) and unpacking of svnmirror-base-r238500.tar.xz from this FS > to itself. So, it was really high multistream load. > > I'll try to reproduce this on SINGLE disk, without geom_radi5 :) The fact that the filesystem code called panic() indicates that the filesystem was already corrupt by that point. That's a likely reason for fsck complaining about the unexpected SU+J inconsistency. Incorrect write ordering that allowed the filesystem to become inconsistent because some pending writes were lost because of the panic might not be necessary, but this might have allowed an earlier crash where a full fsck was skipped to leave the filesystem in this state. This panic might also be a result of the bug fixed in 246877, but I have my doubts about that.