Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jul 2016 17:56:05 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 211013] Write error to UFS filesystem with softupdates panics machine
Message-ID:  <bug-211013-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D211013

            Bug ID: 211013
           Summary: Write error to UFS filesystem with softupdates panics
                    machine
           Product: Base System
           Version: 11.0-BETA1
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Many People
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: karl@denninger.net

The machine in question had mounted a UFS filesystem mounted that had
softupdates enabled (on an SD card; I was updating a system that runs FreeB=
SD
on a Raspberry Pi2 by plugging the card into a different machine) and the I=
/O
card took an unrecoverable write error.

The result was a kernel panic; this is apparently considered expected behav=
ior
at present if softupdates are turned on for the filesystem because it's
possible that the filesystem has now been corrupted and there is no way to =
be
sure with the machine running.  Thus the choice to panic() when this situat=
ion
occurs.

But it appears that the choice to panic() is too broad and unnecessary in t=
hat
in many cases a less-severe action is effective while not exposing the syst=
em
to the risk of unknown filesystem corruption.

Yes, if there are working-set pages on that volume and it is corrupt, the
system is no longer stable (this is especially true if the system is *runni=
ng*
from that volume.)  It is also true that in the case of a solid-state devic=
e of
some kind the impact of a write error may cross a filesystem boundary, so i=
t's
insufficient to invalidate the filesystem (on a SSD or similar device the
read/erase/write cycle for a data re-write may involve many megabytes of da=
ta,
and that can possibly not be entirely local to the filesystem mounted if th=
ere
is more than one on the physical volume.)

HOWEVER, forcibly-detaching the volume in question instead of calling panic=
()
*should* be effective in preventing the possibility of propagating a corrup=
ted
filesystem.  While this will lead to a panic in the event that executing RSS
(or consumed page file space) is present on that volume, in the case where =
the
device holds only data the detach will *not* panic the machine.

This appears to be a situation where a less-severe "remedy" for a failed I/=
O is
certainly called for.

The following backtrace was captured from the panic itself:

root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain condition=
s.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: initiate_write_inodeblock_ufs2: already started
cpuid =3D 14
KDB: stack backtrace:
#0 0xffffffff80b1f357 at kdb_backtrace+0x67
#1 0xffffffff80ad6ec2 at vpanic+0x182
#2 0xffffffff80ad6d33 at panic+0x43
#3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d
#4 0xffffffff80de61eb at ffs_geom_strategy+0x13b
#5 0xffffffff80b872f7 at bufwrite+0x267
#6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca
#7 0xffffffff80b96b77 at vop_stdfsync+0x277
#8 0xffffffff80983766 at devfs_fsync+0x26
#9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d
#10 0xffffffff80baf1ae at sched_sync+0x3be
#11 0xffffffff80a8dcb5 at fork_exit+0x85
#12 0xffffffff80f7f85e at fork_trampoline+0xe
Uptime: 27m9s


(kgdb) where
#0  doadump (textdump=3D<value optimized out>) at pcpu.h:221
#1  0xffffffff80ad6949 in kern_reboot (howto=3D260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80ad6efb in vpanic (fmt=3D<value optimized out>,
    ap=3D<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80ad6d33 in panic (fmt=3D0x0)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80dc16ad in softdep_disk_io_initiation (bp=3D<value optimized=
 out>)
    at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301
#5  0xffffffff80de61eb in ffs_geom_strategy (bo=3D<value optimized out>,
    bp=3D<value optimized out>) at buf.h:412
#6  0xffffffff80b872f7 in bufwrite (bp=3D0xfffffe02e8629b30) at buf.h:405
#7  0xffffffff80b8ac6a in vfs_bio_awrite (bp=3D<value optimized out>)
    at buf.h:393
#8  0xffffffff80b96b77 in vop_stdfsync (ap=3D0xfffffe034f481b68)
    at /usr/src/sys/kern/vfs_default.c:692
#9  0xffffffff80983766 in devfs_fsync (ap=3D0xfffffe034f481b68)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:702
#10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=3D<value optimized out>,
    a=3D<value optimized out>) at vnode_if.c:1331
#11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549
#12 0xffffffff80a8dcb5 in fork_exit (callout=3D0xffffffff80baedf0 <sched_sy=
nc>,
    arg=3D0x0, frame=3D0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork.c=
:1038
#13 0xffffffff80f7f85e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#14 0x0000000000000000 in ?? ()
(kgdb)

FreeBSD 11.0-BETA1 #0 r302439: Fri Jul  8 14:37:27 CDT 2016=20=20=20=20
karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC

The offending code line:

static void
initiate_write_inodeblock_ufs2(inodedep, bp)
        struct inodedep *inodedep;
        struct buf *bp;                 /* The inode block */
{
        struct allocdirect *adp, *lastadp;
        struct ufs2_dinode *dp;
        struct ufs2_dinode *sip;
        struct inoref *inoref;
        struct ufsmount *ump;
        struct fs *fs;
        ufs_lbn_t i;
#ifdef INVARIANTS
        ufs_lbn_t prevlbn =3D 0;
#endif
        int deplist;

        if (inodedep->id_state & IOSTARTED)
                panic("initiate_write_inodeblock_ufs2: already started");
        inodedep->id_state |=3D IOSTARTED;



-- End capture

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-211013-8>