Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 11 Jul 2016 08:46:33 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: Not-so stable if you take a CAM error....
Message-ID:  <877f5e8e-c1e7-6fb0-6ceb-031ce3e68582@denninger.net>
In-Reply-To: <1468243977.72182.118.camel@freebsd.org>
References:  <2b0c454b-c1a0-4b5b-e778-bf0939e90ae1@denninger.net> <op.ykfe1fvbkndu52@ronaldradial.radialsg.local> <6e9c07e1-12a6-a7cd-f775-6b0fe5a706bc@denninger.net> <1468243977.72182.118.camel@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms020802060606020801040803
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 7/11/2016 08:32, Ian Lepore wrote:
> On Mon, 2016-07-11 at 06:30 -0500, Karl Denninger wrote:
>> On 7/11/2016 02:57, Ronald Klop wrote:
>>> On Mon, 11 Jul 2016 02:54:38 +0200, Karl Denninger
>>> <karl@denninger.net> wrote:
>>>
>>>> Got a (nasty) surprise this afternoon on my sandbox machine.
>>>>
>>>> I was updating some Raspberry Pi2 machines which involved taking
>>>> the sd
>>>> card out, sticking it in an adapter and plugging it into the
>>>> sandbox,
>>>> then mounting the partition and using rsync.
>>>>
>>>> Unfortunately one of the cards was, unknown to me, bad and
>>>> returned a
>>>> write error during the update.
>>>>
>>>> The machine panic'd immediately after the CAM write error popped
>>>> up.
>>>>
>>>> I was quite surprised by this, since (1) the SD card was (of
>>>> course)
>>>> mounted as a UFS filesystem; it shows up as a CAM device, (2) the
>>>> machine itself is running off a ZFS root on a normal host-adapter
>>>> and
>>>> thus there is no comingling of the buffer cache and (3) there
>>>> were no
>>>> images being run from (can't, wrong architecture!) nor any system
>>>> I/O
>>>> (e.g. pagefile) going to the SD card.
>>>>
>>>> I certainly understand that under some circumstances (maybe even
>>>> most
>>>> circumstances) taking a hard I/O error to a system device is
>>>> going to
>>>> hose you and a panic() is arguably "least astonishment" when the
>>>> price
>>>> of being wrong might be a corrupted system file or worse (e.g.
>>>> corrupted
>>>> paged-out RSS, etc.)  But I didn't expect a panic out a failed
>>>> write to
>>>> a device that is mounted and being used purely for data.
>>>>
>>>> I don't have a crash dump but can almost-certainly reproduce this
>>>> if
>>>> it's something that shouldn't happen and thus merits
>>>> investigation.
>>>>
>>> Hi,
>>>
>>> I understand you are surprised by this. I don't think it is the way
>>> it
>>> should work.
>>> Is there _any_ debugging information for people to use and try to
>>> help
>>> you? Like which FreeBSD version are you running? Which FreeBSD
>>> version
>>> was used to create the UFS fs? Does it use softupdates (SU) or also
>>> journaling (SU+J)?
>>> Maybe some output of dmesg? Or type of SD-card and reader. Other
>>> people might have similar problems with similar hardware.
>>>
>>> Regards,
>>> Ronald.
>>>
>> FreeBSD 11.0-BETA1 #0 r302489: Sat Jul  9 10:15:24 CDT 2016   =20
>> karl@NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP
>>
>> and
>>
>> FreeBSD 11.0-BETA1 #0 r302526: Sun Jul 10 10:39:31 CDT 2016   =20
>> karl@NewFS.denninger.net:/pics/CrossBuild/obj/arm.armv6/pics/CrossBui
>> ld/src/sys/RPI2
>>
>> Both blew up in the same way when stimulated with same I/O error.
>>
>> The filesystem in question does have softupdates enabled (the RPI
>> images
>> have it turned on by default) but no journaling.  It's not
>> card/reader
>> dependent no architecture dependent; when it occurred the first time
>> I
>> stuck the card and reader into one of my Pis and attempted to update
>> it
>> there (thinking that perhaps my sandbox machine's USB port was wonky)
>> and it blew up the Pi2 in the exact same way.
>>
>> This isn't (obviously, given both Intel-style and ARM machines being
>> involved) architecture dependent.
>>
>> It's been a good long while since I took an actual hard I/O error
>> that
>> was 'visible' at the OS level (I've had plenty of disks die on ZFS
>> over
>> last few years but no "double failures" on a mirror or similar, and I
>> on
>> my servers I haven't had a UFS-based system for a while.  This
>> definitely looks like some sort of regression in the code; I've run
>> FreeBSD for a hell of a long time and have had plenty of instances
>> where
>> disks have failed without having the machine go out from under me.
>>
> Unfortunately, this is "just the way it works".  A hard IO error while
> writing to a ufs filesystem with softupdates enabled will cause a
> panic, because the softupdates code doesn't handle that sort of
> failure, and the failure means that filesystem integrity is lost.  The
> code has no idea how important the data is to the functioning of the
> system, no basis on which to decide whether to panic or not.
>
> -- Ian
>

Here's the backtrace ... sounds like expected behavior, which is not-so
good all-in for a situation like this.  I guess the strategy is to turn
off softupdates before attempting such an update so as not to crash the
host machine if there's a problem with the card.

root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you =
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for detail=
s.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: initiate_write_inodeblock_ufs2: already started
cpuid =3D 14
KDB: stack backtrace:
#0 0xffffffff80b1f357 at kdb_backtrace+0x67
#1 0xffffffff80ad6ec2 at vpanic+0x182
#2 0xffffffff80ad6d33 at panic+0x43
#3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d
#4 0xffffffff80de61eb at ffs_geom_strategy+0x13b
#5 0xffffffff80b872f7 at bufwrite+0x267
#6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca
#7 0xffffffff80b96b77 at vop_stdfsync+0x277
#8 0xffffffff80983766 at devfs_fsync+0x26
#9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d
#10 0xffffffff80baf1ae at sched_sync+0x3be
#11 0xffffffff80a8dcb5 at fork_exit+0x85
#12 0xffffffff80f7f85e at fork_trampoline+0xe
Uptime: 27m9s


(kgdb) where
#0  doadump (textdump=3D<value optimized out>) at pcpu.h:221
#1  0xffffffff80ad6949 in kern_reboot (howto=3D260)
    at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80ad6efb in vpanic (fmt=3D<value optimized out>,
    ap=3D<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff80ad6d33 in panic (fmt=3D0x0)
    at /usr/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff80dc16ad in softdep_disk_io_initiation (bp=3D<value
optimized out>)
    at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301
#5  0xffffffff80de61eb in ffs_geom_strategy (bo=3D<value optimized out>,
    bp=3D<value optimized out>) at buf.h:412
#6  0xffffffff80b872f7 in bufwrite (bp=3D0xfffffe02e8629b30) at buf.h:405=

#7  0xffffffff80b8ac6a in vfs_bio_awrite (bp=3D<value optimized out>)
    at buf.h:393
#8  0xffffffff80b96b77 in vop_stdfsync (ap=3D0xfffffe034f481b68)
    at /usr/src/sys/kern/vfs_default.c:692
#9  0xffffffff80983766 in devfs_fsync (ap=3D0xfffffe034f481b68)
    at /usr/src/sys/fs/devfs/devfs_vnops.c:702
#10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=3D<value optimized out>,
    a=3D<value optimized out>) at vnode_if.c:1331
#11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549
#12 0xffffffff80a8dcb5 in fork_exit (callout=3D0xffffffff80baedf0
<sched_sync>,
    arg=3D0x0, frame=3D0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork=
=2Ec:1038
#13 0xffffffff80f7f85e in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#14 0x0000000000000000 in ?? ()
(kgdb)

FreeBSD 11.0-BETA1 #0 r302439: Fri Jul  8 14:37:27 CDT 2016   =20
karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC

The offending code line:

static void
initiate_write_inodeblock_ufs2(inodedep, bp)
        struct inodedep *inodedep;
        struct buf *bp;                 /* The inode block */
{
        struct allocdirect *adp, *lastadp;
        struct ufs2_dinode *dp;
        struct ufs2_dinode *sip;
        struct inoref *inoref;
        struct ufsmount *ump;
        struct fs *fs;
        ufs_lbn_t i;
#ifdef INVARIANTS
        ufs_lbn_t prevlbn =3D 0;
#endif
        int deplist;

*        if (inodedep->id_state & IOSTARTED)**
**                panic("initiate_write_inodeblock_ufs2: already started"=
);*
        inodedep->id_state |=3D IOSTARTED;


--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms020802060606020801040803
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD
ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg
XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp
3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f
IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO
aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ
Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5
vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq
yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/
o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l
eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI
KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw
CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB
DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX
RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw
FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6
eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf
G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO
sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb
An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+
JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ
3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat
HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0
FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG
1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT
n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH
RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD
MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5
c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI
hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MTExMzQ2MzNaME8GCSqGSIb3DQEJBDFCBEAX
noBAQXaUoBaO0aw54Bts0nIIp/PSE/465xMN36fzeqITV2T6iUHINIZxlPZRvqxW5qLj4RXD
x/UE1JptbJKGMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK
BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI
KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV
BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z
IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk
YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT
AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1
ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG
9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAk2bSzDng
ehDbyKQj1GkDrMkK5V7isdI7H5xwDyKvZJVNys43RkRJGQd7H1Q+m4wi8XFOlyCV7lESi4fA
W+ucHniCiGtHrdltCTCNXSJ2hvdDyfu9Wf6T2d3dtHERwsXu7jmzFQ0YJg0Kn2jPAN1NTZyq
FdzBDbsv1sM+1BJ2zMQbyexqKEjfFD8voqXhAYDhI1JqYthWS35gP31KfWWhhzRgYgGk1IdM
wykFyyJKul4v9ALosaAv3mr0WNq0ZQQNrVTqS0hevRTblidcjSPytQ3GhYneDICrXVpxYrFn
uNxW0Vv0V2vmMJYKpRNqPwkHIWVlGN3g+gRdL4LdspPZevDHOdzhfeGtH0QtZn1HrdG3DdSn
+PuUCjE0iZmXvvcA9JARgGEen5OE+XcMwNSvEN2Sga6/u8EX6195h9wXHTq7Q0Ujp0TR4KTy
ANpCNKDwomy4sY8o2q2lhMCNQ0ggetz0X5j92eIwymn83JvQcsUu8qsxO0nJUrx0/ZBOcy8u
WaNLy591ap2jHxqHSzVBRMGmo5GjeIMH84v2E92AAhmi5VjEiDPhaT1nTx3cySbHkCcDHs13
MHPV/H7cGdqGAcSc4gCCW77u3rbcZNoySef4pU1K6SbMsbd1O6+BJwQadIkQu/z8/4qyauhA
cygJCsjvfMq2pmoC4qhjcPvqoy0AAAAAAAA=
--------------ms020802060606020801040803--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?877f5e8e-c1e7-6fb0-6ceb-031ce3e68582>