Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Feb 2017 15:23:03 -0600
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-stable@freebsd.org
Subject:   Re: FreeBSD 10.2-RELEASE #0 r286666: Panic and crash
Message-ID:  <03a3a0f3-c1fb-9874-93dc-e04185893cc2@denninger.net>
In-Reply-To: <CY1PR14MB0520369956DADEE12E908812C4400@CY1PR14MB0520.namprd14.prod.outlook.com>
References:  <CY1PR14MB0520369956DADEE12E908812C4400@CY1PR14MB0520.namprd14.prod.outlook.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms000200010406000104090602
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

On 2/6/2017 15:01, Shawn Bakhtiar wrote:
> Hi all!
>
> http://pastebin.com/niXrjF0D
>
> Please refer to full output from crash above.
>
> This morning our IMAP server decided to go belly up. I could not remote=
 in, and the machine would not respond to any pings.
>
> Checking the physical console I had the following worrisome messages on=
 screen:
>
> =E2=80=A2 g_vfs_done():da1p1[READ(offset=3D7265561772032, length=3D3276=
8)]error =3D 5
> =E2=80=A2 g_vfs_done():da1p1[WRITE(offset=3D7267957735424, length=3D131=
072)]error =3D 16
> =E2=80=A2 /mnt/USBBD: got error 16 while accessing filesystem
> =E2=80=A2 panic: softdep_deallocate_dependencies: unrecovered I/O error=

> =E2=80=A2 cpuid =3D 5
>
> /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of t=
he IMAP data using rsync. Everything so far has worked without issue.
>
> I also noticed a bunch of:
>
> =E2=80=A2 fstat: can't read file 2 at 0x4000000001fffff
> =E2=80=A2 fstat: can't read file 4 at 0x780000ffff
> =E2=80=A2 fstat: can't read file 5 at 0x600000000
> =E2=80=A2 fstat: can't read file 1 at 0x200007fffffffff
> =E2=80=A2 fstat: can't read file 2 at 0x4000000001fffff
> =E2=80=A2 fstat: can't read file 4 at 0x780000ffff
> =E2=80=A2 fstat: can't read file 5 at 0x600000000
>
>
> but I have no idea what these are from.
>
> df -h output:
> /dev/da0p2    1.8T    226G    1.5T    13%    /
> devfs         1.0K    1.0K      0B   100%    /dev
> /dev/da1p1    7.0T    251G    6.2T     4%    /mnt/USBBD
>
>
> da0p2 is a RAID level 5 on an HP Smart Array
>
> Here is the output of dmsg after reboot:
> http://pastebin.com/rHVjgZ82
>
> Obviously both the RAID and USB drive did not walk away from the crash =
cleaning. Should I be running a fsck at this point on both from single us=
er mode to verify and clean up. My concern is the:
> WARNING: /: mount pending error: blocks 0 files 26
> when mounting /dev/da0p2
>
> For some reason I was under the impression that fsck was run automatica=
lly on reboot.
>
> Any help in this matter would be greatly appreciated. I'm a little conc=
erned that a backup strategy that has worked for us for many MANY years w=
ould so easily throw the OS into panic. If an I/O error occurred on the U=
SB Drive I would frankly think it should just back out, without panic. Or=
 am I missing something?
>
> Any recommendations / insights would be most welcome.
> Shawn
>
>
The "mount pending error" is normal on a disk that has softupdates
turned on; fsck runs in the background after the boot, and this is
"safe" because of how the metadata and data writes are ordered.  In
other words the filesystem in this situation is missing uncommitted
data, but the state of the system is consistent.  As a result the system
can mount root read-write without having to fsck it first and the
background cleanup is safe from a disk consistency problem.

The panic itself appears to have resulted from an I/O error that
resulted in a failed operation.

I was part of a thread in 2016 on this you can find here:
https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html

The basic problem is that the softupdates code cannot deal with a hard
I/O error on write because it no longer can guarantee filesystem
integrity if it continues.  I argued in that thread that the superior
solution would be forcibly detach the volume, which would leave you with
a "dirty" filesystem and a failed operation but not a panic.  The
file(s) involved in the write error might be lost, but the integrity of
the filesystem is recoverable (as it is in the panic case) -- at least
it is if the fsck doesn't require writing to a block that *also* errors o=
ut.

The decision in the code is to panic rather than detach the volume,
however, so panic it is.  This one has bit me with sd cards in small
embedded-style machines (where turning off softupdates makes things VERY
slow) and at some point I may look into developing a patch to
forcibly-detach the volume instead.  That obviously won't help you if
the system volume is the one the error happens on (now you just forcibly
detached the root filesystem which is going to get you an immediate
panic anyway) but in the event of a data disk it would prevent the
system from crashing.

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms000200010406000104090602
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC
BlwwggZYMIIEQKADAgECAgE9MA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBMB4XDTE2MTIxODE5NDUzNVoXDTIxMTIxNzE5NDUzNVowVzEL
MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxGzAZBgNVBAMUEmthcmxAZGVubmluZ2VyLm5ldDCCAiIwDQYJKoZIhvcNAQEBBQADggIP
ADCCAgoCggIBAM2N5maxs7NkoY9g5NMxFWll0TYiO7gXrGZTo3q25ZJgNdPMwrntLz/5ewE9
07TEbwJ3ah/Ep9BfZm7JF9vTtE1HkgKtXNKi0pawNGm1Yn26Dz5AbUr1byby6dFtDJr14E07
trzDCtRRvTkOVSBj6PQPal0fAnDtkIYQBVcuMkXkuMCtyfE95pjm8g4K9l7lAcKii3T1/3rE
hCc1o2nBnb7EN1/XwBeCDGB+I2SN/ftZDbKQqGAF5q9dUn+iXU7Z/CVSfUWmhVh6cVZA4Ftv
TglUqj410OuPx+cUQch3h1kFgsuhQR63HiJc3HbRJllHsV0rihvL1CjeARQkhnA6uY9NLFST
p5I/PfzBzW2MSmtN/tGZvmfKKnmtbfUNgkzbIR1K3lsum+yEL71kB93Xtz/4f1demEx5c8TJ
RBIniDHjDeLGK1aoBu8nfnvXAvgthFNTWBOEoR49AHEPjC3kZj0l8JQml1Y8bTQD5gtC5txl
klO60WV0EufU7Hy9CmynMuFtjiA2v71pm097rXeCdrAKgisdYeEESB+SFrlY65rLiLv4n8o1
PX7DqRfqKkOYIakZ0ug/yHVKcq2EM3RiJxwzls5gT70CoOBlKbrC98O8TA6teON0Jq30M06t
NTI2HhvNbJDLbBH+Awf4h1UKB+0ufENwjVvF5Jfz8Ww/FaSDAgMBAAGjgfQwgfEwNwYIKwYB
BQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgwCQYD
VR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIBDQQf
Fh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUpfAI3y+751pp9A0w
6vJHx8RoR/MwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYwFIES
a2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBiB6MlugxYJdccD8boZ/u8
d8VxmLkJCtbfyYHRjYdyoABLW5hE3k3xSpYCM9L7vzWyV/UWwDYKi4ZzxHo4g+jG/GQZfKhx
v38BQjL2G9xD0Hn2d+cygOq3UPjVYlbbfQoew6JbyCFXrrZ7/0jvRMLAN2+bRC7ynaFUixPH
Whnj9JSH7ieYdzak8KN+G2coIC2t2iyfXVKehzi5gdNQ0vJ7+ypbGsRm4gE8Mdo9N/WgFPvZ
HPFqR9Dwas7Z+aHwOabpk5r/336SyjOaZsn3MqKJQZL6GqDKusVOCWt+9uFAD8kadg7FetZe
atIoD9I+zbp59oVoMnkMDMx7Hi85faU03csusqMGsjSsAzWSI1N8PJytZlchLiykokLKc3OL
G87QKlErotlou7cfPX2BbEAH5wmkj9oiqZhxIL/wwAUA+PkiTbEmksKBNompSjUq/6UsR8EA
s74gnu17lmijv8mrg2qMlwRirE7qG8pnE8egLtCDxcjd0Of9WMi2NJskn0/ovC7P+J60Napl
m3ZIgPJst1piYSE0Zc1FIat4fFphMfK5v4iLblo1tFSlkdx1UNDGdg/U+LaXkNVXlMp8fyPm
R80V6cIrCAlEWnBJNxG1UyfbbsvNMCCZBM4faGGsR/hhQOiydlruxhjL6P8J2WV8p11DdeGx
KymWoil2s1J5WTGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxv
cmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRww
GgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5c3Rl
bXMgTExDIENBAgE9MA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN
AQcBMBwGCSqGSIb3DQEJBTEPFw0xNzAyMDYyMTIzMDNaME8GCSqGSIb3DQEJBDFCBEDmWjMH
yrctMDT2HeVUz2pkaVqpWosT63m0yoj93l2dkLGHxSQDV6FBtQ+JzgIFYl5ttj+0o1zjhBtw
FVw1bNAeMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAKBggq
hkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZI
hvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNVBAgT
B0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExM
QzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3VkYSBT
eXN0ZW1zIExMQyBDQQIBPTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYTAlVT
MRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEg
U3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0B
CQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECAT0wDQYJKoZIhvcNAQEBBQAEggIAsMKBPcQP99gW
mNcYGganxIiI19E46B6IP8XZyB6gb+XVKVfGN59STOaMU8MU1gQvAdlwpOFdDopecvz/u3Yi
voG9Y9xV6Zte4o+sXw7cUwVg3K0MF66CRcRunUaymm+kxAJ+TfbsOVgRiPlneU5gt+hQDKVk
7AMmlw+vieNy7BEV8b0179G0IUySplBxkYnvaZ6I8jWwjR84iKFD8Tjkt3TY2m8dMEtwzu+L
M1uFdu5iJooCk8PD8CaFvmpRC5aC6FMXd/6tka8ZWnPGT9bFg66SLFuvGCPPh9tDwmRoIZyG
p0vVcVD+VxgZLmqJsXvu/lxSLaTnfgjrZXBzfdtDrkbwSfCxgDyi6vRnwhu7e9fVW5/czDeK
OHAJ+g7976FMIIjjXxir0Z+5C0dZDEvHmyUYW0inE7WcTSsUcoc9OYIRezm41aLM81UOrZoW
vPxjSVek6z/lHfWcpofWD5rBMrlzqSeD9JY3mvDiYnA6lddi5zfr/RJV4wTOum+pdNAssFxr
NOInBYWSo9JKxdHZnGs/qANQsVTs2U0/kF1CTIB/1ZhnMmal6GB56QfDrQV4eOV3gPwV2sh5
HGl7NheZZtEMN7HR5eTVA1Et+HjTQ+Rys6t6J3jDn8fITVbT8HPp/q73y4lCH9336GnP6XwK
mFFYaJotnzI2LNc+5h1AA2MAAAAAAAA=
--------------ms000200010406000104090602--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?03a3a0f3-c1fb-9874-93dc-e04185893cc2>