Date: Mon, 6 Feb 2017 15:23:03 -0600 From: Karl Denninger <karl@denninger.net> To: freebsd-stable@freebsd.org Subject: Re: FreeBSD 10.2-RELEASE #0 r286666: Panic and crash Message-ID: <03a3a0f3-c1fb-9874-93dc-e04185893cc2@denninger.net> In-Reply-To: <CY1PR14MB0520369956DADEE12E908812C4400@CY1PR14MB0520.namprd14.prod.outlook.com> References: <CY1PR14MB0520369956DADEE12E908812C4400@CY1PR14MB0520.namprd14.prod.outlook.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format. --------------ms000200010406000104090602 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2/6/2017 15:01, Shawn Bakhtiar wrote: > Hi all! > > http://pastebin.com/niXrjF0D > > Please refer to full output from crash above. > > This morning our IMAP server decided to go belly up. I could not remote= in, and the machine would not respond to any pings. > > Checking the physical console I had the following worrisome messages on= screen: > > =E2=80=A2 g_vfs_done():da1p1[READ(offset=3D7265561772032, length=3D3276= 8)]error =3D 5 > =E2=80=A2 g_vfs_done():da1p1[WRITE(offset=3D7267957735424, length=3D131= 072)]error =3D 16 > =E2=80=A2 /mnt/USBBD: got error 16 while accessing filesystem > =E2=80=A2 panic: softdep_deallocate_dependencies: unrecovered I/O error= > =E2=80=A2 cpuid =3D 5 > > /mnt/USBDB is a MyBook USB 8TB drive that we use for daily backups of t= he IMAP data using rsync. Everything so far has worked without issue. > > I also noticed a bunch of: > > =E2=80=A2 fstat: can't read file 2 at 0x4000000001fffff > =E2=80=A2 fstat: can't read file 4 at 0x780000ffff > =E2=80=A2 fstat: can't read file 5 at 0x600000000 > =E2=80=A2 fstat: can't read file 1 at 0x200007fffffffff > =E2=80=A2 fstat: can't read file 2 at 0x4000000001fffff > =E2=80=A2 fstat: can't read file 4 at 0x780000ffff > =E2=80=A2 fstat: can't read file 5 at 0x600000000 > > > but I have no idea what these are from. > > df -h output: > /dev/da0p2 1.8T 226G 1.5T 13% / > devfs 1.0K 1.0K 0B 100% /dev > /dev/da1p1 7.0T 251G 6.2T 4% /mnt/USBBD > > > da0p2 is a RAID level 5 on an HP Smart Array > > Here is the output of dmsg after reboot: > http://pastebin.com/rHVjgZ82 > > Obviously both the RAID and USB drive did not walk away from the crash = cleaning. Should I be running a fsck at this point on both from single us= er mode to verify and clean up. My concern is the: > WARNING: /: mount pending error: blocks 0 files 26 > when mounting /dev/da0p2 > > For some reason I was under the impression that fsck was run automatica= lly on reboot. > > Any help in this matter would be greatly appreciated. I'm a little conc= erned that a backup strategy that has worked for us for many MANY years w= ould so easily throw the OS into panic. If an I/O error occurred on the U= SB Drive I would frankly think it should just back out, without panic. Or= am I missing something? > > Any recommendations / insights would be most welcome. > Shawn > > The "mount pending error" is normal on a disk that has softupdates turned on; fsck runs in the background after the boot, and this is "safe" because of how the metadata and data writes are ordered. In other words the filesystem in this situation is missing uncommitted data, but the state of the system is consistent. As a result the system can mount root read-write without having to fsck it first and the background cleanup is safe from a disk consistency problem. The panic itself appears to have resulted from an I/O error that resulted in a failed operation. I was part of a thread in 2016 on this you can find here: https://lists.freebsd.org/pipermail/freebsd-stable/2016-July/084944.html The basic problem is that the softupdates code cannot deal with a hard I/O error on write because it no longer can guarantee filesystem integrity if it continues. I argued in that thread that the superior solution would be forcibly detach the volume, which would leave you with a "dirty" filesystem and a failed operation but not a panic. The file(s) involved in the write error might be lost, but the integrity of the filesystem is recoverable (as it is in the panic case) -- at least it is if the fsck doesn't require writing to a block that *also* errors o= ut. The decision in the code is to panic rather than detach the volume, however, so panic it is. This one has bit me with sd cards in small embedded-style machines (where turning off softupdates makes things VERY slow) and at some point I may look into developing a patch to forcibly-detach the volume instead. That obviously won't help you if the system volume is the one the error happens on (now you just forcibly detached the root filesystem which is going to get you an immediate panic anyway) but in the event of a data disk it would prevent the system from crashing. --=20 Karl Denninger karl@denninger.net <mailto:karl@denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms000200010406000104090602 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC BlwwggZYMIIEQKADAgECAgE9MA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE2MTIxODE5NDUzNVoXDTIxMTIxNzE5NDUzNVowVzEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxGzAZBgNVBAMUEmthcmxAZGVubmluZ2VyLm5ldDCCAiIwDQYJKoZIhvcNAQEBBQADggIP ADCCAgoCggIBAM2N5maxs7NkoY9g5NMxFWll0TYiO7gXrGZTo3q25ZJgNdPMwrntLz/5ewE9 07TEbwJ3ah/Ep9BfZm7JF9vTtE1HkgKtXNKi0pawNGm1Yn26Dz5AbUr1byby6dFtDJr14E07 trzDCtRRvTkOVSBj6PQPal0fAnDtkIYQBVcuMkXkuMCtyfE95pjm8g4K9l7lAcKii3T1/3rE hCc1o2nBnb7EN1/XwBeCDGB+I2SN/ftZDbKQqGAF5q9dUn+iXU7Z/CVSfUWmhVh6cVZA4Ftv TglUqj410OuPx+cUQch3h1kFgsuhQR63HiJc3HbRJllHsV0rihvL1CjeARQkhnA6uY9NLFST p5I/PfzBzW2MSmtN/tGZvmfKKnmtbfUNgkzbIR1K3lsum+yEL71kB93Xtz/4f1demEx5c8TJ RBIniDHjDeLGK1aoBu8nfnvXAvgthFNTWBOEoR49AHEPjC3kZj0l8JQml1Y8bTQD5gtC5txl klO60WV0EufU7Hy9CmynMuFtjiA2v71pm097rXeCdrAKgisdYeEESB+SFrlY65rLiLv4n8o1 PX7DqRfqKkOYIakZ0ug/yHVKcq2EM3RiJxwzls5gT70CoOBlKbrC98O8TA6teON0Jq30M06t NTI2HhvNbJDLbBH+Awf4h1UKB+0ufENwjVvF5Jfz8Ww/FaSDAgMBAAGjgfQwgfEwNwYIKwYB BQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgwCQYD VR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIBDQQf Fh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUpfAI3y+751pp9A0w 6vJHx8RoR/MwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYwFIES a2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBiB6MlugxYJdccD8boZ/u8 d8VxmLkJCtbfyYHRjYdyoABLW5hE3k3xSpYCM9L7vzWyV/UWwDYKi4ZzxHo4g+jG/GQZfKhx v38BQjL2G9xD0Hn2d+cygOq3UPjVYlbbfQoew6JbyCFXrrZ7/0jvRMLAN2+bRC7ynaFUixPH Whnj9JSH7ieYdzak8KN+G2coIC2t2iyfXVKehzi5gdNQ0vJ7+ypbGsRm4gE8Mdo9N/WgFPvZ HPFqR9Dwas7Z+aHwOabpk5r/336SyjOaZsn3MqKJQZL6GqDKusVOCWt+9uFAD8kadg7FetZe atIoD9I+zbp59oVoMnkMDMx7Hi85faU03csusqMGsjSsAzWSI1N8PJytZlchLiykokLKc3OL G87QKlErotlou7cfPX2BbEAH5wmkj9oiqZhxIL/wwAUA+PkiTbEmksKBNompSjUq/6UsR8EA s74gnu17lmijv8mrg2qMlwRirE7qG8pnE8egLtCDxcjd0Of9WMi2NJskn0/ovC7P+J60Napl m3ZIgPJst1piYSE0Zc1FIat4fFphMfK5v4iLblo1tFSlkdx1UNDGdg/U+LaXkNVXlMp8fyPm R80V6cIrCAlEWnBJNxG1UyfbbsvNMCCZBM4faGGsR/hhQOiydlruxhjL6P8J2WV8p11DdeGx KymWoil2s1J5WTGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMHRmxv cmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExDMRww GgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5c3Rl bXMgTExDIENBAgE9MA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZIhvcN AQcBMBwGCSqGSIb3DQEJBTEPFw0xNzAyMDYyMTIzMDNaME8GCSqGSIb3DQEJBDFCBEDmWjMH yrctMDT2HeVUz2pkaVqpWosT63m0yoj93l2dkLGHxSQDV6FBtQ+JzgIFYl5ttj+0o1zjhBtw FVw1bNAeMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAKBggq hkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZI hvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNVBAgT B0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExM QzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3VkYSBT eXN0ZW1zIExMQyBDQQIBPTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYTAlVT MRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEg U3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0B CQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECAT0wDQYJKoZIhvcNAQEBBQAEggIAsMKBPcQP99gW mNcYGganxIiI19E46B6IP8XZyB6gb+XVKVfGN59STOaMU8MU1gQvAdlwpOFdDopecvz/u3Yi voG9Y9xV6Zte4o+sXw7cUwVg3K0MF66CRcRunUaymm+kxAJ+TfbsOVgRiPlneU5gt+hQDKVk 7AMmlw+vieNy7BEV8b0179G0IUySplBxkYnvaZ6I8jWwjR84iKFD8Tjkt3TY2m8dMEtwzu+L M1uFdu5iJooCk8PD8CaFvmpRC5aC6FMXd/6tka8ZWnPGT9bFg66SLFuvGCPPh9tDwmRoIZyG p0vVcVD+VxgZLmqJsXvu/lxSLaTnfgjrZXBzfdtDrkbwSfCxgDyi6vRnwhu7e9fVW5/czDeK OHAJ+g7976FMIIjjXxir0Z+5C0dZDEvHmyUYW0inE7WcTSsUcoc9OYIRezm41aLM81UOrZoW vPxjSVek6z/lHfWcpofWD5rBMrlzqSeD9JY3mvDiYnA6lddi5zfr/RJV4wTOum+pdNAssFxr NOInBYWSo9JKxdHZnGs/qANQsVTs2U0/kF1CTIB/1ZhnMmal6GB56QfDrQV4eOV3gPwV2sh5 HGl7NheZZtEMN7HR5eTVA1Et+HjTQ+Rys6t6J3jDn8fITVbT8HPp/q73y4lCH9336GnP6XwK mFFYaJotnzI2LNc+5h1AA2MAAAAAAAA= --------------ms000200010406000104090602--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?03a3a0f3-c1fb-9874-93dc-e04185893cc2>