To: freebsd-hackers@freebsd.org Subject: Upperlimit for bwait() Message-ID: <CAG6t_XAcUDK%2BpPHiUZ9Bwu2fE5wg6vwK_zcuEYe94sb15HnUPg@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
--00000000000098c36f0619a57dd5 Content-Type: multipart/alternative; boundary="00000000000098c36d0619a57dd3" --00000000000098c36d0619a57dd3 Content-Type: text/plain; charset="UTF-8" Hello, There have been a few incidents reported on Juniper devices with FreeBSD, where buffer IO operations sleep for more than 30 mins. Theoretically, this can happen due to faulty hardware or in virtual platforms due to faulty connection between guest and host, filesystem corruption, too many buffer IO operations, and/or host not responding due to various reasons. When that happens, as this buffer IO writes hold a lock before going to sleep, the threads waiting for that lock would starve for so long. There is no upper limit for this bwait() as of now. If that wait goes beyond 30 mins for a sleeping thread OR 15 mins for a thread blocked on turnstile, deadlkres crashes the kernel assuming a possible deadlock. We perhaps could gracefully handle such lengthy buffer IO operations by adding a timeout in bwait() - like say 10 minutes. If the buffer IO is not completed in a few mins, it probably would not complete forever and/or would be slowing down the entire system. So it is better to stop such faulty IO operations. For now, since we had seen these instances only with BIO operations, I have a patch to set this value only from bufwait(). Please find the patch attached. I am not very sure if 10 mins is a good upper limit for all the scenarios for bwait(). If it is, then we could just change msleep() in bwait() to set a 10 mins upper limit by default. Please let me know if this approach works for all the usecases - If not, is there a better alternative ? And is 10 mins okay for a timeout ? Thanks and Regards, Kumara --00000000000098c36d0619a57dd3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><p style=3D"color:rgb(33,33,33);font-family:Aptos;fon= t-size:16px">Hello,<br></p><p style=3D"color:rgb(33,33,33);font-family:Apto= s;font-size:16px">There have been a few incidents reported on Juniper devic= es with FreeBSD, where buffer IO operations sleep for more than 30 mins. Th= eoretically, this can happen due to faulty hardware or in virtual platforms= due to faulty connection between guest and host, filesystem corruption, to= o many buffer IO operations, and/or host not responding due to various reas= ons. When that happens, as this buffer IO writes hold a lock before going t= o sleep, the threads waiting for that lock would starve for so long. There = is no upper limit for this bwait() as of now. If that wait goes beyond 30 m= ins for a sleeping thread OR 15 mins for a thread blocked on turnstile, dea= dlkres crashes the kernel assuming a possible deadlock.<br></p><p style=3D"= color:rgb(33,33,33);font-family:Aptos;font-size:16px">We perhaps could grac= efully handle such lengthy buffer IO operations by adding a timeout in bwai= t() - like say 10 minutes. If the buffer IO is not completed in a few mins,= it probably would not complete forever and/or would be slowing down the en= tire system. So it is better to stop such faulty IO operations.</p><p style= =3D"color:rgb(33,33,33);font-family:Aptos;font-size:16px">For now, since we= had seen these instances only with BIO operations, I have a patch to set t= his value only from bufwait(). Please find the patch attached. I am not ver= y sure if 10 mins is a good upper limit for all the scenarios for bwait(). = If it is, then we could just change msleep() in bwait() to set a 10 mins up= per limit by default.<span class=3D"gmail-Apple-converted-space">=C2=A0</sp= an></p><p style=3D"color:rgb(33,33,33);font-family:Aptos;font-size:16px">Pl= ease let me know if this approach works for all the usecases - If not, is t= here a better alternative ?=C2=A0 And is 10 mins okay for a timeout ?</p><p= class=3D"MsoNormal" style=3D"margin:0in;font-size:16px;font-family:Aptos,s= ans-serif;color:rgb(33,33,33)"><span style=3D"font-size:11pt;font-family:Ca= libri,sans-serif;color:black;line-height:1.2">Thanks and Regards,</span></p= ><p class=3D"MsoNormal" style=3D"margin:0in;font-size:16px;font-family:Apto= s,sans-serif;color:rgb(33,33,33)"><span style=3D"font-size:11pt;font-family= :Calibri,sans-serif;color:black;line-height:1.2">Kumara</span></p></div></d= iv> --00000000000098c36d0619a57dd3-- --00000000000098c36f0619a57dd5 Content-Type: application/octet-stream; name="bwait_timeout.patch" Content-Disposition: attachment; filename="bwait_timeout.patch" Content-Transfer-Encoding: base64 Content-ID: <f_lwsua22p0> X-Attachment-Id: f_lwsua22p0 ZGlmZiAtLWdpdCBhL3N5cy9rZXJuL3Zmc19iaW8uYyBiL3N5cy9rZXJuL3Zmc19iaW8uYwppbmRl eCBiNTQ2NmZiMmNkNTMuLjc4OGFiMmUxYjdmMSAxMDA2NDQKLS0tIGEvc3lzL2tlcm4vdmZzX2Jp by5jCisrKyBiL3N5cy9rZXJuL3Zmc19iaW8uYwpAQCAtNzUsNiArNzUsNyBAQAogI2luY2x1ZGUg PHN5cy9zbXAuaD4KICNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAjaW5jbHVkZSA8c3lzL3N5c2Nh bGxzdWJyLmg+CisjaW5jbHVkZSA8c3lzL3N5c2xvZy5oPgogI2luY2x1ZGUgPHN5cy92bWVtLmg+ CiAjaW5jbHVkZSA8c3lzL3ZtbWV0ZXIuaD4KICNpbmNsdWRlIDxzeXMvdm5vZGUuaD4KQEAgLTM5 MCw2ICszOTEsOSBAQCBzdGF0aWMgaW50IGJkaXJ0eXdhaXQ7CiAvKiBNYXhpbXVtIG51bWJlciBv ZiBidWZmZXIgZG9tYWlucy4gKi8KICNkZWZpbmUJQlVGX0RPTUFJTlMJOAogCisvKiBUaW1lb3V0 IGZvciBidWZmZXIgSS9POiAxMCBtaW5zICovCisjZGVmaW5lIEJUSU1FT1VUICAgICAgIDYwMCAq IGh6CisKIHN0cnVjdCBidWZkb21haW5zZXQgYmRsb2RpcnR5OwkJLyogRG9tYWlucyA+IGxvZGly dHkgKi8KIHN0cnVjdCBidWZkb21haW5zZXQgYmRoaWRpcnR5OwkJLyogRG9tYWlucyA+IGhpZGly dHkgKi8KIApAQCAtNDUzNyw5ICs0NTQxLDkgQEAgaW50CiBidWZ3YWl0KHN0cnVjdCBidWYgKmJw KQogewogCWlmIChicC0+Yl9pb2NtZCA9PSBCSU9fUkVBRCkKLQkJYndhaXQoYnAsIFBSSUJJTywg ImJpb3JkIik7CisJCWJ3YWl0KGJwLCBQUklCSU8sICJiaW9yZCIsIEJUSU1FT1VUKTsKIAllbHNl Ci0JCWJ3YWl0KGJwLCBQUklCSU8sICJiaW93ciIpOworCQlid2FpdChicCwgUFJJQklPLCAiYmlv d3IiLCBCVElNRU9VVCk7CiAJaWYgKGJwLT5iX2ZsYWdzICYgQl9FSU5UUikgewogCQlicC0+Yl9m bGFncyAmPSB+Ql9FSU5UUjsKIAkJcmV0dXJuIChFSU5UUik7CkBAIC01MTIxLDE0ICs1MTI1LDIy IEBAIGJkb25lKHN0cnVjdCBidWYgKmJwKQogfQogCiB2b2lkCi1id2FpdChzdHJ1Y3QgYnVmICpi cCwgdV9jaGFyIHByaSwgY29uc3QgY2hhciAqd2NoYW4pCitid2FpdChzdHJ1Y3QgYnVmICpicCwg dV9jaGFyIHByaSwgY29uc3QgY2hhciAqd2NoYW4sIGludCB0aW1vKQogewogCXN0cnVjdCBtdHgg Km10eHA7CisJaW50IHJldDsKIAogCW10eHAgPSBtdHhfcG9vbF9maW5kKG10eHBvb2xfc2xlZXAs IGJwKTsKIAltdHhfbG9jayhtdHhwKTsKLQl3aGlsZSAoKGJwLT5iX2ZsYWdzICYgQl9ET05FKSA9 PSAwKQotCQltc2xlZXAoYnAsIG10eHAsIHByaSwgd2NoYW4sIDApOworCXdoaWxlICgoYnAtPmJf ZmxhZ3MgJiBCX0RPTkUpID09IDApIHsKKwkJcmV0ID0gbXNsZWVwKGJwLCBtdHhwLCBwcmksIHdj aGFuLCB0aW1vKTsKKwkJaWYgKHJldCA9PSBFV09VTERCTE9DSykgeworCQkJbG9nIChMT0dfRVJS LCAiJXM6IFdhaXRlZCB0b28gbG9uZyglZCkgZm9yIGEgYnVmZmVyIElPIHRvIGNvbXBsZXRlXG4i LCBfX2Z1bmNfXywgdGltbyk7CisJCQlicC0+Yl9lcnJvciA9IEVUSU1FRE9VVDsKKwkJCWJwLT5i X2ZsYWdzIHw9IEJJT19FUlJPUjsKKwkJCWJyZWFrOworCQl9CisJfQogCW10eF91bmxvY2sobXR4 cCk7CiB9CiAKZGlmZiAtLWdpdCBhL3N5cy9zeXMvYnVmLmggYi9zeXMvc3lzL2J1Zi5oCmluZGV4 IDcwZmIyODEyYzNiYS4uZWY0Yzk2NTYwYTU3IDEwMDY0NAotLS0gYS9zeXMvc3lzL2J1Zi5oCisr KyBiL3N5cy9zeXMvYnVmLmgKQEAgLTYwMyw3ICs2MDMsNyBAQCB2b2lkCXBicmVsYm8oc3RydWN0 IGJ1ZiAqKTsKIHZvaWQJcGJyZWx2cChzdHJ1Y3QgYnVmICopOwogaW50CWFsbG9jYnVmKHN0cnVj dCBidWYgKmJwLCBpbnQgc2l6ZSk7CiB2b2lkCXJlYXNzaWduYnVmKHN0cnVjdCBidWYgKik7Ci12 b2lkCWJ3YWl0KHN0cnVjdCBidWYgKiwgdV9jaGFyLCBjb25zdCBjaGFyICopOwordm9pZAlid2Fp dChzdHJ1Y3QgYnVmICosIHVfY2hhciwgY29uc3QgY2hhciAqLCBpbnQpOwogdm9pZAliZG9uZShz dHJ1Y3QgYnVmICopOwogCiB0eXBlZGVmIGRhZGRyX3QgKHZiZ19nZXRfbGJsa25vX3QpKHN0cnVj dCB2bm9kZSAqLCB2bV9vb2Zmc2V0X3QpOwpkaWZmIC0tZ2l0IGEvc3lzL3Vmcy9mZnMvZmZzX3Jh d3JlYWQuYyBiL3N5cy91ZnMvZmZzL2Zmc19yYXdyZWFkLmMKaW5kZXggM2E0MTVkNzY2MzAzLi5k YjNiNmM2ZjM2YmYgMTAwNjQ0Ci0tLSBhL3N5cy91ZnMvZmZzL2Zmc19yYXdyZWFkLmMKKysrIGIv c3lzL3Vmcy9mZnMvZmZzX3Jhd3JlYWQuYwpAQCAtMzE0LDcgKzMxNCw3IEBAIGZmc19yYXdyZWFk X21haW4oc3RydWN0IHZub2RlICp2cCwKIAkJCX0KIAkJfQogCQkKLQkJYndhaXQoYnAsIFBSSUJJ TywgInJhd3JkIik7CisJCWJ3YWl0KGJwLCBQUklCSU8sICJyYXdyZCIsIDApOwogCQl2dW5tYXBi dWYoYnApOwogCQkKIAkJaW9sZW4gPSBicC0+Yl9iY291bnQgLSBicC0+Yl9yZXNpZDsKQEAgLTM4 MSw3ICszODEsNyBAQCBmZnNfcmF3cmVhZF9tYWluKHN0cnVjdCB2bm9kZSAqdnAsCiAJCXVtYV96 ZnJlZShmZnNyYXdfcGJ1Zl96b25lLCBicCk7CiAJfQogCWlmIChuYnAgIT0gTlVMTCkgewkJCS8q IFJ1biBkb3duIHJlYWRhaGVhZCBidWZmZXIgKi8KLQkJYndhaXQobmJwLCBQUklCSU8sICJyYXdy ZCIpOworCQlid2FpdChuYnAsIFBSSUJJTywgInJhd3JkIiwgMCk7CiAJCXZ1bm1hcGJ1ZihuYnAp OwogCQlwYnJlbHZwKG5icCk7CiAJCXVtYV96ZnJlZShmZnNyYXdfcGJ1Zl96b25lLCBuYnApOwpk aWZmIC0tZ2l0IGEvc3lzL3ZtL3N3YXBfcGFnZXIuYyBiL3N5cy92bS9zd2FwX3BhZ2VyLmMKaW5k ZXggZWUyMzZjN2YzOTg4Li5lOTEyYWEwODVmYzcgMTAwNjQ0Ci0tLSBhL3N5cy92bS9zd2FwX3Bh Z2VyLmMKKysrIGIvc3lzL3ZtL3N3YXBfcGFnZXIuYwpAQCAtMTU5NSw3ICsxNTk1LDcgQEAgc3dh cF9wYWdlcl9wdXRwYWdlcyh2bV9vYmplY3RfdCBvYmplY3QsIHZtX3BhZ2VfdCAqbWEsIGludCBj b3VudCwKIAkJLyoKIAkJICogV2FpdCBmb3IgdGhlIHN5bmMgSS9PIHRvIGNvbXBsZXRlLgogCQkg Ki8KLQkJYndhaXQoYnAsIFBWTSwgInN3d3J0Iik7CisJCWJ3YWl0KGJwLCBQVk0sICJzd3dydCIs IDApOwogCiAJCS8qCiAJCSAqIE5vdyB0aGF0IHdlIGFyZSB0aHJvdWdoIHdpdGggdGhlIGJwLCB3 ZSBjYW4gY2FsbCB0aGUKZGlmZiAtLWdpdCBhL3N5cy92bS92bm9kZV9wYWdlci5jIGIvc3lzL3Zt L3Zub2RlX3BhZ2VyLmMKaW5kZXggZDMyZmVjODQ1MDQzLi5hNmZjZTZiMzQ1ZWQgMTAwNjQ0Ci0t LSBhL3N5cy92bS92bm9kZV9wYWdlci5jCisrKyBiL3N5cy92bS92bm9kZV9wYWdlci5jCkBAIC03 MDcsNyArNzA3LDcgQEAgdm5vZGVfcGFnZXJfaW5wdXRfc21sZnModm1fb2JqZWN0X3Qgb2JqZWN0 LCB2bV9wYWdlX3QgbSkKIAkJCWJwLT5iX2lvb2Zmc2V0ID0gZGJ0b2IoYnAtPmJfYmxrbm8pOwog CQkJYnN0cmF0ZWd5KGJwKTsKIAotCQkJYndhaXQoYnAsIFBWTSwgInZuc3JkIik7CisJCQlid2Fp dChicCwgUFZNLCAidm5zcmQiLCAwKTsKIAogCQkJaWYgKChicC0+Yl9pb2ZsYWdzICYgQklPX0VS Uk9SKSAhPSAwKSB7CiAJCQkJS0FTU0VSVChicC0+Yl9lcnJvciAhPSAwLApAQCAtMTE2OCw3ICsx MTY4LDcgQEAgdm5vZGVfcGFnZXJfZ2VuZXJpY19nZXRwYWdlcyhzdHJ1Y3Qgdm5vZGUgKnZwLCB2 bV9wYWdlX3QgKm0sIGludCBjb3VudCwKIAl9IGVsc2UgewogCQlicC0+Yl9pb2RvbmUgPSBiZG9u ZTsKIAkJYnN0cmF0ZWd5KGJwKTsKLQkJYndhaXQoYnAsIFBWTSwgInZucmVhZCIpOworCQlid2Fp dChicCwgUFZNLCAidm5yZWFkIiwgMCk7CiAJCWVycm9yID0gdm5vZGVfcGFnZXJfZ2VuZXJpY19n ZXRwYWdlc19kb25lKGJwKTsKIAkJZm9yIChpID0gMDsgaSA8IGJwLT5iX25wYWdlczsgaSsrKQog CQkJYnAtPmJfcGFnZXNbaV0gPSBOVUxMOwo= --00000000000098c36f0619a57dd5--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6t_XAcUDK%2BpPHiUZ9Bwu2fE5wg6vwK_zcuEYe94sb15HnUPg>