Skip site navigation (1)Skip section navigation (2)
To:        freebsd-hackers@freebsd.org
Subject:   Upperlimit for bwait()
Message-ID:  <CAG6t_XAcUDK%2BpPHiUZ9Bwu2fE5wg6vwK_zcuEYe94sb15HnUPg@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
--00000000000098c36f0619a57dd5
Content-Type: multipart/alternative; boundary="00000000000098c36d0619a57dd3"

--00000000000098c36d0619a57dd3
Content-Type: text/plain; charset="UTF-8"

Hello,

There have been a few incidents reported on Juniper devices with FreeBSD,
where buffer IO operations sleep for more than 30 mins. Theoretically, this
can happen due to faulty hardware or in virtual platforms due to faulty
connection between guest and host, filesystem corruption, too many buffer
IO operations, and/or host not responding due to various reasons. When that
happens, as this buffer IO writes hold a lock before going to sleep, the
threads waiting for that lock would starve for so long. There is no upper
limit for this bwait() as of now. If that wait goes beyond 30 mins for a
sleeping thread OR 15 mins for a thread blocked on turnstile, deadlkres
crashes the kernel assuming a possible deadlock.

We perhaps could gracefully handle such lengthy buffer IO operations by
adding a timeout in bwait() - like say 10 minutes. If the buffer IO is not
completed in a few mins, it probably would not complete forever and/or
would be slowing down the entire system. So it is better to stop such
faulty IO operations.

For now, since we had seen these instances only with BIO operations, I have
a patch to set this value only from bufwait(). Please find the patch
attached. I am not very sure if 10 mins is a good upper limit for all the
scenarios for bwait(). If it is, then we could just change msleep() in
bwait() to set a 10 mins upper limit by default.

Please let me know if this approach works for all the usecases - If not, is
there a better alternative ?  And is 10 mins okay for a timeout ?

Thanks and Regards,

Kumara

--00000000000098c36d0619a57dd3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><p style=3D"color:rgb(33,33,33);font-family:Aptos;fon=
t-size:16px">Hello,<br></p><p style=3D"color:rgb(33,33,33);font-family:Apto=
s;font-size:16px">There have been a few incidents reported on Juniper devic=
es with FreeBSD, where buffer IO operations sleep for more than 30 mins. Th=
eoretically, this can happen due to faulty hardware or in virtual platforms=
 due to faulty connection between guest and host, filesystem corruption, to=
o many buffer IO operations, and/or host not responding due to various reas=
ons. When that happens, as this buffer IO writes hold a lock before going t=
o sleep, the threads waiting for that lock would starve for so long. There =
is no upper limit for this bwait() as of now. If that wait goes beyond 30 m=
ins for a sleeping thread OR 15 mins for a thread blocked on turnstile, dea=
dlkres crashes the kernel assuming a possible deadlock.<br></p><p style=3D"=
color:rgb(33,33,33);font-family:Aptos;font-size:16px">We perhaps could grac=
efully handle such lengthy buffer IO operations by adding a timeout in bwai=
t() - like say 10 minutes. If the buffer IO is not completed in a few mins,=
 it probably would not complete forever and/or would be slowing down the en=
tire system. So it is better to stop such faulty IO operations.</p><p style=
=3D"color:rgb(33,33,33);font-family:Aptos;font-size:16px">For now, since we=
 had seen these instances only with BIO operations, I have a patch to set t=
his value only from bufwait(). Please find the patch attached. I am not ver=
y sure if 10 mins is a good upper limit for all the scenarios for bwait(). =
If it is, then we could just change msleep() in bwait() to set a 10 mins up=
per limit by default.<span class=3D"gmail-Apple-converted-space">=C2=A0</sp=
an></p><p style=3D"color:rgb(33,33,33);font-family:Aptos;font-size:16px">Pl=
ease let me know if this approach works for all the usecases - If not, is t=
here a better alternative ?=C2=A0 And is 10 mins okay for a timeout ?</p><p=
 class=3D"MsoNormal" style=3D"margin:0in;font-size:16px;font-family:Aptos,s=
ans-serif;color:rgb(33,33,33)"><span style=3D"font-size:11pt;font-family:Ca=
libri,sans-serif;color:black;line-height:1.2">Thanks and Regards,</span></p=
><p class=3D"MsoNormal" style=3D"margin:0in;font-size:16px;font-family:Apto=
s,sans-serif;color:rgb(33,33,33)"><span style=3D"font-size:11pt;font-family=
:Calibri,sans-serif;color:black;line-height:1.2">Kumara</span></p></div></d=
iv>

--00000000000098c36d0619a57dd3--
--00000000000098c36f0619a57dd5
Content-Type: application/octet-stream; name="bwait_timeout.patch"
Content-Disposition: attachment; filename="bwait_timeout.patch"
Content-Transfer-Encoding: base64
Content-ID: <f_lwsua22p0>
X-Attachment-Id: f_lwsua22p0

ZGlmZiAtLWdpdCBhL3N5cy9rZXJuL3Zmc19iaW8uYyBiL3N5cy9rZXJuL3Zmc19iaW8uYwppbmRl
eCBiNTQ2NmZiMmNkNTMuLjc4OGFiMmUxYjdmMSAxMDA2NDQKLS0tIGEvc3lzL2tlcm4vdmZzX2Jp
by5jCisrKyBiL3N5cy9rZXJuL3Zmc19iaW8uYwpAQCAtNzUsNiArNzUsNyBAQAogI2luY2x1ZGUg
PHN5cy9zbXAuaD4KICNpbmNsdWRlIDxzeXMvc3lzY3RsLmg+CiAjaW5jbHVkZSA8c3lzL3N5c2Nh
bGxzdWJyLmg+CisjaW5jbHVkZSA8c3lzL3N5c2xvZy5oPgogI2luY2x1ZGUgPHN5cy92bWVtLmg+
CiAjaW5jbHVkZSA8c3lzL3ZtbWV0ZXIuaD4KICNpbmNsdWRlIDxzeXMvdm5vZGUuaD4KQEAgLTM5
MCw2ICszOTEsOSBAQCBzdGF0aWMgaW50IGJkaXJ0eXdhaXQ7CiAvKiBNYXhpbXVtIG51bWJlciBv
ZiBidWZmZXIgZG9tYWlucy4gKi8KICNkZWZpbmUJQlVGX0RPTUFJTlMJOAogCisvKiBUaW1lb3V0
IGZvciBidWZmZXIgSS9POiAxMCBtaW5zICovCisjZGVmaW5lIEJUSU1FT1VUICAgICAgIDYwMCAq
IGh6CisKIHN0cnVjdCBidWZkb21haW5zZXQgYmRsb2RpcnR5OwkJLyogRG9tYWlucyA+IGxvZGly
dHkgKi8KIHN0cnVjdCBidWZkb21haW5zZXQgYmRoaWRpcnR5OwkJLyogRG9tYWlucyA+IGhpZGly
dHkgKi8KIApAQCAtNDUzNyw5ICs0NTQxLDkgQEAgaW50CiBidWZ3YWl0KHN0cnVjdCBidWYgKmJw
KQogewogCWlmIChicC0+Yl9pb2NtZCA9PSBCSU9fUkVBRCkKLQkJYndhaXQoYnAsIFBSSUJJTywg
ImJpb3JkIik7CisJCWJ3YWl0KGJwLCBQUklCSU8sICJiaW9yZCIsIEJUSU1FT1VUKTsKIAllbHNl
Ci0JCWJ3YWl0KGJwLCBQUklCSU8sICJiaW93ciIpOworCQlid2FpdChicCwgUFJJQklPLCAiYmlv
d3IiLCBCVElNRU9VVCk7CiAJaWYgKGJwLT5iX2ZsYWdzICYgQl9FSU5UUikgewogCQlicC0+Yl9m
bGFncyAmPSB+Ql9FSU5UUjsKIAkJcmV0dXJuIChFSU5UUik7CkBAIC01MTIxLDE0ICs1MTI1LDIy
IEBAIGJkb25lKHN0cnVjdCBidWYgKmJwKQogfQogCiB2b2lkCi1id2FpdChzdHJ1Y3QgYnVmICpi
cCwgdV9jaGFyIHByaSwgY29uc3QgY2hhciAqd2NoYW4pCitid2FpdChzdHJ1Y3QgYnVmICpicCwg
dV9jaGFyIHByaSwgY29uc3QgY2hhciAqd2NoYW4sIGludCB0aW1vKQogewogCXN0cnVjdCBtdHgg
Km10eHA7CisJaW50IHJldDsKIAogCW10eHAgPSBtdHhfcG9vbF9maW5kKG10eHBvb2xfc2xlZXAs
IGJwKTsKIAltdHhfbG9jayhtdHhwKTsKLQl3aGlsZSAoKGJwLT5iX2ZsYWdzICYgQl9ET05FKSA9
PSAwKQotCQltc2xlZXAoYnAsIG10eHAsIHByaSwgd2NoYW4sIDApOworCXdoaWxlICgoYnAtPmJf
ZmxhZ3MgJiBCX0RPTkUpID09IDApIHsKKwkJcmV0ID0gbXNsZWVwKGJwLCBtdHhwLCBwcmksIHdj
aGFuLCB0aW1vKTsKKwkJaWYgKHJldCA9PSBFV09VTERCTE9DSykgeworCQkJbG9nIChMT0dfRVJS
LCAiJXM6IFdhaXRlZCB0b28gbG9uZyglZCkgZm9yIGEgYnVmZmVyIElPIHRvIGNvbXBsZXRlXG4i
LCBfX2Z1bmNfXywgdGltbyk7CisJCQlicC0+Yl9lcnJvciA9IEVUSU1FRE9VVDsKKwkJCWJwLT5i
X2ZsYWdzIHw9IEJJT19FUlJPUjsKKwkJCWJyZWFrOworCQl9CisJfQogCW10eF91bmxvY2sobXR4
cCk7CiB9CiAKZGlmZiAtLWdpdCBhL3N5cy9zeXMvYnVmLmggYi9zeXMvc3lzL2J1Zi5oCmluZGV4
IDcwZmIyODEyYzNiYS4uZWY0Yzk2NTYwYTU3IDEwMDY0NAotLS0gYS9zeXMvc3lzL2J1Zi5oCisr
KyBiL3N5cy9zeXMvYnVmLmgKQEAgLTYwMyw3ICs2MDMsNyBAQCB2b2lkCXBicmVsYm8oc3RydWN0
IGJ1ZiAqKTsKIHZvaWQJcGJyZWx2cChzdHJ1Y3QgYnVmICopOwogaW50CWFsbG9jYnVmKHN0cnVj
dCBidWYgKmJwLCBpbnQgc2l6ZSk7CiB2b2lkCXJlYXNzaWduYnVmKHN0cnVjdCBidWYgKik7Ci12
b2lkCWJ3YWl0KHN0cnVjdCBidWYgKiwgdV9jaGFyLCBjb25zdCBjaGFyICopOwordm9pZAlid2Fp
dChzdHJ1Y3QgYnVmICosIHVfY2hhciwgY29uc3QgY2hhciAqLCBpbnQpOwogdm9pZAliZG9uZShz
dHJ1Y3QgYnVmICopOwogCiB0eXBlZGVmIGRhZGRyX3QgKHZiZ19nZXRfbGJsa25vX3QpKHN0cnVj
dCB2bm9kZSAqLCB2bV9vb2Zmc2V0X3QpOwpkaWZmIC0tZ2l0IGEvc3lzL3Vmcy9mZnMvZmZzX3Jh
d3JlYWQuYyBiL3N5cy91ZnMvZmZzL2Zmc19yYXdyZWFkLmMKaW5kZXggM2E0MTVkNzY2MzAzLi5k
YjNiNmM2ZjM2YmYgMTAwNjQ0Ci0tLSBhL3N5cy91ZnMvZmZzL2Zmc19yYXdyZWFkLmMKKysrIGIv
c3lzL3Vmcy9mZnMvZmZzX3Jhd3JlYWQuYwpAQCAtMzE0LDcgKzMxNCw3IEBAIGZmc19yYXdyZWFk
X21haW4oc3RydWN0IHZub2RlICp2cCwKIAkJCX0KIAkJfQogCQkKLQkJYndhaXQoYnAsIFBSSUJJ
TywgInJhd3JkIik7CisJCWJ3YWl0KGJwLCBQUklCSU8sICJyYXdyZCIsIDApOwogCQl2dW5tYXBi
dWYoYnApOwogCQkKIAkJaW9sZW4gPSBicC0+Yl9iY291bnQgLSBicC0+Yl9yZXNpZDsKQEAgLTM4
MSw3ICszODEsNyBAQCBmZnNfcmF3cmVhZF9tYWluKHN0cnVjdCB2bm9kZSAqdnAsCiAJCXVtYV96
ZnJlZShmZnNyYXdfcGJ1Zl96b25lLCBicCk7CiAJfQogCWlmIChuYnAgIT0gTlVMTCkgewkJCS8q
IFJ1biBkb3duIHJlYWRhaGVhZCBidWZmZXIgKi8KLQkJYndhaXQobmJwLCBQUklCSU8sICJyYXdy
ZCIpOworCQlid2FpdChuYnAsIFBSSUJJTywgInJhd3JkIiwgMCk7CiAJCXZ1bm1hcGJ1ZihuYnAp
OwogCQlwYnJlbHZwKG5icCk7CiAJCXVtYV96ZnJlZShmZnNyYXdfcGJ1Zl96b25lLCBuYnApOwpk
aWZmIC0tZ2l0IGEvc3lzL3ZtL3N3YXBfcGFnZXIuYyBiL3N5cy92bS9zd2FwX3BhZ2VyLmMKaW5k
ZXggZWUyMzZjN2YzOTg4Li5lOTEyYWEwODVmYzcgMTAwNjQ0Ci0tLSBhL3N5cy92bS9zd2FwX3Bh
Z2VyLmMKKysrIGIvc3lzL3ZtL3N3YXBfcGFnZXIuYwpAQCAtMTU5NSw3ICsxNTk1LDcgQEAgc3dh
cF9wYWdlcl9wdXRwYWdlcyh2bV9vYmplY3RfdCBvYmplY3QsIHZtX3BhZ2VfdCAqbWEsIGludCBj
b3VudCwKIAkJLyoKIAkJICogV2FpdCBmb3IgdGhlIHN5bmMgSS9PIHRvIGNvbXBsZXRlLgogCQkg
Ki8KLQkJYndhaXQoYnAsIFBWTSwgInN3d3J0Iik7CisJCWJ3YWl0KGJwLCBQVk0sICJzd3dydCIs
IDApOwogCiAJCS8qCiAJCSAqIE5vdyB0aGF0IHdlIGFyZSB0aHJvdWdoIHdpdGggdGhlIGJwLCB3
ZSBjYW4gY2FsbCB0aGUKZGlmZiAtLWdpdCBhL3N5cy92bS92bm9kZV9wYWdlci5jIGIvc3lzL3Zt
L3Zub2RlX3BhZ2VyLmMKaW5kZXggZDMyZmVjODQ1MDQzLi5hNmZjZTZiMzQ1ZWQgMTAwNjQ0Ci0t
LSBhL3N5cy92bS92bm9kZV9wYWdlci5jCisrKyBiL3N5cy92bS92bm9kZV9wYWdlci5jCkBAIC03
MDcsNyArNzA3LDcgQEAgdm5vZGVfcGFnZXJfaW5wdXRfc21sZnModm1fb2JqZWN0X3Qgb2JqZWN0
LCB2bV9wYWdlX3QgbSkKIAkJCWJwLT5iX2lvb2Zmc2V0ID0gZGJ0b2IoYnAtPmJfYmxrbm8pOwog
CQkJYnN0cmF0ZWd5KGJwKTsKIAotCQkJYndhaXQoYnAsIFBWTSwgInZuc3JkIik7CisJCQlid2Fp
dChicCwgUFZNLCAidm5zcmQiLCAwKTsKIAogCQkJaWYgKChicC0+Yl9pb2ZsYWdzICYgQklPX0VS
Uk9SKSAhPSAwKSB7CiAJCQkJS0FTU0VSVChicC0+Yl9lcnJvciAhPSAwLApAQCAtMTE2OCw3ICsx
MTY4LDcgQEAgdm5vZGVfcGFnZXJfZ2VuZXJpY19nZXRwYWdlcyhzdHJ1Y3Qgdm5vZGUgKnZwLCB2
bV9wYWdlX3QgKm0sIGludCBjb3VudCwKIAl9IGVsc2UgewogCQlicC0+Yl9pb2RvbmUgPSBiZG9u
ZTsKIAkJYnN0cmF0ZWd5KGJwKTsKLQkJYndhaXQoYnAsIFBWTSwgInZucmVhZCIpOworCQlid2Fp
dChicCwgUFZNLCAidm5yZWFkIiwgMCk7CiAJCWVycm9yID0gdm5vZGVfcGFnZXJfZ2VuZXJpY19n
ZXRwYWdlc19kb25lKGJwKTsKIAkJZm9yIChpID0gMDsgaSA8IGJwLT5iX25wYWdlczsgaSsrKQog
CQkJYnAtPmJfcGFnZXNbaV0gPSBOVUxMOwo=
--00000000000098c36f0619a57dd5--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAG6t_XAcUDK%2BpPHiUZ9Bwu2fE5wg6vwK_zcuEYe94sb15HnUPg>