Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 3 Sep 2009 19:10:08 GMT
From:      Burt Rosenberg <burt@cs.miami.edu>
To:        freebsd-net@FreeBSD.org
Subject:   Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R
Message-ID:  <200909031910.n83JA8QA018090@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/130628; it has been noted by GNATS.

From: Burt Rosenberg <burt@cs.miami.edu>
To: bug-followup@FreeBSD.org, Joe Marcus Clarke <marcus@marcuscom.com>
Cc: bvowk@math.ualberta.ca
Subject: Re: kern/130628: [nfs] NFS / rpc.lockd deadlock on 7.1-R
Date: Thu, 3 Sep 2009 14:40:24 -0400

 --000e0cd518fc7a0bcd0472b0b7ce
 Content-Type: multipart/alternative; boundary=000e0cd518fc7a0bbf0472b0b7cc
 
 --000e0cd518fc7a0bbf0472b0b7cc
 Content-Type: text/plain; charset=ISO-8859-1
 
 It seems that :
 
 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628
 
 appears in 7.2-R-p3; With this kernel, against Fedora 8 distros:
 
 Linux prism09.cs.miami.edu 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST
 2008 x86_64 x86_64 x86_64 GNU/Linux
 
 which are using NFS (tcp) to mount homedirs form the freebsd server to the
 fedora client,
 server will become unresponsive from the network during graphical login of a
 client.
 
 Applying the patch given in the article
 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/130628 seems at present to
 fix the problem. Under a 7.2-R-p3, we can manifest the problem in a few
 minutes, and under said kernel with patches as described in the article, and
 as provided by diffs against the current source, we have not yet seen the
 problem.
 
 When the problem appears, the sever cannot be pinged, an other network
 connections are halted.
 
 On the server, for instance, top shows:
 
 Proc, state, pri
 --------------------
 pc.lockd   *tcpin   -68
 nfsd          -       4
 rpcbind     select   44
 ntpd        select   44
 nfsd        select   44
 ... etc...
 
 
 Also,
 
 ./lockd restart
 Stopping lockd.
 Waiting for PIDS: 1114, 1114, 1114, 1114,....
 
 kill -9 1114 also ineffective.
 
 So it seems to be something spinning in lockd.
 
 I think this is a serious issue and would like to see it resolved. Our setup
 is available if you would like to send instrumented code. I attach diffs.
 
 --000e0cd518fc7a0bbf0472b0b7cc
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 It seems that :<br>=A0<br> <a href=3D"http://www.freebsd.org/cgi/query-pr.c=
 gi?pr=3Dkern/130628">http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/1306=
 28</a><br><br>appears in 7.2-R-p3; With this kernel, against Fedora 8 distr=
 os:<br>
 <br>Linux <a href=3D"http://prism09.cs.miami.edu/" target=3D"_blank">prism0=
 9.cs.miami.edu</a> 2.6.26.8-57.fc8 #1 SMP Thu Dec 18 18:59:49 EST 2008 x86_=
 64 x86_64 x86_64 GNU/Linux<br><br>which are using NFS (tcp) to mount homedi=
 rs form the freebsd server to the fedora client, <br>
 server will become unresponsive from the network during graphical login of =
 a client.<br><br>Applying the patch given in the article <a href=3D"http://=
 www.freebsd.org/cgi/query-pr.cgi?pr=3Dkern/130628">http://www.freebsd.org/c=
 gi/query-pr.cgi?pr=3Dkern/130628</a> seems at present to fix the problem. U=
 nder a 7.2-R-p3, we can manifest the problem in a few minutes, and under sa=
 id kernel with patches as described in the article, and as provided by diff=
 s against the current source, we have not yet seen the problem.<br>
 <br>When the problem appears, the sever cannot be pinged, an other network =
 connections are halted. <br><br>On the server, for instance, top shows:<br>=
 <br style=3D"font-family: courier new,monospace;"><span style=3D"font-famil=
 y: courier new,monospace;">Proc, state, pri</span><br style=3D"font-family:=
  courier new,monospace;">
 <span style=3D"font-family: courier new,monospace;">--------------------</s=
 pan><br style=3D"font-family: courier new,monospace;"><span style=3D"font-f=
 amily: courier new,monospace;">pc.lockd=A0=A0 *tcpin=A0=A0 -68 </span><br s=
 tyle=3D"font-family: courier new,monospace;">
 <span style=3D"font-family: courier new,monospace;">nfsd=A0=A0=A0=A0=A0=A0=
 =A0=A0=A0 -=A0=A0=A0=A0=A0=A0 4</span><br style=3D"font-family: courier new=
 ,monospace;"><span style=3D"font-family: courier new,monospace;">rpcbind=A0=
 =A0=A0=A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monos=
 pace;">
 <span style=3D"font-family: courier new,monospace;">ntpd=A0=A0=A0=A0=A0=A0=
 =A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monospace;"=
 ><span style=3D"font-family: courier new,monospace;">nfsd=A0=A0=A0=A0=A0=A0=
 =A0 select=A0=A0 44</span><br style=3D"font-family: courier new,monospace;"=
 >
 <span style=3D"font-family: courier new,monospace;">... etc...</span><br><b=
 r><br>Also,<br><br><span style=3D"font-family: courier new,monospace;">./lo=
 ckd restart</span><br style=3D"font-family: courier new,monospace;"><span s=
 tyle=3D"font-family: courier new,monospace;">Stopping lockd.</span><br styl=
 e=3D"font-family: courier new,monospace;">
 <span style=3D"font-family: courier new,monospace;">Waiting for PIDS: 1114,=
  1114, 1114, 1114,....</span><br style=3D"font-family: courier new,monospac=
 e;"><br>kill -9 1114 also ineffective.<br><br>So it seems to be something s=
 pinning in lockd.<br>
 <br>I think this is a serious issue and would like to see it resolved. Our =
 setup is available if you would like to send instrumented code. I attach di=
 ffs.<br><br><br><br>
 
 --000e0cd518fc7a0bbf0472b0b7cc--
 --000e0cd518fc7a0bcd0472b0b7ce
 Content-Type: application/octet-stream; name="svc.c.diff"
 Content-Disposition: attachment; filename="svc.c.diff"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_fz5u5alz0
 
 MTgxYzE4MQo8IHhwcnRfaW5hY3RpdmUoU1ZDWFBSVCAqeHBydCkKLS0tCj4geHBydF9pbmFjdGl2
 ZV9sb2NrZWQoU1ZDWFBSVCAqeHBydCkKMTg1LDE4NmQxODQKPCAJbXR4X2xvY2soJnBvb2wtPnNw
 X2xvY2spOwo8IAoxOTFjMTg5CjwgCXdha2V1cCgmcG9vbC0+c3BfYWN0aXZlKTsKLS0tCj4gfQox
 OTJhMTkxLDE5Nwo+IHZvaWQKPiB4cHJ0X2luYWN0aXZlKFNWQ1hQUlQgKnhwcnQpCj4gewo+IAlT
 VkNQT09MICpwb29sID0geHBydC0+eHBfcG9vbDsKPiAKPiAJbXR4X2xvY2soJnBvb2wtPnNwX2xv
 Y2spOwo+IAl4cHJ0X2luYWN0aXZlX2xvY2tlZCh4cHJ0KTsK
 --000e0cd518fc7a0bcd0472b0b7ce
 Content-Type: application/octet-stream; name="svc.h.diff"
 Content-Disposition: attachment; filename="svc.h.diff"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_fz5u5am51
 
 NDlhNTAKPiAjaW5jbHVkZSA8c3lzL19zeC5oPgoxMzFjMTMyCjwgCXN0cnVjdCBtdHgJeHBfbG9j
 azsKLS0tCj4gCXN0cnVjdCBzeCAgICAgICB4cF9sb2NrOwozMzRhMzM2Cj4gZXh0ZXJuIHZvaWQg
 ICAgIHhwcnRfaW5hY3RpdmVfbG9ja2VkKFNWQ1hQUlQgKik7Cg==
 --000e0cd518fc7a0bcd0472b0b7ce
 Content-Type: application/octet-stream; name="svc_dg.c.diff"
 Content-Disposition: attachment; filename="svc_dg.c.diff"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_fz5u5am92
 
 NTVhNTYKPiAjaW5jbHVkZSA8c3lzL3N4Lmg+CjEyMWMxMjIKPCAJbXR4X2luaXQoJnhwcnQtPnhw
 X2xvY2ssICJ4cHJ0LT54cF9sb2NrIiwgTlVMTCwgTVRYX0RFRik7Ci0tLQo+IAlzeF9pbml0KCZ4
 cHJ0LT54cF9sb2NrLCAieHBydC0+eHBfbG9jayIpOwoxNjNhMTY1LDE2Nwo+IAlpZiAoc29yZWFk
 YWJsZSh4cHJ0LT54cF9zb2NrZXQpKQo+IAkJZXR1cm4gKFhQUlRfTU9SRVJFUVMpOwo+IAoxNzRh
 MTc5LDE4Mwo+IAkvKiAKPiAJICogU2VyaWFsaXNlIGFjY2VzcyB0byB0aGUgc29ja2V0Lgo+IAkg
 Ki8KPiAJc3hfeGxvY2soJnhwcnQtPnhwX2xvY2spOwo+IAoxOTAsMTkxZDE5OAo8IAltdHhfbG9j
 aygmeHBydC0+eHBfbG9jayk7CjwgCjE5OSwyMDBjMjA2LDIxMAo8IAkJeHBydF9pbmFjdGl2ZSh4
 cHJ0KTsKPCAJCW10eF91bmxvY2soJnhwcnQtPnhwX2xvY2spOwotLS0KPiAJCW10eF9sb2NrKCZ4
 cHJ0LT54cF9wb29sLT5zcF9sb2NrKTsKPiAJCWlmICghc29yZWFkYWJsZSh4cHJ0LT54cF9zb2Nr
 ZXQpKQo+IAkJCXhwcnRfaW5hY3RpdmVfbG9ja2VkKHhwcnQpOwo+IAkJbXR4X3VubG9jaygmeHBy
 dC0+eHBfcG9vbC0+c3BfbG9jayk7Cj4gCQlzeF94dW5sb2NrKCZ4cHJ0LT54cF9sb2NrKTsKMjEx
 YzIyMQo8IAkJbXR4X3VubG9jaygmeHBydC0+eHBfbG9jayk7Ci0tLQo+IAkJc3hfeHVubG9jaygm
 eHBydC0+eHBfbG9jayk7CjIxNWMyMjUKPCAJbXR4X3VubG9jaygmeHBydC0+eHBfbG9jayk7Ci0t
 LQo+IAlzeF94dW5sb2NrKCZ4cHJ0LT54cF9sb2NrKTsKMzA0YzMxNAo8IAltdHhfZGVzdHJveSgm
 eHBydC0+eHBfbG9jayk7Ci0tLQo+IAlzeF9kZXN0cm95KCZ4cHJ0LT54cF9sb2NrKTsKMzMxZDM0
 MAo8IAltdHhfbG9jaygmeHBydC0+eHBfbG9jayk7CjMzM2QzNDEKPCAJbXR4X3VubG9jaygmeHBy
 dC0+eHBfbG9jayk7Cg==
 --000e0cd518fc7a0bcd0472b0b7ce
 Content-Type: application/octet-stream; name="svc_vc.c.diff"
 Content-Disposition: attachment; filename="svc_vc.c.diff"
 Content-Transfer-Encoding: base64
 X-Attachment-Id: f_fz5u5amc3
 
 NTZhNTcKPiAjaW5jbHVkZSA8c3lzL3N4Lmg+CjE0NWMxNDYKPCAJbXR4X2luaXQoJnhwcnQtPnhw
 X2xvY2ssICJ4cHJ0LT54cF9sb2NrIiwgTlVMTCwgTVRYX0RFRik7Ci0tLQo+IAlzeF9pbml0KCZ4
 cHJ0LT54cF9sb2NrLCAieHBydC0+eHBfbG9jayIpOwoyMjJjMjIzLDIyNAo8IAltdHhfaW5pdCgm
 eHBydC0+eHBfbG9jaywgInhwcnQtPnhwX2xvY2siLCBOVUxMLCBNVFhfREVGKTsKLS0tCj4gCXN4
 X2luaXQoJnhwcnQtPnhwX2xvY2ssICJ4cHJ0LT54cF9sb2NrIik7Cj4gCjI1OGMyNjAKPCAJbXR4
 X2xvY2soJnhwcnQtPnhwX2xvY2spOwotLS0KPiAJc3hfeGxvY2soJnhwcnQtPnhwX2xvY2spOwoy
 NjBjMjYyLDI2Mwo8IAltdHhfdW5sb2NrKCZ4cHJ0LT54cF9sb2NrKTsKLS0tCj4gCXN4X3h1bmxv
 Y2soJnhwcnQtPnhwX2xvY2spOwo+IAozNTljMzYyCjwgCW10eF9sb2NrKCZ4cHJ0LT54cF9sb2Nr
 KTsKLS0tCj4gCXN4X3hsb2NrKCZ4cHJ0LT54cF9sb2NrKTsKMzY0LDM2NWMzNjcsMzczCjwgCQl4
 cHJ0X2luYWN0aXZlKHhwcnQpOwo8IAkJbXR4X3VubG9jaygmeHBydC0+eHBfbG9jayk7Ci0tLQo+
 IAkJQUNDRVBUX0xPQ0soKTsKPiAJCW10eF9sb2NrKCZ4cHJ0LT54cF9wb29sLT5zcF9sb2NrKTsK
 PiAJCWlmIChUQUlMUV9FTVBUWSgmeHBydC0+eHBfc29ja2V0LT5zb19jb21wKSkKPiAJCQl4cHJ0
 X2luYWN0aXZlX2xvY2tlZCh4cHJ0KTsKPiAJCW10eF91bmxvY2soJnhwcnQtPnhwX3Bvb2wtPnNw
 X2xvY2spOwo+IAkJQUNDRVBUX1VOTE9DSygpOwo+IAkJc3hfeHVubG9jaygmeHBydC0+eHBfbG9j
 ayk7CjM3NmMzODQKPCAJCW10eF91bmxvY2soJnhwcnQtPnhwX2xvY2spOwotLS0KPiAJCXN4X3h1
 bmxvY2soJnhwcnQtPnhwX2xvY2spOwozODBjMzg4CjwgCW10eF91bmxvY2soJnhwcnQtPnhwX2xv
 Y2spOwotLS0KPiAJc3hfeHVubG9jaygmeHBydC0+eHBfbG9jayk7CjQyNWM0MzMKPCAJbXR4X2Rl
 c3Ryb3koJnhwcnQtPnhwX2xvY2spOwotLS0KPiAJc3hfZGVzdHJveSgmeHBydC0+eHBfbG9jayk7
 CjQ5MWE1MDAKPiAJCXN4X3hsb2NrKCZ4cHJ0LT54cF9sb2NrKTsKNDk2YTUwNgo+IAkJc3hfeHVu
 bG9jaygmeHBydC0+eHBfbG9jayk7CjUwMGE1MTEsNTEzCj4gCWlmIChzb3JlYWRhYmxlKHhwcnQt
 PnhwX3NvY2tldCkpCj4gCQlyZXR1cm4gKFhQUlRfTU9SRVJFUVMpOwo+IAo1MTFhNTI1LDUyNgo+
 IAlzeF94bG9jaygmeHBydC0+eHBfbG9jayk7Cj4gCjU4NmE2MDIKPiAJCQkJc3hfeHVubG9jaygm
 eHBydC0+eHBfbG9jayk7CjYxNGQ2MjkKPCAJCW10eF9sb2NrKCZ4cHJ0LT54cF9sb2NrKTsKNjI0
 LDYyNWM2MzksNjQzCjwgCQkJeHBydF9pbmFjdGl2ZSh4cHJ0KTsKPCAJCQltdHhfdW5sb2NrKCZ4
 cHJ0LT54cF9sb2NrKTsKLS0tCj4gCQkJbXR4X2xvY2soJnhwcnQtPnhwX3Bvb2wtPnNwX2xvY2sp
 Owo+IAkJCWlmICghc29yZWFkYWJsZSh4cHJ0LT54cF9zb2NrZXQpKQo+IAkJCQl4cHJ0X2luYWN0
 aXZlX2xvY2tlZCh4cHJ0KTsKPiAJCQltdHhfdW5sb2NrKCZ4cHJ0LT54cF9wb29sLT5zcF9sb2Nr
 KTsKPiAJCQlzeF94dW5sb2NrKCZ4cHJ0LT54cF9sb2NrKTsKNjM3YzY1NQo8IAkJCW10eF91bmxv
 Y2soJnhwcnQtPnhwX2xvY2spOwotLS0KPiAJCQlzeF94dW5sb2NrKCZ4cHJ0LT54cF9sb2NrKTsK
 NjQ0YTY2Mwo+IAkJCXhwcnRfaW5hY3RpdmUoeHBydCk7CjY0NmM2NjUKPCAJCQltdHhfdW5sb2Nr
 KCZ4cHJ0LT54cF9sb2NrKTsKLS0tCj4gCQkJc3hfeHVubG9jaygmeHBydC0+eHBfbG9jayk7CjY1
 NCw2NTVkNjcyCjwgCjwgCQltdHhfdW5sb2NrKCZ4cHJ0LT54cF9sb2NrKTsKNzQyZDc1OAo8IAlt
 dHhfbG9jaygmeHBydC0+eHBfbG9jayk7Cjc0NGQ3NTkKPCAJbXR4X3VubG9jaygmeHBydC0+eHBf
 bG9jayk7Cg==
 --000e0cd518fc7a0bcd0472b0b7ce--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200909031910.n83JA8QA018090>