Date: Wed, 21 Nov 2012 17:27:32 +0200 From: Nikolay Denev <ndenev@gmail.com> To: Rick Macklem <rmacklem@uoguelph.ca> Cc: "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.ORG> Subject: Re: nfsd hang in sosend_generic Message-ID: <8C72CE97-6D19-4847-9A89-DF8A05B984DD@gmail.com> In-Reply-To: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca> References: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca>
next in thread | previous in thread | raw e-mail | index | archive | help
On Nov 21, 2012, at 4:01 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote: > Nikolay Denev wrote: >> Hello, >>=20 >> First of all, I'm not sure if this is actually nfsd issue and not >> network stack issue. >>=20 >> I've just had nfsd hang in unkillable state while doing some IO from >> Linux host running Oracle DB using Oracle's Direct NFS. >>=20 >> I was watching from some time how the Direct NFS client loads the NFS >> server differently, i.e.: >> with the linux kernel NFS client I see single TCP session to port = 2049 >> and all traffic goes there, while the Direct NFS client >> is much more aggressive and creates multiple TCP sessions, and often >> was able to generate pretty big Send/Recv-Q's on FreeBSD's side. >> I'm mentioning this as probably is related. >>=20 > I don't know anything about the Oracle client, but it might be = creating > new TCP connections to try and recover from a "hung" state. Your = netstat > for the client below shows that there are several ESTABLISHED TCP = connections > with large receive queues. I wouldn't expect to see this and it = suggests > that the Oracle client isn't receiving/reading data off the TCP socket = for > some reason. Once it isn't receiving/reading an RPC reply off the TCP = socket, > it might create a new one to attempt a retry of the RPC. (NFSv4 = requires that > any retry of an RPC be done on a new TCP connection. Although that = requirement > doesn't exist for NFSv3, it would probably be considered "good = practice" and > will happen if NFSv3 and NFSv4 share the same RPC socket handling = code.) >=20 >> Here's the procstat -kk of the hanged nfsd process : >>=20 >> [... snipped huge procstat output =85] >>=20 > It appears that all the nfsd threads are trying to send RPC replies > back to the client and are stuck there. As you can see below, the > send queues for the TCP sockets are big, so the data isn't getting > through to the client. The large receive queue in the ESTABLISHED > connections on the Linux client suggests that Oracle isn't taking > data off the TCP socket for some reason, which would result in this, > once the send window is filled. At least that's my rusty old > understanding of TCP. (That would hint at an Oracle client bug, > but I don't know anything about the Oracle client.) >=20 > Why? Well, I can't even guess, but a few things you might try are: > - disabling TSO and rx/tx checksum offload on the FreeBSD server's > network interface(s). > - try a different type of network card, if you have one handy. > I doubt these will make a difference, since the large receive queues > for the ESTABLISHED TCP connections in the Linux client suggests that > the data is getting through. Still might be worth a try, since there > might be one packet that isn't getting through and that is causing > issues for the Oracle client. >=20 > - if you can do it, try switching the Oracle client mounts to UDP. > (For UDP, you want to start with a rsize, wsize no bigger than > 16384 and then be prepared to make it smaller if the > "fragments dropped due to timeout" becomes non-zero for UDP when > you do a "netstat -s".) > - There might be a NFS over TCP bug in the Oracle client. > - when it is stuck again, do a "vmstat -z" and "vmstat -m" to > see if there is a large "InUse" for anything. > - in particular, check mbuf clusters >=20 > Also, you could try capturing packets when it > happens and look at then in wireshark to see if/what > related traffic is going on the wire. Focus on the TCP layer > as well as NFS. >=20 Looking at it again, It really looks like a bug in the Oracle client, so for now we've decided to disable the Direct NFS client and switch back = to the standard linux kernel NFS client. Unfortunately testing with UDP won't be possible as I think oracle's NFS = client only support TCP. What is curious is why the kernel NFS mount from the Linux host was also = stuck because of the misbehaving userspace client. I should have tested mounting from another host to see if the NFS server = would respond, as this seems like a DoS attack to the NFS server :) Anyways, I've started collecting and graphing the output of netstat -m = and vmstat -z in case something like this happens again. >>=20 >> Here is a netstat output for the nfs sessions from FreeBSD server >> side: >>=20 >> Proto Recv-Q Send-Q Local Address Foreign Address (state) >> tcp4 0 37215456 10.101.0.1.2049 10.101.0.2.42856 ESTABLISHED >> tcp4 0 14561020 10.101.0.1.2049 10.101.0.2.62854 FIN_WAIT_1 >> tcp4 0 3068132 10.100.0.1.2049 10.100.0.2.9712 FIN_WAIT_1 >>=20 >> Linux host sees this : >>=20 >> tcp 1 0 10.101.0.2:9270 10.101.0.1:2049 CLOSE_WAIT >> tcp 477940 0 10.100.0.2:9712 10.100.0.1:2049 ESTABLISHED > ** These hint that the Oracle client isn't reading the socket > for some reason. I'd guess that the send window is now full, > so the data is backing up in the send queue in the server. >> tcp 1 0 10.101.0.2:10588 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:12254 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:12438 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:17583 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:20285 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:20678 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:22892 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:28850 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:33851 10.100.0.1:2049 CLOSE_WAIT >> tcp 165 0 10.100.0.2:34190 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:35643 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:39498 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:39724 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:40742 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:41674 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:42942 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:42956 10.100.0.1:2049 CLOSE_WAIT >> tcp 477976 0 10.101.0.2:42856 10.101.0.1:2049 ESTABLISHED >> tcp 1 0 10.100.0.2:42045 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:42048 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:43063 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:44771 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:49568 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:50813 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:51418 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:54507 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:57201 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:58553 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:59638 10.101.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.100.0.2:62289 10.100.0.1:2049 CLOSE_WAIT >> tcp 1 0 10.101.0.2:61848 10.101.0.1:2049 CLOSE_WAIT >> tcp 476952 0 10.101.0.2:62854 10.101.0.1:2049 ESTABLISHED >>=20 >> Then I used "tcpdrop" on FreeBSD's side to drop the sessions, the = nfsd >> was able to die and be restarted. >> During the "hanged" period, all NFS mounts from the Linux host were >> inaccessible, and IO hanged. >>=20 >> The nfsd is running with drc2/drc3 and lkshared patches from Rick >> Macklem. >>=20 > These shouldn't have any effect on the above, unless you've exhausted > your mbuf clusters. Once you are out of mbuf clusters, I'm not sure > what might happen within the lower layers TCP->network interface. >=20 > Good luck with it, rick >=20 Thank you for the response! Cheers, Nikolay >> _______________________________________________ >> freebsd-fs@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8C72CE97-6D19-4847-9A89-DF8A05B984DD>