FreeBSD Mail Archives

Date:      Wed, 21 Nov 2012 17:27:32 +0200
From:      Nikolay Denev <ndenev@gmail.com>
To:        Rick Macklem <rmacklem@uoguelph.ca>
Cc:        "freebsd-fs@freebsd.org" <freebsd-fs@FreeBSD.ORG>
Subject:   Re: nfsd hang in sosend_generic
Message-ID:  <8C72CE97-6D19-4847-9A89-DF8A05B984DD@gmail.com>
In-Reply-To: <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca>
References:  <1183657468.630412.1353506493075.JavaMail.root@erie.cs.uoguelph.ca>


On Nov 21, 2012, at 4:01 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:

> Nikolay Denev wrote:
>> Hello,
>>=20
>> First of all, I'm not sure if this is actually nfsd issue and not
>> network stack issue.
>>=20
>> I've just had nfsd hang in unkillable state while doing some IO from
>> Linux host running Oracle DB using Oracle's Direct NFS.
>>=20
>> I was watching from some time how the Direct NFS client loads the NFS
>> server differently, i.e.:
>> with the linux kernel NFS client I see single TCP session to port =
2049
>> and all traffic goes there, while the Direct NFS client
>> is much more aggressive and creates multiple TCP sessions, and often
>> was able to generate pretty big Send/Recv-Q's on FreeBSD's side.
>> I'm mentioning this as probably is related.
>>=20
> I don't know anything about the Oracle client, but it might be =
creating
> new TCP connections to try and recover from a "hung" state. Your =
netstat
> for the client below shows that there are several ESTABLISHED TCP =
connections
> with large receive queues. I wouldn't expect to see this and it =
suggests
> that the Oracle client isn't receiving/reading data off the TCP socket =
for
> some reason. Once it isn't receiving/reading an RPC reply off the TCP =
socket,
> it might create a new one to attempt a retry of the RPC. (NFSv4 =
requires that
> any retry of an RPC be done on a new TCP connection. Although that =
requirement
> doesn't exist for NFSv3, it would probably be considered "good =
practice" and
> will happen if NFSv3 and NFSv4 share the same RPC socket handling =
code.)
>=20
>> Here's the procstat -kk of the hanged nfsd process :
>>=20
>> [... snipped huge procstat output =85]
>>=20
> It appears that all the nfsd threads are trying to send RPC replies
> back to the client and are stuck there. As you can see below, the
> send queues for the TCP sockets are big, so the data isn't getting
> through to the client. The large receive queue in the ESTABLISHED
> connections on the Linux client suggests that Oracle isn't taking
> data off the TCP socket for some reason, which would result in this,
> once the send window is filled. At least that's my rusty old
> understanding of TCP. (That would hint at an Oracle client bug,
> but I don't know anything about the Oracle client.)
>=20
> Why? Well, I can't even guess, but a few things you might try are:
> - disabling TSO and rx/tx checksum offload on the FreeBSD server's
>  network interface(s).
> - try a different type of network card, if you have one handy.
> I doubt these will make a difference, since the large receive queues
> for the ESTABLISHED TCP connections in the Linux client suggests that
> the data is getting through. Still might be worth a try, since there
> might be one packet that isn't getting through and that is causing
> issues for the Oracle client.
>=20
> - if you can do it, try switching the Oracle client mounts to UDP.
>  (For UDP, you want to start with a rsize, wsize no bigger than
>   16384 and then be prepared to make it smaller if the
>   "fragments dropped due to timeout" becomes non-zero for UDP when
>   you do a "netstat -s".)
>   - There might be a NFS over TCP bug in the Oracle client.
> - when it is stuck again, do a "vmstat -z" and "vmstat -m" to
>  see if there is a large "InUse" for anything.
>  - in particular, check mbuf clusters
>=20
> Also, you could try capturing packets when it
> happens and look at then in wireshark to see if/what
> related traffic is going on the wire. Focus on the TCP layer
> as well as NFS.
>=20

Looking at it again, It really looks like a bug in the Oracle client, so
for now we've decided to disable the Direct NFS client and switch back =
to the
standard linux kernel NFS client.

Unfortunately testing with UDP won't be possible as I think oracle's NFS =
client only support TCP.

What is curious is why the kernel NFS mount from the Linux host was also =
stuck because of the misbehaving userspace client.
I should have tested mounting from another host to see if the NFS server =
would respond, as this seems like a DoS attack to the NFS server :)

Anyways, I've started collecting and graphing the output of netstat -m =
and vmstat -z in case
something like this happens again.

>>=20
>> Here is a netstat output for the nfs sessions from FreeBSD server
>> side:
>>=20
>> Proto Recv-Q Send-Q Local Address Foreign Address (state)
>> tcp4 0 37215456 10.101.0.1.2049 10.101.0.2.42856 ESTABLISHED
>> tcp4 0 14561020 10.101.0.1.2049 10.101.0.2.62854 FIN_WAIT_1
>> tcp4 0 3068132 10.100.0.1.2049 10.100.0.2.9712 FIN_WAIT_1
>>=20
>> Linux host sees this :
>>=20
>> tcp 1 0 10.101.0.2:9270 10.101.0.1:2049 CLOSE_WAIT
>> tcp 477940 0 10.100.0.2:9712 10.100.0.1:2049 ESTABLISHED
> ** These hint that the Oracle client isn't reading the socket
>   for some reason. I'd guess that the send window is now full,
>   so the data is backing up in the send queue in the server.
>> tcp 1 0 10.101.0.2:10588 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:12254 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:12438 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:17583 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:20285 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:20678 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:22892 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:28850 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:33851 10.100.0.1:2049 CLOSE_WAIT
>> tcp 165 0 10.100.0.2:34190 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:35643 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:39498 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:39724 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:40742 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:41674 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:42942 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:42956 10.100.0.1:2049 CLOSE_WAIT
>> tcp 477976 0 10.101.0.2:42856 10.101.0.1:2049 ESTABLISHED
>> tcp 1 0 10.100.0.2:42045 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:42048 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:43063 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:44771 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:49568 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:50813 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:51418 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:54507 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:57201 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:58553 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:59638 10.101.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.100.0.2:62289 10.100.0.1:2049 CLOSE_WAIT
>> tcp 1 0 10.101.0.2:61848 10.101.0.1:2049 CLOSE_WAIT
>> tcp 476952 0 10.101.0.2:62854 10.101.0.1:2049 ESTABLISHED
>>=20
>> Then I used "tcpdrop" on FreeBSD's side to drop the sessions, the =
nfsd
>> was able to die and be restarted.
>> During the "hanged" period, all NFS mounts from the Linux host were
>> inaccessible, and IO hanged.
>>=20
>> The nfsd is running with drc2/drc3 and lkshared patches from Rick
>> Macklem.
>>=20
> These shouldn't have any effect on the above, unless you've exhausted
> your mbuf clusters. Once you are out of mbuf clusters, I'm not sure
> what might happen within the lower layers TCP->network interface.
>=20
> Good luck with it, rick
>=20

Thank you for the response!

Cheers,
Nikolay

>> _______________________________________________
>> freebsd-fs@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8C72CE97-6D19-4847-9A89-DF8A05B984DD>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation