Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 Apr 2015 20:36:36 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Adam Guimont <aguimont@tezzaron.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: NFSD high CPU usage
Message-ID:  <238081719.19055888.1429058196527.JavaMail.root@uoguelph.ca>
In-Reply-To: <551F072C.1000505@tezzaron.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Adam Guimont wrote:
> Rick Macklem wrote:
> > I can think of two explanations for this.
> > 1 - The server nfsd threads get confused when the TCP recv Q fills
> >      and start looping around.
> > OR
> > 2 - The client is sending massive #s of RPCs (or crap that is
> >      incomplete RPCs).
> >
> > To get a better idea w.r.t. what is going on, I'd suggest that
> > you capture packets (for a relatively short period) when the
> > server is 100% CPU busy.
> > # tcpdump -s 0 -w out.pcap host <nfs-client>
> > - run on the server should do it.
> > Then look at out.pcap in wireshark and see what the packets
> > look like. (wireshark understands NFS, whereas tcpdump doesn't)
> > If #1, I'd guess very little traffic (maybe TCP layer stuff),
> > if #2, I'd guess you'll see a lot of RPC requests or garbage
> > that isn't a valid request. (This latter case would suggest a
> > CentOS problem.)
> >
> > If you capture the packets but can't look at them in wireshark,
> > you could email me the packet capture as an attachment and I
> > can look at it after Apr. 10, when I get home.
> >
> > rick
> >
> 
> Thanks Rick,
> 
> I was able to capture this today while it was happening. The capture
> is
> for about 100 seconds. I took a look at it in wireshark and to me it
> appears like the #2 situation you were describing.
> 
> If you would like to confirm that I've uploaded the pcap file here:
> 
> https://www.dropbox.com/s/pdhwj5z5tz7iwou/out.pcap.20150403
> 
Well, I took a look, but I'll admit I couldn't figure out much from it.
It appears that the TCP connection is in a pretty degraded state.
- FreeBSD is sending a whole bunch of TCP segments with 164bytes of
  data (that appears to be the same for each one, but I didn't look at
  them closely). Each of them has a Window size == 0 (PUSH + ACK).
  --> Linux responds with an ACK and no data (which makes sense because
      of the 0 length Window)
eventually FreeBSD does open up the Window after something like 1200 of the
above TCP segments.
--> It is possible that all these segments are RPC replies to similar
    requests, but Wireshark just think they're all RPC continuations
    and doesn't recognize an RPC message. (I couldn't be bothered to
    try and decode one manually.)

One thing I see is that the Linux window size is 24576. If TSO is
enabled in FreeBSD's net device, you might try disabling TSO, in
case it is sending too much or somehow getting confused.

Other than that, I think it would take a packet capture just when
the trouble starts to try and figure out how things get messed up.

I'm not good enough w.r.t. TCP to have any idea what might be
happening. Maybe someone conversant with TCP can look at the trace?

rick

> I will continue running some tests and trying to gather as much data
> as
> I can.
> 
> Regards,
> 
> Adam Guimont
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?238081719.19055888.1429058196527.JavaMail.root>