Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Aug 2015 15:32:15 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Chris Stankevitz <chris@stankevitz.com>
Cc:        freebsd-net@freebsd.org
Subject:   Re: ssh over WAN: TCP window too small
Message-ID:  <20150826223215.GS33167@funkthat.com>
In-Reply-To: <55DE2FDF.5030707@stankevitz.com>
References:  <55DCF080.7080208@stankevitz.com> <20150826010323.GN33167@funkthat.com> <55DD2A98.2010605@stankevitz.com> <20150826082457.GQ33167@funkthat.com> <55DE2FDF.5030707@stankevitz.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Chris Stankevitz wrote this message on Wed, Aug 26, 2015 at 14:30 -0700:
> Thanks again.  I appreciate you teaching me "how to fish".  I basically 
> spent all morning reading kdump output.

No worries, glad you're learning...

> On 8/26/15 1:24 AM, John-Mark Gurney wrote:
>  > It really looks like we should set TCPRcvBufPoll by default on
>  > FreeBSD...
> 
> According to /etc/ssh/sshd_config, TCPRcvBufPoll defaults to true. 
> ktrace (thank you for showing me this) shows the ssh process 
> continuously polling SO_RCVBUF.
> 
> In my case with TCPRcvBufPoll on and HPNBufferSize and TCPRcvBuf unset, 
> ssh never sets SO_RCVBUF or SO_SNDBUF so presumably FreeBSD has complete 
> control over those values (you saw the same thing).
> 
> But perhaps all this is moot in my case because netstat shows that the 
> sender and receiver have sufficiently large SND/RCV buffers:  (can you 
> confirm my interpretation is correct?)
> 
> Sender   S-BMAX  662816
> Receiver R-BMAX  528416

Yes, this is correct... So definately not an issue w/ window size of
socket buffer size...

> >> I will experiment with TCPRcvBuf.
> >
> > It does look like the values are in KB
> 
> Yes, README.hpn says they are in KB.  However, even though it is 
> described in README.hpn, I cannot set TCPRcvBuf in sshd_config:
> 
> /etc/ssh/sshd_config: line 141: Bad configuration option: TCPRcvBuf
> /etc/ssh/sshd_config: line 141: Bad configuration option: TcpRcvBuf

Looks like:
https://www.psc.edu/index.php/networking/695

Says that TCPRcvBuf is client side only..  Which is odd, but not an
issue w/ FreeBSD...

> However, as I described above, I believe the buffer size is a red herring.

Agreed....

> ktrace shows pretty clearly what is happening.  ssh isn't even bothering 
> to read from /dev/zero until select(2) gives the green light.  And in my 
> case select(2) blocks ssh for 0.1 second for every ~30KB of data:
> 
> 
>   94146 ssh      6.592042 CALL  getsockopt(0x3,SOL_SOCKET,SO_RCVBUF
>   94146 ssh      6.592054 RET   getsockopt 0
>   94146 ssh      6.592065 CALL  select(0x7,0x803817168,0x803817178,0,0)
>   94146 ssh      6.685873 RET   select 1
>   94146 ssh      6.685907 CALL  read(0x3,0x7fffffffae40,0x2000)
>   94146 ssh      6.685921 GIO   fd 3 read 36 bytes
>    [ read of fd 3 snipped]
>   94146 ssh      6.685931 RET   read 36/0x24
>   94146 ssh      6.685950 CALL  getpid
>   94146 ssh      6.685962 RET   getpid 94146/0x16fc2
>   94146 ssh      6.685974 CALL  getpid
>   94146 ssh      6.685984 RET   getpid 94146/0x16fc2
>   94146 ssh      6.685996 CALL  getpid
>   94146 ssh      6.686006 RET   getpid 94146/0x16fc2
>   94146 ssh      6.686017 CALL  getpid
>   94146 ssh      6.686027 RET   getpid 94146/0x16fc2
>   94146 ssh      6.686091 CALL  getsockopt(0x3,SOL_SOCKET,SO_RCVBUF
>   94146 ssh      6.686103 RET   getsockopt 0
>   94146 ssh      6.686116 CALL  select(0x7,0x803817168,0x803817178,0,0)
>   94146 ssh      6.686128 RET   select 2
>   94146 ssh      6.686140 CALL  read(0x4,0x7fffffff6c70,0x4000)
>   94146 ssh      6.686154 GIO   fd 4 read 4096 bytes
>    [ read of stdin (/dev/zero) snipped)

It would be interesting to know how long from the read of stdin (and is
it really reading stdin in 4k blocks?  If so, that should be fixed)
to the write out the socket... Basicly, how long does it take to encrypt
the data...

> Note in the above the blocking call to select at time 6.592065 that 
> takes ~0.1 second.  This is the reason my connection is slow.  The 
> content appears to be encrypted... I presume it's an application-level 
> ssh ack.
> 
> BTW when I disable HPN (HPNdisabled=yes) that same select happens, but 
> it blocks for less time (~0.05 second) and roughly doubles my throughput.
> 
> > Also, don't forget that if you set this in .ssh/config, you only set
> > the client size recive buffer, not the server side, so you'd probably
> > need to add this to the server's sshd_config to enable it for server
> > receive side...
> 
> I understand.
> 
> > ktrace -i ssh <params>...
> 
> Thank you, this is awesome.  Is there a way for me to ktrace 'b' in this 
> example: `a | b | c`?  I tried `ktrace a | b | c` and `ktrace test.sh` 

When you do this, make sure you ktrace -i to inherit the ktrace flag
to the children...  In the case of a | b | c, just doing:
a | ktrace b | c

Should work too...

> (which included a|b|c) but neither seemed to work.  I'm using stream 
> redirection now but it doesn't afford me the bs control of dd.  Perhaps 
> named pipes is the solution.
> 
> > Oh, I forgot to ask to make sure that net.inet.tcp.{send,recv}buf_auto
> > is enabled:
> 
> Unfortunately it is enabled :-/
> 
> > Maybe a dump of your net.inet.tcp might also be helpful...
> 
> This should all be standard out-of-the-box:

[...]

> net.inet.tcp.sendspace: 32768
> net.inet.tcp.recvspace: 65536

Try increasing these and reporting back...  the buf_auto should override
those, but this may be limiting it...

> net.inet.tcp.cc.algorithm: newreno
> net.inet.tcp.cc.available: newreno

Another option could be to look at other cc algorithms...  See mod_cc(4)
for more info...

[...]

> >> S-BCNT  57344
> >
> > You were probably unlucky when you sampled this value, and caught it at
> > a bad time...  Also, look at how much CPU time ssh uses...  ssh can
> > introduce additional latency that isn't apparent from the network...
> 
> S-BCNT is always ~60KB when HPN is enabled.  This jives with my "ssh is 
> spending all of its time waiting on select(2)" theory.

So, I looked at what getsockopt SO_RCVBUF returns, and it returns:
                case SO_RCVBUF:
                        optval = so->so_rcv.sb_hiwat;

Which is NOT S-BMAX, so it looks like OpenSSH only ever gets 66052 bytes
for the buffer...

If it's decided to base it's waiting for ack on SO_RCVBUF, then this
is probably where the issue is...

Try setting HPNBufferSize to something resonably large, like 1MB, or
above your bandwidth-delay product to see if this will make a difference..
Per:
  Conditions: HPNBufferSize SET, TCPRcvBufPoll enabled, TCPRcvBuf NOT Set
  Result: HPN Buffer Size = grows to HPNBufferSize
    The buffer will grow up to the maximum size specified here. 

This could be an interaction w/ the HPN buffering and the socket buffer
size...  As the receive buffer never grows large, HPN can't buffer enough
data to meet the bandwidth product delay, even though it should be able
to..

> > It's very possible that we don't set any of these values, so what
> > happens is that ssh reads the value of the receive buffer at startup,
> > which is 64k or so, and only does buffering in that size..  Then you
> > end up w/ a latency not of your network, but of the speed at which
> > your computer can encrypt at...  Just a thought, but you could also
> > measure latency between writes using ktrace to help figure this
> > out...
> 
> Yes, I believe something like this is happening now.
> 
> Thank you again for your help... this thread is proving to me one of 
> those quantum leap moments for me in terms of FreeBSD knowledge.

Glad I could be of help.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150826223215.GS33167>