Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Aug 2014 21:53:55 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Niu Zhixiong <kaiaixi@gmail.com>
Cc:        Michael Tuexen <Michael.Tuexen@lurchi.franken.de>, Bill Yuan <bycn82@gmail.com>, freebsd-net@freebsd.org
Subject:   Re: A problem on TCP in High RTT Environment.
Message-ID:  <20140810045355.GM83475@funkthat.com>
In-Reply-To: <CAOENNMA-dwPQr53bM4rzC=1eitoi-JAB4mCGx4zybFwUC=GMNg@mail.gmail.com>
References:  <20140809184232.GF83475@funkthat.com> <8AE1AC56-D52F-4F13-AAA3-BB96042B37DD@lurchi.franken.de> <20140809204500.GG83475@funkthat.com> <3F6BC212-4223-4AAC-8668-A27075DC55C2@lurchi.franken.de> <CAOENNMCPuiYS7LHwMfOczhZ4yisjGkpOmWzv2pcAoi9Hhzb7dw@mail.gmail.com> <20140810022350.GI83475@funkthat.com> <CAOENNMB3=FZx5kSHVPDPBTtMKbmYJ=c_XNMcuYuoLPe=6U%2Bkxg@mail.gmail.com> <CAOENNMARg36KH1Y%2B0wG8pd7sSf8XKnMf6g790_KiKaj3Mdwyjw@mail.gmail.com> <20140810033212.GL83475@funkthat.com> <CAOENNMA-dwPQr53bM4rzC=1eitoi-JAB4mCGx4zybFwUC=GMNg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 11:48 +0800:
> I am using Intel I350-T4 NIC. The LRO is closed by default. And by the way,
> when I am using KVM-based virtual machine(virtio NIC) do the exactly same
> test. The results are same.

Have you tried disabling tso?  I asked that in an earlier email, but
never heard from you if that changed anything...

a lot of the trace looks like:
19:29:57.223574 IP 10.0.10.2.61010 > 10.0.10.3.9000: . 251521:257313(5792) ack 1 win 32783 <nop,nop,timestamp 51563557 1047294279>
19:29:57.223798 IP 10.0.10.3.9000 > 10.0.10.2.61010: . ack 257313 win 32745 <nop,nop,timestamp 1047294690 51563557>
19:29:57.225570 IP 10.0.10.2.61010 > 10.0.10.3.9000: . 257313:263105(5792) ack 1 win 32783 <nop,nop,timestamp 51563557 1047294279>

Notice how the ack comes back immediately, but for some reason, we decide to
wait almost 2ms before sending out the next frame...

For some reason, we just aren't filling our window out...  tcptcace's
graphs shows the winow at 2MB, but we only ever have 4 segments
outstanding at once...

> ifconfig igb0
> igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
> options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
>  ether a0:36:9f:38:27:d0
> inet 10.0.10.3 netmask 0xffffff00 broadcast 10.0.10.255
> inet6 fe80::a236:9fff:fe38:27d0%igb0 prefixlen 64 scopeid 0x1
>  nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
> media: Ethernet autoselect (1000baseT <full-duplex>)
>  status: active
> 
> Regards,
> Niu Zhixiong
> ?????????????????????????????????????????????
>  kaiaixi@gmail.com
> 
> 
> On Sun, Aug 10, 2014 at 11:32 AM, John-Mark Gurney <jmg@funkthat.com> wrote:
> 
> > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:50 +0800:
> > > I am sorry that I upload a WRONG SCTP capture. But, the throughput is
> > same.
> > > SCTP is double than TCP, about 18Mbps.
> > > ???
> > >  sctp_2.pcapng.gz
> > > <
> > https://docs.google.com/file/d/0By8sTL79ob4tMlh4WDlTSndHX0k/edit?usp=drive_web
> > >
> > > ???
> >
> > Ok, the owin graph is very interesting...  We do have a full 2MB window
> > on the receiver side, but for some reason, we only ever have just under
> > 6k outstanding on the connection...
> >
> > So, it looks like we send for a short period of time, and then stop
> > sending...  Do you have LRO enabled?  I think it might be related to:
> > https://svnweb.freebsd.org/changeset/base/r256920
> >
> > As I'm seeing >100ms gaps where the sender doesn't send any data, and
> > as soon as more than one ack comes in, the next segment goes out...  If
> > we only receive a single ack, then we wait for a timeout before sending
> > the next segment..
> >
> > Can you try to disable LRO on the receiving host?
> >
> > ifconfig <iface> -lro
> >
> > And see if that helps... If it does...  Applying the patch, or compiling
> > a more recent kernel from stable/10 that is after r257367 as that is was
> > the date that the change was merged...
> >
> > > On Sun, Aug 10, 2014 at 10:42 AM, Niu Zhixiong <kaiaixi@gmail.com>
> > wrote:
> > >
> > > > I am sure that wnd is about 2MB all the time.
> > > > This is my latest capture, plz see Google Drive.
> > > > In the latest test, TCP(0s-120s) is about 9Mbps and SCTP(0s-120s) is
> > about
> > > > 18Mbps.
> > > > (The bandwidth(20Mbps) and delay(200ms) is set by dummynet)
> > > > The SCTP and TCP are tested in same environment.
> > > >
> > > > ???
> > > >  sctp.pcapng.gz
> > > > <
> > https://docs.google.com/file/d/0By8sTL79ob4tYl9sM2V5a19iNVU/edit?usp=drive_web
> > >
> > > > ??????
> > > >  tcp.pcapng.gz
> > > > <
> > https://docs.google.com/file/d/0By8sTL79ob4tV0NMR1FYLUQ3MWs/edit?usp=drive_web
> > >
> > > > ???
> > > >
> > > >
> > > >
> > > > Regards,
> > > > Niu Zhixiong
> > > > ?????????????????????????????????????????????
> > > >  kaiaixi@gmail.com
> > > >
> > > >
> > > > On Sun, Aug 10, 2014 at 10:23 AM, John-Mark Gurney <jmg@funkthat.com>
> > > > wrote:
> > > >
> > > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:12 +0800:
> > > >> > During the TCP4 transmission.
> > > >> > Proto Recv-Q Send-Q Local Address          Foreign Address
> > > >>  (state)
> > > >> > tcp4       0 2097346 10.0.10.2.13504        10.0.10.3.9000
> > > >> > ESTABLISHED
> > > >>
> > > >> Ok, so you are getting a full 2MB in there, and w/ that, you should
> > > >> easily be saturating your pipe...
> > > >>
> > > >> The next thing would be to get a tcpdump, and take a look at the
> > > >> window size.. Wireshark has lots of neat tools to make this analysis
> > > >> easy...  Another tool that is good is tcptrace..  It can output a
> > > >> variety of different graphs that will help you track down, and see
> > > >> what part of the system is the problem...
> > > >>
> > > >> You probably only need a few tens of seconds of the tcpdump...
> > > >>
> > > >> > On Sun, Aug 10, 2014 at 4:58 AM, Michael Tuexen <
> > > >> > Michael.Tuexen@lurchi.franken.de> wrote:
> > > >> >
> > > >> > >
> > > >> > > On 09 Aug 2014, at 22:45, John-Mark Gurney <jmg@funkthat.com>
> > wrote:
> > > >> > >
> > > >> > > > Michael Tuexen wrote this message on Sat, Aug 09, 2014 at 21:51
> > > >> +0200:
> > > >> > > >>
> > > >> > > >> On 09 Aug 2014, at 20:42, John-Mark Gurney <jmg@funkthat.com>
> > > >> wrote:
> > > >> > > >>
> > > >> > > >>> Niu Zhixiong wrote this message on Fri, Aug 08, 2014 at 20:34
> > > >> +0800:
> > > >> > > >>>> Dear all,
> > > >> > > >>>>
> > > >> > > >>>> Last month, I send problems related to FTP/TCP in a high RTT
> > > >> > > environment.
> > > >> > > >>>> After that, I setup a simulation environment(Dummynet) to
> > test
> > > >> TCP
> > > >> > > and SCTP
> > > >> > > >>>> in high delay environment. After finishing the test, I can
> > see
> > > >> TCP is
> > > >> > > >>>> always slower than SCTP. But, I think it is not possible.
> > (Plz
> > > >> see the
> > > >> > > >>>> figure in the attachment). When the delay is 200ms(means
> > > >> RTT=400ms).
> > > >> > > >>>> Besides, the TCP is extremely slow.
> > > >> > > >>>>
> > > >> > > >>>> ALL BW=20Mbps, DELAY= 0 ~ 200MS, Packet LOSS = 0 (by
> > dummynet)
> > > >> > > >>>>
> > > >> > > >>>> This is my parameters:
> > > >> > > >>>> FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEASE #0: Thu
> > Aug
> > > >>  7
> > > >> > > >>>> 11:04:15 HKT 2014
> > > >> > > >>>>
> > > >> > > >>>> sysctl net.inet.tcp
> > > >> > > >>>
> > > >> > > >>> [...]
> > > >> > > >>>
> > > >> > > >>>> net.inet.tcp.recvbuf_auto: 0
> > > >> > > >>>
> > > >> > > >>> [...]
> > > >> > > >>>
> > > >> > > >>>> net.inet.tcp.sendbuf_auto: 0
> > > >> > > >>>
> > > >> > > >>> Try enabling this...  This should allow the buffer to grow
> > large
> > > >> enough
> > > >> > > >>> to deal w/ the higher latency...
> > > >> > > >>>
> > > >> > > >>> Also, make sure your program isn't setting the recv buffer
> > size
> > > >> as that
> > > >> > > >>> will disable the auto growing...
> > > >> > > >> I think the program sets the buffer to 2MB, which it also does
> > for
> > > >> SCTP.
> > > >> > > >> So having both statically at the same size makes sense for the
> > > >> > > comparison.
> > > >> > > >> I remember that there was a bug in the combination of LRO and
> > > >> delayed
> > > >> > > ACK,
> > > >> > > >> which was fixed, but I don't remember it was fixed before
> > 10.0...
> > > >> > > >
> > > >> > > > Sounds like disabling LRO and TSO would be a useful test to see
> > if
> > > >> that
> > > >> > > > improves things...  But hiren said that the fix made it, so...
> > > >> > > >
> > > >> > > >>> If you use netstat -a, you should be able to see the send-q
> > on the
> > > >> > > >>> sender grow as necessary...
> > > >> > > >
> > > >> > > > Also, getting the send-q output while it's running would let us
> > know
> > > >> > > > if the buffer is getting to 2MB or not...
> > > >> > > That is correct. Niu: Can you provide this?
> >
> > --
> >   John-Mark Gurney                              Voice: +1 415 225 5579
> >
> >      "All that I will do, has been done, All that I have, has not."
> >
> _______________________________________________
> freebsd-net@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org"
-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140810045355.GM83475>