From owner-freebsd-net@FreeBSD.ORG Tue Aug 12 08:22:55 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DDE0B227 for ; Tue, 12 Aug 2014 08:22:55 +0000 (UTC) Received: from mail-qc0-x236.google.com (mail-qc0-x236.google.com [IPv6:2607:f8b0:400d:c01::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 93F8528E9 for ; Tue, 12 Aug 2014 08:22:55 +0000 (UTC) Received: by mail-qc0-f182.google.com with SMTP id i8so2595499qcq.27 for ; Tue, 12 Aug 2014 01:22:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LYPXX9ciZjEXTlmrPOYF3MMdpBwV3vBNB9mYK4p6Txc=; b=lYXLKU3trnYJYTg4hZ5oepn9lM48TRL8PwaIxkg+wUEpR5DaeXX+gXRLAFE9SfWKk5 g6O/YB9wPHKj4+tDKyGK4+CxuW0GkRy+nwXTHwi2Cb8Pzo3496klqNc9mUwY0bKYcreO tXBO+LxBlY2SsYAk0zDORGQi4oRGqGFILBP2UEY1DTGcpofsOwm66nQP5xYoAo1gt/VS dU8g+im0ym86smnM+dpYyoioUVm+F0v4q2zZWboSCxUH2dUSSdN6+ND0JMmIX8kBxdFw uN4Y2gyo5/YGWyAVt/GB1ti8AUCvIWJzxnXb78fDggVDkUyv/L56z7rl9XEXdwLnXxsO jD+w== MIME-Version: 1.0 X-Received: by 10.224.95.74 with SMTP id c10mr3983970qan.35.1407831774591; Tue, 12 Aug 2014 01:22:54 -0700 (PDT) Received: by 10.224.65.65 with HTTP; Tue, 12 Aug 2014 01:22:54 -0700 (PDT) In-Reply-To: <20140811171517.GW83475@funkthat.com> References: <3F6BC212-4223-4AAC-8668-A27075DC55C2@lurchi.franken.de> <20140810022350.GI83475@funkthat.com> <20140810033212.GL83475@funkthat.com> <20140810045355.GM83475@funkthat.com> <20140811171517.GW83475@funkthat.com> Date: Tue, 12 Aug 2014 16:22:54 +0800 Message-ID: Subject: Re: A problem on TCP in High RTT Environment. From: Niu Zhixiong To: Niu Zhixiong , Michael Tuexen , "freebsd-net@freebsd.org" , Bill Yuan Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Aug 2014 08:22:56 -0000 I use a switch and capture in the a sender mirror port. and I also noticed that some acks are before segment. I am not sure how to solve the problem. But, for my kvm-based virtual machines experimental environment. These are no such issues. =E2=80=8B testtest.tar.gz =E2=80=8B Regards, Niu Zhixiong =EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF= =BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D kaiaixi@gmail.com On Tue, Aug 12, 2014 at 1:15 AM, John-Mark Gurney wrote: > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 20:27 +0800: > > Hi, I am not sure whether my last email is filtered by mailing list. > > After disabled tso??? the speed become even poorer??? > > This is the packets captures. Plz see google drive. > > tcp_with_tso_off.pcapng.gz > > < > https://docs.google.com/file/d/0By8sTL79ob4tYXQ0N0lZN0FUNVE/edit?usp=3Ddr= ive_web > > > > So, the reason that this is also slow is that it only ever really has one > segment on the wire at a time... This is similar to the previous > packet capture... > > Which side was thie captured on? Was this the receiving > side? Because it looks like packets are getting merged still... > > 22:19:25.628087 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq > 149171:152067, ack 1, win 32783, options [nop,nop,TS val 61731427 ecr > 2405797018], length 2896 > > and as before: > 22:19:25.634095 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq > 165099:166547, ack 1, win 32783, options [nop,nop,TS val 61731431 ecr > 2405797022], length 1448 > 22:19:25.635084 IP 10.0.10.3.9000 > 10.0.10.2.62995: Flags [.], ack > 167995, win 32745, options [nop,nop,TS val 2405797438 ecr 61731431], leng= th > 0 > 22:19:25.635097 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq > 166547:167995, ack 1, win 32783, options [nop,nop,TS val 61731431 ecr > 2405797022], length 1448 > 22:19:25.636073 IP 10.0.10.2.62995 > 10.0.10.3.9000: Flags [.], seq > 167995:170891, ack 1, win 32783, options [nop,nop,TS val 61731431 ecr > 2405797022], length 2896 > 22:19:25.636266 IP 10.0.10.3.9000 > 10.0.10.2.62995: Flags [.], ack > 170891, win 32745, options [nop,nop,TS val 2405797439 ecr 61731431], leng= th > 0 > > Though the other thing I noticed is that we appear to be ack'ing before > the segment was received, which is a bit odd... And it happens quite > consistantly... > > We really need someone who knows our TCP stack to comment on this... > > > On Sun, Aug 10, 2014 at 1:24 PM, Niu Zhixiong wrote= : > > > > > Hi??? > > > After disabled tso??? the speed become even poorer??? > > > This is the packets captures. Plz see google drive. > > > ??? > > > tcp_with_tso_off.pcapng.gz > > > < > https://docs.google.com/file/d/0By8sTL79ob4tYXQ0N0lZN0FUNVE/edit?usp=3Ddr= ive_web > > > > > ??? > > > > > > > > > John-Mark Gurney >???2014???8???10????????????????????? > > > > > > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 11:48 +0800: > > >> > I am using Intel I350-T4 NIC. The LRO is closed by default. And by > the > > >> way, > > >> > when I am using KVM-based virtual machine(virtio NIC) do the exact= ly > > >> same > > >> > test. The results are same. > > >> > > >> Have you tried disabling tso? I asked that in an earlier email, but > > >> never heard from you if that changed anything... > > >> > > >> a lot of the trace looks like: > > >> 19:29:57.223574 IP 10.0.10.2.61010 > 10.0.10.3.9000: . > > >> 251521:257313(5792) ack 1 win 32783 1047294279> > > >> 19:29:57.223798 IP 10.0.10.3.9000 > 10.0.10.2.61010: . ack 257313 wi= n > > >> 32745 > > >> 19:29:57.225570 IP 10.0.10.2.61010 > 10.0.10.3.9000: . > > >> 257313:263105(5792) ack 1 win 32783 1047294279> > > >> > > >> Notice how the ack comes back immediately, but for some reason, we > decide > > >> to > > >> wait almost 2ms before sending out the next frame... > > >> > > >> For some reason, we just aren't filling our window out... tcptcace'= s > > >> graphs shows the winow at 2MB, but we only ever have 4 segments > > >> outstanding at once... > > >> > > >> > ifconfig igb0 > > >> > igb0: flags=3D8843 metric = 0 > mtu > > >> 1500 > > >> > > > >> > options=3D403bb > > >> > ether a0:36:9f:38:27:d0 > > >> > inet 10.0.10.3 netmask 0xffffff00 broadcast 10.0.10.255 > > >> > inet6 fe80::a236:9fff:fe38:27d0%igb0 prefixlen 64 scopeid 0x1 > > >> > nd6 options=3D29 > > >> > media: Ethernet autoselect (1000baseT ) > > >> > status: active > > >> > > > >> > Regards, > > >> > Niu Zhixiong > > >> > ????????????????????????????????????????????? > > >> > kaiaixi@gmail.com > > >> > > > >> > > > >> > On Sun, Aug 10, 2014 at 11:32 AM, John-Mark Gurney < > jmg@funkthat.com> > > >> wrote: > > >> > > > >> > > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:50 > +0800: > > >> > > > I am sorry that I upload a WRONG SCTP capture. But, the > throughput > > >> is > > >> > > same. > > >> > > > SCTP is double than TCP, about 18Mbps. > > >> > > > ??? > > >> > > > sctp_2.pcapng.gz > > >> > > > < > > >> > > > > >> > https://docs.google.com/file/d/0By8sTL79ob4tMlh4WDlTSndHX0k/edit?usp=3Ddr= ive_web > > >> > > > > > >> > > > ??? > > >> > > > > >> > > Ok, the owin graph is very interesting... We do have a full 2MB > > >> window > > >> > > on the receiver side, but for some reason, we only ever have jus= t > > >> under > > >> > > 6k outstanding on the connection... > > >> > > > > >> > > So, it looks like we send for a short period of time, and then > stop > > >> > > sending... Do you have LRO enabled? I think it might be relate= d > to: > > >> > > https://svnweb.freebsd.org/changeset/base/r256920 > > >> > > > > >> > > As I'm seeing >100ms gaps where the sender doesn't send any data= , > and > > >> > > as soon as more than one ack comes in, the next segment goes > out... > > >> If > > >> > > we only receive a single ack, then we wait for a timeout before > > >> sending > > >> > > the next segment.. > > >> > > > > >> > > Can you try to disable LRO on the receiving host? > > >> > > > > >> > > ifconfig -lro > > >> > > > > >> > > And see if that helps... If it does... Applying the patch, or > > >> compiling > > >> > > a more recent kernel from stable/10 that is after r257367 as tha= t > is > > >> was > > >> > > the date that the change was merged... > > >> > > > > >> > > > On Sun, Aug 10, 2014 at 10:42 AM, Niu Zhixiong < > kaiaixi@gmail.com> > > >> > > wrote: > > >> > > > > > >> > > > > I am sure that wnd is about 2MB all the time. > > >> > > > > This is my latest capture, plz see Google Drive. > > >> > > > > In the latest test, TCP(0s-120s) is about 9Mbps and > SCTP(0s-120s) > > >> is > > >> > > about > > >> > > > > 18Mbps. > > >> > > > > (The bandwidth(20Mbps) and delay(200ms) is set by dummynet) > > >> > > > > The SCTP and TCP are tested in same environment. > > >> > > > > > > >> > > > > ??? > > >> > > > > sctp.pcapng.gz > > >> > > > > < > > >> > > > > >> > https://docs.google.com/file/d/0By8sTL79ob4tYl9sM2V5a19iNVU/edit?usp=3Ddr= ive_web > > >> > > > > > >> > > > > ?????? > > >> > > > > tcp.pcapng.gz > > >> > > > > < > > >> > > > > >> > https://docs.google.com/file/d/0By8sTL79ob4tV0NMR1FYLUQ3MWs/edit?usp=3Ddr= ive_web > > >> > > > > > >> > > > > ??? > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > Regards, > > >> > > > > Niu Zhixiong > > >> > > > > ????????????????????????????????????????????? > > >> > > > > kaiaixi@gmail.com > > >> > > > > > > >> > > > > > > >> > > > > On Sun, Aug 10, 2014 at 10:23 AM, John-Mark Gurney < > > >> jmg@funkthat.com> > > >> > > > > wrote: > > >> > > > > > > >> > > > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:= 12 > > >> +0800: > > >> > > > >> > During the TCP4 transmission. > > >> > > > >> > Proto Recv-Q Send-Q Local Address Foreign Addres= s > > >> > > > >> (state) > > >> > > > >> > tcp4 0 2097346 10.0.10.2.13504 10.0.10.3.900= 0 > > >> > > > >> > ESTABLISHED > > >> > > > >> > > >> > > > >> Ok, so you are getting a full 2MB in there, and w/ that, yo= u > > >> should > > >> > > > >> easily be saturating your pipe... > > >> > > > >> > > >> > > > >> The next thing would be to get a tcpdump, and take a look a= t > the > > >> > > > >> window size.. Wireshark has lots of neat tools to make this > > >> analysis > > >> > > > >> easy... Another tool that is good is tcptrace.. It can > output a > > >> > > > >> variety of different graphs that will help you track down, > and > > >> see > > >> > > > >> what part of the system is the problem... > > >> > > > >> > > >> > > > >> You probably only need a few tens of seconds of the > tcpdump... > > >> > > > >> > > >> > > > >> > On Sun, Aug 10, 2014 at 4:58 AM, Michael Tuexen < > > >> > > > >> > Michael.Tuexen@lurchi.franken.de> wrote: > > >> > > > >> > > > >> > > > >> > > > > >> > > > >> > > On 09 Aug 2014, at 22:45, John-Mark Gurney < > jmg@funkthat.com > > >> > > > >> > > wrote: > > >> > > > >> > > > > >> > > > >> > > > Michael Tuexen wrote this message on Sat, Aug 09, 201= 4 > at > > >> 21:51 > > >> > > > >> +0200: > > >> > > > >> > > >> > > >> > > > >> > > >> On 09 Aug 2014, at 20:42, John-Mark Gurney < > > >> jmg@funkthat.com> > > >> > > > >> wrote: > > >> > > > >> > > >> > > >> > > > >> > > >>> Niu Zhixiong wrote this message on Fri, Aug 08, 201= 4 > at > > >> 20:34 > > >> > > > >> +0800: > > >> > > > >> > > >>>> Dear all, > > >> > > > >> > > >>>> > > >> > > > >> > > >>>> Last month, I send problems related to FTP/TCP in = a > > >> high RTT > > >> > > > >> > > environment. > > >> > > > >> > > >>>> After that, I setup a simulation > environment(Dummynet) > > >> to > > >> > > test > > >> > > > >> TCP > > >> > > > >> > > and SCTP > > >> > > > >> > > >>>> in high delay environment. After finishing the > test, I > > >> can > > >> > > see > > >> > > > >> TCP is > > >> > > > >> > > >>>> always slower than SCTP. But, I think it is not > > >> possible. > > >> > > (Plz > > >> > > > >> see the > > >> > > > >> > > >>>> figure in the attachment). When the delay is > 200ms(means > > >> > > > >> RTT=3D400ms). > > >> > > > >> > > >>>> Besides, the TCP is extremely slow. > > >> > > > >> > > >>>> > > >> > > > >> > > >>>> ALL BW=3D20Mbps, DELAY=3D 0 ~ 200MS, Packet LOSS = =3D 0 (by > > >> > > dummynet) > > >> > > > >> > > >>>> > > >> > > > >> > > >>>> This is my parameters: > > >> > > > >> > > >>>> FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEA= SE > > >> #0: Thu > > >> > > Aug > > >> > > > >> 7 > > >> > > > >> > > >>>> 11:04:15 HKT 2014 > > >> > > > >> > > >>>> > > >> > > > >> > > >>>> sysctl net.inet.tcp > > >> > > > >> > > >>> > > >> > > > >> > > >>> [...] > > >> > > > >> > > >>> > > >> > > > >> > > >>>> net.inet.tcp.recvbuf_auto: 0 > > >> > > > >> > > >>> > > >> > > > >> > > >>> [...] > > >> > > > >> > > >>> > > >> > > > >> > > >>>> net.inet.tcp.sendbuf_auto: 0 > > >> > > > >> > > >>> > > >> > > > >> > > >>> Try enabling this... This should allow the buffer = to > > >> grow > > >> > > large > > >> > > > >> enough > > >> > > > >> > > >>> to deal w/ the higher latency... > > >> > > > >> > > >>> > > >> > > > >> > > >>> Also, make sure your program isn't setting the recv > > >> buffer > > >> > > size > > >> > > > >> as that > > >> > > > >> > > >>> will disable the auto growing... > > >> > > > >> > > >> I think the program sets the buffer to 2MB, which it > also > > >> does > > >> > > for > > >> > > > >> SCTP. > > >> > > > >> > > >> So having both statically at the same size makes sen= se > > >> for the > > >> > > > >> > > comparison. > > >> > > > >> > > >> I remember that there was a bug in the combination o= f > LRO > > >> and > > >> > > > >> delayed > > >> > > > >> > > ACK, > > >> > > > >> > > >> which was fixed, but I don't remember it was fixed > before > > >> > > 10.0... > > >> > > > >> > > > > > >> > > > >> > > > Sounds like disabling LRO and TSO would be a useful > test > > >> to see > > >> > > if > > >> > > > >> that > > >> > > > >> > > > improves things... But hiren said that the fix made > it, > > >> so... > > >> > > > >> > > > > > >> > > > >> > > >>> If you use netstat -a, you should be able to see th= e > > >> send-q > > >> > > on the > > >> > > > >> > > >>> sender grow as necessary... > > >> > > > >> > > > > > >> > > > >> > > > Also, getting the send-q output while it's running > would > > >> let us > > >> > > know > > >> > > > >> > > > if the buffer is getting to 2MB or not... > > >> > > > >> > > That is correct. Niu: Can you provide this? > > -- > John-Mark Gurney Voice: +1 415 225 5579 > > "All that I will do, has been done, All that I have, has not." >