From owner-freebsd-net@FreeBSD.ORG Sun Aug 10 12:27:02 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B5323B38 for ; Sun, 10 Aug 2014 12:27:02 +0000 (UTC) Received: from mail-qa0-x22d.google.com (mail-qa0-x22d.google.com [IPv6:2607:f8b0:400d:c00::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 6B52C2D7A for ; Sun, 10 Aug 2014 12:27:02 +0000 (UTC) Received: by mail-qa0-f45.google.com with SMTP id cm18so7034342qab.18 for ; Sun, 10 Aug 2014 05:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=9ovCxsdSWgOd0GEAo64jRNS17PuPqEiw2cHHW83bjfs=; b=sbJHc9nx0+rTglTIWXGJeY/CtZG2W5eCVyCBO6ctYlkXQFSGcH4ODAIB6A4ErnXimE dmwfl2+WDyV94eZxr0VY8GGFXXpy0UGLPU3RRflp30wgIjgzaUCGbQwlvHvTDyo5ukLw MtPYyzhzpTFh1DokFXEj3UteyQva2U400mFqLyHwwpGHRVBJ/1lcFSVn8ptKQFVuUCSd sZJmdBllvDDvzpqEhdFyyPeSCLyTXY9lLLth124PUTOjSe+zNF27fy/PRQuPM57N6eQp IQK/3d4mib6Aj5dtqssh4XEEInuSWN62pHdZlVv1ZrY7nvWt3S9RwpBCFtaVGX3gim7S 6oXw== MIME-Version: 1.0 X-Received: by 10.140.41.38 with SMTP id y35mr38456683qgy.69.1407673621360; Sun, 10 Aug 2014 05:27:01 -0700 (PDT) Received: by 10.224.137.71 with HTTP; Sun, 10 Aug 2014 05:27:01 -0700 (PDT) In-Reply-To: References: <20140809184232.GF83475@funkthat.com> <8AE1AC56-D52F-4F13-AAA3-BB96042B37DD@lurchi.franken.de> <20140809204500.GG83475@funkthat.com> <3F6BC212-4223-4AAC-8668-A27075DC55C2@lurchi.franken.de> <20140810022350.GI83475@funkthat.com> <20140810033212.GL83475@funkthat.com> <20140810045355.GM83475@funkthat.com> Date: Sun, 10 Aug 2014 20:27:01 +0800 Message-ID: Subject: Re: A problem on TCP in High RTT Environment. From: Niu Zhixiong To: Niu Zhixiong , Michael Tuexen , "freebsd-net@freebsd.org" , Bill Yuan , John-Mark Gurney Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.18 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Aug 2014 12:27:02 -0000 Hi, I am not sure whether my last email is filtered by mailing list. After disabled tso=EF=BC=8C the speed become even poorer=EF=BC=8E This is the packets captures. Plz see google drive. tcp_with_tso_off.pcapng.gz Regards, Niu Zhixiong =EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF= =BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D=EF=BC=8D kaiaixi@gmail.com On Sun, Aug 10, 2014 at 1:24 PM, Niu Zhixiong wrote: > Hi=EF=BC=8C > After disabled tso=EF=BC=8C the speed become even poorer=EF=BC=8E > This is the packets captures. Plz see google drive. > =E2=80=8B > tcp_with_tso_off.pcapng.gz > > =E2=80=8B > > > John-Mark Gurney =E4=BA=8E2014=E5=B9=B48=E6=9C=8810=E6= =97=A5=E6=98=9F=E6=9C=9F=E6=97=A5=E5=86=99=E9=81=93=EF=BC=9A > > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 11:48 +0800: >> > I am using Intel I350-T4 NIC. The LRO is closed by default. And by the >> way, >> > when I am using KVM-based virtual machine(virtio NIC) do the exactly >> same >> > test. The results are same. >> >> Have you tried disabling tso? I asked that in an earlier email, but >> never heard from you if that changed anything... >> >> a lot of the trace looks like: >> 19:29:57.223574 IP 10.0.10.2.61010 > 10.0.10.3.9000: . >> 251521:257313(5792) ack 1 win 32783 >> 19:29:57.223798 IP 10.0.10.3.9000 > 10.0.10.2.61010: . ack 257313 win >> 32745 >> 19:29:57.225570 IP 10.0.10.2.61010 > 10.0.10.3.9000: . >> 257313:263105(5792) ack 1 win 32783 >> >> Notice how the ack comes back immediately, but for some reason, we decid= e >> to >> wait almost 2ms before sending out the next frame... >> >> For some reason, we just aren't filling our window out... tcptcace's >> graphs shows the winow at 2MB, but we only ever have 4 segments >> outstanding at once... >> >> > ifconfig igb0 >> > igb0: flags=3D8843 metric 0 mt= u >> 1500 >> > >> options=3D403bb >> > ether a0:36:9f:38:27:d0 >> > inet 10.0.10.3 netmask 0xffffff00 broadcast 10.0.10.255 >> > inet6 fe80::a236:9fff:fe38:27d0%igb0 prefixlen 64 scopeid 0x1 >> > nd6 options=3D29 >> > media: Ethernet autoselect (1000baseT ) >> > status: active >> > >> > Regards, >> > Niu Zhixiong >> > ????????????????????????????????????????????? >> > kaiaixi@gmail.com >> > >> > >> > On Sun, Aug 10, 2014 at 11:32 AM, John-Mark Gurney >> wrote: >> > >> > > Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:50 +0800: >> > > > I am sorry that I upload a WRONG SCTP capture. But, the throughput >> is >> > > same. >> > > > SCTP is double than TCP, about 18Mbps. >> > > > ??? >> > > > sctp_2.pcapng.gz >> > > > < >> > > >> https://docs.google.com/file/d/0By8sTL79ob4tMlh4WDlTSndHX0k/edit?usp=3Dd= rive_web >> > > > >> > > > ??? >> > > >> > > Ok, the owin graph is very interesting... We do have a full 2MB >> window >> > > on the receiver side, but for some reason, we only ever have just >> under >> > > 6k outstanding on the connection... >> > > >> > > So, it looks like we send for a short period of time, and then stop >> > > sending... Do you have LRO enabled? I think it might be related to= : >> > > https://svnweb.freebsd.org/changeset/base/r256920 >> > > >> > > As I'm seeing >100ms gaps where the sender doesn't send any data, an= d >> > > as soon as more than one ack comes in, the next segment goes out... >> If >> > > we only receive a single ack, then we wait for a timeout before >> sending >> > > the next segment.. >> > > >> > > Can you try to disable LRO on the receiving host? >> > > >> > > ifconfig -lro >> > > >> > > And see if that helps... If it does... Applying the patch, or >> compiling >> > > a more recent kernel from stable/10 that is after r257367 as that is >> was >> > > the date that the change was merged... >> > > >> > > > On Sun, Aug 10, 2014 at 10:42 AM, Niu Zhixiong >> > > wrote: >> > > > >> > > > > I am sure that wnd is about 2MB all the time. >> > > > > This is my latest capture, plz see Google Drive. >> > > > > In the latest test, TCP(0s-120s) is about 9Mbps and SCTP(0s-120s= ) >> is >> > > about >> > > > > 18Mbps. >> > > > > (The bandwidth(20Mbps) and delay(200ms) is set by dummynet) >> > > > > The SCTP and TCP are tested in same environment. >> > > > > >> > > > > ??? >> > > > > sctp.pcapng.gz >> > > > > < >> > > >> https://docs.google.com/file/d/0By8sTL79ob4tYl9sM2V5a19iNVU/edit?usp=3Dd= rive_web >> > > > >> > > > > ?????? >> > > > > tcp.pcapng.gz >> > > > > < >> > > >> https://docs.google.com/file/d/0By8sTL79ob4tV0NMR1FYLUQ3MWs/edit?usp=3Dd= rive_web >> > > > >> > > > > ??? >> > > > > >> > > > > >> > > > > >> > > > > Regards, >> > > > > Niu Zhixiong >> > > > > ????????????????????????????????????????????? >> > > > > kaiaixi@gmail.com >> > > > > >> > > > > >> > > > > On Sun, Aug 10, 2014 at 10:23 AM, John-Mark Gurney < >> jmg@funkthat.com> >> > > > > wrote: >> > > > > >> > > > >> Niu Zhixiong wrote this message on Sun, Aug 10, 2014 at 10:12 >> +0800: >> > > > >> > During the TCP4 transmission. >> > > > >> > Proto Recv-Q Send-Q Local Address Foreign Address >> > > > >> (state) >> > > > >> > tcp4 0 2097346 10.0.10.2.13504 10.0.10.3.9000 >> > > > >> > ESTABLISHED >> > > > >> >> > > > >> Ok, so you are getting a full 2MB in there, and w/ that, you >> should >> > > > >> easily be saturating your pipe... >> > > > >> >> > > > >> The next thing would be to get a tcpdump, and take a look at th= e >> > > > >> window size.. Wireshark has lots of neat tools to make this >> analysis >> > > > >> easy... Another tool that is good is tcptrace.. It can output= a >> > > > >> variety of different graphs that will help you track down, and >> see >> > > > >> what part of the system is the problem... >> > > > >> >> > > > >> You probably only need a few tens of seconds of the tcpdump... >> > > > >> >> > > > >> > On Sun, Aug 10, 2014 at 4:58 AM, Michael Tuexen < >> > > > >> > Michael.Tuexen@lurchi.franken.de> wrote: >> > > > >> > >> > > > >> > > >> > > > >> > > On 09 Aug 2014, at 22:45, John-Mark Gurney > > >> > > wrote: >> > > > >> > > >> > > > >> > > > Michael Tuexen wrote this message on Sat, Aug 09, 2014 at >> 21:51 >> > > > >> +0200: >> > > > >> > > >> >> > > > >> > > >> On 09 Aug 2014, at 20:42, John-Mark Gurney < >> jmg@funkthat.com> >> > > > >> wrote: >> > > > >> > > >> >> > > > >> > > >>> Niu Zhixiong wrote this message on Fri, Aug 08, 2014 at >> 20:34 >> > > > >> +0800: >> > > > >> > > >>>> Dear all, >> > > > >> > > >>>> >> > > > >> > > >>>> Last month, I send problems related to FTP/TCP in a >> high RTT >> > > > >> > > environment. >> > > > >> > > >>>> After that, I setup a simulation environment(Dummynet) >> to >> > > test >> > > > >> TCP >> > > > >> > > and SCTP >> > > > >> > > >>>> in high delay environment. After finishing the test, I >> can >> > > see >> > > > >> TCP is >> > > > >> > > >>>> always slower than SCTP. But, I think it is not >> possible. >> > > (Plz >> > > > >> see the >> > > > >> > > >>>> figure in the attachment). When the delay is 200ms(mea= ns >> > > > >> RTT=3D400ms). >> > > > >> > > >>>> Besides, the TCP is extremely slow. >> > > > >> > > >>>> >> > > > >> > > >>>> ALL BW=3D20Mbps, DELAY=3D 0 ~ 200MS, Packet LOSS =3D 0= (by >> > > dummynet) >> > > > >> > > >>>> >> > > > >> > > >>>> This is my parameters: >> > > > >> > > >>>> FreeBSD vfreetest0 10.0-RELEASE FreeBSD 10.0-RELEASE >> #0: Thu >> > > Aug >> > > > >> 7 >> > > > >> > > >>>> 11:04:15 HKT 2014 >> > > > >> > > >>>> >> > > > >> > > >>>> sysctl net.inet.tcp >> > > > >> > > >>> >> > > > >> > > >>> [...] >> > > > >> > > >>> >> > > > >> > > >>>> net.inet.tcp.recvbuf_auto: 0 >> > > > >> > > >>> >> > > > >> > > >>> [...] >> > > > >> > > >>> >> > > > >> > > >>>> net.inet.tcp.sendbuf_auto: 0 >> > > > >> > > >>> >> > > > >> > > >>> Try enabling this... This should allow the buffer to >> grow >> > > large >> > > > >> enough >> > > > >> > > >>> to deal w/ the higher latency... >> > > > >> > > >>> >> > > > >> > > >>> Also, make sure your program isn't setting the recv >> buffer >> > > size >> > > > >> as that >> > > > >> > > >>> will disable the auto growing... >> > > > >> > > >> I think the program sets the buffer to 2MB, which it als= o >> does >> > > for >> > > > >> SCTP. >> > > > >> > > >> So having both statically at the same size makes sense >> for the >> > > > >> > > comparison. >> > > > >> > > >> I remember that there was a bug in the combination of LR= O >> and >> > > > >> delayed >> > > > >> > > ACK, >> > > > >> > > >> which was fixed, but I don't remember it was fixed befor= e >> > > 10.0... >> > > > >> > > > >> > > > >> > > > Sounds like disabling LRO and TSO would be a useful test >> to see >> > > if >> > > > >> that >> > > > >> > > > improves things... But hiren said that the fix made it, >> so... >> > > > >> > > > >> > > > >> > > >>> If you use netstat -a, you should be able to see the >> send-q >> > > on the >> > > > >> > > >>> sender grow as necessary... >> > > > >> > > > >> > > > >> > > > Also, getting the send-q output while it's running would >> let us >> > > know >> > > > >> > > > if the buffer is getting to 2MB or not... >> > > > >> > > That is correct. Niu: Can you provide this? >> > > >> > > -- >> > > John-Mark Gurney Voice: +1 415 225 >> 5579 >> > > >> > > "All that I will do, has been done, All that I have, has not." >> > > >> > _______________________________________________ >> > freebsd-net@freebsd.org mailing list >> > http://lists.freebsd.org/mailman/listinfo/freebsd-net >> > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" >> -- >> John-Mark Gurney Voice: +1 415 225 5579 >> >> "All that I will do, has been done, All that I have, has not." >> >