Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 21 Apr 2003 19:35:32 -0700
From:      "Jin Guojun [NCS]" <j_guojun@lbl.gov>
To:        bj@dc.luth.se
Cc:        freebsd-performance@freebsd.org
Subject:   Re: patch for test (Was: tcp_output starving -- is due to mbuf get  delay?)
Message-ID:  <3EA4AA74.F9993276@lbl.gov>
References:  <200304210827.h3L8Rx2F032265@dc.luth.se>

next in thread | previous in thread | raw e-mail | index | archive | help
It is hard to compare your netstat output due to short NetBSD output.
In NetBSD, it is too short to know if TCP output has been saturated
(reach the maximum packet/sec?) at second 3,it reached 18.6Kpkt/s or 28KB/s,
which means MTU = 28KB / 18.5 = 1500, right?

The FreeBSD seems did better job, at second 2,  it already reached 72KB/s.
The pkt/s is low because you had Jumbo frame. The net.inet.tcp.liondmask=7
has doubled your  TCP window from 1,314,022 to 2,204,667, which is a fully
opened cwnd for 22ms + 1Gb/s path. There is nothing to be better than that.
The only thing left to chew your CPU is the memory copy.

On my web page, it shows that we have reduced all mbuf chain overhead,
but there is still a second memory copy overhead, which is can be also reduced.
However, this is no a patch any more. It requires to modify mbuf operation.
I am BCC this to core@freebsd.org, but I am not sure it will get throughput.

To reduce the second memory copy, the mbuf structure needs to have another
flag -- EOP -- end of packet. At this point, in xxx_usr_send(), we can simply copy
each t_maxseg to a mbuf chain, and set EOP bit in the mbuf flags, then chain
this mbuf into the sb_mb. At the tcp_output(), where I modified the mbuf chain
for m_copydata & m_copy, get rid of these two copy routines, and simply hand
the m to the if_queue. Since we just pass the handle, when NIC driver passes
the mbuf to the m_free, m_free will do nothing for these mbufs since EOP is set.
Therefore, we will reduce the mbuf operation on both en_queue and m_free.
This will left only one memory copy. For a system with 64-bit PCI chipset, this will
not be a bottleneck at all.

Of course, we can further reduce this one (to make zero copy TCP).
Lock (wire) down the user buffer, and simply assign the user space to the
mcluster as E_USR_EXT. This may be completely make sense since future
computer will have large memory (at least 1GB), where rarely have some
applications write a buffer large than 1MB at once (typically 64KB up to 640KB).
So lock down 0.1% of total system memory is not a bad thing.

If you want even better, the new TCP (Lion) stack is going for that goal, but it will
not be available till it is stabilized.

As Terry mentioned, now, you may try to play the NetBSD TCP stack first since
you have seen that NetBSD does a better job, and provide some feedback.

    -Jin

Borje Josefsson wrote:

> On Sun, 20 Apr 2003 13:12:42 PDT "Jin Guojun [NCS]" wrote:
>
> > Now the patch is ready. It has been tested on both 4.7 and 4.8.
> > For 4.7, one has to manually add an empty line before the comment prior to the
> > tcp_output() routine. <there is no spacing between end of SYSCTL and
> > comment for beginning the tcp_output() in 4.7-RELEASE :-( >
> >
> > Some more hints for tracing: (net.inet.tcp.liondmask is a bitmap)
> > bit 0-1 (value 1, 2, or 3) is for enabling tcp_output() mbuf chain modification
> > bit 2 (value 4) is for enabling sbappend() mbuf chain modification
> > bit 3 (value 8) is for tcp_input (DO NOT TRY IT, it is not ready).
> >
> > bit 9 (value 512) is for enabling check routine (dump errors to /var/log/messag).
> >
> > If you do have problem, set net.inet.tcp.liondmask to 512 and look what message says.
> > If you would like to know which part causing problem or not working properly,
> > set net.inet.tcp.liondmask  to 1, 2, 3 or 4 to test individual module.
>
> Thanks!!
>
> This patch definitively works, and gives much higher PPS (32000 instead of
> 19000). This is on a low-end system (PIII 900MHz with 33MHz bus), I'll
> test one of my larger systems later today.
>
> One question though - is there any way of having the code being more
> "aggressive"? As You see, in the netstat output below, it takes ~35
> seconds(!) before reaching full speed. On NetBSD I reach maxPPS almost
> immediately. Even if we now (with Your patch) can utilize the hardware
> much more, it only helps if You have connections that lasts for a very
> long time, so that the "ramping" time is not significant.
>
> *Note* (the very last output below) that this seems to be highly dependant
> on RTT. On a 2ms connection (~50 miles) I reach max RTT almost
> immediately. (can't explain why I go to 51kpps and then fall back to
> 35kpps, this is repeatable).
>
> Apart from vanilla 4.8R I have set:
>
> kern.ipc.maxsockbuf=8388608
> net.inet.tcp.sendspace=3217968
> net.inet.tcp.recvspace=3217968
> kern.ipc.nmbclusters=8192
>
> And this test is done on a connection with RTT in the order of 22 ms.
>
> --Börje
>
> =========== "netstat 1" **on NetBSD** (for comparation) =====
>
>  bge0 in       bge0 out              total in      total out
>  packets  errs  packets  errs colls   packets  errs  packets
>        1     0        1     0     0         1     0        1
>     7118     0    11315     0     0      7118     0    11315
>    18604     0    28014     0     0     18604     0    28014
>    18610     0    28005     0     0     18611     0    28005
>
> (NOTE that this example is using larger MTU, and not on the same hardware
> as below, but the behaviour of reaching maxPPS "immediately" is the same)
>
> =========== "netstat 1" with liondmask=7 ================
>
>             input        (Total)           output
>    packets  errs      bytes    packets  errs      bytes colls
>          6     0        540          3     0        228     0
>         37     0       2712         56     0      72216     0
>        646     0      42636        823     0    1244686     0
>       1548     0     102168       1966     0    2975188     0
>       2432     0     160512       3039     0    4604252     0
>       3301     0     217866       4193     0    6345352     0
>       4174     0     275484       5254     0    7950192     0
>       5011     0     330726       6373     0    9650414     0
>       5836     0     385176       7448     0   11271908     0
>       6675     0     440550       8519     0   12896430     0
>       7528     0     496848       9596     0   14527008     0
>       8408     0     554928      10626     0   16089456     0
>       9212     0     607992      11652     0   17636764     0
>       9962     0     657492      12698     0   19223436     0
>      10699     0     706134      13694     0   20731380     0
>      11368     0     750288      14648     0   22175736     0
>      12144     0     801504      15697     0   23768464     0
>      12802     0     844932      16693     0   25267324     0
>      13412     0     885192      17552     0   26576934     0
>      14001     0     924066      18495     0   28001608     0
>      14444     0     953304      19415     0   29384230     0
>      15041     0     992706      20275     0   30701070     0
>      15681     0    1034946      21327     0   32283200     0
>      16224     0    1070784      22202     0   33610978     0
>      16621     0    1096986      22888     0   34651096     0
>      17050     0    1125300      23568     0   35682130     0
>      17721     0    1169586      24573     0   37200672     0
>      18256     0    1204896      25361     0   38401274     0
>      18782     0    1239612      26128     0   39550400     0
>      19359     0    1277694      26972     0   40834272     0
>      20150     0    1329900      28015     0   42413374     0
>      20900     0    1379400      28962     0   43854702     0
>      21523     0    1420518      30024     0   45447430     0
>      22256     0    1468896      30891     0   46767638     0
>      22882     0    1510212      31655     0   47924334     0
>      23087     0    1523742      31865     0   48243788     0
>      23225     0    1532850      32038     0   48502682     0
>
> It seems that I reach the limit about here - 35-36 sec after start
>
>      23170     0    1529220      32121     0   48629858     0
>      23223     0    1532718      32036     0   48501168     0
>      23200     0    1531200      32121     0   48629858     0
>      23103     0    1524792      32122     0   48631372     0
>      23104     0    1524864      32080     0   48565096     0
>      23214     0    1532124      32079     0   48566270     0
>      23147     0    1527696      32036     0   48501168     0
>      10318     0     680988      13543     0   20495142     0
>          1     0         66          1     0        178     0
>          1     0         66          1     0        178     0
>
> =========== "netstat 1" with liondmask=7 ================
>
> With plain 4.8 (liondmask=0) I get:
>
> root@stinky 8# netstat 1
>             input        (Total)           output
>    packets  errs      bytes    packets  errs      bytes colls
>          7     0        732         10     0       2394     0
>        437     0      28842        556     0     840448     0
>       1343     0      88638       1669     0    2531586     0
>       2201     0     145266       2757     0    4166706     0
>       3082     0     203406       3857     0    5841190     0
>       4021     0     265386       4959     0    7503562     0
>       4877     0     321882       6017     0    9111430     0
>       5621     0     370986       7064     0   10690532     0
>       6471     0     427086       8136     0   12319596     0
>       7216     0     476256       9177     0   13889614     0
>       8006     0     528396      10181     0   15415726     0
>       8725     0     575850      11215     0   16975146     0
>       9482     0     625812      12259     0   18561818     0
>      10205     0     673530      13258     0   20071276     0
>      10846     0     715836      14115     0   21365746     0
>      11563     0     763158      15223     0   23046286     0
>      12399     0     818334      16266     0   24628416     0
>      13024     0     859584      17119     0   25913802     0
>      13609     0     898194      17949     0   27173450     0
>      14316     0     944856      18798     0   28458836     0
>      14391     0     949806      18842     0   28522764     0
>      14463     0     954558      19010     0   28779804     0
>
> Here I reach the limit after 20 seconds.
>
>      14500     0     957000      19095     0   28908494     0
>      14534     0     959244      19053     0   28844906     0
>      14599     0     963534      19052     0   28843392     0
>      14526     0     958716      19053     0   28844906     0
>      14484     0     955944      18967     0   28714702     0
>      14330     0     945780      18968     0   28716216     0
>      14581     0     962346      19137     0   28972082     0
>      14531     0     959046      19180     0   29037184     0
>      14465     0     954690      19095     0   28908494     0
>      14514     0     957924      19095     0   28908494     0
>      14403     0     950598      19095     0   28908494     0
>      14493     0     956538      19052     0   28843392     0
>      14544     0     959904      19095     0   28908494     0
>      14546     0     960036      19095     0   28908494     0
>      14558     0     960828      19095     0   28908494     0
>      14559     0     960894      19053     0   28844906     0
>      14597     0     963402      19094     0   28906980     0
>      14509     0     957594      19053     0   28844906     0
>      14527     0     958782      19137     0   28972082     0
>      14576     0     962016      19139     0   28973936     0
>      14575     0     961950      19096     0   28908494     0
>      14578     0     962148      19052     0   28843392     0
>      14519     0     958254      18968     0   28716216     0
>      14579     0     962214      19052     0   28843392     0
>      14533     0     959178      19095     0   28908494     0
>      14588     0     962808      19137     0   28972082     0
>      14503     0     957198      19053     0   28844906     0
>      14580     0     962280      19095     0   28908494     0
>      14479     0     955614      18968     0   28716216     0
>      14477     0     955482      19052     0   28843392     0
>      14618     0     964788      19137     0   28972082     0
>      14569     0     961554      19053     0   28844906     0
>      14586     0     962676      19095     0   28908494     0
>       4462     0     294492       5438     0    8224172     0
>
> ============ "netstat 1" with liondmask on a 2ms RTT connection ====
>
> root@stinky 17# netstat 1
>             input        (Total)           output
>    packets  errs      bytes    packets  errs      bytes colls
>          2     0        132          2     0          0     0
>       3908     0     258086       7004     0   10856439     0
>      29353     0    1937298      51940     0   78631282     0
>      29317     0    1934922      51911     0   78629768     0
>      29344     0    1936704      51894     0   78502592     0
>      29340     0    1936440      51841     0   78501078     0
>      29298     0    1933668      51860     0   78567694     0
>      29376     0    1938816      51947     0   78629768     0
>      29344     0    1936704      51928     0   78566180     0
>      20988     0    1385208      37580     0   56660114     0
>      19687     0    1299336      35473     0   53704786     0
>      19705     0    1300530      35431     0   53641198     0
>      19705     0    1300530      35431     0   53641198     0
>      19670     0    1298220      35346     0   53512508     0
>      19680     0    1298880      35388     0   53576096     0



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3EA4AA74.F9993276>