Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Aug 2018 18:52:17 +0200
From:      Michio Honda <micchie@sfc.wide.ad.jp>
To:        Navdeep Parhar <np@FreeBSD.org>, freebsd-current@freebsd.org
Subject:   Re: TCP server app performance
Message-ID:  <9e67c516-799c-8a1e-0cc5-64ab1e582a98@sfc.wide.ad.jp>
In-Reply-To: <cc95d407-378c-392e-1f0f-67f22dc89f9e@FreeBSD.org>
References:  <CA%2BSc9E3_6bw68odAMH2Y-SzR2-PjuxZG6jp8P0JZV6z9LgmwQw@mail.gmail.com> <cc95d407-378c-392e-1f0f-67f22dc89f9e@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help


On 08/14/2018 09:30 PM, Navdeep Parhar wrote:
> On 8/12/18 9:50 AM, Honda Michio wrote:
>> Hi,
>>
>> I'm measuring TCP server app performance using my toy web server.
>> It just accept TCP connections and responds back HTTP OK to the clients.
>> It monitors sockets using kqueue, and processes each ready descriptor using
>> a pair of read() and write(). (in more detail, it's
>> https://github.com/micchie/netmap/tree/paste/apps/phttpd)
>>
>> Using 100 persistent TCP connections (the client sends 44 B HTTP GET and
>> the server responds with 151 B of HTTP OK) and a single CPU core, I only
>> get 152K requests per second, which is 2.5x slower than Linux that runs the
>> same app  (except that it uses epoll instead of kqueue).
>> I cannot justify this by myself. Does anybody has some intuition about how
>> much FreeBSD would get with such workloads?
>> I tried disabling TCP delayed ack and changing interrupt rates, but no
>> significant difference was observed.
>>
>> I use FreeBSD-CURRENT with GENERIC-NODEBUG (git commit hash: 3015145c3aa4b).
>> For hardware, the server has Xeon Silver 4110 and Intel X540 NIC (activate
>> only a single queue as I test with a single CPU core). All the offloadings
>> are disabled.
> 
> I hope hw L3/L4 checksumming is still on?
They are off but exchanged messages are too small to benefit from these 
offloads anyways.
> 
> Are your results similar to what you get with 100 (same number as your
> test clients) netperf's doing TCP_RR on this setup, or wildly different?
I cannot find any netperf option that sends multiple connections with 
TCP_RR. I also don't find any equivalent in iperf. But single-connection 
performance is similar to what I get with wrk and my HTTP server.

Running 100 netperf processes does not show aggregate throughput, and 
that also means running 100 kevent() loops, which is quite different 
behaviour from what I want.

Cheers,
- Michio

> 
> Regards,
> Navdeep
> 



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9e67c516-799c-8a1e-0cc5-64ab1e582a98>