Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Mar 2017 14:00:51 -0800
From:      Gleb Smirnoff <glebius@FreeBSD.org>
To:        Sepherosa Ziehau <sepherosa@gmail.com>
Cc:        Julien Charbon <jch@freebsd.org>, Jason Eggleston <jeggleston@llnw.com>, "freebsd-net@freebsd.org" <net@freebsd.org>, hiren@freebsd.org, jtl@freebsd.org, rrs@freebsd.org
Subject:   Re: listening sockets as non sockets
Message-ID:  <20170302220051.GU1044@FreeBSD.org>
In-Reply-To: <CAMOc5cyNA3cWjagQ6VHz5fy8F=j6J1_xGNxkr-GbqUxP6phLpg@mail.gmail.com>
References:  <20170127005251.GM2611@FreeBSD.org> <20170210063024.GE1973@FreeBSD.org> <20170216184903.GF58829@FreeBSD.org> <0858647a-ec3c-1a78-053f-d04397a82d8a@freebsd.org> <20170222232704.GJ8899@FreeBSD.org> <CAMOc5cyNA3cWjagQ6VHz5fy8F=j6J1_xGNxkr-GbqUxP6phLpg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Feb 26, 2017 at 11:37:59PM +0800, Sepherosa Ziehau wrote:
S> r314268 -> solisten
S> 
S> 1KB:
S> Performance (reqs/s)
S> 77916.71 -> 26240.37
S> Latency average
S> 121ms -> 294ms
...
S> So what I have seen is solisten's performance is 1/3 of r314268, and
S> average latency doubles.

I did similar testing, and my results are the following, for three
consecutive runs:

	solisten 		head (r306199)
req/s	63k,63k,63k		46k,47k,44k
latency 213,214,208		232,233,223

So, I don't see latency increase, neither req/s regression. I see
the opposite.

What is different about my test? First, this is NetflixBSD, both head
and solisten installation. Head is based on r306199 and solisten
is based on r314150 and cb79de4fd2912450c4ab808c017ae395fd636bd8 from
my github.

To my knowledge the parts of the stack that are different in NetflixBSD
do not touch sonewconn(), accept4() and other parts we are interested at.
I also didn't notice any drastical changes in head between r306199 and
r314150. So imho it is fair to attribute the difference to my change.

The hardware is different. It is Supermicro X9SRH-7F/7TF,
Xeon(R) CPU E5-2697 v2 @ 2.70GHz, 256Gb RAM and Chelsio cxl(4)
at 40Gbit/s. I got two boxes of this configuration one running head
and other solisten. The client box runs same CPU and mainboard, but has
lagg of two cxls, capping it to 80 Gbit/s, which isn't important but,
what is important providing more parallelism at sending side.

The nginx has multiple listening sockets, but we bombard only one that
is at AF_INET4 *:80. The nginx is configured to 64 worker processes
and accept_mutex is on. So, even with 1 socket, seems like I got some
improvement.

I run your wrk 498d70f6da5a201f109488eeaf31c8ba891dc163, and the command
used on the sending side is:

./wrk -c 15000 -t 48 -d 120s --delay --latency --connreqs 1 http://host/file

The difference to your command is only threads count. My box has much
more cores.

The file is of size 1657 bytes.

Sephe, can you please get hwpmc dumps with your test on solisten/head? In
the test that shows that solisten is 3x slower.

Julien, your testing will also be much appreciated. Looks like Sephe's
result shouldn't block your try.

-- 
Totus tuus, Glebius.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170302220051.GU1044>