Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Apr 2012 16:34:08 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        "K. Macy" <kmacy@FreeBSD.org>
Cc:        Andre Oppermann <andre@FreeBSD.org>, Luigi Rizzo <rizzo@iet.unipi.it>, current@FreeBSD.org, net@FreeBSD.org
Subject:   Re: Some performance measurements on the FreeBSD network stack
Message-ID:  <20120421155638.E982@besplex.bde.org>
In-Reply-To: <CAHM0Q_P3XvOZrfJW7dUa23H%2BYUMe608hoKY41DZ7BGGc=cKniQ@mail.gmail.com>
References:  <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <CAHM0Q_M4wcEiWGkjWxE1OjLeziQN0vM%2B4_EYS_WComZ6=j5xhA@mail.gmail.com> <20120419212224.GA95459@onelab2.iet.unipi.it> <CAHM0Q_Md4M1YRA=RJD7-xVxehvwWFjU07PdA5vWFBR6PXE14Zw@mail.gmail.com> <20120420144410.GA3629@onelab2.iet.unipi.it> <CAHM0Q_P3XvOZrfJW7dUa23H%2BYUMe608hoKY41DZ7BGGc=cKniQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 20 Apr 2012, K. Macy wrote:

> On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo <rizzo@iet.unipi.it> wrote:

>> The small penalty when flowtable is disabled but compiled in is
>> probably because the net.flowtable.enable flag is checked
>> a bit deep in the code.
>>
>> The advantage with non-connect()ed sockets is huge. I don't
>> quite understand why disabling the flowtable still helps there.
>
> Do you mean having it compiled in but disabled still helps
> performance? Yes, that is extremely strange.

This reminds me that when I worked on this, I saw very large throughput
differences (in the 20-50% range) as a result of minor changes in
unrelated code.  I could get these changes intentionally by adding or
removing padding in unrelated unused text space, so the differences were
apparently related to text alignment.  I thought I had some significant
micro-optimizations, but it turned out that they were acting mainly by
changing the layout in related used text space where it is harder to
control.  Later, I suspected that the differences were more due to cache
misses for data than for text.  The CPU and its caching must affect this
significantly.  I tested on an AthlonXP and Athlon64, and the differences
were larger on the AthlonXP.  Both of these have a shared I/D cache so
pressure on the I part would affect the D part, but in this benchmark
the D part is much more active than the I part so it is unclear how
text layout could have such a large effect.

Anyway, the large differences made it impossible to trust the results
of benchmarking any single micro-benchmark.  Also, ministat is useless
for understanding the results.  (I note that luigi didn't provide any
standard deviations and neither would I. :-).  My results depended on
the cache behaviour but didn't change significantly when rerun, unless
the code was changed.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120421155638.E982>