Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Jun 2013 00:24:53 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Gleb Smirnoff <glebius@FreeBSD.org>
Cc:        Konstantin Belousov <kostikbel@gmail.com>, svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Bruce Evans <brde@optusnet.com.au>
Subject:   Re: svn commit: r252032 - head/sys/amd64/include
Message-ID:  <20130626233533.H2933@besplex.bde.org>
In-Reply-To: <20130626091055.GU1214@FreeBSD.org>
References:  <20130622124832.S2347@besplex.bde.org> <20130622174921.I3112@besplex.bde.org> <20130623073343.GY91021@kib.kiev.ua> <20130623181458.J2256@besplex.bde.org> <20130624170849.GH91021@kib.kiev.ua> <20130625102023.K899@besplex.bde.org> <20130625062039.GJ91021@kib.kiev.ua> <20130625190352.P986@besplex.bde.org> <20130625205826.GM91021@kib.kiev.ua> <20130626092955.B891@besplex.bde.org> <20130626091055.GU1214@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 26 Jun 2013, Gleb Smirnoff wrote:

> On Wed, Jun 26, 2013 at 11:42:39AM +1000, Bruce Evans wrote:
> B> > Anyway, as Gleb said, there is no point in
> B> > optimizing the i386 kernel.
> B>
> B> I said that there is every point in optimizing the i386 kernel.  This
> B> applies even more to other 32-bit arches.  Some CPUs are much slower
> B> than modern x86's.  They shouldn't be slowed down more by inefficient
> B> KPIs.
>
> I didn't mean that i386 arch is a relic and should be ignored at all.
>
> What I actually meant, is that the problem of performance drop due to
> cache poisoning and loss of statistics with simple "+=" operation can
> be observed only at extremely high event rates, with multiple processors
> involved.

I think you already fixed cache poisoning, and it has nothing to do with
whether the access is a simple "+=' operation.  amd64 still uses a simple
'+=' operation (written in asm so that it uses the right instructions),
so the slow cmpxch8b used on i386 can't be doing anything good for the
cache.

> The counter(9) is solution for these conditions. Thus we are interested
> in optimising amd64, not i386. The latter isn't affected neither positively
> nor negatively with these changes, just because last i386 CPUs can't reach
> the event rates where need for counter(9) arises. Yes, you can tweak
> implementation and obtain better results with microbenchmarks, but I bet
> that any change in counter(9) implementation won't affect packet forwarding
> rate on any i386. What we claim for i386 (and all other arches) that
> counter(9) is lossless, and that's all.
>
> I second to Konstantin, that we don't have objections in any changes to
> i386 part of counter, including a daemon, but the changes shouldn't affect
> amd64.

amd64 should be changed too, to use 32-bit pcpu counters to avoid ifdefs
and to use less cache.  You can't reach event rates that overflow 32-bit
counters faster than a daemon can accumulate them.  For example, with
10 Gbps ethernet the maximum packet rate is about 14 Mpps.  Suppose that
is all handled and counted on 1 CPU, which isn't possible yet.  The
daemon must run once every 286 seconds to keep up with that.  Byte counts
are more interesting.  Counting 1 G/second of anything requires running
the daemon every 4 seconds.

I don't remember how you distributed the counters to avoid cache poisoning.
Is it one pcpu counter per cache line, so that counters never poison nor
benefit from caching for other counters for the same CPU?  Or is it just
separate cache lines for each CPU?  I think the latter.  So there can be
several 64-bit counters per cache line, or twice as many 32-bit counters.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130626233533.H2933>