Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Mar 2002 15:10:20 +0100
From:      Poul-Henning Kamp <phk@critter.freebsd.dk>
To:        Robert Watson <rwatson@FreeBSD.ORG>
Cc:        Matthew Dillon <dillon@apollo.backplane.com>, John Baldwin <jhb@FreeBSD.ORG>, freebsd-smp@FreeBSD.ORG
Subject:   Re: Syscall contention tests return, userret() bugs/issues. 
Message-ID:  <76368.1017497420@critter.freebsd.dk>
In-Reply-To: Your message of "Sat, 30 Mar 2002 08:30:48 EST." <Pine.NEB.3.96L.1020330082409.73912V-100000@fledge.watson.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
In message <Pine.NEB.3.96L.1020330082409.73912V-100000@fledge.watson.org>, Robe
rt Watson writes:

>That said, if getuid as the example micro-benchmark can be demonstrated to
>causally affect optimize the macro-benchmark, then the selection of
>micro-benchmark by implementation facility sounds reasonable to me. :-)

Well, my gripe with microbenchmarks like this is that they are very
very very hard to get right.

Matt obviously didn't get it right as he himself noticed: one
testcase ran faster despite the fact that it was doing more work.

This means that the behaviour of caches (of all sorts) were a larger
factor than his particular change to the code.

The elimination (practically or by calculation) of the effects of
caches on microbenchmarks is by now a science onto itself.

I am very afraid that we will see people optimize for the cache-footprint
of their microbenchmarks rather than their microbenchmarks themselves.

Remember how Linux optimized for the wrong parameters because of
lmbench ?

We don't want to go there...

The only credible way to get a sensible results from a micro benchmark
that can be extrapolated to macro performance involves adding a
known or predictable, varying entropy load as jitter factor and use
a long integration times (>6hours).  That automatically takes you
into the territory of temperature stabilization and atomic referenced
clock signals etc.

And quite frankly, having gone there and come back I can personally
tell you that life isn't long enough for that.

(And no, just disabling caches is not a solution because then your
are not putting the CPU in a representative memory environment
anymore, that's like benchmarking car performance only in 1st gear.

So right now I think that our requirement for doing optimizations
should be:

	1.  It simplifies the code significantly.
or
	2.  It carries undisputed theoretical improvement.
or
	3.  It gives a statistically significant macroscopic improvement
	    in a (reasonably) well-defined workload of relevance.

The practical guide to execute #3 should be:

	A = Time reference code
	B = Time modified code
	C = Time reference code
	D = Time modified code

Unless both A and C are lower than both B and D it will take a lot
of carefully controlled test-runs to prove that there is a statistically
significant improvement (standard deviations and all that...)

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?76368.1017497420>