From owner-freebsd-performance@FreeBSD.ORG Sat Mar 12 10:02:27 2011 Return-Path: Delivered-To: freebsd-performance@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03ECE1065675; Sat, 12 Mar 2011 10:02:27 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [IPv6:2a01:4f8:100:1043::3]) by mx1.freebsd.org (Postfix) with ESMTP id 8BBB68FC1F; Sat, 12 Mar 2011 10:02:26 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.1]) by mail.vx.sk (Postfix) with ESMTP id 9E6A6141F80; Sat, 12 Mar 2011 11:02:25 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk ([127.0.0.1]) by core.vx.sk (mail.vx.sk [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 8dDsTdjVAMte; Sat, 12 Mar 2011 11:02:23 +0100 (CET) Received: from [10.9.8.1] (chello085216231078.chello.sk [85.216.231.78]) by mail.vx.sk (Postfix) with ESMTPSA id 46B5A141F77; Sat, 12 Mar 2011 11:02:23 +0100 (CET) Message-ID: <4D7B44AF.7040406@FreeBSD.org> Date: Sat, 12 Mar 2011 11:02:23 +0100 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; sk; rv:1.8.1.23) Gecko/20090812 Lightning/0.9 Thunderbird/2.0.0.23 Mnenhy/0.7.5.0 MIME-Version: 1.0 To: Poul-Henning Kamp References: <98496.1299861978@critter.freebsd.dk> In-Reply-To: <98496.1299861978@critter.freebsd.dk> X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=windows-1250 Content-Transfer-Encoding: 8bit Cc: freebsd-performance@FreeBSD.org, freebsd-current@FreeBSD.org Subject: Re: FreeBSD Compiler Benchmark: gcc-base vs. gcc-ports vs. clang X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 12 Mar 2011 10:02:27 -0000 Hi Poul-Henning, I have redone the test for majority of the processors, this time taking 5 samples of each whole testrun, calculating the average, standard deviation, relative standard deviation, standard error and relative standard error. The relative standard error is below 0.25% for ~91%, between 0.25% and 0.5% for ~7%, 0.5%-1.0% for ~1% and between 1.0%-2.0% for <1% of the tests. Under a "test" I mean 5 runs for the same setting of the same compiler on the same preocessor. So let's say I have now the string/base64 test for a core i7 showing the following (score +/- standard deviation): gcc421: 82.7892 points +/- 0.8314 (1%) gcc45-nocona: 96.0882 points +/- 1.1652 (1.21%) For a relative comparsion of two settings of the same test I could calculate the difference of averages = 13.299 (16.06%) points and sum of standard deviations = 2.4834 points (3.00%) Therefore if assuming normal distribution intervals I could say that: With a 95% probability gcc45-nocona is faster than gcc421 by at least 10.18% (16.06 - 1.96x3.00) or with a 99.9% probability by at least 6.12% (16,06 - 3.2906x3.00). So I should probably take a significance level (e.g. 95%, 99% or 99.9%) and normalize all the test scores for this level. Results out of the interval (difference is below zero) are then not significant. What significance level should I take? I hope this approach is better :) Dňa 11.03.2011 17:46, Poul-Henning Kamp wrote / napísal(a): > In message <4D7A42CC.8020807@FreeBSD.org>, Martin Matuska writes: > >> But what I can say, e.g. for the Intel Atom processor, if there are >> performance gains in all but one test (that falls 2% behind), generic >> perl code (the routines benchmarked) on this processor is very likely to >> run faster with that setup. > > No, actually you cannot say that, unless you run all the tests at > least three times for each compiler(+flag), calculate the average > and standard deviation of all the tests, and see which, if any of > the results are statistically significant. > > Until you do that, you numbers are meaningless, because we have no > idea what the signal/noise ratio is. >