From owner-freebsd-threads@FreeBSD.ORG  Sun Oct  7 20:09:51 2007
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F279716A418
	for <freebsd-threads@freebsd.org>; Sun,  7 Oct 2007 20:09:50 +0000 (UTC)
	(envelope-from gofdt-freebsd-threads@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 659F813C4A7
	for <freebsd-threads@freebsd.org>; Sun,  7 Oct 2007 20:09:49 +0000 (UTC)
	(envelope-from gofdt-freebsd-threads@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IecRa-0006sD-Du
	for freebsd-threads@freebsd.org; Sun, 07 Oct 2007 20:09:42 +0000
Received: from 78-1-114-229.adsl.net.t-com.hr ([78.1.114.229])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-threads@freebsd.org>; Sun, 07 Oct 2007 20:09:42 +0000
Received: from ivoras by 78-1-114-229.adsl.net.t-com.hr with local (Gmexim 0.1
	(Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-threads@freebsd.org>; Sun, 07 Oct 2007 20:09:42 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-threads@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Sun, 07 Oct 2007 22:09:28 +0200
Lines: 99
Message-ID: <febedp$jv0$1@sea.gmane.org>
References: <fearqk$sot$1@sea.gmane.org> <200710071805.39399.tijl@ulyssis.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 78-1-114-229.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.0 (X11/20070527)
In-Reply-To: <200710071805.39399.tijl@ulyssis.org>
Sender: news <news@sea.gmane.org>
Subject: Re: Unexpected threading performance result
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>, 
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 20:09:51 -0000

Tijl Coosemans wrote:
> On Sunday 07 October 2007 16:52:03 Ivan Voras wrote:
>> For an unrelated purpose, I'm benchmarking performance of tree 
>> algorithms in SMP environments and my preliminary run has an unexpected 
>> result. Here's the typical output from the (small) benchmark program, 
>> run on a dual-core Athlon64 (i386 mode):
>>
>> Running benchmarks on small_nonuniform, 1000000 samples
>> Step 1: Running 100 loops
>> ** Step 1 benchmark completed 100 loops in 84.44 seconds.
>> Step 2: Running 2 threads with 100 loops each
>> ** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds.
> 
> My guess is, that in the beginning of step1() and step2() you have to
> add a line "time_start = gettime();".

Of course I have. I was so focused on the low level stuff I did 
something stupid to the effect of your suggestion. Thanks for the help!

The results make sense now, and if anyone's interested, I'm pasting them 
below. I did additional effort and run it under both 4BSD and ULE 
schedulers non 7-CURRENT (SMP, dual-core).

-- 4BSD, nonuniform samples --
Running benchmarks on small_nonuniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 86.33 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 82.79 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 124.67 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 166.32 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 210.67 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 251.83 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 291.25 seconds.

-- ULE nonuniform samples --
Running benchmarks on small_nonuniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 84.09 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 83.43 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 126.21 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 166.66 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 209.40 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 250.36 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 291.92 seconds.
Step 2: Running 8 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 8 threads in 333.42 seconds.

-- 4BSD uniform samples --
Running benchmarks on small_uniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 93.33 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 89.33 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 135.20 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 179.96 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 226.40 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 269.57 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 314.06 seconds.
Step 2: Running 8 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 8 threads in 358.67 seconds.

-- ULE uniform samples --
Running benchmarks on small_uniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 89.76 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 89.90 seconds.
Step 2: Running 3 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 3 threads in 135.75 seconds.
Step 2: Running 4 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 4 threads in 179.72 seconds.
Step 2: Running 5 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 5 threads in 226.10 seconds.
Step 2: Running 6 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 6 threads in 269.63 seconds.
Step 2: Running 7 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 7 threads in 314.76 seconds.
Step 2: Running 8 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 8 threads in 359.44 seconds.

"uniform" / "nonuniform" describes the distribution of the random number 
function.