Date: Sun, 07 Oct 2007 22:09:28 +0200 From: Ivan Voras <ivoras@freebsd.org> To: freebsd-threads@freebsd.org Subject: Re: Unexpected threading performance result Message-ID: <febedp$jv0$1@sea.gmane.org> In-Reply-To: <200710071805.39399.tijl@ulyssis.org> References: <fearqk$sot$1@sea.gmane.org> <200710071805.39399.tijl@ulyssis.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Tijl Coosemans wrote: > On Sunday 07 October 2007 16:52:03 Ivan Voras wrote: >> For an unrelated purpose, I'm benchmarking performance of tree >> algorithms in SMP environments and my preliminary run has an unexpected >> result. Here's the typical output from the (small) benchmark program, >> run on a dual-core Athlon64 (i386 mode): >> >> Running benchmarks on small_nonuniform, 1000000 samples >> Step 1: Running 100 loops >> ** Step 1 benchmark completed 100 loops in 84.44 seconds. >> Step 2: Running 2 threads with 100 loops each >> ** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds. > > My guess is, that in the beginning of step1() and step2() you have to > add a line "time_start = gettime();". Of course I have. I was so focused on the low level stuff I did something stupid to the effect of your suggestion. Thanks for the help! The results make sense now, and if anyone's interested, I'm pasting them below. I did additional effort and run it under both 4BSD and ULE schedulers non 7-CURRENT (SMP, dual-core). -- 4BSD, nonuniform samples -- Running benchmarks on small_nonuniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 86.33 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 82.79 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 124.67 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 166.32 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 210.67 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 251.83 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 291.25 seconds. -- ULE nonuniform samples -- Running benchmarks on small_nonuniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 84.09 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 83.43 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 126.21 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 166.66 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 209.40 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 250.36 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 291.92 seconds. Step 2: Running 8 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 8 threads in 333.42 seconds. -- 4BSD uniform samples -- Running benchmarks on small_uniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 93.33 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 89.33 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 135.20 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 179.96 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 226.40 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 269.57 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 314.06 seconds. Step 2: Running 8 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 8 threads in 358.67 seconds. -- ULE uniform samples -- Running benchmarks on small_uniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 89.76 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 89.90 seconds. Step 2: Running 3 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 3 threads in 135.75 seconds. Step 2: Running 4 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 4 threads in 179.72 seconds. Step 2: Running 5 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 5 threads in 226.10 seconds. Step 2: Running 6 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 6 threads in 269.63 seconds. Step 2: Running 7 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 7 threads in 314.76 seconds. Step 2: Running 8 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 8 threads in 359.44 seconds. "uniform" / "nonuniform" describes the distribution of the random number function.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?febedp$jv0$1>