From owner-freebsd-threads@FreeBSD.ORG  Sun Oct  7 14:52:40 2007
Return-Path: <owner-freebsd-threads@FreeBSD.ORG>
Delivered-To: freebsd-threads@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3CF2A16A417
	for <freebsd-threads@freebsd.org>; Sun,  7 Oct 2007 14:52:40 +0000 (UTC)
	(envelope-from gofdt-freebsd-threads@m.gmane.org)
Received: from ciao.gmane.org (main.gmane.org [80.91.229.2])
	by mx1.freebsd.org (Postfix) with ESMTP id C26D413C47E
	for <freebsd-threads@freebsd.org>; Sun,  7 Oct 2007 14:52:39 +0000 (UTC)
	(envelope-from gofdt-freebsd-threads@m.gmane.org)
Received: from list by ciao.gmane.org with local (Exim 4.43)
	id 1IeXUZ-0007dP-JZ
	for freebsd-threads@freebsd.org; Sun, 07 Oct 2007 14:52:27 +0000
Received: from 89-172-61-172.adsl.net.t-com.hr ([89.172.61.172])
	by main.gmane.org with esmtp (Gmexim 0.1 (Debian))
	id 1AlnuQ-0007hv-00
	for <freebsd-threads@freebsd.org>; Sun, 07 Oct 2007 14:52:27 +0000
Received: from ivoras by 89-172-61-172.adsl.net.t-com.hr with local (Gmexim
	0.1 (Debian)) id 1AlnuQ-0007hv-00
	for <freebsd-threads@freebsd.org>; Sun, 07 Oct 2007 14:52:27 +0000
X-Injected-Via-Gmane: http://gmane.org/
To: freebsd-threads@freebsd.org
From: Ivan Voras <ivoras@freebsd.org>
Date: Sun, 07 Oct 2007 16:52:03 +0200
Lines: 90
Message-ID: <fearqk$sot$1@sea.gmane.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: usenet@sea.gmane.org
X-Gmane-NNTP-Posting-Host: 89-172-61-172.adsl.net.t-com.hr
User-Agent: Thunderbird 2.0.0.0 (X11/20070527)
Sender: news <news@sea.gmane.org>
Subject: Unexpected threading performance result
X-BeenThere: freebsd-threads@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Threading on FreeBSD <freebsd-threads.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>, 
	<mailto:freebsd-threads-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-threads>
List-Post: <mailto:freebsd-threads@freebsd.org>
List-Help: <mailto:freebsd-threads-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-threads>,
	<mailto:freebsd-threads-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Oct 2007 14:52:40 -0000

Hi,

For an unrelated purpose, I'm benchmarking performance of tree 
algorithms in SMP environments and my preliminary run has an unexpected 
result. Here's the typical output from the (small) benchmark program, 
run on a dual-core Athlon64 (i386 mode):

Running benchmarks on small_nonuniform, 1000000 samples
Step 1: Running 100 loops
** Step 1 benchmark completed 100 loops in 84.44 seconds.
Step 2: Running 2 threads with 100 loops each
** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds.

The interpretation is: running the same loop twice, in two parallel 
threads doesn't result in a speedup, but it looks like the execution is 
serialized. The problem is: the loops are completely independent, no 
locking in their execution, and 'top' confirms that both threads in the 
program are running at 100% CPU each.

I verified this behaviour on:

- 7-CURRENT, i386, ULE
- 7-CURRENT, i386, 4BSD
- 6-STABLE, amd64, 4BSD

I can't really explain this behaviour, but it might not be related to 
FreeBSD - maybe I made a mistake in the program or there's a 
hardware-related reason for it (maybe CPU cache trashing from the tree 
traversal?). In any case, can someone shed some light on this?

The main part of the (small) program is pasted below.


  47 double time_start, time_b1, time_b2;
  48 int n_data, n_samples;
  49 int *data, *samples;
  50
  51
  52 void bench_loop()
  53 {
  54         int i;
  55         struct node *nd, find;
  56         for (i = 0; i < n_samples; i++) {
  57                 find.data = samples[i];
  58                 nd = RB_FIND(node_tree, &head, &find);
  59                 if (nd == NULL)
  60                         errx(1, "Sample %d was not found", find.data);
  61         }
  62 }
  63
  64 void step1()
  65 {
  66         int n;
  67         /* step 1 - simple tree traversal */
  68         printf("Step 1: Running %d loops\n", STEP1_ITER);
  69         for (n = 0; n < STEP1_ITER; n++)
  70                 bench_loop();
  71         time_b1 = gettime();
  72         printf("** Step 1 benchmark completed %d loops in %0.2lf 
seconds.\n", STEP1_ITER, time_b1 - time_start);
  73 }
  74
  75 void *step2_thread(void *arg) {
  76         int n;
  77         for (n = 0; n < STEP2_ITER; n++)
  78                 bench_loop();
  79         return NULL;
  80 }
  81
  82 void step2()
  83 {
  84         /* step 2 - run tree traversal in parallel threads */
  85         int n;
  86         pthread_t threads[STEP2_THREADS];
  87
  88         printf("Step 2: Running %d threads with %d loops each\n", 
STEP2_THREADS, STEP2_ITER);
  89         for (n = 0; n < STEP2_THREADS; n++) {
  90                 if (pthread_create(&threads[n], NULL, step2_thread, 
NULL) != 0)
  91                         err(1, "Cannot spawn thread");
  92         }
  93         for (n = 0; n < STEP2_THREADS; n++)
  94                 pthread_join(threads[n], NULL);
  95         time_b2 = gettime();
  96         printf("** Step 2 benchmark completed %d loops in %d 
threads in %0.2lf seconds.\n",
  97                         STEP2_ITER, STEP2_THREADS, time_b2 - 
time_start);
  98 }