From owner-freebsd-threads@FreeBSD.ORG Sun Oct 7 14:52:40 2007 Return-Path: Delivered-To: freebsd-threads@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3CF2A16A417 for ; Sun, 7 Oct 2007 14:52:40 +0000 (UTC) (envelope-from gofdt-freebsd-threads@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id C26D413C47E for ; Sun, 7 Oct 2007 14:52:39 +0000 (UTC) (envelope-from gofdt-freebsd-threads@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1IeXUZ-0007dP-JZ for freebsd-threads@freebsd.org; Sun, 07 Oct 2007 14:52:27 +0000 Received: from 89-172-61-172.adsl.net.t-com.hr ([89.172.61.172]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 07 Oct 2007 14:52:27 +0000 Received: from ivoras by 89-172-61-172.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 07 Oct 2007 14:52:27 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-threads@freebsd.org From: Ivan Voras Date: Sun, 07 Oct 2007 16:52:03 +0200 Lines: 90 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 89-172-61-172.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.0 (X11/20070527) Sender: news Subject: Unexpected threading performance result X-BeenThere: freebsd-threads@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Threading on FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 07 Oct 2007 14:52:40 -0000 Hi, For an unrelated purpose, I'm benchmarking performance of tree algorithms in SMP environments and my preliminary run has an unexpected result. Here's the typical output from the (small) benchmark program, run on a dual-core Athlon64 (i386 mode): Running benchmarks on small_nonuniform, 1000000 samples Step 1: Running 100 loops ** Step 1 benchmark completed 100 loops in 84.44 seconds. Step 2: Running 2 threads with 100 loops each ** Step 2 benchmark completed 100 loops in 2 threads in 167.46 seconds. The interpretation is: running the same loop twice, in two parallel threads doesn't result in a speedup, but it looks like the execution is serialized. The problem is: the loops are completely independent, no locking in their execution, and 'top' confirms that both threads in the program are running at 100% CPU each. I verified this behaviour on: - 7-CURRENT, i386, ULE - 7-CURRENT, i386, 4BSD - 6-STABLE, amd64, 4BSD I can't really explain this behaviour, but it might not be related to FreeBSD - maybe I made a mistake in the program or there's a hardware-related reason for it (maybe CPU cache trashing from the tree traversal?). In any case, can someone shed some light on this? The main part of the (small) program is pasted below. 47 double time_start, time_b1, time_b2; 48 int n_data, n_samples; 49 int *data, *samples; 50 51 52 void bench_loop() 53 { 54 int i; 55 struct node *nd, find; 56 for (i = 0; i < n_samples; i++) { 57 find.data = samples[i]; 58 nd = RB_FIND(node_tree, &head, &find); 59 if (nd == NULL) 60 errx(1, "Sample %d was not found", find.data); 61 } 62 } 63 64 void step1() 65 { 66 int n; 67 /* step 1 - simple tree traversal */ 68 printf("Step 1: Running %d loops\n", STEP1_ITER); 69 for (n = 0; n < STEP1_ITER; n++) 70 bench_loop(); 71 time_b1 = gettime(); 72 printf("** Step 1 benchmark completed %d loops in %0.2lf seconds.\n", STEP1_ITER, time_b1 - time_start); 73 } 74 75 void *step2_thread(void *arg) { 76 int n; 77 for (n = 0; n < STEP2_ITER; n++) 78 bench_loop(); 79 return NULL; 80 } 81 82 void step2() 83 { 84 /* step 2 - run tree traversal in parallel threads */ 85 int n; 86 pthread_t threads[STEP2_THREADS]; 87 88 printf("Step 2: Running %d threads with %d loops each\n", STEP2_THREADS, STEP2_ITER); 89 for (n = 0; n < STEP2_THREADS; n++) { 90 if (pthread_create(&threads[n], NULL, step2_thread, NULL) != 0) 91 err(1, "Cannot spawn thread"); 92 } 93 for (n = 0; n < STEP2_THREADS; n++) 94 pthread_join(threads[n], NULL); 95 time_b2 = gettime(); 96 printf("** Step 2 benchmark completed %d loops in %d threads in %0.2lf seconds.\n", 97 STEP2_ITER, STEP2_THREADS, time_b2 - time_start); 98 }