From owner-freebsd-arm@FreeBSD.ORG  Fri Jan  3 06:07:19 2014
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C47444BA
 for <freebsd-arm@freebsd.org>; Fri,  3 Jan 2014 06:07:19 +0000 (UTC)
Received: from alogt.com (alogt.com [69.36.191.58])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9E2F21AD4
 for <freebsd-arm@freebsd.org>; Fri,  3 Jan 2014 06:07:19 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=alogt.com;
 s=default; 
 h=Content-Transfer-Encoding:Content-Type:Mime-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date;
 bh=YEInKPeXyFm/K39F1/akU6ImmnTvtwD8rIGGwjKh0a4=; 
 b=dKHB7oQg8zuCalFcqxljCthvXh1g5edyhhGAetFogeIlkS2kpaMNl1tPGXGG4Hx4NttKTEYiWm/zMpvmpwZD+91BAjqARKLT5IXuBhT8DlbRJlfj6TMloTrZujyAYQ/5Abp0DWxY9MlyJu7R5nDaZW591AQYO7KNWMU6JcfzKpY=;
Received: from [120.174.102.146] (port=49819 helo=X220.alogt.com)
 by sl-508-2.slc.westdc.net with esmtpsa (SSLv3:DHE-RSA-AES128-SHA:128)
 (Exim 4.82) (envelope-from <erichsfreebsdlist@alogt.com>)
 id 1Vyxua-0014QJ-E0; Thu, 02 Jan 2014 23:07:13 -0700
Date: Fri, 3 Jan 2014 14:06:58 +0800
From: Erich Dollansky <erichsfreebsdlist@alogt.com>
To: chump1@hushmail.com
Subject: Re: Beagle recommendations
Message-ID: <20140103140658.071f970d@X220.alogt.com>
In-Reply-To: <20140103052201.E9397200F5@smtp.hushmail.com>
References: <20140103052201.E9397200F5@smtp.hushmail.com>
X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.19; amd64-portbld-freebsd10.0)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - sl-508-2.slc.westdc.net
X-AntiAbuse: Original Domain - freebsd.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - alogt.com
X-Get-Message-Sender-Via: sl-508-2.slc.westdc.net: authenticated_id:
 erichsfreebsdlist@alogt.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
Cc: freebsd-arm@freebsd.org
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Porting FreeBSD to ARM processors." <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm/>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 06:07:19 -0000

Hi,

On Fri, 03 Jan 2014 00:22:01 -0500
chump1@hushmail.com wrote:

I have to say that my experience is not related to ARM CPUs but
PA-RISC, SPARC and x86 CPUs.
> 
> I have a fairly simple task that involves processing something in a
> 2D array, MxN times. I took a naive approach, 1x process 1x thread,
> and it took a little longer than desired. Well now, I could do better
> with some multi processing, especially on a multi core box, right?
> 
One process and one thread? You should not gain much as I understand
your writing.
> 
> Well, I have not had much luck. At first I spawned M threads and had
> each iterate over each N in turn, with M between 25-35. It took much,
> much longer than the single thread. I figured contention and overhead
> were costing me big, and gave it a shot with a scaled down version of
> the problem, M=10. Still, much slower than the single thread. A
> little confused, I went back to the big problem set (25-35), and made
> a new program that spawned only two threads, and each is limited to
> processing only even or only odd data sets. Even that still takes
> twice as long as the single thread version! What is up with that?
> 
Did you try one process per row having one thread per column?

Do the processes and threads have to interact or can each element
processed independent of the other elements?
> 
> More important asides, I am barely doing any real processing at all.
> It is basically a no-op, barely doing more than incrementing the
> counter. Should I expect to see performance gains once I am doing
> real work in the processing portion of my program? Should I expect to

You will not see the performance drop if you do more processing as the
context switches cost at the moment more time than anything else.

> see much different behavior on a different OS? Also I have one

If you would use a real-time OS, it could be possible but I see it
unlikely as your problem has nothing to do with reaction time.

> physical processor, two cores. Would I see better gains with more
> cores? How do you find processes and threads scale against hardware
> overall?

Your main problem seems to be that you keep the OS busy with context
switches. Use more loops. You could try one process with one thread per
row and then loop through the columns. Again, if your problem will
allow this.

And never forget, if this is all in a single array, you could use a
single process and then try to find the proper mix between number of
threads and loops. This would take some load of the CPU cache.

Erich