From owner-freebsd-arm@FreeBSD.ORG Fri Jan 3 06:07:19 2014 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C47444BA for ; Fri, 3 Jan 2014 06:07:19 +0000 (UTC) Received: from alogt.com (alogt.com [69.36.191.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 9E2F21AD4 for ; Fri, 3 Jan 2014 06:07:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=alogt.com; s=default; h=Content-Transfer-Encoding:Content-Type:Mime-Version:References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date; bh=YEInKPeXyFm/K39F1/akU6ImmnTvtwD8rIGGwjKh0a4=; b=dKHB7oQg8zuCalFcqxljCthvXh1g5edyhhGAetFogeIlkS2kpaMNl1tPGXGG4Hx4NttKTEYiWm/zMpvmpwZD+91BAjqARKLT5IXuBhT8DlbRJlfj6TMloTrZujyAYQ/5Abp0DWxY9MlyJu7R5nDaZW591AQYO7KNWMU6JcfzKpY=; Received: from [120.174.102.146] (port=49819 helo=X220.alogt.com) by sl-508-2.slc.westdc.net with esmtpsa (SSLv3:DHE-RSA-AES128-SHA:128) (Exim 4.82) (envelope-from ) id 1Vyxua-0014QJ-E0; Thu, 02 Jan 2014 23:07:13 -0700 Date: Fri, 3 Jan 2014 14:06:58 +0800 From: Erich Dollansky To: chump1@hushmail.com Subject: Re: Beagle recommendations Message-ID: <20140103140658.071f970d@X220.alogt.com> In-Reply-To: <20140103052201.E9397200F5@smtp.hushmail.com> References: <20140103052201.E9397200F5@smtp.hushmail.com> X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.19; amd64-portbld-freebsd10.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - sl-508-2.slc.westdc.net X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - alogt.com X-Get-Message-Sender-Via: sl-508-2.slc.westdc.net: authenticated_id: erichsfreebsdlist@alogt.com X-Source: X-Source-Args: X-Source-Dir: Cc: freebsd-arm@freebsd.org X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 06:07:19 -0000 Hi, On Fri, 03 Jan 2014 00:22:01 -0500 chump1@hushmail.com wrote: I have to say that my experience is not related to ARM CPUs but PA-RISC, SPARC and x86 CPUs. > > I have a fairly simple task that involves processing something in a > 2D array, MxN times. I took a naive approach, 1x process 1x thread, > and it took a little longer than desired. Well now, I could do better > with some multi processing, especially on a multi core box, right? > One process and one thread? You should not gain much as I understand your writing. > > Well, I have not had much luck. At first I spawned M threads and had > each iterate over each N in turn, with M between 25-35. It took much, > much longer than the single thread. I figured contention and overhead > were costing me big, and gave it a shot with a scaled down version of > the problem, M=10. Still, much slower than the single thread. A > little confused, I went back to the big problem set (25-35), and made > a new program that spawned only two threads, and each is limited to > processing only even or only odd data sets. Even that still takes > twice as long as the single thread version! What is up with that? > Did you try one process per row having one thread per column? Do the processes and threads have to interact or can each element processed independent of the other elements? > > More important asides, I am barely doing any real processing at all. > It is basically a no-op, barely doing more than incrementing the > counter. Should I expect to see performance gains once I am doing > real work in the processing portion of my program? Should I expect to You will not see the performance drop if you do more processing as the context switches cost at the moment more time than anything else. > see much different behavior on a different OS? Also I have one If you would use a real-time OS, it could be possible but I see it unlikely as your problem has nothing to do with reaction time. > physical processor, two cores. Would I see better gains with more > cores? How do you find processes and threads scale against hardware > overall? Your main problem seems to be that you keep the OS busy with context switches. Use more loops. You could try one process with one thread per row and then loop through the columns. Again, if your problem will allow this. And never forget, if this is all in a single array, you could use a single process and then try to find the proper mix between number of threads and loops. This would take some load of the CPU cache. Erich