From owner-freebsd-arm@FreeBSD.ORG Fri Jan 3 05:22:03 2014 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 462094FF for ; Fri, 3 Jan 2014 05:22:03 +0000 (UTC) Received: from smtp2.hushmail.com (smtp2a.hushmail.com [65.39.178.237]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 2C1A81818 for ; Fri, 3 Jan 2014 05:22:02 +0000 (UTC) Received: from smtp2.hushmail.com (smtp2a.hushmail.com [65.39.178.237]) by smtp2.hushmail.com (Postfix) with SMTP id 0C330A00BB for ; Fri, 3 Jan 2014 05:22:02 +0000 (UTC) Received: from smtp.hushmail.com (w7.hushmail.com [65.39.178.32]) by smtp2.hushmail.com (Postfix) with ESMTP for ; Fri, 3 Jan 2014 05:22:01 +0000 (UTC) Received: by smtp.hushmail.com (Postfix, from userid 99) id E9397200F5; Fri, 3 Jan 2014 05:22:01 +0000 (UTC) MIME-Version: 1.0 Date: Fri, 03 Jan 2014 00:22:01 -0500 To: freebsd-arm@freebsd.org Subject: Beagle recommendations From: chump1@hushmail.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="UTF-8" Message-Id: <20140103052201.E9397200F5@smtp.hushmail.com> X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 05:22:03 -0000 I have a fairly simple task that involves processing something in a 2D array, MxN times. I took a naive approach, 1x process 1x thread, and it took a little longer than desired. Well now, I could do better with some multi processing, especially on a multi core box, right? Well, I have not had much luck. At first I spawned M threads and had each iterate over each N in turn, with M between 25-35. It took much, much longer than the single thread. I figured contention and overhead were costing me big, and gave it a shot with a scaled down version of the problem, M=10. Still, much slower than the single thread. A little confused, I went back to the big problem set (25-35), and made a new program that spawned only two threads, and each is limited to processing only even or only odd data sets. Even that still takes twice as long as the single thread version! What is up with that? More important asides, I am barely doing any real processing at all. It is basically a no-op, barely doing more than incrementing the counter. Should I expect to see performance gains once I am doing real work in the processing portion of my program? Should I expect to see much different behavior on a different OS? Also I have one physical processor, two cores. Would I see better gains with more cores? How do you find processes and threads scale against hardware overall? Thanks! Sent using Hushmail