Date: Fri, 3 Jan 2014 00:52:45 -0500 From: Rayson Ho <raysonlogin@gmail.com> To: chump1@hushmail.com Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org> Subject: Re: pthread basics and contention Message-ID: <CAHwLALPNoerSBvP1wOiLQPE6oiTO4j%2Bk3nU0=mq6ogRfXQKZRg@mail.gmail.com> In-Reply-To: <20140103051707.C11B6200F5@smtp.hushmail.com> References: <20140103051707.C11B6200F5@smtp.hushmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
It depends on how you partition the work items. If the even & odd data end up sharing the same cacheline, then it can be slow... You may want to google: cache ping pong effect Rayson =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html On Fri, Jan 3, 2014 at 12:17 AM, <chump1@hushmail.com> wrote: > > I have a fairly simple task that involves processing something in a 2D ar= ray, MxN times. I took a naive approach, 1x process 1x thread, and it took = a little longer than desired. Well now, I could do better with some multi p= rocessing, especially on a multi core box, right? > > > > > Well, I have not had much luck. At first I spawned M threads and had each= iterate over each N in turn, with M between 25-35. It took much, much long= er than the single thread. I figured contention and overhead were costing m= e big, and gave it a shot with a scaled down version of the problem, M=3D10= . Still, much slower than the single thread. A little confused, I went back= to the big problem set (25-35), and made a new program that spawned only t= wo threads, and each is limited to processing only even or only odd data se= ts. Even that still takes twice as long as the single thread version! What = is up with that? > > > > > More important asides, I am barely doing any real processing at all. It i= s basically a no-op, barely doing more than incrementing the counter. Shoul= d I expect to see performance gains once I am doing real work in the proces= sing portion of my program? Should I expect to see much different behavior = on a different OS? Also I have one physical processor, two cores. Would I s= ee better gains with more cores? How do you find processes and threads scal= e against hardware overall? > > > > > Thanks! > > > Sent using Hushmail > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= "
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAHwLALPNoerSBvP1wOiLQPE6oiTO4j%2Bk3nU0=mq6ogRfXQKZRg>