Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Jan 2014 22:41:15 -0800
From:      Tim Kientzle <tim@kientzle.com>
To:        chump1@hushmail.com
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: pthread basics and contention
Message-ID:  <4EFEA29F-4D6E-4B4A-8C26-E15FA62B574C@kientzle.com>
In-Reply-To: <CAHwLALPNoerSBvP1wOiLQPE6oiTO4j%2Bk3nU0=mq6ogRfXQKZRg@mail.gmail.com>
References:  <20140103051707.C11B6200F5@smtp.hushmail.com> <CAHwLALPNoerSBvP1wOiLQPE6oiTO4j%2Bk3nU0=mq6ogRfXQKZRg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Depending on the calculation involved, you may
be memory bus constrained, not CPU constrained.

On modern processors, it is often the case that
it takes longer to get the data to/from memory than
to actually compute anything.  In those cases, splitting
your work into threads just gives you more CPUs
waiting on the same slow memory.

Deciding whether this is the issue or not requires
good processor-level profiling tools.

Tim


On Fri, Jan 3, 2014 at 12:17 AM,  <chump1@hushmail.com> wrote:
>=20
> I have a fairly simple task that involves processing something in a 2D =
array, MxN times. I took a naive approach, 1x process 1x thread, and it =
took a little longer than desired. Well now, I could do better with some =
multi processing, especially on a multi core box, right?
>=20
> Well, I have not had much luck. At first I spawned M threads and had =
each iterate over each N in turn, with M between 25-35. It took much, =
much longer than the single thread. I figured contention and overhead =
were costing me big, and gave it a shot with a scaled down version of =
the problem, M=3D10. Still, much slower than the single thread. A little =
confused, I went back to the big problem set (25-35), and made a new =
program that spawned only two threads, and each is limited to processing =
only even or only odd data sets. Even that still takes twice as long as =
the single thread version! What is up with that?
>=20
> More important asides, I am barely doing any real processing at all. =
It is basically a no-op, barely doing more than incrementing the =
counter. Should I expect to see performance gains once I am doing real =
work in the processing portion of my program? Should I expect to see =
much different behavior on a different OS? Also I have one physical =
processor, two cores. Would I see better gains with more cores? How do =
you find processes and threads scale against hardware overall?
>=20
> Thanks!
>=20
> Sent using Hushmail





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EFEA29F-4D6E-4B4A-8C26-E15FA62B574C>