Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Dec 2011 00:48:38 +0100
From:      "O. Hartmann" <ohartman@zedat.fu-berlin.de>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        Bruce Cran <bruce@cran.org.uk>, "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>, Current FreeBSD <freebsd-current@freebsd.org>, freebsd-stable@freebsd.org, freebsd-performance@freebsd.org
Subject:   Re: SCHED_ULE should not be the default
Message-ID:  <4EE692D6.5010208@zedat.fu-berlin.de>
In-Reply-To: <20111212170604.GA74044@troutmask.apl.washington.edu>
References:  <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <20111212155159.GB73597@troutmask.apl.washington.edu> <4EE6295B.3020308@cran.org.uk> <20111212170604.GA74044@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigBAD169A868B0E5A48D5A4634
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 12/12/11 18:06, Steve Kargl wrote:
> On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote:
>> On 12/12/2011 15:51, Steve Kargl wrote:
>>> This comes up every 9 months or so, and must be approaching FAQ=20
>>> status. In a HPC environment, I recommend 4BSD. Depending on the=20
>>> workload, ULE can cause a severe increase in turn around time when=20
>>> doing already long computations. If you have an MPI application,=20
>>> simply launching greater than ncpu+1 jobs can show the problem. PS:=20
>>> search the list archives for "kargl and ULE".=20
>>
>> This isn't something that can be fixed by tuning ULE? For example for =

>> desktop applications kern.sched.preempt_thresh should be set to 224 fr=
om=20
>> its default. I'm wondering if the installer should ask people what the=
=20
>> typical use will be, and tune the scheduler appropriately.
>>

Is the tuning of kern.sched.preempt_thresh and a proper method of
estimating its correct value for the intended to use workload documented
in the manpages, maybe tuning()?

I find it hard to crawl a lot of pros and cons of mailing lists for
evaluating a correct value of this, seemingly, important tunable.

>=20
> Tuning kern.sched.preempt_thresh did not seem to help for
> my workload.  My code is a classic master-slave OpenMPI
> application where the master runs on one node and all
> cpu-bound slaves are sent to a second node.  If I send
> send ncpu+1 jobs to the 2nd node with ncpu's, then=20
> ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
> last two jobs are assigned to the ncpu'th cpu, and=20
> these ping-pong on the this cpu.  AFAICT, it is a cpu
> affinity issue, where ULE is trying to keep each job
> associated with its initially assigned cpu.
>=20
> While one might suggest that starting ncpu+1 jobs
> is not prudent, my example is just that.  It is an
> example showing that ULE has performance issues.=20
> So, I now can start only ncpu jobs on each node
> in the cluster and send emails to all other users
> to not use those node, or use 4BSD and not worry
> about loading issues.
>=20



--------------enigBAD169A868B0E5A48D5A4634
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iQEcBAEBAgAGBQJO5pLWAAoJEOgBcD7A/5N86FIIAMlp2MmSfYGAw+Gqn5MuN/s1
VxWt+47R+tii3x2I5rvjigs2+c5BbMhQ5B/+LS1qU8OspeAwWcvqYnXCXwKs7kUo
FG+8mmdyVaqt9s1hoh/W4tHgDgL/DCMxwkIfS3yVubjqOltDo7npcre7sMoUaEjL
lv0ySiLArwHbnD4mdrC3gJz/fW0enmNOl9wGYWWcUPcDdJ5XdYMSfSGk0W6bpSgA
ewDaoPtz1jh/CkLAVH59/cxcHowtsM9YcrdTOPKOIAI9amNChlvtuv8Sv8g2LC9e
RhgNHCE6RKVqAIpyIZLTFZ6pUfTtQeI6CtqWHDDAvhYAUEZxZmBDErazPkkirWQ=
=prJ+
-----END PGP SIGNATURE-----

--------------enigBAD169A868B0E5A48D5A4634--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EE692D6.5010208>