Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 22 Dec 2011 01:07:58 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Steve Kargl <sgk@troutmask.apl.washington.edu>
Cc:        Attilio Rao <attilio@freebsd.org>, Andrey Chernov <ache@nagual.pp.ru>, George Mitchell <george+freebsd@m5p.com>, Doug Barton <dougb@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: SCHED_ULE should not be the default
Message-ID:  <CAJ-VmonkjXV-w52Ofbi7zrOYpCdrbjojkV-2kHBATe0JbTWikQ@mail.gmail.com>
In-Reply-To: <20111222005250.GA23115@troutmask.apl.washington.edu>
References:  <4EE1EAFE.3070408@m5p.com> <CAJ-FndBSOS3hKYqmPnVkoMhPmowBBqy9-%2BeJJEMTdoVjdMTEdw@mail.gmail.com> <20111215215554.GA87606@troutmask.apl.washington.edu> <CAJ-FndD0vFWUnRPxz6CTR5JBaEaY3gh9y7-Dy6Gds69_aRgfpg@mail.gmail.com> <20111222005250.GA23115@troutmask.apl.washington.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
Are you able to go through the emails here and grab out Attilio's
example for generating KTR scheduler traces?


Adrian

On 21 December 2011 16:52, Steve Kargl <sgk@troutmask.apl.washington.edu> w=
rote:
> On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
>> 2011/12/15 Steve Kargl <sgk@troutmask.apl.washington.edu>:
>> > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
>> >>
>> >> I basically went through all the e-mail you just sent and identified =
4
>> >> real report on which we could work on and summarizied in the attached
>> >> Excel file.
>> >> I'd like that George, Steve, Doug, Andrey and Mike possibly review th=
e
>> >> few datas there and add more, if they want, or make more important
>> >> clarifications in particular about the Xorg presence (or rather not)
>> >> in their workload.
>> >
>> > Your summary of my observations appears correct.
>> >
>> > I have grabbed an up-to-date /usr/src, built and
>> > installed world, and built and installed a new
>> > kernel on one of the nodes in my cluster. ??It
>> > has
>> >
>>
>> It seems a perfect environment, just please make sure you made a
>> debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).
>>
>> The first thing is, can you try reproducing your case? As far as I got
>> it, for you it was enough to run N + small_amount of CPU-bound threads
>> to show performance penalty, so I'd ask you to start with using dnetc
>> or just your preferred cpu-bound workload and verify you can reproduce
>> the issue.
>> As it happens, please monitor the threads bouncing and CPU utilization
>> via 'top' (you don't need to be 100% precise, jut to get an idea, and
>> keep an eye on things like excessive threads migration, thread binding
>> obsessity, low throughput on CPU).
>> One note: if your workloads need to do I/O please use a tempfs or
>> memory storage to do so, in order to reduce I/O effects at all.
>> Also, verify this doesn't happen with 4BSD scheduler, just in case.
>>
>> Finally, if the problem is still in place, please recompile your
>> kernel by adding:
>> options KTR
>> options KTR_ENTRIES=3D262144
>> options KTR_COMPILE=3D(KTR_SCHED)
>> options KTR_MASK=3D(KTR_SCHED)
>>
>> And reproduce the issue.
>> When you are in the middle of the scheduling issue go with:
>> # ktrdump -ctf > ktr-ule-problem-YOURNAME.out
>>
>> and send to the mailing list along with your dmesg and the
>> informations on the CPU utilization you gathered by top(1).
>>
>> That should cover it all, but if you have further questions, please
>> just go ahead.
>
> Attilio,
>
> I have placed several files at
>
> http://troutmask.apl.washington.edu/~kargl/freebsd
>
> dmesg.txt =A0 =A0 =A0--> dmesg for ULE kernel
> summary =A0 =A0 =A0 =A0--> A summary that includes top(1) output of all r=
uns.
> sysctl.ule.txt --> sysctl -a for the ULE kernel
> ktr-ule-problem-kargl.out.gz
>
> I performed a series of tests with both 4BSD and ULE kernels.
> The 4BSD and ULE kernels are identical except of course for the
> scheduler. =A0Both witness and invariants are disabled, and malloc
> has been compiled without debugging.
>
> Here's what I did. =A0On the master node in my cluster, I ran an
> OpenMPI code that sends N jobs off to the node with the kernel
> of interest. =A0There is communication between the master and
> slaves to generate 16 independent chunks of data. =A0Note, there
> is no disk IO. =A0So, for example, N=3D4 will start 4 essentially
> identical numerically intensity jobs. =A0At the start of a run,
> the master node instructs each slave job to create a chunk of
> data. =A0After the data is created, the slave sends it back to the
> master and the master sends instructions to create the next chunk
> of data. =A0This communication continues until the 16 chunks have
> been assigned, computed, and returned to the master.
>
> Here is a rough measurement of the problem with ULE and numerical
> intensity loads. =A0This command is executed on the master
>
> time mpiexec -machinefile mf3 -np N sasmp sas.in
>
> Since time is executed on the master, only the 'real' time is of
> interest (the summary file includes user and sys times). =A0This
> command is run at 5 times for each N value and up to 10 time for
> some N values with the ULE kernel. =A0The following table records
> the average 'real' time and the number in (...) is the mean
> absolute deviations.
>
> # =A0N =A0 =A0 =A0 =A0 ULE =A0 =A0 =A0 =A0 =A0 =A0 4BSD
> # -------------------------------------
> # =A04 =A0 =A0223.27 (0.502) =A0 221.76 (0.551)
> # =A05 =A0 =A0404.35 (73.82) =A0 270.68 (0.866)
> # =A06 =A0 =A0627.56 (173.0) =A0 247.23 (1.442)
> # =A07 =A0 =A0475.53 (84.07) =A0 285.78 (1.421)
> # =A08 =A0 =A0429.45 (134.9) =A0 223.64 (1.316)
>
> These numbers to me demonstrate that ULE is not a good choice
> for a HPC workload.
>
> If you need more information, feel free to ask. =A0If you would
> like access to the node, I can probably arrange that. =A0But,
> we can discuss that off-line.
>
> --
> Steve
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-VmonkjXV-w52Ofbi7zrOYpCdrbjojkV-2kHBATe0JbTWikQ>