From owner-freebsd-stable@FreeBSD.ORG Thu Dec 22 09:08:00 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CC0F9106566C; Thu, 22 Dec 2011 09:08:00 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 53A0E8FC17; Thu, 22 Dec 2011 09:08:00 +0000 (UTC) Received: by vcbfk1 with SMTP id fk1so11310518vcb.13 for ; Thu, 22 Dec 2011 01:07:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=BFHDVlT/kICbC3GkTRTpeOyjPAiFIv8HCU7V+QP37Zw=; b=kvBBjSgbgG+aMLNcgz88F31xIyka0Hoqe2AszRDSPtf2Noi4+XWTIH017MtcjKap5G +7qyLyWLBE5havidcAJXlytasSnMD2Bm7s2byp6YzFzShAge6ayt42UetCkj1/qNZjaw 6ftrMaekQO6PHq0F4i0SwKtCjJDr4wmJ5NzkM= MIME-Version: 1.0 Received: by 10.52.22.193 with SMTP id g1mr2120727vdf.77.1324544879095; Thu, 22 Dec 2011 01:07:59 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.52.158.104 with HTTP; Thu, 22 Dec 2011 01:07:58 -0800 (PST) In-Reply-To: <20111222005250.GA23115@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <20111215215554.GA87606@troutmask.apl.washington.edu> <20111222005250.GA23115@troutmask.apl.washington.edu> Date: Thu, 22 Dec 2011 01:07:58 -0800 X-Google-Sender-Auth: 2iowPw7r6ScXngE2X2QfLuZUfRU Message-ID: From: Adrian Chadd To: Steve Kargl Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Attilio Rao , Andrey Chernov , George Mitchell , Doug Barton , freebsd-stable@freebsd.org Subject: Re: SCHED_ULE should not be the default X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Dec 2011 09:08:00 -0000 Are you able to go through the emails here and grab out Attilio's example for generating KTR scheduler traces? Adrian On 21 December 2011 16:52, Steve Kargl w= rote: > On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote: >> 2011/12/15 Steve Kargl : >> > On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote: >> >> >> >> I basically went through all the e-mail you just sent and identified = 4 >> >> real report on which we could work on and summarizied in the attached >> >> Excel file. >> >> I'd like that George, Steve, Doug, Andrey and Mike possibly review th= e >> >> few datas there and add more, if they want, or make more important >> >> clarifications in particular about the Xorg presence (or rather not) >> >> in their workload. >> > >> > Your summary of my observations appears correct. >> > >> > I have grabbed an up-to-date /usr/src, built and >> > installed world, and built and installed a new >> > kernel on one of the nodes in my cluster. ??It >> > has >> > >> >> It seems a perfect environment, just please make sure you made a >> debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically). >> >> The first thing is, can you try reproducing your case? As far as I got >> it, for you it was enough to run N + small_amount of CPU-bound threads >> to show performance penalty, so I'd ask you to start with using dnetc >> or just your preferred cpu-bound workload and verify you can reproduce >> the issue. >> As it happens, please monitor the threads bouncing and CPU utilization >> via 'top' (you don't need to be 100% precise, jut to get an idea, and >> keep an eye on things like excessive threads migration, thread binding >> obsessity, low throughput on CPU). >> One note: if your workloads need to do I/O please use a tempfs or >> memory storage to do so, in order to reduce I/O effects at all. >> Also, verify this doesn't happen with 4BSD scheduler, just in case. >> >> Finally, if the problem is still in place, please recompile your >> kernel by adding: >> options KTR >> options KTR_ENTRIES=3D262144 >> options KTR_COMPILE=3D(KTR_SCHED) >> options KTR_MASK=3D(KTR_SCHED) >> >> And reproduce the issue. >> When you are in the middle of the scheduling issue go with: >> # ktrdump -ctf > ktr-ule-problem-YOURNAME.out >> >> and send to the mailing list along with your dmesg and the >> informations on the CPU utilization you gathered by top(1). >> >> That should cover it all, but if you have further questions, please >> just go ahead. > > Attilio, > > I have placed several files at > > http://troutmask.apl.washington.edu/~kargl/freebsd > > dmesg.txt =A0 =A0 =A0--> dmesg for ULE kernel > summary =A0 =A0 =A0 =A0--> A summary that includes top(1) output of all r= uns. > sysctl.ule.txt --> sysctl -a for the ULE kernel > ktr-ule-problem-kargl.out.gz > > I performed a series of tests with both 4BSD and ULE kernels. > The 4BSD and ULE kernels are identical except of course for the > scheduler. =A0Both witness and invariants are disabled, and malloc > has been compiled without debugging. > > Here's what I did. =A0On the master node in my cluster, I ran an > OpenMPI code that sends N jobs off to the node with the kernel > of interest. =A0There is communication between the master and > slaves to generate 16 independent chunks of data. =A0Note, there > is no disk IO. =A0So, for example, N=3D4 will start 4 essentially > identical numerically intensity jobs. =A0At the start of a run, > the master node instructs each slave job to create a chunk of > data. =A0After the data is created, the slave sends it back to the > master and the master sends instructions to create the next chunk > of data. =A0This communication continues until the 16 chunks have > been assigned, computed, and returned to the master. > > Here is a rough measurement of the problem with ULE and numerical > intensity loads. =A0This command is executed on the master > > time mpiexec -machinefile mf3 -np N sasmp sas.in > > Since time is executed on the master, only the 'real' time is of > interest (the summary file includes user and sys times). =A0This > command is run at 5 times for each N value and up to 10 time for > some N values with the ULE kernel. =A0The following table records > the average 'real' time and the number in (...) is the mean > absolute deviations. > > # =A0N =A0 =A0 =A0 =A0 ULE =A0 =A0 =A0 =A0 =A0 =A0 4BSD > # ------------------------------------- > # =A04 =A0 =A0223.27 (0.502) =A0 221.76 (0.551) > # =A05 =A0 =A0404.35 (73.82) =A0 270.68 (0.866) > # =A06 =A0 =A0627.56 (173.0) =A0 247.23 (1.442) > # =A07 =A0 =A0475.53 (84.07) =A0 285.78 (1.421) > # =A08 =A0 =A0429.45 (134.9) =A0 223.64 (1.316) > > These numbers to me demonstrate that ULE is not a good choice > for a HPC workload. > > If you need more information, feel free to ask. =A0If you would > like access to the node, I can probably arrange that. =A0But, > we can discuss that off-line. > > -- > Steve > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"