From owner-freebsd-stable@FreeBSD.ORG  Mon Dec 12 17:06:07 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id D0371106564A;
	Mon, 12 Dec 2011 17:06:07 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21])
	by mx1.freebsd.org (Postfix) with ESMTP id 8DE578FC12;
	Mon, 12 Dec 2011 17:06:07 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	pBCH64v5074234; Mon, 12 Dec 2011 09:06:04 -0800 (PST)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	pBCH64Fe074233; Mon, 12 Dec 2011 09:06:04 -0800 (PST)
	(envelope-from sgk)
Date: Mon, 12 Dec 2011 09:06:04 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Bruce Cran <bruce@cran.org.uk>
Message-ID: <20111212170604.GA74044@troutmask.apl.washington.edu>
References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com>
	<4EE6060D.5060201@mail.zedat.fu-berlin.de>
	<20111212155159.GB73597@troutmask.apl.washington.edu>
	<4EE6295B.3020308@cran.org.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4EE6295B.3020308@cran.org.uk>
User-Agent: Mutt/1.4.2.3i
Cc: "O. Hartmann" <ohartman@mail.zedat.fu-berlin.de>,
	Current FreeBSD <freebsd-current@freebsd.org>,
	freebsd-stable@freebsd.org, freebsd-performance@freebsd.org
Subject: Re: SCHED_ULE should not be the default
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Dec 2011 17:06:07 -0000

On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote:
> On 12/12/2011 15:51, Steve Kargl wrote:
> >This comes up every 9 months or so, and must be approaching FAQ 
> >status. In a HPC environment, I recommend 4BSD. Depending on the 
> >workload, ULE can cause a severe increase in turn around time when 
> >doing already long computations. If you have an MPI application, 
> >simply launching greater than ncpu+1 jobs can show the problem. PS: 
> >search the list archives for "kargl and ULE". 
> 
> This isn't something that can be fixed by tuning ULE? For example for 
> desktop applications kern.sched.preempt_thresh should be set to 224 from 
> its default. I'm wondering if the installer should ask people what the 
> typical use will be, and tune the scheduler appropriately.
> 

Tuning kern.sched.preempt_thresh did not seem to help for
my workload.  My code is a classic master-slave OpenMPI
application where the master runs on one node and all
cpu-bound slaves are sent to a second node.  If I send
send ncpu+1 jobs to the 2nd node with ncpu's, then 
ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
last two jobs are assigned to the ncpu'th cpu, and 
these ping-pong on the this cpu.  AFAICT, it is a cpu
affinity issue, where ULE is trying to keep each job
associated with its initially assigned cpu.

While one might suggest that starting ncpu+1 jobs
is not prudent, my example is just that.  It is an
example showing that ULE has performance issues. 
So, I now can start only ncpu jobs on each node
in the cluster and send emails to all other users
to not use those node, or use 4BSD and not worry
about loading issues.

-- 
Steve