From owner-freebsd-stable@FreeBSD.ORG  Mon Dec 12 19:26:38 2011
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 327C0106566B;
	Mon, 12 Dec 2011 19:26:38 +0000 (UTC)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu
	[128.95.76.21])
	by mx1.freebsd.org (Postfix) with ESMTP id E4EEC8FC17;
	Mon, 12 Dec 2011 19:26:37 +0000 (UTC)
Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu
	[127.0.0.1])
	by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id
	pBCJQbTv087779; Mon, 12 Dec 2011 11:26:37 -0800 (PST)
	(envelope-from sgk@troutmask.apl.washington.edu)
Received: (from sgk@localhost)
	by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id
	pBCJQbD3087778; Mon, 12 Dec 2011 11:26:37 -0800 (PST)
	(envelope-from sgk)
Date: Mon, 12 Dec 2011 11:26:37 -0800
From: Steve Kargl <sgk@troutmask.apl.washington.edu>
To: Current FreeBSD <freebsd-current@freebsd.org>, freebsd-stable@freebsd.org, 
	freebsd-performance@freebsd.org
Message-ID: <20111212192637.GA87729@troutmask.apl.washington.edu>
References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com>
	<4EE6060D.5060201@mail.zedat.fu-berlin.de>
	<20111212155159.GB73597@troutmask.apl.washington.edu>
	<4EE6295B.3020308@cran.org.uk>
	<20111212170604.GA74044@troutmask.apl.washington.edu>
	<20111212190330.GA69380@sysmon.tcworks.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20111212190330.GA69380@sysmon.tcworks.net>
User-Agent: Mutt/1.4.2.3i
Cc: 
Subject: Re: SCHED_ULE should not be the default
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Dec 2011 19:26:38 -0000

On Mon, Dec 12, 2011 at 01:03:30PM -0600, Scott Lambert wrote:
> On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote:
> > Tuning kern.sched.preempt_thresh did not seem to help for
> > my workload.  My code is a classic master-slave OpenMPI
> > application where the master runs on one node and all
> > cpu-bound slaves are sent to a second node.  If I send
> > send ncpu+1 jobs to the 2nd node with ncpu's, then 
> > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
> > last two jobs are assigned to the ncpu'th cpu, and 
> > these ping-pong on the this cpu.  AFAICT, it is a cpu
> > affinity issue, where ULE is trying to keep each job
> > associated with its initially assigned cpu.
> > 
> > While one might suggest that starting ncpu+1 jobs
> > is not prudent, my example is just that.  It is an
> > example showing that ULE has performance issues. 
> > So, I now can start only ncpu jobs on each node
> > in the cluster and send emails to all other users
> > to not use those node, or use 4BSD and not worry
> > about loading issues.
> 
> Does it meet your expectations if you start (j modulo ncpu) = 0
> jobs on a node?
> 

I've never tried to launch more than ncpu + 1 (or + 2)
jobs.  I suppose at the time I was investigating the issue,
it was determined that 4BSD allowed me to get my work done
in a more timely manner.  So, I took the path of least
resistance.

-- 
Steve