From owner-freebsd-stable@FreeBSD.ORG Mon Dec 12 19:26:38 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 327C0106566B; Mon, 12 Dec 2011 19:26:38 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id E4EEC8FC17; Mon, 12 Dec 2011 19:26:37 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id pBCJQbTv087779; Mon, 12 Dec 2011 11:26:37 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id pBCJQbD3087778; Mon, 12 Dec 2011 11:26:37 -0800 (PST) (envelope-from sgk) Date: Mon, 12 Dec 2011 11:26:37 -0800 From: Steve Kargl To: Current FreeBSD , freebsd-stable@freebsd.org, freebsd-performance@freebsd.org Message-ID: <20111212192637.GA87729@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <20111212155159.GB73597@troutmask.apl.washington.edu> <4EE6295B.3020308@cran.org.uk> <20111212170604.GA74044@troutmask.apl.washington.edu> <20111212190330.GA69380@sysmon.tcworks.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20111212190330.GA69380@sysmon.tcworks.net> User-Agent: Mutt/1.4.2.3i Cc: Subject: Re: SCHED_ULE should not be the default X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2011 19:26:38 -0000 On Mon, Dec 12, 2011 at 01:03:30PM -0600, Scott Lambert wrote: > On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote: > > Tuning kern.sched.preempt_thresh did not seem to help for > > my workload. My code is a classic master-slave OpenMPI > > application where the master runs on one node and all > > cpu-bound slaves are sent to a second node. If I send > > send ncpu+1 jobs to the 2nd node with ncpu's, then > > ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The > > last two jobs are assigned to the ncpu'th cpu, and > > these ping-pong on the this cpu. AFAICT, it is a cpu > > affinity issue, where ULE is trying to keep each job > > associated with its initially assigned cpu. > > > > While one might suggest that starting ncpu+1 jobs > > is not prudent, my example is just that. It is an > > example showing that ULE has performance issues. > > So, I now can start only ncpu jobs on each node > > in the cluster and send emails to all other users > > to not use those node, or use 4BSD and not worry > > about loading issues. > > Does it meet your expectations if you start (j modulo ncpu) = 0 > jobs on a node? > I've never tried to launch more than ncpu + 1 (or + 2) jobs. I suppose at the time I was investigating the issue, it was determined that 4BSD allowed me to get my work done in a more timely manner. So, I took the path of least resistance. -- Steve