From owner-freebsd-stable@FreeBSD.ORG Mon Dec 12 17:06:07 2011 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0371106564A; Mon, 12 Dec 2011 17:06:07 +0000 (UTC) (envelope-from sgk@troutmask.apl.washington.edu) Received: from troutmask.apl.washington.edu (troutmask.apl.washington.edu [128.95.76.21]) by mx1.freebsd.org (Postfix) with ESMTP id 8DE578FC12; Mon, 12 Dec 2011 17:06:07 +0000 (UTC) Received: from troutmask.apl.washington.edu (localhost.apl.washington.edu [127.0.0.1]) by troutmask.apl.washington.edu (8.14.5/8.14.5) with ESMTP id pBCH64v5074234; Mon, 12 Dec 2011 09:06:04 -0800 (PST) (envelope-from sgk@troutmask.apl.washington.edu) Received: (from sgk@localhost) by troutmask.apl.washington.edu (8.14.5/8.14.5/Submit) id pBCH64Fe074233; Mon, 12 Dec 2011 09:06:04 -0800 (PST) (envelope-from sgk) Date: Mon, 12 Dec 2011 09:06:04 -0800 From: Steve Kargl To: Bruce Cran Message-ID: <20111212170604.GA74044@troutmask.apl.washington.edu> References: <4EE1EAFE.3070408@m5p.com> <4EE22421.9060707@gmail.com> <4EE6060D.5060201@mail.zedat.fu-berlin.de> <20111212155159.GB73597@troutmask.apl.washington.edu> <4EE6295B.3020308@cran.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4EE6295B.3020308@cran.org.uk> User-Agent: Mutt/1.4.2.3i Cc: "O. Hartmann" , Current FreeBSD , freebsd-stable@freebsd.org, freebsd-performance@freebsd.org Subject: Re: SCHED_ULE should not be the default X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2011 17:06:07 -0000 On Mon, Dec 12, 2011 at 04:18:35PM +0000, Bruce Cran wrote: > On 12/12/2011 15:51, Steve Kargl wrote: > >This comes up every 9 months or so, and must be approaching FAQ > >status. In a HPC environment, I recommend 4BSD. Depending on the > >workload, ULE can cause a severe increase in turn around time when > >doing already long computations. If you have an MPI application, > >simply launching greater than ncpu+1 jobs can show the problem. PS: > >search the list archives for "kargl and ULE". > > This isn't something that can be fixed by tuning ULE? For example for > desktop applications kern.sched.preempt_thresh should be set to 224 from > its default. I'm wondering if the installer should ask people what the > typical use will be, and tune the scheduler appropriately. > Tuning kern.sched.preempt_thresh did not seem to help for my workload. My code is a classic master-slave OpenMPI application where the master runs on one node and all cpu-bound slaves are sent to a second node. If I send send ncpu+1 jobs to the 2nd node with ncpu's, then ncpu-1 jobs are assigned to the 1st ncpu-1 cpus. The last two jobs are assigned to the ncpu'th cpu, and these ping-pong on the this cpu. AFAICT, it is a cpu affinity issue, where ULE is trying to keep each job associated with its initially assigned cpu. While one might suggest that starting ncpu+1 jobs is not prudent, my example is just that. It is an example showing that ULE has performance issues. So, I now can start only ncpu jobs on each node in the cluster and send emails to all other users to not use those node, or use 4BSD and not worry about loading issues. -- Steve