Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 26 Aug 2017 11:12:02 -0700 (PDT)
From:      "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Don Lewis <truckman@freebsd.org>, avg@freebsd.org, freebsd-arch@freebsd.org
Subject:   Re: ULE steal_idle questions
Message-ID:  <201708261812.v7QIC2eJ074443@pdx.rh.CN85.dnsmgr.net>
In-Reply-To: <20170826094725.G1648@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
> On Fri, 25 Aug 2017, Don Lewis wrote:
> 
> > ...
> > Something else that I did not expect is the how frequently threads are
> > stolen from the other SMT thread on the same core, even though I
> > increased steal_thresh from 2 to 3 to account for the off-by-one
> > problem.  This is true even right after the system has booted and no
> > significant load has been applied.  My best guess is that because of
> > affinity, both the parent and child processes run on the same CPU after
> > fork(), and if a number of processes are forked() in quick succession,
> > the run queue of that CPU can get really long.  Forcing a thread
> > migration in exec() might be a good solution.
> 
> Since you are trying a lot of combinations, maybe you can tell us which
> ones work best.  SCHED_4BSD works better for me on an old 2-core system.
> SCHED_ULE works better on a not-so old 4x2 core (Haswell) system, but I 
> don't like it due to its complexity.  It makes differences of at most
> +-2% except when mistuned it can give -5% for real time (but better for
> CPU and presumably power).
> 
> For SCHED_4BSD, I wrote fancy tuning for fork/exec and sometimes get
> everything to like up for a 3% improvement (803 seconds instead of 823
> on the old system, with -current much slower at 840+ and old versions
> of ULE before steal_idle taking 890+).  This is very resource (mainly
> cache associativity?) dependent and my tuning makes little difference
> on the newer system.  SCHED_ULE still has bugfeatures which tend to
> help large builds by reducing context switching, e.g., by bogusly
> clamping all CPU-bound threads to nearly maximal priority.

That last bugfeature is probably what makes current systems
interactive performance tank rather badly when under heavy
loads.  Would it be hard to fix?


-- 
Rod Grimes                                                 rgrimes@freebsd.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201708261812.v7QIC2eJ074443>