Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 7 Jun 2007 14:39:53 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        src-committers@freebsd.org, John Baldwin <jhb@freebsd.org>, cvs-src@freebsd.org, cvs-all@freebsd.org, Attilio Rao <attilio@freebsd.org>, Kostik Belousov <kostikbel@gmail.com>, Jeff Roberson <jroberson@chesapeake.net>
Subject:   Re: cvs commit: src/sys/kern kern_mutex.c
Message-ID:  <20070607133524.S7002@besplex.bde.org>
In-Reply-To: <20070606154548.F3105@besplex.bde.org>
References:  <200706051420.l55EKEih018925@repoman.freebsd.org> <3bbf2fe10706050829o2d756a4cu22f98cf11c01f5e4@mail.gmail.com> <3bbf2fe10706050843x5aaafaafy284e339791bcfe42@mail.gmail.com> <200706051230.21242.jhb@freebsd.org> <20070606094354.E51708@delplex.bde.org> <20070605195839.I606@10.0.0.1> <20070606154548.F3105@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 6 Jun 2007, Bruce Evans wrote:

> On Tue, 5 Jun 2007, Jeff Roberson wrote:

>> You should try with kern.sched.pick_pri = 0.  I have changed this to be the 
>> default recently.  This weakens the preemption and speeds up some 
>> workloads.
>
> I haven't tried a new SCHED_ULE kernel yet.

Tried now.  In my makeworld benchmark, SCHED_ULE is now only 4% slower
than SCHED_4BSD (after losing 2% in SCHED_4BSD) (down from about 7%
slower).  The difference is still from CPUs idling too much.

Best result ever (SCHED_4BSD, June 4 kernel, no PREEMPTION):
---
       827.48 real      1309.26 user       186.86 sys
    1332122  voluntary context switches
    1535129  involuntary context switches
pagezero time 6 seconds
---

After thread lock changes (SCHED_4BSD, no PREEMPTION):
---
       847.70 real      1309.83 user       169.39 sys
    2933415  voluntary context switches
    1501808  involuntary context switches
pagezero time 30 seconds.

Unlike what I wrote before, there is a scheduling bug that affects
pagezero directly.  The bug from last month involving pagezero losing
its priority of PRI_MAX_IDLE and running at priority PUSER is back.
This bug seemed to be gone in the June 4 kernel, but actually only
happens less there.  This bug seems to cost 0.5-1.0% real time.
---

After thread lock changes (SCHED_4BSD, now with PREEMPTION):
---
       843.34 real      1304.00 user       168.87 sys
    1651011  voluntary context switches
    1630988  involuntary context switches
pagezero time 27 seconds

The problem with the extra context switches is gone (these context switch
counts are like the ones in old kernels with PREEMPTION).  This result is
affected by pagezero getting its priority clobbered.  The best result for
an old kernel with PREMPTION was about 840 seconds, before various
optimizations reduced this to 827 seconds (-0+4 seconds).
---

Old run with SCHED_ULE (Mar 18):
       899.50 real      1311.00 user       187.47 sys
    1566366  voluntary context switches
    1959436  involuntary context switches
pagezero time 19 seconds
---

Today with SCHED_ULE:
---
       883.65 real      1290.92 user       188.21 sys
    1658109  voluntary context switches
    1708148  involuntary context switches
pagezero time 7 seconds.
---

In all of these, the user + sys decomposition is very inaccurate, but the
(user + sys + pagezero_time) total is fairly accurate.  It is 1500+-2 for
SCHED_4BSD and 1500+-17 for SCHED_ULE (old ULE larger, current ULE smaller).

SCHED_ULE now shows intereting behaviour for non-parallel kernel
builds on a 2-way SMP machine.  It is now slightly faster than SCHED_4BSD
for this, but still much slower for parallel kernel builds.  This might
be because it likes to leave 1 CPU idle to wait to find a better CPU to
run on, and this is actually an optimization when there is >= 1 CPU to
spare:

RELENG_4 kernel build on nfs, non-parallel make.
Best ever with SCHED_ULE (~June 4 kernel):
        62.55 real        55.30 user         3.65 sys
Current with SCHED_ULE:
        62.18 real        54.91 user         3.51 sys

RELENG_4 kernel build on nfs, make -j4.
Best ever for SCHED_ULE (~June 4 kernel):
        32.00 real        56.98 user         3.90 sys
Current with SCHED_ULE:
        33.11 real        56.01 user         4.12 sys
ULE has been about 1 second slower for this since at least last November.
It presumably reduces user+sys time by running pagezero more.

The slowdown is much larger for a build on ffs:

Non-parallel results not shown (litte difference from above).

RELENG_4 kernel build on ffs, make -j4.
Best ever for SCHED_ULE (~June 4 kernel):
        29.94 real        56.03 user         3.12 sys
Current with SCHED_ULE:
        32.63 real        55.13 user         3.53 sys
Now 9% of the real time (= 18% of the cycles on one CPU = almost the
sys sys overhead) is apparently wasted by leaving one CPU idle.  This
benchmark is of course dominated by many instances of 2 gcc hogs which
should be scheduled to run in parallel with no idle cycles.  (In all
these kernel benchmarks, everything except disk writes is cached before
starting.  In other makeworld benchmarks, everything is cached before
starting on the nfs server, while on the client nothing is cached.)

I don't have context switch counts or pagezero times for the kernel builds.
stathz is 100 = hz.  Maybe SCHED_ULE doesn't like this.  hz = 100 is
about 1% faster than hz = 1000 for the makeworld benchmark.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070607133524.S7002>