Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Apr 1995 14:40:21 +1000
From:      Bruce Evans <bde@zeta.org.au>
To:        terry@cs.weber.edu, toor@jsdinc.root.com
Cc:        geli.com!rcarter@implode.root.com, hackers@FreeBSD.org, jkh@violet.berkeley.edu
Subject:   Re: benchmark hell..
Message-ID:  <199504250440.OAA15562@godzilla.zeta.org.au>

next in thread | raw e-mail | index | archive | help
>The correct way to run comparative benchmarks is to boot a DOS disk
>and fdisk/mbr the same machine and installl on the same machine over
>and over with the different OS's.  Not "identical hardware", the same
>machine.

What is ``DOS''? :->  The correct way is to boot a boot disk for the
OS being tested and erase all traces of the previous OS...

>The first is context switch.  There are several significan differences
>in the way context switch takes place in BSD and Linux.  The BSD model
>for the actual switch itself is very close to the UnixWare/Solaris model,
>but is missing delayed storage of the FPU registers on a switch.  This is
>because BSD really doesn't have its act together regarding the FPU, and
>can't really be corrected until it does.  On hardware that does proper

Actually, this is because FreeBSD doesn't waste a whole 108 bytes in the
proc table for the FPU state and no one wants to handle the complications
and probable slowness of updating paged-out FPU contexts after delayed
FPU context switches.

>exception handling (like the Pentiums tested), the FPU context can be
>thrown out to the process it belongs to after being delayed over several
>context switches previous on the basis of "uses FPU" being set in the
>process or not, and a soft interrupt of the FPU as if trapping to an
>emulator to tag the first reference in each process.  Pretty much all
>the UNIX implementations and Linux do this, but BSD does not.

>It should be pretty obvious that for a benchmark, when there is a single
>program doing FPU crap, that the FPU delayed switchout means no switch
>actually occurs during the running of the benchmark.  You can think of
>this as a benchmark cheat, since it is a large locality of reference
>hack, in effect.

It takes a fairly special benchmark to demonstrate the speed advantages
of delayed context switches.  If there are multiple processes all using
the FPU then non-delayed switching is slighly faster.  If there are
many more processes not using the FPU than there are processes using it,
then most context switches don't switch the FPU.  For real processes,
those using the FPU a lot are likely to be CPU hogs that get context
switched very rarely so the extra cost for immediately switching the
FPU context is insignificant.

FreeBSD's low level context switching is faster than Linux's because
hardware tasking is not used.  Perhaps there is a lot more bloat in
other layers of the context switching.  (Yes, there is.  E.g., calling
microtime() for each context switch is very expensive except on
Pentiums).  microtime() has to be called so that FreeBSD can do better
timing statistics and scheduling than Linux.  ) However, for real
processes, context switching is relatively rare, so small differences
(less than a factor of 2-10) in the speed of context switching don't
matter.

>The system call overhead in BSD is typically larger.  This is because
>of address range checking for copyin/copyout operations.  Linux has

Actually, this is because FreeBSD has more layers.

>split this up into a seperate check call and copy operations, which is
>more prone to programmer error leaving security holes than an integral
>copy/check, but they have an advantage when it comes to multiple use
>memoy regions because of this (areas that are copied from several times
>or which are copied both in and out).

Actually, copyin/copyout are faster in FreeBSD, except on 386's.  For
copyin, the check consists of setting up a fault handler, checking
that the addresses are covered by the user segment registers, and
letting the h/w check for page faults.  For copyout, the page tables
have to be checked directly only for 386's.

I think the Linux advantage for syscalls is that copyin is usually not
used at all.  The args are in registers.

>Linux, as part of this, has no copyinstr.  Instead, they use a routine
>called "getpathname".  This not only allows them to special case the
>code, it also allows them greater flexibility than traditional copyinstr
>implementations when it comes to internationalization.  Since the only

copyinstr() is poorly implemented iin FreeBSD.  However, I've never seen
it showing up in profiling output.

>Finally, the pipe overhead is traceable to system call overhead, the pipe
>implementation itself, and the file system stack coeelescing being a
>little less than desirable.

This seems likely.  The BYTE benchmark article didn't mention the exact
syscalls used so it's not clear if the pipe benchmark is valid.  lmbench
has a "syscall" overhead benchmark that actually tests i/o of one byte
to a file.  Linux is much faster because there are less vfs layers, not
because syscalls are faster.  Pipe benchmarks involving small amounts
of data (as would be best if pipes are being used for process
synchronization) are likely to have to same problem.  Pipe benchmarks
involving a large amount of data should reduce to benchmarking bcopy()
(at east is the implmentation is naive enough to always actually do the
copy).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199504250440.OAA15562>