Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 24 Nov 2014 16:34:45 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        David Chisnall <theraven@freebsd.org>
Cc:        src-committers@freebsd.org, Scott Long <scott4long@yahoo.com>, Rui Paulo <rpaulo@me.com>, svn-src-all@freebsd.org, Scott Long <scottl@freebsd.org>, svn-src-head@freebsd.org
Subject:   Re: svn commit: r274489 - in head/sys/amd64: amd64 include
Message-ID:  <20141124151802.L1037@besplex.bde.org>
In-Reply-To: <426D8696-801A-4C48-A2FE-74575B4B79E7@FreeBSD.org>
References:  <201411132211.sADMBjP3009246@svn.freebsd.org> <35E5EAD8-99C1-43C0-8D01-B3B5B86ECA25@me.com> <13EC3116-6146-42FC-8941-2C7C009224B3@yahoo.com> <426D8696-801A-4C48-A2FE-74575B4B79E7@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 23 Nov 2014, David Chisnall wrote:

> On 21 Nov 2014, at 23:26, Scott Long <scott4long@yahoo.com> wrote:
>
>> That=92s a good question to look further into.  I didn=92t see any measu=
rable differences with this change.  I think that the cost of the function =
call itself masks the cost of a few extra instructions, but I didn=92t test=
 with switching it on/off for the entire kernel
>=20
> [ Note: The following is not specific to the kernel ]
>
> The overhead for preserving / omitting the frame pointer is decidedly non=
linear.  On a modern superscalar processor, it will usually be effectively =
zero, right up until the point that it pushes something out of the instruct=
ion cache on a hot path, at which point it jumps to 20-50%, depending on th=
e workload.

It seems to work much the same as padding with nops for that.  I get the
following times for:

X int x;
X=20
X asm("=09=09=09\n\
X .p2align 6=09=09\n\
X test:=09=09=09\n\
X =09# pushl %ebp=09\n\
X =09# movl %esp,%ebp=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09nop=09=09\n\
X =09# popl %ebp=09\n\
X =09ret=09=09\n\
X ");
X=20
X main()
X {
X =09int i;
X=20
X =09for (i =3D 0; i < 201000000; i++)
X =09=09test();
X }

on an old A64 at 2.01GHz in 32-bit mode:

- above code   (13 bytes of instruction prefetch needed in <test>): 7 cycle=
s
- change 2 nops to pushl %ebp; popl% ebp (same ifetch size):        7 cycle=
s
- change 4 nops to pushl/movl/popl (same ifetch size):              7 cycle=
s
- change 4 nops to 2 * pushl/popl (same ifetch size):               8 cycle=
s
- add 1-2 nops (14-15 bytes...):                                    8 cycle=
s
- add 3 nops (16 bytes...):                                        10 cycle=
s

So the cost is indeed 20-50% (actually 3/7 =3D 43%) in some cases, but only
in weird cases.  You just have to pack about 3 useful instructions (3 ~=3D
numver of independent pipelines) together with the frame pointer instructio=
ns
in the first 13 bytes of every function, or maybe move the frame pointer
instructions later (only traps including NMIs would notice if they are not
done as soon as possible, provided they are done before function calls).

OTOH, not using a frame pointer costs 1 byte per stack accesses.  This
might bust the icache anywhere in the function, but probably doesn't.
Busting is more likely at the beginning of the function where it does
a bunch of loads of args or a bunch of initializations.  At least gcc
like to generate code like:

 =09movl=09$0,N(%esp)=09# 7 bytes
 =09movl=09$0,N+4(%esp)
 =09...

for initializations, even when the initializations are not at the start
of the function in the source code.  7 bytes is a large x86 instruction,
and just 3 of them may bust the ifetch.

> The performance difference was more pronounced on i386, where having an e=
xtra GPR for the register allocator to use could make a 10-20% performance =
difference on some fairly common code (the two big performance wins for x86=
-64 over IA32 were the increase in number of GPRs and an FPU ISA that wasn'=
t batshit insane).

No, these only make small differences on modern superscalar processors.
More than explicit 8 GPR or FPU registers are very rarely needed.  The
FPU ISA is not bad (it is just a enhanced stack ISA), any the ISA makes
little difference anyway.  Any inefficiencies in the ISA are hidden in
pipelines provided the ifetcher can keep up and register renaming doesn't
break down.  Managing the FPU stack is painful in asm but easy for compiler=
s.

> For ISAs with more GPRs, that's less of an issue, although after inlining=
 being able to use %rbp as a GPR can sometimes make a noticeable difference=
 in performance.  In particular, as %rpb is callee-save, it's very useful t=
o be able to use it in non-leaf functions.

The limited number of registers works a bit like the limited ifetch at the=
=20
beginning of a function.  Very occasionally, having 1 extra register or bei=
ng
1 byte shorter makes a signficant difference difference.

ifetch seems to be easier to optimize than a limited number of registers, s=
o
my results above probably only apply to the CPU tested.  Even an A64 can
execute 2 instances of a function concurrently even when the function is
called sequentially.  It is a special case of branch prediction to fetch
from the branch target in advance.  The amount of prefetch may be limited
but it it is always possible to cache the results of prefetching better.

Bruce
From owner-svn-src-all@FreeBSD.ORG  Mon Nov 24 07:23:19 2014
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 9A2C5741;
 Mon, 24 Nov 2014 07:23:19 +0000 (UTC)
Received: from mtaout.vnode.se (mtaout.vnode.se [192.121.62.130])
 by mx1.freebsd.org (Postfix) with ESMTP id 09B3284A;
 Mon, 24 Nov 2014 07:23:19 +0000 (UTC)
Received: from ymer.vnode.se (h71n10-th-c-d4.ias.bredband.telia.com
 [81.234.63.71])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mtaout.vnode.se (Postfix) with ESMTPSA id 09496B0591;
 Mon, 24 Nov 2014 08:17:31 +0100 (CET)
Date: Mon, 24 Nov 2014 08:23:17 +0100
From: Joel Dahl <joel@vnode.se>
To: Baptiste Daroussin <bapt@FreeBSD.org>
Subject: Re: svn commit: r274925 - in head: lib/libc/sys lib/libdpv sbin/ipfw
 share/man/man4 share/man/man4/man4.arm share/man/man9 sys/boot/common
 sys/boot/i386/gptzfsboot usr.bin/dpv
Message-ID: <20141124072316.GA27782@ymer.vnode.se>
Mail-Followup-To: Baptiste Daroussin <bapt@FreeBSD.org>,
 src-committers@freebsd.org, svn-src-all@freebsd.org,
 svn-src-head@freebsd.org
References: <201411232100.sANL00cG078781@svn.freebsd.org>
 <20141123210412.GG68776@ivaldir.etoilebsd.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20141123210412.GG68776@ivaldir.etoilebsd.net>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>;
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 24 Nov 2014 07:23:19 -0000

On Sun, Nov 23, 2014 at 10:04:12PM +0100, Baptiste Daroussin wrote:
> On Sun, Nov 23, 2014 at 09:00:00PM +0000, Joel Dahl wrote:
> > Author: joel (doc committer)
> > Date: Sun Nov 23 21:00:00 2014
> > New Revision: 274925
> > URL: https://svnweb.freebsd.org/changeset/base/274925
> > 
> > Log:
> >   Misc mdoc fixes:
> >   
> >   - Remove superfluous paragraph macros.
> >   - Remove/fix empty or incorrect macros.
> >   - Sort sections into conventional order.
> >   - Terminate quoted strings properly.
> >   - Remove EOL whitespace.
> > 
> > Modified:
> >   head/lib/libc/sys/poll.2
> >   head/lib/libdpv/dpv.3
> >   head/sbin/ipfw/ipfw.8
> >   head/share/man/man4/gre.4
> >   head/share/man/man4/man4.arm/cgem.4
> >   head/share/man/man4/me.4
> >   head/share/man/man4/netmap.4
> >   head/share/man/man9/get_cyclecount.9
> >   head/share/man/man9/malloc.9
> >   head/share/man/man9/sleepqueue.9
> >   head/sys/boot/common/zfsloader.8
> >   head/sys/boot/i386/gptzfsboot/gptzfsboot.8
> >   head/usr.bin/dpv/dpv.1
> > 
> 
> [...]
> 
> > +.Sh AUTHORS
> > +This manual page was written by
> > +.An Andriy Gapon Aq avg@FreeBSD.org .
>                        ^ There should be a Mt here to properly render in html
> 
> I just picked one in the middle of this commit. In general every mail on any
> manpage Mt should be used.

Sure. Feel free to go over our manpages and fix them. It's a minor issue.

And while we're on the subject, there's a bit of background to this commit.
Back in 2012 I started fixing mandoc lint errors/warnings in our manpage
collection (excluding stuff from contrib/ and gnu/ etc.). I think I got them
down from around ~4000 issues to almost zero. Quite a few manpages didn't
even work with mandoc at the time, due to how many syntactical mdoc errors
they had. The situation is still good, but I re-ran my scripts yesterday and
found a slew of new warnings. I fixed a few obvious ones, but if someone with
more time on his hands wants to help, please go ahead. A good starting point
would probably the netmap.4 or ctl.conf.5 manpages, they seem to generate
quite a few warnings.

I'd also be grateful if everyone ran mandoc -Tlint on their manpages before
committing. :-)

-- 
Joel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141124151802.L1037>