Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 9 Sep 2017 08:14:30 +0100
From:      David Chisnall <theraven@FreeBSD.org>
To:        Mateusz Guzik <mjg@FreeBSD.org>
Cc:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r323329 - head/sys/sys
Message-ID:  <9BE7B1BD-DE1A-48FE-B247-A3CE7674648E@FreeBSD.org>
In-Reply-To: <201709082009.v88K9EGW006964@repo.freebsd.org>
References:  <201709082009.v88K9EGW006964@repo.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 8 Sep 2017, at 21:09, Mateusz Guzik <mjg@FreeBSD.org> wrote:
>=20
> Author: mjg
> Date: Fri Sep  8 20:09:14 2017
> New Revision: 323329
> URL: https://svnweb.freebsd.org/changeset/base/323329
>=20
> Log:
>  Allow __builtin_memset instead of bzero for small buffers of known =
size

This change seems redundant, because modern compilers already do this =
optimisation.  For example:

	#include <strings.h>
=09
	char buf[42];
=09
	void bz(void)
	{
        	bzero(buf, 42);
	}

With clang 4.0 on x86 compiles to:

        pushq   %rbp
        movq    %rsp, %rbp
        xorps   %xmm0, %xmm0
        movups  %xmm0, buf+26(%rip)
        movaps  %xmm0, buf+16(%rip)
        movaps  %xmm0, buf(%rip)
        popq    %rbp
        retq

On AArch64, it compiles to:

        adrp    x8, buf
        add     x8, x8, :lo12:buf
        strh    wzr, [x8, #40]
        stp     xzr, xzr, [x8, #24]
        stp     xzr, xzr, [x8, #8]
        str             xzr, [x8]
        ret

Neither contains a call, both have inlined the zeroing.  This change is =
strictly worse, because the compiler has some carefully tuned heuristics =
that are set per target for when to inline the memset / bzero and when =
to call the function.  These are based on both the size and the =
alignment, including whether the target supports misaligned accesses and =
whether misaligned accesses are cheap.  None of this is captured by this =
change.

In the kernel, this optimisation is disabled by -ffreestanding, however =
__builtin_memset will be turned into a memset call if the size is not =
constant or if the memset call would be more efficient (as determined by =
the aforementioned heuristics).  Simply using __builtin_memset in all =
cases should give better code, and is more likely to be forward =
compatible with future ISAs where the arbitrary constant picked in this =
patch may or may not be optimal.

David




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9BE7B1BD-DE1A-48FE-B247-A3CE7674648E>