Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Mar 2017 02:03:24 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Warner Losh <imp@freebsd.org>, src-committers <src-committers@freebsd.org>, "svn-src-all@freebsd.org" <svn-src-all@freebsd.org>,  "svn-src-head@freebsd.org" <svn-src-head@freebsd.org>
Subject:   Re: svn commit: r316064 - head/sys/boot/i386/boot2
Message-ID:  <CANCZdfrJNrk3h-vGzBcWkguWM3VimasYajxFgyOA9okPe5Ghrw@mail.gmail.com>
In-Reply-To: <20170328141213.T927@besplex.bde.org>
References:  <201703272253.v2RMra2L032487@repo.freebsd.org> <20170328141213.T927@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
[[ sorry for the top post but it's quick ]]

Summary, in Bruce's own words

> the whole optimization step was silly.

I agree. on my -current system, clang compiled boot2 was 4 bytes
smaller after ripping it out. gcc was a whopping 7 bytes larger. Since
gcc has 156 still free, I think it's best to just retire this.

Maybe you can give me a hint as to which structs to look at to get
back those 7 bytes :)

Thanks for the writeup. It was most enlightening.

Warner

On Mon, Mar 27, 2017 at 11:53 PM, Bruce Evans <brde@optusnet.com.au> wrote:
> On Mon, 27 Mar 2017, Warner Losh wrote:
>
>> Log:
>>  Fix build with path names with 'align' or 'nop' in them.
>>
>>  clang is now inserting .file directives with the entire path in
>>  them. This is fine, except that our sed peephole optimizer removes
>>  them if ${SRCTOP} or ${OBJTOP} contains 'align' or 'nop', leading to
>>  build failures. The sed peephole optimizer removes useful things for
>>  boot2 when used with clang, so restrict its use to gcc. Also, gcc no
>>  longer generates nops to pad things, so there's no point in removing
>>  it. Specialize the optimization to just removing the .align 4 lines to
>>  preclude inadvertant path matching.
>>
>>  Sponsored by: Netflix
>>  Commit brought to you the path: /home/xxx/NCD-3592-logsynopts/FreeBSD
>>
>> Modified:
>>  head/sys/boot/i386/boot2/Makefile
>>
>> Modified: head/sys/boot/i386/boot2/Makefile
>>
>> ==============================================================================
>> --- head/sys/boot/i386/boot2/Makefile   Mon Mar 27 22:34:43 2017
>> (r316063)
>> +++ head/sys/boot/i386/boot2/Makefile   Mon Mar 27 22:53:36 2017
>> (r316064)
>> @@ -91,10 +91,18 @@ boot2.o: boot2.s
>>
>> SRCS=   boot2.c boot2.h
>>
>> +# Gcc (4.2.1 at least) benefits from removing the forced alignment
>> +# clang doesn't. Make the removal as specific as possible to avoid
>> +# false positives (like path names with odd names for debugging info).
>> +# In the past, gcc benefited from nop removal, but not in 4.2.1.
>> +# Think of this as a poor-man's peephole optimizer for gcc 4.2.1
>> boot2.s: boot2.c boot2.h ${.CURDIR}/../../common/ufsread.c
>>         ${CC} ${CFLAGS} -S -o boot2.s.tmp ${.CURDIR}/boot2.c
>> -       sed -e '/align/d' -e '/nop/d' < boot2.s.tmp > boot2.s
>> -       rm -f boot2.s.tmp
>> +.if ${COMPILER_TYPE} == "gcc"
>> +       sed -e '/\.align 4/d' < boot2.s.tmp > boot2.s
>> +.else
>> +       cp boot2.s.tmp boot2.s
>> +.endif
>>
>> boot2.h: boot1.out
>>         ${NM} -t d ${.ALLSRC} | awk '/([0-9])+ T xread/ \
>
>
> That's silly; if you just fix the regexps to actually be as specifice as
> possible, then they would just work with no need for large comments or
> ifdefs.  '\.align' instead of 'align' is a good start.  ' 4' instead of
> nothing is not so good.  There should also be a check for nothing
> except whitespace before '\.align'.  The whitespace before '4' might
> be a tab, but perhaps gcc only generates a space, and only generates
> unwanted alignments of precisely 4, with no fill bytes.  The general
> syntax would be too hard to parse, but it is easy to handle any digit,
> and also '\.p2align', and allow nothing except whitespace after the
> digit.
>
> Last time I looked at this about 5 years ago, but after clang, the
> whole optimization step was silly.  It made little difference for gcc
> and no difference for clang as you say.  The nop removal that you
> removed was silliest.  I think compilers stopped generating nops for
> padding many years before that code was written.  In text sections,
> They should generate something like '.p2align x,y,z', where x is the
> alignment, y is empty (it is the fill byte, which defaults to nop, but
> we don't want that but want a larger null instruction for multiple
> bytes) and z is a limit on the amount of padding (x is only a
> preference and z limits it).  This syntax would be hard to parse.
> But CFLAGS has -Os, and that works for preventing padding in the
> text section, so nop is never generated, not even with the spelling
> 'align'.
>
> -Os doesn't work so well for other things.  For kernels, it is completely
> broken with gcc-4.2.1 unless -fno-inline-functions is also used (it
> breaks the inlining parameters whose misconfiguration is probably the
> actual bug.  This doesn't seem to be a problem for boot code.  IIRC,
> the only simply broken thing is that alignment is still done for variables.
> This generates ".align 4".
>
> -Os is differently broken for clang, so the simple editing doesn't help.
> -Wa,-al to see what the assembler is doing is broken for clang.
> -mpreferred-stack-boundary doesn't work for clang, so is not used, but
> clang does better stack alignment by default.  This might still waste
> some instructions, but not for every function.  -mno-align-long-strings
> doesn't work for clang or for gcc > 4.2.1, so is not used.  clang does
> quite different string padding by default, and IIRC it is worse.  But
> boot2 hardly has any long strings.
>
> boot code is hard to test since it is not modular.  This is much more broken
> than when I last tested the size of boot2.o.  The most obvious new bugs are:
> - something like SRCTOP breaks finding the relative path to ufs headers in
>   a FreeBSD-11 sys tree
> - ${CLANG_OPT_SMALL} is added for all compilers.  This is defined
> out-of-tree
>   in /usr/share/mk/bsd.sys.mk.  (I tested on a host running nearly -current
>   with a nearly FreeBSD-10 sys tree).  In boot2, CLANG_P{T_SMALL is
>   wrong even for clang since it contains -mstack-alignment=8
>   unconditionally, but the stack alignment should be 4 or even 1 with
>   -m32 -Os.  Fixing this makes no difference for boot2.o compiled by
>   clang.  The corresponding -mpreferred-stack-boundary=2 for gcc is
>   configured correctly in boot2/Makefile.
>
> After fixing this, simply "CC=gcc-4.2.1 make boot2.o".  Adding -Wa,-al
> to CC also works (-Wa,-al is broken for clang, and clang produces a boot2.s
> which cannot be assembled by gas).
>
> The file sizes for FreeBSD-10 are:
>
>   text   data    bss     dec      hex   filename
>   5126     12   4940   10078   0x275e   clang/boot2-unhacked.o
>   5126      9   4940   10075   0x275b   clang/boot2.o
>   5036     32   4928    9996   0x270c   gcc/boot-unhacked.o
>   5032     29   4928    9989   0x2705   gcc/boot2.o
>
> In other words, the editing saves a whole 4 bytes of text and 3 bytes of
> data for gcc.  It also saves a whole 3 bytes of data for clang, and you
> just broke this :-).  (I would be surprised if the data isn't padded back
> to a multiple of 4 anyway.)
>
> After the reductions, the result with gcc is 86 bytes smaller than with
> clang, instead of only 82 bytes smaller.
>
> It is silly doing such small optimizations specially.
>
> The editing actually removes 22 alignment statements, all spelled
> "^<tab>\.align<space>4$'.  All of these are for data.  -Wa,-al shows
> that the padding is 2 instances of 2 bytes and 1 of 3 bytes.  At
> least half of the padding is for pessimal layout which is also a
> style bug in the source code.  Arrays of characters are mixed with
> 32-bit variables.  Style(9) says to sort on size (it means on alignment/
> element size), to minimize padding (mainly for structs) and get a conistent
> order as a side effect.  The padding usually doesn't matter except for
> structs.
>
> After fixing the source code, there would be just 1 odd byte in a
> character array that normally causes 3 bytes of padding after it.  It
> takes considerably more magic to avoid getting this padding anyway in
> the link step,  In the old a.out boot blocks, the linker forces 16-byte
> alignment at the end of every object file.  The old boot blocks put
> most of the data in a single object file to avoid an aerage of 8 bytes
> of padding between files.  Elf is more flexible, and usually gives
> 4-byte alignment.  I think it is possible to match an odd byte at the
> end of 1 object file with an odd 3 bytes at the beginning of the next
> object file, but this usually doesn't happen.  The new boot2 uses
> hacks like including ufsread.c to avoid having lots of object files.
>
> I use the old a.out boot blocks updated to elf and EDD and have squeezed
> them more extensively.  The squeezing is also silly since I could very
> easily switch to 64K ones (much larger than that is not easy, since
> they have to fit below 640K and a few multiples of 64K are already
> used for buffers).  The limit on 8K is mainly a historical mistake.
> A limit of 7.5K simplified booting from 15-sector floppies.  18-sector
> floppies allowed easy expansion to 9K, but were unportable for a small
> gain.  The default partition layout left only 8K below the ffs
> superblock, but was only a default.
>
> Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrJNrk3h-vGzBcWkguWM3VimasYajxFgyOA9okPe5Ghrw>