Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 15 Feb 2016 12:17:50 -0800
From:      Mark Millard <markmi@dsl-only.net>
To:        Roman Divacky <rdivacky@vlakno.cz>
Cc:        Nathan Whitehorn <nwhitehorn@freebsd.org>, FreeBSD Toolchain <freebsd-toolchain@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>
Subject:   Re: I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc
Message-ID:  <3A260EC5-E69A-4980-8F74-C04395F4E5F4@dsl-only.net>
In-Reply-To: <20160215191100.GA17387@vlakno.cz>
References:  <F6846682-10F7-4D0D-A691-ED8D4366805C@dsl-only.net> <20160214192903.GA96697@vlakno.cz> <70B405C4-E1AC-4F35-9786-051FDA2F8BE7@dsl-only.net> <D7D536A4-68B6-4506-BDFB-8C2C41E1C958@dsl-only.net> <20160215191100.GA17387@vlakno.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2016-Feb-15, at 11:11 AM, Roman Divacky <rdivacky at vlakno.cz> =
wrote:
>=20
> Mark, I believe you're right. What do you think about this patch?
>=20
> Index: tools/clang/lib/CodeGen/TargetInfo.cpp
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- tools/clang/lib/CodeGen/TargetInfo.cpp	(revision 260852)
> +++ tools/clang/lib/CodeGen/TargetInfo.cpp	(working copy)
> @@ -3599,6 +3599,8 @@
>   {
>     CGF.EmitBlock(UsingOverflow);
>=20
> +    Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
> +
>     // Everything in the overflow area is rounded up to a size of at =
least 4.
>     CharUnits OverflowAreaAlign =3D CharUnits::fromQuantity(4);
>=20
>=20
> Can you test it?

It may be later today before I can start the the test process.

While your change is not wrong as presented, it does seem to be based on =
the ABI document's numbering with the range 3 <=3D gr <12, where 3 <=3D =
gr < 11 cover r3-r10 use and gr=3D11 implies overflow stack area use. =
(gr being the ABI documents name.)

The clang code generation that I saw while analyzing the problem and the =
clang source that you had me look at did not use that numbering. Instead =
it seems to be based on 0 <=3D gpr < 9, where 0 <=3D gpr < 8 cover =
r3-r10 use and gpr=3D8 implies overflow stack area use. (gpr being what =
gdb showed me as I remember.) In other words: clang counts the number of =
"parameter registers" already in use as it goes along instead of =
tracking register numbers that have been used.

So assigning any value that appears to be positive and >=3D 8 should =
work, such as:

Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);

A cross check on this is the clang source code below:

> llvm::Value *NumRegs =3D Builder.CreateLoad(NumRegsAddr, =
"numUsedRegs");
> . . .
> llvm::Value *CC =3D
>     Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");
>=20
> llvm::BasicBlock *UsingRegs =3D CGF.createBasicBlock("using_regs");
> llvm::BasicBlock *UsingOverflow =3D =
CGF.createBasicBlock("using_overflow");
> llvm::BasicBlock *Cont =3D CGF.createBasicBlock("cont");
>=20
> Builder.CreateCondBr(CC, UsingRegs, UsingOverflow);

I'd guess that the Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), =
"cond") for using in Builder.CreateCondBr is a test for < 8 (unsigned =
test?) picking UsingRegs and >=3D8 picking UsingOverflow. 11>=3D8 so 11 =
would work.

But the clang folks might prefer that the same figure be used in both =
places, possibly with the source code naming the value once and using =
the name in both places, not that the figure is likely to change in this =
already PowerPC specific code.

In analyzing the powerpc code absent knowledge of clang's code =
generation source code I would likely have been confused by seeing such =
differing numbers in the generated code if I'd run into such. That is =
another reason to use the same figure in both places.


I continue to provide some history below for reference.

=3D=3D=3D
Mark Millard
markmi at dsl-only.net

On Mon, Feb 15, 2016 at 12:52:15AM -0800, Mark Millard wrote:
>=20
> I'm top posting as the following can stand on its own fairly well.
>=20
> On Sun Feb 14 23:46:14 UTC 2016 Nathan Whitehorn wrote:
>=20
>> On 02/14/16 14:34, Mark Millard wrote:
>>> clang's code base is not familiar material for me nor do I have =
solid=20
>>> reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules =
so=20
>>> the below has my guess work involved. The following code appears to=20=

>>> have hard wired a global, unvarying constant (8) into the test for=20=

>>> picking UsingRegs vs. UsingOverflow.
>>=20
>> For reference, we use the standard ELF ABI=20
>> (https://uclibc.org/docs/psABI-ppc.pdf).
>> -Nathan
>=20
> Reviewing the Parameter Passing material in that document shows that =
the problem is in the original specification.
>=20
> And there is a more modern specification that has a fix in its =
wording. (Which shows that I'm not likely to be wrong.) I'll reference =
and quote it later.
>=20
> First I'll explain the problem that is in psABI-ppc.pdf (the old =
SunSoft 1995 document).
>=20
> First a numbering point: psABI-ppc.pdf uses "gr" matching the numeral =
in r3, r4, . . . , r10, starting at r3 (i.e, 3). And gr indicates the =
next register to be used, not the last one already used.
>=20
> The document splits the algorithm for placement of parameters into 3 =
stages with the following structure, intended as they have it in the =
document but various less interesting details for my "8byte then 4byte" =
example omitted:
>=20
>> INITIALIZING:
>>     Set fr=3D1, gr=3D3, and starg to the address of
>>     parameter word 1.
>> SCAN:
>>     If there are no more arguments, terminate.
>>     Otherwise, select one of the following
>>     depending on the type of the next argument:
>>=20
>>     DOUBLE_OR_FLOAT
>>        If fr>8 ( . . .), go to OTHER. Otherwise,
>>        . . .
>>=20
>>     SIMPLE_ARG
>>        If gr>10, go to OTHER. Otherwise, load the
>>        argument value into general register gr,
>>        set gr to gr+1, can goto SCAN. . . .
>>=20
>>     LONG_LONG
>>        If gr>9, go to OTHER. Otherwise, . . .
>>=20
>> OTHER:
>>       Arguments not otherwise handled above are
>>       passed in the parameter words of the
>>       caller???s stack frame. . . . Set starg to
>>       starg+size, then go to SCAN.
>=20
> Note that gr is not incremented by LONG_LONG or by the later OTHER =
usage when gr>9. (That would be my example's 8 byte integer that is =
later followed by a 4 byte one.)
>=20
> That OTHER's "go to SCAN" would then lead to the following 4 byte =
integer in my example to be put in r10 and gr then being set to 11 =
instead of it being stored in a parameter word on the stack.
>=20
> The nasty thing about this for va_list/va_arg use is that the stored =
information does not indicate which was before vs. after in the argument =
order: the 4 byte r10 content or the 8 byte "OTHER" content: the two =
orders produce identical results.
>=20
> This can not be correct.
>=20
> The Power-Arch-32-bit-ABI-supp-1.0-Unified.pdf is more modern and =
explicitly deals with VR and other modern things. (Its terminology =
matching LONG_LONG above is DUAL_GP.) But for what I'm dealing with here =
it has the following extra wording at the very end of its OTHER section:
>=20
>> If gr>9 and the type is DUAL_GP ,or . . ., or . . ., then set gr =3D =
11 (to prevent subsequent SINGLE_GPs from being placed in registers =
after DUAL_GP, QUAD_GP, or EIGHT_GP arguments that would no longer fit =
in the registers).
>=20
>=20
>=20
> I've left the prior information below for reference.
>=20
> =3D=3D=3D
> Mark Millard
> markmi at dsl-only.net
>=20
>=20
>=20
> On 2016-Feb-14, at 2:34 PM, Mark Millard <markmi@dsl-only.net> wrote:
>>=20
>> . . .
>> clang's code base is not familiar material for me nor do I have solid =
reference material for the FreeBSD TARGET_ARCH=3Dpowerpc ABI rules so =
the below has my guess work involved.
>>=20
>> The following code appears to have hard wired a global, unvarying =
constant (8) into the test for picking UsingRegs vs. UsingOverflow.
>>=20
>>=20
>>> llvm::Value *NumRegs =3D Builder.CreateLoad(NumRegsAddr, =
"numUsedRegs");
>> . . .
>>> llvm::Value *CC =3D
>>>     Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");
>>>=20
>>> llvm::BasicBlock *UsingRegs =3D CGF.createBasicBlock("using_regs");
>>> llvm::BasicBlock *UsingOverflow =3D =
CGF.createBasicBlock("using_overflow");
>>> llvm::BasicBlock *Cont =3D CGF.createBasicBlock("cont");
>>>=20
>>> Builder.CreateCondBr(CC, UsingRegs, UsingOverflow);
>> . . .
>>> // Case 1: consume registers.
>>> Address RegAddr =3D Address::invalid();
>>> {
>> . . .
>>>   // Increase the used-register count.
>>>   NumRegs =3D
>>>     Builder.CreateAdd(NumRegs,
>>>                       Builder.getInt8((isI64 || (isF64 && =
IsSoftFloatABI)) ? 2 : 1));
>>>   Builder.CreateStore(NumRegs, NumRegsAddr);. . .
>> . . .
>>> }
>>>=20
>>> // Case 2: consume space in the overflow area.
>>> Address MemAddr =3D Address::invalid();
>>> {
>> . . . (no adjustments to NumRegs) . . .
>>=20
>> If so the means of counting NumRegs (a.k.a. gpr) then needs to take =
into account an allocated but unused last UsingRegs "slot" sometimes. =
Imagine. . .
>>=20
>> r3, r4, r5, r6, r7, r8, r9 in use already so r10 is the last possible =
"UsingRegs" context.
>> (0  1   2   3   4   5   6, leaving r10 as position 7, the last < 8 =
value)
>>=20
>> Then the next two arguments are a 8 byte integer then a a 4 byte =
integer (in that order). That results in what should be:
>>=20
>> r10 "UsingRegs" slot reserved and un-accessed
>> In other words: counted as allocated so that the rest goes in in the =
overflow area
>> (so no position 7 usage)
>>=20
>> then
>>=20
>> overflow with the 8 byte integer then the 4 byte integer.
>>=20
>>=20
>> And, in fact, the memory content reflects this in the overflow area.
>>=20
>>=20
>> But the va_arg access code does not count r10's slot as allocated in =
"Using Regs" after the 8 byte integer. So later it tries to use r10's =
slot for the 4 byte integer that is actually in the UsingOverflow area.
>>=20
>> One fix of sorts is to have "Case 2: consume space in the overflow =
area." set NumRegs (a.k.a. gpr) to the bound from the =
Builder.CreateICmpULT (8 in this context). Then the first (or any/every) =
use of the UsingOverflow area forces no more use of the UsingRegs area =
(for the involved va_list).
>>=20
>>=20
>>=20
>> =3D=3D=3D
>> Mark Millard
>> markmi at dsl-only.net
>=20
> _______________________________________________
> freebsd-toolchain@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
> To unsubscribe, send any mail to =
"freebsd-toolchain-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A260EC5-E69A-4980-8F74-C04395F4E5F4>