Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 May 2013 12:33:01 +0100
From:      Andrew Turner <andrew@fubar.geek.nz>
To:        Tim Kientzle <tim@kientzle.com>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: Is this related to the general panic discussed in freebsd-current?
Message-ID:  <20130506123301.397bbfcd@bender.lan>
In-Reply-To: <B5B4C509-5CEC-4155-90BF-B40D7395F09B@kientzle.com>
References:  <51835891.4050409@thieprojects.ch> <03971BD1-4ADE-4435-BDD0-B94B62634F1D@bsdimp.com> <5183BF8C.4040406@thieprojects.ch> <CCABA43A-6D7E-4310-9F68-AEE54C88F431@kientzle.com> <6D0E82C9-79D1-4804-9B39-3440F99AA8FE@kientzle.com> <20130505140006.0d671ba5@bender> <D0B02568-E7AB-410E-8717-E9F9C745E6ED@kientzle.com> <20130505233729.63ac23bc@bender.lan> <B5B4C509-5CEC-4155-90BF-B40D7395F09B@kientzle.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 5 May 2013 16:20:28 -0700
Tim Kientzle <tim@kientzle.com> wrote:

> 
> On May 5, 2013, at 3:37 PM, Andrew Turner wrote:
> 
> > On Sun, 5 May 2013 09:37:48 -0700
> > Tim Kientzle <tim@kientzle.com> wrote:
> >> On May 5, 2013, at 6:00 AM, Andrew Turner wrote:
> >> 
> >>> On Sat, 4 May 2013 15:44:37 -0700
> >>> Tim Kientzle <tim@kientzle.com> wrote:
> >>>> I'm baffled.  If I insert a printf into the loop in
> >>>> stack_capture, the kernel boots. But the generated assembly
> >>>> looks perfectly correct to me in either case.  So inserting the
> >>>> printf must have some side-effect.
> >>>> 
> >>>> The stack does end up aligned differently:  The failing version
> >>>> puts 16 bytes on the stack, the working version puts 24 bytes.
> >>>> But I can't figure out how that would explain what I'm seeing...
> >>> 
> >>> It feels like an alignment issue but those stack sizes should both
> >>> be valid. Are you able to send me the asm for the working and
> >>> broken versions of the function?
> >>> 
> >>> Also which ABI are you using? I have not been able to reproduce it
> >>> with EABI, but that may have been because I have a patched clang
> >>> I've been using to track down another issue.
> >> 
> >> I'm using whatever the default is in FreeBSD-CURRENT.  I've seen
> >> this consistently with both RaspberryPi and BeagleBone kernels
> >> for the last few weeks.
> > Ok, it's the old ABI. I note this function may be broken with EABI
> > as it make assumptions on the layout of each frame.
> 
> Thought so.
> 
> >> /* Broken version */
> >> c0519cec <stack_save>:
> >> void
> >> stack_save(struct stack *st)
> >> {
> >> c0519cec:       e92d4830        push    {r4, r5, fp, lr}
> > 
> > This stack layout is incorrect. It should store (from a low address
> > to high address) r4, r5, fp, ip, lr and pc.
> 
> If I understand right, you're claiming that Clang is generating
> a wrong preamble for OABI functions which is manifesting
> as crashes in the stack-walking code.
Clang is generating an invalid preample for APCS (ARM Procedure Call
Standard), at least according to the documentation I've read.

> 
> I'm not sure I understand the frame layout you're saying it
> should use, though.  Pushing PC seems a very strange thing
> to do on ARM.  (Though it would seem to match
> sys/arm/include/stack.h.)

The stack structure is described at [1]. The documentation I have says
storing pc is to find the function the stack frame belongs to.


> It doesn't look like Clang/OABI is using the layout you suggest
> anywhere in the kernel code:  I grepped through the kernel
> disassembly and found only a single instance of "fp, ip, lr, pc"
> and that was from assembly.
> 
> It also looks like sys/arm/include/stack.h needs to be taught
> about the difference between EABI and OABI.
Yes, EABI will be difficult, it either needs to use the unwind code,
decode the stack instructions, we tell clang to add a frame pointer, or
we stop implementing the two functions saving the stack.

IA64 takes the last approach. I'm not sure how difficult it would be to
teach clang to store the frame pointer for us to use, from a quick test
it appears to be more difficult than adding -fno-omit-frame-pointer to
the command.

> > The unwind code following is
> > incorrect for this stack layout.
> 
> Ah.  I'll take another look.  I hadn't tried to match up the offsets
> to see if they made sense for the stack layout.
> 
> I could probably change this stack-walking code to
> match the frame layout being used by Clang here,
> but I'm not sure whether that's the "right" fix.
I suspect the right fix is to move to EABI. I have one bug to fix in
clang before this can be done, where the stack is incorrectly aligned
in 'leaf' functions accessing thread local data. I have a work around
for it and may commit this until a better solution can be found.

This may also explain why backtrace is also broken. It is likely
expecting the same frame layout.

> > In your working code how deep is the stack you are printing? I
> > suspect you are getting lucky with the data on the stack.
> 
> Yes, almost certainly it's a matter of luck here.  I had noticed that
> when I added the printf, it became apparent that the function never
> walked more than one frame.  Now I understand why.
Welcome to the world of debugging stack issues where adding function
calls can magically fix things.

Andrew

[1] http://www.heyrick.co.uk/assembler/apcsintro.html



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130506123301.397bbfcd>