Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Jun 2017 16:02:53 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        Justin Hibbits <jhibbits@FreeBSD.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers@freebsd.org
Subject:   Re: A different 32-bit powerpc head -r317820 panic on old PowerMac G5: dual backtraces from "timeout stopping cpus" (dump failed though): any comments?
Message-ID:  <29CCA1EC-242D-42E7-97E9-6F2F67178DF3@dsl-only.net>
In-Reply-To: <1F1E52BD-375E-47CC-BF06-ECB1092121B4@dsl-only.net>
References:  <D69CB244-69E2-4319-BD63-07BC7F763279@dsl-only.net> <1F1E52BD-375E-47CC-BF06-ECB1092121B4@dsl-only.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2017-Jun-6, at 11:09 AM, Mark Millard <markmi@dsl-only.net> wrote:

> . . .
> FYI: I'm currently doing an approximate
> binary search for localizing part of the panic problem.

This effort failed. More after the reminder
of the technique as it was when I started to
try this.

> This is based on the classic panics that are instead
> from jumping to a non-code area. . .
> 
> At a given point in my other experiments I was
> getting:
> 
> srr0=0x90a0f0 etext+0xb8fc
> 
> Adding (unused) code somewhat before that etext
> (so increasing etext) got:
> 
> srr0=0x90a0f0 etext+0xb8a8
> (The additional code was larger than I now use.)
> 
> But instead adding some code earlier (by around
> 0x100000 in this example) got:
> 
> srr0=0x90a110 etext+0xb8fc
> 
> So comparing to the starting conditions in
> each case:
> 
> The bad-address accessed in one case stayed
> constant but the etext offset decreased: in essence
> the only thing that happened is etext increased
> (matching the offset decrease).
> 
> In the other case the etext offset stayed constant
> but the bad-address and etext increased by the
> same amount.
> 
> . . .
> 
> Currently I'm adding code by adding:
> 
> void HACKISH_EXTRA_CODE(void) {}
> 
> to one .c file from /usr/src/sys/. . . based which
> file gets to within a ballpark of a more accurate
> binary search position. (Large binary search
> jumps currently: I'm not being picky about where
> in the .c the addition is made yet.)

The reason for the failure is that the behavioral
changes and failure modes changed depending where
HACKISH_EXTRA_CODE was added (over a very wide
span of addresses for where the code was tried).
Overall I was unable to have a criteria for
picking between larger addresses and smaller
addresses in the search in a way that targeted
getting near a boundary having two specific, distinct
behaviors on each side of the boundary.

Also adding code to panic instead of accessing
or changing inappropriate memory for failures
seen in some failures again changed the behavior
observed, no longer accessing or corrupting the
same way. So for the binary search I had to revert
such extra problem-detection code.

Very memory-layout dependent.

At this point I'm not hopeful of providing any
better evidence than I have in my various
prior list messages. I doubt anyone can pick
anything out based on just those from the last
several weeks. At most if something is noticed
the reports might be able to be checked for
"would this now identified code-problem have
possibly contributed to those reports?". (Even
that use seems unlikely.)


===
Mark Millard
markmi at dsl-only.net






Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?29CCA1EC-242D-42E7-97E9-6F2F67178DF3>