Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Jun 2017 21:49:42 -0700
From:      Mark Millard <markmi@dsl-only.net>
To:        Justin Hibbits <jhibbits@FreeBSD.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers@freebsd.org
Subject:   32-bit powerpc FreeBSD head -r317820 panic example: Example showing register r10 having been trashed during set_affinity's time frame
Message-ID:  <1738C902-7A9C-4711-A2F7-D302249C9AF8@dsl-only.net>

next in thread | raw e-mail | index | archive | help
Summary of the analysis below: Something
trashed the r10 value during set_affinity,
effectively replacing a struct td_sched *
value with what happened to be a struct
thread * value.

That in turn lead to what should have been
a cpuid (for one of 4 CPUs) to instead be
a struct td_sched * value.

And that in turn lead to a data storage
interrupt from the indexing into
cpuid_to_pcpu being way out of bounds.


Context: 32-bit powerpc head -r317820 used on
a old PowerMac G5 so-called "Quad Core".


The analysis from a panic's dump and from
pictures of ddb output and using the kernel
debug file. . .

I got a panic that it looked like I could back calculate
a ways to see what was odd somewhat earlier. This is
that back calculation sequence.

This will start in ipi_cpu where the panic occurred
and progress backwards into sched_affinity.

void
ipi_cpu(int cpu, u_int ipi)
{

        ipi_send(cpuid_to_pcpu[cpu], ipi);
}

was:

(kgdb) x/90i ipi_cpu
0x8eb0dc <ipi_cpu>:	stwu    r1,-32(r1)
0x8eb0e0 <ipi_cpu+4>:	mflr    r0
0x8eb0e4 <ipi_cpu+8>:	stw     r30,24(r1)
0x8eb0e8 <ipi_cpu+12>:	stw     r31,28(r1)
0x8eb0ec <ipi_cpu+16>:	stw     r0,36(r1)
0x8eb0f0 <ipi_cpu+20>:	mr      r31,r1
0x8eb0f4 <ipi_cpu+24>:	bcl-    20,4*cr7+so,0x8eb0f8 <ipi_cpu+28>
0x8eb0f8 <ipi_cpu+28>:	mflr    r30
0x8eb0fc <ipi_cpu+32>:	lwz     r0,-32(r30)
0x8eb100 <ipi_cpu+36>:	add     r30,r0,r30
0x8eb104 <ipi_cpu+40>:	rlwinm  r3,r3,2,0,29
0x8eb108 <ipi_cpu+44>:	lwz     r9,-32756(r30)
0x8eb10c <ipi_cpu+48>:	lwzx    r3,r3,r9
0x8eb110 <ipi_cpu+52>:	bl      0x8eb018 <ipi_send>
0x8eb114 <ipi_cpu+56>:	lwz     r11,0(r1)
0x8eb118 <ipi_cpu+60>:	lwz     r0,4(r11)
0x8eb11c <ipi_cpu+64>:	mtlr    r0
0x8eb120 <ipi_cpu+68>:	lwz     r30,-8(r11)
0x8eb124 <ipi_cpu+72>:	lwz     r31,-4(r11)
0x8eb128 <ipi_cpu+76>:	mr      r1,r11
0x8eb12c <ipi_cpu+80>:	blr
0x8eb130 <ipi_cpu+84>:	.long 0x43ce40

The point of failure (srr0 value) is:

0x8eb10c <ipi_cpu+48>:	lwzx    r3,r3,r9

(Note: This is a form of r3:=3DMEM(r3+r9,4).
Luckily r3 was not updated because of the
bad r3+r9 value.)

Note that the only prior instruction modifying
r3 in ipi_cpu is:

0x8eb104 <ipi_cpu+40>:	rlwinm  r3,r3,2,0,29

The "show reg r3" result was: 0x51f2880. So
prior to the shift by 2 and masking off of
the least significant 2 bits its value was:

r3=3D0x147CA20

I will note that the 0x51f2880 + the
"show reg r9" value 0xf2c9fc (i.e.,
&cpuid_to_pcpu) produces: 0x611f27c
-- and the exception was reported as
being for virtual address 0x611f27c .

By the source code structure r3 was
supposed to be one of the values: 0x0,
0x1, 0x2 or 0x3 instead. (The PowerMac G5
"Quad Core" has 4 cpus.)

With the "show reg r4" result being 0x1
it means that the call to ipi_cpu was
apparently (by values passed):

ipi_cpu(0x147CA20, 0x1)

"Show allpcpu" reported that cpuid=3D3 had:

curthread  =3D 0x147ca20: pid 11 tid 100006 "idle: cpu3"
idlethread =3D 0x147ca20: did 100006 "idle: cpu3"

where 0x147d000 (for cpuid 2) has 0x147d008
holding a 0x147ca20 value.

So somehow a thread address for cpuid 3 was
given to ipi_cpu instead of 0x3 .

The backtrace shows that the caller of
ipi_cpu was sched_affinity:

void
sched_affinity(struct thread *td)
{
#ifdef SMP
        struct td_sched *ts;

        THREAD_LOCK_ASSERT(td, MA_OWNED);
        ts =3D td_get_sched(td);
        if (THREAD_CAN_SCHED(td, ts->ts_cpu))
                return;
        if (TD_ON_RUNQ(td)) {
                sched_rem(td);
                sched_add(td, SRQ_BORING);
                return;
        }
        if (!TD_IS_RUNNING(td))
                return;
        /*
         * Force a switch before returning to userspace.  If the
         * target thread is not running locally send an ipi to force
         * the issue.
         */
        td->td_flags |=3D TDF_NEEDRESCHED;
        if (td !=3D curthread)
                ipi_cpu(ts->ts_cpu, IPI_PREEMPT);
#endif
}

So the implication is that ts->ts_cpu had
the value 0x147ca20 instead of the value 3
at the time of the call sequence.

sched_affinity was:

0x535ab8 <sched_affinity>:	stwu    r1,-32(r1)
0x535abc <sched_affinity+4>:	mflr    r0
0x535ac0 <sched_affinity+8>:	stw     r29,20(r1)
0x535ac4 <sched_affinity+12>:	stw     r30,24(r1)
0x535ac8 <sched_affinity+16>:	stw     r31,28(r1)
0x535acc <sched_affinity+20>:	stw     r0,36(r1)
0x535ad0 <sched_affinity+24>:	mr      r31,r1
0x535ad4 <sched_affinity+28>:	mr      r29,r3
0x535ad8 <sched_affinity+32>:	lwz     r0,0(r3)
0x535adc <sched_affinity+36>:	addi    r10,r3,808
0x535ae0 <sched_affinity+40>:	lwz     r9,8(r10)
0x535ae4 <sched_affinity+44>:	lwz     r11,48(r3)
0x535ae8 <sched_affinity+48>:	rlwinm  r0,r9,29,3,29
0x535aec <sched_affinity+52>:	clrlwi  r9,r9,27
0x535af0 <sched_affinity+56>:	lwzx    r0,r11,r0
0x535af4 <sched_affinity+60>:	sraw    r0,r0,r9
0x535af8 <sched_affinity+64>:	andi.   r9,r0,1
0x535afc <sched_affinity+68>:	bne-    0x535b4c <sched_affinity+148>
0x535b00 <sched_affinity+72>:	lwz     r0,652(r3)
0x535b04 <sched_affinity+76>:	cmpwi   cr7,r0,3
0x535b08 <sched_affinity+80>:	bne+    cr7,0x535b20 =
<sched_affinity+104>
0x535b0c <sched_affinity+84>:	bl      0x5341c0 <sched_rem>
0x535b10 <sched_affinity+88>:	mr      r3,r29
0x535b14 <sched_affinity+92>:	li      r4,0
0x535b18 <sched_affinity+96>:	bl      0x53583c <sched_add>
0x535b1c <sched_affinity+100>:	b       0x535b4c <sched_affinity+148>
0x535b20 <sched_affinity+104>:	cmpwi   cr7,r0,4
0x535b24 <sched_affinity+108>:	bne-    cr7,0x535b4c =
<sched_affinity+148>
0x535b28 <sched_affinity+112>:	lwz     r0,156(r3)
0x535b2c <sched_affinity+116>:	oris    r0,r0,1
0x535b30 <sched_affinity+120>:	stw     r0,156(r3)
0x535b34 <sched_affinity+124>:	mr      r0,r2
0x535b38 <sched_affinity+128>:	cmpw    cr7,r3,r0
0x535b3c <sched_affinity+132>:	beq-    cr7,0x535b4c =
<sched_affinity+148>
0x535b40 <sched_affinity+136>:	lwz     r3,8(r10)
0x535b44 <sched_affinity+140>:	li      r4,1
0x535b48 <sched_affinity+144>:	bl      0x8eb0dc <ipi_cpu>
0x535b4c <sched_affinity+148>:	lwz     r11,0(r1)
0x535b50 <sched_affinity+152>:	lwz     r0,4(r11)
0x535b54 <sched_affinity+156>:	mtlr    r0
0x535b58 <sched_affinity+160>:	lwz     r29,-12(r11)
0x535b5c <sched_affinity+164>:	lwz     r30,-8(r11)
0x535b60 <sched_affinity+168>:	lwz     r31,-4(r11)
0x535b64 <sched_affinity+172>:	mr      r1,r11
0x535b68 <sched_affinity+176>:	blr
0x535b6c <sched_affinity+180>:	.long 0x7e7f18

The only call executed by sched_affinity was to
ipi_cpu given what is inlined and the fact that
it made it to do the ipi_cpu bl instruction.

Since ipi_cpu leaves various registers alone
some register values from sched_affinity were
preserved for "show reg" use. r10 is an example
in:

0x535ad4 <sched_affinity+28>:	mr      r29,r3
. . .
0x535adc <sched_affinity+36>:	addi    r10,r3,808
. . .
0x535b40 <sched_affinity+136>:	lwz     r3,8(r10)
0x535b44 <sched_affinity+140>:	li      r4,1
0x535b48 <sched_affinity+144>:	bl      0x8eb0dc <ipi_cpu>

(r29 is another register value that is preserved
in what sched_affinity called. r29 reports what
r3 was when sched_affinity was called: 0x147d000,
which makes sense. Also 808 is in decimal notation.)

That sched_affinity code corresponds to (r10 for
ts below):

sched_affinity(struct thread *td)
. . .
        struct td_sched *ts;
. . .
        ts =3D td_get_sched(td);
. . .
                ipi_cpu(ts->ts_cpu, IPI_PREEMPT);

where "show reg r10" reported: 0x147d000, which
matched the "show allpcpu" for cpuid=3D2:

curthread =3D 0x147d000: pid11 tid 100005 "idle: cpu2"

But it should not match that. . .

r10 should have been 0x147d328 (0x147d00+0x328)
but is reported as 0x147d000 and the behavior
reported for ipi_cpu agrees with 0x147d000:
8(r10) then extracts a 0x0147ca20 value from
0x147d008 instead of a 0x0 from 0x147d328 .

So something trashed r10 before:

0x535b40 <sched_affinity+136>:	lwz     r3,8(r10)

and so effectively replaced ts with the td value.

Trashing r3 so that r3+808=3D=3D0x147d000 when
0x535adc executes seems unlikely but is a
technical possibility for how r10 ends up
trashed.


Other notes:

Some of the details not shown above are:

static __inline __pure2 struct td_sched *
td_get_sched(struct thread *td)
{
       =20
        return ((struct td_sched *)&td[1]);
}

and:

/*
 * Thread scheduler specific section.  All fields are protected
 * by the thread lock.
 */
struct td_sched {
        struct runq     *ts_runq;       /* Run-queue we're queued on. */
        short           ts_flags;       /* TSF_* flags. */
        int             ts_cpu;         /* CPU that we have affinity =
for. */
        int             ts_rltick;      /* Real last tick, for affinity. =
*/
        int             ts_slice;       /* Ticks of slice remaining. */
        u_int           ts_slptime;     /* Number of ticks we vol. slept =
*/
        u_int           ts_runtime;     /* Number of ticks we were =
running */
        int             ts_ltick;       /* Last tick that we were =
running on */
        int             ts_ftick;       /* First tick that we were =
running on */
        int             ts_ticks;       /* Tick count */
#ifdef KTR
        char            ts_name[TS_NAME_LEN];
#endif
};

where td_sched is a suffix after a "struct thread",
such as the space reserved in:

struct thread0_storage {
        struct thread t0st_thread;
        uint64_t t0st_sched[10];
};

and checked by:

_Static_assert(sizeof(struct thread) + sizeof(struct td_sched) <=3D
    sizeof(struct thread0_storage),
    "increase struct thread0_storage.t0st_sched size");





=3D=3D=3D
Mark Millard
markmi at dsl-only.net




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1738C902-7A9C-4711-A2F7-D302249C9AF8>