Date: Tue, 6 Jun 2017 21:49:42 -0700 From: Mark Millard <markmi@dsl-only.net> To: Justin Hibbits <jhibbits@FreeBSD.org>, Nathan Whitehorn <nwhitehorn@freebsd.org>, FreeBSD PowerPC ML <freebsd-ppc@freebsd.org>, freebsd-hackers@freebsd.org Subject: 32-bit powerpc FreeBSD head -r317820 panic example: Example showing register r10 having been trashed during set_affinity's time frame Message-ID: <1738C902-7A9C-4711-A2F7-D302249C9AF8@dsl-only.net>
next in thread | raw e-mail | index | archive | help
Summary of the analysis below: Something trashed the r10 value during set_affinity, effectively replacing a struct td_sched * value with what happened to be a struct thread * value. That in turn lead to what should have been a cpuid (for one of 4 CPUs) to instead be a struct td_sched * value. And that in turn lead to a data storage interrupt from the indexing into cpuid_to_pcpu being way out of bounds. Context: 32-bit powerpc head -r317820 used on a old PowerMac G5 so-called "Quad Core". The analysis from a panic's dump and from pictures of ddb output and using the kernel debug file. . . I got a panic that it looked like I could back calculate a ways to see what was odd somewhat earlier. This is that back calculation sequence. This will start in ipi_cpu where the panic occurred and progress backwards into sched_affinity. void ipi_cpu(int cpu, u_int ipi) { ipi_send(cpuid_to_pcpu[cpu], ipi); } was: (kgdb) x/90i ipi_cpu 0x8eb0dc <ipi_cpu>: stwu r1,-32(r1) 0x8eb0e0 <ipi_cpu+4>: mflr r0 0x8eb0e4 <ipi_cpu+8>: stw r30,24(r1) 0x8eb0e8 <ipi_cpu+12>: stw r31,28(r1) 0x8eb0ec <ipi_cpu+16>: stw r0,36(r1) 0x8eb0f0 <ipi_cpu+20>: mr r31,r1 0x8eb0f4 <ipi_cpu+24>: bcl- 20,4*cr7+so,0x8eb0f8 <ipi_cpu+28> 0x8eb0f8 <ipi_cpu+28>: mflr r30 0x8eb0fc <ipi_cpu+32>: lwz r0,-32(r30) 0x8eb100 <ipi_cpu+36>: add r30,r0,r30 0x8eb104 <ipi_cpu+40>: rlwinm r3,r3,2,0,29 0x8eb108 <ipi_cpu+44>: lwz r9,-32756(r30) 0x8eb10c <ipi_cpu+48>: lwzx r3,r3,r9 0x8eb110 <ipi_cpu+52>: bl 0x8eb018 <ipi_send> 0x8eb114 <ipi_cpu+56>: lwz r11,0(r1) 0x8eb118 <ipi_cpu+60>: lwz r0,4(r11) 0x8eb11c <ipi_cpu+64>: mtlr r0 0x8eb120 <ipi_cpu+68>: lwz r30,-8(r11) 0x8eb124 <ipi_cpu+72>: lwz r31,-4(r11) 0x8eb128 <ipi_cpu+76>: mr r1,r11 0x8eb12c <ipi_cpu+80>: blr 0x8eb130 <ipi_cpu+84>: .long 0x43ce40 The point of failure (srr0 value) is: 0x8eb10c <ipi_cpu+48>: lwzx r3,r3,r9 (Note: This is a form of r3:=3DMEM(r3+r9,4). Luckily r3 was not updated because of the bad r3+r9 value.) Note that the only prior instruction modifying r3 in ipi_cpu is: 0x8eb104 <ipi_cpu+40>: rlwinm r3,r3,2,0,29 The "show reg r3" result was: 0x51f2880. So prior to the shift by 2 and masking off of the least significant 2 bits its value was: r3=3D0x147CA20 I will note that the 0x51f2880 + the "show reg r9" value 0xf2c9fc (i.e., &cpuid_to_pcpu) produces: 0x611f27c -- and the exception was reported as being for virtual address 0x611f27c . By the source code structure r3 was supposed to be one of the values: 0x0, 0x1, 0x2 or 0x3 instead. (The PowerMac G5 "Quad Core" has 4 cpus.) With the "show reg r4" result being 0x1 it means that the call to ipi_cpu was apparently (by values passed): ipi_cpu(0x147CA20, 0x1) "Show allpcpu" reported that cpuid=3D3 had: curthread =3D 0x147ca20: pid 11 tid 100006 "idle: cpu3" idlethread =3D 0x147ca20: did 100006 "idle: cpu3" where 0x147d000 (for cpuid 2) has 0x147d008 holding a 0x147ca20 value. So somehow a thread address for cpuid 3 was given to ipi_cpu instead of 0x3 . The backtrace shows that the caller of ipi_cpu was sched_affinity: void sched_affinity(struct thread *td) { #ifdef SMP struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); ts =3D td_get_sched(td); if (THREAD_CAN_SCHED(td, ts->ts_cpu)) return; if (TD_ON_RUNQ(td)) { sched_rem(td); sched_add(td, SRQ_BORING); return; } if (!TD_IS_RUNNING(td)) return; /* * Force a switch before returning to userspace. If the * target thread is not running locally send an ipi to force * the issue. */ td->td_flags |=3D TDF_NEEDRESCHED; if (td !=3D curthread) ipi_cpu(ts->ts_cpu, IPI_PREEMPT); #endif } So the implication is that ts->ts_cpu had the value 0x147ca20 instead of the value 3 at the time of the call sequence. sched_affinity was: 0x535ab8 <sched_affinity>: stwu r1,-32(r1) 0x535abc <sched_affinity+4>: mflr r0 0x535ac0 <sched_affinity+8>: stw r29,20(r1) 0x535ac4 <sched_affinity+12>: stw r30,24(r1) 0x535ac8 <sched_affinity+16>: stw r31,28(r1) 0x535acc <sched_affinity+20>: stw r0,36(r1) 0x535ad0 <sched_affinity+24>: mr r31,r1 0x535ad4 <sched_affinity+28>: mr r29,r3 0x535ad8 <sched_affinity+32>: lwz r0,0(r3) 0x535adc <sched_affinity+36>: addi r10,r3,808 0x535ae0 <sched_affinity+40>: lwz r9,8(r10) 0x535ae4 <sched_affinity+44>: lwz r11,48(r3) 0x535ae8 <sched_affinity+48>: rlwinm r0,r9,29,3,29 0x535aec <sched_affinity+52>: clrlwi r9,r9,27 0x535af0 <sched_affinity+56>: lwzx r0,r11,r0 0x535af4 <sched_affinity+60>: sraw r0,r0,r9 0x535af8 <sched_affinity+64>: andi. r9,r0,1 0x535afc <sched_affinity+68>: bne- 0x535b4c <sched_affinity+148> 0x535b00 <sched_affinity+72>: lwz r0,652(r3) 0x535b04 <sched_affinity+76>: cmpwi cr7,r0,3 0x535b08 <sched_affinity+80>: bne+ cr7,0x535b20 = <sched_affinity+104> 0x535b0c <sched_affinity+84>: bl 0x5341c0 <sched_rem> 0x535b10 <sched_affinity+88>: mr r3,r29 0x535b14 <sched_affinity+92>: li r4,0 0x535b18 <sched_affinity+96>: bl 0x53583c <sched_add> 0x535b1c <sched_affinity+100>: b 0x535b4c <sched_affinity+148> 0x535b20 <sched_affinity+104>: cmpwi cr7,r0,4 0x535b24 <sched_affinity+108>: bne- cr7,0x535b4c = <sched_affinity+148> 0x535b28 <sched_affinity+112>: lwz r0,156(r3) 0x535b2c <sched_affinity+116>: oris r0,r0,1 0x535b30 <sched_affinity+120>: stw r0,156(r3) 0x535b34 <sched_affinity+124>: mr r0,r2 0x535b38 <sched_affinity+128>: cmpw cr7,r3,r0 0x535b3c <sched_affinity+132>: beq- cr7,0x535b4c = <sched_affinity+148> 0x535b40 <sched_affinity+136>: lwz r3,8(r10) 0x535b44 <sched_affinity+140>: li r4,1 0x535b48 <sched_affinity+144>: bl 0x8eb0dc <ipi_cpu> 0x535b4c <sched_affinity+148>: lwz r11,0(r1) 0x535b50 <sched_affinity+152>: lwz r0,4(r11) 0x535b54 <sched_affinity+156>: mtlr r0 0x535b58 <sched_affinity+160>: lwz r29,-12(r11) 0x535b5c <sched_affinity+164>: lwz r30,-8(r11) 0x535b60 <sched_affinity+168>: lwz r31,-4(r11) 0x535b64 <sched_affinity+172>: mr r1,r11 0x535b68 <sched_affinity+176>: blr 0x535b6c <sched_affinity+180>: .long 0x7e7f18 The only call executed by sched_affinity was to ipi_cpu given what is inlined and the fact that it made it to do the ipi_cpu bl instruction. Since ipi_cpu leaves various registers alone some register values from sched_affinity were preserved for "show reg" use. r10 is an example in: 0x535ad4 <sched_affinity+28>: mr r29,r3 . . . 0x535adc <sched_affinity+36>: addi r10,r3,808 . . . 0x535b40 <sched_affinity+136>: lwz r3,8(r10) 0x535b44 <sched_affinity+140>: li r4,1 0x535b48 <sched_affinity+144>: bl 0x8eb0dc <ipi_cpu> (r29 is another register value that is preserved in what sched_affinity called. r29 reports what r3 was when sched_affinity was called: 0x147d000, which makes sense. Also 808 is in decimal notation.) That sched_affinity code corresponds to (r10 for ts below): sched_affinity(struct thread *td) . . . struct td_sched *ts; . . . ts =3D td_get_sched(td); . . . ipi_cpu(ts->ts_cpu, IPI_PREEMPT); where "show reg r10" reported: 0x147d000, which matched the "show allpcpu" for cpuid=3D2: curthread =3D 0x147d000: pid11 tid 100005 "idle: cpu2" But it should not match that. . . r10 should have been 0x147d328 (0x147d00+0x328) but is reported as 0x147d000 and the behavior reported for ipi_cpu agrees with 0x147d000: 8(r10) then extracts a 0x0147ca20 value from 0x147d008 instead of a 0x0 from 0x147d328 . So something trashed r10 before: 0x535b40 <sched_affinity+136>: lwz r3,8(r10) and so effectively replaced ts with the td value. Trashing r3 so that r3+808=3D=3D0x147d000 when 0x535adc executes seems unlikely but is a technical possibility for how r10 ends up trashed. Other notes: Some of the details not shown above are: static __inline __pure2 struct td_sched * td_get_sched(struct thread *td) { =20 return ((struct td_sched *)&td[1]); } and: /* * Thread scheduler specific section. All fields are protected * by the thread lock. */ struct td_sched { struct runq *ts_runq; /* Run-queue we're queued on. */ short ts_flags; /* TSF_* flags. */ int ts_cpu; /* CPU that we have affinity = for. */ int ts_rltick; /* Real last tick, for affinity. = */ int ts_slice; /* Ticks of slice remaining. */ u_int ts_slptime; /* Number of ticks we vol. slept = */ u_int ts_runtime; /* Number of ticks we were = running */ int ts_ltick; /* Last tick that we were = running on */ int ts_ftick; /* First tick that we were = running on */ int ts_ticks; /* Tick count */ #ifdef KTR char ts_name[TS_NAME_LEN]; #endif }; where td_sched is a suffix after a "struct thread", such as the space reserved in: struct thread0_storage { struct thread t0st_thread; uint64_t t0st_sched[10]; }; and checked by: _Static_assert(sizeof(struct thread) + sizeof(struct td_sched) <=3D sizeof(struct thread0_storage), "increase struct thread0_storage.t0st_sched size"); =3D=3D=3D Mark Millard markmi at dsl-only.net
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1738C902-7A9C-4711-A2F7-D302249C9AF8>