Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 May 2015 18:10:21 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 198149] [hwpmc] pmcstat -P -t (top mode, process sampling) stops after a while
Message-ID:  <bug-198149-8-9DSbrQme3L@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-198149-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-198149-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198149

--- Comment #10 from John Baldwin <jhb@FreeBSD.org> ---
So I think I somewhat understand what is wrong.  I'm not yet sure how to fix
it. What seems to happen is that on a context switch out, the read_pmc
operation is returning a very large value.  The result of this is that the PMC
gets set to a value large enough that it won't expire during the next slice. 
This error gets recompounded on every switch out/in and the PMC stops firing as
a result.  Some snippets of KTR traces show the error in action:

238280   1  268934513552654 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=5dd31
238271   1  268934513429846 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=4fffffff855b6 iafctrl=0 pmc=fffffff855b6
238262   1  268934513342102 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=2f2c
238247   1  268934510388202 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=2fffffffc9345 iafctrl=0 pmc=fffffffc9345
238238   1  268934510294742 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=4125
238229   1  268934510220562 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=1fffffffea910 iafctrl=0 pmc=fffffffea910
238220   1  268934510132922 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=fffffffffffe
238211   1  268934510048862 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=ffffffffd489 iafctrl=0 pmc=ffffffffd489
238202   1  268934509967030 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=b34d
238193   1  268934509880238 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=ffffffff109e iafctrl=0 pmc=ffffffff109e
238184   1  268934509789534 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=e848
238175   1  268934509749902 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=ffffffff942b iafctrl=0 pmc=ffffffff942b
238166   1  268934509673986 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=62dd
238157   1  268934508267090 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=ffffffff18a7 iafctrl=0 pmc=ffffffff18a7
238148   1  268934508103386 MDP:REA:1: iaf-read cpu=1 ri=2 msr=0x40000002 ->
v=6d25

The error occurs at event 238220 when "-2" is converted to a large unsigned
value.  After this point, the PMC is programmed with progressively larger and
larger values on each switch in and never fires again.

By the end of the trace when I killed my test program it was quite far off:

116541   1  268945752955406 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=ffed2e3cfd33 iafctrl=0 pmc=ffed2e3cfd33
116448   1  268945715324794 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=ffef3dd2e030 iafctrl=0 pmc=ffef3dd2e030
116337   1  268945421271906 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=fff2cff21be4 iafctrl=0 pmc=fff2cff21be4
116321   1  268945168850926 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=fff2defc2fec iafctrl=0 pmc=fff2defc2fec
116276   1  268944964260070 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=fff3210cd42b iafctrl=0 pmc=fff3210cd42b
116241   1  268944442945530 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=fff353fadde0 iafctrl=0 pmc=fff353fadde0
116207   1  268944442823210 MDP:WRI:1: iaf-write cpu=1 ri=2 msr=0x40000002
v=fff3fa4fc2e3 iafctrl=0 pmc=fff3fa4fc2e3
...

I'm not really sure where the error is.  I think it might be that
iap_perfctr_value_to_reload_count needs to sign extend its return value so it
can return -2 as the value of the PMC in this case instead of what it returned.
 Note that this seems specific to hwpmc_core.c.  hwpmc_amd.c uses a different
approach.  It sign extends the value it reads from the PMC first and then
negates it (which would have returned -2 in this case).

-- 
You are receiving this mail because:
You are the assignee for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-198149-8-9DSbrQme3L>