Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 22 Jun 2013 01:48:34 GMT
From:      adrian chadd <adrian@FreeBSD.org>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/179827: [hwpmc] process-mode counters aren't correctly read on multi-core machines
Message-ID:  <201306220148.r5M1mYaJ076069@oldred.freebsd.org>
Resent-Message-ID: <201306220150.r5M1o0xW076243@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         179827
>Category:       kern
>Synopsis:       [hwpmc] process-mode counters aren't correctly read on multi-core machines
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Jun 22 01:50:00 UTC 2013
>Closed-Date:
>Last-Modified:
>Originator:     adrian chadd
>Release:        9-STABLE, r252049, amd64 (and i386)
>Organization:
Netflix, Inc
>Environment:
>Description:
When using a process mode counter PMC on a multi-core machine, the pmc_read() -> PMC_OP_PMCRW event is not correctly reading the PMC.

What's going on in hwpmc_mod.c is thus:

* It grabs the current thread CPU;
* It then checks if the PMC is active;
* If it's currently active, it reads the PMC from the current CPU rather than the target CPU;
* .. otherwise it returns the old counter value.

When I enabled pmc debugging: DEBUG in the kernel, then:

# sysctl kern.hwpmc.debugflags="csw=swo,swi pmc=ops"

I'd see this:

* lots of swi, swo (swapin, swapout) of the process, which updates the cached PMC value
* a PMC,OPS event every second whilst it was reading the process mode counter
* .. but the PMC,OPS value would always be the "old" one, rather than the updated one.

The only time the counter value was updated was if the PMC,OPS (ie, the PMC_OP_PMCRW event) occured _during_ the running of the process - ie, the scheduler put the process being counted _and_ the pmcstat process on the same CPU, switched out the process under profiling  (which would switch it out and update the value) and switched in pmcstat to do the read.

This reads like it was really designed for reading counters from an active process - ie, where you'd put pmc_read() at specific points in a programs execution in order to count how long certain events were taking.

It's totally bogus for the SMP case, where your current process being run is on another CPU and doesn't get switched out for long stretches of time, whilst pmcstat wakes up once a second on another CPU to do a counter read.

>How-To-Repeat:
* grab a multi-core device;
* Grab some CPU chewing process that doesn't do any kind of blocking - ie, some minute long math problem;
* run pmcstat in counting mode - pmcstat -p instructions -w 1 ./process
* watch the counters look totally bogus.
* See above for the sysctl line to enable pmc debugging and you'll see how bogus the sampling is.

>Fix:
I'm not sure. There may be multiple problems here:

* One is ensuring the PMC stats are read from the CPU the pmc counter is actively active on, as it may be a different CPU to the thread calling pmc_read() -> PMC_OP_PMCRW;
* Another is checking to see if the PMC is actually active at the present moment on _any_ CPU and if it is, fetch the statistics from said correct CPU.
* I'm not sure what the deal is with running counting mode samples on a multi-core CPU with multiple threads running in the process under count. Ie, if you have 4 threads on 4 separate CPUs running and you're trying to count '-p instructions', then I think the right behaviour is to enable the instructions counter on -all- the CPUs with those threads active. So what should pmc_read() do? Does the kernel-side correctly read all the CPUs with threads active for that particular registered PMC? Or what is actually going on?


>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201306220148.r5M1mYaJ076069>