Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Apr 2005 03:37:58 +0000 (GMT)
From:      jkoshy@FreeBSD.ORG (Joseph Koshy)
To:        obrien@FreeBSD.org
Cc:        Joseph Koshy <jkoshy@FreeBSD.org>
Subject:   Re: cvs commit: src/gnu/usr.bin/groff/tmac mdoc.local src/lib  Makefile src/share/doc/papers/hwpmc Makefile hwpmc.ms  src/share/examples/hwpmc README src/share/man/man4 Makefile ...
Message-ID:  <20050420033758.8711B16A4CF@hub.freebsd.org>
In-Reply-To: Message from "David O'Brien" <obrien@FreeBSD.org>  <20050419181128.GA27443@dragon.NUXI.org>

next in thread | previous in thread | raw e-mail | index | archive | help


al> I assume this is like a portable version of the measurement backend in
al> Intels VTune... at least I assume VTune does something like this
al> itself.

I have not actually used Intel's VTune or AMD's CodeAnalyst so
please take my words with a pinch of salt.

>From reading the publically available documentation, VTune's backend
appears to do 'system-wide sampling'.

Our backend can do system-wide measurements as well as per-process
measurements (i.e., the counter hardware can be 'virtualized').
Another difference is that we support 'counting' as well as 'sampling'.

So 4 kinds of PMC usage styles are currently supported by our
infrastructure:

  - process-private, counting

    o We could have a profiling runtime library that augments its
      data collection with data from the PMCs at function entry/exit.

    o Scientific applications could use this mode to measure hardware
      counts between two points of code.  I believe the scientific
      community uses an API named "PAPI" for performance measurements.
      We should be able to support PAPI in -current now.
    
  - system-wide, counting
    
    o You could allocate system-wide, counting PMCs and read these
      once a minute.  This operation would have near-zero overhead
      and could be used for collecting long-term data, say for making
      machine sizing decisions.

  - process-private, sampling

    o The standard 'profiling' function, with a couple of twists:
      you would not need to specially compile executables for
      profiling, and you could profile any process you could
      PMC_ATTACH a PMC to.

  - system-wide, sampling

    o This 'profiles' the whole system: applications, kernel and
      interrupt handlers.

The current snapshot in -current has sampling modes turned off as
they haven't been fully implemented.

obrien> Every modern CPU has event counters.  Some CPU's have as little as 2
obrien> (Pentium Pro), others have 4 (Athlon64 and Opteron), I think IA-64 has

The P4 has had the most complexity so far: 18 counters, 45 event-select
registers and many many restrictions about what works with what.
Further, logical (HTT) cpus share PMC resources and some events
change semantics if HTT is enabled (TS/TI events) :(.

The userland library pmc(3) and the driver hwpmc(4) handle these
issues for you.

obrien> This PMC facility is much more similar to Linux's Oprofile than VTune or
obrien> AMD's CodeAnalyst.  It allows one to set and access the event counters.

Linux has Oprofile (for system-wide sampling) and many separate
'counting' mode implementations (Perfctr, Rabbit, Lperfex, etc.).

obrien> You will need to find the applicable CPU docs so you know what [public]
obrien> events exist, and any "options" those events have.

The PMC specific sections of pmc(3) list the events and allowed
modifiers that our library understands.

You would still need to read the CPU docs: some of the events
measured by hardware only make sense in the context of a given CPU
architecture.

For folks who like Python, there is a Python wrapper around libpmc
that makes it easy to play around with this functionality.  You can
pick it up at:

  http://people.freebsd.org/~jkoshy/projects/perf-measurement/pypmc.html

Regards,
Koshy
<jkoshy@freebsd.org>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050420033758.8711B16A4CF>