Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 20 Apr 2009 00:50:22 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Ivan Voras <ivoras@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r191291 - in head: lib/libthr/thread libexec/rtld-elf/amd64 libexec/rtld-elf/arm libexec/rtld-elf/i386 	libexec/rtld-elf/ia64 libexec/rtld-elf/mips libexec/rtld-elf/powerpc 	libexec/rtld-...
Message-ID:  <alpine.BSF.2.00.0904200039250.71062@fledge.watson.org>
In-Reply-To: <9bbcef730904191630x4e4f2aeci2d6ac769fc1f73f8@mail.gmail.com>
References:  <200904192302.n3JN2o6Z023217@svn.freebsd.org> <alpine.BSF.2.00.0904200006290.71062@fledge.watson.org> <9bbcef730904191630x4e4f2aeci2d6ac769fc1f73f8@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--621616949-165048185-1240185022=:71062
Content-Type: TEXT/PLAIN; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Mon, 20 Apr 2009, Ivan Voras wrote:

> 2009/4/20 Robert Watson <rwatson@freebsd.org>:
>> On Sun, 19 Apr 2009, Robert Watson wrote:
>>
>>>  Now that the kernel defines CACHE_LINE_SIZE in machine/param.h, use  that 
>>> definition in the custom locking code for the run-time linker  rather than 
>>> local definitions.
>>
>> This actually changes the line size used by the rtld code for pre-pthreads 
>> locking for several architectures.  I think this is an improvement, but if 
>> architecture maintainers could comment on that, that would be helpful.
>
> Will there be infrastructure for creating per-CPU structures or is using 
> something like:
>
> int mycounter[MAXCPU] __attribute__ ((aligned(CACHE_LINE_SIZE)));

For now, yes, something along these lines.  I have a local prototype I'm using 
that has an API something like this:

  *     // Definitions
  *     struct foostat  *foostatp;
  *     void            *foostat_psp;
  *
  *     // Module load
  *     if (pcpustat_alloc(&foostat_psp, "foostat", sizeof(struct foostat),
  *         sizeof(u_long)) != 0)
  *             panic("foostat_init: pcpustat_alloc failed");
  *     foostatp = pcpustat_getptr(foostat_psp);
  *
  *     // Use the pointer for a statistic
  *     foostatp[curcpu].fs_counter1++;
  *
  *     // Retrieve summary statistics and store in a single instance
  *     struct foostat fs;
  *     pcpustat_fetch(foostat_psp, &fs);
  *
  *     // Reset summary statistics.
  *     pcpustat_reset(foostat_psp);
  *
  *     // Module unload
  *     pcpustat_free(foostat_psp);
  *     foostatp = foostat_psp = NULL;

The problem with the [curcpu] model is that it embeds the assumption that it's 
a good idea to have per-CPU fields in adjacent cache lines within a page.  As 
the world rocks gently in the direction of NUMA, there's a legitimate question 
as to whether that's a good assumption to build in.  It's a better assumption 
than the assumption that it's a good idea to use a single stat across all CPUs 
in a single cache line, of course.  Depending on how we feel about the 
overhead of accessor interfaces, all this can be hidden easily enough.

A facility I'd like to have would be an API to allocate memory on all CPUs at 
once, with the memory on each CPU at a constant offset from a per-CPU base 
address.  That way you could calculate the location of the per-CPU structure 
using PCPU_GET(dynbase) + foostat_offset without adding an additional 
indirection.

Robert N M Watson
Computer Laboratory
University of Cambridge
--621616949-165048185-1240185022=:71062--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0904200039250.71062>