Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 Dec 2016 20:45:19 +0100
From:      Luigi Rizzo <rizzo@iet.unipi.it>
To:        David Chisnall <David.Chisnall@cl.cam.ac.uk>
Cc:        Alan Somers <asomers@FreeBSD.org>, "current@freebsd.org" <current@freebsd.org>
Subject:   Re: best approximation of getcpu() ?
Message-ID:  <20161216194519.GA71398@onelab2.iet.unipi.it>
In-Reply-To: <D9F98972-ED18-4B59-AB3A-73B89F3C220D@cl.cam.ac.uk>
References:  <20161216021719.GA63374@onelab2.iet.unipi.it> <CAOtMX2hdkCk3ho%2Byedpv7iPPi97be4eFViYm4%2Bmi8EC-iR2Uvg@mail.gmail.com> <D9F98972-ED18-4B59-AB3A-73B89F3C220D@cl.cam.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 16, 2016 at 09:29:15AM +0000, David Chisnall wrote:
> On 16 Dec 2016, at 03:10, Alan Somers <asomers@FreeBSD.org> wrote:
> > 
> > What about pthread_setaffinity(3) and friends?  You can use it to pin
> > a thread to a single CPU, and know that it will never migrate.
> 
> This is not a useable solution for anything that needs to live in a library and also doesn???t solve the problem.
> 
> The Linux get_cpu call() is used for caches that are somewhere between global and thread-local.  Accessing them still requires a lock, but it???s very likely to be uncontended (contention only happens when you???re context switched at exactly the wrong time, or if a thread is migrated between cores in between the get_cpu() call and usage) and so you can use the userspace fast path for the lock and not suffer from cache contention effects.  
> 
> One x86, you can use cpuid from userspace and get the current core ID.  I have some code that does this and re-checks every few hundred accesses, storing the current CPU ID in a thread-local variable.  Using the per-CPU caches is a lot faster than using the global cache (and reduces contention on the global cache).  It would be great if we could have a syscall to do this on FreeBSD (it would be even better if we could have specify a TLS variable that the kernel automatically updates for the userspace thread when the scheduler migrates the thread between cores).

indeed the following line seems to do the job for x86
	asm volatile("cpuid" : "=d"(curcpu), "=a"(tmp), "=b"(tmp), "=c"(tmp) : "a"(0xb) );
(there must be a better way to tell the compiler that eax, ebx, ecx, edx are
all clobbered).

0xb is the CPUID function that returns the current APIC id for the
core (not necessarily matching the OS core-id)

The only problem is that this instruction is serialising and slow,
seems to take some 70-100ns on several of my machines so you
cannot afford to call it at all times but need the value cached
somewhere. Exposing it as thread local storage, or a VDSO syscall,
would be nicer because the kernel knows when it is actually changing
value.

cheers
luigi



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161216194519.GA71398>