Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 05 Aug 2002 15:14:15 -0700
From:      Peter Wemm <peter@wemm.org>
To:        Luigi Rizzo <rizzo@icir.org>
Cc:        Terry Lambert <tlambert2@mindspring.com>, smp@freebsd.org
Subject:   Re: how to create per-cpu variables in SMP kernels ? 
Message-ID:  <20020805221415.B0C732A7D6@canning.wemm.org>
In-Reply-To: <20020805015340.A17716@iguana.icir.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
Luigi Rizzo wrote:
> On Sun, Aug 04, 2002 at 11:44:27PM -0700, Terry Lambert wrote:
> > > I would like to know how does the FreeBSD kernel (both in -current
> > > and -stable) handle per-cpu variables such as curproc/curthread, cpuid,
> ...
> > > How expensive is to access them compared to regular variables ?
> > 
> > Depends on the specific variable's implementation.  If you are asking
> > because you want to add one, then don't.  8-).  They damage symmetry
> 
> i am asking because in the code I see several instance of things like
> 
> 	p = curproc;
> 	<code using p instead of curproc>
> 
> in a context where curproc is not supposed to change. Is there a
> performance bonus in doing this, or not ?

Sort-of.  There is both a compile time issue and a runtime issue.

Using the %fs:variable segment overrides doesn't make a lot of difference,
but the compiler is effectively wired so that they are treated as volatile.

ie:
p = curproc;
foo(curproc);
bar(curproc);
return curproc;
.. will cause *4* memory references with segment overrides.  However:
p = curproc;
foo(p);
bar(p);
return p;
.. will use *1*.  Actually, this isn't quite correct on -current since
there isn't a curproc percpu variable. It is really:
#define curproc (curthread->td_proc)
so the example above has actually got 8 memory references vs 2.
Sure, you will probably hit L1 cache, but there is no guarantee of that.
In the 'p' cases, it will probably end up as a register, but that is up
to the compiler to figure out the best use of resources.

Secondly, there is a compile time issue.   "curproc" and "curthread" expand
to monster macros that the compiler has to untangle and optimize.  It
contributes to compile time and memory to represent it in the rtl tree.
Minimizing unnecessary overuse of them adds up over time.

An example from -current..  This:

static __inline int
sigonstack(size_t sp)
{
        register struct thread *td = curthread;
        struct proc *p = td->td_proc;
 
        return ((p->p_flag & P_ALTSTACK) ?
            ((sp - (size_t)p->p_sigstk.ss_sp) < p->p_sigstk.ss_size)
            : 0);
}

Becomes:
static __inline int
sigonstack(size_t sp)
{
        register struct thread *td = ({ __typeof(((struct pcpu *)0)->pc_curthrea
d) __result; if (sizeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:
%1,%0" : "=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthre
ad))))); __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else
if (sizeof(__result) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r
" (__w) : "m" (*(u_short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __
result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(
__result) == 4) { u_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m"
 (*(u_int *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__ty
peof(((struct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(
((struct pcpu *)0)->pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2
,%0" : "=r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvs
pace)))), "i" (((size_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __res
ult; });
        struct proc *p = td->td_proc;

        return ((p->p_flag & 0x4000000) ?
            ((sp - (size_t)p->p_sigstk.ss_sp) < p->p_sigstk.ss_size)
            : 0);
}



However, if I change it like this:
static __inline int
sigonstack(size_t sp)
{
        return ((curproc->p_flag & P_ALTSTACK) ?
            ((sp - (size_t)curproc->p_sigstk.ss_sp) < curproc->p_sigstk.ss_size)
            : 0);
}
it becomes:
static __inline int
sigonstack(size_t sp)
{
        return (((({ __typeof(((struct pcpu *)0)->pc_curthread) __result; if (si
zeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:%1,%0" : "=r" (__b)
 : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result =
 *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (sizeof(__result
) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r" (__w) : "m" (*(u_
short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof
(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(__result) == 4) { u
_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_int *)(((size
_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((struct pcpu 
*)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(((struct pcpu *)0)-
>pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2,%0" : "=r" (__p) :
 "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvspace)))), "i" (((si
ze_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __result; })->td_proc)->
p_flag & 0x4000000) ?
            ((sp - (size_t)(({ __typeof(((struct pcpu *)0)->pc_curthread) __resu
lt; if (sizeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:%1,%0" : 
"=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); 
__result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (sizeo
f(__result) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r" (__w) :
 "m" (*(u_short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = 
*(__typeof(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(__result)
 == 4) { u_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_int
 *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((st
ruct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(((struct 
pcpu *)0)->pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2,%0" : "=
r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvspace)))),
 "i" (((size_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __result; })->
td_proc)->p_sigstk.ss_sp) < (({ __typeof(((struct pcpu *)0)->pc_curthread) __res
ult; if (sizeof(__result) == 1) { u_char __b; __asm volatile("movb %%fs:%1,%0" :
 "=r" (__b) : "m" (*(u_char *)(((size_t)(&((struct pcpu *)0)->pc_curthread)))));
 __result = *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__b; } else if (size
of(__result) == 2) { u_short __w; __asm volatile("movw %%fs:%1,%0" : "=r" (__w) 
: "m" (*(u_short *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result =
 *(__typeof(((struct pcpu *)0)->pc_curthread) *)&__w; } else if (sizeof(__result
) == 4) { u_int __i; __asm volatile("movl %%fs:%1,%0" : "=r" (__i) : "m" (*(u_in
t *)(((size_t)(&((struct pcpu *)0)->pc_curthread))))); __result = *(__typeof(((s
truct pcpu *)0)->pc_curthread) *)&__i; } else { __result = *({ __typeof(((struct
 pcpu *)0)->pc_curthread) *__p; __asm volatile("movl %%fs:%1,%0; addl %2,%0" : "
=r" (__p) : "m" (*(struct pcpu *)(((size_t)(&((struct pcpu *)0)->pc_prvspace))))
, "i" (((size_t)(&((struct pcpu *)0)->pc_curthread)))); __p; }); } __result; })-
>td_proc)->p_sigstk.ss_size)
            : 0);
}

Also, when you get a syntax error due to a #define collision in the middle
of that mess, which would you rather be trying to debug the preprocessor
output from?

Cheers,
-Peter
--
Peter Wemm - peter@wemm.org; peter@FreeBSD.org; peter@yahoo-inc.com
"All of this is for nothing if we don't go to the stars" - JMS/B5


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020805221415.B0C732A7D6>