Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Jul 1999 13:37:15 +1000
From:      Peter Jeremy <jeremyp@gsmx07.alcatel.com.au>
To:        mike@smith.net.au
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re: "objtrm" problem probably found (was Re: Stuck in "objtrm")
Message-ID:  <99Jul13.134051est.40360@border.alcanet.com.au>
In-Reply-To: <199907130209.TAA03301@dingo.cdrom.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Mike Smith <mike@smith.net.au> wrote:
>> Although function calls are more expensive than inline code,
>> they aren't necessarily a lot more so, and function calls to
>> non-locked RMW operations are certainly much cheaper than
>> inline locked RMW operations.
>
>This is a fairly key statement in context, and an opinion here would 
>count for a lot; are function calls likely to become more or less 
>expensive in time?

Based on general computer architecture principles, I'd say that a lock
prefix is likely to become more expensive[1], whilst a function call
will become cheaper[2] over time.

I'm not sure that this is an important issue here.  The sole advantage
of moving to indirect function calls would be that the same object
code could be used on both UP and SMP configurations, without
incurring the overhead of the lock prefix in the UP configuration.
(At the expense of an additional function call in all configurations).
We can't avoid the lock prefix overhead in the SMP case.

Based on the timings I did this morning, function calls are
(unacceptably, IMHO) expensive on all the CPU's I have to hand (i386,
Pentium and P-II) - the latter two presumably comprising the bulk of
current FreeBSD use.

Currently the UP/SMP decision is made at compile time (and has
significant and widespread impact) - therefore there seems little (if
any) benefit in using function calls within the main kernel.

I believe that Matt's patched i386/include/atomic.h, with the addition
of code to only include the lock prefix when SMP is defined, is
currently the optimal approach for the kernel - and I can't see any
way a future IA-32 implementation could change that.

The only benefit could be for kernel modules - a module could possibly
be compiled so the same LKM would run on either UP or SMP.  Note that
function calls for atomic operations may not be sufficient (by
themselves) to achieve this: One of the SMP gurus may be able to
confirm whether anything else prevents an SMP-compiled LKM running
with a UP kernel.

If the lock prefix overhead becomes an issue for LKMs, then we could
define a variant of i386/include/atomic.h (eg by using a #define which
is only true for compiling LKMs) which does use indirect function
calls (and add the appropriate initialisation code).  This is a
trivial exercise (which I'll demonstrate on request).

[1] A locked instruction implies a synchronous RMW cycle.  In order
    to meet write-ordering guarantees (without which, a locked RMW
    cycle would be useless as a semaphore primitive), it implies a
    complete write serialization, and probably some level of
    instruction serialisation.  Since write-back pipelines will get
    longer and parallel execution units more numerous, the cost of
    a serialisation operation will get relatively higher.  Also,
    lock instructions are relatively infrequent, therefore there is
    little incentive to expend valuable silicon on trying to make
    them more efficient (at least as seen by the executing CPU).

[2] Function calls _are_ fairly common, therefore it probably is
    worthwhile expending some effort in optimising them - and the
    stack updates associated with a leaf subroutine are fairly
    easy to totally hide in an on-chip write pipeline/cache.

Peter


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?99Jul13.134051est.40360>