Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Jan 2001 22:10:05 -0700 (MST)
From:      Kevin Van Maren <vanmaren@fast.cs.utah.edu>
To:        cp@bsdi.com, smp@FreeBSD.ORG
Subject:   Re: atomic increment?
Message-ID:  <200101020510.WAA13199@fast.cs.utah.edu>

next in thread | raw e-mail | index | archive | help
I didn't see the Jason Evans flame thread on -arch.  Does
anyone have a pointer to it in the mail archive?

In the interim, I think atomic_{increment,decrement}, even if
they are just syntactic sugar to atomic_{add,subtract}, should be
provided.  After all, we use "++" as syntactic sugar to "+=1".
[The fact that gcc uses an intermediate register to add an immediate
constant is bogus, and not sufficient reason by itself to use
atomic_increment.]  On x86 the code is slightly better for "++"
ver "+=1", while it really is just syntactic sugar on other systems.

However, I also have another thought.  Often times I need to
modify a value and also (atomically) determine it's (old|new)
value.  Primitives that use "xadd" instead of "add" or "sub"
provide atomicity and eliminate an extra read.  Yes, trashing
the added value may cause a register to spill if the value is
needed again later, but often times it is never used again anyway.
[pre-processor can negate the "sub"; worst case we need an extra
"neg" instruction for atomic_subtract, but the subtraction will
still be atomic.]  Essentially, the (old) value is available for
free, so why not provide it?  It might even make sense to always
provide the old value for add/subtract, and have gcc throw away
the unused output (unless the input value is reused, when we'd
lose a whole register to the xadd).
Even if the processor does not support xadd-like operations, it
can be emulated (more expensively) using load, add, cmpxchg, loop-
if-failed [similar to the code frag below, but with a while() loop.]
But by providing a primitive, it can be optimized much further
than just C code using an atomic cmpxchg operation (and is
"negative cost" on x86 -- faster than the non-atomic version).

For example, in Julian's acquire_reader:
 => atomic_add_long(&ngq->q_flags, READER_INCREMENT);
 => if ((ngq->q_flags & (~READER_MASK)) == 0) {
can be changed to (find a better macro name):
 => flags = atomic_read_add_long(&ngq->q_flags, READER_INCREMENT);
 => if ((flags & (~READER_MASK)) == 0) {
gcc will realize that the register is sticks READER_INCREMENT in
(which it shouldn't for normal atomic_adds, but does anyway) now
contains "flags", and will use that register again for the mask,
thus saving a memory access to "volatile" memory.

In Julian's acquire_writer, we need to do an atomic compare-and-swap
operation, instead of assuming two operations are atomic (because the
above acquire_reader code could be executed between the two following
statements):

 => if ((ngq->q_flags & (~SINGLE_THREAD_ONLY)) == 0) {
 => 	atomic_add_long(&ngq->q_flags, WRITER_ACTIVE);

Here is a possible code sequence to "just get it working" (at least
I *think* this fixes the alleged problem):
    [ register int flags; ]
 => flags = ngq->q_flags;
 => if ((flags & (~SINGLE_THREAD_ONLY) == 0) &&
 =>     atomic_cmpset(&ngq->q_flags, flags, flags + WRITER_ACTIVE)) {


One more thought on atomic operations: If we don't assume assignments
are atomic, and always use atomic_load and atomic_store, then we a) can
easily provide atomic 64-bit operations on x86 (quick hack would be
to use a single mutex for all 64-bit operations), and b) we can port
to platforms where atomic_add requires a mutex to protect the atomic_add
or atomic_cmpset sequence.  [Slow as molasses]  On x86, the load/store
macros are NOPs, but the use also (c) makes it clear that we are
manipulating a variable we perform atomic operations on.

Kevin


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101020510.WAA13199>