Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 12 Dec 2000 18:34:14 -0800
From:      Jason Evans <jasone@canonware.com>
To:        John Baldwin <jhb@FreeBSD.org>
Cc:        arch@FreeBSD.org
Subject:   Re: An opaque refcount type
Message-ID:  <20001212183414.K2312@canonware.com>
In-Reply-To: <XFMail.001212122450.jhb@FreeBSD.org>; from jhb@FreeBSD.org on Tue, Dec 12, 2000 at 12:24:50PM -0800
References:  <XFMail.001212122450.jhb@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Dec 12, 2000 at 12:24:50PM -0800, John Baldwin wrote:
> Here's another bikeshed war for everyone to get in on:  I've implemented a
> relatively light weight and very simple opaque reference counter.  It defines
> an opaque refcount_t type.  In the INVARIANTS case, this maps to a structure
> that contains an integer and a mutex.  The mutex is used to protect the
> integer count as well as several KASSERT()'s.  In the non-INVARIANTS case, it
> maps to a single integer on all of our current platforms (x86, alpha, ia64) and
> is managed via atomic operations.  It defines a rather simple API:
> 
> void refcount_init(refcount_t *count, u_int value)
>  - Initializes the reference counter and sets its inital count to 'value'.
> void refcount_destroy(refcount_t *count)
>  - Cleans up any internals used in a refcount, frees resources, etc.
> void refcount_acquire(refcount_t *count)
>  - Atomically bump the reference count by one.
> int refcount_release(refcount_t *count)
>  - Atomically decerement the reference count by one and return true if the
>    count drops to zero.

As at least John and Alfred know, I'm opposed to an atomic refcount type,
for at least the following reasons:

1) The number of bits of accuracy varies from platform to platform.  It
   seems that the least common denominator we'd need to settle on is 24
   bits (sparc64), which is enough for most, but not all uses.  This is the
   kernel we're talking about here; I don't like the uncertainty of the
   accuracy.  Maybe 16.7 million is enough accuracy, but...

2) It is hard to say at this point since we still have a lot of conversion
   work to do, but I suspect that the number of places where a refcount is
   ideally manipulated outside of a mutex is small.

3) The refcount_init() and refcount_destroy() calls are mandatory, but are
   only really necessary for platforms that have to rely on mutexes for the
   internal implementation (or if INVARIANTS is defined).  For the most
   part, this just amounts to extra API complexity.

4) For platforms that can't support an atomic refcount with atomic
   operations, we have to use mutexes.  Now, instead of having using normal
   atomic operations for refcounts, we have a mutex structure in place of
   an integer.  So, in some cases, this API is actually a pessimization
   rather than an optimization, and any structures that use a refcount_t
   are suddenly much larger.  Naturally, this means that any structures
   with a refcount_t that are exposed to userland will potentially vary in
   size, depending on INVARIANTS.

5) At the moment we have mtx_*(), atomic_*(), [mt]sleep(), and the lockmgr.
   In the foreseeable future (maybe not by 5.0, but eventually) we will
   probably have mtx_*(), cv_*() (condition variables, replaces [mt]sleep()),
   and enhanced reader/writer (actually SIX) locks (replaces lockmgr).
   I'd like to see the number of synchronization-related interfaces get
   smaller and simpler, not larger.

In another email thread, I demonstrated how to safely manipulate refcounts
with atomic operations.  The main problem to solve is how to avoid a
cleanup race when decrementing the refcount to 0.  The following
pseudo-code demonstrates a way to avoid this race.

atomic_decrement(count)
if (compare_and_set(count, 0, 1)) { /* iff (count == 0) {count = 1}. */
	/* Clean up. */
}

This requires two atomic operations for refcount decrement, which, as John
has pointed out, is the same number of locked instructions as would be
required for a mutex acquire/release.  However, refcount increment still
only requires one locked instruction.  Therefore, this method requires 50%
more locked instructions than a refcount_t implementation would require on
a cooperative platform, and 33% fewer locked instructions than a refcount_t
implementation on an uncooperative platform.

My knowledge of memory bus locking isn't fresh, but I suspect that using
two locked instructions on the same address in short succession only costs
significantly more than one locked instruction if there is contention.  Of
course, if there is contention, then an implementation that only uses one
locked instruction is going to get hurt too, so I suspect that the actual
performance difference between this and a streamlined refcount_t
implementation is rather small.  (Someone please correct me if this is
incorrect.)

Even if performance is significantly worse for this method, point 2 above
could make it irrelevant anyway.  In summary, I have a number of specific
and general issues with introducing an atomic refcount type with associated
functions.

Jason


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20001212183414.K2312>