From owner-freebsd-current  Wed Jul 10 11:30:16 2002
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4807937B400; Wed, 10 Jul 2002 11:30:03 -0700 (PDT)
Received: from InterJet.dellroad.org (adsl-63-194-81-26.dsl.snfc21.pacbell.net [63.194.81.26])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 7D79643E3B; Wed, 10 Jul 2002 11:30:02 -0700 (PDT)
	(envelope-from archie@dellroad.org)
Received: from arch20m.dellroad.org (arch20m.dellroad.org [10.1.1.20])
	by InterJet.dellroad.org (8.9.1a/8.9.1) with ESMTP id LAA68895;
	Wed, 10 Jul 2002 11:15:46 -0700 (PDT)
Received: (from archie@localhost)
	by arch20m.dellroad.org (8.11.6/8.11.6) id g6AIFDm28655;
	Wed, 10 Jul 2002 11:15:13 -0700 (PDT)
	(envelope-from archie)
From: Archie Cobbs <archie@dellroad.org>
Message-Id: <200207101815.g6AIFDm28655@arch20m.dellroad.org>
Subject: Re: Timeout and SMP race
In-Reply-To: <XFMail.20020710132809.jhb@FreeBSD.org> "from John Baldwin at Jul
 10, 2002 01:28:09 pm"
To: John Baldwin <jhb@FreeBSD.org>
Date: Wed, 10 Jul 2002 11:15:13 -0700 (PDT)
Cc: davidx@viasoft.com.cn, freebsd-arch@FreeBSD.org,
	julian@elischer.org
X-Mailer: ELM [version 2.4ME+ PL88 (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-current@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-current.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-current>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-current>
X-Loop: FreeBSD.ORG


[ NOTE: I'm moving this discussion to freebsd-arch@freebsd.org ]

John Baldwin writes:
> > What do you think of the idea of letting the timer code (optionally)
> > handle all the locking and race conditions?
> 
> I'm not sure it can in a clean fashion since of the few cases I've known
> so far each client needs a customized solution.  I am open to ideas though.
> I'm also open to some redesign of how callouts work to begin with (maybe
> using other threads than the one running softclock() to actually execute
> callout handlers, etc.).

FWIW, here is an API I've used before. This handles all race
conditions and the 'other thread' question.

struct timer_event;				/* opaque structure */

typedef struct timer_event *timer_handle_t;	/* caller's timer "handle" */

typedef void timer_func_t(void *arg);		/* timeout function type */

/* flags for timer_start() */
#define TIMER_RECURRING		0x0001		/* timer is recurring */
#define TIMER_OWN_THREAD	0x0002		/* handle in separate thread */

extern int	timer_start(timer_handle_t *handlep, mutex_t *mutexp,
			timer_func_t tfunc, void *arg, u_int delay,
			int flags);
extern void	timer_cancel(timer_handle_t *handlep);
extern int	timer_remaining(timer_handle_t handle, u_int *delayp);

static inline int
timer_isrunning(timer_handle_t handle)
{
	return (handle != NULL);
}

Semantics:

  1. The caller supplies a pointer to the 'handle', which must initially
     be NULL. The handle != NULL if and only if the timer is running.
  2. timer_cancel() guarantees that tfunc() will not be called subsequently
  3. *handlep is set to NULL by timer_cancel() and by the timer expiring.
     So when *handlep is NULL when tfunc() is invoked (unless TIMER_RECURRING).
  4. Calling timer_start() or timer_stop() from within tfunc() is OK.
  5. If TIMER_RECURRING, timer started again before calling tfunc()
  6. If TIMER_OWN_THREAD, timer runs in a newly created thread (rather
     than the timer service thread), which means that tfunc() may sleep
     or be canceled. If tfunc() sleeps or the thread is canceled but
     TIMER_OWN_THREAD was not set -> panic.
  7. If mutexp != NULL, *mutexp is acquired before calling tfunc() and
     released after it returns.

Items 1, and 2 are guaranteed only if mutexp != NULL and the caller
acquires *mutexp before any calls to timer_start() or timer_cancel()
(you would normally be doing this anyway).

Errors:

  - timer_start() returns EBUSY if *handlep != NULL
  - timer_remaining() returns ESRCH if handle != NULL

The model is: you have some object that has an associated lock and
one or more associated timers. The object is to be locked whenever
you muck with it (including when you start, stop, or handle a timer):

    struct foobar {
	struct lock	mutex;
	timer_handle_t	timer1;
	timer_handle_t	timer2;
	...
    };

Then all calls to the timer_* routines are "well behaved" and the
timeout thread caling tfunc() never races with any other thread
that may be stopping or starting the timer, or destroying the object.
E.g., to destroy the object, the following suffices:

    void
    foobar_destroy(struct foobar *f)
    {
	mutex_lock(&f->mutex);
	timer_cancel(&f->timer1);
	timer_cancel(&f->timer2);
	mutex_unlock(&f->mutex);
	free(f);
    }

The only remaining complexity for the caller is that if you have
any TIMER_OWN_THREAD handlers which unlock the object (e.g., in order
to go to sleep), then you need to reference count the object and
have a FOOBAR_INVALID flag.

If you are working under a different model then this API may not
be appropriate, but at least in my multi-threading experience this
model is very typical.

-Archie

__________________________________________________________________________
Archie Cobbs     *     Packet Design     *     http://www.packetdesign.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message