Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Dec 2007 18:05:40 -0500
From:      John Baldwin <jhb@FreeBSD.org>
To:        freebsd-arch@FreeBSD.org
Cc:        Attilio Rao <attilio@FreeBSD.org>, arch@FreeBSD.org, Poul-Henning Kamp <phk@phk.freebsd.dk>, Robert Watson <rwatson@FreeBSD.org>, Andre Oppermann <andre@FreeBSD.org>
Subject:   Re: New "timeout" api, to replace callout
Message-ID:  <200712271805.40972.jhb@freebsd.org>
In-Reply-To: <4752AABE.6090006@freebsd.org>
References:  <18378.1196596684@critter.freebsd.dk> <4752AABE.6090006@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sunday 02 December 2007 07:53:18 am Andre Oppermann wrote:
> Poul-Henning Kamp wrote:
> > In message <4752998A.9030007@freebsd.org>, Andre Oppermann writes:
> >>  o TCP puts the timer into an allocated structure and upon close of the
> >>    session it has to be deallocated including stopping of all currently
> >>    running timers.
> >>    [...]
> >>     -> The timer facility should provide an atomic stop/remove call
> >>        that prevent any further callbacks upon return.  It should not
> >>        do a 'drain' where the callback may be run anyway.
> >>        Note: We hold the lock the callback would have to obtain.
> > 
> > It is my intent, that the implementation behind the new API will
> > only ever grab the specified lock when it calls the timeout function.
> 
> This is the same for the current one and pretty much a given.
> 
> > When you do a timeout_disable() or timeout_cleanup() you will be
> > sleeping on a mutex internal to the implementation, if the timeout
> > is currently executing.
> 
> This is the problematic part.  We can't sleep in TCP when cleaning up
> the timer.  We're not always called from userland but from interrupt
> context.  And when calling the cleanup we currently hold the lock the
> callout wants to obtain.  We can't drop it either as the race would
> be back again.  What you describe here is the equivalent of callout_
> drain().  This is unfortunately unworkable in TCP's context.  The
> callout has to go away even if it is already pending and waiting on
> the lock.  Maybe that can only be solved by a flag in the lock saying
> "give up and go away".

The reason you need to do a drain is to allow for safe destroying of the lock.  
Specifically, drivers tend to do this:

	FOO_LOCK(sc);
	...
	callout_stop(...);
	FOO_UNLOCK(sc);
	...
	callout_drain(...);
	...
	mtx_destroy(&sc->foo_mtx);

If you don't have the drain and softclock is trying to acquire the backing 
mutex while you have it held (before the callout_stop) then Bad Things can 
happen if you don't do the drain.  Having the lock just "give up" doesn't 
work either because if the memory containing the lock is free'd and 
reinitialized such that it looks enough like a valid lock then softclock (or 
its equivalent) will still try to obtain it.  Also, you need to do a drain so 
it is safe to free the callout structure to prevent it from being recycled 
and having weird races where it gets recycled and rescheduled but the timer 
code thinks it has a pending stop for that pointer and so it aborts the wrong 
instance of the timer, etc.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200712271805.40972.jhb>