From owner-freebsd-arch@FreeBSD.ORG  Fri Jan  4 01:19:23 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 5FCF816A417;
	Fri,  4 Jan 2008 01:19:23 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42])
	by mx1.freebsd.org (Postfix) with ESMTP id 2641213C465;
	Fri,  4 Jan 2008 01:19:23 +0000 (UTC)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [209.31.154.41])
	by cyrus.watson.org (Postfix) with ESMTP id E320B47954;
	Thu,  3 Jan 2008 20:19:21 -0500 (EST)
Date: Fri, 4 Jan 2008 01:19:21 +0000 (GMT)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
In-Reply-To: <2296.1199319966@critter.freebsd.dk>
Message-ID: <20080104011149.M42109@fledge.watson.org>
References: <2296.1199319966@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Attilio Rao <attilio@freebsd.org>, arch@freebsd.org,
	Andre Oppermann <andre@freebsd.org>, freebsd-arch@freebsd.org
Subject: Re: New "timeout" api, to replace callout 
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Jan 2008 01:19:23 -0000


> In message <477C2A8A.8090604@freebsd.org>, Andre Oppermann writes:
>
>>>>>> I fear we have to go for the latter.  Getting a non-sleeping callout 
>>>>>> drain seems to be non-trivial.
>>>>> There is a crucial difference between "non-sleeping" and "not sleeping 
>>>>> on my lock" that you should be very careful about in this context.
>>>>>
>>>>> Which is your requirement ?
>>>> Calling timeout_drain() must not sleep and not drop the lock in this 
>>>> context (while making any pending timeout go away forever).
>>>
>>> Ok, so if the timeouts callback function is running you propose to do what 
>>> ?  spin until it returns ?
>>
>> As long as the spinning is bounded [...] I don't have a perfect solution 
>> handy.  That's why I try to state the requirement and hope the timeout 
>> gurus can work out how to do it.  ;-)
>
> It's all Alan Turings fault :-)
>
> You can't have that, it's that simple.
>
> What I'm proposing is that your thread will sleep on a plain, but unrelated 
> mutex (internal to the timeout code) until the function comes back.
>
> Based on your description above, you won't be able to tell the any 
> difference between this and what you wish for.

In a world of bounded mutex-length sleeps and unbounded I/O-length sleeps, 
waiting a mutex-length sleep for callout_stop() to cancel and drain the 
callout is fine.  However, callout_stop() needs to be done while holding locks 
that the callout may acquire (global tcbinfo and local inpcb locks), which in 
the current world order would lead rapidly to deadlock.  In the model you have 
in mind, is it OK to hold locks that the callout being canceled using the call 
to callout_stop() might acquire?  The following pseudo-code snippets captures 
what I'm probably putting into words poorly:

TCP input path:

 	mtx_lock(&tcbinfo);
 	...
 	mtx_lock(inp->inp_mtx);
 	...
 	/*
 	 * NB: We can't drop the locks here or there will be races, and we
 	 * also can't msleep() or related sorts of things as we're running in
 	 * an ithread.
 	 */
 	callout_stop(inp->inp_callout);
 	mtx_destroy(inp->inp_mtx);
 	free(inp);

TCP timer path:

 	mtx_lock(&tcbinfo);
 	...
 	mtx_lock(inp->inp_mtx);
 	...
 	mtx_unlock(inp->inp_mtx);
 	...
 	mtx_unlock(&tcbinfo);

Right now, we callout_stop() but not callout_drain() in the input path on the 
basis that we can't msleep(), and for better and mostly worse, we accept the 
resulting race.  We'd like to eliminate that race.  One way to do that is to 
hand the inpcb off to another kthread that *can* msleep() doing a 
callout_drain(), and have it GC the data structure when done.  That's sort of 
ugly, and means that we're freeing the memory asynchronously which is somewhat 
undesirable.  If it were possible to have the above callout_stop() also 
guarantee that once it returns, the timer is not currently running and will 
never run again, then we wouldn't need that.

We may also have cases where we want to free the inpcb from within the 
timer...

Robert N M Watson
Computer Laboratory
University of Cambridge