From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 22:53:59 2008
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id E957C16A41A
	for <arch@FreeBSD.org>; Wed,  2 Jan 2008 22:53:59 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 5407013C46A
	for <arch@FreeBSD.org>; Wed,  2 Jan 2008 22:53:59 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: (qmail 81381 invoked from network); 2 Jan 2008 22:19:15 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <jhb@FreeBSD.org>; 2 Jan 2008 22:19:15 -0000
Message-ID: <477C1604.2030905@freebsd.org>
Date: Wed, 02 Jan 2008 23:53:56 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Thunderbird 1.5.0.14 (Windows/20071210)
MIME-Version: 1.0
To: John Baldwin <jhb@FreeBSD.org>
References: <18378.1196596684@critter.freebsd.dk>
	<4752AABE.6090006@freebsd.org> <200712271805.40972.jhb@freebsd.org>
In-Reply-To: <200712271805.40972.jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Attilio Rao <attilio@FreeBSD.org>, arch@FreeBSD.org,
	Poul-Henning Kamp <phk@phk.freebsd.dk>,
	Robert Watson <rwatson@FreeBSD.org>, freebsd-arch@FreeBSD.org
Subject: Re: New "timeout" api, to replace callout
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2008 22:54:00 -0000

John Baldwin wrote:
> On Sunday 02 December 2007 07:53:18 am Andre Oppermann wrote:
>> Poul-Henning Kamp wrote:
>>> In message <4752998A.9030007@freebsd.org>, Andre Oppermann writes:
>>>>  o TCP puts the timer into an allocated structure and upon close of the
>>>>    session it has to be deallocated including stopping of all currently
>>>>    running timers.
>>>>    [...]
>>>>     -> The timer facility should provide an atomic stop/remove call
>>>>        that prevent any further callbacks upon return.  It should not
>>>>        do a 'drain' where the callback may be run anyway.
>>>>        Note: We hold the lock the callback would have to obtain.
>>> It is my intent, that the implementation behind the new API will
>>> only ever grab the specified lock when it calls the timeout function.
>> This is the same for the current one and pretty much a given.
>>
>>> When you do a timeout_disable() or timeout_cleanup() you will be
>>> sleeping on a mutex internal to the implementation, if the timeout
>>> is currently executing.
>> This is the problematic part.  We can't sleep in TCP when cleaning up
>> the timer.  We're not always called from userland but from interrupt
>> context.  And when calling the cleanup we currently hold the lock the
>> callout wants to obtain.  We can't drop it either as the race would
>> be back again.  What you describe here is the equivalent of callout_
>> drain().  This is unfortunately unworkable in TCP's context.  The
>> callout has to go away even if it is already pending and waiting on
>> the lock.  Maybe that can only be solved by a flag in the lock saying
>> "give up and go away".
> 
> The reason you need to do a drain is to allow for safe destroying of the lock.  
> Specifically, drivers tend to do this:
> 
> 	FOO_LOCK(sc);
> 	...
> 	callout_stop(...);
> 	FOO_UNLOCK(sc);
> 	...
> 	callout_drain(...);
> 	...
> 	mtx_destroy(&sc->foo_mtx);
> 
> If you don't have the drain and softclock is trying to acquire the backing 
> mutex while you have it held (before the callout_stop) then Bad Things can 
> happen if you don't do the drain.  Having the lock just "give up" doesn't 
> work either because if the memory containing the lock is free'd and 
> reinitialized such that it looks enough like a valid lock then softclock (or 
> its equivalent) will still try to obtain it.  Also, you need to do a drain so 
> it is safe to free the callout structure to prevent it from being recycled 
> and having weird races where it gets recycled and rescheduled but the timer 
> code thinks it has a pending stop for that pointer and so it aborts the wrong 
> instance of the timer, etc.

This is all well known.  ;)  What isn't known is that this (the
sleep part) is a major problem for TCP due to being run from
interrupt context.  Hence the request for some kind of busy-drain
or other method prevent the sleep.  A second less severe problem
are races while the lock is dropped during the sleep.  Here some
other part of TCP may go into the tcpcb scheduled for destruction.

-- 
Andre