From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 22:53:59 2008 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E957C16A41A for ; Wed, 2 Jan 2008 22:53:59 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 5407013C46A for ; Wed, 2 Jan 2008 22:53:59 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 81381 invoked from network); 2 Jan 2008 22:19:15 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 2 Jan 2008 22:19:15 -0000 Message-ID: <477C1604.2030905@freebsd.org> Date: Wed, 02 Jan 2008 23:53:56 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: John Baldwin References: <18378.1196596684@critter.freebsd.dk> <4752AABE.6090006@freebsd.org> <200712271805.40972.jhb@freebsd.org> In-Reply-To: <200712271805.40972.jhb@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Attilio Rao , arch@FreeBSD.org, Poul-Henning Kamp , Robert Watson , freebsd-arch@FreeBSD.org Subject: Re: New "timeout" api, to replace callout X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2008 22:54:00 -0000 John Baldwin wrote: > On Sunday 02 December 2007 07:53:18 am Andre Oppermann wrote: >> Poul-Henning Kamp wrote: >>> In message <4752998A.9030007@freebsd.org>, Andre Oppermann writes: >>>> o TCP puts the timer into an allocated structure and upon close of the >>>> session it has to be deallocated including stopping of all currently >>>> running timers. >>>> [...] >>>> -> The timer facility should provide an atomic stop/remove call >>>> that prevent any further callbacks upon return. It should not >>>> do a 'drain' where the callback may be run anyway. >>>> Note: We hold the lock the callback would have to obtain. >>> It is my intent, that the implementation behind the new API will >>> only ever grab the specified lock when it calls the timeout function. >> This is the same for the current one and pretty much a given. >> >>> When you do a timeout_disable() or timeout_cleanup() you will be >>> sleeping on a mutex internal to the implementation, if the timeout >>> is currently executing. >> This is the problematic part. We can't sleep in TCP when cleaning up >> the timer. We're not always called from userland but from interrupt >> context. And when calling the cleanup we currently hold the lock the >> callout wants to obtain. We can't drop it either as the race would >> be back again. What you describe here is the equivalent of callout_ >> drain(). This is unfortunately unworkable in TCP's context. The >> callout has to go away even if it is already pending and waiting on >> the lock. Maybe that can only be solved by a flag in the lock saying >> "give up and go away". > > The reason you need to do a drain is to allow for safe destroying of the lock. > Specifically, drivers tend to do this: > > FOO_LOCK(sc); > ... > callout_stop(...); > FOO_UNLOCK(sc); > ... > callout_drain(...); > ... > mtx_destroy(&sc->foo_mtx); > > If you don't have the drain and softclock is trying to acquire the backing > mutex while you have it held (before the callout_stop) then Bad Things can > happen if you don't do the drain. Having the lock just "give up" doesn't > work either because if the memory containing the lock is free'd and > reinitialized such that it looks enough like a valid lock then softclock (or > its equivalent) will still try to obtain it. Also, you need to do a drain so > it is safe to free the callout structure to prevent it from being recycled > and having weird races where it gets recycled and rescheduled but the timer > code thinks it has a pending stop for that pointer and so it aborts the wrong > instance of the timer, etc. This is all well known. ;) What isn't known is that this (the sleep part) is a major problem for TCP due to being run from interrupt context. Hence the request for some kind of busy-drain or other method prevent the sleep. A second less severe problem are races while the lock is dropped during the sleep. Here some other part of TCP may go into the tcpcb scheduled for destruction. -- Andre