From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 16:28:32 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9C59F16A417; Thu, 3 Jan 2008 16:28:32 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from speedfactory.net (mail.speedfactory.net [66.23.216.219]) by mx1.freebsd.org (Postfix) with ESMTP id 02CB113C45A; Thu, 3 Jan 2008 16:28:31 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (unverified [66.23.211.162]) by speedfactory.net (SurgeMail 3.8q) with ESMTP id 227143444-1834499 for multiple; Thu, 03 Jan 2008 11:26:36 -0500 Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.8/8.13.8) with ESMTP id m03GSLgu028899; Thu, 3 Jan 2008 11:28:22 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-arch@freebsd.org Date: Thu, 3 Jan 2008 11:05:50 -0500 User-Agent: KMail/1.9.6 References: <18378.1196596684@critter.freebsd.dk> <200712271805.40972.jhb@freebsd.org> <477C1604.2030905@freebsd.org> In-Reply-To: <477C1604.2030905@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200801031105.52354.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Thu, 03 Jan 2008 11:28:22 -0500 (EST) X-Virus-Scanned: ClamAV 0.91.2/5348/Thu Jan 3 00:26:56 2008 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Attilio Rao , arch@freebsd.org, Poul-Henning Kamp , Andre Oppermann , Robert Watson Subject: Re: New "timeout" api, to replace callout X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2008 16:28:32 -0000 On Wednesday 02 January 2008 05:53:56 pm Andre Oppermann wrote: > John Baldwin wrote: > > On Sunday 02 December 2007 07:53:18 am Andre Oppermann wrote: > >> Poul-Henning Kamp wrote: > >>> In message <4752998A.9030007@freebsd.org>, Andre Oppermann writes: > >>>> o TCP puts the timer into an allocated structure and upon close of the > >>>> session it has to be deallocated including stopping of all currently > >>>> running timers. > >>>> [...] > >>>> -> The timer facility should provide an atomic stop/remove call > >>>> that prevent any further callbacks upon return. It should not > >>>> do a 'drain' where the callback may be run anyway. > >>>> Note: We hold the lock the callback would have to obtain. > >>> It is my intent, that the implementation behind the new API will > >>> only ever grab the specified lock when it calls the timeout function. > >> This is the same for the current one and pretty much a given. > >> > >>> When you do a timeout_disable() or timeout_cleanup() you will be > >>> sleeping on a mutex internal to the implementation, if the timeout > >>> is currently executing. > >> This is the problematic part. We can't sleep in TCP when cleaning up > >> the timer. We're not always called from userland but from interrupt > >> context. And when calling the cleanup we currently hold the lock the > >> callout wants to obtain. We can't drop it either as the race would > >> be back again. What you describe here is the equivalent of callout_ > >> drain(). This is unfortunately unworkable in TCP's context. The > >> callout has to go away even if it is already pending and waiting on > >> the lock. Maybe that can only be solved by a flag in the lock saying > >> "give up and go away". > > > > The reason you need to do a drain is to allow for safe destroying of the lock. > > Specifically, drivers tend to do this: > > > > FOO_LOCK(sc); > > ... > > callout_stop(...); > > FOO_UNLOCK(sc); > > ... > > callout_drain(...); > > ... > > mtx_destroy(&sc->foo_mtx); > > > > If you don't have the drain and softclock is trying to acquire the backing > > mutex while you have it held (before the callout_stop) then Bad Things can > > happen if you don't do the drain. Having the lock just "give up" doesn't > > work either because if the memory containing the lock is free'd and > > reinitialized such that it looks enough like a valid lock then softclock (or > > its equivalent) will still try to obtain it. Also, you need to do a drain so > > it is safe to free the callout structure to prevent it from being recycled > > and having weird races where it gets recycled and rescheduled but the timer > > code thinks it has a pending stop for that pointer and so it aborts the wrong > > instance of the timer, etc. > > This is all well known. ;) What isn't known is that this (the > sleep part) is a major problem for TCP due to being run from > interrupt context. Hence the request for some kind of busy-drain > or other method prevent the sleep. A second less severe problem > are races while the lock is dropped during the sleep. Here some > other part of TCP may go into the tcpcb scheduled for destruction. My point is that there isn't really a good way to fix this that doesn't involve sleeping. If you just spin you may spin forever (netisr has a higher priority than softclock IIRC). One option is to not destroy pcb's directly in interrupt context but instead to queue them and let a task on a taskqueue finish the destruction in a context where it can sleep if necessary. -- John Baldwin