Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Jan 2002 14:29:03 -0800 (PST)
From:      John Baldwin <jhb@FreeBSD.org>
To:        Skye Poier <skye@ffwd.cx>
Cc:        FreeBSD 31337 H4X0RZ <freebsd-hackers@freebsd.org>
Subject:   RE: Possible problem with timeouts?
Message-ID:  <XFMail.020111142903.jhb@FreeBSD.org>
In-Reply-To: <20020111132615.A36583@ffwd.cx>

next in thread | previous in thread | raw e-mail | index | archive | help

On 11-Jan-02 Skye Poier wrote:
> Hello Hackers,
> 
> While doing an audit of the timer code in FreeBSD's kernel one of our
> developers came across a theoretical bug and I thought I'd run it by the
> gurus on this list before we hack around it.
> 
> It seems that it is possible to call untimeout and then have your timer
> called immediately thereafter. However, we haven't actually seen this in
> practice, this is a theoretical bug. If this is indeed the case, it will
> break lots of our code (misunderstood semantics..)

Yes.

> If you look at softclock, you will see that the callout_lock mutex is
> released after we have decided on the callback to call next, but right
> before we actually call it. Theoretically, the following seems possible:
> 
>     callout thread          our kern thread
>     --------------          ---------------
>                             acquire Giant
>                             try to acquire callout_lock
>     choose callout
>     release callout_lock
>     try to acquire Giant
>                             remove callout
>                             release callout_lock
>                             release Giant
>                             untimeout returns
>                             caller removes resource callout needs
>     acquire Giant
>     call callout
>     BIG TROUBLE FOR MOOSE
>     AND SQUIRREL
> 
> With these semantics, things get severely broken, because there's no
> reliable way to clean up after timeouts except to just never call
> untimeout and have the timeouts themselves realize they have been
> cancelled.

Incorrect, there is a reliable way. :)  However, it has been fixed since your
snapshot.  callout_reset() returns a boolean now, true if it succesfully
removed the item, false otherwise.  We use this to workaround just such a race
with msleep and the endtsleep timeout.  In msleep, when we resume, if
PS_TIMEOUT isn't set, we do a callout_reset().  If that fails, then we know
that we have lost the race (i.e., endtsleep is still out there waiting to wake
us up), so to workaround the race, msleep sets PS_TIMEOUT and the thread
suspends.  In endtsleep, if PS_TIMEOUT is set, then we know we've lost the
race, so we unsuspend the thread.  This synchronization is necessary to avoid
having the endtsleep() wakeup the wrong tsleep (imagine a while (!foo)
tsleep(.., timo);).  Note that this means that the calling software has to use
a flag (like msleep/endtsleep uses PS_TIMEOUT) to help detect and flag this
condition in cases where this is a problem.  For some callouts, however, having
it run is harmless.

We have to drop the callout lock around the actual timeout function to prevent
lock order reversals and allow callouts to grab locks.

> If anyone has some insight into this it would be much appreciated.  I
> have a second question around softclock but I'll save it for later..

Hope this helps.

> Thanks,
> Skye Poier

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.020111142903.jhb>