Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 10 Jul 2003 15:22:12 -0400 (EDT)
From:      John Baldwin <jhb@FreeBSD.org>
To:        harti@FreeBSD.org
Cc:        hackers@FreeBSD.org
Subject:   RE: Race in kevent
Message-ID:  <XFMail.20030710152212.jhb@FreeBSD.org>
In-Reply-To: <20030710103146.R30571@beagle.fokus.fraunhofer.de>

next in thread | previous in thread | raw e-mail | index | archive | help

On 10-Jul-2003 Harti Brandt wrote:
> On Wed, 9 Jul 2003, John Baldwin wrote:
> 
> JB>On 09-Jul-2003 Harti Brandt wrote:
> JB>>
> JB>> Hi,
> JB>>
> JB>> I just had a crash while typing ^C to a program that has a kevent timer
> JB>> running. The crash was:
> JB>>
> JB>> callout_stop
> JB>> callout_reset
> JB>> filt_timerexpire
> JB>> softclock
> JB>>
> JB>> and callout_stop was accessing freed memory (0xdeadc0e2). After looking
> JB>> some time at the filt_timerdetach, callout_stop and softclock I think the
> JB>> following happened:
> 
> JB>This is becoming a common race unfortunately. :(  See the hacks in
> JB>msleep() that use TDF_TIMEOUT in coooperationg with endtsleep() and
> JB>the recent commit to the realtimer callout code for ways to work around
> JB>this race.
> 
> In both places the thread just sleeps until the timeout has fired (when I
> understand this correctly). While this is a possible workaround also for
> kevent() (which only holds Giant as far as I can see) this is by no means
> a solution for other callers. While looking through the tree I have found
> several issues with timeouts which probably should be resolved or they
> will hit us with SMP:

Yes, they sleep until the callout has finished executing.  Note that the
callout has _already_ fired.  The common case is that it is blocked on
the lock that the code trying to stop the callout is holding.  Thus, you
are going to have to have special case code in your callout handler
_anyway_ to handle these edge cases, so there really isn't a super-duper
easy-clean solution.

> - the CALLOUT_ACTIVE flag is not maintained correctly. softclock() fails
> to clear this flag after the timeout has fired. callout_stop() clears
> CALLOUT_ACTIVE if it finds the callout not PENDING. This is wrong if
> the callout is just about to be called (in this case it is !PENDING
> but ACTIVE). This makes callout_active() useless.

The problem is in the API.  One of the design goals is that a callout can
re-fire itself.  Thus, softclock can't touch the callout once it has fired
it.  This design goal is the reason for much of the confusion.

> - using callout_active() on a callout_handle. Callouts for
> callout_handles (timeout(9)) are allocated from a common pool. So you may
> just check the wrong callout if the callout has already fired and has been
> reallocated to another user. Handles allocated with timeout(9) can only
> be passed to untimeout(9)

The idea is that timeout(9) and untimeout(9) are a deprecated interface and
code should be using the callout(9) API instead.  Note that timeout(9)'s can
never be marked MPSAFE.

> I think we should try to make the callout interface usable without races
> for the !MPSAFE case (see mail from Eric Jacobs). For the MPSAFE case the
> caller should be responsible for this. And we should probably better
> document the interface.
> 
> Going to think about this...

Well, you need to consider the design goal above as it throws several
wrenches into the works.  One possibility is that we could ditch the
design goal.  Another possibility is that we could expand the callout
API to allow for periodict callouts and not just one-shot callouts.

-- 

John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.20030710152212.jhb>