Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 01 Oct 2004 12:32:02 -0600
From:      Scott Long <scottl@FreeBSD.org>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        Stephan Uphoff <ups@tree.com>
Subject:   Re: sched_switch (sched_4bsd) may be preempted
Message-ID:  <415DA2A2.5010309@FreeBSD.org>
In-Reply-To: <200410011815.i91IFJQq019971@apollo.backplane.com>
References:  <1096610130.21577.219.camel@palm.tree.com> <1096644427.25800.26.camel@palm.tree.com> <200410011815.i91IFJQq019971@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Matthew Dillon wrote:
> :The core is wrapped in the sched_lock. And since it is a spin lock it is
> :running in a critical section with interrupts disabled.
> :
> :The additional (recursive) critical_enter is just an abusive way to tell
> :maybe_preempt* that it should not immediately switch.
> :( Yes - eventually there should be a better way to do this)
> 
>     Umm.  So you are saying that the code is intentionally breaking the
>     API that should otherwise be protecting it from a preemptive thread
>     switch by making the assumption that the only critical section count
>     will come from an intentional scheduler mutex obtained ONLY for the
>     purpose of calling sched_add(), and thus only two or more critical section
>     counts means that the originally interrupted code desires no preemption?
>     No wonder the scheduler is broken!  That sounds like a recipe for disaster!
> 
>     But the solution is simple enough... move the maybe_preempt() call out
>     of sched_add().  That is, remove the flawed assumption instead of adding
>     further hacks to work around the flawed assumption.  Instead, 
>     just conditionally set TDP_OWEPREEMPT there, don't actually try to switch.
>     Then simply check for TDP_OWEPREEMPT either just after the scheduler 
>     mutex is released, or just before it would otherwise be released.  The
>     recursion is happening because the original code was badly designed,
>     not because it is an inevitable consequence of implementing preemption.

This is a pretty bold assumption to make.  I agree that when I first 
looked at this code a while back I was quite confused by the semantics, 
but after discussing it with John it makes a whole lot more sense.  The
whole design is based on being allowed to be switched away from if
curthread->td_critnest is less than one.  Simply holding a single 
spinlock or critical section will not prevent this, but this is only a
problem from within the scheduler.  If a thread enters the scheduler
with a spinlock  or critical section held, the act of the scheduler
picking up sched_lock will bump up td_critnest and prevent preemption.
This of course leaves a hole where the scheduler is entered without
a spinlock held, and Stephen looks like he's cleaning up this hole,
and doing a pretty reasonable job at it.  The easy hack was to just
wrap setrunqueue() in a critical section, but there were still a few
problems with that.

I agree that there are better ways to deal with this in the long run,
but please don't distract us from making it work correctly right now.

>     But this problem looks *REALLY* easy to fix... NOT by adding more hacks,
>     but by fixing the originally flawed code.
> 
> :>     I would not reset TDP_OWEPREEMPT there.  If I understand its function
> :>     correctly you need to leave it intact in order to detect preemption
> :>     request races against the scheduler.  Since at that point newtd may
> :>     be non-NULL and thus not cause another scheduling queue check to be
> :>     made before the next switch, you cannot safely clear the flag where you
> :>     are clearing it.
> :
> :This is all running in critical section and we just decided to switch
> :and either have or will pick the best thread. Interrupts are locked. The
> :additional critical section just prevents recursion problems by delaying
> :unwanted switches in maybe_preempt* . Resetting TDP_OWEPREEMPT is
> :perfectly save since we switch to the thread chosen while everything has
> :been locked.
> :
> :	Stephan
> 
>     I sorta see that, but then again newtd is already set so you are
>     assuming that no side effects have occured (from calling other scheduler
>     related routines) since newtd was last chosen.  But it is clear that
>     there are a ton of opporunities for side effects either to occur or
>     to occur in the future as the code continues to be modified, which makes
>     this sort of assumption very dangerous and makes the resulting code very
>     fragile.  For example, if someone ever wanted to avoid physically 
>     disabling interrupts with a 'cli' in critical_enter() (and this is 
>     something that could very well happen since neither the original 4.x code
>     or the DragonFly code disables interrupts in this case, as an 
>     optimization), that breaks all of your assumptions.   In fact, the
>     interrupt disablement is being done in the machine-dependant (MD) code,
>     and you are assuming it in machine-independant (MI) code.  This makes
>     your assumption even MORE unsafe.
> 
>     Your goal, with all the problems that the scheduler is having now, should
>     be to make the code more robust, NOT make it more fragile.

Testing is showing that it is becoming more robust and following the
original design goals.  The real problem here is that the design wasn't
well documented, and what was documented wasn't being read by most
people.

Scott



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?415DA2A2.5010309>