Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Apr 2005 18:37:03 GMT
From:      Steve Sears <stevenjsears@yahoo.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/79693: SMP: msleep and sleepq_broadcast race
Message-ID:  <200504081837.j38Ib3Ap015022@www.freebsd.org>
Resent-Message-ID: <200504081840.j38Ie5aq054037@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         79693
>Category:       kern
>Synopsis:       SMP: msleep and sleepq_broadcast race
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Apr 08 18:40:05 GMT 2005
>Closed-Date:
>Last-Modified:
>Originator:     Steve Sears
>Release:        5.3-STABLE, 5.4-STABLE
>Organization:
self
>Environment:
FreeBSD sjs-bsd 5.4-STABLE FreeBSD 5.4-STABLE #17: Thu Apr  7 08:15:24 EDT 2005     root@sjs-bsd:/usr/src/sys-sjs/i386/compile/SJSKERN  i386
>Description:
      Failure symptom:

sleepq_broadcast spins in an infinite loop, due to its local temporary queue (list) getting corrupted.

Conditions which cause the failure:

1. msleep is being called in a loop with the PCATCH flag set
2. at the same time wakeup is called on the same identifier

wakeup calls sleepq_broadcast, which dequeues the thread from the sleepq and queues the thread on a temporary list, while holding the sleepq lock. The lock is then dropped, and threads on the temporary list are scheduled to run. So the td_slpq field of the thread struct is being manipulated while the sleepq lock is not being held. This is safe if you assume no one else can manipulate that field in the meantime, but it's an invalid assumption. The caller of msleep can return without ever going to sleep, and upon a subsequent call to msleep, queue itself on the sleepq again, which manipulates the td_slpq field, and corrupts the temporary queue used by sleepq_broadcast().

So the race is as follows:

1. threadA calls msleep with PCATCH set
2. threadA is queued on sleepq
3. threadB calls wakeup on same identifier, which calls sleepq_broadcast
4. threadB moves threadA from sleepq to local temporary queue (list)
5. threadA calls sleepq_timedwait_sig, which calls sleepq_switch
6. threadA determines it has been woken up since td->td_sleepqueue != NULL
7. threadA returns from sleepq_switch, sleepq_timed_wait, msleep
8. threadA calls msleep with PCATCH set
9. threadA is queued on sleepq [CORRUPTION] Note threadA is still on
sleepq_broadcast's local temporary queue (list). This results in threadA
pointing back to itself on the sleepq.
10. threadA goes to sleep
11. threadB dequeues threadA from its local temporary queue (list) and
schedules it.
12. goto 11


Problem verification:
1> add a 'volatile int test_flags' to the thread strucutre.
2> test_flags = 1, in the first loop of the sleepq_broadcast
3> test_flags = 0, in the second loop (local queue processing)
4> panic in sleepq_switch 
<snip> 
if (td->td_sleepqueue != NULL) {
    if (td->test_flags)
        panic("msleep returning without sleeping, and race with broadcast\n");
}
</snip>


>How-To-Repeat:
      Create a thread.  have this thread run a while loop where it tsleep(my_thread_wait_channel, PZERO | PCATCH, "my_wait", hz), where kernel hz is set to 1000 (i believe this is the x86 default).

Send wakeups to this thread from an ithread at a high rate (1K per second?).  ithread vs a regular thread may not matter.  Eventually, the thread sending the wakeups will enter an infinite loop caused by a cycle in a sleepq.  If there are multiple threads sending wakeups, then the system may hang or NMI.


>Fix:
      Here's one solution that is probably faster than the original code.

It addresses the race while minimizing lock overhead.
 

Change sleepq_broadcast to:

void
sleepq_broadcast(void *wchan, int flags, int pri)
{
    struct sleepqueue *sq;
    struct thread *td;

    CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags);
    KASSERT(wchan != NULL, ("%s: invalid NULL wait channel", __func__));
    sq = sleepq_lookup(wchan);
    if (sq == NULL) {
        sleepq_release(wchan);
        return;
    }
    KASSERT(sq->sq_type == (flags & SLEEPQ_TYPE),
        ("%s: mismatch between sleep/wakeup and cv_*", __func__));
    /* XXX: Do for all sleep queues eventually. */
    if (flags & SLEEPQ_CONDVAR)
        mtx_assert(sq->sq_lock, MA_OWNED);

    while (!TAILQ_EMPTY(&sq->sq_blocked)) {
        td = TAILQ_FIRST(&sq->sq_blocked);
        sleepq_remove_and_resume_thread(sq, td);
    }
    sleepq_release(wchan);
}

Add this function:

/*
 * Removes and resumes thread from a sleep queue.
 */
static void
sleepq_remove_and_resume_thread(struct sleepqueue *sq, struct thread *td, int pri)
{
    struct sleepqueue_chain *sc;

    MPASS(td != NULL);
    MPASS(sq->sq_wchan != NULL);
    MPASS(td->td_wchan == sq->sq_wchan);
    sc = SC_LOOKUP(sq->sq_wchan);
    mtx_assert(&sc->sc_lock, MA_OWNED);

    /* Remove the thread from the queue. */
    TAILQ_REMOVE(&sq->sq_blocked, td, td_slpq);

    /*
     * Get a sleep queue for this thread.  If this is the last waiter,
     * use the queue itself and take it out of the chain, otherwise,
     * remove a queue from the free list.
     */
    if (LIST_EMPTY(&sq->sq_free)) {
        td->td_sleepqueue = sq;
#ifdef INVARIANTS
        sq->sq_wchan = NULL;
#endif
#ifdef SLEEPQUEUE_PROFILING
        sc->sc_depth--;
#endif
    } else
        td->td_sleepqueue = LIST_FIRST(&sq->sq_free);
    LIST_REMOVE(td->td_sleepqueue, sq_hash);

    mtx_lock_spin(&sched_lock);

    td->td_wmesg = NULL;
    td->td_wchan = NULL;

    CTR3(KTR_PROC, "sleepq_wakeup: thread %p (pid %ld, %s)",
        (void *)td, (long)td->td_proc->p_pid, td->td_proc->p_comm);
    TD_CLR_SLEEPING(td);

    /* Adjust priority if requested. */
    MPASS(pri == -1 || (pri >= PRI_MIN && pri <= PRI_MAX));
    if (pri != -1 && td->td_priority > pri)
        sched_prio(td, pri);
    setrunnable(td);

    mtx_unlock_spin(&sched_lock);
}
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200504081837.j38Ib3Ap015022>