From owner-freebsd-bugs@FreeBSD.ORG Fri Apr 8 18:40:06 2005 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8389216A4CE for ; Fri, 8 Apr 2005 18:40:06 +0000 (GMT) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1AC5A43D49 for ; Fri, 8 Apr 2005 18:40:06 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j38Ie5L3054038 for ; Fri, 8 Apr 2005 18:40:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j38Ie5aq054037; Fri, 8 Apr 2005 18:40:05 GMT (envelope-from gnats) Resent-Date: Fri, 8 Apr 2005 18:40:05 GMT Resent-Message-Id: <200504081840.j38Ie5aq054037@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Steve Sears Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A2F3316A4CE for ; Fri, 8 Apr 2005 18:37:04 +0000 (GMT) Received: from www.freebsd.org (www.freebsd.org [216.136.204.117]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2558043D5F for ; Fri, 8 Apr 2005 18:37:04 +0000 (GMT) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.13.1/8.13.1) with ESMTP id j38Ib4L0015023 for ; Fri, 8 Apr 2005 18:37:04 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.13.1/8.13.1/Submit) id j38Ib3Ap015022; Fri, 8 Apr 2005 18:37:03 GMT (envelope-from nobody) Message-Id: <200504081837.j38Ib3Ap015022@www.freebsd.org> Date: Fri, 8 Apr 2005 18:37:03 GMT From: Steve Sears To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-2.3 Subject: kern/79693: SMP: msleep and sleepq_broadcast race X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Apr 2005 18:40:06 -0000 >Number: 79693 >Category: kern >Synopsis: SMP: msleep and sleepq_broadcast race >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Fri Apr 08 18:40:05 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Steve Sears >Release: 5.3-STABLE, 5.4-STABLE >Organization: self >Environment: FreeBSD sjs-bsd 5.4-STABLE FreeBSD 5.4-STABLE #17: Thu Apr 7 08:15:24 EDT 2005 root@sjs-bsd:/usr/src/sys-sjs/i386/compile/SJSKERN i386 >Description: Failure symptom: sleepq_broadcast spins in an infinite loop, due to its local temporary queue (list) getting corrupted. Conditions which cause the failure: 1. msleep is being called in a loop with the PCATCH flag set 2. at the same time wakeup is called on the same identifier wakeup calls sleepq_broadcast, which dequeues the thread from the sleepq and queues the thread on a temporary list, while holding the sleepq lock. The lock is then dropped, and threads on the temporary list are scheduled to run. So the td_slpq field of the thread struct is being manipulated while the sleepq lock is not being held. This is safe if you assume no one else can manipulate that field in the meantime, but it's an invalid assumption. The caller of msleep can return without ever going to sleep, and upon a subsequent call to msleep, queue itself on the sleepq again, which manipulates the td_slpq field, and corrupts the temporary queue used by sleepq_broadcast(). So the race is as follows: 1. threadA calls msleep with PCATCH set 2. threadA is queued on sleepq 3. threadB calls wakeup on same identifier, which calls sleepq_broadcast 4. threadB moves threadA from sleepq to local temporary queue (list) 5. threadA calls sleepq_timedwait_sig, which calls sleepq_switch 6. threadA determines it has been woken up since td->td_sleepqueue != NULL 7. threadA returns from sleepq_switch, sleepq_timed_wait, msleep 8. threadA calls msleep with PCATCH set 9. threadA is queued on sleepq [CORRUPTION] Note threadA is still on sleepq_broadcast's local temporary queue (list). This results in threadA pointing back to itself on the sleepq. 10. threadA goes to sleep 11. threadB dequeues threadA from its local temporary queue (list) and schedules it. 12. goto 11 Problem verification: 1> add a 'volatile int test_flags' to the thread strucutre. 2> test_flags = 1, in the first loop of the sleepq_broadcast 3> test_flags = 0, in the second loop (local queue processing) 4> panic in sleepq_switch if (td->td_sleepqueue != NULL) { if (td->test_flags) panic("msleep returning without sleeping, and race with broadcast\n"); } >How-To-Repeat: Create a thread. have this thread run a while loop where it tsleep(my_thread_wait_channel, PZERO | PCATCH, "my_wait", hz), where kernel hz is set to 1000 (i believe this is the x86 default). Send wakeups to this thread from an ithread at a high rate (1K per second?). ithread vs a regular thread may not matter. Eventually, the thread sending the wakeups will enter an infinite loop caused by a cycle in a sleepq. If there are multiple threads sending wakeups, then the system may hang or NMI. >Fix: Here's one solution that is probably faster than the original code. It addresses the race while minimizing lock overhead. Change sleepq_broadcast to: void sleepq_broadcast(void *wchan, int flags, int pri) { struct sleepqueue *sq; struct thread *td; CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags); KASSERT(wchan != NULL, ("%s: invalid NULL wait channel", __func__)); sq = sleepq_lookup(wchan); if (sq == NULL) { sleepq_release(wchan); return; } KASSERT(sq->sq_type == (flags & SLEEPQ_TYPE), ("%s: mismatch between sleep/wakeup and cv_*", __func__)); /* XXX: Do for all sleep queues eventually. */ if (flags & SLEEPQ_CONDVAR) mtx_assert(sq->sq_lock, MA_OWNED); while (!TAILQ_EMPTY(&sq->sq_blocked)) { td = TAILQ_FIRST(&sq->sq_blocked); sleepq_remove_and_resume_thread(sq, td); } sleepq_release(wchan); } Add this function: /* * Removes and resumes thread from a sleep queue. */ static void sleepq_remove_and_resume_thread(struct sleepqueue *sq, struct thread *td, int pri) { struct sleepqueue_chain *sc; MPASS(td != NULL); MPASS(sq->sq_wchan != NULL); MPASS(td->td_wchan == sq->sq_wchan); sc = SC_LOOKUP(sq->sq_wchan); mtx_assert(&sc->sc_lock, MA_OWNED); /* Remove the thread from the queue. */ TAILQ_REMOVE(&sq->sq_blocked, td, td_slpq); /* * Get a sleep queue for this thread. If this is the last waiter, * use the queue itself and take it out of the chain, otherwise, * remove a queue from the free list. */ if (LIST_EMPTY(&sq->sq_free)) { td->td_sleepqueue = sq; #ifdef INVARIANTS sq->sq_wchan = NULL; #endif #ifdef SLEEPQUEUE_PROFILING sc->sc_depth--; #endif } else td->td_sleepqueue = LIST_FIRST(&sq->sq_free); LIST_REMOVE(td->td_sleepqueue, sq_hash); mtx_lock_spin(&sched_lock); td->td_wmesg = NULL; td->td_wchan = NULL; CTR3(KTR_PROC, "sleepq_wakeup: thread %p (pid %ld, %s)", (void *)td, (long)td->td_proc->p_pid, td->td_proc->p_comm); TD_CLR_SLEEPING(td); /* Adjust priority if requested. */ MPASS(pri == -1 || (pri >= PRI_MIN && pri <= PRI_MAX)); if (pri != -1 && td->td_priority > pri) sched_prio(td, pri); setrunnable(td); mtx_unlock_spin(&sched_lock); } >Release-Note: >Audit-Trail: >Unformatted: