From owner-svn-src-all@FreeBSD.ORG Tue Sep 15 16:56:18 2009 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38E06106566C; Tue, 15 Sep 2009 16:56:18 +0000 (UTC) (envelope-from attilio@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id 27FE78FC08; Tue, 15 Sep 2009 16:56:18 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id n8FGuIDp095141; Tue, 15 Sep 2009 16:56:18 GMT (envelope-from attilio@svn.freebsd.org) Received: (from attilio@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id n8FGuIW3095139; Tue, 15 Sep 2009 16:56:18 GMT (envelope-from attilio@svn.freebsd.org) Message-Id: <200909151656.n8FGuIW3095139@svn.freebsd.org> From: Attilio Rao Date: Tue, 15 Sep 2009 16:56:18 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r197223 - head/sys/kern X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2009 16:56:18 -0000 Author: attilio Date: Tue Sep 15 16:56:17 2009 New Revision: 197223 URL: http://svn.freebsd.org/changeset/base/197223 Log: Fix sched_switch_migrate(): - In 8.x and above the run-queue locks are nomore shared even in the HTT case, so remove the special case. - The deadlock explained in the removed comment here is still possible even with different locks, with the contribution of tdq_lock_pair(). An explanation is here: (hypotesis: a thread needs to migrate on another CPU, thread1 is doing sched_switch_migrate() and thread2 is the one handling the sched_switch() request or in other words, thread1 is the thread that needs to migrate and thread2 is a thread that is going to be preempted, most likely an idle thread. Also, 'old' is referred to the context (in terms of run-queue and CPU) thread1 is leaving and 'new' is referred to the context thread1 is going into. Finally, thread3 is doing tdq_idletd() or sched_balance() and definitively doing tdq_lock_pair()) * thread1 blocks its td_lock. Now td_lock is 'blocked' * thread1 drops its old runqueue lock * thread1 acquires the new runqueue lock * thread1 adds itself to the new runqueue and sends an IPI_PREEMPT through tdq_notify() to the new CPU * thread1 drops the new lock * thread3, scanning the runqueues, locks the old lock * thread2 received the IPI_PREEMPT and does thread_lock() with td_lock pointing to the new runqueue * thread3 wants to acquire the new runqueue lock, but it can't because it is held by thread2 so it spins * thread1 wants to acquire old lock, but as long as it is held by thread3 it can't * thread2 going further, at some point wants to switchin in thread1, but it will wait forever because thread1->td_lock is in blocked state This deadlock has been manifested mostly on 7.x and reported several time on mailing lists under the voice 'spinlock held too long'. Many thanks to des@ for having worked hard on producing suitable textdumps and Jeff for help on the comment wording. Reviewed by: jeff Reported by: des, others Tested by: des, Giovanni Trematerra (STABLE_7 based version) Modified: head/sys/kern/sched_ule.c Modified: head/sys/kern/sched_ule.c ============================================================================== --- head/sys/kern/sched_ule.c Tue Sep 15 12:51:22 2009 (r197222) +++ head/sys/kern/sched_ule.c Tue Sep 15 16:56:17 2009 (r197223) @@ -1749,19 +1749,19 @@ sched_switch_migrate(struct tdq *tdq, st */ spinlock_enter(); thread_block_switch(td); /* This releases the lock on tdq. */ - TDQ_LOCK(tdn); - tdq_add(tdn, td, flags); - tdq_notify(tdn, td); + /* - * After we unlock tdn the new cpu still can't switch into this - * thread until we've unblocked it in cpu_switch(). The lock - * pointers may match in the case of HTT cores. Don't unlock here - * or we can deadlock when the other CPU runs the IPI handler. + * Acquire both run-queue locks before placing the thread on the new + * run-queue to avoid deadlocks created by placing a thread with a + * blocked lock on the run-queue of a remote processor. The deadlock + * occurs when a third processor attempts to lock the two queues in + * question while the target processor is spinning with its own + * run-queue lock held while waiting for the blocked lock to clear. */ - if (TDQ_LOCKPTR(tdn) != TDQ_LOCKPTR(tdq)) { - TDQ_UNLOCK(tdn); - TDQ_LOCK(tdq); - } + tdq_lock_pair(tdn, tdq); + tdq_add(tdn, td, flags); + tdq_notify(tdn, td); + TDQ_UNLOCK(tdn); spinlock_exit(); #endif return (TDQ_LOCKPTR(tdn));