From owner-freebsd-current@FreeBSD.ORG Sun Jan 17 05:33:57 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C943106566B for ; Sun, 17 Jan 2010 05:33:57 +0000 (UTC) (envelope-from okuno.kohji@jp.panasonic.com) Received: from smtp.mei.co.jp (smtp.mei.co.jp [133.183.100.20]) by mx1.freebsd.org (Postfix) with ESMTP id 2227A8FC08 for ; Sun, 17 Jan 2010 05:33:56 +0000 (UTC) Received: from mail-gw.jp.panasonic.com ([157.8.1.145]) by smtp.mei.co.jp (8.12.11.20060614/3.7W/kc-maile14) with ESMTP id o0H5M302002157 for ; Sun, 17 Jan 2010 14:22:03 +0900 (JST) Received: from epochmail.jp.panasonic.com (localhost [127.0.0.1]) by mail.jp.panasonic.com (8.11.6p2/3.7W/kc-maili06) with ESMTP id o0H5M3124516 for ; Sun, 17 Jan 2010 14:22:03 +0900 (JST) Received: by epochmail.jp.panasonic.com (8.12.11.20060308/3.7W/somla2) id o0H5M3KI000757 for freebsd-current@freebsd.org; Sun, 17 Jan 2010 14:22:03 +0900 (JST) Received: from localhost by somla2.jp.panasonic.com (8.12.11.20060308/3.7W) with ESMTP id o0H5M0v0000701; Sun, 17 Jan 2010 14:22:00 +0900 (JST) Date: Sun, 17 Jan 2010 14:22:00 +0900 (JST) Message-Id: <20100117.142200.321689433999177718.okuno.kohji@jp.panasonic.com> To: freebsd-current@freebsd.org From: Kohji Okuno Organization: Panasonic Corporation X-Mailer: Mew version 6.3 on Emacs 23.1 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: okuno.kohji@jp.panasonic.com Subject: Bug about sched_4bsd? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Jan 2010 05:33:57 -0000 Hello, I think that sched_4bsd.c:sched_switch() has a problem. I encounterd the problem that a process could not exit. That process has multi threads and p_flag was as below. p_flag : (P_INMEM|P_STOPPED_SINGLE|P_EXEC|P_SINGLE_EXIT|P_PPWAIT|P_CONTROLT) One thread (THREAD A) was suspended as below. sched_switch() mi_switch() thread_suspend_switch() thread_single() exit1() sys_exit() td_flags : TDF_NEEDSIGCHK|TDF_ASTPENDING|TDF_CANSWAP|TDF_INMEM Another thread (THREAD B) was sleeping as below. sched_switch() mi_switch() sleepq_switch() sleepq_catch_signals() sleepq_wait_sig() _cv_wait_sig() seltdwait() kern_select() select() td_flags : TDF_CANSWAP|TDF_SINTR|TDF_INMEM That process could not exit, because THREAD B did not execute "thread_suspend_check()" and THREAD A had been suspended. I think that the race condition had occurred about td_flags as below. In sched_switch(), when td->td_lock is not &sched_lock, td->td_lock will be unlocked as shown below. And then, td->td_flags will change without td->td_lock. <> kern_thread.c: int thread_single(int mode) { ... thread_lock(td2); td2->td_flags |= TDF_ASTPENDING | TDF_NEEDSUSPCHK; *** I think that td2 specified THREAD B in this time. <> sched_4bsd.c: void sched_switch(struct thread *td, struct thread *newtd, int flags) { ... /* * Switch to the sched lock to fix things up and pick * a new thread. */ if (td->td_lock != &sched_lock) { mtx_lock_spin(&sched_lock); thread_unlock(td); } *** I think that td_lock was sleepqueue_chagin->sc_lock. ... td->td_lastcpu = td->td_oncpu; td->td_flags &= ~TDF_NEEDRESCHED; td->td_owepreempt = 0; td->td_oncpu = NOCPU; -- Thanks, Kohji Okuno.