From owner-freebsd-current Thu Dec 7 19:39:34 2000 From owner-freebsd-current@FreeBSD.ORG Thu Dec 7 19:39:32 2000 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from pike.osd.bsdi.com (pike.osd.bsdi.com [204.216.28.222]) by hub.freebsd.org (Postfix) with ESMTP id EF55C37B400 for ; Thu, 7 Dec 2000 19:39:31 -0800 (PST) Received: from laptop.baldwin.cx (john@jhb-laptop.osd.bsdi.com [204.216.28.241]) by pike.osd.bsdi.com (8.11.1/8.9.3) with ESMTP id eB83cp739981; Thu, 7 Dec 2000 19:38:51 -0800 (PST) (envelope-from jhb@FreeBSD.org) Message-ID: X-Mailer: XFMail 1.4.0 on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Thu, 07 Dec 2000 19:39:41 -0800 (PST) From: John Baldwin To: The Hermit Hacker Subject: Current Broken! Cc: freebsd-current@FreeBSD.org Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 08-Dec-00 The Hermit Hacker wrote: > > Just upgraded the kernel, rebooted and it hung/panic'd with: > > panic: spin lock sched lock held by 0x0xc02a73el for > 5 seconds > cpuid = 1; lapic.id = 01000000 > Debugger("panic") > > I have DDB enabled, and ctl-alt-esc doesn't break to the debugger, so its > totally hung here ... > > dual-cpu celeron, smp enabled ... > > Marc G. Fournier ICQ#7615664 IRC Nick: > Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: > scrappy@{freebsd|postgresql}.org Yes. Something is broken with mutexes for the non-I386_CPU (and thus for SMP) case in -current with the latest commit to i386/include/mutex.h. Of course, you can revert that commit and then your kernel won't compile.... In the code I've looked at so far, it looks like possibly a weird register allocation bug in gcc and/or another weird nuance in the register constraints. In the specific case I am looking at, the mtx_exit() of Giant in STOPEVENT in syscall2() failed to properly release an unrecursed, uncontested Giant in mtx_exit() and fell through to mtx_exit_hard(), which assumes that Giant is either recursed or contested. When I disassembled the kernel and looked at the code, gcc assumed that when it looked up curproc for the mtx_enter() operation (which executed ok as far as I can tell), it could leave the value of curproc cached in %edi _across_ the call to the stopevent() function. My only guess is that %edi was clobbered during stopevent(), causing the cmpxchgl to fail, and throwing the code into mtx_exit_hard() when it shouldn't have. :( If anyone is an expert at register constraints, etc., please feel free to look at the macros in src/sys/i386/include/mutex.h -- John Baldwin -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message