From owner-freebsd-current@FreeBSD.ORG Sun Jun 13 13:46:14 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4684416A4CE for ; Sun, 13 Jun 2004 13:46:14 +0000 (GMT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id C13B443D1D for ; Sun, 13 Jun 2004 13:46:13 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i5DDjM4u009240; Sun, 13 Jun 2004 23:45:22 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i5DDjJLS028695; Sun, 13 Jun 2004 23:45:21 +1000 Date: Sun, 13 Jun 2004 23:45:19 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Don Bowman In-Reply-To: Message-ID: <20040613232622.D1173@gamplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: "'current@freebsd.org'" Subject: RE: kernel trap 19 with interrupts disabled: system hang X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 13 Jun 2004 13:46:14 -0000 On Sun, 13 Jun 2004, Don Bowman wrote: > ... OK, this did the trick, i got into db. > ... > The system was locked up, so when i pressed the key > sequence to enter the debugger, it timed out stopping > the other cpus. Everybody is in sched_switch and idle??? > ... > siointr1(c554d800) at siointr1+0xd0 > db> t 0 > sched_switch(c074bfa0) at sched_switch+0x60 > mi_switch(1,0,1,c0c21d2c,c0562ba4) at mi_switch+0x1a0 > sleepq_switch(c074bde0,0,c0c21d54,c054dd12,c074bde0) at sleepq_switch+0x135 > sleepq_timedwait(c074bde0,0,23,0,0) at sleepq_timedwait+0xc > msleep(c074bde0,0,44,c06ecd01,2710) at msleep+0x40a > scheduler(0,c1ec00,c1e000,0,c0436065) at scheduler+0x167 > mi_startup() at mi_startup+0x96 > begin() at begin+0x2c > db> t 1 > sched_switch(c53e0540) at sched_switch+0x60 > mi_switch(1,0,0,ed097c18,c0562b60) at mi_switch+0x1a0 > [... 4 CPUs at sched_switch+0x60] sched_switch holds sched_lock which masks interrupts. This accounts for the processes not being stoppable. I don't see how they can spin in sched_switch() or even all stop at the same place. Perhaps they called somewhere that is looping and the traceback isn't showing everything. This is most likely for cpu_switch(). OTOH, IIRC there is a bug in stopping CPUs that breaks seeing where they are stopped. Try looking at where they reported to be when they are stopped for entering ddb while the system is running normally. Bruce