From owner-freebsd-current@FreeBSD.ORG  Sun Jun 13 13:46:14 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 4684416A4CE
	for <current@FreeBSD.org>; Sun, 13 Jun 2004 13:46:14 +0000 (GMT)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP id C13B443D1D
	for <current@FreeBSD.org>; Sun, 13 Jun 2004 13:46:13 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i5DDjM4u009240;	Sun, 13 Jun 2004 23:45:22 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i5DDjJLS028695;	Sun, 13 Jun 2004 23:45:21 +1000
Date: Sun, 13 Jun 2004 23:45:19 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Don Bowman <don@sandvine.com>
In-Reply-To: <FE045D4D9F7AED4CBFF1B3B813C85337051D8FA9@mail.sandvine.com>
Message-ID: <20040613232622.D1173@gamplex.bde.org>
References: <FE045D4D9F7AED4CBFF1B3B813C85337051D8FA9@mail.sandvine.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: "'current@freebsd.org'" <current@FreeBSD.org>
Subject: RE: kernel trap 19 with interrupts disabled: system hang
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 13 Jun 2004 13:46:14 -0000

On Sun, 13 Jun 2004, Don Bowman wrote:

>  ... OK, this did the trick, i got into db.
> ...
> The system was locked up, so when i pressed the key
> sequence to enter the debugger, it timed out stopping
> the other cpus. Everybody is in sched_switch and idle???
> ...
> siointr1(c554d800) at siointr1+0xd0
> db> t 0
> sched_switch(c074bfa0) at sched_switch+0x60
> mi_switch(1,0,1,c0c21d2c,c0562ba4) at mi_switch+0x1a0
> sleepq_switch(c074bde0,0,c0c21d54,c054dd12,c074bde0) at sleepq_switch+0x135
> sleepq_timedwait(c074bde0,0,23,0,0) at sleepq_timedwait+0xc
> msleep(c074bde0,0,44,c06ecd01,2710) at msleep+0x40a
> scheduler(0,c1ec00,c1e000,0,c0436065) at scheduler+0x167
> mi_startup() at mi_startup+0x96
> begin() at begin+0x2c
> db> t 1
> sched_switch(c53e0540) at sched_switch+0x60
> mi_switch(1,0,0,ed097c18,c0562b60) at mi_switch+0x1a0
> [... 4 CPUs at sched_switch+0x60]

sched_switch holds sched_lock which masks interrupts.  This accounts for
the processes not being stoppable.  I don't see how they can spin in
sched_switch() or even all stop at the same place.  Perhaps they called
somewhere that is looping and the traceback isn't showing everything.
This is most likely for cpu_switch().  OTOH, IIRC there is a bug in
stopping CPUs that breaks seeing where they are stopped.  Try looking
at where they reported to be when they are stopped for entering ddb
while the system is running normally.

Bruce