From owner-freebsd-hackers@FreeBSD.ORG  Tue Sep 20 14:47:03 2005
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
X-Original-To: freebsd-hackers@freebsd.org
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 7F79716A41F
	for <freebsd-hackers@freebsd.org>; Tue, 20 Sep 2005 14:47:03 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from cyrus.watson.org (cyrus.watson.org [204.156.12.53])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 0F72143D45
	for <freebsd-hackers@freebsd.org>; Tue, 20 Sep 2005 14:47:03 +0000 (GMT)
	(envelope-from rwatson@FreeBSD.org)
Received: from fledge.watson.org (fledge.watson.org [204.156.12.50])
	by cyrus.watson.org (Postfix) with ESMTP id 5EB6246B3E;
	Tue, 20 Sep 2005 10:47:02 -0400 (EDT)
Date: Tue, 20 Sep 2005 15:47:02 +0100 (BST)
From: Robert Watson <rwatson@FreeBSD.org>
X-X-Sender: robert@fledge.watson.org
To: Koen Martens <fbsd@metro.cx>
In-Reply-To: <432F1310.80007@metro.cx>
Message-ID: <20050920153806.F34322@fledge.watson.org>
References: <2B3B2AA816369A4E87D7BE63EC9D2F269B7B4D@SDCEXCHANGE01.ad.amcc.com>
	<432F1310.80007@metro.cx>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-hackers@freebsd.org, Dimitry Andric <dimitry@andric.com>,
	Vinod Kashyap <vkashyap@amcc.com>
Subject: Re: panic in propagate_priority w/ postgresql under heavy load
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
	<freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>, 
	<mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
	<mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Sep 2005 14:47:03 -0000


On Mon, 19 Sep 2005, Koen Martens wrote:

> Without the debug stuff in the kernel, it crashed within 2 days, same 
> story: postgresql process, function propagate_priority. However, no dump 
> was written to disk :(
>
> Furthermore, i've been seeing the same crash (in propagate_priority) on 
> another box in mysql processes. Both servers seem to panic every 2-3 
> days. I have another server of the exact same hardware configuration, 
> but it is mainly idling most of the time. Haven't seen that one crash 
> yet.
>
> I am thinking now that it is a bug in the twa driver, so i'll have to 
> dig in to that. Furthermore, it seems to have to do with some sort of 
> concurrency issue or otherwise timing-sensitive issue, because slowing 
> the kernel down with debug code seems to avoid the panic. But, as i am 
> completely new to the freebsd kernel and don't even know what turnstiles 
> are, i imagine i will have a hard time. So if anyone can offer some 
> help, please :)
>
> Ok, thanks for your attention,

I can't speak to the problem with the core dumps, as it sounds like that 
is device/firmware related.  However, I probably can lend a hand in 
debugging the problems you're seeing.

First off, propagate_priority() is part of the priority propagation 
mechanism associated with mutexes, which are a locking primitive in the 
FreeBSD kernel.  Most panic in propagate_priority() are actually the 
result of a corrupted mutex, and when the mutex code goes to perform 
priority propagation, it trips over bad pointers and panics in some form 
of another.  Often, this means the actual panic or failure has not 
occurred in the thread that prints out the panic you see, but another 
panic.  So the first task on hitting a propagate_priority() panic is to 
identify the thread that actually had the problem.

Usually, I do this from DDB, rather than a core dump, because I find that 
DDB's tools for inspect running state are a little easier to use.  First, 
I identify what code called into the mutex call that resulted in 
propagate_priority() being called.  The reason to do this is that what you 
want to do next is use "ps" and "trace" to identify other 
processes/threads in the same code, and hence likely to have caused a 
problem with the mutex storage in memory.  Generally, you're looking for a 
panic in another thread, so once you identify a set of threads that might 
be to blame, you can trace them to find one that is in panic().  Usually, 
that thread will be in the RUN state, or on an SMP box, possibly running 
on another CPU.  If you're running 6.x, the thread that panicked was 
likely preempted as it had problems, perhaps due to an untimely interrupt.

If you want to do this by e-mail so we can lend a hand, you probably want 
to hook up a serial console so you can copy and paste the debugging 
session.  Compile DDB into the kernel (this should have no performance 
overhead), and when the system panics, you'll (ideally) get a db> prompt. 
The panic message and any related context (such as trap information) is 
useful.  I usually then use "show percpu" to see what CPU I'm running on, 
the thread that's running, etc.  I'll then use "trace" with no argument to 
see the stack of the thread.  If I'm trying to find another thread that 
may have been preempted, I'll use "ps" to show the running processes and 
threads, then "trace <pid>" to trace the main thread of processes that 
look interesting.  Generally, those in the RUN state, because the thread 
will be runnable.

If you're running on an SMP system, you may occasionally find that 
information to inspect the stacks of threads currently running on other 
processors may not be consistently in memory -- i.e., cached, the stack 
frame is partially written, or whatever.  There's a kernel option, 
KDB_STOP_NMI, which when combined with a sysctl, will cause the debugger 
to deliver an NMI IPI instead of a debug IPI, which may help kick those 
processors into the debugger if they are stuck in spin locks.  However, 
the chances are fairly good this isn't the case so you're probably fine 
without it.

Robert N M Watson