Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Jan 2009 00:09:24 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Pete French <petefrench@ticketswitch.com>
Cc:        freebsd-stable@freebsd.org, drosih@rpi.edu, rblayzor.bulk@inoc.net
Subject:   Re: Big problems with 7.1 locking up :-(
Message-ID:  <alpine.BSF.2.00.0901130005130.16794@fledge.watson.org>
In-Reply-To: <E1LMS1C-0002x6-Je@dilbert.ticketswitch.com>
References:  <E1LMS1C-0002x6-Je@dilbert.ticketswitch.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Mon, 12 Jan 2009, Pete French wrote:

>> I'm not sure if you've done this already, but the normal suggestions apply: 
>> have you compiled with INVARIANTS/WITNESS/DDB/KDB/BREAK_TO_DEBUGGER, and do 
>> any results / panics / etc result?  Sometimes these debugging tools are 
>> able to convert hangs into panics, which gives us much more ability to 
>> debug them.
>
> OK, I have now had a machine hand again, with the correct debug options in 
> the kernel. The screen looked like this when I went to restart it:
>
> 	http://toybox.twisted.org.uk/~pete/71_lor2.png
>
> It had not, however, dropped into any kind of debugger. Also there appear to 
> me console messages after the lock order reversal - is that normal ?

Lock order reversals are warnings of potential deadlock due to a lock cycle, 
but deadlocks may not actually result, either because it's a false positive 
(some locking construct that is deadlock free but involves lock cycles), or 
because a cycle didn't actually form.  The message is suggestive, but if you 
have significant system activity after the message, then it may be unrelated.

> The machine did stay up for a signifanct amount of time before doing this. I 
> notice that it is more or less identical to the one I posted whenI had 
> WITNESS_KDB in the kernel too, so maybe those results arent entirely 
> suprious after all ?
>
> Given it hasnt dropped to a debugger, is there anything else I can try ?

Features like WITNESS and INVARIANTS may change the timing of the kernel 
making certain race conditions less likely; I'd run with them for a bit and 
see if you can reproduce the hang with them present, as they will make 
debugging the problem a lot easier, if it's possible.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0901130005130.16794>