From owner-freebsd-stable@FreeBSD.ORG Tue Jan 13 12:42:09 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B498B1065686 for ; Tue, 13 Jan 2009 12:42:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 8FE168FC17 for ; Tue, 13 Jan 2009 12:42:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 34B1746B23; Tue, 13 Jan 2009 07:42:09 -0500 (EST) Date: Tue, 13 Jan 2009 12:42:09 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Pete French In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@freebsd.org, drosih@rpi.edu, rblayzor.bulk@inoc.net Subject: Re: Big problems with 7.1 locking up :-( X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Jan 2009 12:42:10 -0000 On Tue, 13 Jan 2009, Pete French wrote: >> Features like WITNESS and INVARIANTS may change the timing of the kernel >> making certain race conditions less likely; I'd run with them for a bit and >> see if you can reproduce the hang with them present, as they will make >> debugging the problem a lot easier, if it's possible. > > Uh, the above *was* me reproducing the hang with them present ;-)) It quite > happily hangs with thoise things in the kernel - indeed the next hang was > immediately after I rebooted the machine. But even with WITNESS and > INVARIANTS and all the rest it does not drop to a debugger, it simply locks > up. > > That machine is currently turned off, but still has 7.1 installed. What > would you like me to try now ? I have a lockup I can reproduce pretty > reliably now (just wait and it will always lock up). I also found that my > other 7.1 box locks up fairly reliably when doing a buildworld. > > The only similarily between these two machines and the ones which dont lock > up is that these are serving DNS. The others don't. Note that all the > hardware is identical, as is the installed software and the configuration. If you have BREAK_TO_DEBUGGER compiled into the kernel, then try pressing ctrl-alt-break on the console to see if you can drop into the debugger, or issue a serial break on a serial console. For somewhat complicated reasons to explain, serial breaks are more effective at getting into the debugger, so are preferable -- also because you can more easily log output from the debugger. If you are able to get into the debugger, the normal commands would be most helpful, especially if you can log the results: ps show lockedvnods show alllocks Robert N M Watson Computer Laboratory University of Cambridge