From owner-freebsd-hackers@FreeBSD.ORG Thu May 31 14:48:49 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CBE07106566C; Thu, 31 May 2012 14:48:49 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 75D658FC1D; Thu, 31 May 2012 14:48:49 +0000 (UTC) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 85873B96E; Thu, 31 May 2012 10:48:48 -0400 (EDT) From: John Baldwin To: Mark Felder Date: Thu, 31 May 2012 10:48:45 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p13; KDE/4.5.5; amd64; ; ) References: <201205301317.07345.jhb@freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201205311048.45813.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 31 May 2012 10:48:48 -0400 (EDT) Cc: freebsd-hackers@freebsd.org, freebsd-questions@freebsd.org Subject: Re: Please help me diagnose this crazy VMWare/FreeBSD 8.x crash X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 May 2012 14:48:49 -0000 On Wednesday, May 30, 2012 3:56:02 pm Mark Felder wrote: > On Wed, 30 May 2012 12:17:07 -0500, John Baldwin wrote: > > > > > Humm, can you test it with 2 CPUs? > > > > We primarily only run with 1 CPU. We have seen it crash on multiple CPU > VMs. Also, Dane Foster appeared to have been using multiple CPUs in his > video transcoding VMs. > > Unfortunately I can't give you more information at the moment. I'm working > with Dane to compile easy to follow steps that recreate this failure. I > have not been successful in getting this to crash on demand in my > environment, but Dane has so we're trying to recreate his. Ok. It would be really helpful if we could get a crashdump, though I realize that may not be doable. Otherwise, full DDB ps output from a hang would be a good start. Primarily I would want to see what the system is doing and why it isn't running the threads on the run queue. It might also be useful to add KTR_SCHED tracing so we can get the output of that via 'show ktr' from DDB when it hangs. -- John Baldwin