From owner-cvs-src@FreeBSD.ORG Wed Dec 15 19:56:36 2004 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9636C16A4CE; Wed, 15 Dec 2004 19:56:36 +0000 (GMT) Received: from www.cryptography.com (li-22.members.linode.com [64.5.53.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id 34C2043D31; Wed, 15 Dec 2004 19:56:36 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.0.0.34] (adsl-67-119-74-222.dsl.sntc01.pacbell.net [67.119.74.222]) by www.cryptography.com (8.12.8/8.12.8) with ESMTP id iBFJuWug009490 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Wed, 15 Dec 2004 11:56:35 -0800 Message-ID: <41C096EF.7050908@root.org> Date: Wed, 15 Dec 2004 11:56:31 -0800 From: Nate Lawson User-Agent: Mozilla Thunderbird 0.9 (Windows/20041103) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Kris Kennaway References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <41BF6F44.2090407@root.org> <20041215001034.GA60875@xor.obsecurity.org> <200412142148.48019.jhb@FreeBSD.org> <20041215151526.GA3462@xor.obsecurity.org> In-Reply-To: <20041215151526.GA3462@xor.obsecurity.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit cc: cvs-src@FreeBSD.org cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org cc: John Baldwin Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2004 19:56:36 -0000 Kris Kennaway wrote: > On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote: > >>On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote: >> >>>On Tue, Dec 14, 2004 at 02:55:00PM -0800, Nate Lawson wrote: >>> >>>>>Erm, well, that's not always easy since sometimes when you panic you >>>>>can't talk to the other CPUs for whatever reason. Putting back the >>>>>proxy reset doesn't hurt for now but does restore functionality in at >>>>>least some cases. I'd rather have that then certain hard panics not >>>>>get into ddb because we couldn't get onto the BSP to run ddb. >>>> >>>>Perhaps you could give me some pointers on what is counted on to be >>>>working when panic() is called? I can't come up with a situation where >>>>the proxy code couldn't be used upon entry to ddb. If there were any >>>>cases like this, the proxy code wouldn't work for cpu_reset() either. >>>>Also, in such a case, it's hard to see how ddb could be usable since it >>>>tries to stop other processors, which requires similar code to the proxy. >>>> >>>>Or in other words, if you have enough capability to call panic() or >>>>break to ddb, then you have enough resources to do an IPI and get onto >>>>the BSP. >>> >>>NB: DDB often isn't usable on SMP machines thesedays, and will hang >>>when a panic tries to enter it. >> >>Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from trying to >>stop the other CPUs. Another possible fix that ups@ has talked about is >>changing IPI_STOP to use an NMI rather than a vector (you can send NMI IPIs >>via the local APIC) so that IPI_STOP is more reliable. > > > This is already set, and it doesn't always fix the problem. I often > get overlapping panics from the other CPUs on this machine, and it > often locks up when trying to enter DDB, or while printing the panic > string (the other day it only got as far as 'p' before hanging). > > Kris Bruce had a patch that made better effort to get all cpus stopped in panic without races or hangs. Perhaps he can repost it, it's been a while since I saw it. -- Nate