Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Dec 2004 11:56:31 -0800
From:      Nate Lawson <nate@root.org>
To:        Kris Kennaway <kris@obsecurity.org>
Cc:        John Baldwin <jhb@FreeBSD.org>
Subject:   Re: cvs commit: src/sys/i386/i386 vm_machdep.c
Message-ID:  <41C096EF.7050908@root.org>
In-Reply-To: <20041215151526.GA3462@xor.obsecurity.org>
References:  <200411300618.iAU6IkQX065609@repoman.freebsd.org> <41BF6F44.2090407@root.org> <20041215001034.GA60875@xor.obsecurity.org> <200412142148.48019.jhb@FreeBSD.org> <20041215151526.GA3462@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Kris Kennaway wrote:
> On Tue, Dec 14, 2004 at 09:48:48PM -0500, John Baldwin wrote:
> 
>>On Tuesday 14 December 2004 07:10 pm, Kris Kennaway wrote:
>>
>>>On Tue, Dec 14, 2004 at 02:55:00PM -0800, Nate Lawson wrote:
>>>
>>>>>Erm, well, that's not always easy since sometimes when you panic you
>>>>>can't talk to the other CPUs for whatever reason.  Putting back the
>>>>>proxy reset doesn't hurt for now but does restore functionality in at
>>>>>least some cases.  I'd rather have that then certain hard panics not
>>>>>get into ddb because we couldn't get onto the BSP to run ddb.
>>>>
>>>>Perhaps you could give me some pointers on what is counted on to be
>>>>working when panic() is called?  I can't come up with a situation where
>>>>the proxy code couldn't be used upon entry to ddb.  If there were any
>>>>cases like this, the proxy code wouldn't work for cpu_reset() either.
>>>>Also, in such a case, it's hard to see how ddb could be usable since it
>>>>tries to stop other processors, which requires similar code to the proxy.
>>>>
>>>>Or in other words, if you have enough capability to call panic() or
>>>>break to ddb, then you have enough resources to do an IPI and get onto
>>>>the BSP.
>>>
>>>NB: DDB often isn't usable on SMP machines thesedays, and will hang
>>>when a panic tries to enter it.
>>
>>Try debug.kdb.stop_cpus=0 (sysctl and tunable) to prevent KDB from trying to 
>>stop the other CPUs.  Another possible fix that ups@ has talked about is 
>>changing IPI_STOP to use an NMI rather than a vector (you can send NMI IPIs 
>>via the local APIC) so that IPI_STOP is more reliable.
> 
> 
> This is already set, and it doesn't always fix the problem.  I often
> get overlapping panics from the other CPUs on this machine, and it
> often locks up when trying to enter DDB, or while printing the panic
> string (the other day it only got as far as 'p' before hanging).
> 
> Kris

Bruce had a patch that made better effort to get all cpus stopped in 
panic without races or hangs.  Perhaps he can repost it, it's been a 
while since I saw it.

-- 
Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?41C096EF.7050908>