From owner-cvs-all@FreeBSD.ORG Wed Dec 15 01:36:43 2004 Return-Path: Delivered-To: cvs-all@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C285616A4CE; Wed, 15 Dec 2004 01:36:43 +0000 (GMT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2070443D53; Wed, 15 Dec 2004 01:36:43 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])iBF1afGx012333; Wed, 15 Dec 2004 12:36:41 +1100 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) iBF1ac7M029925; Wed, 15 Dec 2004 12:36:39 +1100 Date: Wed, 15 Dec 2004 12:36:38 +1100 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Nate Lawson In-Reply-To: <41BF6F44.2090407@root.org> Message-ID: <20041215115843.E45301@delplex.bde.org> References: <200411300618.iAU6IkQX065609@repoman.freebsd.org> <200412141333.06213.jhb@FreeBSD.org> <41BF48D4.8080305@root.org> <200412141719.10701.jhb@FreeBSD.org> <41BF6F44.2090407@root.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@FreeBSD.org cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org cc: John Baldwin Subject: Re: cvs commit: src/sys/i386/i386 vm_machdep.c X-BeenThere: cvs-all@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the entire tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 Dec 2004 01:36:44 -0000 On Tue, 14 Dec 2004, Nate Lawson wrote: > John Baldwin wrote: > > On Tuesday 14 December 2004 03:11 pm, Nate Lawson wrote: > > > >>John Baldwin wrote: > >>... > >>>FYI, this breaks the 'reset' command from ddb if you panic on a cpu other > >>>than the BSP. boot() isn't the only function that calls cpu_reset(), so > >>>perhaps this should be reverted (same for amd64) > >> > >>No, I think we should move forward instead of backward. Entering the > >>debugger should happen on the BSP and possibly other cpus need to be > >>stopped by panic(). > > > > Erm, well, that's not always easy since sometimes when you panic you can't > > talk to the other CPUs for whatever reason. Putting back the proxy reset The most common reason is that at least one other CPU is looping with interrupts disabled. Then it won't see IPIs and stop_cpus() will loop forever. > > doesn't hurt for now but does restore functionality in at least some cases. > > I'd rather have that then certain hard panics not get into ddb because we > > couldn't get onto the BSP to run ddb. However, when another CPU doesn't stop, you can't get into ddb anyway since ddb always (at least in old versions) waits for all CPUs to stop. Excpet when DIAGNOSTIC is configured -- then stop_cpus() has a hack which makes it break out of the loop after a machine-dependent (but often far too short -- 100000 memory accesses doesn't take long even for locked accesses) number of iterations. I use a larger hack to make the breakout unconditional, increase the number of iterations, and print messages about stop/start activity. See old mail. This is not nearly enough for ddb. E.g., multiple CPUs may hit the same breakpoint (this is very likely if the breakpoint is at a core function like mtx_lock()). Then they race trying to enter ddb and stop each other. Since interrupts are disabled on hitting a breakpoint the CPUs can't possibly deliver IPIs to each other. I use larger hacks to add a some ddb-specific locking and not try so hard to stop the other CPUs. See old mail. This works if all the CPUs are trying to enter ddb -- then they can rendezvous on the lock there -- but not if the are looping with interrupts disabled elsewhere. > Perhaps you could give me some pointers on what is counted on to be > working when panic() is called? Nothing can or should be counted on. ddb might work if you are lucky. This doesn't take much more than having no bugs in console i/o or ddb. > I can't come up with a situation where > the proxy code couldn't be used upon entry to ddb. If there were any > cases like this, the proxy code wouldn't work for cpu_reset() either. > Also, in such a case, it's hard to see how ddb could be usable since it > tries to stop other processors, which requires similar code to the proxy. This might work in emergency but would complicate ddb and thus increase the chance of bugs in it. ddb must (appear to) have no state and (appear to) run on the CPU that it was entered on. Fully virtualizing it wouldn't be easy. > Or in other words, if you have enough capability to call panic() or > break to ddb, then you have enough resources to do an IPI and get onto > the BSP. You usually have enough resource to send an IPI but not so usually enough to make it have an effect. For panic(), it's better to do any waiting for other CPUs before calling ddb. Both panic() and ddb should be callable/enterable from any context, but panic() doesn't need to be transparent or return like ddb so handling errors in it is simpler. Bruce