Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Dec 2004 12:36:38 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Nate Lawson <nate@root.org>
Cc:        John Baldwin <jhb@FreeBSD.org>
Subject:   Re: cvs commit: src/sys/i386/i386 vm_machdep.c
Message-ID:  <20041215115843.E45301@delplex.bde.org>
In-Reply-To: <41BF6F44.2090407@root.org>
References:  <200411300618.iAU6IkQX065609@repoman.freebsd.org> <200412141333.06213.jhb@FreeBSD.org> <41BF48D4.8080305@root.org> <200412141719.10701.jhb@FreeBSD.org> <41BF6F44.2090407@root.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 14 Dec 2004, Nate Lawson wrote:

> John Baldwin wrote:
> > On Tuesday 14 December 2004 03:11 pm, Nate Lawson wrote:
> >
> >>John Baldwin wrote:
> >>...
> >>>FYI, this breaks the 'reset' command from ddb if you panic on a cpu other
> >>>than the BSP.  boot() isn't the only function that calls cpu_reset(), so
> >>>perhaps this should be reverted (same for amd64)
> >>
> >>No, I think we should move forward instead of backward.  Entering the
> >>debugger should happen on the BSP and possibly other cpus need to be
> >>stopped by panic().
> >
> > Erm, well, that's not always easy since sometimes when you panic you can't
> > talk to the other CPUs for whatever reason.  Putting back the proxy reset

The most common reason is that at least one other CPU is looping with
interrupts disabled.  Then it won't see IPIs and stop_cpus() will loop
forever.

> > doesn't hurt for now but does restore functionality in at least some cases.
> > I'd rather have that then certain hard panics not get into ddb because we
> > couldn't get onto the BSP to run ddb.

However, when another CPU doesn't stop, you can't get into ddb anyway
since ddb always (at least in old versions) waits for all CPUs to stop.
Excpet when DIAGNOSTIC is configured -- then stop_cpus() has a hack
which makes it break out of the loop after a machine-dependent (but
often far too short -- 100000 memory accesses doesn't take long even
for locked accesses) number of iterations.  I use a larger hack to make
the breakout unconditional, increase the number of iterations, and print
messages about stop/start activity.  See old mail.

This is not nearly enough for ddb.  E.g., multiple CPUs may hit the
same breakpoint (this is very likely if the breakpoint is at a core
function like mtx_lock()).  Then they race trying to enter ddb and
stop each other.  Since interrupts are disabled on hitting a breakpoint
the CPUs can't possibly deliver IPIs to each other.  I use larger hacks
to add a some ddb-specific locking and not try so hard to stop the
other CPUs.  See old mail.  This works if all the CPUs are trying to
enter ddb -- then they can rendezvous on the lock there -- but not if
the are looping with interrupts disabled elsewhere.

> Perhaps you could give me some pointers on what is counted on to be
> working when panic() is called?

Nothing can or should be counted on.  ddb might work if you are lucky.
This doesn't take much more than having no bugs in console i/o or ddb.

> I can't come up with a situation where
> the proxy code couldn't be used upon entry to ddb.  If there were any
> cases like this, the proxy code wouldn't work for cpu_reset() either.
> Also, in such a case, it's hard to see how ddb could be usable since it
> tries to stop other processors, which requires similar code to the proxy.

This might work in emergency but would complicate ddb and thus increase
the chance of bugs in it.  ddb must (appear to) have no state and
(appear to) run on the CPU that it was entered on.  Fully virtualizing
it wouldn't be easy.

> Or in other words, if you have enough capability to call panic() or
> break to ddb, then you have enough resources to do an IPI and get onto
> the BSP.

You usually have enough resource to send an IPI but not so usually enough
to make it have an effect.

For panic(), it's better to do any waiting for other CPUs before calling
ddb.  Both panic() and ddb should be callable/enterable from any context,
but panic() doesn't need to be transparent or return like ddb so handling
errors in it is simpler.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041215115843.E45301>