Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Nov 2011 12:11:07 -0500
From:      Andrew Boyer <aboyer@averesystems.com>
To:        Kostik Belousov <kostikbel@gmail.com>
Cc:        arch@freebsd.org, current@freebsd.org, avg@freebsd.org
Subject:   Re: Stop scheduler on panic
Message-ID:  <0850D6DB-386B-4588-A362-D53637D25F7D@averesystems.com>
In-Reply-To: <20111113083215.GV50300@deviant.kiev.zoral.com.ua>
References:  <20111113083215.GV50300@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help

On Nov 13, 2011, at 3:32 AM, Kostik Belousov wrote:

> I was tricked into finishing the work by Andrey Gapon, who developed
> the patch to reliably stop other processors on panic.  The patch
> greatly improves the chances of getting dump on panic on SMP host.
> Several people already saw the patchset, and I remember that Andrey
> posted it to some lists.
>=20
> The change stops other (*) processors early upon the panic.  This way,
> no parallel manipulation of the kernel memory is performed by CPUs.
> In particular, the kernel memory map is static.  Patch prevents the
> panic thread from blocking and switching out.
>=20
> * - in the context of the description, other means not current.
>=20
> Since other threads are not run anymore, lock owner cannot release a
> lock which is required by panic thread.  Due to this, we need to fake
> a lock acquisition after the panic, which adds minimal overhead to the
> locking cost. The patch tries to not add any overhead on the fast path
> of the lock acquire.  The check for the after-panic condition was
> reduced to single memory access, done only when the quick cas lock
> attempt failed, and braced with __unlikely compiler hint.
>=20
> For now, the new mode of operation is disabled by default, since some
> further USB changes are needed to make USB keyboard usable in that
> environment.
>=20
> With the patch, getting a dump from the machine without debugger
> compiled in is much more realistic.  Please comment, I will commit the
> change in 2 weeks unless strong reasons not to are given.
>=20
> http://people.freebsd.org/~kib/misc/stop_cpus_on_panic.1.patch
>=20

We have many systems running Andriy's latest version of the patch under =
8.2.  I also brought in the related USB patch; without it, the system =
hangs up while dumping almost every time.  With both patches in place* =
it has worked flawlessly for us.

-Andrew

* - with one change: always do the critical_enter() / critical_exit().  =
Using spinlock_enter() blocks the software watchdog, which needs to =
still be active in case the dump hangs for other reasons.  This is =
obviously not ideal but the best solution I have right now.  We also =
stop all of the network interfaces at the beginning of boot().


--------------------------------------------------
Andrew Boyer	aboyer@averesystems.com







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?0850D6DB-386B-4588-A362-D53637D25F7D>