From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 15 22:58:26 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 59715D0
 for <freebsd-hackers@freebsd.org>; Thu, 15 Nov 2012 22:58:26 +0000 (UTC)
 (envelope-from rysto32@gmail.com)
Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com
 [209.85.220.182])
 by mx1.freebsd.org (Postfix) with ESMTP id 0C9A68FC08
 for <freebsd-hackers@freebsd.org>; Thu, 15 Nov 2012 22:58:25 +0000 (UTC)
Received: by mail-vc0-f182.google.com with SMTP id fo13so2961299vcb.13
 for <freebsd-hackers@freebsd.org>; Thu, 15 Nov 2012 14:58:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:date:message-id:subject:from:to:content-type;
 bh=BTfkGZQUJ3lIp8hBJxvqxeqRbUVrB+EtwxPdt7aJL4c=;
 b=RqOVme++W2hOmM00mYBBXMtuRU0bMC6SGRNS3cXw21O4L6PodbbgBBHiQdf4sofQBM
 WpPHeei/rfOyDbx6zImjf505jkq3cWVIocpOqp94oPo2Xu+cgw1XBdLINAakWgIFK1gq
 G8SiwoniiYj8+EDLsNRDy2Vg5eAKV2x5v0XSGtFF4xK6zblqtx38T+4rh8G+2Y5esQCP
 lEmpmDwyq9YgpC74PjUQAgm9L8EAT3EUuhaQ/kjj7RzP5DYMu9ZZFY7N7/0Dci/E70W2
 4/HY3FfOHjMRIGrf9NNaUhq2LX35D1a6EyVeq+DWsrRqpBxvZyZQOjrXAbMHNYKimEAO
 SXQA==
MIME-Version: 1.0
Received: by 10.58.39.42 with SMTP id m10mr3810842vek.21.1353020305377; Thu,
 15 Nov 2012 14:58:25 -0800 (PST)
Received: by 10.58.207.114 with HTTP; Thu, 15 Nov 2012 14:58:25 -0800 (PST)
Date: Thu, 15 Nov 2012 17:58:25 -0500
Message-ID: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com>
Subject: stop_cpus_hard when multiple CPUs are panicking from an NMI
From: Ryan Stone <rysto32@gmail.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Nov 2012 22:58:26 -0000

At work we have some custom watchdog hardware that sends an NMI upon
expiry.  We've modified the kernel to panic when it receives the watchdog
NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
discovered that when my watchdog expires, the system gets completely
wedged.  After some digging, I've discovered is that I have multiple CPUs
getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
wins, and the rest spin forever in this code:

/*
     * We don't want multiple CPU's to panic at the same time, so we
     * use panic_cpu as a simple spinlock.  We have to keep checking
     * panic_cpu if we are spinning in case the panic on the first
     * CPU is canceled.
     */
    if (panic_cpu != PCPU_GET(cpuid))
        while (atomic_cmpset_int(&panic_cpu, NOCPU,
            PCPU_GET(cpuid)) == 0)
            while (panic_cpu != NOCPU)
                ; /* nothing */

The system wedges when stop_cpus_hard() is called, which sends NMIs to all
of the other CPUs and waits for them to acknowledge that they are stopped
before returning.  However the CPU will not deliver an NMI to a CPU that is
already handling an NMI, so the other CPUs that got a watchdog NMI and are
spinning will never go into the NMI handler and acknowledge that they are
stopped.

I've been able to work around this with the following hideous hack:

--- kern_shutdown.c     2012-08-17 10:25:02.000000000 -0400
+++ kern_shutdown.c     2012-11-15 17:04:10.000000000 -0500
@@ -658,11 +658,15 @@
         * panic_cpu if we are spinning in case the panic on the first
         * CPU is canceled.
         */
-       if (panic_cpu != PCPU_GET(cpuid))
+       if (panic_cpu != PCPU_GET(cpuid)) {
                while (atomic_cmpset_int(&panic_cpu, NOCPU,
-                   PCPU_GET(cpuid)) == 0)
+                   PCPU_GET(cpuid)) == 0) {
+                       atomic_set_int(&stopped_cpus, PCPU_GET(cpumask));
                        while (panic_cpu != NOCPU)
                                ; /* nothing */
+               }
+               atomic_clear_int(&stopped_cpus, PCPU_GET(cpumask));
+       }

        if (stop_scheduler_on_panic) {
                if (panicstr == NULL && !kdb_active)


But I'm hoping that somebody has some ideas on a better way to fix this
kind of problem.