From owner-freebsd-hackers@FreeBSD.ORG  Thu Nov 15 23:47:40 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 2C545892;
 Thu, 15 Nov 2012 23:47:40 +0000 (UTC)
 (envelope-from rysto32@gmail.com)
Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com
 [209.85.212.54])
 by mx1.freebsd.org (Postfix) with ESMTP id AC5B58FC14;
 Thu, 15 Nov 2012 23:47:39 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id l1so2908550vba.13
 for <multiple recipients>; Thu, 15 Nov 2012 15:47:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=bbHSI7vrKbMR0tC1sinwWp4mTH7fBmL3oRsL0gMG/kA=;
 b=pl70uuQluEx1+G6lImBGIukhTDOGZn/uyRbq1NkSJw+mnisZaBv1Y82bBC+4DWIhmr
 qvH5/naGxvMOxIIpMo3ntx0gD56p8A+ZkZkEbAxbf8tXGSfgAJ+zmnSaLcw49iJztJJ/
 a2hMtJayR6z/nZOtJd5bYqguj76ebo9UG12/RmVRSfH1v7HfMwhYvtwF8+zlr8VbJpwD
 ouNYOfcYRSrc2jQAXvkt5RqNp7NLx5oGcagUs+bWoJz6ivMvHEAzGhFCx2jhs5un81Bk
 HMFWTPmaAZAwwnW8djjT+z9ODEr32iO5brg+KNMEXgZXog2k1BuRBysFTNOXMtyWQG+d
 afIg==
MIME-Version: 1.0
Received: by 10.52.100.230 with SMTP id fb6mr3332517vdb.91.1353023258579; Thu,
 15 Nov 2012 15:47:38 -0800 (PST)
Received: by 10.58.207.114 with HTTP; Thu, 15 Nov 2012 15:47:38 -0800 (PST)
In-Reply-To: <CAJ-FndBQwO0syGpG9mSYF4tAEO8wu6vv7QKbvzQY-9uo_ZJWhA@mail.gmail.com>
References: <CAFMmRNwb_rxYXHGtXgtcyVUJnFDx5PSeMmA_crBbeV_rtzL9Cg@mail.gmail.com>
 <CAJ-FndBQwO0syGpG9mSYF4tAEO8wu6vv7QKbvzQY-9uo_ZJWhA@mail.gmail.com>
Date: Thu, 15 Nov 2012 18:47:38 -0500
Message-ID: <CAFMmRNx3Q_F02CnqHhYKF=HLMu=hhMVP2PhJscAydAFcQKU52w@mail.gmail.com>
Subject: Re: stop_cpus_hard when multiple CPUs are panicking from an NMI
From: Ryan Stone <rysto32@gmail.com>
To: Attilio Rao <attilio@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Nov 2012 23:47:40 -0000

On Thu, Nov 15, 2012 at 6:41 PM, Attilio Rao <attilio@freebsd.org> wrote:

> On Thu, Nov 15, 2012 at 10:58 PM, Ryan Stone <rysto32@gmail.com> wrote:
> > At work we have some custom watchdog hardware that sends an NMI upon
> > expiry.  We've modified the kernel to panic when it receives the watchdog
> > NMI.  I've been trying the "stop_scheduler_on_panic" mode, and I've
> > discovered that when my watchdog expires, the system gets completely
> > wedged.  After some digging, I've discovered is that I have multiple CPUs
> > getting the watchdog NMI and trying to panic concurrently.  One of the
> CPUs
> > wins, and the rest spin forever in this code:
>
> Quick question: can you control the way your watchdog sends the NMI?
> Like only to BSP rather than broadcast, etc.
> This is tied to the very unique situation that you cannot really
> deliver the (second) NMI.
>
> Attilio
>
>
> --
> Peace can only be achieved by understanding - A. Einstein
>

I don't believe that I can, but I can check.  In any case I can imagine
other places where this could be an issue.  hwpmc works with NMIs, right?
So an hwpmc bug could trigger the same kind of issues if two CPUs that
concurrently called pmc_intr both tripped over the sane bug.