Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Apr 2018 15:04:31 -0700
From:      John Baldwin <jhb@freebsd.org>
To:        Mark Johnston <markj@freebsd.org>
Cc:        "Jonathan T. Looney" <jtl@freebsd.org>, cem@freebsd.org, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r332860 - head/sys/kern
Message-ID:  <1739228.8pyHcvzasL@ralph.baldwin.cx>
In-Reply-To: <20180423180024.GC84833@raichu>
References:  <201804211705.w3LH50Dk056339@repo.freebsd.org> <CADrOrmvAxuoadBM==1EEbJc4PAPwtd-vPE4Tg-pM86CvwQnnwA@mail.gmail.com> <20180423180024.GC84833@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday, April 23, 2018 02:00:24 PM Mark Johnston wrote:
> On Mon, Apr 23, 2018 at 11:12:32AM -0400, Jonathan T. Looney wrote:
> > Hi Mark,
> > 
> > Let me start by saying that I appreciate your well-reasoned response. (I
> > think) I understand your reasoning, I appreciate your well-explained
> > argument, and I respect your opinion. I just wanted to make that clear up
> > front.
> > 
> > On Sun, Apr 22, 2018 at 1:11 PM, Mark Johnston <markj@freebsd.org> wrote:
> > >
> > > > All too often, my ability to debug assertion violations is hindered
> > because
> > > > the system trips over yet another assertion while dumping the core. If
> > we
> > > > skip the assertion, nothing bad happens. (The post-panic debugging code
> > > > already needs to deal with systems that are inconsistent, and it does a
> > > > pretty good job at it.)
> > >
> > > I think we make a decent effort to fix such problems as they arise, but
> > > you are declaring defeat on behalf of everyone. Did you make some effort
> > > to fix or report these issues before resorting to the more drastic
> > > measure taken here?
> > 
> > We try to report or fix them as they arise. However, you don't know there
> > is a problem until you actually run into it. And, you don't run into the
> > problem until you can't get a core dump due to the assertion.
> > 
> > (And, with elusive problems, it isn't always easy to duplicate them. So,
> > fixing the assertion is sometimes "too late".)
> 
> Sure, this is true. But unless it's a problem in practice it's obviously
> preferable to keep assertions enabled. Kernel dumping itself is a
> fundamentally unreliable mechanism, but it works well enough to be
> useful. I basically never see problems with post-panic assertion
> failures, and I test the kernel dump code a fair bit. Isilon exercises
> that code quite a lot as well without any problems that I'm aware of,
> and I can't think of any reports of such assertion failures that weren't
> quickly fixed. So I'm wondering what problems exist in your specific
> environment that we might instead address surgically.
> 
> (I could very well be wrong about how widespread post-panic assertion
> failures are. We've had problems of this sort before, e.g., with the
> updated DRM graphics drivers, where the code to grab the console after a
> panic didn't work properly. There, the bandaid was to just disable that
> specific mechanism.)

I think this is actually a key question.  In my experience to date I have not
encountered a large number of post-panic assertion failures.  Given that
we already break all locks and disable assertions for locks I'd be curious
which assertions are actually failing.  My inclination given my experiences
to date would be to explicitly ignore those as we do for locking if it is
constrained set rather than blacklisting all of them.  However, I would be
most interested in seeing some examples of assertions that are failing.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1739228.8pyHcvzasL>