Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Apr 2018 13:11:06 -0400
From:      Mark Johnston <markj@FreeBSD.org>
To:        Jonathan Looney <jonlooney@gmail.com>
Cc:        cem@freebsd.org, src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r332860 - head/sys/kern
Message-ID:  <20180422171106.GB84833@raichu>
In-Reply-To: <CADrOrmsmfjuj3_rGoVydF9ahvfuCunNyrHRs1VxKZWr=cUmLhQ@mail.gmail.com>
References:  <201804211705.w3LH50Dk056339@repo.freebsd.org> <CAG6CVpVAKAou%2B=9dv%2Bef8txmykR%2BMCpbmvteJua7ErhXvM7rCg@mail.gmail.com> <CADrOrmsmfjuj3_rGoVydF9ahvfuCunNyrHRs1VxKZWr=cUmLhQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Apr 21, 2018 at 04:33:33PM -0400, Jonathan Looney wrote:
> On Sat, Apr 21, 2018 at 1:53 PM, Conrad Meyer <cem@freebsd.org> wrote:
> >
> > I don't think this should be enabled by default.  Can we leave it
> > disabled by default and let consumers opt-in?
> 
> I'm willing to change the default if there's a good reason or consensus for
> that. However, it is not obvious to me that the default is actually wrong.
> 
> I think its important that we remember where we are when we hit this code.
> 
> By the time we hit this code, we've already panic'd (whether due to a
> "true" panic or a violated assertion). At this point, the system is already
> in an unknown/unexpected state and we want to preserve our ability to
> troubleshoot and/or recover. (And, since we're running a system with
> INVARIANTS, presumably the ability to troubleshoot is important.)
> 
> We may well violate a second assertion simply because the system is in an
> unknown/unexpected state. Once we do that, the question is whether we
> should try to preserve the ability to troubleshoot, or should give up and
> reset the system.
> 
> All too often, my ability to debug assertion violations is hindered because
> the system trips over yet another assertion while dumping the core. If we
> skip the assertion, nothing bad happens. (The post-panic debugging code
> already needs to deal with systems that are inconsistent, and it does a
> pretty good job at it.)

I think we make a decent effort to fix such problems as they arise, but
you are declaring defeat on behalf of everyone. Did you make some effort
to fix or report these issues before resorting to the more drastic
measure taken here?

> On the other hand, I really am not sure what you are worried might happen
> if we skip checking assertions after we've already panic'd. As far as I can
> tell, the likely worst case is that we hit a true panic of some kind. In
> that case, we're no worse off than before.
> 
> I think the one obvious exception is when we're purposely trying to
> validate the post-panic debugging code. In that case, you can change the
> sysctl/tunable to enable troubleshooting.

What about a user whose test system panics and fails to dump? With
assertions enabled, a developer has a better chance of spotting the
problem. Now we need at least one extra round trip to the user to
diagnose the problem, which may not be readily reproducible in the first
place.

> I would honestly appreciate someone explaining the dangers in disabling a
> response to assertion violations after we've already panic'd and are simply
> trying to troubleshoot, because they are not obvious to me. But, I could
> simply be missing them.

The assertions help identify code that is being executed during a dump
when it shouldn't be. In general we rely on users to opt in to running
INVARIANTS kernels because developers don't catch every single bug. With
this change it's harder to be confident in the kernel dump code. (Or in
any post-panic debugging code for that matter.)

I dislike the change and would prefer the default to be inverted. At the
very least I think we should print the assertion message rather than
returning silently from kassert_panic().



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180422171106.GB84833>