Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 5 Oct 2016 11:19:10 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Gleb Smirnoff <glebius@freebsd.org>
Cc:        Eric van Gyzen <vangyzen@freebsd.org>, src-committers@freebsd.org,  svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r306346 - head/sys/kern
Message-ID:  <20161005101932.U984@besplex.bde.org>
In-Reply-To: <20161004205600.GN23123@FreeBSD.org>
References:  <201609261530.u8QFUUZd020174@repo.freebsd.org> <20161004205600.GN23123@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 4 Oct 2016, Gleb Smirnoff wrote:

> On Mon, Sep 26, 2016 at 03:30:30PM +0000, Eric van Gyzen wrote:
> E> ...
> E> Modified: head/sys/kern/kern_mutex.c
> E> ==============================================================================
> E> --- head/sys/kern/kern_mutex.c	Mon Sep 26 15:03:31 2016	(r306345)
> E> +++ head/sys/kern/kern_mutex.c	Mon Sep 26 15:30:30 2016	(r306346)
> E> @@ -924,7 +924,7 @@ __mtx_assert(const volatile uintptr_t *c
> E>  {
> E>  	const struct mtx *m;
> E>
> E> -	if (panicstr != NULL || dumping)
> E> +	if (panicstr != NULL || dumping || SCHEDULER_STOPPED())
> E>  		return;
>
> I wonder if all this disjunct can be reduced just to SCHEDULER_STOPPED()?
> Positive panicstr and dumping imply scheduler stopped.

'dumping' doesn't imply SCHEDULER_STOPPED().

Checking 'dumping' here seems to be just an old bug.  It just breaks
__mtx_assert(), while all other mutex operations work normally for dumping
without panicing.

kern doesn't have this bug anywhere else.  It just has style bugs for
most references to 'dumping':

X kern_mutex.c:		 * re-enable interrupts while dumping core.

This is under another recent fix involving SCHEDULER_STOPPED().

X kern_mutex.c:	if (panicstr != NULL || dumping || SCHEDULER_STOPPED())

Broken.

X kern_shutdown.c:int dumping;				/* system is dumping */

Banal comment.

X kern_shutdown.c:	if (dumping)
X kern_shutdown.c:	dumping++;
X kern_shutdown.c:	dumping--;

Obfuscation of a boolean by manually optimizing its setting for PDP-11.

X kern_shutdown.c:	if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping)

Missing spaced around binary operator.

X sched_4bsd.c:	if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||

Here the bogus test for dumping is commented out.  This is in maybe_preempt().
There is a TD_IS_INHIBITED() check.  sched_ule.c has similar code without
the commented-out 'dumping'.  Neither checks SCHEDULER_STOPPED().  It is
certainly useless to preempt if SCHEDULER_STOPPED(), but perhaps checking it
is unnecessary.

I don't like the design or implementation of SCHEDULER_STOPPED().  It is
a hack to specially break mutexes while panicing.  Panicing stops the
scheduler and tries to stop all other CPUs
    (fixed in my version to either actually stop them all, with special
    stopping for NMI handlers, or hang waiting)
and we set the flag curthread->td_stopsched to indicate that the scheduler
is specially stopped for panic.  Bugs in the implementation of this
include:
- 2 bytes are wasted in struct thread to hold the flag.  This is a
   dubious obfuscation of an old version that used a global flag
- despite this optimization, all mutex operations should be slowed down
   by testing this flag
- however, the inlined mutex operations don't test this flag.  This gives
   inconsistencies.  __mtx_assert() needed the fix in this commit to not
   detect these inconsistencies
Bugs in the design of this include:
- SCHEDULER_STOPPED() doesn't really mean that the scheduler has stopped.
   It means that we are panicing and have (tried to) stop other CPUs and
   want to forcer all mutex operations to silently succeed without keeping
   the mutex state consistent.

This is fragile.  Nothing can depend on mutexes working or mutex assertions
finding inconsistencies or on mtx_owned() working, so the SCHEDULER_STOPPED()
check must be done in more than central mutex code.  It is confusing that
this condition doesn't mean that the scheduler is stopped.  It means that
a certain state in panicing has been reached.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161005101932.U984>