Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 19 Jul 2017 17:03:25 -0700
From:      Mark Johnston <markj@FreeBSD.org>
To:        Mark Martinec <Mark.Martinec+freebsd@ijs.si>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
Message-ID:  <20170720000325.GB9198@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si>
References:  <e4acc16980fe65751325333870bf2b68@ijs.si> <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <c8140f430fb2af93a6bc70a3df8cdadc@ijs.si> <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jul 20, 2017 at 01:46:33AM +0200, Mark Martinec wrote:
> More news on the matter. As reported yesterday the locally built
> kernel with options INVARIANTS and DDB works fine and somehow avoids
> the trouble at attaching the da (mps) disks on an LSI controller, so
> today I wanted to get back to a reproducible hang - and sure enough,
> reverting to the generic kernel as distributed brings back the hang.
> 
> So I tried rebuilding the kernel while experimenting with options
> like DDB and INVARIANTS.
> 
> A locally built GENERIC kernel behaves the same as the original
> kernel from the distribution (as installed by freebsd-upgrade),
> so no surprises there. It hangs trying to attach the first of the
> da disks (after first successfully attaching all the ada disks).
> The alt ctrl esc is unable to enter debugger when the hang occurs
> (possibly due to an unresponsive USB keyboard at that time),
> even though the debug.kdb.break_to_debugger was set to 1 at a
> loader prompt. It needs loader "Safe mode" to be able to boot.
> 
> Next, a locally built kernel with DDB and INVARIANTS works well
> (the remaining options come from an included GENERIC).
> 
> Now the funny part: a locally built kernel with just the DDB
> option (and the rest included from GENERIC) *also* works well.
> Somehow the DDB option makes a difference, even though kernel
> debugger is never activated.

One thing to try at this point would be to disable EARLY_AP_STARTUP in
the kernel config. That is, take a configuration with which you're able
to reproduce the hang during boot, and remove "options
EARLY_AP_STARTUP".

This feature has a fairly large impact on the bootup process and has
had a few problems that manifested as hangs during boot. There was at
least one other case where an innocuous change to the kernel
configuration "fixed" the problem by introducing some second-order
effect (causing kernel threads to be scheduled in a different
order, for instance).

Regardless of whether the suggestion above makes a difference, it would
be helpful to see verbose dmesgs from both a clean boot and a boot that
hangs. If disabling EARLY_AP_STARTUP helps, then we can try adding some
assertions that will cause the system to panic when the hang occurs,
making it easier to see what's going on.

> 
> To re-assert: at the time of a hang the CPU fan starts revving up,
> and the USB keyboard is unresponsive (<scroll> does not enter scroll
> mode, caps lock and num lock do not toggle their LED indicators,
> alt ctrl esc do not activate kernel debugger. Loader "Safe mode"
> avoids the problem (presumably by disabling SMP).
> 
> Meanwhile I have successfully upgraded two other similar
> hosts from 11.0 to 11.1-RC3, no surprises there (but they do not
> have the same disk controller).
> 
> Not sure what to try next.
> 
>    Mark



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170720000325.GB9198>