Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Jul 2017 01:46:33 +0200
From:      Mark Martinec <Mark.Martinec+freebsd@ijs.si>
To:        freebsd-stable@freebsd.org
Subject:   Re: The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
Message-ID:  <9b3563aae75aa954d7fe31ffe25e1d29@ijs.si>
In-Reply-To: <c8140f430fb2af93a6bc70a3df8cdadc@ijs.si>
References:  <e4acc16980fe65751325333870bf2b68@ijs.si> <20170717232434.GB21048@wkstn-mjohnston.west.isilon.com> <c8140f430fb2af93a6bc70a3df8cdadc@ijs.si>

next in thread | previous in thread | raw e-mail | index | archive | help
More news on the matter. As reported yesterday the locally built
kernel with options INVARIANTS and DDB works fine and somehow avoids
the trouble at attaching the da (mps) disks on an LSI controller, so
today I wanted to get back to a reproducible hang - and sure enough,
reverting to the generic kernel as distributed brings back the hang.

So I tried rebuilding the kernel while experimenting with options
like DDB and INVARIANTS.

A locally built GENERIC kernel behaves the same as the original
kernel from the distribution (as installed by freebsd-upgrade),
so no surprises there. It hangs trying to attach the first of the
da disks (after first successfully attaching all the ada disks).
The alt ctrl esc is unable to enter debugger when the hang occurs
(possibly due to an unresponsive USB keyboard at that time),
even though the debug.kdb.break_to_debugger was set to 1 at a
loader prompt. It needs loader "Safe mode" to be able to boot.

Next, a locally built kernel with DDB and INVARIANTS works well
(the remaining options come from an included GENERIC).

Now the funny part: a locally built kernel with just the DDB
option (and the rest included from GENERIC) *also* works well.
Somehow the DDB option makes a difference, even though kernel
debugger is never activated.

To re-assert: at the time of a hang the CPU fan starts revving up,
and the USB keyboard is unresponsive (<scroll> does not enter scroll
mode, caps lock and num lock do not toggle their LED indicators,
alt ctrl esc do not activate kernel debugger. Loader "Safe mode"
avoids the problem (presumably by disabling SMP).

Meanwhile I have successfully upgraded two other similar
hosts from 11.0 to 11.1-RC3, no surprises there (but they do not
have the same disk controller).

Not sure what to try next.

   Mark



2017-07-19 01:18, Mark Martinec wrote:
> 2017-07-18 01:24, Mark Johnston wrote:
>> Are you able to break into the debugger at this point? Try setting
>> debug.kdb.break_to_debugger=1 and debug.kdb.alt_break_to_debugger=1 at
>> the loader prompt, and hit the break key, or the key sequence
>> <CR> ~ ctrl-b once the hang occurs. At the debugger prompt, try
>> "bt" and "show allpcpu" to start.
> 
> Thank you for a prompt and good suggestion! I spent an afternoon
> fiddling with the machine, with mixed results. Your suggestion to
> break into debugger did not work, there was no reaction to <break>
> or to <CR> ~ ctrl-b.
> 
> So I embarked on rebuilding the RC3 kernel with
>   options KDB
>   options DDB
>   options BREAK_TO_DEBUGGER
>   options ALT_BREAK_TO_DEBUGGER
>   options INVARIANTS
>   options INVARIANT_SUPPORT
>   options WITNESS
>   options WITNESS_SKIPSPIN
> but then I realized the <debug> key is mapped-to by: alt ctrl <esc>,
> which now does break into debugger - but not so early where the
> holdup occurs.
> 
> The WITNESS produced some LOR warnings, but that is probably ok.
> I came across a trace just before the problem area, but it flows
> by so fast on a vt console and only the last 40 or so lines
> remain on the screen (I have a photo), which do not look like
> revealing much. Unfortunately this machine does not have a serial
> interface.
> 
> So in my last attempt I rebuilt a kernel with INVARIANTS but
> without WITNESS - and now I cannot reproduce the problem, with
> or without a "safe mode". What is interesting here that now
> the da0..da3 disks are attached first, and only then the ada
> disks - and even within the group of disks on the same
> controller their order has been shuffled - no idea what could
> have caused it - and it may have avoided the problem by doing so.
> 
> Will play some more with this tomorrow...
> 
>   Mark
> 
> 
>> On Tue, Jul 18, 2017 at 01:01:16AM +0200, Mark Martinec wrote:
>>> Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update
>>> upgrade
>>> method I ended up with a system which gets stuck while trying to 
>>> attach
>>> the second set of disks. This happened already after the first phase 
>>> of
>>> the upgrade procedure (installing and re-booting with a new kernel).
>>> 
>>> The first set of disks (ada0 .. ada2) are attached successfully, also 
>>> a
>>> cd0, but then when the first of the set of four (a regular spinning
>>> disk)
>>> on an LSI controller is to be attached, the boot procedure just gets
>>> stuck there:
>>>    kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 
>>> 8192bytes)
>>>    kernel: ada1: Command Queueing enabled
>>>    kernel: ada1: 305245MB (625142448 512 byte sectors)
>>>    kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0
>>>    kernel: ada2: <OCZ-VERTEX3 2.25> ATA8-ACS SATA 3.x device
>>>    kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8
>>>    kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 
>>> 8192bytes)
>>>    kernel: ada2: Command Queueing enabled
>>>    kernel: ada2: 114473MB (234441648 512 byte sectors)
>>>    kernel: ada2: quirks=0x1<4K>
>>>    kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0
>>> 
>>> (stuck here, keyboard not responding, fans rising their pitch,
>>>   presumably CPU is spinning)
> [...]
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to 
> "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9b3563aae75aa954d7fe31ffe25e1d29>