Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Jul 2017 01:01:16 +0200
From:      Mark Martinec <Mark.Martinec+freebsd@ijs.si>
To:        freebsd-stable@freebsd.org
Cc:        freebsd-hackers@freebsd.org
Subject:   The 11.1-RC3 can only boot and attach disks in "Safe mode", otherwise gets stuck attaching
Message-ID:  <e4acc16980fe65751325333870bf2b68@ijs.si>

next in thread | raw e-mail | index | archive | help
Upgrading 11.0-RELEASE-p11 to 11.1-RC3 using the usual freebsd-update 
upgrade
method I ended up with a system which gets stuck while trying to attach
the second set of disks. This happened already after the first phase of
the upgrade procedure (installing and re-booting with a new kernel).

The first set of disks (ada0 .. ada2) are attached successfully, also a
cd0, but then when the first of the set of four (a regular spinning 
disk)
on an LSI controller is to be attached, the boot procedure just gets
stuck there:

   kernel: ada1: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes)
   kernel: ada1: Command Queueing enabled
   kernel: ada1: 305245MB (625142448 512 byte sectors)
   kernel: ada2 at ahcich6 bus 0 scbus8 target 0 lun 0
   kernel: ada2: <OCZ-VERTEX3 2.25> ATA8-ACS SATA 3.x device
   kernel: ada2: Serial Number OCZ-O1L6RF591R09Z5C8
   kernel: ada2: 300.000MB/s transfers (SATA 2.x, PIO4, PIO 8192bytes)
   kernel: ada2: Command Queueing enabled
   kernel: ada2: 114473MB (234441648 512 byte sectors)
   kernel: ada2: quirks=0x1<4K>
   kernel: da0 at mps0 bus 0 scbus0 target 2 lun 0

(stuck here, keyboard not responding, fans rising their pitch,
  presumably CPU is spinning)

(instead of the normal continuation like:
   kernel: da0: <ATA ST32000645NS 0004> Fixed Direct Access SPC-4 SCSI 
device
   kernel: da0: Serial Number ....
   kernel: da0: 600.000MB/s transfers
   kernel: da0: Command Queueing enabled
   kernel: da0: 1907729MB (3907029168 512 byte sectors)
)

The controller for da0 .. da3 is an LSI:

   kernel: mps0: <Avago Technologies (LSI) SAS2004> port 0x4000-0x40ff 
mem 0xd1740000-0xd1743fff,0xd1300000-0xd133ffff irq 16 at device 0.0 on 
pci1
   kernel: mps0: Firmware: 14.00.01.00, Driver: 21.02.00.00-fbsd
   kernel: mps0: IOCCapabilities: 
185c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,IR>
[...]
   kernel: mps0: SAS Address for SATA device = a4a4843003d0cf79
   kernel: mps0: SAS Address from SATA device = a4a4843003d0cf79
   kernel: mps0: SAS Address for SATA device = d3d48904eddff0d5
   kernel: mps0: SAS Address from SATA device = d3d48904eddff0d5
[...]
   kernel: mps0: SAS Address for SATA device = 2a021c07585c665b
   kernel: mps0: SAS Address from SATA device = 2a021c07585c665b
   kernel: mps0: SAS Address for SATA device = 2a021c0758637b7c
   kernel: mps0: SAS Address from SATA device = 2a021c0758637b7c

This host in this configuration worked perfectly well with 11.0 and
many older versions of the OS.

After some frustration I found out that the system can boot fine
if a boot loader option "Safe mode" is set. This way I successfully
finished the upgrade procedure (installing world).

Playing with loader options that the "Safe mode" turns on
( /boot/menu-commands.4th ) it seems that kern.smp.disabled=1
is the crucial option, although my attempts at ruling out remaining
options of the "Safe mode" turned out inconclusive - perhaps there
is some random/race involved. Anyway, in "Safe mode" the machine
always boots normally and attaches all disks.

This experience is much like described in:
   https://forums.freebsd.org/threads/56524/
where the poster ended up disabling SMP to be able to have a working 
host.

It is also somewhat similar to:
   
https://lists.freebsd.org/pipermail/freebsd-hackers/2017-July/051258.html
where a FreeBSD 11.1 prerelease only boots on a single-CPU AWS host,
but fails to boot on a 2-core CPU, with various symptoms, including:
   ( 
https://lists.freebsd.org/pipermail/freebsd-hackers/2017-July/051260.html 
)
   Feeding entropy: .
   spin lock 0xffffffff80db45c0 (smp rendezvous) held by 
0xfffff80004378560
   (tid 100074) too long
   timeout stopping cpus
   panic: spin lock held too long

Please advise, thanks
   Mark



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?e4acc16980fe65751325333870bf2b68>