Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Mar 2020 10:57:39 -0800
From:      David Christensen <dpchrist@holgerdanske.com>
To:        freebsd-questions@freebsd.org
Subject:   Re: ZFS i/o error on boot unable to start system
Message-ID:  <5457ae0b-326f-8e7a-eb7f-76dc43c02c8c@holgerdanske.com>
In-Reply-To: <3a085472-cfe5-5166-9426-db8ae3344ab8@sentex.net>
References:  <eb8f8f32fcf5559774daf3a772a1ad2e.squirrel@webmail.harte-lyne.ca> <fcc9f93f-3680-d000-840c-a9be86a53ceb@holgerdanske.com> <3a085472-cfe5-5166-9426-db8ae3344ab8@sentex.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-03-02 07:28, mike tancsa wrote:
> On 2/28/2020 2:04 PM, David Christensen wrote:
>>
>> The most likely explanation is that you broke rc.conf.
>>
> I dont think he is getting that far. This looks like the kernel etc are
> not even loaded yet.


I agree.  I spent many hours yesterday battling the the same error 
message again.  The message appears to be generated late in the boot 
process when the (third stage?) bootloader or kernel (?) is trying to 
mount root and/or other ZFS virtual devices.  Therefore, both root and 
/etc/rc.conf are unavailable when the message is printed to the console.


It appears that ZFS writes metadata to disk (disks?) that is read by 
(one or more stages of) the bootloader and/or kernel upon the next boot. 
  If conditions during the next boot are "too different" from the 
conditions that existed when the metadata was written, the error message 
is printed and the boot process stops.


So far, I have found a few techniques for dealing with this problem:

1.  Power off, remove all data disks, and boot.  This can work; sometimes.

2.  Power off, remove all disks, and boot the FreeBSD USB installer into 
single user mode.  I believe this has always worked.  Once at the 
installer root prompt, remount the installer root filesystem read-write, 
import the problem disk bootpool, delete /boot/zfs/zpool.cache on the 
problem disk, export the problem disk bootpool, power off, unplug the 
installer media, and boot.  Again, sometimes.

3.  Reverse the ordering of the SATA port connections, so that the 
system disk is at one end or the other -- e.g. if my motherboard has 
ports SATA0 through SATA5, put the system disk at SATA0 if it was at 
SATA5, or put the system disk at SATA5 if it was at SATA0.  (Finding the 
first and last ports gets more complex when you have multiple chips 
and/or HBA's, and can change when you add more disks.)  Again, sometimes.

4.  If all else fails, move the system disk to another computer and 
start over.  (Or build a new one in the target computer.)


By executing permutations of the above techniques, I was eventually able 
to get both computers to boot with one bulk data pool in each.


The dimensions of this problem appear to be:

1.  Firmware-- BIOS (myself, OP?) vs. (U)EFI

2.  Partitioning -- MBR (myself) vs. GPT (OP?)

3.  SATA port ordering -- system disk on first SATA port (unknown) vs. 
system disk on last SATA port (unknown)

4.  Boot file system -- UFS vs. ZFS boot (myself, OP?)

5.  Root file system -- UFS vs. ZFS root (myself, OP?)

6.  Number of VDEV's -- multiple devices in one VDEV with everything 
(OP?) vs. one device with boot partition/VDEV and root partition/VDEV, 
and multiple devices in a yet another VDEV with bulk data (myself).


So, 2**6 = 64 combinations using just the dimensions and choices above 
(and there are more).  My case may have some overlap with the OP and/or 
the various respondants, but the differences make troubleshooting 
difficult because what may work for one combination does not apply or 
may not work for another.


My next battle will be inserting and removing additional disks for 
rotating backups...


David



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5457ae0b-326f-8e7a-eb7f-76dc43c02c8c>