Date: Mon, 2 Mar 2020 10:57:39 -0800 From: David Christensen <dpchrist@holgerdanske.com> To: freebsd-questions@freebsd.org Subject: Re: ZFS i/o error on boot unable to start system Message-ID: <5457ae0b-326f-8e7a-eb7f-76dc43c02c8c@holgerdanske.com> In-Reply-To: <3a085472-cfe5-5166-9426-db8ae3344ab8@sentex.net> References: <eb8f8f32fcf5559774daf3a772a1ad2e.squirrel@webmail.harte-lyne.ca> <fcc9f93f-3680-d000-840c-a9be86a53ceb@holgerdanske.com> <3a085472-cfe5-5166-9426-db8ae3344ab8@sentex.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2020-03-02 07:28, mike tancsa wrote: > On 2/28/2020 2:04 PM, David Christensen wrote: >> >> The most likely explanation is that you broke rc.conf. >> > I dont think he is getting that far. This looks like the kernel etc are > not even loaded yet. I agree. I spent many hours yesterday battling the the same error message again. The message appears to be generated late in the boot process when the (third stage?) bootloader or kernel (?) is trying to mount root and/or other ZFS virtual devices. Therefore, both root and /etc/rc.conf are unavailable when the message is printed to the console. It appears that ZFS writes metadata to disk (disks?) that is read by (one or more stages of) the bootloader and/or kernel upon the next boot. If conditions during the next boot are "too different" from the conditions that existed when the metadata was written, the error message is printed and the boot process stops. So far, I have found a few techniques for dealing with this problem: 1. Power off, remove all data disks, and boot. This can work; sometimes. 2. Power off, remove all disks, and boot the FreeBSD USB installer into single user mode. I believe this has always worked. Once at the installer root prompt, remount the installer root filesystem read-write, import the problem disk bootpool, delete /boot/zfs/zpool.cache on the problem disk, export the problem disk bootpool, power off, unplug the installer media, and boot. Again, sometimes. 3. Reverse the ordering of the SATA port connections, so that the system disk is at one end or the other -- e.g. if my motherboard has ports SATA0 through SATA5, put the system disk at SATA0 if it was at SATA5, or put the system disk at SATA5 if it was at SATA0. (Finding the first and last ports gets more complex when you have multiple chips and/or HBA's, and can change when you add more disks.) Again, sometimes. 4. If all else fails, move the system disk to another computer and start over. (Or build a new one in the target computer.) By executing permutations of the above techniques, I was eventually able to get both computers to boot with one bulk data pool in each. The dimensions of this problem appear to be: 1. Firmware-- BIOS (myself, OP?) vs. (U)EFI 2. Partitioning -- MBR (myself) vs. GPT (OP?) 3. SATA port ordering -- system disk on first SATA port (unknown) vs. system disk on last SATA port (unknown) 4. Boot file system -- UFS vs. ZFS boot (myself, OP?) 5. Root file system -- UFS vs. ZFS root (myself, OP?) 6. Number of VDEV's -- multiple devices in one VDEV with everything (OP?) vs. one device with boot partition/VDEV and root partition/VDEV, and multiple devices in a yet another VDEV with bulk data (myself). So, 2**6 = 64 combinations using just the dimensions and choices above (and there are more). My case may have some overlap with the OP and/or the various respondants, but the differences make troubleshooting difficult because what may work for one combination does not apply or may not work for another. My next battle will be inserting and removing additional disks for rotating backups... David
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5457ae0b-326f-8e7a-eb7f-76dc43c02c8c>