Date: Fri, 13 Dec 2019 15:57:24 -0500 From: Matthew Pounsett <matt@conundrum.com> To: freebsd-questions@freebsd.org Subject: Root volume renumbered unexpectedly, no longer boots Message-ID: <CAAiTEH94JZFf6XpmXbAUFrWbjA8CXF-EpH231huzmxX%2BcjkvVQ@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
We have a large-ish FreeBSD 11.2-p7 file server with two 24-disk ZFS pools (20 live 4 spares each) and a single SSD boot volume. Yesterday we pulled some dead drives from the ZFS pools and replaced them with new drives intended to become spares. After powering the system back up, it looks like the boot volume has been renumbered from da0 to da4. I thought renumbering like this wasn't supposed to happen for at least the last decade, since ATA_STATIC_ID was introduced to the kernel, but there's little doubt that's what's happened. Automatic boot now fails and drops to the third stage loader prompt when the kernel tries to mount the root volume from ufs:/dev/da0p2. I can manually try to mount the root volume as ufs:/dev/da4p2, and the system begins to load the root volume, but then hangs. The only two lines printed after loading da4 are related to loading up the ZFS pools. I can't reproduce the messages again now (explained below), so quoting them verbatim isn't possible, but they're related to the ZFS version being behind and suggesting I upgrade the pools. The messages themselves are not unusual,and I'm used to seeing similar messages in the 'zpool status' output for a while now. What is unusual is that the system seems to hang at this point. I'm concerned that the re-ordering of drives might be causing problems for the system trying to put the ZFS pools back together. I don't really know, though. Does anyone have any insight into what's going on here? There is a new wrinkle... since booting from a USB stick so that I could get into the box and double-check some things, and confirm the location of the root volume, the BIOS no longer seems to see da4 as a potential boot volume. I'm hoping that goes back to the way it was once the USB stick is removed. At the moment I have no way to even get the box to try/fail to boot from its normal boot volume. The machine is many thousands of miles remote, so I haven't tried to do this yet... I can invoke some remote help once that's necessary. BIOS issue aside, I'm hoping there's a way I can pin this drive back to da0. I don't know how that could be done, but if anyone has any suggestions I'd happily try them. Failing that, I suppose I can just insert a vfs.root.mountfrom option in loader.conf. Can anyone clue me into what's happening here, or suggest some further troubleshooting that will help me gain some insight? Thanks!
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAiTEH94JZFf6XpmXbAUFrWbjA8CXF-EpH231huzmxX%2BcjkvVQ>