Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Dec 2019 10:35:26 -0500
From:      Marc Branchaud <marcnarc@gmail.com>
To:        Mark Martinec <Mark.Martinec+freebsd@ijs.si>, freebsd-stable@freebsd.org
Subject:   Re: Boot loader stuck after first stage upgrading 11.2 to 12.0-RC2
Message-ID:  <f599acfb-20f7-fc53-9753-fcd37a923e8e@gmail.com>
In-Reply-To: <4c4019102b63054f8de93324dba0e776@ijs.si>
References:  <22f5b92a09ea4d62ac3feb74457067f7@ijs.si> <5EEBAFC0-4FA3-4219-A918-7376F4223656@me.com> <f2737ffb236d39761767aa10a603c084@ijs.si> <0F5FCC70-EADB-4F9E-A391-F1A73BE5608F@me.com> <dc762bdf408c92daae826425fdba98d9@ijs.si> <B3C7194D-93B8-406B-9E8E-BA55D49D657A@me.com> <1543954753.1860.243.camel@freebsd.org> <EC8DD049-8BBE-4E96-A68B-A2846CED00BA@me.com> <53ceda24-fa1b-8546-3511-bd500b440dfe@digiware.nl> <4c4019102b63054f8de93324dba0e776@ijs.si>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-12-10 9:18 a.m., Mark Martinec wrote:
> Commenting on a thread from 2018-12 and from 2019-09-20, with my solution
> to the boot problem at the end, in case anyone is still interested.

Thank you very much for this.  A couple of questions:

(1) Why do you say "raw devices for historical reasons"?  Glancing 
through the zpool man page and the Handbook, I see nothing recommending 
or requiring GPT partitions.

(2) Just to be 100% clear, my 11.3 non-root zpool looks like this:
         NAME        STATE     READ WRITE CKSUM
         storage     ONLINE       0     0     0
           raidz2-0  ONLINE       0     0     0
             ada2    ONLINE       0     0     0
             ada3    ONLINE       0     0     0
             ada4    ONLINE       0     0     0
             ada5    ONLINE       0     0     0
             ada6    ONLINE       0     0     0
             ada7    ONLINE       0     0     0
So this is using raw devices.  Are you saying that if I upgrade this 
machine to 12 that it won't be able to boot?

Thanks again!

		M.


> =======
> 
> On 2018-11-29 myself wrote:
> (after upgrading from 11.2 to 12.0):
>> While booting, the 'BTX loader' comes up, lists the BIOS drives,
>> then the spinner below the list comes up and begins turning,
>> stuttering, and after a couple of seconds it grinds to a standstill
>> and nothing happens afterwards.
>> At this point the ZFS and the bootstrap loader is supposed to
>> come up, but it doesn't.
> [...] (on 2018-12-04):
>> The situation has not changed: the BTX loader lists all BIOS drives
>> C..J (disk0..disk7), then a spinner starts and gets stuck forever.
>> It never reaches the 'BIOS 635kB/3537856kB available memory' line.
>>
>> While trying to restore the old /boot from 11.2, I tried booting
>> a live image from a 12.0-RC3 memory stick - and the loader got
>> stuck again, same as when booting from a disk.
>> So I had to boot from an 11.2 memstick to be able to regain control.
> 
> =======
> 
> 2018-12-04, Ian Lepore writes:
>>   Toomas Soome wrote:
>> |    ok, if you could perform 2 tests:
>> |    1. from loader prompt enter 0x413 0xa000 - @w . cr
>> |    2. on first spinner, press space and type on boot: prompt:
>> |    /boot/loader_4th and see if that will do better
>> |    thanks, toomas
>> I don't think that will be an option.  If it hasn't gotten to the point
>> of saying how much BIOS available memory there is, it's only halfway
>> through loader main() and has hung before getting to interact().
>>
>> In fact, if that line hasn't printed, but some disk drives have been
>> listed, it pretty much has to be hung in the "March through the device
>> switch probing for things" loop. If all the disks are listed, then it
>> got through that entry in the devsw, and is likely hanging in the
>> dv_init calls for either the pxedisk or zfsdev devices.
> 
> =======
> 
> 2018-12-07 19:08, Willem Jan Withagen wrote:
>> Ended up more or less in the same situation this afternoon with
>> freebsd-upgrade to [12.0]-RC3
>> Boot stops after listing all DOS disks, in a spinner.
>> So that is no fix.
>>
>> I booted from USB 11.2 and replaced the /boot/zfs{boot,loader} by the 
>> 11.2 ones.
>> That makes my server again happy.
> 
> =======are
> 
> 2019-09-19 16:02, Kurt Jaeger wrote:
> Subject: Re: Lockdown adaX numbers to allow booting ?
>> |  Kurt Jaeger writes:
>> |    The problem is that if all 10 disks are connected, the system
>> |    looses track from where it should boot and fails to boot (serial 
>> boot log):
>> |
>> |    Consoles: internal video/keyboard  serial port
>> |    BTX loader 1.00  BTX version is 1.02
>> |    Consoles: internal video/keyboard  serial port
>> |    BIOS drive C: is disk0
>> |    BIOS drive D: is disk1
>> |    BIOS drive E: is disk2
>> |    BIOS drive F: is disk3
>> |    BIOS drive G: is disk4
>> |    BIOS drive H: is disk5
>> |    BIOS drive I: is disk6
>> |    BIOS drive J: is disk7
>> |    BIOS drive K: is disk8
>> |    BIOS drive L: is disk9
>> |    //
>> |    [...]
>> |    The solution right now is this to unplug all disks of the 'bck' 
>> pool,
>> |    reboot, and re-insert the data disks after the boot is finished.
>> |    [...]
>> |    No gpart on the bck pool, raw drives.
> 
> 2019-09-20 17:27, Mark Martinec wrote:
> Subject: Re: Lockdown adaX numbers to allow booting ?
>>
>> This sounds very much like my experience:
>>
>>   2018-11-29, Boot loader stuck after first stage upgrading 11.2 to 
>> 12.0-RC2
>> https://lists.freebsd.org/pipermail/freebsd-stable/2018-November/090129.html 
>>
>> https://lists.freebsd.org/pipermail/freebsd-stable/2018-December/090159.html 
>>
>>
>> I now have three SuperMicro machines which are unable to boot after
>> upgrading 11.2 to 12.0. After unsuccessfully fiddling with boot loaders,
>> I have reverted two back to 11.2 (which boots and works fine again),
>> and the third one is now at 12.0 but needs the boot hack as described
>> by Kurt, i.e. pull out half the disks (of the 'data' pool), boot the
>> system, plug the disks back in and zfs mount the remaining pool.
>>
>> Considering that the 11.2 boots and works fine on these machines,
>> I consider it a btx loader failure and not a BIOS issue.
>>
>> What is common with these three machines is that they have one pool
>> on raw devices for historical reasons (not on gpt partitions).
>> My guess is that the new loader gets confused by these raw disks.
> 
> =======
> 
> Ok, now to my current situation and solution/workaround.
> 
> What was common with these hosts (and similar) is that a machine
> has more than a couple of disks, with a zfs pool (non-root) on
> raw devices (for historical reasons), not on gpt partitions.
> 
> Three workarounds seem possible:
> 
> - replace a boot loader with the one from 11.2, or
> 
> - using a default loader from 12, disconnect a sufficient number
>    of data disks, boot, then reconnect disks and zfs attach the pool,
> 
> - or my current solution: zfs offline one disk at a time from
>    a data pool, wipe it, set up a gpt partition on it and
>    put it back to the pool by 'zfs replace', letting it resilver.
>    It was a painful and slightly risky procedure (9 hours of
>    resilvering each of the seven disks), but this procedure
>    has now salvaged our remaining hosts which could not be
>    upgraded from 11.2 to 12.
> 
> Mark
> 
> 
> 
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f599acfb-20f7-fc53-9753-fcd37a923e8e>