Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Dec 2017 20:28:00 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Eric McCorkle <eric@metricspace.net>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, Warner Losh <imp@freebsd.org>, Allan Jude <allanjude@freebsd.org>
Subject:   Re: loader.efi architecture for replacing boot1.efi
Message-ID:  <CANCZdfr0=WzVkUb85o2aUT3eA7EAAx4MCnQy6gk8XdeJvb9tsA@mail.gmail.com>
In-Reply-To: <CANCZdfrpi3JTDxo17RBiLdZ=UjdPF3FgpqwmBepZ=8k5-P0F2g@mail.gmail.com>
References:  <1fa7edde-6ac0-1d4f-e75a-503b23a5d4dc@metricspace.net> <CANCZdfpJm9MjxvO4dPy7qZ4jjot44yAMj7NhaY_MQ5z7WVbd9A@mail.gmail.com> <46af04dd-8f74-b9dc-3d3a-343f022129ed@metricspace.net> <CANCZdfrpi3JTDxo17RBiLdZ=UjdPF3FgpqwmBepZ=8k5-P0F2g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Dec 15, 2017 at 7:05 PM, Warner Losh <imp@bsdimp.com> wrote:

>
>
> On Dec 15, 2017 6:43 PM, "Eric McCorkle" <eric@metricspace.net> wrote:
>
> On 12/15/2017 20:09, Warner Losh wrote:
>
> > This should be second. Uefi variables Trump all.
> >
> >     2) If not, then attempt to read EFI vars to determine the boot
> location
> >
> >     3) If no EFI vars are defined, and no partition was specified, fall
> back
> >     to looking for an installed system on devices
> >
> >
> > This is fine, so long as it is only on the device that the loader loaded
> > from.
>
> It's fine if it's configurable, but there needs to be sane behavior if
> the EFI vars aren't set.
>
>
> Where do we get this info for such a broken setup? Do you have actual
> examples?
>
> >     4) At the very last, do the legacy (what loader.efi currently does)
> >     behavior.
> >
> >
> > This is bogus. It violates the uefi boot loader protocol. We must
> > abandon this legacy behavior. The behavior is actively harmful since
> > something random will boot. This has caused actual operational issues at
> > Netflix. Guessing is really bad.
>
> We can't just ditch the current behavior and break everyone's existing
> install, though.  Legacy behavior should be supported at least until the
> next major release.
>
>
> What useful setups does this break? Absent a real example, we absolutely
> are breaking this. There is a real cost to doing this that as the de facto
> maintainer of stand I'm unwilling to maintain, test or commit to not
> breaking. The legacy behavior is broken and has caused me hours of pain in
> production. There has been no articulated use case this enables, especially
> since boot loader can be interrupted to specify something in recovery
> scenarios.
>
>
> >
> >     Step (3) is done by attempting to stat /boot/loader.conf and
> >     /boot/kernel.  First, all partitions on the same disk are searched,
> then
> >     all remaining partitions are searched.
> >
> >     This should allow mechanisms like EFI vars and command-line args to
> work
> >     without interference from the fallback mechanisms.  However, it also
> >     provides robustness in the face of failure modes and uninitialized
> >     systems (I personally ran into a problem a while back with a linux
> >     system, where I couldn't boot with EFI, because the EFI vars weren't
> >     set, because I couldn't set them if I couldn't boot with EFI; had to
> use
> >     Shell.efi to sort out the mess...)
> >
> >     More importantly, it provides a seamless transition from the way
> things
> >     are now to the way we want things to be.
> >
> >     Please provide comments and feedback.
> >
> >
> > Please listen when I say searching all devices is actively harmful. The
> > uefi boot manager, which I'm in the process of bringing in, offers a way
> > to specifically say what you want to boot. If someone needs something
> > complicated, they must use that moving forward. Part of what makes the
> > protocol work is loaders giving up early so the next one on the list can
> > be tried.
>
> We also have to deal with the reality that some EFI implementations are
> adversarial.  We have to be able to deal with implementations that make
> it difficult to set EFI vars, or which mess with their values (Lenovo is
> particularly notorious for this).
>
> You can disable fallback mechanisms with command-line args or macros or
> whatever, but they need to be there.
>
>
> No. Absent a sane use case, I refuse. Give me a reasonable use case, I
> will reconsider.
>
>
So the current behavior leads to absurd results that nobody else does, and
that we don't do for legacy boot:

If we boot loader.efi/boot1.efi off a hard drive, and find there's no
kernel, we'll load off cdrom or a floppy if we happen to find a kernel
there. That's nuts. What's more, we'll load off a different device (say a
thumb drive), which is also crazy. The last thing you want is to
accidentally pick the thumb drive recovery kernel that happens to be in a
USB slot when you have a primary and secondary partition on two main disks,
but today's behavior chooses that. It's so crazy that I can see no benefit
from supporting, testing and maintaining this. If someone wants to recover
a system, they can do it at the boot loader prompt now (they couldn't
before). If someone really wants to boot his crazy thing, we have a new way
to specify it specifically w/o any ambiguity based on how the devices might
move around.

We already support about 100 boot scenarios that are hard enough to test. I
don't want to commit to supporting this and making it 120 or 150 once you
work out all the combinatorics. We have to trim the matrix of useless
things.  So absent a use case that makes sense, that people are actually
doing, I'm having a hard time justifying keeping it around as we transition.

Warner

P.S. On x86, we support geli/nogeli, gpt/mbr, ufs/zfs, and uefi/legacy/both
(24 combinations). Plus we support booting off CDROM, netbooting, etc. For
arm, and arm64 we have a similar number that are possible. zfs/ufs,
u-boot/uefi, and mbr/gpt (plus a number of different u-boot boards). For
mips we have a similar mix. Powerpc we support 4 or 6 ways. It's just too
much to hope to test and ensure works. Each new thing has an non-trivial
cost, and I see zero benefit from this one more thing, especially since it
gets in the way of UEFI boot manager support.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfr0=WzVkUb85o2aUT3eA7EAAx4MCnQy6gk8XdeJvb9tsA>