Date: Sun, 21 Oct 2018 23:24:43 -0700 From: Mark Millard <marklmi@yahoo.com> To: Warner Losh <imp@bsdimp.com> Cc: Konstantin Belousov <kib@freebsd.org>, FreeBSD Current <freebsd-current@freebsd.org>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated Message-ID: <50C22D2F-0D72-4485-9AE2-E22EC336F8CB@yahoo.com> In-Reply-To: <CANCZdfoHg8=FfuJchyPJ9qBDZBkR_7nYTWPiQedZkW4Cs1pR5A@mail.gmail.com> References: <79973E2B-F5C4-4E7C-B92B-1C8D4441C7D1@yahoo.com> <ACBB38EF-9A6A-40E5-AB6C-EEB9E292A919@yahoo.com> <EDBFFACB-8582-4B16-AC1A-63F8C86C9BA4@yahoo.com> <CANCZdfo=uqLn16r0FShz=WEv3Z34LbmC1gqzKabwfr3gEUXsJg@mail.gmail.com> <CANCZdfoHg8=FfuJchyPJ9qBDZBkR_7nYTWPiQedZkW4Cs1pR5A@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2018-Oct-21, at 8:30 PM, Warner Losh <imp at bsdimp.com> wrote: > On Sun, Oct 21, 2018 at 9:28 PM Warner Losh <imp at bsdimp.com> wrote: >=20 > On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable = <freebsd-stable@freebsd.org> wrote: >> [I built based on WITHOUT_ZFS=3D for other reasons. But, >> after installing the build, Hyper-V based boots are >> working.] >>=20 >> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> = wrote: >>=20 >> > On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> = wrote: >> >=20 >> >> I attempted to jump from head -r334014 to -r339076 >> >> on a threadripper 1950X board and the boot fails. >> >> This is both native booting and under Hyper-V, >> >> same machine and root file system in both cases. >> >=20 >> > I did my investigation under Hyper-V after seeing >> > a boot failure native. >> >=20 >> > Looks like the native failure is even earlier, >> > before db> is even possible, possibly during >> > early loader activity. >> >=20 >> > So this report is really for running under >> > Hyper-V: -r338804 boots and -r338810 does >> > not. By contrast -r334804 does not boot native. >> > (But I've little information for that context.) >> >=20 >> > Sorry for the confusion. I rushed the report >> > in hopes of getting to sleep. It was not to be. >> >=20 >> >> It fails just after the FreeBSD/SMP lines, >> >> reporting "kernel trap 9 with interrupts disabled". >> >>=20 >> >> It fails in pmap_force_invaldiate_cache_range at >> >> a clflusl (%rax) instruction that produces a >> >> "Fatal trap 9: general protection fault while >> >> in kernel mode". cpudid=3D0 apic id=3D 00 >> >>=20 >> >> I used kernel.txz files from: >> >>=20 >> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >> >>=20 >> >> to narrow the range of kernel builds for working -> failing >> >> and got: >> >>=20 >> >> -r338804 boots fine >> >> (no amd64 kernel builds between to try) >> >> -r338810+ fails (any that I tried, anyway) >> >>=20 >> >> In that range is -r338807 : >> >>=20 >> >> QUOTE >> >> Author: kib >> >> Date: Wed Sep 19 19:35:02 2018 >> >> New Revision: 338807 >> >> URL:=20 >> >> https://svnweb.freebsd.org/changeset/base/338807 >> >>=20 >> >>=20 >> >> Log: >> >> Convert x86 cache invalidation functions to ifuncs. >> >>=20 >> >> This simplifies the runtime logic and reduces the number of >> >> runtime-constant branches. >> >>=20 >> >> Reviewed by: alc, markj >> >> Sponsored by: The FreeBSD Foundation >> >> Approved by: re (gjb) >> >> Differential revision: =20 >> >> https://reviews.freebsd.org/D16736 >> >>=20 >> >> Modified: >> >> head/sys/amd64/amd64/pmap.c >> >> head/sys/amd64/include/pmap.h >> >> head/sys/dev/drm2/drm_os_freebsd.c >> >> head/sys/dev/drm2/i915/intel_ringbuffer.c >> >> head/sys/i386/i386/pmap.c >> >> head/sys/i386/i386/vm_machdep.c >> >> head/sys/i386/include/pmap.h >> >> head/sys/x86/iommu/intel_utils.c >> >> END QUOTE >> >>=20 >> >> There do seem to be changes associated with >> >> clflush(...) use. Looking at: >> >>=20 >> >> = https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=3D339= 432 >> >>=20 >> >> it appears that pmap_force_invalidate_cache_range has not >> >> changed since -r338807. >> >>=20 >> >> It seems that -r338806 and -r3388810 would be unlikely >> >> contributors. >> >=20 >>=20 >> I went after my native-boot loader problem first because I >> could switch kernels via the loader for booting FreeBSD under >> Hyper-V. Switching loaders is more of a problem. >>=20 >> In order to avoid the loader-time crash I switched to building >> installing based on WITHOUT_ZFS=3D . I've had no active use of >> ZFS in years. (The old official-build loaders that worked were >> non-ZFS ones.) >>=20 >> This took care of the native-boot loader-crash --and, to my >> surprise, also the Hyper-V-boot kernel-time crash. >>=20 >> My private builds now boot the 1950X in both contexts just >> fine. >>=20 >> During my early investigation I did pick up specific changes >> from after -r339076 that seemed to be tied to Ryzen and such. >> (They made no difference to the boot problems at the time >> but I saw no reason to remove them.) >>=20 >> # uname -apKU >> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: = Sun Oct 21 16:44:25 PDT 2018 = markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/G= ENERIC-NODBG amd64 amd64 1200084 1200084 >>=20 >> (stupid gmail)=20 >=20 > The phrase "no active use" bothers me. What does that mean? Are there = any ZFS pools or any disks that any whiff of ZFSish thing on it at all? = Clearly, there's something in the zfs boot loader that's freaking out by = something on your system, but absent that information I can't help you. No ZFS pools: Strictly UFS for FreeBSD file systems for the last few years, UFS before I had access to the 1950X system. I've never before bothered to use WITHOUT_ZFS=3D in my builds. So the system had the ZFS support, such as kernel modules, over all the time that this system had been in use. Prior to the recent versions I saw no such problems. But the default loader was not ZFS capable. As seen in the under-Hyper-V use-context: # gpart show -p =3D> 40 937703008 da0 GPT (447G) 40 1024 da0p1 freebsd-boot (512K) 1064 746586112 da0p2 freebsd-ufs (356G) 746587176 31457280 da0p3 freebsd-swap (15G) 778044456 159383552 da0p4 freebsd-swap (76G) 937428008 275040 - free - (134M) =3D> 40 937703008 da1 GPT (447G) 40 1024 da1p1 freebsd-boot (512K) 1064 369098752 da1p2 freebsd-ufs (176G) 369099816 406846424 da1p3 freebsd-swap (194G) 775946240 130024488 - free - (62G) 905970728 31457280 da1p4 freebsd-swap (15G) 937428008 275040 - free - (134M) =3D> 40 419430320 da2 GPT (200G) 40 4056 - free - (2.0M) 4096 419426263 da2p1 freebsd-ufs (200G) 419430359 1 - free - (512B) =3D> 40 2000409184 da3 GPT (954G) 40 1024 da3p1 freebsd-boot (512K) 1064 2000408159 da3p2 freebsd-ufs (954G) 2000409223 1 - free - (512B) So no ZFS pools. The above context never had the ZFS-capable loader problem but did have the kernel problem. I was booting the 356G freebsd-ufs partition: the only one that I have updated the FreeBSD version on so far. FreeBSD booted natively more drives are seen in gpart show, some not from/for FreeBSD. But the above drives are present and I was booting from the same partition of the same drive: the 356G freebsd-ufs partition. Still no ZFS pools anywhere: # gpart show -p =3D> 34 4000797293 nvd0 GPT (1.9T) 34 262144 nvd0p1 ms-reserved (128M) 262178 2014 - free - (1.0M) 264192 3600451584 nvd0p2 ms-basic-data (1.7T) 3600715776 400081551 - free - (191G) =3D> 40 937703008 nvd1 GPT (447G) 40 1024 nvd1p1 freebsd-boot (512K) 1064 746586112 nvd1p2 freebsd-ufs (356G) 746587176 31457280 nvd1p3 freebsd-swap (15G) 778044456 159383552 nvd1p4 freebsd-swap (76G) 937428008 275040 - free - (134M) =3D> 40 937703008 nvd2 GPT (447G) 40 1024 nvd2p1 freebsd-boot (512K) 1064 369098752 nvd2p2 freebsd-ufs (176G) 369099816 406846424 nvd2p3 freebsd-swap (194G) 775946240 130024488 - free - (62G) 905970728 31457280 nvd2p4 freebsd-swap (15G) 937428008 275040 - free - (134M) =3D> 34 2000409197 nvd3 GPT (954G) 34 2014 - free - (1.0M) 2048 1021952 nvd3p1 ms-recovery (499M) 1024000 202752 nvd3p2 efi (99M) 1226752 32768 nvd3p3 ms-reserved (16M) 1259520 1859119104 nvd3p4 ms-basic-data (886G) 1860378624 140030607 - free - (67G) =3D> 40 2000409184 nvd4 GPT (954G) 40 1024 nvd4p1 freebsd-boot (512K) 1064 2000408159 nvd4p2 freebsd-ufs (954G) 2000409223 1 - free - (512B) =3D> 63 2000409201 ada0 MBR (954G) 63 1985 - free - (993K) 2048 4096 ada0s1 linux-data (2.0M) 6144 2093056 - free - (1.0G) 2099200 1998309376 ada0s2 linux-lvm (953G) 2000408576 688 - free - (344K) =3D> 34 2000409197 ada1 GPT (954G) 34 262144 ada1p1 ms-reserved (128M) 262178 2000147053 - free - (954G) =3D> 34 2000409197 ada2 GPT (954G) 34 262144 ada2p1 ms-reserved (128M) 262178 2000147053 - free - (954G) =3D> 34 1953497022 da0 GPT (932G) 34 262144 da0p1 ms-reserved (128M) 262178 2014 - free - (1.0M) 264192 1953230848 da0p2 ms-basic-data (931G) 1953495040 2016 - free - (1.0M) =3D> 1 60062499 da1 MBR (29G) 1 31 - free - (16K) 32 60062468 da1s1 fat32lba (29G) The 356G freebsd-ufs partition is the only one of the freebsd-ufs partitions updated so far. This is the context that had the problem with the ZFS-capable loaders --but no later kernel problem when a not-ZFS-capable loader was used (via copying over an older one --until I did the WITHOUT_ZFS=3D build/install). As for the ZFS-capable loader: May it has problems when it sees one or more of: ms-reserved (on GPT) ms-basic-data (on GPT) (NTFS file system) ms-recovery (on GPT) efi (on GPT) linux-data (on MBR) linux-lvm (on MBR) fat32lba (on MBR) (given that none of these is available in the Hyper-V context as the virtual machine has been configured). =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50C22D2F-0D72-4485-9AE2-E22EC336F8CB>