From owner-freebsd-stable@freebsd.org Mon Oct 22 09:28:24 2018 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C1215FF0F1B; Mon, 22 Oct 2018 09:28:23 +0000 (UTC) (envelope-from tsoome@me.com) Received: from pv33p00im-asmtp001.me.com (pv33p00im-asmtp001.me.com [17.142.194.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 56E3570B0D; Mon, 22 Oct 2018 09:28:23 +0000 (UTC) (envelope-from tsoome@me.com) Received: from process-dkim-sign-daemon.pv33p00im-asmtp001.me.com by pv33p00im-asmtp001.me.com (Oracle Communications Messaging Server 8.0.2.2.20180531 64bit (built May 31 2018)) id <0PGZ00400U1AD700@pv33p00im-asmtp001.me.com>; Mon, 22 Oct 2018 09:28:11 +0000 (GMT) Received: from icloud.com ([127.0.0.1]) by pv33p00im-asmtp001.me.com (Oracle Communications Messaging Server 8.0.2.2.20180531 64bit (built May 31 2018)) with ESMTPSA id <0PGZ00FNHUA5JG20@pv33p00im-asmtp001.me.com>; Mon, 22 Oct 2018 09:27:46 +0000 (GMT) X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1810220085 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2018-10-21_14:,, signatures=0 From: Toomas Soome Message-id: <3CA4C94F-A062-44FE-B507-948A6F88C83D@me.com> MIME-version: 1.0 (Mac OS X Mail 12.0 \(3445.100.39\)) Subject: Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated Date: Mon, 22 Oct 2018 12:27:40 +0300 In-reply-to: Cc: Konstantin Belousov , FreeBSD Current , FreeBSD-STABLE Mailing List , Warner Losh To: Mark Millard References: <79973E2B-F5C4-4E7C-B92B-1C8D4441C7D1@yahoo.com> X-Mailer: Apple Mail (2.3445.100.39) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Oct 2018 09:28:24 -0000 > On 22 Oct 2018, at 06:30, Warner Losh wrote: >=20 > On Sun, Oct 21, 2018 at 9:28 PM Warner Losh > wrote: >=20 >>=20 >>=20 >> On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable < >> freebsd-stable@freebsd.org> wrote: >>=20 >>> [I built based on WITHOUT_ZFS=3D for other reasons. But, >>> after installing the build, Hyper-V based boots are >>> working.] >>>=20 >>> On 2018-Oct-20, at 2:09 AM, Mark Millard = wrote: >>>=20 >>>> On 2018-Oct-20, at 1:39 AM, Mark Millard = wrote: >>>>=20 >>>>> I attempted to jump from head -r334014 to -r339076 >>>>> on a threadripper 1950X board and the boot fails. >>>>> This is both native booting and under Hyper-V, >>>>> same machine and root file system in both cases. >>>>=20 >>>> I did my investigation under Hyper-V after seeing >>>> a boot failure native. >>>>=20 >>>> Looks like the native failure is even earlier, >>>> before db> is even possible, possibly during >>>> early loader activity. >>>>=20 >>>> So this report is really for running under >>>> Hyper-V: -r338804 boots and -r338810 does >>>> not. By contrast -r334804 does not boot native. >>>> (But I've little information for that context.) >>>>=20 >>>> Sorry for the confusion. I rushed the report >>>> in hopes of getting to sleep. It was not to be. >>>>=20 >>>>> It fails just after the FreeBSD/SMP lines, >>>>> reporting "kernel trap 9 with interrupts disabled". >>>>>=20 >>>>> It fails in pmap_force_invaldiate_cache_range at >>>>> a clflusl (%rax) instruction that produces a >>>>> "Fatal trap 9: general protection fault while >>>>> in kernel mode". cpudid=3D0 apic id=3D 00 >>>>>=20 >>>>> I used kernel.txz files from: >>>>>=20 >>>>> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/ >>>>>=20 >>>>> to narrow the range of kernel builds for working -> failing >>>>> and got: >>>>>=20 >>>>> -r338804 boots fine >>>>> (no amd64 kernel builds between to try) >>>>> -r338810+ fails (any that I tried, anyway) >>>>>=20 >>>>> In that range is -r338807 : >>>>>=20 >>>>> QUOTE >>>>> Author: kib >>>>> Date: Wed Sep 19 19:35:02 2018 >>>>> New Revision: 338807 >>>>> URL: >>>>> https://svnweb.freebsd.org/changeset/base/338807 >>>>>=20 >>>>>=20 >>>>> Log: >>>>> Convert x86 cache invalidation functions to ifuncs. >>>>>=20 >>>>> This simplifies the runtime logic and reduces the number of >>>>> runtime-constant branches. >>>>>=20 >>>>> Reviewed by: alc, markj >>>>> Sponsored by: The FreeBSD Foundation >>>>> Approved by: re (gjb) >>>>> Differential revision: >>>>> https://reviews.freebsd.org/D16736 >>>>>=20 >>>>> Modified: >>>>> head/sys/amd64/amd64/pmap.c >>>>> head/sys/amd64/include/pmap.h >>>>> head/sys/dev/drm2/drm_os_freebsd.c >>>>> head/sys/dev/drm2/i915/intel_ringbuffer.c >>>>> head/sys/i386/i386/pmap.c >>>>> head/sys/i386/i386/vm_machdep.c >>>>> head/sys/i386/include/pmap.h >>>>> head/sys/x86/iommu/intel_utils.c >>>>> END QUOTE >>>>>=20 >>>>> There do seem to be changes associated with >>>>> clflush(...) use. Looking at: >>>>>=20 >>>>>=20 >>> = https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=3D339= 432 >>>>>=20 >>>>> it appears that pmap_force_invalidate_cache_range has not >>>>> changed since -r338807. >>>>>=20 >>>>> It seems that -r338806 and -r3388810 would be unlikely >>>>> contributors. >>>>=20 >>>=20 >>> I went after my native-boot loader problem first because I >>> could switch kernels via the loader for booting FreeBSD under >>> Hyper-V. Switching loaders is more of a problem. >>>=20 >>> In order to avoid the loader-time crash I switched to building >>> installing based on WITHOUT_ZFS=3D . I've had no active use of >>> ZFS in years. (The old official-build loaders that worked were >>> non-ZFS ones.) >>>=20 >>> This took care of the native-boot loader-crash --and, to my >>> surprise, also the Hyper-V-boot kernel-time crash. >>>=20 >>> My private builds now boot the 1950X in both contexts just >>> fine. >>>=20 >>> During my early investigation I did pick up specific changes >>> from after -r339076 that seemed to be tied to Ryzen and such. >>> (They made no difference to the boot problems at the time >>> but I saw no reason to remove them.) >>>=20 >>> # uname -apKU >>> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: = Sun >>> Oct 21 16:44:25 PDT 2018 = markmi@FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/G= ENERIC-NODBG >>> amd64 amd64 1200084 1200084 >>=20 >>=20 > (stupid gmail) >=20 > The phrase "no active use" bothers me. What does that mean? Are there = any > ZFS pools or any disks that any whiff of ZFSish thing on it at all? > Clearly, there's something in the zfs boot loader that's freaking out = by > something on your system, but absent that information I can't help = you. >=20 It would help to get output from loader lsdev -v command. Also if you = could test boot loader with UEFI - for example get to loader prompt via = usb/cd boot and then get the same lsdev -v output. I would be interested = to see the sector size information and if the UEFI loader does also have = issues. If it does, I=E2=80=99d like to see the outputs from commands: zpool status zpool import thanks, toomas