From owner-freebsd-current@freebsd.org Tue Apr 10 12:27:59 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 11634F98759 for ; Tue, 10 Apr 2018 12:27:59 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9F1917D25B; Tue, 10 Apr 2018 12:27:58 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from [192.168.200.3] (c-73-216-227-39.hsd1.va.comcast.net [73.216.227.39]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: gallatin) by duke.cs.duke.edu (Postfix) with ESMTPSA id 8664127000A7; Tue, 10 Apr 2018 08:27:57 -0400 (EDT) DMARC-Filter: OpenDMARC Filter v1.3.1 duke.cs.duke.edu 8664127000A7 Authentication-Results: duke.cs.duke.edu; dmarc=none header.from=cs.duke.edu DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail0816; t=1523363277; bh=mcusF5JafEU6qRtouVtsmwiafH5uCkM52/G6WIsAgrs=; h=Subject:To:From:Date:From; b=e/z9zk3yEULuJTVGPgqz7qHAmxboYBH8hwCO/CqRWk9PUT2FyZAwpyNUS7352Jcub oycYEeDBQ9bsXrKgkSoW0f7EOf/pyCbrpoNE+dQgakN+aVgS5H050VPw7m31QMl+0H eilyxyX1nwF+xfd9PmgxfZ0UPaeT9dHX/bHG3unn3MNmb9Hobn4OaH3WeQtDa1HNED ItozY3ulGImgPG1bC3YNXsls9A//GNc28OOQCoPbSfy+rGWMQg2nzzBRmRciY0ejVg YiFwsSnMmgogtLTcTKoq4BsD/FthzXfSziq0W1wz28AXDHNsBZlcImnD+4y0CFPQnz JO5UPsEu6v1xQ== Subject: Re: Re: Odd ZFS boot module issue on r332158 To: Allan Jude , freebsd-current@freebsd.org References: <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org> From: Andrew Gallatin Message-ID: <5316e5ea-17a2-2f23-3c88-1671f41b5642@cs.duke.edu> Date: Tue, 10 Apr 2018 08:27:56 -0400 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <935ad20e-017c-5c34-61b4-9db58788a663@freebsd.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2018 12:27:59 -0000 On 04/09/18 23:33, Allan Jude wrote: > On 2018-04-09 19:11, Andrew Gallatin wrote: >> I updated my main amd64 workstation to r332158 from something much >> earlier (mid Jan). >> >> Upon reboot, all seemed well.  However, I later realized that the vmm.ko >> module was not loaded at boot, because bhyve PCI passthru did not >> work.  My loader.conf looks like (I'm passing a USB interface through): >> >> ####### >> vmm_load="YES" >> opensolaris_load="YES" >> zfs_load="YES" >> nvidia_load="YES" >> nvidia-modeset_load="YES" >> >> # Tune ZFS Arc Size - Change to adjust memory used for disk cache >> vfs.zfs.arc_max="4096M" >> hint.xhci.2.disabled="1" >> pptdevs="8/0/0" >> hw.dmar.enable="0" >> cuse_load="YES" >> ####### >> >> The problem seems "random".  I rebooted into single-user to >> see if somehow, vmm.ko was loaded at boot and something >> was unloading vmm.ko.  However, on this boot it was loaded.  I then >> ^D'ed and continued to multi-user, where X failed to start because >> this time, the nvidia modules were not loaded.  (but nvidia had >> been loaded on the 1st boot). >> >> So it *seems* like different modules are randomly not loaded by the >> loader, at boot.   The ZFS config is: >> >> config: >> >>         NAME        STATE     READ WRITE CKSUM >>         tank        ONLINE       0     0     0 >>           mirror-0  ONLINE       0     0     0 >>             ada0p2  ONLINE       0     0     0 >>             da3p2   ONLINE       0     0     0 >>           mirror-1  ONLINE       0     0     0 >>             ada1p2  ONLINE       0     0     0 >>             da0p2   ONLINE       0     0     0 >>         cache >>           da2s1d    ONLINE       0     0     0 >> >> The data drives in the pool are all exactly like this: >> >> =>        34  9767541101  ada0  GPT  (4.5T) >>           34           6        - free -  (3.0K) >>           40      204800     1  efi  (100M) >>       204840  9763209216     2  freebsd-zfs  (4.5T) >>   9763414056     4096000     3  freebsd-swap  (2.0G) >>   9767510056       31079        - free -  (15M) >> >> >> There is about 1.44T used in the pool.  I have no idea >> how ZFS mirrors work, but I'm wondering if somehow this >> is a 2T problem, and there are issues with blocks on >> difference sides of the mirror being across the 2T boundary. >> >> Sorry to be so vague.. but this is the one machine I *don't* have >> a serial console on, so I don't have good logs. >> >> Drew >> >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" > > What makes you think it is related to ZFS? > > Are there any error messages when the nvidia module did not load? > I think it is related to ZFS simply because I'm booting from ZFS and it is not working reliably. Our systems at work, booting from UFS on roughly the same svn rev seem to still load modules reliably from the loader. I know there has been a lot of work on the loader recently, and in a UEFE + UFS context, I've seen it fail to boot the right partition, etc. However, I've never seen it fail to load just some modules. The one difference between what I run at home and what we run at work is ZFS vs UFS. Given that it is a glass console, I have no confidence in my ability to log error messages. However, I could have sworn that I saw something like "io error" when it failed to load vmm.ko (I actually rebooted several times when I was diagnosing it.. at first I thought xhci was holding on to the pass-thru device) I vaguely remembered reading something about this recently. I just tracked it down to the "ZFS i/o error in recent 12.0" thread from last month, and this message in particular: https://lists.freebsd.org/pipermail/freebsd-current/2018-March/068890.html I'm booting via UEFI into a ZFS system with a FS that extends across 2TB.. Is there something like tools/diag/prtblknos for ZFS? Drew