From owner-freebsd-virtualization@freebsd.org Wed Jan 11 09:45:51 2017 Return-Path: Delivered-To: freebsd-virtualization@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C9E7CCAAFD4 for ; Wed, 11 Jan 2017 09:45:51 +0000 (UTC) (envelope-from soralx@cydem.org) Received: from smtp.triumf.ca (smtp.triumf.ca [142.90.100.188]) by mx1.freebsd.org (Postfix) with ESMTP id B79351821 for ; Wed, 11 Jan 2017 09:45:51 +0000 (UTC) (envelope-from soralx@cydem.org) Received: from mscad14 (mscad14.triumf.ca [142.90.115.36]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.triumf.ca (Postfix) with ESMTP id 9E5A9F804; Wed, 11 Jan 2017 01:45:45 -0800 (PST) Date: Wed, 11 Jan 2017 01:45:44 -0800 From: To: Subject: Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2) Message-ID: <20170111014544.70670784@mscad14> In-Reply-To: <20170110180117.7f246b5a@mscad14> References: <20170110003332.7cf8ba15@mscad14> <0de7e0fe-5680-b1be-bd57-6bf446c2fd38@talk2dom.com> <0c927784-3e3f-7946-fba9-c25001f4156c@talk2dom.com> <20170110180117.7f246b5a@mscad14> X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.29; amd64-portbld-freebsd9.3) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-virtualization@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Discussion of various virtualization techniques FreeBSD supports." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Jan 2017 09:45:51 -0000 > The problem appears to be in the area of assigning memory-mapped > I/O ranges by bhyve for the VGA card to a region outside of the > CPU's addressable space; i.e., bhyve does not check CPUID's > 0x80000008 AL value (0x27 for my CPU, which is 39 bits -- while > bhyve assigns 0xd000000000 & above for the large Prefetch Memory > chunks, which requires 40 address bits). At least this is my > understanding of why VGA passthrough does not work. To test this, I tried writing to PCI BARs in FreeBSD guest using `pciconf -w`. Not much use that was: I could read back the values written to the registers (e.g., `pciconf -r pci0:0:4:0 0x14:48`), but `pciconf -lvb` still showed the same huge base addresses -- they did not want to change. OK, I had enough of that. So I went to dig in the source, and changed the "#define PCI_EMUL_MEMBASE64" from '0xD000000000UL' to '0x3400000000UL' in src/usr.sbin/bhyve/pci_emul.c. Recompiled bhyve, booted up FreeBSD, and: # pciconf -lvb [...] vgapci0@pci0:0:4:0: class=0x030000 card=0x084a10de chip=0x0dd810de rev=0xa1 hdr=0x00 vendor = 'NVIDIA Corporation' device = 'GF106GL [Quadro 2000]' class = display subclass = VGA bar [10] = type Memory, range 32, base 0xc2000000, size 33554432, enabled bar [14] = type Prefetchable Memory, range 64, base 0x3400000000, size 134217728, enabled bar [1c] = type Prefetchable Memory, range 64, base 0x3408000000, size 67108864, enabled bar [24] = type I/O Port, range 32, base 0x2080, size 128, enabled ...a-a-and: # kldload nvidia-modeset Linux ELF exec handler installed nvidia0: on vgapci0 vgapci0: child nvidia0 requested pci_enable_io vgapci0: attempting to allocate 1 MSI vectors (1 supported) msi: routing MSI IRQ 269 to local APIC 3 vector 51 vgapci0: using IRQ 269 for MSI vgapci0: child nvidia0 requested pci_enable_io random: harvesting attach, 8 bytes (4 bits) from nvidia0 # nvidia-smi acquiring duplicate lock of same type: "os.lock_sx" 1st os.lock_sx @ nvidia_os.c:599 2nd os.lock_sx @ nvidia_os.c:599 stack backtrace: #0 0xffffffff80aa6780 at witness_debugger+0x70 #1 0xffffffff80aa6683 at witness_checkorder+0xde3 #2 0xffffffff80a4fac2 at _sx_xlock+0x72 #3 0xffffffff82a515c2 at os_acquire_mutex+0x32 #4 0xffffffff82a21068 at _nv016673rm+0x18 Tue Jan 10 17:06:48 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 367.44 Driver Version: 367.44 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro 2000 Off | 0000:00:04.0 Off | N/A | | 30% 35C P8 N/A / N/A | 0MiB / 963MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ Beauty! It's very slow to execute, though. And Xorg is not in a hurry to start working: [ 204.724] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 0xc2000000/33554432, 0x3400000000/134217728, 0x3408000000/67108864, I/O @ 0x00002080/128, BIOS @ 0x????????/65536 [...] [ 204.736] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32 [ 204.736] (==) NVIDIA(0): RGB weight 888 [ 204.736] (==) NVIDIA(0): Default visual is TrueColor [ 204.736] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0) [ 204.738] (**) NVIDIA(0): Enabling 2D acceleration [ 213.674] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:0:4:0 [ 213.674] (--) NVIDIA(0): CRT-0 [ 213.674] (--) NVIDIA(0): DFP-0 (boot) [ 213.674] (--) NVIDIA(0): DFP-1 [ 213.674] (--) NVIDIA(0): DFP-2 [ 213.674] (--) NVIDIA(0): DFP-3 [ 213.675] (--) NVIDIA(0): DFP-4 [ 213.698] (--) NVIDIA(0): CRT-0: disconnected [ 213.698] (--) NVIDIA(0): CRT-0: 400.0 MHz maximum pixel clock [ 213.698] (--) NVIDIA(0): [ 213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): connected [ 213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): Internal TMDS [ 213.743] (--) NVIDIA(0): DELL 2007FP (DFP-0): 330.0 MHz maximum pixel clock [...] [ 213.747] (II) NVIDIA(0): NVIDIA GPU Quadro 2000 (GF106GL) at PCI:0:4:0 (GPU-0) [ 213.747] (--) NVIDIA(0): Memory: 1048576 kBytes [ 213.747] (--) NVIDIA(0): VideoBIOS: 70.06.0d.00.02 [ 213.747] (II) NVIDIA(0): Detected PCI Express Link width: 16X [ 213.748] (**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display [ 213.748] (**) NVIDIA(0): device DELL 2007FP (DFP-0) (Using EDID frequencies has [ 213.748] (**) NVIDIA(0): been enabled on all display devices.) [...] [ 213.751] (II) NVIDIA(0): Virtual screen size determined to be 1600 x 1200 [ 213.761] (--) NVIDIA(0): DPI set to (99, 98); computed from "UseEdidDpi" X config [ 213.761] (--) NVIDIA(0): option [ 213.761] (--) Depth 24 pixmap format is 32 bpp [ 213.767] (II) NVIDIA: Reserving 12288.00 MB of virtual memory for indirect memory [ 213.767] (II) NVIDIA: access. [ 216.789] (EE) NVIDIA(GPU-0): Failed to initialize DMA. [ 216.789] (EE) *** Aborting *** [ 216.791] (EE) NVIDIA(0): Failed to allocate push buffer [ 216.839] (EE) Fatal server error: [ 216.839] (EE) AddScreen/ScreenInit failed for driver 0 Linux still doesn't work (curse Ubuntu! what a mess. It tried to start Xorg at boot, so I managed to disable that, but no matter what, I couldn't stop it from trying to run 'nvidia-smi' at boot! And trust me, I tried a lot. I removed all the scripts related to nvidia, /etc/udev/ is basically empty [/etc just looks like a pile-up of crap, wow!], yet /usr/bin/nvidia-smi still tried to run by itself until I moved it away). dmesg: [ 1.390957] nvidia: module verification failed: signature and/or required key missing - tainting kernel [ 1.394715] nvidia 0000:00:04.0: can't derive routing for PCI INT A [ 1.395185] nvidia 0000:00:04.0: PCI INT A: no GSI [ 1.414173] vgaarb: device changed decodes: PCI:0000:00:04.0,olddecodes=io+mem,decodes=none:owns=io+mem [ 1.417062] nvidia-nvlink: Nvlink Core is being initialized, major device number 247 [ 1.417609] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.26 Thu Dec 8 18:36:43 PST 2016 (using threaded interrupts) [ 1.419820] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.26 Thu Dec 8 18:04:14 PST 2016 [ 1.422067] [drm] [nvidia-drm] [GPU ID 0x00000004] Loading driver [...] [ 3.904893] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 246 # lspci -vvn 00:04.0 0300: 10de:0dd8 (rev a1) (prog-if 00 [VGA controller]) Subsystem: 10de:084a Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-