Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Jan 2017 19:54:02 -0800
From:      <soralx@cydem.org>
To:        <freebsd-virtualization@freebsd.org>
Subject:   Re: Issues with GTX960 on CentOS7 using bhyve PCI passthru (FreeBSD 11-RC2)
Message-ID:  <20170111195402.785f27c6@mscad14>
In-Reply-To: <93196ea2-5439-49ff-54fd-7b7273bdec85@freebsd.org>
References:  <20170110003332.7cf8ba15@mscad14> <0de7e0fe-5680-b1be-bd57-6bf446c2fd38@talk2dom.com> <0c927784-3e3f-7946-fba9-c25001f4156c@talk2dom.com> <20170110180117.7f246b5a@mscad14> <20170111014544.70670784@mscad14> <93196ea2-5439-49ff-54fd-7b7273bdec85@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

 I had a bit more play with nVidia and FreeBSD guest.

 First, `nvidia-smi -q` output diff [0] is interesting. It suggests that
 the card may be in some incompletely initialized state: notice the
 "Unknown Error" instead of real UUID, and the P8 power state. Could it
 be that the driver doesn't put the card's BIOS in the right state?
 The command was run in both host and guest without Xorg loaded.

 Second, I was able to start Xorg by disabling rendering acceleration
 (Option "NoAccel"). Now nVidia's Xorg module does not fail to allocate
 DMA (I guess it does not try?), but oddly, reserves 48 GB (!?) of virtual
 memory instead. Sadly there is still no display for some reason.

 Relevant dmesg bits are below [1]. Of particular interest is the line
 "nvidia-modeset: Allocated GPU:0 () @ PCI:0000:00:00.0" -- the PCI
 address is obviously incorrect.

 Xorg log bits [2] show that X is up. But the monitor stays in sleep mode.
 With more options [3], I get this: [4]. Edit: actually, host reboot
 made it behave the same as just with "NoAccel", maybe.

 Clearly the driver is able to talk to the card: e.g., it attaches and
 responds to `nvidia-smi` [with the exception of UUID], reads EDID from
 the monitor. But some channel of communication is clearly missing or
 not working right. Any ideas how to go about finding out which one?

[0]
 ==============NVSMI LOG==============
 
-Timestamp                           : Wed Jan 11 19:40:54 2017
+Timestamp                           : Wed Jan 11 11:08:40 2017
 Driver Version                      : 367.44
 
 Attached GPUs                       : 1
-GPU 0000:01:00.0
+GPU 0000:00:04.0
     Product Name                    : Quadro 2000
     Product Brand                   : Quadro
     Display Mode                    : Enabled
@@ -17,11 +17,11 @@
         Current                     : N/A
         Pending                     : N/A
     Serial Number                   : N/A
-    GPU UUID                        : GPU-f6c71b8e-f6c8-5a42-260d-1164720bf4f2
+    GPU UUID                        : Unknown Error
     Minor Number                    : 0
     VBIOS Version                   : 70.06.0D.00.02
     MultiGPU Board                  : No
-    Board ID                        : 0x100
+    Board ID                        : 0x4
     GPU Part Number                 : N/A
     Inforom Version
         Image Version               : N/A
@@ -34,16 +34,16 @@
     GPU Virtualization Mode
         Virtualization mode         : None
     PCI
-        Bus                         : 0x01
-        Device                      : 0x00
+        Bus                         : 0x00
+        Device                      : 0x04
         Domain                      : 0x0000
         Device Id                   : 0x0DD810DE
-        Bus Id                      : 0000:01:00.0
+        Bus Id                      : 0000:00:04.0
         Sub System Id               : 0x084A10DE
         GPU Link Info
             PCIe Generation
                 Max                 : 2
-                Current             : 2
+                Current             : 1
             Link Width
                 Max                 : 16x
                 Current             : 16x
@@ -54,7 +54,7 @@
         Tx Throughput               : N/A
         Rx Throughput               : N/A
     Fan Speed                       : 30 %
-    Performance State               : P0
+    Performance State               : P8
     Clocks Throttle Reasons         : N/A
     FB Memory Usage
         Total                       : 963 MiB
@@ -113,7 +113,7 @@
         Double Bit ECC              : N/A
         Pending                     : N/A
     Temperature
-        GPU Current Temp            : 38 C
+        GPU Current Temp            : 35 C
         GPU Shutdown Temp           : N/A
         GPU Slowdown Temp           : N/A
     Power Readings
@@ -125,10 +125,10 @@
         Min Power Limit             : N/A
         Max Power Limit             : N/A
     Clocks
-        Graphics                    : 625 MHz
-        SM                          : 1251 MHz
-        Memory                      : 1304 MHz
-        Video                       : 540 MHz
+        Graphics                    : 405 MHz
+        SM                          : 810 MHz
+        Memory                      : 324 MHz
+        Video                       : 405 MHz
     Applications Clocks
         Graphics                    : N/A
         Memory                      : N/A


[1]
nvidia0: <Quadro 2000> on vgapci0
vgapci0: child nvidia0 requested pci_enable_io
vgapci0: attempting to allocate 1 MSI vectors (1 supported)
msi: routing MSI IRQ 269 to local APIC 3 vector 51
vgapci0: using IRQ 269 for MSI
vgapci0: child nvidia0 requested pci_enable_io
nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  367.44  Wed Aug 17 22:05:09 PDT 2016
acquiring duplicate lock of same type: "os.lock_sx"
 1st os.lock_sx @ nvidia_os.c:599
 2nd os.lock_sx @ nvidia_os.c:599
stack backtrace:
#0 0xffffffff80aa6780 at witness_debugger+0x70
#1 0xffffffff80aa6683 at witness_checkorder+0xde3
#2 0xffffffff80a4fac2 at _sx_xlock+0x72
#3 0xffffffff82a515c2 at os_acquire_mutex+0x32
#4 0xffffffff82a21068 at _nv016673rm+0x18
nvidia-modeset: Allocated GPU:0 () @ PCI:0000:00:00.0
nvidia-modeset: WARNING: GPU:0: Lost display notification; continuing.
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000000
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000001
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000002
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000003
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000004
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000005
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000006
NVRM: Xid (PCI:0000:00:04): 16, Head 00000000 Count 00000007

When rebooting, I get this:
nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000857d:0:0:0x00000040

[2]
[   659.406] (--) PCI:*(0:0:4:0) 10de:0dd8:10de:084a rev 161, Mem @ 0xc2000000/33554432, 0x3400000000/134217728, 0x3408000000/67108864, I/O @ 0x00002080/128, BIOS @ 0x????????/65536
[   659.407] (II) LoadModule: "glx"
[   659.412] (II) Loading /usr/local/lib/xorg/modules/extensions/libglx.so
[   659.594] (II) Module glx: vendor="NVIDIA Corporation"
[   659.594]    compiled for 4.0.2, module version = 1.0.0
[   659.594]    Module class: X.Org Server Extension
[   659.594] (II) NVIDIA GLX Module  367.44  Wed Aug 17 22:01:17 PDT 2016
[   659.595] (II) LoadModule: "nvidia"
[   659.595] (II) Loading /usr/local/lib/xorg/modules/drivers/nvidia_drv.so
[   659.603] (II) Module nvidia: vendor="NVIDIA Corporation"
[   659.603]    compiled for 4.0.2, module version = 1.0.0
[   659.603]    Module class: X.Org Video Driver
[   659.604] (II) NVIDIA dlloader X Driver  367.44  Wed Aug 17 21:41:06 PDT 2016
[   659.604] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[   659.604] (--) Using syscons driver with X support (version 2.0)
[   659.604] (--) using VT number 9
[...]
[   659.609] (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
[   659.609] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[   659.609] (==) NVIDIA(0): RGB weight 888
[   659.609] (==) NVIDIA(0): Default visual is TrueColor
[   659.609] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[   659.610] (**) NVIDIA(0): Option "NoPowerConnectorCheck" "yes"
[   659.610] (**) NVIDIA(0): Option "ThermalConfigurationCheck" "no"
[   659.610] (**) NVIDIA(0): Option "Accel" "no"
[   659.610] (**) NVIDIA(0): Disabling 2D acceleration
[   668.054] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:0:4:0
[   668.054] (--) NVIDIA(0):     CRT-0
[   668.054] (--) NVIDIA(0):     DFP-0 (boot)
[...]
[   668.125] (II) NVIDIA(GPU-0): Skipping Power Connector Check.
[   668.125] (II) NVIDIA(GPU-0): Skipping Thermal Configuration Check.
[   668.125] (II) NVIDIA(GPU-0): Acceleration disabled.
[   668.125] (II) NVIDIA(0): NVIDIA GPU Quadro 2000 (GF106GL) at PCI:0:4:0 (GPU-0)
[   668.125] (--) NVIDIA(0): Memory: 1048576 kBytes
[   668.125] (--) NVIDIA(0): VideoBIOS: 70.06.0d.00.02
[   668.125] (II) NVIDIA(0): Detected PCI Express Link width: 16X
[   668.125] (**) NVIDIA(0): Using HorizSync/VertRefresh ranges from the EDID for display
[   668.125] (**) NVIDIA(0):     device DELL 2007FP (DFP-0) (Using EDID frequencies has
[   668.125] (**) NVIDIA(0):     been enabled on all display devices.)
[   668.127] (==) NVIDIA(0): 
[   668.127] (==) NVIDIA(0): No modes were requested; the default mode "nvidia-auto-select"
[   668.127] (==) NVIDIA(0):     will be used as the requested mode.
[   668.127] (==) NVIDIA(0): 
[   668.128] (II) NVIDIA(0): Validated MetaModes:
[   668.128] (II) NVIDIA(0):     "DFP-0:nvidia-auto-select"
[   668.128] (II) NVIDIA(0): Virtual screen size determined to be 1600 x 1200
[   668.139] (--) NVIDIA(0): DPI set to (99, 98); computed from "UseEdidDpi" X config
[   668.139] (--) NVIDIA(0):     option
[   668.139] (--) Depth 24 pixmap format is 32 bpp
[   668.144] (II) NVIDIA: Reserving 49152.00 MB of virtual memory for indirect memory
[   668.144] (II) NVIDIA:     access.
[   668.370] (II) NVIDIA(0): Setting mode "DFP-0:nvidia-auto-select"
[   675.574] (==) NVIDIA(0): Disabling shared memory pixmaps
[   675.574] (==) NVIDIA(0): Backing store enabled
[   675.574] (==) NVIDIA(0): Silken mouse enabled
[   675.575] (**) NVIDIA(0): DPMS enabled
[   675.575] (II) Loading sub module "dri2"
[   675.575] (II) LoadModule: "dri2"
[   675.575] (II) Module "dri2" already built-in
[   675.575] (II) NVIDIA(0): [DRI2] Setup complete
[   675.575] (II) NVIDIA(0): [DRI2]   VDPAU driver: nvidia
[   675.576] (--) RandR disabled
[   675.577] (EE) Failed to initialize GLX extension (Compatible NVIDIA X driver not found)
[   675.577] (II) Loading sub module "shadow"
[   675.577] (II) LoadModule: "shadow"
[   675.577] (II) Loading /usr/local/lib/xorg/modules/libshadow.so
[   675.577] (II) Module shadow: vendor="X.Org Foundation"
[   675.577]    compiled for 1.17.4, module version = 1.1.0
[   675.577]    ABI class: X.Org ANSI C Emulation, version 0.4
[   675.638] (II) config/devd: probing input devices...
[   675.638] (II) config/devd: adding input device (null) (/dev/kbdmux)
[   675.638] (II) LoadModule: "kbd"
[   675.639] (II) Loading /usr/local/lib/xorg/modules/input/kbd_drv.so
[   675.640] (II) Module kbd: vendor="X.Org Foundation"
[   675.640]    compiled for 1.17.4, module version = 1.8.1
[   675.640]    Module class: X.Org XInput Driver
[   675.640]    ABI class: X.Org XInput driver, version 21.0
[   675.640] (II) Using input driver 'kbd' for 'kbdmux'
[   675.640] (**) kbdmux: always reports core events
[   675.640] (**) kbdmux: always reports core events
[   675.640] (**) Option "Protocol" "standard"
[   675.640] (**) Option "XkbRules" "base"
[   675.640] (**) Option "XkbModel" "pc105"
[   675.640] (**) Option "XkbLayout" "us"
[   675.640] (**) Option "config_info" "devd:kbdmux"
[   675.640] (II) XINPUT: Adding extended input device "kbdmux" (type: KEYBOARD, id 6)
[   675.640] (II) config/devd: kbdmux is enabled, ignoring device atkbd0
[   675.640] (II) config/devd: adding input device (null) (/dev/sysmouse)
[   675.640] (II) LoadModule: "mouse"
[   675.640] (II) Loading /usr/local/lib/xorg/modules/input/mouse_drv.so
[   675.642] (II) Module mouse: vendor="X.Org Foundation"
[   675.642]    compiled for 1.17.4, module version = 1.9.1
[   675.642]    Module class: X.Org XInput Driver
[   675.642]    ABI class: X.Org XInput driver, version 21.0
[   675.642] (II) Using input driver 'mouse' for 'sysmouse'
[   675.642] (**) sysmouse: always reports core events
[   675.642] (**) Option "Device" "/dev/sysmouse"
[   675.642] (==) sysmouse: Protocol: "Auto"
[   675.642] (**) sysmouse: always reports core events
[   675.642] (==) sysmouse: Emulate3Buttons, Emulate3Timeout: 50
[   675.642] (**) sysmouse: ZAxisMapping: buttons 4 and 5
[   675.642] (**) sysmouse: Buttons: 5
[   675.642] (**) Option "config_info" "devd:sysmouse"
[   675.642] (II) XINPUT: Adding extended input device "sysmouse" (type: MOUSE, id 7)
[   675.642] (**) sysmouse: (accel) keeping acceleration scheme 1
[   675.642] (**) sysmouse: (accel) acceleration profile 0
[   675.642] (**) sysmouse: (accel) acceleration factor: 2.000
[   675.642] (**) sysmouse: (accel) acceleration threshold: 4
[   675.642] (II) sysmouse: SetupAuto: hw.iftype is 4, hw.model is 0
[   675.642] (II) sysmouse: SetupAuto: protocol is SysMouse


[3]
Option "Accel" "no"
Option "RenderAccel" "no"
Option "UBB" "no"
Option "NoFlip" "yes"
Option "Overlay" "no"
Option "MultisampleCompatibility" "off"
Option "NoPowerConnectorCheck" "yes"
Option "ThermalConfigurationCheck" "no"
Option "TripleBuffer" "off"
Option "ModeDebug" "yes"
Option "IndirectMemoryAccess" "no"
Option "UseSysmemPixmapAccel" "no"
#Option "UseDPLib" "off"


[4]
Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed
Jan 11 11:34:49 fbsd12tst kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x20
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8243997b
stack pointer           = 0x28:0xfffffe007aec8ab8
frame pointer           = 0x28:0xfffff80006a58808
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 3
current process         = 631 (Xorg)
[ thread pid 631 tid 100062 ]
Stopped at      _nv002035kms+0x3b:      movq    0x20(%rax,%rdx,8),%rax
db> 


-- 
[SorAlx]  ridin' VN2000 Classic LT



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170111195402.785f27c6>