Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 11 Jul 2010 13:14:44 -0700
From:      Doug Barton <dougb@FreeBSD.org>
To:        Rene Ladan <rene@freebsd.org>
Cc:        danfe@FreeBSD.org, Christian Zander <czander@nvidia.com>, David Naylor <naylor.b.david@gmail.com>, Yuri Pankov <yuri.pankov@gmail.com>, freebsd-current@freebsd.org
Subject:   Re: nvidia-driver crashing kernel on head
Message-ID:  <4C3A2634.5050003@FreeBSD.org>
In-Reply-To: <4C36488A.6030203@freebsd.org>
References:  <201007021146.46542.naylor.b.david@gmail.com> <AANLkTimT4UwDzB6jF2eML4U7jQubOs1slwBPHwy_5U3b@mail.gmail.com> <201007021855.42103.naylor.b.david@gmail.com> <201007080826.32764.jhb@freebsd.org> <alpine.BSF.2.00.1007081304590.5061@yncgbc.qbhto.arg> <4C36488A.6030203@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------040401050506090804000701
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 07/08/10 14:52, Rene Ladan wrote:
> On 08-07-2010 22:09, Doug Barton wrote:
>> On Thu, 8 Jul 2010, John Baldwin wrote:
>>
>>> These freezes and panics are due to the driver using a spin mutex
>>> instead of a
>>> regular mutex for the per-file descriptor event_mtx.  If you patch the
>>> driver
>>> to change it to be a regular mutex I think that should fix the problems.
>>
>> Can you give an example? :) I don't mind creating a patch for all of
>> them if you can illustrate what needs to be changed.
>>
> See the attached patch

In order to use 195.36.15 it was necessary to use the patch Rene sent,
the suggestion from jhb previously to remove some locks, plus a bit
more. The patch that got it working on HEAD for me (specifically
r209633) is attached. With that patch I could start X, and run it for a
while, but performance was very poor, even in comparison with the stock
nv driver, and it crashed a couple times (although not nearly as bad as
previously).

So based on other suggestions I tried the newest release version at
nvidia, 256.35. Some of the same locking stuff was needed to patch it, a
patch for the port which includes the locking patch is also attached. If
you are running an amd64 system you'll have to type 'make makesum' after
applying this patch to the port. I'm not sure this patch is complete, or
what Alexey might want to do with the update, but it does create an
accurate plist which means you can cleanly deinstall/pkg_delete when
you're done.

With 256.35 performance and stability have both been quite good,
comparable even to before the the drama started. The only concern I have
at this point is that I'm periodically getting a strange sort of "flash"
popping up on my screen that I didn't get while I was running the nv
driver recently. It looks sort of like the default X background (the
tiny gray crosshatch) is popping through for just a split second.


hth,

Doug

-- 

	... and that's just a little bit of history repeating.
			-- Propellerheads

	Improve the effectiveness of your Internet presence with
	a domain name makeover!    http://SupersetSolutions.com/


--------------040401050506090804000701
Content-Type: text/plain;
 name="nvidia-port-locking-256-35.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="nvidia-port-locking-256-35.diff"

Index: Makefile
===================================================================
RCS file: /home/pcvs/ports/x11/nvidia-driver/Makefile,v
retrieving revision 1.98
diff -u -r1.98 Makefile
--- Makefile	24 May 2010 03:01:56 -0000	1.98
+++ Makefile	11 Jul 2010 20:02:47 -0000
@@ -6,7 +6,7 @@
 #
 
 PORTNAME=	nvidia-driver
-DISTVERSION?=	195.36.15
+DISTVERSION?=	256.35
 PORTREVISION?=	0			# As a reminder it can be overridden
 CATEGORIES=	x11 kld
 MASTER_SITES=	${MASTER_SITE_NVIDIA}
@@ -143,9 +143,6 @@
 .endif
 .if ${NVVERSION} < 1802900
 	@${REINPLACE_CMD} '/vdpau/d' ${TMPPLIST}
-.else
-	@${MKDIR} ${PREFIX}/include/vdpau
-	@${LN} -sf ${DOCSDIR}/vdpau*.h ${PREFIX}/include/vdpau
 .endif
 .if ${NVVERSION} < 1851829
 	@${REINPLACE_CMD} '/libcuda/d' ${TMPPLIST}
Index: distinfo
===================================================================
RCS file: /home/pcvs/ports/x11/nvidia-driver/distinfo,v
retrieving revision 1.36
diff -u -r1.36 distinfo
--- distinfo	10 Apr 2010 13:40:07 -0000	1.36
+++ distinfo	11 Jul 2010 20:02:47 -0000
@@ -1,15 +1,3 @@
-MD5 (NVIDIA-FreeBSD-x86-195.36.15.tar.gz) = 2537ca726240344c7eaa44857e2b134e
-SHA256 (NVIDIA-FreeBSD-x86-195.36.15.tar.gz) = 21fc89fa59e2cc96e560af856a3fa583ce4bfb7975465c71170c64962201e7a1
-SIZE (NVIDIA-FreeBSD-x86-195.36.15.tar.gz) = 25614326
-MD5 (NVIDIA-FreeBSD-x86_64-195.36.15.tar.gz) = 95af03aedc818a3dfd8ae9f289746ba4
-SHA256 (NVIDIA-FreeBSD-x86_64-195.36.15.tar.gz) = d64c664398cb4dade24af6b108e03607614f1f7584c71449230c646c313d0e7e
-SIZE (NVIDIA-FreeBSD-x86_64-195.36.15.tar.gz) = 26449559
-MD5 (NVIDIA-FreeBSD-x86-173.14.25.tar.gz) = 1eca3916a9ae86b953f54405e1881774
-SHA256 (NVIDIA-FreeBSD-x86-173.14.25.tar.gz) = c432ed94ce71e297b2d9304d9f34f906b58e2c7c4bc13d8dbac264ed52fd6261
-SIZE (NVIDIA-FreeBSD-x86-173.14.25.tar.gz) = 16682722
-MD5 (NVIDIA-FreeBSD-x86-96.43.16.tar.gz) = 3fc5c2bb537d4a7664d84a7a0df09c7c
-SHA256 (NVIDIA-FreeBSD-x86-96.43.16.tar.gz) = 38bf334284dc600d92d8436333c98d5577e34d69456ed71f1cccc75caa6dffcd
-SIZE (NVIDIA-FreeBSD-x86-96.43.16.tar.gz) = 11842453
-MD5 (NVIDIA-FreeBSD-x86-71.86.13.tar.gz) = 19000b906225ebd39ca3edc1b0c3c7a5
-SHA256 (NVIDIA-FreeBSD-x86-71.86.13.tar.gz) = 27ae01cd6fe050871f7785c2146b18e74ea882f6262e46dc965bf26061238447
-SIZE (NVIDIA-FreeBSD-x86-71.86.13.tar.gz) = 8066159
+MD5 (NVIDIA-FreeBSD-x86-256.35.tar.gz) = 599908c9ffd8999ab0333cab34ea15a0
+SHA256 (NVIDIA-FreeBSD-x86-256.35.tar.gz) = 897c711acdca188da26868aec510c732d34f415ae621c35e5556ed8de493f26e
+SIZE (NVIDIA-FreeBSD-x86-256.35.tar.gz) = 26047458
Index: pkg-plist
===================================================================
RCS file: /home/pcvs/ports/x11/nvidia-driver/pkg-plist,v
retrieving revision 1.27
diff -u -r1.27 pkg-plist
--- pkg-plist	10 Apr 2010 13:40:07 -0000	1.27
+++ pkg-plist	11 Jul 2010 20:02:47 -0000
@@ -10,15 +10,13 @@
 @unexec mv -f %D/%%MODULESDIR%%/extensions/XXX-libglx.so.%%%%.%%XSERVVERSION%% %D/%%MODULESDIR%%/extensions/libglx.so
 @exec mv -f %D/lib/libGL.so.1 %D/lib/XXX-libGL.so.1.%%%%.%%LIBGLVERSION%%
 @unexec mv -f %D/lib/XXX-libGL.so.1.%%%%.%%LIBGLVERSION%% %D/lib/libGL.so.1
-include/vdpau/vdpau.h
-include/vdpau/vdpau_x11.h
-@dirrm include/vdpau
+lib/libGL.so
+lib/libnvidia-glcore.so.1
+lib/libnvidia-glcore.so
 lib/libnvidia-tls.so.1
 lib/libnvidia-tls.so
 lib/libnvidia-cfg.so.1
 lib/libnvidia-cfg.so
-lib/libGLcore.so.1
-lib/libGLcore.so
 lib/libvdpau.so.1
 lib/libvdpau.so
 lib/vdpau/libvdpau_nvidia.so.1
@@ -41,12 +39,10 @@
 %%LINUX%%@cwd %%LINUXBASE%%
 %%LINUX%%usr/lib/libGL.so.%%SHLIB_VERSION%%
 %%LINUX%%usr/lib/libGL.so.1
-%%LINUX%%usr/lib/libGLcore.so.%%SHLIB_VERSION%%
-%%LINUX%%usr/lib/libGLcore.so.1
 %%LINUX%%usr/lib/libcuda.so.%%SHLIB_VERSION%%
 %%LINUX%%usr/lib/libcuda.so.1
+%%LINUX%%usr/lib/libnvidia-glcore.so.%%SHLIB_VERSION%%
 %%LINUX%%usr/lib/libnvidia-tls.so.%%SHLIB_VERSION%%
-%%LINUX%%usr/lib/libnvidia-tls.so.1
 %%LINUX%%usr/lib/libvdpau.so.%%SHLIB_VERSION%%
 %%LINUX%%usr/lib/libvdpau.so.1
 %%LINUX%%usr/lib/libvdpau_nvidia.so
Index: files/patch-nvidia-locking-256-35
===================================================================
RCS file: files/patch-nvidia-locking-256-35
diff -N files/patch-nvidia-locking-256-35
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ files/patch-nvidia-locking-256-35	11 Jul 2010 20:02:47 -0000
@@ -0,0 +1,100 @@
+diff -ur NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_ctl.c src/nvidia_ctl.c
+--- NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_ctl.c	2010-06-16 18:36:40.000000000 -0700
++++ src/nvidia_ctl.c	2010-07-11 01:22:55.000000000 -0700
+@@ -53,7 +53,7 @@
+     }
+ 
+     filep->nv = nv;
+-    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE));
++    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE));
+     STAILQ_INIT(&filep->event_queue);
+ 
+     nv_lock_api(nv);
+@@ -123,7 +123,7 @@
+     if (status != 0)
+         return status;
+ 
+-    mtx_lock_spin(&filep->event_mtx);
++    mtx_lock(&filep->event_mtx);
+     et = STAILQ_FIRST(&filep->event_queue);
+ 
+     if (et == NULL)
+@@ -131,7 +131,7 @@
+     else
+         mask = (events & (POLLIN | POLLPRI | POLLRDNORM));
+ 
+-    mtx_unlock_spin(&filep->event_mtx);
++    mtx_unlock(&filep->event_mtx);
+ 
+     return mask;
+ }
+diff -ur NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_dev.c src/nvidia_dev.c
+--- NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_dev.c	2010-06-16 18:36:40.000000000 -0700
++++ src/nvidia_dev.c	2010-07-11 01:22:55.000000000 -0700
+@@ -52,7 +52,7 @@
+     }
+ 
+     filep->nv = nv;
+-    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE));
++    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE));
+     STAILQ_INIT(&filep->event_queue);
+ 
+     nv_lock_api(nv);
+@@ -123,7 +123,7 @@
+     if (status != 0)
+         return status;
+ 
+-    mtx_lock_spin(&filep->event_mtx);
++    mtx_lock(&filep->event_mtx);
+     et = STAILQ_FIRST(&filep->event_queue);
+ 
+     if (et == NULL)
+@@ -131,7 +131,7 @@
+     else
+         mask = (events & (POLLIN | POLLPRI | POLLRDNORM));
+ 
+-    mtx_unlock_spin(&filep->event_mtx);
++    mtx_unlock(&filep->event_mtx);
+ 
+     return mask;
+ }
+diff -ur NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_subr.c src/nvidia_subr.c
+--- NVIDIA-FreeBSD-x86-256.35-port-patched/src/nvidia_subr.c	2010-06-16 18:36:40.000000000 -0700
++++ src/nvidia_subr.c	2010-07-11 01:22:55.000000000 -0700
+@@ -987,9 +987,9 @@
+     et->event.hObject = hObject;
+     et->event.index = index;
+ 
+-    mtx_lock_spin(&filep->event_mtx);
++    mtx_lock(&filep->event_mtx);
+     STAILQ_INSERT_TAIL(&filep->event_queue, et, queue);
+-    mtx_unlock_spin(&filep->event_mtx);
++    mtx_unlock(&filep->event_mtx);
+ 
+     selwakeup(&filep->event_rsel);
+ }
+@@ -1004,7 +1004,7 @@
+     struct nvidia_filep *filep = file;
+     struct nvidia_event *et;
+ 
+-    mtx_lock_spin(&filep->event_mtx);
++    mtx_lock(&filep->event_mtx);
+     et = STAILQ_FIRST(&filep->event_queue);
+ 
+     if (et != NULL) {
+@@ -1013,13 +1013,13 @@
+         STAILQ_REMOVE(&filep->event_queue, et, nvidia_event, queue);
+         *pending = !STAILQ_EMPTY(&filep->event_queue);
+ 
+-        mtx_unlock_spin(&filep->event_mtx);
++        mtx_unlock(&filep->event_mtx);
+         free(et, M_NVIDIA);
+ 
+         return RM_OK;
+     }
+ 
+-    mtx_unlock_spin(&filep->event_mtx);
++    mtx_unlock(&filep->event_mtx);
+     return RM_ERROR;
+ }
+ 

--------------040401050506090804000701
Content-Type: text/plain;
 name="patch-nvidia-locking-195-36-15"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="patch-nvidia-locking-195-36-15"

diff -ur NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_ctl.c src/nvidia_ctl.c
--- NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_ctl.c	2010-03-12 09:21:51.000000000 -0800
+++ src/nvidia_ctl.c	2010-07-10 16:36:49.000000000 -0700
@@ -54,7 +54,7 @@
     }
 
     filep->nv = nv;
-    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE));
+    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE));
     STAILQ_INIT(&filep->event_queue);
 
     nv_lock_api(nv);
@@ -126,7 +126,7 @@
     if (status != 0)
         return status;
 
-    mtx_lock_spin(&filep->event_mtx);
+    mtx_lock(&filep->event_mtx);
     et = STAILQ_FIRST(&filep->event_queue);
 
     if (et == NULL)
@@ -134,7 +134,7 @@
     else
         mask = (events & (POLLIN | POLLPRI | POLLRDNORM));
 
-    mtx_unlock_spin(&filep->event_mtx);
+    mtx_unlock(&filep->event_mtx);
 
     return mask;
 }
diff -ur NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_dev.c src/nvidia_dev.c
--- NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_dev.c	2010-03-12 09:21:51.000000000 -0800
+++ src/nvidia_dev.c	2010-07-10 16:36:49.000000000 -0700
@@ -52,7 +52,7 @@
     }
 
     filep->nv = nv;
-    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_SPIN | MTX_RECURSE));
+    mtx_init(&filep->event_mtx, "event_mtx", NULL, (MTX_DEF | MTX_RECURSE));
     STAILQ_INIT(&filep->event_queue);
 
     nv_lock_api(nv);
@@ -125,7 +125,7 @@
     if (status != 0)
         return status;
 
-    mtx_lock_spin(&filep->event_mtx);
+    mtx_lock(&filep->event_mtx);
     et = STAILQ_FIRST(&filep->event_queue);
 
     if (et == NULL)
@@ -133,7 +133,7 @@
     else
         mask = (events & (POLLIN | POLLPRI | POLLRDNORM));
 
-    mtx_unlock_spin(&filep->event_mtx);
+    mtx_unlock(&filep->event_mtx);
 
     return mask;
 }
diff -ur NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_subr.c src/nvidia_subr.c
--- NVIDIA-FreeBSD-x86-195.36.15-port-patched/src/nvidia_subr.c	2010-03-12 09:21:52.000000000 -0800
+++ src/nvidia_subr.c	2010-07-10 16:37:43.000000000 -0700
@@ -967,9 +967,9 @@
     et->event.hObject = hObject;
     et->event.index = index;
 
-    mtx_lock_spin(&filep->event_mtx);
+    mtx_lock(&filep->event_mtx);
     STAILQ_INSERT_TAIL(&filep->event_queue, et, queue);
-    mtx_unlock_spin(&filep->event_mtx);
+    mtx_unlock(&filep->event_mtx);
 
     selwakeup(&filep->event_rsel);
 }
@@ -984,7 +984,7 @@
     struct nvidia_filep *filep = file;
     struct nvidia_event *et;
 
-    mtx_lock_spin(&filep->event_mtx);
+    mtx_lock(&filep->event_mtx);
     et = STAILQ_FIRST(&filep->event_queue);
 
     if (et != NULL) {
@@ -993,13 +993,13 @@
         STAILQ_REMOVE(&filep->event_queue, et, nvidia_event, queue);
         *pending = !STAILQ_EMPTY(&filep->event_queue);
 
-        mtx_unlock_spin(&filep->event_mtx);
+        mtx_unlock(&filep->event_mtx);
         free(et, M_NVIDIA);
 
         return RM_OK;
     }
 
-    mtx_unlock_spin(&filep->event_mtx);
+    mtx_unlock(&filep->event_mtx);
     return RM_ERROR;
 }
 
@@ -1301,9 +1301,6 @@
 
     for (i = 0; i < count; i++) {
         pte_array[i] = at->pte_array[i].physical_address;
-        vm_page_lock_queues();
-        vm_page_wire(PHYS_TO_VM_PAGE(pte_array[i]));
-        vm_page_unlock_queues();
         sglist_append_phys(at->sg_list, pte_array[i], PAGE_SIZE);
     }
 
@@ -1365,9 +1362,6 @@
         os_flush_cpu_cache();
 
     for (i = 0; i < count; i++) {
-        vm_page_lock_queues();
-        vm_page_unwire(PHYS_TO_VM_PAGE(at->pte_array[i].physical_address), 0);
-        vm_page_unlock_queues();
         kmem_free(kernel_map,
                 at->pte_array[i].virtual_address, PAGE_SIZE);
         malloc_type_freed(M_NVIDIA, PAGE_SIZE);

--------------040401050506090804000701--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C3A2634.5050003>