From owner-freebsd-current@FreeBSD.ORG Sat Aug 10 09:12:53 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 877B1BF3 for ; Sat, 10 Aug 2013 09:12:53 +0000 (UTC) (envelope-from rhurlin@gwdg.de) Received: from amailer.gwdg.de (amailer.gwdg.de [134.76.10.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E9FDA20C2 for ; Sat, 10 Aug 2013 09:12:52 +0000 (UTC) Received: from p5dc3f042.dip0.t-ipconnect.de ([93.195.240.66] helo=krabat.raven.hur) by mailer.gwdg.de with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.72) (envelope-from ) id 1V85C4-0002S9-Ra; Sat, 10 Aug 2013 11:10:41 +0200 Message-ID: <52060390.1040505@gwdg.de> Date: Sat, 10 Aug 2013 11:10:40 +0200 From: Rainer Hurling User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130808 Thunderbird/17.0.8 MIME-Version: 1.0 To: gljennjohn@googlemail.com Subject: Re: CURRENT crashes with nvidia GPU BLOB : vm_radix_insert: key 23c078 is already present References: <20130808201018.1215f733@munin.geoinf.fu-berlin.de> <1375997961.1451.3.camel@localhost> <20130809073251.376c9206@munin.geoinf.fu-berlin.de> <20130809171237.GN1746@albert.catwhisker.org> <20130810103705.022ce7be@ernst.home> In-Reply-To: <20130810103705.022ce7be@ernst.home> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Authenticated: Id:rhurlin X-Spam-Level: - X-Virus-Scanned: (clean) by exiscan+sophie Cc: FreeBSD CURRENT X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Aug 2013 09:12:53 -0000 Am 10.08.2013 10:37, schrieb Gary Jennejohn: > On Fri, 9 Aug 2013 10:12:37 -0700 > David Wolfskill wrote: > >> On Fri, Aug 09, 2013 at 07:32:51AM +0200, O. Hartmann wrote: >>> ... >>>>> On 8 August 2013 11:10, O. Hartmann >>>>> wrote: >>>>>> The most recent CURRENT doesn't work with the x11/nvidia-driver >>>>>> (which is at 319.25 in the ports and 325.15 from nVidia). >>>>>> >>>>>> After build- and installworld AND successfully rebuilding port >>>>>> x11/nvidia-driver, the system crashes immediately after a reboot >>>>>> as soon the kernel module nvidia.ko seems to get loaded (in my >>>>>> case, I load nvidia.ko via /etc/rc.conf.local since the nVidia >>>>>> BLOB doesn't load cleanly everytime when loaded >>>>>> from /boot/loader.conf). >>>>>> >>>>>> The crash occurs on systems with default compilation options set >>>>>> while building world and with settings like -O3 -march=native. It >>>>>> doesn't matter. >>>>>> >>>>>> FreeBSD and the port x11/nvidia-driver has been compiled with >>>>>> CLANG. >>>>>> >>>>>> Most recent FreeBSD revision still crashing is r254097. >>>>>> >>>>>> When vmcore is saved, I always see something like >>>>>> >>>>>> savecore: reboot after panic: vm_radix_insert: key 23c078 is >>>>>> already present >>>>>> >>>>>> >>>>>> Does anyone has any idea what's going on? >>>>>> >>>>>> Thanks for helping in advance, >>>>>> >>>>>> Oliver >>>> >>>> I'm seeing a complete deadlock on my T520 with today's current and >>>> latest portsnap'd versions of ports for the nvidia-driver updates. >>>> >>>> A little bisection and help from others seems to point the finger at >>>> Jeff's r254025 >>>> >>>> I'm getting a complete deadlock on X starting, but loading the module >>>> seems to have no ill effects. >>>> >>>> Sean >>> >>> Rigth, I loaded the module also via /boot/loader.conf and it loads >>> cleanly. I start xdm and then the deadlock occurs. >>> >>> I tried recompiling the whole xorg suite via "portmaster -f xorg xdm", >>> it took a while, but no effect, still dying. >>> ..... >> >> Sorry to be rather late to the party; the Internet connection I'm using >> at the moment is a bit flaky. (I'm out of town.) >> >> I managed to get head/i386 @r254135 built and booting ... by removing >> the "options DEBUG_MEMGUARD" from my kernel. >> >> However, that merely prevented a (very!) early panic, and got me to the >> point where trying to start xdm with the x11/nvidia-driver as the >> display driver causes an immediate reboot (no crash dump, despite >> 'dumpdev="AUTO"' in /etc/rc.conf). No drop to debugger, either. >> >> Booting & starting xdm with the nv driver works -- that's my present >> environment as I am typing this. >> >> However, the panic with DEBUG_MEMGUARD may offer a clue. Unfortunately, >> it's early enough that screen lock/scrolling doesn't work, and I only >> had the patience to write down partof the panic information. (This is >> on my laptop; no serial console, AFAICT -- and no device to capture the >> output if I did, since I'm not at home.) >> >> The top line of the screen (at the panic) reads: >> >> s/kern/subr_vmem.c:1050 >> >> The backtrace has the expected stuff near the top (about kbd, panic, and >> memguard stuff); just below that is: >> >> vmem_alloc(c1226100,6681000,2,c1820cc0,3b5,...) at 0xc0ac5673=vmem_alloc+0x53/frame 0xc1820ca0 >> >> Caveat: that was hand-transcribed from the screen to papaer, then >> hand-transcribed from paper to this email message. And my highest grade >> in "Penmanship" was a D+. >> >> Be that as it may, here's the relevant section of subr_vmem.c with line >> numbers (cut/pasted, so tabs get munged): >> >> 1039 /* >> 1040 * vmem_alloc: allocate resource from the arena. >> 1041 */ >> 1042 int >> 1043 vmem_alloc(vmem_t *vm, vmem_size_t size, int flags, vmem_addr_t *addrp) >> 1044 { >> 1045 const int strat __unused = flags & VMEM_FITMASK; >> 1046 qcache_t *qc; >> 1047 >> 1048 flags &= VMEM_FLAGS; >> 1049 MPASS(size > 0); >> 1050 MPASS(strat == M_BESTFIT || strat == M_FIRSTFIT); >> 1051 if ((flags & M_NOWAIT) == 0) >> 1052 WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "vmem_alloc"); >> 1053 >> 1054 if (size <= vm->vm_qcache_max) { >> 1055 qc = &vm->vm_qcache[(size - 1) >> vm->vm_quantum_shift]; >> 1056 *addrp = (vmem_addr_t)uma_zalloc(qc->qc_cache, flags); >> 1057 if (*addrp == 0) >> 1058 return (ENOMEM); >> 1059 return (0); >> 1060 } >> 1061 >> 1062 return vmem_xalloc(vm, size, 0, 0, 0, VMEM_ADDR_MIN, VMEM_ADDR_MAX, >> 1063 flags, addrp); >> 1064 } >> >> >> This is at r254025. >> > > The REINPLACE_CMD at line 160 of nvidia-driver/Makefile is incorrect. > > How do I know that? Because I made a patch which results in a working > nvidia-driver-319.32 with r254050. That's what I'm running right now. > > Here's the patch (loaded with :r in vi, so all spaces etc. are correct): > > --- src/nvidia_subr.c.orig 2013-08-09 11:32:26.000000000 +0200 > +++ src/nvidia_subr.c 2013-08-09 11:33:23.000000000 +0200 > @@ -945,7 +945,7 @@ > return ENOMEM; > } > > - address = kmem_alloc_contig(kernel_map, size, flags, 0, > + address = kmem_alloc_contig(kmem_arena, size, flags, 0, > sc->dma_mask, PAGE_SIZE, 0, attr); > if (!address) { > status = ENOMEM; > @@ -994,7 +994,7 @@ > os_flush_cpu_cache(); > > if (at->pte_array[0].virtual_address != NULL) { > - kmem_free(kernel_map, > + kmem_free(kmem_arena, > at->pte_array[0].virtual_address, at->size); > malloc_type_freed(M_NVIDIA, at->size); > } > @@ -1021,7 +1021,7 @@ > if (at->attr != VM_MEMATTR_WRITE_BACK) > os_flush_cpu_cache(); > > - kmem_free(kernel_map, at->pte_array[0].virtual_address, > + kmem_free(kmem_arena, at->pte_array[0].virtual_address, > at->size); > malloc_type_freed(M_NVIDIA, at->size); > > @@ -1085,7 +1085,7 @@ > } > > for (i = 0; i < count; i++) { > - address = kmem_alloc_contig(kernel_map, PAGE_SIZE, flags, 0, > + address = kmem_alloc_contig(kmem_arena, PAGE_SIZE, flags, 0, > sc->dma_mask, PAGE_SIZE, 0, attr); > if (!address) { > status = ENOMEM; > @@ -1139,7 +1139,7 @@ > for (i = 0; i < count; i++) { > if (at->pte_array[i].virtual_address == 0) > break; > - kmem_free(kernel_map, > + kmem_free(kmem_arena, > at->pte_array[i].virtual_address, PAGE_SIZE); > malloc_type_freed(M_NVIDIA, PAGE_SIZE); > } > @@ -1169,7 +1169,7 @@ > os_flush_cpu_cache(); > > for (i = 0; i < count; i++) { > - kmem_free(kernel_map, > + kmem_free(kmem_arena, > at->pte_array[i].virtual_address, PAGE_SIZE); > malloc_type_freed(M_NVIDIA, PAGE_SIZE); > } > > The primary differences are > 1) use kmem_arena instead of kernel_map everywhere. The REINPLACE_CMD > uses kernel_arena > 2) DO NOT use kva_free, but kmem_free as previously > > To use the patch > Delete or comment out the 4 lines starting at 160 in Makefile > Run ``make patch'' > cd work/NVIDIA-FreeBSD-x86_64-319.32/src > patch < [wherever the patch is] > cd ../../.. > make deinstall install clean > kldunload the old nvidia.ko > kldload the new nvidia.ko > start X > Yes, I can confirm, that it builds, installs and runs fine for me. The patch should be placed as x11/nvidia-driver/files/patch-src__nvidia_subr.c, shoudn't it? Many thanks for this work. Regards and a nice weekend, Rainer Hurling