Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Jul 2010 08:26:32 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Yuri Pankov <yuri.pankov@gmail.com>, =?iso-8859-15?q?Ren=E9_Ladan?= <rene@freebsd.org>, David Naylor <naylor.b.david@gmail.com>
Subject:   Re: nvidia-driver crashing kernel on head
Message-ID:  <201007080826.32764.jhb@freebsd.org>
In-Reply-To: <201007021855.42103.naylor.b.david@gmail.com>
References:  <201007021146.46542.naylor.b.david@gmail.com> <AANLkTimT4UwDzB6jF2eML4U7jQubOs1slwBPHwy_5U3b@mail.gmail.com> <201007021855.42103.naylor.b.david@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, July 02, 2010 12:55:38 pm David Naylor wrote:
> On Friday 02 July 2010 14:57:35 Ren=E9 Ladan wrote:
> > 2010/7/2 Yuri Pankov <yuri.pankov@gmail.com>:
> > > On Fri, Jul 02, 2010 at 11:46:41AM +0200, David Naylor wrote:
> > >> Hi,
> > >>=20
> > >> I'm not sure this has been reported before but I am experience crash=
es
> > >> with nvidia-driver on -current (cvsup ~day ago).
> > >>=20
> > >> If I remove all the debugging options from the kernel config then it=
 is
> > >> very usable.
> > >>=20
> > >> Here are the backtraces from two nvidia-driver versions:
> > >>=20
> > >> nvidia-driver-195.36.15 and GENERIC:
> > >> panic: mutex page lock not owned at
> > >> /home/freebsd9/src/sys/vm/vm_page.c:1638 cpuid =3D 1
> > >> KDB: enter: panic
> > >> [ thread pid 1815 tid 100097 ]
> > >> Stopped at      kdb_enter+0x3d: movq    $0,0x6bc27c(%rip)
> > >> db> bt
> > >> Tracing pid 1815 tid 100097 td 0xffffff00045af000
> > >> kdb_enter() at kdb_enter+0x3d
> > >> panic() at panic+0x176
> > >> assert_mtx() at assert_mtx
> > >> vm_page_wire() at vm_page_wire+0x37
> > >> nv_alloc_system_pages() at nv_alloc_system_pages+0x217
> > >> nv_alloc_pages() at nv_alloc_pages+0xcd
> > >> _nv019978rm() at _nv019978rm+0x7f
> > >>=20
> > >> nvidia-driver-256.35 and custom kernel:
> > >> panic: blockable sleep lock (sleep mutex) select mtxpool @
> > >> /home/freebsd9/src/sys/kern/sys_generic.c:1479
> > >> cpuid =3D 1
> > >> KDB: enter: panic
> > >> [ thread pid 1830 tid 100090 ]
> > >> Stopped at      kdb_enter+0x3d: movq    $0,0x51368c(%rip)
> > >> db> bt
> > >> Tracing pid 1830 tid 100090 td 0xffffff000456d3d0
> > >> kdb_enter() at kdb_enter+0x3d
> > >> panic() at panic+0x176
> > >> witness_checkorder() at witness_checkorder+0x913
> > >> _mtx_lock_flags() at _mtx_lock_flags+0x68
> > >> selrecord() at selrecord+0x71
> > >> nvidia_dev_poll() at nvidia_dev_poll+0x52
> > >> devfs_poll_f() at devfs_poll_f+0x55
> > >> kern_select() at kern_select+0x501
> > >> select() at select+0x54
> > >> syscallenter() at syscallenter+0x19b
> > >> syscall() at syscall+0x41
> > >> Xfast_syscall() at Xfast_syscall+0xe2
> > >> --- syscall (93, FreeBSD ELF64, select), rip =3D 0x801a17ddc, rsp =3D
> > >> 0x7fffffffe908, rbp =3D 0x100 ---
> > >>=20
> > >> Also of note is:
> > >> # grep '^C.*FLAGS' /etc/make.conf
> > >> CFLAGS+=3D -DNDEBUG
> > >>=20
> > >> As mentioned that without any debugging options the system is stable.
> > >>=20
> > >> Is there anything I can do to assist diagnosis?
> > >>=20
> > >> Regards,
> > >>=20
> > >> David
> > >=20
> > > http://lists.freebsd.org/pipermail/freebsd-current/2010-June/017936.h=
tml
> > > helps here, check the thread as well.
> > >=20
> > > You could also try to use 256.35 driver.
> >=20
> > The 256.35 driver works for me (without the above-referred patch), but
> > anywhere between 1 and 48 hours my laptop locks up hard without any
> > warning nor panic. This is with CURRENT r209581, GENERIC kernel, but wi=
th
> > debug.witness.watch=3D0 If I set debug.witness.watch to 1, the kernel
> > freezes when starting X.
>=20
> I experienced a lockup when using the 256.35 driver, I switched back to t=
he=20
> 195.36.15 driver and no problems since.  The system also freezes up when=
=20
> launching k3b so I'm not sure what caused that particular freeze...
>=20
> Thanks for the debug.witness.watch hint. =20

These freezes and panics are due to the driver using a spin mutex instead o=
f a=20
regular mutex for the per-file descriptor event_mtx.  If you patch the driv=
er=20
to change it to be a regular mutex I think that should fix the problems.

=2D-=20
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201007080826.32764.jhb>