Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 17 Nov 2014 10:01:16 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-ports-bugs@FreeBSD.org
Subject:   [Bug 195097] New: x11/nvidia-driver: Kernel panic after "NVRM: rm_init_adapter() failed!"
Message-ID:  <bug-195097-13@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195097

            Bug ID: 195097
           Summary: x11/nvidia-driver: Kernel panic after "NVRM:
                    rm_init_adapter() failed!"
           Product: Ports Tree
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: Needs Triage
          Severity: Affects Some People
          Priority: ---
         Component: Individual Port(s)
          Assignee: danfe@FreeBSD.org
          Reporter: stefanf@FreeBSD.org
          Assignee: danfe@FreeBSD.org
             Flags: maintainer-feedback?(danfe@FreeBSD.org)

With the update to nvidia-driver-340.46 on HEAD amd64, I now have a ~50% chance
of a kernel panic at `startx'. Just before the panic I get the errors

NVRM: RmInitAdapter failed! (0x26:0x2a:1224)
nvidia0: NVRM: rm_init_adapter() failed!

followed immediately by

fatal trap 12: page fault while in kernel mode within rm_free_unused_clients.

I took a look at the open source parts of the driver and found an invalid null
pointer usage, I think.

The driver does roughly this:

devfs_open
    nvidia_dev_open
        devfs_set_cdevpriv
        nvidia_open_dev
            NV_UMA_ZONE_ALLOC_STACK(sc->api_sp);
            rm_init_adapter -> fail
            NV_UMA_ZONE_FREE_STACK(sc->api_sp);

Here sc->api_sp is set to NULL after the rm_init_adapter failure.

    devfs_clear_cdevpriv
        devfs_fpdrop
            devfs_destroy_cdevpriv
                nvidia_dev_dtor
                    nvidia_close_dev
                        rm_free_unused_clients(sc->api_sp)

Here rm_free_unused_clients is called with a null pointer. This function is not
open source, but from the panic my guess is it's not happy being called with a
null pointer.

I'm not sure about the best possible fix, but calling nvidia_close_dev after an
unsuccessful nvidia_open_dev seems wrong and also wraps the refcnt to 
(uint32_t)-1.  Maybe nvidia_dev_dtor simply needs to check the refcnt and avoid
calling nvidia_close_dev if it's already 0.

--- Comment #1 from Bugzilla Automation <bugzilla@FreeBSD.org> ---
Auto-assigned to maintainer danfe@FreeBSD.org

-- 
You are receiving this mail because:
You are the assignee for the bug.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-195097-13>