Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 5 Dec 2017 13:37:21 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        bugzilla-noreply@freebsd.org
Cc:        freebsd-bugs@freebsd.org
Subject:   Re: [Bug 224069] (Fix included) Use of uninitalized register value in vesa.ko, causing X, text console and suspend/resume to fail
Message-ID:  <20171205120519.R1457@besplex.bde.org>
In-Reply-To: <bug-224069-8-jTJiD5CPL5@https.bugs.freebsd.org/bugzilla/>
References:  <bug-224069-8@https.bugs.freebsd.org/bugzilla/> <bug-224069-8-jTJiD5CPL5@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 4 Dec 2017 a bug that doesn't want repleies@freebsd.org wrote:

> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=224069
>
> --- Comment #9 from Jung-uk Kim <jkim@FreeBSD.org> ---
> ...
>> In the FreeBSD forums, there are constant complaints regarding Nvidia cards.
> ...
>> So I have the strong feeling that there is a serious problem with the vesa
>> module. But that is just my (possibly misleading) intuition.
>
> Ah, NVIDIA...  It is a well known problem for many years.  The root cause is
> NVIDIA does not really support VESA BIOS re-POST and save/restore state calls.
> It just checks few bits and returns immediately if they are set/clear.  I have

I wrote that my laptop with Nvidia video is the only one where
suspend/resume works (it only started working at age 9 in 2015, at
least on i386).  Actually, it has ATI video.

> a feeling that they intentionally did it to make reverse engineering harder.
> On top of that, they don't provide any information to improve the situation.
> Long time ago, I even bought couple of cheap NVIDIA controllers to fix the
> problem but soon I gave up because I realized it was too complicated and I
> wasn't desperate.  So, no, VESA module have no "serious" problem.  Just NVIDIA
> doesn't want us to mess around with their hardware although we own it.
>
> We can add few knobs in the driver to work around these issues and turn them on
> by default if NVIDIA controller is found.

Intel graphics may have similar problems.  It crashes on my Sandybridge
laptop and crashes or fails on my Haswell desktop (the video ROM signature
indicates that the ROM is fairly generic for these CPUs).

I got a bit further debugging this.  On Haswell, int 0x10 starts with

 	outl(0xf000, 0x45400);
 	if (inl(0xf004) == 0xffffffff)
 		return (FAILURE);

for some calls (all calls?)

Initially, inl(0xf004) is not 0xffffffff, so the call doesn't fail.  IIRC,
it is 0, but I may have misread this.

At suspend time inl(0xf004) is 0xffffffff, so the call fails.

At resume time, the call succeeds, so inl(0xf004) can't be 0xffffffff.

>From userland, inl(0xf004) is 0xc0000000.  This 0xc0000000 might be the
address of the video ROM in 16-bit segment:offset form.

The STATE_SAVE and STATE_LOAD calls are not normally made initially.
When I force them in debugging code, STATE_SAVE with the normal
vesa_state = 0xf crashes on a corrupt destination pointer (it mostly
points %es:%di into the target buffer and advances %di as it fills
the buffer, but it also uses %edi as a scratch variable and doesn't
restore it in some cases).  This bug seems to be in the standard ROM
-- not just some artifact of vm86 mode.

When I force STATE_SAVE with vesa_state = 0x7 initially, the call works.

The failure at suspend time prevents reaching the crash with the normal
vesa_state and the success with vesa_state = 0x7.  It also prevents the
screen coming back.  All this is without drm2.  drm2 is even more unusable
than drm since it it only exists as a module.

Resuming using the power button on this system often bounces to another
suspend.

Intel NICs used to work for on this system, but they work less well with
iflib.  For resume, they now often come back with very slow ping times
(usually hundreds of milliseconds, sometimes many seconds).

That was all on Haswell i386.  On Sandybridge i386:
- port 0xf000-4 isn't special according to userland tests (I didn't trace
   the int 0x10 call)
- suspend hangs without printing anything for debug.x86bios.int=1.

On Sandybridge amd64:
- the screen part seems to work, except the screen never comes back and
   the screen goes away before any output for debug.x86bios.int=1 is
   visible.  The output is especially hard to see without a serial console
   and with the buggy NIC resume.  I finally found it in /var/log/messages
   on reboot.  It shows that the SAVE/LOAD calls succeeded.  LOAD didn't
   actually work.

On Haswell amd64:
- everything behaves as on Haswell i386, except the NIC resume and power
   key bounce are smaller problems, and debug.x86bios.int=1 output is no
   longer visible on the console for suspend, perhaps just because the
   screen is turned off faster.  The output is easy to see on the serial
   console.  It shows that SAVE failed as on i386.

All this is with syscons.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20171205120519.R1457>