Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Feb 2007 18:21:52 -0600
From:      Billy Newsom <billy@nlcc.us>
To:        stable@freebsd.org
Subject:   The code for rebooting an SMP machine doesn't always work (still)
Message-ID:  <45DB90A0.4070906@nlcc.us>

next in thread | raw e-mail | index | archive | help
When a SMP machine does not have an AT keyboard controller, there needs 
to be a way to reboot the machine under FreeBSD!

I have another system which fails to reboot under FreeBSD. This time it 
is a bleeding-edge current system and FreeBSD 6.2-release. From what I 
can tell, the code to reboot machines has not really changed much in 
over ten years. There is definitely something wrong with it, however, 
probably in SMP systems.

Here is what happens. When I do a shutdown and reboot of this machine 
(which lacks the keyboard controller), I get a notice that the Keyboard 
reset failed.

//Keyboard reset did not work, attempting CPU shutdown//

Then, there is an attempt to reboot the machine which results in a Fatal 
Trap 12.

Google for triple fault reboot
perhaps kern/94822

Now, I have looked at the code a lot to see what is happening. Taking 
vm_mach_dep.c to task, it is obvious that the keyboard reboot is the 
norm, with the alternate method used as a last resort. (This is true 
even for AMD64, which is what I am using!) I even looked at the locore.s 
code to see how that reboot code works (written in assemler, I think) 
there, and they do not even try the keyboard reset. The idea, it would 
seem, is to cause what is known as a triple fault in the CPU, which is 
supposed to force it to reset. (I cross-referenced to other operating 
systems, like NetBSD) In this case, I think maybe the CPU is somehow 
surviving the attempt to be rebooted when certain things happen.

I wonder if someone would like to test this, simply remove the portion 
of vm_mach_dep.c that attempts the keyboard reset and see if the 
remaining C code there works. After all, this bug only shows up on the 
odd machine which has no KBC.

The code is easy to spot because the comment is

*//* "good night, sweet prince .... <THUNK!>" */

and it has been in the code since the 1990s at least.
/*


Examples of affected machines? Well, I am testing a Mac Pro with dual 
Xeons and four cores. I believe that blade servers are often without a 
keyboard controller, too. Many embedded systems
have no KBC. The other example is a machine that I still run FreeBSD 5 
on. It is a dual Pentium Pro 200. Notice that both of my examples are 
running SMP, and this could have a lot to do with being able to force a 
CPU to execute and perform three

See cpu_reset_real() and its comments at 
http://fxr.watson.org/fxr/source//amd64/amd64/vm_machdep.c
http://fxr.watson.org/fxr/source/i386/i386/vm_machdep.c

For those who might think I didn't try everything: I tried this in 
device.hints:

# Billy removed these six things for Mac Pro
hint.atkbdc.0.disabled="1"
hint.atkbd.0.disabled="1"
hint.sio.0.disabled="1"
hint.sio.1.disabled="1"
hint.ppc.0.disabled="1"
hint.psm.0.disabled="1"
# Billy removed these six things for Mac Pro

I tried removing those culprits from the kernel, too. Less errors at 
boot, but never would it reboot. It will do this:
halt -p (Works)

It will reboot under Windows XP (same machine)
It will reboot at the Boot Loader prompt (type reboot, and it does that. 
See locore.s)

In other words, amd64's vm_machdep.c is the problem, but I must say that 
I'm pretty confident that the same is true for i386. My dual Pentium Pro 
stopped rebooting okay when upgraded from FreeBSD 4.x to 5.2 and still 
won't reboot.

As a footnote, there is a kernel option called BROKEN_KEYBOARD_RESET. 
Great, right? Well, someone disabled it for amd64, so the kernel 
wouldn't even build with that option. Shame on us for removing a simple 
way to troubleshoot a problem. I would recommend adding that back as 
either a device hint or a kernel option. It's still available for i386. 
But all it would do for me is avoid the attempt to try the keyboard 
reset, which doesn't freeze or panic this computer, it simply just 
doesn't work.

Thanks for any help. I have collected a lot of data if someone is 
interested. I may post my dmesg output for this Macintosh anyway just 
for someone's reference.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?45DB90A0.4070906>