Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 7 May 2014 17:15:43 -0600
From:      John Nielsen <lists@jnielsen.net>
To:        freebsd-hackers@freebsd.org
Cc:        freebsd-virtualization@freebsd.org
Subject:   consistent VM hang during reboot
Message-ID:  <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net>

next in thread | raw e-mail | index | archive | help
I am trying to solve a problem with amd64 FreeBSD virtual machines =
running on a Linux+KVM hypervisor. To be honest I'm not sure if the =
problem is in FreeBSD or the hypervisor, but I'm trying to rule out the =
OS first.

The _second_ time FreeBSD boots in a virtual machine with more than one =
core, the boot hangs just before the kernel would normally print e.g. =
"SMP: AP CPU #1 Launched!" (The last line on the console is "usbus0: =
12Mbps Full Speed USB v1.0", but the problem persists even without USB). =
The VM will boot fine a first time, but running either "shutdown -r now" =
OR "reboot" will lead to a hung second boot. Stopping and starting the =
host qemu-kvm process is the only way to continue.

The problem seems to be triggered by something in the SMP portion of =
cpu_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual =
"reset" button the next boot is fine. If I have 'kern.smp.disabled=3D"1"' =
set for the initial boot then subsequent boots are fine (but I can only =
use one CPU core, of course). However, if I boot normally the first time =
then set 'kern.smp.disabled=3D"1"' for the second (re)boot, the problem =
is triggered. Apparently something in the shutdown code is "poisoning =
the well" for the next boot.

The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of =
yesterday.

This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:

--- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 13:19:07.400981580 =
-0600
+++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 =
-0600
@@ -593,7 +593,7 @@
 void
 cpu_reset()
 {
-#ifdef SMP
+#if 0
 	cpuset_t map;
 	u_int cnt;

I've tried skipping or disabling smaller chunks of code within the #if =
block but haven't found a consistent winner yet.

I'm hoping the list will have suggestions on how I can further narrow =
down the problem, or theories on what might be going on.

Thanks!

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BED233F2-EAFF-41A3-9C5B-869041A9AED8>