Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 May 2014 18:42:24 +0000
From:      Andrew Duane <aduane@juniper.net>
To:        John Nielsen <lists@jnielsen.net>, John Baldwin <jhb@freebsd.org>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, "freebsd-virtualization@freebsd.org" <freebsd-virtualization@freebsd.org>
Subject:   RE: consistent VM hang during reboot
Message-ID:  <af0f4c6348d64ab0b5ea56d2ea777e99@BY2PR05MB582.namprd05.prod.outlook.com>
In-Reply-To: <E97C3027-79CF-45F9-B5ED-3339D7AE0B5F@jnielsen.net>
References:  <BED233F2-EAFF-41A3-9C5B-869041A9AED8@jnielsen.net> <201405081303.17079.jhb@freebsd.org> <E97C3027-79CF-45F9-B5ED-3339D7AE0B5F@jnielsen.net>

next in thread | previous in thread | raw e-mail | index | archive | help
When I was doing some early work on some of the Octeon multi-core chips, I =
encountered something similar. If I remember correctly, there was an issue =
in the shutdown sequence that did not properly halt the cores and set up th=
e "start jump" vector. So the first core would start, and when it tried to =
start the next ones it would hang waiting for the ACK that they were runnin=
g (since they didn't have a start vector and hence never started). I know M=
IPS, not AMD, so I can't say what the equivalent would be, but I'm sure the=
re is one. Check that part, setting up the early state.

If Juli and/or Adrian are reading this: do you remember anything about that=
, something like 2 years ago?

....................................
Andrew L. Duane
AT&T Technical Lead
JNCIA - JUNOS
m=A0=A0=A0+1 603.770.7088
o    +1 408.933.6944 (2-6944)
skype: andrewlduane
aduane@juniper.net


-----Original Message-----
From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freeb=
sd.org] On Behalf Of John Nielsen
Sent: Thursday, May 08, 2014 1:56 PM
To: John Baldwin
Cc: freebsd-hackers@freebsd.org; freebsd-virtualization@freebsd.org
Subject: Re: consistent VM hang during reboot

On May 8, 2014, at 11:03 AM, John Baldwin <jhb@freebsd.org> wrote:

> On Wednesday, May 07, 2014 7:15:43 pm John Nielsen wrote:
>> I am trying to solve a problem with amd64 FreeBSD virtual machines runni=
ng on a Linux+KVM hypervisor. To be honest I'm not sure if the problem is i=
n FreeBSD or=20
> the hypervisor, but I'm trying to rule out the OS first.
>>=20
>> The _second_ time FreeBSD boots in a virtual machine with more than one =
core, the boot hangs just before the kernel would normally print e.g. "SMP:=
 AP CPU #1=20
> Launched!" (The last line on the console is "usbus0: 12Mbps Full Speed US=
B v1.0", but the problem persists even without USB). The VM will boot fine =
a first time,=20
> but running either "shutdown -r now" OR "reboot" will lead to a hung seco=
nd boot. Stopping and starting the host qemu-kvm process is the only way to=
 continue.
>>=20
>> The problem seems to be triggered by something in the SMP portion of cpu=
_reset() (from sys/amd64/amd64/vm_machdep.c). If I hit the virtual "reset" =
button the next=20
> boot is fine. If I have 'kern.smp.disabled=3D"1"' set for the initial boo=
t then subsequent boots are fine (but I can only use one CPU core, of cours=
e). However, if I=20
> boot normally the first time then set 'kern.smp.disabled=3D"1"' for the s=
econd (re)boot, the problem is triggered. Apparently something in the shutd=
own code is=20
> "poisoning the well" for the next boot.
>>=20
>> The problem is present in FreeBSD 8.4, 9.2, 10.0 and 11-CURRENT as of ye=
sterday.
>>=20
>> This (heavy-handed and wrong) patch (to HEAD) lets me avoid the issue:
>>=20
>> --- sys/amd64/amd64/vm_machdep.c.orig	2014-05-07 13:19:07.400981580 -060=
0
>> +++ sys/amd64/amd64/vm_machdep.c	2014-05-07 17:02:52.416783795 -0600
>> @@ -593,7 +593,7 @@
>> void
>> cpu_reset()
>> {
>> -#ifdef SMP
>> +#if 0
>> 	cpuset_t map;
>> 	u_int cnt;
>>=20
>> I've tried skipping or disabling smaller chunks of code within the #if b=
lock but haven't found a consistent winner yet.
>>=20
>> I'm hoping the list will have suggestions on how I can further narrow do=
wn the problem, or theories on what might be going on.
>=20
> Can you try forcing the reboot to occur on the BSP (via 'cpuset -l 0 rebo=
ot')
> or a non-BSP ('cpuset -l 1 reboot') to see if that has any effect?  It mi=
ght
> not, but if it does it would help narrow down the code to consider.

Hello jhb, thanks for responding.

I tried your suggestion but unfortunately it does not make any difference. =
The reboot hangs regardless of which CPU I assign the command to.

Any other suggestions?

JN

_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?af0f4c6348d64ab0b5ea56d2ea777e99>