Date: Thu, 01 Nov 2007 21:01:35 -0700 From: Nate Lawson <nate@root.org> To: Glen <glen.leeder@nokia.com> Cc: ACPI mailing list <freebsd-acpi@freebsd.org> Subject: Re: SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous) Message-ID: <472AA11F.3080302@root.org> In-Reply-To: <472A53B2.6030901@nokia.com> References: <472A53B2.6030901@nokia.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Glen wrote: > Hi, > > I have been seeing intermittent hangs in the acpi shutdown code on a > Intel 2.4GHz 8 CPU system. I am running a with a Freebsd6.1 code base > but cannot see a reason why this can't happen in other Freebsd versions. > The hang is very irregular, I am recreating it using an expect script > that repeatedly reboots the system. Sometimes, I can do up to 200 > reboots before observing the hang, sometimes, it happens after 5-20 > reboots. > > It has been difficult to pin down the hang as the system is not > responding to NMI events but using breakpoints I believe the hang is in > acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous. > > My theory is that one of the CPUs does not respond to ipi_all_but_self > and that all the other CPUs are waiting for it in smp_rendezvous_action. > The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system > hangs. This maybe happen due to other activity (or a deadlock?) on that > CPU. > > I noticed a few threads relating to this and have already tried stuff > like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt. > Neither had any effect. > > 1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and > this stops the hang from happening. Does anyone know the purpose of this > call in the shutdown code or if I might suffer some consequence by > removing it? I have one more thing I needed to consider. There's a race where a thread could be entering acpi_cpu_idle() to read a C2-3 register but that register state gets destroyed with the softc before the read. In that case, I thought there could be a panic, hence why I originally put in the smp_rendezvous(). However, I don't think device_shutdown() frees softcs (need to look in the newbus code to be sure). So I still should be able to remove this code after checking more closely. -- Nate
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?472AA11F.3080302>