Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Nov 2007 17:43:23 -0700
From:      Nate Lawson <nate@root.org>
To:        Glen <glen.leeder@nokia.com>
Cc:        ACPI mailing list <freebsd-acpi@freebsd.org>
Subject:   Re: SMP system shutdown hang (acpi_cpu_shutdown - smp_rendezvous)
Message-ID:  <472A72AB.4000809@root.org>
In-Reply-To: <472A53B2.6030901@nokia.com>
References:  <472A53B2.6030901@nokia.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Glen wrote:
> Hi,
> 
> I have been seeing intermittent hangs in the acpi shutdown code on a
> Intel 2.4GHz 8 CPU system. I am running a with a  Freebsd6.1 code base
> but cannot see a reason why this can't happen in other Freebsd versions.
> The hang is very irregular, I am recreating it using an expect script
> that repeatedly reboots the system. Sometimes, I can do up to 200
> reboots before observing the hang, sometimes, it happens after 5-20
> reboots.
> 
> It has been difficult to pin down the hang as the system is not
> responding to NMI events but using breakpoints I believe the hang is in 
> acpi_cpu.c:acpi_cpu_shutdown with the call to smp_rendezvous.

First, thank you for your careful debugging help.  This is wonderful.

> My theory is that one of the CPUs does not respond to ipi_all_but_self
> and that all the other CPUs are waiting for it in smp_rendezvous_action.
> The smp_rv_waiters[0] < mp_ncpus condition never gets met and the system
> hangs. This maybe happen due to other activity (or a deadlock?) on that
> CPU.
> 
> I noticed a few threads relating to this and have already tried stuff
> like changing kern.sched.ipiwakeup.enabled & machdep.cpu_idle_hlt.
> Neither had any effect.

Very interesting.  I didn't think anything could cause an IPI not to get
delivered eventually but during shutdown interrupts may be disabled at
some point.

> 1) I tried removing the call to smp_rendezvous in acpi_cpu_shutdown and
> this stops the hang from happening. Does anyone know the purpose of this
> call in the shutdown code or if I might suffer some consequence by
> removing it?

Yes, I put it in to break all APs out of their potential C1-3 sleep.
This way they are not halted when shutdown needs to synchronize and stop
them.  But that code sends its own IPI so there is no reason to do it
again here.  I will remove smp_rendezvous() now.

> 2) Has anyone got any suggestions for debugging this further given that
> I can't break into the debugger? I thought I could maybe instrument some
> counters in i386/i386/local_apic.c & kern_smp.c with the aim of
> identifying a root cause.

Sounds reasonable.  Thanks again for a detailed problem report.

-- 
Nate



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?472A72AB.4000809>