Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Jun 2011 18:35:28 -0400
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-stable@freebsd.org, freebsd-current@freebsd.org, "Robert N. M. Watson" <rwatson@freebsd.org>
Subject:   Re: [poll / rfc] kdb_stop_cpus
Message-ID:  <BANLkTik6vR_vG%2BiwxVd9p3rNAMAXv-FGZQ@mail.gmail.com>
In-Reply-To: <4DE9EB61.3000006@FreeBSD.org>
References:  <4DE8FA2E.4030202@FreeBSD.org> <5E4D0F56-4338-4157-8BC6-17EE2831725F@FreeBSD.org> <4DE9EB61.3000006@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/6/4 Andriy Gapon <avg@freebsd.org>:
> on 03/06/2011 20:57 Robert N. M. Watson said the following:
>>
>> On 3 Jun 2011, at 16:13, Andriy Gapon wrote:
>>
>>> I wonder if anybody uses kdb_stop_cpus with non-default value. If, yes,=
 I
>>> am very interested to learn about your usecase for it.
>>
>> The issue that prompted the sysctl was non-NMI IPIs being used to enter =
the
>> debugger or reboot following a core hanging with interrupts disabled. Wi=
th
>> the switch to NMI IPIs in some of those circumstances, life is better --=
 at
>> least, on hardware that supports non-maskable IPIs. I seem to recall spa=
rc64
>> doesn't, however?
>
> Seems to be so as Nathan has also pointed out for PPC.
> For this I also plan the following change:
>
> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016
>
> =C2=A0 =C2=A0generic_stop_cpus: move timeout detection code from under DI=
AGNOSTIC
>
> =C2=A0 =C2=A0... and also increase it a bit.
> =C2=A0 =C2=A0IMO it's better to detect and report the (rather serious) co=
ndition and
> =C2=A0 =C2=A0allow a system to proceed somehow rather than be stuck in an=
 endless
> =C2=A0 =C2=A0loop.
>
> diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c
> index ae52f4b..4bd766b 100644
> --- a/sys/kern/subr_smp.c
> +++ b/sys/kern/subr_smp.c
> @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type)
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* spin */
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cpu_spinwait();
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0i++;
> -#ifdef DIAGNOSTIC
> - =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (i =3D=3D 100000) {
> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (i =3D=3D 100000000=
) {
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0printf("timeout stopping cpus\n");
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0break;
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
> -#endif
> =C2=A0 =C2=A0 =C2=A0 =C2=A0}
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0stopping_cpu =3D NOCPU;

I'd also add the ability, once the deadlock is detected, to break in
KDB, and put that under DIAGNOSTIC.
I had such a patch and I used it to debug some deadlocks on shutdown
code, but now it seems I can't find it anymore.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTik6vR_vG%2BiwxVd9p3rNAMAXv-FGZQ>