Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Jun 2011 17:14:29 +0300
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Attilio Rao <attilio@FreeBSD.org>
Cc:        freebsd-stable@FreeBSD.org, freebsd-current@FreeBSD.org, "Robert N. M. Watson" <rwatson@FreeBSD.org>
Subject:   Re: [poll / rfc] kdb_stop_cpus
Message-ID:  <4DEE3245.5030708@FreeBSD.org>
In-Reply-To: <BANLkTik6vR_vG%2BiwxVd9p3rNAMAXv-FGZQ@mail.gmail.com>
References:  <4DE8FA2E.4030202@FreeBSD.org>	<5E4D0F56-4338-4157-8BC6-17EE2831725F@FreeBSD.org>	<4DE9EB61.3000006@FreeBSD.org> <BANLkTik6vR_vG%2BiwxVd9p3rNAMAXv-FGZQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
on 05/06/2011 01:35 Attilio Rao said the following:
> 2011/6/4 Andriy Gapon <avg@freebsd.org>:
>> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016
>>
>>    generic_stop_cpus: move timeout detection code from under DIAGNOSTIC
>>
>>    ... and also increase it a bit.
>>    IMO it's better to detect and report the (rather serious) condition and
>>    allow a system to proceed somehow rather than be stuck in an endless
>>    loop.
>>
>> diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c
>> index ae52f4b..4bd766b 100644
>> --- a/sys/kern/subr_smp.c
>> +++ b/sys/kern/subr_smp.c
>> @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type)
>>                /* spin */
>>                cpu_spinwait();
>>                i++;
>> -#ifdef DIAGNOSTIC
>> -               if (i == 100000) {
>> +               if (i == 100000000) {
>>                        printf("timeout stopping cpus\n");
>>                        break;
>>                }
>> -#endif
>>        }
>>
>>        stopping_cpu = NOCPU;
> 
> I'd also add the ability, once the deadlock is detected, to break in
> KDB, and put that under DIAGNOSTIC.
> I had such a patch and I used it to debug some deadlocks on shutdown
> code, but now it seems I can't find it anymore.

I think that this could be useful.
Of course, it would have to honor KDB_UNATTENDED.
However, I am not sure how to implement it safely.  E.g. panic() should stop other
CPUs before setting panicstr and if some CPU is stuck for good, then we would just
be recursively calling panic() until triple-fault.  Ditto for kdb_trap().

So if you could dig up your code for implementing this that would be useful.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4DEE3245.5030708>