From owner-freebsd-current@FreeBSD.ORG Tue Jun 7 14:14:33 2011 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 081D8106566C; Tue, 7 Jun 2011 14:14:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id B939C8FC15; Tue, 7 Jun 2011 14:14:31 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id RAA23855; Tue, 07 Jun 2011 17:14:29 +0300 (EEST) (envelope-from avg@FreeBSD.org) Message-ID: <4DEE3245.5030708@FreeBSD.org> Date: Tue, 07 Jun 2011 17:14:29 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; U; FreeBSD amd64; en-US; rv:1.9.2.17) Gecko/20110504 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: Attilio Rao References: <4DE8FA2E.4030202@FreeBSD.org> <5E4D0F56-4338-4157-8BC6-17EE2831725F@FreeBSD.org> <4DE9EB61.3000006@FreeBSD.org> In-Reply-To: X-Enigmail-Version: 1.1.2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-stable@FreeBSD.org, freebsd-current@FreeBSD.org, "Robert N. M. Watson" Subject: Re: [poll / rfc] kdb_stop_cpus X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2011 14:14:33 -0000 on 05/06/2011 01:35 Attilio Rao said the following: > 2011/6/4 Andriy Gapon : >> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016 >> >> generic_stop_cpus: move timeout detection code from under DIAGNOSTIC >> >> ... and also increase it a bit. >> IMO it's better to detect and report the (rather serious) condition and >> allow a system to proceed somehow rather than be stuck in an endless >> loop. >> >> diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c >> index ae52f4b..4bd766b 100644 >> --- a/sys/kern/subr_smp.c >> +++ b/sys/kern/subr_smp.c >> @@ -232,12 +232,10 @@ generic_stop_cpus(cpumask_t map, u_int type) >> /* spin */ >> cpu_spinwait(); >> i++; >> -#ifdef DIAGNOSTIC >> - if (i == 100000) { >> + if (i == 100000000) { >> printf("timeout stopping cpus\n"); >> break; >> } >> -#endif >> } >> >> stopping_cpu = NOCPU; > > I'd also add the ability, once the deadlock is detected, to break in > KDB, and put that under DIAGNOSTIC. > I had such a patch and I used it to debug some deadlocks on shutdown > code, but now it seems I can't find it anymore. I think that this could be useful. Of course, it would have to honor KDB_UNATTENDED. However, I am not sure how to implement it safely. E.g. panic() should stop other CPUs before setting panicstr and if some CPU is stuck for good, then we would just be recursively calling panic() until triple-fault. Ditto for kdb_trap(). So if you could dig up your code for implementing this that would be useful. -- Andriy Gapon