Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 06 Dec 2011 23:29:21 +0200
From:      Andriy Gapon <avg@FreeBSD.org>
To:        Attilio Rao <attilio@FreeBSD.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, arch@FreeBSD.org, current@FreeBSD.org
Subject:   Re: Stop scheduler on panic
Message-ID:  <4EDE8931.1080506@FreeBSD.org>
In-Reply-To: <CAJ-FndCz0wrFojK4_FOrcoQWk6Wd%2Btq%2Br%2BT9PXT%2BLn%2B7hvKB8A@mail.gmail.com>
References:  <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <CAJ-FndCz0wrFojK4_FOrcoQWk6Wd%2Btq%2Br%2BT9PXT%2BLn%2B7hvKB8A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
on 06/12/2011 20:34 Attilio Rao said the following:
[snip]
> - I'm not entirely sure, why we want to disable interrupts at this
> moment (before to stop other CPUs)?:

Because I believe that stop_cpus_hard() should run in a context with interrupts
and preemption disabled.  Also, I believe that the whole panic handling  code
should run in the same context.  So it was only natural for me to enter that
context at this point.

> @@ -547,13 +555,18 @@ panic(const char *fmt, ...)
>  {
>  #ifdef SMP
>  	static volatile u_int panic_cpu = NOCPU;
> +	cpuset_t other_cpus;
>  #endif
>  	struct thread *td = curthread;
>  	int bootopt, newpanic;
>  	va_list ap;
>  	static char buf[256];
> 
> -	critical_enter();
> +	if (stop_scheduler_on_panic)
> +		spinlock_enter();
> +	else
> +		critical_enter();
> +
> 
> - In this chunk I don't entirely understand the kdb_active check:
> @@ -566,11 +579,18 @@ panic(const char *fmt, ...)
>  		    PCPU_GET(cpuid)) == 0)
>  			while (panic_cpu != NOCPU)
>  				; /* nothing */
> +	if (stop_scheduler_on_panic) {
> +		if (panicstr == NULL && !kdb_active) {
> +			other_cpus = all_cpus;
> +			CPU_CLR(PCPU_GET(cpuid), &other_cpus);
> +			stop_cpus_hard(other_cpus);
> +		}
> +	}
>  #endif
> 
>  	bootopt = RB_AUTOBOOT;
>  	newpanic = 0;
> -	if (panicstr)
> +	if (panicstr != NULL)
>  		bootopt |= RB_NOSYNC;
>  	else {
>  		bootopt |= RB_DUMP;
> 
> Is it for avoiding to pass an empty mask to stop_cpus() in kdb_trap()
> (I saw you changed the policy there)?

Yes.  You know my position about elimination of the cpuset parameter to
stop_cpus_* and my intention to do so.  This is related to that.  Right now that
check is not strictly necessary,  but it doesn't do any harm either.  We know
that all other CPUs are already stopped when kdb_active (ditto for panicstr !=
NULL).

> Maybe we can find a better integration among the two.

What kind of integration?
Right now I have simple model - both stop all other CPUs.

> I'd also move the setting of stop_scheduler variable in the "if", it
> seems a bug to me to have it set otherwise.

Can you please explain what bug do you suspect here?
I do not see any.

[snip]
> - I'm not sure I like to change the policies on cpu stopping for KDB
> with this patchset.

I am not sure what exactly you mean by change in policies.  I do not see any
such change, entering kdb always stops all other CPUs, with or without the patch.

> I think we should discuss this furthermore, in
> particular in terms of reviewing what accesses points KDB has and if
> they are all covered.

Please review the code and see if anything is left to discuss :-)
My review shows that all access points are covered.  Essentially there is only
one access point - kdb_trap.  It's called either directly from a known trap
context or indirectly (via a debug trap) using kdb_enter.

> If someone can summary this up (and has already made the analysis) and
> would please share his findings I'd be happy about it, otherwise we
> should not commit the stop_cpu approach in kdb_trap() IMHO.

I hope that the above answers somewhat clear your concerns.  Just in case, if
you can point out any clear specific problems with this approach, then that can
block me from committing.  A fuzzy feeling that something might be wrong won't
do that :-)

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4EDE8931.1080506>