Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Dec 2011 23:11:01 +0100
From:      Attilio Rao <attilio@freebsd.org>
To:        Andriy Gapon <avg@freebsd.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, arch@freebsd.org, current@freebsd.org
Subject:   Re: Stop scheduler on panic
Message-ID:  <CAJ-FndA0f5v%2B7Y_HreqUXwxRwWyEqie1ip2c6==VKME4Bp1u5A@mail.gmail.com>
In-Reply-To: <4EDE8931.1080506@FreeBSD.org>
References:  <20111113083215.GV50300@deviant.kiev.zoral.com.ua> <CAJ-FndCz0wrFojK4_FOrcoQWk6Wd%2Btq%2Br%2BT9PXT%2BLn%2B7hvKB8A@mail.gmail.com> <4EDE8931.1080506@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/12/6 Andriy Gapon <avg@freebsd.org>:
> on 06/12/2011 20:34 Attilio Rao said the following:
> [snip]
>> - I'm not entirely sure, why we want to disable interrupts at this
>> moment (before to stop other CPUs)?:
>
> Because I believe that stop_cpus_hard() should run in a context with inte=
rrupts
> and preemption disabled. =C2=A0Also, I believe that the whole panic handl=
ing =C2=A0code
> should run in the same context. =C2=A0So it was only natural for me to en=
ter that
> context at this point.

I'm not against that, I don't see anything wrong with having
interrupts disabled at that point.

>> @@ -547,13 +555,18 @@ panic(const char *fmt, ...)
>> =C2=A0{
>> =C2=A0#ifdef SMP
>> =C2=A0 =C2=A0 =C2=A0 static volatile u_int panic_cpu =3D NOCPU;
>> + =C2=A0 =C2=A0 cpuset_t other_cpus;
>> =C2=A0#endif
>> =C2=A0 =C2=A0 =C2=A0 struct thread *td =3D curthread;
>> =C2=A0 =C2=A0 =C2=A0 int bootopt, newpanic;
>> =C2=A0 =C2=A0 =C2=A0 va_list ap;
>> =C2=A0 =C2=A0 =C2=A0 static char buf[256];
>>
>> - =C2=A0 =C2=A0 critical_enter();
>> + =C2=A0 =C2=A0 if (stop_scheduler_on_panic)
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 spinlock_enter();
>> + =C2=A0 =C2=A0 else
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 critical_enter();
>> +
>>
>> - In this chunk I don't entirely understand the kdb_active check:
>> @@ -566,11 +579,18 @@ panic(const char *fmt, ...)
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PCPU_GET(=
cpuid)) =3D=3D 0)
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 while (panic_cpu !=3D NOCPU)
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ; /* nothing */
>> + =C2=A0 =C2=A0 if (stop_scheduler_on_panic) {
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (panicstr =3D=3D NULL && =
!kdb_active) {
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
other_cpus =3D all_cpus;
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
CPU_CLR(PCPU_GET(cpuid), &other_cpus);
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
stop_cpus_hard(other_cpus);
>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>> + =C2=A0 =C2=A0 }
>> =C2=A0#endif
>>
>> =C2=A0 =C2=A0 =C2=A0 bootopt =3D RB_AUTOBOOT;
>> =C2=A0 =C2=A0 =C2=A0 newpanic =3D 0;
>> - =C2=A0 =C2=A0 if (panicstr)
>> + =C2=A0 =C2=A0 if (panicstr !=3D NULL)
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bootopt |=3D RB_NOSYNC;
>> =C2=A0 =C2=A0 =C2=A0 else {
>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bootopt |=3D RB_DUMP;
>>
>> Is it for avoiding to pass an empty mask to stop_cpus() in kdb_trap()
>> (I saw you changed the policy there)?
>
> Yes. =C2=A0You know my position about elimination of the cpuset parameter=
 to
> stop_cpus_* and my intention to do so. =C2=A0This is related to that. =C2=
=A0Right now that
> check is not strictly necessary, =C2=A0but it doesn't do any harm either.=
 =C2=A0We know
> that all other CPUs are already stopped when kdb_active (ditto for panics=
tr !=3D
> NULL).

I see there could be races with disabiling interrupts and having 2
different stopping mechanisms that want to stop cpus, even using
stop_cpus_hard(), on architectures that don't use a privileged channel
(NMI) as mostly of our !x86.
They are mostly very rare races (requiring kdb_trap() and panic() to
happen in parallel on different CPUs).

>> Maybe we can find a better integration among the two.
>
> What kind of integration?
> Right now I have simple model - both stop all other CPUs.

Well, there is no synchronization atm between panic stopping cpus and
the kdb_trap(). When kdb_trap() stop cpus it has interrupts disabled
and if panic already started they will both deadlock because IPI_STOP
won't be properly delivered.
However, I see all this as a problem with other arches not having/not
implementing a real dedicated channel for cpu_stop_hard(), so we
should not think about it now.

I think we may need some sort of control as panic already does with
panic_cpu before to disable interrupts, but don't worry about it now.

>> I'd also move the setting of stop_scheduler variable in the "if", it
>> seems a bug to me to have it set otherwise.
>
> Can you please explain what bug do you suspect here?
> I do not see any.

I just see more natural to move it within the above if (panicstr =3D=3D
NULL ...) condition.

> [snip]
>> - I'm not sure I like to change the policies on cpu stopping for KDB
>> with this patchset.
>
> I am not sure what exactly you mean by change in policies. =C2=A0I do not=
 see any
> such change, entering kdb always stops all other CPUs, with or without th=
e patch.

Yes, I was confused by older code did just stopped CPUs before
kdb_trap() manually, I think what kdb_trap() does now is ok (and you
just retain it).
I'd just change this check on panicstr:
@@ -606,9 +603,13 @@ kdb_trap(int type, int code, struct trapframe *tf)
 	intr =3D intr_disable();

 #ifdef SMP
-	other_cpus =3D all_cpus;
-	CPU_CLR(PCPU_GET(cpuid), &other_cpus);
-	stop_cpus_hard(other_cpus);
+	if (panicstr =3D=3D NULL) {
+		other_cpus =3D all_cpus;
+		CPU_CLR(PCPU_GET(cpuid), &other_cpus);
+		stop_cpus_hard(other_cpus);
+		did_stop_cpus =3D 1;
+	} else
+		did_stop_cpus =3D 0;

to be SCHEDULER_STOPPED().

If you agree I can fix the kern_mutex, kern_sx and kern_rwlock parts
and it should be done.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-FndA0f5v%2B7Y_HreqUXwxRwWyEqie1ip2c6==VKME4Bp1u5A>