Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2011 00:12:28 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Max Laier <max@love2party.net>
Cc:        FreeBSD current <freebsd-current@freebsd.org>, Peter Grehan <grehan@freebsd.org>, Andriy Gapon <avg@freebsd.org>, neel@freebsd.org
Subject:   Re: proposed smp_rendezvous change
Message-ID:  <BANLkTi=QCZC%2BTuWjA9EzaP%2BBcn87cKW0Rg@mail.gmail.com>
In-Reply-To: <201105161805.51188.max@love2party.net>
References:  <4DCD357D.6000109@FreeBSD.org> <201105161738.53366.max@love2party.net> <BANLkTik8CqtiP9OgvBpL08dqK6Aj%2BLQ3OA@mail.gmail.com> <201105161805.51188.max@love2party.net>

next in thread | previous in thread | raw e-mail | index | archive | help
2011/5/17 Max Laier <max@love2party.net>:
> On Monday 16 May 2011 17:54:54 Attilio Rao wrote:
>> 2011/5/16 Max Laier <max@love2party.net>:
>> > On Monday 16 May 2011 16:46:03 John Baldwin wrote:
>> >> On Monday, May 16, 2011 4:30:44 pm Max Laier wrote:
>> >> > On Monday 16 May 2011 14:21:27 John Baldwin wrote:
>> >> > > Yes, we need to fix that. =C2=A0Humm, it doesn't preempt when you=
 do a
>> >> > > critical_exit() though? =C2=A0Or do you use a hand-rolled critica=
l exit
>> >> > > that doesn't do a deferred preemption?
>> >> >
>> >> > Right now I just did a manual td_critnest++/--, but I guess ...
>> >>
>> >> Ah, ok, so you would "lose" a preemption. =C2=A0That's not really ide=
al.
>> >>
>> >> > > Actually, I'm curious how the spin unlock inside the IPI could yi=
eld
>> >> > > the CPU. =C2=A0Oh, is rmlock doing a wakeup inside the IPI handle=
r? =C2=A0I
>> >> > > guess that is ok as long as the critical_exit() just defers the
>> >> > > preemption to the end of the IPI handler.
>> >> >
>> >> > ... the earliest point where it is safe to preempt is after doing t=
he
>> >> >
>> >> > =C2=A0 =C2=A0atomic_add_int(&smp_rv_waiters[2], 1);
>> >> >
>> >> > so that we can start other IPIs again. =C2=A0However, since we don'=
t accept
>> >> > new IPIs until we signal EOI in the MD code (on amd64), this might
>> >> > still not be a good place to do the yield?!?
>> >>
>> >> Hmm, yeah, you would want to do the EOI before you yield. =C2=A0Howev=
er, we
>> >> could actually move the EOI up before calling the MI code so long as =
we
>> >> leave interrupts disabled for the duration of the handler (which we d=
o).
>> >>
>> >> > The spin unlock boils down to a critical_exit() and unless we did a
>> >> > critical_enter() at some point during the redenvouz setup, we will
>> >> > yield() if we owepreempt. =C2=A0I'm not quite sure how that can hap=
pen, but
>> >> > it seems like there is a path that allows the scheduler to set it f=
rom
>> >> > a foreign CPU.
>> >>
>> >> No, it is only set on curthread by curthread. =C2=A0This is actually =
my main
>> >> question. =C2=A0I've no idea how this could happen unless the rmlock =
code is
>> >> actually triggering a wakeup or sched_add() in its rendezvous handler=
.
>> >>
>> >> I don't see anything in rm_cleanIPI() that would do that however.
>> >>
>> >> I wonder if your original issue was really fixed =C2=A0just by the fi=
rst
>> >> patch you had which fixed the race in smp_rendezvous()?
>> >
>> > I found the stack that lead me to this patch in the first place:
>> >
>> > #0 =C2=A0sched_switch (td=3D0xffffff011a970000, newtd=3D0xffffff006e67=
84b0,
>> > flags=3D4) at src/sys/kern/sched_ule.c:1939
>> > #1 =C2=A00xffffffff80285c7f in mi_switch (flags=3D6, newtd=3D0x0) at
>> > src/sys/kern/kern_synch.c:475
>> > #2 =C2=A00xffffffff802a2cb3 in critical_exit () at
>> > src/sys/kern/kern_switch.c:185 #3 =C2=A00xffffffff80465807 in spinlock=
_exit
>> > () at
>> > src/sys/amd64/amd64/machdep.c:1458
>> > #4 =C2=A00xffffffff8027adea in rm_cleanIPI (arg=3D<value optimized out=
>) at
>> > src/sys/kern/kern_rmlock.c:180
>> > #5 =C2=A00xffffffff802b9887 in smp_rendezvous_action () at
>> > src/sys/kern/subr_smp.c:402
>> > #6 =C2=A00xffffffff8045e2a4 in Xrendezvous () at
>> > src/sys/amd64/amd64/apic_vector.S:235
>> > #7 =C2=A00xffffffff802a2c6e in critical_exit () at
>> > src/sys/kern/kern_switch.c:179 #8 =C2=A00xffffffff804365ba in uma_zfre=
e_arg
>> > (zone=3D0xffffff009ff4b5a0, item=3D0xffffff000f34cd40,
>> > udata=3D0xffffff000f34ce08) at
>> > src/sys/vm/uma_core.c:2370
>> > .
>> > .
>> > .
>> >
>> > and now that I look at it again, it is clear that critical_exit() just
>> > isn't interrupt safe. =C2=A0I'm not sure how to fix that, yet ... but =
this:
>> >
>> >
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_critnest =3D=3D 1) {
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0td->td_critnest=
 =3D 0;
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (td->td_owep=
reempt) {
>> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0td->td_critnest =3D 1;
>> >
>> > clearly doesn't work.
>>
>> I'm sorry if I didn't reply to the whole rendezvous thread, but as
>> long as there is so many people taking care of it, I'll stay hidden in
>> my corner.
>>
>> I just wanted to tell that I think you are misunderstanding what
>> critical section is supposed to do.
>>
>> When an interrupt fires, it goes on the old "interrupt/kernel context"
>> which means it has not a context of his own. That is the reason why we
>> disable interrupts on spinlocks (or similar workaround for !x86
>> architectures) and this is why spinlocks are the only protection
>> usable in code that runs in interrupt context.
>>
>> Preempting just means another thread will be scheduler in the middle
>> of another thread execution path.
>>
>> This code is perfectly fine if you consider curthread won't be
>> descheduled while it is executing.
>
> Well, no - it is not. =C2=A0With this you can end up with a curthread tha=
t has
> td_critnest=3D0 and td_owepreempt=3D1 in interrupt context. =C2=A0If you =
use a spinlock
> on such a thread, it will do the preemption at the point where you drop t=
he
> spinlock, this is bad in some circumstances. =C2=A0One example is the smp=
_rendevous
> case we are discussing here.

This circumstances are further protected by another call to
critical_enter(), by consumers or however upper layer calls.
This is why, for example, spinlock_enter() does call critical_enter()
even if it actually disables interrupts or why we disable preemption
in other cases where the interrupts are already disabled.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTi=QCZC%2BTuWjA9EzaP%2BBcn87cKW0Rg>