From owner-freebsd-current@FreeBSD.ORG Thu May 19 12:07:36 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 779A1106564A; Thu, 19 May 2011 12:07:36 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4AF878FC0C; Thu, 19 May 2011 12:07:36 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id D27D646B06; Thu, 19 May 2011 08:07:35 -0400 (EDT) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6F2968A04F; Thu, 19 May 2011 08:07:35 -0400 (EDT) From: John Baldwin To: Max Laier Date: Thu, 19 May 2011 08:07:34 -0400 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110325; KDE/4.5.5; amd64; ; ) References: <4DCD357D.6000109@FreeBSD.org> <201105171635.17704.jhb@freebsd.org> <4DD42BC6.80104@love2party.net> In-Reply-To: <4DD42BC6.80104@love2party.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <201105190807.34978.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.6 (bigwig.baldwin.cx); Thu, 19 May 2011 08:07:35 -0400 (EDT) Cc: neel@freebsd.org, Andriy Gapon , Attilio Rao , FreeBSD current , Stephan Uphoff , Peter Grehan Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 May 2011 12:07:36 -0000 On Wednesday, May 18, 2011 4:27:50 pm Max Laier wrote: > On 05/17/2011 01:35 PM, John Baldwin wrote: > ... > > Yeah, I already have a patch to do that, but hadn't added atomic ops to > > critical_enter() and critical_exit(). But it also wasn't as fancy in the > > critical_exit() case. Here is what I have and I think it might actually > > be ok (it doesn't use an atomic read and clear, but I think it is safe). > > Hmm, actually, it will need to use the read and clear: > > Looks good to me. I was slightly surprised by this: > > > Index: kern/kern_synch.c > > =================================================================== > > --- kern/kern_synch.c (revision 222024) > > +++ kern/kern_synch.c (working copy) > > @@ -400,9 +400,7 @@ > > if (!TD_ON_LOCK(td)&& !TD_IS_RUNNING(td)) > > mtx_assert(&Giant, MA_NOTOWNED); > > #endif > > - KASSERT(td->td_critnest == 1 || (td->td_critnest == 2&& > > - (td->td_owepreempt)&& (flags& SW_INVOL) != 0&& > > - newtd == NULL) || panicstr, > > + KASSERT(td->td_critnest == 1 || panicstr, > > ("mi_switch: switch in a critical section")); > > KASSERT((flags& (SW_INVOL | SW_VOL)) != 0, > > ("mi_switch: switch must be voluntary or involuntary")); > > part of the patch. But that is in fact correct and much more expressive > and safe than the version we had before. Actually, this is something that was made obsolete by ups's change to critical_exit() several years ago. However, after sleeping on this last night, I think that critical_enter() and critical_exit() are a very hot path and that it may be worth adding some additional complexity to keep them as cheap as possible. Also, your original patch for rm locks is actually correct. They will not miss a preemption because rm_cleanIPI will not schedule a thread to run while it holds the spin lock. However, I'd like to be a bit more future proof so that other IPI handler authors don't have to deal with this obscure race. Given that, I've generalized your fix and moved it up into the SMP rendezvous code itself along with a big comment to explain why it works and a KASSERT(). I think this is also MFC'able back to 7 as well: Index: kern/subr_smp.c =================================================================== --- kern/subr_smp.c (revision 222032) +++ kern/subr_smp.c (working copy) @@ -316,6 +316,9 @@ void (*local_action_func)(void*); void (*local_teardown_func)(void*); int generation; +#ifdef INVARIANTS + int owepreempt; +#endif /* Ensure we have up-to-date values. */ atomic_add_acq_int(&smp_rv_waiters[0], 1); @@ -330,6 +333,33 @@ generation = smp_rv_generation; /* + * Use a nested critical section to prevent any preemptions + * from occurring during a rendezvous action routine. + * Specifically, if a rendezvous handler is invoked via an IPI + * and the interrupted thread was in the critical_exit() + * function after setting td_critnest to 0 but before + * performing a deferred preemption, this routine can be + * invoked with td_critnest set to 0 and td_owepreempt true. + * In that case, a critical_exit() during the rendezvous + * action would trigger a preemption which is not permitted in + * a rendezvous action. To fix this, wrap all of the + * rendezvous action handlers in a critical section. We + * cannot use a regular critical section however as having + * critical_exit() preempt from this routine would also be + * problematic (the preemption must not occur before the IPI + * has been acknowleged via an EOI). Instead, we + * intentionally ignore td_owepreempt when leaving the + * critical setion. This should be harmless because we do not + * permit rendezvous action routines to schedule threads, and + * thus td_owepreempt should never transition from 0 to 1 + * during this routine. + */ + td->td_critnest++; +#ifdef INVARIANTS + owepreempt = td->td_owepreempt; +#endif + + /* * If requested, run a setup function before the main action * function. Ensure all CPUs have completed the setup * function before moving on to the action function. @@ -362,14 +392,18 @@ */ MPASS(generation == smp_rv_generation); atomic_add_int(&smp_rv_waiters[2], 1); - if (local_teardown_func == smp_no_rendevous_barrier) - return; - while (smp_rv_waiters[2] < smp_rv_ncpus && - generation == smp_rv_generation) - cpu_spinwait(); + if (local_teardown_func != smp_no_rendevous_barrier) { + while (smp_rv_waiters[2] < smp_rv_ncpus && + generation == smp_rv_generation) + cpu_spinwait(); - if (local_teardown_func != NULL) - local_teardown_func(local_func_arg); + if (local_teardown_func != NULL) + local_teardown_func(local_func_arg); + } + + td->td_critnest--; + KASSERT(owepreempt == td->td_owepreempt, + ("rendezvous action changed td_owepreempt")); } void Index: kern/kern_synch.c =================================================================== --- kern/kern_synch.c (revision 222024) +++ kern/kern_synch.c (working copy) @@ -400,9 +400,7 @@ if (!TD_ON_LOCK(td) && !TD_IS_RUNNING(td)) mtx_assert(&Giant, MA_NOTOWNED); #endif - KASSERT(td->td_critnest == 1 || (td->td_critnest == 2 && - (td->td_owepreempt) && (flags & SW_INVOL) != 0 && - newtd == NULL) || panicstr, + KASSERT(td->td_critnest == 1 || panicstr, ("mi_switch: switch in a critical section")); KASSERT((flags & (SW_INVOL | SW_VOL)) != 0, ("mi_switch: switch must be voluntary or involuntary")); -- John Baldwin