From owner-freebsd-current@FreeBSD.ORG Sun May 15 16:11:18 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 206CA1065741; Sun, 15 May 2011 16:11:18 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.171]) by mx1.freebsd.org (Postfix) with ESMTP id BBAA08FC18; Sun, 15 May 2011 16:11:17 +0000 (UTC) Received: from maxlap.localnet (wsp04373635wss.cr.net.cable.rogers.com [24.235.98.20]) by mrelayeu.kundenserver.de (node=mreu4) with ESMTP (Nemesis) id 0Li6Ap-1PycxU3h4d-00nOZe; Sun, 15 May 2011 18:11:16 +0200 From: Max Laier To: John Baldwin Date: Sun, 15 May 2011 12:09:13 -0400 User-Agent: KMail/1.13.6 (FreeBSD/9.0-CURRENT; KDE/4.6.1; amd64; ; ) References: <4DCD357D.6000109@FreeBSD.org> <4DCFE8FA.6080005@FreeBSD.org> <4DCFEE33.5090808@FreeBSD.org> In-Reply-To: <4DCFEE33.5090808@FreeBSD.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201105151209.13846.max@love2party.net> X-Provags-ID: V02:K0:Nm0LRuUw7utpDrjdHqQI/D0DwK4xrpCSytV18EYFGs3 LEDI/00ufxbYWIVk9My4ethAPAp3LdZ0ErC/rgX6aRoueGrtYG rgsaRUoV3KhRwrSWJlrlTfTzEWc8vU11LojlBrNmQTA2tV6gnN NzYlvoazLm7H7lS7n3ol7cnqY5dqI3tZM2886vDTFIwxQRbS5H fAEdvThYp3yhW+jIlK+Xg== Cc: FreeBSD current , Peter Grehan , Andriy Gapon , neel@freebsd.org Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 15 May 2011 16:11:18 -0000 On Sunday 15 May 2011 11:16:03 John Baldwin wrote: > On 5/15/11 10:53 AM, Andriy Gapon wrote: > > on 15/05/2011 10:12 Andriy Gapon said the following: > >> on 14/05/2011 18:25 John Baldwin said the following: > >>> Hmmm, so this is not actually sufficient. NetApp ran into a very > >>> similar race with virtual CPUs in BHyVe. In their case because > >>> virtual CPUs are threads that can be preempted, they have a chance at > >>> a longer race. > >>> > >>> The problem that they see is that even though the values have been > >>> updated, the next CPU to start a rendezvous can clear > >>> smp_rv_waiters[2] to zero before one of the other CPUs notices that it > >>> has finished. > >> > >> As a follow up to my previous question. Have you noticed that in my > >> patch no slave CPU actually waits/spins on smp_rv_waiters[2]? It's > >> always only master CPU (and under smp_ipi_mtx). > > > > Here's a cleaner version of my approach to the fix. > > This one does not remove the initial wait on smp_rv_waiters[0] in > > smp_rendezvous_action() and thus does not renumber all smp_rv_waiters[] > > members and thus hopefully should be clearer. > > > > Index: sys/kern/subr_smp.c > > =================================================================== > > --- sys/kern/subr_smp.c (revision 221943) > > +++ sys/kern/subr_smp.c (working copy) > > @@ -110,7 +110,7 @@ static void (*volatile smp_rv_setup_func)(void *ar > > > > static void (*volatile smp_rv_action_func)(void *arg); > > static void (*volatile smp_rv_teardown_func)(void *arg); > > static void *volatile smp_rv_func_arg; > > > > -static volatile int smp_rv_waiters[3]; > > +static volatile int smp_rv_waiters[4]; > > > > /* > > > > * Shared mutex to restrict busywaits between smp_rendezvous() and > > > > @@ -338,11 +338,15 @@ smp_rendezvous_action(void) > > > > /* spin on exit rendezvous */ > > atomic_add_int(&smp_rv_waiters[2], 1); > > > > - if (local_teardown_func == smp_no_rendevous_barrier) > > + if (local_teardown_func == smp_no_rendevous_barrier) { > > + atomic_add_int(&smp_rv_waiters[3], 1); > > > > return; > > > > + } > > > > while (smp_rv_waiters[2]< smp_rv_ncpus) > > > > cpu_spinwait(); > > > > + atomic_add_int(&smp_rv_waiters[3], 1); > > + > > > > /* teardown function */ > > if (local_teardown_func != NULL) > > > > local_teardown_func(local_func_arg); > > > > @@ -377,6 +381,9 @@ smp_rendezvous_cpus(cpumask_t map, > > > > /* obtain rendezvous lock */ > > mtx_lock_spin(&smp_ipi_mtx); > > > > + while (smp_rv_waiters[3]< smp_rv_ncpus) > > + cpu_spinwait(); > > + > > > > /* set static function pointers */ > > smp_rv_ncpus = ncpus; > > smp_rv_setup_func = setup_func; > > > > @@ -385,6 +392,7 @@ smp_rendezvous_cpus(cpumask_t map, > > > > smp_rv_func_arg = arg; > > smp_rv_waiters[1] = 0; > > smp_rv_waiters[2] = 0; > > > > + smp_rv_waiters[3] = 0; > > > > atomic_store_rel_int(&smp_rv_waiters[0], 0); > > > > /* signal other processors, which will enter the IPI with interrupts > > off */ > > Ahh, so the bump is after the change. I do think this will still be ok > and I probably just didn't explain it well to Neel. I wonder though > if the bump shouldn't happen until after the call of the local teardown > function? I don't think we ever intended to synchronize the local teardown part, and I believe that is the correct behavior for this API. This version is sufficiently close to what I have, so I am resonably sure that it will work for us. It seems, however, that if we move to check to after picking up the lock anyway, the generation approach has even less impact and I am starting to prefer that solution. Andriy, is there any reason why you'd prefer your approach over the generation version? Thanks, Max