From owner-freebsd-current@FreeBSD.ORG Fri May 13 17:13:29 2011 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A9325106566B; Fri, 13 May 2011 17:13:29 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.freebsd.org (Postfix) with ESMTP id 530A58FC0C; Fri, 13 May 2011 17:13:29 +0000 (UTC) Received: from maxlap.localnet ([192.75.139.253]) by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis) id 0LvdiS-1Pe5jW3MeC-017YWK; Fri, 13 May 2011 19:13:27 +0200 From: Max Laier To: freebsd-current@freebsd.org Date: Fri, 13 May 2011 13:13:11 -0400 User-Agent: KMail/1.13.6 (FreeBSD/9.0-CURRENT; KDE/4.6.1; amd64; ; ) References: <4DCD357D.6000109@FreeBSD.org> <4DCD4E21.7020800@FreeBSD.org> <201105131150.57548.max@love2party.net> In-Reply-To: <201105131150.57548.max@love2party.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201105131313.11677.max@love2party.net> X-Provags-ID: V02:K0:r50QccgkMB8WjpPtOG8pJYxHBO5+Q0k58RQhC+y7wFN jieoGg5pRLpgQYcX0p6bRZtbwzsyr2KvCtI42hjg7ZGUfhT1XH V2mRaMdVdExGT7jp6eJEzjEqWaDnjruH/UuckoPDyfDU/2eoPP /Ldjk+Vz3INos5eo7oSm4B4Yk/57Pucq5F82njFv9P5GLUdXcP yyknvVeUyBZykdPFT0Abg== Cc: Andriy Gapon Subject: Re: proposed smp_rendezvous change X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 May 2011 17:13:29 -0000 On Friday 13 May 2011 11:50:57 Max Laier wrote: > On Friday 13 May 2011 11:28:33 Andriy Gapon wrote: > > on 13/05/2011 17:41 Max Laier said the following: > > > this ncpus isn't the one you are looking for. > > > > Thank you! > > > Here's an updated patch: > Can you attach the patch, so I can apply it locally. This code is really > hard to read without context. Some more comments inline ... > > > Index: sys/kern/subr_smp.c > > =================================================================== > > --- sys/kern/subr_smp.c (revision 221835) > > +++ sys/kern/subr_smp.c (working copy) > > @@ -316,19 +316,14 @@ > > > > void (*local_action_func)(void*) = smp_rv_action_func; > > void (*local_teardown_func)(void*) = smp_rv_teardown_func; > > > > - /* Ensure we have up-to-date values. */ > > - atomic_add_acq_int(&smp_rv_waiters[0], 1); > > - while (smp_rv_waiters[0] < smp_rv_ncpus) > > - cpu_spinwait(); > > - > > You really need this for architectures that need the memory barrier to > ensure consistency. We also need to move the reads of smp_rv_* below this > point to provide a consistent view. > > > /* setup function */ > > if (local_setup_func != smp_no_rendevous_barrier) { > > > > if (smp_rv_setup_func != NULL) > > > > smp_rv_setup_func(smp_rv_func_arg); > > > > /* spin on entry rendezvous */ > > > > - atomic_add_int(&smp_rv_waiters[1], 1); > > - while (smp_rv_waiters[1] < smp_rv_ncpus) > > + atomic_add_int(&smp_rv_waiters[0], 1); > > + while (smp_rv_waiters[0] < smp_rv_ncpus) > > > > cpu_spinwait(); > > > > } > > > > @@ -337,12 +332,16 @@ > > > > local_action_func(local_func_arg); > > > > /* spin on exit rendezvous */ > > > > - atomic_add_int(&smp_rv_waiters[2], 1); > > - if (local_teardown_func == smp_no_rendevous_barrier) > > + atomic_add_int(&smp_rv_waiters[1], 1); > > + if (local_teardown_func == smp_no_rendevous_barrier) { > > + atomic_add_int(&smp_rv_waiters[2], 1); > > > > return; > > > > - while (smp_rv_waiters[2] < smp_rv_ncpus) > > + } > > + while (smp_rv_waiters[1] < smp_rv_ncpus) > > > > cpu_spinwait(); > > > > + atomic_add_int(&smp_rv_waiters[2], 1); > > + > > > > /* teardown function */ > > if (local_teardown_func != NULL) > > > > local_teardown_func(local_func_arg); > > > > @@ -377,6 +376,10 @@ > > > > /* obtain rendezvous lock */ > > mtx_lock_spin(&smp_ipi_mtx); > > > > + /* Wait for any previous unwaited rendezvous to finish. */ > > + while (atomic_load_acq_int(&smp_rv_waiters[2]) < smp_rv_ncpus) > > + cpu_spinwait(); > > + Disregard this ... I misread the diff. You are indeed using [2] correctly as the "all-clear" semaphore. I still believe, that it is safer/cleaner to do this spin before releasing the lock instead (see my patch). > This does not help you at all. Imagine the following (unlikely, but not > impossible) case: > > CPUA: start rendevouz including self, finish the action first (i.e. CPUA is > the first one to see smp_rv_waiters[2] == smp_rv_ncpus, drop the lock and > start a new rendevouz. smp_rv_waiters[2] == smp_rv_ncpus is still true on > that CPU, but ... > > CPUB might have increased smp_rv_waiters[2] for the first rendevouz, but > never saw smp_rv_waiters[2] == smp_rv_ncpus, still ... > > CPUA is allowed to start a new rendevouz which will leave CPUB stranded and > can lead to a deadlock. > > I think this is also possible with another CPU starting the second > rendevous. > > > /* set static function pointers */ > > smp_rv_ncpus = ncpus; > > smp_rv_setup_func = setup_func; > > > > @@ -395,7 +398,7 @@ > > > > smp_rendezvous_action(); > > > > if (teardown_func == smp_no_rendevous_barrier) > > > > - while (atomic_load_acq_int(&smp_rv_waiters[2]) < ncpus) > > + while (atomic_load_acq_int(&smp_rv_waiters[1]) < ncpus) > > > > cpu_spinwait(); > > > > /* release lock */ > > > > _______________________________________________ > > freebsd-current@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-current > > To unsubscribe, send any mail to > > "freebsd-current-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"