Date: Tue, 21 Aug 2007 21:19:02 +0200 From: Pawel Jakub Dawidek <pjd@FreeBSD.org> To: John Baldwin <jhb@freebsd.org> Cc: Alfred Perlstein <alfred@freebsd.org>, freebsd-arch@freebsd.org Subject: Re: Lockless uidinfo. Message-ID: <20070821191902.GA4187@garage.freebsd.pl> In-Reply-To: <200708211403.29293.jhb@freebsd.org> References: <20070818120056.GA6498@garage.freebsd.pl> <20070818155041.GY90381@elvis.mu.org> <20070818161449.GE6498@garage.freebsd.pl> <200708211403.29293.jhb@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 21, 2007 at 02:03:28PM -0400, John Baldwin wrote: > On Saturday 18 August 2007 12:14:49 pm Pawel Jakub Dawidek wrote: > > On Sat, Aug 18, 2007 at 08:50:41AM -0700, Alfred Perlstein wrote: > > > * Pawel Jakub Dawidek <pjd@FreeBSD.org> [070818 07:59] wrote: > > > > Yes, to lookup uidinfo you need to hold uihashtbl_mtx mutex, so onc= e you > > > > hold it and ui_ref is 0, noone will be able to reference it, becaus= e it > > > > has to wait to look it up. > > >=20 > > > And the field doesn't need to be volatile to prevent cached/opportuni= tic > > > reads? > >=20 > > The only chance of something like this will be the scenario below: > >=20 > > thread1 (uifind) thread2 (uifree) > > ---------------- ---------------- > > refcount_release(&uip->ui_ref)) > > /* ui_ref =3D=3D 0 */ > > mtx_lock(&uihashtbl_mtx); > > refcount_acquire(&uip->ui_ref); > > /* ui_ref =3D=3D 1 */ > > mtx_unlock(&uihashtbl_mtx); > > mtx_lock(&uihashtbl_mtx); > > if (uip->ui_ref > 0) { > > mtx_unlock(&uihashtbl_mtx); > > return; > > } > >=20 > > Now, you suggest that ui_ref in 'if (uip->ui_ref > 0)' may still have > > cached 0? I don't think it is possible, first refcount_acquire() uses > > read memory bariers (but we may still need ui_ref to volatile for this > > to make any difference) and second, think of ui_ref as a field protected > > by uihashtbl_mtx mutex in this very case. > >=20 > > Is my thinking correct? >=20 > Memory barriers on another CPU don't mean anything about the CPU thread 2= is=20 > on. Memory barriers do not flush caches on other CPUs, etc. Normally wh= en=20 > objects are refcounted in a table, the table holds a reference on the obj= ect,=20 > but that doesn't seem to be the case here. [...] But the memory barrier from 'mtx_lock(&uihashtbl_mtx)' above 'if (uip->ui_ref > 0)' would do the trick and I can safely avoid using atomic read in this if statement, right? > [...] Have you tried doing something=20 > very simple in uifree(): >=20 > { > mtx_lock(&uihashtbl_mtx); > if (refcount_release(...)) { > LIST_REMOVE(); > mtx_unlock(&uihashtbl_mtx); > ... > free(); > } else > mtx_unlock(&uihashtbl_mtx); > } >=20 > I wouldn't use a more complex algo in uifree() unless the simple one is s= hown=20 > to perform badly. Needless complexity is a hindrance to future maintenan= ce. Of coure we could do that, but I was trying really hard to remove contention in the common case. Before we used UIDINFO_LOCK() in the common case, now you suggesting using global lock here, and I'd really, really prefer using one atomic only. > Also, even if you do go with the more complex route, I'd rather you reduc= e=20 > diffs with the current code by keeping the test as 'uip->ui_ref =3D=3D 0'= and=20 > keeping the removal code in the if-block. Will do. > In chgproccnt() you should use atomic_fetchadd_long() to avoid a race whe= n=20 > reading ui_proccnt. >=20 >=20 > old =3D atomic_fetchadd_long(&uip->ui_proccnt, diff); > if (old + diff < 0) > printf("...."); I'm aware of this race, but I don't find closing it that much important. We won't generate false positive here. My vote is to leave it as it is, because atomic_fetchadd_long() is slower on some archs than atomic_add_long(), ie. it is implemented using atomic_cmpset_long() loop, and as I checked by running 8 processes on 8way machine with older code that used atomic_cmpset_long() loop in 'diff > 0' case, there is almost one extra loop on every call, which makes it about 6% slower. > OTOH, atomic_fetchadd_long() doesn't yet exist, so you will need to fix t= hat,=20 > or just always use an atomic_cmpset() loop. I already implemented those. --=20 Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! --BOKacYhQ+x31HxR3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.4 (FreeBSD) iD8DBQFGyzqmForvXbEpPzQRAquEAJ9fd9/Ys+F3sCWE22/A3ls+iLjtIACfZiJX /zfTVrohvXz+Av4X+OvInQU= =uQx4 -----END PGP SIGNATURE----- --BOKacYhQ+x31HxR3--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070821191902.GA4187>