Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 19 Aug 2007 09:57:50 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        Jeff Roberson <jroberson@chesapeake.net>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Lockless uidinfo.
Message-ID:  <20070819075750.GB11792@garage.freebsd.pl>
In-Reply-To: <20070818163503.T568@10.0.0.1>
References:  <20070818120056.GA6498@garage.freebsd.pl> <20070818220756.GH6498@garage.freebsd.pl> <20070818230917.GI6498@garage.freebsd.pl> <20070818163503.T568@10.0.0.1>

next in thread | previous in thread | raw e-mail | index | archive | help

--0eh6TmSyL6TZE2Uz
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Aug 18, 2007 at 04:35:42PM -0700, Jeff Roberson wrote:
> On Sun, 19 Aug 2007, Pawel Jakub Dawidek wrote:
> >Ok, after implementing atomic_fetchadd_long() on amd64, we get additional
> >6% of performance improvement:
> >
> >x ./uidinfo_lockfree.txt (atomic_cmpset_long loop)
> >+ ./uidinfo_waitfree.txt (atomic_fetchadd_long)
> >+-----------------------------------------------------------------------=
-------+
> >|                                                                       =
  =20
> >+|
> >|                                                                       =
  =20
> >+|
> >|x   xx    xx                                                           =
  =20
> >+ ++|
> >|  |__MA___|                                                            =
  =20
> >|AM|
> >+-----------------------------------------------------------------------=
-------+
> >   N           Min           Max        Median           Avg        Stdd=
ev
> >x   5       1561566       1575987       1568964       1569767     5853.1=
399
> >+   5       1662362       1665936       1665810     1664881.8     1541.2=
693
> >Difference at 95.0% confidence
> >       95114.8 +/- 6241.96
> >       6.05917% +/- 0.397636%
> >       (Student's t, pooled s =3D 4279.88)
>=20
> How does this effect the single-threaded performance?  Do you attribute=
=20
> this to atomic fetchadd being cheaper than atomic cmpset?  What is your=
=20
> processor?

CPU: Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz (1597.65-MHz
K8-class CPU)
  Origin =3D "GenuineIntel"  Id =3D 0x6f7  Stepping =3D 7
  Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PG=
E,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=3D0x4e33d<SSE3,RSVD2,MON,DS_CPL,VMX,TM2,SSSE3,CX16,xTPR,PDCM,DC=
A>
  AMD Features=3D0x20100800<SYSCALL,NX,LM>
  AMD Features2=3D0x1<LAHF>
  Cores per package: 4

Ok, I changed the code to something like this:

	long old;
	int diff, loops;

	atomic_add_int(&uidinfo_cnt1, 1);
	if (diff > 0) {
		loops =3D 0;
		do {
			loops++;
			old =3D uip->ui_sbsize;
			if (old + diff > max)
				return (0);
		} while (atomic_cmpset_long(&uip->ui_sbsize, old, old + diff) =3D=3D 0);
		if (loops > 1)
			atomic_add_int(&uidinfo_cnt2, loops);
	} else {
		atomic_add_long(&uip->ui_sbsize, (long)diff);
	}

This allows me to see how many additional loops I do, because with
lock-free version we still can have contention and loop, that's why
wait-free version is superior.

Actually I was a bit surprised with the results:

debug.uidinfo.cnt1: 88746008
debug.uidinfo.cnt2: 31296304

(Running 8 processes.)

Which means, because of contention, we do 31296304 additional atomic
operations, which is about 30% more.

--=20
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd@FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

--0eh6TmSyL6TZE2Uz
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFGx/f9ForvXbEpPzQRAoIwAKCL/fLfk/Wow6njyNFLyOXjKky5RwCfUKoX
7ZGZAv/M+5w9Xu5RFPFoJRE=
=yhzP
-----END PGP SIGNATURE-----

--0eh6TmSyL6TZE2Uz--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070819075750.GB11792>