Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 May 2017 21:11:27 +0200
From:      Jens Stimpfle <jfs@jstimpfle.de>
To:        freebsd-hackers@freebsd.org
Subject:   Re: Improved Red-black tree implementation
Message-ID:  <20170522191127.GB27949@jstimpfle.de>
In-Reply-To: <59227456.9020805@embedded-brains.de>
References:  <20170514050220.GA768@jstimpfle.de> <591EA74A.3080500@embedded-brains.de> <20170520165303.GA22452@jstimpfle.de> <59227456.9020805@embedded-brains.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, May 22, 2017 at 07:17:10AM +0200, Sebastian Huber wrote:
> >Considering memory savings, have you considered 2-pointer trees (no
> >parent pointer)? They need a temporary search stack for insertion and
> >deletion, but not for iteration ("threaded trees").
>
> The RB and Eternally Confuzzled variants don't use a parent pointer as far
> as I remember.

I don't know "RB", but a few more words since both do not do too well in
your bench, performance-wise. I know that Eternally Confuzzled uses a
possibly easier to implement ("top-down") algorithm which is expected to
be slower. A 2-ptr tree with a conventional implementation need not be
much slower than a 3-ptr, or could in fact be faster due to memory
savings. Here are measurements including an unpolished 2-ptr tree.

Amd X2-250 (x86-64):
====================

  SIL 3-pointer Red-black tree: 0.699s 0.646s 0.635s
  SIL 2-pointer Red-black tree: 0.856s 0.856s 0.856s
  STL std::set:                 0.750s 0.751s 0.750s
  BSD RB tree:                  0.804s 0.832s 0.828s

Raspberry Pi 2 (ARM ???)
========================

  SIL 3-pointer Red-black tree: 3.844s 3.841s 3.839s
  SIL 2-pointer Red-black tree: 4.002s 4.231s 3.954s
  STL std::set:                 3.742s 3.786s 3.776s
  BSD RB tree:                  4.112s 4.107s 4.105s

That particular benchmark generates a random permutation of [0,2^16) in
a flat array of C ints. It adds 2^15 random elements from that array to
an empty tree and then makes another 2^15 operations, randomly chosen to
be add, delete, or find (in 128-bursts) on random elements from the
array.

The variance in the measurements accounts for the randomly chosen array
elements. There is no variance if I chose neighbouring elements or when
the bench data is small enough to fit in the L1 cache (it's also about
100x faster).

The measurements do not account for the performance disadvantage of
2-ptr that is having to "find", i.e. create a search stack, an
(intrusively linked) element that you already hold in your hands, if you
want to delete it or use it as a hint to add a similar element. This is
to my knowledge the only case where maybe 2x slowdown can be expected.

Jens



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170522191127.GB27949>