From owner-freebsd-current@FreeBSD.ORG Fri Oct 1 09:20:42 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82B471065670 for ; Fri, 1 Oct 2010 09:20:42 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 073F98FC16 for ; Fri, 1 Oct 2010 09:20:41 +0000 (UTC) Received: (qmail 28285 invoked from network); 1 Oct 2010 09:12:32 -0000 Received: from unknown (HELO [62.48.0.92]) ([62.48.0.92]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 1 Oct 2010 09:12:32 -0000 Message-ID: <4CA5A7FF.5030602@freebsd.org> Date: Fri, 01 Oct 2010 11:21:03 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: freebsd-current@freebsd.org References: <4CA4BCD2.4070303@freebsd.org> <20100930172439.GA34369@freebsd.org> <4CA4CCF8.1050300@freebsd.org> <20100930174900.GA37733@freebsd.org> <20100930180417.GA39381@freebsd.org> <4CA504AD.8000102@freebsd.org> <4CA509FE.30303@freebsd.org> <201010010449.o914nmVt024965@apollo.backplane.com> In-Reply-To: <201010010449.o914nmVt024965@apollo.backplane.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Examining the VM splay tree effectiveness X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 09:20:42 -0000 On 01.10.2010 06:49, Matthew Dillon wrote: > I don't remember the reference but I read a comprehensive comparison > between various indexing methods about a year ago and the splay tree > did considerably better than a RB-tree. The RB-tree actually did > fairly poorly. It heavily depends on the access pattern of data structure. Is it lookup, modify or insert/delete dominated? Or a mix of any of them. How heavily is the data structure shared across CPU's? Without this information it is impossible to make a qualified choice. Making general comparative statements on indexing methods without taking the access pattern and SMP/CPU cache behavior into account is going to lead to wrong approach 90% of the time. (Made that number of up ;-) > Any binary tree-like structure makes fairly poor use of cpu caches. > Splay trees work somewhat better as long as there is some locality > of reference in the lookups, since the node being looked up is moved > to the root. It isn't a bad trade-off. Again, it hugely depends on how good the locality is and how expensive the CPU cache line dirtying of the splay rotation is. You can quickly fall off an amortization *cliff* here. I agree with binary tree structures being a bit less optimal for CPU caches because the tree node is embedded with the data element. On the plus side not many other data structures are either. And as long as memory is only read it can be cached on multiple CPU's. Touching it throws it out everywhere else and causes a high latency memory access on the next read. > On the negative side all binary-tree-like structures tend to be > difficult to MP-lock in a fine-grained manner (not that very many > structures are locked at that fine a grain anyway, but it's a data > point). Splay-trees are impossible to lock at a fine-grain due to > the massive modifications made on any search and the movement > of the root, and there are MP access issues too. I doubt that fine grained locking of such data structures is beneficial in many cases. Fine grained locking implies more locks, more bus lock cycles, more memory barriers and more CPU cache dirtying. As long as a data structure's global lock is not significantly contended on and based on that a finer locking doesn't lead to parallel operation it just creates a lot of overhead for nothing. > -- > > What turned out to be the best indexing mechanism was a chained > hash table whos hoppers were small linear arrays instead of single > elements. So instead of pointer-chaining each element you have a small > for() loop for 4-8 elements before you chain. The structure being > indexed would NOT be integrated into the index directly, the index > would point at the final structure from the hopper. This makes a lot of sense if the index is sufficiently small, lets say one or two int's. When you go beyond that the advantage quickly fades away. > For our purposes such linear arrays would contain a pointer and > an indexing value in as small an element as possible (8-16 bytes), > the idea being that you make the absolute best use of your cache line > and L1 cache / memory burst. One random access (initial hash index), > then linear accesses using a small indexing element, then one final > random access to get to the result structure and validate that > it's the one desired (at that point we'd be 99.9% sure that we have > the right structure because we have already compared the index value > stored in the hopper). As a plus the initial hash index also makes > MP locking the base of the chains easier. Agreed under the premise that the access pattern fits this behavior. > I don't use arrayized chained hash tables in DFly either, but only > because it's stayed fairly low on the priority list. cpu isn't really > a major issue on systems these days. I/O is the bigger problem. > RB-Trees are simply extremely convenient from the programming side, > which is why we still use them. Agreed with the emphasis on including lock/atomic cycles and CPU cache hierarchy into I/O. > I was surprised that splay trees did so well vs RB-trees, I never liked > the memory writes splay trees do let alone the impossibility of > fine-grained locking. Originally I thought the writes would make > performance worse, but they actually don't. Still, if I were to change > any topologies now I would definitely go with the chained-hash / > small-linear-array / chain / small-linear-array / chain mechanic. It > seems to be the clear winner. Without first studying the accesses pattern and applying it to the various data structures this is a very speculative statement to make. -- Andre