From owner-freebsd-current@FreeBSD.ORG Fri Oct 1 08:53:49 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7ACDA1065670 for ; Fri, 1 Oct 2010 08:53:49 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id DA2618FC19 for ; Fri, 1 Oct 2010 08:53:48 +0000 (UTC) Received: (qmail 27992 invoked from network); 1 Oct 2010 08:45:37 -0000 Received: from unknown (HELO [62.48.0.92]) ([62.48.0.92]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 1 Oct 2010 08:45:37 -0000 Message-ID: <4CA5A1B1.7050902@freebsd.org> Date: Fri, 01 Oct 2010 10:54:09 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 MIME-Version: 1.0 To: freebsd-current@freebsd.org References: <4CA4BCD2.4070303@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-hackers Subject: Re: Examining the VM splay tree effectiveness X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Oct 2010 08:53:49 -0000 On 30.09.2010 19:51, Ivan Voras wrote: > On 09/30/10 18:37, Andre Oppermann wrote: > >> Both the vmmap and page table make use of splay trees to manage the >> entries and to speed up lookups compared to long to traverse linked >> lists or more memory expensive hash tables. Some structures though >> do have an additional linked list to simplify ordered traversals. > > The property of splay tree requiring *writes* for nearly every read > really is a thorn in the eye for SMP. It seems to me that even if the > immediate benefits from converting to something else are not directly > observable, it will still be worth doing it. Fully agreed. > It's a shame that RCU is still a patent minefield :/ > > http://mirror.leaseweb.com/kernel/people/npiggin/patches/lockless/2.6.16-rc5/radix-intro.pdf I'm not convinced that RCU is the only scalable way of sharing a data structure across a possibly large number of CPU's. The term "lockless" is often used and frequently misunderstood. Having a lockess data structure *doesn't* mean that is either performant, scalable or both. It heavily depends on a number of additional factors. Many times "lockless" just replaces a simple lock/unlock cycle with a number of atomic operations on the data structure. This can easily backfire because an atomic operation just hides the computational complexity and also dirties the CPU's cache lines. Generally on cache coherent architectures almost all of the atomic operation is in HW with bus lock cycles, bus snooping and whatnot. While seemingly simple form the programmers point of view, the overhead and latency is still there. Needless to say that on more relaxed architectures (SPARC, Alpha, ...) the overhead is higher. Also important in the overall picture are the memory barrier semantics of locks. Some newer CPU's have started to provide hardware implemented lock managers and combine it with SMT features so that access to an already locked lock causes an immediate HW thread switch and on unlock a switch back. We also have rm_locks (read mostly locks) that do not require synchronization on read but have a more expensive write lock. In UMA we use a mix of global pools of elements with per-CPU caching of free elements. As always the best approach depends on the dominant access pattern of a structure. It all comes down to the amortization cost of the different locking or "lockless" strategies. It's also important to make sure that worst case behavior doesn't bring down the box. As a rule of thumb I use this: a) make sure the lock is held for only a small amount of time to avoid lock contention. b) do everything you can outside of the lock. c) if the lock is found to be heavily contended rethink the whole approach and check if other data structures can be used. d) minimize write accesses to memory in the lock protected shared data structure. e) PROFILE, DON'T SPECULATE! Measure the access pattern and measure the locking/data access strategy's cost in terms of CPU cycles consumed. f) on lookup heavy data structures avoid writing to memory and by it dirtying CPU caches. g) on modify heavy data structures avoid touching too many elements. h) on lookup and modify heavy data structure that are used across many CPU's all bets are off and a different data structure approach should be considered resulting ideally in case f). It all starts with the hypothesis that a data structure is not optimally locked. -- Andre