Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Mar 2008 09:15:21 +0000 (GMT)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Julian Elischer <julian@elischer.org>
Cc:        vadim_nuclight@mail.ru, freebsd-hackers@freebsd.org, freebsd-ipfw@freebsd.org
Subject:   Re: [HEADS UP!] IPFW Ideas: possible SoC 2008 candidate
Message-ID:  <20080327090909.T34007@fledge.watson.org>
In-Reply-To: <47EA8860.3060709@elischer.org>
References:  <slrnfud9lu.1rus.vadim_nuclight@hostel.avtf.net> <47E79636.1000909@FreeBSD.org> <47E7EAA8.7020101@elischer.org> <slrnfuk3lf.egh.vadim_nuclight@hostel.avtf.net> <47EA8860.3060709@elischer.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 26 Mar 2008, Julian Elischer wrote:

> it wouldn't.. you'd add them together before presenting them. but every time 
> a packet changes a counter that is shared, there is a chance that it is 
> being altered by another processor, so if you have fine grained locking in 
> ipfw, you really should use atomic adds, which are slow, or accept possibl 
> collisions (which might be ok) but still cause a lot of cross cpu TLB 
> flushing.

In malloc(9) and uma(9), we maintain per-CPU stats, coalescing only for 
presentation, relying on soft critical sections rather than locks to pretect 
consistency.  What's worth remembering, however, is that recent multicore 
machines have significantly optimized the cost of atomic operations on cache 
lines held for write by the current CPU, and so the cost of locking has 
dramatically fallen in the last few years.  This re-emphasizes the importance 
of careful cacheline management for per-CPU data structures (particularly, 
don't put data written by multiple CPUs in the same cacheline if you want the 
benefits of per-CPU access).

Where read-write locking is the best model, Stephan's recent work on rmlocks 
looks quite promising.  In my micro-benchmarks, on recent hardware it performs 
extremely well on SMP for read locks, but still requires optimization for 
UP-compiled kernels.  For stats and writable structures, such as per-CPU 
caches, rmlocks aren't very helpful, but when compared with replicating 
infrequently written data structures across many CPUs, rwlocks/rmlocks offer a 
much simpler and less error-prone programming model.  We need to see more 
optimization and measurement done on rmlocks for 8.x, and the lack of full 
priority propagation for rwlocks has to be kept in mind.

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080327090909.T34007>