Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 04 Feb 2008 01:40:10 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Kris Kennaway <kris@FreeBSD.org>, Robert Watson <rwatson@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org, Julian Elischer <julian@elischer.org>
Subject:   Re: Memory allocation performance
Message-ID:  <47A650DA.8020908@FreeBSD.org>
In-Reply-To: <47A4F1AF.9090306@FreeBSD.org>
References:  <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org>	<47A2C2A2.5040109@FreeBSD.org>	<20080201185435.X88034@fledge.watson.org>	<47A43873.40801@FreeBSD.org>	<20080202095658.R63379@fledge.watson.org> <47A4E934.1050207@FreeBSD.org> <47A4F1AF.9090306@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Kris Kennaway wrote:
> You can look at the raw output from pmcstat, which is a collection of 
> instruction pointers that you can feed to e.g. addr2line to find out 
> exactly where in those functions the events are occurring.  This will 
> often help to track down the precise causes.

Thanks to the hint, it was interesting hunting, but it shown nothing. It 
hits into very simple lines like:
bucket = cache->uc_freebucket;
cache->uc_allocs++;
if (zone->uz_ctor != NULL) {
cache->uc_frees++;
and so on.
There is no loops, there is no inlines or macroses. Nothing! And the 
only hint about it is a huge number of "p4-resource-stall"s in those 
lines. I have no idea what exactly does it means, why does it happens 
mostly here and how to fight it.

I would probably agreed that it might be some profiler fluctuation, but 
performance benefits I have got from self-made uma calls caching look 
very real. :(

Robert Watson wrote:
 > There was, FYI, a report a few years ago that there was a measurable
 > improvement from allocating off the free bucket rather than maintaining
 > separate alloc and free buckets.  It sounded good at the time but I was
 > never able to reproduce the benefits in my test environment.  Now might
 > be a good time to try to revalidate that.  Basically, the goal would be
 > to make the pcpu cache FIFO as much as possible as that maximizes the
 > chances that the newly allocated object already has lines in the cache.
 > It's a fairly trivial tweak to the UMA allocation code.

I have tried this, but have not found a difference. May be it gives some 
benefits, but not in this situation. In this situation profiling shows 
delays in allocator itself, so as soon as allocator does not touches 
data objects itself it probably more speaks about management structure's 
memory caching then about objects caching.

I have got one more crazy idea that memory containing zones may have 
some special hardware or configuration features, like "noncaching" or 
something alike. That could explain slowdown in accessing it. But as I 
can't prove it, it just one more crazy theory. :(

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47A650DA.8020908>