Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 Jan 2008 23:07:48 -0800
From:      Julian Elischer <julian@elischer.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-performance@freebsd.org
Subject:   Re: Memory allocation performance
Message-ID:  <47A2C544.4090303@elischer.org>
In-Reply-To: <47A2C2A2.5040109@FreeBSD.org>
References:  <47A25412.3010301@FreeBSD.org> <47A25A0D.2080508@elischer.org> <47A2C2A2.5040109@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Alexander Motin wrote:
> Julian Elischer пишет:
>> Alexander Motin wrote:
>>> Hi.
>>>
>>> While profiling netgraph operation on UP HEAD router I have found 
>>> that huge amount of time it spent on memory allocation/deallocation:
>>>
>>>         0.14  0.05  132119/545292      ip_forward <cycle 1> [12]
>>>         0.14  0.05  133127/545292      fxp_add_rfabuf [18]
>>>         0.27  0.10  266236/545292      ng_package_data [17]
>>> [9]14.1 0.56  0.21  545292         uma_zalloc_arg [9]
>>>         0.17  0.00  545292/1733401     critical_exit <cycle 2> [98]
>>>         0.01  0.00  275941/679675      generic_bzero [68]
>>>         0.01  0.00  133127/133127      mb_ctor_pack [103]
>>>
>>>         0.15  0.06  133100/545266      mb_free_ext [22]
>>>         0.15  0.06  133121/545266      m_freem [15]
>>>         0.29  0.11  266236/545266      ng_free_item [16]
>>> [8]15.2 0.60  0.23  545266         uma_zfree_arg [8]
>>>         0.17  0.00  545266/1733401     critical_exit <cycle 2> [98]
>>>         0.00  0.04  133100/133100      mb_dtor_pack [57]
>>>         0.00  0.00  134121/134121      mb_dtor_mbuf [111]
>>>
>>> I have already optimized all possible allocation calls and those that 
>>> left are practically unavoidable. But even after this kgmon tells 
>>> that 30% of CPU time consumed by memory management.
>>>
>>> So I have some questions:
>>> 1) Is it real situation or just profiler mistake?
>>> 2) If it is real then why UMA is so slow? I have tried to replace it 
>>> in some places with preallocated TAILQ of required memory blocks 
>>> protected by mutex and according to profiler I have got _much_ better 
>>> results. Will it be a good practice to replace relatively small UMA 
>>> zones with preallocated queue to avoid part of UMA calls?
>>> 3) I have seen that UMA does some kind of CPU cache affinity, but 
>>> does it cost so much that it costs 30% CPU time on UP router?
>>
>> given this information, I would add an 'item cache' in ng_base.c
>> (hmm do I already have one?)
> 
> That was actually my second question. As there is only 512 items by 
> default and they are small in size I can easily preallocate them all on 
> boot. But is it a good way? Why UMA can't do just the same when I have 
> created zone with specified element size and maximum number of objects? 
> What is the principal difference?
> 

who knows what uma does.. but if you do it yourself you know what the 
overhead is.. :-)




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47A2C544.4090303>