Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jan 2013 17:51:27 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Andre Oppermann <andre@freebsd.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, src-committers@freebsd.org, Alan Cox <alc@rice.edu>, "Jayachandran C." <jchandra@freebsd.org>, svn-src-all@freebsd.org, Alfred Perlstein <bright@mu.org>, Oleksandr Tymoshenko <gonzo@bluezbox.com>, freebsd-arch@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r243631 - in head/sys: kern sys
Message-ID:  <50F4297F.8050708@FreeBSD.org>
In-Reply-To: <50F41F8C.5030900@freebsd.org>
References:  <201211272119.qARLJxXV061083@svn.freebsd.org> <ABB3E29B-91F3-4C25-8FAB-869BBD7459E1@bluezbox.com> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com> <50EB22D2.6090103@rice.edu> <50EB415F.8020405@freebsd.org> <CA%2B7sy7CkdoyScOEDEXWuwJxjCS5zTcC8_fu9isCeTFxT8opNJQ@mail.gmail.com> <50F04FE5.7010406@rice.edu> <CA%2B7sy7D=ZjTLirGW3BVGcAu0h8-dWpib%2BYziUjEqegOL9J4adw@mail.gmail.com> <CAJ-VmonLoL4E3UsNwx87p2FuHXTbJe7wFs9hBn5Zmr7TTQOSkg@mail.gmail.com> <50F1BD69.4060104@mu.org> <CAJ-VmokjZ_vpcmYeD65pWJN5tfhqn6yDXrFFcXf8dvYc55tQtg@mail.gmail.com> <50F2F79C.7040109@mu.org> <50F41F8C.5030900@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14.01.2013 17:09, Andre Oppermann wrote:
> On 13.01.2013 19:06, Alfred Perlstein wrote:
>> On 1/12/13 10:32 PM, Adrian Chadd wrote:
>>> On 12 January 2013 11:45, Alfred Perlstein <bright@mu.org> wrote:
>>>
>>>> I'm not sure if regressing to the waterfall method of development is
>>>> a good
>>>> idea at this point.
>>>>
>>>> I see a light at the end of the tunnel and we to continue to just
>>>> handle
>>>> these minor corner cases as we progress.
>>>>
>>>> If we move to a model where a minor bug is grounds to completely remove
>>>> helpful code then nothing will ever get done.
>>>>
>>> Allocating 512MB worth of callwheels on a 16GB MIPS machine is a
>>> little silly, don't you think?
>>>
>>> That suggests to me that the extent of which maxfiles/maxusers/etc
>>> percolates the codebase wasn't totally understood by those who wish to
>>> change it.
>>>
>>> I'd rather see some more investigative work into outlining things that
>>> need fixing and start fixing those, rather than "just change stuff and
>>> fix whatever issues creep up."
>>>
>>> I kinda hope we all understand what we're working on in the kernel a
>>> little better than that.
>>
>> Cool!   I'm glad people are now aware of the callwheel allocation
>> being insane with large maxusers.
>>
>> I saw this about a month ago (if not longer), but since there were
>> half a dozen people calling me an
>> imbecile who hadn't really yet read the code I didn't want to inflame
>> them more by fixing that with
>> "a hack". (actually a simple fix).
>>
>> A simple fix is to clamp callwheel size to the previous result of a
>> maxusers of 384 and call it a day.
>>
>> However the simplicity of that approach would probably inflame too
>> many feelings so I am unsure as
>> how to proceed.
>>
>> Any ideas?
> 
> I noticed the callwheel dependency as well and asked mav@ about it
> in a short email exchange.  He said it has only little use and goes
> away with the calloutng import.  While that is outstanding we need
> to clamp it to a sane value.
> 
> However I don't know what a sane value would be and why its size is
> directly derived from maxproc and maxfiles.  If there can be one
> callout per process and open file descriptor in the system, then
> it probably has to be so big.  If it can deal with 'collisions'
> in the wheel it can be much smaller.

As I've actually written, there are two different things:
 ncallout -- number of preallocated callout structures for purposes of
timeout() calls. That is a legacy API that is probably not very much
used now, so that value don't need to be too big. But that allocation is
static and if it will ever be exhausted system will panic. That is why
it was set quite high. The right way now would be to analyze where that
API is still used and estimate the really required number.

 callwheelsize -- number of slots in the callwheel. That is purely
optimizational value. If set too low, it will just increase number of
hash collisions without effects other then some slowdown. Optimal value
here does depend on number of callouts in system, but not only. Since
array index there is not really a hash, it is practically useless to set
array size it higher then median callout interval divided by hz (or by
1ms in calloutng). The problem is to estimate that median value, that
completely depends on workload.

Each one ncallout cost 32-52 bytes, while one callwheelsize only 8-16
and could probably be reduced to 4-8 by replacing TAILQ with LIST. So
that is ncallout and respective timeout() API what should be managed in
first order.

-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50F4297F.8050708>