Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 14 Jan 2013 17:05:43 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        Adrian Chadd <adrian@freebsd.org>, src-committers@freebsd.org, Alan Cox <alc@rice.edu>, "Jayachandran C." <jchandra@freebsd.org>, svn-src-all@freebsd.org, Alfred Perlstein <bright@mu.org>, Oleksandr Tymoshenko <gonzo@bluezbox.com>, freebsd-arch@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r243631 - in head/sys: kern sys
Message-ID:  <50F42CD7.6020400@freebsd.org>
In-Reply-To: <50F4297F.8050708@FreeBSD.org>
References:  <201211272119.qARLJxXV061083@svn.freebsd.org> <ABB3E29B-91F3-4C25-8FAB-869BBD7459E1@bluezbox.com> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com> <50EB22D2.6090103@rice.edu> <50EB415F.8020405@freebsd.org> <CA%2B7sy7CkdoyScOEDEXWuwJxjCS5zTcC8_fu9isCeTFxT8opNJQ@mail.gmail.com> <50F04FE5.7010406@rice.edu> <CA%2B7sy7D=ZjTLirGW3BVGcAu0h8-dWpib%2BYziUjEqegOL9J4adw@mail.gmail.com> <CAJ-VmonLoL4E3UsNwx87p2FuHXTbJe7wFs9hBn5Zmr7TTQOSkg@mail.gmail.com> <50F1BD69.4060104@mu.org> <CAJ-VmokjZ_vpcmYeD65pWJN5tfhqn6yDXrFFcXf8dvYc55tQtg@mail.gmail.com> <50F2F79C.7040109@mu.org> <50F41F8C.5030900@freebsd.org> <50F4297F.8050708@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14.01.2013 16:51, Alexander Motin wrote:
> On 14.01.2013 17:09, Andre Oppermann wrote:
>> On 13.01.2013 19:06, Alfred Perlstein wrote:
>>> On 1/12/13 10:32 PM, Adrian Chadd wrote:
>>>> On 12 January 2013 11:45, Alfred Perlstein <bright@mu.org> wrote:
>>>>
>>>>> I'm not sure if regressing to the waterfall method of development is
>>>>> a good
>>>>> idea at this point.
>>>>>
>>>>> I see a light at the end of the tunnel and we to continue to just
>>>>> handle
>>>>> these minor corner cases as we progress.
>>>>>
>>>>> If we move to a model where a minor bug is grounds to completely remove
>>>>> helpful code then nothing will ever get done.
>>>>>
>>>> Allocating 512MB worth of callwheels on a 16GB MIPS machine is a
>>>> little silly, don't you think?
>>>>
>>>> That suggests to me that the extent of which maxfiles/maxusers/etc
>>>> percolates the codebase wasn't totally understood by those who wish to
>>>> change it.
>>>>
>>>> I'd rather see some more investigative work into outlining things that
>>>> need fixing and start fixing those, rather than "just change stuff and
>>>> fix whatever issues creep up."
>>>>
>>>> I kinda hope we all understand what we're working on in the kernel a
>>>> little better than that.
>>>
>>> Cool!   I'm glad people are now aware of the callwheel allocation
>>> being insane with large maxusers.
>>>
>>> I saw this about a month ago (if not longer), but since there were
>>> half a dozen people calling me an
>>> imbecile who hadn't really yet read the code I didn't want to inflame
>>> them more by fixing that with
>>> "a hack". (actually a simple fix).
>>>
>>> A simple fix is to clamp callwheel size to the previous result of a
>>> maxusers of 384 and call it a day.
>>>
>>> However the simplicity of that approach would probably inflame too
>>> many feelings so I am unsure as
>>> how to proceed.
>>>
>>> Any ideas?
>>
>> I noticed the callwheel dependency as well and asked mav@ about it
>> in a short email exchange.  He said it has only little use and goes
>> away with the calloutng import.  While that is outstanding we need
>> to clamp it to a sane value.
>>
>> However I don't know what a sane value would be and why its size is
>> directly derived from maxproc and maxfiles.  If there can be one
>> callout per process and open file descriptor in the system, then
>> it probably has to be so big.  If it can deal with 'collisions'
>> in the wheel it can be much smaller.
>
> As I've actually written, there are two different things:
>   ncallout -- number of preallocated callout structures for purposes of
> timeout() calls. That is a legacy API that is probably not very much
> used now, so that value don't need to be too big. But that allocation is
> static and if it will ever be exhausted system will panic. That is why
> it was set quite high. The right way now would be to analyze where that
> API is still used and estimate the really required number.

Can timeout() be emulated on top of another API so we can do away with it?

>   callwheelsize -- number of slots in the callwheel. That is purely
> optimizational value. If set too low, it will just increase number of
> hash collisions without effects other then some slowdown. Optimal value
> here does depend on number of callouts in system, but not only. Since
> array index there is not really a hash, it is practically useless to set
> array size it higher then median callout interval divided by hz (or by
> 1ms in calloutng). The problem is to estimate that median value, that
> completely depends on workload.

OK.  So for example a large number of TCP connection would use up a
large number of slots in the callwheel.  I'll try to come up with a
reasonable sane scaling value.

> Each one ncallout cost 32-52 bytes, while one callwheelsize only 8-16
> and could probably be reduced to 4-8 by replacing TAILQ with LIST. So
> that is ncallout and respective timeout() API what should be managed in
> first order.

I'll give it a try.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50F42CD7.6020400>