From owner-freebsd-arch@FreeBSD.ORG Mon Jan 14 16:05:45 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id BD56810D for ; Mon, 14 Jan 2013 16:05:45 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 1F50C1AB for ; Mon, 14 Jan 2013 16:05:44 +0000 (UTC) Received: (qmail 61937 invoked from network); 14 Jan 2013 17:28:38 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 14 Jan 2013 17:28:38 -0000 Message-ID: <50F42CD7.6020400@freebsd.org> Date: Mon, 14 Jan 2013 17:05:43 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Alexander Motin Subject: Re: svn commit: r243631 - in head/sys: kern sys References: <201211272119.qARLJxXV061083@svn.freebsd.org> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com> <50EB22D2.6090103@rice.edu> <50EB415F.8020405@freebsd.org> <50F04FE5.7010406@rice.edu> <50F1BD69.4060104@mu.org> <50F2F79C.7040109@mu.org> <50F41F8C.5030900@freebsd.org> <50F4297F.8050708@FreeBSD.org> In-Reply-To: <50F4297F.8050708@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 14 Jan 2013 17:30:54 +0000 Cc: Adrian Chadd , src-committers@freebsd.org, Alan Cox , "Jayachandran C." , svn-src-all@freebsd.org, Alfred Perlstein , Oleksandr Tymoshenko , freebsd-arch@freebsd.org, svn-src-head@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 16:05:45 -0000 On 14.01.2013 16:51, Alexander Motin wrote: > On 14.01.2013 17:09, Andre Oppermann wrote: >> On 13.01.2013 19:06, Alfred Perlstein wrote: >>> On 1/12/13 10:32 PM, Adrian Chadd wrote: >>>> On 12 January 2013 11:45, Alfred Perlstein wrote: >>>> >>>>> I'm not sure if regressing to the waterfall method of development is >>>>> a good >>>>> idea at this point. >>>>> >>>>> I see a light at the end of the tunnel and we to continue to just >>>>> handle >>>>> these minor corner cases as we progress. >>>>> >>>>> If we move to a model where a minor bug is grounds to completely remove >>>>> helpful code then nothing will ever get done. >>>>> >>>> Allocating 512MB worth of callwheels on a 16GB MIPS machine is a >>>> little silly, don't you think? >>>> >>>> That suggests to me that the extent of which maxfiles/maxusers/etc >>>> percolates the codebase wasn't totally understood by those who wish to >>>> change it. >>>> >>>> I'd rather see some more investigative work into outlining things that >>>> need fixing and start fixing those, rather than "just change stuff and >>>> fix whatever issues creep up." >>>> >>>> I kinda hope we all understand what we're working on in the kernel a >>>> little better than that. >>> >>> Cool! I'm glad people are now aware of the callwheel allocation >>> being insane with large maxusers. >>> >>> I saw this about a month ago (if not longer), but since there were >>> half a dozen people calling me an >>> imbecile who hadn't really yet read the code I didn't want to inflame >>> them more by fixing that with >>> "a hack". (actually a simple fix). >>> >>> A simple fix is to clamp callwheel size to the previous result of a >>> maxusers of 384 and call it a day. >>> >>> However the simplicity of that approach would probably inflame too >>> many feelings so I am unsure as >>> how to proceed. >>> >>> Any ideas? >> >> I noticed the callwheel dependency as well and asked mav@ about it >> in a short email exchange. He said it has only little use and goes >> away with the calloutng import. While that is outstanding we need >> to clamp it to a sane value. >> >> However I don't know what a sane value would be and why its size is >> directly derived from maxproc and maxfiles. If there can be one >> callout per process and open file descriptor in the system, then >> it probably has to be so big. If it can deal with 'collisions' >> in the wheel it can be much smaller. > > As I've actually written, there are two different things: > ncallout -- number of preallocated callout structures for purposes of > timeout() calls. That is a legacy API that is probably not very much > used now, so that value don't need to be too big. But that allocation is > static and if it will ever be exhausted system will panic. That is why > it was set quite high. The right way now would be to analyze where that > API is still used and estimate the really required number. Can timeout() be emulated on top of another API so we can do away with it? > callwheelsize -- number of slots in the callwheel. That is purely > optimizational value. If set too low, it will just increase number of > hash collisions without effects other then some slowdown. Optimal value > here does depend on number of callouts in system, but not only. Since > array index there is not really a hash, it is practically useless to set > array size it higher then median callout interval divided by hz (or by > 1ms in calloutng). The problem is to estimate that median value, that > completely depends on workload. OK. So for example a large number of TCP connection would use up a large number of slots in the callwheel. I'll try to come up with a reasonable sane scaling value. > Each one ncallout cost 32-52 bytes, while one callwheelsize only 8-16 > and could probably be reduced to 4-8 by replacing TAILQ with LIST. So > that is ncallout and respective timeout() API what should be managed in > first order. I'll give it a try. -- Andre