From owner-freebsd-arch@FreeBSD.ORG Mon Jan 14 15:51:40 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1CD24B3F; Mon, 14 Jan 2013 15:51:40 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-bk0-f43.google.com (mail-bk0-f43.google.com [209.85.214.43]) by mx1.freebsd.org (Postfix) with ESMTP id 8B50D112; Mon, 14 Jan 2013 15:51:38 +0000 (UTC) Received: by mail-bk0-f43.google.com with SMTP id jf20so2095752bkc.30 for ; Mon, 14 Jan 2013 07:51:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=8dWv/Q4R3Sd3oz0GATFy77UA0UR87LtTzChQ6qO+U9c=; b=yrxy1X5hRA3NKHii/S4oUGd8mq13rpSA5Xvp3iZhkx9254Sufov+dzvqaUQyJGlqXA i4JZfAQLlywYI9vPAZnvbUUpHzbMmTsNsHYFPCI7KNfSpf9ke0waBv8r8FWjUETwYy1Q /r+UWMYUlwnYjlA2zdiSp/xGWPLRjunev11I3tE+H6EG3Ef4IzVlJOUAtyIRQ+Gp2seo nNOA1k391n+L4n9RnS+KlfpiumEHqmxELRe4SRRgzexkeWwU4b0WetW0WqBh6Rynd4Dc d4Kbks4+K0PuXHhVjcMetazuQMYd0vPP2CpLWZgLagb0LQZeoNzbI5qT8MOVxVyc/XNd 5f8A== X-Received: by 10.204.147.143 with SMTP id l15mr40422061bkv.28.1358178691948; Mon, 14 Jan 2013 07:51:31 -0800 (PST) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id m20sm10183415bkw.4.2013.01.14.07.51.29 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 14 Jan 2013 07:51:31 -0800 (PST) Sender: Alexander Motin Message-ID: <50F4297F.8050708@FreeBSD.org> Date: Mon, 14 Jan 2013 17:51:27 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: svn commit: r243631 - in head/sys: kern sys References: <201211272119.qARLJxXV061083@svn.freebsd.org> <50C1BC90.90106@freebsd.org> <50C25A27.4060007@bluezbox.com> <50C26331.6030504@freebsd.org> <50C26AE9.4020600@bluezbox.com> <50C3A3D3.9000804@freebsd.org> <50C3AF72.4010902@rice.edu> <330405A1-312A-45A5-BB86-4969478D8BBD@bluezbox.com> <50D03E83.8060908@rice.edu> <50DD081E.8000409@bluezbox.com> <50EB1841.5030006@bluezbox.com> <50EB22D2.6090103@rice.edu> <50EB415F.8020405@freebsd.org> <50F04FE5.7010406@rice.edu> <50F1BD69.4060104@mu.org> <50F2F79C.7040109@mu.org> <50F41F8C.5030900@freebsd.org> In-Reply-To: <50F41F8C.5030900@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 14 Jan 2013 16:11:17 +0000 Cc: Adrian Chadd , src-committers@freebsd.org, Alan Cox , "Jayachandran C." , svn-src-all@freebsd.org, Alfred Perlstein , Oleksandr Tymoshenko , freebsd-arch@freebsd.org, svn-src-head@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jan 2013 15:51:40 -0000 On 14.01.2013 17:09, Andre Oppermann wrote: > On 13.01.2013 19:06, Alfred Perlstein wrote: >> On 1/12/13 10:32 PM, Adrian Chadd wrote: >>> On 12 January 2013 11:45, Alfred Perlstein wrote: >>> >>>> I'm not sure if regressing to the waterfall method of development is >>>> a good >>>> idea at this point. >>>> >>>> I see a light at the end of the tunnel and we to continue to just >>>> handle >>>> these minor corner cases as we progress. >>>> >>>> If we move to a model where a minor bug is grounds to completely remove >>>> helpful code then nothing will ever get done. >>>> >>> Allocating 512MB worth of callwheels on a 16GB MIPS machine is a >>> little silly, don't you think? >>> >>> That suggests to me that the extent of which maxfiles/maxusers/etc >>> percolates the codebase wasn't totally understood by those who wish to >>> change it. >>> >>> I'd rather see some more investigative work into outlining things that >>> need fixing and start fixing those, rather than "just change stuff and >>> fix whatever issues creep up." >>> >>> I kinda hope we all understand what we're working on in the kernel a >>> little better than that. >> >> Cool! I'm glad people are now aware of the callwheel allocation >> being insane with large maxusers. >> >> I saw this about a month ago (if not longer), but since there were >> half a dozen people calling me an >> imbecile who hadn't really yet read the code I didn't want to inflame >> them more by fixing that with >> "a hack". (actually a simple fix). >> >> A simple fix is to clamp callwheel size to the previous result of a >> maxusers of 384 and call it a day. >> >> However the simplicity of that approach would probably inflame too >> many feelings so I am unsure as >> how to proceed. >> >> Any ideas? > > I noticed the callwheel dependency as well and asked mav@ about it > in a short email exchange. He said it has only little use and goes > away with the calloutng import. While that is outstanding we need > to clamp it to a sane value. > > However I don't know what a sane value would be and why its size is > directly derived from maxproc and maxfiles. If there can be one > callout per process and open file descriptor in the system, then > it probably has to be so big. If it can deal with 'collisions' > in the wheel it can be much smaller. As I've actually written, there are two different things: ncallout -- number of preallocated callout structures for purposes of timeout() calls. That is a legacy API that is probably not very much used now, so that value don't need to be too big. But that allocation is static and if it will ever be exhausted system will panic. That is why it was set quite high. The right way now would be to analyze where that API is still used and estimate the really required number. callwheelsize -- number of slots in the callwheel. That is purely optimizational value. If set too low, it will just increase number of hash collisions without effects other then some slowdown. Optimal value here does depend on number of callouts in system, but not only. Since array index there is not really a hash, it is practically useless to set array size it higher then median callout interval divided by hz (or by 1ms in calloutng). The problem is to estimate that median value, that completely depends on workload. Each one ncallout cost 32-52 bytes, while one callwheelsize only 8-16 and could probably be reduced to 4-8 by replacing TAILQ with LIST. So that is ncallout and respective timeout() API what should be managed in first order. -- Alexander Motin