From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 10:58:35 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 14FE2A14; Wed, 2 Jan 2013 10:58:35 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 6D1D68FC08; Wed, 2 Jan 2013 10:58:34 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 3209473027; Wed, 2 Jan 2013 11:57:30 +0100 (CET) Date: Wed, 2 Jan 2013 11:57:30 +0100 From: Luigi Rizzo To: Alexander Motin Subject: Re: [RFC/RFT] calloutng Message-ID: <20130102105730.GA42542@onelab2.iet.unipi.it> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50E16637.9070501@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: Davide Italiano , Ian Lepore , FreeBSD Current , Marius Strobl , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 10:58:35 -0000 On Mon, Dec 31, 2012 at 12:17:27PM +0200, Alexander Motin wrote: > On 31.12.2012 08:17, Luigi Rizzo wrote: > >On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote: ... > >>Then I noticed you had a 12_26 patchset so I tested > >>that (after crudely fixing a couple uninitialized var warnings), and it > >>all looks good on this arm (Raspberry Pi). I'll attach the results. > >> > >>It's so sweet to be able to do precision sleeps. > > Thank you for testing, Ian. > > >interesting numbers, but there seems to be some problem in computing > >the exact interval; delays are much larger than expected. > > > >In this test, the original timer code used to round to the next multiple > >of 1 tick and then add another tick (except for the kqueue case), > >which is exactly what you see in the second set of measurements. > > > >The calloutng code however seems to do something odd: > >in addition to fixed overhead (some 50us, which you can see in > >the tests for 1us and 300us), all delay seem to be ~10% larger > >than what is requested, upper bounded to 10ms (note, the > >numbers are averages so i cannot tell whether all samples are > >the same or there is some distribution of values). > > > >I am not sure if this error is peculiar of the ARM version or also > >appears on x86/amd64 but I believe it should be fixed. > > > >If you look at the results below: > > > >1us possily ok: > > for very short intervals i would expect some kind > > of 'reschedule' without actually firing a timer; maybe > > 50us are what it takes to do a round through the scheduler ? > > > >300us probably ok > > i guess the extra 50-90us are what it takes to do a round > > through the scheduler > > > >1000us borderline (this is the case for poll and kqueue, which are > > rounded to 1ms) > > here intervals seem to be increased by 10%, and i cannot see > > a good reason for this (more below). > > > >3000us and above: wrong > > here again, the intervals seem to be 10% larger than what is > > requested, perhaps limiting the error to 10-20ms. > > > > > >Maybe the 10% extension results from creating a default 'precision' > >for legacy calls, but i do not think this is done correctly. > > > >First of all, if users do not specify a precision themselves, the > >automatically generated value should never exceed one tick. > > > >Second, the only point of a 'precision' parameter is to merge > >requests that may be close in time, so if there is already a > >timer scheduled within [Treq, Treq+precision] i will get it; > >but if there no pending timer, then one should schedule it > >for the requested interval. > > > >Davide/Alexander, any ideas ? > > All mentioned effects could be explained with implemented logic. 50us at > 1us is probably sum of minimal latency of the hardware eventtimer on the > specific platform and some software processing overhead (syscall, > callout, timecouters, scheduler, etc). At later points system starts to > noticeably use precision specified by kern.timecounter.alloweddeviation > sysctl. It affects results from two sides: 1) extending intervals for > specified percent of time to allow event aggregation, and 2) choosing > time base between fast getbinuptime() and precise binuptime(). Extending > interval is needed to aggregate not only callouts with each other, but > also callouts with other system events, which are impossible to schedule > in advance. It gives specified relative error, but no more then one CPU > wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) > it is 1/hz, for completely idle one it can be up to 0.5s. Second point > allows to reduce processing overhead by the cost of error up to 1/hz for > long periods (>(100/allowed)*(1/hz)), when it is used. i am not sure what you mean by "extending interval", but i believe the logic should be the following: - say user requests a timeout after X seconds and with a tolerance of D second (both X and D are fractional, so they can be short). Interpret this as "the system should do its best to generate an event between X and X+D seconds" - convert X to an absolute time, T_X - if there are any pending events already scheduled between T_X and T_X+D, then by definition they are acceptable. Attach the requested timeout to the earliest of these events. - otherwise, schedule an event at time T_X (because there is no valid reason to generate a late event, and it makes no sense from an energy saving standpoint, either -- see below). It seems to me that you are instead extending the requested interval upfront, which causes some gratuitous pessimizations in scheduling the callout. Re. energy savings: the gain in extending the timeout cannot exceed the value D/X. So while it may make sense to extend a 1us request to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s, it is completely pointless from an energy saving standpoint to introduce a 10ms error on a 300ms request. (even though i hate the idea that a 1us request defaults to a 50us delay; but that is hopefully something that can be tuned in a platform-specific way and perhaps at runtime). cheers luigi > To get best possible precision kern.timecounter.alloweddeviation sysctl > can be set to smaller value. Setting it to 0 will effectively disable > all optimizations, but should give 50us precision in all cases. > > >>for t in 1 300 3000 30000 300000 ; do > >> for m in select poll usleep nanosleep kqueue kqueueto syscall ; do > >> ./testsleep $t $m > >> done > >>done > >> > >> > >>With calloutng_12_26.patch... > >> > >> HZ=100 HZ=250 HZ=1000 > >>---------- ---------------- ---------------- ---------------- > >>select 1 55.79 1 50.96 1 61.32 > >>poll 1 1109.46 1 1107.86 1 1114.38 > >>usleep 1 56.33 1 72.90 1 62.78 > >>nanosleep 1 52.66 1 55.23 1 64.23 > >>kqueue 1 1114.23 1 1113.81 1 1121.21 > >>kqueueto 1 65.44 1 71.00 1 75.01 > >>syscall 1 4.70 1 4.45 1 4.55 > >>select 300 355.79 300 357.76 300 362.35 > >>poll 300 1107.85 300 1122.55 300 1115.62 > >>usleep 300 355.28 300 357.28 300 360.79 > >>nanosleep 300 354.49 300 355.82 300 360.62 > >>kqueue 300 1112.57 300 1118.13 300 1117.16 > >>kqueueto 300 375.98 300 378.62 300 395.61 > >>syscall 300 4.41 300 4.45 300 4.54 > >>select 3000 3246.75 3000 3246.74 3000 3252.72 > >>poll 3000 3238.10 3000 3229.12 3000 3250.10 > >>usleep 3000 3242.47 3000 3237.06 3000 3249.61 > >>nanosleep 3000 3238.79 3000 3231.55 3000 3248.11 > >>kqueue 3000 3240.01 3000 3236.07 3000 3247.60 > >>kqueueto 3000 3265.36 3000 3267.22 3000 3274.96 > >>syscall 3000 4.69 3000 4.44 3000 4.50 > >>select 30000 31714.60 30000 31941.17 30000 32467.69 > >>poll 30000 31522.76 30000 31983.00 30000 32497.81 > >>usleep 30000 31459.67 30000 31980.76 30000 32458.71 > >>nanosleep 30000 31431.02 30000 31982.22 30000 32525.20 > >>kqueue 30000 31466.75 30000 31873.90 30000 31973.54 > >>kqueueto 30000 31564.67 30000 32522.35 30000 32475.59 > >>syscall 30000 4.70 30000 4.73 30000 4.89 > >>select 300000 319133.02 300000 311562.33 300000 309918.62 > >>poll 300000 319604.27 300000 311422.94 300000 310000.76 > >>usleep 300000 319314.60 300000 311269.69 300000 309996.34 > >>nanosleep 300000 319497.58 300000 311425.40 300000 309997.13 > >>kqueue 300000 309995.55 300000 303980.27 300000 309908.82 > >>kqueueto 300000 319505.88 300000 311424.97 300000 309996.16 > >>syscall 300000 4.41 300000 4.45 300000 4.89 > >> > >> > >>With no patches... > >> > >> HZ=100 HZ=250 HZ=1000 > >>---------- ---------------- ---------------- ---------------- > >>select 1 19941.70 1 7989.10 1 1999.16 > >>poll 1 19904.61 1 7987.32 1 1999.78 > >>usleep 1 19904.95 1 7993.30 1 1999.96 > >>nanosleep 1 19905.64 1 7993.71 1 1999.72 > >>kqueue 1 10001.61 1 4004.00 1 1000.27 > >>kqueueto 1 19904.00 1 7993.03 1 1999.54 > >>syscall 1 4.04 1 4.05 1 4.75 > >>select 300 19904.66 300 7998.39 300 2000.27 > >>poll 300 19904.35 300 7993.47 300 1999.86 > >>usleep 300 19903.96 300 7994.11 300 1999.81 > >>nanosleep 300 19904.48 300 7993.77 300 1999.80 > >>kqueue 300 10001.68 300 4004.18 300 1000.31 > >>kqueueto 300 19997.86 300 7993.37 300 1999.59 > >>syscall 300 4.01 300 4.00 300 4.32 > >>select 3000 19904.80 3000 7998.85 3000 3998.43 > >>poll 3000 19904.92 3000 8005.93 3000 3999.39 > >>usleep 3000 19904.50 3000 7992.88 3000 3999.44 > >>nanosleep 3000 19904.84 3000 7993.34 3000 3999.36 > >>kqueue 3000 10001.58 3000 4003.97 3000 3000.72 > >>kqueueto 3000 19903.56 3000 7993.24 3000 3999.34 > >>syscall 3000 4.02 3000 4.37 3000 4.29 > >>select 30000 39905.02 30000 35991.79 30000 31051.77 > >>poll 30000 39905.49 30000 35980.35 30000 30995.64 > >>usleep 30000 39903.78 30000 35979.48 30000 30995.23 > >>nanosleep 30000 39904.55 30000 35981.61 30000 30995.87 > >>kqueue 30000 30002.73 30000 32019.54 30000 30004.83 > >>kqueueto 30000 39903.59 30000 35979.64 30000 30996.05 > >>syscall 30000 4.44 30000 4.04 30000 4.31 > >>select 300000 310001.23 300000 303995.86 300000 300994.30 > >>poll 300000 309902.73 300000 303981.58 300000 300996.17 > >>usleep 300000 309903.64 300000 303980.17 300000 300997.42 > >>nanosleep 300000 309903.32 300000 303980.36 300000 300993.64 > >>kqueue 300000 300002.77 300000 300019.46 300000 300006.90 > >>kqueueto 300000 309903.31 300000 303978.10 300000 300996.84 > >>syscall 300000 4.01 300000 4.04 300000 4.29 > > > -- > Alexander Motin