From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 10:58:35 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 14FE2A14; Wed, 2 Jan 2013 10:58:35 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 6D1D68FC08; Wed, 2 Jan 2013 10:58:34 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 3209473027; Wed, 2 Jan 2013 11:57:30 +0100 (CET) Date: Wed, 2 Jan 2013 11:57:30 +0100 From: Luigi Rizzo To: Alexander Motin Subject: Re: [RFC/RFT] calloutng Message-ID: <20130102105730.GA42542@onelab2.iet.unipi.it> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50E16637.9070501@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: Davide Italiano , Ian Lepore , FreeBSD Current , Marius Strobl , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 10:58:35 -0000 On Mon, Dec 31, 2012 at 12:17:27PM +0200, Alexander Motin wrote: > On 31.12.2012 08:17, Luigi Rizzo wrote: > >On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote: ... > >>Then I noticed you had a 12_26 patchset so I tested > >>that (after crudely fixing a couple uninitialized var warnings), and it > >>all looks good on this arm (Raspberry Pi). I'll attach the results. > >> > >>It's so sweet to be able to do precision sleeps. > > Thank you for testing, Ian. > > >interesting numbers, but there seems to be some problem in computing > >the exact interval; delays are much larger than expected. > > > >In this test, the original timer code used to round to the next multiple > >of 1 tick and then add another tick (except for the kqueue case), > >which is exactly what you see in the second set of measurements. > > > >The calloutng code however seems to do something odd: > >in addition to fixed overhead (some 50us, which you can see in > >the tests for 1us and 300us), all delay seem to be ~10% larger > >than what is requested, upper bounded to 10ms (note, the > >numbers are averages so i cannot tell whether all samples are > >the same or there is some distribution of values). > > > >I am not sure if this error is peculiar of the ARM version or also > >appears on x86/amd64 but I believe it should be fixed. > > > >If you look at the results below: > > > >1us possily ok: > > for very short intervals i would expect some kind > > of 'reschedule' without actually firing a timer; maybe > > 50us are what it takes to do a round through the scheduler ? > > > >300us probably ok > > i guess the extra 50-90us are what it takes to do a round > > through the scheduler > > > >1000us borderline (this is the case for poll and kqueue, which are > > rounded to 1ms) > > here intervals seem to be increased by 10%, and i cannot see > > a good reason for this (more below). > > > >3000us and above: wrong > > here again, the intervals seem to be 10% larger than what is > > requested, perhaps limiting the error to 10-20ms. > > > > > >Maybe the 10% extension results from creating a default 'precision' > >for legacy calls, but i do not think this is done correctly. > > > >First of all, if users do not specify a precision themselves, the > >automatically generated value should never exceed one tick. > > > >Second, the only point of a 'precision' parameter is to merge > >requests that may be close in time, so if there is already a > >timer scheduled within [Treq, Treq+precision] i will get it; > >but if there no pending timer, then one should schedule it > >for the requested interval. > > > >Davide/Alexander, any ideas ? > > All mentioned effects could be explained with implemented logic. 50us at > 1us is probably sum of minimal latency of the hardware eventtimer on the > specific platform and some software processing overhead (syscall, > callout, timecouters, scheduler, etc). At later points system starts to > noticeably use precision specified by kern.timecounter.alloweddeviation > sysctl. It affects results from two sides: 1) extending intervals for > specified percent of time to allow event aggregation, and 2) choosing > time base between fast getbinuptime() and precise binuptime(). Extending > interval is needed to aggregate not only callouts with each other, but > also callouts with other system events, which are impossible to schedule > in advance. It gives specified relative error, but no more then one CPU > wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) > it is 1/hz, for completely idle one it can be up to 0.5s. Second point > allows to reduce processing overhead by the cost of error up to 1/hz for > long periods (>(100/allowed)*(1/hz)), when it is used. i am not sure what you mean by "extending interval", but i believe the logic should be the following: - say user requests a timeout after X seconds and with a tolerance of D second (both X and D are fractional, so they can be short). Interpret this as "the system should do its best to generate an event between X and X+D seconds" - convert X to an absolute time, T_X - if there are any pending events already scheduled between T_X and T_X+D, then by definition they are acceptable. Attach the requested timeout to the earliest of these events. - otherwise, schedule an event at time T_X (because there is no valid reason to generate a late event, and it makes no sense from an energy saving standpoint, either -- see below). It seems to me that you are instead extending the requested interval upfront, which causes some gratuitous pessimizations in scheduling the callout. Re. energy savings: the gain in extending the timeout cannot exceed the value D/X. So while it may make sense to extend a 1us request to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s, it is completely pointless from an energy saving standpoint to introduce a 10ms error on a 300ms request. (even though i hate the idea that a 1us request defaults to a 50us delay; but that is hopefully something that can be tuned in a platform-specific way and perhaps at runtime). cheers luigi > To get best possible precision kern.timecounter.alloweddeviation sysctl > can be set to smaller value. Setting it to 0 will effectively disable > all optimizations, but should give 50us precision in all cases. > > >>for t in 1 300 3000 30000 300000 ; do > >> for m in select poll usleep nanosleep kqueue kqueueto syscall ; do > >> ./testsleep $t $m > >> done > >>done > >> > >> > >>With calloutng_12_26.patch... > >> > >> HZ=100 HZ=250 HZ=1000 > >>---------- ---------------- ---------------- ---------------- > >>select 1 55.79 1 50.96 1 61.32 > >>poll 1 1109.46 1 1107.86 1 1114.38 > >>usleep 1 56.33 1 72.90 1 62.78 > >>nanosleep 1 52.66 1 55.23 1 64.23 > >>kqueue 1 1114.23 1 1113.81 1 1121.21 > >>kqueueto 1 65.44 1 71.00 1 75.01 > >>syscall 1 4.70 1 4.45 1 4.55 > >>select 300 355.79 300 357.76 300 362.35 > >>poll 300 1107.85 300 1122.55 300 1115.62 > >>usleep 300 355.28 300 357.28 300 360.79 > >>nanosleep 300 354.49 300 355.82 300 360.62 > >>kqueue 300 1112.57 300 1118.13 300 1117.16 > >>kqueueto 300 375.98 300 378.62 300 395.61 > >>syscall 300 4.41 300 4.45 300 4.54 > >>select 3000 3246.75 3000 3246.74 3000 3252.72 > >>poll 3000 3238.10 3000 3229.12 3000 3250.10 > >>usleep 3000 3242.47 3000 3237.06 3000 3249.61 > >>nanosleep 3000 3238.79 3000 3231.55 3000 3248.11 > >>kqueue 3000 3240.01 3000 3236.07 3000 3247.60 > >>kqueueto 3000 3265.36 3000 3267.22 3000 3274.96 > >>syscall 3000 4.69 3000 4.44 3000 4.50 > >>select 30000 31714.60 30000 31941.17 30000 32467.69 > >>poll 30000 31522.76 30000 31983.00 30000 32497.81 > >>usleep 30000 31459.67 30000 31980.76 30000 32458.71 > >>nanosleep 30000 31431.02 30000 31982.22 30000 32525.20 > >>kqueue 30000 31466.75 30000 31873.90 30000 31973.54 > >>kqueueto 30000 31564.67 30000 32522.35 30000 32475.59 > >>syscall 30000 4.70 30000 4.73 30000 4.89 > >>select 300000 319133.02 300000 311562.33 300000 309918.62 > >>poll 300000 319604.27 300000 311422.94 300000 310000.76 > >>usleep 300000 319314.60 300000 311269.69 300000 309996.34 > >>nanosleep 300000 319497.58 300000 311425.40 300000 309997.13 > >>kqueue 300000 309995.55 300000 303980.27 300000 309908.82 > >>kqueueto 300000 319505.88 300000 311424.97 300000 309996.16 > >>syscall 300000 4.41 300000 4.45 300000 4.89 > >> > >> > >>With no patches... > >> > >> HZ=100 HZ=250 HZ=1000 > >>---------- ---------------- ---------------- ---------------- > >>select 1 19941.70 1 7989.10 1 1999.16 > >>poll 1 19904.61 1 7987.32 1 1999.78 > >>usleep 1 19904.95 1 7993.30 1 1999.96 > >>nanosleep 1 19905.64 1 7993.71 1 1999.72 > >>kqueue 1 10001.61 1 4004.00 1 1000.27 > >>kqueueto 1 19904.00 1 7993.03 1 1999.54 > >>syscall 1 4.04 1 4.05 1 4.75 > >>select 300 19904.66 300 7998.39 300 2000.27 > >>poll 300 19904.35 300 7993.47 300 1999.86 > >>usleep 300 19903.96 300 7994.11 300 1999.81 > >>nanosleep 300 19904.48 300 7993.77 300 1999.80 > >>kqueue 300 10001.68 300 4004.18 300 1000.31 > >>kqueueto 300 19997.86 300 7993.37 300 1999.59 > >>syscall 300 4.01 300 4.00 300 4.32 > >>select 3000 19904.80 3000 7998.85 3000 3998.43 > >>poll 3000 19904.92 3000 8005.93 3000 3999.39 > >>usleep 3000 19904.50 3000 7992.88 3000 3999.44 > >>nanosleep 3000 19904.84 3000 7993.34 3000 3999.36 > >>kqueue 3000 10001.58 3000 4003.97 3000 3000.72 > >>kqueueto 3000 19903.56 3000 7993.24 3000 3999.34 > >>syscall 3000 4.02 3000 4.37 3000 4.29 > >>select 30000 39905.02 30000 35991.79 30000 31051.77 > >>poll 30000 39905.49 30000 35980.35 30000 30995.64 > >>usleep 30000 39903.78 30000 35979.48 30000 30995.23 > >>nanosleep 30000 39904.55 30000 35981.61 30000 30995.87 > >>kqueue 30000 30002.73 30000 32019.54 30000 30004.83 > >>kqueueto 30000 39903.59 30000 35979.64 30000 30996.05 > >>syscall 30000 4.44 30000 4.04 30000 4.31 > >>select 300000 310001.23 300000 303995.86 300000 300994.30 > >>poll 300000 309902.73 300000 303981.58 300000 300996.17 > >>usleep 300000 309903.64 300000 303980.17 300000 300997.42 > >>nanosleep 300000 309903.32 300000 303980.36 300000 300993.64 > >>kqueue 300000 300002.77 300000 300019.46 300000 300006.90 > >>kqueueto 300000 309903.31 300000 303978.10 300000 300996.84 > >>syscall 300000 4.01 300000 4.04 300000 4.29 > > > -- > Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 11:24:37 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A3C63FF3; Wed, 2 Jan 2013 11:24:37 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ea0-f178.google.com (mail-ea0-f178.google.com [209.85.215.178]) by mx1.freebsd.org (Postfix) with ESMTP id C58628FC14; Wed, 2 Jan 2013 11:24:36 +0000 (UTC) Received: by mail-ea0-f178.google.com with SMTP id k11so5884314eaa.23 for ; Wed, 02 Jan 2013 03:24:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=6epBEEoEP6d+FP+YxO1bxtw+aDC9aFKnX7YDOkS3OPg=; b=d9FZUY+/cGhYonLELHujxH3ivyxhBqHU011TQ+2uVwdsHidt6JBQKEWn+VOdAkC7JR Ll8+wvIarXcXDIafVX9eDkk0RTvASajbSuMq0yxW50qcNwplgiR+8zB1MXGQh1zT+7BG nmKloM8yresyJvzireaFXksaCzon6SrFTILACgXvTmVSqQsQC4V7N3vodFg2rQw9Ptmz 9iF4D4+6fON8X16/LzMHfDp4F48GmI4spkH2Z3WZhkAEPnhUS/rL9KMzsqx5wsE8xVQ8 d2IilXIJMW4pS3TvyCe05eZiBr8mYishaKC5jM/CYa1ZOOIr4N8xhOi9B8SD+okGh0ce +yGQ== X-Received: by 10.14.215.6 with SMTP id d6mr124479099eep.40.1357125869943; Wed, 02 Jan 2013 03:24:29 -0800 (PST) Received: from mavbook.mavhome.dp.ua ([91.198.175.1]) by mx.google.com with ESMTPS id 43sm96922786eed.10.2013.01.02.03.24.27 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 Jan 2013 03:24:28 -0800 (PST) Sender: Alexander Motin Message-ID: <50E418EA.7030801@FreeBSD.org> Date: Wed, 02 Jan 2013 13:24:26 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120628 Thunderbird/13.0.1 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: [RFC/RFT] calloutng References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> In-Reply-To: <20130102105730.GA42542@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Davide Italiano , Ian Lepore , FreeBSD Current , Marius Strobl , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 11:24:37 -0000 On 02.01.2013 12:57, Luigi Rizzo wrote: > On Mon, Dec 31, 2012 at 12:17:27PM +0200, Alexander Motin wrote: >> On 31.12.2012 08:17, Luigi Rizzo wrote: >>> On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote: > ... >>>> Then I noticed you had a 12_26 patchset so I tested >>>> that (after crudely fixing a couple uninitialized var warnings), and it >>>> all looks good on this arm (Raspberry Pi). I'll attach the results. >>>> >>>> It's so sweet to be able to do precision sleeps. >> >> Thank you for testing, Ian. >> >>> interesting numbers, but there seems to be some problem in computing >>> the exact interval; delays are much larger than expected. >>> >>> In this test, the original timer code used to round to the next multiple >>> of 1 tick and then add another tick (except for the kqueue case), >>> which is exactly what you see in the second set of measurements. >>> >>> The calloutng code however seems to do something odd: >>> in addition to fixed overhead (some 50us, which you can see in >>> the tests for 1us and 300us), all delay seem to be ~10% larger >>> than what is requested, upper bounded to 10ms (note, the >>> numbers are averages so i cannot tell whether all samples are >>> the same or there is some distribution of values). >>> >>> I am not sure if this error is peculiar of the ARM version or also >>> appears on x86/amd64 but I believe it should be fixed. >>> >>> If you look at the results below: >>> >>> 1us possily ok: >>> for very short intervals i would expect some kind >>> of 'reschedule' without actually firing a timer; maybe >>> 50us are what it takes to do a round through the scheduler ? >>> >>> 300us probably ok >>> i guess the extra 50-90us are what it takes to do a round >>> through the scheduler >>> >>> 1000us borderline (this is the case for poll and kqueue, which are >>> rounded to 1ms) >>> here intervals seem to be increased by 10%, and i cannot see >>> a good reason for this (more below). >>> >>> 3000us and above: wrong >>> here again, the intervals seem to be 10% larger than what is >>> requested, perhaps limiting the error to 10-20ms. >>> >>> >>> Maybe the 10% extension results from creating a default 'precision' >>> for legacy calls, but i do not think this is done correctly. >>> >>> First of all, if users do not specify a precision themselves, the >>> automatically generated value should never exceed one tick. >>> >>> Second, the only point of a 'precision' parameter is to merge >>> requests that may be close in time, so if there is already a >>> timer scheduled within [Treq, Treq+precision] i will get it; >>> but if there no pending timer, then one should schedule it >>> for the requested interval. >>> >>> Davide/Alexander, any ideas ? >> >> All mentioned effects could be explained with implemented logic. 50us at >> 1us is probably sum of minimal latency of the hardware eventtimer on the >> specific platform and some software processing overhead (syscall, >> callout, timecouters, scheduler, etc). At later points system starts to >> noticeably use precision specified by kern.timecounter.alloweddeviation >> sysctl. It affects results from two sides: 1) extending intervals for >> specified percent of time to allow event aggregation, and 2) choosing >> time base between fast getbinuptime() and precise binuptime(). Extending >> interval is needed to aggregate not only callouts with each other, but >> also callouts with other system events, which are impossible to schedule >> in advance. It gives specified relative error, but no more then one CPU >> wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) >> it is 1/hz, for completely idle one it can be up to 0.5s. Second point >> allows to reduce processing overhead by the cost of error up to 1/hz for >> long periods (>(100/allowed)*(1/hz)), when it is used. > > i am not sure what you mean by "extending interval", but i believe the > logic should be the following: > > - say user requests a timeout after X seconds and with a tolerance of D second > (both X and D are fractional, so they can be short). Interpret this as > > "the system should do its best to generate an event between X and X+D seconds" > > - convert X to an absolute time, T_X > > - if there are any pending events already scheduled between T_X and T_X+D, > then by definition they are acceptable. Attach the requested timeout > to the earliest of these events. All above is true, but not following. > - otherwise, schedule an event at time T_X (because there is no valid > reason to generate a late event, and it makes no sense from an > energy saving standpoint, either -- see below). System may have many interrupts except timer: network, disk, ... WiFi cards generate interrupts with AP beacon rate -- dozens times per second. It is not very efficient to wake up CPU precisely at T_X time, that may be just 100us earlier then next hardware interrupt. That's why timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next statclock, ...). As result, event will be handled within allowed range, but real delay will depends on current environment conditions. > It seems to me that you are instead extending the requested interval > upfront, which causes some gratuitous pessimizations in scheduling > the callout. > > Re. energy savings: the gain in extending the timeout cannot exceed > the value D/X. So while it may make sense to extend a 1us request > to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s, > it is completely pointless from an energy saving standpoint to > introduce a 10ms error on a 300ms request. I am not so sure in this. When CPU package is in C7 sleep state with all buses and caches shut down and memory set to self refresh, it consumes very few (some milli-Watts) of power. Wake up from that state takes 100us or even more with power consumption much higher then normal operational one. Sure, if we compare it with power consumption of 100% CPU load, difference between 10 and 100 wakeups per second may be small, but when comparing to each other in some low-power environment for mostly idle system it may be much more significant. > (even though i hate the idea that a 1us request defaults to > a 50us delay; but that is hopefully something that can be tuned > in a platform-specific way and perhaps at runtime). It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us. -- Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 12:28:41 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id A35DFC3E; Wed, 2 Jan 2013 12:28:41 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 1E42F8FC12; Wed, 2 Jan 2013 12:28:39 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 9B8E87300A; Wed, 2 Jan 2013 13:27:43 +0100 (CET) Date: Wed, 2 Jan 2013 13:27:43 +0100 From: Luigi Rizzo To: Alexander Motin Subject: Re: [RFC/RFT] calloutng Message-ID: <20130102122743.GA43241@onelab2.iet.unipi.it> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50E418EA.7030801@FreeBSD.org> User-Agent: Mutt/1.4.2.3i Cc: Davide Italiano , Ian Lepore , FreeBSD Current , Marius Strobl , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 12:28:41 -0000 On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote: > On 02.01.2013 12:57, Luigi Rizzo wrote: ... > >i am not sure what you mean by "extending interval", but i believe the > >logic should be the following: > > > >- say user requests a timeout after X seconds and with a tolerance of D > >second > > (both X and D are fractional, so they can be short). Interpret this as > > > > "the system should do its best to generate an event between X and X+D > > seconds" > > > >- convert X to an absolute time, T_X > > > >- if there are any pending events already scheduled between T_X and T_X+D, > > then by definition they are acceptable. Attach the requested timeout > > to the earliest of these events. > > All above is true, but not following. > > >- otherwise, schedule an event at time T_X (because there is no valid > > reason to generate a late event, and it makes no sense from an > > energy saving standpoint, either -- see below). > > System may have many interrupts except timer: network, disk, ... WiFi > cards generate interrupts with AP beacon rate -- dozens times per > second. It is not very efficient to wake up CPU precisely at T_X time, > that may be just 100us earlier then next hardware interrupt. That's why > timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next > statclock, ...). As result, event will be handled within allowed range, > but real delay will depends on current environment conditions. I don't see why system events (hardclock, statclock, 0.5s,...) need to be treated specially -- and i am saying this also in the interest of simplifying the logic of the code. First of all, if you know that there is already a hardclock/statclock/* scheduled in [T_X, T_X+D] you just reuse that. This particular bullet was ""no event scheduled in [T_X, T_X+D]" so you need to generate a new one. Surely scheduling the event at T_X+D instead of T_X increases the chance of merging events. But the saving are smaller and smaller as the value X increases. This particular client will only change its request rate from 1/X to 1/(X+D) so in relative terms the gain is ( 1/X - 1/(X+D) ) / (1/(X+D) ) = D/X Example: if X = 300ms, and D = 10ms (as in the test case) you just save one interrupt every 30seconds by scheduling at T_X+D instead of T_X. Are we actually able to measure the difference ? Even at high interrupt rates (e.g. X = 1ms) you are not going to save a lot unless the tolerance D is very large, which is generally undesirable for other reasons (presumably, applications are not going to be happy if you artificially double their timeouts). Now, say your application requests timeouts every X = 300ms. > >It seems to me that you are instead extending the requested interval > >upfront, which causes some gratuitous pessimizations in scheduling > >the callout. > > > >Re. energy savings: the gain in extending the timeout cannot exceed > >the value D/X. So while it may make sense to extend a 1us request > >to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s, > >it is completely pointless from an energy saving standpoint to > >introduce a 10ms error on a 300ms request. > > I am not so sure in this. When CPU package is in C7 sleep state with all > buses and caches shut down and memory set to self refresh, it consumes > very few (some milli-Watts) of power. Wake up from that state takes > 100us or even more with power consumption much higher then normal > operational one. Sure, if we compare it with power consumption of 100% > CPU load, difference between 10 and 100 wakeups per second may be small, > but when comparing to each other in some low-power environment for > mostly idle system it may be much more significant. see above -- at low rates the difference is not measurable, at high rates thCe only obvious answer is "do not use C7 unless if the next interrupt is due in less than 2..5 milliseconds" > >(even though i hate the idea that a 1us request defaults to > >a 50us delay; but that is hopefully something that can be tuned > >in a platform-specific way and perhaps at runtime). > > It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us. very good, i suspected something similar, just wanted to be sure :) cheers luigi > -- > Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 14:03:06 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9388A5B5; Wed, 2 Jan 2013 14:03:06 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214]) by mx1.freebsd.org (Postfix) with ESMTP id 7BA3B8FC08; Wed, 2 Jan 2013 14:03:04 +0000 (UTC) Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218]) by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id r02E2waO021163; Wed, 2 Jan 2013 07:02:58 -0700 (MST) (envelope-from freebsd@damnhippie.dyndns.org) Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r02E2sIQ087850; Wed, 2 Jan 2013 07:02:54 -0700 (MST) (envelope-from freebsd@damnhippie.dyndns.org) Subject: Re: [RFC/RFT] calloutng From: Ian Lepore To: Alexander Motin In-Reply-To: References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> Content-Type: text/plain; charset="koi8-r" Date: Wed, 02 Jan 2013 07:02:54 -0700 Message-ID: <1357135374.54953.150.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 8bit Cc: Davide Italiano , Marius Strobl , FreeBSD Current , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 14:03:06 -0000 On Wed, 2013-01-02 at 15:11 +0200, Alexander Motin wrote: > 02.01.2013 14:28 пользователь "Luigi Rizzo" написал: > > > > On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote: > > > On 02.01.2013 12:57, Luigi Rizzo wrote: > > First of all, if you know that there is already a hardclock/statclock/* > > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet > > was ""no event scheduled in [T_X, T_X+D]" so you need to generate > > a new one. > > That is true, but my main point was about merging with external events, > which I can't predict and the only way to merge is increase sleep period, > hoping for better. > This really is the crux of the problem, because you can't *by default* dispatch an event earlier than requested because that's just a violation of the usual rules of precision timing (where you expect to be late but never early). Sometimes there is no need for such precision, and an early wakeup is no more or less detrimental than a late wakeup. In fact, that may even be the majority case. I wonder if it might make sense to allow the precision specification to indicate whether it needs traditional "never early" behavior or whether it can be interpretted as "plus or minus this amount is fine." Like maybe negative precision is interpretted as "plus or minus abs(precision)" or something like that. Or maybe even the other way around... you get "plus or minus" precision by default and the few things that really care about precision timing have a way of indicating that. (But in that case the userland sleeps would have to assume the traditional behavior because that's how they've always been documented.) -- Ian From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 16:23:04 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 192E191B; Wed, 2 Jan 2013 16:23:04 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id BF0181616; Wed, 2 Jan 2013 16:23:03 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 622AA7300A; Wed, 2 Jan 2013 17:22:06 +0100 (CET) Date: Wed, 2 Jan 2013 17:22:06 +0100 From: Luigi Rizzo To: Alexander Motin Subject: Re: [RFC/RFT] calloutng Message-ID: <20130102162206.GA45701@onelab2.iet.unipi.it> References: <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Davide Italiano , Ian Lepore , freebsd-arch@freebsd.org, FreeBSD Current , Marius Strobl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 16:23:04 -0000 On Wed, Jan 02, 2013 at 03:11:05PM +0200, Alexander Motin wrote: ... > > First of all, if you know that there is already a hardclock/statclock/* > > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet > > was ""no event scheduled in [T_X, T_X+D]" so you need to generate > > a new one. > > That is true, but my main point was about merging with external events, > which I can't predict and the only way to merge is increase sleep period, > hoping for better. ok, now i understand why you want to schedule for T_X+D. Probably one way to close this discussion would be to provide a sysctl so the sysadmin can decide which point in the interval to pick when there is no suitable callout already scheduled. cheers luigi -----------------------------------------+------------------------------- Prof. Luigi RIZZO, rizzo@iet.unipi.it . Dip. di Ing. dell'Informazione http://www.iet.unipi.it/~luigi/ . Universita` di Pisa TEL +39-050-2211611 . via Diotisalvi 2 Mobile +39-338-6809875 . 56122 PISA (Italy) -----------------------------------------+------------------------------- From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 17:01:52 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id CE7CF890; Wed, 2 Jan 2013 17:01:52 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-we0-f175.google.com (mail-we0-f175.google.com [74.125.82.175]) by mx1.freebsd.org (Postfix) with ESMTP id BB21B17C1; Wed, 2 Jan 2013 17:01:51 +0000 (UTC) Received: by mail-we0-f175.google.com with SMTP id z53so6928298wey.34 for ; Wed, 02 Jan 2013 09:01:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=MQ2/8MvxPXr3Mfaq7+CzCJq3Asbu/F7W+iECw1K5ZLk=; b=BG0WgUU5j933tfR8FChP08xQshElk+KiP04OcjMogmx3Pw/Gq2mznTwTWYCo20JrZJ kX5eC2kiO9HEve+tA1EXv9CuCaMJMfYX8PQjPZDxGgbH82r2rOVZwu07A6XqjcquMJ5F fs6Tf9b4R+hWb7fpm+Bez/RaxmVCahfYCFAZk5x/c/OnFhdyQk+HmZt+X3ZiwaB5Rj/K vquQ6pZdKlXmbgghrTDbEap0VfXHHCSD523G2sPyxbPpu1VsUKhA7inpg8Nk/yS4q9ND TykABw8tCyoTtfXqVew4tJxif4oG97T9Q7fHCghEPUSjk4X33+ZdYEla0MuhxcdzPcRy o1IA== MIME-Version: 1.0 Received: by 10.194.83.36 with SMTP id n4mr73372240wjy.59.1357142883202; Wed, 02 Jan 2013 08:08:03 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.217.57.9 with HTTP; Wed, 2 Jan 2013 08:08:03 -0800 (PST) In-Reply-To: <1357135374.54953.150.camel@revolution.hippie.lan> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <1357135374.54953.150.camel@revolution.hippie.lan> Date: Wed, 2 Jan 2013 08:08:03 -0800 X-Google-Sender-Auth: ziUubih1EPjyqc6VtzirwSbcEKY Message-ID: Subject: Re: [RFC/RFT] calloutng From: Adrian Chadd To: Ian Lepore Content-Type: text/plain; charset=ISO-8859-1 Cc: Davide Italiano , Alexander Motin , Marius Strobl , FreeBSD Current , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 17:01:52 -0000 .. I'm pretty damned sure we're going to need to enforce a "never earlier than X" latency. Is there a more detailed writeup of calloutng somewhere, besides David's slides? The wiki page is rather empty. Eg - I think this work does coalesce wakeups, right? Or it can? So when in low-power scenarios you can end up with lower-resolution callout periods, but many less CPU wakeups a second? (Do we actually _expose_ wakeups-per-second somewhere?) Adrian From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 17:09:38 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id DBCC8DEE for ; Wed, 2 Jan 2013 17:09:38 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7F42E17E1 for ; Wed, 2 Jan 2013 17:09:38 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id r02H9YJs040574; Wed, 2 Jan 2013 19:09:34 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.7.3 kib.kiev.ua r02H9YJs040574 Received: (from kostik@localhost) by tom.home (8.14.5/8.14.5/Submit) id r02H9Y1b040573; Wed, 2 Jan 2013 19:09:34 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 2 Jan 2013 19:09:34 +0200 From: Konstantin Belousov To: Luigi Rizzo Subject: Re: [RFC/RFT] calloutng Message-ID: <20130102170934.GA82219@kib.kiev.ua> References: <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <20130102162206.GA45701@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1XmnKQGVLLNJnMip" Content-Disposition: inline In-Reply-To: <20130102162206.GA45701@onelab2.iet.unipi.it> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: Davide Italiano , Ian Lepore , Alexander Motin , freebsd-arch@freebsd.org, FreeBSD Current , Marius Strobl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 17:09:38 -0000 --1XmnKQGVLLNJnMip Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jan 02, 2013 at 05:22:06PM +0100, Luigi Rizzo wrote: > On Wed, Jan 02, 2013 at 03:11:05PM +0200, Alexander Motin wrote: > ... > > > First of all, if you know that there is already a hardclock/statclock= /* > > > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet > > > was ""no event scheduled in [T_X, T_X+D]" so you need to generate > > > a new one. > >=20 > > That is true, but my main point was about merging with external events, > > which I can't predict and the only way to merge is increase sleep perio= d, > > hoping for better. >=20 > ok, now i understand why you want to schedule for T_X+D. >=20 > Probably one way to close this discussion would be to provide > a sysctl so the sysadmin can decide which point in the interval > to pick when there is no suitable callout already scheduled. Isn't trying to synchronize to the external events in this way unsafe ? I remember, but cannot find the reference right now, a scheduler exploit(s) which completely hide malicious thread from the time accounting, by making it voluntary yielding right before statclock should fire. If statistic gathering could be piggy-backed on the external interrupt, and attacker can control the source of the external events, wouldn't this give her a handle ? --1XmnKQGVLLNJnMip Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (FreeBSD) iQIcBAEBAgAGBQJQ5GnNAAoJEJDCuSvBvK1BD7wP/AiejtH6wGLURiyHvqLN6rvo pjj0tVyGyBErD8Fk/iGaeT6Ok4mqMoKU2bNEGLE9FhC+9ixqxGJzZdGargtDSigq f5qDnnGx0ZYGRz9M/E66E8HJnUT0dLd4eb5zOPdHYUS4vceOdhXDOhRma68fRU4a pPsAPMVgZCFjh44MuL5Ip5hUYYXBvFqloMjsWX5QCm3oYm1mmAN6EX+45jO201sy BP10zANSKFD59QgFr8vWV9zBzx3dfeG6TjJFDKQ9UCV9OohxdrkFeanxE6cq7gxi kEjjUUNcCKiQ9XdzfdCawncYoO+9sioPrg+tHMLCaAPXOW+N8v9PGlPwUechS5kj HAdUwUhEOqFWppPSC9w89WaDGMUWDLo3DXSi+spk+towdaT3caZkKm+EVAmkCYqL xXwFNCmu/xHMLvqPUvnyH/6uq0DmNOHrtoJxKMMcP3fzRjdWFwzJYNMM6kMTDZGu /AcLEVJwU9+FgWGZgz+nAyQv62Z23YXk8nLOgUD33lS3zC4KV19onkhzYZ9XHQf2 TasP7/TFUgxjnU8l8QGwTgeb9oaHZHy2O4qH2jvPP3Eb22mkfoiaJHpVWrJ8iwek C7ZDUJVat0HIdazS6lE1geveBoDmINZemHyBK+f5XGYNfGUefVMqcSU77i8yOfFL 2h3QNlI93AdfFutRaPTU =oZCF -----END PGP SIGNATURE----- --1XmnKQGVLLNJnMip-- From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 13:11:08 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CCCA27D; Wed, 2 Jan 2013 13:11:08 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ia0-f171.google.com (mail-ia0-f171.google.com [209.85.210.171]) by mx1.freebsd.org (Postfix) with ESMTP id 2539B8FC08; Wed, 2 Jan 2013 13:11:07 +0000 (UTC) Received: by mail-ia0-f171.google.com with SMTP id k27so11814000iad.16 for ; Wed, 02 Jan 2013 05:11:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=Ge93wgbmwXzBy5wLHFxOPAMkTtllBOn/XYFxpSV6pMg=; b=MCWbvY3CpRUbFneMcabounMspKT0sSpk6f2qARWW9WOKXEfN53e3dzOf4uSoZpIkaS YY4SoT1V6FyIY4OLA/8XBSuS96ra6DJNnctXvT6A5aLO667HLTotfIWAk0H8n7kxBxcw Jujob+9zNT4UGv8XWAytj1JvqVgrkTScPLEueVstkob/tAssk9dF0QmB589AUrL0Y4qz fOeMFsSKEd8FUqQGJ3pRzsUwMqqPUjUNxGHujHOICcXBv+ezkSi4X47VBfZgNyFnPEV9 ydhMiPJmmff2ovijM4CbouCcBdtd6Nx1vT4JGe/wWwdZHCwtnsFZGp5AuCannINT5PoR PF9g== MIME-Version: 1.0 Received: by 10.50.190.199 with SMTP id gs7mr30698329igc.89.1357132266513; Wed, 02 Jan 2013 05:11:06 -0800 (PST) Sender: mavbsd@gmail.com Received: by 10.231.25.76 with HTTP; Wed, 2 Jan 2013 05:11:05 -0800 (PST) Received: by 10.231.25.76 with HTTP; Wed, 2 Jan 2013 05:11:05 -0800 (PST) In-Reply-To: <20130102122743.GA43241@onelab2.iet.unipi.it> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> Date: Wed, 2 Jan 2013 15:11:05 +0200 X-Google-Sender-Auth: c0dsQh7t4Ciqz-RqfgDDC7ofqvc Message-ID: Subject: Re: [RFC/RFT] calloutng From: Alexander Motin To: Luigi Rizzo X-Mailman-Approved-At: Wed, 02 Jan 2013 21:16:12 +0000 Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: Davide Italiano , Ian Lepore , freebsd-arch@freebsd.org, FreeBSD Current , Marius Strobl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 13:11:08 -0000 02.01.2013 14:28 =D0=CF=CC=D8=DA=CF=D7=C1=D4=C5=CC=D8 "Luigi Rizzo" =CE=C1=D0=C9=D3=C1=CC: > > On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote: > > On 02.01.2013 12:57, Luigi Rizzo wrote: > ... > > >i am not sure what you mean by "extending interval", but i believe the > > >logic should be the following: > > > > > >- say user requests a timeout after X seconds and with a tolerance of = D > > >second > > > (both X and D are fractional, so they can be short). Interpret this as > > > > > > "the system should do its best to generate an event between X and X+D > > > seconds" > > > > > >- convert X to an absolute time, T_X > > > > > >- if there are any pending events already scheduled between T_X and T_X+D, > > > then by definition they are acceptable. Attach the requested timeou= t > > > to the earliest of these events. > > > > All above is true, but not following. > > > > >- otherwise, schedule an event at time T_X (because there is no valid > > > reason to generate a late event, and it makes no sense from an > > > energy saving standpoint, either -- see below). > > > > System may have many interrupts except timer: network, disk, ... WiFi > > cards generate interrupts with AP beacon rate -- dozens times per > > second. It is not very efficient to wake up CPU precisely at T_X time, > > that may be just 100us earlier then next hardware interrupt. That's why > > timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next > > statclock, ...). As result, event will be handled within allowed range, > > but real delay will depends on current environment conditions. > > I don't see why system events (hardclock, statclock, 0.5s,...) > need to be treated specially -- and i am saying this also in > the interest of simplifying the logic of the code. Sure. That is mostly for historical reasons. At some point they should disappear, just not now, as patch is already quite big. > First of all, if you know that there is already a hardclock/statclock/* > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet > was ""no event scheduled in [T_X, T_X+D]" so you need to generate > a new one. That is true, but my main point was about merging with external events, which I can't predict and the only way to merge is increase sleep period, hoping for better. > Surely scheduling the event at T_X+D instead of T_X increases the > chance of merging events. But the saving are smaller and smaller > as the value X increases. This particular client will only > change its request rate from 1/X to 1/(X+D) so in relative terms > the gain is ( 1/X - 1/(X+D) ) / (1/(X+D) ) =3D D/X > > Example: if X =3D 300ms, and D =3D 10ms (as in the test case) > you just save one interrupt every 30seconds by scheduling at > T_X+D instead of T_X. Are we actually able to measure the > difference ? > > Even at high interrupt rates (e.g. X =3D 1ms) you are not > going to save a lot unless the tolerance D is very large, > which is generally undesirable for other reasons > (presumably, applications are not going to be happy > if you artificially double their timeouts). > Now, say your application requests timeouts every X =3D 300ms. With default precision set to 5% it will be only 5% save from periods increase. But that is absolutely not my goal! Imagine different case: you have NIC interrupts at 1000Hz. Also you have 100 callouts with 100ms period each. If we program timer with absolute precision, you will get about 2000Hz of total interrupt rate. But if we allow just 2% deviation, most of callouts will be grouped with NIC interrupts and total rate will be 1000Hz. Loosing _less_ then 2% of precision we are reducing interrupt rate in _half_! > > >It seems to me that you are instead extending the requested interval > > >upfront, which causes some gratuitous pessimizations in scheduling > > >the callout. > > > > > >Re. energy savings: the gain in extending the timeout cannot exceed > > >the value D/X. So while it may make sense to extend a 1us request > > >to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s, > > >it is completely pointless from an energy saving standpoint to > > >introduce a 10ms error on a 300ms request. > > > > I am not so sure in this. When CPU package is in C7 sleep state with al= l > > buses and caches shut down and memory set to self refresh, it consumes > > very few (some milli-Watts) of power. Wake up from that state takes > > 100us or even more with power consumption much higher then normal > > operational one. Sure, if we compare it with power consumption of 100% > > CPU load, difference between 10 and 100 wakeups per second may be small= , > > but when comparing to each other in some low-power environment for > > mostly idle system it may be much more significant. > > see above -- at low rates the difference is not measurable, > at high rates thCe only obvious answer is "do not use C7 unless > if the next interrupt is due in less than 2..5 milliseconds" > > > >(even though i hate the idea that a 1us request defaults to > > >a 50us delay; but that is hopefully something that can be tuned > > >in a platform-specific way and perhaps at runtime). > > > > It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us. > > very good, i suspected something similar, just wanted to be sure :) > > cheers > luigi > > > -- > > Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 21:39:21 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2006AA95; Wed, 2 Jan 2013 21:39:21 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ea0-f178.google.com (mail-ea0-f178.google.com [209.85.215.178]) by mx1.freebsd.org (Postfix) with ESMTP id 614D7623; Wed, 2 Jan 2013 21:39:20 +0000 (UTC) Received: by mail-ea0-f178.google.com with SMTP id k11so6115397eaa.37 for ; Wed, 02 Jan 2013 13:39:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=Uo48LHfkJdD9FVWjrQOe4d6qpXXXfB1b+Md8xaa4JYg=; b=VCkGVKddQebRYa8Q1TsRe+UT1A7G4zqhQ5qE35Vl9FSrEYf3vuE0unLy0IDdMTVBcG neVE6n/PQOcvLFXPLDj07I9hsWv8lWOBq93PpRsG3d4XQjchEwSEPnafb6EJJt7vmUy3 AAViJ5qAHFBQXogJVkfMc4pk9uxwV//T3vJZQLdnqzHKs4d33w1LC7sGBl9NSA+CQVr+ fdrveDjF6t6PdJCwEPaUhBrSEzLpcatPTf3McPzGQ2r0L4bHiqnciDi6Wcv5WjnWrNPn 6S47+OLdrd2gVT++CtmLncq3JDNJsPvaYOsBXkTGLnVf/8fVRLJBtz5piVmVHxpZFW3+ bJdQ== X-Received: by 10.14.204.70 with SMTP id g46mr102666330eeo.15.1357162759189; Wed, 02 Jan 2013 13:39:19 -0800 (PST) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id l3sm100090480eem.14.2013.01.02.13.39.16 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 Jan 2013 13:39:17 -0800 (PST) Sender: Alexander Motin Message-ID: <50E4A902.4050307@FreeBSD.org> Date: Wed, 02 Jan 2013 23:39:14 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120628 Thunderbird/13.0.1 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: [RFC/RFT] calloutng References: <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <20130102162206.GA45701@onelab2.iet.unipi.it> <20130102170934.GA82219@kib.kiev.ua> In-Reply-To: <20130102170934.GA82219@kib.kiev.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Davide Italiano , Ian Lepore , freebsd-arch@freebsd.org, FreeBSD Current , Marius Strobl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 21:39:21 -0000 On 02.01.2013 19:09, Konstantin Belousov wrote: > On Wed, Jan 02, 2013 at 05:22:06PM +0100, Luigi Rizzo wrote: >> Probably one way to close this discussion would be to provide >> a sysctl so the sysadmin can decide which point in the interval >> to pick when there is no suitable callout already scheduled. > Isn't trying to synchronize to the external events in this way unsafe ? > I remember, but cannot find the reference right now, a scheduler > exploit(s) which completely hide malicious thread from the time > accounting, by making it voluntary yielding right before statclock > should fire. If statistic gathering could be piggy-backed on the > external interrupt, and attacker can control the source of the external > events, wouldn't this give her a handle ? There are many different kinds of accounting with different characteristics. Run time for each thread calculated using high resolution per-CPU clocks on each context switch. It is impossible to hide from it. System load average updated using callout and aligned with hardclock(). Hiding from it was easy before, but it can be made more complicated (asynchronous) now. Per-CPU SYSTEM/INTERRUPT/USER/NICE/IDLE counters are updated by statcklock(), that is asynchronous to hardclock() and hiding from it supposed to be more complicated. But even before it was possible, since hardclock() frequency is 8 times higher then statclock() one. More important for scheduling fairness thread's CPU percentage is also based on hardclock() and hiding from it was trivial before, since all sleep primitives were strictly aligned to hardclock(). Now it is slightly less trivial, since this alignment was removed and user-level APIs provide no easy way to enforce it. The only way to get really safe accounting is forget about sampling and use potentially more expensive other ways. It was always stopped by lack of cheap and reliable clocks. But since TSC is P-state invariant on most of CPUs present now it could probably be reconsidered. -- Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Wed Jan 2 22:06:18 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 43B89B05; Wed, 2 Jan 2013 22:06:18 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-ea0-f181.google.com (mail-ea0-f181.google.com [209.85.215.181]) by mx1.freebsd.org (Postfix) with ESMTP id 272BC742; Wed, 2 Jan 2013 22:06:16 +0000 (UTC) Received: by mail-ea0-f181.google.com with SMTP id k14so5830467eaa.26 for ; Wed, 02 Jan 2013 14:06:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=3EcKGr55zw6+/irtMq+g7UAr+0y89PXbs2Myu69P0YQ=; b=Tv8eLhy6bKcOnHGEpyLTNFPl+wJaa2eKk4/HAH1XsZZE4Gwv21IwvSTBvEIhkDBL8s 1yssBkVYX9kSyeHoev85Q3s3HeNMXNtSH2DzX38QxtR6vw47RlAB93FE2LlA2T0jEAAp eQmqdTCMC0vEsNg1UrG37axDth1r4lG2CidlwvcH8ca+AQuZWFY5U6pCRbrV7PEVFu8W uIPuN090Wg0q3mE9mTcU5jsBxDFT1xtIetBH+BSQSurcEH/ixhxV04t2GDbKgqJY3bim eB/1j6yFuy0WVLYPE6B7q/0oBHFmvH7DG/QpvS6stDrkrfvL8fOWSbnzN4Dh8UJangIY dWyA== X-Received: by 10.14.0.133 with SMTP id 5mr127300082eeb.29.1357164370592; Wed, 02 Jan 2013 14:06:10 -0800 (PST) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id w44sm100140868eep.6.2013.01.02.14.06.07 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 02 Jan 2013 14:06:09 -0800 (PST) Sender: Alexander Motin Message-ID: <50E4AF4C.2070902@FreeBSD.org> Date: Thu, 03 Jan 2013 00:06:04 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120628 Thunderbird/13.0.1 MIME-Version: 1.0 To: Adrian Chadd Subject: Re: [RFC/RFT] calloutng References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <1357135374.54953.150.camel@revolution.hippie.lan> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Davide Italiano , Ian Lepore , Marius Strobl , FreeBSD Current , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2013 22:06:18 -0000 On 02.01.2013 18:08, Adrian Chadd wrote: > .. I'm pretty damned sure we're going to need to enforce a "never > earlier than X" latency. Do you mean here that we should never wake up before specified time (just as specified by the most of existing APIs), or that we should not allow sleep shorter then some value to avoid DoS? At least on x86 nanosleep(0) doesn't allow to block the system. Also there is already present mechanism for specifying minimum timer programming interval in eventtimers(9) KPI. > Is there a more detailed writeup of calloutng somewhere, besides > David's slides? The wiki page is rather empty. There are updated manual pages in the patch. Also Davide written some blog during GSoC. Now we are working on papers for the AsiaBSDCon. > Eg - I think this work does coalesce wakeups, right? Or it can? So > when in low-power scenarios you can end up with lower-resolution > callout periods, but many less CPU wakeups a second? This work does coalesce wakeups out of the box, but also provide ways to improve it further, where possible. With additional tuning of some kernel subsystems and drivers I was able to drop total idle interrupt rate down to 10-15Hz on arm and 20-30Hz on x86. > (Do we actually _expose_ wakeups-per-second somewhere?) On systems with ACPI there are average per-CPU sleep times exposed via sysctls dev.cpu.X.cx_usage. Also cpu_idle() call rate calculated by both schedulers for purposes of idle loop optimizations, but it is not exposed outside now. Also for idle SMP system enabling COUNT_IPIS should give number of interrupts in systat comparable to number of wakeups. I am mostly using the last way. -- Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 05:52:45 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id E965A71D; Thu, 3 Jan 2013 05:52:45 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-ea0-f176.google.com (mail-ea0-f176.google.com [209.85.215.176]) by mx1.freebsd.org (Postfix) with ESMTP id E335C6CE; Thu, 3 Jan 2013 05:52:44 +0000 (UTC) Received: by mail-ea0-f176.google.com with SMTP id d13so5967972eaa.7 for ; Wed, 02 Jan 2013 21:52:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=YIee0nzPlls5BnzFjKLFHW7KoULSislsRykqse+eXTw=; b=PhzqX8ilrQY+1U+fo3LmWn5BIEhr+UvMkrclm3V7ctg08p26oOrMSmLRsy7YofqmGN U75qKZv4DoBeDgOiVn91Jk71GnGNOlDVo+/7Xs07TiPgs/KdKMxin+m8JUOkbkL+yZvA 33OnhVMRQwIRxc/mRBlwEM2ZI8eCr0NcIEb8kZNeiCM6RzosYQDY5G/qknV9iv4EHjz9 jTofSwPkY8GOX/Oq9i/9fznwptIZ1BNRGpMXu7uV9gU79nvYg2ZCW7rDyB24p5/tVpY/ +E9R8WjEUDbVAI7RpCJ3fofcO6NinxswQJPmk5EIfF+MUHBeXy2PCSLkbpmbY8xUH2RI Hg0g== MIME-Version: 1.0 Received: by 10.14.184.134 with SMTP id s6mr130647516eem.43.1357192358178; Wed, 02 Jan 2013 21:52:38 -0800 (PST) Received: by 10.223.170.193 with HTTP; Wed, 2 Jan 2013 21:52:37 -0800 (PST) In-Reply-To: <50E4AF4C.2070902@FreeBSD.org> References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <1357135374.54953.150.camel@revolution.hippie.lan> <50E4AF4C.2070902@FreeBSD.org> Date: Wed, 2 Jan 2013 21:52:37 -0800 Message-ID: Subject: Re: [RFC/RFT] calloutng From: Kevin Oberman To: Alexander Motin Content-Type: text/plain; charset=UTF-8 Cc: Davide Italiano , Ian Lepore , Adrian Chadd , Marius Strobl , FreeBSD Current , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 05:52:46 -0000 On Wed, Jan 2, 2013 at 2:06 PM, Alexander Motin wrote: > On 02.01.2013 18:08, Adrian Chadd wrote: >> >> .. I'm pretty damned sure we're going to need to enforce a "never >> earlier than X" latency. > > > Do you mean here that we should never wake up before specified time (just as > specified by the most of existing APIs), or that we should not allow sleep > shorter then some value to avoid DoS? At least on x86 nanosleep(0) doesn't > allow to block the system. Also there is already present mechanism for > specifying minimum timer programming interval in eventtimers(9) KPI. I can see serious performance issues with some hardware (wireless comes to mind) if things happen too quickly. Intuition is that it could also play hob with VMs. I believe that the proper way is to wake between T_X and T_X + D. This assumes that D is max_wake_delay, not deviation, which leaves us at the original of (T_X) =< event_time =< (T_X + D). -- R. Kevin Oberman, Network Engineer E-mail: kob6558@gmail.com From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 08:42:25 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5737B333; Thu, 3 Jan 2013 08:42:25 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 06BBB6CF; Thu, 3 Jan 2013 08:42:24 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id C099A7300A; Thu, 3 Jan 2013 09:41:26 +0100 (CET) Date: Thu, 3 Jan 2013 09:41:26 +0100 From: Luigi Rizzo To: Kevin Oberman Subject: Re: [RFC/RFT] calloutng Message-ID: <20130103084126.GC54360@onelab2.iet.unipi.it> References: <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <1357135374.54953.150.camel@revolution.hippie.lan> <50E4AF4C.2070902@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Cc: Davide Italiano , Ian Lepore , Adrian Chadd , Alexander Motin , freebsd-arch@freebsd.org, FreeBSD Current , Marius Strobl X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 08:42:25 -0000 On Wed, Jan 02, 2013 at 09:52:37PM -0800, Kevin Oberman wrote: > On Wed, Jan 2, 2013 at 2:06 PM, Alexander Motin wrote: > > On 02.01.2013 18:08, Adrian Chadd wrote: > >> > >> .. I'm pretty damned sure we're going to need to enforce a "never > >> earlier than X" latency. > > > > > > Do you mean here that we should never wake up before specified time (just as > > specified by the most of existing APIs), or that we should not allow sleep > > shorter then some value to avoid DoS? At least on x86 nanosleep(0) doesn't > > allow to block the system. Also there is already present mechanism for > > specifying minimum timer programming interval in eventtimers(9) KPI. > > I can see serious performance issues with some hardware (wireless > comes to mind) if things happen too quickly. Intuition is that it > could also play hob with VMs. > > I believe that the proper way is to wake between T_X and T_X + D. > This assumes that D is max_wake_delay, not deviation, which leaves us > at the original of (T_X) =< event_time =< (T_X + D). i think "max delay" was the intended meaning of the D parameter. We picked bad names (tolerance, deviation,...) for it. cheers luigi From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 14:45:55 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8ED726E5; Thu, 3 Jan 2013 14:45:55 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id F2217C2F; Thu, 3 Jan 2013 14:45:54 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r03EjbNs031145 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Jan 2013 01:45:39 +1100 Date: Fri, 4 Jan 2013 01:45:37 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Alexander Motin Subject: Re: [RFC/RFT] calloutng In-Reply-To: <50E4A902.4050307@FreeBSD.org> Message-ID: <20130103232413.O947@besplex.bde.org> References: <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <20130102162206.GA45701@onelab2.iet.unipi.it> <20130102170934.GA82219@kib.kiev.ua> <50E4A902.4050307@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=BrrFWvr5 c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mUAV9h2nInsA:10 a=F1MZp-0HJ9x99sWca6YA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Davide Italiano , Ian Lepore , Marius Strobl , FreeBSD Current , freebsd-arch@FreeBSD.org, Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 14:45:55 -0000 On Wed, 2 Jan 2013, Alexander Motin wrote: > On 02.01.2013 19:09, Konstantin Belousov wrote: >> On Wed, Jan 02, 2013 at 05:22:06PM +0100, Luigi Rizzo wrote: >>> Probably one way to close this discussion would be to provide >>> a sysctl so the sysadmin can decide which point in the interval >>> to pick when there is no suitable callout already scheduled. >> Isn't trying to synchronize to the external events in this way unsafe ? >> I remember, but cannot find the reference right now, a scheduler >> exploit(s) which completely hide malicious thread from the time >> accounting, by making it voluntary yielding right before statclock >> should fire. If statistic gathering could be piggy-backed on the >> external interrupt, and attacker can control the source of the external >> events, wouldn't this give her a handle ? Fine-grained timeouts complete fully opening this security hole. Synchronization without fine-grained timeouts might allow the same, but is harder to exploit since you can't control the yielding points directly. With fine-grained timeouts, you just have to predict the statclock firing points. Use one timeout to arrange to yield just before statclock fires and another to regain control just after it has fired. If the timeout resolution is say 50 usec, then this can hope to run for all except 100 usec out of every 1/stathz seconds. With stathz = 128, 1/stathz is 7812 usec, so this gives 7712/7812 of the CPU with 0 statclock ticks. Since the scheduler never sees you running, your priority remains minimal, so the scheduler should prefer to run you whenever a timeout expires, with only round-robin with other minimal-priority threads preventing you getting 7712/7812 of the (user non-rtprio) CPU. The previous stage of fully opening this security hole was changing (the default) HZ from 100 to 1000. HZ must not be much smaller than stathz, else the security hole is almost fully open. With HZ = 100 being less than stathz and timeout granularity limiting the fine control to 2/HZ = 20 msec (except you can use a periodic itimer to get a 1/HZ granularity at a minor cost of getting more SIGALRMs), it is impossible to get near 100% of the CPU with 0 statclock ticks. After yielding, you can't get control for another 100 or 200 msec. Since this exceeds 1/stathz = 78.12 usec, you can only hide from statclock ticks by not running very often or for very long. Limited hiding is possible by wasting even more CPU to determine when to hide: since the timeout granularity is large, it is also ineffective for determining when to yield. So when running, you must poll the current time a lot to determine when to yield. Yield just before statclock fires, as above. (Do it 50 usec early, as above, to avoid most races involving polling the time.) This actually has good chances of not limiting the hiding too much, depending on the details of the scheduling. It yields just before a statclock tick. After this tick fires, if the scheduler reschedules for any reason, then the hiding process would most likely be run again, since its priority is minimal. But at least the old 4BSD scheduler doesn't reschedule after _every_ statclock tick. This depends on the bugfeature that the priority is not checked on _every_ return to user mode (sched_clock() does change the priority, but this is not acted on until much later). Without this bugfeature, there would be excessive context switches. OTOH, with timeouts, at least old non-fine-grained ones, you can force a rescheduling that is acted on soon enough simply by using timeouts (since timeouts give a context switch to the softclock thread, the scheduler has no option to skip checking the priority on return to user mode). After the previous stage of changing HZ to 1000, the granuarity is fine enough for using timeouts to hide from the scheduler. Using a periodic itimer to get a granularity of 1000 usec, start hiding 50-1000 usec before each statclock tick and regain control 1000 usec later. With stathz = 128, 6812/7812 of the CPU with 0 statclock ticks. Not much worse (for the hider) than 7712/7812. Statclock was supposed to be aperiodic to avoid hiding (see statclk-usenix93.ps), but this was never implemented in FreeBSD. With fine-grained timeouts, it would have to be very aperiodic, to the point of giving large inaccuracies, to limit the hiding very much. For example, suppose that it has an average period of 7812 usec with +-50% jitter. You would try to hide from it most of the time by running for a bit less than 7812/2 usec before yielding in most cases. If too much scheduling is done on each statclock tick, then you are likely to regain control after each one (as above) and then know that there is almost a full minimal period until the next one. Otherwise, it seems to be necessary to determine when the previous statclock tick occurred, so as to determine the minimum time until the next one. > There are many different kinds of accounting with different characteristics. > Run time for each thread calculated using high resolution per-CPU clocks on > each context switch. It is impossible to hide from it. But this (td_runtime) is not used at all for scheduling, since the high-resolution per-CPU clocks didn't exist originally and now since reading of them is expensive. However, the read is done on every context switch. This pessimizes context switches, with the pessimization reduced a little by using inaccurate cpu_ticks() timers instead of timecounters, so the expenses for using td_runtime for scheduling are mostly already paid for. The main case where it doesn't work is when there are few context switches, but there are usually more than a few for interrupts to ithreads for network and disk i/o. Updating td_runtime in statclock() (where it is not normally updated since there is no context switch) would probably make td_runtime usable for scheduling without increasing the pessimization much. > System load average > updated using callout and aligned with hardclock(). Hiding from it was easy > before, but it can be made more complicated (asynchronous) now. I think you mean it was easy because the timeout period is so long. Any aperiodicity in a timer of course means that it is harder to predict. The load average timeout was already randomized (I think it is 5 +- 2 seconds with 1/hz granularity). That is a large variance, but you can still hide from it about 3/5 of the time. loadav is not very important, and is not even used by SCHED_ULE. SCHED_ULE uses more fine-grained statistics based on statclock. > Per-CPU > SYSTEM/INTERRUPT/USER/NICE/IDLE counters are updated by statcklock(), that is > asynchronous to hardclock() and hiding from it supposed to be more > complicated. But even before it was possible, since hardclock() frequency is > 8 times higher then statclock() one. These are not used for scheduling. In SCHED_4BSD, little more than the event of a statclock tick is used for scheduling. sched_clock() uses this to update td_estcpu and then the priority but not much more. In SCHED_ULE, sched_clock() updates many more private variables. There are also statistics for memory use and similar things maintained by statclock(). These are not very important, like loadav for SCHED_ULE (purely information, for userland). I don't see s better way than _periodic_ statclock ticks for maintaining these. Aperiodicity just makes them less accurate for the usual case where there is no hiding (unless this is not the usual case, due to accidental synchronization causing non-malicious hiding, and if this is a problem then it can be fixed using only a small amount of aperiodicity). > More important for scheduling fairness > thread's CPU percentage is also based on hardclock() and hiding from it was > trivial before, since all sleep primitives were strictly aligned to > hardclock(). Now it is slightly less trivial, since this alignment was > removed and user-level APIs provide no easy way to enforce it. %cpu is actually based on statclock(), and not even used for scheduling. The alignment of sleep primitives mainly increased the chances of accidental synchronization. (This was apparently a problem when all clocks were derived from the lapic timer. I didn't like the fixes for that.) For malicious hiding, it only makes the hiding less trivial for hardclock ticks to be periodic and somewhat aligned with statclock ticks. The phase will drift, so that there is no long-term alignment, unless the clocks are derived for the same one (and not randomized). So the statclock firings will sometimes be mispredicted, or the hiding would have to be conservative, or the prediction updated a lot. It is simplest and probably most malicious to accept occasional mispredictions and recalibrate the prediction after missing. This is any easy case of relcalibrating often for highly aperiodic statclocks. > The only way to get really safe accounting is forget about sampling and use > potentially more expensive other ways. It was always stopped by lack of cheap > and reliable clocks. But since TSC is P-state invariant on most of CPUs > present now it could probably be reconsidered. And even if the clock is expensive and unreliable, it is already used for td_runtime, as described above. Large unreliability is actually less of a problem for scheduling than for td_runtime -- an error of 50% would barely affect scheduling since scheduling is only heuristic, but if td_runtime is off by 50% (as it often is without P-state invariance and with longstanding bugs in cpu_ticks() calibration), then it gives nonsense like user times exceeding real times by 50%. When I last thought about using reliable clocks for statistics, I was most interested in simplifiying things by removing all the the tick counters and not in making schedulers fairer. I still can't see how to do the former. To do the user/sys/interrupt accounting accurately, the clock would have to be read on every syscall entry and exit, and that is too much (it is bad enough that it is read on almost every interrupt, without even making interrupt accounting actually work right). But for scheduling decisions, a fuzzy copy of the full td_runtime is enough, at least for SCHED_4BSD. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 15:51:15 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 0D15AD97 for ; Thu, 3 Jan 2013 15:51:15 +0000 (UTC) (envelope-from des@des.no) Received: from smtp.des.no (smtp.des.no [194.63.250.102]) by mx1.freebsd.org (Postfix) with ESMTP id CA254C2 for ; Thu, 3 Jan 2013 15:51:14 +0000 (UTC) Received: from ds4.des.no (smtp.des.no [194.63.250.102]) by smtp.des.no (Postfix) with ESMTP id 5F3016B84 for ; Thu, 3 Jan 2013 16:51:13 +0100 (CET) Received: by ds4.des.no (Postfix, from userid 1001) id 2C0AD8B5E; Thu, 3 Jan 2013 16:51:13 +0100 (CET) From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= To: arch@freebsd.org Subject: BURN_BRIDGES in bsd.own.mk Date: Thu, 03 Jan 2013 16:51:12 +0100 Message-ID: <86zk0q1czz.fsf@ds4.des.no> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 15:51:15 -0000 We still have code in bsd.own.mk to support a selection of old-style NO_FOO options, protected by .if !defined(BURN_BRIDGES) (which is not in any way related to the similarly-named kernel option). There is also bsd.compat.mk, which handles the even older-style NOFOO options (by translating them to NO_FOO and emitting a warning). These chunks of code date back to 2006 and 2004, respectively. I think it's way past time we nuked them. Any objections? DES --=20 Dag-Erling Sm=C3=B8rgrav - des@des.no From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 16:12:26 2013 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 32EE842C; Thu, 3 Jan 2013 16:12:26 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-la0-f51.google.com (mail-la0-f51.google.com [209.85.215.51]) by mx1.freebsd.org (Postfix) with ESMTP id 4875F192; Thu, 3 Jan 2013 16:12:24 +0000 (UTC) Received: by mail-la0-f51.google.com with SMTP id fj20so8223223lab.38 for ; Thu, 03 Jan 2013 08:12:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:sender:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=Ph9x+C/HqilLh+giuX2YFQtzManr/VaLiqQB1Y/XUuA=; b=cn7hfXcSx2tfRJOyNVaZxZxUQr62OM29cZRc+20Z8lsortEHEpc3O0v6F/KdH6SPhF EfKysg2/bmPoaAzkbcZPXk5LfevG2KebC4gbRIZVEh5W2L8O0hIaMCrE6x0ScscLY/d4 p8l+U6cDUtmT8jHOSb4tlLXtNEK54wWv9FLz5mo+IYa0xuQU8Dt5GJ+7d1A3iVxYFRQF yfOl6Fp8+ee7Hnd5EuKnbjmF4if7tcy7sMAiV2YdY+NC76yq1gLARnD0Wzqq0Hz5P9sV oEm4Zlws7Ee08bZPSnmyjone6sze43OS+Bo489yXyhi4M0YmQ6IrMqAN4a6WU7/IbNSf pFoQ== X-Received: by 10.112.38.72 with SMTP id e8mr15978420lbk.123.1357229543977; Thu, 03 Jan 2013 08:12:23 -0800 (PST) Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37]) by mx.google.com with ESMTPS id hc20sm18566188lab.11.2013.01.03.08.12.18 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 03 Jan 2013 08:12:21 -0800 (PST) Sender: Alexander Motin Message-ID: <50E5ADE1.4020104@FreeBSD.org> Date: Thu, 03 Jan 2013 18:12:17 +0200 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:13.0) Gecko/20120628 Thunderbird/13.0.1 MIME-Version: 1.0 To: Bruce Evans Subject: Re: [RFC/RFT] calloutng References: <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <20130102162206.GA45701@onelab2.iet.unipi.it> <20130102170934.GA82219@kib.kiev.ua> <50E4A902.4050307@FreeBSD.org> <20130103232413.O947@besplex.bde.org> In-Reply-To: <20130103232413.O947@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Davide Italiano , Ian Lepore , Marius Strobl , FreeBSD Current , freebsd-arch@FreeBSD.org, Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 16:12:26 -0000 On 03.01.2013 16:45, Bruce Evans wrote: > On Wed, 2 Jan 2013, Alexander Motin wrote: >> More important for scheduling fairness thread's CPU percentage is also >> based on hardclock() and hiding from it was trivial before, since all >> sleep primitives were strictly aligned to hardclock(). Now it is >> slightly less trivial, since this alignment was removed and user-level >> APIs provide no easy way to enforce it. > > %cpu is actually based on statclock(), and not even used for scheduling. May be for SCHED_4BSD, but not for SCHED_ULE. In SCHED_ULE both %cpu and thread priority based on the same ts_ticks counter, that is based on hardclock() as time source. Interactivity calculation uses alike logic and uses the same time source. -- Alexander Motin From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 16:55:07 2013 Return-Path: Delivered-To: freebsd-arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 8A6F27E4; Thu, 3 Jan 2013 16:55:07 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 21F17364; Thu, 3 Jan 2013 16:55:06 +0000 (UTC) Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r03GstDD010598 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 4 Jan 2013 03:54:56 +1100 Date: Fri, 4 Jan 2013 03:54:55 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Alexander Motin Subject: Re: [RFC/RFT] calloutng In-Reply-To: <50E5ADE1.4020104@FreeBSD.org> Message-ID: <20130104034917.O1929@besplex.bde.org> References: <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it> <20130102162206.GA45701@onelab2.iet.unipi.it> <20130102170934.GA82219@kib.kiev.ua> <50E4A902.4050307@FreeBSD.org> <20130103232413.O947@besplex.bde.org> <50E5ADE1.4020104@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.0 cv=Zr21sKHG c=1 sm=1 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mUAV9h2nInsA:10 a=5cdyoQTr-aBsL13Ni88A:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117 Cc: Davide Italiano , Ian Lepore , Marius Strobl , FreeBSD Current , freebsd-arch@FreeBSD.org, Konstantin Belousov X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 16:55:07 -0000 On Thu, 3 Jan 2013, Alexander Motin wrote: > On 03.01.2013 16:45, Bruce Evans wrote: >> On Wed, 2 Jan 2013, Alexander Motin wrote: >>> More important for scheduling fairness thread's CPU percentage is also >>> based on hardclock() and hiding from it was trivial before, since all >>> sleep primitives were strictly aligned to hardclock(). Now it is >>> slightly less trivial, since this alignment was removed and user-level >>> APIs provide no easy way to enforce it. >> >> %cpu is actually based on statclock(), and not even used for scheduling. > > May be for SCHED_4BSD, but not for SCHED_ULE. In SCHED_ULE both %cpu and > thread priority based on the same ts_ticks counter, that is based on > hardclock() as time source. Interactivity calculation uses alike logic and > uses the same time source. Hmm. I missed this because it hacks on the 'ticks' global. It is clearer in intermediate versions which use the scheduler API sched_tick(), which is the hardclock analogue of sched_clock() for statclock. sched_tick() is now bogus since it is null for all schedulers. Bruce From owner-freebsd-arch@FreeBSD.ORG Thu Jan 3 18:15:06 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 1EF354B5 for ; Thu, 3 Jan 2013 18:15:06 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by mx1.freebsd.org (Postfix) with ESMTP id 9F80B94C for ; Thu, 3 Jan 2013 18:15:05 +0000 (UTC) Received: by mail-la0-f53.google.com with SMTP id fn20so8301645lab.40 for ; Thu, 03 Jan 2013 10:14:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=cmu5rNPICt5KD0fe1dwgAGLnu39w8Mnb6GROgSOaWXY=; b=ZXzoUJ3XapLVmsHZAmib7BWJNtMfpPiJqvc5XJBE1mZYSTWL4rmRCCCjqfCI2wqbOJ tY87pMlkHgis+cQ3XHrCBqfRnplSmM94znrD5fjUdoYj+tkuTm7y1+Rs5Gm3pQ8aEYL7 W8IY4lxPtSUGoXnvtsaH3F5lp1SgmrVZmwCts= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=cmu5rNPICt5KD0fe1dwgAGLnu39w8Mnb6GROgSOaWXY=; b=WGpV+hOz+xK8azf1lvf7CYnSI1k0rBHKiXECLh00VcDWyPjwOMCslURw4kH7cn0Clu ipG6/wB8iCRyciHgKTshf5vS1Mv6JqL4xAUOdJhAz8W861C+ovH8faTauAi7mMOGV3Ha SwsGvDTrLo1d5vXsZJ+z5B2WUMhfLjdmxCUjWHEnENrTC9sUkxOI1n4u0+2rK8zhwl+C W9JztjSKK8Fa6OjaOKuG0cBJdf8Qnu07eqa4DWMIVHOMfS6f3zBtInuYfrlc1fmDNGIN nHuWWORbU7yInosZ5DrqFoflY2WxvCjYRuol7Bxmko6NkvJe664/Q1EZyNas48iu0z7A hFEQ== Received: by 10.112.50.138 with SMTP id c10mr20274441lbo.104.1357236899187; Thu, 03 Jan 2013 10:14:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.75.200 with HTTP; Thu, 3 Jan 2013 10:14:29 -0800 (PST) In-Reply-To: <86zk0q1czz.fsf@ds4.des.no> References: <86zk0q1czz.fsf@ds4.des.no> From: Eitan Adler Date: Thu, 3 Jan 2013 13:14:29 -0500 Message-ID: Subject: Re: BURN_BRIDGES in bsd.own.mk To: =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQnwBIhRLkKaZ9VHjrpTXVChwVasCNsZznSqxcQfa2ocvPTBBUMoBEE6EPpAwwxnm4jmM0ZC Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Jan 2013 18:15:06 -0000 On 3 January 2013 10:51, Dag-Erling Sm=C3=B8rgrav wrote: > We still have code in bsd.own.mk to support a selection of old-style > NO_FOO options, protected by .if !defined(BURN_BRIDGES) (which is not in > any way related to the similarly-named kernel option). There is also > bsd.compat.mk, which handles the even older-style NOFOO options (by > translating them to NO_FOO and emitting a warning). These chunks of > code date back to 2006 and 2004, respectively. I think it's way past > time we nuked them. Any objections? No. See http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dconf/155738 I've been told this required an exp-run last I asked. --=20 Eitan Adler From owner-freebsd-arch@FreeBSD.ORG Fri Jan 4 06:23:41 2013 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id B56BD2A0 for ; Fri, 4 Jan 2013 06:23:41 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ie0-f172.google.com (mail-ie0-f172.google.com [209.85.223.172]) by mx1.freebsd.org (Postfix) with ESMTP id 8A3CBEB7 for ; Fri, 4 Jan 2013 06:23:41 +0000 (UTC) Received: by mail-ie0-f172.google.com with SMTP id c13so19470937ieb.3 for ; Thu, 03 Jan 2013 22:23:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to:x-mailer:x-gm-message-state; bh=1pV0+a9+Pvhhj+eyoA4dYE+fS7P5Yhq9Cvyw4oj6VAI=; b=Y7KilCIiQ3zLFob2fPSlskeXyn4NTuuOwlTcLgVOHlYIOhdbuAk9F6A5TeecwfEux5 BL9tIZm1pkvtAlOW3ne3rRT5BroUu86/HIGSdQcO341gnKjzX/QcRr6+bFOq41ts+NfO QqDJCOxLmL0PbW+mQQnMwHooypOV87qll1OYXOTjDdG/SAl/7eyxemKLO3gpP8pMKs8G aodNtnL93DWifuuFgEhZnMHHL+LHA6N1WvY5mDbAQOjEQsQNWHyeaMcFR86ONDw1bwic qx0vg8HoF0mav7epQbXJ/XQgyYIu3/8h4VTeIEDl+bZYnJ1yBl1UM1YJZiyHBwLsMlS2 /W8g== X-Received: by 10.42.51.142 with SMTP id e14mr39413161icg.2.1357280620881; Thu, 03 Jan 2013 22:23:40 -0800 (PST) Received: from [192.168.43.239] (me62836d0.tmodns.net. [208.54.40.230]) by mx.google.com with ESMTPS id xn10sm46361139igb.4.2013.01.03.22.23.38 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 03 Jan 2013 22:23:39 -0800 (PST) Sender: Warner Losh Subject: Re: BURN_BRIDGES in bsd.own.mk Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=iso-8859-1 From: Warner Losh In-Reply-To: <86zk0q1czz.fsf@ds4.des.no> Date: Fri, 4 Jan 2013 00:23:29 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <352CB28B-276F-4B3C-B153-02A3AD435938@bsdimp.com> References: <86zk0q1czz.fsf@ds4.des.no> To: =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQndnb7LxNws/Yl49VOFkRD2OcusjdrW/7VTTOWgMp2feiWWMJ2ecnKl2+vFUBsNwRVya7Mz Cc: arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Jan 2013 06:23:41 -0000 On Jan 3, 2013, at 9:51 AM, Dag-Erling Sm=F8rgrav wrote: > We still have code in bsd.own.mk to support a selection of old-style > NO_FOO options, protected by .if !defined(BURN_BRIDGES) (which is not = in > any way related to the similarly-named kernel option). There is also > bsd.compat.mk, which handles the even older-style NOFOO options (by > translating them to NO_FOO and emitting a warning). These chunks of > code date back to 2006 and 2004, respectively. I think it's way past > time we nuked them. Any objections? Kill them. When they were put in, they were there to designate things that = shouldn't be relied upon and would be deleted soon... Warner