From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 10:58:35 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 14FE2A14;
 Wed,  2 Jan 2013 10:58:35 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
 by mx1.freebsd.org (Postfix) with ESMTP id 6D1D68FC08;
 Wed,  2 Jan 2013 10:58:34 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
 id 3209473027; Wed,  2 Jan 2013 11:57:30 +0100 (CET)
Date: Wed, 2 Jan 2013 11:57:30 +0100
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: [RFC/RFT] calloutng
Message-ID: <20130102105730.GA42542@onelab2.iet.unipi.it>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <50E16637.9070501@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 10:58:35 -0000

On Mon, Dec 31, 2012 at 12:17:27PM +0200, Alexander Motin wrote:
> On 31.12.2012 08:17, Luigi Rizzo wrote:
> >On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote:
...
> >>Then I noticed you had a 12_26 patchset so I tested
> >>that (after crudely fixing a couple uninitialized var warnings), and it
> >>all looks good on this arm (Raspberry Pi).  I'll attach the results.
> >>
> >>It's so sweet to be able to do precision sleeps.
> 
> Thank you for testing, Ian.
> 
> >interesting numbers, but there seems to be some problem in computing
> >the exact interval; delays are much larger than expected.
> >
> >In this test, the original timer code used to round to the next multiple
> >of 1 tick and then add another tick (except for the kqueue case),
> >which is exactly what you see in the second set of measurements.
> >
> >The calloutng code however seems to do something odd:
> >in addition to fixed overhead (some 50us, which you can see in
> >the tests for 1us and 300us), all delay seem to be ~10% larger
> >than what is requested, upper bounded to 10ms (note, the
> >numbers are averages so i cannot tell whether all samples are
> >the same or there is some distribution of values).
> >
> >I am not sure if this error is peculiar of the ARM version or also
> >appears on x86/amd64 but I believe it should be fixed.
> >
> >If you look at the results below:
> >
> >1us 	possily ok:
> >	for very short intervals i would expect some kind
> >	of 'reschedule' without actually firing a timer; maybe
> >	50us are what it takes to do a round through the scheduler ?
> >
> >300us	probably ok
> >	i guess the extra 50-90us are what it takes to do a round
> >	through the scheduler
> >
> >1000us	borderline (this is the case for poll and kqueue, which are
> >	rounded to 1ms)
> >	here intervals seem to be increased by 10%, and i cannot see
> >	a good reason for this (more below).
> >
> >3000us and above: wrong
> >	here again, the intervals seem to be 10% larger than what is
> >	requested, perhaps limiting the error to 10-20ms.
> >
> >
> >Maybe the 10% extension results from creating a default 'precision'
> >for legacy calls, but i do not think this is done correctly.
> >
> >First of all, if users do not specify a precision themselves, the
> >automatically generated value should never exceed one tick.
> >
> >Second, the only point of a 'precision' parameter is to merge
> >requests that may be close in time, so if there is already a
> >timer scheduled within [Treq, Treq+precision] i will get it;
> >but if there no pending timer, then one should schedule it
> >for the requested interval.
> >
> >Davide/Alexander, any ideas ?
> 
> All mentioned effects could be explained with implemented logic. 50us at 
> 1us is probably sum of minimal latency of the hardware eventtimer on the 
> specific platform and some software processing overhead (syscall, 
> callout, timecouters, scheduler, etc). At later points system starts to 
> noticeably use precision specified by kern.timecounter.alloweddeviation 
> sysctl. It affects results from two sides: 1) extending intervals for 
> specified percent of time to allow event aggregation, and 2) choosing 
> time base between fast getbinuptime() and precise binuptime(). Extending 
> interval is needed to aggregate not only callouts with each other, but 
> also callouts with other system events, which are impossible to schedule 
> in advance. It gives specified relative error, but no more then one CPU 
> wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) 
> it is 1/hz, for completely idle one it can be up to 0.5s. Second point 
> allows to reduce processing overhead by the cost of error up to 1/hz for 
> long periods (>(100/allowed)*(1/hz)), when it is used.

i am not sure what you mean by "extending interval", but i believe the
logic should be the following:

- say user requests a timeout after X seconds and with a tolerance of D second
  (both X and D are fractional, so they can be short).  Interpret this as

   "the system should do its best to generate an event between X and X+D seconds"

- convert X to an absolute time, T_X

- if there are any pending events already scheduled between T_X and T_X+D,
  then by definition they are acceptable. Attach the requested timeout
  to the earliest of these events.

- otherwise, schedule an event at time T_X (because there is no valid
  reason to generate a late event, and it makes no sense from an
  energy saving standpoint, either -- see below).

It seems to me that you are instead extending the requested interval
upfront, which causes some gratuitous pessimizations in scheduling
the callout.

Re. energy savings: the gain in extending the timeout cannot exceed
the value D/X. So while it may make sense to extend a 1us request
to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s,
it is completely pointless from an energy saving standpoint to
introduce a 10ms error on a 300ms request.

(even though i hate the idea that a 1us request defaults to
a 50us delay; but that is hopefully something that can be tuned
in a platform-specific way and perhaps at runtime).

cheers
luigi

> To get best possible precision kern.timecounter.alloweddeviation sysctl 
> can be set to smaller value. Setting it to 0 will effectively disable 
> all optimizations, but should give 50us precision in all cases.
> 
> >>for t in 1 300 3000 30000 300000 ; do
> >>   for m in select poll usleep nanosleep kqueue kqueueto syscall ; do
> >>     ./testsleep $t $m
> >>   done
> >>done
> >>
> >>
> >>With calloutng_12_26.patch...
> >>
> >>                 HZ=100               HZ=250               HZ=1000
> >>---------- ----------------     ----------------     ----------------
> >>select          1     55.79          1     50.96          1     61.32
> >>poll            1   1109.46          1   1107.86          1   1114.38
> >>usleep          1     56.33          1     72.90          1     62.78
> >>nanosleep       1     52.66          1     55.23          1     64.23
> >>kqueue          1   1114.23          1   1113.81          1   1121.21
> >>kqueueto        1     65.44          1     71.00          1     75.01
> >>syscall         1      4.70          1      4.45          1      4.55
> >>select        300    355.79        300    357.76        300    362.35
> >>poll          300   1107.85        300   1122.55        300   1115.62
> >>usleep        300    355.28        300    357.28        300    360.79
> >>nanosleep     300    354.49        300    355.82        300    360.62
> >>kqueue        300   1112.57        300   1118.13        300   1117.16
> >>kqueueto      300    375.98        300    378.62        300    395.61
> >>syscall       300      4.41        300      4.45        300      4.54
> >>select       3000   3246.75       3000   3246.74       3000   3252.72
> >>poll         3000   3238.10       3000   3229.12       3000   3250.10
> >>usleep       3000   3242.47       3000   3237.06       3000   3249.61
> >>nanosleep    3000   3238.79       3000   3231.55       3000   3248.11
> >>kqueue       3000   3240.01       3000   3236.07       3000   3247.60
> >>kqueueto     3000   3265.36       3000   3267.22       3000   3274.96
> >>syscall      3000      4.69       3000      4.44       3000      4.50
> >>select      30000  31714.60      30000  31941.17      30000  32467.69
> >>poll        30000  31522.76      30000  31983.00      30000  32497.81
> >>usleep      30000  31459.67      30000  31980.76      30000  32458.71
> >>nanosleep   30000  31431.02      30000  31982.22      30000  32525.20
> >>kqueue      30000  31466.75      30000  31873.90      30000  31973.54
> >>kqueueto    30000  31564.67      30000  32522.35      30000  32475.59
> >>syscall     30000      4.70      30000      4.73      30000      4.89
> >>select     300000 319133.02     300000 311562.33     300000 309918.62
> >>poll       300000 319604.27     300000 311422.94     300000 310000.76
> >>usleep     300000 319314.60     300000 311269.69     300000 309996.34
> >>nanosleep  300000 319497.58     300000 311425.40     300000 309997.13
> >>kqueue     300000 309995.55     300000 303980.27     300000 309908.82
> >>kqueueto   300000 319505.88     300000 311424.97     300000 309996.16
> >>syscall    300000      4.41     300000      4.45     300000      4.89
> >>
> >>
> >>With no patches...
> >>
> >>                 HZ=100               HZ=250               HZ=1000
> >>---------- ----------------     ----------------     ----------------
> >>select          1  19941.70          1   7989.10          1   1999.16
> >>poll            1  19904.61          1   7987.32          1   1999.78
> >>usleep          1  19904.95          1   7993.30          1   1999.96
> >>nanosleep       1  19905.64          1   7993.71          1   1999.72
> >>kqueue          1  10001.61          1   4004.00          1   1000.27
> >>kqueueto        1  19904.00          1   7993.03          1   1999.54
> >>syscall         1      4.04          1      4.05          1      4.75
> >>select        300  19904.66        300   7998.39        300   2000.27
> >>poll          300  19904.35        300   7993.47        300   1999.86
> >>usleep        300  19903.96        300   7994.11        300   1999.81
> >>nanosleep     300  19904.48        300   7993.77        300   1999.80
> >>kqueue        300  10001.68        300   4004.18        300   1000.31
> >>kqueueto      300  19997.86        300   7993.37        300   1999.59
> >>syscall       300      4.01        300      4.00        300      4.32
> >>select       3000  19904.80       3000   7998.85       3000   3998.43
> >>poll         3000  19904.92       3000   8005.93       3000   3999.39
> >>usleep       3000  19904.50       3000   7992.88       3000   3999.44
> >>nanosleep    3000  19904.84       3000   7993.34       3000   3999.36
> >>kqueue       3000  10001.58       3000   4003.97       3000   3000.72
> >>kqueueto     3000  19903.56       3000   7993.24       3000   3999.34
> >>syscall      3000      4.02       3000      4.37       3000      4.29
> >>select      30000  39905.02      30000  35991.79      30000  31051.77
> >>poll        30000  39905.49      30000  35980.35      30000  30995.64
> >>usleep      30000  39903.78      30000  35979.48      30000  30995.23
> >>nanosleep   30000  39904.55      30000  35981.61      30000  30995.87
> >>kqueue      30000  30002.73      30000  32019.54      30000  30004.83
> >>kqueueto    30000  39903.59      30000  35979.64      30000  30996.05
> >>syscall     30000      4.44      30000      4.04      30000      4.31
> >>select     300000 310001.23     300000 303995.86     300000 300994.30
> >>poll       300000 309902.73     300000 303981.58     300000 300996.17
> >>usleep     300000 309903.64     300000 303980.17     300000 300997.42
> >>nanosleep  300000 309903.32     300000 303980.36     300000 300993.64
> >>kqueue     300000 300002.77     300000 300019.46     300000 300006.90
> >>kqueueto   300000 309903.31     300000 303978.10     300000 300996.84
> >>syscall    300000      4.01     300000      4.04     300000      4.29
> 
> 
> -- 
> Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 11:24:37 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id A3C63FF3;
 Wed,  2 Jan 2013 11:24:37 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-ea0-f178.google.com (mail-ea0-f178.google.com
 [209.85.215.178])
 by mx1.freebsd.org (Postfix) with ESMTP id C58628FC14;
 Wed,  2 Jan 2013 11:24:36 +0000 (UTC)
Received: by mail-ea0-f178.google.com with SMTP id k11so5884314eaa.23
 for <multiple recipients>; Wed, 02 Jan 2013 03:24:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:sender:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=6epBEEoEP6d+FP+YxO1bxtw+aDC9aFKnX7YDOkS3OPg=;
 b=d9FZUY+/cGhYonLELHujxH3ivyxhBqHU011TQ+2uVwdsHidt6JBQKEWn+VOdAkC7JR
 Ll8+wvIarXcXDIafVX9eDkk0RTvASajbSuMq0yxW50qcNwplgiR+8zB1MXGQh1zT+7BG
 nmKloM8yresyJvzireaFXksaCzon6SrFTILACgXvTmVSqQsQC4V7N3vodFg2rQw9Ptmz
 9iF4D4+6fON8X16/LzMHfDp4F48GmI4spkH2Z3WZhkAEPnhUS/rL9KMzsqx5wsE8xVQ8
 d2IilXIJMW4pS3TvyCe05eZiBr8mYishaKC5jM/CYa1ZOOIr4N8xhOi9B8SD+okGh0ce
 +yGQ==
X-Received: by 10.14.215.6 with SMTP id d6mr124479099eep.40.1357125869943;
 Wed, 02 Jan 2013 03:24:29 -0800 (PST)
Received: from mavbook.mavhome.dp.ua ([91.198.175.1])
 by mx.google.com with ESMTPS id 43sm96922786eed.10.2013.01.02.03.24.27
 (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 02 Jan 2013 03:24:28 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <50E418EA.7030801@FreeBSD.org>
Date: Wed, 02 Jan 2013 13:24:26 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:13.0) Gecko/20120628 Thunderbird/13.0.1
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: [RFC/RFT] calloutng
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it>
In-Reply-To: <20130102105730.GA42542@onelab2.iet.unipi.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 11:24:37 -0000

On 02.01.2013 12:57, Luigi Rizzo wrote:
> On Mon, Dec 31, 2012 at 12:17:27PM +0200, Alexander Motin wrote:
>> On 31.12.2012 08:17, Luigi Rizzo wrote:
>>> On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote:
> ...
>>>> Then I noticed you had a 12_26 patchset so I tested
>>>> that (after crudely fixing a couple uninitialized var warnings), and it
>>>> all looks good on this arm (Raspberry Pi).  I'll attach the results.
>>>>
>>>> It's so sweet to be able to do precision sleeps.
>>
>> Thank you for testing, Ian.
>>
>>> interesting numbers, but there seems to be some problem in computing
>>> the exact interval; delays are much larger than expected.
>>>
>>> In this test, the original timer code used to round to the next multiple
>>> of 1 tick and then add another tick (except for the kqueue case),
>>> which is exactly what you see in the second set of measurements.
>>>
>>> The calloutng code however seems to do something odd:
>>> in addition to fixed overhead (some 50us, which you can see in
>>> the tests for 1us and 300us), all delay seem to be ~10% larger
>>> than what is requested, upper bounded to 10ms (note, the
>>> numbers are averages so i cannot tell whether all samples are
>>> the same or there is some distribution of values).
>>>
>>> I am not sure if this error is peculiar of the ARM version or also
>>> appears on x86/amd64 but I believe it should be fixed.
>>>
>>> If you look at the results below:
>>>
>>> 1us 	possily ok:
>>> 	for very short intervals i would expect some kind
>>> 	of 'reschedule' without actually firing a timer; maybe
>>> 	50us are what it takes to do a round through the scheduler ?
>>>
>>> 300us	probably ok
>>> 	i guess the extra 50-90us are what it takes to do a round
>>> 	through the scheduler
>>>
>>> 1000us	borderline (this is the case for poll and kqueue, which are
>>> 	rounded to 1ms)
>>> 	here intervals seem to be increased by 10%, and i cannot see
>>> 	a good reason for this (more below).
>>>
>>> 3000us and above: wrong
>>> 	here again, the intervals seem to be 10% larger than what is
>>> 	requested, perhaps limiting the error to 10-20ms.
>>>
>>>
>>> Maybe the 10% extension results from creating a default 'precision'
>>> for legacy calls, but i do not think this is done correctly.
>>>
>>> First of all, if users do not specify a precision themselves, the
>>> automatically generated value should never exceed one tick.
>>>
>>> Second, the only point of a 'precision' parameter is to merge
>>> requests that may be close in time, so if there is already a
>>> timer scheduled within [Treq, Treq+precision] i will get it;
>>> but if there no pending timer, then one should schedule it
>>> for the requested interval.
>>>
>>> Davide/Alexander, any ideas ?
>>
>> All mentioned effects could be explained with implemented logic. 50us at
>> 1us is probably sum of minimal latency of the hardware eventtimer on the
>> specific platform and some software processing overhead (syscall,
>> callout, timecouters, scheduler, etc). At later points system starts to
>> noticeably use precision specified by kern.timecounter.alloweddeviation
>> sysctl. It affects results from two sides: 1) extending intervals for
>> specified percent of time to allow event aggregation, and 2) choosing
>> time base between fast getbinuptime() and precise binuptime(). Extending
>> interval is needed to aggregate not only callouts with each other, but
>> also callouts with other system events, which are impossible to schedule
>> in advance. It gives specified relative error, but no more then one CPU
>> wakeup period in absolute: for busy CPU (not skipping hardclock() ticks)
>> it is 1/hz, for completely idle one it can be up to 0.5s. Second point
>> allows to reduce processing overhead by the cost of error up to 1/hz for
>> long periods (>(100/allowed)*(1/hz)), when it is used.
>
> i am not sure what you mean by "extending interval", but i believe the
> logic should be the following:
>
> - say user requests a timeout after X seconds and with a tolerance of D second
>    (both X and D are fractional, so they can be short).  Interpret this as
>
>     "the system should do its best to generate an event between X and X+D seconds"
>
> - convert X to an absolute time, T_X
>
> - if there are any pending events already scheduled between T_X and T_X+D,
>    then by definition they are acceptable. Attach the requested timeout
>    to the earliest of these events.

All above is true, but not following.

> - otherwise, schedule an event at time T_X (because there is no valid
>    reason to generate a late event, and it makes no sense from an
>    energy saving standpoint, either -- see below).

System may have many interrupts except timer: network, disk, ... WiFi 
cards generate interrupts with AP beacon rate -- dozens times per 
second. It is not very efficient to wake up CPU precisely at T_X time, 
that may be just 100us earlier then next hardware interrupt. That's why 
timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next 
statclock, ...). As result, event will be handled within allowed range, 
but real delay will depends on current environment conditions.

> It seems to me that you are instead extending the requested interval
> upfront, which causes some gratuitous pessimizations in scheduling
> the callout.
>
> Re. energy savings: the gain in extending the timeout cannot exceed
> the value D/X. So while it may make sense to extend a 1us request
> to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s,
> it is completely pointless from an energy saving standpoint to
> introduce a 10ms error on a 300ms request.

I am not so sure in this. When CPU package is in C7 sleep state with all 
buses and caches shut down and memory set to self refresh, it consumes 
very few (some milli-Watts) of power. Wake up from that state takes 
100us or even more with power consumption much higher then normal 
operational one. Sure, if we compare it with power consumption of 100% 
CPU load, difference between 10 and 100 wakeups per second may be small, 
but when comparing to each other in some low-power environment for 
mostly idle system it may be much more significant.

> (even though i hate the idea that a 1us request defaults to
> a 50us delay; but that is hopefully something that can be tuned
> in a platform-specific way and perhaps at runtime).

It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 12:28:41 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id A35DFC3E;
 Wed,  2 Jan 2013 12:28:41 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
 by mx1.freebsd.org (Postfix) with ESMTP id 1E42F8FC12;
 Wed,  2 Jan 2013 12:28:39 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
 id 9B8E87300A; Wed,  2 Jan 2013 13:27:43 +0100 (CET)
Date: Wed, 2 Jan 2013 13:27:43 +0100
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: [RFC/RFT] calloutng
Message-ID: <20130102122743.GA43241@onelab2.iet.unipi.it>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <50E418EA.7030801@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 12:28:41 -0000

On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote:
> On 02.01.2013 12:57, Luigi Rizzo wrote:
...
> >i am not sure what you mean by "extending interval", but i believe the
> >logic should be the following:
> >
> >- say user requests a timeout after X seconds and with a tolerance of D 
> >second
> >   (both X and D are fractional, so they can be short).  Interpret this as
> >
> >    "the system should do its best to generate an event between X and X+D 
> >    seconds"
> >
> >- convert X to an absolute time, T_X
> >
> >- if there are any pending events already scheduled between T_X and T_X+D,
> >   then by definition they are acceptable. Attach the requested timeout
> >   to the earliest of these events.
> 
> All above is true, but not following.
> 
> >- otherwise, schedule an event at time T_X (because there is no valid
> >   reason to generate a late event, and it makes no sense from an
> >   energy saving standpoint, either -- see below).
> 
> System may have many interrupts except timer: network, disk, ... WiFi 
> cards generate interrupts with AP beacon rate -- dozens times per 
> second. It is not very efficient to wake up CPU precisely at T_X time, 
> that may be just 100us earlier then next hardware interrupt. That's why 
> timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next 
> statclock, ...). As result, event will be handled within allowed range, 
> but real delay will depends on current environment conditions.

I don't see why system events (hardclock, statclock, 0.5s,...)
need to be treated specially -- and i am saying this also in
the interest of simplifying the logic of the code.

First of all, if you know that there is already a hardclock/statclock/*
scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
was ""no event scheduled in [T_X, T_X+D]" so you need to generate
a new one.

Surely scheduling the event at T_X+D instead of T_X increases the
chance of merging events. But the saving are smaller and smaller
as the value X increases. This particular client will only
change its request rate from 1/X to 1/(X+D) so in relative terms
the gain is ( 1/X - 1/(X+D) ) / (1/(X+D) ) = D/X

Example: if X = 300ms, and D = 10ms (as in the test case)
you just save one interrupt every 30seconds by scheduling at
T_X+D instead of T_X. Are we actually able to measure the
difference ?

Even at high interrupt rates (e.g. X = 1ms) you are not
going to save a lot unless the tolerance D is very large,
which is generally undesirable for other reasons
(presumably, applications are not going to be happy
if you artificially double their timeouts).
Now, say your application requests timeouts every X = 300ms.

> >It seems to me that you are instead extending the requested interval
> >upfront, which causes some gratuitous pessimizations in scheduling
> >the callout.
> >
> >Re. energy savings: the gain in extending the timeout cannot exceed
> >the value D/X. So while it may make sense to extend a 1us request
> >to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s,
> >it is completely pointless from an energy saving standpoint to
> >introduce a 10ms error on a 300ms request.
> 
> I am not so sure in this. When CPU package is in C7 sleep state with all 
> buses and caches shut down and memory set to self refresh, it consumes 
> very few (some milli-Watts) of power. Wake up from that state takes 
> 100us or even more with power consumption much higher then normal 
> operational one. Sure, if we compare it with power consumption of 100% 
> CPU load, difference between 10 and 100 wakeups per second may be small, 
> but when comparing to each other in some low-power environment for 
> mostly idle system it may be much more significant.

see above -- at low rates the difference is not measurable,
at high rates thCe only obvious answer is "do not use C7 unless
if the next interrupt is due in less than 2..5 milliseconds"

> >(even though i hate the idea that a 1us request defaults to
> >a 50us delay; but that is hopefully something that can be tuned
> >in a platform-specific way and perhaps at runtime).
> 
> It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us.

very good, i suspected something similar, just wanted to be sure :)

cheers
luigi

> -- 
> Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 14:03:06 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 9388A5B5;
 Wed,  2 Jan 2013 14:03:06 +0000 (UTC)
 (envelope-from freebsd@damnhippie.dyndns.org)
Received: from duck.symmetricom.us (duck.symmetricom.us [206.168.13.214])
 by mx1.freebsd.org (Postfix) with ESMTP id 7BA3B8FC08;
 Wed,  2 Jan 2013 14:03:04 +0000 (UTC)
Received: from damnhippie.dyndns.org (daffy.symmetricom.us [206.168.13.218])
 by duck.symmetricom.us (8.14.5/8.14.5) with ESMTP id r02E2waO021163;
 Wed, 2 Jan 2013 07:02:58 -0700 (MST)
 (envelope-from freebsd@damnhippie.dyndns.org)
Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240])
 by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id r02E2sIQ087850;
 Wed, 2 Jan 2013 07:02:54 -0700 (MST)
 (envelope-from freebsd@damnhippie.dyndns.org)
Subject: Re: [RFC/RFT] calloutng
From: Ian Lepore <freebsd@damnhippie.dyndns.org>
To: Alexander Motin <mav@freebsd.org>
In-Reply-To: <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
Content-Type: text/plain; charset="koi8-r"
Date: Wed, 02 Jan 2013 07:02:54 -0700
Message-ID: <1357135374.54953.150.camel@revolution.hippie.lan>
Mime-Version: 1.0
X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port 
Content-Transfer-Encoding: 8bit
Cc: Davide Italiano <davide@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 14:03:06 -0000

On Wed, 2013-01-02 at 15:11 +0200, Alexander Motin wrote:
> 02.01.2013 14:28 ������������ "Luigi Rizzo" <rizzo@iet.unipi.it> �������:
> >
> > On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote:
> > > On 02.01.2013 12:57, Luigi Rizzo wrote:

> > First of all, if you know that there is already a hardclock/statclock/*
> > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
> > was ""no event scheduled in [T_X, T_X+D]" so you need to generate
> > a new one.
> 
> That is true, but my main point was about merging with external events,
> which I can't predict and the only way to merge is increase sleep period,
> hoping for better.
> 

This really is the crux of the problem, because you can't *by default*
dispatch an event earlier than requested because that's just a violation
of the usual rules of precision timing (where you expect to be late but
never early).

Sometimes there is no need for such precision, and an early wakeup is no
more or less detrimental than a late wakeup.  In fact, that may even be
the majority case.  I wonder if it might make sense to allow the
precision specification to indicate whether it needs traditional "never
early" behavior or whether it can be interpretted as "plus or minus this
amount is fine."  Like maybe negative precision is interpretted as "plus
or minus abs(precision)" or something like that.

Or maybe even the other way around... you get "plus or minus" precision
by default and the few things that really care about precision timing
have a way of indicating that.  (But in that case the userland sleeps
would have to assume the traditional behavior because that's how they've
always been documented.)

-- Ian


From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 16:23:04 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 192E191B;
 Wed,  2 Jan 2013 16:23:04 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
 by mx1.freebsd.org (Postfix) with ESMTP id BF0181616;
 Wed,  2 Jan 2013 16:23:03 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
 id 622AA7300A; Wed,  2 Jan 2013 17:22:06 +0100 (CET)
Date: Wed, 2 Jan 2013 17:22:06 +0100
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: [RFC/RFT] calloutng
Message-ID: <20130102162206.GA45701@onelab2.iet.unipi.it>
References: <50D03173.9080904@FreeBSD.org>
 <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>, freebsd-arch@freebsd.org,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 16:23:04 -0000

On Wed, Jan 02, 2013 at 03:11:05PM +0200, Alexander Motin wrote:
...
> > First of all, if you know that there is already a hardclock/statclock/*
> > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
> > was ""no event scheduled in [T_X, T_X+D]" so you need to generate
> > a new one.
> 
> That is true, but my main point was about merging with external events,
> which I can't predict and the only way to merge is increase sleep period,
> hoping for better.

ok, now i understand why you want to schedule for T_X+D.

Probably one way to close this discussion would be to provide
a sysctl so the sysadmin can decide which point in the interval
to pick when there is no suitable callout already scheduled.

cheers
luigi

-----------------------------------------+-------------------------------
  Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
  http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
  TEL      +39-050-2211611               . via Diotisalvi 2
  Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 17:01:52 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id CE7CF890;
 Wed,  2 Jan 2013 17:01:52 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-we0-f175.google.com (mail-we0-f175.google.com
 [74.125.82.175])
 by mx1.freebsd.org (Postfix) with ESMTP id BB21B17C1;
 Wed,  2 Jan 2013 17:01:51 +0000 (UTC)
Received: by mail-we0-f175.google.com with SMTP id z53so6928298wey.34
 for <multiple recipients>; Wed, 02 Jan 2013 09:01:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=MQ2/8MvxPXr3Mfaq7+CzCJq3Asbu/F7W+iECw1K5ZLk=;
 b=BG0WgUU5j933tfR8FChP08xQshElk+KiP04OcjMogmx3Pw/Gq2mznTwTWYCo20JrZJ
 kX5eC2kiO9HEve+tA1EXv9CuCaMJMfYX8PQjPZDxGgbH82r2rOVZwu07A6XqjcquMJ5F
 fs6Tf9b4R+hWb7fpm+Bez/RaxmVCahfYCFAZk5x/c/OnFhdyQk+HmZt+X3ZiwaB5Rj/K
 vquQ6pZdKlXmbgghrTDbEap0VfXHHCSD523G2sPyxbPpu1VsUKhA7inpg8Nk/yS4q9ND
 TykABw8tCyoTtfXqVew4tJxif4oG97T9Q7fHCghEPUSjk4X33+ZdYEla0MuhxcdzPcRy
 o1IA==
MIME-Version: 1.0
Received: by 10.194.83.36 with SMTP id n4mr73372240wjy.59.1357142883202; Wed,
 02 Jan 2013 08:08:03 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.217.57.9 with HTTP; Wed, 2 Jan 2013 08:08:03 -0800 (PST)
In-Reply-To: <1357135374.54953.150.camel@revolution.hippie.lan>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org>
 <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <1357135374.54953.150.camel@revolution.hippie.lan>
Date: Wed, 2 Jan 2013 08:08:03 -0800
X-Google-Sender-Auth: ziUubih1EPjyqc6VtzirwSbcEKY
Message-ID: <CAJ-Vmo=mmm5zhwHyzKeg1VEL8hSz6_LxJAaLh74ArHF3_9KWaQ@mail.gmail.com>
Subject: Re: [RFC/RFT] calloutng
From: Adrian Chadd <adrian@freebsd.org>
To: Ian Lepore <freebsd@damnhippie.dyndns.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: Davide Italiano <davide@freebsd.org>, Alexander Motin <mav@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 17:01:52 -0000

.. I'm pretty damned sure we're going to need to enforce a "never
earlier than X" latency.

Is there a more detailed writeup of calloutng somewhere, besides
David's slides? The wiki page is rather empty.

Eg - I think this work does coalesce wakeups, right? Or it can? So
when in low-power scenarios you can end up with lower-resolution
callout periods, but many less CPU wakeups a second?

(Do we actually _expose_ wakeups-per-second somewhere?)


Adrian

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 17:09:38 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id DBCC8DEE
 for <freebsd-arch@freebsd.org>; Wed,  2 Jan 2013 17:09:38 +0000 (UTC)
 (envelope-from kostikbel@gmail.com)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 by mx1.freebsd.org (Postfix) with ESMTP id 7F42E17E1
 for <freebsd-arch@freebsd.org>; Wed,  2 Jan 2013 17:09:38 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.5/8.14.5) with ESMTP id r02H9YJs040574;
 Wed, 2 Jan 2013 19:09:34 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.7.3 kib.kiev.ua r02H9YJs040574
Received: (from kostik@localhost)
 by tom.home (8.14.5/8.14.5/Submit) id r02H9Y1b040573;
 Wed, 2 Jan 2013 19:09:34 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Wed, 2 Jan 2013 19:09:34 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: [RFC/RFT] calloutng
Message-ID: <20130102170934.GA82219@kib.kiev.ua>
References: <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <20130102162206.GA45701@onelab2.iet.unipi.it>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="1XmnKQGVLLNJnMip"
Content-Disposition: inline
In-Reply-To: <20130102162206.GA45701@onelab2.iet.unipi.it>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>, Alexander Motin <mav@freebsd.org>,
 freebsd-arch@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 17:09:38 -0000


--1XmnKQGVLLNJnMip
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jan 02, 2013 at 05:22:06PM +0100, Luigi Rizzo wrote:
> On Wed, Jan 02, 2013 at 03:11:05PM +0200, Alexander Motin wrote:
> ...
> > > First of all, if you know that there is already a hardclock/statclock=
/*
> > > scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
> > > was ""no event scheduled in [T_X, T_X+D]" so you need to generate
> > > a new one.
> >=20
> > That is true, but my main point was about merging with external events,
> > which I can't predict and the only way to merge is increase sleep perio=
d,
> > hoping for better.
>=20
> ok, now i understand why you want to schedule for T_X+D.
>=20
> Probably one way to close this discussion would be to provide
> a sysctl so the sysadmin can decide which point in the interval
> to pick when there is no suitable callout already scheduled.
Isn't trying to synchronize to the external events in this way unsafe ?
I remember, but cannot find the reference right now, a scheduler
exploit(s) which completely hide malicious thread from the time
accounting, by making it voluntary yielding right before statclock
should fire. If statistic gathering could be piggy-backed on the
external interrupt, and attacker can control the source of the external
events, wouldn't this give her a handle ?

--1XmnKQGVLLNJnMip
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)

iQIcBAEBAgAGBQJQ5GnNAAoJEJDCuSvBvK1BD7wP/AiejtH6wGLURiyHvqLN6rvo
pjj0tVyGyBErD8Fk/iGaeT6Ok4mqMoKU2bNEGLE9FhC+9ixqxGJzZdGargtDSigq
f5qDnnGx0ZYGRz9M/E66E8HJnUT0dLd4eb5zOPdHYUS4vceOdhXDOhRma68fRU4a
pPsAPMVgZCFjh44MuL5Ip5hUYYXBvFqloMjsWX5QCm3oYm1mmAN6EX+45jO201sy
BP10zANSKFD59QgFr8vWV9zBzx3dfeG6TjJFDKQ9UCV9OohxdrkFeanxE6cq7gxi
kEjjUUNcCKiQ9XdzfdCawncYoO+9sioPrg+tHMLCaAPXOW+N8v9PGlPwUechS5kj
HAdUwUhEOqFWppPSC9w89WaDGMUWDLo3DXSi+spk+towdaT3caZkKm+EVAmkCYqL
xXwFNCmu/xHMLvqPUvnyH/6uq0DmNOHrtoJxKMMcP3fzRjdWFwzJYNMM6kMTDZGu
/AcLEVJwU9+FgWGZgz+nAyQv62Z23YXk8nLOgUD33lS3zC4KV19onkhzYZ9XHQf2
TasP7/TFUgxjnU8l8QGwTgeb9oaHZHy2O4qH2jvPP3Eb22mkfoiaJHpVWrJ8iwek
C7ZDUJVat0HIdazS6lE1geveBoDmINZemHyBK+f5XGYNfGUefVMqcSU77i8yOfFL
2h3QNlI93AdfFutRaPTU
=oZCF
-----END PGP SIGNATURE-----

--1XmnKQGVLLNJnMip--

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 13:11:08 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
 by hub.freebsd.org (Postfix) with ESMTP id 8CCCA27D;
 Wed,  2 Jan 2013 13:11:08 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-ia0-f171.google.com (mail-ia0-f171.google.com
 [209.85.210.171])
 by mx1.freebsd.org (Postfix) with ESMTP id 2539B8FC08;
 Wed,  2 Jan 2013 13:11:07 +0000 (UTC)
Received: by mail-ia0-f171.google.com with SMTP id k27so11814000iad.16
 for <multiple recipients>; Wed, 02 Jan 2013 05:11:06 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date
 :x-google-sender-auth:message-id:subject:from:to:cc:content-type;
 bh=Ge93wgbmwXzBy5wLHFxOPAMkTtllBOn/XYFxpSV6pMg=;
 b=MCWbvY3CpRUbFneMcabounMspKT0sSpk6f2qARWW9WOKXEfN53e3dzOf4uSoZpIkaS
 YY4SoT1V6FyIY4OLA/8XBSuS96ra6DJNnctXvT6A5aLO667HLTotfIWAk0H8n7kxBxcw
 Jujob+9zNT4UGv8XWAytj1JvqVgrkTScPLEueVstkob/tAssk9dF0QmB589AUrL0Y4qz
 fOeMFsSKEd8FUqQGJ3pRzsUwMqqPUjUNxGHujHOICcXBv+ezkSi4X47VBfZgNyFnPEV9
 ydhMiPJmmff2ovijM4CbouCcBdtd6Nx1vT4JGe/wWwdZHCwtnsFZGp5AuCannINT5PoR
 PF9g==
MIME-Version: 1.0
Received: by 10.50.190.199 with SMTP id gs7mr30698329igc.89.1357132266513;
 Wed, 02 Jan 2013 05:11:06 -0800 (PST)
Sender: mavbsd@gmail.com
Received: by 10.231.25.76 with HTTP; Wed, 2 Jan 2013 05:11:05 -0800 (PST)
Received: by 10.231.25.76 with HTTP; Wed, 2 Jan 2013 05:11:05 -0800 (PST)
In-Reply-To: <20130102122743.GA43241@onelab2.iet.unipi.it>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org>
 <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
Date: Wed, 2 Jan 2013 15:11:05 +0200
X-Google-Sender-Auth: c0dsQh7t4Ciqz-RqfgDDC7ofqvc
Message-ID: <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
Subject: Re: [RFC/RFT] calloutng
From: Alexander Motin <mav@FreeBSD.org>
To: Luigi Rizzo <rizzo@iet.unipi.it>
X-Mailman-Approved-At: Wed, 02 Jan 2013 21:16:12 +0000
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>, freebsd-arch@freebsd.org,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 13:11:08 -0000

02.01.2013 14:28 =D0=CF=CC=D8=DA=CF=D7=C1=D4=C5=CC=D8 "Luigi Rizzo" <rizzo@=
iet.unipi.it> =CE=C1=D0=C9=D3=C1=CC:
>
> On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote:
> > On 02.01.2013 12:57, Luigi Rizzo wrote:
> ...
> > >i am not sure what you mean by "extending interval", but i believe the
> > >logic should be the following:
> > >
> > >- say user requests a timeout after X seconds and with a tolerance of =
D
> > >second
> > >   (both X and D are fractional, so they can be short).  Interpret
this as
> > >
> > >    "the system should do its best to generate an event between X and
X+D
> > >    seconds"
> > >
> > >- convert X to an absolute time, T_X
> > >
> > >- if there are any pending events already scheduled between T_X and
T_X+D,
> > >   then by definition they are acceptable. Attach the requested timeou=
t
> > >   to the earliest of these events.
> >
> > All above is true, but not following.
> >
> > >- otherwise, schedule an event at time T_X (because there is no valid
> > >   reason to generate a late event, and it makes no sense from an
> > >   energy saving standpoint, either -- see below).
> >
> > System may have many interrupts except timer: network, disk, ... WiFi
> > cards generate interrupts with AP beacon rate -- dozens times per
> > second. It is not very efficient to wake up CPU precisely at T_X time,
> > that may be just 100us earlier then next hardware interrupt. That's why
> > timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next
> > statclock, ...). As result, event will be handled within allowed range,
> > but real delay will depends on current environment conditions.
>
> I don't see why system events (hardclock, statclock, 0.5s,...)
> need to be treated specially -- and i am saying this also in
> the interest of simplifying the logic of the code.

Sure. That is mostly for historical reasons. At some point they should
disappear, just not now, as patch is already quite big.

> First of all, if you know that there is already a hardclock/statclock/*
> scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
> was ""no event scheduled in [T_X, T_X+D]" so you need to generate
> a new one.

That is true, but my main point was about merging with external events,
which I can't predict and the only way to merge is increase sleep period,
hoping for better.

> Surely scheduling the event at T_X+D instead of T_X increases the
> chance of merging events. But the saving are smaller and smaller
> as the value X increases. This particular client will only
> change its request rate from 1/X to 1/(X+D) so in relative terms
> the gain is ( 1/X - 1/(X+D) ) / (1/(X+D) ) =3D D/X
>
> Example: if X =3D 300ms, and D =3D 10ms (as in the test case)
> you just save one interrupt every 30seconds by scheduling at
> T_X+D instead of T_X. Are we actually able to measure the
> difference ?
>
> Even at high interrupt rates (e.g. X =3D 1ms) you are not
> going to save a lot unless the tolerance D is very large,
> which is generally undesirable for other reasons
> (presumably, applications are not going to be happy
> if you artificially double their timeouts).
> Now, say your application requests timeouts every X =3D 300ms.

With default precision set to 5% it will be only 5% save from periods
increase. But that is absolutely not my goal!

Imagine different case: you have NIC interrupts at 1000Hz. Also you have
100 callouts with 100ms period each. If we program timer with absolute
precision, you will get about 2000Hz of total interrupt rate. But if we
allow just 2% deviation, most of callouts will be grouped with NIC
interrupts and total rate will be 1000Hz. Loosing _less_ then 2% of
precision we are reducing interrupt rate in _half_!

> > >It seems to me that you are instead extending the requested interval
> > >upfront, which causes some gratuitous pessimizations in scheduling
> > >the callout.
> > >
> > >Re. energy savings: the gain in extending the timeout cannot exceed
> > >the value D/X. So while it may make sense to extend a 1us request
> > >to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s,
> > >it is completely pointless from an energy saving standpoint to
> > >introduce a 10ms error on a 300ms request.
> >
> > I am not so sure in this. When CPU package is in C7 sleep state with al=
l
> > buses and caches shut down and memory set to self refresh, it consumes
> > very few (some milli-Watts) of power. Wake up from that state takes
> > 100us or even more with power consumption much higher then normal
> > operational one. Sure, if we compare it with power consumption of 100%
> > CPU load, difference between 10 and 100 wakeups per second may be small=
,
> > but when comparing to each other in some low-power environment for
> > mostly idle system it may be much more significant.
>
> see above -- at low rates the difference is not measurable,
> at high rates thCe only obvious answer is "do not use C7 unless
> if the next interrupt is due in less than 2..5 milliseconds"
>
> > >(even though i hate the idea that a 1us request defaults to
> > >a 50us delay; but that is hopefully something that can be tuned
> > >in a platform-specific way and perhaps at runtime).
> >
> > It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us.
>
> very good, i suspected something similar, just wanted to be sure :)
>
> cheers
> luigi
>
> > --
> > Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 21:39:21 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 2006AA95;
 Wed,  2 Jan 2013 21:39:21 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-ea0-f178.google.com (mail-ea0-f178.google.com
 [209.85.215.178])
 by mx1.freebsd.org (Postfix) with ESMTP id 614D7623;
 Wed,  2 Jan 2013 21:39:20 +0000 (UTC)
Received: by mail-ea0-f178.google.com with SMTP id k11so6115397eaa.37
 for <multiple recipients>; Wed, 02 Jan 2013 13:39:19 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:sender:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=Uo48LHfkJdD9FVWjrQOe4d6qpXXXfB1b+Md8xaa4JYg=;
 b=VCkGVKddQebRYa8Q1TsRe+UT1A7G4zqhQ5qE35Vl9FSrEYf3vuE0unLy0IDdMTVBcG
 neVE6n/PQOcvLFXPLDj07I9hsWv8lWOBq93PpRsG3d4XQjchEwSEPnafb6EJJt7vmUy3
 AAViJ5qAHFBQXogJVkfMc4pk9uxwV//T3vJZQLdnqzHKs4d33w1LC7sGBl9NSA+CQVr+
 fdrveDjF6t6PdJCwEPaUhBrSEzLpcatPTf3McPzGQ2r0L4bHiqnciDi6Wcv5WjnWrNPn
 6S47+OLdrd2gVT++CtmLncq3JDNJsPvaYOsBXkTGLnVf/8fVRLJBtz5piVmVHxpZFW3+
 bJdQ==
X-Received: by 10.14.204.70 with SMTP id g46mr102666330eeo.15.1357162759189;
 Wed, 02 Jan 2013 13:39:19 -0800 (PST)
Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37])
 by mx.google.com with ESMTPS id
 l3sm100090480eem.14.2013.01.02.13.39.16
 (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 02 Jan 2013 13:39:17 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <50E4A902.4050307@FreeBSD.org>
Date: Wed, 02 Jan 2013 23:39:14 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:13.0) Gecko/20120628 Thunderbird/13.0.1
MIME-Version: 1.0
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: [RFC/RFT] calloutng
References: <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <20130102162206.GA45701@onelab2.iet.unipi.it>
 <20130102170934.GA82219@kib.kiev.ua>
In-Reply-To: <20130102170934.GA82219@kib.kiev.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>, freebsd-arch@freebsd.org,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 21:39:21 -0000

On 02.01.2013 19:09, Konstantin Belousov wrote:
> On Wed, Jan 02, 2013 at 05:22:06PM +0100, Luigi Rizzo wrote:
>> Probably one way to close this discussion would be to provide
>> a sysctl so the sysadmin can decide which point in the interval
>> to pick when there is no suitable callout already scheduled.
> Isn't trying to synchronize to the external events in this way unsafe ?
> I remember, but cannot find the reference right now, a scheduler
> exploit(s) which completely hide malicious thread from the time
> accounting, by making it voluntary yielding right before statclock
> should fire. If statistic gathering could be piggy-backed on the
> external interrupt, and attacker can control the source of the external
> events, wouldn't this give her a handle ?

There are many different kinds of accounting with different 
characteristics. Run time for each thread calculated using high 
resolution per-CPU clocks on each context switch. It is impossible to 
hide from it. System load average updated using callout and aligned with 
hardclock(). Hiding from it was easy before, but it can be made more 
complicated (asynchronous) now. Per-CPU SYSTEM/INTERRUPT/USER/NICE/IDLE 
counters are updated by statcklock(), that is asynchronous to 
hardclock() and hiding from it supposed to be more complicated. But even 
before it was possible, since hardclock() frequency is 8 times higher 
then statclock() one. More important for scheduling fairness thread's 
CPU percentage is also based on hardclock() and hiding from it was 
trivial before, since all sleep primitives were strictly aligned to 
hardclock(). Now it is slightly less trivial, since this alignment was 
removed and user-level APIs provide no easy way to enforce it.

The only way to get really safe accounting is forget about sampling and 
use potentially more expensive other ways. It was always stopped by lack 
of cheap and reliable clocks. But since TSC is P-state invariant on most 
of CPUs present now it could probably be reconsidered.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Wed Jan  2 22:06:18 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 43B89B05;
 Wed,  2 Jan 2013 22:06:18 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-ea0-f181.google.com (mail-ea0-f181.google.com
 [209.85.215.181])
 by mx1.freebsd.org (Postfix) with ESMTP id 272BC742;
 Wed,  2 Jan 2013 22:06:16 +0000 (UTC)
Received: by mail-ea0-f181.google.com with SMTP id k14so5830467eaa.26
 for <multiple recipients>; Wed, 02 Jan 2013 14:06:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:sender:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=3EcKGr55zw6+/irtMq+g7UAr+0y89PXbs2Myu69P0YQ=;
 b=Tv8eLhy6bKcOnHGEpyLTNFPl+wJaa2eKk4/HAH1XsZZE4Gwv21IwvSTBvEIhkDBL8s
 1yssBkVYX9kSyeHoev85Q3s3HeNMXNtSH2DzX38QxtR6vw47RlAB93FE2LlA2T0jEAAp
 eQmqdTCMC0vEsNg1UrG37axDth1r4lG2CidlwvcH8ca+AQuZWFY5U6pCRbrV7PEVFu8W
 uIPuN090Wg0q3mE9mTcU5jsBxDFT1xtIetBH+BSQSurcEH/ixhxV04t2GDbKgqJY3bim
 eB/1j6yFuy0WVLYPE6B7q/0oBHFmvH7DG/QpvS6stDrkrfvL8fOWSbnzN4Dh8UJangIY
 dWyA==
X-Received: by 10.14.0.133 with SMTP id 5mr127300082eeb.29.1357164370592;
 Wed, 02 Jan 2013 14:06:10 -0800 (PST)
Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37])
 by mx.google.com with ESMTPS id
 w44sm100140868eep.6.2013.01.02.14.06.07
 (version=TLSv1/SSLv3 cipher=OTHER);
 Wed, 02 Jan 2013 14:06:09 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <50E4AF4C.2070902@FreeBSD.org>
Date: Thu, 03 Jan 2013 00:06:04 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:13.0) Gecko/20120628 Thunderbird/13.0.1
MIME-Version: 1.0
To: Adrian Chadd <adrian@freebsd.org>
Subject: Re: [RFC/RFT] calloutng
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <1357135374.54953.150.camel@revolution.hippie.lan>
 <CAJ-Vmo=mmm5zhwHyzKeg1VEL8hSz6_LxJAaLh74ArHF3_9KWaQ@mail.gmail.com>
In-Reply-To: <CAJ-Vmo=mmm5zhwHyzKeg1VEL8hSz6_LxJAaLh74ArHF3_9KWaQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 02 Jan 2013 22:06:18 -0000

On 02.01.2013 18:08, Adrian Chadd wrote:
> .. I'm pretty damned sure we're going to need to enforce a "never
> earlier than X" latency.

Do you mean here that we should never wake up before specified time 
(just as specified by the most of existing APIs), or that we should not 
allow sleep shorter then some value to avoid DoS? At least on x86 
nanosleep(0) doesn't allow to block the system. Also there is already 
present mechanism for specifying minimum timer programming interval in 
eventtimers(9) KPI.

> Is there a more detailed writeup of calloutng somewhere, besides
> David's slides? The wiki page is rather empty.

There are updated manual pages in the patch. Also Davide written some 
blog during GSoC. Now we are working on papers for the AsiaBSDCon.

> Eg - I think this work does coalesce wakeups, right? Or it can? So
> when in low-power scenarios you can end up with lower-resolution
> callout periods, but many less CPU wakeups a second?

This work does coalesce wakeups out of the box, but also provide ways to 
improve it further, where possible. With additional tuning of some 
kernel subsystems and drivers I was able to drop total idle interrupt 
rate down to 10-15Hz on arm and 20-30Hz on x86.

> (Do we actually _expose_ wakeups-per-second somewhere?)

On systems with ACPI there are average per-CPU sleep times exposed via 
sysctls dev.cpu.X.cx_usage. Also cpu_idle() call rate calculated by both 
schedulers for purposes of idle loop optimizations, but it is not 
exposed outside now. Also for idle SMP system enabling COUNT_IPIS should 
give number of interrupts in systat comparable to number of wakeups. I 
am mostly using the last way.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 05:52:45 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id E965A71D;
 Thu,  3 Jan 2013 05:52:45 +0000 (UTC)
 (envelope-from kob6558@gmail.com)
Received: from mail-ea0-f176.google.com (mail-ea0-f176.google.com
 [209.85.215.176])
 by mx1.freebsd.org (Postfix) with ESMTP id E335C6CE;
 Thu,  3 Jan 2013 05:52:44 +0000 (UTC)
Received: by mail-ea0-f176.google.com with SMTP id d13so5967972eaa.7
 for <multiple recipients>; Wed, 02 Jan 2013 21:52:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=YIee0nzPlls5BnzFjKLFHW7KoULSislsRykqse+eXTw=;
 b=PhzqX8ilrQY+1U+fo3LmWn5BIEhr+UvMkrclm3V7ctg08p26oOrMSmLRsy7YofqmGN
 U75qKZv4DoBeDgOiVn91Jk71GnGNOlDVo+/7Xs07TiPgs/KdKMxin+m8JUOkbkL+yZvA
 33OnhVMRQwIRxc/mRBlwEM2ZI8eCr0NcIEb8kZNeiCM6RzosYQDY5G/qknV9iv4EHjz9
 jTofSwPkY8GOX/Oq9i/9fznwptIZ1BNRGpMXu7uV9gU79nvYg2ZCW7rDyB24p5/tVpY/
 +E9R8WjEUDbVAI7RpCJ3fofcO6NinxswQJPmk5EIfF+MUHBeXy2PCSLkbpmbY8xUH2RI
 Hg0g==
MIME-Version: 1.0
Received: by 10.14.184.134 with SMTP id s6mr130647516eem.43.1357192358178;
 Wed, 02 Jan 2013 21:52:38 -0800 (PST)
Received: by 10.223.170.193 with HTTP; Wed, 2 Jan 2013 21:52:37 -0800 (PST)
In-Reply-To: <50E4AF4C.2070902@FreeBSD.org>
References: <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org>
 <50D03173.9080904@FreeBSD.org>
 <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <1357135374.54953.150.camel@revolution.hippie.lan>
 <CAJ-Vmo=mmm5zhwHyzKeg1VEL8hSz6_LxJAaLh74ArHF3_9KWaQ@mail.gmail.com>
 <50E4AF4C.2070902@FreeBSD.org>
Date: Wed, 2 Jan 2013 21:52:37 -0800
Message-ID: <CAN6yY1vRJN8EpKpYARfkShRzmPfC4VEw33O1mfppZ+D+8iebgQ@mail.gmail.com>
Subject: Re: [RFC/RFT] calloutng
From: Kevin Oberman <kob6558@gmail.com>
To: Alexander Motin <mav@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>, Adrian Chadd <adrian@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 05:52:46 -0000

On Wed, Jan 2, 2013 at 2:06 PM, Alexander Motin <mav@freebsd.org> wrote:
> On 02.01.2013 18:08, Adrian Chadd wrote:
>>
>> .. I'm pretty damned sure we're going to need to enforce a "never
>> earlier than X" latency.
>
>
> Do you mean here that we should never wake up before specified time (just as
> specified by the most of existing APIs), or that we should not allow sleep
> shorter then some value to avoid DoS? At least on x86 nanosleep(0) doesn't
> allow to block the system. Also there is already present mechanism for
> specifying minimum timer programming interval in eventtimers(9) KPI.

I can see serious performance issues with some hardware (wireless
comes to mind) if things happen too quickly. Intuition is that it
could also play hob with VMs.

I believe that the proper way is to wake between  T_X and T_X + D.
This assumes that D is max_wake_delay, not deviation, which leaves us
at the original of (T_X) =< event_time =< (T_X + D).
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6558@gmail.com

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 08:42:25 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 5737B333;
 Thu,  3 Jan 2013 08:42:25 +0000 (UTC)
 (envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
 by mx1.freebsd.org (Postfix) with ESMTP id 06BBB6CF;
 Thu,  3 Jan 2013 08:42:24 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
 id C099A7300A; Thu,  3 Jan 2013 09:41:26 +0100 (CET)
Date: Thu, 3 Jan 2013 09:41:26 +0100
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: Kevin Oberman <kob6558@gmail.com>
Subject: Re: [RFC/RFT] calloutng
Message-ID: <20130103084126.GC54360@onelab2.iet.unipi.it>
References: <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <1357135374.54953.150.camel@revolution.hippie.lan>
 <CAJ-Vmo=mmm5zhwHyzKeg1VEL8hSz6_LxJAaLh74ArHF3_9KWaQ@mail.gmail.com>
 <50E4AF4C.2070902@FreeBSD.org>
 <CAN6yY1vRJN8EpKpYARfkShRzmPfC4VEw33O1mfppZ+D+8iebgQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAN6yY1vRJN8EpKpYARfkShRzmPfC4VEw33O1mfppZ+D+8iebgQ@mail.gmail.com>
User-Agent: Mutt/1.4.2.3i
Cc: Davide Italiano <davide@freebsd.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>, Adrian Chadd <adrian@freebsd.org>,
 Alexander Motin <mav@freebsd.org>, freebsd-arch@freebsd.org,
 FreeBSD Current <freebsd-current@freebsd.org>,
 Marius Strobl <marius@alchemy.franken.de>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 08:42:25 -0000

On Wed, Jan 02, 2013 at 09:52:37PM -0800, Kevin Oberman wrote:
> On Wed, Jan 2, 2013 at 2:06 PM, Alexander Motin <mav@freebsd.org> wrote:
> > On 02.01.2013 18:08, Adrian Chadd wrote:
> >>
> >> .. I'm pretty damned sure we're going to need to enforce a "never
> >> earlier than X" latency.
> >
> >
> > Do you mean here that we should never wake up before specified time (just as
> > specified by the most of existing APIs), or that we should not allow sleep
> > shorter then some value to avoid DoS? At least on x86 nanosleep(0) doesn't
> > allow to block the system. Also there is already present mechanism for
> > specifying minimum timer programming interval in eventtimers(9) KPI.
> 
> I can see serious performance issues with some hardware (wireless
> comes to mind) if things happen too quickly. Intuition is that it
> could also play hob with VMs.
> 
> I believe that the proper way is to wake between  T_X and T_X + D.
> This assumes that D is max_wake_delay, not deviation, which leaves us
> at the original of (T_X) =< event_time =< (T_X + D).

i think "max delay" was the intended meaning of the D parameter.
We picked bad names (tolerance, deviation,...) for it.

cheers
luigi

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 14:45:55 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 8ED726E5;
 Thu,  3 Jan 2013 14:45:55 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
 [211.29.132.184])
 by mx1.freebsd.org (Postfix) with ESMTP id F2217C2F;
 Thu,  3 Jan 2013 14:45:54 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r03EjbNs031145
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 4 Jan 2013 01:45:39 +1100
Date: Fri, 4 Jan 2013 01:45:37 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: [RFC/RFT] calloutng
In-Reply-To: <50E4A902.4050307@FreeBSD.org>
Message-ID: <20130103232413.O947@besplex.bde.org>
References: <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <20130102162206.GA45701@onelab2.iet.unipi.it>
 <20130102170934.GA82219@kib.kiev.ua> <50E4A902.4050307@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=BrrFWvr5 c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mUAV9h2nInsA:10
 a=F1MZp-0HJ9x99sWca6YA:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: Davide Italiano <davide@FreeBSD.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@FreeBSD.org>, freebsd-arch@FreeBSD.org,
 Konstantin Belousov <kostikbel@gmail.com>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 14:45:55 -0000

On Wed, 2 Jan 2013, Alexander Motin wrote:

> On 02.01.2013 19:09, Konstantin Belousov wrote:
>> On Wed, Jan 02, 2013 at 05:22:06PM +0100, Luigi Rizzo wrote:
>>> Probably one way to close this discussion would be to provide
>>> a sysctl so the sysadmin can decide which point in the interval
>>> to pick when there is no suitable callout already scheduled.
>> Isn't trying to synchronize to the external events in this way unsafe ?
>> I remember, but cannot find the reference right now, a scheduler
>> exploit(s) which completely hide malicious thread from the time
>> accounting, by making it voluntary yielding right before statclock
>> should fire. If statistic gathering could be piggy-backed on the
>> external interrupt, and attacker can control the source of the external
>> events, wouldn't this give her a handle ?

Fine-grained timeouts complete fully opening this security hole.
Synchronization without fine-grained timeouts might allow the same,
but is harder to exploit since you can't control the yielding points
directly.  With fine-grained timeouts, you just have to predict the
statclock firing points.  Use one timeout to arrange to yield just
before statclock fires and another to regain control just after it has
fired.  If the timeout resolution is say 50 usec, then this can hope
to run for all except 100 usec out of every 1/stathz seconds.  With
stathz = 128, 1/stathz is 7812 usec, so this gives 7712/7812 of the
CPU with 0 statclock ticks.  Since the scheduler never sees you running,
your priority remains minimal, so the scheduler should prefer to run
you whenever a timeout expires, with only round-robin with other
minimal-priority threads preventing you getting 7712/7812 of the (user
non-rtprio) CPU.

The previous stage of fully opening this security hole was changing
(the default) HZ from 100 to 1000.  HZ must not be much smaller than
stathz, else the security hole is almost fully open.  With HZ = 100
being less than stathz and timeout granularity limiting the fine control
to 2/HZ = 20 msec (except you can use a periodic itimer to get a 1/HZ
granularity at a minor cost of getting more SIGALRMs), it is impossible
to get near 100% of the CPU with 0 statclock ticks.  After yielding,
you can't get control for another 100 or 200 msec.  Since this exceeds
1/stathz = 78.12 usec, you can only hide from statclock ticks by not
running very often or for very long.  Limited hiding is possible by
wasting even more CPU to determine when to hide: since the timeout
granularity is large, it is also ineffective for determining when to
yield.  So when running, you must poll the current time a lot to
determine when to yield.  Yield just before statclock fires, as above.
(Do it 50 usec early, as above, to avoid most races involving polling
the time.)  This actually has good chances of not limiting the hiding
too much, depending on the details of the scheduling.  It yields just
before a statclock tick.  After this tick fires, if the scheduler
reschedules for any reason, then the hiding process would most likely
be run again, since its priority is minimal.  But at least the old
4BSD scheduler doesn't reschedule after _every_ statclock tick.  This
depends on the bugfeature that the priority is not checked on _every_
return to user mode (sched_clock() does change the priority, but this
is not acted on until much later).  Without this bugfeature, there
would be excessive context switches.  OTOH, with timeouts, at least
old non-fine-grained ones, you can force a rescheduling that is acted
on soon enough simply by using timeouts (since timeouts give a context
switch to the softclock thread, the scheduler has no option to skip
checking the priority on return to user mode).

After the previous stage of changing HZ to 1000, the granuarity is fine
enough for using timeouts to hide from the scheduler.  Using a periodic
itimer to get a granularity of 1000 usec, start hiding 50-1000 usec
before each statclock tick and regain control 1000 usec later.  With
stathz = 128, 6812/7812 of the CPU with 0 statclock ticks.  Not much
worse (for the hider) than 7712/7812.

Statclock was supposed to be aperiodic to avoid hiding (see
statclk-usenix93.ps), but this was never implemented in FreeBSD.  With
fine-grained timeouts, it would have to be very aperiodic, to the point
of giving large inaccuracies, to limit the hiding very much.  For
example, suppose that it has an average period of 7812 usec with +-50%
jitter.  You would try to hide from it most of the time by running for
a bit less than 7812/2 usec before yielding in most cases.  If too
much scheduling is done on each statclock tick, then you are likely
to regain control after each one (as above) and then know that there
is almost a full minimal period until the next one.  Otherwise, it
seems to be necessary to determine when the previous statclock tick
occurred, so as to determine the minimum time until the next one.

> There are many different kinds of accounting with different characteristics. 
> Run time for each thread calculated using high resolution per-CPU clocks on 
> each context switch. It is impossible to hide from it.

But this (td_runtime) is not used at all for scheduling, since the
high-resolution per-CPU clocks didn't exist originally and now since
reading of them is expensive.  However, the read is done on every
context switch.  This pessimizes context switches, with the
pessimization reduced a little by using inaccurate cpu_ticks() timers
instead of timecounters, so the expenses for using td_runtime for
scheduling are mostly already paid for.  The main case where it doesn't
work is when there are few context switches, but there are usually
more than a few for interrupts to ithreads for network and disk i/o.
Updating td_runtime in statclock() (where it is not normally updated
since there is no context switch) would probably make td_runtime usable
for scheduling without increasing the pessimization much.

> System load average 
> updated using callout and aligned with hardclock(). Hiding from it was easy 
> before, but it can be made more complicated (asynchronous) now.

I think you mean it was easy because the timeout period is so long.  Any
aperiodicity in a timer of course means that it is harder to predict.  The
load average timeout was already randomized (I think it is 5 +- 2 seconds
with 1/hz granularity).  That is a large variance, but you can still hide
from it about 3/5 of the time.

loadav is not very important, and is not even used by SCHED_ULE.  SCHED_ULE
uses more fine-grained statistics based on statclock.

> Per-CPU 
> SYSTEM/INTERRUPT/USER/NICE/IDLE counters are updated by statcklock(), that is 
> asynchronous to hardclock() and hiding from it supposed to be more 
> complicated. But even before it was possible, since hardclock() frequency is 
> 8 times higher then statclock() one.

These are not used for scheduling.

In SCHED_4BSD, little more than the event of a statclock tick is used for
scheduling.  sched_clock() uses this to update td_estcpu and then the
priority but not much more.  In SCHED_ULE, sched_clock() updates many
more private variables.

There are also statistics for memory use and similar things maintained
by statclock().  These are not very important, like loadav for SCHED_ULE
(purely information, for userland).  I don't see s better way than
_periodic_ statclock ticks for maintaining these.  Aperiodicity just
makes them less accurate for the usual case where there is no hiding
(unless this is not the usual case, due to accidental synchronization
causing non-malicious hiding, and if this is a problem then it can be
fixed using only a small amount of aperiodicity).

> More important for scheduling fairness 
> thread's CPU percentage is also based on hardclock() and hiding from it was 
> trivial before, since all sleep primitives were strictly aligned to 
> hardclock(). Now it is slightly less trivial, since this alignment was 
> removed and user-level APIs provide no easy way to enforce it.

%cpu is actually based on statclock(), and not even used for scheduling.

The alignment of sleep primitives mainly increased the chances of accidental
synchronization.  (This was apparently a problem when all clocks were
derived from the lapic timer.  I didn't like the fixes for that.)  For
malicious hiding, it only makes the hiding less trivial for hardclock ticks
to be periodic and somewhat aligned with statclock ticks.  The phase will
drift, so that there is no long-term alignment, unless the clocks are
derived for the same one (and not randomized).  So the statclock firings
will sometimes be mispredicted, or the hiding would have to be conservative,
or the prediction updated a lot.  It is simplest and probably most malicious
to accept occasional mispredictions and recalibrate the prediction after
missing.  This is any easy case of relcalibrating often for highly aperiodic
statclocks.

> The only way to get really safe accounting is forget about sampling and use 
> potentially more expensive other ways. It was always stopped by lack of cheap 
> and reliable clocks. But since TSC is P-state invariant on most of CPUs 
> present now it could probably be reconsidered.

And even if the clock is expensive and unreliable, it is already used for
td_runtime, as described above.  Large unreliability is actually less of
a problem for scheduling than for td_runtime -- an error of 50% would
barely affect scheduling since scheduling is only heuristic, but if
td_runtime is off by 50% (as it often is without P-state invariance and
with longstanding bugs in cpu_ticks() calibration), then it gives nonsense
like user times exceeding real times by 50%.

When I last thought about using reliable clocks for statistics, I was
most interested in simplifiying things by removing all the the tick
counters and not in making schedulers fairer.  I still can't see how
to do the former.  To do the user/sys/interrupt accounting accurately,
the clock would have to be read on every syscall entry and exit, and
that is too much (it is bad enough that it is read on almost every
interrupt, without even making interrupt accounting actually work
right).  But for scheduling decisions, a fuzzy copy of the full
td_runtime is enough, at least for SCHED_4BSD.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 15:51:15 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 0D15AD97
 for <arch@freebsd.org>; Thu,  3 Jan 2013 15:51:15 +0000 (UTC)
 (envelope-from des@des.no)
Received: from smtp.des.no (smtp.des.no [194.63.250.102])
 by mx1.freebsd.org (Postfix) with ESMTP id CA254C2
 for <arch@freebsd.org>; Thu,  3 Jan 2013 15:51:14 +0000 (UTC)
Received: from ds4.des.no (smtp.des.no [194.63.250.102])
 by smtp.des.no (Postfix) with ESMTP id 5F3016B84
 for <arch@freebsd.org>; Thu,  3 Jan 2013 16:51:13 +0100 (CET)
Received: by ds4.des.no (Postfix, from userid 1001)
 id 2C0AD8B5E; Thu,  3 Jan 2013 16:51:13 +0100 (CET)
From: =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To: arch@freebsd.org
Subject: BURN_BRIDGES in bsd.own.mk
Date: Thu, 03 Jan 2013 16:51:12 +0100
Message-ID: <86zk0q1czz.fsf@ds4.des.no>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (berkeley-unix)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 15:51:15 -0000

We still have code in bsd.own.mk to support a selection of old-style
NO_FOO options, protected by .if !defined(BURN_BRIDGES) (which is not in
any way related to the similarly-named kernel option).  There is also
bsd.compat.mk, which handles the even older-style NOFOO options (by
translating them to NO_FOO and emitting a warning).  These chunks of
code date back to 2006 and 2004, respectively.  I think it's way past
time we nuked them.  Any objections?

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 16:12:26 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 32EE842C;
 Thu,  3 Jan 2013 16:12:26 +0000 (UTC)
 (envelope-from mavbsd@gmail.com)
Received: from mail-la0-f51.google.com (mail-la0-f51.google.com
 [209.85.215.51]) by mx1.freebsd.org (Postfix) with ESMTP id 4875F192;
 Thu,  3 Jan 2013 16:12:24 +0000 (UTC)
Received: by mail-la0-f51.google.com with SMTP id fj20so8223223lab.38
 for <multiple recipients>; Thu, 03 Jan 2013 08:12:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=x-received:sender:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=Ph9x+C/HqilLh+giuX2YFQtzManr/VaLiqQB1Y/XUuA=;
 b=cn7hfXcSx2tfRJOyNVaZxZxUQr62OM29cZRc+20Z8lsortEHEpc3O0v6F/KdH6SPhF
 EfKysg2/bmPoaAzkbcZPXk5LfevG2KebC4gbRIZVEh5W2L8O0hIaMCrE6x0ScscLY/d4
 p8l+U6cDUtmT8jHOSb4tlLXtNEK54wWv9FLz5mo+IYa0xuQU8Dt5GJ+7d1A3iVxYFRQF
 yfOl6Fp8+ee7Hnd5EuKnbjmF4if7tcy7sMAiV2YdY+NC76yq1gLARnD0Wzqq0Hz5P9sV
 oEm4Zlws7Ee08bZPSnmyjone6sze43OS+Bo489yXyhi4M0YmQ6IrMqAN4a6WU7/IbNSf
 pFoQ==
X-Received: by 10.112.38.72 with SMTP id e8mr15978420lbk.123.1357229543977;
 Thu, 03 Jan 2013 08:12:23 -0800 (PST)
Received: from mavbook.mavhome.dp.ua (mavhome.mavhome.dp.ua. [213.227.240.37])
 by mx.google.com with ESMTPS id
 hc20sm18566188lab.11.2013.01.03.08.12.18
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 03 Jan 2013 08:12:21 -0800 (PST)
Sender: Alexander Motin <mavbsd@gmail.com>
Message-ID: <50E5ADE1.4020104@FreeBSD.org>
Date: Thu, 03 Jan 2013 18:12:17 +0200
From: Alexander Motin <mav@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:13.0) Gecko/20120628 Thunderbird/13.0.1
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
Subject: Re: [RFC/RFT] calloutng
References: <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org>
 <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org>
 <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <20130102162206.GA45701@onelab2.iet.unipi.it>
 <20130102170934.GA82219@kib.kiev.ua> <50E4A902.4050307@FreeBSD.org>
 <20130103232413.O947@besplex.bde.org>
In-Reply-To: <20130103232413.O947@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Davide Italiano <davide@FreeBSD.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@FreeBSD.org>, freebsd-arch@FreeBSD.org,
 Konstantin Belousov <kostikbel@gmail.com>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 16:12:26 -0000

On 03.01.2013 16:45, Bruce Evans wrote:
> On Wed, 2 Jan 2013, Alexander Motin wrote:
>> More important for scheduling fairness thread's CPU percentage is also
>> based on hardclock() and hiding from it was trivial before, since all
>> sleep primitives were strictly aligned to hardclock(). Now it is
>> slightly less trivial, since this alignment was removed and user-level
>> APIs provide no easy way to enforce it.
>
> %cpu is actually based on statclock(), and not even used for scheduling.

May be for SCHED_4BSD, but not for SCHED_ULE.  In SCHED_ULE both %cpu 
and thread priority based on the same ts_ticks counter, that is based on 
hardclock() as time source. Interactivity calculation uses alike logic 
and uses the same time source.

-- 
Alexander Motin

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 16:55:07 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@FreeBSD.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id 8A6F27E4;
 Thu,  3 Jan 2013 16:55:07 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
 [211.29.132.184])
 by mx1.freebsd.org (Postfix) with ESMTP id 21F17364;
 Thu,  3 Jan 2013 16:55:06 +0000 (UTC)
Received: from c211-30-173-106.carlnfd1.nsw.optusnet.com.au
 (c211-30-173-106.carlnfd1.nsw.optusnet.com.au [211.30.173.106])
 by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id r03GstDD010598
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Fri, 4 Jan 2013 03:54:56 +1100
Date: Fri, 4 Jan 2013 03:54:55 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Alexander Motin <mav@FreeBSD.org>
Subject: Re: [RFC/RFT] calloutng
In-Reply-To: <50E5ADE1.4020104@FreeBSD.org>
Message-ID: <20130104034917.O1929@besplex.bde.org>
References: <20121225232126.GA47692@alchemy.franken.de>
 <50DB4EFE.2020600@FreeBSD.org>
 <1356909223.54953.74.camel@revolution.hippie.lan>
 <20121231061735.GA5866@onelab2.iet.unipi.it>
 <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it>
 <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it>
 <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn+uqv7w@mail.gmail.com>
 <20130102162206.GA45701@onelab2.iet.unipi.it>
 <20130102170934.GA82219@kib.kiev.ua>
 <50E4A902.4050307@FreeBSD.org> <20130103232413.O947@besplex.bde.org>
 <50E5ADE1.4020104@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.0 cv=Zr21sKHG c=1 sm=1 a=kj9zAlcOel0A:10
 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=mUAV9h2nInsA:10
 a=5cdyoQTr-aBsL13Ni88A:9 a=CjuIK1q_8ugA:10 a=TEtd8y5WR3g2ypngnwZWYw==:117
Cc: Davide Italiano <davide@FreeBSD.org>,
 Ian Lepore <freebsd@damnhippie.dyndns.org>,
 Marius Strobl <marius@alchemy.franken.de>,
 FreeBSD Current <freebsd-current@FreeBSD.org>, freebsd-arch@FreeBSD.org,
 Konstantin Belousov <kostikbel@gmail.com>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 16:55:07 -0000

On Thu, 3 Jan 2013, Alexander Motin wrote:

> On 03.01.2013 16:45, Bruce Evans wrote:
>> On Wed, 2 Jan 2013, Alexander Motin wrote:
>>> More important for scheduling fairness thread's CPU percentage is also
>>> based on hardclock() and hiding from it was trivial before, since all
>>> sleep primitives were strictly aligned to hardclock(). Now it is
>>> slightly less trivial, since this alignment was removed and user-level
>>> APIs provide no easy way to enforce it.
>> 
>> %cpu is actually based on statclock(), and not even used for scheduling.
>
> May be for SCHED_4BSD, but not for SCHED_ULE.  In SCHED_ULE both %cpu and 
> thread priority based on the same ts_ticks counter, that is based on 
> hardclock() as time source. Interactivity calculation uses alike logic and 
> uses the same time source.

Hmm.  I missed this because it hacks on the 'ticks' global.  It is clearer
in intermediate versions which use the scheduler API sched_tick(), which
is the hardclock analogue of sched_clock() for statclock.  sched_tick() is
now bogus since it is null for all schedulers.

Bruce

From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  3 18:15:06 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 1EF354B5
 for <arch@freebsd.org>; Thu,  3 Jan 2013 18:15:06 +0000 (UTC)
 (envelope-from lists@eitanadler.com)
Received: from mail-la0-f53.google.com (mail-la0-f53.google.com
 [209.85.215.53]) by mx1.freebsd.org (Postfix) with ESMTP id 9F80B94C
 for <arch@freebsd.org>; Thu,  3 Jan 2013 18:15:05 +0000 (UTC)
Received: by mail-la0-f53.google.com with SMTP id fn20so8301645lab.40
 for <arch@freebsd.org>; Thu, 03 Jan 2013 10:14:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=eitanadler.com; s=0xdeadbeef;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type:content-transfer-encoding;
 bh=cmu5rNPICt5KD0fe1dwgAGLnu39w8Mnb6GROgSOaWXY=;
 b=ZXzoUJ3XapLVmsHZAmib7BWJNtMfpPiJqvc5XJBE1mZYSTWL4rmRCCCjqfCI2wqbOJ
 tY87pMlkHgis+cQ3XHrCBqfRnplSmM94znrD5fjUdoYj+tkuTm7y1+Rs5Gm3pQ8aEYL7
 W8IY4lxPtSUGoXnvtsaH3F5lp1SgmrVZmwCts=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type:content-transfer-encoding:x-gm-message-state;
 bh=cmu5rNPICt5KD0fe1dwgAGLnu39w8Mnb6GROgSOaWXY=;
 b=WGpV+hOz+xK8azf1lvf7CYnSI1k0rBHKiXECLh00VcDWyPjwOMCslURw4kH7cn0Clu
 ipG6/wB8iCRyciHgKTshf5vS1Mv6JqL4xAUOdJhAz8W861C+ovH8faTauAi7mMOGV3Ha
 SwsGvDTrLo1d5vXsZJ+z5B2WUMhfLjdmxCUjWHEnENrTC9sUkxOI1n4u0+2rK8zhwl+C
 W9JztjSKK8Fa6OjaOKuG0cBJdf8Qnu07eqa4DWMIVHOMfS6f3zBtInuYfrlc1fmDNGIN
 nHuWWORbU7yInosZ5DrqFoflY2WxvCjYRuol7Bxmko6NkvJe664/Q1EZyNas48iu0z7A
 hFEQ==
Received: by 10.112.50.138 with SMTP id c10mr20274441lbo.104.1357236899187;
 Thu, 03 Jan 2013 10:14:59 -0800 (PST)
MIME-Version: 1.0
Received: by 10.112.75.200 with HTTP; Thu, 3 Jan 2013 10:14:29 -0800 (PST)
In-Reply-To: <86zk0q1czz.fsf@ds4.des.no>
References: <86zk0q1czz.fsf@ds4.des.no>
From: Eitan Adler <lists@eitanadler.com>
Date: Thu, 3 Jan 2013 13:14:29 -0500
Message-ID: <CAF6rxg=ERVyrKdOZCO6AVWchynUD1uRwgb9js1T_SS1UJg4GzQ@mail.gmail.com>
Subject: Re: BURN_BRIDGES in bsd.own.mk
To: =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= <des@des.no>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQnwBIhRLkKaZ9VHjrpTXVChwVasCNsZznSqxcQfa2ocvPTBBUMoBEE6EPpAwwxnm4jmM0ZC
Cc: arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2013 18:15:06 -0000

On 3 January 2013 10:51, Dag-Erling Sm=C3=B8rgrav <des@des.no> wrote:
> We still have code in bsd.own.mk to support a selection of old-style
> NO_FOO options, protected by .if !defined(BURN_BRIDGES) (which is not in
> any way related to the similarly-named kernel option).  There is also
> bsd.compat.mk, which handles the even older-style NOFOO options (by
> translating them to NO_FOO and emitting a warning).  These chunks of
> code date back to 2006 and 2004, respectively.  I think it's way past
> time we nuked them.  Any objections?

No.  See http://www.freebsd.org/cgi/query-pr.cgi?pr=3Dconf/155738

I've been told this required an exp-run last I asked.


--=20
Eitan Adler

From owner-freebsd-arch@FreeBSD.ORG  Fri Jan  4 06:23:41 2013
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id B56BD2A0
 for <arch@freebsd.org>; Fri,  4 Jan 2013 06:23:41 +0000 (UTC)
 (envelope-from imp@bsdimp.com)
Received: from mail-ie0-f172.google.com (mail-ie0-f172.google.com
 [209.85.223.172]) by mx1.freebsd.org (Postfix) with ESMTP id 8A3CBEB7
 for <arch@freebsd.org>; Fri,  4 Jan 2013 06:23:41 +0000 (UTC)
Received: by mail-ie0-f172.google.com with SMTP id c13so19470937ieb.3
 for <arch@freebsd.org>; Thu, 03 Jan 2013 22:23:41 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:sender:subject:mime-version:content-type:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id:references
 :to:x-mailer:x-gm-message-state;
 bh=1pV0+a9+Pvhhj+eyoA4dYE+fS7P5Yhq9Cvyw4oj6VAI=;
 b=Y7KilCIiQ3zLFob2fPSlskeXyn4NTuuOwlTcLgVOHlYIOhdbuAk9F6A5TeecwfEux5
 BL9tIZm1pkvtAlOW3ne3rRT5BroUu86/HIGSdQcO341gnKjzX/QcRr6+bFOq41ts+NfO
 QqDJCOxLmL0PbW+mQQnMwHooypOV87qll1OYXOTjDdG/SAl/7eyxemKLO3gpP8pMKs8G
 aodNtnL93DWifuuFgEhZnMHHL+LHA6N1WvY5mDbAQOjEQsQNWHyeaMcFR86ONDw1bwic
 qx0vg8HoF0mav7epQbXJ/XQgyYIu3/8h4VTeIEDl+bZYnJ1yBl1UM1YJZiyHBwLsMlS2
 /W8g==
X-Received: by 10.42.51.142 with SMTP id e14mr39413161icg.2.1357280620881;
 Thu, 03 Jan 2013 22:23:40 -0800 (PST)
Received: from [192.168.43.239] (me62836d0.tmodns.net. [208.54.40.230])
 by mx.google.com with ESMTPS id xn10sm46361139igb.4.2013.01.03.22.23.38
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 03 Jan 2013 22:23:39 -0800 (PST)
Sender: Warner Losh <wlosh@bsdimp.com>
Subject: Re: BURN_BRIDGES in bsd.own.mk
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=iso-8859-1
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <86zk0q1czz.fsf@ds4.des.no>
Date: Fri, 4 Jan 2013 00:23:29 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <352CB28B-276F-4B3C-B153-02A3AD435938@bsdimp.com>
References: <86zk0q1czz.fsf@ds4.des.no>
To: =?iso-8859-1?Q?Dag-Erling_Sm=F8rgrav?= <des@des.no>
X-Mailer: Apple Mail (2.1085)
X-Gm-Message-State: ALoCoQndnb7LxNws/Yl49VOFkRD2OcusjdrW/7VTTOWgMp2feiWWMJ2ecnKl2+vFUBsNwRVya7Mz
Cc: arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 04 Jan 2013 06:23:41 -0000


On Jan 3, 2013, at 9:51 AM, Dag-Erling Sm=F8rgrav wrote:

> We still have code in bsd.own.mk to support a selection of old-style
> NO_FOO options, protected by .if !defined(BURN_BRIDGES) (which is not =
in
> any way related to the similarly-named kernel option).  There is also
> bsd.compat.mk, which handles the even older-style NOFOO options (by
> translating them to NO_FOO and emitting a warning).  These chunks of
> code date back to 2006 and 2004, respectively.  I think it's way past
> time we nuked them.  Any objections?

Kill them.

When they were put in, they were there to designate things that =
shouldn't be relied upon and would be deleted soon...

Warner