Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 2 Jan 2013 15:11:05 +0200
From:      Alexander Motin <mav@FreeBSD.org>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        Davide Italiano <davide@freebsd.org>, Ian Lepore <freebsd@damnhippie.dyndns.org>, freebsd-arch@freebsd.org, FreeBSD Current <freebsd-current@freebsd.org>, Marius Strobl <marius@alchemy.franken.de>
Subject:   Re: [RFC/RFT] calloutng
Message-ID:  <CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn%2Buqv7w@mail.gmail.com>
In-Reply-To: <20130102122743.GA43241@onelab2.iet.unipi.it>
References:  <50CCAB99.4040308@FreeBSD.org> <50CE5B54.3050905@FreeBSD.org> <50D03173.9080904@FreeBSD.org> <20121225232126.GA47692@alchemy.franken.de> <50DB4EFE.2020600@FreeBSD.org> <1356909223.54953.74.camel@revolution.hippie.lan> <20121231061735.GA5866@onelab2.iet.unipi.it> <50E16637.9070501@FreeBSD.org> <20130102105730.GA42542@onelab2.iet.unipi.it> <50E418EA.7030801@FreeBSD.org> <20130102122743.GA43241@onelab2.iet.unipi.it>

next in thread | previous in thread | raw e-mail | index | archive | help
02.01.2013 14:28 =D0=CF=CC=D8=DA=CF=D7=C1=D4=C5=CC=D8 "Luigi Rizzo" <rizzo@=
iet.unipi.it> =CE=C1=D0=C9=D3=C1=CC:
>
> On Wed, Jan 02, 2013 at 01:24:26PM +0200, Alexander Motin wrote:
> > On 02.01.2013 12:57, Luigi Rizzo wrote:
> ...
> > >i am not sure what you mean by "extending interval", but i believe the
> > >logic should be the following:
> > >
> > >- say user requests a timeout after X seconds and with a tolerance of =
D
> > >second
> > >   (both X and D are fractional, so they can be short).  Interpret
this as
> > >
> > >    "the system should do its best to generate an event between X and
X+D
> > >    seconds"
> > >
> > >- convert X to an absolute time, T_X
> > >
> > >- if there are any pending events already scheduled between T_X and
T_X+D,
> > >   then by definition they are acceptable. Attach the requested timeou=
t
> > >   to the earliest of these events.
> >
> > All above is true, but not following.
> >
> > >- otherwise, schedule an event at time T_X (because there is no valid
> > >   reason to generate a late event, and it makes no sense from an
> > >   energy saving standpoint, either -- see below).
> >
> > System may have many interrupts except timer: network, disk, ... WiFi
> > cards generate interrupts with AP beacon rate -- dozens times per
> > second. It is not very efficient to wake up CPU precisely at T_X time,
> > that may be just 100us earlier then next hardware interrupt. That's why
> > timer interrupts are scheduled at min(T_X+D, 0.5s, next hardclock, next
> > statclock, ...). As result, event will be handled within allowed range,
> > but real delay will depends on current environment conditions.
>
> I don't see why system events (hardclock, statclock, 0.5s,...)
> need to be treated specially -- and i am saying this also in
> the interest of simplifying the logic of the code.

Sure. That is mostly for historical reasons. At some point they should
disappear, just not now, as patch is already quite big.

> First of all, if you know that there is already a hardclock/statclock/*
> scheduled in [T_X, T_X+D] you just reuse that. This particular bullet
> was ""no event scheduled in [T_X, T_X+D]" so you need to generate
> a new one.

That is true, but my main point was about merging with external events,
which I can't predict and the only way to merge is increase sleep period,
hoping for better.

> Surely scheduling the event at T_X+D instead of T_X increases the
> chance of merging events. But the saving are smaller and smaller
> as the value X increases. This particular client will only
> change its request rate from 1/X to 1/(X+D) so in relative terms
> the gain is ( 1/X - 1/(X+D) ) / (1/(X+D) ) =3D D/X
>
> Example: if X =3D 300ms, and D =3D 10ms (as in the test case)
> you just save one interrupt every 30seconds by scheduling at
> T_X+D instead of T_X. Are we actually able to measure the
> difference ?
>
> Even at high interrupt rates (e.g. X =3D 1ms) you are not
> going to save a lot unless the tolerance D is very large,
> which is generally undesirable for other reasons
> (presumably, applications are not going to be happy
> if you artificially double their timeouts).
> Now, say your application requests timeouts every X =3D 300ms.

With default precision set to 5% it will be only 5% save from periods
increase. But that is absolutely not my goal!

Imagine different case: you have NIC interrupts at 1000Hz. Also you have
100 callouts with 100ms period each. If we program timer with absolute
precision, you will get about 2000Hz of total interrupt rate. But if we
allow just 2% deviation, most of callouts will be grouped with NIC
interrupts and total rate will be 1000Hz. Loosing _less_ then 2% of
precision we are reducing interrupt rate in _half_!

> > >It seems to me that you are instead extending the requested interval
> > >upfront, which causes some gratuitous pessimizations in scheduling
> > >the callout.
> > >
> > >Re. energy savings: the gain in extending the timeout cannot exceed
> > >the value D/X. So while it may make sense to extend a 1us request
> > >to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s,
> > >it is completely pointless from an energy saving standpoint to
> > >introduce a 10ms error on a 300ms request.
> >
> > I am not so sure in this. When CPU package is in C7 sleep state with al=
l
> > buses and caches shut down and memory set to self refresh, it consumes
> > very few (some milli-Watts) of power. Wake up from that state takes
> > 100us or even more with power consumption much higher then normal
> > operational one. Sure, if we compare it with power consumption of 100%
> > CPU load, difference between 10 and 100 wakeups per second may be small=
,
> > but when comparing to each other in some low-power environment for
> > mostly idle system it may be much more significant.
>
> see above -- at low rates the difference is not measurable,
> at high rates thCe only obvious answer is "do not use C7 unless
> if the next interrupt is due in less than 2..5 milliseconds"
>
> > >(even though i hate the idea that a 1us request defaults to
> > >a 50us delay; but that is hopefully something that can be tuned
> > >in a platform-specific way and perhaps at runtime).
> >
> > It is 50us on this ARM. On SandyBridge Core i7 it is only about 2us.
>
> very good, i suspected something similar, just wanted to be sure :)
>
> cheers
> luigi
>
> > --
> > Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAO4K=PUHAH=UNzMde0V2TwkN5vj3gw9hHj5yCQxDvdUn%2Buqv7w>