Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 17 Jan 2010 01:44:09 +1100 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Attilio Rao <attilio@freebsd.org>
Cc:        FreeBSD Arch <arch@freebsd.org>, Ed Maste <emaste@freebsd.org>
Subject:   Re: [PATCH] Statclock aliasing by LAPIC
Message-ID:  <20100116235558.E64689@delplex.bde.org>
In-Reply-To: <3bbf2fe11001160409w1dfdbb9j36458c52d596c92a@mail.gmail.com>
References:  <3bbf2fe10911271542h2b179874qa0d9a4a7224dcb2f@mail.gmail.com>  <200911301305.30572.jhb@freebsd.org> <3bbf2fe11001150706y765159a2jbd37c7ae4cf378f0@mail.gmail.com> <20100116205752.J64514@delplex.bde.org> <3bbf2fe11001160409w1dfdbb9j36458c52d596c92a@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-201474786-1263653049=:64689
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Sat, 16 Jan 2010, Attilio Rao wrote:

> 2010/1/16 Bruce Evans <brde@optusnet.com.au>:
>> On Fri, 15 Jan 2010, Attilio Rao wrote:
>>> ... There is an updated patch:
>>>
>>> http://www.freebsd.org/~attilio/Sandvine/STABLE_8/statclock_aliasing/st=
atclock_aliasing4.diff
>>
>> It seems to have the same fundamental bugs as the previous version.
>> The atrtc interrupt is too slow to use for anything, so it should never
>> be used if there is something better like the lapic timer available
>> (even the i8254 is better), and using it here doesn't even fix the
>> problem (malicious applications can very easily hide from statclock
>> by default since the default hz is much larger than the default stathz,
>> and malicious applications can not so easily hide from statclock
>> irrespective
>> of the misconfiguration of hz, since statclock is not random). =C2=A0See=
 my
>> previous reply and ftp://ftp.ee.lbl.gov/papers/statclk-usenix93.ps.Z for
>> more details.
>
> Well, the primary things I wanted to fix is not the hiding of
> malicious programs but the clock aliasing created when handling all
> the clocks by the same source.

Those could easily be a scheduler bug, or at least fixable there.

> About the slowness -- I'm fine with whatever additional source to
> LAPIC we would eventually use thus would you feel better if i8254 is
> used replacing atrtc?

You can just probably just use the LAPIC with programmed pseudo-not-very-
randomness (to delivery, not to the LAPIC interrupts which probably need
to remain periodic).

> Also note that atrtc is the default if LAPIC cannot be used. I don't
> understand why another source, even simpler (eg. i8254) would have
> been used in that specific case by the 'old' code.

The i8254 restarts itself automatically.  It only needs an EOI, which
is fairly efficient if it is on APIC.  Thus interrupting at say 2 KHz
with i8254 hopefully has less overhead than interrupting at 128 Hz
with atrtc, provided the i8254 interrupt handler does no more than the
lapic_timer one.  The i8254 is also programmable, so you can change its
period easily though not efficiently to program randomness with a
resolution of a few microseconds.

> What I mean, then is: I see your points, I'm not arguing that at all,
> but the old code has other problems that gets fixed with this patch
> (having different sources make the whole system more flexible) while
> the new things it does introduce are secondarilly (but still: I'm fine
> with whatever second source is picked up for statclock, profclock) if
> you really see a concern wrt atrtc slowness.

Did you see the points in my more detailed review?

We want to remove support for the old clock sources eventually, not
have more code to select them and more non-default configuration to
avoid them again.

I think I understand the actual bug now.  It is in lapic_handle_timer().
Statclock interrupts should never be delivered on the same lapic timer
interrupt as a hardclock interrupt (this is possible, at least with
default hz's, since hardclock interrupts are delivered every second
lapic timer interrupt), but they are.  This at best results in every
second statclock interrupt being in perfect sync with some hardclock
interrupt (I think it actually gives bunches of about lapic_timer_hz/stathz=
/2
(default 7 or 8) in or out of perfect sync.  Maybe the bunches are
what makes the problem serious).  To fix this, statclock interrupts
should be delayed until the next lapic timer interrupt if a hardclock
interrupt was just delivered, or done early if the next delay would
be more than half a lapic timer period.  This delay/advance also gives
some free pseudo-randomness.

See my more detailed review about statistics utilities not liking any
randomness.  Too bad for them.  The non-divisibility of lapic_timer_hz
by stathz with defaults already gives large delays.  You can even
reduce the maximum jitter using delay/advance instead of the current
method (from almost 1 lapic timer period (always late) to +- 1/2 of
1 lapic timer period (early or late)).

I don't see any reason to keep using stathz =3D 133 or even a non-multiple
or non-divisor or hz (but it needs to remain nearly 128 for historical
reasons until other layers are changed).  The non-multiple is to ensure
that independent clocks don't stay in sync for long.  Programmed
non-sync ensures this better.  Malicious programs just have different
problems predicting the 2 types of pseudo-not-very-randomness.  With
statclock ticks occurring exactly half-way between hardclock ticks,
malicious programs can a bit too easily wake up on a hardclock tick
and run for a half less epsilon of a hardclock tick without getting
accounted.  Oops, non-malicious programs can also do this a bit too
easily -- you can have a thundering herd wake up and all finish before
the accounting.  More pseudo-randomness seems to be needed.  I don't
see a good way to handle the thundering herd case.  For that you
actually want the statclock tick immediately after the hardclock tick
(but not in sync) quite often.  The following might work except for
inefficiency (time/power): make lapic_timer_hz say 10 times larger and
distribute statclock delivery randomly about hardclock delivery in the
39 slots 0, +-lapic_timer_period, ... +-19*lapic_timer_period, instead
of only in the 2 +-lapic_timer_period slots.  First try using the 0
slot with just these 2.

Perhaps similarly for profclock, but when it is fixed it should be
much larger than hz (10-10000 kHz according to machine speed and/or
sysctl), so lapic_timer_hz would have to be enormous to give a small
relative jitter for profclock.

Bruce
--0-201474786-1263653049=:64689--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100116235558.E64689>