Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 18 Jul 2001 22:13:21 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Ian Dowse <iedowse@maths.tcd.ie>
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re: Load average synchronisation and phantom loads 
Message-ID:  <Pine.BSF.4.21.0107182018440.95253-100000@besplex.bde.org>
In-Reply-To: <200107172119.aa30092@salmon.maths.tcd.ie>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 17 Jul 2001, Ian Dowse wrote:

> In message <Pine.BSF.4.21.0107180242310.67803-100000@besplex.bde.org>, Bruce Ev
> ans writes:
> >
> >I think that is far too much variation.  5 seconds is hard-coded into
> >the computation of the load average (constants in cexp[]), so even a
> >variation of +-1 ticks breaks the computation slightly.
> 
> I have not changed the mean inter-sample time from 5 seconds (*),
> so is this really a problem? There will be a slight time-warping
> effect in the load calculation, but even for the shorter 5-minute
> timescale, this will average out to typically no more than a few
> percent (i.e the "5 minutes" will instead normally be approx 4.8
> to 5.2 minutes). Apart from a marginally more wobbley xload display,
> this should not make any real difference.

It should average out to precisely the same as before if you haven't
changed the mean (mean = average :-).  The real difference may be
small, but I think it is an unnecessary regression.

> If the variation was much smaller than it is in the proposed patch,
> you could get a noticable drifting in and out of phase with processes
> that have a regular run-pause pattern. Obviously this is a much

No, phase matches will be very rare in practice no matter what the
random variation is.  The average difference between the actual sampling
time for the N'th sample and (5 * N) will drift away from 0.  I think
the average of the square of this difference is proportional to N
provided the variation is uniformly distributed.  The main problem
with a small variation is that the average difference won't drift away
from 0 very fast.  Even with a large variation, the drift might not be
fast enough.

> The alternative that I considered was to sample the processes once
> during every 5-second interval, but to place the sampling point
> randomly over the interval. That probably results in a better
> synchronisation-avoidance behaviour. However, to incorporate the
> sample into the load average requires either waiting until the end
> of the interval, or updating the load average at the time of
> sampling. The former introduces a new delay into the load average
> computation, and the latter results in a lot of very noticable
> jitter on the inter-sample interval.

I rather like this.  With immediate update, It's almost equivalent to
your current method with a random variation of between -5 and 5 seconds
instead of between -1 and 1 seconds.  Your current method doesn't
really reduce the jitter -- it just concentrates it into a smaller
interval.

> (*) Actually, I have changed the mean by 0.5 ticks, but that is a
> bug that I will fix. The "4 + random() % (hz * 2)" should be "4 +
> random() % (hz * 2 + 1)" instead.

I think we can do better by making the bug a feature and using a sample
time difference of slightly more than 5 seconds, e.g. 5.01 seconds.
Then processes that wake up every second and run for less than 1 tick
would be in phase precisely every 500 or 501 seconds, which is good
(we want them to be in phase in a uniform way so that they get counted).
Processes that wake up ever 1.002 seconds would then be in phase too
much, but 1.002 is much less magic than 1.000, so such processes are
hopefully rare.  Use a (small) random variation to reduce phase effects
for such processes.  I think there are none in the kernel.  I would try
using the following magic numbers:

    sample interval = 5.02 seconds (approx) (not 5.01, so that the random
                                             variation never gives a multiple
                                             of 1.00)
    random variation = 0+-0.01 seconds (approx)
    cexp[] = adjusted for 5.02 instead of 5.00

Note: sleep(), select() and non-periodic setitimer()'s also add 1 to the
timeout.  This should help reduce phase effects in userland.

> >Not another SYSINIT (all SYSINITs are evil IMO).  SI_SUB_PSEUDO is
> >bogus here -- there are no pseudo ttys here.  sched_setup() is a
> >good place to do this initialization.
> 
> John Baldwin suggested moving the load average calculation into
> kern_synch.c, so it would certainly make sense to initialise it
> from sched_setup() then. This seems like a good idea to me; does
> that sound OK?

Yes.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0107182018440.95253-100000>