Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 May 2010 21:39:18 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Neel Natu <neelnatu@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, Alexander Motin <mav@freebsd.org>, src-committers@freebsd.org, Neel Natu <neel@freebsd.org>
Subject:   Re: svn commit: r208585 - head/sys/mips/mips
Message-ID:  <20100527200033.K1376@besplex.bde.org>
In-Reply-To: <AANLkTinZEhKiUIz2pQqPQvrAL3N2LX0e1zrc8oJcBoKa@mail.gmail.com>
References:  <201005270127.o4R1RPaT016558@svn.freebsd.org> <4BFDE4E3.4060300@FreeBSD.org> <AANLkTinZEhKiUIz2pQqPQvrAL3N2LX0e1zrc8oJcBoKa@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-599122876-1274960358=:1376
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Wed, 26 May 2010, Neel Natu wrote:

> On Wed, May 26, 2010 at 8:20 PM, Alexander Motin <mav@freebsd.org> wrote:
>> Neel Natu wrote:

>> Also, as soon as you run timer1 on frequency higher then hz - it is
>> strange to see
>> =A0 =A0 =A0 =A0stathz =3D hz;
>> =A0 =A0 =A0 =A0profhz =3D hz;
>> there. It is just useless. Better would be to do same as for x86:
>> =A0 =A0 =A0 =A0profhz =3D timer1hz;
>> =A0 =A0 =A0 =A0if (timer1hz < 128)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0stathz =3D timer1hz;
>> =A0 =A0 =A0 =A0else
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0stathz =3D timer1hz / (timer1hz / 128);
>>

This is almost unreadable due to \xa0.

 =09stathz =3D timer1hz / (timer1hz / 128);

only works right if timer1hz is a multiple of 128, or at least a
multiple of the final stathz..  Otherwise, there may be significant
rounding error in the calculation, and if the final stathz is not an
exact divisor of timer1hz it is impossible to generate stathz from
timer1hz by dividing it.  (This has always been broken for the
lapic timer on amd64 and i386.  stathz =3D 133 is only nearly a divisor
of 1000 or 2000, and 128 is even further from being a divisor of any
timer frequency that can generate hz 1000.  The effects of this can
be seen in systat(1) -v 1 output -- the reported lapic timer interrupt
frequencies jump every ~(lapic_timer_hz / stathz) seconds when the divider
compensates for the multiple not being exact.  Another bug visible in systa=
t
-v and vmstat -i output on ref9-amd64 right now is that the lapic timer
interrupt frequencies are all reported as 960.  hz is reported to be 1000,
but it is impossible to generate 1000 from 960.  Another bug in the lapic
timer code on amd64 and i386 is that it doesn't change the lapic timer
frequency to generate a high enough profhz.  profhz =3D 8192, which is
generated by the RTC on amd64 and i386, was adequate in 1990, and it
needs to be 100-1000 times larger now, but the lapic doesn't even generate
that; it claims to generates 1024, and this is even more impossible to
divide down from 960 than is 1000.)

> I see your point with the profiling timer. I'll fix that to be like x86.
>
> However it is not immediately obvious why we prefer to run the
> statistics timer at (or very close to) 128Hz. Any pointers?

At least SCHED_4BSD requires stathz to be almost 128.  More precisely,
it requires a clock of frequency about 16 Hz and divides stathz
internally by INVERSE_ESTCPU_WEIGHT =3D (8 * smp_cpus) to get this.  It
gets some extra resolution by accumulating ticks at stathz but has to
divide the result by 8 before feeding it to the priority adjustment,
else the adjustment would be too sensitive to recent activity, and/or
would overflow (overflow is avoided by clamping to the limit, but this
is bad too).  Dividing by smp_ncpus is a hack to avoid the overflow
at a cost of reducing sensitivity.  The requirement for stathz to
be almost 128 is pushed to the clock generator(s) to avoid having
dividers (other than the simple/historical division by 8) in both
the clock generator(s) and the scheduler(s).

WHen using lapic timers, I normally use lapic_timer_hz =3D hz =3D stathz =
=3D
profhz =3D 100, and don't worry about the completely broken profhz or the
scheduling problems from having stathz =3D hz.  The scheduling problems
are mostly caused by the hardware clocks behind stathz and hz being
indentical.  When they are identical, having stathz !=3D hz doesn't
help much, at least without the changes that I suggested a few months
ago (statclock() and hardclock() should never be called from the same
hardware interrupt).  There are 2 types of scheduling/statistics problems:
- malicious applications may hide from scheduling/statistics interrupts
   by arranging that they don't run across the interrupts.  This is easy
   to do while running for most of the time if hz is much larger than
   stathz (now the default :-().
- even non-malicious applications may hide from scheduling/statistics
   interrupts if the statclock and hardclock interrupts are too synchronous=
=2E
   This is a problem with the lapic timer interrupts in practice.  I think
   it takes almost perfect synchronization for there to be a problem in
   practice, and I can't see how the syncronization was perfect enough.
   For hz =3D 1000, lapic_timer_hz was 2000 and hardclock was called every
   second interrupt, while statclock was called every 2000/(stathz=3D133) =
=3D
   15th or 16th interrupt.  Since 15 is not a multiple of 2, statclock was
   normally called for the same lapic timer interrupt only every second
   interrupt.  This should be asynchronous enough.  I don't know the detail=
s
   of the current or previous implementation (where lapic_timer_hz is not
   2000) but IIRC the dividers don't know anything about the synchronicity
   problem so they could easily make it worse.

Bruce
--0-599122876-1274960358=:1376--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100527200033.K1376>