From owner-freebsd-arch@FreeBSD.ORG  Tue Jan 19 19:19:54 2010
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B1BE01065670;
	Tue, 19 Jan 2010 19:19:54 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au
	[211.29.132.182])
	by mx1.freebsd.org (Postfix) with ESMTP id 422C08FC18;
	Tue, 19 Jan 2010 19:19:53 +0000 (UTC)
Received: from c220-239-227-214.carlnfd1.nsw.optusnet.com.au
	(c220-239-227-214.carlnfd1.nsw.optusnet.com.au [220.239.227.214])
	by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	o0JJJo1h020005
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 20 Jan 2010 06:19:51 +1100
Date: Wed, 20 Jan 2010 06:19:50 +1100 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@delplex.bde.org
To: Attilio Rao <attilio@freebsd.org>
In-Reply-To: <3bbf2fe11001190941s37f62c48tb91be0061b658b2c@mail.gmail.com>
Message-ID: <20100120055636.U68115@delplex.bde.org>
References: <3bbf2fe10911271542h2b179874qa0d9a4a7224dcb2f@mail.gmail.com>
	<20100116205752.J64514@delplex.bde.org>
	<3bbf2fe11001160409w1dfdbb9j36458c52d596c92a@mail.gmail.com>
	<201001191144.23299.jhb@freebsd.org>
	<3bbf2fe11001190927m10f73775p7b68eb4d3ce0470a@mail.gmail.com>
	<274B568B-81D9-4554-8C3A-888FF0CD7B08@samsco.org>
	<3bbf2fe11001190941s37f62c48tb91be0061b658b2c@mail.gmail.com>
MIME-Version: 1.0
Content-Type: MULTIPART/MIXED; BOUNDARY="0-2077489133-1263928790=:68115"
Cc: FreeBSD Arch <arch@freebsd.org>, Scott Long <scottl@samsco.org>,
	Ed Maste <emaste@freebsd.org>
Subject: Re: [PATCH] Statclock aliasing by LAPIC
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
	<mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Jan 2010 19:19:54 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--0-2077489133-1263928790=:68115
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

On Tue, 19 Jan 2010, Attilio Rao wrote:

> 2010/1/19 Scott Long <scottl@samsco.org>:
>> On Jan 19, 2010, at 10:27 AM, Attilio Rao wrote:
>>>
>>> 2010/1/19 John Baldwin <jhb@freebsd.org>:
>>>> My feeling, btw, is that the real solution is to not use a sampling cl=
ock
>>>> for
>>>> per-process stats, but to just use the cycle counter and keep separate
>>>> user,
>>>> system, and interrupt cycle counts (like the rux_runtime we have now).
>>>> =C2=A0This
>>>> makes calcru() trivial and eliminates many of the weird "going
>>>> backwards",
>>>> etc. problems. =C2=A0The only issue with this approach is that not all
>>>> platforms
>>>> have a cheap cycle counter (many embedded platforms lack one I think),=
 so
>>>> you
>>>> would almost need to support both modes of operation and maybe have an
>>>> #define
>>>> in <machine/param.h> to choose between the two modes.
>>>
>>> Generally that would be a good idea, but the problem is not only for
>>> the architectures not supporting it, but also for architectures that
>>> do (eg. TSC de-synchronization in some SMP environment).
>>>
>>
>> For process stats, TSC desync isn't a big problem. =C2=A0As a process mi=
grates
>> from one CPU to the other, its stats from the old cpu will be recorded, =
then
>> stats will be started on the new cpu. =C2=A0The only problem here is wit=
h
>> normalizing the different TSC's to a common reference. =C2=A0Maybe that =
can be
>> done when computing cp_times? =C2=A0This is definitely a case where 'per=
fect' is
>> the enemy of 'a hell of a lot better than we have now'.
>

Only the frequencies would need normalization, since the TSCs are per-CPU
and they hopefully don't get reset by suspend etc.  Separate frequencies
for separate CPUs are not supported now.

> I wouldn't like to be mistaken, but IIRC in some benchmarks kris@ did
> in the past years we were seeing TSC timers litterally going backwards
> after the de-synchronization (even on absolute measurement).

Do you really mean individual TSCs going backwards?  P-state-invariance
(?) should prevent the desync.  If the TSCs actually desync, then TSC
timecounters are sure to break, with timecounters going backwards being
a typical result (certain calculations overflow if time deltas are
unexpectedly large).  Timecounters used to be used for the equivalent
of rux_runtime.  There were/are no checks for timecounters themselves
going backwards, but sanity checks in the use of rux_runtime detected
this.  Now TSCs (if available) are normally used for rux_runtime.
Recalibration of the TSC's assumed-common frequency is buggy and can
easily cause bizarre user times when the frequency is changed.

Apart from that, rux_runtime is correct.  Good enough for scheduling
even when incorrect.

Bruce
--0-2077489133-1263928790=:68115--