From owner-svn-src-all@FreeBSD.ORG Mon Jun 20 23:41:16 2011 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from [127.0.0.1] (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by hub.freebsd.org (Postfix) with ESMTP id 1B36710658AF; Mon, 20 Jun 2011 23:41:15 +0000 (UTC) (envelope-from jkim@FreeBSD.org) From: Jung-uk Kim To: Bruce Evans Date: Mon, 20 Jun 2011 19:41:00 -0400 User-Agent: KMail/1.6.2 References: <201106081938.p58JcWuB044252@svn.freebsd.org> <201106081913.09272.jkim@FreeBSD.org> <20110618210815.W889@besplex.bde.org> In-Reply-To: <20110618210815.W889@besplex.bde.org> MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201106201941.03393.jkim@FreeBSD.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r222866 - head/sys/x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jun 2011 23:41:16 -0000 On Saturday 18 June 2011 08:05 am, Bruce Evans wrote: > Long ago, On Wed, 8 Jun 2011, Jung-uk Kim wrote: > > On Wednesday 08 June 2011 04:55 pm, Bruce Evans wrote: > >> On Wed, 8 Jun 2011, Jung-uk Kim wrote: > >>> Log: > >>> Introduce low-resolution TSC timecounter "TSC-low". It > >>> replaces the normal TSC timecounter if TSC frequency is higher > >>> than ~4.29 MHz (or 2^32-1 Hz) or > >> > >> It should be a separate timecounter so that the user can choose > >> it independently, at least in the SMP case where it is very low > >> (at most ~4.29 GHz >> 8 ~= 17 MHz). > > > > As I noted in the log, it is still higher than the previous > > default ACPI-fast, which is ~3.68 MHz and I've never heard of any > > complaint about ACPI-fast being too low. ;-) > > That's because it is too low to measure itself being low :-). > > > Nothing prevents us from making a separate timecounter, though. > > In fact, we can do the same for ACPI-fast/ACPI-safe. However, > > that'll only confuse users, IMHO. > > TSC/TSC-low sort of corresponds to ACPI-fast/ACPI-safe. Users can > switch between the latter. How do we do that? if (j == 10) { acpi_timer_timecounter.tc_name = "ACPI-fast"; acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount; acpi_timer_timecounter.tc_quality = 900; } else { acpi_timer_timecounter.tc_name = "ACPI-safe"; acpi_timer_timecounter.tc_get_timecount = acpi_timer_get_timecount_safe; acpi_timer_timecounter.tc_quality = 850; } We didn't have any code to influence this selection as far as I can remember. > What they can't do is run both concurrently, either to compare them > or use the best one that works in the current context. That would > be more developers and is not implemented mainly because it has more > complexity (only a tiny amount of extra overhead I think, provided > you don't try to keep the 2 times coherent -- just an extra windup > for each active timecounter). > > >>> static void tsc_levels_changed(void *arg, int unit); > >>> > >>> static struct timecounter tsc_timecounter = { > >>> @@ -392,11 +393,19 @@ test_smp_tsc(void) > >>> static void > >>> init_TSC_tc(void) > >> > >> This seems to only be called once at boot time. So the lowness > >> may be much lower than necessary if the levels are reduced > >> significantly later. > > > > It'll only happen when the CPU is started at the highest > > frequency and TSC is not invariant. In this case, its quality > > will be set to 800 and HPET or ACPI timecounter will be selected > > by default. I don't see much problem with the default choice > > here. > > Can the CPU be started at a low frequency and throttled up later? Yes, Intel mobile parts may do that. > I agree that the non-invariant case is not very important. Exactly. > >>> { > >>> + uint64_t max_freq; > >>> + int shift; > >>> > >>> if ((cpu_feature & CPUID_TSC) == 0 || tsc_disabled) > >>> return; > >>> > >>> /* > >>> + * Limit timecounter frequency to fit in an int and prevent > >>> it from + * overflowing too fast. > >>> + */ > >>> + max_freq = UINT_MAX; > >>> + > >>> + /* > >>> * We can not use the TSC if we support APM. Precise > >>> timekeeping * on an APM'ed machine is at best a fools pursuit, > >>> since * any and all of the time spent in various SMM code can't > >>> @@ -418,13 +427,27 @@ init_TSC_tc(void) > >>> * We can not use the TSC in SMP mode unless the TSCs on all > >>> CPUs are * synchronized. If the user is sure that the system > >>> has synchronized * TSCs, set kern.timecounter.smp_tsc tunable > >>> to a non-zero value. + * We also limit the frequency even > >>> lower to avoid "temporal anomalies" + * as much as possible. > >>> */ > >>> - if (smp_cpus > 1) > >>> + if (smp_cpus > 1) { > >>> tsc_timecounter.tc_quality = test_smp_tsc(); > >>> + max_freq >>= 8; > >>> + } > >> > >> This gives especially low lowness if the levels are reduced > >> significantly. Maybe as low as 100 MHz >> 8 = ~390 KHz = lower > >> than an i8254. > > > > I don't remember any SMP-capable x86 ever running at 100 MHz > > unless it is seriously under-clocked. Even if it existed, it > > won't be available today. :-P > > Doesn't throttling give underclocking? T-state *usually* does not change CPU frequency itself. Only P-state can change TSC frequency. However, some broken implementation *may* just stop incrementing TSC in very low T-state (or C-state). AMD does not have this problem for invariant TSCs. It seems Intel also fixed it for recent processors. Nehalem or Sandy Bridge, I am not sure, though. > Maybe not as low as 100 MHz, but quite low. Only a possible problem > for the non-invariant case anyway. Agreed. > >> OTOH, maybe the temporal anomalies scale with the TSC frequency, > >> so you need to right shift by a few irrespective of the TSC > >> frequency. A shift count of 8 seems too much, but if the initial > >> TSC frequency is already < 2**32 shifted by 8, then the final > >> shift is 0. > > This is my main point. How can it be right to reduce the extra > shift for SMP (if this shift is needed at all) just because the > initial TSC frequency is low? All instructions are clocked, so > non-temporalness within a core scales with the current frequency. > Oops, this leads back to my previous point that the scaling should > depend on the current frequency and not just on the initial > frequency. Across cores, it isn't so clear what the > non-temporalness scales with. The non-temporalness is FUD so its > scaling could be anything :-). My questions to you: a) Why do we care TSC timecounter when it is not invariant where we *know* it is unusable and set to negative quality? b) Why do we complicate code when invariant frequency == current frequency == initial frequency? > >> ... > >> Perhaps the levels can also be increased significantly later. > >> Then the timecounter frequency may exceed 4.29 GHz despite its > >> scaling. > > > > Again, it can only happen when the CPU was started at low > > frequency and the TSC is not invariant. For that case, TSC won't > > be selected by default unless both HPET and ACPI timers are > > disabled/unavailable. > > But users can select it, and since user's can't control the scaling > or even select between TSC/TSC-low, TSC-low must be scaled properly > initially to have the best chance of working later. Maybe we should not allow users to select negative-quality timecounter in the first place. Or maybe we should print scary warning messages if they try foot-shooting. Sigh... > >>> @@ -520,8 +545,15 @@ SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq > >>> 0, 0, sysctl_machdep_tsc_freq, "QU", "Time Stamp Counter > >>> frequency"); > >>> > >>> static u_int > >>> -tsc_get_timecount(struct timecounter *tc) > >>> +tsc_get_timecount(struct timecounter *tc __unused) > >>> { > >>> > >>> return (rdtsc32()); > >>> } > >>> + > >>> +static u_int > >>> +tsc_get_timecount_lowres(struct timecounter *tc) > >>> +{ > >>> + > >>> + return (rdtsc() >> (int)(intptr_t)tc->tc_priv); > >> > >> This forces a slow 64-bit shift (shrdl; shrl) in all cases. > > > > Yes, it does, unfortunately. > > > > I have no clue why AMD didn't implement native 64-bit RDTSC (and > > RDMSR/WRMSR) in the first place. :-( > > I didn't notice before that it still goes to a register pair on > amd64. > > >> rdtsc32() with a scaled tc_counter_mask should work OK > >> (essentially the same as the non-low timecounter except for > >> reduced accuracy; the only loss is an decrease in the time until > >> counter overflow to the same as for the non-low timecounter). > > > > I thought about that but I didn't like that idea, i.e., losing > > resolution and accuracy at the same time. > > But it doesn't lose any more resolution or accuracy than any shift > necessarily uses. It only loses wrap time, which is of no interest > for a small reduction. See another reply. > > The shift of 8 for SMP still seems far too much. clock_gettime() > with a TSC timecounter on an old 2GHz system takes about 250 nS. I > hope it takes only 1/2 that on a newer system. nanouptime() in the > kernel takes more like 30 nS on the old system. It should at least > try to have enough resulution for sequential calls to it to never > return the same time (even ACPI-fast has this property -- about > 1000 nS per call and a resolution of about 250 nS). rdtsc on old > Athlons takes only 12 (9?) cycles so you could almost use it to > time individual instructions (modulo out of order execution). THe > invariant versions have to be much slower for synchronization :-(. > They take at least 42 cycles AFAIR. A shift count of 5 would lose > less resolution than an invariant TSC really has so it would be > good if it is enough to hide the nontemporalness. A shift count of > 6 would be OK too. But a shift count of 8 lets you execute about 4 > nanouptime()'s for every change in the time returned. OTOH, 256 > cycles at 4 GHz is about 64 uS and clock_gettime() unfortunately > takes longer (except on Linux? :-(), so a shift count of 8 is OK > for it. > > My clock measurement program (mostly an old program by Wollman) > shows the following histogram of times for a non-invariant TSC > timecounter on a 2GHz UP system: > > % min 273, max 265102, mean 273.998217, std 79.069534 > % 1th: 273 (1727219 observations) > % 2th: 274 (265607 observations) > % 3th: 275 (6984 observations) > % 4th: 280 (11 observations) > % 5th: 290 (8 observations) > > The variance is small, and differences of a single nS can be seen > clearly. With the SMP shift of 8 on a 4GHz system, the minimum > difference would be 64 nS so it would be impossible to see the > details of the distribution about the mean of 273.998 nS. Thanks for the info, Jung-uk Kim