From owner-cvs-src@FreeBSD.ORG Tue Oct 18 17:32:24 2005 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C64116A41F; Tue, 18 Oct 2005 17:32:24 +0000 (GMT) (envelope-from nate@root.org) Received: from www.cryptography.com (li-22.members.linode.com [64.5.53.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id BCB8B43D45; Tue, 18 Oct 2005 17:32:23 +0000 (GMT) (envelope-from nate@root.org) Received: from [10.120.198.17] ([209.101.72.251]) by www.cryptography.com (8.12.8/8.12.8) with ESMTP id j9IHWHxq015877 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Tue, 18 Oct 2005 10:32:20 -0700 Message-ID: <43553162.5040802@root.org> Date: Tue, 18 Oct 2005 10:31:14 -0700 From: Nate Lawson User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Scott Long References: <200510172310.j9HNAVPL013057@repoman.freebsd.org> <20051018094402.A29138@grasshopper.cs.duke.edu> <435501B9.4070401@samsco.org> In-Reply-To: <435501B9.4070401@samsco.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Andrew Gallatin , cvs-all@FreeBSD.org, David Xu Subject: Re: cvs commit: src/sys/amd64/amd64 cpu_switch.S machdep.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2005 17:32:24 -0000 Scott Long wrote: > Andrew Gallatin wrote: >> Nice. This reduces lmbench context switch latency by about 0.4us (7.2 >> -> 6.8us), and reduces TCP loopback latency by about 0.9us (36.1 -> >> 35.2) on my dual core 3800+ >> >> It is a shame we can't find a way to use the TSC as a timecounter on >> SMP systems. It seems that about 40% of the context switch time is >> spent just waiting for the PIO read of the ACPI-fast or i8254 to >> return. > > The TSC represents the clock rate of the CPU, and thus can vary wildly > when thermal and power management controls kick in, and there is no way > to know when it changes. Because of this, I think that it's > practically useless on Pentium-Mobile and Pentium-M chips, among many > others. This is a myth. It is not so dismal as you portray and cpufreq(4) gives both the kernel and userland a way of getting the necessary info in an MI way (including notification of clock rate changes) and control it when possible. There are a number of mechanisms actually in the world today: * SMM-based clock switching: most laptops have SMM code (i.e. BIOS) that checks the power line status on boot and sets the base clock rate. They use the standard platform mechanism (i.e. enh speedstep, speedstep-ich) to set the frequency and cpufreq(4) allows the user or kernel to freely override it at runtime. All that is left to do is for timecounters to export a "re-calibrate" option that works at runtime and for cpufreq(4) to call it when the frequency is changed by the kernel/usermode. bde@ supplied some code I hope to import soon once I have it well tested that implements such a runtime calibration, although it is just used internally by cpufreq(4), not hooked into timecounters at the moment. Note that no BIOS I know of actually changes the value after boot, so TSC is reliable unless we change it ourselves. * p4tcc: thermal control circuit. Version 1 does x/8 throttling of the CPU by an internal stop clock cycle, where "x" is an integer. Version 2 also can step the clock rate via enh speedstep. There are two parts to this, the platform (BIOS) setting and "on demand" (kernel) setting. The OS can use the on demand setting via cpufreq(4) to save power or for passive cooling. We initiate this ourselves, so once the timecounter interface can accept an updated calibration, there is no issue here. The platform setting is worse in that we don't know when it kicks in. However, it is intended as an emergency measure like if a fan dies. All known BIOSen set this value just below the thermal shutdown circuit (i.e. the processor stops operation completely). As such, this is an edge case that we do not have to handle particularly efficiently. It suffices to periodically check the calibration of TSC (perhaps every 10 seconds?) via the ACPI timer and update our settings if it has changed. Since cpufreq(4) knows all the possible settings, it suffices to just measure the clock rate and compare it to a table of valid settings. There is no ambiguity (yet) since every CPU control mechanism has discrete settings. > There is also the issue of multiple CPUs having to keep their > TSC's somewhat in sync in order to get consistent counting in the > system. The best that you can do is to periodically read a stable > counter and try to recalibrate, but then you'll likely start getting > wild operational variances. > It's a shame that a PIO read is still so > expensive. I'd hate to see just how bad your benchmark becomes when > ACPI-slow is used instead of ACPI-fast. ACPI-slow should not be used at all. If the acpi timer is unreliable, use a different one. Also, I think most systems that had unreliable acpi timers were older and not likely to have variable CPU clocks. So I'd prefer TSC on such systems anyway. > I wonder if moving to HZ=1000 on amd64 and i386 was really all that good > of an idea. Having preemption in the kernel means that ithreads can run > right away instead of having to wait for a tick, and various fixes to > 4BSD in the past year have eliminated bugs that would make the CPU wait > for up to a tick to schedule a thread. So all we're getting now is a > 10x increase in scheduler overhead, including reading the timecounters. I use hz=100 on my systems due to the 1 khz noise from C3 sleep. Windows has the same problem. -- Nate