From owner-cvs-src@FreeBSD.ORG Tue Oct 18 15:31:39 2005 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 236BC16A41F; Tue, 18 Oct 2005 15:31:39 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id 961F143D45; Tue, 18 Oct 2005 15:31:38 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id D8B4EBC84; Tue, 18 Oct 2005 15:31:31 +0000 (UTC) To: Scott Long From: "Poul-Henning Kamp" In-Reply-To: Your message of "Tue, 18 Oct 2005 08:34:52 MDT." <4355080C.302@samsco.org> Date: Tue, 18 Oct 2005 17:31:31 +0200 Message-ID: <69026.1129649491@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Andrew Gallatin , cvs-all@FreeBSD.org, David Xu Subject: Re: cvs commit: src/sys/amd64/amd64 cpu_switch.S machdep.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2005 15:31:39 -0000 In message <4355080C.302@samsco.org>, Scott Long writes: [At the risk of repeating myself once more...] >Steering mutliple TSC's together isn't that hard and there are plenty of >examples, as you point out. Accounting for the changes due to thermal >and power management (note that this isn't the same problem as suspend >and resume) is what worries me. It all depends what you mean by "hard" and what benefit you expect to arrive at. One of the things you have to realize is that once you go down this road you need a lot of code for all the conditionals. For instance you need to make sure that every new timestamp you hand out not prior to another one, no matter what is happening to the clocks. Imagine one CPU throttling because of heat, that CPU will be handing out timestamps in the past until the TSC slowdown has been corrected, meanwhile the other CPU in the system churns on at full speed. To solve this, you need to pessimize every timestamp with an intercpu lock to compare against the previous timestamp and if less you have to do the Lamport-trick and return the "previous timestamp + epsilon". Then there is the question of how you adapt, a stepwise adaptation is hard to get right without overshoot, and stability is far from a given. Dave Mills implemented a scheme on Alpha to have a per-cpu PLL which where clocked by a common interrupt from the RTC. The results were interesting, but hardly revolutionary, and performance wise it sucked. So, yes, it may not be "hard" in the "write an OS from scratch" sense of "hard", but it is certainly far from trivial, comes with a heavy penalty in complexity and a notable shortage of successful prior art. One of the things we pride ourselves off in FreeBSD is stability, and the current code (finally!) provides that: It has been a long time since we last hard timecounter issues with broken hardware. But if people are certain their TSC's are good and sound, they can override the default safe selection of ACPI with a sysctl, and in doing so, they can take a calculated risk. That, IMO, is the correct "FreeBSD way" to handle this: "Safe out of the box. Informed tweaking may be profitable." I would hate to have to go to the other side where some fraction of users which happen to use hardware with problems in this space will have to disable something to get stable operation or to avoid unexplained undesirable transient phenomena. >> It seems like reading ACPI-fast is "only" 3us or so, but when the ctx >> switch is otherwise 4us, it adds up. i8254 is much worse on this >> system (6.5us). i8254 is always bad, and about as bad as it can. Mostly because of the need to disable interrupts (Actually, that's a critical section today, isn't it ?) and also hobbled by the three 8 bit ISA-bus(-like) accesses needed. >> > I wonder if moving to HZ=1000 on amd64 and i386 was really all that good >> > of an idea. The main benefit was getting more precise timeouts, something we have at various times thought about implementing with deadline counters on platforms that have it. Nobody has done it though. So, instead of looking for "quick fixes", lets look at this with a designers or architects view: On a busy system the scheduler works hundred thousand times per second, but on most systems nobody ever looks at the times(2) data. The smart solution is therefore to postpone the heavy stuff into times(2) and make the scheduler work as fast as it can. So the scheduler should read the TSC and schedule in TSC-ticks. times(2) will then have to convert this to clock_t compatible numbers. According the The Open Group, clock_t is in microseconds by means of historical standards mistakes. However, I can see nowhere that would collide with an interpretation that said "clock_t is microseconds PROVIDED the cpu had run at full speed", so a simple one second routine to latch the highest number of TSC-tics we've seen in a second would be sufficient to generate the conversion factor. And in many ways this would be a much more useful metric to offer (in top(1)) than the current rubber-band-cpu-seconds. Poul-Henning [1] A problem with this plan of course is that some CPU's don't have TSCs, but a fallback mechanism to use whatever timecounter is active as TSC. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.