From owner-svn-src-all@FreeBSD.ORG Wed Jul 25 14:44:07 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 3A5BC106566C; Wed, 25 Jul 2012 14:44:07 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail01.syd.optusnet.com.au (mail01.syd.optusnet.com.au [211.29.132.182]) by mx1.freebsd.org (Postfix) with ESMTP id BF88A8FC0C; Wed, 25 Jul 2012 14:44:06 +0000 (UTC) Received: from c122-106-171-246.carlnfd1.nsw.optusnet.com.au (c122-106-171-246.carlnfd1.nsw.optusnet.com.au [122.106.171.246]) by mail01.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q6PEi4Jp023193 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 26 Jul 2012 00:44:05 +1000 Date: Thu, 26 Jul 2012 00:44:04 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Andriy Gapon In-Reply-To: <500FE6AE.8070706@FreeBSD.org> Message-ID: <20120726001659.M5406@besplex.bde.org> References: <201207242210.q6OMACqV079603@svn.freebsd.org> <500F9E22.4080608@FreeBSD.org> <20120725102130.GH2676@deviant.kiev.zoral.com.ua> <500FE6AE.8070706@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Konstantin Belousov , Jim Harris , svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, svn-src-head@FreeBSD.org Subject: Re: svn commit: r238755 - head/sys/x86/x86 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jul 2012 14:44:07 -0000 On Wed, 25 Jul 2012, Andriy Gapon wrote: > on 25/07/2012 13:21 Konstantin Belousov said the following: >> ... >> diff --git a/sys/x86/x86/tsc.c b/sys/x86/x86/tsc.c >> index 085c339..229b351 100644 >> --- a/sys/x86/x86/tsc.c >> +++ b/sys/x86/x86/tsc.c >> @@ -594,6 +594,7 @@ static u_int >> tsc_get_timecount(struct timecounter *tc __unused) >> { >> >> + rmb(); >> return (rdtsc32()); >> } > > This makes sense to me. We probably want correctness over performance here. > [BTW, I originally thought that the change was here; brain malfunction] And I liked the original change because it wasn't here :-). >> @@ -602,8 +603,9 @@ tsc_get_timecount_low(struct timecounter *tc) >> { >> uint32_t rv; >> >> + rmb(); >> __asm __volatile("rdtsc; shrd %%cl, %%edx, %0" >> - : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); >> + : "=a" (rv) : "c" ((int)(intptr_t)tc->tc_priv) : "edx"); >> return (rv); >> } >> > > It would correct here too, but not sure if it would make any difference given that > some lower bits are discarded anyway. Probably depends on exact CPU. It is needed to pessimize this too. :-) As I have complained before, the loss of resolution from the shift is easy to see by reading the time from userland, even with syscall overhead taking 10-20 times longer than the read. On core2 with TSC-low, a clock- checking utility gives: % min 481, max 12031, mean 530.589452, std 51.633626 % 1th: 550 (1296487 observations) % 2th: 481 (448425 observations) % 3th: 482 (142650 observations) % 4th: 549 (61945 observations) % 5th: 551 (47619 observations) The numbers are diffences in nanoseconds measured by clock_gettime(). The jump from 481 to 549 is 68. From this I can tell that the clock frequency is 1.86 Ghz and the shift is 128, or the clock frequency is 3.72 Ghz and the shift is 256. On AthlonXP with TSC: % min 273, max 29075, mean 274.412811, std 80.425963 % 1th: 273 (853962 observations) % 2th: 274 (745606 observations) % 3th: 275 (400212 observations) % 4th: 276 (20 observations) % 5th: 280 (10 observations) Now the numbers cluster about the mean. Although syscalls take much longer than the loss of resolution with TSC-low, and even the core2 TSC takes almost as long to read as the loss, it is still possible to see things happening at the limits of the resolution (~0.5 nsec). > And, oh hmm, I read AMD Software Optimization Guide for AMD Family 10h Processors > and they suggest using cpuid (with a note that it may be intercepted in > virtualized environments) or _mfence_ in the discussed role (Appendix F of the > document). > Googling for 'rdtsc mfence lfence' yields some interesting results. The second hit was for the shrd pessimization/loss of resolution and a memory access hack in lkml in 2011. I now seem to remember jkim mentioning the memory access hack. rmb() on i386 has a related memory access hack, but now with a lock prefix that defeats the point of the 2011 hack (it wanted to save 5 nsec by removing fences). rmb() on amd64 uses lfence. Some of the other hits are a bit old. The 8th one was by me in the thread about kib@ implementing gettimeofday() in userland. Bruce