From owner-cvs-src@FreeBSD.ORG Fri Oct 21 14:23:08 2005 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8892116A420; Fri, 21 Oct 2005 14:23:08 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id D1EFC43D45; Fri, 21 Oct 2005 14:23:07 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86]) by mailout2.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j9LEN6Mr015514; Sat, 22 Oct 2005 00:23:06 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j9LEN3TI021531; Sat, 22 Oct 2005 00:23:04 +1000 Date: Sat, 22 Oct 2005 00:23:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Poul-Henning Kamp In-Reply-To: <27345.1129842256@critter.freebsd.dk> Message-ID: <20051021230751.Q5110@delplex.bde.org> References: <27345.1129842256@critter.freebsd.dk> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, Andre Oppermann , cvs-all@FreeBSD.org Subject: Re: Timekeeping [Was: Re: cvs commit: src/usr.bin/vmstat vmstat.c src/usr.bin/w w.c] X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Oct 2005 14:23:08 -0000 On Thu, 20 Oct 2005, Poul-Henning Kamp wrote: > ... > If people do stupid things like use hard steps (*settime*()) to > correct rate problems, then they get what they deserve, including > potentially backwards jumps in time, but the integral over time of > all steps apart from the first one amounts to a rate correction. Using *settime*() isn't stupid. It is always done by ntpdate -b and sometimes done by ntpd. (I use ntpd -x to prevent stepping, but -x shouldn't be used except for debugging since stepping is the best way to correct large errors, and at least old versions of ntpd are broken if they would prefer to step but are prevent from doing so by -x.) > In summary: CLOCK_MONOTONIC is our best estimate of how many SI > seconds the system have been runing [3]. Actual testing shows that CLOCK_MONOTONIC, or possibly CLOCK_REALTIME less the boot time, gives a very bad estimate of how long the system has been running. The difference between these clocks was about 500 seconds on all systems tested: % sledge: % 1:03PM up 22:45, 1 user, load averages: 0.23, 0.08, 0.02 % uptime 1 81900 % uptime 2 82887 % % pluto1: % 1:05PM up 15 days, 10:18, 1 user, load averages: 1.28, 1.15, 1.26 % uptime 1 1333090 % uptime 2 1333540 % % pluto2: % 1:06PM up 10 days, 7:19, 1 user, load averages: 1.95, 1.83, 1.80 % uptime 1 890323 % uptime 2 890721 These are freebsd machines. uptime1 is from gettimeofday() less boottime. uptime2 is from CLOCK_MONOTONIC. I don't know what root has been doing to mess up the clocks on these machines. % delplex: % 11:00PM up 31 days, 4:37, 2 users, load averages: 0.06, 0.02, 0.00 % uptime 1 2695028 % uptime 2 2695926 % % epsplex: % 11:00PM up 3:34, 4 users, load averages: 0.00, 0.00, 0.00 % uptime 1 12856 % uptime 2 13390 % % besplex: % 11:01PM up 26 days, 1:09, 1 user, load averages: 0.00, 0.00, 0.00 % uptime 1 2250584 % uptime 2 2251311 These are my local machines. Root did a lot of ntpdate -b's on delplex and besplex when they rebooted after a power failure 26 days ago, but the steps were much smaller than 500 seconds and there haven't been any since. epsplex has the ~500 second difference after not doing any steps except: % Oct 21 19:26:59 epsplex kernel: tc_windup: large step 1129922814 Usual step from 0 to year 2005 on startup: % Oct 21 19:26:59 epsplex kernel: tc_windup: negative step 36000 Usual step by adjkerntz to fix up hardware clock being on local time. Doesn't affect deltas. % Oct 21 19:27:01 epsplex kernel: tc_windup: negative step 2 By ntpdate to sync with delplex. A large fairly machdine-independent differece is hard to explain. I will reboot after sending this to see if one of the values is much larger than the uptime when the uptime is < 60 seconds. > Given that CLOCK_MONOTONIC is our best guess how long the kernel > has been running, it follows that CLOCK_REALTIME - CLOCK_MONOTONIC > must be our best estimate of what time the kernel booted. Not given, and not true. After syncing with an accurate external clock by a step, we know the real time very accurately. Normally we sync soon after booting. Then we know the boot time very accurately (it is the current real time less CLOCK_MONOTONIC). Then if we resync with the external clock later using a step, we again know the real time very accurately, and our best guess at the uptime is the current real time less the previously determined boot time (with a non-broken time_t or difftime() restoring leap seconds). CLOCK_MONOTONIC cannot track this because it cannot jump. You might say that the uptime cannot jump either. This is OK, but then it (like CLOCK_MONOTONIC) should be slewed to catch up with the jump. Bruce