From owner-cvs-src@FreeBSD.ORG Tue Oct 18 05:04:57 2005 Return-Path: X-Original-To: cvs-src@FreeBSD.org Delivered-To: cvs-src@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6CF7116A41F; Tue, 18 Oct 2005 05:04:57 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id B5A8843D46; Tue, 18 Oct 2005 05:04:56 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87]) by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j9I54tOb014186; Tue, 18 Oct 2005 15:04:55 +1000 Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246]) by mailproxy2.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id j9I54qQ7024637; Tue, 18 Oct 2005 15:04:53 +1000 Date: Tue, 18 Oct 2005 15:04:54 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Andre Oppermann In-Reply-To: <200510171537.j9HFbMdW073713@repoman.freebsd.org> Message-ID: <20051018135821.L93164@delplex.bde.org> References: <200510171537.j9HFbMdW073713@repoman.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/usr.bin/vmstat vmstat.c src/usr.bin/w w.c X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Oct 2005 05:04:57 -0000 On Mon, 17 Oct 2005, Andre Oppermann wrote: > andre 2005-10-17 15:37:22 UTC > > FreeBSD src repository > > Modified files: > usr.bin/vmstat vmstat.c > usr.bin/w w.c > Log: > Obtain true uptime through clock_gettime(CLOCK_MONOTONIC, struct *timespec) > instead of subtracting 'bootime' from 'now'. This is bogus, and it breaks vmstat some more in the dead kernel case. In the live kernel case, clock_gettime() returns the time since an unspecified point in the past. It is still necessary to subtract the boottime, one measured by the same clock, especially under systems like FreeBSD where the "unspecified point in the past" is undocumented. In the dead kernel case, clock_gettime() cannot be used, and using it gives a wrong uptime (that of the running system, not that of the dead kernel). The old code is wrong too. It subtracts the boot time from the current time, where the current time is for the live kernel and the boot time is for the live or dead kernel. This gives an even wronger uptime for the dead kernel case, since a live kernel may be run longer after the boot time of a dead kernel than after its own boot time. I don't know of any good way to determine the uptime of a dead kernel now. I only know of a wrong way: kern_shutdown.c has a low quality function print_uptime() which does user-interface things that don't belong in the kernel. It prints the uptime to the console. To ensure its low quality, it does this after the dump has completed, so that its output doesn't get printed in the dump via the message buffer. Thus its output can only be seen if you are watching the shutdown in some way, and is soon forgotten unless your watcher has logging features. The death time for a dead kernel should be saved in a variable near its boottime variable so that utilities like vmstat can determine it easily. For live kernels, subtracting the boot time from the current _real_ time using difftime() is the correct method. Both times are relative to the same clock, so their difference gives the elasped time unless there are bugs in this clock. The clock should be the real time clock and not the monotonic clock, since uptimes are in real time. In practice there should only be a difference of a leap second or two and no one would be able to see it. However, there are bugs. For the real time clock, the boot time at least used to be bogusly adjusted to compensate for adjustements to the real time by settime(), mainly so that subtracting the boot time from the current time gives a value near the uptime. This is correct if the adjustment is to compensate for an initial error in the real time as is normal if the system clock is on local time (here local time is 36000 seconds in advance of UTC, so both the real time and the boot time start up 36000 too high; then the settime() in adjkerntz sets them both back 36000; if the boottime were not adjusted then the uptime shown by vmstat would be off by 36000 seconds). This is wrong if the adjustment is to compensate for clock drift. Then the uptime really should change to adjust for the error in the real time, but instead the boot time is messed up to keep the uptime the same. Using the monotonic clock for the uptime gives essentially the same bug -- in general, the real time clock and the monotonic clock will drift at different rates, and since only the real time clock can be synced with an external clock, the inaccuracy can only be bounded for the real time clock. There are also some style bugs in vmstat.c in the changes and old ones nearby: % Index: vmstat.c % =================================================================== % RCS file: /home/ncvs/src/usr.bin/vmstat/vmstat.c,v % retrieving revision 1.90 % retrieving revision 1.91 % diff -u -2 -r1.90 -r1.91 % --- vmstat.c 6 Aug 2005 13:56:21 -0000 1.90 % +++ vmstat.c 17 Oct 2005 15:37:22 -0000 1.91 % @@ -397,24 +397,12 @@ % getuptime(void) % { % - static struct timeval boottime; % - static time_t now; % + struct timespec sp; % time_t uptime; % % - if (boottime.tv_sec == 0) { % - if (kd != NULL) { % - kread(X_BOOTTIME, &boottime, sizeof(boottime)); These leaves dead code in the nlist. % - } else { % - size_t size; % - % - size = sizeof(boottime); % - mysysctl("kern.boottime", &boottime, &size, NULL, 0); % - if (size != sizeof(boottime)) % - errx(1, "kern.boottime size mismatch"); There was error checking for the sysctl. % - } % - } % - (void)time(&now); % - uptime = now - boottime.tv_sec; % + (void)clock_gettime(CLOCK_MONOTONIC, &sp); There was/is no error checking for the time calls. % + uptime = sp.tv_sec; % if (uptime <= 0 || uptime > 60*60*24*365*10) % errx(1, "time makes no sense; namelist must be wrong"); This check is now bogus. Dead kernels are no longer used to determine the uptime; in particular the namelist is not used. The only possibility of an error is clock_gettime() but errors from it were voided. % + Extra blank line. % return(uptime); No space before return expression. % } Bruce