From owner-freebsd-current@FreeBSD.ORG Wed Jul 21 13:01:25 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A2CFD16A4CE; Wed, 21 Jul 2004 13:01:25 +0000 (GMT) Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id 254AC43D53; Wed, 21 Jul 2004 13:01:25 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au [61.8.0.86])i6LD1N4u010287; Wed, 21 Jul 2004 23:01:23 +1000 Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246]) i6LD1Kao015911; Wed, 21 Jul 2004 23:01:21 +1000 Date: Wed, 21 Jul 2004 23:01:20 +1000 (EST) From: Bruce Evans X-X-Sender: bde@epsplex.bde.org To: Brian Fundakowski Feldman In-Reply-To: <20040721102620.GF1009@green.homeunix.org> Message-ID: <20040721220405.Y2346@epsplex.bde.org> References: <20040721081310.GJ22160@freebsd3.cimlogic.com.au> <20040721102620.GF1009@green.homeunix.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: current@freebsd.org Subject: Re: nanosleep returning early X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2004 13:01:25 -0000 On Wed, 21 Jul 2004, Brian Fundakowski Feldman wrote: > On Wed, Jul 21, 2004 at 06:13:10PM +1000, John Birrell wrote: > > > > Today I increased HZ in a current kernel to 1000 when adding dummynet. > > Now I find that nanosleep regularly comes back a little early. > > Can anyone explain why? The most obvious bug is that nanosleep() uses the low-accuracy interface getnanouptime(). I can't see why the the problem is more obvious with large HZ or why it affects short sleeps. From kern_time.c 1.170: % static int % nanosleep1(struct thread *td, struct timespec *rqt, struct timespec *rmt) % { % struct timespec ts, ts2, ts3; % struct timeval tv; % int error; % % if (rqt->tv_nsec < 0 || rqt->tv_nsec >= 1000000000) % return (EINVAL); % if (rqt->tv_sec < 0 || (rqt->tv_sec == 0 && rqt->tv_nsec == 0)) % return (0); % getnanouptime(&ts); This may lag the actual (up)time by 1/HZ seconds. % timespecadd(&ts, rqt); So we get a final time that may be 1/HZ seconds too small. % TIMESPEC_TO_TIMEVAL(&tv, rqt); Rounding to microseconds doesn't make much difference since things take on the order of 1uS. % for (;;) { % error = tsleep(&nanowait, PWAIT | PCATCH, "nanslp", % tvtohz(&tv)); We only converted to a timeval so that we could use tvtohz() here. tvtohz() rounds up to the tick boundary after the next one to allow for the 1/HZ resolution of tsleep(). This should also mask the inaccuracy of getnanouptime() unless the sleep returns early due to a signal -- in the usual case of not very long sleeps that are not killed by a signal, the tsleep() guarantees sleeping long enough and sleeps (1/2*1/HZ) extra on average. % getnanouptime(&ts2); % if (error != EWOULDBLOCK) { % if (error == ERESTART) % error = EINTR; % if (rmt != NULL) { % timespecsub(&ts, &ts2); % if (ts.tv_sec < 0) % timespecclear(&ts); % *rmt = ts; This handles the case of being killed by a signal. Then we always return early, and the bug is just that returned time-not-slept is innacurate. % } % return (error); % } % if (timespeccmp(&ts2, &ts, >=)) % return (0); This handles the case where the timeout expires. We check that the specified sleep time has expired, not just that some number of ticks expired, since the latter may be too short for long sleeps even after rounding it up. % ts3 = ts; % timespecsub(&ts3, &ts2); % TIMESPEC_TO_TIMEVAL(&tv, &ts3); Errors may accumulate (or cancel?) for the next iteration. % } % } > > I would have expected that the *overrun* beyond the required time to vary, > > but never that it would come back early. > > Is this a difference from clock_gettime(CLOCK_MONOTONIC)? You really > shouldn't be using gettimeofday() foor internal timing since the > system clock can be adjusted by NTP. The monotonic clock can also be adjusted by NTP, and normally is if there are any NTP adjustments at all (the uptime and the time use the same timecounter which is adjusted by NTP). NTP's adjustments are only limited to CLOCK_REALTIME when NTP steps the clock for initialization. Stepping the clock causes other time warps and should never be used. Bruce