From owner-freebsd-current@FreeBSD.ORG  Wed Jul 21 13:01:25 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A2CFD16A4CE; Wed, 21 Jul 2004 13:01:25 +0000 (GMT)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 254AC43D53; Wed, 21 Jul 2004 13:01:25 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])i6LD1N4u010287;	Wed, 21 Jul 2004 23:01:23 +1000
Received: from epsplex.bde.org (katana.zip.com.au [61.8.7.246])
	i6LD1Kao015911;	Wed, 21 Jul 2004 23:01:21 +1000
Date: Wed, 21 Jul 2004 23:01:20 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@epsplex.bde.org
To: Brian Fundakowski Feldman <green@freebsd.org>
In-Reply-To: <20040721102620.GF1009@green.homeunix.org>
Message-ID: <20040721220405.Y2346@epsplex.bde.org>
References: <20040721081310.GJ22160@freebsd3.cimlogic.com.au>
 <20040721102620.GF1009@green.homeunix.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: current@freebsd.org
Subject: Re: nanosleep returning early
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Jul 2004 13:01:25 -0000

On Wed, 21 Jul 2004, Brian Fundakowski Feldman wrote:

> On Wed, Jul 21, 2004 at 06:13:10PM +1000, John Birrell wrote:
> >
> > Today I increased HZ in a current kernel to 1000 when adding dummynet.
> > Now I find that nanosleep regularly comes back a little early.
> > Can anyone explain why?

The most obvious bug is that nanosleep() uses the low-accuracy interface
getnanouptime().  I can't see why the the problem is more obvious with
large HZ or why it affects short sleeps.  From kern_time.c 1.170:

% static int
% nanosleep1(struct thread *td, struct timespec *rqt, struct timespec *rmt)
% {
% 	struct timespec ts, ts2, ts3;
% 	struct timeval tv;
% 	int error;
%
% 	if (rqt->tv_nsec < 0 || rqt->tv_nsec >= 1000000000)
% 		return (EINVAL);
% 	if (rqt->tv_sec < 0 || (rqt->tv_sec == 0 && rqt->tv_nsec == 0))
% 		return (0);
% 	getnanouptime(&ts);

This may lag the actual (up)time by 1/HZ seconds.

% 	timespecadd(&ts, rqt);

So we get a final time that may be 1/HZ seconds too small.

% 	TIMESPEC_TO_TIMEVAL(&tv, rqt);

Rounding to microseconds doesn't make much difference since things
take on the order of 1uS.

% 	for (;;) {
% 		error = tsleep(&nanowait, PWAIT | PCATCH, "nanslp",
% 		    tvtohz(&tv));

We only converted to a timeval so that we could use tvtohz() here.
tvtohz() rounds up to the tick boundary after the next one to allow
for the 1/HZ resolution of tsleep().  This should also mask the
inaccuracy of getnanouptime() unless the sleep returns early due
to a signal -- in the usual case of not very long sleeps that are
not killed by a signal, the tsleep() guarantees sleeping long enough
and sleeps (1/2*1/HZ) extra on average.

% 		getnanouptime(&ts2);
% 		if (error != EWOULDBLOCK) {
% 			if (error == ERESTART)
% 				error = EINTR;
% 			if (rmt != NULL) {
% 				timespecsub(&ts, &ts2);
% 				if (ts.tv_sec < 0)
% 					timespecclear(&ts);
% 				*rmt = ts;

This handles the case of being killed by a signal.  Then we always return
early, and the bug is just that returned time-not-slept is innacurate.

% 			}
% 			return (error);
% 		}
% 		if (timespeccmp(&ts2, &ts, >=))
% 			return (0);

This handles the case where the timeout expires.  We check that the
specified sleep time has expired, not just that some number of ticks
expired, since the latter may be too short for long sleeps even after
rounding it up.

% 		ts3 = ts;
% 		timespecsub(&ts3, &ts2);
% 		TIMESPEC_TO_TIMEVAL(&tv, &ts3);

Errors may accumulate (or cancel?) for the next iteration.

% 	}
% }

> > I would have expected that the *overrun* beyond the required time to vary,
> > but never that it would come back early.
>
> Is this a difference from clock_gettime(CLOCK_MONOTONIC)?  You really
> shouldn't be using gettimeofday() foor internal timing since the
> system clock can be adjusted by NTP.

The monotonic clock can also be adjusted by NTP, and normally is if there
are any NTP adjustments at all (the uptime and the time use the same
timecounter which is adjusted by NTP).  NTP's adjustments are only limited
to CLOCK_REALTIME when NTP steps the clock for initialization.  Stepping
the clock causes other time warps and should never be used.

Bruce