From owner-cvs-src@FreeBSD.ORG  Sat Oct 22 10:17:26 2005
Return-Path: <owner-cvs-src@FreeBSD.ORG>
X-Original-To: cvs-src@freebsd.org
Delivered-To: cvs-src@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C144616A41F;
	Sat, 22 Oct 2005 10:17:26 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 4D35243D6B;
	Sat, 22 Oct 2005 10:17:21 +0000 (GMT) (envelope-from bde@zeta.org.au)
Received: from mailproxy1.pacific.net.au (mailproxy1.pacific.net.au
	[61.8.0.86])
	by mailout1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id
	j9MAHKbd012534; Sat, 22 Oct 2005 20:17:20 +1000
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	by mailproxy1.pacific.net.au (8.13.4/8.13.4/Debian-3) with ESMTP id
	j9MAHGv0015850; Sat, 22 Oct 2005 20:17:17 +1000
Date: Sat, 22 Oct 2005 20:17:16 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
In-Reply-To: <31753.1129924404@critter.freebsd.dk>
Message-ID: <20051022193119.R8350@delplex.bde.org>
References: <31753.1129924404@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: src-committers@freebsd.org, Andre Oppermann <andre@freebsd.org>,
	cvs-src@freebsd.org, cvs-all@freebsd.org,
	Marcel Moolenaar <marcel@xcllnt.net>,
	Andre Oppermann <oppermann@networx.ch>
Subject: Re: Timekeeping [Was: Re: cvs commit: src/usr.bin/vmstat vmstat.c
 src/usr.bin/w w.c] 
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 22 Oct 2005 10:17:27 -0000

On Fri, 21 Oct 2005, Poul-Henning Kamp wrote:

> In message <01DFB595-5279-4D3A-BEDA-5F0285E9519B@xcllnt.net>, Marcel Moolenaar
> writes:
>
>>> I think we need the definition to consider if (process- ?)state is
>>> retained while the system is unconcious or not.
>>
>> I'm not sure. I think that might be what makes the definition
>> complex.
>
> Actually I don't think it does, it simplifies it.

I agree.  Except for statistics progams, it is necessary to keep as much
history as practical; in particular, don't forgot the original boot time,
and keep supporting averages since boot in vmstat and systat.

> If a process survives across the "unconcious" period, then it follows
> that CLOCK_MONOTONIC cannot be reset to zero in relation to the
> unconcious period.

What is survival?  Everything might be restarted virtually.

> But we are only just scratching the surface here, there are tons of
> ambiguities we need to resolve, for instance:
>
> 	select(...., {3m0s})
> 	suspend
> 	[ 2 minutes pass ]
> 	resume
>
> When does select time out ?
>
>    One minute after the resume ?
>
>    Three minutes after the resume ?
>
>    Right after the resume with a special errno ?

As close as possible to 3m0s after select() was called.

There are many longstanding bugs in this area.  I remember the following:
- the stillborn non-option APM_FIXUP_CALLTODO attempts to fix some of
   them, by reducing all timeouts by the suspend time.  (It was stillborn
   because it is for the pre-callwheel implementation of timeouts but was
   committed after callwheel timeouts, so it never compiled in any committed
   version.  The uselessness of APM_FIXUP_CALLTODO was hidden by not making
   it a normal option.)

   The problem of wrong timeouts after suspend is very old.  Not fixing it
   avoids thundering herds of timeout expiries after suspend.

- nanosleep(), select() and poll() use getnanouptime(), getmicrouptime() and
   getmicrouptime() to not-so-carefully check that the timeout has expired
   after they wake up (the wakeup is sometimes early or late due to minor
   inaccuracies; when it is early, we detect that not-so-carefully and go
   back to sleep; when it is late, we can't recover so we should request
   the timeout to always be a little early so that we can be as close to
   on time as possible).  These syscalls should use non-get*() versions
   and non-*uptime() versions so that they actually know if the timeout
   expired.  Using *uptime() doesn't work because it doesn't count suspend
   time.  Using non-*uptime() doesn't quite work either, since the system's
   best idea of the real time may jump backwards.  A monotonic clock that
   jumps forwards by the suspend time is needed.

- realitimexpire() has the same bug as nanosleep() and friends.  The very
   name of this function shows that it should not be using *uptime().
   According to setitimer(2), "ITIMER_REAL decrements in real time".
   Using get*() in it is more justified than in nanosleep() since it is
   lower level so its efficiency may be important.

> Some code should obviously know about the suspend/resume event,
> dhclient, wep, wpa, bgpd, sshd, just to mention a few

Code like cron should get enough notification be having timeouts expires
as soon as possible after resume (if they would have expired during the
suspend interval if there was no suspend).  Such code can then check the
actual time on the correct clock like nanosleep() and friends to see if
a critical time has been reached.

Bruce