From owner-freebsd-current@FreeBSD.ORG  Mon Jun 21 05:01:42 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id A518116A4CE; Mon, 21 Jun 2004 05:01:42 +0000 (GMT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 2962C43D58; Mon, 21 Jun 2004 05:01:42 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i5L51e5v008152;	Mon, 21 Jun 2004 15:01:41 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i5L51cnl031991;	Mon, 21 Jun 2004 15:01:39 +1000
Date: Mon, 21 Jun 2004 15:01:38 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Don Lewis <truckman@freebsd.org>
In-Reply-To: <200406210258.i5L2wTKF047144@gw.catspoiler.org>
Message-ID: <20040621132119.Q8596@gamplex.bde.org>
References: <200406210258.i5L2wTKF047144@gw.catspoiler.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: threads@freebsd.org
cc: rwatson@freebsd.org
cc: current@freebsd.org
Subject: Re: calcru: negative time ... followed by freeze
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jun 2004 05:01:42 -0000

On Sun, 20 Jun 2004, Don Lewis wrote:

> On 20 Jun, Robert Watson wrote:
> > FYI, this is a Xeon box with two physical processors, each with two
> > logical processors, and the problem could well be SMP-related.
>
> The hang might be SMP-related, but I just started getting the calcru
> messages on my UP Athlon XP box.  I just upgraded to today's
> -CURRENT from a week-old version, and my console got spammed by a bunch
> of these messages while I was running portupgrade.
>
> calcru: negative time of 0 usec for pid 21261 (sh)
> calcru: negative time of 0 usec for pid 21261 (sh)
> ...
> calcru: negative time of 3917 usec for pid 49504 (sh)
> calcru: negative time of 3917 usec for pid 49504 (sh)

Hmm, these are nonnegative times.

The extra messages in -current might be caused by fill_kinfo() now calling
calcru().

Misreporting of negative times is by the following too-simple dianostic:

% 	tu = (u_int64_t)tv.tv_sec * 1000000 + tv.tv_usec;
% 	ptu = p->p_uu + p->p_su + p->p_iu;
% 	if (tu < ptu || (int64_t)tu < 0) {
% 		printf("calcru: negative time of %jd usec for pid %d (%s)\n",
% 		    (intmax_t)tu, p->p_pid, p->p_comm);
% 		tu = ptu;
% 	}

The message is also printed for the tu < ptu case, which is what you are
getting.

I fixed the messages when I got a lot of them due to a local bug.  The local
bug was from double rounding for calcru() on child times (which -current
doesn't do).  I can't see how this could be the problem in -current, since
the components of ptu are rounded down and there is a KASSERT that they
added up to less than tu in the previous call.

Ah, here is a likely cause of the bug in -current:

% 	if (p == curthread->td_proc) {
% 		/*
% 		 * Adjust for the current time slice.  This is actually fairly
% 		 * important since the error here is on the order of a time
% 		 * quantum, which is much greater than the sampling error.
% 		 * XXXKSE use a different test due to threads on other
% 		 * processors also being 'current'.
% 		 */
% 		binuptime(&bt);
% 		bintime_sub(&bt, PCPU_PTR(switchtime));
% 		bintime_add(&bt, &p->p_runtime);
% 	} else
% 		bt = p->p_runtime;

The XXXKSE comment is correct that this might be broken.  If the (p
!= curthread->td_proc) case happens at all for a running process, then
it gives a wrong (out of date) timestamp in bt.  This wrongness will
be detected if calcru() is was called called earlier in the current
timeslice and took the other path here.

The recent change to fill_kinfo() is quite likely to trigger detection
of this bug.  fill_kinfo() is often used to iterate over all processes
for ps, so it will call calcru() with (p != curthread->td_proc) for
all processes other than the one running it, and give a bt that is out
of date for all such processes that are actually running.  Since there
can be at most one running process per CPU, this bug only affects SMP.

The call to calcru() from ttyinfo() may be the only other trigger.
ttyinfo() picks a process and should rarely or never pick the ithread
running it, so it will almost always take the (p != curthread->td_proc)
path.  Again, this is only a problem for the SMP case since in the !SMP
case the picked process must have been switched away from to run the
ithread, so it cannot be running.

Bruce