From owner-freebsd-current@FreeBSD.ORG  Mon Jun 21 08:05:36 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 9791916A4CE; Mon, 21 Jun 2004 08:05:36 +0000 (GMT)
Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 161B043D58; Mon, 21 Jun 2004 08:05:36 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])i5L8555v031808;	Mon, 21 Jun 2004 18:05:05 +1000
Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246])
	i5L853nl023076;	Mon, 21 Jun 2004 18:05:03 +1000
Date: Mon, 21 Jun 2004 18:05:02 +1000 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@gamplex.bde.org
To: Julian Elischer <julian@elischer.org>
In-Reply-To: <Pine.BSF.4.21.0406210040150.69164-100000@InterJet.elischer.org>
Message-ID: <20040621174821.B979@gamplex.bde.org>
References: <Pine.BSF.4.21.0406210040150.69164-100000@InterJet.elischer.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: threads@freebsd.org
cc: Don Lewis <truckman@freebsd.org>
cc: rwatson@freebsd.org
cc: current@freebsd.org
Subject: Re: calcru: negative time ... followed by freeze
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 21 Jun 2004 08:05:36 -0000

On Mon, 21 Jun 2004, Julian Elischer wrote:

> On Mon, 21 Jun 2004, Bruce Evans wrote:
>
> > Ah, here is a likely cause of the bug in -current:
> >
> > % 	if (p == curthread->td_proc) {
> > % 		/*
> > % 		 * Adjust for the current time slice.  This is actually fairly
> > % 		 * important since the error here is on the order of a time
> > % 		 * quantum, which is much greater than the sampling error.
> > % 		 * XXXKSE use a different test due to threads on other
> > % 		 * processors also being 'current'.
> > % 		 */
> > % 		binuptime(&bt);
> > % 		bintime_sub(&bt, PCPU_PTR(switchtime));
> > % 		bintime_add(&bt, &p->p_runtime);
> > % 	} else
> > % 		bt = p->p_runtime;
> >
> > The XXXKSE comment is correct that this might be broken.  If the (p
> > != curthread->td_proc) case happens at all for a running process, then
> > it gives a wrong (out of date) timestamp in bt.  This wrongness will
> > be detected if calcru() is was called called earlier in the current
> > timeslice and took the other path here.
>
> It should be fairly easy as there is now a thread state that indicates
> that it is actually running now..

It's not so easy [to fix] since the switchtime for threads running on other
CPUs is inaccessible (it is in the CPU's pcpu data).

The bug seems to be unrelated to KSE.  It is related to SMP.  RELENG_4 has
the bug, and pre-KSE versions have a proc state that indicates if we have
a running process which can't be handled right.

I will turn off the check in the known broken case, and maybe change the
printf() to a log() since the error is not very important and syscons's
console output routine is suspect when called with sched_lock held.

Bruce