From owner-freebsd-current@FreeBSD.ORG Mon Jun 21 08:05:36 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9791916A4CE; Mon, 21 Jun 2004 08:05:36 +0000 (GMT) Received: from mailout2.pacific.net.au (mailout2.pacific.net.au [61.8.0.85]) by mx1.FreeBSD.org (Postfix) with ESMTP id 161B043D58; Mon, 21 Jun 2004 08:05:36 +0000 (GMT) (envelope-from bde@zeta.org.au) Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au [61.8.0.87])i5L8555v031808; Mon, 21 Jun 2004 18:05:05 +1000 Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) i5L853nl023076; Mon, 21 Jun 2004 18:05:03 +1000 Date: Mon, 21 Jun 2004 18:05:02 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Julian Elischer In-Reply-To: Message-ID: <20040621174821.B979@gamplex.bde.org> References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: threads@freebsd.org cc: Don Lewis cc: rwatson@freebsd.org cc: current@freebsd.org Subject: Re: calcru: negative time ... followed by freeze X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jun 2004 08:05:36 -0000 On Mon, 21 Jun 2004, Julian Elischer wrote: > On Mon, 21 Jun 2004, Bruce Evans wrote: > > > Ah, here is a likely cause of the bug in -current: > > > > % if (p == curthread->td_proc) { > > % /* > > % * Adjust for the current time slice. This is actually fairly > > % * important since the error here is on the order of a time > > % * quantum, which is much greater than the sampling error. > > % * XXXKSE use a different test due to threads on other > > % * processors also being 'current'. > > % */ > > % binuptime(&bt); > > % bintime_sub(&bt, PCPU_PTR(switchtime)); > > % bintime_add(&bt, &p->p_runtime); > > % } else > > % bt = p->p_runtime; > > > > The XXXKSE comment is correct that this might be broken. If the (p > > != curthread->td_proc) case happens at all for a running process, then > > it gives a wrong (out of date) timestamp in bt. This wrongness will > > be detected if calcru() is was called called earlier in the current > > timeslice and took the other path here. > > It should be fairly easy as there is now a thread state that indicates > that it is actually running now.. It's not so easy [to fix] since the switchtime for threads running on other CPUs is inaccessible (it is in the CPU's pcpu data). The bug seems to be unrelated to KSE. It is related to SMP. RELENG_4 has the bug, and pre-KSE versions have a proc state that indicates if we have a running process which can't be handled right. I will turn off the check in the known broken case, and maybe change the printf() to a log() since the error is not very important and syscons's console output routine is suspect when called with sched_lock held. Bruce