From owner-freebsd-current@FreeBSD.ORG  Sun Aug 19 23:49:33 2007
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F2E1416A41B;
	Sun, 19 Aug 2007 23:49:32 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com
	[216.240.101.25])
	by mx1.freebsd.org (Postfix) with ESMTP id BA0DA13C45D;
	Sun, 19 Aug 2007 23:49:32 +0000 (UTC)
	(envelope-from jroberson@chesapeake.net)
Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net
	[67.160.44.208]) (authenticated bits=0)
	by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id
	l7JNnRmF009233
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Sun, 19 Aug 2007 19:49:28 -0400 (EDT)
	(envelope-from jroberson@chesapeake.net)
Date: Sun, 19 Aug 2007 16:52:12 -0700 (PDT)
From: Jeff Roberson <jroberson@chesapeake.net>
X-X-Sender: jroberson@10.0.0.1
To: Diomidis Spinellis <dds@aueb.gr>
In-Reply-To: <46C8C2E6.9030405@aueb.gr>
Message-ID: <20070819164823.H568@10.0.0.1>
References: <20070819214953.I1099@fledge.watson.org> <46C8C2E6.9030405@aueb.gr>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: attilio@FreeBSD.ORG, jeff@FreeBSD.ORG, Robert Watson <rwatson@FreeBSD.ORG>,
	current@FreeBSD.ORG
Subject: Re: Crash in accounting code: encode_long(), due to bad rusage data?
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 19 Aug 2007 23:49:33 -0000

On Mon, 20 Aug 2007, Diomidis Spinellis wrote:

> Robert Watson wrote:
>> I recently upgraded two servers from FreeBSD 6-STABLE to FreeBSD 7-CURRENT 
>> in anticipation of the forthcoming release.  Both of them run with 
>> accounting enabled at all times.  When a large pine session was exiting on 
>> one of the two boxes, I ran into the following panic:
>> 
>> panic: encode_long: -ve value -32749
>
> Getting rid of the panic is easy:
>
> --- kern_acct.c	2007-08-20 01:15:18.000000000 +0300
> +++ kern_acct.c.new	2007-08-20 01:16:06.000000000 +0300
> @@ -523,8 +523,7 @@
> 	int norm_exp;	/* Normalized exponent */
> 	int shift;
>
> -	KASSERT(val >= 0,  ("encode_long: -ve value %ld", val));
> -	if (val == 0)
> +	if (val <= 0)
> 		return (0);
> 	norm_exp = fls(val) - 1;
> 	shift = FLT_MANT_DIG - norm_exp - 1;
>
> However, as you wrote, this doesn't fix the root of the problem.
>
>> I find the large negative value in ru_idrss somewhat sad to contemplate, 
>> and while this could well be a problem with capturing of process runtime 
>> information, I'd like it if the accounting code were robust against this 
>> sort of bug, rather than panicking, and I guess I'd also rather than the 
>> process runtime information also be correctly captured :-).
>
> Do you think it makes any sense for encode_long to be correctly encoding 
> negative numbers, or should we concentrate on locating and fixing the 
> negative ru_idrss problem?

The number overflowed.  Based on information from robert on IRC it 
probably wrapped more than once so the data is meaningless.  For this to 
continue to be useful we'd have to make irss and drss 64bit on 32bit 
platforms.  The problem probably doesn't occur on 64bit machines.

Basically irss/drss are kilobytes per tick.  If ticks are 1000 that works 
out to almost bytes per second of runtime.  So you can see how this easily 
overflows with a long-running high-memory application like pine.

What do you think about simply putting in a max value if we overflow?  We 
could then make a note about it in process accounting docs.  We might want 
to fix this in rusage as well.

Jeff


>
> Diomidis Spinellis - http://www.spinellis.gr
>