From owner-freebsd-current@FreeBSD.ORG Sun Aug 19 23:49:33 2007 Return-Path: Delivered-To: current@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2E1416A41B; Sun, 19 Aug 2007 23:49:32 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from webaccess-cl.virtdom.com (webaccess-cl.virtdom.com [216.240.101.25]) by mx1.freebsd.org (Postfix) with ESMTP id BA0DA13C45D; Sun, 19 Aug 2007 23:49:32 +0000 (UTC) (envelope-from jroberson@chesapeake.net) Received: from [192.168.1.103] (c-67-160-44-208.hsd1.wa.comcast.net [67.160.44.208]) (authenticated bits=0) by webaccess-cl.virtdom.com (8.13.6/8.13.6) with ESMTP id l7JNnRmF009233 (version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO); Sun, 19 Aug 2007 19:49:28 -0400 (EDT) (envelope-from jroberson@chesapeake.net) Date: Sun, 19 Aug 2007 16:52:12 -0700 (PDT) From: Jeff Roberson X-X-Sender: jroberson@10.0.0.1 To: Diomidis Spinellis In-Reply-To: <46C8C2E6.9030405@aueb.gr> Message-ID: <20070819164823.H568@10.0.0.1> References: <20070819214953.I1099@fledge.watson.org> <46C8C2E6.9030405@aueb.gr> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: attilio@FreeBSD.ORG, jeff@FreeBSD.ORG, Robert Watson , current@FreeBSD.ORG Subject: Re: Crash in accounting code: encode_long(), due to bad rusage data? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Aug 2007 23:49:33 -0000 On Mon, 20 Aug 2007, Diomidis Spinellis wrote: > Robert Watson wrote: >> I recently upgraded two servers from FreeBSD 6-STABLE to FreeBSD 7-CURRENT >> in anticipation of the forthcoming release. Both of them run with >> accounting enabled at all times. When a large pine session was exiting on >> one of the two boxes, I ran into the following panic: >> >> panic: encode_long: -ve value -32749 > > Getting rid of the panic is easy: > > --- kern_acct.c 2007-08-20 01:15:18.000000000 +0300 > +++ kern_acct.c.new 2007-08-20 01:16:06.000000000 +0300 > @@ -523,8 +523,7 @@ > int norm_exp; /* Normalized exponent */ > int shift; > > - KASSERT(val >= 0, ("encode_long: -ve value %ld", val)); > - if (val == 0) > + if (val <= 0) > return (0); > norm_exp = fls(val) - 1; > shift = FLT_MANT_DIG - norm_exp - 1; > > However, as you wrote, this doesn't fix the root of the problem. > >> I find the large negative value in ru_idrss somewhat sad to contemplate, >> and while this could well be a problem with capturing of process runtime >> information, I'd like it if the accounting code were robust against this >> sort of bug, rather than panicking, and I guess I'd also rather than the >> process runtime information also be correctly captured :-). > > Do you think it makes any sense for encode_long to be correctly encoding > negative numbers, or should we concentrate on locating and fixing the > negative ru_idrss problem? The number overflowed. Based on information from robert on IRC it probably wrapped more than once so the data is meaningless. For this to continue to be useful we'd have to make irss and drss 64bit on 32bit platforms. The problem probably doesn't occur on 64bit machines. Basically irss/drss are kilobytes per tick. If ticks are 1000 that works out to almost bytes per second of runtime. So you can see how this easily overflows with a long-running high-memory application like pine. What do you think about simply putting in a max value if we overflow? We could then make a note about it in process accounting docs. We might want to fix this in rusage as well. Jeff > > Diomidis Spinellis - http://www.spinellis.gr >