From owner-freebsd-pf@FreeBSD.ORG  Tue May  3 09:22:59 2011
Return-Path: <owner-freebsd-pf@FreeBSD.ORG>
Delivered-To: freebsd-pf@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 639CE106567A;
	Tue,  3 May 2011 09:22:59 +0000 (UTC)
	(envelope-from dhartmei@insomnia.benzedrine.cx)
Received: from insomnia.benzedrine.cx (106-30.3-213.fix.bluewin.ch
	[213.3.30.106])
	by mx1.freebsd.org (Postfix) with ESMTP id 8EE768FC12;
	Tue,  3 May 2011 09:22:57 +0000 (UTC)
Received: from insomnia.benzedrine.cx (localhost.benzedrine.cx [127.0.0.1])
	by insomnia.benzedrine.cx (8.14.1/8.13.4) with ESMTP id p439MvwH015703
	(version=TLSv1/SSLv3 cipher=DHE-DSS-AES256-SHA bits=256 verify=NO);
	Tue, 3 May 2011 11:22:57 +0200 (MEST)
Received: (from dhartmei@localhost)
	by insomnia.benzedrine.cx (8.14.1/8.12.10/Submit) id p439Mvm4009848;
	Tue, 3 May 2011 11:22:57 +0200 (MEST)
Date: Tue, 3 May 2011 11:22:57 +0200
From: Daniel Hartmeier <daniel@benzedrine.cx>
To: Jeremy Chadwick <freebsd@jdc.parodius.com>
Message-ID: <20110503092257.GC9657@insomnia.benzedrine.cx>
References: <20110503015854.GA31444@icarus.home.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110503015854.GA31444@icarus.home.lan>
User-Agent: Mutt/1.5.12-2006-07-14
Cc: freebsd-stable@freebsd.org, freebsd-pf@freebsd.org
Subject: Re: RELENG_8 pf stack issue (state count spiraling out of control)
X-BeenThere: freebsd-pf@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Technical discussion and general questions about packet filter
	\(pf\)" <freebsd-pf.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
	<mailto:freebsd-pf-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-pf>
List-Post: <mailto:freebsd-pf@freebsd.org>
List-Help: <mailto:freebsd-pf-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
	<mailto:freebsd-pf-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 09:22:59 -0000

On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:

> Here's one piece of core.0.txt which makes no sense to me -- the "rate"
> column.  I have a very hard time believing that was the interrupt rate
> of all the relevant devices at the time (way too high).  Maybe this data
> becomes wrong only during a coredump?  The total column I could believe.
> 
> ------------------------------------------------------------------------
> vmstat -i
> 
> interrupt                          total       rate
> irq4: uart0                        54768        912
> irq6: fdc0                             1          0
> irq17: uhci1+                        172          2
> irq23: uhci3 ehci1+                 2367         39
> cpu0: timer                  13183882632  219731377
> irq256: em0                    260491055    4341517
> irq257: em1                    127555036    2125917
> irq258: ahci0                  225923164    3765386
> cpu2: timer                  13183881837  219731363
> cpu1: timer                  13002196469  216703274
> cpu3: timer                  13183881783  219731363
> Total                        53167869284  886131154
> ------------------------------------------------------------------------
> 
> Here's what a normal "vmstat -i" shows from the command-line:
> 
> # vmstat -i
> interrupt                          total       rate
> irq4: uart0                          518          0
> irq6: fdc0                             1          0
> irq23: uhci3 ehci1+                  145          0
> cpu0: timer                     19041199       1999
> irq256: em0                       614280         64
> irq257: em1                       168529         17
> irq258: ahci0                     355536         37
> cpu2: timer                     19040462       1999
> cpu1: timer                     19040458       1999
> cpu3: timer                     19040454       1999
> Total                           77301582       8119

The cpu0-3 timer totals seem consistent in the first output: 
13183881783/1999/60/60/24 matches 76 days of uptime.

The high rate in the first output comes from vmstat.c dointr()'s
division of the total by the uptime:

        struct timespec sp;
        clock_gettime(CLOCK_MONOTONIC, &sp);
        uptime = sp.tv_sec;
	for (i = 0; i < nintr; i++) {
		printf("%-*s %20lu %10lu\n", istrnamlen, intrname,
		    *intrcnt, *intrcnt / uptime);
	}

>From this we can deduce that the value of uptime must have been
13183881783/219731363 = 60 (seconds).

Since the uptime was 76 days (and not just 60 seconds), the
CLOCK_MONOTONIC clock must have reset, wrapped, or been overwritten.

I don't know how that's possible, but if this means that the kernel
variable time_second was possibly going back, that could very well
have messed up pf's state purging.

Daniel