From owner-freebsd-pf@FreeBSD.ORG  Tue May  3 09:16:23 2011
Return-Path: <owner-freebsd-pf@FreeBSD.ORG>
Delivered-To: freebsd-pf@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 0FA39106566C
	for <freebsd-pf@freebsd.org>; Tue,  3 May 2011 09:16:23 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from QMTA11.westchester.pa.mail.comcast.net
	(qmta11.westchester.pa.mail.comcast.net [76.96.59.211])
	by mx1.freebsd.org (Postfix) with ESMTP id AB5D38FC20
	for <freebsd-pf@freebsd.org>; Tue,  3 May 2011 09:16:22 +0000 (UTC)
Received: from omta03.westchester.pa.mail.comcast.net ([76.96.62.27])
	by QMTA11.westchester.pa.mail.comcast.net with comcast
	id exFN1g0010bG4ec5BxGNsW; Tue, 03 May 2011 09:16:22 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta03.westchester.pa.mail.comcast.net with comcast
	id exGL1g00A1t3BNj3PxGMoV; Tue, 03 May 2011 09:16:21 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 486319B418; Tue,  3 May 2011 02:16:19 -0700 (PDT)
Date: Tue, 3 May 2011 02:16:19 -0700
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Daniel Hartmeier <daniel@benzedrine.cx>
Message-ID: <20110503091619.GA39329@icarus.home.lan>
References: <20110503015854.GA31444@icarus.home.lan>
	<20110503084800.GB9657@insomnia.benzedrine.cx>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110503084800.GB9657@insomnia.benzedrine.cx>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org, freebsd-pf@freebsd.org
Subject: Re: RELENG_8 pf stack issue (state count spiraling out of control)
X-BeenThere: freebsd-pf@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Technical discussion and general questions about packet filter
	\(pf\)" <freebsd-pf.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
	<mailto:freebsd-pf-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-pf>
List-Post: <mailto:freebsd-pf@freebsd.org>
List-Help: <mailto:freebsd-pf-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
	<mailto:freebsd-pf-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 03 May 2011 09:16:23 -0000

On Tue, May 03, 2011 at 10:48:00AM +0200, Daniel Hartmeier wrote:
> On Mon, May 02, 2011 at 06:58:54PM -0700, Jeremy Chadwick wrote:
> 
> > Status: Enabled for 76 days 06:49:10          Debug: Urgent
> 
> > The "pf uptime" shown above, by the way, matches system uptime.
> 
> > ps -axl
> > 
> >   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
> >     0   422     0   0 -16  0     0     0 pftm   DL    ??  1362773081:04.00 [pfpurge]
> 
> This looks weird, too. 1362773081 minutes would be >2500 years.
> 
> Usually, you should see [idle] with almost uptime in minutes, and
> [pfpurge] with much less, like in
> 
>   # uptime
>   10:22AM  up 87 days, 19:36, 1 user, load averages: 0.00, 0.03, 0.05
>   # echo "((87*24)+19)*60+36" | bc
>   126456
> 
>   # ps -axl
>   UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
>     0     7     0   0  44  0     0     8 pftm   DL    ??    0:13.16 [pfpurge]
>     0    11     0   0 171  0     0     8 -      RL    ??  124311:23.04 [idle]

Agreed -- and that's exactly how things look on the same box right now:

$ ps -axl | egrep 'UID|pfpurge|idle'
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
    0    11     0   0 171  0     0    64 -      RL    ??  2375:15.91 [idle]
    0   422     0   0 -16  0     0    16 pftm   DL    ??    0:00.28 [pfpurge]

The ps -axl output I provided earlier came from /var/crash/core.0.txt.
So it's interesting that ps -axl as well as vmstat -i both showed
something off-the-wall.  I wonder if this can happen when within ddb?
Unsure.  I do have the core from "call doadump", so I should be able to
go back and re-examine it with kgdb.  I just wish I knew what to poke
around looking for in there.

Sadly I don't see a way with bsnmpd(8) to monitor things like interrupt
usage, etc. otherwise I'd be graphing that.  The more monitoring the
better; at least then I could say "wow, interrupts really did shoot
through the roof -- the box went crazy!" and RMA the thing.  :-)

> How is time handled on your machine? ntpdate on boot and then ntpd?

Yep, you got it:

ntpdate_enable="yes"
ntpdate_config="/conf/ME/ntp.conf"
ntpd_enable="yes"
ntpd_config="/conf/ME/ntp.conf"

I don't use ntpd_sync_on_start because I've never had reason to.  I
always set the system/BIOS clock to UTC time when building a system.  I
use ntpd's complaint about excessive offset as an indicator that
something bad happened.  /conf/ME/ntp.conf on this machine syncs from
another on the private network (em1) only, and that machine syncs from
a series of geographically-diverse stratum 2 servers and one stratum 1
server.  I've never seen high delays, offsets, or jitter using "ntpq -c
peers" on any box we have.

Actual timecounters (not time itself) are handled by ACPI-safe or
ACPI-fast (varies per boot; I've talked to jhb@ about this before and
it's normal).

powerd is in use on all our systems, and on this box use of processor
sleep states (lowest state = C2; physical CPU only supports C0-C2 and I
wouldn't go any lower than that anyway :-) ).  Appropriate
/boot/loader.conf entries that pertain to it:

# Enable use of P-state CPU frequency throttling.
# http://wiki.freebsd.org/TuningPowerConsumption
hint.p4tcc.0.disabled="1"
hint.acpi_throttle.0.disabled="1"

There are numerous other systems exactly like this one (literally same
model of hardware, RAM amount, CPU model, BIOS version and settings, and
system configuration, including pf) that have much higher load and fire
many more interrupts (particularly the NFS server!) that haven't
exhibited any problems.  This box had an uptime of 72 days, and prior to
that around 100 (before being taken down for world/kernel upgrades).
All machines have ECC RAM too, and MCA/MCE is in use.

You don't know how bad I'd love to blame this on a hardware issue (it's
always possible in some way or another), but the way this manifest
itself was extremely specific.  The problem could be super rare and
something triggered it that hasn't been seen before by developers.  So
far there's only 1 other user who has seen this behaviour but his was
attributed to use of "reassemble tcp" which I wasn't using; so the true
problem could still be out there.  I feel better knowing I'm not the
only one who's seen this oddity.

Since his post, I've removed all scrub rules from all of our machines as
a precaution.  If it ever happens again we'll have one more thing to
safely rule out.

We have other machines (different hardware, running RELENG_7 i386) which
have had 1+ year uptimes also using pf, so the possibility of just some
"crazy fluke" is plausible to me.

> Any manual time changes since the last boot?

None unless adjkerntz did something during the PST->PDT switchover, but
that would manifest itself as a +1 hour offset difference.

Since the machine rebooted the system synced its time without issue and
well within acceptable delta (1.075993 sec).  I did not power-cycle the
box during any of this; pure soft reboots.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.               PGP 4BD6C0CB |