From owner-freebsd-pf@FreeBSD.ORG Tue May 3 07:32:12 2011 Return-Path: Delivered-To: freebsd-pf@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06597106566C for ; Tue, 3 May 2011 07:32:12 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta04.emeryville.ca.mail.comcast.net (qmta04.emeryville.ca.mail.comcast.net [76.96.30.40]) by mx1.freebsd.org (Postfix) with ESMTP id E0EEC8FC15 for ; Tue, 3 May 2011 07:32:11 +0000 (UTC) Received: from omta03.emeryville.ca.mail.comcast.net ([76.96.30.27]) by qmta04.emeryville.ca.mail.comcast.net with comcast id evXl1g0010b6N64A4vYBGr; Tue, 03 May 2011 07:32:11 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta03.emeryville.ca.mail.comcast.net with comcast id evY91g0041t3BNj8PvY9Nh; Tue, 03 May 2011 07:32:11 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 7088E9B418; Tue, 3 May 2011 00:32:09 -0700 (PDT) Date: Tue, 3 May 2011 00:32:09 -0700 From: Jeremy Chadwick To: Daniel Hartmeier Message-ID: <20110503073209.GA37676@icarus.home.lan> References: <20110503015854.GA31444@icarus.home.lan> <20110503070042.GA9657@insomnia.benzedrine.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110503070042.GA9657@insomnia.benzedrine.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org, freebsd-pf@freebsd.org Subject: Re: RELENG_8 pf stack issue (state count spiraling out of control) X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 07:32:12 -0000 On Tue, May 03, 2011 at 09:00:42AM +0200, Daniel Hartmeier wrote: > I read those graphs differently: the problem doesn't arise slowly, > but rather seems to start suddenly at 13:00. > > Right after 13:00, traffic on em0 drops, i.e. the firewall seems > to stop forwarding packets completely. > > Yet, at the same time, the states start to increase, almost linearly > at about one state every two seconds, until the limit of 10,000 is > reached. Reaching the limit seems to be only a side-effect of a > problem that started at 13:00. > > > Here's one piece of core.0.txt which makes no sense to me -- the "rate" > > column. I have a very hard time believing that was the interrupt rate > > of all the relevant devices at the time (way too high). Maybe this data > > becomes wrong only during a coredump? The total column I could believe. > > > > ------------------------------------------------------------------------ > > vmstat -i > > > > interrupt total rate > > irq4: uart0 54768 912 > > irq6: fdc0 1 0 > > irq17: uhci1+ 172 2 > > irq23: uhci3 ehci1+ 2367 39 > > cpu0: timer 13183882632 219731377 > > irq256: em0 260491055 4341517 > > irq257: em1 127555036 2125917 > > irq258: ahci0 225923164 3765386 > > cpu2: timer 13183881837 219731363 > > cpu1: timer 13002196469 216703274 > > cpu3: timer 13183881783 219731363 > > Total 53167869284 886131154 > > ------------------------------------------------------------------------ > > I find this suspect as well, but I don't have an explanation yet. > > Are you using anything non-GENERIC related to timers, like change > HZ or enable polling? HZ is standard (1000 is the default I believe), and I do not use polling. > Are you sure the problem didn't start right at 13:00, and cause complete > packet loss for the entire period, and that it grew gradually worse > instead? It's hard to discern from the graphs, but I can tell you exactly what I saw TCP-wise since I did have some already-existing/established TCP connections to the box (e.g. connections which already had ESTABLISHED states according to pfctl -s state) when it began exhibiting issues. Any packets which already had existing state entries in pf's state table continued to work, and bidirectionally. New inbound connections to the box via em0 would result in no response/timeout (and as indicated per pfctl, such packets were being dropped due to the state limit being reached). Outbound connections from the box via em0 to the outside world also resulted in no response/timeout. I will show you evidence of the latter. The first indication of a problem in syslog is the following message from named -- this is the first in my entire life I've ever seen this message, but seems to indicate some kind of internal watchdog was fired within named itself. The log I'm looking at, by the way, is /var/log/all.log -- yes, I do turn that on (for reasons exactly like this). This box is a secondary nameserver (public), so keep that in mind too. Anyway: May 1 12:50:14 isis named[728]: *** POKED TIMER *** Seconds later, I see unexpected RCODE messages, lame server messages, etc.. -- all which indicate packets to some degree are working ("the usual" badly-configured nameservers on the Internet). A few minutes later: May 1 12:53:15 isis named[728]: *** POKED TIMER *** May 1 12:53:54 isis named[728]: *** POKED TIMER *** With more of the usual unexpected RCODE/SERVFAIL messages after that. The next message: May 1 13:28:55 isis named[728]: *** POKED TIMER *** May 1 13:29:13 isis named[728]: *** POKED TIMER *** May 1 13:30:11 isis last message repeated 3 times Then more RCODE/SERVFAIL and something called "FORMERR" but that could be normal as well. Remember, all from named. This "cycle" of behaviour continued, with the number of POKED TIMER messages gradually increasing more and more as time went on. By 16:07 on May 1st, these messages were arriving usually in "bursts" of 5 or 6. Things finally "exploded", from named's perspective, here (with slaved zones X'd out): May 1 19:23:21 isis named[728]: *** POKED TIMER *** May 1 19:28:59 isis named[728]: zone XXXXXXXX/IN: refresh: failure trying master x.x.x.x#53 (source x.x.x.x#0): operation canceled May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 213.179.160.66#53 May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 193.0.12.4#53 May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 193.194.64.242#53 May 1 19:35:32 isis named[728]: host unreachable resolving 'dns2.djaweb.dz/A/IN': 192.134.0.49#53 And many other slaved zones reporting the exact same error. The hostnames which couldn't be resolved have no relevancy (honestly there is no consistency in any of them, and given what our service does, the randomness is expected. (Read: I just picked linear log lines above; don't focus on dns2.djaweb.dz like it's somehow responsible for this)) To me, this indicates a serious IP stack, firewall, or network exhaustion situation. The "host unreachable" messages became more and more regular, along with the occasional POKED TIMER. So what I'm saying is that the "most chatty" of all the daemons on the machine during this time was named, which does not surprise me given the machine's role. I finally logged in on May 2nd at 13:41 and tried to put an end to the madness. Other non-syslog-based daemons on the machine doing logging show no signs of issues, I/O errors (network or otherwise), etc.. Again, this doesn't surprise me, because nobody with existing connections to the machine was complaining (and hadn't). Only when I was told by a user that *new* connections were failing did I begin to investigate. So, it seems likely that network I/O mostly stopped because the IP stack stopped handling new I/O due to whatever was going on with pf. While at the same time, pf's state counter started gradually incrementing for reasons unknown -- an indicator that something bad was happening, almost certainly within pf itself, or somewhere within the kernel. I'm inclined to believe pf, because existing/ESTABLISHED stateful entries continued to get processed okay. Therefore, to me, the graphs are indicators that the problem started at around 12:50 PDT (meaning impact began at 12:50 PDT), and something very bad continued to happen within the pf stack, until I managed to intervene. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP 4BD6C0CB |