From owner-freebsd-pf@FreeBSD.ORG Tue Jun 6 00:46:25 2006 Return-Path: X-Original-To: freebsd-pf@freebsd.org Delivered-To: freebsd-pf@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8124D16CE6F for ; Mon, 5 Jun 2006 23:40:40 +0000 (UTC) (envelope-from fox@verio.net) Received: from dfw-smtpout4.email.verio.net (dfw-smtpout4.email.verio.net [129.250.36.44]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2006E43D49 for ; Mon, 5 Jun 2006 23:40:40 +0000 (GMT) (envelope-from fox@verio.net) Received: from [129.250.36.63] (helo=dfw-mmp3.email.verio.net) by dfw-smtpout4.email.verio.net with esmtp id 1FnOgZ-0006OK-Ay for freebsd-pf@freebsd.org; Mon, 05 Jun 2006 23:40:39 +0000 Received: from [129.250.40.241] (helo=limbo.int.dllstx01.us.it.verio.net) by dfw-mmp3.email.verio.net with esmtp id 1FnOgZ-0006yr-5y for freebsd-pf@freebsd.org; Mon, 05 Jun 2006 23:40:39 +0000 Received: by limbo.int.dllstx01.us.it.verio.net (Postfix, from userid 1000) id 5CB8D8E2E7; Mon, 5 Jun 2006 18:40:32 -0500 (CDT) Date: Mon, 5 Jun 2006 18:40:32 -0500 From: David DeSimone To: freebsd-pf@freebsd.org Message-ID: <20060605234031.GA4787@verio.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.9i Subject: pfsync after reboot does not synchronize X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Jun 2006 00:46:38 -0000 I tried posting some messages about PF to the freebsd-net mailing list, but they seemed to be ignored. So I thought I would try sending my questions here. I am trying to figure out why pfsync does not seem to work correctly when one of my cluster nodes reboots. When I reboot one of the cluster members, the state tables do appear to synchronize, sort of, and populate with some of the same connection states, but not all of them. That is "pfctl -ss" on both cluster members will show a different number of state entries. Vastly different if the new member has only been up for a minute or two. In particular, long-lived, extant connections (such as IRC server connections) seem to never show up in the rebooted member's state table, even though the connections continue to update their state on the current carp master. I figured that doing ifconfig down/up would send some sort of "full sync" message between the two members, to cause the entire state table to be sent in bulk. Eventually I learned that the method to do this is to use "ifconfig syncdev" to force a bulk update: ifconfig pfsync0 syncdev fxp0 # $pfsync_syncdev When I perform the above command, I see the following debug output (when PF is configured at "misc" or "loud" debug level): On the cluster member receiving the requests: pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request pfsync: received bulk update request On the cluster member making the request (where syncdev was just ifconfig'd): pfsync: requesting bulk update pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: received bulk update start pfsync: failed to receive bulk update status After performing this manual action, I find the state table is much better populated, and the two firewalls appear to be synchronized. However, the messages above bother me. It looks to me like the cluster member making the request repeats it over and over again, and finally gives up after PFSYNC_MAX_BULKTRIES (12) attempts. Shouldn't that be something that only happens in exceptional conditions? Yet, I can make it happen every time, even on a test cluster with no traffic (and thus an almost empty state table). Does anyone have any insight as to why I see these problems? 1. Why does pfsync synchronize the state tables when I use the "ifconfig syncdev" trick to force a bulk update, yet it does not do this when the system is booting up? 2. Why does pfsync keep repeating the bulk update request and then give up? What message is not getting through? The two cluster members have a direct cross-cable between them. My PF policy has these settings: set skip on pfsync0 pass quick on fxp0 proto pfsync # $pfsync_syncdev -- David DeSimone == Network Admin == fox@verio.net "It took me fifteen years to discover that I had no talent for writing, but I couldn't give it up because by that time I was too famous. -- Robert Benchley