From owner-freebsd-pf@FreeBSD.ORG  Tue Jun  6 00:46:25 2006
Return-Path: <owner-freebsd-pf@FreeBSD.ORG>
X-Original-To: freebsd-pf@freebsd.org
Delivered-To: freebsd-pf@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 8124D16CE6F
	for <freebsd-pf@freebsd.org>; Mon,  5 Jun 2006 23:40:40 +0000 (UTC)
	(envelope-from fox@verio.net)
Received: from dfw-smtpout4.email.verio.net (dfw-smtpout4.email.verio.net
	[129.250.36.44])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 2006E43D49
	for <freebsd-pf@freebsd.org>; Mon,  5 Jun 2006 23:40:40 +0000 (GMT)
	(envelope-from fox@verio.net)
Received: from [129.250.36.63] (helo=dfw-mmp3.email.verio.net)
	by dfw-smtpout4.email.verio.net with esmtp id 1FnOgZ-0006OK-Ay
	for freebsd-pf@freebsd.org; Mon, 05 Jun 2006 23:40:39 +0000
Received: from [129.250.40.241] (helo=limbo.int.dllstx01.us.it.verio.net)
	by dfw-mmp3.email.verio.net with esmtp id 1FnOgZ-0006yr-5y
	for freebsd-pf@freebsd.org; Mon, 05 Jun 2006 23:40:39 +0000
Received: by limbo.int.dllstx01.us.it.verio.net (Postfix, from userid 1000)
	id 5CB8D8E2E7; Mon,  5 Jun 2006 18:40:32 -0500 (CDT)
Date: Mon, 5 Jun 2006 18:40:32 -0500
From: David DeSimone <fox@verio.net>
To: freebsd-pf@freebsd.org
Message-ID: <20060605234031.GA4787@verio.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.9i
Subject: pfsync after reboot does not synchronize
X-BeenThere: freebsd-pf@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "Technical discussion and general questions about packet filter
	\(pf\)" <freebsd-pf.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
	<mailto:freebsd-pf-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-pf>
List-Post: <mailto:freebsd-pf@freebsd.org>
List-Help: <mailto:freebsd-pf-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-pf>,
	<mailto:freebsd-pf-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 06 Jun 2006 00:46:38 -0000

I tried posting some messages about PF to the freebsd-net mailing list,
but they seemed to be ignored.  So I thought I would try sending my
questions here.

I am trying to figure out why pfsync does not seem to work correctly
when one of my cluster nodes reboots.

When I reboot one of the cluster members, the state tables do appear to
synchronize, sort of, and populate with some of the same connection
states, but not all of them.

That is "pfctl -ss" on both cluster members will show a different number
of state entries.  Vastly different if the new member has only been up
for a minute or two.

In particular, long-lived, extant connections (such as IRC server
connections) seem to never show up in the rebooted member's state table,
even though the connections continue to update their state on the
current carp master.

I figured that doing ifconfig down/up would send some sort of "full
sync" message between the two members, to cause the entire state table
to be sent in bulk.  Eventually I learned that the method to do this is
to use "ifconfig syncdev" to force a bulk update:

    ifconfig pfsync0 syncdev fxp0   # $pfsync_syncdev

When I perform the above command, I see the following debug output (when
PF is configured at "misc" or "loud" debug level):

    On the cluster member receiving the requests:

	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request
	pfsync: received bulk update request

    On the cluster member making the request (where syncdev was just
    ifconfig'd):

	pfsync: requesting bulk update
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: received bulk update start
	pfsync: failed to receive bulk update status

After performing this manual action, I find the state table is much
better populated, and the two firewalls appear to be synchronized. 
However, the messages above bother me.  It looks to me like the cluster
member making the request repeats it over and over again, and finally
gives up after PFSYNC_MAX_BULKTRIES (12) attempts.  Shouldn't that be
something that only happens in exceptional conditions?  Yet, I can make
it happen every time, even on a test cluster with no traffic (and thus
an almost empty state table).

Does anyone have any insight as to why I see these problems?

1.  Why does pfsync synchronize the state tables when I use the
    "ifconfig syncdev" trick to force a bulk update, yet it does
    not do this when the system is booting up?

2.  Why does pfsync keep repeating the bulk update request and then give
    up?  What message is not getting through?


The two cluster members have a direct cross-cable between them.  My PF
policy has these settings:

    set skip on pfsync0

    pass quick on fxp0 proto pfsync	# $pfsync_syncdev

-- 
David DeSimone == Network Admin == fox@verio.net
  "It took me fifteen years to discover that I had no
   talent for writing, but I couldn't give it up because
   by that time I was too famous.  -- Robert Benchley