Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 8 Jul 2006 10:43:43 +0200
From:      Daniel Hartmeier <daniel@benzedrine.cx>
To:        "Douglas K. Rand" <rand@meridian-enviro.com>
Cc:        mcbride@openbsd.org, freebsd-pf@freebsd.org
Subject:   Re: pfsync & carp problems
Message-ID:  <20060708084343.GA32262@insomnia.benzedrine.cx>
In-Reply-To: <87zmfl466d.fsf@delta.meridian-enviro.com>
References:  <87ejwx1edf.wl%rand@meridian-enviro.com> <87zmfl466d.fsf@delta.meridian-enviro.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 07, 2006 at 01:32:26PM -0500, Douglas K. Rand wrote:

> Some more information after I discovered the -x loud option to
> pfctl. When the master firewall goes down and the already established
> TCP session hangs, I get these messages on the slave:
> 
> pf: BAD state: TCP 67.134.74.224:52173 67.134.74.224:52173 204.152.184.134:80 [lo=2943781408 high=2943846943 win=33304 modulator=0 wscale=1] [lo=3255565389 high=3255629101 win=65535 modulator=0 wscale=0] 4:4 A seq=3255634893 ack=2943781408 len=1448 ackskew=0 pkts=21109:24835 dir=in,rev
> pf: State failure on: 1       |

This means the web server is trying to send data to the client that is
out of (what pf thinks is legal for) its window.

The last ACK from the client that pf's state saw was 3255562493
(advertising th_win 33304 wscale factor 2^1), hence the upper boundary
of what the client accepts is 3255562493 + 2*33304 == seqhi 3255629101.

The packet's end, th_seq 3255634893 + len 1448 == 3255636341 is larger
than the client's seqhi 3255629101 (by 7240, which is 5*1448). Hence it
is blocked.

The fact that the server retransmits the same segment over and over
without going back to older segments probably means that it has gotten
an ACK from the client for 3255634893.

So how can the server have received an ACK up to 3255634893 when pf's
state has only seen an ACK for 3255562493?

I guess this depends on how you shut down the master in the first place.
For instance, if its kernel would, for a brief period of time, continue
to forward packets while pf is no longer seeing packets, this would be
possible. Also, there's a certain latency between pf updating its state
entry based on a passing packet and pfsync actually transmitting that
update to the slave. If an update was lost because the box was shutting
down precisely in that moment, I guess there is a chance for such a
race.

How are you disconnecting the master? Does this occur when you physically
disconnect the ethernet cable towards the server first?

I'm not sure if there's any code that should try to prevent this
scenario in a normal shutdown/reboot case (like disabling forwarding or
taking down interfaces in a certain order first).

Ryan, do we address this, or is it just a rare but expected case that this
might occur? Or did I miss anything and this shouldn't occur for some reason?

Daniel



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060708084343.GA32262>