Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 23 Aug 2007 00:06:16 +0200
From:      Max Laier <max@love2party.net>
To:        freebsd-pf@freebsd.org
Subject:   Re: pfsync errors
Message-ID:  <200708230006.32294.max@love2party.net>
In-Reply-To: <55e8a96c0708221242h2d5e7d15q847e6fac7cf60554@mail.gmail.com>
References:  <55e8a96c0708221242h2d5e7d15q847e6fac7cf60554@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart4352611.joD4R0DG9A
Content-Type: multipart/mixed;
  boundary="Boundary-01=_ZNLzG1Tc2e5FeEr"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

--Boundary-01=_ZNLzG1Tc2e5FeEr
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On Wednesday 22 August 2007, Bill Marquette wrote:
> For the last two days I've been troubleshooting a wierd issue where my
> secondary firewall in a pfsync/carp cluster isn't maintaining a state
> table similar in size to the primary - it's slowly increasing to the
> max size.  I think I've finally tracked it down to ip_output()
> returning an error, but at this point I'm lost.  The interfaces show
> no errors, this box happily ran OpenBSD for the last three years with
> no similar errors and has only started exhibiting this behavior after
> converting it.  I'm seeing this on multiple boxes, but am spending my
> time troubleshooting just one.  Any advice/assistance would be greatly
> appreciated, I'm at a loss and this is affecting my production
> environment.
>
> We're running RELENG_6_2, nics are Intel PRO/1000's (copper, but the
> cat-5e cable is a direct run to the 6513 switch one cabinet over -
> 15ft cable).
>
> This is a netstat from the primary machine, the secondary has been
> failed over to a couple times and looks similar (although
> interestingly the cluster seems to handle being on the secondary box
> better)
> # netstat -s -p pfsync
> pfsync:
>         409302985 packets received (IPv4)
>         0 packets received (IPv6)
>                 0 packets discarded for bad interface
>                 0 packets discarded for bad ttl
>                 0 packets shorter than header
>                 0 packets discarded for bad version
>                 0 packets discarded for bad HMAC
>                 0 packets discarded for bad action
>                 0 packets discarded for short packet
>                 0 states discarded for bad values
>                 0 stale states
>                 16980281 failed state lookup/inserts
>         1541416698 packets sent (IPv4)
>         0 packets sent (IPv6)
>                 0 send failed due to mbuf memory error
>                 182754275 send error

There are two reasons why we increase the send error counter.  Either the=20
internal deferred work queue is full or ip_output fails.  Could you=20
locate "pfsyncstats.pfsyncs_oerrors++" in your source code and replace=20
either occurrence with a printf().  Maybe use the attached.  This way we=20
will know what exactly fails and if it is ip_output, why.

> # netstat -i -Iem2
> Name    Mtu Network       Address              Ipkts Ierrs    Opkts
> Oerrs  Coll em2    1500 <Link#3>      00:04:23:a6:b7:be 409328713    27
> 1359271127 0     0
> em2    1500 192.168.100.2 l4dupfw140-sync   409327567     - 1359270884
>     -     -



=2D-=20
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News

--Boundary-01=_ZNLzG1Tc2e5FeEr
Content-Type: text/x-diff;
  charset="iso-8859-1";
  name="pfsync_error.diff"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="pfsync_error.diff"

Index: if_pfsync.c
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /usr/store/mlaier/fcvs/src/sys/contrib/pf/net/if_pfsync.c,v
retrieving revision 1.19.2.5
diff -u -r1.19.2.5 if_pfsync.c
=2D-- if_pfsync.c	19 Jan 2007 23:01:26 -0000	1.19.2.5
+++ if_pfsync.c	22 Aug 2007 22:05:04 -0000
@@ -1842,13 +1842,14 @@
 {
 	struct pfsync_softc *sc =3D (struct pfsync_softc *)arg;
 	struct mbuf *m;
+	int error;
=20
 	for(;;) {
 		IF_DEQUEUE(&sc->sc_ifq, m);
 		if (m =3D=3D NULL)
 			break;
=2D		if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL))
=2D			pfsyncstats.pfsyncs_oerrors++;
+		if ((error =3D ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, NULL)=
))
+			printf("pfsync_senddef: ip_output %d\n", error);
 	}
 }
=20

--Boundary-01=_ZNLzG1Tc2e5FeEr--

--nextPart4352611.joD4R0DG9A
Content-Type: application/pgp-signature; name=signature.asc 
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQBGzLNoXyyEoT62BG0RAumTAJ9vPlcsOSvv6Yk1MFJfCVSXexlshACePK7U
CjFj//r6E77RGekurLnNOoc=
=scAm
-----END PGP SIGNATURE-----

--nextPart4352611.joD4R0DG9A--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200708230006.32294.max>