Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 21 Jul 2006 02:05:45 +0200
From:      Max Laier <max@love2party.net>
To:        freebsd-stable@freebsd.org
Cc:        Michal Mertl <mime@traveller.cz>, freebsd-pf@freebsd.org
Subject:   Re: Kernel panic with PF
Message-ID:  <200607210205.51614.max@love2party.net>
In-Reply-To: <1153410809.1126.66.camel@genius.i.cz>
References:  <1153410809.1126.66.camel@genius.i.cz>

next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2043876.unpcXM98FI
Content-Type: text/plain;
  charset="iso-8859-6"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

[CC'ing -pf]

On Thursday 20 July 2006 17:53, Michal Mertl wrote:
> Hello,
>
> I am deploying FreeBSD based application proxies' based firewall
> (www.kernun.com, but not much English there) and am having frequent
> panics of RELENG_6_1 under load. The server has IP forwarding disabled.
>
> I've got two machines in a carp cluster and the transparent proxies use
> PF to get the data.

Which proxies are you using?  The "pool_ticket: 1429 !=3D 1430" messages yo=
u=20
quote below indicate a synchronization problem within the app talking to pf=
=20
via ioctl's.  Tickets are used to ensure atomic commits for operations that=
=20
require more than one ioctl.  If your proxy app runs in parallel it might=20
screw up the internal state and thus leave it undefined afterwards.  I give=
=20
you that this shouldn't cause a kernel problem, but if we could fix the app=
=20
we can probably find the right sanity check more easily.

> I don't know much about kernel internals and PF but from the following
> backtrace I understand that the crash happens because rpool->cur on line
> 2158 in src/sys/contrib/pf/net/pf.c is NULL and is dereferenced. It
> probably shouldn't happen yet it does.
>
> The machines are SMP and were running SMP kernel. The only places where
> pool.cur (or pool->cur) is assigned to are in pf_ioctl.c. It seems there
> are some lock operations though so it is probably believed that the
> coder is properly locked.
>
> I have been running with kern.smp.disabled=3D1 for a moment before I put
> the old firewall in place and haven't seen the panic but the time was
> deffinitely too short to make me believe it fixes the issue. Can setting
> debug.mpsafenet to 0 possibly also help?
>
> I could probably bandaid this particular failure mode by returning
> failure instead of panicing but the bug is probably elsewhere.
>
> I've lost the debug kernel from which this backtrace is and can't
> therefore continue much :-(. Unfortunately so far I can only reproduce
> the problem in production and for obvious reasons I can't put it there.
>
> Fatal trap 12: page fault while in kernel mode
> cpuid =3D 0; apic id =3D 00
> fault virtual address   =3D 0x28
> fault code              =3D supervisor read, page not present
> instruction pointer     =3D 0x8:0xffffffff801ab528
> stack pointer           =3D 0x10:0xffffffffb1ade650
> frame pointer           =3D 0x10:0xffffff004cc7cc30
> code segment            =3D base 0x0, limit 0xfffff, type 0x1b
>                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
> current process         =3D 15 (swi1: net)
> trap number             =3D 12
> panic: page fault
>
> #0  doadump () at pcpu.h:172
> #1  0x0000000000000004 in ?? ()
> #2  0xffffffff803d5137 in boot (howto=3D260)
>     at ../../../kern/kern_shutdown.c:402
> #3  0xffffffff803d58a1 in panic (fmt=3D0xffffff007ba32000 "@\223<A3>{")
>     at ../../../kern/kern_shutdown.c:558
> #4  0xffffffff80543b3f in trap_fatal (frame=3D0xffffff007ba32000,
>     eva=3D18446742976272241472) at ../../../amd64/amd64/trap.c:660
> #5  0xffffffff80543e5f in trap_pfault (frame=3D0xffffffffb1ade5a0,
> usermode=3D0)
>     at ../../../amd64/amd64/trap.c:573
> #6  0xffffffff80544113 in trap (frame=3D
>       {tf_rdi =3D 2, tf_rsi =3D -1098223465792, tf_rdx =3D -1098439497700,
> tf_rcx =3D -1
> 314002464, tf_r8 =3D 0, tf_r9 =3D -1314002776, tf_rax =3D 0, tf_rbx =3D 0,
> tf_rbp =3D -109
> 8223465424, tf_r10 =3D 1, tf_r11 =3D 257, tf_r12 =3D -1098439497700, tf_r=
13 =3D
> -1314002
> 776, tf_r14 =3D 2, tf_r15 =3D -1314002464, tf_trapno =3D 12, tf_addr =3D =
40,
> tf_flags =3D
> 216171684640539392, tf_err =3D 0, tf_rip =3D -2145733336, tf_cs =3D 8,
> tf_rflags =3D 661
> 18, tf_rsp =3D -1314003360, tf_ss =3D 16})
> at ../../../amd64/amd64/trap.c:352
> #7  0xffffffff8052feab in calltrap ()
> at ../../../amd64/amd64/exception.S:168
> #8  0xffffffff801ab528 in pf_map_addr (af=3D2 '\002',
> r=3D0xffffff004cc7cac0,
>     saddr=3D0xffffff003fe7681c, naddr=3D0xffffffffb1ade9e0, init_addr=3D0=
x0,
>     sn=3D0xffffffffb1ade8a8) at ../../../contrib/pf/net/pf.c:2163
> #9  0xffffffff801acab6 in pf_get_translation (pd=3D0xffffffffb1ade9c0,
>     m=3D0xffffff0042ede900, off=3D20, direction=3D1, kif=3D0xffffff007b03=
8a00,
>     sn=3D0xffffffffb1ade8a8, saddr=3D0xffffff003fe7681c, sport=3D0,
>     daddr=3D0xffffff003fe76820, dport=3D50881, naddr=3D0xffffffffb1ade9e0,
>     nport=3D0xffffffffb1ade8b6) at ../../../contrib/pf/net/pf.c:2618
> #10 0xffffffff801b315b in pf_test_tcp (rm=3D0xffffffffb1ade960,
>     sm=3D0xffffffffb1ade950, direction=3D1, kif=3D0xffffff007b038a00,
>     m=3D0xffffff0042ede900, off=3D20, h=3D0xffffff003fe76810,
>     pd=3D0xffffffffb1ade9c0, am=3D0xffffffffb1ade968,
> rsm=3D0xffffffffb1ade970,
>     ifq=3D0x2, inp=3D0x0) at ../../../contrib/pf/net/pf.c:3013
> #11 0xffffffff801b5694 in pf_test (dir=3D1, ifp=3D0xffffff0000bee800,
>     m0=3D0xffffffffb1adeaa0, eh=3D0xffffffffb1ade97e, inp=3D0x0)
>     at ../../../contrib/pf/net/pf.c:6449
> #12 0xffffffff801bafb2 in pf_check_in (arg=3D0x2, m=3D0xffffffffb1adeaa0,
>     ifp=3D0xffffff004cc7cac0, dir=3D-1314002464, inp=3D0xffffffffb1ade9e0)
>     at ../../../contrib/pf/net/pf_ioctl.c:3358
> #13 0xffffffff80461c2e in pfil_run_hooks (ph=3D0xffffffff807e0920,
>     mp=3D0xffffffffb1adeb28, ifp=3D0xffffff0000bee800, dir=3D1, inp=3D0x0)
>     at ../../../net/pfil.c:139
> #14 0xffffffff8048d225 in ip_input (m=3D0xffffff0042ede900)
>     at ../../../netinet/ip_input.c:465
> #15 0xffffffff8046180c in netisr_processqueue (ni=3D0xffffffff807df690)
>     at ../../../net/netisr.c:236
> #16 0xffffffff80461abd in swi_net (dummy=3D0x2)
> at ../../../net/netisr.c:349
> #17 0xffffffff803bbd99 in ithread_loop (arg=3D0xffffff00000506a0)
>     at ../../../kern/kern_intr.c:684
> #18 0xffffffff803ba527 in fork_exit (
>     callout=3D0xffffffff803bbc50 <ithread_loop>, arg=3D0xffffff00000506a0,
>     frame=3D0xffffffffb1adec50) at ../../../kern/kern_fork.c:805
> #19 0xffffffff8053020e in fork_trampoline ()
>     at ../../../amd64/amd64/exception.S:394
> #20 0x0000000000000000 in ?? ()
>
> The firewall also reports lots of PF problems durings operation:
>
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: KERN-100-E
> [natutil.c:770] ioctl(): Invalid argument (EINVAL=3D22)
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-111-E
> add_rule(): PF ioctl DIOCADDRULE failed
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-701-E
> addnatmap out(): Adding TCP NAT MAP from [127.0.0.1]:60860 to
> [212.80.76.13]:80 -> [193.179.161.10]:60860 failed
> Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NETL-210-E
> netbind(server,10): NAT binding failed
>
> Kernel often reports "pool_ticket: 1429 !=3D 1430" (with increasing
> numbers over time).
>
> Thank you very much for any advice.

=2D-=20
/"\  Best regards,                      | mlaier@freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier@EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News

--nextPart2043876.unpcXM98FI
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (FreeBSD)

iD8DBQBEwBpfXyyEoT62BG0RAgnrAJ4h0goY21wyFk8+rrdlnNAMcY9vQACfbT4Y
fNf0Vs1dEldK2z5HktYUh+g=
=I4KF
-----END PGP SIGNATURE-----

--nextPart2043876.unpcXM98FI--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200607210205.51614.max>