From owner-freebsd-pf@FreeBSD.ORG Fri Jul 21 00:05:55 2006 Return-Path: X-Original-To: freebsd-pf@freebsd.org Delivered-To: freebsd-pf@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4095A16A4DD; Fri, 21 Jul 2006 00:05:55 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.177]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8438C43D46; Fri, 21 Jul 2006 00:05:54 +0000 (GMT) (envelope-from max@love2party.net) Received: from [88.64.184.76] (helo=amd64.laiers.local) by mrelayeu.kundenserver.de (node=mrelayeu2) with ESMTP (Nemesis), id 0MKwtQ-1G3iWf0DQe-000441; Fri, 21 Jul 2006 02:05:53 +0200 From: Max Laier Organization: FreeBSD To: freebsd-stable@freebsd.org Date: Fri, 21 Jul 2006 02:05:45 +0200 User-Agent: KMail/1.9.3 References: <1153410809.1126.66.camel@genius.i.cz> In-Reply-To: <1153410809.1126.66.camel@genius.i.cz> X-Face: ,,8R(x[kmU]tKN@>gtH1yQE4aslGdu+2]; R]*pL,U>^H?)gW@49@wdJ`H<=?utf-8?q?=25=7D*=5FBD=0A=09U=5For=3D=5CmOZf764=26nYj=3DJYbR1PW0ud?=>|!~,,CPC.1-D$FG@0h3#'5"k{V]a~.<=?utf-8?q?mZ=7D44=23Se=7Em=0A=09Fe=7E=5C=5DX5B=5D=5Fxj?=(ykz9QKMw_l0C2AQ]}Ym8)fU MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2043876.unpcXM98FI"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200607210205.51614.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:61c499deaeeba3ba5be80f48ecc83056 Cc: Michal Mertl , freebsd-pf@freebsd.org Subject: Re: Kernel panic with PF X-BeenThere: freebsd-pf@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Technical discussion and general questions about packet filter \(pf\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Jul 2006 00:05:55 -0000 --nextPart2043876.unpcXM98FI Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline [CC'ing -pf] On Thursday 20 July 2006 17:53, Michal Mertl wrote: > Hello, > > I am deploying FreeBSD based application proxies' based firewall > (www.kernun.com, but not much English there) and am having frequent > panics of RELENG_6_1 under load. The server has IP forwarding disabled. > > I've got two machines in a carp cluster and the transparent proxies use > PF to get the data. Which proxies are you using? The "pool_ticket: 1429 !=3D 1430" messages yo= u=20 quote below indicate a synchronization problem within the app talking to pf= =20 via ioctl's. Tickets are used to ensure atomic commits for operations that= =20 require more than one ioctl. If your proxy app runs in parallel it might=20 screw up the internal state and thus leave it undefined afterwards. I give= =20 you that this shouldn't cause a kernel problem, but if we could fix the app= =20 we can probably find the right sanity check more easily. > I don't know much about kernel internals and PF but from the following > backtrace I understand that the crash happens because rpool->cur on line > 2158 in src/sys/contrib/pf/net/pf.c is NULL and is dereferenced. It > probably shouldn't happen yet it does. > > The machines are SMP and were running SMP kernel. The only places where > pool.cur (or pool->cur) is assigned to are in pf_ioctl.c. It seems there > are some lock operations though so it is probably believed that the > coder is properly locked. > > I have been running with kern.smp.disabled=3D1 for a moment before I put > the old firewall in place and haven't seen the panic but the time was > deffinitely too short to make me believe it fixes the issue. Can setting > debug.mpsafenet to 0 possibly also help? > > I could probably bandaid this particular failure mode by returning > failure instead of panicing but the bug is probably elsewhere. > > I've lost the debug kernel from which this backtrace is and can't > therefore continue much :-(. Unfortunately so far I can only reproduce > the problem in production and for obvious reasons I can't put it there. > > Fatal trap 12: page fault while in kernel mode > cpuid =3D 0; apic id =3D 00 > fault virtual address =3D 0x28 > fault code =3D supervisor read, page not present > instruction pointer =3D 0x8:0xffffffff801ab528 > stack pointer =3D 0x10:0xffffffffb1ade650 > frame pointer =3D 0x10:0xffffff004cc7cc30 > code segment =3D base 0x0, limit 0xfffff, type 0x1b > =3D DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags =3D interrupt enabled, resume, IOPL =3D 0 > current process =3D 15 (swi1: net) > trap number =3D 12 > panic: page fault > > #0 doadump () at pcpu.h:172 > #1 0x0000000000000004 in ?? () > #2 0xffffffff803d5137 in boot (howto=3D260) > at ../../../kern/kern_shutdown.c:402 > #3 0xffffffff803d58a1 in panic (fmt=3D0xffffff007ba32000 "@\223{") > at ../../../kern/kern_shutdown.c:558 > #4 0xffffffff80543b3f in trap_fatal (frame=3D0xffffff007ba32000, > eva=3D18446742976272241472) at ../../../amd64/amd64/trap.c:660 > #5 0xffffffff80543e5f in trap_pfault (frame=3D0xffffffffb1ade5a0, > usermode=3D0) > at ../../../amd64/amd64/trap.c:573 > #6 0xffffffff80544113 in trap (frame=3D > {tf_rdi =3D 2, tf_rsi =3D -1098223465792, tf_rdx =3D -1098439497700, > tf_rcx =3D -1 > 314002464, tf_r8 =3D 0, tf_r9 =3D -1314002776, tf_rax =3D 0, tf_rbx =3D 0, > tf_rbp =3D -109 > 8223465424, tf_r10 =3D 1, tf_r11 =3D 257, tf_r12 =3D -1098439497700, tf_r= 13 =3D > -1314002 > 776, tf_r14 =3D 2, tf_r15 =3D -1314002464, tf_trapno =3D 12, tf_addr =3D = 40, > tf_flags =3D > 216171684640539392, tf_err =3D 0, tf_rip =3D -2145733336, tf_cs =3D 8, > tf_rflags =3D 661 > 18, tf_rsp =3D -1314003360, tf_ss =3D 16}) > at ../../../amd64/amd64/trap.c:352 > #7 0xffffffff8052feab in calltrap () > at ../../../amd64/amd64/exception.S:168 > #8 0xffffffff801ab528 in pf_map_addr (af=3D2 '\002', > r=3D0xffffff004cc7cac0, > saddr=3D0xffffff003fe7681c, naddr=3D0xffffffffb1ade9e0, init_addr=3D0= x0, > sn=3D0xffffffffb1ade8a8) at ../../../contrib/pf/net/pf.c:2163 > #9 0xffffffff801acab6 in pf_get_translation (pd=3D0xffffffffb1ade9c0, > m=3D0xffffff0042ede900, off=3D20, direction=3D1, kif=3D0xffffff007b03= 8a00, > sn=3D0xffffffffb1ade8a8, saddr=3D0xffffff003fe7681c, sport=3D0, > daddr=3D0xffffff003fe76820, dport=3D50881, naddr=3D0xffffffffb1ade9e0, > nport=3D0xffffffffb1ade8b6) at ../../../contrib/pf/net/pf.c:2618 > #10 0xffffffff801b315b in pf_test_tcp (rm=3D0xffffffffb1ade960, > sm=3D0xffffffffb1ade950, direction=3D1, kif=3D0xffffff007b038a00, > m=3D0xffffff0042ede900, off=3D20, h=3D0xffffff003fe76810, > pd=3D0xffffffffb1ade9c0, am=3D0xffffffffb1ade968, > rsm=3D0xffffffffb1ade970, > ifq=3D0x2, inp=3D0x0) at ../../../contrib/pf/net/pf.c:3013 > #11 0xffffffff801b5694 in pf_test (dir=3D1, ifp=3D0xffffff0000bee800, > m0=3D0xffffffffb1adeaa0, eh=3D0xffffffffb1ade97e, inp=3D0x0) > at ../../../contrib/pf/net/pf.c:6449 > #12 0xffffffff801bafb2 in pf_check_in (arg=3D0x2, m=3D0xffffffffb1adeaa0, > ifp=3D0xffffff004cc7cac0, dir=3D-1314002464, inp=3D0xffffffffb1ade9e0) > at ../../../contrib/pf/net/pf_ioctl.c:3358 > #13 0xffffffff80461c2e in pfil_run_hooks (ph=3D0xffffffff807e0920, > mp=3D0xffffffffb1adeb28, ifp=3D0xffffff0000bee800, dir=3D1, inp=3D0x0) > at ../../../net/pfil.c:139 > #14 0xffffffff8048d225 in ip_input (m=3D0xffffff0042ede900) > at ../../../netinet/ip_input.c:465 > #15 0xffffffff8046180c in netisr_processqueue (ni=3D0xffffffff807df690) > at ../../../net/netisr.c:236 > #16 0xffffffff80461abd in swi_net (dummy=3D0x2) > at ../../../net/netisr.c:349 > #17 0xffffffff803bbd99 in ithread_loop (arg=3D0xffffff00000506a0) > at ../../../kern/kern_intr.c:684 > #18 0xffffffff803ba527 in fork_exit ( > callout=3D0xffffffff803bbc50 , arg=3D0xffffff00000506a0, > frame=3D0xffffffffb1adec50) at ../../../kern/kern_fork.c:805 > #19 0xffffffff8053020e in fork_trampoline () > at ../../../amd64/amd64/exception.S:394 > #20 0x0000000000000000 in ?? () > > The firewall also reports lots of PF problems durings operation: > > Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: KERN-100-E > [natutil.c:770] ioctl(): Invalid argument (EINVAL=3D22) > Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-111-E > add_rule(): PF ioctl DIOCADDRULE failed > Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-701-E > addnatmap out(): Adding TCP NAT MAP from [127.0.0.1]:60860 to > [212.80.76.13]:80 -> [193.179.161.10]:60860 failed > Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NETL-210-E > netbind(server,10): NAT binding failed > > Kernel often reports "pool_ticket: 1429 !=3D 1430" (with increasing > numbers over time). > > Thank you very much for any advice. =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart2043876.unpcXM98FI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQBEwBpfXyyEoT62BG0RAgnrAJ4h0goY21wyFk8+rrdlnNAMcY9vQACfbT4Y fNf0Vs1dEldK2z5HktYUh+g= =I4KF -----END PGP SIGNATURE----- --nextPart2043876.unpcXM98FI--