Date: Thu, 20 Jul 2006 17:53:29 +0200 From: Michal Mertl <mime@traveller.cz> To: freebsd-stable@freebsd.org Subject: Kernel panic with PF Message-ID: <1153410809.1126.66.camel@genius.i.cz>
next in thread | raw e-mail | index | archive | help
Hello, I am deploying FreeBSD based application proxies' based firewall (www.kernun.com, but not much English there) and am having frequent panics of RELENG_6_1 under load. The server has IP forwarding disabled. I've got two machines in a carp cluster and the transparent proxies use PF to get the data. I don't know much about kernel internals and PF but from the following backtrace I understand that the crash happens because rpool->cur on line 2158 in src/sys/contrib/pf/net/pf.c is NULL and is dereferenced. It probably shouldn't happen yet it does. The machines are SMP and were running SMP kernel. The only places where pool.cur (or pool->cur) is assigned to are in pf_ioctl.c. It seems there are some lock operations though so it is probably believed that the coder is properly locked. I have been running with kern.smp.disabled=1 for a moment before I put the old firewall in place and haven't seen the panic but the time was deffinitely too short to make me believe it fixes the issue. Can setting debug.mpsafenet to 0 possibly also help? I could probably bandaid this particular failure mode by returning failure instead of panicing but the bug is probably elsewhere. I've lost the debug kernel from which this backtrace is and can't therefore continue much :-(. Unfortunately so far I can only reproduce the problem in production and for obvious reasons I can't put it there. Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x28 fault code = supervisor read, page not present instruction pointer = 0x8:0xffffffff801ab528 stack pointer = 0x10:0xffffffffb1ade650 frame pointer = 0x10:0xffffff004cc7cc30 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 15 (swi1: net) trap number = 12 panic: page fault #0 doadump () at pcpu.h:172 #1 0x0000000000000004 in ?? () #2 0xffffffff803d5137 in boot (howto=260) at ../../../kern/kern_shutdown.c:402 #3 0xffffffff803d58a1 in panic (fmt=0xffffff007ba32000 "@\223<A3>{") at ../../../kern/kern_shutdown.c:558 #4 0xffffffff80543b3f in trap_fatal (frame=0xffffff007ba32000, eva=18446742976272241472) at ../../../amd64/amd64/trap.c:660 #5 0xffffffff80543e5f in trap_pfault (frame=0xffffffffb1ade5a0, usermode=0) at ../../../amd64/amd64/trap.c:573 #6 0xffffffff80544113 in trap (frame= {tf_rdi = 2, tf_rsi = -1098223465792, tf_rdx = -1098439497700, tf_rcx = -1 314002464, tf_r8 = 0, tf_r9 = -1314002776, tf_rax = 0, tf_rbx = 0, tf_rbp = -109 8223465424, tf_r10 = 1, tf_r11 = 257, tf_r12 = -1098439497700, tf_r13 = -1314002 776, tf_r14 = 2, tf_r15 = -1314002464, tf_trapno = 12, tf_addr = 40, tf_flags = 216171684640539392, tf_err = 0, tf_rip = -2145733336, tf_cs = 8, tf_rflags = 661 18, tf_rsp = -1314003360, tf_ss = 16}) at ../../../amd64/amd64/trap.c:352 #7 0xffffffff8052feab in calltrap () at ../../../amd64/amd64/exception.S:168 #8 0xffffffff801ab528 in pf_map_addr (af=2 '\002', r=0xffffff004cc7cac0, saddr=0xffffff003fe7681c, naddr=0xffffffffb1ade9e0, init_addr=0x0, sn=0xffffffffb1ade8a8) at ../../../contrib/pf/net/pf.c:2163 #9 0xffffffff801acab6 in pf_get_translation (pd=0xffffffffb1ade9c0, m=0xffffff0042ede900, off=20, direction=1, kif=0xffffff007b038a00, sn=0xffffffffb1ade8a8, saddr=0xffffff003fe7681c, sport=0, daddr=0xffffff003fe76820, dport=50881, naddr=0xffffffffb1ade9e0, nport=0xffffffffb1ade8b6) at ../../../contrib/pf/net/pf.c:2618 #10 0xffffffff801b315b in pf_test_tcp (rm=0xffffffffb1ade960, sm=0xffffffffb1ade950, direction=1, kif=0xffffff007b038a00, m=0xffffff0042ede900, off=20, h=0xffffff003fe76810, pd=0xffffffffb1ade9c0, am=0xffffffffb1ade968, rsm=0xffffffffb1ade970, ifq=0x2, inp=0x0) at ../../../contrib/pf/net/pf.c:3013 #11 0xffffffff801b5694 in pf_test (dir=1, ifp=0xffffff0000bee800, m0=0xffffffffb1adeaa0, eh=0xffffffffb1ade97e, inp=0x0) at ../../../contrib/pf/net/pf.c:6449 #12 0xffffffff801bafb2 in pf_check_in (arg=0x2, m=0xffffffffb1adeaa0, ifp=0xffffff004cc7cac0, dir=-1314002464, inp=0xffffffffb1ade9e0) at ../../../contrib/pf/net/pf_ioctl.c:3358 #13 0xffffffff80461c2e in pfil_run_hooks (ph=0xffffffff807e0920, mp=0xffffffffb1adeb28, ifp=0xffffff0000bee800, dir=1, inp=0x0) at ../../../net/pfil.c:139 #14 0xffffffff8048d225 in ip_input (m=0xffffff0042ede900) at ../../../netinet/ip_input.c:465 #15 0xffffffff8046180c in netisr_processqueue (ni=0xffffffff807df690) at ../../../net/netisr.c:236 #16 0xffffffff80461abd in swi_net (dummy=0x2) at ../../../net/netisr.c:349 #17 0xffffffff803bbd99 in ithread_loop (arg=0xffffff00000506a0) at ../../../kern/kern_intr.c:684 #18 0xffffffff803ba527 in fork_exit ( callout=0xffffffff803bbc50 <ithread_loop>, arg=0xffffff00000506a0, frame=0xffffffffb1adec50) at ../../../kern/kern_fork.c:805 #19 0xffffffff8053020e in fork_trampoline () at ../../../amd64/amd64/exception.S:394 #20 0x0000000000000000 in ?? () The firewall also reports lots of PF problems durings operation: Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: KERN-100-E [natutil.c:770] ioctl(): Invalid argument (EINVAL=22) Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-111-E add_rule(): PF ioctl DIOCADDRULE failed Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NATT-701-E addnatmap out(): Adding TCP NAT MAP from [127.0.0.1]:60860 to [212.80.76.13]:80 -> [193.179.161.10]:60860 failed Jul 20 10:44:11 fw1 kernel: Jul 20 10:44:11 fw1 HTTP[7607]: NETL-210-E netbind(server,10): NAT binding failed Kernel often reports "pool_ticket: 1429 != 1430" (with increasing numbers over time). Thank you very much for any advice. Regards Michal
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1153410809.1126.66.camel>