From owner-freebsd-questions@FreeBSD.ORG Sun Jun 8 20:02:46 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CDF57106566C for ; Sun, 8 Jun 2008 20:02:46 +0000 (UTC) (envelope-from catalin@starcomms.com) Received: from webmail.starcomms.com (starcomms.com [41.205.191.5]) by mx1.freebsd.org (Postfix) with SMTP id A18BC8FC25 for ; Sun, 8 Jun 2008 20:02:44 +0000 (UTC) (envelope-from catalin@starcomms.com) Received: from (mars.starcomms.local [172.16.2.31]) by webmail.starcomms.com with smtp id 53c5_b0d3784a_3590_11dd_a173_001143cecab4; Sun, 08 Jun 2008 20:25:44 +0100 Received: from STA-HQ-S001.starcomms.local ([172.16.2.28]) by webmail.starcomms.com with Microsoft SMTPSVC(6.0.3790.3959); Sun, 8 Jun 2008 20:31:10 +0100 x-mimeole: Produced By Microsoft Exchange V6.5 x-cr-puzzleid: {5E6A582D-0123-4616-942C-C03716A9FC71} MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Content-class: urn:content-classes:message Date: Sun, 8 Jun 2008 20:31:11 +0100 Message-ID: <3A0AA7018522134597ED63B3B794C92A0213C690@STA-HQ-S001.starcomms.local> x-cr-hashedpuzzle: CIL2 CXxU DMG8 DYJH EGki EpY7 Eqd3 FKTV GH/E GfaL HLV2 HdfB HjVW IrO6 JsTY JtU/; 1; ZgByAGUAZQBiAHMAZAAtAHEAdQBlAHMAdABpAG8AbgBzAEAAZgByAGUAZQBiAHMAZAAuAG8AcgBnAA==; Sosha1_v1; 7; {5E6A582D-0123-4616-942C-C03716A9FC71}; YwBhAHQAYQBsAGkAbgBAAHMAdABhAHIAYwBvAG0AbQBzAC4AYwBvAG0A; Sun, 08 Jun 2008 19:25:10 GMT; awBlAHIAbgBlAGwAIABjAHIAYQBzAGgAIABjAG8AcgBlAGQAdQBtAHAAIABoAGUAbABwAA== X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: kernel crash coredump help Thread-Index: AcjJnV3OnhF0O6C7SCuPdBjv+MGMug== From: "Catalin Miclaus" To: "FreeBSD-Questions" X-OriginalArrivalTime: 08 Jun 2008 19:31:10.0718 (UTC) FILETIME=[34C1A1E0:01C8C99E] Subject: kernel crash coredump help X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 08 Jun 2008 20:02:46 -0000 Hi, We install a new server Dell 2950 with FreeBSD 7.0 and we've got some issues with same. Hardware: Dell 2950 Intel(R) Xeon(R) Dual CPU Quad-Core E5335 @ 2.00GHz (1995.01-MHz K8-class CPU) with 4079 MB RAM and 2 x 250GB SATA HDD. Normal server install using developer, all sources without games. Upgrade to 7.0-p1#, then recompile kernel with GENERIC + device pf device pfsync device pflog device carp options HZ=3D1000 options DEVICE_POLLING Server is running as secondary PF firewall with CARP/PFSYNC/IFSTATED. Additional services running on the server are bind, net-snmp and ssh. We have additional 7 servers running similar services with 6.2 and 7.0 FreeBSD all running fine. Later same day the server crashed. The traffic was on MASTER CARP server when crash happen, server was not under load, CPU was 0% and memory 10% from NMS reports.=20 We were able to got a crash dump: [root@fw2 FW]# kgdb kernel.debug /var/crash/vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd". Unread portion of the kernel message buffer: <7>arp_rtrequest: bad gateway 196.3.61.14 (!AF_LINK) Fatal trap 12: page fault while in kernel mode cpuid =3D 0; apic id =3D 00 fault virtual address =3D 0xda040020 fault code =3D supervisor read data, page not present instruction pointer =3D 0x8:0xffffffff80666070 stack pointer =3D 0x10:0xffffffffac3e0650 frame pointer =3D 0x10:0xffffff00cfb42820 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 19 (swi1: net) trap number =3D 12 panic: page fault cpuid =3D 0 Uptime: 2h10m11s Physical memory: 4079 MB Dumping 425 MB: 410 394 378 362 346 330 314 298 282 266 250 234 218 202 186 170 154 138 122 106 90 74 58 42 26 10 #0 doadump () at pcpu.h:194 194 __asm __volatile("movq %%gs:0,%0" : "=3Dr" (td)); (kgdb) list *0xffffffff80666070 0xffffffff80666070 is in uma_zfree_internal (uma_int.h:368). 363 int hval; 364 365 hval =3D UMA_HASH(hash, data); 366 367 SLIST_FOREACH(slab, &hash->uh_slab_hash[hval], us_hlink) { 368 if ((u_int8_t *)slab->us_data =3D=3D data) 369 return (slab); 370 } 371 return (NULL); 372 } (kgdb) backtrace #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff80497ea9 in boot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:409 #3 0xffffffff804982ad in panic (fmt=3D0x104
) at /usr/src/sys/kern/kern_shutdown.c:563 #4 0xffffffff8071ad64 in trap_fatal (frame=3D0xffffff00010e0340, eva=3D18446742974215697512) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff8071b135 in trap_pfault (frame=3D0xffffffffac3e05a0, usermode=3D0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0xffffffff8071ba78 in trap (frame=3D0xffffffffac3e05a0) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0xffffffff807016de in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0xffffffff80666070 in uma_zfree_internal (zone=3D0xffffff00cfb42820, item=3D0xffffff0003b2e000, udata=3D0x0, skip=3DVariable "skip" is not available.) at uma_int.h:367 #9 0xffffffff8066909b in uma_zfree_arg (zone=3D0xffffff00cfb42820, item=3D0xffffff0003b2e000, udata=3D0x0) at = /usr/src/sys/vm/uma_core.c:2405 #10 0xffffffff80665fe4 in uma_zfree_internal (zone=3D0xffffff00cfb429c0, item=3D0xffffff0003a86600, udata=3D0x0, skip=3DVariable "skip" is not available.) at /usr/src/sys/vm/uma_core.c:2434 #11 0xffffffff80666bba in bucket_drain (zone=3D0xffffff00cfb429c0, bucket=3D0xffffff0003a94830) at /usr/src/sys/vm/uma_core.c:595 #12 0xffffffff80666cab in bucket_cache_drain (zone=3D0xffffff00cfb429c0) at /usr/src/sys/vm/uma_core.c:662 #13 0xffffffff8066996b in zone_drain (zone=3D0xffffff00cfb429c0) at /usr/src/sys/vm/uma_core.c:710 #14 0xffffffff801b7f95 in pfsync_get_mbuf (sc=3D0xffffff0003573400, action=3D2 '\002', sp=3D0xffffff0003573570) at mbuf.h:529 #15 0xffffffff801b8208 in pfsync_pack_state (action=3DVariable "action" = is not available.) at /usr/src/sys/contrib/pf/net/if_pfsync.c:1512 #16 0xffffffff801ce863 in pf_test (dir=3D1, ifp=3D0xffffff000128d800, m0=3D0xffffffffac3e0a00, eh=3DVariable "eh" is not available.) at /usr/src/sys/contrib/pf/net/pf.c:6955 #17 0xffffffff801d360a in pf_check_in (arg=3DVariable "arg" is not available.) at /usr/src/sys/contrib/pf/net/pf_ioctl.c:3533 #18 0xffffffff80539561 in pfil_run_hooks (ph=3DVariable "ph" is not available.) at /usr/src/sys/net/pfil.c:78 #19 0xffffffff80574e2b in ip_input (m=3D0xffffff0036190500) at /usr/src/sys/netinet/ip_input.c:417 #20 0xffffffff8052dee1 in ether_demux (ifp=3D0xffffff000128d800, m=3D0xffffff0036190500) at /usr/src/sys/net/if_ethersubr.c:834 #21 0xffffffff8052e181 in ether_input (ifp=3D0xffffff000128d800, m=3D0xffffff0036190500) at /usr/src/sys/net/if_ethersubr.c:692 #22 0xffffffff802d77ac in em_rxeof (adapter=3D0xffffff000122f000, count=3D119) at /usr/src/sys/dev/em/if_em.c:4542 #23 0xffffffff802d84d7 in em_poll (ifp=3D0xffffff000128d800, = cmd=3DVariable "cmd" is not available.) at /usr/src/sys/dev/em/if_em.c:1433 #24 0xffffffff8048dd8d in netisr_poll () at /usr/src/sys/kern/kern_poll.c:432 #25 0xffffffff80537e8a in swi_net (dummy=3DVariable "dummy" is not available.) at /usr/src/sys/net/netisr.c:254 #26 0xffffffff8047b5a0 in ithread_loop (arg=3D0xffffff00010d9b80) at /usr/src/sys/kern/kern_intr.c:1036 #27 0xffffffff80478673 in fork_exit (callout=3D0xffffffff8047b430 , arg=3D0xffffff00010d9b80, frame=3D0xffffffffac3e0c80) at /usr/src/sys/kern/kern_fork.c:781 #28 0xffffffff80701aae in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:415 #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000001 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () #37 0x0000000000000000 in ?? () #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000000 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000000 in ?? () #44 0x0000000000000000 in ?? () #45 0x0000000000000000 in ?? () #46 0x0000000000000000 in ?? () #47 0x0000000000000000 in ?? () #48 0x0000000000000000 in ?? () #49 0x0000000000000000 in ?? () #50 0x0000000000000000 in ?? () #51 0x0000000000000000 in ?? () #52 0x0000000000000000 in ?? () #53 0x0000000000c9c000 in ?? () #54 0xffffff00010e0340 in ?? () #55 0x0000000000000001 in ?? () #56 0xffffff00010f3468 in ?? () #57 0xffffff00010e0680 in ?? () #58 0xffffff00010e0340 in ?? () #59 0xffffffffac3e0b58 in ?? () #60 0xffffff00010e0340 in ?? () #61 0xffffffff804b5b69 in sched_switch (td=3D0xffffff00010d9b80, newtd=3D0xffffffff8047b430, flags=3D0) at = /usr/src/sys/kern/sched_4bsd.c:905 #62 0x0000000000000000 in ?? () #63 0x0000000000000000 in ?? () #64 0x0000000000000000 in ?? () #65 0x0000000000000000 in ?? () #66 0x0000000000000000 in ?? () #67 0x0000000000000000 in ?? () #68 0x0000000000000000 in ?? () #69 0x0000000000000000 in ?? () #70 0x0000000000000000 in ?? () #71 0x0000000000000000 in ?? () #72 0x0000000000000000 in ?? () #73 0x0000000000000000 in ?? () #74 0x0000000000000000 in ?? () #75 0x0000000000000000 in ?? () #76 0x0000000000000000 in ?? () #77 0x0000000000000000 in ?? () #78 0x0000000000000000 in ?? () #79 0x0000000000000000 in ?? () #80 0x0000000000000000 in ?? () #81 0x0000000000000000 in ?? () #82 0x0000000000000000 in ?? () #83 0x0000000000000000 in ?? () #84 0x0000000000000000 in ?? () #85 0x0000000000000000 in ?? () #86 0x0000000000000000 in ?? () #87 0x0000000000000000 in ?? () #88 0x0000000000000000 in ?? () #89 0x0000000000000000 in ?? () #90 0x0000000000000000 in ?? () #91 0x0000000000000000 in ?? () #92 0x0000000000000000 in ?? () #93 0x0000000000000000 in ?? () #94 0x0000000000000000 in ?? () #95 0x0000000000000000 in ?? () #96 0x0000000000000000 in ?? () #97 0x0000000000000000 in ?? () #98 0x0000000000000000 in ?? () #99 0x0000000000000000 in ?? () #100 0x0000000000000000 in ?? () #101 0x0000000000000000 in ?? () #102 0x0000000000000000 in ?? () #103 0x0000000000000000 in ?? () #104 0x0000000000000000 in ?? () #105 0x0000000000000000 in ?? () #106 0x0000000000000000 in ?? () #107 0x0000000000000000 in ?? () #108 0x0000000000000000 in ?? () #109 0x0000000000000000 in ?? () #110 0x0000000000000000 in ?? () #111 0x0000000000000000 in ?? () #112 0x0000000000000000 in ?? () #113 0x0000000000000000 in ?? () #114 0x0000000000000000 in ?? () #115 0x0000000000000000 in ?? () #116 0x0000000000000000 in ?? () #117 0x0000000000000000 in ?? () #118 0x0000000000000000 in ?? () #119 0x0000000000000000 in ?? () #120 0x0000000000000000 in ?? () #121 0x0000000000000000 in ?? () #122 0x0000000000000000 in ?? () #123 0x0000000000000000 in ?? () #124 0x0000000000000000 in ?? () #125 0x0000000000000000 in ?? () #126 0x0000000000000000 in ?? () #127 0x0000000000000000 in ?? () #128 0x0000000000000000 in ?? () #129 0x0000000000000000 in ?? () #130 0x0000000000000000 in ?? () #131 0x0000000000000000 in ?? () #132 0x0000000000000000 in ?? () #133 0x0000000000000000 in ?? () Cannot access memory at address 0xffffffffac3e1000 (kgdb) Appreciate your help in identifying if this is a hardware failure or we just step on a bug. Best Regards Catalin Miclaus Network/Security ISP-Data Starcomms Ltd.