From owner-freebsd-stable@FreeBSD.ORG Tue Jun 7 16:36:27 2005 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 77BD316A41C for ; Tue, 7 Jun 2005 16:36:27 +0000 (GMT) (envelope-from mgrooms@seton.org) Received: from zixvpm01.seton.org (zixvpm01.seton.org [207.193.126.161]) by mx1.FreeBSD.org (Postfix) with ESMTP id EDA8343D4C for ; Tue, 7 Jun 2005 16:36:26 +0000 (GMT) (envelope-from mgrooms@seton.org) Received: from zixvpm01.seton.org (ZixVPM [127.0.0.1]) by Outbound.seton.org (Proprietary) with ESMTP id C3C623600AB for ; Tue, 7 Jun 2005 11:36:24 -0500 (CDT) Received: from mx1-out.seton.org (unknown [10.21.254.249]) by zixvpm01.seton.org (Proprietary) with ESMTP id 8C056330047; Tue, 7 Jun 2005 11:36:24 -0500 (CDT) Received: from localhost (unknown [127.0.0.1]) by mx1-out.seton.org (Postfix) with ESMTP id 8075C8014E25; Tue, 7 Jun 2005 11:36:24 -0500 (CDT) Received: from mx1-out.seton.org ([10.21.254.249]) by localhost (mx1 [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 10797-17; Tue, 7 Jun 2005 11:36:24 -0500 (CDT) Received: from ausexfe01.seton.org (ausexfe01.seton.org [10.20.10.211]) by mx1-out.seton.org (Postfix) with ESMTP id 636D48014E24; Tue, 7 Jun 2005 11:36:24 -0500 (CDT) Received: from AUSEX2VS1.seton.org ([10.20.10.74]) by ausexfe01.seton.org with Microsoft SMTPSVC(6.0.3790.211); Tue, 7 Jun 2005 11:36:24 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Date: Tue, 7 Jun 2005 11:32:04 -0500 Message-ID: <28FCC7CB4CF6EA43AF83BCA2096E97D013E55C@AUSEX2VS1.seton.org> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 5.4-RELEASE lockups on amd64 SMP Thread-Index: AcVq8cnNgx8mcuhuQ+6jNcxmDwIj1wAjKZK3 From: "Grooms, Matthew" To: , , X-OriginalArrivalTime: 07 Jun 2005 16:36:24.0095 (UTC) FILETIME=[0B1632F0:01C56B7F] X-Virus-Scanned: by amavisd-new at seton.org Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: RE: 5.4-RELEASE lockups on amd64 SMP X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jun 2005 16:36:27 -0000 All, Here is an update with more info. In addition to the lock order = reversal, this is the third panic that I have seen that looked like this = ... Tracing id 110 tid 100089 td 0xffffff012f3f0c80 kdb_enter() at kdb_enter+0x2f panic() at panic+0x249 uma_dbg_free() at uma_dbg_free+0x188 uma_zfree_arg() at uma_zfree_arg+0x1b0 pf_purge_expired_states() at pf_purge_expired_states+0x41 pfsync_input at pfsync_input+xb35 pf_input() at ip_input+0x10f netisr_processqueue() at netisr_processqueue+0x17 swi_net() at swi_net+0xa8 ithread_loop() at ithread_loop+0xd9 fork_exit() at fork_exit+0xc3 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffffffb44f9d00, rbp =3D 0 --- db> continue boot() called on cpu#0 Uptime: 13h42m43s Dumping 4864 MB 16 32 ... I was hoping to get a crash dump but unfortunately can't seem to = get one to complete. In any case, this particular install is now toast. = Im surprised it lasted as long as it did considering all the pushishment = it took. Im not a kernel hacker, but it would seem to me that somthing = is up with pfsync. Should it matter that I am running an AMD64 kernel in = SMP mode? Matthew Grooms -----Original Message----- From: Grooms, Matthew Sent: Mon 6/6/2005 6:54 PM To: freebsd-stable@freebsd.org Subject: 5.4-RELEASE lockups on amd64 SMP =20 My appologies. With the debug options listed in my previous post ( = should have read 5.4 not 5.3 ), I got a lock order reversal. After a = while, it paniced and spat out this ... lock order reversal 1st 0xffffffff80752ec0 pf task mtx (pf task mtx) @ = contrib/pf/net/if_pfsync.c:1621 2nd 0xffffffff8076e9f0 user map (user man) @ vm/vm_map.c:2998 KDB: stack backtrace: witness_checkorder() at witness_checkorder+0x654 _sx_xlock() at _sx_xlock+0x51 vm_map_lookup() at vm_map_lookup+0x44 vm_fault() at vm_fault+0xba trap() at trap+0x1c5 alltraps_with_regs_pushed() at alltraps_with_regs_pushed+0x5 pf_state_tree_lan_ext_RB_REMOVE() at = pf_state_tree_lan_ext_RB_REMOVE+0x10c pf_purge_expired_states() at pf_purge_expired_states+0xab pfsync_input() at ip_input+0x10f netisr_processqueue() at netisr_processqueue+0x17 swi_net() at swi_net+0xa8 ithread_loop() at ithread_loop+0xd9 fork_exit() at fork_exit+0xc3 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffffffb44f9d00, rbp =3D 0 --- KDB: enter: withness_ckeckorder [thread pid 110 tid 100089] Stopped at kdb_enter+0x2f: nop db> panic blockable sleep lock (sleep mutex) tty @ = kern/kern_event.c:1453 cpuid =3D 0 boot() called on cpu#0 Uptime: 10m40s Dumping 4864 mB 16 32 ......... After a reboot, I received another panic. Tracing pid 603 tid 100140 td 0xffffff012efda500 kdb_enter() at kdb_enter+02f panic() at panic+0x249 ffs_blkfree() at ffs_blkfree+0x483 indir_trunc() at indir_trunc+0x190 indir_trunc() at indir_trunc+0x1fb handle_workitem_freeblocks() at handle_workitem_freeblocks+0x228 softdep_setup_freeblocks() at softdep_setup_freeblocks+0x730 ffs_truncate() at ffs_truncate+0x1c9 ffs_snapshot() at ffs_snapshot+0x717 ffs_omount() at ffs_omount+0x16e vfs_domount() at vfs_domount+0x5a0 mount() at mount+0xd8 syscall() at syscall+0x1fb Xfast_syscall() at Xfast_syscall+0xa8 --- syscall(21, FreeBSD ELF64, mount), rip =3D 0800697580, rsp =3D = 0x7fffffffec58, fbp =3D 0x515b10 --- I am guessing this is related to background fsck processes being = launched because it happened consistently until I disabled background = fsck and performed one manually in single user mode. Now I can boot = normally into multi user mode. Not sure where to go from here except to watch the system and wait for = more kernel debug output. BTW : To answer a reply to my previous post, I have 6 em interfaces. -Matthew -----Original Message----- From: Grooms, Matthew Sent: Mon 6/6/2005 12:06 PM To: freebsd-stable@freebsd.org Subject: Debug help - 5.3 lockups on amd64 SMP =20 All, I am experiencing lockups on a production 5.4 amd64 SMP system.=20 Its lightly loaded and seems to last about 3-5 days before it stops=20 responding to network or even console interaction. The system is acting=20 as a firewall and runs a mostly stock kernel with IPV6 removed and SMP,=20 PF, PFLOG, CARP and ALTQ added. The only other thing I can think to note = is that tcpdump is running constantly on the pflog interface to coax=20 human readable firewall logs out of pf. I have an identical hot spare server with SMP disabled that has=20 taken over flawlessly every time the live lock occurs so I am willing to = leave the primary in the production environment to do testing and gather = debug info. I have added the following options to primary fw kernel=20 config ... # Debug Options makeoptions DEBUG=3D-g options DDB options KDB options BREAK_TO_DEBUGGER options INVARIANT_SUPPORT options INVARIANTS options WITNESS options WITNESS_KDB options WITNESS_SKIPSPIN ... and the following to the rc.conf ... dumpdev=3D"/dev/amrd0s1h" dumpdir=3D"/var/crash" Will this do it or should I add anything else? Thanks in advance, -Matthew