Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Jun 2005 11:32:04 -0500
From:      "Grooms, Matthew" <MGrooms@seton.org>
To:        <freebsd-stable@freebsd.org>, <freebsd-stable@freebsd.org>, <max@love2party.net>
Subject:   RE: 5.4-RELEASE lockups on amd64 SMP
Message-ID:  <28FCC7CB4CF6EA43AF83BCA2096E97D013E55C@AUSEX2VS1.seton.org>

next in thread | raw e-mail | index | archive | help
All,

     Here is an update with more info. In addition to the lock order =
reversal, this is the third panic that I have seen that looked like this =
...

Tracing id 110 tid 100089 td 0xffffff012f3f0c80
kdb_enter() at kdb_enter+0x2f
panic() at panic+0x249
uma_dbg_free() at uma_dbg_free+0x188
uma_zfree_arg() at uma_zfree_arg+0x1b0
pf_purge_expired_states() at pf_purge_expired_states+0x41
pfsync_input at pfsync_input+xb35
pf_input() at ip_input+0x10f
netisr_processqueue() at netisr_processqueue+0x17
swi_net() at swi_net+0xa8
ithread_loop() at ithread_loop+0xd9
fork_exit() at fork_exit+0xc3
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffffffb44f9d00, rbp =3D 0 ---
db> continue
boot() called on cpu#0
Uptime: 13h42m43s
Dumping 4864 MB
 16 32 ...

     I was hoping to get a crash dump but unfortunately can't seem to =
get one to complete. In any case, this particular install is now toast. =
Im surprised it lasted as long as it did considering all the pushishment =
it took. Im not a kernel hacker, but it would seem to me that somthing =
is up with pfsync. Should it matter that I am running an AMD64 kernel in =
SMP mode?

Matthew Grooms

-----Original Message-----
From: Grooms, Matthew
Sent: Mon 6/6/2005 6:54 PM
To: freebsd-stable@freebsd.org
Subject: 5.4-RELEASE lockups on amd64 SMP
=20
My appologies. With the debug options listed in my previous post ( =
should have read 5.4 not 5.3 ), I got a lock order reversal. After a =
while, it paniced and spat out this ...

lock order reversal
1st 0xffffffff80752ec0 pf task mtx (pf task mtx) @ =
contrib/pf/net/if_pfsync.c:1621
2nd 0xffffffff8076e9f0 user map (user man) @ vm/vm_map.c:2998
KDB: stack backtrace:
witness_checkorder() at witness_checkorder+0x654
_sx_xlock() at _sx_xlock+0x51
vm_map_lookup() at vm_map_lookup+0x44
vm_fault() at vm_fault+0xba
trap() at trap+0x1c5
alltraps_with_regs_pushed() at alltraps_with_regs_pushed+0x5
pf_state_tree_lan_ext_RB_REMOVE() at =
pf_state_tree_lan_ext_RB_REMOVE+0x10c
pf_purge_expired_states() at pf_purge_expired_states+0xab
pfsync_input() at ip_input+0x10f
netisr_processqueue() at netisr_processqueue+0x17
swi_net() at swi_net+0xa8
ithread_loop() at ithread_loop+0xd9
fork_exit() at fork_exit+0xc3
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffffffb44f9d00, rbp =3D 0 ---
KDB: enter: withness_ckeckorder
[thread pid 110 tid 100089]
Stopped at      kdb_enter+0x2f: nop
db> panic blockable sleep lock (sleep mutex) tty @ =
kern/kern_event.c:1453
cpuid =3D 0
boot() called on cpu#0
Uptime: 10m40s
Dumping 4864 mB
 16 32 .........

After a reboot, I received another panic.

Tracing pid 603 tid 100140 td 0xffffff012efda500
kdb_enter() at kdb_enter+02f
panic() at panic+0x249
ffs_blkfree() at ffs_blkfree+0x483
indir_trunc() at indir_trunc+0x190
indir_trunc() at indir_trunc+0x1fb
handle_workitem_freeblocks() at handle_workitem_freeblocks+0x228
softdep_setup_freeblocks() at softdep_setup_freeblocks+0x730
ffs_truncate() at ffs_truncate+0x1c9
ffs_snapshot() at ffs_snapshot+0x717
ffs_omount() at ffs_omount+0x16e
vfs_domount() at vfs_domount+0x5a0
mount() at mount+0xd8
syscall() at syscall+0x1fb
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall(21, FreeBSD ELF64, mount), rip =3D 0800697580, rsp =3D =
0x7fffffffec58, fbp =3D 0x515b10 ---

I am guessing this is related to background fsck processes being =
launched because it happened consistently until I disabled background =
fsck and performed one manually in single user mode. Now I can boot =
normally into multi user mode.

Not sure where to go from here except to watch the system and wait for =
more kernel debug output.

BTW : To answer a reply to my previous post, I have 6 em interfaces.

-Matthew

-----Original Message-----
From: Grooms, Matthew
Sent: Mon 6/6/2005 12:06 PM
To: freebsd-stable@freebsd.org
Subject: Debug help - 5.3 lockups on amd64 SMP
=20
All,

      I am experiencing lockups on a production 5.4 amd64 SMP system.=20
Its lightly loaded and seems to last about 3-5 days before it stops=20
responding to network or even console interaction. The system is acting=20
as a firewall and runs a mostly stock kernel with IPV6 removed and SMP,=20
PF, PFLOG, CARP and ALTQ added. The only other thing I can think to note =

is that tcpdump is running constantly on the pflog interface to coax=20
human readable firewall logs out of pf.

     I have an identical hot spare server with SMP disabled that has=20
taken over flawlessly every time the live lock occurs so I am willing to =

leave the primary in the production environment to do testing and gather =

debug info. I have added the following options to primary fw kernel=20
config ...

# Debug Options
makeoptions     DEBUG=3D-g
options         DDB
options         KDB
options         BREAK_TO_DEBUGGER
options         INVARIANT_SUPPORT
options         INVARIANTS
options         WITNESS
options         WITNESS_KDB
options         WITNESS_SKIPSPIN

... and the following to the rc.conf ...

dumpdev=3D"/dev/amrd0s1h"
dumpdir=3D"/var/crash"

Will this do it or should I add anything else?

Thanks in advance,

-Matthew







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?28FCC7CB4CF6EA43AF83BCA2096E97D013E55C>