Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 8 Oct 2009 04:37:23 +0400
From:      Maxim Dounin <mdounin@mdounin.ru>
To:        freebsd-net@freebsd.org
Subject:   deadlock with pf uid rules + syncache
Message-ID:  <20091008003723.GD1229@mdounin.ru>

next in thread | raw e-mail | index | archive | help
Hello!

We with ru@ investigated lockups our colleagues observing on 
RELENG_7 machines, and was able to track [at least some of] them 
down to the following deadlock:

1. syncache does syn+ack retransmit on timeout. it holds 
tcp_sc_head mutex, and output packet processed by pf which in 
turn waits for tcp lock to lookup uid.

2. em0 dispatches new ack packet to listen socket, tcp_input() 
holds tcp lock, syncache lookup requires tcp_sc_head.

Here is data from ddb:

db> bt 19

Tracing pid 19 tid 100011 td 0xc6ca0000
sched_switch(c6ca0000,0,1,c06388b0,a38b7a66,...) at sched_switch+0x320
mi_switch(1,0,c08455c6,2e3,c091d634,...) at mi_switch+0x143
turnstile_wait(c7881140,c6d76d80,1,bd5,c6d76d86,...) at turnstile_wait+0x234
_rw_rlock(c0966b8c,c73619f6,bd5,c736392c,ce3,...) at _rw_rlock+0x166
pf_socket_lookup(2,c6bc6a80,0,ce3,c6bc6a1c,...) at pf_socket_lookup+0xc8
pf_test_tcp(c6bc6b18,c6bc6b14,2,c7236400,c73c7800,...) at pf_test_tcp+0x86f
pf_test(2,c704e800,c6bc6b74,0,0,...) at pf_test+0xde1
pf_check_out(0,c6bc6b74,c704e800,2,0,...) at pf_check_out+0x43
pfil_run_hooks(c0965e40,c6bc6c04,c704e800,2,0,...) at pfil_run_hooks+0x78
ip_output(c73c7800,0,c6bc6bd8,0,0,...) at ip_output+0x7c2
syncache_respond(c8929418,0,0,0,c0842ffa,...) at syncache_respond+0x255
syncache_timer(c6fd7ac0,0,c0842ffa,f1,1,...) at syncache_timer+0x127
softclock(0,0,c083f980,4cb,c6ce5070,...) at softclock+0x21f
ithread_loop(c6c95330,c6bc6d38,c083f823,323,c6c97840,...) at ithread_loop+0x145
fork_exit(c05e7520,c6c95330,c6bc6d38) at fork_exit+0x94
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xc6bc6d70, ebp = 0 ---
db> bt 31

Tracing pid 31 tid 100031 td 0xc6d76d80
sched_switch(c6d76d80,0,1,c06388ee,a38981fa,...) at sched_switch+0x320
mi_switch(1,0,c08455c6,2e3,c091d7d8,...) at mi_switch+0x143
turnstile_wait(c7881be0,c6ca0000,0,c6fd7ac0,1f0,...) at turnstile_wait+0x234
_mtx_lock_sleep(c6fd7ac0,c6d76d80,0,c0850acb,1f0,...) at _mtx_lock_sleep+0x10e
_mtx_lock_flags(c6fd7ac0,0,c0850acb,1f0,e6f87b14,...) at _mtx_lock_flags+0x67
syncache_lookup(e6f87aec,e6f87a8c,e6f8799c,c07db700,c091d8b8,...) at syncache_lookup+0x5e
syncache_expand(e6f87aec,e6f87b14,c7593024,e6f87b2c,c753d600,...) at syncache_expand+0x1e
tcp_input(c753d600,14,c704e800,1,0,...) at tcp_input+0x459
ip_input(c753d600,c057d932,800,c704e800,800,...) at ip_input+0x620
netisr_dispatch(2,c753d600,0,c7257537,3ac,...) at netisr_dispatch+0x55
ether_demux(c704e800,c753d600,3,0,3,...) at ether_demux+0x1aa
ether_input(c704e800,c753d600,c6d76e18,c08baa84,c082d68e,...) at ether_input+0x343
ether_demux(c6da6000,c753d600,3,0,3,...) at ether_demux+0x107
ether_input(c6da6000,c753d600,c082d68e,11ba,c084572e,...) at ether_input+0x343
em_rxeof(e6f87ca4,c05f860d,c6da159c,8,c6dc832c,...) at em_rxeof+0x499
em_handle_rxtx(c6dc4000,1,c0845360,52,c6da159c,...) at em_handle_rxtx+0x2e
taskqueue_run(c6da1580,c6da159c,c0836cc3,0,c083f823,...) at taskqueue_run+0x10b
taskqueue_thread_loop(c6dc835c,e6f87d38,c083f823,323,c6dcbb00,...) at taskqueue_thread_loop+0x68
fork_exit(c0636ee0,c6dc835c,e6f87d38) at fork_exit+0x94
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xe6f87d70, ebp = 0 ---
db> show turnstile 0xc7881be0

Lock: 0xc6fd7ac0 - (sleep mutex) tcp_sc_head
Lock Owner: 0xc6ca0000 (tid 100011, pid 19, "swi4: clock sio")
Shared Waiters:
        empty
Exclusive Waiters:
        0xc6d76d80 (tid 100031, pid 31, "em0 taskq")
Pending Threads:
        empty
db> show turnstile 0xc7881140

Lock: 0xc0966b8c - (rw) tcp
Lock Owner: 0xc6d76d80 (tid 100031, pid 31, "em0 taskq")
Shared Waiters:
        0xc6ca0000 (tid 100011, pid 19, "swi4: clock sio")
        0xc7994d80 (tid 100192, pid 1833, "httpd")
        0xc7883b40 (tid 100282, pid 1831, "httpd")
        0xc7b1a000 (tid 100252, pid 1832, "httpd")
Exclusive Waiters:
        0xc6c9ab40 (tid 100013, pid 21, "swi1: net")
        0xc7971000 (tid 100202, pid 1120, "nginx")
        0xc7a04480 (tid 100204, pid 1122, "nginx")
        0xc79a9000 (tid 100278, pid 1830, "httpd")
Pending Threads:
        empty

Unfortunately "show alllocks" output wasn't properly captured by 
textdump (as witness_list_lock() uses printf() for output, not 
db_printf() which is captured), but here is some reconstruction 
(lines from textdump's ddb.txt and msgbuf.txt, my memory about 
their relevant order):

db> show alllocks

Process 1319 (postgres) thread 0xc804a480 (100326)
exclusive sx so_rcv_sx r = 0 (0xc7b088ac) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148
Process 1117 (sshd) thread 0xc708b240 (100057)
exclusive sx so_rcv_sx r = 0 (0xc7b06a4c) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148
Process 956 (sshd) thread 0xc72476c0 (100119)
exclusive sx so_rcv_sx r = 0 (0xc7d1bd8c) locked @ /usr/src/sys/kern/uipc_sockbuf.c:148
Process 55 (g_mirror home) thread 0xc70c2d80 (100055)
exclusive sx gmirror:lock r = 0 (0xc705442c) locked @ /usr/src/sys/modules/geom/geom_mirror/../../../geom/mirror/g_mirror.c:1858
Process 31 (em0 taskq) thread 0xc6d76d80 (100031)
exclusive rw tcpinp r = 0 (0xc751d36c) locked @ /usr/src/sys/netinet/tcp_input.c:480
exclusive rw tcp r = 0 (0xc0966b8c) locked @ /usr/src/sys/netinet/tcp_input.c:400
Process 27 (swi6: Giant taskq) thread 0xc6ca06c0 (100027)
exclusive sleep mutex Giant r = 0 (0xc0910630) locked @ /usr/src/sys/kern/kern_intr.c:1125
Process 19 (swi4: clock sio) thread 0xc6ca0000 (100011)
shared rw PFil hook read/write mutex r = 0 (0xc0965e58) locked @ /usr/src/sys/net/pfil.c:73
exclusive sleep mutex tcp_sc_head r = 0 (0xc6fd7ac0) locked @ /usr/src/sys/kern/kern_timeout.c:241


The machine in question runs RELENG_7 from something about 
2009-10-05.

Please let me know if you need some more info.

Maxim Dounin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20091008003723.GD1229>