From owner-freebsd-stable@FreeBSD.ORG Sat Mar 14 08:51:07 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 19ACD106566B; Sat, 14 Mar 2009 08:51:07 +0000 (UTC) (envelope-from nick@nickwithers.com) Received: from mail.nickwithers.com (mail.nickwithers.com [123.243.228.66]) by mx1.freebsd.org (Postfix) with ESMTP id 309A78FC14; Sat, 14 Mar 2009 08:51:05 +0000 (UTC) (envelope-from nick@nickwithers.com) Received: from [10.0.0.245] (presario.shmon.net [10.0.0.245]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.nickwithers.com (Postfix) with ESMTPSA id B079A44; Sat, 14 Mar 2009 19:50:51 +1100 (EST) From: Nick Withers To: Robert Watson In-Reply-To: References: <1236920519.1490.30.camel@localhost> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-bRTMx26oWXyROZCDBgdl" Date: Sat, 14 Mar 2009 19:50:45 +1100 Message-Id: <1237020646.1532.24.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.24.5 FreeBSD GNOME Team Port X-MailScanner-ID: B079A44.394BC X-nickwithers-MailScanner: Found to be clean X-nickwithers-MailScanner-From: nick@nickwithers.com Cc: freebsd-stable@freebsd.org Subject: Re: NICs locking up, "*tcp_sc_h" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Mar 2009 08:51:07 -0000 --=-bRTMx26oWXyROZCDBgdl Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Fri, 2009-03-13 at 09:49 +0000, Robert Watson wrote: > On Fri, 13 Mar 2009, Robert Watson wrote: >=20 > > Sounds like a lock leak -- if you're running INVARIANTS, then "show all= ocks" > ^^^^ should read WITNES= S > > and "show allchains" would be useful. I've had a report of a TCP lock = leak=20 > > possibly in tcp_input(), but haven't managed to track it down yet -- th= is=20 > > could well be it as well. Right, here we go! Here's "show alllocks"' output: ____ Process 31 (irp20: fxp0+) thread 0xffffff00012016e0 (100030) exclusive rw tcpinp r =3D 0 (0xffffff000392d570) locked @ /usr/src/sys/netinet/tcp_input.c:480 exclusive rw tcp r =3D 0 (0xffffffff806dcbe8) locked @ /usr/src/sys/netinet/tcp_input.c:400 Process 17 (swi6: Giant taskq) thread 0xffffff0001173000 (100016) exclusive sleep mutex Giant r =3D 0 (0xffffffff80652520) locked @ /usr/src/sys/kern/kern_intr.c:1087 Process 12 (swi4: clock) thread 0xffffff00010c6370 (100003) shared rw IPFW static rules r =3D 0 (0xffffffff806db2b8) locked @ /usr/src/sys/netinet/ip_fw2.c:2460 shared rw PFil hoow read/write mutex r =3D 0 (0xffffffff806dba28) locked @ /usr/src/sys/net/pfil.c:73 exclusive sleep mutex tcp_sc_head r =3D 0 (0xfffffffe8036c8f8) locked @ /usr/src/sys/kern/kern_timeout.c:241 ____ ...and here's "show allchains"': ____ chain 1: thread 100031 (pid 32, irp22: rl0) blocked on lock 0xffffffff806dcbe8 (rw) "tcp" thread 100030 (pid 31, irq20: fxp0+) blocked on lock 0xfffffffe8036c8f8 (sleep mutex) "tcp_sc_head" thread 100003 (pid 12, swi4: clock) blocked on lock 0xffffffff806dcbe8 (rw) "tcp" thread 100030 (pid 31, irq20: fxp0+) blocked on lock 0xfffffffe8036c8f8 (sleep mutex) "tcp_sc_head" thread 100003 (pid 12, swi4: clock) blocked on lock 0xffffffff806dcbe8 (rw) "tcp" thread 100030 (pid 31, irq20: fxp0+) blocked on lock 0xfffffffe8036c8f8 (sleep mutex) "tcp_sc_head" thread 100003 (pid 12, swi4: clock) blocked on lock 0xffffffff806dcbe8 (rw) "tcp" (...and so on, these last two seeming to go on forever.) ____ This is with fxp0 and rl0 lagg(4)-ed For completeness... - "ps" in DDB shows that: - PID 32 ("[irp22: rl0]") is in state "LL" on "*tcp" - PID 31 ("[irq20: fxp0+]") is in state "LL" on "*tcp_sc_h" - PID 12 ("[swi4: clock]") is in state "LL" on "*tcp" - "where 32" gives: ____ sched_switch() at sched_switch+0xdf mi_switch() at mi_switch+0x18b turnstile_wait() at turnstile_wait+0x1c4 _rw_lock_hard() at _rw_lock_hard+0xa3 _rw_wlock() at _rw_wlock+0x54 tcp_input() at tcp_input+0x318 ip_input() at ip_input+0xaf ether_demux() at ether_demux+0x1b9 ether_input() at ether_input+0x1bb rl_rxeof() at rl_rxeof+0x1c8 rl_intr() at rl_intr+0x138 ithread_loop() at ithread_loop+0xe9 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xfffffffe91e0cd30, rbp =3D 0 --- ____ - "where 31" gives: ____ Tracing PID 31 tid 100030 td 0xffffff00012016e0 sched_switch() at sched_switch+0xdf mi_switch() at mi_switch+0x18b turnstile_wait() at turnstile_wait+0x1c4 _mtx_lock_sleep() at _mtx_lock_sleep+0x76 _mtx_lock_flags() at _mtx_lock_flags+0x95 syncache_lookup() at syncache_lookup+0xee syncache_expand() at syncache_expand+0x38 tcp_input() at tcp_input+0x99b ip_input() at ip_input+0xaf ether_demux() at ether_demux+0x1b9 ether_input() at ether_input+0x1bb fxp_intr() at fxp_intr+0x224 ithread_loop() at ithread_loop+0xe9 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xfffffffe80174d30, rbp =3D 0 --- ____ - "where 12" gives: ____ sched_switch() at sched_switch+0xdf mi_switch() at mi_switch+0x18b turnstile_wait() at turnstile_wait+0x1c4 _rw_rlock() at _rw_rlock+0x9c ipfw_chk() at ipfw_chk+0x3ac1 ipfw_check_out() at ipfw_check_out+0xb1 pfil_run_hooks() at pfil_run_hooks+0xac ip_output() at ip_output+0x357 syncache_respond() at syncache_respond+0x2fd syncache_timer() at syncache_timer+0x15a softclock() at softclock+0x270 ithread_loop() at ithread_loop+0xe9 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xfffffffe80017d30, rbp =3D 0 --- ____ - Before having entered the debugger, the following were logged: ____ lock order reversal: 1st 0xffffff00032947b0 tcpinp (tcpinp) @ /usr/src/sys/netinet/tcp_timer.c:169 2nd 0xffffffff806dba28 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:73 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a witness_checkorder() at witness_checkorder+0x565 _rw_rlock() at _rw_rlock+0x25 pfil_run_hooks() at pfil_run_hooks+0x44 ip_output() at ip_output+0x357 tcp_output() at tcp_output+0xa1d tcp_timer_delack() at tcp_timer_delack+0xc7 softclock() at softclock+0x270 ithread_loop() at ithread_loop+0xe9 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xfffffffe80017d30, rbp =3D 0 --- ____ ...and: ____ lock order reversal: 1st 0xfffffffe80365df0 tcp_sc_head (tcp_sc_head) @ /usr/src/sys/kern/kern_timeout.c:241 2nd 0xffffffff806dba28 PFil hook read/write mutex (PFil hook read/write mutex) @ /usr/src/sys/net/pfil.c:73 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a witness_checkorder() at witness_checkorder+0x565 _rw_rlock() at _rw_rlock+0x25 pfil_run_hooks() at pfil_run_hooks+0x44 ip_output() at ip_output+0x357 syncache_respond() at syncache_respond+0x2fd syncache_timer() at syncache_timer+0x15a softclock() at softclock+0x270 ithread_loop() at ithread_loop+0xe9 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xfffffffe80017d30, rbp =3D 0 --- ____ On reboot: ____ lock order reversal: 1st 0xffffffff806db2b8 IPFW static rules (IPFW static rules) @ /usr/src/sys/netinet/ip_fw2.c:2460 2nd 0xffffffff806dcbe8 tcp (tcp) @ /usr/src/sys/netinet/ip_fw2.c:2009 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a witness_checkorder() at witness_checkorder+0x565 _rw_rlock() at _rw_rlock+0x25 ipfw_chk() at ipfw_chk+0x3ac1 ipfw_check_out() at ipfw_check_out+0xb1 pfil_run_hooks() at pfil_run_hooks+0xac ip_output() at ip_output+0x357 tcp_respond() at tcp_respond+0x33a tcp_dropwithreset() at tcp_dropwithreset+0x131 tcp_do_segment() at tcp_do_segment+0xd93 tcp_input() at tcp_input+0x8dd ip_input() at ip_input+0xaf ether_demux() at ether_demux+0x1b9 ether_input() at ether_input+0x1bb fxp_intr() at fxp_intr+0x224 ithread_loop() at ithread_loop+0xe9 fork_exit() at fork_exit+0x112 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xfffffffe80174d30, rbp =3D 0 --- ____ (Haven't actually checked whether any of these are known innocuous). Anything else that might be useful? > Robert N M Watson > Computer Laboratory > University of Cambridge --=20 Nick Withers email: nick@nickwithers.com Web: http://www.nickwithers.com Mobile: +61 414 397 446 --=-bRTMx26oWXyROZCDBgdl Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) iEYEABECAAYFAkm7b+UACgkQ3wcG/Pf4WrgyBACbBKT1O58l51oz/TJqkMWG++as ZfkAoM3EPPnRY1kJFL7QodDeDWWrQq4S =nzyk -----END PGP SIGNATURE----- --=-bRTMx26oWXyROZCDBgdl--