From owner-freebsd-net@FreeBSD.ORG Wed May 21 19:11:21 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DAA481065671 for ; Wed, 21 May 2008 19:11:21 +0000 (UTC) (envelope-from neil@hoggarth.me.uk) Received: from neilhoggarth-2.dsl.easynet.co.uk (neilhoggarth-2.dsl.easynet.co.uk [217.206.124.94]) by mx1.freebsd.org (Postfix) with ESMTP id BEDCB8FC1D for ; Wed, 21 May 2008 19:11:20 +0000 (UTC) (envelope-from neil@hoggarth.me.uk) Received: from neilhoggarth-2.dsl.easynet.co.uk (localhost [127.0.0.1]) by neilhoggarth-2.dsl.easynet.co.uk (8.14.2/8.14.2) with ESMTP id m4LJBIrd001247 for ; Wed, 21 May 2008 20:11:18 +0100 (BST) (envelope-from neil@hoggarth.me.uk) Received: from localhost (njh@localhost) by neilhoggarth-2.dsl.easynet.co.uk (8.14.2/8.14.2/Submit) with ESMTP id m4LJBIHs001244 for ; Wed, 21 May 2008 20:11:18 +0100 (BST) (envelope-from neil@hoggarth.me.uk) X-Authentication-Warning: neilhoggarth-2.dsl.easynet.co.uk: njh owned process doing -bs Date: Wed, 21 May 2008 20:11:18 +0100 (BST) From: Neil Hoggarth X-X-Sender: njh@neilhoggarth-2.dsl.easynet.co.uk To: freebsd-net@freebsd.org Message-ID: User-Agent: Alpine 1.10 (BSF 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Subject: [em][panic] recursed on non-recursive mutex em0 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 May 2008 19:11:22 -0000 Hi Folks, I opened PR kern/122928 last month, describing my problems with Intel PRO/1000 MT adaptor on 7-STABLE, with v6.7.3 of the em driver: every so often the machine would get into a state where it would repeatedly watchdog timeout the em0 interface, and the interface would stop receiving packets. A few weeks ago I cvsupped a newer version of the 7-STABLE source tree, after v6.9.0 of the em driver was MFCed. Since I started running on the newer build watchdog timeouts appear to be followed by a system panic rather than the freezing up of the interface. I have no hard evidence but it "feels like" the panics are happening under the same sort of circumstance that was previously triggering my watchdog problem (basically network activity combined with CPU load). A similar observation was made by Yani Karydis on the freebsd-stable mailing list earlier in the month: http://lists.freebsd.org/pipermail/freebsd-stable/2008-May/042311.html I've now got a serial console set up, and I'm running a kernel with DDB support compiled in. I managed to get the information below out of the most recent panic. Is anyone able to use this to debug the situation? What else might I try to gather useful information? em0: watchdog timeout -- resetting panic: _mtx_lock_sleep: recursed on non-recursive mutex em0 @ /usr/src/sys/dev/em/if_em.c:1348 cpuid = 0 KDB: enter: panic [thread pid 14 tid 100005 ] Stopped at kdb_enter_why+0x3d: movq $0,0x5efc08(%rip) db> bt Tracing pid 14 tid 100005 td 0xffffff00012406a0 kdb_enter_why() at kdb_enter_why+0x3d panic() at panic+0x176 _mtx_lock_sleep() at _mtx_lock_sleep+0x181 _mtx_lock_flags() at _mtx_lock_flags+0xe1 em_init_locked() at em_init_locked+0x61 em_local_timer() at em_local_timer+0x195 softclock() at softclock+0x299 ithread_loop() at ithread_loop+0xda fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffd5599d30, rbp = 0 --- db> show locks exclusive sleep mutex em0 (EM TX Lock) r = 0 (0xffffff00036754a8) locked @ /usr/src/sys/dev/em/if_em.c:1302 exclusive sleep mutex em0 (EM Core Lock) r = 0 (0xffffff0003675478) locked @ /usr/src/sys/kern/kern_timeout.c:241 db> ps pid ppid pgrp uid state wmesg wchan cmd 50494 50493 41749 0 R+ make 50493 41749 41749 0 S+ wait 0xffffff01196a6468 sh 45511 45510 45511 1284 Ss sbwait 0xffffff00150257c4 sftp-server 45510 45135 45135 1284 S select 0xffffffff80b2f930 sshd 45135 932 45135 0 Ss sbwait 0xffffff001502c984 sshd 41749 1070 41749 0 S+ piperd 0xffffff0015ac45d0 initial thread 34233 34223 34073 0 S+ nanslp 0xffffffff80a9da68 initial thread 34223 34221 34073 0 S+ wait 0xffffff0009b62468 make 34221 34073 34073 0 S+ wait 0xffffff0015aa0468 sh 34073 33858 34073 0 S+ wait 0xffffff0125b6f8d0 make 33858 33834 33858 1284 Ss+ wait 0xffffff0009d528d0 bash 33834 33833 33833 1284 S select 0xffffffff80b2f930 xterm 33833 1066 33833 1284 Ss wait 0xffffff0125b6c468 sh 7250 1110 1110 1284 R+ initial thread 5215 1 1007 1284 S+ select 0xffffffff80b2f930 initial thread 5213 5209 1007 1284 S+ (threaded) firefox-bin 100203 S ucond 0xffffff009a318b00 firefox-bin 100202 S ucond 0xffffff00a0a25400 firefox-bin 100201 S ucond 0xffffff00a1dd4c00 firefox-bin 100200 S ucond 0xffffff00a4297580 firefox-bin 100195 S ucond 0xffffff0003e86900 firefox-bin 100194 S select 0xffffffff80b2f930 firefox-bin 100190 S select 0xffffffff80b2f930 initial thread 5209 5205 1007 1284 S+ wait 0xffffff0015aa58d0 sh 5205 5204 1007 1284 S+ wait 0xffffff0015aa1000 sh 5204 1034 1007 1284 S+ wait 0xffffff0015aa2468 sh 4977 4976 4976 1284 S+ (threaded) alpine 100147 S select 0xffffffff80b2f930 initial thread 4976 4973 4976 1284 S+ wait 0xffffff0015c6c000 sh 4973 4971 4973 1284 Ss+ wait 0xffffff0015c6b000 bash 4971 1034 1007 1284 S+ select 0xffffffff80b2f930 xterm 3249 919 919 80 S accept 0xffffff000929c5ce httpd 1110 1107 1110 1284 S+ wait 0xffffff0009b65000 sh 1107 1042 1107 1284 Ss+ wait 0xffffff000379e000 bash 1070 1068 1070 1284 Ss+ wait 0xffffff0009d51000 bash 1068 1067 1067 1284 S select 0xffffffff80b2f930 xterm 1067 1066 1067 1284 Ss wait 0xffffff0009d51468 sh 1066 1059 1059 1284 S+ select 0xffffffff80b2f930 ratpoison 1059 1052 1059 1284 S+ wait 0xffffff00037b28d0 sh 1054 0 0 0 RL CPU 1 [irq16: drm0] 1053 1052 1053 1284 S+ select 0xffffffff80b2f930 initial thread 1052 1051 1051 1284 S+ wait 0xffffff0009d54000 xinit 1051 1050 1051 1284 S+ wait 0xffffff0009d54468 sh 1050 1042 1050 1284 Ss+ wait 0xffffff0009b66000 bash 1048 1045 1048 1284 S+ kqread 0xffffff0003e60e00 rtorrent 1045 1042 1045 1284 Ss+ wait 0xffffff0003a45468 bash 1042 1041 1042 1284 Ss select 0xffffffff80b2f930 screen 1041 1006 1041 1284 S+ pause 0xffffff0009b620c0 screen 1034 1020 1007 1284 S+ select 0xffffffff80b2f930 initial thread 1033 1020 1033 1284 S+ select 0xffffffff80b2f930 initial thread 1027 1020 1007 1284 S+ select 0xffffffff80b2f930 initial thread 1020 1 1007 1284 S+ select 0xffffffff80b2f930 initial thread 1016 1 1007 1284 S+ select 0xffffffff80b2f930 Xvnc 1006 986 1006 1284 S+ wait 0xffffff000379f8d0 bash 1001 872 872 0 S select 0xffffffff80b2f930 perl5.8.8 1000 872 872 0 S select 0xffffffff80b2f930 perl5.8.8 999 919 919 80 S accept 0xffffff000929c5ce httpd 998 919 919 80 S accept 0xffffff000929c5ce httpd 997 919 919 80 S accept 0xffffff000929c5ce httpd 996 919 919 80 S accept 0xffffff000929c5ce httpd 995 919 919 80 S accept 0xffffff000929c5ce httpd 994 1 994 0 Ss+ ttyin 0xffffff0003689010 getty 993 1 993 0 Ss+ ttyin 0xffffff00036ab410 getty 992 1 992 0 Ss+ ttyin 0xffffff00036ac410 getty 991 1 991 0 Ss+ ttyin 0xffffff00036ac010 getty 990 1 990 0 Ss+ ttyin 0xffffff00036aac10 getty 989 1 989 0 Ss+ ttyin 0xffffff00036aa810 getty 988 1 988 0 Ss+ ttyin 0xffffff000369c810 getty 987 1 987 0 Ss+ ttyin 0xffffff000369cc10 getty 986 1 986 0 Ss+ wait 0xffffff0003a42000 login 965 1 965 0 Ss select 0xffffffff80b2f930 inetd 937 1 937 0 Ss nanslp 0xffffffff80a9da68 cron 932 1 932 0 Ss select 0xffffffff80b2f930 sshd 919 1 919 0 Ss select 0xffffffff80b2f930 httpd 894 1 894 25 Ss pause 0xffffff00037b0990 sendmail 890 1 890 0 Ss select 0xffffffff80b2f930 sendmail 872 1 872 0 Ss select 0xffffffff80b2f930 perl5.8.8 861 856 847 1003 S sbwait 0xffffff00092c16cc tinydns 860 857 847 1002 S piperd 0xffffff0003c27ba0 multilog 859 854 847 1001 S select 0xffffffff80b2f930 dnscache 858 855 847 1002 S piperd 0xffffff0003c39000 multilog 857 849 847 0 S select 0xffffffff80b2f930 supervise 856 849 847 0 S select 0xffffffff80b2f930 supervise 855 849 847 0 S select 0xffffffff80b2f930 supervise 854 849 847 0 S select 0xffffffff80b2f930 supervise 850 1 847 0 S piperd 0xffffff0003c278b8 readproctitle 849 1 847 0 S nanslp 0xffffffff80a9da68 svscan 833 1 833 1285 Ss select 0xffffffff80b2f930 dhcpd 799 1 799 0 Ss select 0xffffffff80b2f930 ntpd 775 1 775 0 Ss select 0xffffffff80b2f930 lpd 707 1 707 0 Ss select 0xffffffff80b2f930 syslogd 677 0 0 0 SL mdwait 0xffffff0003caa000 [md0] 633 1 633 0 Ss select 0xffffffff80b2f930 devd 169 0 0 0 SL zfs:(&tq 0xffffff0001349bc0 [zil_clean] 168 0 0 0 SL zfs:(&tq 0xffffff0001349940 [zil_clean] 167 0 0 0 SL zfs:(&tq 0xffffff00013496c0 [zil_clean] 105 0 0 0 SL zfs:(&tq 0xffffff0001349440 [zil_clean] 104 0 0 0 SL zfs:(&tx 0xffffff00038451a0 [txg_thread_enter] 103 0 0 0 SL zfs:(&tx 0xffffff0003845160 [txg_thread_enter] 102 0 0 0 SL zfs:(&tx 0xffffff0003845180 [txg_thread_enter] 101 0 0 0 SL vgeom:io 0xffffff00037c5810 [vdev:worker ad10] 100 0 0 0 SL vgeom:io 0xffffff0003760390 [vdev:worker ad8] 99 0 0 0 SL zfs:(&tq 0xffffff0001348080 [spa_zio_intr_5] 98 0 0 0 SL zfs:(&tq 0xffffff0001348080 [spa_zio_intr_5] 97 0 0 0 SL zfs:(&tq 0xffffff00013481c0 [spa_zio_issue_5] 96 0 0 0 SL zfs:(&tq 0xffffff00013481c0 [spa_zio_issue_5] 95 0 0 0 SL zfs:(&tq 0xffffff0001348300 [spa_zio_intr_4] 94 0 0 0 SL zfs:(&tq 0xffffff0001348300 [spa_zio_intr_4] 93 0 0 0 SL zfs:(&tq 0xffffff0001348440 [spa_zio_issue_4] 92 0 0 0 SL zfs:(&tq 0xffffff0001348440 [spa_zio_issue_4] 91 0 0 0 SL zfs:(&tq 0xffffff0001348580 [spa_zio_intr_3] 90 0 0 0 SL zfs:(&tq 0xffffff0001348580 [spa_zio_intr_3] 89 0 0 0 SL zfs:(&tq 0xffffff00013486c0 [spa_zio_issue_3] 88 0 0 0 SL zfs:(&tq 0xffffff00013486c0 [spa_zio_issue_3] 87 0 0 0 SL zfs:(&tq 0xffffff0001348800 [spa_zio_intr_2] 86 0 0 0 SL zfs:(&tq 0xffffff0001348800 [spa_zio_intr_2] 85 0 0 0 SL zfs:(&tq 0xffffff0001348940 [spa_zio_issue_2] 84 0 0 0 SL zfs:(&tq 0xffffff0001348940 [spa_zio_issue_2] 83 0 0 0 SL zfs:(&tq 0xffffff0001348a80 [spa_zio_intr_1] 82 0 0 0 SL zfs:(&tq 0xffffff0001348a80 [spa_zio_intr_1] 81 0 0 0 SL zfs:(&tq 0xffffff0001348bc0 [spa_zio_issue_1] 80 0 0 0 SL zfs:(&tq 0xffffff0001348bc0 [spa_zio_issue_1] 79 0 0 0 SL zfs:(&tq 0xffffff0001348d00 [spa_zio_intr_0] 78 0 0 0 SL zfs:(&tq 0xffffff0001348d00 [spa_zio_intr_0] 77 0 0 0 SL zfs:(&tq 0xffffff0001348e40 [spa_zio_issue_0] 76 0 0 0 SL zfs:(&tq 0xffffff0001348e40 [spa_zio_issue_0] 49 0 0 0 SL sdflush 0xffffffff80b3fc38 [softdepflush] 48 0 0 0 SL vlruwt 0xffffff00036a3468 [vnlru] 47 0 0 0 SL syncer 0xffffffff80a9d6e0 [syncer] 46 0 0 0 SL psleep 0xffffffff80b3013c [bufdaemon] 45 0 0 0 SL pgzero 0xffffffff80b416c4 [pagezero] 44 0 0 0 SL psleep 0xffffffff80b40a08 [vmdaemon] 43 0 0 0 SL psleep 0xffffffff80b409cc [pagedaemon] 42 0 0 0 SL zfs:(&ar 0xffffffff80d2a330 [arc_reclaim_thread] 41 0 0 0 SL waiting_ 0xffffffff80b33868 [sctp_iterator] 40 0 0 0 WL [irq1: atkbd0] 39 0 0 0 WL [swi0: sio] 38 0 0 0 WL [irq7: ppbus0 ppc0] 37 0 0 0 SL - 0xffffff000364cd80 [em0 taskq] 36 0 0 0 WL [irq17: atapci2+] 35 0 0 0 WL [irq15: ata1] 34 0 0 0 WL [irq14: ata0] 33 0 0 0 SL - 0xffffff0003600280 [nfe0 taskq] 32 0 0 0 SL usbevt 0xffffff00033da420 [usb2] 31 0 0 0 WL [irq22: ehci0] 30 0 0 0 SL usbevt 0xffffffff8103b420 [usb1] 29 0 0 0 WL [irq21: ohci1+] 28 0 0 0 SL usbtsk 0xffffffff80a98e48 [usbtask-dr] 27 0 0 0 SL usbtsk 0xffffffff80a98e20 [usbtask-hc] 26 0 0 0 SL usbevt 0xffffffff81039420 [usb0] 25 0 0 0 WL [irq20: nfe0 ohci0] 24 0 0 0 WL [irq9: acpi0] 23 0 0 0 SL - 0xffffff000138b000 [kqueue taskq] 22 0 0 0 SL - 0xffffff000138b080 [acpi_task_2] 21 0 0 0 SL - 0xffffff000138b080 [acpi_task_1] 9 0 0 0 SL - 0xffffff000138b080 [acpi_task_0] 20 0 0 0 WL [swi2: cambio] 8 0 0 0 SL ccb_scan 0xffffffff80a66fe0 [xpt_thrd] 19 0 0 0 WL [swi6: task queue] 18 0 0 0 WL [swi6: Giant taskq] 7 0 0 0 SL - 0xffffff0001337600 [thread taskq] 17 0 0 0 WL [swi5: +] 6 0 0 0 SL zfs:(&tq 0xffffff0001349080 [system_taskq] 5 0 0 0 SL zfs:(&tq 0xffffff0001349080 [system_taskq] 16 0 0 0 SL - 0xffffffff80a9d6e8 [yarrow] 4 0 0 0 SL - 0xffffffff80a99778 [g_down] 3 0 0 0 SL - 0xffffffff80a99770 [g_up] 2 0 0 0 SL - 0xffffffff80a99760 [g_event] 15 0 0 0 WL [swi3: vm] 14 0 0 0 RL CPU 0 [swi4: clock sio] 13 0 0 0 WL [swi1: net] 12 0 0 0 RL [idle: cpu0] 11 0 0 0 RL [idle: cpu1] 1 0 1 0 SLs wait 0xffffff000122f8d0 [init] 10 0 0 0 SL audit_wo 0xffffffff80b3f0c0 [audit] 0 0 0 0 WLs [swapper]