From owner-freebsd-current@FreeBSD.ORG Sun Dec 10 12:57:15 2006 Return-Path: X-Original-To: freebsd-current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id CD78216A407 for ; Sun, 10 Dec 2006 12:57:15 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [209.31.154.42]) by mx1.FreeBSD.org (Postfix) with ESMTP id 161DE43C9D for ; Sun, 10 Dec 2006 12:56:05 +0000 (GMT) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [209.31.154.41]) by cyrus.watson.org (Postfix) with ESMTP id 3E0C246F09; Sun, 10 Dec 2006 07:57:14 -0500 (EST) Date: Sun, 10 Dec 2006 12:57:14 +0000 (GMT) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Maxim Konovalov In-Reply-To: <20061210123204.V52497@mp2.macomnet.net> Message-ID: <20061210125011.F2296@fledge.watson.org> References: <52944.192.168.1.110.1165679313.squirrel@yal.hopto.org> <20061209195519.B60055@mp2.macomnet.net> <20061209204924.N9926@fledge.watson.org> <20061210013735.D11309@mp2.macomnet.net> <20061210083752.G9926@fledge.watson.org> <20061210123204.V52497@mp2.macomnet.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-current@FreeBSD.org, yal Subject: Re: CURRENT freezes on Laitude D520 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Dec 2006 12:57:16 -0000 On Sun, 10 Dec 2006, Maxim Konovalov wrote: >>> I didn't suggest to turn off mpsafenet forever and forget, I just wanted >>> to check my guess. I would like to help to debug the problem but I need >>> some initial instructions to start. There is a firewire console. What do >>> I need to check? >> >> Start with the information in my followup e-mail to Andrew: >> >> - Configure WITNESS and see if you get any console output regarding >> lock order problems. > > Yes, there is one: > > lock order reversal > 1st 0xd0f277c8 inp (rawinp) @ /usr/src/sys/netinet/raw_ip.c > 2nd 0xd0ecbb54 wi0 (network driver) @ /usr/src/sys/modules/wi/../../dev/wi/if_wi.c > KDB > db_trace_self_wrapper(ce626f9d) at db_trace_self_wrapper+0x25 > kdb_backtrace(ffffffff,ce6a6378,ce6a6b20,ce65bd24,ce6e4ed0,...) at kdb_backtrace+0x29 > witness_checkorder(d0ecbb54,9,d0e73d13,388) at witness_checkorder+0x4db > _mtx_lock_flags(d0ecbb54,0,d0e73d13,388,ce4d8cdd,...) at _mtx_lock_flags+0x1e > wi_start(d0e05800) at wi_start+0x32 > if_start(d0e05800) at if_start+0x53 > ether_output_frame(d0e05800,d0d18100,0,1,0,...) at ether_output_frame+0x180 > ether_output(d0e05800,d0d18100,d0e652b0,d0e61bb8,ce6e6b18,...) at ether_output+0x3c0 > ieee80211_output(d0e05800,d0d18100,d0e652b0,d0e61bb8,0,...) at ieee80211_output+0x33 > ip_output(d0d18100,0,e1afbb38,20,0,...) at ip_output+0x7f0 > rip_output(d0d18100,d102ee44,1d2722c3,2000,e1afbbf0,...) at rip_output+0x29b > rip_send(d102ee44,0,d0d18100,0,0,...) at rip_send+0x4f > sosend_generic(d102ee44,0,0,d0d18100,0,...) at sosend_generic+0x3e1 > sosend(d102ee44,0,0,d0d18100,0,...) at sosend+0x22 > ng_ksocket_rcvdata(d10ab280,d104f750,1,e1afbc78,0,...) at ng_ksocket_rcvdata+0xa3 > ng_apply_item(d10ab200,d104f750,0,0,d10ab200,...) at ng_apply_item+0xf8 > ngintr(0) at ngintr+0x13d > swi_net(0) at swi_net+0xba > ithread_execute_handlers(d09acb40,d09dba00) at ithread_execute_handlers+0xce > ithread_loop(d09dc180,e1afbd38,ce697af0,0,ce622832,328) at ithread_loop+0x4f > fork_exit(ce4cdf0c,d09dc180,e1afbd38) at fork_exit+0x68 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xe1afbd6c, ebp = 0 --- > > At this point ifconfig wlan0 hangs, reboot hangs. > >> - Try setting net.isr.direct=0 and see if the problem goes away. > > This indeed help. LOR has gone and wireless works. > >> - Try removing options PREEMPTION and see if the problem goes away. > > Haven't try. As speculated by others, this is a bug in the if_wi driver, which improperly holds a device driver lock over a call into the network stack. While this can result in a deadlock under other circumstances, net.isr.direct makes the chances of that deadlock much greater. It appears also that you have netgraph in the mix somehow, which might well also increase the chances of the deadlock triggering. Someone(tm) needs to fix if_wi to operate properly with respect to the network stack lock order; another feature likely to trigger the same device driver bug is IP fast forwarding from a wireless interface. Sam has mentioned to me that this same bug exists in several wireless drivers. Robert N M Watson Computer Laboratory University of Cambridge