From owner-freebsd-current@FreeBSD.ORG Fri Dec 28 22:09:29 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB72616A468; Fri, 28 Dec 2007 22:09:29 +0000 (UTC) (envelope-from peo@intersonic.se) Received: from neonpark.inter-sonic.com (neonpark.inter-sonic.com [212.247.8.98]) by mx1.freebsd.org (Postfix) with ESMTP id 8B5FA13C46E; Fri, 28 Dec 2007 22:09:29 +0000 (UTC) (envelope-from peo@intersonic.se) X-Virus-Scanned: amavisd-new at inter-sonic.com Message-ID: <47757413.6010807@intersonic.se> Date: Fri, 28 Dec 2007 23:09:23 +0100 From: Per olof Ljungmark Organization: Intersonic AB User-Agent: Thunderbird 2.0.0.9 (X11/20071216) MIME-Version: 1.0 To: Stefan Lambrev References: <4774E2FB.2090107@intersonic.se> <4774E68E.7030200@moneybookers.com> <47750046.80705@intersonic.se> <4775229A.3040707@moneybookers.com> In-Reply-To: <4775229A.3040707@moneybookers.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Maxime Henrion , freebsd-net@FreeBSD.org, freebsd-current Subject: Re: [Fwd: Re: rtfree: 0xc5caad98 has 2 refs] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Dec 2007 22:09:29 -0000 Stefan Lambrev wrote: > Hi, > > Can you replace all calls to rtfree() with RTFREE_LOCKED() in those files: > > netinet/if_ether.c > netinet6/nd6_nbr.c > netinet6/in6_ifattach.c > netinet6/in6_gif.c > > Of course do not forget net/route.c with the patch from the PR. > Recompile the kernel and check if this will cure your hangs? > > I'm not sure about the lock order reversal, may be it was introduced > with kbd_backtrace(). > You can remove it from route.c, replace rtfree() and build kernel with > debug, to see if the LOR is gone. > > It seems that the panic is caused by rtalloc1() called in route.c line > 333 : > rt = rtalloc1(dst, 0, 0UL); /* NB: rt is locked */ > > most probably because rt is not locked :) > I'm out of ideas how to check if it is really locked, but you can > experiment with RT_LOCK() and RT_UNLOCK(). > May be mtx_trylock() can help too. > > Please share your findings with -net & -current if you did not before. > > =cut= Unfortunately I ran out of time before I could complete the test. However, I can report one more interesting finding from today: The icmp packets that triggers the bug probably comes either from a Cisco router or the setup itself. Late today our network topology was changed, Previous setup: affected hosts ISP's router (default gw) .1 LAN ------------ router-------- wlan 1 (via ISP) | 192.168.3.0 our firewall .254 | fw ----------wlan 2 | 172.16.2.0 (isakmpd) | Internet Current setup: affected hosts our fw (OpenBSD) .1 192.168.3.0 LAN ------------ router------ wlan 1 (isakmpd) | | 172.16.2.0 (isakmpd) | --------wlan 2 | | Internet and this "fixed" the problem! We have no access to the Cisco so I don't know it's configuration. But: No lockups, no "rtfree" messages. If the bug is still unresolved mid-January I can continue testing by then. Thanks to all for your suggestions and help! --per