From owner-freebsd-stable@FreeBSD.ORG Tue Apr 21 17:53:55 2009 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38897106564A; Tue, 21 Apr 2009 17:53:55 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 120E78FC17; Tue, 21 Apr 2009 17:53:55 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id AD72746B66; Tue, 21 Apr 2009 13:53:54 -0400 (EDT) Date: Tue, 21 Apr 2009 18:53:54 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Mike Tancsa In-Reply-To: <200904211610.n3LGAYll090970@lava.sentex.ca> Message-ID: References: <200904210524.n3L5O9YS086865@lava.sentex.ca> <200904211111.57295.jhb@freebsd.org> <200904211519.n3LFJFsk090691@lava.sentex.ca> <20090421153112.GA47589@edoofus.dev.vega.ru> <200904211610.n3LGAYll090970@lava.sentex.ca> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-stable@FreeBSD.org, Ruslan Ermilov , John Baldwin Subject: Re: RELENG_7 crash X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Apr 2009 17:53:55 -0000 On Tue, 21 Apr 2009, Mike Tancsa wrote: > At 11:31 AM 4/21/2009, Ruslan Ermilov wrote: >> : >> : Note that these changes simply close races around use of ifindex_table, >> : and make no attempt to solve the probem of disappearing ifnets. Further >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> : refinement of this work, including with respect to ifindex_table >> : resizing, is still required. >> : >> : In a future change, the ifnet lock should be converted from a mutex to an >> : rwlock in order to reduce contention. > > Thanks for the info! In the mean, time, apart from disabling > snmpwalking, is there anything I can do to mitigate triggering this bug ? > The box runs ospf/zebra for routing daemons and mpd53 for l2tp LNS > termination. There are several bugs here, one difficult to fix (lack of refcounting), but also stuff like ifp being derived from an interface number twice, but checked against NULL only the first time (line 85 checked for NULL, re-queried but no check line 88). Fixing the top bit of the function to only query the ifp once and check it for NULL then would be a good idea. More fundamentally, we do need to refcount ifnets when used from the management path, which is not all that hard a change, but preferably to try the easy way first given where we are in the release cycle. However, I wonder if your debugger is being totally honest with you. Line 127 is after several other dereferences of ifp, and there are calls to functions with locking, so the compiler really shouldn't have reordered the post-sysctl calls to be before the pre-sysctl calls that also dereference it. Could you try using addr2line and see if it gives you a different line number, and/or check source and object file dates? Robert N M Watson Computer Laboratory University of Cambridge