Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 Apr 2009 18:53:54 +0100 (BST)
From:      Robert Watson <rwatson@FreeBSD.org>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-stable@FreeBSD.org, Ruslan Ermilov <ru@FreeBSD.org>, John Baldwin <jhb@FreeBSD.org>
Subject:   Re: RELENG_7 crash
Message-ID:  <alpine.BSF.2.00.0904211851040.67705@fledge.watson.org>
In-Reply-To: <200904211610.n3LGAYll090970@lava.sentex.ca>
References:  <200904210524.n3L5O9YS086865@lava.sentex.ca> <200904211111.57295.jhb@freebsd.org> <200904211519.n3LFJFsk090691@lava.sentex.ca> <20090421153112.GA47589@edoofus.dev.vega.ru> <200904211610.n3LGAYll090970@lava.sentex.ca>

next in thread | previous in thread | raw e-mail | index | archive | help

On Tue, 21 Apr 2009, Mike Tancsa wrote:

> At 11:31 AM 4/21/2009, Ruslan Ermilov wrote:
>> :
>> : Note that these changes simply close races around use of ifindex_table,
>> : and make no attempt to solve the probem of disappearing ifnets.  Further
>>       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> : refinement of this work, including with respect to ifindex_table
>> : resizing, is still required.
>> :
>> : In a future change, the ifnet lock should be converted from a mutex to an
>> : rwlock in order to reduce contention.
>
>        Thanks for the info!  In the mean, time, apart from disabling 
> snmpwalking, is there anything I can do to mitigate triggering this bug ? 
> The box runs ospf/zebra for routing daemons and mpd53 for l2tp LNS 
> termination.

There are several bugs here, one difficult to fix (lack of refcounting), but 
also stuff like ifp being derived from an interface number twice, but checked 
against NULL only the first time (line 85 checked for NULL, re-queried but no 
check line 88).  Fixing the top bit of the function to only query the ifp once 
and check it for NULL then would be a good idea.  More fundamentally, we do 
need to refcount ifnets when used from the management path, which is not all 
that hard a change, but preferably to try the easy way first given where we 
are in the release cycle.

However, I wonder if your debugger is being totally honest with you.  Line 127 
is after several other dereferences of ifp, and there are calls to functions 
with locking, so the compiler really shouldn't have reordered the post-sysctl 
calls to be before the pre-sysctl calls that also dereference it.  Could you 
try using addr2line and see if it gives you a different line number, and/or 
check source and object file dates?

Robert N M Watson
Computer Laboratory
University of Cambridge



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.0904211851040.67705>