From owner-freebsd-net@FreeBSD.ORG Mon Mar 6 23:16:05 2006 Return-Path: X-Original-To: net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DCBE316A420; Mon, 6 Mar 2006 23:16:04 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 86A7943D45; Mon, 6 Mar 2006 23:16:04 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 28ACF1A4DA9; Mon, 6 Mar 2006 15:16:04 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 0CFF65140C; Mon, 6 Mar 2006 18:15:57 -0500 (EST) Date: Mon, 6 Mar 2006 18:15:56 -0500 From: Kris Kennaway To: Kris Kennaway Message-ID: <20060306231556.GA54600@xor.obsecurity.org> References: <20050927222721.GA46411@xor.obsecurity.org> <20051001214002.GU45345@cell.sick.ru> <20051005173837.GA36638@xor.obsecurity.org> <20051005174012.GB36638@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SUOF0GtieIMvvwua" Content-Disposition: inline In-Reply-To: <20051005174012.GB36638@xor.obsecurity.org> User-Agent: Mutt/1.4.2.1i Cc: gnn@freebsd.org, Hajimu UMEMOTO , net@FreeBSD.org Subject: Re: ipv6 panic in 6.0 ([kris@FreeBSD.org: kern/85780: 'panic: bogus refcnt 0' in routing/ipv6]) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Mar 2006 23:16:05 -0000 --SUOF0GtieIMvvwua Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I've been adding KTR debugging to try and track down the cause of this recurring problem (FYI: debug.mpsafenet=3D0 is no longer working around it). To refresh your memory, here is the panic: db> wh Tracing pid 24 tid 100012 td 0xfffff802be9fa560 panic() at panic+0x164 rtfree() at rtfree+0xb4 nd6_na_output() at nd6_na_output+0x540 nd6_ns_input() at nd6_ns_input+0x738 icmp6_input() at icmp6_input+0xc38 ip6_input() at ip6_input+0x1038 netisr_processqueue() at netisr_processqueue+0x7c swi_net() at swi_net+0xdc ithread_execute_handlers() at ithread_execute_handlers+0x144 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 db> It's always in nd6_na_output() although the trace beyond this point varies. However that doesn't tell us what leaked the reference count prior to this stack trace. So far I have narrowed it down to: db> show ktr/v Timestamp --v =20 9320 (0xfffff802be9fa560:cpu5) 1815572139270 net/route.c.247: Removing ref = -> 0 0xfffff80227cefc20 ^-- This is the cause of the panic in rtfree(), since it tries to decrement= from 0. 9319 (0xfffff802be9fa560:cpu5) 1815572138338 netinet6/nd6_nbr.c.1028: Freei= ng route 0xfffff80227cefc20 with ref 0 ^-- This is the call to rtfree() above, which is here at the end of nd6_na_output(): if (ro.ro_rt) { /* we don't cache this route. */ RTFREE(ro.ro_rt); } return; 9318 (0xfffff802be9fa560:cpu5) 1815572070306 net/route.c.247: Removing ref = -> 1 0xfffff80227cefc20 This is the previous time rtfree() was run 9317 (0xfffff802be9fa560:cpu5) 1815572068930 netinet6/in6_src.c.703: rtfree= 0xfffff80227cefc20 ^-- this is the call to rtfree in 9318, which is at the end of in6_selectif() if (rt && rt =3D=3D sro.ro_rt) RTFREE(rt); return (0); My next step is to add KTR logging to all the callers of in6_selectif() to backtrace another level, but perhaps someone has ideas what can be going wrong from the partial trace already. 9316 (0xfffff802be9fa560:cpu5) 1815572067244 net/route.c.198: Adding ref ->= 0 0xfffff80227cefc20 This is in rtalloc1(): } else { KASSERT(rt =3D=3D newrt, ("locking wrong route")); RT_LOCK(newrt); RT_ADDREF(newrt); I suppose I need to also add KTR logging to the callers of rtalloc1(). 9315 (0xfffff802be9fa560:cpu5) 1815572057262 netinet6/nd6.c.877: Removing r= ef -> 1 0xfffff80227cefc20 This is in nd6_lookup(): } RT_LOCK_ASSERT(rt); RT_REMREF(rt); /* * Validation for the entry. * Note that the check for rt_llinfo is necessary because a cloned * route from a parent route that has the L flag (e.g. the default NB: The RT_LOCK_ASSERT() is superfluous here since RT_REMREF() already asserts it. 9314 (0xfffff802be9fa560:cpu5) 1815572046008 net/route.c.198: Adding ref ->= 0 0xfffff80227cefc20 Kris P.S. This comment in netinet6/ip6_output.c appears to be bogus, since RTFREE is only a single statement: if (ro =3D=3D &ip6route && ro->ro_rt) { /* brace necessary for RTFR= EE */ RTFREE(ro->ro_rt); } else if (ro_pmtu =3D=3D &ip6route && ro_pmtu->ro_rt) { RTFREE(ro_pmtu->ro_rt); } --SUOF0GtieIMvvwua Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (FreeBSD) iD8DBQFEDMKrWry0BWjoQKURAvgzAKDnN9+5HIYyYHAusbLMZrydQWtgrwCg+JlF waxk7dL+TEKD/2M7QJ61DB0= =0v/4 -----END PGP SIGNATURE----- --SUOF0GtieIMvvwua--