From owner-freebsd-net@FreeBSD.ORG Tue Feb 4 04:53:11 2014 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 27ECA6E2 for ; Tue, 4 Feb 2014 04:53:11 +0000 (UTC) Received: from mail.fer.hr (mail.fer.hr [161.53.72.233]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id ADB08117F for ; Tue, 4 Feb 2014 04:53:09 +0000 (UTC) Received: from x23.lan (141.138.17.195) by MAIL.fer.hr (161.53.72.233) with Microsoft SMTP Server (TLS) id 14.2.342.3; Tue, 4 Feb 2014 05:51:56 +0100 Date: Tue, 4 Feb 2014 05:52:29 +0100 From: Marko Zec To: Vijay Singh Subject: Re: vnet deletion panic Message-ID: <20140204055229.4a52ec15@x23.lan> In-Reply-To: References: Organization: FER X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.19; amd64-portbld-freebsd9.1) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [141.138.17.195] Cc: "freebsd-net@freebsd.org" X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 Feb 2014 04:53:11 -0000 On Mon, 3 Feb 2014 19:33:21 -0800 Vijay Singh wrote: > I'm running into a crash due on vnet deletion in the presence of > routing sockets. The root cause seems to originate from(): > > if_detach_internal() -> if_down(ifp) -> if_unroute() -> rt_ifmsg() -> > rt_dispatch() > > In rt_dispatch() we have: > > #ifdef VIMAGE > if (V_loif) > m->m_pkthdr.rcvif = V_loif; > #endif > netisr_queue(NETISR_ROUTE, m); > > Now since this would be processed async, and the ifp alove is the > loopback of the vnet being deleted, we run into accessing a freed > pointer (ifp) when netisr picks up the mbuf. So I am wondering how to > fix this. I am thinking that we could do something like the following > in rt_dispatch(): > > #ifdef VIMAGE > if (V_loif) { > if ((ifp == V_loif) && !IS_DEFAULT_VNET(curvnet)) { > CURVNET_SET_QUIET(vnet0); > m->m_pkthdr.rcvif = V_loif; > CURVNET_RESTORE(); > } else > m->m_pkthdr.rcvif = V_loif; > } > #endif > > So basically switch to the default vnet for the mbuf with the routing > socket message. Thoughts? By design, the vnet teardown procedure should not commence before the last socket attached to that vnet is closed, so I'm suspicious whether the proposed approach could actually appease the panics you're observing. Furthermore, it would certainly cause bogus routing messages to appear in vnet0 and possibly confuse routing socket consumers running there. Plus, in rt_dispatch() there's no ifp context to check against V_loif at all, as you're proposing your patch? Perhaps it could be possible to walk through all the netisr queues just before V_loif gets destroyed, and prune all queued mbufs which have m->m_pkthdr.rcvif pointing to V_loif? Since the vnet teardown procedure cannot be initiated before all (routing) sockets attached to that vnet have been closed, after all other ifnets except V_loif have also been destroyed it should not be possible for new mbufs to be queued with rcvif pointing back to V_loif, so at least conceptually that approach might work correctly. Marko