From owner-svn-src-all@FreeBSD.ORG Fri Feb 13 23:38:22 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DF5F79B2; Fri, 13 Feb 2015 23:38:22 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 6199AC47; Fri, 13 Feb 2015 23:38:21 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id t1DNcCtw033825 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sat, 14 Feb 2015 02:38:12 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id t1DNcB49033824; Sat, 14 Feb 2015 02:38:11 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sat, 14 Feb 2015 02:38:11 +0300 From: Gleb Smirnoff To: Randall Stewart Subject: Re: svn commit: r278472 - in head/sys: netinet netinet6 Message-ID: <20150213233811.GH15484@FreeBSD.org> References: <201502091928.t19JSC5P066293@svn.freebsd.org> <1903622.i3cFFNVYcL@ralph.baldwin.cx> <20150213212128.GG15484@FreeBSD.org> <89BA29F2-C617-40DE-90E5-F3EC9C17AEC8@netflix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <89BA29F2-C617-40DE-90E5-F3EC9C17AEC8@netflix.com> User-Agent: Mutt/1.5.23 (2014-03-12) Cc: src-committers@freebsd.org, John Baldwin , Eric van Gyzen , svn-src-all@freebsd.org, Randall Stewart , svn-src-head@freebsd.org, zont@FreeBSD.org, rstone@FreeBSD.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Feb 2015 23:38:23 -0000 Randall, thanks a lot for investigating that. On Fri, Feb 13, 2015 at 05:27:40PM -0500, Randall Stewart wrote: R> Gleb: R> R> Ok here is the diff of the arp timer function that this changed made (238990): R> **************************** R> arptimer(void *arg) R> { R> + struct llentry *lle = (struct llentry *)arg; R> struct ifnet *ifp; R> - struct llentry *lle; R> - int pkts_dropped; R> + size_t pkts_dropped; R> R> - KASSERT(arg != NULL, ("%s: arg NULL", __func__)); R> - lle = (struct llentry *)arg; R> + if (lle->la_flags & LLE_STATIC) { R> + LLE_WUNLOCK(lle); R> + return; R> + } R> + R> ifp = lle->lle_tbl->llt_ifp; R> CURVNET_SET(ifp->if_vnet); R> + R> + if (lle->la_flags != LLE_DELETED) { R> + int evt; R> + R> + if (lle->la_flags & LLE_VALID) R> + evt = LLENTRY_EXPIRED; R> + else R> + evt = LLENTRY_TIMEDOUT; R> + EVENTHANDLER_INVOKE(lle_event, lle, evt); R> + } R> + R> + callout_stop(&lle->la_timer); R> + R> + /* XXX: LOR avoidance. We still have ref on lle. */ R> + LLE_WUNLOCK(lle); R> IF_AFDATA_LOCK(ifp); R> LLE_WLOCK(lle); R> - if (lle->la_flags & LLE_STATIC) R> - LLE_WUNLOCK(lle); R> - else { R> - if (!callout_pending(&lle->la_timer) && R> - callout_active(&lle->la_timer)) { R> - callout_stop(&lle->la_timer); R> - LLE_REMREF(lle); R> R> - if (lle->la_flags != LLE_DELETED) { R> - int evt; R> - R> - if (lle->la_flags & LLE_VALID) R> - evt = LLENTRY_EXPIRED; R> - else R> - evt = LLENTRY_TIMEDOUT; R> - EVENTHANDLER_INVOKE(lle_event, lle, evt); R> - } R> - R> - pkts_dropped = llentry_free(lle); R> - ARPSTAT_ADD(dropped, pkts_dropped); R> - ARPSTAT_INC(timeouts); R> - } else { R> -#ifdef DIAGNOSTIC R> - struct sockaddr *l3addr = L3_ADDR(lle); R> - log(LOG_INFO, R> - "arptimer issue: %p, IPv4 address: \"%s\"\n", lle, R> - inet_ntoa( R> - ((const struct sockaddr_in *)l3addr)->sin_addr)); R> -#endif R> - LLE_WUNLOCK(lle); R> - } R> - } R> + LLE_REMREF(lle); R> + pkts_dropped = llentry_free(lle); R> IF_AFDATA_UNLOCK(ifp); R> + ARPSTAT_ADD(dropped, pkts_dropped); R> + ARPSTAT_INC(timeouts); R> CURVNET_RESTORE(); R> } R> ****************************** R> R> And I can see *what* the problem was.. If you look at the lines: R> - if (!callout_pending(&lle->la_timer) && R> - callout_active(&lle->la_timer)) { R> R> This is the *incorrect* way to write this code it should have been: R> if (callout_pending(&lle->la_timer) && R> !callout_active(&lle->la_timer)) { R> R> To properly do the MPSAFE thing.. take a look at the callout manual.. R> So the original author just mixed up the test.. R> R> The idea is after you get the lock you check if its pending again, if R> so someone has restarted it.. and if its not active, then someone has R> stopped it. R> R> They check if it was *not* pending.. which is almost always the case R> since the callout code removes the pending flag and thats what you want R> to be there. So not pending would always be true.. R> R> I don’t see that this old code did the callout_deactive().. but I think R> the callout_stop() does that I guess.. R> R> I think the problem was that the original code was wrong and that R> the fix yes, avoided one race but put in another. R> R> I do think a callout_async_drain is the right thing, but that will take a while R> to get right. Peter Holm (thank you so much) has been pounding on this, if you R> could find another test to add.. that would be great. I think this will work R> the way it is.. R> R> R R> R> R> R> R> R> R> On Feb 13, 2015, at 4:21 PM, Gleb Smirnoff wrote: R> R> > 165863 R> R> -------- R> Randall Stewart R> rrs@netflix.com R> 803-317-4952 R> R> R> R> R> -- Totus tuus, Glebius.