From owner-freebsd-net@FreeBSD.ORG Thu Mar 29 21:59:39 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BC2F7106566C; Thu, 29 Mar 2012 21:59:39 +0000 (UTC) (envelope-from rysto32@gmail.com) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by mx1.freebsd.org (Postfix) with ESMTP id 2AC488FC14; Thu, 29 Mar 2012 21:59:38 +0000 (UTC) Received: by wibhq7 with SMTP id hq7so3560wib.13 for ; Thu, 29 Mar 2012 14:59:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GHlz166u2NjlKnc3cB0dVFb8oLC9r/HtwBvHvsklFE0=; b=06ipmV/a8hLLumYMQ1xMEi/sClQurzqxKRw4a1oaf83W9vzQy7F9Fkk3Llbat+y5rM iUxko/MXQ4+UUnIGs1dgf0R7yp+k0LCzsK37sWckDqsJuU3vbofvcQTogUQGPYhmw5ER hgdnoSeSJg+/27bvuqYoL6YWZoicdJIr01BkMEKEM6XlPGThhsCuOvvy7yme//Av4tu0 JkMZAloe3nJ22453dJz2OxV9PGyJ4cfE8VpOVAEBDprAFEAZtzgnjIA9QyPbKYpjSLs1 ALbicDpI0y5pHH5dEUmexWrZpcU1O4vOCNRItBSRLaj/58qAMGJ89eKyUOezD0qsFcgv FzYA== MIME-Version: 1.0 Received: by 10.180.95.74 with SMTP id di10mr2266419wib.1.1333058378223; Thu, 29 Mar 2012 14:59:38 -0700 (PDT) Received: by 10.180.79.137 with HTTP; Thu, 29 Mar 2012 14:59:38 -0700 (PDT) In-Reply-To: <201203090930.q299UCPX057950@freefall.freebsd.org> References: <201203090930.q299UCPX057950@freefall.freebsd.org> Date: Thu, 29 Mar 2012 17:59:38 -0400 Message-ID: From: Ryan Stone To: Gleb Smirnoff Content-Type: text/plain; charset=ISO-8859-1 Cc: freebsd-net@freebsd.org Subject: Re: kern/165863 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Mar 2012 21:59:39 -0000 Ok, I think that I have an approach that will work. This is heavily based off of glebius' proposal. The big difference is that instead of initializing the arptimer callout with the ll_entry's lock, I initialize it with the if_afdata_lock. This eliminates the LOR problem in arptimer. It also nicely prevents any races between arptimer and in_lltable_prefix_free. Either arptimer will run and ensure that prefix_free can never see an entry that arptimer is in the process of destroying, or prefix_free will stop the callout before arptimer gets a chance to start. I'll admit that it feels a bit dirty to be doing anything if the ifp at this level, but I'd argue that is an artifact of using a lock in the ifp to protect the lltable. The patch below is not yet complete; it doesn't fix the IPv6 case. IPv6 is looking much trickier as when an NDP entry is timed out nd6_free() will drop the LLE_WLOCK, acquire the IF_AFDATA_LOCK and then re-acquire the LLE_WLOCK. It's not immediately clear to me what the best way to handle the race between in6_lltable_prefix_free and nd6_llinfo_timer is. Holding the afdata_lock across all of nd6_llinfo_timer feels like overkill. Any comments on this approach? Am I going in the wrong direction? diff --git a/sys/netinet/if_ether.c b/sys/netinet/if_ether.c index bdb4efc..02d9af4 100644 --- a/sys/netinet/if_ether.c +++ b/sys/netinet/if_ether.c @@ -165,36 +165,26 @@ arptimer(void *arg) { struct ifnet *ifp; struct llentry *lle; - int pkts_dropped; + size_t pkts_dropped; KASSERT(arg != NULL, ("%s: arg NULL", __func__)); lle = (struct llentry *)arg; + + if (lle->la_flags & LLE_STATIC) + return; + ifp = lle->lle_tbl->llt_ifp; CURVNET_SET(ifp->if_vnet); - IF_AFDATA_LOCK(ifp); + IF_AFDATA_LOCK_ASSERT(ifp); LLE_WLOCK(lle); - if (lle->la_flags & LLE_STATIC) - LLE_WUNLOCK(lle); - else { - if (!callout_pending(&lle->la_timer) && - callout_active(&lle->la_timer)) { - callout_stop(&lle->la_timer); - LLE_REMREF(lle); - pkts_dropped = llentry_free(lle); - ARPSTAT_ADD(dropped, pkts_dropped); - ARPSTAT_INC(timeouts); - } else { -#ifdef DIAGNOSTIC - struct sockaddr *l3addr = L3_ADDR(lle); - log(LOG_INFO, - "arptimer issue: %p, IPv4 address: \"%s\"\n", lle, - inet_ntoa( - ((const struct sockaddr_in *)l3addr)->sin_addr)); -#endif - LLE_WUNLOCK(lle); - } - } - IF_AFDATA_UNLOCK(ifp); + LLE_REMREF(lle); + + /* llentry_free drops the LLE_WLOCK */ + pkts_dropped = llentry_free(lle); + + ARPSTAT_ADD(dropped, pkts_dropped); + ARPSTAT_INC(timeouts); + CURVNET_RESTORE(); } diff --git a/sys/netinet/in.c b/sys/netinet/in.c index ac0aada..0d5d80e 100644 --- a/sys/netinet/in.c +++ b/sys/netinet/in.c @@ -1275,15 +1275,17 @@ in_lltable_free(struct lltable *llt, struct llentry *lle) } static struct llentry * -in_lltable_new(const struct sockaddr *l3addr, u_int flags) +in_lltable_new(struct lltable *llt, const struct sockaddr *l3addr, u_int flags) { struct in_llentry *lle; + struct ifnet *ifp; lle = malloc(sizeof(struct in_llentry), M_LLTABLE, M_DONTWAIT | M_ZERO); if (lle == NULL) /* NB: caller generates msg */ return NULL; - callout_init(&lle->base.la_timer, CALLOUT_MPSAFE); + ifp = llt->llt_ifp; + callout_init_rw(&lle->base.la_timer, &ifp->if_afdata_lock, 0); /* * For IPv4 this will trigger "arpresolve" to generate * an ARP request. @@ -1292,6 +1294,7 @@ in_lltable_new(const struct sockaddr *l3addr, u_int flags) lle->l3_addr4 = *(const struct sockaddr_in *)l3addr; lle->base.lle_refcnt = 1; lle->base.lle_free = in_lltable_free; + lle->base.lle_tbl = llt; LLE_LOCK_INIT(&lle->base); return &lle->base; } @@ -1311,6 +1314,7 @@ in_lltable_prefix_free(struct lltable *llt, register int i; size_t pkts_dropped; + IF_AFDATA_WLOCK(llt->llt_ifp); for (i=0; i < LLTBL_HASHTBL_SIZE; i++) { LIST_FOREACH_SAFE(lle, &llt->lle_head[i], lle_next, next) { @@ -1318,20 +1322,19 @@ in_lltable_prefix_free(struct lltable *llt, * (flags & LLE_STATIC) means deleting all entries * including static ARP entries */ - if (IN_ARE_MASKED_ADDR_EQUAL((struct sockaddr_in *)L3_ADDR(lle), - pfx, msk) && - ((flags & LLE_STATIC) || !(lle->la_flags & LLE_STATIC))) { - int canceled; + if (IN_ARE_MASKED_ADDR_EQUAL(satosin(L3_ADDR(lle)), + pfx, msk) && ((flags & LLE_STATIC) || + !(lle->la_flags & LLE_STATIC))) { - canceled = callout_drain(&lle->la_timer); LLE_WLOCK(lle); - if (canceled) + if (callout_stop(&lle->la_timer)) LLE_REMREF(lle); pkts_dropped = llentry_free(lle); ARPSTAT_ADD(dropped, pkts_dropped); } } } + IF_AFDATA_WUNLOCK(llt->llt_ifp); } @@ -1451,7 +1454,7 @@ in_lltable_lookup(struct lltable *llt, u_int flags, const struct sockaddr *l3add in_lltable_rtcheck(ifp, flags, l3addr) != 0) goto done; - lle = in_lltable_new(l3addr, flags); + lle = in_lltable_new(llt, l3addr, flags); if (lle == NULL) { log(LOG_INFO, "lla_lookup: new lle malloc failed\n"); goto done; @@ -1462,7 +1465,6 @@ in_lltable_lookup(struct lltable *llt, u_int flags, const struct sockaddr *l3add lle->la_flags |= (LLE_VALID | LLE_STATIC); } - lle->lle_tbl = llt; lle->lle_head = lleh; LIST_INSERT_HEAD(lleh, lle, lle_next); } else if (flags & LLE_DELETE) {