From owner-svn-src-all@freebsd.org Fri Jun 8 05:03:06 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id DA8F0101022F; Fri, 8 Jun 2018 05:03:05 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-qt0-x241.google.com (mail-qt0-x241.google.com [IPv6:2607:f8b0:400d:c0d::241]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 74F9D72B2F; Fri, 8 Jun 2018 05:03:05 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by mail-qt0-x241.google.com with SMTP id d3-v6so12316481qto.1; Thu, 07 Jun 2018 22:03:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=gXCMWOxqFAkbidGNU9RKHmHQAsT5Tc524QWYSp9eqBk=; b=ckhsSD0As07JuG4gcSy7JQfpKdSZDQufDgyijTZOg1zjFf74plOYtkHOGFOUarGeQK iem4YZdt9rH0l9HM6CiZyGnEHribCt6L9yB8adw84t9mtajMwknPDEMoEXrkL5Ei27B9 7QkrlqUX0vZTIIg7ATHLO6SUa6LWrwp775/UA1+MmCR6/MQ+ZmjDa/wqALLGfz5zd7xC 8dUrw2MnSCfjuH0NgB7j7cHChkzESIrRAVP18sptBv2RjKnWniUd8Tp6Yupo+lBzvARH J0O+NG5w8aVLGztuWy9WZ7BshOf4uJAL0+Vzz7b9BAd067ldJf0nEgBXJOqUSpyfXrP9 HV9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=gXCMWOxqFAkbidGNU9RKHmHQAsT5Tc524QWYSp9eqBk=; b=MxbLHJ8R3bDv1DQgL9tjQuyWcoIAAA5ab2Ln3zSVQyL9Z9Bd+YDGWpC6YVQ/jWHYSx 1siMz0VdXcKt2GIkFrCekMfI/HA7atvXavoDVRkXOB9B7S3qtWE5vqFQ7/+6+nUjYzzq beYQ08S8Li151zi2Pk4HktKhh4QXVHKIDCogeLcH3Amur9VPNCKvEJJ0YokuNTjSUxCL RP5V//ZN++HMuJuTGAJayBO777y/vnUuSYFAwJ+sUrijROzx7qpxZZz/XObAYzb7h5Hn t86WOM3f1xKo4fYh6A4vo8tmBKjh/AZlt7c8/DoIDY33ipQUFmahsG/11B2lANoPHsxA RIhA== X-Gm-Message-State: APt69E2VAUd8LVf3JD2TNnsuzHq0hjQ7MK5eTSh60s366wK+w6xqGTeS rmTfKP8AixNvQNyDwgSyhJ+9cCq4QW6BWK1s/MQ= X-Google-Smtp-Source: ADUXVKJQEP+fdmv1oryv3QaX5e1PFpAYF85yv5vzI6fY9mjc3QQ1bDGbZzEeZ1n+0yK7dcKZDXHQU/EH59aYOUPl0No= X-Received: by 2002:ac8:2779:: with SMTP id h54-v6mr4386857qth.85.1528434184934; Thu, 07 Jun 2018 22:03:04 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac8:1c4e:0:0:0:0:0 with HTTP; Thu, 7 Jun 2018 22:03:04 -0700 (PDT) In-Reply-To: References: <201806061257.w56CvCwq089369@repo.freebsd.org> <20180606140311.GU2450@kib.kiev.ua> <20180608033242.GA54099@pesky> From: Mateusz Guzik Date: Fri, 8 Jun 2018 07:03:04 +0200 Message-ID: Subject: Re: svn commit: r334708 - head/sys/kern To: Ryan Libby Cc: Mark Johnston , Konstantin Belousov , Justin Hibbits , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Jun 2018 05:03:06 -0000 On Fri, Jun 8, 2018 at 6:29 AM, Ryan Libby wrote: > On Thu, Jun 7, 2018 at 8:32 PM, Mark Johnston wrote: > > On Wed, Jun 06, 2018 at 05:03:11PM +0300, Konstantin Belousov wrote: > >> On Wed, Jun 06, 2018 at 12:57:12PM +0000, Justin Hibbits wrote: > >> > Author: jhibbits > >> > Date: Wed Jun 6 12:57:11 2018 > >> > New Revision: 334708 > >> > URL: https://svnweb.freebsd.org/changeset/base/334708 > >> > > >> > Log: > >> > Add a memory barrier after taking a reference on the vnode holdcnt > in _vhold > >> > > >> > This is needed to avoid a race between the VNASSERT() below, and > another > >> > thread updating the VI_FREE flag, on weakly-ordered architectures. > >> > > >> > On a 72-thread POWER9, without this barrier a 'make -j72 > buildworld' would > >> > panic on the assert regularly. > >> > > >> > It may be possible to use a weaker barrier, and I'll investigate > that once > >> > all stability issues are worked out on POWER9. > >> > > >> > Modified: > >> > head/sys/kern/vfs_subr.c > >> > > >> > Modified: head/sys/kern/vfs_subr.c > >> > ============================================================ > ================== > >> > --- head/sys/kern/vfs_subr.c Wed Jun 6 10:46:24 2018 > (r334707) > >> > +++ head/sys/kern/vfs_subr.c Wed Jun 6 12:57:11 2018 > (r334708) > >> > @@ -2807,6 +2807,9 @@ _vhold(struct vnode *vp, bool locked) > >> > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > >> > if (!locked) { > >> > if (refcount_acquire_if_not_zero(&vp->v_holdcnt)) { > >> > +#if !defined(__amd64__) && !defined(__i386__) > >> > + mb(); > >> > +#endif > >> > VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, > >> > ("_vhold: vnode with holdcnt is free")); > >> > return; > >> First, mb() must not be used in the FreeBSD code at all. > >> Look at atomic_thread_fenceXXX(9) KPI. > >> > >> Second, you need the reciprocal fence between clearing of VI_FREE and > >> refcount_acquire(), otherwise the added barrier is nop. Most likely, > >> you got away with it because there is a mutex unlock between clearing > >> of VI_FREE and acquire, which release semantic you abused. > > > > I note that vnlru_free_locked() clears VI_FREE and increments v_holdcnt > > without an intervening release fence. At this point the caller has not > > purged the vnode from the name cache, so it seems possible that the > > panicking thread observed the two stores out of order. In particular, it > > seems to me that the patch below is necessary, but possibly (probably?) > > not sufficient: > > Mark, Justin, and I looked at this. > > I think that the VNASSERT itself is not quite valid, since it is > checking the value of v_iflag without the vnode interlock held (and > without the vop lock also). If the rule is that we normally need the > vnode interlock to check VI_FREE, then I don't think that possible > reordering between the refcount_acquire and VI_FREE clearing in > vnlru_free_locked is necessarily a problem in production. > Checking it without any locks is perfectly valid in this case. It is done after v_holdcnt gets bumped from a non-zero value. So at that time it is at least two. Of course that result is stale as an arbitrary number of other threads could have bumped and dropped the ref past that point. The minimum value is 1 since we hold the ref. But this means the vnode must not be on the free list and that's what the assertion is verifying. The problem is indeed lack of ordering against the code clearing the flag for the case where 2 threads to vhold and one does the 0->1 transition. That said, the fence is required for the assertion to work. > > It might just be that unlocked assertions about v_iflag actually need > additional synchronization (although it would be a little unfortunate to > add synchronization only to INVARIANTS builds). > > > > >> Does the fence needed for the non-invariants case ? > >> > >> Fourth, doesn't v_usecount has the same issues WRT inactivation ? > > > > diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c > > index 286a871c3631..c97a8ba63612 100644 > > --- a/sys/kern/vfs_subr.c > > +++ b/sys/kern/vfs_subr.c > > @@ -1018,6 +1018,7 @@ vnlru_free_locked(int count, struct vfsops *mnt_op) > > */ > > freevnodes--; > > vp->v_iflag &= ~VI_FREE; > > + atomic_thread_fence_rel(); > This probably has no added value for non-debug kernels. > > refcount_acquire(&vp->v_holdcnt); > > > > mtx_unlock(&vnode_free_list_mtx); > > @@ -2807,9 +2808,7 @@ _vhold(struct vnode *vp, bool locked) > > CTR2(KTR_VFS, "%s: vp %p", __func__, vp); > > if (!locked) { > > if (refcount_acquire_if_not_zero(&vp->v_holdcnt)) { > > -#if !defined(__amd64__) && !defined(__i386__) > > - mb(); > > -#endif > > + atomic_thread_fence_acq(); > Same as above. > > VNASSERT((vp->v_iflag & VI_FREE) == 0, vp, > > ("_vhold: vnode with holdcnt is free")); > > return; > The main codepath which runs into this (... -> cache_lookup -> vhold) most definitely does not need the fence for production operation. What is unclear without audit is whether there are vhold users which need one. I think the patch is fine to go in if the other VI_FREE place gets a comment about a fence stemming from mtx_unlock and there is another one prior to the assertion explaining that this orders against v_iflag tests and may or may not be needed for other consumers. In general the code is *full* of data races and accidental reliance of ordering provided by amd64. Weeding this all out will be a painful exercise. Part of the problem is lack of primitives like READ_ONCE/WRITE_ONCE as seen in the linux kernel, someone should hack up equivalents. -- Mateusz Guzik