From owner-svn-src-all@FreeBSD.ORG  Wed May 20 17:49:40 2015
Return-Path: <owner-svn-src-all@FreeBSD.ORG>
Delivered-To: svn-src-all@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CD13E244;
 Wed, 20 May 2015 17:49:40 +0000 (UTC)
Received: from mail-lb0-x229.google.com (mail-lb0-x229.google.com
 [IPv6:2a00:1450:4010:c04::229])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3BF1D1AF4;
 Wed, 20 May 2015 17:49:40 +0000 (UTC)
Received: by lbbuc2 with SMTP id uc2so259305lbb.2;
 Wed, 20 May 2015 10:49:38 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=qCwIhfNgwnH5zswek0XJzjtOn1kTy0FLys4XPTqGv6Q=;
 b=iuT0bUAQ6fcFd1bl2qQirYkvl4vDqkKxQE841m0it7doKBibhGpDGbwkkBm7YbjQFt
 53IrVdE0GJMibq4nm7k8iidbQldpNyu0pqKSjSM/OEd4gOT+b8HdfNed20gYeOmZ3eo6
 udhkX1rz9uTaLaCphkO09sUvVSE2ZcjxraGF00U3UxPUeEU59ppkdz+UHuv8uI67b2va
 FAa2uPyO5MpLHSPGiBZXy5JBqJ/51NPkGh9fRNMZRF6VUi08HGULc+wItic8X5DhD8oH
 GW2RYSVbI4rmCVMpQo4Tqlep/6tB/fL9qkQUE9ltHDclWsZQjwNbs55BLvImhSBpYBQ+
 339Q==
MIME-Version: 1.0
X-Received: by 10.152.42.211 with SMTP id q19mr27716551lal.78.1432144178266;
 Wed, 20 May 2015 10:49:38 -0700 (PDT)
Sender: mahrens@gmail.com
Received: by 10.112.188.164 with HTTP; Wed, 20 May 2015 10:49:38 -0700 (PDT)
In-Reply-To: <20150520134101.69e555d7@kan>
References: <201505151350.t4FDocQT054144@svn.freebsd.org>
 <20150520120046.268dde86@kan>
 <CAKUb7iv0xTtivBb9TXMG_iTBJp2m-E8i87cDLutMfhk4BJnK4w@mail.gmail.com>
 <20150520134101.69e555d7@kan>
Date: Wed, 20 May 2015 10:49:38 -0700
X-Google-Sender-Auth: dK7iJ3NVKt7HDkR12NsvL3XkNWc
Message-ID: <CAKUb7ivud+SEx9N3NPtWff7xSaKAprsFOVCpERdjZ8K-jHtZWA@mail.gmail.com>
Subject: Re: svn commit: r282971 - in head/sys: kern sys
From: Matthew Ahrens <matt@mahrens.org>
To: Alexander Kabaev <kabaev@gmail.com>
Cc: John Baldwin <jhb@freebsd.org>, 
 "src-committers@freebsd.org" <src-committers@freebsd.org>,
 svn-src-all@freebsd.org, svn-src-head@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 May 2015 17:49:41 -0000

On Wed, May 20, 2015 at 10:41 AM, Alexander Kabaev <kabaev@gmail.com> wrote:

> On Wed, 20 May 2015 09:54:45 -0700
> Matthew Ahrens <matt@mahrens.org> wrote:
>
> > On Wed, May 20, 2015 at 9:00 AM, Alexander Kabaev <kabaev@gmail.com>
> > wrote:
> >
> > > On Fri, 15 May 2015 13:50:38 +0000 (UTC)
> > > John Baldwin <jhb@FreeBSD.org> wrote:
> > >
> > > > Author: jhb
> > > > Date: Fri May 15 13:50:37 2015
> > > > New Revision: 282971
> > > > URL: https://svnweb.freebsd.org/changeset/base/282971
> > > >
> > > > Log:
> > > >   Previously, cv_waiters was only updated by cv_signal or
> > > > cv_wait. If a thread awakened due to a time out, then cv_waiters
> > > > was not decremented. If INT_MAX threads timed out on a cv without
> > > > an intervening cv_broadcast, then cv_waiters could overflow. To
> > > > fix this, have each sleeping thread decrement cv_waiters when it
> > > > resumes.
> > > >
> > > >   Note that previously cv_waiters was protected by the sleepq
> > > > chain lock. However, that lock is not held when threads resume
> > > > from sleep. In addition, the interlock is also not always
> > > > reacquired after resuming (cv_wait_unlock), nor is it always held
> > > > by callers of cv_signal() or cv_broadcast(). Instead, use atomic
> > > > ops to update cv_waiters. Since the sleepq chain lock is still
> > > > held on every increment, it should still be safe to compare
> > > > cv_waiters against zero while holding the lock in the wakeup
> > > > routines as the only way the race should be lost would result in
> > > > extra calls to sleepq_signal() or sleepq_broadcast().
> > > >   Differential Revision:      https://reviews.freebsd.org/D2427
> > > >   Reviewed by:        benno
> > > >   Reported by:        benno (wrap of cv_waiters in the field)
> > > >   MFC after:  2 weeks
> > > >
> > > > Modified:
> > > >   head/sys/kern/kern_condvar.c
> > > >   head/sys/sys/condvar.h
> > > >
> > >
> > > This breaks ZFS range locking code, which expects to be able to
> > > wakeup everyone on the condition variable and then free the
> > > structure that contains it. Having woken up threads modify
> > > cv_waiters results in a race that leads to already freed memory to
> > > be accessed.
> > >
> > > It is debatable just how correct ZFS code in its expectations, but I
> > > think this commit should probably be reverted until either ZFS is
> > > changed not to expect cv modifiable by waking threads or until
> > > alternative solution is found to the cv_waiters overflow issue
> > > fixed by this commit.
> > >
> > >
> > It isn't clear to me how the zfs_range_unlock() code could know when
> > all the waiters have woken up and updated the CV, and thus it's safe
> > to destroy/free the CV.  Would the woken threads ask, "was I the last
> > thread to be woken by this CV" and if so free the struct containing
> > the CV? Obviously such a check would need to ensure that the other
> > threads have completed their updates to the CV.
> >
> > --matt
>
> Assuming other threads _need_ to update cv after they have been woken
> up. Clearly Solaris implementation managed to do without and our code
> changed that breaking range locks implementation we took directly from
> OpenSolaris (or illumos). What was previously possible now isn't. As I
> wrote before, while merits of this expectations are debatable and it is
> not hard to see where Solaris way is advantageous, that is really
> besides the point. Are you arguing that we should leave kernel in known
> broken state until 'proper' fix makes its way through possible upstream
> detour?
>

Not at all.  Breaking ZFS is not OK.  I was just trying to understand if
it's even possible to fix the breakage within ZFS.  If it's not
possible/reasonable, then the CV semantics would clearly have to be
reverted.


>
> Also, we have large code base taken from Solaris and chances are this is
> not the only place that might be affected. I think we are better off
> with this commit temporarily reverted until necessary repairs and
> auditing are complete for it to be safely re-enabled.
>
>
Agreed that the risk is large (a huge amount of code is potentially
impacted, probably not only Solaris-derived code), and does not seem to
have been analyzed before this change was landed.

--matt