From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 22:11:23 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 07B30D2E
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 22:11:23 +0000 (UTC)
Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id DE7F318B8
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 22:11:22 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; 
 d="scan'208,217";a="134151895"
Received: from vmwexceht03-prd.hq.netapp.com ([10.106.76.241])
 by mx12-out.netapp.com with ESMTP; 02 Jan 2014 14:11:21 -0800
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by
 vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.03.0123.003;
 Thu, 2 Jan 2014 14:11:21 -0800
From: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Reference count race window
Thread-Topic: Reference count race window
Thread-Index: Ac8IB4AfLgR9XBy+SMCYAizpAjCJRg==
Date: Thu, 2 Jan 2014 22:11:20 +0000
Message-ID: <D29CB80EBA4DEA4D91181928AAF51538438C0D3B@SACEXCMBX04-PRD.hq.netapp.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.106.53.51]
MIME-Version: 1.0
X-Mailman-Approved-At: Fri, 03 Jan 2014 04:38:41 +0000
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 22:11:23 -0000

Hi,
  I am Suresh from NetAPP and I have  questions/queries related to the refe=
rence count usage in the BSD kernel. We are seeing some corruptions/use aft=
er free
  issues and while  debugging we found that the corruption pattern is a ucr=
ed/crgroups structure and started looking at ucred reference count implemen=
ation.

This is my understanding of ref count race window,  please correct me if I =
am wrong.


It seems there is a timing window exposed by the FreeBSD reference count us=
age/implementation. Let's start with the definitions of the acquire and rel=
ease routines
in freebsd/sys/sys/refcount.h

static __inline void
refcount_acquire(volatile u_int *count)
{

        atomic_add_acq_int(count, 1);
}

static __inline int
refcount_release(volatile u_int *count)
{
        u_int old;

        /* XXX: Should this have a rel membar? */
        old =3D atomic_fetchadd_int(count, -1);
        KASSERT(old > 0, ("negative refcount %p", count));
        return (old =3D=3D 1);
}

As implemented, a call to refcount_acquire  atomically increments the refer=
ence count while refcount_release decrements
the reference count and returns true if this release dropped the reference =
count to zero.

Consider the following sequence of events in the absence of other external =
synchronization:

* Object foo has a refcount of 1
* Thread a on processor m calls refcount_release on foo.
* Very soon after (in CPU terms) thread b on processor n calls refcount_acq=
uire on foo.
* atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int i=
n thread b,
  decrementing foo's refcount to zero and setting old to 1. refcount_releas=
e returns true.
* atomic_add_acq_int in thread b increments the reference count to 1!
* thread a, seeing refcount_release return success, frees foo.
* thread b, believing it has a reference count on foo, continues to use it.

* The major hole here is that refcount_acquire is a void function. If it al=
so returned status,
   calling software could determine that it had a valid reference and take =
appropriate action if it failed to acquire.


One such implementation might look like:
static __inline int
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old =3D atomic_fetchadd_int(count, 1);
        return (old !=3D 0);
}

This change would require modification of all calls to refcount_acquire and=
 determining appropriate action in the case of a non-success return.


Without changing the return-value semantics of refcount_acquire, we have in=
troduced a panic if we detected a race as below.
static __inline void
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old =3D atomic_fetchadd_int(count, 1);
        if (old =3D=3D 0) {
          panic("refcount_acquire race condition detected!\n");
        }
}

After this change , we have seen this panic in one of our systems. Could so=
meone look at my understanding and give me some ways to narrow down this pr=
oblem.
As I mentioned earlier, one option is to change refcount_acquire to be non =
void and change all the callers, but it seems there are many paths to be ch=
anged on failure case.


Thank you
Suresh