From owner-freebsd-hackers@FreeBSD.ORG  Wed Jan  1 21:20:58 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 236ACE53;
 Wed,  1 Jan 2014 21:20:58 +0000 (UTC)
Received: from mail-oa0-x229.google.com (mail-oa0-x229.google.com
 [IPv6:2607:f8b0:4003:c02::229])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id D698A1611;
 Wed,  1 Jan 2014 21:20:57 +0000 (UTC)
Received: by mail-oa0-f41.google.com with SMTP id j17so14253092oag.14
 for <multiple recipients>; Wed, 01 Jan 2014 13:20:57 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:date:message-id:subject:from:to:content-type;
 bh=qw4fJY9NTpa2WTFbgIG1Ny52kpxvxA/XduyNE1CHqBk=;
 b=wQL5w7MEeXkwfDHphx6UhJomrSxbBwHnj0wfXwdqr7x2zM63Nk30Z/3qEWRu4a9ajb
 qIeMTlThEwU0uwMab2I3YByUK6DRDGGzL16dQyFDvf89zuGX6ZsNwd9VzBaklEgOaWoD
 ZEYPRvlNumNwR+OmYu15PgIZwlqKosjIsEuPTEkysGwp0g0/zC1wEZj07vdHvutkv/7/
 N705EtpLN+m25V9oFm/dCWdfBJOOFL2K8QxBk1FS4ecgHNBM64CCjZFGM6Bi8W7W3QEW
 +iFEMUc3UgCpgx0QN2h1jOBH4BDroCJIChNsqskAO14vRv3agv5D8t/jbEJJ6XfQlEqt
 9Fyw==
MIME-Version: 1.0
X-Received: by 10.60.33.7 with SMTP id n7mr53196429oei.25.1388611257171; Wed,
 01 Jan 2014 13:20:57 -0800 (PST)
Sender: pali.gabor@gmail.com
Received: by 10.182.22.44 with HTTP; Wed, 1 Jan 2014 13:20:57 -0800 (PST)
Date: Wed, 1 Jan 2014 22:20:57 +0100
X-Google-Sender-Auth: nZ-MT4eiPxM_TEwxpbJTFGhdjNk
Message-ID: <CAHnG2CzANn+d2Qkoh+Ld9G=40SUB59hdbjp8My9zvkoPDGusAw@mail.gmail.com>
Subject: Re: Call for FreeBSD 2013Q4 (October-December) Status Reports
From: Gabor Pali <pgj@FreeBSD.org>
To: hackers@freebsd.org, current@freebsd.org
Content-Type: text/plain; charset=ISO-8859-1
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Jan 2014 21:20:58 -0000

Dear FreeBSD Community,

Please note that the submission date for the October to December
Quarterly Status Reports is January 14th, 2014, a little less than two
weeks away.  Please consult my previous message for the details:

On Sat, Dec 14, 2013 at 2:05 PM, Gabor Pali <pgj@freebsd.org> wrote:
> Dear FreeBSD Community,
>
> Please note that the next submission date for the October to December
> Quarterly Status Reports is January 14th, 2014, about a month away.
>
> They do not have to be very long -- basically they may be about
> anything that lets people know what is going on around the FreeBSD
> Project.  Submission of reports is not restricted to committers:
> Anyone who is doing anything interesting and FreeBSD-related can (and
> therefore encouraged to) write one!
>
> The preferred and easiest submission method is to use the XML
> generator [1] with the result emailed as an attachment to us, that is,
> monthly@FreeBSD.org [2].  There is also an XML template [3] which can
> be filled out manually and attached if preferred.  For the expected
> content and style, please study our guidelines on how to write a good
> status reports [4].
>
> To enable compilation and publication of the Q4 report as soon as
> possible for the January 14th deadline, please be prompt with any
> report submissions you may have.
>
> We are looking forward to all of your 2013Q4 reports!
>
> Thanks,
> Gabor
>
>
> [1] http://www.freebsd.org/cgi/monthly.cgi
> [2] mailto:monthly@freebsd.org
> [3] http://www.freebsd.org/news/status/report-sample.xml
> [4] http://www.freebsd.org/news/status/howto.html

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 22:17:39 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id EDD41F99
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 22:17:39 +0000 (UTC)
Received: from mx11.netapp.com (mx11.netapp.com [216.240.18.76])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id D068D18FE
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 22:17:39 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; d="scan'208,217";a="93311489"
Received: from vmwexceht03-prd.hq.netapp.com ([10.106.76.241])
 by mx11-out.netapp.com with ESMTP; 02 Jan 2014 14:17:39 -0800
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by
 vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.03.0123.003;
 Thu, 2 Jan 2014 14:17:39 -0800
From: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Reference count race window
Thread-Topic: Reference count race window
Thread-Index: Ac8ICHHs+v5D+0UyRX2b4aawkpE+qg==
Date: Thu, 2 Jan 2014 22:17:38 +0000
Message-ID: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.106.53.51]
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 22:17:40 -0000

Hi,
  I am Suresh from NetAPP and I have  questions/queries related to the refe=
rence count usage in the BSD kernel. We are seeing some corruptions/use aft=
er free
  issues and while  debugging we found that the corruption pattern is a ucr=
ed/crgroups structure and started looking at ucred reference count implemen=
ation.

This is my understanding of ref count race window,  please correct me if I =
am wrong.


It seems there is a timing window exposed by the FreeBSD reference count us=
age/implementation. Let's start with the definitions of the acquire and rel=
ease routines
in freebsd/sys/sys/refcount.h

static __inline void
refcount_acquire(volatile u_int *count)
{

        atomic_add_acq_int(count, 1);
}

static __inline int
refcount_release(volatile u_int *count)
{
        u_int old;

        /* XXX: Should this have a rel membar? */
        old =3D atomic_fetchadd_int(count, -1);
        KASSERT(old > 0, ("negative refcount %p", count));
        return (old =3D=3D 1);
}

As implemented, a call to refcount_acquire  atomically increments the refer=
ence count while refcount_release decrements
the reference count and returns true if this release dropped the reference =
count to zero.

Consider the following sequence of events in the absence of other external =
synchronization:

* Object foo has a refcount of 1
* Thread a on processor m calls refcount_release on foo.
* Very soon after (in CPU terms) thread b on processor n calls refcount_acq=
uire on foo.
* atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int i=
n thread b,
  decrementing foo's refcount to zero and setting old to 1. refcount_releas=
e returns true.
* atomic_add_acq_int in thread b increments the reference count to 1!
* thread a, seeing refcount_release return success, frees foo.
* thread b, believing it has a reference count on foo, continues to use it.

* The major hole here is that refcount_acquire is a void function. If it al=
so returned status,
   calling software could determine that it had a valid reference and take =
appropriate action if it failed to acquire.


One such implementation might look like:
static __inline int
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old =3D atomic_fetchadd_int(count, 1);
        return (old !=3D 0);
}

This change would require modification of all calls to refcount_acquire and=
 determining appropriate action in the case of a non-success return.


Without changing the return-value semantics of refcount_acquire, we have in=
troduced a panic if we detected a race as below.
static __inline void
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old =3D atomic_fetchadd_int(count, 1);
        if (old =3D=3D 0) {
          panic("refcount_acquire race condition detected!\n");
        }
}

After this change , we have seen this panic in one of our systems. Could so=
meone look at my understanding and give me some ways to narrow down this pr=
oblem.
As I mentioned earlier, one option is to change refcount_acquire to be non =
void and change all the callers, but it seems there are many paths to be ch=
anged on failure case.


Thank you
Suresh


From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 23:00:28 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id AEE25886
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 23:00:28 +0000 (UTC)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id 9A2821CE9
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 23:00:28 +0000 (UTC)
Received: from Alfreds-MacBook-Pro.local
 (50-204-88-5-static.hfc.comcastbusiness.net [50.204.88.5])
 by elvis.mu.org (Postfix) with ESMTPSA id 658481A3C35;
 Thu,  2 Jan 2014 14:50:39 -0800 (PST)
Message-ID: <52C5ED3E.4020805@mu.org>
Date: Thu, 02 Jan 2014 14:50:38 -0800
From: Alfred Perlstein <bright@mu.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>, 
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: Reference count race window
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
In-Reply-To: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 23:00:28 -0000


On 1/2/14, 2:17 PM, Gumpula, Suresh wrote:
> Hi,
>    I am Suresh from NetAPP and I have  questions/queries related to the reference count usage in the BSD kernel. We are seeing some corruptions/use after free
>    issues and while  debugging we found that the corruption pattern is a ucred/crgroups structure and started looking at ucred reference count implemenation.
>
> This is my understanding of ref count race window,  please correct me if I am wrong.
>
>
> It seems there is a timing window exposed by the FreeBSD reference count usage/implementation. Let's start with the definitions of the acquire and release routines
> in freebsd/sys/sys/refcount.h
>
> static __inline void
> refcount_acquire(volatile u_int *count)
> {
>
>          atomic_add_acq_int(count, 1);
> }
>
> static __inline int
> refcount_release(volatile u_int *count)
> {
>          u_int old;
>
>          /* XXX: Should this have a rel membar? */
>          old = atomic_fetchadd_int(count, -1);
>          KASSERT(old > 0, ("negative refcount %p", count));
>          return (old == 1);
> }
>
> As implemented, a call to refcount_acquire  atomically increments the reference count while refcount_release decrements
> the reference count and returns true if this release dropped the reference count to zero.
>
> Consider the following sequence of events in the absence of other external synchronization:
>
> * Object foo has a refcount of 1
> * Thread a on processor m calls refcount_release on foo.
> * Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo.
> * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b,
>    decrementing foo's refcount to zero and setting old to 1. refcount_release returns true.
> * atomic_add_acq_int in thread b increments the reference count to 1!
> * thread a, seeing refcount_release return success, frees foo.
> * thread b, believing it has a reference count on foo, continues to use it.
>
> * The major hole here is that refcount_acquire is a void function. If it also returned status,
>     calling software could determine that it had a valid reference and take appropriate action if it failed to acquire.
>
>
> One such implementation might look like:
> static __inline int
> refcount_acquire(volatile u_int *count)
> {
>          u_int old;
>
>          old = atomic_fetchadd_int(count, 1);
>          return (old != 0);
> }
>
> This change would require modification of all calls to refcount_acquire and determining appropriate action in the case of a non-success return.
>
>
> Without changing the return-value semantics of refcount_acquire, we have introduced a panic if we detected a race as below.
> static __inline void
> refcount_acquire(volatile u_int *count)
> {
>          u_int old;
>
>          old = atomic_fetchadd_int(count, 1);
>          if (old == 0) {
>            panic("refcount_acquire race condition detected!\n");
>          }
> }
>
> After this change , we have seen this panic in one of our systems. Could someone look at my understanding and give me some ways to narrow down this problem.
> As I mentioned earlier, one option is to change refcount_acquire to be non void and change all the callers, but it seems there are many paths to be changed on failure case.
>
>
> Thank you
> Suresh
>
> _________________________
Hey Suresh,

In theory this shouldn't happen due to pointer/thread ownership of the 
resource.

This means that usually a cred is copied via refcount to another object 
and by the time the refcount hits 1 then only that one object should be 
pointing to it.

That means that if someone is raising the refcount at the same time then 
they are looking into an object that is in the process of being destroyed!

Going back to your example:

* Object foo has a refcount of 1
* Thread a on processor m calls refcount_release on foo.
* Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo.

    ^--- this should not be happening as "foo" should no longer be accessible to other subsystems.
   imagine this would be like some other CPU calling rfork() on a process that is in the middle of exit().  This should *not* happen.

* atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b,
   decrementing foo's refcount to zero and setting old to 1. refcount_release returns true.
* atomic_add_acq_int in thread b increments the reference count to 1!
* thread a, seeing refcount_release return success, frees foo.
* thread b, believing it has a reference count on foo, continues to use it.


While it's possible that there *may* be a bug here, I think it would 
make sense for you to add more instrumentation to your code.

Are you testing with INVARIANTS enabled?  otherwise refcount_release 
should be panic'ing due to the KASSERT!

static __inline int
refcount_release(volatile u_int *count)
{
         u_int old;

         /* XXX: Should this have a rel membar? */
         old = atomic_fetchadd_int(count, -1);
         KASSERT(old > 0, ("negative refcount %p", count));
         return (old == 1);
}

Perhaps you should either enable INVARIANTS... or you can turn that one 
single KASSERT into an unconditional test like so:


static __inline int
refcount_release(volatile u_int *count)
{
         u_int old;

         /* XXX: Should this have a rel membar? */
         old = atomic_fetchadd_int(count, -1);
         if (old < 0) panic("negative refcount %p", count);
         return (old == 1);
}

That ought to help you catch the bug.

-Alfred



From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 23:39:28 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4AC24410
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 23:39:28 +0000 (UTC)
Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 111821F39
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 23:39:27 +0000 (UTC)
Received: from [192.168.1.73] (254C510A.nat.pool.telekom.hu [37.76.81.10])
 (authenticated bits=0)
 by vps1.elischer.org (8.14.7/8.14.7) with ESMTP id s02NdLM2019146
 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO);
 Thu, 2 Jan 2014 15:39:25 -0800 (PST)
 (envelope-from julian@freebsd.org)
Message-ID: <52C5F8A3.9000902@freebsd.org>
Date: Fri, 03 Jan 2014 00:39:15 +0100
From: Julian Elischer <julian@freebsd.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Alfred Perlstein <bright@mu.org>,
 "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: Reference count race window
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
 <52C5ED3E.4020805@mu.org>
In-Reply-To: <52C5ED3E.4020805@mu.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 23:39:28 -0000

On 1/2/14, 11:50 PM, Alfred Perlstein wrote:
>
> On 1/2/14, 2:17 PM, Gumpula, Suresh wrote:
>> Hi,
>>    I am Suresh from NetAPP and I have  questions/queries related to 
>> the reference count usage in the BSD kernel. We are seeing some 
>> corruptions/use after free
>>    issues and while  debugging we found that the corruption pattern 
>> is a ucred/crgroups structure and started looking at ucred 
>> reference count implemenation.
>>
>> This is my understanding of ref count race window,  please correct 
>> me if I am wrong.
>>
>>
>> It seems there is a timing window exposed by the FreeBSD reference 
>> count usage/implementation. Let's start with the definitions of the 
>> acquire and release routines
>> in freebsd/sys/sys/refcount.h
>>
>> static __inline void
>> refcount_acquire(volatile u_int *count)
>> {
>>
>>          atomic_add_acq_int(count, 1);
>> }
>>
>> static __inline int
>> refcount_release(volatile u_int *count)
>> {
>>          u_int old;
>>
>>          /* XXX: Should this have a rel membar? */
>>          old = atomic_fetchadd_int(count, -1);
>>          KASSERT(old > 0, ("negative refcount %p", count));
>>          return (old == 1);
>> }
>>
>> As implemented, a call to refcount_acquire  atomically increments 
>> the reference count while refcount_release decrements
>> the reference count and returns true if this release dropped the 
>> reference count to zero.
>>
>> Consider the following sequence of events in the absence of other 
>> external synchronization:
>>
>> * Object foo has a refcount of 1
>> * Thread a on processor m calls refcount_release on foo.
>> * Very soon after (in CPU terms) thread b on processor n calls 
>> refcount_acquire on foo.
>> * atomic_fetchadd_int operating in thread a stalls the 
>> atomic_add_acq_int in thread b,
>>    decrementing foo's refcount to zero and setting old to 1. 
>> refcount_release returns true.
>> * atomic_add_acq_int in thread b increments the reference count to 1!
>> * thread a, seeing refcount_release return success, frees foo.
>> * thread b, believing it has a reference count on foo, continues to 
>> use it.
>>
>> * The major hole here is that refcount_acquire is a void function. 
>> If it also returned status,
>>     calling software could determine that it had a valid reference 
>> and take appropriate action if it failed to acquire.
>>
>>
>> One such implementation might look like:
>> static __inline int
>> refcount_acquire(volatile u_int *count)
>> {
>>          u_int old;
>>
>>          old = atomic_fetchadd_int(count, 1);
>>          return (old != 0);
>> }
>>
>> This change would require modification of all calls to 
>> refcount_acquire and determining appropriate action in the case of 
>> a non-success return.
>>
>>
>> Without changing the return-value semantics of refcount_acquire, we 
>> have introduced a panic if we detected a race as below.
>> static __inline void
>> refcount_acquire(volatile u_int *count)
>> {
>>          u_int old;
>>
>>          old = atomic_fetchadd_int(count, 1);
>>          if (old == 0) {
>>            panic("refcount_acquire race condition detected!\n");
>>          }

so what is the stacktrace of the panic?
>> }
>>
>> After this change , we have seen this panic in one of our systems. 
>> Could someone look at my understanding and give me some ways to 
>> narrow down this problem.
>> As I mentioned earlier, one option is to change refcount_acquire to 
>> be non void and change all the callers, but it seems there are many 
>> paths to be changed on failure case.
>>
>>
>> Thank you
>> Suresh
>>
>> _________________________
> Hey Suresh,
>
> In theory this shouldn't happen due to pointer/thread ownership of 
> the resource.
>
My memory is that the refcount infrastructure makes some assumptions 
about how it is called.
and the cred code makes some assumptions about what is going on too.
I do agree that there is a race as outlined by you, but I believe that 
it was suposed to be impossible to reach that due to the fact that 
creds were only actually released in special cases. In those cases we 
can guarantee that no-one else should be able to have a pointer to 
that cred as the pointer is supposed to be found after the locking of 
the appropriate proc/thread structure.  Is it possible that you have 
changed the possible places that creds are released?

it is possible that we ourselves have broken this. I have not looked 
at it for some years.
Maybe it is time to change the way that the refcount interface is used 
here so that we do know if we succeeded in getting an only reference.. 
it would probably require recoding because there is always a 
legitimate place to get  a reference count of 1 (the initial setup) 
and initial and subsequent acquisition of reference counts is often 
achieved with the same code.

> This means that usually a cred is copied via refcount to another 
> object and by the time the refcount hits 1 then only that one object 
> should be pointing to it.
>
> That means that if someone is raising the refcount at the same time 
> then they are looking into an object that is in the process of being 
> destroyed!
>
> Going back to your example:
>
> * Object foo has a refcount of 1
> * Thread a on processor m calls refcount_release on foo.
> * Very soon after (in CPU terms) thread b on processor n calls 
> refcount_acquire on foo.
>
>    ^--- this should not be happening as "foo" should no longer be 
> accessible to other subsystems.
>   imagine this would be like some other CPU calling rfork() on a 
> process that is in the middle of exit().  This should *not* happen.
to expand.. there are locks that are supposed to stop this from 
happening. Exit should not be able to proceed until it is sure that 
the proc structure is only accessed by itself, and the cred pointer 
should never be cached without a reference addition. Meaning that the 
count can only be 1 in this case when the lock has been held. If this 
has been changed then yes there is a bug..
you may try check the lock status of various locks when removing the 
last reference.

>
> * atomic_fetchadd_int operating in thread a stalls the 
> atomic_add_acq_int in thread b,
>   decrementing foo's refcount to zero and setting old to 1. 
> refcount_release returns true.
> * atomic_add_acq_int in thread b increments the reference count to 1!
> * thread a, seeing refcount_release return success, frees foo.
> * thread b, believing it has a reference count on foo, continues to 
> use it.
>
>
> While it's possible that there *may* be a bug here, I think it would 
> make sense for you to add more instrumentation to your code.
>
> Are you testing with INVARIANTS enabled?  otherwise refcount_release 
> should be panic'ing due to the KASSERT!
>
> static __inline int
> refcount_release(volatile u_int *count)
> {
>         u_int old;
>
>         /* XXX: Should this have a rel membar? */
>         old = atomic_fetchadd_int(count, -1);
>         KASSERT(old > 0, ("negative refcount %p", count));
>         return (old == 1);
> }
>
> Perhaps you should either enable INVARIANTS... or you can turn that 
> one single KASSERT into an unconditional test like so:
>
>
> static __inline int
> refcount_release(volatile u_int *count)
> {
>         u_int old;
>
>         /* XXX: Should this have a rel membar? */
>         old = atomic_fetchadd_int(count, -1);
>         if (old < 0) panic("negative refcount %p", count);
>         return (old == 1);
> }
>
> That ought to help you catch the bug.
>
> -Alfred
>
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to 
> "freebsd-hackers-unsubscribe@freebsd.org"
>
>


From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 23:54:01 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 75D4C6CB;
 Thu,  2 Jan 2014 23:54:01 +0000 (UTC)
Received: from mx11.netapp.com (mx11.netapp.com [216.240.18.76])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 55E641077;
 Thu,  2 Jan 2014 23:54:01 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; d="scan'208";a="93338555"
Received: from vmwexceht06-prd.hq.netapp.com ([10.106.77.104])
 by mx11-out.netapp.com with ESMTP; 02 Jan 2014 15:54:00 -0800
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by
 vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) with mapi id 14.03.0123.003;
 Thu, 2 Jan 2014 15:54:00 -0800
From: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
To: Julian Elischer <julian@freebsd.org>, Alfred Perlstein <bright@mu.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: RE: Reference count race window
Thread-Topic: Reference count race window
Thread-Index: AQHPCA0ePlCdkUTb2Eq8swVe1xATjJpynbOA//98gUA=
Date: Thu, 2 Jan 2014 23:53:59 +0000
Message-ID: <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com>
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
 <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org>
In-Reply-To: <52C5F8A3.9000902@freebsd.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.106.53.51]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 23:54:01 -0000

>> Without changing the return-value semantics of refcount_acquire, we=20
>> have introduced a panic if we detected a race as below.
>> static __inline void
>> refcount_acquire(volatile u_int *count) {
>>          u_int old;
>>
>>          old =3D atomic_fetchadd_int(count, 1);
>>          if (old =3D=3D 0) {
>>            panic("refcount_acquire race condition detected!\n");
>>          }

>>>>>so what is the stacktrace of the panic?

It's from the socket code calling crhold.   It's a non debug build( NO INVA=
RIANTS )

#4  0xffffffff80331d34 in panic (fmt=3D0xffffffff805c1e60 "refcount_acquire=
 race condition detected!\n") at ../../../../sys/kern/kern_shutdown.c:1009
#5  0xffffffff80326662 in refcount_acquire (count=3D<optimized out>) at ../=
../../../sys/sys/refcount.h:65
#6  crhold (cr=3D<optimized out>) at ../../../../sys/kern/kern_prot.c:1814
#7  0xffffffff803aa0d9 in socreate (dom=3D<optimized out>, aso=3D0xffffff80=
345c1b00, type=3D<optimized out>, proto=3D0, cred=3D0xffffff0017d7aa00, td=
=3D0xffffff000b294410)=20
at ../../../../sys/kern/uipc_socket.c:441
#8  0xffffffff803b2e5c in socket (td=3D0xffffff000b294410, uap=3D0xffffff80=
345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201
#9  0xffffffff80539ecb in syscall (frame=3D0xffffff80345c1c80) at ../../../=
../sys/amd64/amd64/trap.c:1260


Thanks
Suresh

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 01:21:18 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C3E94CC3;
 Fri,  3 Jan 2014 01:21:18 +0000 (UTC)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id AD77816D3;
 Fri,  3 Jan 2014 01:21:18 +0000 (UTC)
Received: from Alfreds-MacBook-Pro.local
 (50-204-88-5-static.hfc.comcastbusiness.net [50.204.88.5])
 by elvis.mu.org (Postfix) with ESMTPSA id 575521A3C19;
 Thu,  2 Jan 2014 17:21:14 -0800 (PST)
Message-ID: <52C61088.3080703@mu.org>
Date: Thu, 02 Jan 2014 17:21:12 -0800
From: Alfred Perlstein <bright@mu.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>, 
 Julian Elischer <julian@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: Reference count race window
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
 <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org>
 <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com>
In-Reply-To: <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 01:21:18 -0000


On 1/2/14, 3:53 PM, Gumpula, Suresh wrote:
>>> Without changing the return-value semantics of refcount_acquire, we
>>> have introduced a panic if we detected a race as below.
>>> static __inline void
>>> refcount_acquire(volatile u_int *count) {
>>>           u_int old;
>>>
>>>           old = atomic_fetchadd_int(count, 1);
>>>           if (old == 0) {
>>>             panic("refcount_acquire race condition detected!\n");
>>>           }
>>>>>> so what is the stacktrace of the panic?
> It's from the socket code calling crhold.   It's a non debug build( NO INVARIANTS )
>
> #4  0xffffffff80331d34 in panic (fmt=0xffffffff805c1e60 "refcount_acquire race condition detected!\n") at ../../../../sys/kern/kern_shutdown.c:1009
> #5  0xffffffff80326662 in refcount_acquire (count=<optimized out>) at ../../../../sys/sys/refcount.h:65
> #6  crhold (cr=<optimized out>) at ../../../../sys/kern/kern_prot.c:1814
> #7  0xffffffff803aa0d9 in socreate (dom=<optimized out>, aso=0xffffff80345c1b00, type=<optimized out>, proto=0, cred=0xffffff0017d7aa00, td=0xffffff000b294410)
> at ../../../../sys/kern/uipc_socket.c:441
> #8  0xffffffff803b2e5c in socket (td=0xffffff000b294410, uap=0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201
> #9  0xffffffff80539ecb in syscall (frame=0xffffff80345c1c80) at ../../../../sys/amd64/amd64/trap.c:1260
>
If it's a non-debug build then how do you know that someone isn't 
incorrectly lowering the refcount?

Please try some invariants or at least manually turn on the one KASSERT 
I mentioned.

Another trick would be to add a an array of char*+int for the last few 
places that decremented, you can use the returned refcount as an index 
to that array to track who may be doing the extra frees.

-Alfred


From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 02:38:25 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 39A84F63;
 Fri,  3 Jan 2014 02:38:25 +0000 (UTC)
Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 17C1C1C9A;
 Fri,  3 Jan 2014 02:38:24 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.95,595,1384329600"; d="scan'208";a="134232804"
Received: from vmwexceht05-prd.hq.netapp.com ([10.106.77.35])
 by mx12-out.netapp.com with ESMTP; 02 Jan 2014 18:38:19 -0800
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by
 vmwexceht05-prd.hq.netapp.com ([10.106.77.35]) with mapi id 14.03.0123.003;
 Thu, 2 Jan 2014 18:38:19 -0800
From: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
To: Alfred Perlstein <bright@mu.org>, Julian Elischer <julian@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: RE: Reference count race window
Thread-Topic: Reference count race window
Thread-Index: AQHPCA0ePlCdkUTb2Eq8swVe1xATjJpynbOA//98gUCAAJ/7AP//hPzA
Date: Fri, 3 Jan 2014 02:38:18 +0000
Message-ID: <D29CB80EBA4DEA4D91181928AAF51538438C0F09@SACEXCMBX04-PRD.hq.netapp.com>
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
 <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org>
 <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com>
 <52C61088.3080703@mu.org>
In-Reply-To: <52C61088.3080703@mu.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.106.53.51]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 02:38:25 -0000

Hi Alfred,
      I agree that there could have been an extra/invalid  crfree() which d=
ecremented the count  and  looks valid  crhold(acquire) from socket code  p=
anic'ed in my case. As per your suggestion if we=20
 replace  the assert with if condition in release,  we will end up panicing=
  when the actual  crfree() happens.  But we may not be knowing who crfree'=
ed() in the first and invalid place.  Am I correct?   I will try your sugge=
stion.

Can you please bit more explain your array trick ?

Thanks
Suresh



-----Original Message-----
From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freeb=
sd.org] On Behalf Of Alfred Perlstein
Sent: Thursday, January 02, 2014 8:21 PM
To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org
Subject: Re: Reference count race window


On 1/2/14, 3:53 PM, Gumpula, Suresh wrote:
>>> Without changing the return-value semantics of refcount_acquire, we=20
>>> have introduced a panic if we detected a race as below.
>>> static __inline void
>>> refcount_acquire(volatile u_int *count) {
>>>           u_int old;
>>>
>>>           old =3D atomic_fetchadd_int(count, 1);
>>>           if (old =3D=3D 0) {
>>>             panic("refcount_acquire race condition detected!\n");
>>>           }
>>>>>> so what is the stacktrace of the panic?
> It's from the socket code calling crhold.   It's a non debug build( NO IN=
VARIANTS )
>
> #4  0xffffffff80331d34 in panic (fmt=3D0xffffffff805c1e60=20
> "refcount_acquire race condition detected!\n") at=20
> ../../../../sys/kern/kern_shutdown.c:1009
> #5  0xffffffff80326662 in refcount_acquire (count=3D<optimized out>) at=20
> ../../../../sys/sys/refcount.h:65
> #6  crhold (cr=3D<optimized out>) at=20
> ../../../../sys/kern/kern_prot.c:1814
> #7  0xffffffff803aa0d9 in socreate (dom=3D<optimized out>,=20
> aso=3D0xffffff80345c1b00, type=3D<optimized out>, proto=3D0,=20
> cred=3D0xffffff0017d7aa00, td=3D0xffffff000b294410) at=20
> ../../../../sys/kern/uipc_socket.c:441
> #8  0xffffffff803b2e5c in socket (td=3D0xffffff000b294410,=20
> uap=3D0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201
> #9  0xffffffff80539ecb in syscall (frame=3D0xffffff80345c1c80) at=20
> ../../../../sys/amd64/amd64/trap.c:1260
>
If it's a non-debug build then how do you know that someone isn't incorrect=
ly lowering the refcount?

Please try some invariants or at least manually turn on the one KASSERT I m=
entioned.

Another trick would be to add a an array of char*+int for the last few plac=
es that decremented, you can use the returned refcount as an index to that =
array to track who may be doing the extra frees.

-Alfred

_______________________________________________
freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/l=
istinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 02:53:23 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B981932B;
 Fri,  3 Jan 2014 02:53:23 +0000 (UTC)
Received: from elvis.mu.org (elvis.mu.org [192.203.228.196])
 by mx1.freebsd.org (Postfix) with ESMTP id 97C091D8C;
 Fri,  3 Jan 2014 02:53:23 +0000 (UTC)
Received: from Alfreds-MacBook-Pro.local
 (50-204-88-5-static.hfc.comcastbusiness.net [50.204.88.5])
 by elvis.mu.org (Postfix) with ESMTPSA id 66D1F1A3C19;
 Thu,  2 Jan 2014 18:53:12 -0800 (PST)
Message-ID: <52C62617.7020304@mu.org>
Date: Thu, 02 Jan 2014 18:53:11 -0800
From: Alfred Perlstein <bright@mu.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>, 
 Julian Elischer <julian@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Re: Reference count race window
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
 <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org>
 <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com>
 <52C61088.3080703@mu.org>
 <D29CB80EBA4DEA4D91181928AAF51538438C0F09@SACEXCMBX04-PRD.hq.netapp.com>
In-Reply-To: <D29CB80EBA4DEA4D91181928AAF51538438C0F09@SACEXCMBX04-PRD.hq.netapp.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 02:53:23 -0000


On 1/2/14, 6:38 PM, Gumpula, Suresh wrote:
> Hi Alfred,
>        I agree that there could have been an extra/invalid  crfree() which decremented the count  and  looks valid  crhold(acquire) from socket code  panic'ed in my case. As per your suggestion if we
>   replace  the assert with if condition in release,  we will end up panicing  when the actual  crfree() happens.  But we may not be knowing who crfree'ed() in the first and invalid place.  Am I correct?   I will try your suggestion.
>
> Can you please bit more explain your array trick ?

I think the simplest thing to do would be to replace crfree with a macro 
to pass __FILE__ and __LINE__ down into the actual crfree (or you can 
use builtin_return_address to get the stack address of the caller instead.

Then just either have an array of {file,line} tuples or instead just 
return addresses in the struct.

Just extend struct ucred and add this:

#define MAX_PREV 10
const char *files[MAX_PREV];
int lines[MAX_PREV];

Then hack your own version of refcount_release(), call it 
refcount_release2() but have it take a pointer to an integer that it 
will write the value of the old refcount into.

Then you can use that return value like so:

if (old_refcount < MAX_PREV) {
   cred->files[old_refcount] = pointer_to_file_name;
   cred->lines[old_refcount] = line_number;
}

Then add the assert.  or better yet, turn on INVARIANTS (or maybe try 
each option in turn as INVARIANTS might hide the bug).

When you crash you can then see the last few callers who did free inside 
the struct.

Since the refcount "old_refcount" is atomically manipulated you should 
see the last few frees that send you negative.

-Alfred

>
> Thanks
> Suresh
>
>
>
> -----Original Message-----
> From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of Alfred Perlstein
> Sent: Thursday, January 02, 2014 8:21 PM
> To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org
> Subject: Re: Reference count race window
>
>
> On 1/2/14, 3:53 PM, Gumpula, Suresh wrote:
>>>> Without changing the return-value semantics of refcount_acquire, we
>>>> have introduced a panic if we detected a race as below.
>>>> static __inline void
>>>> refcount_acquire(volatile u_int *count) {
>>>>            u_int old;
>>>>
>>>>            old = atomic_fetchadd_int(count, 1);
>>>>            if (old == 0) {
>>>>              panic("refcount_acquire race condition detected!\n");
>>>>            }
>>>>>>> so what is the stacktrace of the panic?
>> It's from the socket code calling crhold.   It's a non debug build( NO INVARIANTS )
>>
>> #4  0xffffffff80331d34 in panic (fmt=0xffffffff805c1e60
>> "refcount_acquire race condition detected!\n") at
>> ../../../../sys/kern/kern_shutdown.c:1009
>> #5  0xffffffff80326662 in refcount_acquire (count=<optimized out>) at
>> ../../../../sys/sys/refcount.h:65
>> #6  crhold (cr=<optimized out>) at
>> ../../../../sys/kern/kern_prot.c:1814
>> #7  0xffffffff803aa0d9 in socreate (dom=<optimized out>,
>> aso=0xffffff80345c1b00, type=<optimized out>, proto=0,
>> cred=0xffffff0017d7aa00, td=0xffffff000b294410) at
>> ../../../../sys/kern/uipc_socket.c:441
>> #8  0xffffffff803b2e5c in socket (td=0xffffff000b294410,
>> uap=0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201
>> #9  0xffffffff80539ecb in syscall (frame=0xffffff80345c1c80) at
>> ../../../../sys/amd64/amd64/trap.c:1260
>>
> If it's a non-debug build then how do you know that someone isn't incorrectly lowering the refcount?
>
> Please try some invariants or at least manually turn on the one KASSERT I mentioned.
>
> Another trick would be to add a an array of char*+int for the last few places that decremented, you can use the returned refcount as an index to that array to track who may be doing the extra frees.
>
> -Alfred
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>


From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 04:00:25 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4B9064A1
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 04:00:25 +0000 (UTC)
Received: from nm20.bullet.mail.bf1.yahoo.com (nm20.bullet.mail.bf1.yahoo.com
 [98.139.212.179]) by mx1.freebsd.org (Postfix) with SMTP id D23931286
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 04:00:24 +0000 (UTC)
Received: from [66.196.81.170] by nm20.bullet.mail.bf1.yahoo.com with NNFMP;
 03 Jan 2014 04:00:18 -0000
Received: from [98.139.213.8] by tm16.bullet.mail.bf1.yahoo.com with NNFMP;
 03 Jan 2014 04:00:18 -0000
Received: from [127.0.0.1] by smtp108.mail.bf1.yahoo.com with NNFMP;
 03 Jan 2014 04:00:18 -0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024;
 t=1388721618; bh=bFriKQOv0BcElDs+L8odPU+1tifFeZclJ+da2Tp/NOM=;
 h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Subject:From:Content-Type:X-Mailer:Message-Id:Date:To:Content-Transfer-Encoding:Mime-Version;
 b=0A3TUKMwiRqdxZzqlD3Y3yU2bnidbMUW8fTd5eOvj1WxP8aPZGXl+hawSEBWhWpuTco1wAK0wojMfmUw9ay9RcBAX+5+fg997TDfC0IC8bZyfkZMp0WVUgOULW8IiY9Bv22P9IldZe2qfq7PofvWqU+elQJFER3/KGGFCx5AE5E=
X-Yahoo-Newman-Id: 701339.92781.bm@smtp108.mail.bf1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: vPFXwj8VM1nNu8xnsv8sAvDnOQBPNZb_7oWkpRoZxLLg8Lb
 ACnG56BTtU64KAq0RKtro67zoqc_kfYqy91JUnoUC1_w.GLeQv05SqiIxn7G
 Mxw6A6u.kaRLgqTwOkgxWl9I4L.No7z7nUBNjQwaHIvB77tWA_OTegtJFROA
 t3z1j4C1SJYX2TyAh38ZK9FrYzUjwMsxsnQh6VgewVxmJ.5NHeHGlw.TQK8t
 2yoT_9DJImpK7aKbsdOXn6RB1RHYXywrEpdzZF4.YOovTpAuL5ByvKjIl9aY
 sRgDlg2ZUQUAHXfZ7yV5Ha6dkl.hMFZXl_eJoImkayiFVowMftVXK.N.g9KJ
 3uBOA5siG2rZL8hzt0kpEtzYQuCQ6N6CG4S_Oh4Qdfs.gXpYAHhAFhsS7DJ8
 vb7RZIpjTCtukE0kaoaoamWGEZIcdu5Ti80l4kzaGQyCiH.2VgrFsySLz0BT
 0LTogth413PN21dbeaJfXENYqkM_hkQK.NVZtWtUCHa3ppV7YDmuZctOd0tZ
 03NDlPuRmssMqwWtgToUle3oDXHGK1hLj
X-Yahoo-SMTP: LAFNfTaswBDguI7meB90l2l3wOU-
X-Rocket-Received: from [10.111.176.199] (free7by@117.136.24.75 with xymcookie
 [106.10.149.123])
 by smtp108.mail.bf1.yahoo.com with SMTP; 02 Jan 2014 20:00:18 -0800 PST
Subject: Strange keyboard mistake
From: by <free7by@yahoo.com>
Content-Type: text/plain;
	charset=us-ascii
X-Mailer: iPhone Mail (11B554a)
Message-Id: <D8FDBCA7-8B7F-41DB-A526-924F45D7EA51@yahoo.com>
Date: Fri, 3 Jan 2014 12:00:07 +0800
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (1.0)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 04:00:25 -0000

      Hi,
      I got a very strange problem.
      I got another keyboard for my laptop, everything goes well for days, b=
ut today, for some reasons when in csh environment, i got my new keyboard of=
f my laptop's USB port, and just a few minutes later, after i put it back, k=
eyboard got a mistake.
      For example, when i type 'b', it became a "smile face", and other keys=
 became other strange symbols too!
      What is the most strange is that my original keyboard on my laptop bec=
ame the same!=20
      I got no idea how to do, so i hit Ctrl+Alt+Delete to reboot, after reb=
oot, everything became normal.
      Does anyone got any ideas about this strange behavior? Or if i do not w=
ant to reboot, what should i do when i encounter this situation again.
      By the way, i use FreeBSD 8.4 RELEASE and my new keyboard is Logitech K=
310.
      Thanks.
----by

From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 22:11:23 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 07B30D2E
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 22:11:23 +0000 (UTC)
Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id DE7F318B8
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 22:11:22 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; 
 d="scan'208,217";a="134151895"
Received: from vmwexceht03-prd.hq.netapp.com ([10.106.76.241])
 by mx12-out.netapp.com with ESMTP; 02 Jan 2014 14:11:21 -0800
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by
 vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.03.0123.003;
 Thu, 2 Jan 2014 14:11:21 -0800
From: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
To: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: Reference count race window
Thread-Topic: Reference count race window
Thread-Index: Ac8IB4AfLgR9XBy+SMCYAizpAjCJRg==
Date: Thu, 2 Jan 2014 22:11:20 +0000
Message-ID: <D29CB80EBA4DEA4D91181928AAF51538438C0D3B@SACEXCMBX04-PRD.hq.netapp.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.106.53.51]
MIME-Version: 1.0
X-Mailman-Approved-At: Fri, 03 Jan 2014 04:38:41 +0000
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.17
Cc: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 22:11:23 -0000

Hi,
  I am Suresh from NetAPP and I have  questions/queries related to the refe=
rence count usage in the BSD kernel. We are seeing some corruptions/use aft=
er free
  issues and while  debugging we found that the corruption pattern is a ucr=
ed/crgroups structure and started looking at ucred reference count implemen=
ation.

This is my understanding of ref count race window,  please correct me if I =
am wrong.


It seems there is a timing window exposed by the FreeBSD reference count us=
age/implementation. Let's start with the definitions of the acquire and rel=
ease routines
in freebsd/sys/sys/refcount.h

static __inline void
refcount_acquire(volatile u_int *count)
{

        atomic_add_acq_int(count, 1);
}

static __inline int
refcount_release(volatile u_int *count)
{
        u_int old;

        /* XXX: Should this have a rel membar? */
        old =3D atomic_fetchadd_int(count, -1);
        KASSERT(old > 0, ("negative refcount %p", count));
        return (old =3D=3D 1);
}

As implemented, a call to refcount_acquire  atomically increments the refer=
ence count while refcount_release decrements
the reference count and returns true if this release dropped the reference =
count to zero.

Consider the following sequence of events in the absence of other external =
synchronization:

* Object foo has a refcount of 1
* Thread a on processor m calls refcount_release on foo.
* Very soon after (in CPU terms) thread b on processor n calls refcount_acq=
uire on foo.
* atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int i=
n thread b,
  decrementing foo's refcount to zero and setting old to 1. refcount_releas=
e returns true.
* atomic_add_acq_int in thread b increments the reference count to 1!
* thread a, seeing refcount_release return success, frees foo.
* thread b, believing it has a reference count on foo, continues to use it.

* The major hole here is that refcount_acquire is a void function. If it al=
so returned status,
   calling software could determine that it had a valid reference and take =
appropriate action if it failed to acquire.


One such implementation might look like:
static __inline int
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old =3D atomic_fetchadd_int(count, 1);
        return (old !=3D 0);
}

This change would require modification of all calls to refcount_acquire and=
 determining appropriate action in the case of a non-success return.


Without changing the return-value semantics of refcount_acquire, we have in=
troduced a panic if we detected a race as below.
static __inline void
refcount_acquire(volatile u_int *count)
{
        u_int old;

        old =3D atomic_fetchadd_int(count, 1);
        if (old =3D=3D 0) {
          panic("refcount_acquire race condition detected!\n");
        }
}

After this change , we have seen this panic in one of our systems. Could so=
meone look at my understanding and give me some ways to narrow down this pr=
oblem.
As I mentioned earlier, one option is to change refcount_acquire to be non =
void and change all the callers, but it seems there are many paths to be ch=
anged on failure case.


Thank you
Suresh


From owner-freebsd-hackers@FreeBSD.ORG  Thu Jan  2 23:07:50 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 77B079B8
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 23:07:50 +0000 (UTC)
Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222])
 by mx1.freebsd.org (Postfix) with ESMTP id 3E09A1D71
 for <freebsd-hackers@freebsd.org>; Thu,  2 Jan 2014 23:07:50 +0000 (UTC)
Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3])
 by phk.freebsd.dk (Postfix) with ESMTP id 520703EB30;
 Thu,  2 Jan 2014 23:07:49 +0000 (UTC)
Received: from critter.freebsd.dk (localhost [127.0.0.1])
 by critter.freebsd.dk (8.14.7/8.14.7) with ESMTP id s02N7mts012214;
 Thu, 2 Jan 2014 23:07:48 GMT (envelope-from phk@phk.freebsd.dk)
To: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
Subject: Re: Reference count race window
In-reply-to: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
From: "Poul-Henning Kamp" <phk@phk.freebsd.dk>
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
Content-Type: text/plain; charset=ISO-8859-1
Date: Thu, 02 Jan 2014 23:07:48 +0000
Message-ID: <12213.1388704068@critter.freebsd.dk>
X-Mailman-Approved-At: Fri, 03 Jan 2014 04:39:36 +0000
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jan 2014 23:07:50 -0000

In message <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.
com>, "Gumpula, Suresh" writes:


>One such implementation might look like:
>static __inline int
>refcount_acquire(volatile u_int *count)
>{
>        u_int old;
>
>        old = atomic_fetchadd_int(count, 1);
>        return (old != 0);
>}

This would still not be safe. as it would increment the count even if
it failed, and thereby just move the race to the thread to come
past this counter.

I agree that refcount_acquire() needs to return failure (either as
returnvalue or panic) if the refcount was zero, but unless it
panics it SHALL also leave the refcount intact in that case.

I don't think there is any way to implement failure-detecting
refcounts correctly, except by using a compare-exchange style atomic,
which is less efficient than the atomic add.

For that reason, it can be argued that the present design is
faster and that users of the refcount API are required to use
some other means to ensure that grabbing a reference is always
safe.

However, in my experience that usually becomes even more inefficient.

So overall I would probably vote for the compare-exchange model with a
return value for failure.


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 05:47:16 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8CDFEA18
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 05:47:16 +0000 (UTC)
Received: from smtp2.hushmail.com (smtp2a.hushmail.com [65.39.178.237])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 72D271975
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 05:47:15 +0000 (UTC)
Received: from smtp2.hushmail.com (smtp2a.hushmail.com [65.39.178.237])
 by smtp2.hushmail.com (Postfix) with SMTP id DCEEBA0214
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 05:17:07 +0000 (UTC)
Received: from smtp.hushmail.com (w7.hushmail.com [65.39.178.32])
 by smtp2.hushmail.com (Postfix) with ESMTP
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 05:17:07 +0000 (UTC)
Received: by smtp.hushmail.com (Postfix, from userid 99)
 id C11B6200F5; Fri,  3 Jan 2014 05:17:07 +0000 (UTC)
MIME-Version: 1.0
Date: Fri, 03 Jan 2014 00:17:07 -0500
To: freebsd-hackers@freebsd.org
Subject: pthread basics and contention
From: chump1@hushmail.com
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="UTF-8"
Message-Id: <20140103051707.C11B6200F5@smtp.hushmail.com>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 05:47:16 -0000


I have a fairly simple task that involves processing something in a 2D array, MxN times. I took a naive approach, 1x process 1x thread, and it took a little longer than desired. Well now, I could do better with some multi processing, especially on a multi core box, right?




Well, I have not had much luck. At first I spawned M threads and had each iterate over each N in turn, with M between 25-35. It took much, much longer than the single thread. I figured contention and overhead were costing me big, and gave it a shot with a scaled down version of the problem, M=10. Still, much slower than the single thread. A little confused, I went back to the big problem set (25-35), and made a new program that spawned only two threads, and each is limited to processing only even or only odd data sets. Even that still takes twice as long as the single thread version! What is up with that?




More important asides, I am barely doing any real processing at all. It is basically a no-op, barely doing more than incrementing the counter. Should I expect to see performance gains once I am doing real work in the processing portion of my program? Should I expect to see much different behavior on a different OS? Also I have one physical processor, two cores. Would I see better gains with more cores? How do you find processes and threads scale against hardware overall?




Thanks!


Sent using Hushmail


From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 05:52:48 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 35010BDF
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 05:52:48 +0000 (UTC)
Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com
 [IPv6:2a00:1450:400c:c03::22d])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id C30931A02
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 05:52:47 +0000 (UTC)
Received: by mail-we0-f173.google.com with SMTP id u57so13089000wes.18
 for <freebsd-hackers@freebsd.org>; Thu, 02 Jan 2014 21:52:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type:content-transfer-encoding;
 bh=Iz7IULqqHivY11RfFDM4ueaKz78qDFOFOx9oCUQf4Bo=;
 b=gLahNKfVeBZWSPOFJ7OUcOTuOKsV8Qykf6kH9Mfzp7cV8bQwnLwXo0/ZTQ4wKcdZUk
 vBS6tAg6kUr9BdQ2jgNtFAdY54nPlcJV9Xjjr800w3YEncXGQnobOCfT518zFwc1nGL4
 aJ6H6oJ9J6J1Zxv5QTdCTKzQ5HMnR4Ed4Y/JA4PoId2AlrqJRNsznJ7cE/xyn24Yje2I
 9g0+s19hdSOigLSE2oANvZ8U7sE5OzdxhKIkbmiPuZFeXcidvdKT6RmXW60bbXwU0/D/
 V7HphTdof+vx4uP4/O7hys4cof/7lYdC5FrMQP4ociz9qD7lLt68XXMhrYxlWlt7SNT1
 fgAQ==
MIME-Version: 1.0
X-Received: by 10.180.39.43 with SMTP id m11mr532191wik.8.1388728365233; Thu,
 02 Jan 2014 21:52:45 -0800 (PST)
Received: by 10.194.187.136 with HTTP; Thu, 2 Jan 2014 21:52:45 -0800 (PST)
In-Reply-To: <20140103051707.C11B6200F5@smtp.hushmail.com>
References: <20140103051707.C11B6200F5@smtp.hushmail.com>
Date: Fri, 3 Jan 2014 00:52:45 -0500
Message-ID: <CAHwLALPNoerSBvP1wOiLQPE6oiTO4j+k3nU0=mq6ogRfXQKZRg@mail.gmail.com>
Subject: Re: pthread basics and contention
From: Rayson Ho <raysonlogin@gmail.com>
To: chump1@hushmail.com
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 05:52:48 -0000

It depends on how you partition the work items. If the even & odd data
end up sharing the same cacheline, then it can be slow... You may want
to google: cache ping pong effect

Rayson

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


On Fri, Jan 3, 2014 at 12:17 AM,  <chump1@hushmail.com> wrote:
>
> I have a fairly simple task that involves processing something in a 2D ar=
ray, MxN times. I took a naive approach, 1x process 1x thread, and it took =
a little longer than desired. Well now, I could do better with some multi p=
rocessing, especially on a multi core box, right?
>
>
>
>
> Well, I have not had much luck. At first I spawned M threads and had each=
 iterate over each N in turn, with M between 25-35. It took much, much long=
er than the single thread. I figured contention and overhead were costing m=
e big, and gave it a shot with a scaled down version of the problem, M=3D10=
. Still, much slower than the single thread. A little confused, I went back=
 to the big problem set (25-35), and made a new program that spawned only t=
wo threads, and each is limited to processing only even or only odd data se=
ts. Even that still takes twice as long as the single thread version! What =
is up with that?
>
>
>
>
> More important asides, I am barely doing any real processing at all. It i=
s basically a no-op, barely doing more than incrementing the counter. Shoul=
d I expect to see performance gains once I am doing real work in the proces=
sing portion of my program? Should I expect to see much different behavior =
on a different OS? Also I have one physical processor, two cores. Would I s=
ee better gains with more cores? How do you find processes and threads scal=
e against hardware overall?
>
>
>
>
> Thanks!
>
>
> Sent using Hushmail
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"

From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 17:40:20 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8470F1A5;
 Fri,  3 Jan 2014 17:40:20 +0000 (UTC)
Received: from mx11.netapp.com (mx11.netapp.com [216.240.18.76])
 (using TLSv1 with cipher RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5F04611A8;
 Fri,  3 Jan 2014 17:40:19 +0000 (UTC)
X-IronPort-AV: E=Sophos;i="4.95,598,1384329600"; d="scan'208";a="93491130"
Received: from vmwexceht06-prd.hq.netapp.com ([10.106.77.104])
 by mx11-out.netapp.com with ESMTP; 03 Jan 2014 09:40:19 -0800
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by
 vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) with mapi id 14.03.0123.003;
 Fri, 3 Jan 2014 09:40:19 -0800
From: "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>
To: Alfred Perlstein <bright@mu.org>, Julian Elischer <julian@freebsd.org>,
 "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject: RE: Reference count race window
Thread-Topic: Reference count race window
Thread-Index: AQHPCA0ePlCdkUTb2Eq8swVe1xATjJpynbOA//98gUCAAJ/7AP//hPzAgACUt4CAAGawEA==
Date: Fri, 3 Jan 2014 17:40:18 +0000
Message-ID: <D29CB80EBA4DEA4D91181928AAF51538438C13E4@SACEXCMBX04-PRD.hq.netapp.com>
References: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
 <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org>
 <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com>
 <52C61088.3080703@mu.org>
 <D29CB80EBA4DEA4D91181928AAF51538438C0F09@SACEXCMBX04-PRD.hq.netapp.com>
 <52C62617.7020304@mu.org>
In-Reply-To: <52C62617.7020304@mu.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.106.53.53]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 17:40:20 -0000

Thanks a lot for suggestions/comments  Alfred/Julian/Poul Henning.   I will=
 instrument as per Alfred suggestion first and will see if misuse of crfree=
 is the root cause of corruption than a race window.

By the way , what are all the ways  we have in freebsd to debug memory corr=
uptions.  I am aware of   options DEBUG_REDZONE( overflow detection) ,  DEB=
UG_MEMGAURD( for a malloc type),  MALLOC_DEBUG_MAXZONES  and some INVARIANT=
S in=20
Kern_malloc.c  which caches the  malloc_type that most recently freed .   A=
nd any more corruption debugging methods we already have ?

Thanks
Suresh

-----Original Message-----
From: Alfred Perlstein [mailto:bright@mu.org]=20
Sent: Thursday, January 02, 2014 9:53 PM
To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org
Subject: Re: Reference count race window


On 1/2/14, 6:38 PM, Gumpula, Suresh wrote:
> Hi Alfred,
>        I agree that there could have been an extra/invalid  crfree() whic=
h decremented the count  and  looks valid  crhold(acquire) from socket code=
  panic'ed in my case. As per your suggestion if we
>   replace  the assert with if condition in release,  we will end up panic=
ing  when the actual  crfree() happens.  But we may not be knowing who crfr=
ee'ed() in the first and invalid place.  Am I correct?   I will try your su=
ggestion.
>
> Can you please bit more explain your array trick ?

I think the simplest thing to do would be to replace crfree with a macro to=
 pass __FILE__ and __LINE__ down into the actual crfree (or you can use bui=
ltin_return_address to get the stack address of the caller instead.

Then just either have an array of {file,line} tuples or instead just return=
 addresses in the struct.

Just extend struct ucred and add this:

#define MAX_PREV 10
const char *files[MAX_PREV];
int lines[MAX_PREV];

Then hack your own version of refcount_release(), call it
refcount_release2() but have it take a pointer to an integer that it will w=
rite the value of the old refcount into.

Then you can use that return value like so:

if (old_refcount < MAX_PREV) {
   cred->files[old_refcount] =3D pointer_to_file_name;
   cred->lines[old_refcount] =3D line_number; }

Then add the assert.  or better yet, turn on INVARIANTS (or maybe try each =
option in turn as INVARIANTS might hide the bug).

When you crash you can then see the last few callers who did free inside th=
e struct.

Since the refcount "old_refcount" is atomically manipulated you should see =
the last few frees that send you negative.

-Alfred

>
> Thanks
> Suresh
>
>
>
> -----Original Message-----
> From: owner-freebsd-hackers@freebsd.org=20
> [mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of Alfred=20
> Perlstein
> Sent: Thursday, January 02, 2014 8:21 PM
> To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org
> Subject: Re: Reference count race window
>
>
> On 1/2/14, 3:53 PM, Gumpula, Suresh wrote:
>>>> Without changing the return-value semantics of refcount_acquire, we=20
>>>> have introduced a panic if we detected a race as below.
>>>> static __inline void
>>>> refcount_acquire(volatile u_int *count) {
>>>>            u_int old;
>>>>
>>>>            old =3D atomic_fetchadd_int(count, 1);
>>>>            if (old =3D=3D 0) {
>>>>              panic("refcount_acquire race condition detected!\n");
>>>>            }
>>>>>>> so what is the stacktrace of the panic?
>> It's from the socket code calling crhold.   It's a non debug build( NO I=
NVARIANTS )
>>
>> #4  0xffffffff80331d34 in panic (fmt=3D0xffffffff805c1e60=20
>> "refcount_acquire race condition detected!\n") at
>> ../../../../sys/kern/kern_shutdown.c:1009
>> #5  0xffffffff80326662 in refcount_acquire (count=3D<optimized out>) at
>> ../../../../sys/sys/refcount.h:65
>> #6  crhold (cr=3D<optimized out>) at
>> ../../../../sys/kern/kern_prot.c:1814
>> #7  0xffffffff803aa0d9 in socreate (dom=3D<optimized out>,=20
>> aso=3D0xffffff80345c1b00, type=3D<optimized out>, proto=3D0,=20
>> cred=3D0xffffff0017d7aa00, td=3D0xffffff000b294410) at
>> ../../../../sys/kern/uipc_socket.c:441
>> #8  0xffffffff803b2e5c in socket (td=3D0xffffff000b294410,
>> uap=3D0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201
>> #9  0xffffffff80539ecb in syscall (frame=3D0xffffff80345c1c80) at
>> ../../../../sys/amd64/amd64/trap.c:1260
>>
> If it's a non-debug build then how do you know that someone isn't incorre=
ctly lowering the refcount?
>
> Please try some invariants or at least manually turn on the one KASSERT I=
 mentioned.
>
> Another trick would be to add a an array of char*+int for the last few pl=
aces that decremented, you can use the returned refcount as an index to tha=
t array to track who may be doing the extra frees.
>
> -Alfred
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list=20
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org=
"
>


From owner-freebsd-hackers@FreeBSD.ORG  Fri Jan  3 21:01:49 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 8496E49D
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 21:01:49 +0000 (UTC)
Received: from lath.rinet.ru (lath.rinet.ru [195.54.192.90])
 by mx1.freebsd.org (Postfix) with ESMTP id 476E812BA
 for <freebsd-hackers@freebsd.org>; Fri,  3 Jan 2014 21:01:49 +0000 (UTC)
Received: by lath.rinet.ru (Postfix, from userid 222)
 id 59E738BE1; Sat,  4 Jan 2014 00:51:59 +0400 (MSK)
Date: Sat, 4 Jan 2014 00:51:59 +0400
From: Oleg Bulyzhin <oleg@freebsd.org>
To: freebsd-hackers@freebsd.org
Subject: atomic_load_acq @ i386/amd64
Message-ID: <20140103205159.GA99722@lath.rinet.ru>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Jan 2014 21:01:49 -0000


Hello.

I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs
with locked cmpxchg instruction? Comment about this
(in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe
acquire/release semantics does not require StoreLoad barrier so simple aligned
load should be enough. (because acquire/release semantics does not guarantee
sequential consistency).

-- 
Oleg.

================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru ===
================================================================


From owner-freebsd-hackers@FreeBSD.ORG  Sat Jan  4 06:41:22 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 409BAAA6
 for <freebsd-hackers@freebsd.org>; Sat,  4 Jan 2014 06:41:22 +0000 (UTC)
Received: from mail-pb0-f42.google.com (mail-pb0-f42.google.com
 [209.85.160.42])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 128691831
 for <freebsd-hackers@freebsd.org>; Sat,  4 Jan 2014 06:41:21 +0000 (UTC)
Received: by mail-pb0-f42.google.com with SMTP id uo5so16503541pbc.15
 for <freebsd-hackers@freebsd.org>; Fri, 03 Jan 2014 22:41:21 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:content-type:mime-version:subject:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id:references
 :to; bh=kgeRFR3+AzOkEo09wPlisHPzxcuyKYTwB+pZUcR6aOk=;
 b=f0cygry4gkENkt4+CiLmptS9qRTiLA4hfJwTFoUBzmvoWoHN787Ay0tjvqe92CBj/z
 FucUC0AIPNPe2qb2S+Cdkb/NlBXhfJj7DIud2XyYSXSMpj5nog9uEx0s+AHhbZJQdhH/
 B4hRm+kRAwp2AeeBXNX6vF3HS4UWw3q3wPpvjBgo1jMErd6v79GsJlIZlqcDVi3qmpKO
 yG6jpez9hynY6BxUksD4nHT46GjKkT/SsY6J/eMbD2ePHbOxr9T5nmI6MdVgkkvu7bhC
 9tM1EQvGLFg+nUHu2e5wVUyZqgGjqVm+WncM/Jb0rZjAmgumq19NjFhb5KTcaBZqRcAL
 dOIQ==
X-Gm-Message-State: ALoCoQndkDfUJ/9p33qiV5FGvEFDSGh8l166Jqwf3fv+wB/aHLrce6q7RPyzC9IpAJxg89qnG6wN
X-Received: by 10.68.189.133 with SMTP id gi5mr101475889pbc.57.1388817681188; 
 Fri, 03 Jan 2014 22:41:21 -0800 (PST)
Received: from [192.168.2.136] (99-74-169-43.lightspeed.sntcca.sbcglobal.net.
 [99.74.169.43])
 by mx.google.com with ESMTPSA id de1sm113285379pbc.7.2014.01.03.22.41.18
 for <multiple recipients>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 03 Jan 2014 22:41:19 -0800 (PST)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\))
Subject: Re: pthread basics and contention
From: Tim Kientzle <tim@kientzle.com>
In-Reply-To: <CAHwLALPNoerSBvP1wOiLQPE6oiTO4j+k3nU0=mq6ogRfXQKZRg@mail.gmail.com>
Date: Fri, 3 Jan 2014 22:41:15 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <4EFEA29F-4D6E-4B4A-8C26-E15FA62B574C@kientzle.com>
References: <20140103051707.C11B6200F5@smtp.hushmail.com>
 <CAHwLALPNoerSBvP1wOiLQPE6oiTO4j+k3nU0=mq6ogRfXQKZRg@mail.gmail.com>
To: chump1@hushmail.com
X-Mailer: Apple Mail (2.1827)
Cc: "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Jan 2014 06:41:22 -0000

Depending on the calculation involved, you may
be memory bus constrained, not CPU constrained.

On modern processors, it is often the case that
it takes longer to get the data to/from memory than
to actually compute anything.  In those cases, splitting
your work into threads just gives you more CPUs
waiting on the same slow memory.

Deciding whether this is the issue or not requires
good processor-level profiling tools.

Tim


On Fri, Jan 3, 2014 at 12:17 AM,  <chump1@hushmail.com> wrote:
>=20
> I have a fairly simple task that involves processing something in a 2D =
array, MxN times. I took a naive approach, 1x process 1x thread, and it =
took a little longer than desired. Well now, I could do better with some =
multi processing, especially on a multi core box, right?
>=20
> Well, I have not had much luck. At first I spawned M threads and had =
each iterate over each N in turn, with M between 25-35. It took much, =
much longer than the single thread. I figured contention and overhead =
were costing me big, and gave it a shot with a scaled down version of =
the problem, M=3D10. Still, much slower than the single thread. A little =
confused, I went back to the big problem set (25-35), and made a new =
program that spawned only two threads, and each is limited to processing =
only even or only odd data sets. Even that still takes twice as long as =
the single thread version! What is up with that?
>=20
> More important asides, I am barely doing any real processing at all. =
It is basically a no-op, barely doing more than incrementing the =
counter. Should I expect to see performance gains once I am doing real =
work in the processing portion of my program? Should I expect to see =
much different behavior on a different OS? Also I have one physical =
processor, two cores. Would I see better gains with more cores? How do =
you find processes and threads scale against hardware overall?
>=20
> Thanks!
>=20
> Sent using Hushmail



From owner-freebsd-hackers@FreeBSD.ORG  Sat Jan  4 17:29:33 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 2ED75F46;
 Sat,  4 Jan 2014 17:29:33 +0000 (UTC)
Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id AE2C51193;
 Sat,  4 Jan 2014 17:29:32 +0000 (UTC)
Received: from tom.home (kostik@localhost [127.0.0.1])
 by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id s04HTNqL097496;
 Sat, 4 Jan 2014 19:29:23 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua s04HTNqL097496
Received: (from kostik@localhost)
 by tom.home (8.14.7/8.14.7/Submit) id s04HTNQm097495;
 Sat, 4 Jan 2014 19:29:23 +0200 (EET)
 (envelope-from kostikbel@gmail.com)
X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com
 using -f
Date: Sat, 4 Jan 2014 19:29:23 +0200
From: Konstantin Belousov <kostikbel@gmail.com>
To: Oleg Bulyzhin <oleg@freebsd.org>
Subject: Re: atomic_load_acq @ i386/amd64
Message-ID: <20140104172923.GY59496@kib.kiev.ua>
References: <20140103205159.GA99722@lath.rinet.ru>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="pfhDleuqWB4Kh3F0"
Content-Disposition: inline
In-Reply-To: <20140103205159.GA99722@lath.rinet.ru>
User-Agent: Mutt/1.5.22 (2013-10-16)
X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00,
 DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no
 version=3.3.2
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home
Cc: freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Jan 2014 17:29:33 -0000


--pfhDleuqWB4Kh3F0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jan 04, 2014 at 12:51:59AM +0400, Oleg Bulyzhin wrote:
>=20
> Hello.
>=20
> I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs
> with locked cmpxchg instruction? Comment about this
> (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe
> acquire/release semantics does not require StoreLoad barrier so simple al=
igned
> load should be enough. (because acquire/release semantics does not guaran=
tee
> sequential consistency).

You did not explicitely wrote which statement in the comment is false, in
your opinion.

FreeBSD assumes a property of _acq/_rel stuff which is sometimes called
'total lock ordering'. It is indeed sort of sequential consistency, but
only for atomic+membar ops. Would atomic_load_acq()  implemented as plain
load, it can pass stores, in particular stores from the _rel op, which
breaks the guarantee.

For x86, there are indeed two possible schemes for implementing critical
section, one is lock cmpxchg for get(), and plain store for release(),
which is what we use. Another is plain load for get(), and xchg for
release().  Then, the load_acq() must be adopted to not break the acq/rel
consistency, and since we use plain store for release(), load_acq must
use serialing instruction.

--pfhDleuqWB4Kh3F0
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (FreeBSD)

iQIcBAEBAgAGBQJSyETyAAoJEJDCuSvBvK1Bo5QP/1A+2IP95QtfUmdMb5KYY1XT
w2oQIODKTsJ9pTjreuNhj4ShdcPJ5IhxilYrLY4lcUycdY/LQzVypO0/2M/1L/TJ
5KcHOYdcsxvEd7gQHqIzgMJLLnHtLK0CT3D2VJ/Tee67FB/fGbCOa55JIL0OWbeD
E4gUvqZovhIUE7tjqZW7Dcco6IfPWtvMnr5CIIRR3b7s4Yud4gW5dI1NUfL/jvl9
PwcJQo/KOeFL+7ZkGR6EN5pY9q8e/dNLsJGLbGYjmKboYZN6GfPIZ5Blri0v1yEM
nQCs6j+Smhthc1x3Uvi5HdUSc4PcvzRDkHltKAW5+Tuo2gQPMEoIr75AjPWjYdTw
cCOjP9mHnZPcSkv5CoDGh+LrbFBr3adgSRa6wD08GJxEZ24wgeXwtBW+jYX+IoRF
Ze9nNW91pMsfWKhwxPGs+RSJCMeRgenLRCppg86yGHJ33gUGwRIguijqiH87MOLt
IhHrhJV6pk3uZIWB6/Ktv+C4TsTxRtyoIQ1ZZnqq5aIv6uxg+4HTm2UB1fc7vTCy
vHi26KTpCrGU5daPRoEJvS8P41Zuw/Ghpc1Ky/DV7ZqxoRLgNz6MlISsscxoZBWU
UMCcW/HEcu3tHe8yeZ5rB95H6/r5LObnPX4/f5JSkexjSWuEv78Dj1xP/2zGDCdV
pZXfiQ/9oLZVxkV6Z5wy
=NHEV
-----END PGP SIGNATURE-----

--pfhDleuqWB4Kh3F0--

From owner-freebsd-hackers@FreeBSD.ORG  Sat Jan  4 23:29:17 2014
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 4D22AF33;
 Sat,  4 Jan 2014 23:29:17 +0000 (UTC)
Received: from lath.rinet.ru (lath.rinet.ru [195.54.192.90])
 by mx1.freebsd.org (Postfix) with ESMTP id 09FE01A61;
 Sat,  4 Jan 2014 23:29:16 +0000 (UTC)
Received: by lath.rinet.ru (Postfix, from userid 222)
 id 604588210; Sun,  5 Jan 2014 03:29:10 +0400 (MSK)
Date: Sun, 5 Jan 2014 03:29:10 +0400
From: Oleg Bulyzhin <oleg@FreeBSD.org>
To: Konstantin Belousov <kostikbel@gmail.com>
Subject: Re: atomic_load_acq @ i386/amd64
Message-ID: <20140104232910.GA12331@lath.rinet.ru>
References: <20140103205159.GA99722@lath.rinet.ru>
 <20140104172923.GY59496@kib.kiev.ua>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140104172923.GY59496@kib.kiev.ua>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-hackers@freebsd.org, Oleg Bulyzhin <oleg@freebsd.org>
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers/>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Jan 2014 23:29:17 -0000

On Sat, Jan 04, 2014 at 07:29:23PM +0200, Konstantin Belousov wrote:
> On Sat, Jan 04, 2014 at 12:51:59AM +0400, Oleg Bulyzhin wrote:
> > 
> > Hello.
> > 
> > I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs
> > with locked cmpxchg instruction? Comment about this
> > (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe
> > acquire/release semantics does not require StoreLoad barrier so simple aligned
> > load should be enough. (because acquire/release semantics does not guarantee
> > sequential consistency).
> 
> You did not explicitely wrote which statement in the comment is false, in
> your opinion.

> 
> FreeBSD assumes a property of _acq/_rel stuff which is sometimes called
> 'total lock ordering'. It is indeed sort of sequential consistency, but
> only for atomic+membar ops. Would atomic_load_acq()  implemented as plain
> load, it can pass stores, in particular stores from the _rel op, which
> breaks the guarantee.
> 
> For x86, there are indeed two possible schemes for implementing critical
> section, one is lock cmpxchg for get(), and plain store for release(),
> which is what we use. Another is plain load for get(), and xchg for
> release().  Then, the load_acq() must be adopted to not break the acq/rel
> consistency, and since we use plain store for release(), load_acq must
> use serialing instruction.

Perhaps i was not clear enough, i'm talking about this one:
"However, loads may pass stores, so for atomic_load_acq we have to
 ensure a Store/Load barrier to do the load in SMP kernels."

As far as i know acquire/release semantics guarantees following:
if we have this code
<prev_code>
_acq
<some code>
_rel
<post_code>

following statements are true:
1) <some code> cannot leave (due to reordering) acq/rel block
2) <prev_code> may leak past _acq 
3) <post_code> may leak before _rel
So neither _acq nor _rel requires full membar. I.e.
op_acq is:
<op>
<one way membar, down->up reordering is prohibited>
op_rel is:
<one way membar, up->down reordering is prohibited>
<op>

Intel documentation says about only thing (for simple load/stores) can be
reordered: "Reads may be reordered with older writes to different locations
but not with older writes to the same location."

So, if older store can pass our load_acq() it would not break requirements.
And i do not understand how load op from load_acq() can pass store op from
store_rel(), intel doc says: "Writes are not reordered with older reads". 

Well, while writing this email i realized what is disturbing me: it's atomic(9)
"Multiple Processors" section. It claims atomics are not atomic in common MP
case and says atomics are atomic @i386. It looks strange for me:
1) i guess it's not "atomic" even for i386/MP without proper membar pairing.
2) if we have acq/rel modifiers for atomics why we cannot guarantee "atomicity"
   for any MP arch?

P.S. please correct me if i'm wrong in my statements, i'm spending my new year
holidays for ignorance elimination. ;)

-- 
Oleg.

================================================================
=== Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru ===
================================================================