From owner-freebsd-hackers@FreeBSD.ORG Wed Jan 1 21:20:58 2014 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 236ACE53; Wed, 1 Jan 2014 21:20:58 +0000 (UTC) Received: from mail-oa0-x229.google.com (mail-oa0-x229.google.com [IPv6:2607:f8b0:4003:c02::229]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D698A1611; Wed, 1 Jan 2014 21:20:57 +0000 (UTC) Received: by mail-oa0-f41.google.com with SMTP id j17so14253092oag.14 for ; Wed, 01 Jan 2014 13:20:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=qw4fJY9NTpa2WTFbgIG1Ny52kpxvxA/XduyNE1CHqBk=; b=wQL5w7MEeXkwfDHphx6UhJomrSxbBwHnj0wfXwdqr7x2zM63Nk30Z/3qEWRu4a9ajb qIeMTlThEwU0uwMab2I3YByUK6DRDGGzL16dQyFDvf89zuGX6ZsNwd9VzBaklEgOaWoD ZEYPRvlNumNwR+OmYu15PgIZwlqKosjIsEuPTEkysGwp0g0/zC1wEZj07vdHvutkv/7/ N705EtpLN+m25V9oFm/dCWdfBJOOFL2K8QxBk1FS4ecgHNBM64CCjZFGM6Bi8W7W3QEW +iFEMUc3UgCpgx0QN2h1jOBH4BDroCJIChNsqskAO14vRv3agv5D8t/jbEJJ6XfQlEqt 9Fyw== MIME-Version: 1.0 X-Received: by 10.60.33.7 with SMTP id n7mr53196429oei.25.1388611257171; Wed, 01 Jan 2014 13:20:57 -0800 (PST) Sender: pali.gabor@gmail.com Received: by 10.182.22.44 with HTTP; Wed, 1 Jan 2014 13:20:57 -0800 (PST) Date: Wed, 1 Jan 2014 22:20:57 +0100 X-Google-Sender-Auth: nZ-MT4eiPxM_TEwxpbJTFGhdjNk Message-ID: Subject: Re: Call for FreeBSD 2013Q4 (October-December) Status Reports From: Gabor Pali To: hackers@freebsd.org, current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jan 2014 21:20:58 -0000 Dear FreeBSD Community, Please note that the submission date for the October to December Quarterly Status Reports is January 14th, 2014, a little less than two weeks away. Please consult my previous message for the details: On Sat, Dec 14, 2013 at 2:05 PM, Gabor Pali wrote: > Dear FreeBSD Community, > > Please note that the next submission date for the October to December > Quarterly Status Reports is January 14th, 2014, about a month away. > > They do not have to be very long -- basically they may be about > anything that lets people know what is going on around the FreeBSD > Project. Submission of reports is not restricted to committers: > Anyone who is doing anything interesting and FreeBSD-related can (and > therefore encouraged to) write one! > > The preferred and easiest submission method is to use the XML > generator [1] with the result emailed as an attachment to us, that is, > monthly@FreeBSD.org [2]. There is also an XML template [3] which can > be filled out manually and attached if preferred. For the expected > content and style, please study our guidelines on how to write a good > status reports [4]. > > To enable compilation and publication of the Q4 report as soon as > possible for the January 14th deadline, please be prompt with any > report submissions you may have. > > We are looking forward to all of your 2013Q4 reports! > > Thanks, > Gabor > > > [1] http://www.freebsd.org/cgi/monthly.cgi > [2] mailto:monthly@freebsd.org > [3] http://www.freebsd.org/news/status/report-sample.xml > [4] http://www.freebsd.org/news/status/howto.html From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 22:17:39 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EDD41F99 for ; Thu, 2 Jan 2014 22:17:39 +0000 (UTC) Received: from mx11.netapp.com (mx11.netapp.com [216.240.18.76]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id D068D18FE for ; Thu, 2 Jan 2014 22:17:39 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; d="scan'208,217";a="93311489" Received: from vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) by mx11-out.netapp.com with ESMTP; 02 Jan 2014 14:17:39 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.03.0123.003; Thu, 2 Jan 2014 14:17:39 -0800 From: "Gumpula, Suresh" To: "freebsd-hackers@freebsd.org" Subject: Reference count race window Thread-Topic: Reference count race window Thread-Index: Ac8ICHHs+v5D+0UyRX2b4aawkpE+qg== Date: Thu, 2 Jan 2014 22:17:38 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: "Gumpula, Suresh" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 22:17:40 -0000 Hi, I am Suresh from NetAPP and I have questions/queries related to the refe= rence count usage in the BSD kernel. We are seeing some corruptions/use aft= er free issues and while debugging we found that the corruption pattern is a ucr= ed/crgroups structure and started looking at ucred reference count implemen= ation. This is my understanding of ref count race window, please correct me if I = am wrong. It seems there is a timing window exposed by the FreeBSD reference count us= age/implementation. Let's start with the definitions of the acquire and rel= ease routines in freebsd/sys/sys/refcount.h static __inline void refcount_acquire(volatile u_int *count) { atomic_add_acq_int(count, 1); } static __inline int refcount_release(volatile u_int *count) { u_int old; /* XXX: Should this have a rel membar? */ old =3D atomic_fetchadd_int(count, -1); KASSERT(old > 0, ("negative refcount %p", count)); return (old =3D=3D 1); } As implemented, a call to refcount_acquire atomically increments the refer= ence count while refcount_release decrements the reference count and returns true if this release dropped the reference = count to zero. Consider the following sequence of events in the absence of other external = synchronization: * Object foo has a refcount of 1 * Thread a on processor m calls refcount_release on foo. * Very soon after (in CPU terms) thread b on processor n calls refcount_acq= uire on foo. * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int i= n thread b, decrementing foo's refcount to zero and setting old to 1. refcount_releas= e returns true. * atomic_add_acq_int in thread b increments the reference count to 1! * thread a, seeing refcount_release return success, frees foo. * thread b, believing it has a reference count on foo, continues to use it. * The major hole here is that refcount_acquire is a void function. If it al= so returned status, calling software could determine that it had a valid reference and take = appropriate action if it failed to acquire. One such implementation might look like: static __inline int refcount_acquire(volatile u_int *count) { u_int old; old =3D atomic_fetchadd_int(count, 1); return (old !=3D 0); } This change would require modification of all calls to refcount_acquire and= determining appropriate action in the case of a non-success return. Without changing the return-value semantics of refcount_acquire, we have in= troduced a panic if we detected a race as below. static __inline void refcount_acquire(volatile u_int *count) { u_int old; old =3D atomic_fetchadd_int(count, 1); if (old =3D=3D 0) { panic("refcount_acquire race condition detected!\n"); } } After this change , we have seen this panic in one of our systems. Could so= meone look at my understanding and give me some ways to narrow down this pr= oblem. As I mentioned earlier, one option is to change refcount_acquire to be non = void and change all the callers, but it seems there are many paths to be ch= anged on failure case. Thank you Suresh From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 23:00:28 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AEE25886 for ; Thu, 2 Jan 2014 23:00:28 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 9A2821CE9 for ; Thu, 2 Jan 2014 23:00:28 +0000 (UTC) Received: from Alfreds-MacBook-Pro.local (50-204-88-5-static.hfc.comcastbusiness.net [50.204.88.5]) by elvis.mu.org (Postfix) with ESMTPSA id 658481A3C35; Thu, 2 Jan 2014 14:50:39 -0800 (PST) Message-ID: <52C5ED3E.4020805@mu.org> Date: Thu, 02 Jan 2014 14:50:38 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: "Gumpula, Suresh" , "freebsd-hackers@freebsd.org" Subject: Re: Reference count race window References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 23:00:28 -0000 On 1/2/14, 2:17 PM, Gumpula, Suresh wrote: > Hi, > I am Suresh from NetAPP and I have questions/queries related to the reference count usage in the BSD kernel. We are seeing some corruptions/use after free > issues and while debugging we found that the corruption pattern is a ucred/crgroups structure and started looking at ucred reference count implemenation. > > This is my understanding of ref count race window, please correct me if I am wrong. > > > It seems there is a timing window exposed by the FreeBSD reference count usage/implementation. Let's start with the definitions of the acquire and release routines > in freebsd/sys/sys/refcount.h > > static __inline void > refcount_acquire(volatile u_int *count) > { > > atomic_add_acq_int(count, 1); > } > > static __inline int > refcount_release(volatile u_int *count) > { > u_int old; > > /* XXX: Should this have a rel membar? */ > old = atomic_fetchadd_int(count, -1); > KASSERT(old > 0, ("negative refcount %p", count)); > return (old == 1); > } > > As implemented, a call to refcount_acquire atomically increments the reference count while refcount_release decrements > the reference count and returns true if this release dropped the reference count to zero. > > Consider the following sequence of events in the absence of other external synchronization: > > * Object foo has a refcount of 1 > * Thread a on processor m calls refcount_release on foo. > * Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo. > * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b, > decrementing foo's refcount to zero and setting old to 1. refcount_release returns true. > * atomic_add_acq_int in thread b increments the reference count to 1! > * thread a, seeing refcount_release return success, frees foo. > * thread b, believing it has a reference count on foo, continues to use it. > > * The major hole here is that refcount_acquire is a void function. If it also returned status, > calling software could determine that it had a valid reference and take appropriate action if it failed to acquire. > > > One such implementation might look like: > static __inline int > refcount_acquire(volatile u_int *count) > { > u_int old; > > old = atomic_fetchadd_int(count, 1); > return (old != 0); > } > > This change would require modification of all calls to refcount_acquire and determining appropriate action in the case of a non-success return. > > > Without changing the return-value semantics of refcount_acquire, we have introduced a panic if we detected a race as below. > static __inline void > refcount_acquire(volatile u_int *count) > { > u_int old; > > old = atomic_fetchadd_int(count, 1); > if (old == 0) { > panic("refcount_acquire race condition detected!\n"); > } > } > > After this change , we have seen this panic in one of our systems. Could someone look at my understanding and give me some ways to narrow down this problem. > As I mentioned earlier, one option is to change refcount_acquire to be non void and change all the callers, but it seems there are many paths to be changed on failure case. > > > Thank you > Suresh > > _________________________ Hey Suresh, In theory this shouldn't happen due to pointer/thread ownership of the resource. This means that usually a cred is copied via refcount to another object and by the time the refcount hits 1 then only that one object should be pointing to it. That means that if someone is raising the refcount at the same time then they are looking into an object that is in the process of being destroyed! Going back to your example: * Object foo has a refcount of 1 * Thread a on processor m calls refcount_release on foo. * Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo. ^--- this should not be happening as "foo" should no longer be accessible to other subsystems. imagine this would be like some other CPU calling rfork() on a process that is in the middle of exit(). This should *not* happen. * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b, decrementing foo's refcount to zero and setting old to 1. refcount_release returns true. * atomic_add_acq_int in thread b increments the reference count to 1! * thread a, seeing refcount_release return success, frees foo. * thread b, believing it has a reference count on foo, continues to use it. While it's possible that there *may* be a bug here, I think it would make sense for you to add more instrumentation to your code. Are you testing with INVARIANTS enabled? otherwise refcount_release should be panic'ing due to the KASSERT! static __inline int refcount_release(volatile u_int *count) { u_int old; /* XXX: Should this have a rel membar? */ old = atomic_fetchadd_int(count, -1); KASSERT(old > 0, ("negative refcount %p", count)); return (old == 1); } Perhaps you should either enable INVARIANTS... or you can turn that one single KASSERT into an unconditional test like so: static __inline int refcount_release(volatile u_int *count) { u_int old; /* XXX: Should this have a rel membar? */ old = atomic_fetchadd_int(count, -1); if (old < 0) panic("negative refcount %p", count); return (old == 1); } That ought to help you catch the bug. -Alfred From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 23:39:28 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4AC24410 for ; Thu, 2 Jan 2014 23:39:28 +0000 (UTC) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 111821F39 for ; Thu, 2 Jan 2014 23:39:27 +0000 (UTC) Received: from [192.168.1.73] (254C510A.nat.pool.telekom.hu [37.76.81.10]) (authenticated bits=0) by vps1.elischer.org (8.14.7/8.14.7) with ESMTP id s02NdLM2019146 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 2 Jan 2014 15:39:25 -0800 (PST) (envelope-from julian@freebsd.org) Message-ID: <52C5F8A3.9000902@freebsd.org> Date: Fri, 03 Jan 2014 00:39:15 +0100 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: Alfred Perlstein , "Gumpula, Suresh" , "freebsd-hackers@freebsd.org" Subject: Re: Reference count race window References: <52C5ED3E.4020805@mu.org> In-Reply-To: <52C5ED3E.4020805@mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 23:39:28 -0000 On 1/2/14, 11:50 PM, Alfred Perlstein wrote: > > On 1/2/14, 2:17 PM, Gumpula, Suresh wrote: >> Hi, >> I am Suresh from NetAPP and I have questions/queries related to >> the reference count usage in the BSD kernel. We are seeing some >> corruptions/use after free >> issues and while debugging we found that the corruption pattern >> is a ucred/crgroups structure and started looking at ucred >> reference count implemenation. >> >> This is my understanding of ref count race window, please correct >> me if I am wrong. >> >> >> It seems there is a timing window exposed by the FreeBSD reference >> count usage/implementation. Let's start with the definitions of the >> acquire and release routines >> in freebsd/sys/sys/refcount.h >> >> static __inline void >> refcount_acquire(volatile u_int *count) >> { >> >> atomic_add_acq_int(count, 1); >> } >> >> static __inline int >> refcount_release(volatile u_int *count) >> { >> u_int old; >> >> /* XXX: Should this have a rel membar? */ >> old = atomic_fetchadd_int(count, -1); >> KASSERT(old > 0, ("negative refcount %p", count)); >> return (old == 1); >> } >> >> As implemented, a call to refcount_acquire atomically increments >> the reference count while refcount_release decrements >> the reference count and returns true if this release dropped the >> reference count to zero. >> >> Consider the following sequence of events in the absence of other >> external synchronization: >> >> * Object foo has a refcount of 1 >> * Thread a on processor m calls refcount_release on foo. >> * Very soon after (in CPU terms) thread b on processor n calls >> refcount_acquire on foo. >> * atomic_fetchadd_int operating in thread a stalls the >> atomic_add_acq_int in thread b, >> decrementing foo's refcount to zero and setting old to 1. >> refcount_release returns true. >> * atomic_add_acq_int in thread b increments the reference count to 1! >> * thread a, seeing refcount_release return success, frees foo. >> * thread b, believing it has a reference count on foo, continues to >> use it. >> >> * The major hole here is that refcount_acquire is a void function. >> If it also returned status, >> calling software could determine that it had a valid reference >> and take appropriate action if it failed to acquire. >> >> >> One such implementation might look like: >> static __inline int >> refcount_acquire(volatile u_int *count) >> { >> u_int old; >> >> old = atomic_fetchadd_int(count, 1); >> return (old != 0); >> } >> >> This change would require modification of all calls to >> refcount_acquire and determining appropriate action in the case of >> a non-success return. >> >> >> Without changing the return-value semantics of refcount_acquire, we >> have introduced a panic if we detected a race as below. >> static __inline void >> refcount_acquire(volatile u_int *count) >> { >> u_int old; >> >> old = atomic_fetchadd_int(count, 1); >> if (old == 0) { >> panic("refcount_acquire race condition detected!\n"); >> } so what is the stacktrace of the panic? >> } >> >> After this change , we have seen this panic in one of our systems. >> Could someone look at my understanding and give me some ways to >> narrow down this problem. >> As I mentioned earlier, one option is to change refcount_acquire to >> be non void and change all the callers, but it seems there are many >> paths to be changed on failure case. >> >> >> Thank you >> Suresh >> >> _________________________ > Hey Suresh, > > In theory this shouldn't happen due to pointer/thread ownership of > the resource. > My memory is that the refcount infrastructure makes some assumptions about how it is called. and the cred code makes some assumptions about what is going on too. I do agree that there is a race as outlined by you, but I believe that it was suposed to be impossible to reach that due to the fact that creds were only actually released in special cases. In those cases we can guarantee that no-one else should be able to have a pointer to that cred as the pointer is supposed to be found after the locking of the appropriate proc/thread structure. Is it possible that you have changed the possible places that creds are released? it is possible that we ourselves have broken this. I have not looked at it for some years. Maybe it is time to change the way that the refcount interface is used here so that we do know if we succeeded in getting an only reference.. it would probably require recoding because there is always a legitimate place to get a reference count of 1 (the initial setup) and initial and subsequent acquisition of reference counts is often achieved with the same code. > This means that usually a cred is copied via refcount to another > object and by the time the refcount hits 1 then only that one object > should be pointing to it. > > That means that if someone is raising the refcount at the same time > then they are looking into an object that is in the process of being > destroyed! > > Going back to your example: > > * Object foo has a refcount of 1 > * Thread a on processor m calls refcount_release on foo. > * Very soon after (in CPU terms) thread b on processor n calls > refcount_acquire on foo. > > ^--- this should not be happening as "foo" should no longer be > accessible to other subsystems. > imagine this would be like some other CPU calling rfork() on a > process that is in the middle of exit(). This should *not* happen. to expand.. there are locks that are supposed to stop this from happening. Exit should not be able to proceed until it is sure that the proc structure is only accessed by itself, and the cred pointer should never be cached without a reference addition. Meaning that the count can only be 1 in this case when the lock has been held. If this has been changed then yes there is a bug.. you may try check the lock status of various locks when removing the last reference. > > * atomic_fetchadd_int operating in thread a stalls the > atomic_add_acq_int in thread b, > decrementing foo's refcount to zero and setting old to 1. > refcount_release returns true. > * atomic_add_acq_int in thread b increments the reference count to 1! > * thread a, seeing refcount_release return success, frees foo. > * thread b, believing it has a reference count on foo, continues to > use it. > > > While it's possible that there *may* be a bug here, I think it would > make sense for you to add more instrumentation to your code. > > Are you testing with INVARIANTS enabled? otherwise refcount_release > should be panic'ing due to the KASSERT! > > static __inline int > refcount_release(volatile u_int *count) > { > u_int old; > > /* XXX: Should this have a rel membar? */ > old = atomic_fetchadd_int(count, -1); > KASSERT(old > 0, ("negative refcount %p", count)); > return (old == 1); > } > > Perhaps you should either enable INVARIANTS... or you can turn that > one single KASSERT into an unconditional test like so: > > > static __inline int > refcount_release(volatile u_int *count) > { > u_int old; > > /* XXX: Should this have a rel membar? */ > old = atomic_fetchadd_int(count, -1); > if (old < 0) panic("negative refcount %p", count); > return (old == 1); > } > > That ought to help you catch the bug. > > -Alfred > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to > "freebsd-hackers-unsubscribe@freebsd.org" > > From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 23:54:01 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 75D4C6CB; Thu, 2 Jan 2014 23:54:01 +0000 (UTC) Received: from mx11.netapp.com (mx11.netapp.com [216.240.18.76]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 55E641077; Thu, 2 Jan 2014 23:54:01 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; d="scan'208";a="93338555" Received: from vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) by mx11-out.netapp.com with ESMTP; 02 Jan 2014 15:54:00 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) with mapi id 14.03.0123.003; Thu, 2 Jan 2014 15:54:00 -0800 From: "Gumpula, Suresh" To: Julian Elischer , Alfred Perlstein , "freebsd-hackers@freebsd.org" Subject: RE: Reference count race window Thread-Topic: Reference count race window Thread-Index: AQHPCA0ePlCdkUTb2Eq8swVe1xATjJpynbOA//98gUA= Date: Thu, 2 Jan 2014 23:53:59 +0000 Message-ID: References: <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org> In-Reply-To: <52C5F8A3.9000902@freebsd.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 23:54:01 -0000 >> Without changing the return-value semantics of refcount_acquire, we=20 >> have introduced a panic if we detected a race as below. >> static __inline void >> refcount_acquire(volatile u_int *count) { >> u_int old; >> >> old =3D atomic_fetchadd_int(count, 1); >> if (old =3D=3D 0) { >> panic("refcount_acquire race condition detected!\n"); >> } >>>>>so what is the stacktrace of the panic? It's from the socket code calling crhold. It's a non debug build( NO INVA= RIANTS ) #4 0xffffffff80331d34 in panic (fmt=3D0xffffffff805c1e60 "refcount_acquire= race condition detected!\n") at ../../../../sys/kern/kern_shutdown.c:1009 #5 0xffffffff80326662 in refcount_acquire (count=3D) at ../= ../../../sys/sys/refcount.h:65 #6 crhold (cr=3D) at ../../../../sys/kern/kern_prot.c:1814 #7 0xffffffff803aa0d9 in socreate (dom=3D, aso=3D0xffffff80= 345c1b00, type=3D, proto=3D0, cred=3D0xffffff0017d7aa00, td= =3D0xffffff000b294410)=20 at ../../../../sys/kern/uipc_socket.c:441 #8 0xffffffff803b2e5c in socket (td=3D0xffffff000b294410, uap=3D0xffffff80= 345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201 #9 0xffffffff80539ecb in syscall (frame=3D0xffffff80345c1c80) at ../../../= ../sys/amd64/amd64/trap.c:1260 Thanks Suresh From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 01:21:18 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C3E94CC3; Fri, 3 Jan 2014 01:21:18 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id AD77816D3; Fri, 3 Jan 2014 01:21:18 +0000 (UTC) Received: from Alfreds-MacBook-Pro.local (50-204-88-5-static.hfc.comcastbusiness.net [50.204.88.5]) by elvis.mu.org (Postfix) with ESMTPSA id 575521A3C19; Thu, 2 Jan 2014 17:21:14 -0800 (PST) Message-ID: <52C61088.3080703@mu.org> Date: Thu, 02 Jan 2014 17:21:12 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: "Gumpula, Suresh" , Julian Elischer , "freebsd-hackers@freebsd.org" Subject: Re: Reference count race window References: <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 01:21:18 -0000 On 1/2/14, 3:53 PM, Gumpula, Suresh wrote: >>> Without changing the return-value semantics of refcount_acquire, we >>> have introduced a panic if we detected a race as below. >>> static __inline void >>> refcount_acquire(volatile u_int *count) { >>> u_int old; >>> >>> old = atomic_fetchadd_int(count, 1); >>> if (old == 0) { >>> panic("refcount_acquire race condition detected!\n"); >>> } >>>>>> so what is the stacktrace of the panic? > It's from the socket code calling crhold. It's a non debug build( NO INVARIANTS ) > > #4 0xffffffff80331d34 in panic (fmt=0xffffffff805c1e60 "refcount_acquire race condition detected!\n") at ../../../../sys/kern/kern_shutdown.c:1009 > #5 0xffffffff80326662 in refcount_acquire (count=) at ../../../../sys/sys/refcount.h:65 > #6 crhold (cr=) at ../../../../sys/kern/kern_prot.c:1814 > #7 0xffffffff803aa0d9 in socreate (dom=, aso=0xffffff80345c1b00, type=, proto=0, cred=0xffffff0017d7aa00, td=0xffffff000b294410) > at ../../../../sys/kern/uipc_socket.c:441 > #8 0xffffffff803b2e5c in socket (td=0xffffff000b294410, uap=0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201 > #9 0xffffffff80539ecb in syscall (frame=0xffffff80345c1c80) at ../../../../sys/amd64/amd64/trap.c:1260 > If it's a non-debug build then how do you know that someone isn't incorrectly lowering the refcount? Please try some invariants or at least manually turn on the one KASSERT I mentioned. Another trick would be to add a an array of char*+int for the last few places that decremented, you can use the returned refcount as an index to that array to track who may be doing the extra frees. -Alfred From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 02:38:25 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 39A84F63; Fri, 3 Jan 2014 02:38:25 +0000 (UTC) Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 17C1C1C9A; Fri, 3 Jan 2014 02:38:24 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.95,595,1384329600"; d="scan'208";a="134232804" Received: from vmwexceht05-prd.hq.netapp.com ([10.106.77.35]) by mx12-out.netapp.com with ESMTP; 02 Jan 2014 18:38:19 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht05-prd.hq.netapp.com ([10.106.77.35]) with mapi id 14.03.0123.003; Thu, 2 Jan 2014 18:38:19 -0800 From: "Gumpula, Suresh" To: Alfred Perlstein , Julian Elischer , "freebsd-hackers@freebsd.org" Subject: RE: Reference count race window Thread-Topic: Reference count race window Thread-Index: AQHPCA0ePlCdkUTb2Eq8swVe1xATjJpynbOA//98gUCAAJ/7AP//hPzA Date: Fri, 3 Jan 2014 02:38:18 +0000 Message-ID: References: <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org> <52C61088.3080703@mu.org> In-Reply-To: <52C61088.3080703@mu.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 02:38:25 -0000 Hi Alfred, I agree that there could have been an extra/invalid crfree() which d= ecremented the count and looks valid crhold(acquire) from socket code p= anic'ed in my case. As per your suggestion if we=20 replace the assert with if condition in release, we will end up panicing= when the actual crfree() happens. But we may not be knowing who crfree'= ed() in the first and invalid place. Am I correct? I will try your sugge= stion. Can you please bit more explain your array trick ? Thanks Suresh -----Original Message----- From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freeb= sd.org] On Behalf Of Alfred Perlstein Sent: Thursday, January 02, 2014 8:21 PM To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org Subject: Re: Reference count race window On 1/2/14, 3:53 PM, Gumpula, Suresh wrote: >>> Without changing the return-value semantics of refcount_acquire, we=20 >>> have introduced a panic if we detected a race as below. >>> static __inline void >>> refcount_acquire(volatile u_int *count) { >>> u_int old; >>> >>> old =3D atomic_fetchadd_int(count, 1); >>> if (old =3D=3D 0) { >>> panic("refcount_acquire race condition detected!\n"); >>> } >>>>>> so what is the stacktrace of the panic? > It's from the socket code calling crhold. It's a non debug build( NO IN= VARIANTS ) > > #4 0xffffffff80331d34 in panic (fmt=3D0xffffffff805c1e60=20 > "refcount_acquire race condition detected!\n") at=20 > ../../../../sys/kern/kern_shutdown.c:1009 > #5 0xffffffff80326662 in refcount_acquire (count=3D) at=20 > ../../../../sys/sys/refcount.h:65 > #6 crhold (cr=3D) at=20 > ../../../../sys/kern/kern_prot.c:1814 > #7 0xffffffff803aa0d9 in socreate (dom=3D,=20 > aso=3D0xffffff80345c1b00, type=3D, proto=3D0,=20 > cred=3D0xffffff0017d7aa00, td=3D0xffffff000b294410) at=20 > ../../../../sys/kern/uipc_socket.c:441 > #8 0xffffffff803b2e5c in socket (td=3D0xffffff000b294410,=20 > uap=3D0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201 > #9 0xffffffff80539ecb in syscall (frame=3D0xffffff80345c1c80) at=20 > ../../../../sys/amd64/amd64/trap.c:1260 > If it's a non-debug build then how do you know that someone isn't incorrect= ly lowering the refcount? Please try some invariants or at least manually turn on the one KASSERT I m= entioned. Another trick would be to add a an array of char*+int for the last few plac= es that decremented, you can use the returned refcount as an index to that = array to track who may be doing the extra frees. -Alfred _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/l= istinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 02:53:23 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B981932B; Fri, 3 Jan 2014 02:53:23 +0000 (UTC) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 97C091D8C; Fri, 3 Jan 2014 02:53:23 +0000 (UTC) Received: from Alfreds-MacBook-Pro.local (50-204-88-5-static.hfc.comcastbusiness.net [50.204.88.5]) by elvis.mu.org (Postfix) with ESMTPSA id 66D1F1A3C19; Thu, 2 Jan 2014 18:53:12 -0800 (PST) Message-ID: <52C62617.7020304@mu.org> Date: Thu, 02 Jan 2014 18:53:11 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: "Gumpula, Suresh" , Julian Elischer , "freebsd-hackers@freebsd.org" Subject: Re: Reference count race window References: <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org> <52C61088.3080703@mu.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 02:53:23 -0000 On 1/2/14, 6:38 PM, Gumpula, Suresh wrote: > Hi Alfred, > I agree that there could have been an extra/invalid crfree() which decremented the count and looks valid crhold(acquire) from socket code panic'ed in my case. As per your suggestion if we > replace the assert with if condition in release, we will end up panicing when the actual crfree() happens. But we may not be knowing who crfree'ed() in the first and invalid place. Am I correct? I will try your suggestion. > > Can you please bit more explain your array trick ? I think the simplest thing to do would be to replace crfree with a macro to pass __FILE__ and __LINE__ down into the actual crfree (or you can use builtin_return_address to get the stack address of the caller instead. Then just either have an array of {file,line} tuples or instead just return addresses in the struct. Just extend struct ucred and add this: #define MAX_PREV 10 const char *files[MAX_PREV]; int lines[MAX_PREV]; Then hack your own version of refcount_release(), call it refcount_release2() but have it take a pointer to an integer that it will write the value of the old refcount into. Then you can use that return value like so: if (old_refcount < MAX_PREV) { cred->files[old_refcount] = pointer_to_file_name; cred->lines[old_refcount] = line_number; } Then add the assert. or better yet, turn on INVARIANTS (or maybe try each option in turn as INVARIANTS might hide the bug). When you crash you can then see the last few callers who did free inside the struct. Since the refcount "old_refcount" is atomically manipulated you should see the last few frees that send you negative. -Alfred > > Thanks > Suresh > > > > -----Original Message----- > From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of Alfred Perlstein > Sent: Thursday, January 02, 2014 8:21 PM > To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org > Subject: Re: Reference count race window > > > On 1/2/14, 3:53 PM, Gumpula, Suresh wrote: >>>> Without changing the return-value semantics of refcount_acquire, we >>>> have introduced a panic if we detected a race as below. >>>> static __inline void >>>> refcount_acquire(volatile u_int *count) { >>>> u_int old; >>>> >>>> old = atomic_fetchadd_int(count, 1); >>>> if (old == 0) { >>>> panic("refcount_acquire race condition detected!\n"); >>>> } >>>>>>> so what is the stacktrace of the panic? >> It's from the socket code calling crhold. It's a non debug build( NO INVARIANTS ) >> >> #4 0xffffffff80331d34 in panic (fmt=0xffffffff805c1e60 >> "refcount_acquire race condition detected!\n") at >> ../../../../sys/kern/kern_shutdown.c:1009 >> #5 0xffffffff80326662 in refcount_acquire (count=) at >> ../../../../sys/sys/refcount.h:65 >> #6 crhold (cr=) at >> ../../../../sys/kern/kern_prot.c:1814 >> #7 0xffffffff803aa0d9 in socreate (dom=, >> aso=0xffffff80345c1b00, type=, proto=0, >> cred=0xffffff0017d7aa00, td=0xffffff000b294410) at >> ../../../../sys/kern/uipc_socket.c:441 >> #8 0xffffffff803b2e5c in socket (td=0xffffff000b294410, >> uap=0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201 >> #9 0xffffffff80539ecb in syscall (frame=0xffffff80345c1c80) at >> ../../../../sys/amd64/amd64/trap.c:1260 >> > If it's a non-debug build then how do you know that someone isn't incorrectly lowering the refcount? > > Please try some invariants or at least manually turn on the one KASSERT I mentioned. > > Another trick would be to add a an array of char*+int for the last few places that decremented, you can use the returned refcount as an index to that array to track who may be doing the extra frees. > > -Alfred > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 04:00:25 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4B9064A1 for ; Fri, 3 Jan 2014 04:00:25 +0000 (UTC) Received: from nm20.bullet.mail.bf1.yahoo.com (nm20.bullet.mail.bf1.yahoo.com [98.139.212.179]) by mx1.freebsd.org (Postfix) with SMTP id D23931286 for ; Fri, 3 Jan 2014 04:00:24 +0000 (UTC) Received: from [66.196.81.170] by nm20.bullet.mail.bf1.yahoo.com with NNFMP; 03 Jan 2014 04:00:18 -0000 Received: from [98.139.213.8] by tm16.bullet.mail.bf1.yahoo.com with NNFMP; 03 Jan 2014 04:00:18 -0000 Received: from [127.0.0.1] by smtp108.mail.bf1.yahoo.com with NNFMP; 03 Jan 2014 04:00:18 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1388721618; bh=bFriKQOv0BcElDs+L8odPU+1tifFeZclJ+da2Tp/NOM=; h=X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:X-Rocket-Received:Subject:From:Content-Type:X-Mailer:Message-Id:Date:To:Content-Transfer-Encoding:Mime-Version; b=0A3TUKMwiRqdxZzqlD3Y3yU2bnidbMUW8fTd5eOvj1WxP8aPZGXl+hawSEBWhWpuTco1wAK0wojMfmUw9ay9RcBAX+5+fg997TDfC0IC8bZyfkZMp0WVUgOULW8IiY9Bv22P9IldZe2qfq7PofvWqU+elQJFER3/KGGFCx5AE5E= X-Yahoo-Newman-Id: 701339.92781.bm@smtp108.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: vPFXwj8VM1nNu8xnsv8sAvDnOQBPNZb_7oWkpRoZxLLg8Lb ACnG56BTtU64KAq0RKtro67zoqc_kfYqy91JUnoUC1_w.GLeQv05SqiIxn7G Mxw6A6u.kaRLgqTwOkgxWl9I4L.No7z7nUBNjQwaHIvB77tWA_OTegtJFROA t3z1j4C1SJYX2TyAh38ZK9FrYzUjwMsxsnQh6VgewVxmJ.5NHeHGlw.TQK8t 2yoT_9DJImpK7aKbsdOXn6RB1RHYXywrEpdzZF4.YOovTpAuL5ByvKjIl9aY sRgDlg2ZUQUAHXfZ7yV5Ha6dkl.hMFZXl_eJoImkayiFVowMftVXK.N.g9KJ 3uBOA5siG2rZL8hzt0kpEtzYQuCQ6N6CG4S_Oh4Qdfs.gXpYAHhAFhsS7DJ8 vb7RZIpjTCtukE0kaoaoamWGEZIcdu5Ti80l4kzaGQyCiH.2VgrFsySLz0BT 0LTogth413PN21dbeaJfXENYqkM_hkQK.NVZtWtUCHa3ppV7YDmuZctOd0tZ 03NDlPuRmssMqwWtgToUle3oDXHGK1hLj X-Yahoo-SMTP: LAFNfTaswBDguI7meB90l2l3wOU- X-Rocket-Received: from [10.111.176.199] (free7by@117.136.24.75 with xymcookie [106.10.149.123]) by smtp108.mail.bf1.yahoo.com with SMTP; 02 Jan 2014 20:00:18 -0800 PST Subject: Strange keyboard mistake From: by Content-Type: text/plain; charset=us-ascii X-Mailer: iPhone Mail (11B554a) Message-Id: Date: Fri, 3 Jan 2014 12:00:07 +0800 To: "freebsd-hackers@freebsd.org" Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (1.0) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 04:00:25 -0000 Hi, I got a very strange problem. I got another keyboard for my laptop, everything goes well for days, b= ut today, for some reasons when in csh environment, i got my new keyboard of= f my laptop's USB port, and just a few minutes later, after i put it back, k= eyboard got a mistake. For example, when i type 'b', it became a "smile face", and other keys= became other strange symbols too! What is the most strange is that my original keyboard on my laptop bec= ame the same!=20 I got no idea how to do, so i hit Ctrl+Alt+Delete to reboot, after reb= oot, everything became normal. Does anyone got any ideas about this strange behavior? Or if i do not w= ant to reboot, what should i do when i encounter this situation again. By the way, i use FreeBSD 8.4 RELEASE and my new keyboard is Logitech K= 310. Thanks. ----by From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 22:11:23 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07B30D2E for ; Thu, 2 Jan 2014 22:11:23 +0000 (UTC) Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DE7F318B8 for ; Thu, 2 Jan 2014 22:11:22 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; d="scan'208,217";a="134151895" Received: from vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) by mx12-out.netapp.com with ESMTP; 02 Jan 2014 14:11:21 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.03.0123.003; Thu, 2 Jan 2014 14:11:21 -0800 From: "Gumpula, Suresh" To: "freebsd-hackers@freebsd.org" Subject: Reference count race window Thread-Topic: Reference count race window Thread-Index: Ac8IB4AfLgR9XBy+SMCYAizpAjCJRg== Date: Thu, 2 Jan 2014 22:11:20 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] MIME-Version: 1.0 X-Mailman-Approved-At: Fri, 03 Jan 2014 04:38:41 +0000 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: "Gumpula, Suresh" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 22:11:23 -0000 Hi, I am Suresh from NetAPP and I have questions/queries related to the refe= rence count usage in the BSD kernel. We are seeing some corruptions/use aft= er free issues and while debugging we found that the corruption pattern is a ucr= ed/crgroups structure and started looking at ucred reference count implemen= ation. This is my understanding of ref count race window, please correct me if I = am wrong. It seems there is a timing window exposed by the FreeBSD reference count us= age/implementation. Let's start with the definitions of the acquire and rel= ease routines in freebsd/sys/sys/refcount.h static __inline void refcount_acquire(volatile u_int *count) { atomic_add_acq_int(count, 1); } static __inline int refcount_release(volatile u_int *count) { u_int old; /* XXX: Should this have a rel membar? */ old =3D atomic_fetchadd_int(count, -1); KASSERT(old > 0, ("negative refcount %p", count)); return (old =3D=3D 1); } As implemented, a call to refcount_acquire atomically increments the refer= ence count while refcount_release decrements the reference count and returns true if this release dropped the reference = count to zero. Consider the following sequence of events in the absence of other external = synchronization: * Object foo has a refcount of 1 * Thread a on processor m calls refcount_release on foo. * Very soon after (in CPU terms) thread b on processor n calls refcount_acq= uire on foo. * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int i= n thread b, decrementing foo's refcount to zero and setting old to 1. refcount_releas= e returns true. * atomic_add_acq_int in thread b increments the reference count to 1! * thread a, seeing refcount_release return success, frees foo. * thread b, believing it has a reference count on foo, continues to use it. * The major hole here is that refcount_acquire is a void function. If it al= so returned status, calling software could determine that it had a valid reference and take = appropriate action if it failed to acquire. One such implementation might look like: static __inline int refcount_acquire(volatile u_int *count) { u_int old; old =3D atomic_fetchadd_int(count, 1); return (old !=3D 0); } This change would require modification of all calls to refcount_acquire and= determining appropriate action in the case of a non-success return. Without changing the return-value semantics of refcount_acquire, we have in= troduced a panic if we detected a race as below. static __inline void refcount_acquire(volatile u_int *count) { u_int old; old =3D atomic_fetchadd_int(count, 1); if (old =3D=3D 0) { panic("refcount_acquire race condition detected!\n"); } } After this change , we have seen this panic in one of our systems. Could so= meone look at my understanding and give me some ways to narrow down this pr= oblem. As I mentioned earlier, one option is to change refcount_acquire to be non = void and change all the callers, but it seems there are many paths to be ch= anged on failure case. Thank you Suresh From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 23:07:50 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 77B079B8 for ; Thu, 2 Jan 2014 23:07:50 +0000 (UTC) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 3E09A1D71 for ; Thu, 2 Jan 2014 23:07:50 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id 520703EB30; Thu, 2 Jan 2014 23:07:49 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.7/8.14.7) with ESMTP id s02N7mts012214; Thu, 2 Jan 2014 23:07:48 GMT (envelope-from phk@phk.freebsd.dk) To: "Gumpula, Suresh" Subject: Re: Reference count race window In-reply-to: From: "Poul-Henning Kamp" References: Content-Type: text/plain; charset=ISO-8859-1 Date: Thu, 02 Jan 2014 23:07:48 +0000 Message-ID: <12213.1388704068@critter.freebsd.dk> X-Mailman-Approved-At: Fri, 03 Jan 2014 04:39:36 +0000 Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 23:07:50 -0000 In message , "Gumpula, Suresh" writes: >One such implementation might look like: >static __inline int >refcount_acquire(volatile u_int *count) >{ > u_int old; > > old = atomic_fetchadd_int(count, 1); > return (old != 0); >} This would still not be safe. as it would increment the count even if it failed, and thereby just move the race to the thread to come past this counter. I agree that refcount_acquire() needs to return failure (either as returnvalue or panic) if the refcount was zero, but unless it panics it SHALL also leave the refcount intact in that case. I don't think there is any way to implement failure-detecting refcounts correctly, except by using a compare-exchange style atomic, which is less efficient than the atomic add. For that reason, it can be argued that the present design is faster and that users of the refcount API are required to use some other means to ensure that grabbing a reference is always safe. However, in my experience that usually becomes even more inefficient. So overall I would probably vote for the compare-exchange model with a return value for failure. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 05:47:16 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8CDFEA18 for ; Fri, 3 Jan 2014 05:47:16 +0000 (UTC) Received: from smtp2.hushmail.com (smtp2a.hushmail.com [65.39.178.237]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 72D271975 for ; Fri, 3 Jan 2014 05:47:15 +0000 (UTC) Received: from smtp2.hushmail.com (smtp2a.hushmail.com [65.39.178.237]) by smtp2.hushmail.com (Postfix) with SMTP id DCEEBA0214 for ; Fri, 3 Jan 2014 05:17:07 +0000 (UTC) Received: from smtp.hushmail.com (w7.hushmail.com [65.39.178.32]) by smtp2.hushmail.com (Postfix) with ESMTP for ; Fri, 3 Jan 2014 05:17:07 +0000 (UTC) Received: by smtp.hushmail.com (Postfix, from userid 99) id C11B6200F5; Fri, 3 Jan 2014 05:17:07 +0000 (UTC) MIME-Version: 1.0 Date: Fri, 03 Jan 2014 00:17:07 -0500 To: freebsd-hackers@freebsd.org Subject: pthread basics and contention From: chump1@hushmail.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="UTF-8" Message-Id: <20140103051707.C11B6200F5@smtp.hushmail.com> X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 05:47:16 -0000 I have a fairly simple task that involves processing something in a 2D array, MxN times. I took a naive approach, 1x process 1x thread, and it took a little longer than desired. Well now, I could do better with some multi processing, especially on a multi core box, right? Well, I have not had much luck. At first I spawned M threads and had each iterate over each N in turn, with M between 25-35. It took much, much longer than the single thread. I figured contention and overhead were costing me big, and gave it a shot with a scaled down version of the problem, M=10. Still, much slower than the single thread. A little confused, I went back to the big problem set (25-35), and made a new program that spawned only two threads, and each is limited to processing only even or only odd data sets. Even that still takes twice as long as the single thread version! What is up with that? More important asides, I am barely doing any real processing at all. It is basically a no-op, barely doing more than incrementing the counter. Should I expect to see performance gains once I am doing real work in the processing portion of my program? Should I expect to see much different behavior on a different OS? Also I have one physical processor, two cores. Would I see better gains with more cores? How do you find processes and threads scale against hardware overall? Thanks! Sent using Hushmail From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 05:52:48 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 35010BDF for ; Fri, 3 Jan 2014 05:52:48 +0000 (UTC) Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com [IPv6:2a00:1450:400c:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id C30931A02 for ; Fri, 3 Jan 2014 05:52:47 +0000 (UTC) Received: by mail-we0-f173.google.com with SMTP id u57so13089000wes.18 for ; Thu, 02 Jan 2014 21:52:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=Iz7IULqqHivY11RfFDM4ueaKz78qDFOFOx9oCUQf4Bo=; b=gLahNKfVeBZWSPOFJ7OUcOTuOKsV8Qykf6kH9Mfzp7cV8bQwnLwXo0/ZTQ4wKcdZUk vBS6tAg6kUr9BdQ2jgNtFAdY54nPlcJV9Xjjr800w3YEncXGQnobOCfT518zFwc1nGL4 aJ6H6oJ9J6J1Zxv5QTdCTKzQ5HMnR4Ed4Y/JA4PoId2AlrqJRNsznJ7cE/xyn24Yje2I 9g0+s19hdSOigLSE2oANvZ8U7sE5OzdxhKIkbmiPuZFeXcidvdKT6RmXW60bbXwU0/D/ V7HphTdof+vx4uP4/O7hys4cof/7lYdC5FrMQP4ociz9qD7lLt68XXMhrYxlWlt7SNT1 fgAQ== MIME-Version: 1.0 X-Received: by 10.180.39.43 with SMTP id m11mr532191wik.8.1388728365233; Thu, 02 Jan 2014 21:52:45 -0800 (PST) Received: by 10.194.187.136 with HTTP; Thu, 2 Jan 2014 21:52:45 -0800 (PST) In-Reply-To: <20140103051707.C11B6200F5@smtp.hushmail.com> References: <20140103051707.C11B6200F5@smtp.hushmail.com> Date: Fri, 3 Jan 2014 00:52:45 -0500 Message-ID: Subject: Re: pthread basics and contention From: Rayson Ho To: chump1@hushmail.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 05:52:48 -0000 It depends on how you partition the work items. If the even & odd data end up sharing the same cacheline, then it can be slow... You may want to google: cache ping pong effect Rayson =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Open Grid Scheduler - The Official Open Source Grid Engine http://gridscheduler.sourceforge.net/ http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html On Fri, Jan 3, 2014 at 12:17 AM, wrote: > > I have a fairly simple task that involves processing something in a 2D ar= ray, MxN times. I took a naive approach, 1x process 1x thread, and it took = a little longer than desired. Well now, I could do better with some multi p= rocessing, especially on a multi core box, right? > > > > > Well, I have not had much luck. At first I spawned M threads and had each= iterate over each N in turn, with M between 25-35. It took much, much long= er than the single thread. I figured contention and overhead were costing m= e big, and gave it a shot with a scaled down version of the problem, M=3D10= . Still, much slower than the single thread. A little confused, I went back= to the big problem set (25-35), and made a new program that spawned only t= wo threads, and each is limited to processing only even or only odd data se= ts. Even that still takes twice as long as the single thread version! What = is up with that? > > > > > More important asides, I am barely doing any real processing at all. It i= s basically a no-op, barely doing more than incrementing the counter. Shoul= d I expect to see performance gains once I am doing real work in the proces= sing portion of my program? Should I expect to see much different behavior = on a different OS? Also I have one physical processor, two cores. Would I s= ee better gains with more cores? How do you find processes and threads scal= e against hardware overall? > > > > > Thanks! > > > Sent using Hushmail > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 17:40:20 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8470F1A5; Fri, 3 Jan 2014 17:40:20 +0000 (UTC) Received: from mx11.netapp.com (mx11.netapp.com [216.240.18.76]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 5F04611A8; Fri, 3 Jan 2014 17:40:19 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.95,598,1384329600"; d="scan'208";a="93491130" Received: from vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) by mx11-out.netapp.com with ESMTP; 03 Jan 2014 09:40:19 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht06-prd.hq.netapp.com ([10.106.77.104]) with mapi id 14.03.0123.003; Fri, 3 Jan 2014 09:40:19 -0800 From: "Gumpula, Suresh" To: Alfred Perlstein , Julian Elischer , "freebsd-hackers@freebsd.org" Subject: RE: Reference count race window Thread-Topic: Reference count race window Thread-Index: AQHPCA0ePlCdkUTb2Eq8swVe1xATjJpynbOA//98gUCAAJ/7AP//hPzAgACUt4CAAGawEA== Date: Fri, 3 Jan 2014 17:40:18 +0000 Message-ID: References: <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org> <52C61088.3080703@mu.org> <52C62617.7020304@mu.org> In-Reply-To: <52C62617.7020304@mu.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.53] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 17:40:20 -0000 Thanks a lot for suggestions/comments Alfred/Julian/Poul Henning. I will= instrument as per Alfred suggestion first and will see if misuse of crfree= is the root cause of corruption than a race window. By the way , what are all the ways we have in freebsd to debug memory corr= uptions. I am aware of options DEBUG_REDZONE( overflow detection) , DEB= UG_MEMGAURD( for a malloc type), MALLOC_DEBUG_MAXZONES and some INVARIANT= S in=20 Kern_malloc.c which caches the malloc_type that most recently freed . A= nd any more corruption debugging methods we already have ? Thanks Suresh -----Original Message----- From: Alfred Perlstein [mailto:bright@mu.org]=20 Sent: Thursday, January 02, 2014 9:53 PM To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org Subject: Re: Reference count race window On 1/2/14, 6:38 PM, Gumpula, Suresh wrote: > Hi Alfred, > I agree that there could have been an extra/invalid crfree() whic= h decremented the count and looks valid crhold(acquire) from socket code= panic'ed in my case. As per your suggestion if we > replace the assert with if condition in release, we will end up panic= ing when the actual crfree() happens. But we may not be knowing who crfr= ee'ed() in the first and invalid place. Am I correct? I will try your su= ggestion. > > Can you please bit more explain your array trick ? I think the simplest thing to do would be to replace crfree with a macro to= pass __FILE__ and __LINE__ down into the actual crfree (or you can use bui= ltin_return_address to get the stack address of the caller instead. Then just either have an array of {file,line} tuples or instead just return= addresses in the struct. Just extend struct ucred and add this: #define MAX_PREV 10 const char *files[MAX_PREV]; int lines[MAX_PREV]; Then hack your own version of refcount_release(), call it refcount_release2() but have it take a pointer to an integer that it will w= rite the value of the old refcount into. Then you can use that return value like so: if (old_refcount < MAX_PREV) { cred->files[old_refcount] =3D pointer_to_file_name; cred->lines[old_refcount] =3D line_number; } Then add the assert. or better yet, turn on INVARIANTS (or maybe try each = option in turn as INVARIANTS might hide the bug). When you crash you can then see the last few callers who did free inside th= e struct. Since the refcount "old_refcount" is atomically manipulated you should see = the last few frees that send you negative. -Alfred > > Thanks > Suresh > > > > -----Original Message----- > From: owner-freebsd-hackers@freebsd.org=20 > [mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of Alfred=20 > Perlstein > Sent: Thursday, January 02, 2014 8:21 PM > To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org > Subject: Re: Reference count race window > > > On 1/2/14, 3:53 PM, Gumpula, Suresh wrote: >>>> Without changing the return-value semantics of refcount_acquire, we=20 >>>> have introduced a panic if we detected a race as below. >>>> static __inline void >>>> refcount_acquire(volatile u_int *count) { >>>> u_int old; >>>> >>>> old =3D atomic_fetchadd_int(count, 1); >>>> if (old =3D=3D 0) { >>>> panic("refcount_acquire race condition detected!\n"); >>>> } >>>>>>> so what is the stacktrace of the panic? >> It's from the socket code calling crhold. It's a non debug build( NO I= NVARIANTS ) >> >> #4 0xffffffff80331d34 in panic (fmt=3D0xffffffff805c1e60=20 >> "refcount_acquire race condition detected!\n") at >> ../../../../sys/kern/kern_shutdown.c:1009 >> #5 0xffffffff80326662 in refcount_acquire (count=3D) at >> ../../../../sys/sys/refcount.h:65 >> #6 crhold (cr=3D) at >> ../../../../sys/kern/kern_prot.c:1814 >> #7 0xffffffff803aa0d9 in socreate (dom=3D,=20 >> aso=3D0xffffff80345c1b00, type=3D, proto=3D0,=20 >> cred=3D0xffffff0017d7aa00, td=3D0xffffff000b294410) at >> ../../../../sys/kern/uipc_socket.c:441 >> #8 0xffffffff803b2e5c in socket (td=3D0xffffff000b294410, >> uap=3D0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201 >> #9 0xffffffff80539ecb in syscall (frame=3D0xffffff80345c1c80) at >> ../../../../sys/amd64/amd64/trap.c:1260 >> > If it's a non-debug build then how do you know that someone isn't incorre= ctly lowering the refcount? > > Please try some invariants or at least manually turn on the one KASSERT I= mentioned. > > Another trick would be to add a an array of char*+int for the last few pl= aces that decremented, you can use the returned refcount as an index to tha= t array to track who may be doing the extra frees. > > -Alfred > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list=20 > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org= " > From owner-freebsd-hackers@FreeBSD.ORG Fri Jan 3 21:01:49 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8496E49D for ; Fri, 3 Jan 2014 21:01:49 +0000 (UTC) Received: from lath.rinet.ru (lath.rinet.ru [195.54.192.90]) by mx1.freebsd.org (Postfix) with ESMTP id 476E812BA for ; Fri, 3 Jan 2014 21:01:49 +0000 (UTC) Received: by lath.rinet.ru (Postfix, from userid 222) id 59E738BE1; Sat, 4 Jan 2014 00:51:59 +0400 (MSK) Date: Sat, 4 Jan 2014 00:51:59 +0400 From: Oleg Bulyzhin To: freebsd-hackers@freebsd.org Subject: atomic_load_acq @ i386/amd64 Message-ID: <20140103205159.GA99722@lath.rinet.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Jan 2014 21:01:49 -0000 Hello. I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs with locked cmpxchg instruction? Comment about this (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe acquire/release semantics does not require StoreLoad barrier so simple aligned load should be enough. (because acquire/release semantics does not guarantee sequential consistency). -- Oleg. ================================================================ === Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru === ================================================================ From owner-freebsd-hackers@FreeBSD.ORG Sat Jan 4 06:41:22 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 409BAAA6 for ; Sat, 4 Jan 2014 06:41:22 +0000 (UTC) Received: from mail-pb0-f42.google.com (mail-pb0-f42.google.com [209.85.160.42]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 128691831 for ; Sat, 4 Jan 2014 06:41:21 +0000 (UTC) Received: by mail-pb0-f42.google.com with SMTP id uo5so16503541pbc.15 for ; Fri, 03 Jan 2014 22:41:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to; bh=kgeRFR3+AzOkEo09wPlisHPzxcuyKYTwB+pZUcR6aOk=; b=f0cygry4gkENkt4+CiLmptS9qRTiLA4hfJwTFoUBzmvoWoHN787Ay0tjvqe92CBj/z FucUC0AIPNPe2qb2S+Cdkb/NlBXhfJj7DIud2XyYSXSMpj5nog9uEx0s+AHhbZJQdhH/ B4hRm+kRAwp2AeeBXNX6vF3HS4UWw3q3wPpvjBgo1jMErd6v79GsJlIZlqcDVi3qmpKO yG6jpez9hynY6BxUksD4nHT46GjKkT/SsY6J/eMbD2ePHbOxr9T5nmI6MdVgkkvu7bhC 9tM1EQvGLFg+nUHu2e5wVUyZqgGjqVm+WncM/Jb0rZjAmgumq19NjFhb5KTcaBZqRcAL dOIQ== X-Gm-Message-State: ALoCoQndkDfUJ/9p33qiV5FGvEFDSGh8l166Jqwf3fv+wB/aHLrce6q7RPyzC9IpAJxg89qnG6wN X-Received: by 10.68.189.133 with SMTP id gi5mr101475889pbc.57.1388817681188; Fri, 03 Jan 2014 22:41:21 -0800 (PST) Received: from [192.168.2.136] (99-74-169-43.lightspeed.sntcca.sbcglobal.net. [99.74.169.43]) by mx.google.com with ESMTPSA id de1sm113285379pbc.7.2014.01.03.22.41.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 03 Jan 2014 22:41:19 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.1 \(1827\)) Subject: Re: pthread basics and contention From: Tim Kientzle In-Reply-To: Date: Fri, 3 Jan 2014 22:41:15 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <4EFEA29F-4D6E-4B4A-8C26-E15FA62B574C@kientzle.com> References: <20140103051707.C11B6200F5@smtp.hushmail.com> To: chump1@hushmail.com X-Mailer: Apple Mail (2.1827) Cc: "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 06:41:22 -0000 Depending on the calculation involved, you may be memory bus constrained, not CPU constrained. On modern processors, it is often the case that it takes longer to get the data to/from memory than to actually compute anything. In those cases, splitting your work into threads just gives you more CPUs waiting on the same slow memory. Deciding whether this is the issue or not requires good processor-level profiling tools. Tim On Fri, Jan 3, 2014 at 12:17 AM, wrote: >=20 > I have a fairly simple task that involves processing something in a 2D = array, MxN times. I took a naive approach, 1x process 1x thread, and it = took a little longer than desired. Well now, I could do better with some = multi processing, especially on a multi core box, right? >=20 > Well, I have not had much luck. At first I spawned M threads and had = each iterate over each N in turn, with M between 25-35. It took much, = much longer than the single thread. I figured contention and overhead = were costing me big, and gave it a shot with a scaled down version of = the problem, M=3D10. Still, much slower than the single thread. A little = confused, I went back to the big problem set (25-35), and made a new = program that spawned only two threads, and each is limited to processing = only even or only odd data sets. Even that still takes twice as long as = the single thread version! What is up with that? >=20 > More important asides, I am barely doing any real processing at all. = It is basically a no-op, barely doing more than incrementing the = counter. Should I expect to see performance gains once I am doing real = work in the processing portion of my program? Should I expect to see = much different behavior on a different OS? Also I have one physical = processor, two cores. Would I see better gains with more cores? How do = you find processes and threads scale against hardware overall? >=20 > Thanks! >=20 > Sent using Hushmail From owner-freebsd-hackers@FreeBSD.ORG Sat Jan 4 17:29:33 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 2ED75F46; Sat, 4 Jan 2014 17:29:33 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id AE2C51193; Sat, 4 Jan 2014 17:29:32 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.7/8.14.7) with ESMTP id s04HTNqL097496; Sat, 4 Jan 2014 19:29:23 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.8.3 kib.kiev.ua s04HTNqL097496 Received: (from kostik@localhost) by tom.home (8.14.7/8.14.7/Submit) id s04HTNQm097495; Sat, 4 Jan 2014 19:29:23 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 4 Jan 2014 19:29:23 +0200 From: Konstantin Belousov To: Oleg Bulyzhin Subject: Re: atomic_load_acq @ i386/amd64 Message-ID: <20140104172923.GY59496@kib.kiev.ua> References: <20140103205159.GA99722@lath.rinet.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pfhDleuqWB4Kh3F0" Content-Disposition: inline In-Reply-To: <20140103205159.GA99722@lath.rinet.ru> User-Agent: Mutt/1.5.22 (2013-10-16) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on tom.home Cc: freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 17:29:33 -0000 --pfhDleuqWB4Kh3F0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, Jan 04, 2014 at 12:51:59AM +0400, Oleg Bulyzhin wrote: >=20 > Hello. >=20 > I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs > with locked cmpxchg instruction? Comment about this > (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe > acquire/release semantics does not require StoreLoad barrier so simple al= igned > load should be enough. (because acquire/release semantics does not guaran= tee > sequential consistency). You did not explicitely wrote which statement in the comment is false, in your opinion. FreeBSD assumes a property of _acq/_rel stuff which is sometimes called 'total lock ordering'. It is indeed sort of sequential consistency, but only for atomic+membar ops. Would atomic_load_acq() implemented as plain load, it can pass stores, in particular stores from the _rel op, which breaks the guarantee. For x86, there are indeed two possible schemes for implementing critical section, one is lock cmpxchg for get(), and plain store for release(), which is what we use. Another is plain load for get(), and xchg for release(). Then, the load_acq() must be adopted to not break the acq/rel consistency, and since we use plain store for release(), load_acq must use serialing instruction. --pfhDleuqWB4Kh3F0 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQIcBAEBAgAGBQJSyETyAAoJEJDCuSvBvK1Bo5QP/1A+2IP95QtfUmdMb5KYY1XT w2oQIODKTsJ9pTjreuNhj4ShdcPJ5IhxilYrLY4lcUycdY/LQzVypO0/2M/1L/TJ 5KcHOYdcsxvEd7gQHqIzgMJLLnHtLK0CT3D2VJ/Tee67FB/fGbCOa55JIL0OWbeD E4gUvqZovhIUE7tjqZW7Dcco6IfPWtvMnr5CIIRR3b7s4Yud4gW5dI1NUfL/jvl9 PwcJQo/KOeFL+7ZkGR6EN5pY9q8e/dNLsJGLbGYjmKboYZN6GfPIZ5Blri0v1yEM nQCs6j+Smhthc1x3Uvi5HdUSc4PcvzRDkHltKAW5+Tuo2gQPMEoIr75AjPWjYdTw cCOjP9mHnZPcSkv5CoDGh+LrbFBr3adgSRa6wD08GJxEZ24wgeXwtBW+jYX+IoRF Ze9nNW91pMsfWKhwxPGs+RSJCMeRgenLRCppg86yGHJ33gUGwRIguijqiH87MOLt IhHrhJV6pk3uZIWB6/Ktv+C4TsTxRtyoIQ1ZZnqq5aIv6uxg+4HTm2UB1fc7vTCy vHi26KTpCrGU5daPRoEJvS8P41Zuw/Ghpc1Ky/DV7ZqxoRLgNz6MlISsscxoZBWU UMCcW/HEcu3tHe8yeZ5rB95H6/r5LObnPX4/f5JSkexjSWuEv78Dj1xP/2zGDCdV pZXfiQ/9oLZVxkV6Z5wy =NHEV -----END PGP SIGNATURE----- --pfhDleuqWB4Kh3F0-- From owner-freebsd-hackers@FreeBSD.ORG Sat Jan 4 23:29:17 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4D22AF33; Sat, 4 Jan 2014 23:29:17 +0000 (UTC) Received: from lath.rinet.ru (lath.rinet.ru [195.54.192.90]) by mx1.freebsd.org (Postfix) with ESMTP id 09FE01A61; Sat, 4 Jan 2014 23:29:16 +0000 (UTC) Received: by lath.rinet.ru (Postfix, from userid 222) id 604588210; Sun, 5 Jan 2014 03:29:10 +0400 (MSK) Date: Sun, 5 Jan 2014 03:29:10 +0400 From: Oleg Bulyzhin To: Konstantin Belousov Subject: Re: atomic_load_acq @ i386/amd64 Message-ID: <20140104232910.GA12331@lath.rinet.ru> References: <20140103205159.GA99722@lath.rinet.ru> <20140104172923.GY59496@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140104172923.GY59496@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-hackers@freebsd.org, Oleg Bulyzhin X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 04 Jan 2014 23:29:17 -0000 On Sat, Jan 04, 2014 at 07:29:23PM +0200, Konstantin Belousov wrote: > On Sat, Jan 04, 2014 at 12:51:59AM +0400, Oleg Bulyzhin wrote: > > > > Hello. > > > > I've got a question: why atomic_load_acq_* implemented on i386/amd64 archs > > with locked cmpxchg instruction? Comment about this > > (in /sys/(amd64|i386)/include/atomic.h) looks wrong for me. I believe > > acquire/release semantics does not require StoreLoad barrier so simple aligned > > load should be enough. (because acquire/release semantics does not guarantee > > sequential consistency). > > You did not explicitely wrote which statement in the comment is false, in > your opinion. > > FreeBSD assumes a property of _acq/_rel stuff which is sometimes called > 'total lock ordering'. It is indeed sort of sequential consistency, but > only for atomic+membar ops. Would atomic_load_acq() implemented as plain > load, it can pass stores, in particular stores from the _rel op, which > breaks the guarantee. > > For x86, there are indeed two possible schemes for implementing critical > section, one is lock cmpxchg for get(), and plain store for release(), > which is what we use. Another is plain load for get(), and xchg for > release(). Then, the load_acq() must be adopted to not break the acq/rel > consistency, and since we use plain store for release(), load_acq must > use serialing instruction. Perhaps i was not clear enough, i'm talking about this one: "However, loads may pass stores, so for atomic_load_acq we have to ensure a Store/Load barrier to do the load in SMP kernels." As far as i know acquire/release semantics guarantees following: if we have this code _acq _rel following statements are true: 1) cannot leave (due to reordering) acq/rel block 2) may leak past _acq 3) may leak before _rel So neither _acq nor _rel requires full membar. I.e. op_acq is: up reordering is prohibited> op_rel is: down reordering is prohibited> Intel documentation says about only thing (for simple load/stores) can be reordered: "Reads may be reordered with older writes to different locations but not with older writes to the same location." So, if older store can pass our load_acq() it would not break requirements. And i do not understand how load op from load_acq() can pass store op from store_rel(), intel doc says: "Writes are not reordered with older reads". Well, while writing this email i realized what is disturbing me: it's atomic(9) "Multiple Processors" section. It claims atomics are not atomic in common MP case and says atomics are atomic @i386. It looks strange for me: 1) i guess it's not "atomic" even for i386/MP without proper membar pairing. 2) if we have acq/rel modifiers for atomics why we cannot guarantee "atomicity" for any MP arch? P.S. please correct me if i'm wrong in my statements, i'm spending my new year holidays for ignorance elimination. ;) -- Oleg. ================================================================ === Oleg Bulyzhin -- OBUL-RIPN -- OBUL-RIPE -- oleg@rinet.ru === ================================================================