From owner-freebsd-hackers@FreeBSD.ORG Thu Jan 2 22:11:23 2014 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 07B30D2E for ; Thu, 2 Jan 2014 22:11:23 +0000 (UTC) Received: from mx12.netapp.com (mx12.netapp.com [216.240.18.77]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id DE7F318B8 for ; Thu, 2 Jan 2014 22:11:22 +0000 (UTC) X-IronPort-AV: E=Sophos;i="4.95,593,1384329600"; d="scan'208,217";a="134151895" Received: from vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) by mx12-out.netapp.com with ESMTP; 02 Jan 2014 14:11:21 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.03.0123.003; Thu, 2 Jan 2014 14:11:21 -0800 From: "Gumpula, Suresh" To: "freebsd-hackers@freebsd.org" Subject: Reference count race window Thread-Topic: Reference count race window Thread-Index: Ac8IB4AfLgR9XBy+SMCYAizpAjCJRg== Date: Thu, 2 Jan 2014 22:11:20 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] MIME-Version: 1.0 X-Mailman-Approved-At: Fri, 03 Jan 2014 04:38:41 +0000 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.17 Cc: "Gumpula, Suresh" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jan 2014 22:11:23 -0000 Hi, I am Suresh from NetAPP and I have questions/queries related to the refe= rence count usage in the BSD kernel. We are seeing some corruptions/use aft= er free issues and while debugging we found that the corruption pattern is a ucr= ed/crgroups structure and started looking at ucred reference count implemen= ation. This is my understanding of ref count race window, please correct me if I = am wrong. It seems there is a timing window exposed by the FreeBSD reference count us= age/implementation. Let's start with the definitions of the acquire and rel= ease routines in freebsd/sys/sys/refcount.h static __inline void refcount_acquire(volatile u_int *count) { atomic_add_acq_int(count, 1); } static __inline int refcount_release(volatile u_int *count) { u_int old; /* XXX: Should this have a rel membar? */ old =3D atomic_fetchadd_int(count, -1); KASSERT(old > 0, ("negative refcount %p", count)); return (old =3D=3D 1); } As implemented, a call to refcount_acquire atomically increments the refer= ence count while refcount_release decrements the reference count and returns true if this release dropped the reference = count to zero. Consider the following sequence of events in the absence of other external = synchronization: * Object foo has a refcount of 1 * Thread a on processor m calls refcount_release on foo. * Very soon after (in CPU terms) thread b on processor n calls refcount_acq= uire on foo. * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int i= n thread b, decrementing foo's refcount to zero and setting old to 1. refcount_releas= e returns true. * atomic_add_acq_int in thread b increments the reference count to 1! * thread a, seeing refcount_release return success, frees foo. * thread b, believing it has a reference count on foo, continues to use it. * The major hole here is that refcount_acquire is a void function. If it al= so returned status, calling software could determine that it had a valid reference and take = appropriate action if it failed to acquire. One such implementation might look like: static __inline int refcount_acquire(volatile u_int *count) { u_int old; old =3D atomic_fetchadd_int(count, 1); return (old !=3D 0); } This change would require modification of all calls to refcount_acquire and= determining appropriate action in the case of a non-success return. Without changing the return-value semantics of refcount_acquire, we have in= troduced a panic if we detected a race as below. static __inline void refcount_acquire(volatile u_int *count) { u_int old; old =3D atomic_fetchadd_int(count, 1); if (old =3D=3D 0) { panic("refcount_acquire race condition detected!\n"); } } After this change , we have seen this panic in one of our systems. Could so= meone look at my understanding and give me some ways to narrow down this pr= oblem. As I mentioned earlier, one option is to change refcount_acquire to be non = void and change all the callers, but it seems there are many paths to be ch= anged on failure case. Thank you Suresh