Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 02 Jan 2014 14:50:38 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>,  "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Reference count race window
Message-ID:  <52C5ED3E.4020805@mu.org>
In-Reply-To: <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>
References:  <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 1/2/14, 2:17 PM, Gumpula, Suresh wrote:
> Hi,
>    I am Suresh from NetAPP and I have  questions/queries related to the reference count usage in the BSD kernel. We are seeing some corruptions/use after free
>    issues and while  debugging we found that the corruption pattern is a ucred/crgroups structure and started looking at ucred reference count implemenation.
>
> This is my understanding of ref count race window,  please correct me if I am wrong.
>
>
> It seems there is a timing window exposed by the FreeBSD reference count usage/implementation. Let's start with the definitions of the acquire and release routines
> in freebsd/sys/sys/refcount.h
>
> static __inline void
> refcount_acquire(volatile u_int *count)
> {
>
>          atomic_add_acq_int(count, 1);
> }
>
> static __inline int
> refcount_release(volatile u_int *count)
> {
>          u_int old;
>
>          /* XXX: Should this have a rel membar? */
>          old = atomic_fetchadd_int(count, -1);
>          KASSERT(old > 0, ("negative refcount %p", count));
>          return (old == 1);
> }
>
> As implemented, a call to refcount_acquire  atomically increments the reference count while refcount_release decrements
> the reference count and returns true if this release dropped the reference count to zero.
>
> Consider the following sequence of events in the absence of other external synchronization:
>
> * Object foo has a refcount of 1
> * Thread a on processor m calls refcount_release on foo.
> * Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo.
> * atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b,
>    decrementing foo's refcount to zero and setting old to 1. refcount_release returns true.
> * atomic_add_acq_int in thread b increments the reference count to 1!
> * thread a, seeing refcount_release return success, frees foo.
> * thread b, believing it has a reference count on foo, continues to use it.
>
> * The major hole here is that refcount_acquire is a void function. If it also returned status,
>     calling software could determine that it had a valid reference and take appropriate action if it failed to acquire.
>
>
> One such implementation might look like:
> static __inline int
> refcount_acquire(volatile u_int *count)
> {
>          u_int old;
>
>          old = atomic_fetchadd_int(count, 1);
>          return (old != 0);
> }
>
> This change would require modification of all calls to refcount_acquire and determining appropriate action in the case of a non-success return.
>
>
> Without changing the return-value semantics of refcount_acquire, we have introduced a panic if we detected a race as below.
> static __inline void
> refcount_acquire(volatile u_int *count)
> {
>          u_int old;
>
>          old = atomic_fetchadd_int(count, 1);
>          if (old == 0) {
>            panic("refcount_acquire race condition detected!\n");
>          }
> }
>
> After this change , we have seen this panic in one of our systems. Could someone look at my understanding and give me some ways to narrow down this problem.
> As I mentioned earlier, one option is to change refcount_acquire to be non void and change all the callers, but it seems there are many paths to be changed on failure case.
>
>
> Thank you
> Suresh
>
> _________________________
Hey Suresh,

In theory this shouldn't happen due to pointer/thread ownership of the 
resource.

This means that usually a cred is copied via refcount to another object 
and by the time the refcount hits 1 then only that one object should be 
pointing to it.

That means that if someone is raising the refcount at the same time then 
they are looking into an object that is in the process of being destroyed!

Going back to your example:

* Object foo has a refcount of 1
* Thread a on processor m calls refcount_release on foo.
* Very soon after (in CPU terms) thread b on processor n calls refcount_acquire on foo.

    ^--- this should not be happening as "foo" should no longer be accessible to other subsystems.
   imagine this would be like some other CPU calling rfork() on a process that is in the middle of exit().  This should *not* happen.

* atomic_fetchadd_int operating in thread a stalls the atomic_add_acq_int in thread b,
   decrementing foo's refcount to zero and setting old to 1. refcount_release returns true.
* atomic_add_acq_int in thread b increments the reference count to 1!
* thread a, seeing refcount_release return success, frees foo.
* thread b, believing it has a reference count on foo, continues to use it.


While it's possible that there *may* be a bug here, I think it would 
make sense for you to add more instrumentation to your code.

Are you testing with INVARIANTS enabled?  otherwise refcount_release 
should be panic'ing due to the KASSERT!

static __inline int
refcount_release(volatile u_int *count)
{
         u_int old;

         /* XXX: Should this have a rel membar? */
         old = atomic_fetchadd_int(count, -1);
         KASSERT(old > 0, ("negative refcount %p", count));
         return (old == 1);
}

Perhaps you should either enable INVARIANTS... or you can turn that one 
single KASSERT into an unconditional test like so:


static __inline int
refcount_release(volatile u_int *count)
{
         u_int old;

         /* XXX: Should this have a rel membar? */
         old = atomic_fetchadd_int(count, -1);
         if (old < 0) panic("negative refcount %p", count);
         return (old == 1);
}

That ought to help you catch the bug.

-Alfred





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52C5ED3E.4020805>