Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 02 Jan 2014 18:53:11 -0800
From:      Alfred Perlstein <bright@mu.org>
To:        "Gumpula, Suresh" <Suresh.Gumpula@netapp.com>,  Julian Elischer <julian@freebsd.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Reference count race window
Message-ID:  <52C62617.7020304@mu.org>
In-Reply-To: <D29CB80EBA4DEA4D91181928AAF51538438C0F09@SACEXCMBX04-PRD.hq.netapp.com>
References:  <D29CB80EBA4DEA4D91181928AAF51538438C0D8B@SACEXCMBX04-PRD.hq.netapp.com> <52C5ED3E.4020805@mu.org> <52C5F8A3.9000902@freebsd.org> <D29CB80EBA4DEA4D91181928AAF51538438C0DF8@SACEXCMBX04-PRD.hq.netapp.com> <52C61088.3080703@mu.org> <D29CB80EBA4DEA4D91181928AAF51538438C0F09@SACEXCMBX04-PRD.hq.netapp.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 1/2/14, 6:38 PM, Gumpula, Suresh wrote:
> Hi Alfred,
>        I agree that there could have been an extra/invalid  crfree() which decremented the count  and  looks valid  crhold(acquire) from socket code  panic'ed in my case. As per your suggestion if we
>   replace  the assert with if condition in release,  we will end up panicing  when the actual  crfree() happens.  But we may not be knowing who crfree'ed() in the first and invalid place.  Am I correct?   I will try your suggestion.
>
> Can you please bit more explain your array trick ?

I think the simplest thing to do would be to replace crfree with a macro 
to pass __FILE__ and __LINE__ down into the actual crfree (or you can 
use builtin_return_address to get the stack address of the caller instead.

Then just either have an array of {file,line} tuples or instead just 
return addresses in the struct.

Just extend struct ucred and add this:

#define MAX_PREV 10
const char *files[MAX_PREV];
int lines[MAX_PREV];

Then hack your own version of refcount_release(), call it 
refcount_release2() but have it take a pointer to an integer that it 
will write the value of the old refcount into.

Then you can use that return value like so:

if (old_refcount < MAX_PREV) {
   cred->files[old_refcount] = pointer_to_file_name;
   cred->lines[old_refcount] = line_number;
}

Then add the assert.  or better yet, turn on INVARIANTS (or maybe try 
each option in turn as INVARIANTS might hide the bug).

When you crash you can then see the last few callers who did free inside 
the struct.

Since the refcount "old_refcount" is atomically manipulated you should 
see the last few frees that send you negative.

-Alfred

>
> Thanks
> Suresh
>
>
>
> -----Original Message-----
> From: owner-freebsd-hackers@freebsd.org [mailto:owner-freebsd-hackers@freebsd.org] On Behalf Of Alfred Perlstein
> Sent: Thursday, January 02, 2014 8:21 PM
> To: Gumpula, Suresh; Julian Elischer; freebsd-hackers@freebsd.org
> Subject: Re: Reference count race window
>
>
> On 1/2/14, 3:53 PM, Gumpula, Suresh wrote:
>>>> Without changing the return-value semantics of refcount_acquire, we
>>>> have introduced a panic if we detected a race as below.
>>>> static __inline void
>>>> refcount_acquire(volatile u_int *count) {
>>>>            u_int old;
>>>>
>>>>            old = atomic_fetchadd_int(count, 1);
>>>>            if (old == 0) {
>>>>              panic("refcount_acquire race condition detected!\n");
>>>>            }
>>>>>>> so what is the stacktrace of the panic?
>> It's from the socket code calling crhold.   It's a non debug build( NO INVARIANTS )
>>
>> #4  0xffffffff80331d34 in panic (fmt=0xffffffff805c1e60
>> "refcount_acquire race condition detected!\n") at
>> ../../../../sys/kern/kern_shutdown.c:1009
>> #5  0xffffffff80326662 in refcount_acquire (count=<optimized out>) at
>> ../../../../sys/sys/refcount.h:65
>> #6  crhold (cr=<optimized out>) at
>> ../../../../sys/kern/kern_prot.c:1814
>> #7  0xffffffff803aa0d9 in socreate (dom=<optimized out>,
>> aso=0xffffff80345c1b00, type=<optimized out>, proto=0,
>> cred=0xffffff0017d7aa00, td=0xffffff000b294410) at
>> ../../../../sys/kern/uipc_socket.c:441
>> #8  0xffffffff803b2e5c in socket (td=0xffffff000b294410,
>> uap=0xffffff80345c1be0) at ../../../../sys/kern/uipc_syscalls.c:201
>> #9  0xffffffff80539ecb in syscall (frame=0xffffff80345c1c80) at
>> ../../../../sys/amd64/amd64/trap.c:1260
>>
> If it's a non-debug build then how do you know that someone isn't incorrectly lowering the refcount?
>
> Please try some invariants or at least manually turn on the one KASSERT I mentioned.
>
> Another trick would be to add a an array of char*+int for the last few places that decremented, you can use the returned refcount as an index to that array to track who may be doing the extra frees.
>
> -Alfred
>
> _______________________________________________
> freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52C62617.7020304>