Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Jan 2014 09:46:03 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Eric van Gyzen <eric@vangyzen.net>
Cc:        FreeBSD Net <freebsd-net@freebsd.org>
Subject:   Re: ipv6 lock contention with parallel socket io
Message-ID:  <CAJ-Vmom-qghnACRdU9xX5wg9jZLYqvUQe=aD2GkheMNtseekzg@mail.gmail.com>
In-Reply-To: <52C5A020.9010404@vangyzen.net>
References:  <CAJ-VmoknC7TszvhonHgFq6mJNpgUmvXAjAH3uK9TMU0uMzcocw@mail.gmail.com> <CAJ-VmonNDMZpCrojXRP92cUxzMKox_QX_5tBXjGM6kjei=oqjw@mail.gmail.com> <52C5A020.9010404@vangyzen.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2 January 2014 09:21, Eric van Gyzen <eric@vangyzen.net> wrote:
> On 12/31/2013 11:53, Adrian Chadd wrote:
>> On 30 December 2013 23:35, Adrian Chadd <adrian@freebsd.org> wrote:
>>> Hi,
>>>
>>> I've noticed a bunch of lock contention occurs when doing highly
>>> parallel (> 4096 sockets) TCP IPv6 traffic.
>>>
>>> The contention is here:
>>>
>> [snip]
>>
>>> .. it's the IF_ADATA lock surrounding the lla_lookup() calls.
>>>
>>> Now:
>>>
>>> * is there any reason this isn't an rmlock?
>>> * the instance early on in nd6_output_lle() isn't taking the read
>>> lock, it's taking the full lock. Is there any reason for this?
>>>
>>> I don't have much experience or time to spend on optimising the ipv6
>>> code to scale better but this seems like one of those things that'll
>>> bite us in the ass as the amount of ipv6 deployed increases.
>>>
>>> Does anyone have any ideas/suggestions on how we could improve things?
>> This improves things quite a bit - from 1.9gbyte/sec @ 100% cpu usage
>> to 2.7gbyte/sec @ 85% CPU usage. It's not perfect - the lock
>> contention is still there - but it's much less of an issue now.
>>
>> Are there any issues with it?
>>
>> Index: sys/netinet6/nd6.c
>> ===================================================================
>> --- sys/netinet6/nd6.c  (revision 259475)
>> +++ sys/netinet6/nd6.c  (working copy)
>> @@ -1891,9 +1891,9 @@
>>         flags = ((m != NULL) || (lle != NULL)) ? LLE_EXCLUSIVE : 0;
>>         if (ln == NULL) {
>>         retry:
>> -               IF_AFDATA_LOCK(ifp);
>> +               IF_AFDATA_RLOCK(ifp);
>>                 ln = lla_lookup(LLTABLE6(ifp), flags, (struct sockaddr *)dst);
>> -               IF_AFDATA_UNLOCK(ifp);
>> +               IF_AFDATA_RUNLOCK(ifp);
>>                 if ((ln == NULL) && nd6_is_addr_neighbor(dst, ifp))  {
>>                         /*
>>                          * Since nd6_is_addr_neighbor() internally
>> calls nd6_lookup(),
>
> This change looks safe, since flags can only contain LLE_EXCLUSIVE.
> However, the assertion in in6_lltable_lookup() seems insufficient.  It
> asserts any lock on afdata.  I think it should also assert a write-lock
> in the LLE_CREATE case.
>
> Granted, this is not strictly related to your change.

Would you mind doing up a quick patch to demonstrate?

I've heard mumblings at work that ipv6 panics if you change the
routing tables during active traffic so if we're missing lock
assertions I'd like to add them now and try to pick up the problem as
it happens in testing.

Thanks!


-adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmom-qghnACRdU9xX5wg9jZLYqvUQe=aD2GkheMNtseekzg>