Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 04 Jan 2014 11:45:34 +0800
From:      David Xu <listlog2011@gmail.com>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        David Xu <davidxu@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: Acquiring a lock on the same CPU that holds it - what can be done?
Message-ID:  <52C783DE.1060102@gmail.com>
In-Reply-To: <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com>
References:  <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com> <52C77DB8.5020305@gmail.com> <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2014/01/04 11:25, Adrian Chadd wrote:
> Doesn't critical_enter / exit enable/disable interrupts?
>
> We don't necessarily want to do -that-, as that can be expensive. Just
> not scheduling certain tasks that would interfere would be good
> enough.
>
>
> -a

Does critical_enter disable interrupts ? Long time ago, I saw it does
not. If I remembered it correctly, spinlock_enter disables interrupt,
critical_enter still allows interrupt, but current thread can not be
preempted, it is deferred.


>
> On 3 January 2014 19:19, David Xu <listlog2011@gmail.com> wrote:
>> On 2014/01/04 08:55, Adrian Chadd wrote:
>>> Hi,
>>>
>>> So here's a fun one.
>>>
>>> When doing TCP traffic + socket affinity + thread pinning experiments,
>>> I seem to hit this very annoying scenario that caps my performance and
>>> scalability.
>>>
>>> Assume I've lined up everything relating to a socket to run on the
>>> same CPU (ie, TX, RX, TCP timers, userland thread):
>>>
>>> * userland code calls something, let's say "kqueue"
>>> * the kqueue lock gets grabbed
>>> * an interrupt comes in for the NIC
>>> * the NIC code runs some RX code, and eventually hits something that
>>> wants to push a knote up
>>> * and the knote is for the same kqueue above
>>> * .. so it grabs the lock..
>>> * .. contests..
>>> * Then the scheduler flips us back to the original userland thread doing TX
>>> * The userland thread finishes its kqueue manipulation and releases
>>> the queue lock
>>> * .. the scheduler then immediately flips back to the NIC thread
>>> waiting for the lock, grabs the lock, does a bit of work, then
>>> releases the lock
>>>
>>> I see this on kqueue locks, sendfile locks (for sendfile notification)
>>> and vm locks (for the VM page referencing/dereferencing.)
>>>
>>> This happens very frequently. It's very noticable with large numbers
>>> of sockets as the chances of hitting a lock in the NIC RX path that
>>> overlaps with something in the userland TX path that you are currently
>>> fiddling with (eg kqueue manipulation) or sending data (eg vm_page
>>> locks or sendfile locks for things you're currently transmitting) is
>>> very high. As I increase traffic and the number of sockets, the amount
>>> of context switches goes way up (to 300,000+) and the lock contention
>>> / time spent doing locking is non-trivial.
>>>
>>> Linux doesn't "have this" problem - the lock primitives let you
>>> disable driver bottom halves. So, in this instance, I'd just grab the
>>> lock with spin_lock_bh() and all the driver bottom halves would not be
>>> run. I'd thus not have this scheduler ping-ponging and lock contention
>>> as it'd never get a chance to happen.
>>>
>>> So, does anyone have any ideas? Has anyone seen this? Shall we just
>>> implement a way of doing selective thread disabling, a la
>>> spin_lock_bh() mixed with spl${foo}() style stuff?
>>>
>>> Thanks,
>>>
>>>
>>> -adrian
>>>
>> This is how turnstile based mutex works, AFAIK it is for realtime,
>> same as POSIX pthread priority inheritance mutex,  realtime does not
>> mean high performance, in fact, it introduces more context switches
>> and hurts throughput. I think default mutex could be patched to
>> call critical_enter when mutex_lock is called, and spin forever,
>> and call critical_leave when the mutex is unlocked, bypass turnstile.
>> The turnstile design assumes the whole system must be scheduled
>> on global thread priority, but who did say a system must be based on this?
>> Recently, I had ported Linux CFS like scheduler to FreeBSD on our
>> perforce server,
>> it is based on start-time fair queue, and I found turnstile is such a
>> bad thing.
>> it makes me can not schedule thread based on class: rt > timeshare > idle,
>> but must face with a global thread priority change.
>> I have stopped porting it, although it is now fully work on UP, it supports
>> nested group scheduling, I can watch video smoothly while doing "make
>> -j10 buildwork" on
>> same UP machine. My scheduler does not work on SMP, too much priority
>> propagation
>> work makes me go away, non-preemption spinlock works well for such
>> a system,  propagating thread weight on a scheduler tree is not practical.
>>
>> Regards,
>> David Xu
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> freebsd-arch@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52C783DE.1060102>