Date: Sat, 04 Jan 2014 11:45:34 +0800 From: David Xu <listlog2011@gmail.com> To: Adrian Chadd <adrian@freebsd.org> Cc: David Xu <davidxu@freebsd.org>, "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? Message-ID: <52C783DE.1060102@gmail.com> In-Reply-To: <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com> References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com> <52C77DB8.5020305@gmail.com> <CAJ-Vmok=VSLiwzh-626qUWUuqJC1rtg58mwB_zqT2oQd64oo_Q@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2014/01/04 11:25, Adrian Chadd wrote: > Doesn't critical_enter / exit enable/disable interrupts? > > We don't necessarily want to do -that-, as that can be expensive. Just > not scheduling certain tasks that would interfere would be good > enough. > > > -a Does critical_enter disable interrupts ? Long time ago, I saw it does not. If I remembered it correctly, spinlock_enter disables interrupt, critical_enter still allows interrupt, but current thread can not be preempted, it is deferred. > > On 3 January 2014 19:19, David Xu <listlog2011@gmail.com> wrote: >> On 2014/01/04 08:55, Adrian Chadd wrote: >>> Hi, >>> >>> So here's a fun one. >>> >>> When doing TCP traffic + socket affinity + thread pinning experiments, >>> I seem to hit this very annoying scenario that caps my performance and >>> scalability. >>> >>> Assume I've lined up everything relating to a socket to run on the >>> same CPU (ie, TX, RX, TCP timers, userland thread): >>> >>> * userland code calls something, let's say "kqueue" >>> * the kqueue lock gets grabbed >>> * an interrupt comes in for the NIC >>> * the NIC code runs some RX code, and eventually hits something that >>> wants to push a knote up >>> * and the knote is for the same kqueue above >>> * .. so it grabs the lock.. >>> * .. contests.. >>> * Then the scheduler flips us back to the original userland thread doing TX >>> * The userland thread finishes its kqueue manipulation and releases >>> the queue lock >>> * .. the scheduler then immediately flips back to the NIC thread >>> waiting for the lock, grabs the lock, does a bit of work, then >>> releases the lock >>> >>> I see this on kqueue locks, sendfile locks (for sendfile notification) >>> and vm locks (for the VM page referencing/dereferencing.) >>> >>> This happens very frequently. It's very noticable with large numbers >>> of sockets as the chances of hitting a lock in the NIC RX path that >>> overlaps with something in the userland TX path that you are currently >>> fiddling with (eg kqueue manipulation) or sending data (eg vm_page >>> locks or sendfile locks for things you're currently transmitting) is >>> very high. As I increase traffic and the number of sockets, the amount >>> of context switches goes way up (to 300,000+) and the lock contention >>> / time spent doing locking is non-trivial. >>> >>> Linux doesn't "have this" problem - the lock primitives let you >>> disable driver bottom halves. So, in this instance, I'd just grab the >>> lock with spin_lock_bh() and all the driver bottom halves would not be >>> run. I'd thus not have this scheduler ping-ponging and lock contention >>> as it'd never get a chance to happen. >>> >>> So, does anyone have any ideas? Has anyone seen this? Shall we just >>> implement a way of doing selective thread disabling, a la >>> spin_lock_bh() mixed with spl${foo}() style stuff? >>> >>> Thanks, >>> >>> >>> -adrian >>> >> This is how turnstile based mutex works, AFAIK it is for realtime, >> same as POSIX pthread priority inheritance mutex, realtime does not >> mean high performance, in fact, it introduces more context switches >> and hurts throughput. I think default mutex could be patched to >> call critical_enter when mutex_lock is called, and spin forever, >> and call critical_leave when the mutex is unlocked, bypass turnstile. >> The turnstile design assumes the whole system must be scheduled >> on global thread priority, but who did say a system must be based on this? >> Recently, I had ported Linux CFS like scheduler to FreeBSD on our >> perforce server, >> it is based on start-time fair queue, and I found turnstile is such a >> bad thing. >> it makes me can not schedule thread based on class: rt > timeshare > idle, >> but must face with a global thread priority change. >> I have stopped porting it, although it is now fully work on UP, it supports >> nested group scheduling, I can watch video smoothly while doing "make >> -j10 buildwork" on >> same UP machine. My scheduler does not work on SMP, too much priority >> propagation >> work makes me go away, non-preemption spinlock works well for such >> a system, propagating thread weight on a scheduler tree is not practical. >> >> Regards, >> David Xu >> >> >> >> >> >> >> >> >> _______________________________________________ >> freebsd-arch@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-arch >> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?52C783DE.1060102>