Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Jan 2014 16:55:48 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Acquiring a lock on the same CPU that holds it - what can be done?
Message-ID:  <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
Hi,

So here's a fun one.

When doing TCP traffic + socket affinity + thread pinning experiments,
I seem to hit this very annoying scenario that caps my performance and
scalability.

Assume I've lined up everything relating to a socket to run on the
same CPU (ie, TX, RX, TCP timers, userland thread):

* userland code calls something, let's say "kqueue"
* the kqueue lock gets grabbed
* an interrupt comes in for the NIC
* the NIC code runs some RX code, and eventually hits something that
wants to push a knote up
* and the knote is for the same kqueue above
* .. so it grabs the lock..
* .. contests..
* Then the scheduler flips us back to the original userland thread doing TX
* The userland thread finishes its kqueue manipulation and releases
the queue lock
* .. the scheduler then immediately flips back to the NIC thread
waiting for the lock, grabs the lock, does a bit of work, then
releases the lock

I see this on kqueue locks, sendfile locks (for sendfile notification)
and vm locks (for the VM page referencing/dereferencing.)

This happens very frequently. It's very noticable with large numbers
of sockets as the chances of hitting a lock in the NIC RX path that
overlaps with something in the userland TX path that you are currently
fiddling with (eg kqueue manipulation) or sending data (eg vm_page
locks or sendfile locks for things you're currently transmitting) is
very high. As I increase traffic and the number of sockets, the amount
of context switches goes way up (to 300,000+) and the lock contention
/ time spent doing locking is non-trivial.

Linux doesn't "have this" problem - the lock primitives let you
disable driver bottom halves. So, in this instance, I'd just grab the
lock with spin_lock_bh() and all the driver bottom halves would not be
run. I'd thus not have this scheduler ping-ponging and lock contention
as it'd never get a chance to happen.

So, does anyone have any ideas? Has anyone seen this? Shall we just
implement a way of doing selective thread disabling, a la
spin_lock_bh() mixed with spl${foo}() style stuff?

Thanks,


-adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA>