From owner-freebsd-arch@FreeBSD.ORG  Thu Jan  9 18:44:52 2014
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 45E4671A;
 Thu,  9 Jan 2014 18:44:52 +0000 (UTC)
Received: from mail-qc0-x233.google.com (mail-qc0-x233.google.com
 [IPv6:2607:f8b0:400d:c01::233])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id E70F4122D;
 Thu,  9 Jan 2014 18:44:51 +0000 (UTC)
Received: by mail-qc0-f179.google.com with SMTP id e16so762293qcx.10
 for <multiple recipients>; Thu, 09 Jan 2014 10:44:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=aAhH5jlhR1TbfhKPo39MpDNEVRWF/Qa70XRrD8Z0s/g=;
 b=pglE7hL4B9YitMz1Km4mCWtHelsv2cmFOAJzhxnwvL3/DfJGcHWYYUkXraDtTGrV0c
 054MSc0HIu+T0bQznQIYmGEiLH/bDn4klGREEt1A586oXEVEmpiz/29eCCr3VXvSJasd
 WsF/Xi01B9PM6BPK5zk/iEqHqvnNiLWg97dbPxJux15HAc2doNVRXKHr3Jl3cDugfOaN
 0+R4HF3d6JyeeQx0h1rSGP4ESjBYBsWxL/4Dp+KTBxcxQvCzNk/GYSCiHaAxXVPmYJx2
 gKK6qcZz1jJAOvluM/nyb9urOcbNjXy6tyDLkVZvZvOH+uL6wFhYVGusGSoZsWbeYPSV
 xM4g==
MIME-Version: 1.0
X-Received: by 10.229.249.66 with SMTP id mj2mr10709347qcb.4.1389293091082;
 Thu, 09 Jan 2014 10:44:51 -0800 (PST)
Sender: adrian.chadd@gmail.com
Received: by 10.224.52.8 with HTTP; Thu, 9 Jan 2014 10:44:51 -0800 (PST)
In-Reply-To: <9508909.MMfryVDtI5@ralph.baldwin.cx>
References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com>
 <9508909.MMfryVDtI5@ralph.baldwin.cx>
Date: Thu, 9 Jan 2014 10:44:51 -0800
X-Google-Sender-Auth: WAC05-lQnNKd0gYWaSPctCUmXc4
Message-ID: <CAJ-Vmo=rayYvUYsNLs2A-T=a7WbrSA+TUPgDoGCHdbQjeJ9ynw@mail.gmail.com>
Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done?
From: Adrian Chadd <adrian@freebsd.org>
To: John Baldwin <jhb@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Jan 2014 18:44:52 -0000

On 9 January 2014 10:31, John Baldwin <jhb@freebsd.org> wrote:
> On Friday, January 03, 2014 04:55:48 PM Adrian Chadd wrote:
>> Hi,
>>
>> So here's a fun one.
>>
>> When doing TCP traffic + socket affinity + thread pinning experiments,
>> I seem to hit this very annoying scenario that caps my performance and
>> scalability.
>>
>> Assume I've lined up everything relating to a socket to run on the
>> same CPU (ie, TX, RX, TCP timers, userland thread):
>
> Are you sure this is really the best setup?  Especially if you have free CPUs
> in the system the time you lose in context switches fighting over the one
> assigned CPU for a flow when you have idle CPUs is quite wasteful.  I know
> that tying all of the work for a given flow to a single CPU is all the rage
> right now, but I wonder if you had considered assigning a pair of CPUs to a
> flow, one CPU to do the top-half (TX and userland thread) and one CPU to
> do the bottom-half (RX and timers).  This would remove the context switches
> you see and replace it with spinning in the times when the two cores actually
> contend.  It may also be fairly well suited to SMT (which I suspect you might
> have turned off currently).  If you do have SMT turned off, then you can get
> a pair of CPUs for each queue without having to reduce the number of queues
> you are using.  I'm not sure this would work better than creating one queue
> for every CPU, but I think it is probably something worth trying for your use
> case at least.
>
> BTW, the problem with just slapping critical enter into mutexes is you will
> run afoul of assertions the first time you contend on a mutex and have to
> block.  It may be that only the assertions would break and nothing else, but
> I'm not certain there aren't other assumptions about critical sections and
> not ever context switching for any reason, voluntary or otherwise.

It's the rage because it turns out it bounds the system behaviour rather nicely.

The idea is to scale upwards of 60,000 active TCP sockets. Some people
are looking at upwards of 100,000 active concurrent sockets. The
amount of contention is non-trivial if it's not lined up.

And yeah, I'm aware of the problem of just slapping critical sections
around mutexes. I've faced this stuff in Linux. It's why doing this
stuff is much more fragile on Linux.. :-P


-a