From owner-freebsd-arch@FreeBSD.ORG Thu Jan 9 18:44:52 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 45E4671A; Thu, 9 Jan 2014 18:44:52 +0000 (UTC) Received: from mail-qc0-x233.google.com (mail-qc0-x233.google.com [IPv6:2607:f8b0:400d:c01::233]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id E70F4122D; Thu, 9 Jan 2014 18:44:51 +0000 (UTC) Received: by mail-qc0-f179.google.com with SMTP id e16so762293qcx.10 for ; Thu, 09 Jan 2014 10:44:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=aAhH5jlhR1TbfhKPo39MpDNEVRWF/Qa70XRrD8Z0s/g=; b=pglE7hL4B9YitMz1Km4mCWtHelsv2cmFOAJzhxnwvL3/DfJGcHWYYUkXraDtTGrV0c 054MSc0HIu+T0bQznQIYmGEiLH/bDn4klGREEt1A586oXEVEmpiz/29eCCr3VXvSJasd WsF/Xi01B9PM6BPK5zk/iEqHqvnNiLWg97dbPxJux15HAc2doNVRXKHr3Jl3cDugfOaN 0+R4HF3d6JyeeQx0h1rSGP4ESjBYBsWxL/4Dp+KTBxcxQvCzNk/GYSCiHaAxXVPmYJx2 gKK6qcZz1jJAOvluM/nyb9urOcbNjXy6tyDLkVZvZvOH+uL6wFhYVGusGSoZsWbeYPSV xM4g== MIME-Version: 1.0 X-Received: by 10.229.249.66 with SMTP id mj2mr10709347qcb.4.1389293091082; Thu, 09 Jan 2014 10:44:51 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.224.52.8 with HTTP; Thu, 9 Jan 2014 10:44:51 -0800 (PST) In-Reply-To: <9508909.MMfryVDtI5@ralph.baldwin.cx> References: <9508909.MMfryVDtI5@ralph.baldwin.cx> Date: Thu, 9 Jan 2014 10:44:51 -0800 X-Google-Sender-Auth: WAC05-lQnNKd0gYWaSPctCUmXc4 Message-ID: Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? From: Adrian Chadd To: John Baldwin Content-Type: text/plain; charset=ISO-8859-1 Cc: "freebsd-arch@freebsd.org" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jan 2014 18:44:52 -0000 On 9 January 2014 10:31, John Baldwin wrote: > On Friday, January 03, 2014 04:55:48 PM Adrian Chadd wrote: >> Hi, >> >> So here's a fun one. >> >> When doing TCP traffic + socket affinity + thread pinning experiments, >> I seem to hit this very annoying scenario that caps my performance and >> scalability. >> >> Assume I've lined up everything relating to a socket to run on the >> same CPU (ie, TX, RX, TCP timers, userland thread): > > Are you sure this is really the best setup? Especially if you have free CPUs > in the system the time you lose in context switches fighting over the one > assigned CPU for a flow when you have idle CPUs is quite wasteful. I know > that tying all of the work for a given flow to a single CPU is all the rage > right now, but I wonder if you had considered assigning a pair of CPUs to a > flow, one CPU to do the top-half (TX and userland thread) and one CPU to > do the bottom-half (RX and timers). This would remove the context switches > you see and replace it with spinning in the times when the two cores actually > contend. It may also be fairly well suited to SMT (which I suspect you might > have turned off currently). If you do have SMT turned off, then you can get > a pair of CPUs for each queue without having to reduce the number of queues > you are using. I'm not sure this would work better than creating one queue > for every CPU, but I think it is probably something worth trying for your use > case at least. > > BTW, the problem with just slapping critical enter into mutexes is you will > run afoul of assertions the first time you contend on a mutex and have to > block. It may be that only the assertions would break and nothing else, but > I'm not certain there aren't other assumptions about critical sections and > not ever context switching for any reason, voluntary or otherwise. It's the rage because it turns out it bounds the system behaviour rather nicely. The idea is to scale upwards of 60,000 active TCP sockets. Some people are looking at upwards of 100,000 active concurrent sockets. The amount of contention is non-trivial if it's not lined up. And yeah, I'm aware of the problem of just slapping critical sections around mutexes. I've faced this stuff in Linux. It's why doing this stuff is much more fragile on Linux.. :-P -a