Date: Mon, 13 Jan 2014 14:43:52 -0500 From: John Baldwin <jhb@freebsd.org> To: Adrian Chadd <adrian@freebsd.org> Cc: "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org> Subject: Re: Acquiring a lock on the same CPU that holds it - what can be done? Message-ID: <201401131443.52550.jhb@freebsd.org> In-Reply-To: <CAJ-Vmo=rayYvUYsNLs2A-T=a7WbrSA%2BTUPgDoGCHdbQjeJ9ynw@mail.gmail.com> References: <CAJ-Vmok-AJkz0THu72ThTdRhO2h1CnHwffq=cFZGZkbC=cWJZA@mail.gmail.com> <9508909.MMfryVDtI5@ralph.baldwin.cx> <CAJ-Vmo=rayYvUYsNLs2A-T=a7WbrSA%2BTUPgDoGCHdbQjeJ9ynw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday, January 09, 2014 1:44:51 pm Adrian Chadd wrote: > On 9 January 2014 10:31, John Baldwin <jhb@freebsd.org> wrote: > > On Friday, January 03, 2014 04:55:48 PM Adrian Chadd wrote: > >> Hi, > >> > >> So here's a fun one. > >> > >> When doing TCP traffic + socket affinity + thread pinning experiments, > >> I seem to hit this very annoying scenario that caps my performance and > >> scalability. > >> > >> Assume I've lined up everything relating to a socket to run on the > >> same CPU (ie, TX, RX, TCP timers, userland thread): > > > > Are you sure this is really the best setup? Especially if you have free CPUs > > in the system the time you lose in context switches fighting over the one > > assigned CPU for a flow when you have idle CPUs is quite wasteful. I know > > that tying all of the work for a given flow to a single CPU is all the rage > > right now, but I wonder if you had considered assigning a pair of CPUs to a > > flow, one CPU to do the top-half (TX and userland thread) and one CPU to > > do the bottom-half (RX and timers). This would remove the context switches > > you see and replace it with spinning in the times when the two cores actually > > contend. It may also be fairly well suited to SMT (which I suspect you might > > have turned off currently). If you do have SMT turned off, then you can get > > a pair of CPUs for each queue without having to reduce the number of queues > > you are using. I'm not sure this would work better than creating one queue > > for every CPU, but I think it is probably something worth trying for your use > > case at least. > > > > BTW, the problem with just slapping critical enter into mutexes is you will > > run afoul of assertions the first time you contend on a mutex and have to > > block. It may be that only the assertions would break and nothing else, but > > I'm not certain there aren't other assumptions about critical sections and > > not ever context switching for any reason, voluntary or otherwise. > > It's the rage because it turns out it bounds the system behaviour rather nicely. Yes, but are you willing to try the suggestion? This doesn't restrict to you a single queue-pair. It might net you 1 per core (instead of 1 per thread), but that's still more than 1. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201401131443.52550.jhb>