From owner-svn-src-all@FreeBSD.ORG Sat May 10 00:53:37 2014 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 653C196F; Sat, 10 May 2014 00:53:37 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 527D3DD0; Sat, 10 May 2014 00:53:37 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.8/8.14.8) with ESMTP id s4A0rbbY080572; Sat, 10 May 2014 00:53:37 GMT (envelope-from adrian@svn.freebsd.org) Received: (from adrian@localhost) by svn.freebsd.org (8.14.8/8.14.8/Submit) id s4A0rbF9080571; Sat, 10 May 2014 00:53:37 GMT (envelope-from adrian@svn.freebsd.org) Message-Id: <201405100053.s4A0rbF9080571@svn.freebsd.org> From: Adrian Chadd Date: Sat, 10 May 2014 00:53:37 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r265792 - head/sys/kern X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 May 2014 00:53:37 -0000 Author: adrian Date: Sat May 10 00:53:36 2014 New Revision: 265792 URL: http://svnweb.freebsd.org/changeset/base/265792 Log: Add in support to optionally pin the swi threads. Under enough load, the swi's can actually be preempted and migrated to other currently free cores. When doing RSS experiments, this lead to the per-CPU TCP timers not lining up any more with the RX CPU said flows were ending up on, leading to increased lock contention. Since there was a little pushback on flipping them on by default, I've left the default at "don't pin." The other less obvious problem here is that the default swi is also the same as the destination swi for CPU #0. So if one pins the swi on CPU #0, there's no default floating swi. A nice future project would be to create a separate swi for the "default" floating swi, as well as per-CPU swis that are (optionally) pinned. Tested: * parallel TCP tests (2 x 1g unfortunately for now); CPU: Intel(R) Xeon(R) CPU E5-2650 Note: This is based on some initial investigation into RSS/TCP stack lock contention on FreeBSD-HEAD whilst at Netflix in January 2014. Modified: head/sys/kern/kern_timeout.c Modified: head/sys/kern/kern_timeout.c ============================================================================== --- head/sys/kern/kern_timeout.c Sat May 10 00:42:43 2014 (r265791) +++ head/sys/kern/kern_timeout.c Sat May 10 00:53:36 2014 (r265792) @@ -104,6 +104,14 @@ static int ncallout; SYSCTL_INT(_kern, OID_AUTO, ncallout, CTLFLAG_RDTUN, &ncallout, 0, "Number of entries in callwheel and size of timeout() preallocation"); +static int pin_default_swi = 0; +static int pin_pcpu_swi = 0; + +SYSCTL_INT(_kern, OID_AUTO, pin_default_swi, CTLFLAG_RDTUN, &pin_default_swi, + 0, "Pin the default (non-per-cpu) swi (shared with PCPU 0 swi)"); +SYSCTL_INT(_kern, OID_AUTO, pin_pcpu_swi, CTLFLAG_RDTUN, &pin_pcpu_swi, + 0, "Pin the per-CPU swis (except PCPU 0, which is also default"); + /* * TODO: * allocate more timeout table slots when table overflows. @@ -273,6 +281,12 @@ callout_callwheel_init(void *dummy) callwheelmask = callwheelsize - 1; /* + * Fetch whether we're pinning the swi's or not. + */ + TUNABLE_INT_FETCH("kern.pin_default_swi", &pin_default_swi); + TUNABLE_INT_FETCH("kern.pin_pcpu_swi", &pin_pcpu_swi); + + /* * Only cpu0 handles timeout(9) and receives a preallocation. * * XXX: Once all timeout(9) consumers are converted this can @@ -355,6 +369,7 @@ start_softclock(void *dummy) char name[MAXCOMLEN]; #ifdef SMP int cpu; + struct intr_event *ie; #endif cc = CC_CPU(timeout_cpu); @@ -362,6 +377,13 @@ start_softclock(void *dummy) if (swi_add(&clk_intr_event, name, softclock, cc, SWI_CLOCK, INTR_MPSAFE, &cc->cc_cookie)) panic("died while creating standard software ithreads"); + if (pin_default_swi && + (intr_event_bind(clk_intr_event, timeout_cpu) != 0)) { + printf("%s: timeout clock couldn't be pinned to cpu %d\n", + __func__, + timeout_cpu); + } + #ifdef SMP CPU_FOREACH(cpu) { if (cpu == timeout_cpu) @@ -370,9 +392,16 @@ start_softclock(void *dummy) cc->cc_callout = NULL; /* Only cpu0 handles timeout(9). */ callout_cpu_init(cc); snprintf(name, sizeof(name), "clock (%d)", cpu); - if (swi_add(NULL, name, softclock, cc, SWI_CLOCK, + ie = NULL; + if (swi_add(&ie, name, softclock, cc, SWI_CLOCK, INTR_MPSAFE, &cc->cc_cookie)) panic("died while creating standard software ithreads"); + if (pin_pcpu_swi && (intr_event_bind(ie, cpu) != 0)) { + printf("%s: per-cpu clock couldn't be pinned to " + "cpu %d\n", + __func__, + cpu); + } } #endif }