From owner-svn-src-all@freebsd.org Thu Apr 25 21:24:57 2019 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 93D2915A2D18; Thu, 25 Apr 2019 21:24:57 +0000 (UTC) (envelope-from shurd@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2EA616C623; Thu, 25 Apr 2019 21:24:57 +0000 (UTC) (envelope-from shurd@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 134AB19A4; Thu, 25 Apr 2019 21:24:57 +0000 (UTC) (envelope-from shurd@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id x3PLOumF081665; Thu, 25 Apr 2019 21:24:56 GMT (envelope-from shurd@FreeBSD.org) Received: (from shurd@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id x3PLOuC2081663; Thu, 25 Apr 2019 21:24:56 GMT (envelope-from shurd@FreeBSD.org) Message-Id: <201904252124.x3PLOuC2081663@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: shurd set sender to shurd@FreeBSD.org using -f From: Stephen Hurd Date: Thu, 25 Apr 2019 21:24:56 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r346708 - in head: share/man/man4 sys/net X-SVN-Group: head X-SVN-Commit-Author: shurd X-SVN-Commit-Paths: in head: share/man/man4 sys/net X-SVN-Commit-Revision: 346708 X-SVN-Commit-Repository: base MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 2EA616C623 X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org X-Spamd-Result: default: False [-2.96 / 15.00]; local_wl_from(0.00)[FreeBSD.org]; NEURAL_HAM_SHORT(-0.96)[-0.964,0]; ASN(0.00)[asn:11403, ipnet:2610:1c1:1::/48, country:US]; NEURAL_HAM_MEDIUM(-0.99)[-0.995,0]; NEURAL_HAM_LONG(-1.00)[-1.000,0] X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Apr 2019 21:24:57 -0000 Author: shurd Date: Thu Apr 25 21:24:56 2019 New Revision: 346708 URL: https://svnweb.freebsd.org/changeset/base/346708 Log: iflib: Better control over queue core assignment By default, cores are now assigned to queues in a sequential manner rather than all NICs starting at the first core. On a four-core system with two NICs each using two queue pairs, the nic:queue -> core mapping has changed from this: 0:0 -> 0, 0:1 -> 1 1:0 -> 0, 1:1 -> 1 To this: 0:0 -> 0, 0:1 -> 1 1:0 -> 2, 1:1 -> 3 Additionally, a device can now be configured to use separate cores for TX and RX queues. Two new tunables have been added, dev.X.Y.iflib.separate_txrx and dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part of the auto-assigned sequence. Reviewed by: marius MFC after: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D20029 Modified: head/share/man/man4/iflib.4 head/sys/net/iflib.c Modified: head/share/man/man4/iflib.4 ============================================================================== --- head/share/man/man4/iflib.4 Thu Apr 25 21:09:07 2019 (r346707) +++ head/share/man/man4/iflib.4 Thu Apr 25 21:24:56 2019 (r346708) @@ -55,6 +55,16 @@ If zero, the number of TX queues is derived from the n socket connected to the controller. .It Va disable_msix Disables MSI-X interrupts for the device. +.It Va core_offset +Specifies a starting core offset to assign queues to. +If the value is unspecified or 65535, cores are assigned sequentially across +controllers. +.It Va separate_txrx +Requests that RX and TX queues not be paired on the same core. +If this is zero or not set, an RX and TX queue pair will be assigned to each +core. +When set to a non-zero value, TX queues are assigned to cores following the +last RX queue. .El .Pp These Modified: head/sys/net/iflib.c ============================================================================== --- head/sys/net/iflib.c Thu Apr 25 21:09:07 2019 (r346707) +++ head/sys/net/iflib.c Thu Apr 25 21:24:56 2019 (r346708) @@ -189,6 +189,9 @@ struct iflib_ctx { uint16_t ifc_sysctl_qs_eq_override; uint16_t ifc_sysctl_rx_budget; uint16_t ifc_sysctl_tx_abdicate; + uint16_t ifc_sysctl_core_offset; +#define CORE_OFFSET_UNSPECIFIED 0xffff + uint8_t ifc_sysctl_separate_txrx; qidx_t ifc_sysctl_ntxds[8]; qidx_t ifc_sysctl_nrxds[8]; @@ -723,6 +726,18 @@ static void iflib_free_intr_mem(if_ctx_t ctx); static struct mbuf * iflib_fixup_rx(struct mbuf *m); #endif +static SLIST_HEAD(cpu_offset_list, cpu_offset) cpu_offsets = + SLIST_HEAD_INITIALIZER(cpu_offsets); +struct cpu_offset { + SLIST_ENTRY(cpu_offset) entries; + cpuset_t set; + unsigned int refcount; + uint16_t offset; +}; +static struct mtx cpu_offset_mtx; +MTX_SYSINIT(iflib_cpu_offset, &cpu_offset_mtx, "iflib_cpu_offset lock", + MTX_DEF); + NETDUMP_DEFINE(iflib); #ifdef DEV_NETMAP @@ -4461,6 +4476,71 @@ iflib_rem_pfil(if_ctx_t ctx) pfil_head_unregister(pfil); } +static uint16_t +get_ctx_core_offset(if_ctx_t ctx) +{ + if_softc_ctx_t scctx = &ctx->ifc_softc_ctx; + struct cpu_offset *op; + uint16_t qc; + uint16_t ret = ctx->ifc_sysctl_core_offset; + + if (ret != CORE_OFFSET_UNSPECIFIED) + return (ret); + + if (ctx->ifc_sysctl_separate_txrx) + qc = scctx->isc_ntxqsets + scctx->isc_nrxqsets; + else + qc = max(scctx->isc_ntxqsets, scctx->isc_nrxqsets); + + mtx_lock(&cpu_offset_mtx); + SLIST_FOREACH(op, &cpu_offsets, entries) { + if (CPU_CMP(&ctx->ifc_cpus, &op->set) == 0) { + ret = op->offset; + op->offset += qc; + MPASS(op->refcount < UINT_MAX); + op->refcount++; + break; + } + } + if (ret == CORE_OFFSET_UNSPECIFIED) { + ret = 0; + op = malloc(sizeof(struct cpu_offset), M_IFLIB, + M_NOWAIT | M_ZERO); + if (op == NULL) { + device_printf(ctx->ifc_dev, + "allocation for cpu offset failed.\n"); + } else { + op->offset = qc; + op->refcount = 1; + CPU_COPY(&ctx->ifc_cpus, &op->set); + SLIST_INSERT_HEAD(&cpu_offsets, op, entries); + } + } + mtx_unlock(&cpu_offset_mtx); + + return (ret); +} + +static void +unref_ctx_core_offset(if_ctx_t ctx) +{ + struct cpu_offset *op, *top; + + mtx_lock(&cpu_offset_mtx); + SLIST_FOREACH_SAFE(op, &cpu_offsets, entries, top) { + if (CPU_CMP(&ctx->ifc_cpus, &op->set) == 0) { + MPASS(op->refcount > 0); + op->refcount--; + if (op->refcount == 0) { + SLIST_REMOVE(&cpu_offsets, op, cpu_offset, entries); + free(op, M_IFLIB); + } + break; + } + } + mtx_unlock(&cpu_offset_mtx); +} + int iflib_device_register(device_t dev, void *sc, if_shared_ctx_t sctx, if_ctx_t *ctxp) { @@ -4613,6 +4693,11 @@ iflib_device_register(device_t dev, void *sc, if_share goto fail_queues; /* + * Now that we know how many queues there are, get the core offset. + */ + ctx->ifc_sysctl_core_offset = get_ctx_core_offset(ctx); + + /* * Group taskqueues aren't properly set up until SMP is started, * so we disable interrupts until we can handle them post * SI_SUB_SMP. @@ -5037,6 +5122,7 @@ iflib_device_deregister(if_ctx_t ctx) iflib_rx_structures_free(ctx); if (ctx->ifc_flags & IFC_SC_ALLOCATED) free(ctx->ifc_softc, M_IFLIB); + unref_ctx_core_offset(ctx); STATE_LOCK_DESTROY(ctx); free(ctx, M_IFLIB); return (0); @@ -5655,7 +5741,7 @@ find_child_with_core(int cpu, struct cpu_group *grp) * Find the nth "close" core to the specified core * "close" is defined as the deepest level that shares * at least an L2 cache. With threads, this will be - * threads on the same core. If the sahred cache is L3 + * threads on the same core. If the shared cache is L3 * or higher, simply returns the same core. */ static int @@ -5739,10 +5825,13 @@ iflib_irq_set_affinity(if_ctx_t ctx, if_irq_t irq, ifl const char *name) { device_t dev; - int err, cpuid, tid; + int co, cpuid, err, tid; dev = ctx->ifc_dev; - cpuid = find_nth(ctx, qid); + co = ctx->ifc_sysctl_core_offset; + if (ctx->ifc_sysctl_separate_txrx && type == IFLIB_INTR_TX) + co += ctx->ifc_softc_ctx.isc_nrxqsets; + cpuid = find_nth(ctx, qid + co); tid = get_core_offset(ctx, type, qid); MPASS(tid >= 0); cpuid = find_close_core(cpuid, tid); @@ -6344,6 +6433,13 @@ iflib_add_device_sysctl_pre(if_ctx_t ctx) SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "tx_abdicate", CTLFLAG_RWTUN, &ctx->ifc_sysctl_tx_abdicate, 0, "cause tx to abdicate instead of running to completion"); + ctx->ifc_sysctl_core_offset = CORE_OFFSET_UNSPECIFIED; + SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "core_offset", + CTLFLAG_RDTUN, &ctx->ifc_sysctl_core_offset, 0, + "offset to start using cores at"); + SYSCTL_ADD_U8(ctx_list, oid_list, OID_AUTO, "separate_txrx", + CTLFLAG_RDTUN, &ctx->ifc_sysctl_separate_txrx, 0, + "use separate cores for TX and RX"); /* XXX change for per-queue sizes */ SYSCTL_ADD_PROC(ctx_list, oid_list, OID_AUTO, "override_ntxds",