From owner-freebsd-hackers@FreeBSD.ORG Sun Jun 9 21:04:47 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id F338DF0D; Sun, 9 Jun 2013 21:04:46 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id CE1311212; Sun, 9 Jun 2013 21:04:46 +0000 (UTC) Received: from pool-108-21-117-224.nycmny.east.verizon.net ([108.21.117.224]:50488 helo=[192.168.0.177]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.80) (envelope-from ) id 1Ulmn7-00021z-TV; Sun, 09 Jun 2013 17:04:46 -0400 From: George Neville-Neil Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Network Recieve Performance Working Group Date: Sun, 9 Jun 2013 17:04:49 -0400 Message-Id: <8537DE82-46F4-4E11-AECA-42F118AB179F@neville-neil.com> To: "hackers@freebsd.org" Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) X-Mailer: Apple Mail (2.1508) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com X-Get-Message-Sender-Via: vps.hungerhost.com: authenticated_id: gnn@neville-neil.com X-Mailman-Approved-At: Mon, 10 Jun 2013 00:09:00 +0000 Cc: "devsummit@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Jun 2013 21:04:47 -0000 Howdy, At the Network Receive Performance working group at BSDCan we covered a = narrower set of topics than we normally do, which seems to have resulted in a reasonably sized = work list for improving our systems in this area. The main issues relate to getting a good API = that addresses multi-queue NICs. The notes are on the WIki page as well as reproduced here. Best, George https://wiki.freebsd.org/201305DevSummit/NetworkReceivePerformance The discussion opened with an attempt to constrain the problem we were = trying to solve, including pointing out that any KPI/API suggested = needed to be achievable in the next six months. Some of the existing solutions to the problem of talking to hardware = with multiple queues, which all high end NICs currently have, were: =95 Connection Groups =95 Not really a KPI =95 RSS vs. Flow Table is an issue to solve, we have = things for the former, but little for the latter =95 Socket affinity is also an issue =95 NAPI =95 This is an APi in Linux. It uses upcalls. =95 Flow table mapping. Chelsio may have some of this. =95 SRIO =95 VLL Cloner There are several ways to map flows, including: 4 tuple, MAC filter, = arbitrary offset. An API that only handles offset, length, value is too = simple from the standpoint of getting the right data into the hardware. = We need something more rich on the kernel side of the API to that driver = writers don't have to figure out our intentions. Some methods that a good KPI/API ought to have include: =95 Query Device for information about its queues, including how = many exist, and how they are mapped to other resources, including CPU = and memory =95 Map CPUID to a Flow =95 Setup RSS =95 Request RxRing local memory =95 Solaris Mapping API might be a way to go = (http://www.oracle.com/technetwork/articles/servers-storage-admin/crossbow= setup-191326.pdf) =95 Some consumers of such an API include: Performance, affinity, = virtualization, policy, kernel bypass, QoS, and VIMAGE. We have two patches, for different bits, to start from including Vijay's = [RobertWatson] and Randall's [RandallStewart], [GeorgeNevilleNeil] We need quite a few things, including: =95 Per connection flow table =95 Describing queues in the stack such that we can expose = interesting parts via netstat. =95 Packet Batching. This was not overwhelmingly popular. A straw person API includes: =95 MBUF Flag =95 Hash Value =95 The whole thing may be used as opaque =95 Used by the stack for inpcb =95 Get number of buckets =95 Map bucket to RSS =95 Map queue/ithread to CPU =95 Get width of the hash =95 RSS get CPU =95 RSS get hash algo =95 Pick hash inputs =95 Get and set key =95 Rebalance =95 Software hash table =95 Query queue length =95 Get queue affinity =95 Set mask (CPUSET) on socket =95 Set policy on CPU/socket =95 Queue event reporting =95 Load distrubtion stats