From owner-freebsd-arch@FreeBSD.ORG Sat Aug 20 11:38:27 2011 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38E98106566B; Sat, 20 Aug 2011 11:38:27 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0DC048FC16; Sat, 20 Aug 2011 11:38:27 +0000 (UTC) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id A825146B06; Sat, 20 Aug 2011 07:38:26 -0400 (EDT) Date: Sat, 20 Aug 2011 12:38:26 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Lev Serebryakov In-Reply-To: <368496955.20110820101506@serebryakov.spb.ru> Message-ID: References: <810527321.20110819123700@serebryakov.spb.ru> <201108191401.23083.pieter@degoeje.nl> <425884435.20110819175307@serebryakov.spb.ru> <20110819172252.GE88904@in-addr.com> <368496955.20110820101506@serebryakov.spb.ru> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-arch@freebsd.org Subject: 10gbps scalability (was: Re: FreeBSD problems and preliminary ways to solve) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Aug 2011 11:38:27 -0000 On Sat, 20 Aug 2011, Lev Serebryakov wrote: >> Can you honestly say the same about handling line rate packet forwarding >> for multiple 10G cards? > > I agree with you. I've not say, that 10G routing is very important for many > users. My comment about 10G was answer to statement, that "The niche for > routers & traffic analysis is still ours.". I wanted to say, that it is so > may be now, but not for long. Part of the key here will be reworking things like ipfw(4) and pf(4) to scale better than they do currently. For pf(4), it's particularly important that we align hardware work distribution via RSS with state management for TCP connections. I've been working on this for the base system TCP implementation over the last few years, and got most of it into 9.x (but not the actual RSS driver interface, as I wasn't convinced it was a stable KPI in the form I prototyped it in). Post-9.0, I'll try to get the RSS KPI cleaned up so that we can merge it and get our device drivers updated. There's also a related work-in-progress I have that teaches the network stack how to program NIC filters, usually implemented as TCAMs (Chelsio) or hardware hash tables (Solarflare) about network stack connection affinity. My plan is to work on making this substantially more real once the RSS patches are in. (Those are, themselves, fairly minor: we have connection groups already in 9.0, and the RSS changes simply cause existing software-side hash tables to align with hardware-side hashing: the tricky bit is a sustainable KPI for device driver writers). These are closely related to the issue of userspace networking, which Luigi is starting to explore with netmap. Ideally, you could use the same NIC for both kernel network stack stuff and userspace applications, using hardware filters to decide whether individual packets go to a descriptor ring in the kernel or userspace. Solarflare's Open Onload is an interesting potential model there, although perhaps not the exact model we want (they rely on shared network stacks between kernel and userspace, and for most of our purposes, less sharing is not only sufficient, but perhaps better). Robert