From owner-freebsd-net@FreeBSD.ORG Wed Jan 2 23:00:08 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BA50516A417; Wed, 2 Jan 2008 23:00:08 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from out4.smtp.messagingengine.com (out4.smtp.messagingengine.com [66.111.4.28]) by mx1.freebsd.org (Postfix) with ESMTP id 8F7FE13C455; Wed, 2 Jan 2008 23:00:08 +0000 (UTC) (envelope-from bms@FreeBSD.org) Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id 2136382BEC; Wed, 2 Jan 2008 18:00:08 -0500 (EST) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute1.internal (MEProxy); Wed, 02 Jan 2008 18:00:08 -0500 X-Sasl-enc: Dm5wF5UAG3ELkHX9Ly5TBPv+E8N6JHR71dRc7KQu/JX1 1199314807 Received: from empiric.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTP id 77F6525D94; Wed, 2 Jan 2008 18:00:07 -0500 (EST) Message-ID: <477C1776.2080002@FreeBSD.org> Date: Wed, 02 Jan 2008 23:00:06 +0000 From: "Bruce M. Simpson" User-Agent: Thunderbird 2.0.0.6 (X11/20070928) MIME-Version: 1.0 To: Andre Oppermann References: <43B45EEF.6060800@x-trader.de> <43B47CB5.3C0F1632@freebsd.org> <477C1434.80106@freebsd.org> In-Reply-To: <477C1434.80106@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Tiffany Snyder Subject: Re: Routing SMP benefit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2008 23:00:08 -0000 Andre Oppermann wrote: > So far the PPS rate limit has primarily been the cache miss penalties > on the packet access. Multiple CPUs can help here of course for bi- > directional traffic. Hardware based packet header cache prefetching as > done by some embedded MIPS based network processors at least doubles the > performance. Intel has something like this for a couple of chipset and > network chip combinations. We don't support that feature yet though. What sort of work is needed in order to support header prefetch? > > Many of the things you mention here are planned for FreeBSD 8.0 in the > same or different form. Work in progress is the separation of the ARP > table from kernel routing table. If we can prevent references to radix > nodes generally almost all locking can be done away with. Instead only > a global rmlock (read-mostly) could govern the entire routing table. > Obtaining the rmlock for reading is essentially free. This is exactly what I'm thinking, this feels like the right way forward. A single rwlock should be fine, route table updates should generally only be happening from one process, and thus a single thread, at any given time. > Table changes > are very infrequent compared to lookups (like 700,000 to 300-400) in > default free Internet routing. The radix trie nodes are rather big > and could use some more trimming to make the fit a single cache line. > I've already removed some stuff a couple of years ago and more can be > done. > > It's very important to keep this in mind: "profile, don't speculate". Beware though that functionality isn't sacrificed at the expense of this. For example it would be very, very useful to be able to merge the multicast routing implementation with the unicast -- with the proviso of course that mBGP requires that RPF can be performed with a separate set of FIB entries from the unicast FIB. Of course if next-hops themselves are held in a container separately referenced from the radix node, such as a simple linked list as per the OpenBSD code. If we ensure the parent radix trie node object fits in a cache line, then that's fine. [I am looking at some stuff in the dynamic/ad-hoc/mesh space which is really going to need support for multipath similar to this.] later BMS