From owner-freebsd-net@FreeBSD.ORG Wed Jan 2 22:46:16 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E11E616A421 for ; Wed, 2 Jan 2008 22:46:15 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 5CE8913C46E for ; Wed, 2 Jan 2008 22:46:15 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 81323 invoked from network); 2 Jan 2008 22:11:31 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 2 Jan 2008 22:11:31 -0000 Message-ID: <477C1434.80106@freebsd.org> Date: Wed, 02 Jan 2008 23:46:12 +0100 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Tiffany Snyder References: <43B45EEF.6060800@x-trader.de> <43B47CB5.3C0F1632@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Routing SMP benefit X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jan 2008 22:46:16 -0000 Tiffany Snyder wrote: > Hi Andre, > are those numbers for small (64 bytes) packets? Good job on pushing > the base numbers higher on the same HW. Yes, 64 bytes. Haven't measured lately, but I assume PCI-E hardware instead of PCI-X could push quite some more. > What piqued my attention was the note that our forwarding > performance doesn't scale with multiple CPUs. Which means there's a lot of > work to be done :-) Have we taken a look at OpenSolaris' Surya > (http://www.opensolaris.org/os/community/networking/surya-design.pdf) > project? They allow multiple readers/single writer on the radix_node_head > (and not a mutex as we do) and we may be able to do the same to gain some > parallelism. There are other things in Surya that exploit multiple CPUs. > It's definitely worth a read. DragonFlyBSD seems to achieve parallelism by > classifying packet as flows and then redirecting the flows to different > CPUs. OpenSolaris also does something similar. We can definitely think along > those lines. So far the PPS rate limit has primarily been the cache miss penalties on the packet access. Multiple CPUs can help here of course for bi- directional traffic. Hardware based packet header cache prefetching as done by some embedded MIPS based network processors at least doubles the performance. Intel has something like this for a couple of chipset and network chip combinations. We don't support that feature yet though. Many of the things you mention here are planned for FreeBSD 8.0 in the same or different form. Work in progress is the separation of the ARP table from kernel routing table. If we can prevent references to radix nodes generally almost all locking can be done away with. Instead only a global rmlock (read-mostly) could govern the entire routing table. Obtaining the rmlock for reading is essentially free. Table changes are very infrequent compared to lookups (like 700,000 to 300-400) in default free Internet routing. The radix trie nodes are rather big and could use some more trimming to make the fit a single cache line. I've already removed some stuff a couple of years ago and more can be done. It's very important to keep this in mind: "profile, don't speculate". For example while the DragonFly model may seem good in theory it so far did not show itself to be faster. Back when I had the Agilent N2X network tester DFBSD was the poorest performer in the test set of FreeBSD, OpenBSD and DFBSD. Haven't tested Solaris yet, and neither retested the others, but until that is done we should not jump to conclusions yet. At the time we were more than two to three times faster than the other BSDs. > NOTE: > 1) I said multiple instead of dual CPUs on purpose. > 2) I mentioned OpenSolaris and DragonFlyBSD as examples and to acknowledge > the work they are doing and to show that FreeBSD is far behind and is losing > it's lustre on continuing to be the networking platform of choice. Like I said. Don't jump to conclusions without real testing and profiling. Reality may turn out to be different from theory. ;) -- Andre > Thanks, > > Tiffany. > > > On 12/29/05, Andre Oppermann wrote: > >> Markus Oestreicher wrote: >>> Currently running a few routers on 5-STABLE I have read the >>> recent changes in the network stack with interest. >> You should run 6.0R. It contains many improvements over 5-STABLE. >> >>> A few questions come to my mind: >>> >>> - Can a machine that mainly routes packets between two em(4) >>> interfaces benefit from a second CPU and SMP kernel? Can both >>> CPUs process packets from the same interface in parallel? >> My testing has shown that a machine can benefit from it but not >> much in the forwarding performance. The main benefit is the >> prevention of lifelock if you have very high packet loads. The >> second CPU on SMP keeps on doing all userland tasks and running >> routing protocols. Otherwise your BGP sessions or OSPF hellos >> would stop and remove you from the routing cloud. >> >>> - From reading the lists it appears that net.isr.direct >>> and net.ip.fastforwarding are doing similar things. Should >>> they be used together or rather not? >> net.inet.ip.fastforwarding has precedence over net.isr.direct and >> enabling both at the same doesn't gain you anything. Fastforwarding >> is about 30% faster than all other methods available, including >> polling. On my test machine with two em(4) and an AMD Opteron 852 >> (2.6GHz) I can route 580'000 pps with zero packet loss on -CURRENT. >> An upcoming optimization that will go into -CURRENT in the next >> few days pushes that to 714'000 pps. Futher optimizations are >> underway to make a stock kernel do close to or above 1'000'000 pps >> on the same hardware. >> >> -- >> Andre >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to " freebsd-net-unsubscribe@freebsd.org" >> > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > >