From owner-freebsd-net@FreeBSD.ORG Sun Jun 29 08:02:28 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35E481065671 for ; Sun, 29 Jun 2008 08:02:28 +0000 (UTC) (envelope-from paul@gtcomm.net) Received: from atlas.gtcomm.net (atlas.gtcomm.net [67.215.15.242]) by mx1.freebsd.org (Postfix) with ESMTP id F09F18FC15 for ; Sun, 29 Jun 2008 08:02:27 +0000 (UTC) (envelope-from paul@gtcomm.net) Received: from c-76-108-179-28.hsd1.fl.comcast.net ([76.108.179.28] helo=[192.168.1.6]) by atlas.gtcomm.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.67) (envelope-from ) id 1KCroS-0005pw-Gg for freebsd-net@freebsd.org; Sun, 29 Jun 2008 03:59:08 -0400 Message-ID: <4867420D.7090406@gtcomm.net> Date: Sun, 29 Jun 2008 04:04:29 -0400 From: Paul User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: FreeBSD Net Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 29 Jun 2008 08:02:28 -0000 This is just a question but who can get more than 400k pps forwarding performance ? I have tested fbsd 6/7/8 so far with many different configs. (all using intel pci-ex nic and SMP) fbsd 7-stable/8(current) seem to be the fastest and always hit this ceiling of 400k pps. Soon as it hits that I get errors galore. Received no buffers, missed packets, rx overruns.. It's because 'em0 taskq' is 90% cpu or so.. Now, while this is happening I have two CPU's 100% idle, and the other two CPUs are about 60%/20% .. So why in the world can't it use more cpus? Simple test setup: packet generator on em0 destination out em1 have to have ip forwarding and fastforwarding on (fastforward definitely makes a big difference, another 100kpps or so, without it can barely hit 300k) Packets are TCP, randomized sources, randomized ports for src and dst, single destination ip. I even tried the yandex driver in FBSD6 but it could barely even get 200k pps and it had a lot of weird issues, and fbsd6 couldn't hit 400k pps by itself. I am not using polling, that seems to make no difference, i tried that too. So question. What can I do for more performance (SMP)? Are there any good kernel options? If I disable ip forwarding i can do 750kpps with no errors because it's not going anywhere..em0 taskq cpu usage is less than half of what it is when it's forwarding. so obviously the issue is somewhere in the forwarding path and fastforwarding greatly helps!! see below. forwarding off: input (em0) output packets errs bytes packets errs bytes colls 757223 0 46947830 1 0 226 0 753551 0 46720166 1 0 178 0 756359 0 46894262 1 0 178 0 757570 0 46969344 1 0 178 0 753724 0 46730830 1 0 178 0 745372 0 46213130 1 0 178 0 (I had to slow down the packet generation to about 420-430kpps) forwarding on: input (em0) output packets errs bytes packets errs bytes colls 285918 151029 17726936 460 0 25410 0 284929 146151 17665602 417 0 22642 0 284253 147000 17623690 442 0 23884 0 285438 147765 17697160 448 0 24316 0 286582 147171 17768088 456 0 24748 0 287194 147088 17806032 422 0 22912 0 285812 141713 17720348 440 0 23884 0 284958 137579 17667412 457 0 25104 0 fastforwarding on: input (em0) output packets errs bytes packets errs bytes colls 399795 22790 24787310 459 0 25130 0 397425 25254 24640354 434 0 23560 0 403223 26937 24999830 431 0 23452 0 396587 21431 24588398 467 0 25288 0 400970 25776 24860144 459 0 24910 0 397819 23657 24664782 432 0 23452 0 406222 27418 25185768 432 0 23506 0 406718 12407 25216520 461 0 25018 0 PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K 64K CPU1 1 29:24 100.00% {idle: cpu1} 11 root 171 ki31 0K 64K RUN 0 28:46 100.00% {idle: cpu0} 11 root 171 ki31 0K 64K CPU3 3 24:32 84.62% {idle: cpu3} 0 root -68 0 0K 128K CPU2 2 12:59 84.13% {em0 taskq} 0 root -68 0 0K 128K - 3 2:12 19.92% {em1 taskq} 11 root 171 ki31 0K 64K RUN 2 19:46 19.63% {idle: cpu2} Well if anything.. at least it's a good show of the difference fastforwarding makes!! :) I have options NO_ADAPTIVE_MUTEXES ## Improve routing performance? options STOP_NMI # Stop CPUS using NMI instead of IPI no IPV6 no firewall loaded no netgraph HZ is 4000 em driver is 4096 on receive buffers using VLAN devices (em1 output) Tested on Xeon and Opteron processor Don't have exact results. Above results are dual opteron 2212 with freebsd current FreeBSD 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sat Jun 28 23:37:39 CDT 2008 Well I'm curious of the results of others.. Thanks for reading!! :)