From owner-freebsd-performance@FreeBSD.ORG Tue Jul 3 16:12:24 2012 Return-Path: Delivered-To: performance@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by hub.freebsd.org (Postfix) with ESMTP id 13DD6106566C; Tue, 3 Jul 2012 16:12:24 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from dhcp170-36-red.yandex.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with ESMTP id 333D114DE4F; Tue, 3 Jul 2012 16:12:21 +0000 (UTC) Message-ID: <4FF319A2.6070905@FreeBSD.org> Date: Tue, 03 Jul 2012 20:11:14 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120511 Thunderbird/12.0.1 MIME-Version: 1.0 To: net@freebsd.org, hackers@freebsd.org, performance@freebsd.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Subject: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2012 16:12:24 -0000 Hello list! I'm quite stuck with bad forwarding performance on many FreeBSD boxes doing firewalling. Typical configuration is E5645 / E5675 @ Intel 82599 NIC. HT is turned off. (Configs and tunables below). I'm mostly concerned with unidirectional traffic flowing to single interface (e.g. using singe route entry). In most cases system can forward no more than 700 (or 1400) kpps which is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware). Test scenario: Ixia XM2 (traffic generator) <> ix0 (FreeBSD). Ixia sends 64byte IP packets from vlan10 (10.100.0.64 - 10.100.0.156) to destinations in vlan11 (10.100.1.128 - 10.100.1.192). Static arps are configured for all destination addresses. Traffic level is slightly above or slightly below system performance. ================= Test 1 ======================= Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, no firewall Traffic: 1-1 flow (1 src, 1 dst) (This is actually a bit different from described above) Result: input (ix0) output packets errs idrops bytes packets errs bytes colls 878k 48k 0 59M 878k 0 56M 0 874k 48k 0 59M 874k 0 56M 0 875k 48k 0 59M 875k 0 56M 0 16:41 [0] test15# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf " %7s %2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}' STATE C TIME CPU COMMAND CPU6 6 17:28 100.00% kernel{ix0 que} CPU9 9 20:42 60.06% intr{irq265: ix0:que 16:41 [0] test15# vmstat -i | grep ix0 irq256: ix0:que 0 500796 167 irq257: ix0:que 1 6693573 2245 irq258: ix0:que 2 2572380 862 irq259: ix0:que 3 3166273 1062 irq260: ix0:que 4 9691706 3251 irq261: ix0:que 5 10766434 3611 irq262: ix0:que 6 8933774 2996 irq263: ix0:que 7 5246879 1760 irq264: ix0:que 8 3548930 1190 irq265: ix0:que 9 11817986 3964 irq266: ix0:que 10 227561 76 irq267: ix0:link 1 0 Note that system is using 2 cores to forward, so 12 cores should be able to forward 4+ mpps which is more or less consistent with Linux results. Note that interrupts on all queues are (as far as I understand from the fact that AIM is turned off and interrupt rates are the same from previous test). Additionally, despite hw.intr_storm_threshold = 200k, i'm constantly getting interrupt storm detected on "irq265:"; throttling interrupt source message. ================= Test 2 ======================= Kernel: FreeBSD-8-S r237994, stock drivers, stock routing, no FLOWTABLE, no firewall Traffic: Unidirectional many-2-many 16:20 [0] test15# netstat -I ix0 -hw 1 input (ix0) output packets errs idrops bytes packets errs bytes colls 507k 651k 0 74M 508k 0 32M 0 506k 652k 0 74M 507k 0 28M 0 509k 652k 0 74M 508k 0 37M 0 16:28 [0] test15# top -nCHSIzs1 | awk '$5 ~ /(K|SIZE)/ { printf " %7s %2s %6s %10s %15s %s\n", $7, $8, $9, $10, $11, $12}' STATE C TIME CPU COMMAND CPU10 6 0:40 100.00% kernel{ix0 que} CPU2 2 11:47 84.86% intr{irq258: ix0:que CPU3 3 11:50 81.88% intr{irq259: ix0:que CPU8 8 11:38 77.69% intr{irq264: ix0:que CPU7 7 11:24 77.10% intr{irq263: ix0:que WAIT 1 10:10 74.76% intr{irq257: ix0:que CPU4 4 8:57 63.48% intr{irq260: ix0:que CPU6 6 8:35 61.96% intr{irq262: ix0:que CPU9 9 14:01 60.79% intr{irq265: ix0:que RUN 0 9:07 59.67% intr{irq256: ix0:que WAIT 5 6:13 43.26% intr{irq261: ix0:que CPU11 11 5:19 35.89% kernel{ix0 que} - 4 3:41 25.49% kernel{ix0 que} - 1 3:22 21.78% kernel{ix0 que} - 1 2:55 17.68% kernel{ix0 que} - 4 2:24 16.55% kernel{ix0 que} - 1 9:54 14.99% kernel{ix0 que} CPU0 11 2:13 14.26% kernel{ix0 que} 16:07 [0] test15# vmstat -i | grep ix0 irq256: ix0:que 0 13654 15 irq257: ix0:que 1 87043 96 irq258: ix0:que 2 39604 44 irq259: ix0:que 3 48308 53 irq260: ix0:que 4 138002 153 irq261: ix0:que 5 169596 188 irq262: ix0:que 6 107679 119 irq263: ix0:que 7 72769 81 irq264: ix0:que 8 30878 34 irq265: ix0:que 9 1002032 1115 irq266: ix0:que 10 10967 12 irq267: ix0:link 1 0 Note that all cores are loaded more or less evenly, but the result is _worse_. The first reason for this is mtx_lock which is acquired twice on every lookup (once in in in_matroute() where it can possibly be removed and once again in rtalloc1_fib()). Latter one is addressed by andre@ in r234650). Additionally, despite itreads are bound to singe CPU each, kernel que are not in stock setup. However, configuration with 5 queues and 5 kernel threads bound to different CPU provides the same bad results. ================= Test 3 ======================= Kernel: FreeBSD-8-S June 4 SVN, +merged ifaddrlock, stock drivers, stock routing, no FLOWTABLE, no firewall packets errs idrops bytes packets errs bytes colls 580k 18k 0 38M 579k 0 37M 0 581k 26k 0 39M 580k 0 37M 0 580k 24k 0 39M 580k 0 37M 0 ................ Enabling ipfw _increases_ performance a bit: 604k 0 0 39M 604k 0 39M 0 604k 0 0 39M 604k 0 39M 0 582k 19k 0 38M 568k 0 37M 0 527k 81k 0 39M 530k 0 34M 0 605k 28 0 39M 605k 0 39M 0 ================= Test 3.1 ======================= Same as test 3, the only difference is the following: route add -net 10.100.1.160/27 -iface vlan11. input (ix0) output packets errs idrops bytes packets errs bytes colls 543k 879k 0 91M 544k 0 35M 0 547k 870k 0 91M 545k 0 35M 0 541k 870k 0 91M 539k 0 30M 0 952k 565k 0 97M 962k 0 48M 0 1.2M 228k 0 91M 1.2M 0 92M 0 1.2M 226k 0 90M 1.1M 0 76M 0 1.1M 228k 0 91M 1.2M 0 76M 0 1.2M 233k 0 90M 1.2M 0 76M 0 ================= Test 3.2 ======================= Same as test 3, splitting destination into 4 smaller rtes: route add -net 10.100.1.128/28 -iface vlan11 route add -net 10.100.1.144/28 -iface vlan11 route add -net 10.100.1.160/28 -iface vlan11 route add -net 10.100.1.176/28 -iface vlan11 input (ix0) output packets errs idrops bytes packets errs bytes colls 1.4M 0 0 106M 1.6M 0 106M 0 1.8M 0 0 106M 1.6M 0 71M 0 1.6M 0 0 106M 1.6M 0 71M 0 1.6M 0 0 87M 1.6M 0 71M 0 1.6M 0 0 126M 1.6M 0 212M 0 ================= Test 3.3 ======================= Same as test 3, splitting destination into 16 smaller rtes: input (ix0) output packets errs idrops bytes packets errs bytes colls 1.6M 0 0 118M 1.8M 0 118M 0 2.0M 0 0 118M 1.8M 0 119M 0 1.8M 0 0 119M 1.8M 0 79M 0 1.8M 0 0 117M 1.8M 0 157M 0 ================= Test 4 ======================= Kernel: FreeBSD-8-S June 4 SVN, stock drivers, routing patch 1, no FLOWTABLE, no firewall input (ix0) output packets errs idrops bytes packets errs bytes colls 1.8M 0 0 114M 1.9M 0 114M 0 1.7M 0 0 114M 1.7M 0 114M 0 1.8M 0 0 114M 1.8M 0 114M 0 1.7M 0 0 114M 1.7M 0 114M 0 1.8M 0 0 114M 1.8M 0 74M 0 1.5M 0 0 114M 1.8M 0 74M 0 2M 0 0 114M 1.8M 0 194M 0 Patch 1 totally eliminates mtx_lock for fastforwarding path to get an idea how much performance we can achieve. The result is nearly the same as in 3.3 ================= Test 4.1 ======================= Same as the test 4, same traffic level, enabling firewall with single allow rule (evaluating RLOCK performance) 22:35 [0] test15# netstat -I ix0 -hw 1 input (ix0) output packets errs idrops bytes packets errs bytes colls 1.8M 149k 0 114M 1.6M 0 142M 0 1.4M 148k 0 85M 1.6M 0 104M 0 1.8M 149k 0 143M 1.6M 0 104M 0 1.6M 151k 0 114M 1.6M 0 104M 0 1.6M 151k 0 114M 1.6M 0 104M 0 1.4M 152k 0 114M 1.6M 0 104M 0 E.g something like 10% performance loss. ================= Test 4.2 ======================= Same as test4, playing with number of queues. 5queues, same traffic level 1.5M 225k 0 114M 1.5M 0 99M 0 ================= Test 4.3 ======================= Same as test 4, HT on, number of queues = 16 input (ix0) output packets errs idrops bytes packets errs bytes colls 2.4M 0 0 157M 2.4M 0 156M 0 2.4M 0 0 156M 2.4M 0 157M 0 However, enabling firewall immediately drops rate to 1.9mpps which is nearly the same as 4.1 (and complicated fw ruleset possibly kill HT core much faster) ================= Test 4.3 ======================= Same as test4, kerwnel ix0 que Tx threads bound to specific CPUs (corresponding to RX ): 18:02 [0] test15# procstat -ak | grep ix0 | sort -nk 2 12 100045 intr irq256: ix0:que 0 100046 kernel ix0 que 12 100047 intr irq257: ix0:que 0 100048 kernel ix0 que mi_switch sleepq_wait msleep_spin taskqueue_thread_loop fork_exit fork_trampoline 12 100049 intr irq258: ix0:que .. test15# for i in `jot 12 0`; do cpuset -l $i -t $((100046+2*$i)); done Result: input (ix0) output packets errs idrops bytes packets errs bytes colls 2.1M 0 0 139M 2M 0 193M 0 2.1M 0 0 139M 2.3M 0 139M 0 2.1M 0 0 139M 2.1M 0 85M 0 2.1M 0 0 139M 2.1M 0 193M 0 Quite considerable increase, however this works better for uniform traffic distribution only. ================= Test 5 ======================= Same as test 4, make radix use rmlock (r234648, r234649). Result: 1.7 MPPS. ================= Test 6 ======================= Same as test 4 + FLOWTABLE Result: 1.7 MPPS. ================= Test 7 ======================= Same as test 4, build with GCC 4.7 Result: No performance gain Further investigations: ================= Test 8 ======================= Test 4 setup with kernel build with LOCK_PROFILING. 17:46 [0] test15# sysctl debug.lock.prof.enable=1 ; sleep 2 ; sysctl debug.lock.prof.enable=0 920k 0 0 59M 920k 0 59M 0 875k 0 0 59M 920k 0 59M 0 628k 0 0 39M 566k 0 45M 0 79k 2.7M 0 186M 57k 0 6.5M 0 71k 878k 0 61M 73k 0 4.0M 0 891k 254k 0 72M 917k 0 54M 0 920k 0 0 59M 920k 0 59M 0 When enabled, forwarding performance goes down to 60kpps. Enabled for 2 seconds (so actually 130k packets forwarded), results attached as separate file. Several hundred lock contentions in ixgbe, that's all. ================= Test 9 ======================= Same as test 4 setup with hwpmc. Results attached. ================= Test 9 ======================= Kernel: Freebsd-9-S. No major difference Some (my) preliminary conclusions: 1) rte mtx_lock should (and can) be eliminated from stock kernel. (And it can be done more or less easily for in_matroute). 2) rmlock vs rwlock performance difference is insignificant (maybe because of 3) ) 3) there are locks contention between ixgbe taskq threads and ithreads. I'm not sure if taskq threads are necessary in the case of packet forwarding and not traffic generation. Maybe I'm missing something else? (l2 cache misses or other things). What else I can do to debug this further? Relevant files: http://static.ipfw.ru/files/fbsd10g/0001-no-rt-mutex.patch http://static.ipfw.ru/files/fbsd10g/kernel.gprof.txt http://static.ipfw.ru/files/fbsd10g/prof_stats.txt ============= CONFIGS ==================== sysctl.conf: kern.ipc.maxsockbuf=33554432 net.inet.udp.maxdgram=65535 net.inet.udp.recvspace=16777216 net.inet.tcp.sendbuf_auto=0 net.inet.tcp.recvbuf_auto=0 net.inet.tcp.sendspace=16777216 net.inet.tcp.recvspace=16777216 net.inet.ip.maxfragsperpacket=64 kern.random.sys.harvest.ethernet=0 kern.random.sys.harvest.point_to_point=0 kern.random.sys.harvest.interrupt=0 net.inet.ip.forwarding=1 net.inet.ip.fastforwarding=1 net.inet.ip.redirect=0 hw.intr_storm_threshold=20000 loader.conf: kern.ipc.nmbclusters="512000" ixgbe_load="YES" hw.ixgbe.rx_process_limit="300" hw.ixgbe.nojumbobuf="1" hw.ixgbe.max_loop="100" hw.ixgbe.max_interrupt_rate="20000" hw.ixgbe.num_queues="11" hw.ixgbe.txd=4096 hw.ixgbe.rxd=4096 kern.hwpmc.nbuffers=2048 debug.debugger_on_panic=1 net.inet.ip.fw.default_to_accept=1 kernel: cpu HAMMER ident CORE_RELENG_7 options COMPAT_IA32 makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options COMPAT_FREEBSD7 # Compatible with FreeBSD7 options COMPAT_FREEBSD32 options SCSI_DELAY=4000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options STACK # stack(9) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options AUDIT # Security event auditing options HWPMC_HOOKS options GEOM_MIRROR options MROUTING options PRINTF_BUFR_SIZE=100 # To make an SMP kernel, the next two lines are needed options SMP # Symmetric MultiProcessor Kernel # CPU frequency control device cpufreq # Bus support. device acpi device pci device ada device ahci # SCSI Controllers device ahd # AHA39320/29320 and onboard AIC79xx devices options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. device mpt # LSI-Logic MPT-Fusion # SCSI peripherals device scbus # SCSI bus (required for SCSI) device da # Direct Access (disks) device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) # RAID controllers device mfi # LSI MegaRAID SAS # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support # syscons is the default console driver, resembling an SCO console device sc device agp # support several AGP chipsets ## Power management support (see NOTES for more options) #device apm ## Add suspend/resume support for the i8254. #device pmtimer # Serial (COM) ports #device sio # 8250, 16[45]50 based serial ports device uart # Generic UART driver # If you've got a "dumb" serial or parallel PCI card that is # supported by the puc(4) glue driver, uncomment the following # line to enable it (connects to sio, uart and/or ppc drivers): #device puc # PCI Ethernet NICs. device em # Intel PRO/1000 adapter Gigabit Ethernet Card device bce #device ixgb # Intel PRO/10GbE Ethernet Card #device ixgbe # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support # Pseudo devices. device loop # Network loopback device random # Entropy device device ether # Ethernet support device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device firmware # firmware assist module device lagg # The `bpf' device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! # Note that 'bpf' is required for DHCP. device bpf # Berkeley packet filter # USB support device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device usb # USB Bus (required) #device udbp # USB Double Bulk Pipe devices device uhid # "Human Interface Devices" device ukbd # Keyboard device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse # USB Serial devices device ucom # Generic com ttys options INCLUDE_CONFIG_FILE options KDB options KDB_UNATTENDED options DDB options ALT_BREAK_TO_DEBUGGER options IPFIREWALL #firewall options IPFIREWALL_FORWARD #packet destination changes options IPFIREWALL_VERBOSE #print information about # dropped packets options IPFIREWALL_VERBOSE_LIMIT=10000 #limit verbosity # MRT support options ROUTETABLES=16 device vlan #VLAN support # Size of the kernel message buffer. Should be N * pagesize. options MSGBUF_SIZE=4096000 options SW_WATCHDOG options PANIC_REBOOT_WAIT_TIME=4 # # Hardware watchdog timers: # # ichwd: Intel ICH watchdog timer # #device ichwd device smbus device ichsmb device ipmi -- WBR, Alexander From owner-freebsd-performance@FreeBSD.ORG Tue Jul 3 16:35:49 2012 Return-Path: Delivered-To: performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 06444106564A; Tue, 3 Jul 2012 16:35:49 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id B85F88FC14; Tue, 3 Jul 2012 16:35:48 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id EB07573027; Tue, 3 Jul 2012 18:55:06 +0200 (CEST) Date: Tue, 3 Jul 2012 18:55:06 +0200 From: Luigi Rizzo To: "Alexander V. Chernikov" Message-ID: <20120703165506.GA90114@onelab2.iet.unipi.it> References: <4FF319A2.6070905@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FF319A2.6070905@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Mailman-Approved-At: Tue, 03 Jul 2012 16:43:17 +0000 Cc: hackers@freebsd.org, performance@freebsd.org, net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2012 16:35:49 -0000 On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote: > Hello list! > > I'm quite stuck with bad forwarding performance on many FreeBSD boxes > doing firewalling. ... > In most cases system can forward no more than 700 (or 1400) kpps which > is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware). among the many interesting tests you have run, i am curious if you have tried to remove the update of the counters on route entries. They might be another severe contention point. cheers luigi From owner-freebsd-performance@FreeBSD.ORG Tue Jul 3 17:38:49 2012 Return-Path: Delivered-To: performance@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by hub.freebsd.org (Postfix) with ESMTP id 0E168106572A; Tue, 3 Jul 2012 17:38:48 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from dhcp170-36-red.yandex.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with ESMTP id AD66314E588; Tue, 3 Jul 2012 17:38:46 +0000 (UTC) Message-ID: <4FF32DE2.2010606@FreeBSD.org> Date: Tue, 03 Jul 2012 21:37:38 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120511 Thunderbird/12.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <4FF319A2.6070905@FreeBSD.org> <20120703165506.GA90114@onelab2.iet.unipi.it> In-Reply-To: <20120703165506.GA90114@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: hackers@freebsd.org, performance@freebsd.org, net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2012 17:38:49 -0000 On 03.07.2012 20:55, Luigi Rizzo wrote: > On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote: >> Hello list! >> >> I'm quite stuck with bad forwarding performance on many FreeBSD boxes >> doing firewalling. > ... >> In most cases system can forward no more than 700 (or 1400) kpps which >> is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware). > > among the many interesting tests you have run, i am curious > if you have tried to remove the update of the counters on route > entries. They might be another severe contention point. 21:47 [0] m@test15 netstat -I ix0 -w 1 input (ix0) output packets errs idrops bytes packets errs bytes colls 1785514 52785 0 121318340 1784650 0 117874854 0 1773126 52437 0 120701470 1772977 0 117584736 0 1781948 52154 0 121060126 1778271 0 75029554 0 1786169 52982 0 121451160 1787312 0 160967392 0 21:47 [0] test15# sysctl net.rt_count=0 net.rt_count: 1 -> 0 1814465 22546 0 121302076 1814291 0 76860092 0 1817769 14272 0 120984922 1816254 0 163643534 0 1815311 13113 0 120831970 1815340 0 120159118 0 1814059 13698 0 120799132 1813738 0 120172092 0 1818030 13513 0 120960140 1814578 0 120332662 0 1814169 14351 0 120836182 1814003 0 120164310 0 Thanks, another good point. I forgot to merge this option from andre's patch. Another 30-40-50kpps to win. +u_int rt_count = 1; +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, &rt_count, 1, ""); @@ -601,17 +625,20 @@ passout: if (error != 0) IPSTAT_INC(ips_odropped); else { - ro.ro_rt->rt_rmx.rmx_pksent++; + if (rt_count) + ro.ro_rt->rt_rmx.rmx_pksent++; IPSTAT_INC(ips_forward); IPSTAT_INC(ips_fastforward); > > cheers > luigi > -- WBR, Alexander From owner-freebsd-performance@FreeBSD.ORG Tue Jul 3 20:08:33 2012 Return-Path: Delivered-To: performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BEA80106564A; Tue, 3 Jul 2012 20:08:33 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 7995F8FC0A; Tue, 3 Jul 2012 20:08:33 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id E232273029; Tue, 3 Jul 2012 22:27:57 +0200 (CEST) Date: Tue, 3 Jul 2012 22:27:57 +0200 From: Luigi Rizzo To: "Alexander V. Chernikov" Message-ID: <20120703202757.GA90741@onelab2.iet.unipi.it> References: <4FF319A2.6070905@FreeBSD.org> <20120703165506.GA90114@onelab2.iet.unipi.it> <4FF32DE2.2010606@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FF32DE2.2010606@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Mailman-Approved-At: Tue, 03 Jul 2012 20:21:35 +0000 Cc: hackers@freebsd.org, performance@freebsd.org, net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2012 20:08:33 -0000 On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote: ... > Thanks, another good point. I forgot to merge this option from andre's > patch. > > Another 30-40-50kpps to win. not much gain though. What about the other IPSTAT_INC counters ? I think the IPSTAT_INC macros were introduced (by rwatson ?) following a discussion on how to make the counters per-cpu and avoid the contention on cache lines. But they are still implemented as a single instance, and neither volatile nor atomic, so it is not even clear that they can give reliable results, let alone the fact that you are likely to get some cache misses. the relevant macro is in ip_var.h. Cheers luigi > > +u_int rt_count = 1; > +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, &rt_count, 1, ""); > > @@ -601,17 +625,20 @@ passout: > if (error != 0) > IPSTAT_INC(ips_odropped); > else { > - ro.ro_rt->rt_rmx.rmx_pksent++; > + if (rt_count) > + ro.ro_rt->rt_rmx.rmx_pksent++; > IPSTAT_INC(ips_forward); > IPSTAT_INC(ips_fastforward); > > > > > >cheers > >luigi > > > > > -- > WBR, Alexander > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Tue Jul 3 20:33:20 2012 Return-Path: Delivered-To: performance@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [69.147.83.53]) by hub.freebsd.org (Postfix) with ESMTP id F0D10106567E; Tue, 3 Jul 2012 20:33:20 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from dhcp170-36-red.yandex.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with ESMTP id 25AC9158FDC; Tue, 3 Jul 2012 20:33:04 +0000 (UTC) Message-ID: <4FF356BC.2060306@FreeBSD.org> Date: Wed, 04 Jul 2012 00:31:56 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120511 Thunderbird/12.0.1 MIME-Version: 1.0 To: Luigi Rizzo References: <4FF319A2.6070905@FreeBSD.org> <20120703165506.GA90114@onelab2.iet.unipi.it> <4FF32DE2.2010606@FreeBSD.org> <20120703202757.GA90741@onelab2.iet.unipi.it> In-Reply-To: <20120703202757.GA90741@onelab2.iet.unipi.it> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: hackers@freebsd.org, performance@freebsd.org, net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2012 20:33:21 -0000 On 04.07.2012 00:27, Luigi Rizzo wrote: > On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote: > ... >> Thanks, another good point. I forgot to merge this option from andre's >> patch. >> >> Another 30-40-50kpps to win. > > not much gain though. > What about the other IPSTAT_INC counters ? Well, we should then remove all such counters (total, forwarded) and per-interface statistics (at least for forwarded packets). > I think the IPSTAT_INC macros were introduced (by rwatson ?) > following a discussion on how to make the counters per-cpu > and avoid the contention on cache lines. > But they are still implemented as a single instance, > and neither volatile nor atomic, so it is not even clear > that they can give reliable results, let alone the fact > that you are likely to get some cache misses. > > the relevant macro is in ip_var.h. Hm. This seems to be just per-vnet structure instance. We've got some more real DPCPU stuff (sys/pcpu.h && kern/subr_pcpu.c) which can be used for global ipstat structure, however since it is allocated from single area without possibility to free we can't use it for per-interface counters. I'll try to run tests without any possibly contested counters and report the results on Thursday. > > Cheers > luigi > >> >> +u_int rt_count = 1; >> +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW,&rt_count, 1, ""); >> >> @@ -601,17 +625,20 @@ passout: >> if (error != 0) >> IPSTAT_INC(ips_odropped); >> else { >> - ro.ro_rt->rt_rmx.rmx_pksent++; >> + if (rt_count) >> + ro.ro_rt->rt_rmx.rmx_pksent++; >> IPSTAT_INC(ips_forward); >> IPSTAT_INC(ips_fastforward); >> >> >>> >>> cheers >>> luigi >>> >> >> >> -- >> WBR, Alexander >> _______________________________________________ >> freebsd-net@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-net >> To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- WBR, Alexander From owner-freebsd-performance@FreeBSD.ORG Tue Jul 3 21:08:52 2012 Return-Path: Delivered-To: performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E0390106564A; Tue, 3 Jul 2012 21:08:52 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 97C338FC0C; Tue, 3 Jul 2012 21:08:52 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 2AFCB73027; Tue, 3 Jul 2012 23:28:16 +0200 (CEST) Date: Tue, 3 Jul 2012 23:28:16 +0200 From: Luigi Rizzo To: "Alexander V. Chernikov" Message-ID: <20120703212816.GA92445@onelab2.iet.unipi.it> References: <4FF319A2.6070905@FreeBSD.org> <20120703165506.GA90114@onelab2.iet.unipi.it> <4FF32DE2.2010606@FreeBSD.org> <20120703202757.GA90741@onelab2.iet.unipi.it> <4FF356BC.2060306@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4FF356BC.2060306@FreeBSD.org> User-Agent: Mutt/1.4.2.3i X-Mailman-Approved-At: Tue, 03 Jul 2012 21:16:23 +0000 Cc: hackers@freebsd.org, performance@freebsd.org, net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Jul 2012 21:08:53 -0000 On Wed, Jul 04, 2012 at 12:31:56AM +0400, Alexander V. Chernikov wrote: > On 04.07.2012 00:27, Luigi Rizzo wrote: > >On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote: > >... > >>Thanks, another good point. I forgot to merge this option from andre's > >>patch. > >> > >>Another 30-40-50kpps to win. > > > >not much gain though. > >What about the other IPSTAT_INC counters ? > Well, we should then remove all such counters (total, forwarded) and > per-interface statistics (at least for forwarded packets). I am not saying to remove them for good, but at least have a try at what we can hope to save by implementing them on a per-cpu basis. There is a chance that one will not see big gains util the majority of such shared counters are fixed (there are probably 3-4 at least on the non-error path for forwarded packets), plus the per-interface ones that are not even wrapped in macros (see if_ethersubr.c) > >I think the IPSTAT_INC macros were introduced (by rwatson ?) > >following a discussion on how to make the counters per-cpu > >and avoid the contention on cache lines. > >But they are still implemented as a single instance, > >and neither volatile nor atomic, so it is not even clear > >that they can give reliable results, let alone the fact > >that you are likely to get some cache misses. > > > >the relevant macro is in ip_var.h. > Hm. This seems to be just per-vnet structure instance. yes but essentially they are still shared by all threads within a vnet (besides you probably ran your tests in the main instance) > We've got some more real DPCPU stuff (sys/pcpu.h && kern/subr_pcpu.c) > which can be used for global ipstat structure, however since it is > allocated from single area without possibility to free we can't use it > for per-interface counters. yes, those should be moved to a private, dynamically allocated region of the ifnet (the number of CPUs is known at driver init time, i hope). But again for a quick test disabling the if_{i|o}{bytesC|packets} should do the job, if you can count the received rate by some other means. > I'll try to run tests without any possibly contested counters and report > the results on Thursday. great, that would be really useful info. cheers luigi