From owner-freebsd-net@FreeBSD.ORG Mon Jul 7 09:11:39 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 874D81065684 for ; Mon, 7 Jul 2008 09:11:39 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id E0DB78FC1B for ; Mon, 7 Jul 2008 09:11:38 +0000 (UTC) (envelope-from andre@freebsd.org) Received: (qmail 29551 invoked from network); 7 Jul 2008 08:02:07 -0000 Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 7 Jul 2008 08:02:07 -0000 Message-ID: <4871DDC9.6060706@freebsd.org> Date: Mon, 07 Jul 2008 11:11:37 +0200 From: Andre Oppermann User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Robert Watson References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <20080706132148.E44832@fledge.watson.org> <4871D81B.8070507@freebsd.org> <20080707095013.N63144@fledge.watson.org> In-Reply-To: <20080707095013.N63144@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: FreeBSD Net , Bart Van Kerckhove , Ingo Flaschberger , Paul Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 09:11:39 -0000 Robert Watson wrote: > > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Robert Watson wrote: >>> Experience suggests that forwarding workloads see significant lock >>> contention in the routing and transmit queue code. The former needs >>> some kernel hacking to address in order to improve parallelism for >>> routing lookups. The latter is harder to address given the hardware >>> you're using: modern 10gbps cards frequently offer multiple transmit >>> queues that can be used independently (which our cxgb driver >>> supports), but 1gbps cards generally don't. >> >> Actually the routing code is not contended. The workload in router is >> mostly serialized without much opportunity for contention. With many >> interfaces and any-to-any traffic patterns it may get some >> contention. The locking overhead per packet is always there and has >> some impact though. > > Yes, I don't see any real sources of contention until we reach the > output code, which will run in the input if_em taskqueue threads, as the > input path generates little or no contention of the packets are not > destined for local delivery. I was a little concerned about mention of The interface output was the second largest block after the cache misses IIRC. The output part seems to have received only moderate attention and detailed performance analysis compared to the interface input path. Most network drivers do a write to the hardware for every packet sent in addition to other overhead that may be necessary for their transmit DMA rings. That adds significant overhead compared to the RX path where those costs are amortized over a larger number packets. > degrading performance as firewall complexity grows -- I suspect there's > a nice project for someone to do looking at why this is the case. I was > under the impression that, in 7.x and later, we use rwlocks to protect > firewall state, and that unless stateful firewall rules are used, these > are locked read-only rather than writable... The overhead of just looking at the packet (twice) in ipfw or other firewall packets is a huge overhead. The main loop of ipfw is a very large block of code. Unless one implements compilation of firewall to native machine code there is not much that can be done. With LLVM we will see some very interesting opportunity in that area. Other than that the ipfw instruction over per rule seems to be quite close to the optimum. I'm not saying one shouldn't take a close look with a profiler to verify this is actually the case. -- Andre