From owner-freebsd-net@FreeBSD.ORG Tue Jul 17 10:35:12 2012 Return-Path: Delivered-To: net@freebsd.org Received: from mx2.freebsd.org (mx2.freebsd.org [IPv6:2001:4f8:fff6::35]) by hub.freebsd.org (Postfix) with ESMTP id 1E3C2106566C; Tue, 17 Jul 2012 10:35:12 +0000 (UTC) (envelope-from melifaro@FreeBSD.org) Received: from dhcp170-36-red.yandex.net (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx2.freebsd.org (Postfix) with ESMTP id 7F2C7151915; Tue, 17 Jul 2012 10:35:10 +0000 (UTC) Message-ID: <50053F74.7040101@FreeBSD.org> Date: Tue, 17 Jul 2012 14:33:24 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:12.0) Gecko/20120511 Thunderbird/12.0.1 MIME-Version: 1.0 To: Konstantin Belousov References: <4FF36438.2030902@FreeBSD.org> <4FF3E2C4.7050701@FreeBSD.org> <4FF3FB14.8020006@FreeBSD.org> <4FF402D1.4000505@FreeBSD.org> <20120704091241.GA99164@onelab2.iet.unipi.it> <4FF412B9.3000406@FreeBSD.org> <20120704154856.GC3680@onelab2.iet.unipi.it> <4FF59955.5090406@FreeBSD.org> <20120706061126.GA65432@onelab2.iet.unipi.it> <500452A5.3070501@FreeBSD.org> <20120716232352.GE2676@deviant.kiev.zoral.com.ua> In-Reply-To: <20120716232352.GE2676@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Doug Barton , Luigi Rizzo , net@freebsd.org Subject: Re: FreeBSD 10G forwarding performance @Intel X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jul 2012 10:35:12 -0000 On 17.07.2012 03:23, Konstantin Belousov wrote: > On Mon, Jul 16, 2012 at 09:43:01PM +0400, Alexander V. Chernikov wrote: >> On 06.07.2012 10:11, Luigi Rizzo wrote: >>> On Thu, Jul 05, 2012 at 05:40:37PM +0400, Alexander V. Chernikov wrote: >>>> On 04.07.2012 19:48, Luigi Rizzo wrote: >>> the thing discussed a few years ago (at least the one i took out of the >>> discussion) was that the counter fields in rules should hold the >>> index of a per-cpu counter associated to the rule. So CTR_INC(rule->ctr) >>> becomes something like pcpu->ipfw_ctrs[rule->ctr]++ >>> Once you create a new rule you also grab one free index from ipfw_ctrs[], >>> and the same should go for dummynet counters. >> >> Old kernel from previous letters, same setup: >> >> net.inet.ip.fw.enable=0 >> 2.3 MPPS >> net.inet.ip.fw.update_counters=0 >> net.inet.ip.fw.enable=1 >> 1.93MPPS >> net.inet.ip.fw.update_counters=1 >> 1.74MPPS >> >> Kernel with ipfw pcpu counters: >> >> net.inet.ip.fw.enable=0 >> 2.3 MPPS >> net.inet.ip.fw.update_counters=0 >> net.inet.ip.fw.enable=1 >> 1.93MPPS >> net.inet.ip.fw.update_counters=1 >> 1.93MPPS >> >> Counters seems to be working without any (significant) overhead. >> (Maybe I'm wrong somewhere?) > I do not think that your 'per-cpu' counter are correct. The thread > migration or rescheduling causes the fetch or update of the wrong It is typical in networking stack to bind interface queues to CPUs. It is done by netisr code and some network drivers like ixgbe. (And usually you should do it by hand if you want to achieve maximum performance). > per-cpu structure. This allows parallel updates with undefined > consequences. Yes. However, current ipfw counters implementation is not protected by any lock either, so I'm not sure if we really need to provide _very_ fine-graded counters. > > As a lowest thing to do, you need to disable preeemption around counter > structure dereference and increment. Same test with critical_enter() and critical_exit(): packets errs idrops bytes packets errs bytes colls 2412016 0 0 159027422 2413575 0 98996894 0 2412603 0 0 159762580 2413196 0 343078548 0 2413501 0 0 159094602 2411561 0 159208034 0 2413818 0 0 158894876 2412041 0 159579354 0 2411331 0 0 159867612 2412699 0 98690770 0 2413578 0 0 159565910 2413256 0 220508472 0 net.inet.ip.fw.update_counters=0 net.inet.ip.fw.enable=1 2109719 200318 0 155246592 2101593 0 141714254 0 2043039 373932 0 159586476 2042970 0 135004566 0 2042629 371124 0 159429254 2042308 0 84790780 0 packets errs idrops bytes packets errs bytes colls 2033687 0 0 134218324 2034435 0 134524224 0 2044290 0 0 134721534 2043947 0 85143014 0 2047714 0 0 135502190 2050383 0 85434406 0 net.inet.ip.fw.update_counters=0 2008526 0 0 132737890 2009535 0 281671228 0 1977217 13550 0 132278298 1968571 0 130585076 0 1975823 43522 0 133355986 1973620 0 130319542 0 1974159 40715 0 133124772 1977259 0 130552612 0 1969801 42073 0 132911194 1969426 0 130451906 0 1964142 21919 0 131242870 1966925 0 129959256 0 1961748 0 0 129548688 1966168 0 82793086 0 So, overhead (~80kpps) is now more observable. Not sure if this is reasonable. -- WBR, Alexander