From owner-freebsd-net@FreeBSD.ORG  Fri Jul  6 05:52:05 2012
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2E08A106564A;
	Fri,  6 Jul 2012 05:52:05 +0000 (UTC)
	(envelope-from luigi@onelab2.iet.unipi.it)
Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238])
	by mx1.freebsd.org (Postfix) with ESMTP id D93BE8FC1C;
	Fri,  6 Jul 2012 05:52:04 +0000 (UTC)
Received: by onelab2.iet.unipi.it (Postfix, from userid 275)
	id E118E7300A; Fri,  6 Jul 2012 08:11:26 +0200 (CEST)
Date: Fri, 6 Jul 2012 08:11:26 +0200
From: Luigi Rizzo <rizzo@iet.unipi.it>
To: "Alexander V. Chernikov" <melifaro@FreeBSD.org>
Message-ID: <20120706061126.GA65432@onelab2.iet.unipi.it>
References: <4FF361CA.4000506@FreeBSD.org>
	<20120703214419.GC92445@onelab2.iet.unipi.it>
	<4FF36438.2030902@FreeBSD.org> <4FF3E2C4.7050701@FreeBSD.org>
	<4FF3FB14.8020006@FreeBSD.org> <4FF402D1.4000505@FreeBSD.org>
	<20120704091241.GA99164@onelab2.iet.unipi.it>
	<4FF412B9.3000406@FreeBSD.org>
	<20120704154856.GC3680@onelab2.iet.unipi.it>
	<4FF59955.5090406@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4FF59955.5090406@FreeBSD.org>
User-Agent: Mutt/1.4.2.3i
Cc: Doug Barton <dougb@freebsd.org>, net@freebsd.org
Subject: Re: FreeBSD 10G forwarding performance @Intel
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2012 05:52:05 -0000

On Thu, Jul 05, 2012 at 05:40:37PM +0400, Alexander V. Chernikov wrote:
> On 04.07.2012 19:48, Luigi Rizzo wrote:
...
> Traffic stats with most possible counters eliminated:
> (there is a possibility in ixgbe code to update rx/tx packets once per 
> rx_process_limit (which is 100 by default)):
> 
>             input          (ix0)           output
>    packets  errs idrops      bytes    packets  errs      bytes colls
>       2.8M     0     0       186M       2.8M     0       186M     0
>       2.8M     0     0       187M       2.8M     0       186M     0
> 
> And it seems that netstat uses 1024 as divisor (no HN_DIVISOR_1000 
> passed in if.c to show_stat), so real frame count from Ixia side is much 
> closer to 3MPPS (~ 2.961600 ).
...
> IPFW contention:
> Same setup as shown upper, same traffic level
> 
> 17:48 [0] test15# ipfw show
> 00100 0 0 allow ip from any to any
> 65535 0 0 deny ip from any to any
> 
> net.inet.ip.fw.enable: 0 -> 1
>             input          (ix0)           output
>    packets  errs idrops      bytes    packets  errs      bytes colls
>       2.1M  734k     0       187M       2.1M     0       139M     0
>       2.1M  736k     0       187M       2.1M     0       139M     0
>       2.1M  737k     0       187M       2.1M     0        89M     0
>       2.1M  735k     0       187M       2.1M     0       189M     0
> net.inet.ip.fw.update_counters: 1 -> 0
>       2.3M  636k     0       187M       2.3M     0       148M     0
>       2.5M  343k     0       187M       2.5M     0       164M     0
>       2.5M  351k     0       187M       2.5M     0       164M     0
>       2.5M  345k     0       187M       2.5M     0       164M     0
...
> It seems that ipfw counters are suffering from this problem, too.
> Unfortunately, there is no DPCPU allocator in our kernel.
> I'm planning to make a very simple per-cpu counters patch:
> (
> allocate 65k*(u64_bytes+u64_packets) memory for each CPU per vnet 
> instance init and make ipfw use it as counter backend.
> 
> There is a problem with several rules residing in single entry. This can 
> (probably) be worked-around by using fast counters for the first such 
> rule (or not using fast counters for such rules at all)
> )
> 
> What do you think about this?

the thing discussed a few years ago (at least the one i took out of the
discussion) was that the counter fields in rules should hold the
index of a per-cpu counter associated to the rule. So CTR_INC(rule->ctr)
becomes something like pcpu->ipfw_ctrs[rule->ctr]++
Once you create a new rule you also grab one free index from ipfw_ctrs[],
and the same should go for dummynet counters.
The alternative would be to allocate the rule and a set of counters
within the rule itself, but that kills 64 bytes per core per rule
to avoid cache contention.

cheers
luigi