From owner-freebsd-ipfw@FreeBSD.ORG  Wed Apr 24 19:51:38 2013
Return-Path: <owner-freebsd-ipfw@FreeBSD.ORG>
Delivered-To: freebsd-ipfw@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id DD38A8FA;
 Wed, 24 Apr 2013 19:51:38 +0000 (UTC)
 (envelope-from melifaro@FreeBSD.org)
Received: from mail.ipfw.ru (unknown [IPv6:2a01:4f8:120:6141::2])
 by mx1.freebsd.org (Postfix) with ESMTP id A317D1AF6;
 Wed, 24 Apr 2013 19:51:38 +0000 (UTC)
Received: from dhcp170-36-red.yandex.net ([95.108.170.36])
 by mail.ipfw.ru with esmtpsa (TLSv1:CAMELLIA256-SHA:256)
 (Exim 4.76 (FreeBSD)) (envelope-from <melifaro@FreeBSD.org>)
 id 1UV5mT-000A67-I5; Wed, 24 Apr 2013 23:55:05 +0400
Message-ID: <51783798.4020004@FreeBSD.org>
Date: Wed, 24 Apr 2013 23:50:48 +0400
From: "Alexander V. Chernikov" <melifaro@FreeBSD.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:17.0) Gecko/20130418 Thunderbird/17.0.5
MIME-Version: 1.0
To: Luigi Rizzo <rizzo@iet.unipi.it>
Subject: Re: [patch] ipfw interface tracking and opcode rewriting
References: <517801D3.5040502@FreeBSD.org>
 <20130424162349.GA8439@onelab2.iet.unipi.it> <51780C49.7000204@FreeBSD.org>
 <20130424190930.GA10395@onelab2.iet.unipi.it>
In-Reply-To: <20130424190930.GA10395@onelab2.iet.unipi.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-ipfw@freebsd.org, luigi@freebsd.org
X-BeenThere: freebsd-ipfw@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: IPFW Technical Discussions <freebsd-ipfw.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-ipfw>,
 <mailto:freebsd-ipfw-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-ipfw>
List-Post: <mailto:freebsd-ipfw@freebsd.org>
List-Help: <mailto:freebsd-ipfw-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw>,
 <mailto:freebsd-ipfw-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Apr 2013 19:51:38 -0000

On 24.04.2013 23:09, Luigi Rizzo wrote:
> On Wed, Apr 24, 2013 at 08:46:01PM +0400, Alexander V. Chernikov wrote:
>> On 24.04.2013 20:23, Luigi Rizzo wrote:
> ...
>>>> vesrion) in the middle of the next week.
>>> hmmm.... this is quite a large change, and from the description it
>>> is a bit unclear to me how the "opcode rewriting" thing relates to
>>> the use of strings vs index for name matching.
>> sorry, I havent't describe this explicitly.
>> Index matching is done via storing interface index in in p.glob field of
>> ipfw_insn_if instruction.
> understood. the reasons why i did not use the index is that
> one could specify a non-existing interface name, and also interfaces
> can be renamed. If you want to use indexses, you should add
> (perhaps you do, i haven't checked)
Yes, this is done (without 'good' renaming handling), but still.
> hooks to the interface add/rename/delete code in order to
> update the ruleset upon changes on the if list, and it
> seemed to me a bad idea to add this dependency
> (lockingwise, too).
>
> Really, with 16-byte fixed size interface names, the match
> is as simple as this:
>
>      #if CAN_DO_FAST_MATCH && IFNAMSIZ == 16 /* archs with no align requirements */
> 	{
> 		uint64_t *a = (uint64_t *)ifp->if_xname;
> 		uint64_t *b = (uint64_t *)cmd->name;
> 		if (a[0] == b[0] && a[1] == b[1])
> 			return 1;
> 	}
>      #else
> 	if (strncmp(ifp->if_xname, cmd->name, IFNAMSIZ) == 0)
> 		return 1
>      #endif
>
> (assuming the names are zero-padded, which should be the case already).
> Since you have the measurement infrastructure in place, perhaps
> you have an easy way to try this patch and see how effective
> it is in terms of performance.
I'll try this tomorrow, thanks.
>
>>> Additionally, i wonder if there isn't a better way to replace strncmp
>>> with some two 64-bit comparisons (the name is 16 bytes) by making
>>> sure that the fields are zero-padded and suitably aligned.
>>> At this point, on the machines you care about (able to sustain
>>> 1+ Mpps) the two comparison should have the same cost as
>>> the index comparison, without the need to track update in the names.
>> Well, actually I'm thinking of the next 2 steps:
>> 1) making kernel rule header more compact (20 bytes instead of 48) and
>> making it invisible for userland.
>> This involves rule counters to be stored separately (and possibly as
>> pcpu-based ones).
>> 2) since ruleset is now nearly readonly and more or less compact we can
>> try to store it in
>> contiguous address space to optimize cache line usage.
> certainly a worthwhile goal (also using gleb's new counters)
> but i suspect that compacting rules are a second order effect.
> I a bit skeptical they make a big difference on the in-kernel
> version of ipfw. You might see some difference in the
My current numbers are ~5mpps of IPv4 forwarding with ipfw turned on (1 
rule) for vlans over ixgbe, with 60% cpu usage (2xE5646).
For lagg with 2x ixgbe it is ~7mpps with the same 60% usage.
(And, say, 70% of CPU usage on our production is ipfw, despite low 
number of rules).
> userspace version, which runs on top of netmap.
We are preparing to move forward in this direction (and thinking of 
20-30mpps as our goal).
(And I hope some changes of kernel-based version can migrate to userland 
one :))
>
> cheers
> luigi
>