Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Aug 2014 00:28:27 +0400
From:      "Alexander V. Chernikov" <melifaro@yandex-team.ru>
To:        Luigi Rizzo <rizzo@iet.unipi.it>
Cc:        "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Luigi Rizzo <luigi@freebsd.org>, "Andrey V. Elsukov" <ae@freebsd.org>, freebsd-ipfw <freebsd-ipfw@freebsd.org>
Subject:   Re: [CFT] new tables for ipfw
Message-ID:  <53ED1BEB.7000409@yandex-team.ru>
In-Reply-To: <53ECA302.8010100@yandex-team.ru>
References:  <53EBC687.9050503@yandex-team.ru>	<CA%2BhQ2%2Bg=A_rLHCVpBqn0AtFLu_gNGtzbmXvc-7JhpLqPSWw44A@mail.gmail.com>	<53EC880B.3020903@yandex-team.ru>	<CA%2BhQ2%2BiPPhy47eN0=KaSYBaNMdObY20yko7dRY1MMuP_mfnmOQ@mail.gmail.com>	<53EC960A.1030603@yandex-team.ru> <CA%2BhQ2%2BgxVYmXb%2BHOw4qUm6tykmEvBRkrV0RhZsnC6B08FLKvdA@mail.gmail.com> <53ECA302.8010100@yandex-team.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
On 14.08.2014 15:52, Alexander V. Chernikov wrote:
> On 14.08.2014 15:15, Luigi Rizzo wrote:
>>
>>
>>
>> On Thu, Aug 14, 2014 at 12:57 PM, Alexander V. Chernikov
>> <melifaro@yandex-team.ru <mailto:melifaro@yandex-team.ru>> wrote:
>>
>>     On 14.08.2014 14:44, Luigi Rizzo wrote:
>>>
>>>
>>>
>>>     On Thu, Aug 14, 2014 at 11:57 AM, Alexander V. Chernikov
>>>     <melifaro@yandex-team.ru <mailto:melifaro@yandex-team.ru>> wrote:
>>>
>>>         On 14.08.2014 13:23, Luigi Rizzo wrote:
>>>>
>>>>
>>>>
>>>>         On Wed, Aug 13, 2014 at 10:11 PM, Alexander V. Chernikov
>>>>         <melifaro@yandex-team.ru <mailto:melifaro@yandex-team.ru>>
>>>>         wrote:
>>>>
>>>>             Hello list.
>>>>
>>>>             I've been hacking ipfw for a while and It seems there
>>>>             is something ready to test/review in projects/ipfw branch.
>>>>
>>>>
>>>>         ​this is a fantastic piece of work, thanks for doing it and for
>>>>         integrating the feedback.
>>>>         ​
>>>>         I have some detailed feedback that will send you privately,
>>>>         but just a curiosity:
>>>>
>>>>             ​...​
>>>>
>>>>             Some examples (see ipfw(8) manual page for the
>>>>             description):
>>>>
>>>>              
>>>>             ​...
>>>>
>>>>
>>>>               ipfw table mi_test create type cidr algo "cidr:hash
>>>>             masks=/30,/64"
>>>>
>>>>
>>>>         ​why do we need to specify mask lengths in the above​ ?
>>>         Well, since we're hashing IP we have to know mask to cut
>>>         host bits in advance.
>>>         (And the real reason is that I'm too lazy to implement
>>>         hierarchical  matching (check /32, then /31, then /30) like
>>>         how, for example,
>>>
>>>
>>>     ​oh well for that we should use cidr:radix
>>>
>>>     Research results have never shown a strong superiority of
>>>     hierarchical hash tables over good radix implementations,
>>>     and in those cases one usually adopts partial prefix
>>>     expansion so you only have, say, masks that are a
>>>     multiple of 2..8 bits so you only need a small number of
>>>     hash lookups.
>>     Definitely, especially for IPv6. So I was actually thinking about
>>     covering some special sparse cases (e.g. someone having a bunch
>>     of /32 and a bunch of /30 and that's all).
>>
>>     Btw, since we're talking about "good radix implementation": what
>>     license does DXR have? :)
>>     Is it OK to merge it as another cidr implementation?
>>
>>  
>> "cidr" is a very ugly name, i'd rather use "addr"
> Ok, no problem with that. "addr" really sounds better.
>>
>> DXR has a ​bsd license and of course it is possible to use it.
>> You should ask Marko Zec for his latest version of the code
>> (and probably make sure we have one copy of the code in the source tree).
> Great!. I'll ask him :)
>>
>> Speaking of features, one thing that would be nice is the ability
>> for tables to reference the in-kernel tables (e.g. fibs, socket
>> lists, interface lists...), perhaps in readonly mode.
>> How complex do you think that would be ?
Well, the most major problem is that tables handling code assumed that
we do known number of items in advance, and since we're holding locks it
won't change, so we don't need large contigious buffer to dump data to.
This is not the case with "external" tables, so we can't _reliably_ dump
them (the same situation as in case of dynamic states).
Anyway, I've added cidr:kfib algo (
http://svnweb.freebsd.org/base?view=revision&revision=270001 ) and it
looks funny.
Quoting commit message:

# ipfw table fib2 create algo "cidr:kfib fib=2"
# ipfw table fib2 info                        
+++ table(fib2), set(0) +++
 kindex: 2, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:kfib fib=2
 items: 11, size: 288
# ipfw table fib2 list
+++ table(fib2), set(0) +++
10.0.0.0/24 0
127.0.0.1/32 0
::/96 0
::1/128 0
::ffff:0.0.0.0/96 0
2a02:978:2::/112 0
fe80::/10 0
fe80:1::/64 0
fe80:2::/64 0
fe80:3::/64 0
ff02::/16 0
# ipfw table fib2 lookup 10.0.0.5
10.0.0.0/24 0
# ipfw table fib2 lookup 2a02:978:2::11 
2a02:978:2::/112 0
# ipfw table fib2 detail              
+++ table(fib2), set(0) +++
 kindex: 2, type: cidr, locked
 valtype: number, references: 0
 algorithm: cidr:kfib fib=2
 items: 11, size: 288
 IPv4 algorithm radix info
  items: 0 itemsize: 200
 IPv6 algorithm radix info
  items: 0 itemsize: 200

> Implementing algo support for particular provider like sockets/iflists
> shouldn't be hard. Most of the algorithms complexity lies in table
> modifications. Here we have to support
> lookup and dump operations, so it is the question of providing
> necessary bindings to existing mechanisms (via some direct binding or
> utilizing things like kernel_sysctl for dump support).
>
> It looks like the following maps well to current table concept:
> * such tables are not created by default
> * user issues
>  `ipfw table kfib create type addr algo "addr:kernel fib=0"`
> or
>  `ipfw table ktcp create type flow algo "flow:kernel_tcp fib=0"`
> or
> `ipfw table kiface create type iface algo "iface:kernel"`
> * tables have special "readonly" type, flush_all requests are ignored
> * no state stored internally
>
> So generic table handling code needs to be modified to support
> read-only tables (and making more callbacks optional).
> Additionally, we might need to proxy "info" request info algo callback
> (optional, "real" algorithms won't implement it) to be able to show
> number of items (and some other info) to user.
>
>
>
>>
>> cheers
>> luigi
>>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53ED1BEB.7000409>