From owner-freebsd-ipfw@FreeBSD.ORG Mon Aug 4 09:45:26 2014 Return-Path: Delivered-To: freebsd-ipfw@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 19E92FC0; Mon, 4 Aug 2014 09:45:26 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 865CC2F38; Mon, 4 Aug 2014 09:45:25 +0000 (UTC) Received: from [2a02:6b8:0:401:222:4dff:fe50:cd2f] (helo=ptichko.yndx.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1XEAsQ-000COY-5a; Mon, 04 Aug 2014 09:32:06 +0400 Message-ID: <53DF55FA.8010303@FreeBSD.org> Date: Mon, 04 Aug 2014 13:44:26 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: ipfw named objejcts, table values and syntax change References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> In-Reply-To: <53DCA25C.1000108@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-ipfw@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: IPFW Technical Discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Aug 2014 09:45:26 -0000 On 02.08.2014 12:33, Alexander V. Chernikov wrote: > On 02.08.2014 10:33, Luigi Rizzo wrote: >> >> >> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov >> > wrote: >> >> Hello all. >> >> I'm currently working on to enhance ipfw in some areas. >> The most notable (and user-visible) change is named table support. >> The other one is support for different lookup algorithms for different >> key types. >> >> For example, new ipfw permits writing this: >> >> ipfw table tb1 create type cidr >> ipfw add allow ip from table(tl1) to any >> ipfw add allow ip from any lookup dst-ip tb1 >> >> ipfw table if1 create type iface >> ipfw add skipto tablearg ip from any to any via table(if1) >> >> or even this: >> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port >> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 >> ipfw add allow ip from any to any flow table(fl1) >> >> all these changes fully preserve backward compatibility. >> (actually tables needs now to be created before use and their type needs >> to match with opcode used, but new ipfw(8) performs auto-creation >> for cidr tables). >> >> There is another thing I'm going to change and I'm not sure I can keep >> the same compatibility level. >> >> Table values, from one point of view, can be classified to the following >> types: >> >> - skipto argument >> - fwd argument (*) >> - link to another object (nat, pipe, queue) >> - plain u32 (not bound to any object) >> (divert/tee,netgraph,tag/utag,limit) >> >> There are the following reasons why I think it is necessary to implement >> explicit table values typing (like tables): >> - Implementing fwd tablearg for IPv6 hosts requires indirection table >> - Converting nat/pipe instance ids to names renders values unusable >> - retiring old hack with storing saved pointer of found object/rule >> inside rule w/o proper locking >> - making faster skipto >> >> >> ​​i don't buy the idea that you need typed arguments >> for all the cases above. Maybe the case that >> may make sense is the fwd argument (and in the future >> something else). >> We already discussed, i think, the fact that now it >> is legal to have references to non existing things >> (skipto, pipes etc.) implemented as u32. >> Removing that would break configurations. > It depends on actual implementation. This can be preserved by > auto-creating necessary objects in kernel and/or in userspace, so > we can (and should) avoid breaking in this particular way. Can you please explain your vision on values another time? As far as I understand, you're not against it in general, but the details matter: * IP address can be one of the types (it won't break much, and we can simply skip that one for MFC) * what about typing for nat/pipes ? we're not going to convert their ids to names? (or maybe you can suggest other non-disruptive way?) * everything else is type "u32" >> Efficiency is not affected, even for skipto, > It depends on workload. While binary search is fast in terms of cpu, it > is may be not so fast in terms of memory (since each of the rule is > allocated by separate malloc() (and that is another thing which is worth > discussing)). > >> and while i agree that unprotected writes to the pointers >> in rules should not happen, these pointers are changed >> infrequently so a global read-mostly lock should be >> sufficient to protect all changes to the rules. >> >> cheers >> luigi >> >> >> So, as the result, table will have lookup key type (already done), >> value type ('skipto', 'nexthop', 'nat', 'pipe', 'number', ..) and some >> additional restrictions (like inability to add non-existing nat instance >> id). >> >> This change will break (at least) scenarios where people are >> using one table for both nat/pipe instances (and keep nat ids in sync >> with pipe ones). For example: >> >> ipfw table 1 add 10.0.10.0/24 110 >> ipfw table 1 add 10.0.20.0/24 120 >> >> ipfw add 100 nat tablearg from table(1) to any via vlanX in >> .. >> ipfw add 500 pipe tablearg from table(1) to any via ix0 out >> >> It looks like it is not so easy to bind values for given table to >> different objects (or different tasks) (and lack of compatibility kills >> hope for MFC). >> >> Ideas? >> >> >> >> >> >> >> _______________________________________________ >> freebsd-ipfw@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw >> To unsubscribe, send any mail to >> "freebsd-ipfw-unsubscribe@freebsd.org >> " >> >> >> >> >> -- >> -----------------------------------------+------------------------------- >> Prof. Luigi RIZZO, rizzo@iet.unipi.it . >> Dip. di Ing. dell'Informazione >> http://www.iet.unipi.it/~luigi/ . Universita` di Pisa >> TEL +39-050-2211611 . via >> Diotisalvi 2 >> Mobile +39-338-6809875 . 56122 >> PISA (Italy) >> -----------------------------------------+------------------------------- From owner-freebsd-ipfw@FreeBSD.ORG Mon Aug 4 11:54:38 2014 Return-Path: Delivered-To: freebsd-ipfw@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 9DBBC294; Mon, 4 Aug 2014 11:54:38 +0000 (UTC) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id ED1F82022; Mon, 4 Aug 2014 11:54:37 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 990DF73029; Mon, 4 Aug 2014 13:58:17 +0200 (CEST) Date: Mon, 4 Aug 2014 13:58:17 +0200 From: Luigi Rizzo To: "Alexander V. Chernikov" Subject: Re: ipfw named objejcts, table values and syntax change Message-ID: <20140804115817.GA13814@onelab2.iet.unipi.it> References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <53DF55FA.8010303@FreeBSD.org> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-ipfw@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: IPFW Technical Discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Aug 2014 11:54:38 -0000 On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote: > On 02.08.2014 12:33, Alexander V. Chernikov wrote: > > On 02.08.2014 10:33, Luigi Rizzo wrote: > >> > >> > >> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov > >> > wrote: > >> > >> Hello all. > >> > >> I'm currently working on to enhance ipfw in some areas. > >> The most notable (and user-visible) change is named table support. > >> The other one is support for different lookup algorithms for different > >> key types. > >> > >> For example, new ipfw permits writing this: > >> > >> ipfw table tb1 create type cidr > >> ipfw add allow ip from table(tl1) to any > >> ipfw add allow ip from any lookup dst-ip tb1 > >> > >> ipfw table if1 create type iface > >> ipfw add skipto tablearg ip from any to any via table(if1) > >> > >> or even this: > >> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port > >> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 > >> ipfw add allow ip from any to any flow table(fl1) > >> > >> all these changes fully preserve backward compatibility. > >> (actually tables needs now to be created before use and their type needs > >> to match with opcode used, but new ipfw(8) performs auto-creation > >> for cidr tables). > >> > >> There is another thing I'm going to change and I'm not sure I can keep > >> the same compatibility level. > >> > >> Table values, from one point of view, can be classified to the following > >> types: > >> > >> - skipto argument > >> - fwd argument (*) > >> - link to another object (nat, pipe, queue) > >> - plain u32 (not bound to any object) > >> (divert/tee,netgraph,tag/utag,limit) > >> > >> There are the following reasons why I think it is necessary to implement > >> explicit table values typing (like tables): > >> - Implementing fwd tablearg for IPv6 hosts requires indirection table > >> - Converting nat/pipe instance ids to names renders values unusable > >> - retiring old hack with storing saved pointer of found object/rule > >> inside rule w/o proper locking > >> - making faster skipto > >> > >> > >> ??????i don't buy the idea that you need typed arguments > >> for all the cases above. Maybe the case that > >> may make sense is the fwd argument (and in the future > >> something else). > >> We already discussed, i think, the fact that now it > >> is legal to have references to non existing things > >> (skipto, pipes etc.) implemented as u32. > >> Removing that would break configurations. > > It depends on actual implementation. This can be preserved by > > auto-creating necessary objects in kernel and/or in userspace, so > > we can (and should) avoid breaking in this particular way. > Can you please explain your vision on values another time? > As far as I understand, you're not against it in general, but the > details matter: > * IP address can be one of the types (it won't break much, and we can > simply skip that one for MFC) > * what about typing for nat/pipes ? we're not going to convert their ids > to names? (or maybe you can suggest other non-disruptive way?) > * everything else is type "u32" Correct, I am mostly concerned about the details, not on the general concept. To summarize the discussion Alexander and I had about converting identifiers from numbers to arbitrary strings (this is partly related to the values stored in tables, but I think we should have a coherent behaviour) 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits or less) for rules, pipes, queues, tables, probably nat instances. 2. CURRENTLY, in all the above contexts, it is legal to reference a non existing object (rule, pipe, table names, etc.), and the kernel will do something reasonable, namely jump to the next rule, drop traffic for non existing pipes, and so on. 3. of course we want to preserve backward compatibility both for the ioctl interface, and for user configurations. 4. The in-kernel representation of identifiers is not visible to users, so we can use a numeric representation in the kernel for identifiers. Strings like "12345" are converted with atoi() or the like, whereas for other identifiers or numbers outside of the 2^16 range the kernel manages a translation table, allocating new numeric identifiers if a new string appears. This permits backward compatibility for old rulesets, and does not impact performance because the translation table is only used during rules additions or deletion. With this in mind, i think we should follow a similar approach for objects stored in tables, hence if an u32 value was available in the past, it must be available also in the new implementation. The issue with tables is that some convoluted configuration could use the same table to reference pipes _and_ rules _and_ perhaps other things represented as numbers (the former is not too strange, if i have a large configuration i might place sections at rules 12000, 13000, 14000... and associate pipes with the same numberic identifier to each block of rules). Typed table values would clearly disturb backward compatibility in the above configurations. However it should not be difficult to accept arbitrary strings as the values stored in tables, and then store multiple representations as appropriate, including: - the string representation, unconditionally - for names that can be resolved by DNS, the ipv6 and ipv4 address(es) associated with them. ipfw already translates hostnames in rules so this is POLA - for other strings, a u32 from the translation table as previously indicated - and for numeric values, the u32 representation (truncated if needed, according to whatever is the existing behaviour) - If we cannot generate an u32 we will put some value (e.g. 0) that hopefully will not cause confusion. If we do it this way, we should be able to preserve backward compatibility _and_ add features that people may need. cheers luigi From owner-freebsd-ipfw@FreeBSD.ORG Mon Aug 4 19:51:38 2014 Return-Path: Delivered-To: freebsd-ipfw@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C0BF68E8; Mon, 4 Aug 2014 19:51:38 +0000 (UTC) Received: from mail.ipfw.ru (mail.ipfw.ru [IPv6:2a01:4f8:120:6141::2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4CC542177; Mon, 4 Aug 2014 19:51:38 +0000 (UTC) Received: from v6.mpls.in ([2a02:978:2::5] helo=ws.su29.net) by mail.ipfw.ru with esmtpsa (TLSv1:DHE-RSA-AES128-SHA:128) (Exim 4.82 (FreeBSD)) (envelope-from ) id 1XEKL6-000JDe-9n; Mon, 04 Aug 2014 19:38:20 +0400 Message-ID: <53DFE438.5050209@FreeBSD.org> Date: Mon, 04 Aug 2014 23:51:20 +0400 From: "Alexander V. Chernikov" User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Luigi Rizzo Subject: Re: ipfw named objejcts, table values and syntax change References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org> <20140804115817.GA13814@onelab2.iet.unipi.it> In-Reply-To: <20140804115817.GA13814@onelab2.iet.unipi.it> X-Enigmail-Version: 1.6 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="pHf6c1bQV38KmqoIE4V1KKOaAxNm7R47H" Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-ipfw@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: IPFW Technical Discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Aug 2014 19:51:38 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --pHf6c1bQV38KmqoIE4V1KKOaAxNm7R47H Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 04.08.2014 15:58, Luigi Rizzo wrote: > On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote:= >> On 02.08.2014 12:33, Alexander V. Chernikov wrote: >>> On 02.08.2014 10:33, Luigi Rizzo wrote: >>>> >>>> >>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov >>>> > wrote: >>>> >>>> Hello all. >>>> >>>> I'm currently working on to enhance ipfw in some areas. >>>> The most notable (and user-visible) change is named table suppo= rt. >>>> The other one is support for different lookup algorithms for di= fferent >>>> key types. >>>> >>>> For example, new ipfw permits writing this: >>>> >>>> ipfw table tb1 create type cidr >>>> ipfw add allow ip from table(tl1) to any >>>> ipfw add allow ip from any lookup dst-ip tb1 >>>> >>>> ipfw table if1 create type iface >>>> ipfw add skipto tablearg ip from any to any via table(if1) >>>> >>>> or even this: >>>> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port >>>> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 >>>> ipfw add allow ip from any to any flow table(fl1) >>>> >>>> all these changes fully preserve backward compatibility. >>>> (actually tables needs now to be created before use and their t= ype needs >>>> to match with opcode used, but new ipfw(8) performs auto-creati= on >>>> for cidr tables). >>>> >>>> There is another thing I'm going to change and I'm not sure I c= an keep >>>> the same compatibility level. >>>> >>>> Table values, from one point of view, can be classified to the = following >>>> types: >>>> >>>> - skipto argument >>>> - fwd argument (*) >>>> - link to another object (nat, pipe, queue) >>>> - plain u32 (not bound to any object) >>>> (divert/tee,netgraph,tag/utag,limit) >>>> >>>> There are the following reasons why I think it is necessary to = implement >>>> explicit table values typing (like tables): >>>> - Implementing fwd tablearg for IPv6 hosts requires indirection= table >>>> - Converting nat/pipe instance ids to names renders values unus= able >>>> - retiring old hack with storing saved pointer of found object/= rule >>>> inside rule w/o proper locking >>>> - making faster skipto >>>> >>>> >>>> ??????i don't buy the idea that you need typed arguments >>>> for all the cases above. Maybe the case that >>>> may make sense is the fwd argument (and in the future >>>> something else). >>>> We already discussed, i think, the fact that now it >>>> is legal to have references to non existing things >>>> (skipto, pipes etc.) implemented as u32. >>>> Removing that would break configurations. >>> It depends on actual implementation. This can be preserved by >>> auto-creating necessary objects in kernel and/or in userspace, so >>> we can (and should) avoid breaking in this particular way. >> Can you please explain your vision on values another time? >> As far as I understand, you're not against it in general, but the=20 >> details matter: >> * IP address can be one of the types (it won't break much, and we can = >> simply skip that one for MFC) >> * what about typing for nat/pipes ? we're not going to convert their i= ds=20 >> to names? (or maybe you can suggest other non-disruptive way?) >> * everything else is type "u32" >=20 > Correct, I am mostly concerned about the details, not on the general co= ncept. >=20 > To summarize the discussion Alexander and I had about converting > identifiers from numbers to arbitrary strings (this is partly related > to the values stored in tables, but I think we should have a coherent > behaviour) >=20 > 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits or= less) > for rules, pipes, queues, tables, probably nat instances. >=20 > 2. CURRENTLY, in all the above contexts, it is legal to reference a > non existing object (rule, pipe, table names, etc.), > and the kernel will do something reasonable, namely jump to the > next rule, drop traffic for non existing pipes, and so on. >=20 > 3. of course we want to preserve backward compatibility both for > the ioctl interface, and for user configurations. >=20 > 4. The in-kernel representation of identifiers is not visible to users,= > so we can use a numeric representation in the kernel for identifiers= =2E > Strings like "12345" are converted with atoi() or the like, > whereas for other identifiers or numbers outside of the 2^16 range > the kernel manages a translation table, allocating new numeric > identifiers if a new string appears. > This permits backward compatibility for old rulesets, and does not > impact performance because the translation table is only > used during rules additions or deletion. Yes. However this requires either holding either (1) 2 pointers (old&new arrays), or (2) 65k+ index array, or (3) chained hash table. (1) would require additional pointers for each subsystem (and some additional management), (2) will definitely upset embedded guys and (3) is worse in terms of performance >=20 > With this in mind, i think we should follow a similar approach for > objects stored in tables, hence >=20 > if an u32 value was available in the past, it must be > available also in the new implementation. >=20 > The issue with tables is that some convoluted configuration could > use the same table to reference pipes _and_ rules _and_ perhaps > other things represented as numbers (the former is not too strange, > if i have a large configuration i might place sections at rules > 12000, 13000, 14000... and associate pipes with the same numberic > identifier to each block of rules). >=20 > Typed table values would clearly disturb backward compatibility > in the above configurations. However it should not be difficult > to accept arbitrary strings as the values stored in tables, and > then store multiple representations as appropriate, including: Well, I've thought about thas one. It may be an option, but the details are not so promising (below) > - the string representation, unconditionally > - for names that can be resolved by DNS, the ipv6 and ipv4 address(es) > associated with them. ipfw already translates hostnames in rules > so this is POLA I'm not happy what ipfw(8) is doing instead of translation. The proper way would be not simply using first AF_INET answer but saving ALL IPv4+IPv6 records inside rule (and some more tracking should be done afterwards, but that's totally different story). Additionally, I'm unsure if we really need next-hop value expressed as hostname (how can we deal with multiple addresses and diffrent AFs?). We may store strings (and I think we should do it) but I'm unsure about this particular option of interpreting them. > - for other strings, a u32 from the translation table as previously > indicated > - and for numeric values, the u32 representation (truncated if needed, > according to whatever is the existing behaviour) > - > If we cannot generate an u32 we will put some value (e.g. 0) > that hopefully will not cause confusion. As far as I understand, we accept some string "s" as table value inside the kernel, than, we have some logic that says: oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s" has just been created, let's save indices. That would require additional indirection table like: index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ] ( so we will have 2-level indirection table for fwd if we do IPv6) We can optimize this if we use "same name -> same kidx" approach regardless of kernel object we're refering to. That might require some more memory, but that's OK from my point of view. So we end up with int [ skipto idx | fwd idx | obj idx ] idx "0" is special value which means the same as 2.CURRENT That looks better, but still way to complex. I do care about compatibility, but it's hard to improve things without changing. I'd like to propose the following: * Split values into 3 types ("ip|nexthop", "number", "object") * Do not insist on object existence, use value "0" to mimic 2.CURRENT behavior. * Retain full compatibility by introducing special value type "legacy" which matches any type and is backed by given indirection table. * Issue warning in ipfw(8) binary on all auto-created tables that auto-creation is legacy and this behavior will be dropped in next major release (e.g. 11.0) * Save this behavior in MFC but drop "legacy" tables in head after a month after actual MFC. That do you think? >=20 > If we do it this way, we should be able to preserve backward > compatibility _and_ add features that people may need. >=20 > cheers > luigi >=20 --pHf6c1bQV38KmqoIE4V1KKOaAxNm7R47H Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlPf5DwACgkQwcJ4iSZ1q2lHjwCbBFGnZgeSxmyWiYo+sI0c12W0 z4UAoKRb5YoqU5WQKKMlxG4l+wEbqMKk =rPfj -----END PGP SIGNATURE----- --pHf6c1bQV38KmqoIE4V1KKOaAxNm7R47H-- From owner-freebsd-ipfw@FreeBSD.ORG Fri Aug 8 12:50:20 2014 Return-Path: Delivered-To: freebsd-ipfw@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 4FC34D9B; Fri, 8 Aug 2014 12:50:20 +0000 (UTC) Received: from mail.rlan.ru (mail.rlan.ru [213.234.25.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C23AA2DBA; Fri, 8 Aug 2014 12:50:19 +0000 (UTC) Message-ID: <53E4BE62.4050303@rlan.ru> Date: Fri, 08 Aug 2014 16:11:14 +0400 From: Dmitry Selivanov User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: "Alexander V. Chernikov" Subject: Re: ipfw named objejcts, table values and syntax change References: <53DC01DE.3000000@FreeBSD.org> <53DCA25C.1000108@FreeBSD.org> <53DF55FA.8010303@FreeBSD.org> <20140804115817.GA13814@onelab2.iet.unipi.it> <53DFE438.5050209@FreeBSD.org> In-Reply-To: <53DFE438.5050209@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Cc: freebsd-ipfw , "freebsd-net@freebsd.org" X-BeenThere: freebsd-ipfw@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: IPFW Technical Discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Aug 2014 12:50:20 -0000 04.08.2014 23:51, Alexander V. Chernikov пишет: > On 04.08.2014 15:58, Luigi Rizzo wrote: >> On Mon, Aug 04, 2014 at 01:44:26PM +0400, Alexander V. Chernikov wrote: >>> On 02.08.2014 12:33, Alexander V. Chernikov wrote: >>>> On 02.08.2014 10:33, Luigi Rizzo wrote: >>>>> >>>>> >>>>> On Fri, Aug 1, 2014 at 11:08 PM, Alexander V. Chernikov >>>>> > wrote: >>>>> >>>>> Hello all. >>>>> >>>>> I'm currently working on to enhance ipfw in some areas. >>>>> The most notable (and user-visible) change is named table support. >>>>> The other one is support for different lookup algorithms for different >>>>> key types. >>>>> >>>>> For example, new ipfw permits writing this: >>>>> >>>>> ipfw table tb1 create type cidr >>>>> ipfw add allow ip from table(tl1) to any >>>>> ipfw add allow ip from any lookup dst-ip tb1 >>>>> >>>>> ipfw table if1 create type iface >>>>> ipfw add skipto tablearg ip from any to any via table(if1) >>>>> >>>>> or even this: >>>>> ipfw table fl1 create type flow:src-ip,proto,dst-ip,dst-port >>>>> ipfw table fl1 add 10.0.0.5,tcp,10.0.0.6,80 4444 >>>>> ipfw add allow ip from any to any flow table(fl1) >>>>> >>>>> all these changes fully preserve backward compatibility. >>>>> (actually tables needs now to be created before use and their type needs >>>>> to match with opcode used, but new ipfw(8) performs auto-creation >>>>> for cidr tables). >>>>> >>>>> There is another thing I'm going to change and I'm not sure I can keep >>>>> the same compatibility level. >>>>> >>>>> Table values, from one point of view, can be classified to the following >>>>> types: >>>>> >>>>> - skipto argument >>>>> - fwd argument (*) >>>>> - link to another object (nat, pipe, queue) >>>>> - plain u32 (not bound to any object) >>>>> (divert/tee,netgraph,tag/utag,limit) >>>>> >>>>> There are the following reasons why I think it is necessary to implement >>>>> explicit table values typing (like tables): >>>>> - Implementing fwd tablearg for IPv6 hosts requires indirection table >>>>> - Converting nat/pipe instance ids to names renders values unusable >>>>> - retiring old hack with storing saved pointer of found object/rule >>>>> inside rule w/o proper locking >>>>> - making faster skipto >>>>> >>>>> >>>>> ??????i don't buy the idea that you need typed arguments >>>>> for all the cases above. Maybe the case that >>>>> may make sense is the fwd argument (and in the future >>>>> something else). >>>>> We already discussed, i think, the fact that now it >>>>> is legal to have references to non existing things >>>>> (skipto, pipes etc.) implemented as u32. >>>>> Removing that would break configurations. >>>> It depends on actual implementation. This can be preserved by >>>> auto-creating necessary objects in kernel and/or in userspace, so >>>> we can (and should) avoid breaking in this particular way. >>> Can you please explain your vision on values another time? >>> As far as I understand, you're not against it in general, but the >>> details matter: >>> * IP address can be one of the types (it won't break much, and we can >>> simply skip that one for MFC) >>> * what about typing for nat/pipes ? we're not going to convert their ids >>> to names? (or maybe you can suggest other non-disruptive way?) >>> * everything else is type "u32" >> >> Correct, I am mostly concerned about the details, not on the general concept. >> >> To summarize the discussion Alexander and I had about converting >> identifiers from numbers to arbitrary strings (this is partly related >> to the values stored in tables, but I think we should have a coherent >> behaviour) >> >> 1. CURRENTLY ipfw uses numeric identifiers in a small range (16 bits or less) >> for rules, pipes, queues, tables, probably nat instances. >> >> 2. CURRENTLY, in all the above contexts, it is legal to reference a >> non existing object (rule, pipe, table names, etc.), >> and the kernel will do something reasonable, namely jump to the >> next rule, drop traffic for non existing pipes, and so on. >> >> 3. of course we want to preserve backward compatibility both for >> the ioctl interface, and for user configurations. >> >> 4. The in-kernel representation of identifiers is not visible to users, >> so we can use a numeric representation in the kernel for identifiers. >> Strings like "12345" are converted with atoi() or the like, >> whereas for other identifiers or numbers outside of the 2^16 range >> the kernel manages a translation table, allocating new numeric >> identifiers if a new string appears. >> This permits backward compatibility for old rulesets, and does not >> impact performance because the translation table is only >> used during rules additions or deletion. > Yes. However this requires either holding either (1) 2 pointers (old&new > arrays), or (2) 65k+ index array, or (3) chained hash table. > (1) would require additional pointers for each subsystem (and some > additional management), > (2) will definitely upset embedded guys and > (3) is worse in terms of performance >> >> With this in mind, i think we should follow a similar approach for >> objects stored in tables, hence >> >> if an u32 value was available in the past, it must be >> available also in the new implementation. >> >> The issue with tables is that some convoluted configuration could >> use the same table to reference pipes _and_ rules _and_ perhaps >> other things represented as numbers (the former is not too strange, >> if i have a large configuration i might place sections at rules >> 12000, 13000, 14000... and associate pipes with the same numberic >> identifier to each block of rules). >> >> Typed table values would clearly disturb backward compatibility >> in the above configurations. However it should not be difficult >> to accept arbitrary strings as the values stored in tables, and >> then store multiple representations as appropriate, including: > Well, I've thought about thas one. It may be an option, but the details > are not so promising (below) >> - the string representation, unconditionally >> - for names that can be resolved by DNS, the ipv6 and ipv4 address(es) >> associated with them. ipfw already translates hostnames in rules >> so this is POLA > I'm not happy what ipfw(8) is doing instead of translation. The proper > way would be not simply using first AF_INET answer but saving ALL > IPv4+IPv6 records inside rule (and some more tracking should be done > afterwards, but that's totally different story). Additionally, I'm > unsure if we really need next-hop value expressed as hostname (how can > we deal with multiple addresses and diffrent AFs?). We may store strings > (and I think we should do it) but I'm unsure about this particular > option of interpreting them. >> - for other strings, a u32 from the translation table as previously >> indicated >> - and for numeric values, the u32 representation (truncated if needed, >> according to whatever is the existing behaviour) >> - >> If we cannot generate an u32 we will put some value (e.g. 0) >> that hopefully will not cause confusion. > As far as I understand, we accept some string "s" as table value inside > the kernel, than, we have some logic that says: > oh, dummynet pipe has the same name "s"s, oh, nat entity with name "s" > has just been created, let's save indices. > > That would require additional indirection table like: > > index | [ skipto idx | nat idx | pipe idx | queue idx | fwd index ] > ( so we will have 2-level indirection table for fwd if we do IPv6) > > We can optimize this if we use "same name -> same kidx" approach > regardless of kernel object we're refering to. That might require some > more memory, but that's OK from my point of view. > > So we end up with > int [ skipto idx | fwd idx | obj idx ] > > idx "0" is special value which means the same as 2.CURRENT > > That looks better, but still way to complex. > I do care about compatibility, but it's hard to improve things without > changing. > > I'd like to propose the following: > * Split values into 3 types ("ip|nexthop", "number", "object") > * Do not insist on object existence, use value "0" to mimic 2.CURRENT > behavior. > * Retain full compatibility by introducing special value type "legacy" > which matches any type and is backed by given indirection table. > * Issue warning in ipfw(8) binary on all auto-created tables that > auto-creation is legacy and this behavior will be dropped in next major > release (e.g. 11.0) > * Save this behavior in MFC but drop "legacy" tables in head after a > month after actual MFC. > > That do you think? >> >> If we do it this way, we should be able to preserve backward >> compatibility _and_ add features that people may need. >> >> cheers >> luigi >> Here is my idea: tablearg should contain more than one value. I think getting several values from one table lookup is faster than several table lookups with one value. Let tablearg be not just uint32, but array with different value types inside it. For example I have many such rules: allow src-ip 1.2.3.4 MAC any 11:22:33:44:55:66 recv vlan1234 dst-ip 1.1.1.1 These rules can be replaced with such construction: allow src-ip table(1) MAC any tablearg[1] recv tablearg[2] dst-ip tablearg[3] But I don't think indexing by value is a good idea. I think index==starting byte is a better way: allow src-ip table(1) MAC any tablearg:0 recv tablearg:6 dst-ip tablearg:32 where MAC's 6 bytes are from 0 to 5 in tablearg; iface string is from 6 and till \0, but less than 26 bytes; and IPv4's 4 bytes are from 32 to 35. So we need to create table for it: table 1 set MAC:0 string:6:26 ip:32 table 1 add 1.2.3.4 11:22:33:44:55:66 vlan1234 1.1.1.1 String can be used both for iface and comment. Other possible value types: uint16 for nat, pipe, skipto and other 2-bytes actions IPv4 4 bytes CIDRv4 5 bytes IPv6 16 bytes CIDRv6 17 bytes table_id 2 bytes - link to another table Table value length can be set for example with loader tunable like net.inet.ip.fw.table_value_length. Even with default uint32 value length we can get 2 uint16 values or 4 uint8 values, this can help in some configurations. This way is more complex, but much more flexible. It's like netgraph subsystem. I think it suites both Alexander and Luigi requests.