Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 1 Aug 2016 23:00:40 +0800
From:      Julian Elischer <julian@freebsd.org>
To:        "Dr. Rolf Jansen" <rj@obsigna.com>, freebsd-ipfw@freebsd.org
Subject:   Re: ipfw divert filter for IPv4 geo-blocking
Message-ID:  <7ca15b3c-e0af-2759-6d43-96ecc9990d2b@freebsd.org>
In-Reply-To: <AB669E85-445A-4DFC-8DFA-D868A569432D@obsigna.com>
References:  <61DFB3E2-6E34-4EEA-8AC6-70094CEACA72@cyclaero.com> <CAHu1Y739PvFqqEKE74BjzgLa7NNG6Kh55NPnU5MaA-8HsrjkFw@mail.gmail.com> <4D047727-F7D0-4BEE-BD42-2501F44C9550@obsigna.com> <c2cd797d-66db-8673-af4e-552dfa916a76@freebsd.org> <9641D08A-0501-4AA2-9DF6-D5AFE6CB2975@obsigna.com> <4d76a492-17ae-cbff-f92f-5bbbb1339aad@freebsd.org> <C0CC7001-16FE-40BF-A96A-1FA51A0AFBA7@obsigna.com> <677900fb-c717-743f-fcfe-86b603466e33@freebsd.org> <0D3C9016-7A4A-46BA-B35F-3844D07562A8@obsigna.com> <CAFPNf59w6BHgDjLNHW=rQckZAFG4gqPHL49vLXiDmMAxVPOcKg@mail.gmail.com> <1E1DB7E0-D354-4D7A-B657-0ECF94C12CE0@obsigna.com> <50d405a4-3f8f-a706-9cac-d1162925e56a@freebsd.org> <c62fa048-63c8-aef6-5bad-b0a6719f6acb@freebsd.org> <9222BB10-C700-4DE7-83A3-BE7A38A11713@obsigna.com> <1B36CAD7-A139-436B-B7EC-0FFF232F9C6A@obsigna.com> <3eedb126-ac78-ddad-27d3-132c742b2fdb@freebsd.org> <AB669E85-445A-4DFC-8DFA-D868A569432D@obsigna.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1/08/2016 7:16 PM, Dr. Rolf Jansen wrote:
>> Am 01.08.2016 um 03:17 schrieb Julian Elischer <julian@freebsd.org>:
>> On 30/07/2016 10:17 PM, Dr. Rolf Jansen wrote:
>>> I finished the work on CIDR conformity of the IP ranges tables generated by the tool geoip. The main constraint is that the start and end address of an IP block given by the delegation files MUST BE PRESERVED during the transformation to a set of CIDR records. This target is achieved by:
>>>
>>>   1. Finding the largest common netmask boundary of the start address utilizing
>>>      int(log2(addr_count)); then iteration like Euclid's algorithm in computing
>>>      a GCD.
>>>
>>>   2. Output the CIDR with the given start address and the masklen belonging
>>>      to the found netmask.
>>>
>>>   3. If the CIDR does not match the whole original IP range then set the start
>>>      address of the next CIDR block to the next boundary of the common netmask,
>>>      and loop over starting at 1. until the original range has been satisfied.
>>>
>> check out the appletalk code I pointed out  to you.. I wrote that in 93 or so but I remember sweating blood
>> over it to get it right.
> I read the description of the code and the following sentence made me suspicious that aa_dorangeroute() would guarantee the above mentioned main constraint  "start and end address of an IP block given by the delegation files MUST BE PRESERVED" can be matched. Start/end address are said to be anything (even undefined) but fixed in the description
So you get your data as ranges (i.e. a start and an end address)
>
>     ...
>     Split the range into two subranges such that the middle
>     of the two ranges is the point where the highest bit of difference
>     between the two addresses makes its transition.
>     ...
>
> I do not want this.
I think I may explain it better by example..

Any range of addresses can conceptually be broken into an optimal 
range of binary chunks.
for example, for the range 1..... 62   we have an optimal set of 
binary subranges:
  1-1  2-3  4-7 8-15 16-31 (*) 32-47 48-55 56-59 60-61 and 62-62
there is always a point (*)  where we separate the left hand side from 
the right hand side.  You can also only merge the two items either 
side of the (*) point, and even then, only if they have the same 
width. Ranges not immediately adjacent to the (*) point can never be 
merged into any other range.
In some ranges, there are no items to the right, and for some ranges 
there are no items to the left. There may also be skipped ranges. The 
above example is "worst case".  All 10 subranges are present, and only 
2 can be merged, leading to needing 9 binary subranges to correctly 
express the whole range.
So the term "middle" is misleading.  It however allows you to take an 
arbitrary range of addresses and generate the optimal set of cidr 
addresses.
In this case if you had 10.0.1.0 through 10.0.62.255 you would need 9 
cidr addresses to express the range correctly (the middle two can be 
joined).
That is what the code referred to does.

if you have exceptions as you do in routing tables,  you can express 
it as 3 cidr addresses..  0-64  MINUS 0 and MINUS 63.
i.e 10.0.0.0/18 minus 10.0.0.0/24 minus 10.0.63.0/24. However this is 
probably not of interest to your use case.
It however IS possible to do this in ipfw since it allows overlapping 
ranges of different widths. (you will note that the code pointed to 
does NOT do that. it would be an order of magnitude (or more) harder 
to do.)

>
>>> I carefully tested the algorithm and a table that I pipe by the new geoip tool into ipfw is 100 % identical to the output of the ipfw command 'table N list'.
>> though that doesn't mean it is semantically identical to the original table due to 'most specific rule wins" behaviour.
>>
>> for example:
>> if you type in ;
>>
>> 1.2.3.0/24 -> A
>> and
>> 1.2.3.0/26 -> B
>> then both rules will be listed the same as what you put in
>> but if you wanted to get all rules that point to A, without having rules that point to B, then you would have to export
>> 1.2.3.64/26  -> A
>> 1.2.3.128/25 -> A
>>   (i.e. TWO rules)
> This is definitely not the usage case. The origin of the data to be passed to ipfw tables are RIR delegation statistics files, which is guaranteed to be consolidated, namely resolved overlaps and joined adjacencies, long before any tables for ipfw are generated. Each range entry got a well defined, i.e. fixed, i.e non-variable starting address, and anything that changes the starting address of the ranges renders the table useless. Every entry got a well defined range length, and that one also must not be changed, or the table would be useless as well.
I think you are misunderstanding what I meant.
>
> In addition, we are talking about automatic generation of thousands of entries, and I never ever won't rely on something like 'most specific rule wins' behaviour, I want the behaviour as explicit as possible, and for this reason I am happy with 'INPUT is 100 % identical to the OUTPUT'.

I agree. I was saying that getting the same rules out of ipfw as you 
put INTO ipfw does not guarantee that you have correctly converted the 
ranges given in the RIR databases into accurate CIDR ranges.  It just 
means you have correctly normalised the data you are putting into the 
firewall table.
>
>> you could also export
>> 1.2.3.0/24 -> A
>> 1.2.3.0/26 -> 0  (think of it as an "EXCEPT for these" rule)
>>
>> which is ALSO two rules but you would need to be sure that the receiver knows what to do with them.
> This is simply a ridiculous example in the given respect, this sounds like you are suggesting fuzzying the input data in order to bring ipfw to its limits. This makes life less boring, doesn't it? No thanks.
noooooo
I'm saying that in a different use case data can be (and sometimes is) 
given assuming that exceptions work. But that such use cases must 
always have the entire database to work from and you can not extract 
smaller parts of it without risking leaving out one of the exception 
clauses.
(unless you specifically code for that).
(I've been doing this for 30 years, you can assume I'm not taking 
about monte-carlo routing .. life is interesting enough thank you.. :-)



>
>>> It is worth to note, that already the original RIR delegation files contain 457 non CIDR conforming IPv4 ranges in a total of 165815 original records. I guess that this number will increase in the future because the RIR's ran empty on new IPv4 ranges and are urged to subdivide returned old ranges for new delegations. The above algorithm is ready for this.
>>>
>>> Generally, CIDR conforming tables are more than twice as large as optimized (joined adjacencies) IP range tables. All said changes have been pushed to GitHup already.
>>>
>> Unfortunately there is no way to specify (using cidr notation) a.b.1.x AND a.b.2.x without including a.b.[03].x.
>>
>> HOWEVER
>> if you specified the FULL table you could use the "except" feature of routing table behaviour where
>> a.b.0.x/22  -> A
>> a.b.0.x/24  -> B
>> a.b.3.x/24  -> B
>> gives you the same thing because of the 'most specific rule wins" nature of routing table evaluation.
>> I believe this is the case in the tables you imported.
>> the trick is to be able to take an "optimised" table such as that above and produce, given a required subset, just the required part, while changing the rules as needed on the fly to "de-optimise" them enough to maintain correctness.
> Again, this is not the usage case.
that is pretty much the case that you showed with Argentina and Brazil.
four /20 regions, where the last three were aggregated..
Did they come to you already joined together, or did you join them on 
receipt?

in what form do you get and keep the data?
you showed:

$ geoip 201.222.20.1
--> 201.222.20.1 in 201.222.20.0-201.222.31.255 in BR

$ geoip 201.222.16.1
--> 201.222.16.1 in 201.222.16.0-201.222.19.255 in AR

do you keep it internally as a cidr address? or as a range?
>
>>> I am still a little bit amazed how ipfw come to accept incorrect CIDR ranges and arbitrarily moves the start/end addresses in order to achieve CIDR conformity, and that without any further notice, and that given that ipfw can be considered as being quite relevant to system security. Or, may I assume that ipfw knows always better than the user what should be allowed or denied. Otherwise, perhaps I am the only one ever who input incorrect CIDR ranges for processing by ipfw.
>> I answered this before but can't see the answer in my out box, plus I have added info..
>>
>> The ipfw code is derived from the routing code.  it is shorthand notation for a.b.c.d [netmask e.f.g.h ]
>> there is nothing that says that a.b.c.d need be the first address in the range. (though some vendors may require that.)
>> to quote wikipedia on the topic (yes, I know, not an authoritative source)
>>
>> ==== quote ====
>> The address may denote a single, distinct interface address or the beginning address of an entire network. The maximum size of the network is given by the number of addresses that are possible with the remaining, least-significant bits below the prefix. The aggregation of these bits is often called the host identifier.
>>
>> For example:
>>
>> 	• 192.168.100.14/24 represents the IPv4 address 192.168.100.14 and its associated routing prefix 192.168.100.0, or equivalently, its subnet mask 255.255.255.0, which has 24 leading 1-bits.
>> I use this all the time when parsing information that contains a hostname, and I know the netmask width. It saves me from having to have complicated shell code to pull apart the address and zero out the host bits of the address.
> I got it, anyway this is not an issue anymore for the new geoip table generation.
>
> Best regards
>
> Rolf
>
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?7ca15b3c-e0af-2759-6d43-96ecc9990d2b>