From owner-freebsd-ipfw@FreeBSD.ORG Wed Jul 2 14:43:25 2003 Return-Path: Delivered-To: freebsd-ipfw@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4CD3D37B401; Wed, 2 Jul 2003 14:43:25 -0700 (PDT) Received: from mail2.fishnavy.inf.cu (mail2.fishnavy.inf.cu [200.55.133.86]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5BECC43FDF; Wed, 2 Jul 2003 14:43:09 -0700 (PDT) (envelope-from jemilio@fishnavy.inf.cu) Received: from jemilio (proxy.cubamar.cu [200.55.133.82] (may be forged)) h62MlJ9N000193; Wed, 2 Jul 2003 17:47:28 -0500 (EST) Message-ID: <00f501c340e3$9a4d31d0$85060a0a@CONDOR.CU> From: =?iso-8859-1?Q?Jos=E9_Emilio_N=FA=F1ez_May=E1ns?= To: "Luigi Rizzo" , , Date: Wed, 2 Jul 2003 17:47:49 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Subject: Another algorithm for matching static firewall rules on a network packet X-BeenThere: freebsd-ipfw@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: IPFW Technical Discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 Jul 2003 21:43:25 -0000 Another algorithm for matching static firewall rules on a network packet PREFACE Associated with each firewall package is an evaluation algorithm that decides what action to do (deny, allow, count, etc.) with a network packet according to the existent fw-rules. This evaluation algorithm consists basically in a search through the firewall rules for a matching one(s) on the packet. In FreeBSD 4.5, this is the function "ip_fw_chk" in the file "ip_fw.c", IMHO. What is to be discussed next is a possible (?) firewall evaluation algorithm. I'm sorry, but I have no previous experience with the FreeBSD kernel to have a working implementation, so I can't run any real-life comparison. The algorithm may be UNPOLISHED. I hope you have more luck than I had. It's a complex algorithm, so the final idea is communicated in incremental steps, in order to let you correct more easily any error, bug or misunderstanding that, I'm positive, may arise. If I could ask you patience, I shall. The algorithm shown here shouldn't be covered by a patent currently, because it could be in use in the RDBMS arena for many years now. my best wishes, sincerely, José Emilio Núñez Mayáns PS: Although this e-mail is sent to: - professor Luigi Rizzo, by its extensive and professional work on the dynamic (stateful) firewall (and a lot of others); and to - Robert Watson, by the "ugidfw(8)" (File System Firewall Policy in FreeBSD-5.1-current), another kind firewall (on filesystem objects) (TrustedBSD project); - /* it should have been sent to others that have collaborated in the firewall code, Ugen J. S. Antsilevich, Poul-Henning Kamp, Alex Nash, Archie Cobbs... */ it's an incomplete work and FreeBSD is an open project. And I apologize for the unintentional mistakes I have made, and consuming your spare time. TOPICS SECTION 1. Introduction to the proposed algorithm SECTION 1.1. The basic algorithm's steps (ip_fw_chk function) SECTION 1.2. The search data structure SECTION 2. Informal initial sketch of the algorithm in C SECTION 3. Details and improvements of the algorithm SECTION 3.2. Irrelevant fields (omitted or equal to ANY) in a rule SECTION 3.3. Negated (NOT) values in a fw-rule SECTION 3.4. Dealing with ranges and masks of values SECTION 3.7. Frequently-matched rules near the start of the rules list SECTION 3.7.1. A first approach using a partial merge SECTION 3.7.2. Another approach using the fields' structure among the rules, without a merge of the indexing lists SECTION 3.7.3. Consolidating both variants of the algorithm SECTION 3.8. Multi-pass firewalls SECTION 4. Final words ANNEX 1. The merge algorithm (of section 3.2) ANNEX 2. IP_FW_CHK from section 3.7.2 ANNEX 3. How to build the array "break_rule" of each (static) rule (from section 3.7.2) ANNEX 4. IP_FW_CHK from section 3.7.2 ANNEX 90. A useful idea from the dynamic rules SECTION 1. INTRODUCTION TO THE PROPOSED ALGORITHM When a list of (static) firewall rules is set up, all the problematic values (restricted and/or allowed addresses, ports, etc.), are informed to the IPFW subsystem. When a packet arrives, all the values of its fields are known. Since those values don't change while scanning (searching) for deciding what to do with the packet, then, if we can find a way to use this fact in the search, we may obtain some saving. The proposed evaluation strategy tries to economize comparisons, which, in the common sequential-search algorithm may be repeated if two rules share a common field(s), e.g., in these two rules, allow tcp from 192.168.1.0/25 to me smtp allow udp from 192.168.3.0/25 to me syslog the IP-protocol, source and destination addresses, and destination port, are compared twice versus the respective ones in the packet, but no value has changed. This could be expensive if IP6 addresses are to be compared. (But note that, by trying to use the mentioned fact and by trying to save comparisons between the packet's fields and the firewall's, a waste in operations can have occurred in some other sense.) SECTION 1.1. THE BASIC ALGORITHM'S STEPS (ip_fw_chk FUNCTION) The idea consists basically in an indexing schema. 1.- Separate initially each field of the packet, because the comparisons are ultimately done field by field (addresses, ports, flags, ...) between its real value in the packet and what was specified in a fw-rule. 2.- Employ a quick search method for each of the (independent) fields in the (projection on that field of the) fw-rules. (For example, for each field, a hash-based lookup, or, what is also mentioned in the following, a search on a B-tree). Each value that appears in the search data structure of a field is there because it occurs in a fw-rule in such field. Each value (entry or node in the search data structure) has associated an ordered (by "fw_number" fw-rule#) list of the rules that have such value for the field (cf. section 1.2). These ordered lists were built when the search data structure (hash, B-tree, etc) containing the values was. In this step, each search for a field returns the above-mentioned list of fw-rules associated with the value of the field. I.e., this step as a whole returns the ordered lists of rules associated with the values of the fields in the packet. 3.- Merge the lists extracting the fw-rules that are common to all the fields (coincidences), and these are the fw-rules that have to be applied on the packet. As this common sublist will be also ordered by increasing rule number ("fw_number"), low-numbered rules will be applied first on the packet, preserving the existent semantics of the fw-subsystem. 4.- Finally, apply the list of common fw-rules -from step 3- on the packet, until an allow/deny rule is found in the list and then the transversal stops. More exactly, in the words of the ipfw(8) man page, the transversal on the list ends immediately after a "the search terminates" rule is applied (allow, deny, reject, reset, divert, etc.), and continues after a "the search continues" rule (count, skip, etc.) If no "the search terminates" fw-rule is found in the list, then, after the transversal, the default fw-rule is applied to the packet. SECTION 1.2. THE SEARCH DATA STRUCTURE Before continuing to the sketch of the basic algorithm in C, let's briefly expose the search data structure employed in step 2. It would have the following form: Overall perspective: a field main fw rule another field search data struct. list search data struct. | | ("ip_fw_chain_head") | | V ... V | V ... V list list V list list | |... | ----...-........-> rule | |... ... | | -----...--..- | | ... | ----...->.. | V | ---------...->.... -------> ... <----...------- | V ... | V default fw-rule And detailed for each field, search data structure (B-tree, hash, etc.) for the field | (/* containing several values of the field, mentioned in the | firewall rules -the projection of the fw-rules on that | field */) | a node or entry: a single value of the field, mentioned by the | fw-rules | (points to) | -> an ordered -by "fw_number"- list of the fw-rules matching that value for that field. Each entry in the list contains a pointer to a "struct ip_fw" in the "ip_fw_chain_head" list. A list of this type will be returned, if any, by the step 2 (searches) for each field. Step 3 (merge) may not change these lists. So, a node in the lists has this structure: struct list_entry { struct list_entry * next; struct ip_fw * rule; /* to its node on the "ip_fw_chain_head" * list */ }; (Since the lists of rules are built only at the same time the data structure is, the dynamic rules generated in a stateful firewall should not be represented by this algorithm, if it were considerable the time consumed in updating the search data structure and its associated lists when the fw-subsystem adds and removes dynamic rules.) In section 3, there are topics that complement in minor aspects these data structures. But the above diagram will be valid. SECTION 2. INFORMAL INITIAL SKETCH OF THE ALGORITHM IN C To make the idea of the algorithm clearer, let's expose informally an **initial** sketch of it in C: static int ip_fw_chk( struct ip ** pip, /* rest of parameters here*/ ...) { /* (This function is syntactically incorrect.) */ /*** 1. GET THE FIELDS OF THE PACKET ***/ get_packet_fields( pip, hlen, oif, &packet_src_address, &packet_dest_address, &packet_protocol, &packet_in_interface, &packet_out_interface, ...); /*** 2. MAKE A FAST SEARCH FOR EACH FIELD OF THE PACKET ON THE PROJECTION *** OF THE FW-RULES ON THAT FIELD ***/ struct list_entry * ordered_list_for_field[ MAX_FIELDS ]; for each of the fields of the packet /* src_address, dest_address, protocol, interfaces, ... */ { ordered_list_for_field[ fieldX ] = search_in_projection_of_fwrules_on_FIELDx ( packet_fieldX_value ); /* this search could be binary or hash-based. For example, using a * B-tree on the values of the field. There would be then several * B-trees, one for each field. You may use a type of data * data structure for a field and a distinct one for another * field. * * To improve performance, the lists are NOT built by the * searches. Each list returned by each search was set up when its * search data structure -hash, B-tree-, was. * * (Thereafter, the overhead of a stateful firewall -adding and * removing dynamic rules- could put a load in updating the search * data structures and their associated ordered lists, if this * algorithm were used for a stateful firewall.) */ } /*** 3. EXTRACT THE COINCIDENCES IN ALL THE SEARCHES (A MERGE) ***/ struct list_entry * ordered_common_sublist = merge_and_extract_coincidences( ordered_list_for_field ); /*** 4. APPLY THE COINCIDENT RULES ON THE PACKET ***/ if ( ! empty( ordered_common_sublist ) ) { /* The following function, "apply_fw_rules", finally applies the * rules on the packet. * * It returns TRUE if a "the search terminates" (allow, deny, reset, * unreach, ...) rule was applied -and it would have been the only * one of that type to have been applied-, and returns FALSE if all * the rules applied belonged to "the search continues" class (and * therefore the default fw rule has to be applied next). * * This function employs some parameters of "ip_fw_chk", e.g., the * mbuf "m" and the "next_hop", since "apply_fw_rules()" may modify * them if requested by a fw-rule. * * The last parameter to "apply_fw_rules" is really the return value * of this function "ip_fw_chk", if the former returns TRUE. */ int decision_found, ip_fw_chk_returnval ; decision_found = apply_fw_rules( ordered_common_sublist, /* other parameters to ip_fw_chk are passed * here, e.g., the mbuf "m", the * "next_hop", etc, since they may be * modified here */ &ip_fw_chk_returnval ); if ( decision_found ) return ip_fw_chk_returnval; } /* default firewall rule */ /* other stuff here: free the mbuf parameter "m", increment default * rule's statistics, etc. */ return IP_FW_PORT_DENY_FLAG; } /**** END of function ip_fw_chk() ****/ SECTION 3. DETAILS AND IMPROVEMENTS OF THE ALGORITHM There are several details that modify the implementation. SECTION 3.2. IRRELEVANT FIELDS (OMITTED OR EQUAL TO any) IN A RULE Normally, there are fw-rules that don't specify a value for a field. E.g., the rule allow tcp from any to me 80 setup ... doesn't specify any values for the source IP-address/TCP-port. Let's call these rules by "fieldX is irrelevant" in them. When a search data structure (B-tree, hash, etc) for a field is built, it contains known values, explicitly mentioned in the fw-rules. In the case of a "fieldX is irrelevant" rule, all the implicit, missing values of the fieldX domain can be generated into the data structure, but, when the domain is large (IP4 or IP6 addresses), the search on fieldX would be slow, IMHO. A solution is to have, for each fieldX, an entry "REST of the domain", with the list of the "fieldX is irrelevant" fw-rules. This entry is independent of fieldX's search data structure. If the search on the data structure is unsuccessful, then the value of fieldX in the network packet is classified as being in the "REST of the domain", and hence, that list is returned in step 2 for fieldX. (Of course, since these "fieldX is irrelevant" rules can also match every explicit value in the fieldX's search data structure, such rules can be in the explicit values' lists too, as required from section 1.2 which stated "an ordered list of fw-rules *matching* that -explicit- value for that field. Cf section 3.3) -- fieldX ------------------------------------------------------------- | | ------------------------------------------------------------------ | | "REST of the domain" entry | | | | | (points to) | | | | | -> ordered list of the "fieldX is irrelevant" fw-rules. This list | | is returned *only when* the search on the data structure for | | an explicit value was unsuccessful. | ------------------------------------------------------------------ | | ------------------------------------------------------------------ | | search data structure on fw's explicit fieldX's values (cf. section | | 1.2.) | | Note: the rules in the "REST of the domain" entry may also be | | included in the explicit values' lists (cf. section 3.3). | ------------------------------------------------------------------ ------------------------------------------------------ For example, in these two rules, add 1000 allow ip from me to 192.168.1.1 add 2000 allow tcp from any to any established taking in consideration the field "protocol" **only**, then the first rule is in the "REST" entry, because it didn't request a specific IP-family protocol. The second rule is in the list associated to entry "tcp" in the search data structure for field "protocol". Since a TCP packet can match the first rule also -from the field "protocol" point of view, because it specified "ip" that means irrelevant-, then rule 1000 is in the list of the "tcp" entry too -before the second one, by the "fw_number" order: field "protocol": - "REST of the domain" entry ---> list( ... 1000, ...) - search data structure (hash, B-tree, etc.) | | /* has an entry for protocol TCP ...*/ V "tcp" entry ---> list( ... 1000, 2000, ... ) A performance penalty of a "REST of the domain" entry (as employed for the "fieldX is irrelevant" problem of this section 3.2, but see section 3.3) is that, since some rules in the "REST" list can be included in each explicit value's list too, these lists can be longer, and the next step, step 3 -the merge-, will have more work to do exhausting longer lists. For formal reasons only, the "REST of the domain" solution was shown in this section, but it is reserved to be applied to section 3.3. The proposed solution to the "fieldX is irrelevant" problem is that each fw-rule have a bit-vector indicating which fields the rule needs to be matched (the rule has explicit values for them) and which fields it ignores. (This bitmask is very similar to the "fw_flg" and "fw_ipflg" existing in "struct ip_fw", but without the IP_FW_F_COMMAND and some other bits -e.g., the IP_FW_F_KEEP_S and IP_FW_F_CHECK_S bits.) With this bitmask, step 2 (the searches) are unaffected, but in the merge phase, the coincidence on a tentative fw-rule occurs when it appears in all the lists of its needed (explicit) fields, and the lists of its irrelevant fields are ignored. The coincidences with a bitmask for relevant fields (step 2-searches) As usual, array "ordered_list_for_field[ ]" in the code above: [field1] -- points to -> ordered list of fw-rules returned by the search on field1 (having its explicit value matching that in the network packet) ... [fieldN] -- points to -> ordered list of fw-rules returned by the search on field1 (having its explicit value matching that in the network packet) -------- (step 3-merge) Be a tentative fw-rule R (with needed-fields bitmask RB): R is a coincidence of the searches if every field in R's bitmask RB has rule R in the head of its list (in the code above, it is pointed to by "ordered_list_for_field[field]->rule") In Annex 1 there is a possible implementation of this variant for step 3-merge. As a detail, it is immune to "static" rules having no relevant fields at all (from the "static" firewall point of view), e.g. the rule allow check-state because the merge in Annex 1 always tries to discard a minimum (possible coincidence) as soon as possible, but that rule can't be discarded since it doesn't need any match on any field for it to match the packet: it can only be replaced by a lesser "fw_number" matching rule. (For such rules, a question arises in step 2 -the searches-: if the rule depends on no field, then, in which field/list is it to be stored, for the rule can be returned later for its processing in step 3 -merge? At first sight, it seems that in all rules and fields. See section 3.7 for an alternative.) NOTE: about the IPFW2 expressive power and fields both explicit and irrelevant in an IPFW2 rule ... In IPFW2, there may be fields both explicitly requested and irrelevant in a same rule, e.g., the protocol and destination address in deny proto icmp or dst-ip me because, though they are in different OR-blocks, the merge may incorrectly interprets the bitmask as an AND. IMHO, this IPFW2 OR-blocks difficulty could be solved by representing internally the IPFW2 rule in a canonical equivalent set of simpler rules where each field is either explicitly given or irrelevant, but not both. A practical interpretation of such "canonical equivalent set of rules" is not only merging (step 3: extract coincidences) according to a common fw-rule, but requiring further a coincidence in the OR-block in that IPFW2 rule. Thus, the ordered list associated with each explicit value in the search data structure would not be one of rules, but one of OR-blocks in rules: a coincidence needs everyone to point to the same OR-block in a rule. Thus, the canonical set is a kind of disjunctive normal form. SECTION 3.3. NEGATED (NOT) VALUES IN A FW-RULE The problem of negated values for a field in a rule and their matching affects the algorithm. A simple example to proceed: add 10000 deny ip from not 192.168.12.0/25 to any In it, the "IP-SourceAddress" value is negated. (There may be other, better, solutions to this problem than the following, IMHO.) Suppose there is a bitmask "fw_notflg" in the "struct ip_fw" indicating which fields are negated in a rule. That bitmask would be processed in step 3 (merge) or in step 4 (apply) to exclude a negated rule that matched the value in the packet. In step 2 (searches), the rule with the negated value would be in the list of that value, as usual. In the example above: - field: "IP source Address" search data structure | | 192.168.12.0/25 --> list( ..., 10000, ...) - If a packet with IP-SourceAddress == "192.168.12.0/25" arrives, - In step 2 (searches) that value is matched and its list returned, as usual. - In step 3 (merge) or 4 (apply), since rule 10000 in the list has the "IP-SourceAddress" field negated (ON in the "fw_notflg" bitmask), then rule 10000 is discarded. But, what happens on a packet not having that value in the rule? E.g., in the example above, with "IP-SourceAddress"== 192.168.3.3. Then, the value wouldn't be found in the search data structure (or perhaps yes, but rule 10000 wouldn't be in its list), and this is not the intended behavior, because rule 10000 -for field "IP-SourceAddress"- matches indeed, since the value in the packet -192.168.3.3- does match rule 10000's "not 192.168.12.0/25" condition. To avoid this "loss", such fw-rule (in the example, 10000) has to be put in the list of every other explicit value for that field in its search data structure. E.g., if there are these two rules # In real life, it may be after the rule with the "not" add 5000 allow tcp from 192.168.55.0/24 to me smtp setup add 10000 deny ip from not 192.168.12.0/25 to any then we would have that rule 10000 matches for the "Ip-SourceAddress" field == 192.168.55.0/24, and really rule 10000 matches every other explicit value for field "Ip-SourceAddress" that doesn't intersect with rule 10000's negated value "192.168.12.0/25". - field: "IP source Address" search data structure | | ------------|----------- | | | ... | ... | 192.168.55.0/24 --> list( ... 5000, 10000, ...) But, what happens if a packet arrives not having any of the explicit values for the field? E.g., in the example above, with an "IP-SourceAddress" not being "192.168.12.0/25", "192.168.55.0/24" or whatever else is listed in the firewall. Despite such surprise, the negated rule matches on the packet for the field. (In the example, rule 10000 matches on the "unexpected" packet for the field "IP-SourceAddress"). So a "REST of the domain" list is needed for field "IP-SourceAddress" (cf section 3.2). So, in the negated field, the rule with the negated value has to appear in all other explicit values' lists not intersecting the negated value and also in the "REST of the domain" list. Going back to the beginning of this section, why has the rule to appear in the negated value's list, if it is precisely that case that shouldn't match? The negated rule can merely be omitted in its negated value's list. In the example above, - IP source Address (that is the negated field in rule 10000) - "REST of the domain" entry --> ( ..., 10000, ...) - search data structure | /* original version: complex, with a bitmask */ | 192.168.12.0/25 /* the negated value */ | --> list( ...X, 10000, Y,...) /* new version: simpler: negated rule doesn't appear * in its own negated value's list */ | 192.168.12.0/25 --> list( ...X, Y, ...) Therefore, the field "fw_notflag" in "struct ip_fw" is unnecessary for the problem of this section. That's the solution proposed for the negated value problem: - to employ a "REST of the domain" list for (only) the negated field; and - with respect to the negated field, to put the rule with the negated value in: o the list of every other explicit value that doesn't intersect the negated value; and also in o the list of the "REST of the domain" for that field. (A bitmask for representing which fields are negated and which are "positively" required is not needed.) There was a critic in section 3.2 against the "REST of the domain" list in using it for solving the "irrelevant field" problem. It was that the rules in that "REST" list (which were those where the field was irrelevant for them) can be included again in every explicit values' list, so these lists can be longer and this makes the merge analyze more rules. That critic also applies here to this "REST of the domain" list for the "negated values" problem, but noting that negated values in rules are not so common in practice as irrelevant fields, so the "REST of the domain" list, containing only the rules that negate its field, should be shorter than that in the "irrelevant field" counterpart. For example, this brief IPFW2 rule deny proto icmp on the "IP-protocol" field only, is making all other fields in a packet irrelevant for matching the rule. For the "irrelevant field" problem, this rule would have to be in every list of every irrelevant field in it if a relevant fields bitmask weren't used. (E.g., the field "IP source address" would have to have this rule in its explicit values' and "REST" lists.) For the "negated value" problem, this rule doesn't have to be included, since it doesn't have any negated value (and if so it were, then the rule would be duplicated in some lists in the negated field's data structure only, not in every other field's). NOTE: about negated values and IPFW2 OR-blocks A natural constraint in any algorithm pretending to be compatible with the existing sequential search, is that the former has to match the same rule that the latter with the same input. E.g., consider allow src-ip not 192.168.12.0/24 or src-ip not 192.169.13.0/25 If the source IP of the packet is 192.168.13.1, should it be allowed? This rule shouldn't be in the list of the node 192.168.12.0/24 or in that of 192.169.13.0/25 either, because they are negated there; or to the contrary, because both masks don't intersect each other and the subsequent cross-over, the rule should be in both lists. Which to choose? The answer is: choose the one that coincides with the existing sequential search. SECTION 3.4. DEALING WITH RANGES AND MASKS OF VALUES The problem of dealing with ranges and masks is going to be reduced to how to represent them, in order that step 2 (the searches) returns the associated list, and no particular processing is done by step 3 (merge) or 4 (apply rule). (Note that the problem of irrelevant fields -section 3.2- was solved by the latter approach -in the merge-, while the negated values one was by the former method -in the data structure and the searches.) Ranges are dealt first. Let all the values mentioned in a firewall for a field, including those limiting its ranges used in the rules, be put in order: |------------X--------------X------X--.....--X----------| domain-min value1 value2 ... valueN domain-max (values mentioned in the firewall) Let's call a "minimum segment" that between two consecutive values in the domain, without including any of both that is also used as a value alone in the fw. (E.g., a minimum domain would be, in general, a segment from "value(I)+1" to "value(I+1)-1" where the sum/rest of 1 depends on whether the value appears in the fw as a value alone or only delimiting a range; if "domain-min" and "domain-max" are not mentioned in the rules, then another two minimum segments would be: - from "domain-min" to "value(1)-1" - from "value(N)+1" to "domain-max" where there is a similar criterion to the one exposed above on whether to sum/ /rest 1 from a value delimiting these extreme minimum segments.) Let a B-tree where each interior node is a value, and each leaf is either a value, or a minimum segment. E.g., in the port numbers domain -supposing the values used in the fw for this field are 20, 21, 22, 25, 80, 110, 443, 514, 1024 and 3128- the B-tree might be: 443 < | > ------------------------------------ | | 80 1024 | | ------------- --------------------- | | | | 22 110 514 3128 | | | | ---------- ---------- ----------- ----------- | | | | | | | | 20 25 81- 114- 444- 515- 1025- 3129- | | -109 -442 -513 -1023 -3127 -65535 -------- ------- | | | | 1- 21 23- 26- -19 -24 -79 In the example above, any node of the form "A-B", where A and B are two numbers, is a minimum segment, and note they are all leafs in the tree. In practice, there is not need to represent all minimum segments of the set of single values in the firewall for the field. They have to be made only if they are inside of a range specified by a rule, e.g., in minimum minimum segment A segment B ... X---------------X----------------X ... a range's | the range's min value | max value | another value in the fw that is incidentally inside the range. there are only two segments, and if there isn't any other range in the fw for the field, there won't be more minimal segments required. (In other words, the domain of the field doesn't have to be represented into a full partition: a minimum segment only have to be represented in a leaf in the B-tree if it is inside a range mentioned by a rule.) (A "REST of the domain" entry might be appealed for the other, meaningless segments in the domain.) In the example, if the only range mentioned by the firewall rules is that from 1024 to 65535, then the only minimum segments required in the B-tree are 1025-3127 and 3129-65535. Back to the B-tree. The fw-rules in the list associated to a node in the B-tree are those such that the node's value (or minimum segment) is equal to or *contained in* the explicit value or range for the field in them. Note that, since a real range in a fw-rule may be built up of several simpler segments in the B-tree, then a same fw-rule may be in several nodes' lists at the same time. E.g., in the tree above and the rule, add 20000 allow tcp from my-net to any 1024-65535 setup ... the rule's range, 1024-65535, is composed of several nodes in the tree above -such nodes are: 1024, 3128, 1025-3127 and 3129-65535. Therefore, rule 20000 has to be in the lists of these four nodes. There are also many solutions possible to represent masks of values. For example, they can be converted into ranges of values. What is peculiar to masks, though, is that, for each bit in one of them: o if the bit in a position in the mask is 1, then the corresponding bits in its masked value and the real value have to be equal; o if the bit in a position in the mask is 0, the corresponding bit in the real value is ignored. >From this analysis, a pair (mask, masked-value) defines a kind of string matching, IMHO, in the form of a regular expression of the form of ([01]*.*)* where the whole length of the pattern is known before-hand. E.g., 192.168.0.0/255.255.0.255 means the following binary pattern: 11000000'10101000'........'00000000 (length == 32) where "." matches any bit-value, and "'" is used to separate the bytes. (Of course, the term "string matching" refers to this class of problems, no conversion of the real value to a string should be necessary.) What can be extracted from a firewall is then a set of masks (patterns) and the problem is matching the real value in this set of patterns. One solution to this problem is to use a trie of keywords and to build a finite-state machine to recognize the patterns in the set. (More properly, a solution for this search in a set of masks **might** be obtained using ideas from the string matching algorithms.) Trouble is, first, what have to be matched are patterns, not keywords. (Although they aren't general reg-exs, they just specify a bit or ignore it.) Second, ranges of values can also be extracted from the firewall for the same field. It is difficult to represent a range of values into the finite-state machine to recognize it. A solution could be to transform the range of values into a set of masks, but this would augment the number of states in the automata. SECTION 3.7. FREQUENTLY-MATCHED RULES NEAR THE START OF THE RULES LIST Some issues derived from the syntax of a rule has been exposed till now because they should be taken into account. The order of the rules in the overall fw-rules list ("ip_fw_chain_head") is another thing that affects the efficiency of the algorithm. For example, it is common that the first rules in the list are frequent matches, and they only need a couple of fields for their matching. The opportunity is to improve the algorithm using these statistical facts. The template given in "/etc/rc.firewall" -extended to deny the IANA reserved networks and other measures-, begins with the rules allow ip from any to any via lo0 deny ip from any to 127.0.0.0/8 deny ip from 127.0.0.0/8 to any deny ip from ${inet} to any in recv ${oif} deny ip from ${onet} to any in recv ${iif} deny ip from any to 10.0.0.0/8 via ${oif} deny ip from any to 172.16.0.0/12 via ${oif} ... deny networks not assigned by IANA to the public via ${oif} *** divert 8668 ip from any to any via ${oif} deny ip from 10.0.0.0/8 to any via ${oif} deny ip from 172.16.0.0/12 to any via ${oif} ... deny networks not assigned by IANA to the public via ${oif} *** allow tcp from any to any established # or check-state ... ... rest of the rules would follow where the "***"-marked rules are frequent hits under normal circumstances, and only a couple of fields are relevant for all these rules. The point here is not whether the template is used, is that, since the frequently-matched rules should tend to be as close as possible to the beginning of the list of rules to improve the existing sequential search algorithm, and since the lists indexing **all** the fields in the packet don't need to be known there, the point is how to use this ordering. SECTION 3.7.1. A FIRST APPROACH USING A PARTIAL MERGE The simple use here is to limit the searches on the fields of the packet (step 2) -where their lists are obtained-, to only those fields that are strictly relevant, and then merging the lists obtained so far. The basis for this laziness is that the searches in the data structures *might* be more time- consuming than the merge (?). But a problem arises when only the lists of a subset of fields are merged: the case where, even if a feasible -though partial- minimum is produced by the merge, there is a rule before that feasible minimum that doesn't need any of these known fields, so that rule isn't included in any of the lists obtained so far. Therefore, the merge of the subset of lists will never give that rule as a feasible minimum, silently ignoring it. E.g., in allow tcp from any to any 25 setup ... allow udp from any to any 53 ... allow ip from trusted-net to any ... the third rule doesn't need any of the fields relevant for its two previous rules. If only the lists of the fields relevant for the first two rules were merged, the result (coincidence) would always be different (perhaps greater) than the third rule. In short, the set of rules possible to be returned from a partial merge can be discontinuous in the whole list ("ip_fw_chain_head") of rules, IMHO. A solution to this problem is that the partial merge be done and its result known anyway, but each rule before the computed, partial minimum in the "ip_fw_chain_head" list has to be visited to make sure that the rule doesn't have a whole different subset of relevant fields. In other words, to keep visiting each successive rule, skip the unfeasible ones, and if feasible -"feasible" according to the current minimum and the lists merged that coincided in it-, then to search for the lists of its still-unknown relevant fields and to merge with the previously-known lists, until one of these newer lists proves the rule doesn't match, or all the lists of its relevant fields prove the rule is matched: for each successive rule in the "ip_fw_chain_head" list: begin if the rule is unfeasible according to one of its relevant fields which is already known and merged into the partial minimum, then continue with the next rule in "ip_fw_chain_head". for each still-unknown field, relevant for the rule: begin search for the value of the field in the packet and obtain its list of matching rules (make the field "known") merge this list with the lists already known and obtain a partial minimum (coincident rule) in all them if the resulting rule (the partial minimum) proves this rule is unfeasible, then continue with the next rule in "ip_fw_chain_head" end-for the rule has been satisfied by all its fields: apply it on the packet (the transversal may finish here if it is an ipfw(8)-"the search terminates" rule else the search on "ip_fw_chain_head" has to continue, discard current minimum because it was applied on the packet, and merge all known lists again to obtain a new coincident minimum rule.) end-for An implementation is in Annex 2. (Note: this solution also solves the problem of in what list associated to the data structures a rule without relevant fields at all should be stored? (cf section 3.2) Examples of rules without relevant fields are: check-state count ip from any to any deny ip from any to any # the default fw-rule Note that their "rfm" -relevant fields mask- is 0. The answer in this section, by the previous algorithm, is that these rules aren't in any list in the search data structures, they are only referenced through the main "ip_fw_chain_head" list, where they will be visited if necessary. I.e., these rules are not indexed by the search data structures. The next algorithm also preserves this situation for these rules.) SECTION 3.7.2. ANOTHER APPROACH USING THE FIELDS' STRUCTURE AMONG THE RULES, WITHOUT A MERGE OF THE INDEXING LISTS There is a different solution, that while it keeps restricting its analysis to only the fields relevant for the rule, it doesn't do a real merge of the known lists to obtain the next feasible -though partial- minimum. Instead, it recourses to a simpler analysis for not visiting each successive rule, really jumping over some of them. In a way, it tries to improve the "lookup_next_rule" function, defining which rule in "ip_fw_chain_head" will be the one analyzed in the next iteration, by using the lists of matching rules returned by the independent searches for the values in the packet (step 2). To explain this variant, a new concept has to play its role here. It is that of the longest (forward) continuous sequence of rules that share a relevant field with the current rule (the base point of the sequence). The concept is intuitive, but may be formalized as follows: 1.- the longest continuous sequence for a field that starts in a given rule is only defined if that field is relevant in that rule. 2.- if the list of rules consists of only one rule, the longest continuous sequence for each of its relevant fields is just the rule itself. (I.e., the sequences are of length 1.) 3.- if the list of rules consists of more than one rule, and taking without losing generality the base point as the first rule in the list, then the longest continuous sequence for each of the base point's relevant fields depends on whether the field is also relevant for the rule after the base point (it would be the "second" rule in the list starting at the base point): 3a.- if the field is not relevant in the next ("second") rule, then the sequence is just the first rule itself (the base point), as in point 2. 3b.- if the field is also relevant in the next ("second") rule, then the sequence for the field starting in the base point is composed of the base point followed by the longest continuous sequence for the field that starts at the next ("second") rule. Example: in these three rules, allow tcp from my-net to any setup allow udp from another-net to a-net ... deny ip from another-net2 to another-net3 taking as the base point of the sequences the first rule, the longest continuous sequence for the field: o "IP-protocol" ends in the second rule; o "Ip-SourceAddress" ends in the third rule; o "TCP-flags" field is composed of just that first rule; o there is no sequence for field "IP-DestinAddress" starting at the first rule, since that field isn't relevant there. The concept is structural to the list of rules, it doesn't depend on a real network packet. A practical representation for the longest continuous sequence is not explicitly by the sequence per-se, but implicitly by giving which is the rule that breaks -ends- the sequence by not having the field as relevant (point 3a.). (I.e., the rule immediately following in "ip_fw_chain_head" that ending the sequence.) This breaking-rule always exists, because it is at least the default fw-rule, where all fields are irrelevant (by point 3a, it breaks any sequence that reaches just before it). For this reason also, there is no longest continuous sequence starting at the default fw-rule for any field (point 1 in the definition). (I.e., the default fw-rule is unable to be a base-point because it doesn't have any relevant field -point 1 in the definition above.) An implementation of how to build these sequences for a base rule is given in Annex 3. The usefulness of this concept is related to jumping in a safe way over some rules in the "ip_fw_chain_head" list using the known lists for the relevant fields of the rule being analyzed by the loop. "Jumping in a safe way" is jumping over some rules without missing a still-feasible rule. This is detailed here. Suppose the current rule in the main iteration on the "ip_fw_chain_head" list is "R", the transversal has to continue, and that the next rules in the relevant lists for "R" are "N[1]", "N[2]", ... "N[m]". (I.e., "N[i]" is the rule in the head of the relevant list for field "F[i]" in "R" that matches the value in the packet for "F[i]", and further, that "N[i]" follows "R", i.e., "N[i].fw_number > R.fw_number" since the iteration has to continue after "R".) Let "B[i]" be the rule that breaks the longest continuous sequence for field "F[i]" that start at rule "R".(Of course, "B[i].fw_number > R.fw_number", by point 3 of the definition of a longest continuous sequence for field "F[i]".) Suppose "J" is a rule after "R". May a jump be done in a safe way from "R" to "J"? What is the safe "J" that is farthest away from rule "R"? The solution is in the longest continuous sequences starting at "R" for its fields. Consider a rule "N[i]" -that is ultimately taken from the value of "F[i]" in the network packet: o if "N[i]" is inside the longest continuous sequence of its field "F[i]", then jumping directly to "N[i]" from "R" is safe because all rules from "R->next" to the previous to "N[i]" require field "F[i]" by definition of the longest sequence, but they won't be a coincidence because the next matched rule according to the value of field "F[i]" in the packet is "N[i]". (I.e., in this case, "N[i]" is the nearest rule from "R" that is still feasible, according to field "F[i]".) o by the same reason, if a "N[i]" is outside its sequence, all the rules in the longest continuous sequence starting at "R" of its field "F[i]" may be ignored because the field "F[i]" in the packet matched on rule "N[i]"; but the jump can be no further that sequence, because field "F[i]" is irrelevant for the rule after this sequence (its breaking-rule "B[i]"). (I.e., in this case, "B[i]" is the nearest rule from "R" that is still feasible, according to field "F[i]".) Generalizing this reasoning, a safe jump may be to the farthest -greater "fw_number"- of the set of these rules: - the "N[i]"s that are inside the longest continuous sequences starting at "R" for their fields "F[i]" (N[i].fw_number < B[i].fw_number); - the "B[i]"s for those "N[i]"s outside the longest continuous sequences starting at "R" for their fields "F[i]" (N[i].fw_number >= B[i].fw_number). analyzed for every field "F[i]" relevant at "R". Farther than this maximum, it is unsafe, i.e., the jump may be over a feasible rule. In a simple expression: farthest safe jump = max( { min( N[i], B[i] ) } ) where the "i" ranges over the relevant fields of "R" whose lists are known. The "ip_fw_chk" function is then: while (TRUE) begin for each of the relevant fields for the rule: begin if its list of matching rules for the value of the field in the packet is still unknown, then search for the value of the field in the packet and obtain that list (make the field "known") else get its already-known list from the array-cache skip (sync) the rules in (the starting sublist of) this list that are lesser ("fw_number") than the rule being analyzed if the rule in the head of this list coincides with the rule being analyzed then good, another coincidence on this rule else next rule to analyze = farthest, safe jump from this rule that was being analyzed continue main "while" loop on this new rule end-for the rule has been satisfied by all its fields: apply it on the packet (the transversal may finish here if it is an ipfw(8)-"the search terminates" rule else the search on "ip_fw_chain_head" has to continue, next rule to analyze = farthest, safe jump from this rule that was applied on the packet continue main "while" loop on this new rule) end-while In Annex 4 there is an "ip_fw_chk" function using this algorithm. (Note, though, that the farthest, safe jump approach is very dependent on how long a continuous sequence of rules for a field is, and may be innocently torpedoed by breaking the continuous sequence with the insertion of a rule with a different set of relevant fields, or none at all, e.g., by inserting this rule count ip from any to any # relevant fields == none in the middle of the continuous sequence for a field.) SECTION 3.7.3. CONSOLIDATING BOTH VARIANTS OF THE ALGORITHM Comparing both algorithms (section 3.7.1/annex 2 vs section 3.7.2/annex 4), the PROS of each are: o The merge - The merge has a better point of analysis for discarding rules before "real_m", since it takes into account -in the merge- all the known lists, not only those relevant at current, base point, rule. - Its simple comparisons in UNFEASIBLE are done at the beginning of the loop on the current "target_rule", continuing immediately with the next rule in "ip_fw_chain_head" if appropriate. o The farthest, safe, jump - It jumps over some continuous sequence of rules, not needing to visit each successive rule testing whether is feasible (and analyzing it) or not. - It's simpler, but nevertheless, tries to search for a list of a unknown field favoring those fields that may give the farthest, safe, jump. So, a natural step is to combine both algorithms, the merge and the farthest safe jump, basically by leaving the latter compute which rule should be the next in the iteration on "ip_fw_chain_head", and the former -through its check UNFEASIBLE- to quickly discard that jump, given all the known lists it has merged. In practice, both algorithms fit especially nicely, IMHO, into the beginning of the list of rules in the existing template in "/etc/rc.firewall": (a) allow ip from any to any via lo0 (b) deny ip from any to 127.0.0.0/8 (c) deny ip from 127.0.0.0/8 to any deny ip from ${inet} to any in recv ${oif} deny ip from ${onet} to any in recv ${iif} (d) deny ip from any to 10.0.0.0/8 via ${oif} (d) deny ip from any to 172.16.0.0/12 via ${oif} (d) ... deny networks not assigned by IANA to the public via ${oif} *** divert 8668 ip from any to any via ${oif} (e) deny ip from 10.0.0.0/8 to any via ${oif} (e) deny ip from 172.16.0.0/12 to any via ${oif} (e) ... deny networks not assigned by IANA to the public via ${oif} *** allow tcp from any to any established # or check-state ... In the rules (a), (b) and (c), the lists of the rules matching the fields "Input-Interface" (maybe "Output-Interface" if exists), "IP-DestinAddress" and "IP-SourceAddress" in the packet, have to be known in both algorithms. In the long sequence of rules in (d) before the "divert natd" rule, the required fields are the same and are already known, and: o UNFEASIBLE in the merge algorithm may quickly decide if each rule in (d) has to be immediately discarded; and o the farthest, safe jump may skip over the whole sequence of (d)s to the "divert natd" rule directly. The same applies to the long sequence of (e) before the other frequently- matched rule. (We may be even luckier, if the packet doesn't refer to the interface "${oif}", required for the match of the sequence (d), the "divert natd" rule, and the sequence (e), because, if so, the farthest safe jump may be directly to the "allow tcp ... established" or "check-state" rule.) SECTION 3.8. MULTI-PASS FIREWALLS (Luigi Rizzo -et al.- may especially help a ^^LOT^^ here.) Advanced firewalls can recourse to multiple passes on the list of rules (e.g., on "ip_fw_chain_head") before deciding finally to accept or not the packet. At the end of each intermediate pass, the matched rule in it points to another kernel-module which has to process the packet. In a way, a rule in the firewall, re-using its matching capabilities, instructs it to "route" the packet through another kernel-module for additional processing, and after that, another pass in the firewall may begin. (Cf. Luigi Rizzo's work.) This technique is present in "ipfw(8)" to implement: o dummynet (temporary transfer of control to it), if the sysctl-var "net.inet.fw.one_pass" is not set (if set, then the dummynet's action means "[pipe/queue] and accept the packet"); o packet tee'ing, if the search had to continue for action "tee". A processing loop for the packet in "ipfw" is avoided by transferring the state of the search between successive passes. E.g., in the current implementation, this is attained by saving the rule where the previous pass ended, and then, in the next pass in "ip_fw_chk", resuming processing of the packet in the next rule. The state of the search between passes is saved in a local variable in the stack frame of a caller function of "ip_fw_chk", and a pointer to it is passed down to 'ip_fw_chk". In the algorithm shown in this paper, the state may also be saved and transferred between successive passes in the firewall. If a merge is used, then after the rule was applied on the packet but before the pass returns, the next "real_m" **could** be calculated for the next pass (the old "real_m" might be the rule just applied and if so, has to be discarded): ... /* refer to Annex 2 */ decision_found = apply( target_rule, packet, &ip_fw_chk_returnval ); if( ( ! decision_found /* the search has to continue in this pass */ || ( fw_one_pass == 0 && /* there will be a return and then ... */ ( ip_fw_chk_returnval & /* ... another pass in the fw */ (IP_FW_PORT_DYNT_FLAG|IP_FW_PORT_TEE_FLAG))) && ( real_m.fw_number == target_rule->fw_number ) ) { /* refer to the existing code in "ip_fw_chk" in Annex 2 to pop a * feasible minimum from the stack if possible, and then merging all * known lists. */ ... ... } if( decision_found ) return ip_fw_chk_returnval; There would be a minor change in semantics in this code to what is done in the current implementation, where "fw_one_pass" -giving whether the firewall is multi-pass or not- is only checked at the beginning of the new pass. The difference, in practice, only appears on packets that are being processed just when a change to "fw_one_pass" happens between the passes, a random event. The alternative, also viable, is to update "real_m" in the beginning of the new pass, if it is equal to the "last_rule_processed" by the previous pass. The code to be inserted at the beginning of "ip_fw_chk" to update it is very similar to what is done at the end of it (pop from the stack and merge the lists). In either case, if the merge is used, the state to be saved between passes is: struct state_of_the_search { struct ip_fw * last_rule_processed; struct list_entry * ordered_list_for_field[ MAX_FIELDS ]; struct minimum_entry minimums_stack[ MAX_FIELDS ]; u_int32 merge_done_mask, empty_lists_mask; struct t_real_m real_m; u_int8 is_real_m_set, old_fw_one_pass, map_size, map[ MAX_FIELDS ]; }; Note: if a new "real_m" is computed only at the beginning of the new pass, instead of at the end of the previous, the members "real_m", "is_real_m_set" and "old_fw_one_pass" are unnecessary in the saved state. In the version of Annex 4 (with the farthest jump, without any merge at all), the state to be saved is simpler: struct state_of_the_search { struct ip_fw * last_rule_processed; struct list_entry * ordered_list_for_field[ MAX_FIELDS ]; u_int32 search_done_mask, empty_lists_mask; }; In either implementation, what has to be done at the beginning of "ip_fw_chk" to resume a pass consists mainly in: if( state->last_rule_processed == NULL ) /* first pass */ target_rule = LIST_FIRST( &ip_fw_chain_head ); else { /* a successive pass */ if ( fw_one_pass ) /* or "state->old_fw_one_pass" */ return 0; /* multi-pass is OFF */ target_rule = LIST_NEXT( state->last_rule_processed, next ); /* Remember the note above, if the merge of the lists is used * and "real_m" was not computed at the end of the previous pass * and saved in the state. The code mentioned there to compute * the new "real_m" has to be put here. * * (If that is the case but "real_m" weren't updated here, since * the old "real_m" would be "last_rule_processed", then * * real_m.fw_number < target_rule->fw_number * * and then the UNFEASIBLE macro wouldn't work. Consult the * comment before the definition of UNFEASIBLE in Annex 2.) */ } ... main FOR loop on "target_rule" (the same that in Annex 2) ... A problem with the approach of transferring the state between the passes of either variant of this algorithm is the big size of the state to be saved. Some members in the structure could be discarded at the end of a pass, not saving them (specially if a merge is used) and then easily calculated again in the next pass but the size is anyway between 100-500 bytes. For example: struct tiny_stack_entry { struct ip_fw * minimum_rule; u_int8 map_index; /* Don't save the copies "fw_number" and "rfm" in the entry */ }; struct state_of_the_search { u_int16 last_rule_processed_fw_number; struct list_entry * ordered_list_for_field[ MAX_FIELDS ]; struct tiny_stack_entry minimums_stack[ MAX_FIELDS ]; u_int8 map[ MAX_FIELDS ]; /* Perhaps "is_real_m_set", "real_m" and "old_fw_one_pass" might * be here too. */ }; ... ... /* In the beginning of "ip_fw_chk", if a next pass is detected */ else { /* a successive pass */ if ( fw_one_pass /* "state->old_fw_one_pass" */ ) return 0; target_rule = locate_in_the_ip_fw_chain_head_list_the_rule_after ( state->last_rule_processed_fw_number ); merge_done_mask = empty_lists_mask = map_size = 0; while( map_size < MAX_FIELDS && ((field_no = state->map[map_size]) != FIELD_NOT_ASSIGNED)) { merge_done_mask |= 1<ordered_list_for_field[map_size] == NULL ) empty_lists_mask |= 1<next; \ } /****************************************************/ /* Begin of the merge */ ordered_common_sublist = NULL; empty_lists_mask = 0; lookup_for_a_new_minimum: /* a goto label- there are only two in this code */ rule_minimum = IPFW_DEFAULT_RULE; /* no minimum so far */ rule_min_ptr = NULL; min_relevant_fields = 0; occurrence_mask = 0; /* no minimum, no occurrence */ field_no = -1; /* start again from field index == 0 */ while ( ++ field_no < Packet_protocols_Qtty_fields ) { field_mask = 1 << field_no; /* the mask for the field */ if( empty_lists_mask & field_mask != 0 ) /* this list had been already detected to be empty */ continue; try_next_value_in_this_list: /* the other goto target */ if( ordered_list_for_field[field_no] == NULL ) { /* this list has been emptied */ empty_lists_mask |= field_mask; /* don't waste time on the current minimum if this list was * relevant for it */ if ( rule_minimum < IPFW_DEFAULT_RULE && min_relevant_fields & empty_lists_mask != 0 ) { /* this field (list) was relevant for the current coincidence: * discard this partial coincidence */ /* "11111111" */ ADVANCE_HEAD_OF_LISTS( occurrence_mask ); goto lookup_for_a_new_minimum; } else /* this list belongs to a field irrelevant for the current * minimum (partial coincident rule) */ continue; /* ignore this list for this minimum */ } assert( ordered_list_for_field[field_no] != NULL ); /* Get the fw-rule pointed to by the head of this list */ current_rule_ptr = ordered_list_for_field[field_no]->rule; current_rule = current_rule_ptr->fw_number ; current_relevant_fields = current_rule_ptr->relevant_fields_mask; if( current_rule < rule_minimum ) { /* a new minimum for the merge: replace the new minimum, IF * POSSIBLE, according to circumstances */ /* Useful to keep in mind: since field_mask is 1<next; /* NOTE: we advanced our head pointer but didn't change the * list in the search data structure */ goto try_next_value_in_this_list; } else { /* Yes, it is a new minimum and a possible coincidence */ /* "22222222" */ rule_minimum = current_rule; rule_min_ptr = current_rule_ptr; min_relevant_fields = current_relevant_fields; /* init its occurrence mask */ occurrence_mask = field_mask; } } else if ( current_rule == rule_minimum ) { /* good, another coincidence for the current minimum. May be * useless (?), but we register it anyway */ occurrence_mask |= field_mask; } else { assert( current_rule > rule_minimum ); /* Was this field relevant for the current minimum? */ if( min_relevant_fields & field_mask != 0 ) { /* sorry, it was relevant: discard the current minimum */ /* "33333333" */ ADVANCE_HEAD_OF_LISTS( occurrence_mask ); goto lookup_for_a_new_minimum; } } } /* end of the WHILE loop */ if ( rule_minimum < IPFW_DEFAULT_RULE ) { /* a new minimum found and with all its required coincidences */ assert( min_relevant_fields & occurrence_mask == min_relevant_fields ); /* emit this new minimum (its "struct ip_fw" pointer) for the common * sublist for step 4 (apply the rules on the packet) */ append( ordered_common_sublist, rule_min_ptr ); /* "44444444" */ ADVANCE_HEAD_OF_LISTS( occurrence_mask ); if( is_there_any_remaining_list( empty_lists_mask ) ) goto lookup_for_a_new_minimum; } return ordered_common_sublist; } /* end of the merge phase */ There is an observation that may lead to improve the run-time and it is that, when a new minimum is found that replaces an existing minimum, then the new minimum can finally be appended to the common list or discarded, but anyway, the replaced -old- minimum can be employed again because it still **is** the minimum for the first lists checked until the other, newer and lesser, minimum that replaced it. I.e., the replaced minimum should be saved somewhere, and this minimum should be the first one to be resumed again -when a new *partial* minimum is wanted for resuming the merge. Since the old minimum have to be saved last but restored first, a LIFO policy, then a stack for saving the replaced minimum may be an optimization, IMHO. Note that the contents of the stack would be ordered by "fw_number" rule# from its top incrementally to its bottom. In the following patches, a stack "stack_minimums_history" is used, that would be declared local at the beginning of the merge. The contents of an entry in this stack are (old_minimum_ptr, occurrence_mask_til_replaced, field_where_it_was_replaced) Excuse me very deeply for my poor form of back-referencing for patching: /* look for this comment in the code above: * "11111111" * for the following patch */ /* start context */ ADVANCE_HEAD_OF_LISTS( occurrence_mask ); /* changes start here */ while( !empty( stack_minimums_history ) ) { /* the current minimum can't hold its requirements */ /* We try, with the extra visits we have done with the just * discarded minimum -before this "while" loop-, to discard also * the minimums we met in the first lists (lower "field_no"s) * and are in the stack -and were defeated by the minimum just * discarded- for not visiting them again. If one of those * previously met minimums (the lowest at the top of the stack) * can't be discarded because it has some chance to be a real * coincidence, then we instead resume visiting, with it as the * new minimum, from the "field_no"s where it was replaced and * pushed onto the stack. * * The alternative to this would be to optimistically continue * visiting the higher "field_no" lists hoping to meet a new * minimum there. The visiting of the whole array of lists would * be then circular, wrapping from * "Packet_protocols_Qtty_fields - 1" down to 0 inside the * "while" loop, without the need of a goto to * "lookup_for_a_new_minimum", and only finishing the "while" * when "empty_fields_mask" is all "1"s. * * Note that our choice, in other words, causes us to exhaust * first, if possible, the lower "field_no"s lists */ pop( stack_minimums_history, &rule_minimum_ptr, &occurrence_mask, &field_no ); min_relevant_fields = rule_minimum_ptr->relevant_fields_mask; if( min_relevant_fields & empty_lists_mask != 0 ) { /* "empty_lists_mask" is the currently known empty lists * mask, i.e., it was not popped from the stack */ ADVANCE_HEAD_OF_LISTS( occurrence_mask ); } else break; /* We have just found a possible minimum */ } if( min_relevant_fields & empty_lists_mask != 0 ) /* the stack_minimums_history was emptied and none of its * recorded minimum could keep at least a hope according to the * already empty lists */ goto lookup_for_a_new_minimum; /* from the beginning */ else { rule_minimum = rule_minimum_ptr->fw_number ; field_mask = 1 << field_no; goto try_next_value_in_this_list; /* of the just popped * "field_no" */ } /* end of patch for "11111111" */ .... .... /* look for this comment in the code above: * "22222222" * for the following patch */ /* save the minimum just to be replaced */ if ( rule_minimum < IPFW_DEFAULT_RULE ) push( stack_minimums_history, rule_min_ptr, occurrence_mask, field_no ); /* keep doing what was previously indicated, no change */ rule_minimum = current_rule; /* put the new minimum */ rule_min_ptr = current_rule_ptr; min_relevant_fields = current_relevant_fields; /* init its occurrence mask */ occurrence_mask = field_mask; /* end of patch for "22222222" */ .... .... /* for the code after the comments "33333333" and "44444444" we have * to apply the same patch that for "11111111". */ The main (outermost) "while" in the merge -the one immediately after the label "lookup_for_a_new_minimum"- always start with field "field_no = 0", and increment it consecutively until "field_no" reaches the number of fields in the packet's protocols. I.e., there is no wrapping of "field_no" inside the "while" loop back to 0, this can only happen by a goto to "lookup_for_a_new_minimum", before the "while", where "field_no" is reset to 0. In the previous patches, a goto to "lookup_for_a_new_minimum" happens because the "stack_minimums_history" was emptied. So, combining both facts, it can be inferred that the maximum depth that the stack may have is the number of fields in the packet ("Packet_protocols_Qtty_fields"), that only occurs when every new field (visited incrementally by its index) defines a new possible minimum that replaces the immediately previous. Thereafter, "stack_minimums_history" may be declared in the code as a fixed-size (pre-allocated) local array: typedef u_int mask; #define MAX_FIELDS ( sizeof(mask) * 8 ) /* Really, the number of different fields in IPFW2 is 27, IMHO */ struct minimum_entry { struct ip_fw * partial_minimum; int field_no_where_it_lost; } stack_minimums_history[ MAX_FIELDS ]; That avoids to call the heap allocation routines for the stack, that would be relatively expensive since merely a network packet is being matched. I.e., from the perspective of the overhead incurred, the use of a stack is feasible, IMHO. (Also, a little patch to keep the algorithm's state clear in that label: lookup_for_a_new_minimum: assert( empty( stack_minimums_history ) ); ) A simplification follows the use of the stack to save previous minimums. In the code above, "occurrence_mask" is used to remember where the current minimum, "rule_minimum", has occurred so far, in order to use "ADVANCE_HEAD_OF_LISTS(occurrence_mask)" to discard that minimum when it proves to be unfeasible. If another minimum improved (replaced) this old minimum, then the occurrences of the old minimum couldn't be skipped in the heads of its lists because that was the way to remember that it existed -i.e., to find it again in the first lists if needed to be. But with the "stack_minimums_history", the previous minimum is saved (pushed) into the stack, so it won't be lost -really, it may be popped out later-, so not skipping the heads of the lists where that minimum occurred is a very conservative move. That heads may be skipped, because that minimum won't be lost, either discarded or saved into the stack. But if the heads may be skipped, "ADVANCE_HEAD_OF_LISTS(occurrence_mask)" can't be used, and "occurrence_mask" is useless. So, simply, where an assignment to "occurrence_mask" happened to record in which lists the minimum occurred, now it's replaced by an advance in the head of that list. ANNEX 2. IP_FW_CHK FROM SECTION 3.7.1 struct t_real_m { u_int16 fw_number; /* a copy of the rule's fw_number */ u_int32 rfm; /* and its relevant fields mask */ struct ip_fw * fw_rule_ptr; /* the rule in ip_fw_chain_head */ }; struct minimum_entry { struct t_real_m old_minimum; u_int8 map_index; /* the "field_no" where it lost */ u_int32 merge_done_mask; }; static void merge( struct list_entry ** ordered_list_for_field, u_int8 starting_at_map_index, u_int8 * map, u_int8 map_size, struct minimum_entry * minimums_stack, u_int8 * is_real_m_set, struct t_real_m * real_m, u_int32 * empty_lists_mask, u_int32 * merge_done_mask ); /* * If "is_real_m_set" is TRUE ("real_m" has a value), then "real_m" has to be * the current rule being visited by the iteration, or greater than it. * ("real_m" may not be lesser than it because it should be then previously * discarded or applied on the packet, but nevertheless such "real_m" would be * history now.) In symbols, what was expressed here is that * * rule_being_visited->fw_number <= real_m.fw_number * * if "is_real_m_set" is TRUE. * * Feasibility of rule being visited, i.e., that the rule may be a partial * coincidence when we sequentially transverse the "ip_fw_chain_head" list, * requires both ("rfm" is the "relevant fields mask" of the rule): * * o rule->rfm & empty_lists_mask == 0 * * o rule->rfm & merge_done_mask != 0 ===> (implies) * ===> is_real_m_set && real_m.fw_number == rule->fw_number * * These are necessary conditions for possible feasibility of a given fw-rule. * Really, the first condition is implied by the second, as can be easily * proved: * * is_real_m_set && real_m.fw_number == rule->fw_number ===> * ===> rule->rfm & empty_lists_mask == 0 * * otherwise the merge would have produced a feasible "real_m" with one of its * relevant list known and empty. But, in real-life, since the * "merge_done_mask" may be restored from the old minimums stack, but the * "empty_lists_mask" not (i.e., "empty_lists_mask" may be ahead of * "merge_done_mask" when the latter is retrieved from the stack), the first * condition is left explicitly checked. * * Since we try to discard a rule from analysis for coincidence as soon as * possible, we check for its unfeasibility, i.e., the negation of the above * conditions. */ #define UNFEASIBLE(r) \ ( \ ( (r)->nfm & merge_done_mask != 0 && \ ( ! is_real_m_set || (r)->fw_number < real_m.fw_number )) \ || \ ( (r)->nfm & empty_lists_mask != 0 ) \ ) static int ip_fw_chk( struct ip ** pip, /* rest of parameters here ... */ ) { u_int32 empty_lists_mask, merge_done_mask, preferred_fields, field_mask; u_int8 is_real_m_set, map_size, map_index; struct t_real_m real_m; struct list_entry * ordered_list_for_field[ MAX_FIELDS ]; struct minimum_entry minimums_stack[ MAX_FIELDS ]; u_int8 map[ MAX_FIELDS ]; struct ip_fw * target_rule; struct list_entry * list4value; /* The list for the value of the field in * the network packet. */ /************************************************************/ /* * Begin of the work of "ip_fw_chk" */ /************************************************************/ empty_lists_mask = 0; merge_done_mask = 0; is_real_m_set = FALSE; map_size = 0; target_rule = LIST_FIRST( &ip_fw_chain_head ); for(; targe_rule; target_rule = LIST_NEXT(target_rule, next)) { /* We cannot have a minimum "real_m" that is before this rule * "target_rule", because we should have applied that minimum when we * visited it, since all of its relevant fields had been merged into * "real_m" and hence assured it as a full coincidence. * * is_real_m_set ===> (implies) * ===> target_rule->fw_number <= real_m.fw_number */ assert( ! is_real_m_set || target_rule->fw_number <= real_m.fw_number ); /* We cannot have the "minimums_stack" non-empty if there is no best * current minimum set * * ! empty( minimums_stack ) ===> is_real_m_set */ assert( empty( minimums_stack ) || is_real_m_set ); /* This is how "merge_done_mask" might be rebuilt by each iteration if * needed to be (since it's in practice preserved among the iterations and * by the minimums' stack). */ merge_mask_temp = 0; /* just for the assertion */ for( m=0; m < map_size; m++ ) merge_mask_temp |= 1 << map[m]; assert( merge_mask_temp == merge_done_mask ); /* End of the assertions, back to work ... */ if( UNFEASIBLE( target_rule ) ) continue; /* Nothing to do with this rule */ /* We calculate which fields are pending for deciding whether this rule * applies or has to be discarded. * * There are two passes here over the set of relevant fields for this rule * "target_rule". In the first pass we try to analyze only those of its * fields that are also relevant for rule "real_m" but are still unknown, * because, as we know we have to search for some unknown fields for this * rule "target_rule", we prefer those that also affect "real_m". * * (Note: the condition * * target_rule->fw_number != real_m.fw_number * * implies, by "! UNFEASIBLE(target_rule)", that * * target_rule->rfm & ~merge_done_mask == target_rule->rfm * * i.e., there are no coincidences between both. But with * "~merge_done_mask" is clearer. */ preferred_fields = (is_real_m_set && target_rule->fw_number != real_m.fw_number) ? (target_rule->rfm & real_m.rfm & ~merge_done_mask) : 0; /* else no preference */ for( importance = 2; importance; importance -- ) { if( preferred_fields == 0 ) goto skip_FORloop_on_field_no; for( field_no = 0; preferred_fields; field_no ++ ) { field_mask = 1 << field_no; if( field_mask & preferred_fields == 0 ) continue; /* This field is not required by the rule or is * not preferred in this pass by its importance */ preferred_fields ^= field_mask; /* clear its bit */ /* Search this field in its data structure for the first and unique * time */ list4value = search( field_no, packet->fields[field_no] ); /* We don't know anything yet about the list just returned */ assert( list4value == NULL || list4value != NULL ); /* We store this list as a new entry in our array. We don't try to * store it at index "field_no", because then the array would be * dispersed, and we prefer it to be condensed, growing from 0, to * facilitate our merge */ map_index = map_size ++; map[map_index] = field_no; ordered_list_for_field[map_index] = list4value; /* Merge this newly known list with the ones we already have from * previous iterations on other "target_rules", and from previous * fields from this same rule. * * (I don't think a CALL instruction is cheap here, but to make the * code clearer.) */ merge( ordered_list_for_field, map_index, map, map_size, minimums_stack, &is_real_m_set, &real_m, &empty_lists_mask, &merge_done_mask ); if( ! is_real_m_set || target_rule->fw_number < real_m.fw_number ) { /* there is nothing left we can do for this "target_rule", * because we have included one of its distinctive fields in * the merge, and, if it were a coincidence (in "real-m"), then * "target_rule" should be the "real_m". You may check by * yourself that this condition in this "if" is equivalent to * UNFEASIBLE( target_rule ) by the asserts above. */ assert( UNFEASIBLE( target_rule ) ); goto NEXT_TARGET_RULE; } assert( is_real_m_set && target_rule->fw_number == real_m.fw_number ); assert( ! UNFEASIBLE( target_rule ) ); } /* end FOR(field_no) merging unknown fields for "target_rule" */ /* next cycle of lower importance: the remaining unknown fields of * "target_rule". Note here that it is possible that * * target_rule->fw_number == real_m.fw_number * * by the end of the previous FOR(field_no) loop, so we mayn't use * "real_m" here since it may have been changed. We simply use * "merge_done_mask" that has been updated too. */ skip_FORloop_on_field_no: preferred_fields = target_rule->rfm & ~merge_done_mask; } /* end of the FOR(importance) loop */ /* By the last two assertions above, here we know this "target_rule" has * been matched by the packet, and since we are walking through the * "ip_fw_chain_head" list respecting its "fw_number" order, we know we * may apply "target_rule" on the packet. We'll immediately proceed to do * so, because this rule may be a "the search terminates" one. */ assert( (target_rule->rfm == 0) /* A very unusual case, e.g., the rule * "count ip from any to any" * that the merge will always ignore */ || ( is_real_m_set && real_m.fw_number == target_rule->fw_number && real_m.rfm & merge_done_mask == real_m.rfm) ); decision_found = apply( target_rule, packet, &ip_fw_chk_returnval ); if( ! decision_found && real_m.fw_number == target_rule->fw_number /* i.e., equivalent to * target_rule->rfm != 0 */ ) { /* The rule applied was a "the search continues" rule. * * Since "target_rule" is a real coincidence in the merge but it has * been applied on the packet -it can't be applied twice on it-, we * have to forget it. */ do { if( empty( minimums_stack )) break; pop( minimums_stack, &real_m, &map_index, &merge_done_mask ); } while( real_m.rfm & empty_lists_mask != 0 ); if( real_m.fw_number == target_rule->fw_number || real_m.rfm & empty_lists_mask != 0 ) { /* nothing could be found in the minimums stack */ map_index = 0; merge_done_mask = 0; is_real_m_set = FALSE; } merge( ordered_list_for_field, map_index, map, map_size, minimums_stack, &is_real_m_set, &real_m, &empty_lists_mask, &merge_done_mask ); } if( decision_found ) /* "decision_found" is the same that from the return * of the "apply" function above */ return ip_fw_chk_returnval; NEXT_TARGET_RULE: ; } /* end of the FOR on "target_rule" */ /* The default fw-rule is linked into the "ip_fw_chain_head" list and is * visited in the "for" loop if it is the rule matched. The specificities of * the default fw-rule -free the mbuf parameter "m", etc- are dealt with by * the "apply" rule function. */ } /* end of the "ip_fw_chk" function */ /***********************************************************/ /* * * The "merge" function. * */ /***********************************************************/ static void merge( struct list_entry ** ordered_list_for_field, u_int8 starting_at_map_index, u_int8 * map, u_int8 map_size, struct minimum_entry * minimums_stack, u_int8 * is_real_m_set_ptr, struct t_real_m * real_m_ptr, u_int32 * empty_lists_mask_ptr, u_int32 * merge_done_mas_ptr ) { /* Avoid pointer references, by using local copies. (In real life, this * independent function "merge" shouldn't exist; the following should * be directly into "ip_fw_chk".) */ u_int32 empty_lists_mask, merge_done_mask, field_mask; u_int8 is_real_m_set, field_no; struct t_real_m real_m; struct list_entry * list4value; empty_lists_mask = *empty_lists_mask_ptr; merge_done_mask = *merge_done_mask_ptr; is_real_m_set = *is_real_m_set_ptr; if( is_real_m_set ) real_m = *real_m_ptr; do { DO_LOOP: field_no = map[starting_at_map_index]; list4value = ordered_list_for_field[starting_at_map_index]; field_mask = 1 << field_no; /* The bitmask "merge_done_mask" represents over which fields the * merge has been concluded -and "real_m" extracted. Here, we are * starting a merge on a field at "starting_at_map_index", so... */ merge_mask_temp = 0; for( m=0; m < starting_at_map_index; m++ ) merge_mask_temp |= 1 << map[m]; assert( merge_mask_temp == merge_done_mask ); /* We can have a possible minimum, i.e., have "is_real_m_set" == TRUE, * only when we have merged at least one list, i.e., * starting_at_map_index >= 1: * * is_real_m_set ===> (implies) ===> starting_at_map_index >= 1 * * This is really a consequence of the above assert, i.e., that a * "real_m" can only exists when there has been at least a merge, * "merge_done_mask" > 0 */ assert( ! is_real_m_set || starting_at_map_index >= 1 ); /* This comes from the Annex 1: the "minimums_stack" can only save at * most one minimum for each visited field, since we don't wrap * circularly our merge from "MAX_FIELDs" down to 0. */ assert( depth( minimums_stack ) <= starting_at_map_index ); /* Real work to be done */ while( list4value != NULL && UNFEASIBLE( list4value->rule )) list4value = LIST_NEXT( list4value, next ); assert( list4value == NULL || ! UNFEASIBLE( list4value->rule ) ); /* Save back in our array what is remaining from the list */ ordered_list_for_field[starting_at_map_index] = list4value; /* Analyze all, IMHO, possible cases in the merge. */ if( ! is_real_m_set ) { /* No "real_m" yet */ if( list4value != NULL ) { /* "list4value->rule" is a feasible rule according to the * previous "while" loop. */ assign_min( real_m, list4value->rule ); is_real_m_set = TRUE; /* we have just set it */ } else empty_lists_mask |= field_mask; } else /* we had a previous feasible minimum in "real_m" */ if( list4value == NULL ) { empty_lists_mask |= field_mask; /* How this empty list interacts with the previous minimum */ if( real_m.rfm & empty_lists_mask != 0 ) { /* This list was relevant for the previous minimum: discard * it, and try to rescue another feasible minimum from the * stack. * * Note that the "UNFEASIBLE( of the top of the * minimums_stack )" reduces only to test if one of its * relevant lists is empty, because here we should have * "is_real_m_set == FALSE" from the "if" immediately above * this comment. */ while( real_m.rfm & empty_lists_mask != 0 && ! empty( minimums_stack )) pop( minimums_stack, &real_m, &starting_at_map_index, &merge_done_mask); if( real_m.rfm & empty_lists_mask != 0 ) { /* Nothing was found in the stack */ is_real_m_set = FALSE; /* We have to merge all known fields again */ starting_at_map_index = 0; merge_done_mask = 0; } goto DO_LOOP; } } else /* we had a previous feasible minimum in "real_m" and this new * "list4value" != NULL * * What is remaining is to compare "list4value->rule->fw_number" * against the current feasible minimum */ if( list4value->rule->fw_number < real_m.fw_number ) { /* We know from the "while" before this large "if" that * "list4value->rule" may be FEASIBLE. */ /* save the old minimum in the stack */ push( minimums_stack, real_m, starting_at_map_index, merge_done_mask ); assign_min( real_m, list4value->rule ); /* And we don't need this rule any more, since we have put it * in "real_m". If it is defeated by a higher * "starting_at_map_index" minimum, then it will be saved then * in the "minimums_stack". */ ordered_list_for_field[starting_at_map_index] = LIST_NEXT( list4value, next ); } else /* we had a previous feasible minimum in "real_m" and this new * "list4value" != NULL and * "list4value->rule->fw_number" >= "real_m.fw_number" */ if( list4value->rule->fw_number == real_m.fw_number ) /* A coincidence on "real_m": skip this rule, because we * already knew it (in "real_m"). */ ordered_list_for_field[starting_at_map_index] = LIST_NEXT( list4value, next ); else /* we had a previous feasible minimum in "real_m" and this new * "list4value" != NULL and * "list4value->rule->fw_number" > "real_m.fw_number" */ /* See if this list was relevant for the current feasible * minimum, because it hasn't got a coincidence in this list. */ if( real_m.rfm & field_mask != 0 ) { /* This list (field) was relevant for the current feasible * minimum: discard it and try to rescue another minimum * from the "minimums_stack" */ real_m.rfm = empty_lists_mask; /* Discard "real_m" */ do { if( empty( minimums_stack )) break; pop( minimums_stack, &real_m, &starting_at_map_index, &merge_done_mask ); } while( real_m.rfm & empty_lists_mask != 0 ); if( real_m.rfm & empty_lists_mask != 0 ) { /* No minimum in the stack was feasible */ is_real_m_set = FALSE; starting_at_map_index = 0; /* merge all lists again */ merge_done_mask = 0; } goto DO_LOOP; } /* no more cases to analyze for the "if" of the merge */ merge_done_mask |= field_mask; /* If this list was found before this "if" to be non-empty then a * feasible minimum in "real_m" has to be set, since at least we know * that the rule at the "list4value->rule" was FEASIBLE according the * "while" just before the "if". * * field_mask & empty_lists_mask == 0 ===> is_real_m_set */ assert( field_mask & empty_lists_mask != 0 || is_real_m_set ); } while ( ++ starting_at_map_index < map_size ); /* Copy back all the local values to the pointer parameters */ *empty_lists_mask_ptr = empty_lists_mask; *merge_done_mask_ptr = merge_done_mask; *is_real_m_set_ptr = is_real_m_set; if( is_real_m_set ) *real_m_ptr = real_m; } /* end of function "merge" */ ANNEX 3. HOW TO BUILD THE ARRAY "break_rule" OF EACH (STATIC) RULE (FROM SECTION 3.7.2) Let's call the array with the break rules for the longest continuous sequences starting at a rule by "break_rule" (cf. section 3.7). This array is indexed by relevant field in the rule, so in the code, to keep the array compact, another array "relevant_field" for a rule gives the real field number. Both arrays have size "number_relevant_fields" for its rule. We suppose that the "ip_fw_chain_head" list is doubly-linked (there is a "previous" pointer in the "struct ip_fw", analogous to the existing "next" pointer there). There is no circularity in the "ip_fw_chain_head" list. The cases analyzed are when a new fw-rule is added (the function "add_entry" in "ip_fw.c") or deleted ("del_entry"). CASE 1: A new (static) fw-rule is added. static int add_entry( struct ip_fw_head * head, struct ip_fw * new_rule ) { ... ... /* We suppose here that the rule "new_rule" has been already inserted in * its correct place into the "head" list, the pointers in the list have * been updated, and that both arrays, "new_rule->relevant_field" and * "new_rule->break_rule", have been allocated in memory to their size of * "new_rule->number_relevant_fields" */ u_int32 interesting_fields_mask; u_int8 seq, seq1, field_no; struct ip_fw * rule_after; /* Update both arrays for "new_rule". (I.e., "new_rule" as the base point * for the longest continuous sequences -their starting point.) * * "new_rule" inherits the same breaking rules than those of the rule * after it, for those fields that both rules have in common. The common * sequences of the other fields of "new_rule", not in "rule_after", are * broken by "rule_after" itself. * * (We re-use the existing order in the arrays of "rule_after", that are * not modified, to avoid an explicit sort on the "new_rule"'s arrays.) */ rule_after = LIST_NEXT( new_rule, next ); if( rule_after == NULL ) /* "new_rule" is the default fw-rule */ goto skip_arrays_build; seq = 0; /* how many sequences have been filled in "new_rule" */ interesting_fields_mask = new_rule->rfm & rule_after->rfm ; for( seq1 = 0; seq1 < rule_after->number_relevant_fields; seq1++ ) { field_no = rule_after->relevant_field[seq1]; if( ( 1 << field_no ) & interesting_fields_mask == 0 ) /* this field of "rule_after" is not in "new_rule" */ continue; /* "new_rule" and "rule_after" share this field: inherit */ new_rule->relevant_field[seq] = field_no; new_rule->break_rule[seq] = rule_after->break_rule[seq1]; seq++; /* a new entry inserted in the arrays of "new_rule" */ } /* Now to the fields of "new_rule" that are not in "rule_after" */ interesting_fields_mask = new_rule->rfm & ( ~rule_after->rfm ); for( field_no = 0; field_no < MAX_FIELDS; field_no ++ ) if( ( 1 << field_no ) & interesting_fields_mask != 0 ) { /* The field is relevant in " new_rule" but not in "rule_after" */ new_rule->relevant_field[seq] = field_no; new_rule->break_rule[seq] = rule_after; } /* The arrays of "new_rule" have been calculated. * * Now, update the arrays of the rules before "new_rule". I.e., iterate * over the previous rules taking each of them as the base point of the * longest continuous sequences starting at each. */ /* The fields whose continuous sequences can't reach "new_rule" because * they have been broken in the middle. */ u_int32 accum_middle_broken_sequences_mask; struct ip_fw * previous_rule; accum_middle_broken_sequences_mask = 0; previous_rule = LIST_PREVIOUS( new_rule, previous ); while( accum_middle_broken_sequences_mask != ((1 << MAX_FIELDS ) - 1) /* There is a field whose continuous sequence is not broken yet * and hence that sequence of "previous_rule", if present, could * reach "new_rule". */ && previous_rule != NULL ) { int sort_needed = FALSE; for( seq = 0; seq < previous_rule->number_relevant_fields; seq++) { field_no = previous_rule->relevant_field[seq]; field_mask = 1 << field_no; if( field_mask & new_rule->rfm != 0 ) /* The continuous sequence of this "field_no" of * "previous_rule" can't be broken at any rate by "new_rule", * even if the sequence reaches "new_rule", because that * field is also relevant in "new_rule". */ continue; /* Next field of "previous_rule" */ /* The continuous sequence of this field will be broken by * "new_rule", if it reaches there by not being broken by the * rules in the middle, between "previous_rule" and "new_rule". */ if( field_mask & accum_middle_broken_sequences_mask == 0 ) { /* The continuous sequence of "field_no" that starts at * "previous_rule" is not broken by any rule in the middle, * so it reaches "new_rule", and is broken there because * "new_rule" doesn't need the sequence's field. */ assert( previous_rule->break_rule[seq]->fw_number > new_rule->fw_number ); previous_rule->break_rule[seq] = new_rule; /* break */ sort_needed = TRUE; /* We have changed the array */ } } /* End of the "for(seq = all fields in previous_rule)" */ if( sort_needed ) { sort previous_rule->relevant_field and previous_rule->break_rule arrays, in descending order according to the value of previous_rule->break_rule[i]->fw_number } /* This rule "previous_rule" itself breaks all continuous sequence * for the fields it doesn't need. */ accum_middle_broken_sequences_mask |= ~ previous_rule->rfm; previous_rule = LIST_PREVIOUS( previous_rule, previous ); } /* End of the "while( accum_middle_broken_sequences_mask ...)" */ skip_arrays_build: ... /* Ey!! Remember to update the search data structures for the fields and * values found in this new fw-rule, and the lists associated to their * nodes. */ } /* End of the function "add_entry" */ CASE 2: An existing (static) fw-rule is deleted. Note: for simplicity, we suppose here that the "del_entry" function, responsible for deleting a rule(s), takes the pointer to the rule it has to delete, instead that the "fw_number" of all the rules it has to delete. (The general case used really in the function in "ip_fw.c" is dealt with similarly to what is done here: simply locate where the block of the given "fw_number" rules begins and ends. "previous_rule" in the following would start just before that block, and "rule_after" would be the rule after the block.) static int del_entry( struct ip_fw_head * chainptr, struct ip_fw * rule_to_delete ) { u_int8 seq, field_no; struct ip_fw * rule_after; u_int8 rule_after_inverted_index[MAX_FIELDS]; rule_after = LIST_NEXT( rule_to_delete, next ); /* Create the inverted index of "field_no"s for "rule_after */ for( seq = 0; seq < rule_after->number_relevant_fields; seq++ ) { field_no = rule_after->relevant_field[seq]; rule_after_inverted_index[field_no] = seq; } /* Update the arrays for the rules previous to "rule_to_delete" * * The idea for this is very similar to the one used in "add_entry" */ u_int8 seq1; u_int32 accum_middle_broken_sequences_mask; struct ip_fw * previous_rule; accum_middle_broken_sequences_mask = 0; previous_rule = LIST_PREVIOUS( previous_rule, previous ); while ( accum_middle_broken_sequences_mask != (( 1 << MAX_FIELDS ) - 1 ) && previous_rule != NULL ) { u_int sort_needed = FALSE; for( seq = 0; seq < previous_rule->number_relevant_fields; seq++ ) { field_no = previous_rule->relevant_field[seq]; if( previous_rule->break_rule[seq] == rule_to_delete ) { /* The current breaking rule for this sequence is going to * be deleted soon. Find which rule after "rule_to_delete" * will be the next breaking rule for this sequence, using * that of "rule_after". */ if( (1<rfm == 0) /* "rule_after" doesn't have this field, so it will * break this continuous sequence itself. */ previous_rule->break_rule[seq] = rule_after; else { /* "rule_after" has this field, so it belongs to the * sequence, and therefore, "previous_rule" inherits * its same breaking rule for this field. */ seq1 = rule_after_inverted_index[field_no]; previous_rule->break_rule[seq] = rule_after->break_rule[seq1]; } sort_needed = TRUE; } /* End of the "if" */ } /* End of the "for( seq )" */ if( sort_needed ) { sort previous_rule->relevant_field and previous_rule->break_rule arrays, in descending order according to the value of previous_rule->break_rule[i]->fw_number } /* This rule "previous_rule" itself breaks any continuous sequence * for the fields it doesn't need. */ accum_middle_broken_sequences_mask |= ~ previous_rule->rfm; previous_rule = LIST_PREVIOUS( previous_rule, previous ); } /* End of the while( accum_middle_broken_sequences_mask ... )" */ /* Normal work of freeing "rule_to_delete"s arrays and of deleting * "rule_to_delete" from the list. * Remember to update the search data structures and their ordered lists * deleting from them this "rule_to_delete" and the values exclusive to it. */ ... ... } /* End of function "del_entry" */ ANNEX 4. IP_FW_CHK FROM SECTION 3.7.2 static int ip_fw_chk( struct ip ** pip, /* rest of parameters here ... */ ) { u_int32 empty_lists_mask, search_done_mask, field_mask, preferred_fields; u_int8 seq, seq1, field_no; struct ip_fw * block_head, farthest_rule, break_rule, rule4packet; struct list_entry * ordered_list_for_field[ MAX_FIELDS ]; struct list_entry * list4value; /* The list for the value of the field in * the network packet. */ /* our version of the "lookup_next_rule" function */ #define SAFE_FARTHEST_JUMP( base_rule, last_synced_seq ) \ \ do { \ \ farthest_rule = LIST_NEXT( (base_rule), next ); /* default jump; \ * no rule missed, so \ * it is a safe jump \ */ \ \ /* look if there is a farther (and safe) jump than merely \ * "base_rule->next" */ \ \ for( seq1 = 0; seq1 < (base_rule)->number_relevant_fields; \ seq1++ ) { \ \ field_no = (base_rule)->relevant_field[seq1]; \ \ field_mask = 1 << field_no; \ \ if( field_mask & search_done_mask == 0 ) \ \ /* The search for this field hasn't been done. \ * \ * We regard searches on the field's data structure as \ * costly operations (why?) \ */ \ \ continue; /* next relevant field */ \ \ /* This list is known (its field has been searched for it */ \ \ break_rule = (base_rule)->break_rule[seq1]; \ \ if( break_rule->fw_number <= farthest_rule->fw_number ) \ \ /* We have found that the previously existing \ * "farthest_rule" is farther (and still safe) than the \ * break rule (the "B[i]" in section 3.7.2) the current \ * field at "seq1". \ * \ * Since the array "base_rule->break_rule" is decreasingly \ * ordered by "base_rule->break_rule[i]->fw_number", then \ * by transitivity, for any index "i" > "seq1" \ * \ * base_rule->break_rule[i]->fw_number <= \ * farthest_rule->fw_number \ * \ * and since, for those fields for any index "i" > "seq1" \ * their corresponding "list4value" in the network packet \ * has to be in the longest continuous sequence for \ * "field_no".(I.e., it has to be safe.) \ * \ * list4value->rule->fw_number < \ * base_rule->break_rule[i]->fw_number \ * \ * for being considered in the selection for the farthest \ * rule, so, for these "i" > "seq1", \ * \ * list4value->rule->fw_number < farthest_rule->fw_number \ * \ * then we know that the current "farthest_rule" is the \ * farthest jump (and it is safe) \ */ \ \ break; /* this "for(seq1)" loop */ \ \ else /* farthest_rule->fw_number < break_rule->fw_number \ * \ * We have to consult the rule in the list of the value in \ * the packet (the "N[i]" in section 3.7.2), perhaps it's \ * farther than "farthest_rule" and also safe. \ */ \ \ if( empty_lists_mask & field_mask == 0 ) { \ /* "ordered_list_for_field[field_no]" not empty */ \ \ list4value = ordered_list_for_field[field_no]; \ \ if( seq1 > (last_synced_seq) ) { \ \ /* We have to sync here the list4value->rule to \ * pass this "base_rule" rule, since the jump is \ * forward (in section 3.7.2 terminology: \ * \ * R.fw_number < N[i].fw_number \ */ \ \ while( list4value != NULL && \ list4value->rule->fw_number <= \ base_rule->fw_number ) \ \ list4value = LIST_NEXT( list4value, next ); \ \ /* sync done: store it back */ \ ordered_list_for_field[field_no] = list4value; \ \ if( list4value == NULL ) { \ empty_lists_mask |= field_mask; \ continue; /* no rule to take from list */ \ } \ } \ \ rule4packet = list4value->rule; \ \ if( rule4packet->fw_number < break_rule->fw_number \ /* the rule in the list for the packet's value is \ * in the longest continuous sequence for \ * "field_no" (i.e., it's safe) \ */ \ && farthest_rule->fw_number < \ rule4packet->fw_number ) \ \ farthest_rule = rule4packet; \ } \ \ } /* end of the for( seq1 ... ) loop */ \ \ } while( 0 ); /***************************************************************/ /* * Begin of the "ip_fw_chk" function * */ /***************************************************************/ search_done_mask = 0; empty_lists_mask = 0; block_head = LIST_FIRST( &ip_fw_chain_head ); /* We have the default fw-rule in the "ip_fw_chain_head" list, so at least one * rule will always match on the packet. Therefore, this loop is not endless. */ while( 1 ) { start_of_loop_on_current_block_head: /* at least a rule will match in the "ip_fw_chain_head" list */ assert( block_head != NULL ); if( block_head->rfm & empty_lists_mask != 0 ) { /* there are some relevant lists that are already empty, so this rule * "block_head" has some relevant parts missing and can't be a * coincidence. But this fact may avoid us useless searches on the * data structures of its relevant fields which are still unknown. * * Jump to the farthest away rule that is still feasible from this * unfeasible "block_head" rule, according only to its known lists and * its already empty lists. */ /* -1 means to sync all the known lists of this * "block_head" to pass it */ SAFE_FARTHEST_JUMP( block_head, -1 ); block_head = farthest_rule; continue; /* analyze this new "block_head" rule */ } /* This rule "block_head" is still feasible since it doesn't have any * relevant lists empty: visit the lists of its relevant fields. * * Prioritize the analysis on the known lists first, in order of their * longest continuous sequence that starts at "block_head". The second * pass is on the unknown lists, searching them first. This preference * tries to avoid (or just delay) unnecessary searches. */ preferred_fields = block_head->rfm & search_done_mask; for( importance = 2; importance ; importance--) { if( preferred_fields == 0 ) goto skip_FORloop_on_the_fields; for( seq=0; preferred_fields ; seq++ ) { field_no = block_head->relevant_field[seq]; field_mask = 1 << field_no; if( field_mask & preferred_fields == 0 ) continue; /* this field doesn't have this "importance" */ preferred_fields ^= field_mask; /* clear its bit */ if( field_mask & search_done_mask == 0 ) { /* a still-unknown list (field) (importance == 1) */ list4value = search( field_no, packet ); search_done_mask |= field_mask; } else /* a known list (field) (importance == 2) */ list4value = ordered_list_for_field[field_no]; /* put in sync "list4value->rule" to be at least equal to * "block_head->fw_number" */ while( list4value != NULL && list4value->rule->fw_number < block_head->fw_number ) list4value = LIST_NEXT( list4value, next ); ordered_list_for_field[field_no] = list4value; /* Here, the merge is merely reduced to checking for coincidence * on "block_head" */ if( list4value != NULL && list4value->rule->fw_number == block_head->fw_number ) /* Good! A coincidence on this rule "block_head". * Do a favor: sync the list to pass this coincidence. */ ordered_list_for_field[field_no] = LIST_NEXT( list4value, next ); else { if( list4value == NULL ) empty_lists_mask |= field_mask; /* We don't have a coincidence on "block_head", so we have to * discard it. The next "block_head" rule will be the farthest * (but safe) jump from this old "block_head". */ SAFE_FARTHEST_JUMP( block_head, seq ); block_head = farthest_rule; /* analyze this rule */ goto start_of_loop_on_current_block_head; } } /* end of the "for( seq < block_head->number_relevant_fields )" */ /* We go to the next iteration on the fields of "block_head" that are * still unknown, where we'll have to search for their values in the * packet to get their associated lists. */ skip_FORloop_on_the_fields: preferred_fields = block_head->rfm & (~search_done_mask); } /* end of "for(importance)" */ /* all the relevant fields for "block_head" give it as a coincidence, so * we may apply "block_head" right now. */ decision_found = apply( block_head, packet, &ip_fh_chk_returnval ); if( decision_found ) return ip_fh_chk_returnval; else { /* It was a "the search continues" rule. (No list is needed to be * synced for the jump, that's why * "block_head->number_relevant_fields" is specified: it means "all * lists synced") */ SAFE_FARTHEST_JUMP( block_head, block_head->number_relevant_fields ); block_head = farthest_rule; } } /* end of the "while( 1 )" loop */ ANNEX 90. A USEFUL IDEA FROM THE DYNAMIC RULES As it was mentioned above, a stateful firewall should not be expressed by this algorithm because, when a dynamic rule is added or removed, the search data structures and the ordered lists associated with the values in them, have to be updated, and this, IMHO, may be overwhelming. (I don't know... do you know a data structure that allows fast searches on it, and besides, fast inserts and then deletions possibly just after a net.inet.ip.fw.dyn_X_lifetime period of time? I give up.) But, returning to the dynamic rules, the search currently implemented on them -mainly the hash on the packet's addresses and ports- might be considered a generalization of the proposed algorithm, because *each independent* field in the packet isn't being searched for, but a *unified key of some fields* in the packet is: Proposed algorithm: independent searches for each field and then a merge looking for coincidences Current search algorithm in a dynamic firewall: an initial search for a compound key of some fields, and then a precise search in the short list returned by the hash-lookup. Of course, a dynamic rule only expresses real values for addresses and ports, that may be exactly found in a packet, whereas a fw-rule normally expresses a mask of addresses -a mask that by itself will not be found in any packet-, a list or range of ports, etc. But the idea of the current implementation is certainly applicable, instead of searching for each independent field, you combine some fields and search for the compound key. This may save searches if fields with a small values-domain are packed in a key, e.g., the packet's IP and/or TCP options, the IP precedence, and/or TCP flags. (Since there are not so many -practical- combinations of options and flags, a direct indexing schema on the key may be sufficient and no extra searches -as in the case of a hash- would be necessary.) Furthermore, since only one list would be returned by a search for a key of some fields in step 2, less number of fields (lists) would have to be merged in step 3.