From owner-freebsd-net@FreeBSD.ORG Sat Sep 1 08:51:40 2007 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BDD3B16A417 for ; Sat, 1 Sep 2007 08:51:40 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.64.130]) by mx1.freebsd.org (Postfix) with ESMTP id 30FD413C459 for ; Sat, 1 Sep 2007 08:51:39 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.1/8.14.1) with ESMTP id l818pcBE031380; Sat, 1 Sep 2007 12:51:38 +0400 (MSD) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.1/8.14.1/Submit) id l818pcB9031379; Sat, 1 Sep 2007 12:51:38 +0400 (MSD) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sat, 1 Sep 2007 12:51:38 +0400 From: Gleb Smirnoff To: Weiguang Shi Message-ID: <20070901085138.GW21312@glebius.int.ru> Mail-Followup-To: Gleb Smirnoff , Weiguang Shi , maxim@freebsd.org, freebsd-net@freebsd.org References: <957582.10686.qm@web43133.mail.sp1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <957582.10686.qm@web43133.mail.sp1.yahoo.com> User-Agent: Mutt/1.5.15 (2007-04-06) Cc: maxim@FreeBSD.org, freebsd-net@FreeBSD.org Subject: Re: questions wrt ng_netflow X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 01 Sep 2007 08:51:40 -0000 Weiguang, sorry for late answer, I'm too loaded with daytime job. On Thu, Aug 23, 2007 at 09:40:30AM -0700, Weiguang Shi wrote: W> I've been reading netlfow.c in FreeBSD-6.2 and this piece of code confuses me. W> 484 /* W> 485 * Go through hash and find our entry. If we encounter an W> 486 * entry, that should be expired, purge it. We do a reverse W> 487 * search since most active entries are first, and most W> 488 * searches are done on most active entries. W> 489 */ W> 490 TAILQ_FOREACH_REVERSE_SAFE(fle, &hsh->head, fhead, fle_hash, fle1) { W> 491 if (bcmp(&r, &fle->f.r, sizeof(struct flow_rec)) == 0) W> 492 break; W> 493 if ((INACTIVE(fle) && SMALL(fle)) || AGED(fle)) { W> 494 TAILQ_REMOVE(&hsh->head, fle, fle_hash); W> 495 expire_flow(priv, &item, fle, NG_QUEUE); W> 496 atomic_add_32(&priv->info.nfinfo_act_exp, 1); W> 497 } W> 498 } W> W> +-------------+ +--------+ +--------+ +--------+ +--------+ W> | Bucket Head |----->| RecA |----->| RecB |----->| RecC |----->| RecD | W> +-------------+ +--------+ +--------+ +--------+ +--------+ W> W> In the figure above, let's say our packet matches RecC. So before the W> match, RecD is examined to see if it's AGED, i.e., it's lasted for too W> long, or if it's too small and inactive. As the match is found, the W> code stops searching. W> W> First, isn't INACTIVE alone enough to expire a flow? Why must INACTIVE W> _and_ SMALL? No. Netflow engine tries to minimise number of export records sent, and avoid splitting one long flow into several records. Thus, if we have enough space in the cache, we keep inactive flows, because they can become active again. For example, a TCP ssh session, where you have stopped typing and are reading the text becomes inactive after some time passes. However, it will continue, when you start typeing again. We make an exclusion for SMALL flows, to avoid blowing the cache due to continuous internet scanning by worms: /* * 4 is a magical number: statistically number of 4-packet flows is * bigger than 5,6,7...-packet flows by an order of magnitude. Most UDP/ICMP * scans are 1 packet (~ 90% of flow cache). TCP scans are 2-packet in case * of reachable host and 4-packet otherwise. */ #define SMALL(fle) (fle->f.packets <= 4) W> RecA and RecB would not be examined for expiration but since they are W> to the beginning of the queue and therefore actually less recently W> accessed, they are more likely to be INACTIVE and could be more AGED. W> I must be missing something, but what justifies examining RecD but not W> RecA and RecB? Because we are in the interrupt thread. Our aim is to finish processing of one IP packet as fast as possible and return. Our aim is not to expire as much as possible. However we examine the flows that we have just bcmp()'ed. These entires are in the CPU's cache, so we can quickly check them. The periodic expiry routine goes through the TAILQ in opposite order, starting from head, so it accesses the oldest flows earlier. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE