From owner-freebsd-net@FreeBSD.ORG Sat Apr 21 06:34:23 2012 Return-Path: Delivered-To: net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8741E106566C; Sat, 21 Apr 2012 06:34:23 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail05.syd.optusnet.com.au (mail05.syd.optusnet.com.au [211.29.132.186]) by mx1.freebsd.org (Postfix) with ESMTP id 083CE8FC08; Sat, 21 Apr 2012 06:34:19 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail05.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q3L6Y84x025316 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 21 Apr 2012 16:34:09 +1000 Date: Sat, 21 Apr 2012 16:34:08 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: "K. Macy" In-Reply-To: Message-ID: <20120421155638.E982@besplex.bde.org> References: <20120419133018.GA91364@onelab2.iet.unipi.it> <4F907011.9080602@freebsd.org> <20120419204622.GA94904@onelab2.iet.unipi.it> <20120419212224.GA95459@onelab2.iet.unipi.it> <20120420144410.GA3629@onelab2.iet.unipi.it> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Andre Oppermann , Luigi Rizzo , current@FreeBSD.org, net@FreeBSD.org Subject: Re: Some performance measurements on the FreeBSD network stack X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 21 Apr 2012 06:34:23 -0000 On Fri, 20 Apr 2012, K. Macy wrote: > On Fri, Apr 20, 2012 at 4:44 PM, Luigi Rizzo wrote: >> The small penalty when flowtable is disabled but compiled in is >> probably because the net.flowtable.enable flag is checked >> a bit deep in the code. >> >> The advantage with non-connect()ed sockets is huge. I don't >> quite understand why disabling the flowtable still helps there. > > Do you mean having it compiled in but disabled still helps > performance? Yes, that is extremely strange. This reminds me that when I worked on this, I saw very large throughput differences (in the 20-50% range) as a result of minor changes in unrelated code. I could get these changes intentionally by adding or removing padding in unrelated unused text space, so the differences were apparently related to text alignment. I thought I had some significant micro-optimizations, but it turned out that they were acting mainly by changing the layout in related used text space where it is harder to control. Later, I suspected that the differences were more due to cache misses for data than for text. The CPU and its caching must affect this significantly. I tested on an AthlonXP and Athlon64, and the differences were larger on the AthlonXP. Both of these have a shared I/D cache so pressure on the I part would affect the D part, but in this benchmark the D part is much more active than the I part so it is unclear how text layout could have such a large effect. Anyway, the large differences made it impossible to trust the results of benchmarking any single micro-benchmark. Also, ministat is useless for understanding the results. (I note that luigi didn't provide any standard deviations and neither would I. :-). My results depended on the cache behaviour but didn't change significantly when rerun, unless the code was changed. Bruce