From owner-freebsd-net@FreeBSD.ORG Mon Jul 7 19:15:46 2008 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3799106564A; Mon, 7 Jul 2008 19:15:46 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id DB7288FC22; Mon, 7 Jul 2008 19:15:45 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from besplex.bde.org (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m67JFcFV001599 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 8 Jul 2008 05:15:41 +1000 Date: Tue, 8 Jul 2008 05:15:38 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans In-Reply-To: <20080708034304.R21502@delplex.bde.org> Message-ID: <20080708045135.V1022@besplex.bde.org> References: <4867420D.7090406@gtcomm.net> <4869B025.9080006@gtcomm.net><486A7E45.3030902@gtcomm.net> <486A8F24.5010000@gtcomm.net><486A9A0E.6060308@elischer.org> <486B41D5.3060609@gtcomm.net><486B4F11.6040906@gtcomm.net><486BC7F5.5070604@gtcomm.net><20080703160540.W6369@delplex.bde.org><486C7F93.7010308@gtcomm.net><20080703195521.O6973@delplex.bde.org><486D35A0.4000302@gtcomm.net><486DF1A3.9000409@gtcomm.net><486E65E6.3060301@gtcomm.net> <2d3001c8def1$f4309b90$020b000a@bartwrkstxp> <486FFF70.3090402@gtcomm.net> <48701921.7090107@gtcomm.net> <4871E618.1080500@freebsd.org> <20080708002228.G680@besplex.bde.org> <48724238.2020103@freebsd.org> <20080708034304.R21502@delplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: FreeBSD Net , Andre Oppermann , Bart Van Kerckhove , Ingo Flaschberger , Paul Subject: Re: Freebsd IP Forwarding performance (question, and some info) [7-stable, current, em, smp] X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Jul 2008 19:15:46 -0000 On Tue, 8 Jul 2008, Bruce Evans wrote: > On Mon, 7 Jul 2008, Andre Oppermann wrote: > >> Bruce Evans wrote: >>> So it seems that the major overheads are not near the driver (as I already >>> knew), and upper layers are responsible for most of the cache misses. >>> The packet header is accessed even in monitor mode, so I think most of >>> the cache misses in upper layers are not related to the packet header. >>> Maybe they are due mainly to perfect non-locality for mbufs. >> >> Monitor mode doesn't access the payload packet header. It only looks >> at the mbuf (which has a structure called mbuf packet header). The mbuf >> header it hot in the cache because the driver just touched it and filled >> in the information. The packet content (the payload) is cold and just >> arrived via DMA in DRAM. > > Why does it use ntohs() then? :-). From if_ethersubr.c: > ... > % eh = mtod(m, struct ether_header *); > > Point outside of mbuf header. > > % etype = ntohs(eh->ether_type); > > First access outside of mbuf header. > ... > % % /* Allow monitor mode to claim this frame, after stats are updated. > */ > % if (ifp->if_flags & IFF_MONITOR) { > % m_freem(m); > % return; > % } > > Finally return in monitor mode. > > I don't see any stats update before here except for the stray if_imcasts > one. There are some error stats with printfs, but I've never seen these do anything except with a buggy sk driver. Testing verifies that accessing eh above gives a cache miss. Under ~5.2 receiving on bge0 at 397 kpps: -monitor: 17% idle 19 cm/p (18% less idle than under -current) monitor: 66% idle 8 cm/p (17% less idle than under -current) +monitor: 71% idle 7 cm/p (idle time under -current not measured) +monitor is monitor mode with the exit moved to the top of ether_input(). If the cache miss takes the time measured by lmbench2 (42 ns), then 397 k of these per second gives 17 ms or 1.7% CPU, which is vaguely consistent with the improvement of 5% by not taking this cache miss. Avoiding most of the 19 cache misses should give much more than a 5% improvement. Maybe -current gets its 17% improvement by avoiding some. More userland stats weirdness in userland: - in monitor mode, em0 gives byte counts delayed while bge0 gives byte counts always 0. - netstat -I 1 seems to be broken in ~5.2 in all modes -- it gives output for interfaces with drivers but no hardware. All this is for UP. An SMP kernel on the same UP system loses < 5% for at least tx. Bruce