From owner-freebsd-current@FreeBSD.ORG Fri May 4 06:26:02 2012 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4BB5B1065670 for ; Fri, 4 May 2012 06:26:02 +0000 (UTC) (envelope-from luigi@onelab2.iet.unipi.it) Received: from onelab2.iet.unipi.it (onelab2.iet.unipi.it [131.114.59.238]) by mx1.freebsd.org (Postfix) with ESMTP id 0C1AE8FC12 for ; Fri, 4 May 2012 06:26:01 +0000 (UTC) Received: by onelab2.iet.unipi.it (Postfix, from userid 275) id 9742F7300A; Fri, 4 May 2012 08:45:48 +0200 (CEST) Date: Fri, 4 May 2012 08:45:48 +0200 From: Luigi Rizzo To: Andrew Reilly Message-ID: <20120504064548.GB12241@onelab2.iet.unipi.it> References: <20120502182557.GA93838@onelab2.iet.unipi.it> <20120503234356.GD26284@johnny.reilly.home> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120503234356.GD26284@johnny.reilly.home> User-Agent: Mutt/1.4.2.3i Cc: current@freebsd.org, net@frebsd.org Subject: Re: fast bcopy... X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 May 2012 06:26:02 -0000 On Fri, May 04, 2012 at 09:44:15AM +1000, Andrew Reilly wrote: > On Wed, May 02, 2012 at 08:25:57PM +0200, Luigi Rizzo wrote: > > as part of my netmap investigations, i was looking at how > > expensive are memory copies, and here are a couple of findings > > (first one is obvious, the second one less so) > > Most C compilers (well, the ones I regularly use) inline small, > constant-length memcpy operations of the sort you're describing > here. I would expect techniques like that to beat any amount of > hand-tuning in a elf-linkage bcopy subroutine. > > Sure, you want a good implementation for your variable-length > copies, and data layout and alignment is tremendously important > these days, so there's no single silver bullet here. The two things i was addressing on my message cannot be solved by a compiler: the memcpy/bcopy has variable length in the places i was looking at, and the compiler cannot infer that it is allowed to extend the copy to full words or cache lines instead of stopping at the exact boundary. I don't even dare anymore to hand-optimize code: too many times i have been beaten by the compiler. cheers luigi