From owner-svn-src-all@FreeBSD.ORG Wed Aug 18 21:25:49 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B784C10656A3; Wed, 18 Aug 2010 21:25:49 +0000 (UTC) (envelope-from dimitry@andric.com) Received: from tensor.andric.com (cl-327.ede-01.nl.sixxs.net [IPv6:2001:7b8:2ff:146::2]) by mx1.freebsd.org (Postfix) with ESMTP id 75FB28FC12; Wed, 18 Aug 2010 21:25:49 +0000 (UTC) Received: from [IPv6:2001:7b8:3a7:0:f4aa:bf1e:4470:d911] (unknown [IPv6:2001:7b8:3a7:0:f4aa:bf1e:4470:d911]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by tensor.andric.com (Postfix) with ESMTPSA id 9BC565C59; Wed, 18 Aug 2010 23:25:48 +0200 (CEST) Message-ID: <4C6C4FDD.8080803@andric.com> Date: Wed, 18 Aug 2010 23:25:49 +0200 From: Dimitry Andric User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.9pre) Gecko/20100814 Lanikai/3.1.3pre MIME-Version: 1.0 To: mdf@FreeBSD.org References: <201008181740.o7IHeA4c075984@svn.freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Gabor Kovesdan Subject: Re: svn commit: r211463 - head/usr.bin/grep X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Aug 2010 21:25:49 -0000 On 2010-08-18 22:48, mdf@FreeBSD.org wrote: >> - Refactor file reading code to use pure syscalls and an internal buffer >> instead of stdio. This gives BSD grep a very big performance boost, >> its speed is now almost comparable to GNU grep. > > I didn't read all of the details in the profiling mails in the thread, > but does this mean that work on stdio would give a performance boost > to many apps? Or is there something specific about how grep(1) is > using its input that makes it a horse of a different color? Originally, it was reading files 1 character at a time, using fgetc(3), the locking version even. This is usually not the fastest way to read a large file with stdio. :) If grep did not have to support .gz or .bz2 files, we could just have plugged in stdio's fgetln(3). I tried this approach first on some non-compressed files, and it performed much better than fgetc'ing. The reading code that was now committed, is basically the same algorithm as fgetln() uses internally, but it can handle gzip and bzip2 input too.