Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Aug 2010 17:28:08 +0200
From:      Dimitry Andric <dimitry@andric.com>
To:        =?UTF-8?B?RGFnLUVybGluZyBTbcO4cmdyYXY=?= <des@des.no>
Cc:        Doug Barton <dougb@FreeBSD.org>, Justin Hibbits <chmeeedalf@gmail.com>, core@freebsd.org, delphij@freebsd.org, Gabor Kovesdan <gabor@freebsd.org>, Steve Kargl <sgk@troutmask.apl.washington.edu>, current@freebsd.org
Subject:   Re: Official request: Please make GNU grep the default
Message-ID:  <4C6AAA88.5080606@andric.com>
In-Reply-To: <86sk2faqdl.fsf@ds4.des.no>
References:  <4C6505A4.9060203@FreeBSD.org>	<20100813085235.GA16268@freebsd.org> <4C66C010.3040308@FreeBSD.org>	<4C673F02.8000805@FreeBSD.org>	<20100815013438.GA8958@troutmask.apl.washington.edu>	<4C67492C.5020206@FreeBSD.org>	<B7A05068-9578-4341-851B-86BD9BC7A2DA@gmail.com>	<8639ufd78w.fsf@ds4.des.no> <4C6844D8.5070602@andric.com> <86sk2faqdl.fsf@ds4.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2010-08-16 10:55, Dag-Erling Sm=C3=B8rgrav wrote:
> Dimitry Andric <dimitry@andric.com> writes:
>> - Uses plain file descriptors instead of struct FILE, since the
>>   buffering is done manually anyway, and it makes it easier to support=

>>   gzip and bzip2.
> It might be worth a shot adding mmap(2) support as well, i.e. when
> processing an uncompressed regular file, try to mmap(2) it first, and i=
f
> that fails, fall back to the plain buffered read(2) method.

I added a simple mmap to grep, and time-trialed it, but the mmap version
was somewhat slower than the regular version.  I understood from Kostik
Belousov that readahead does not work properly with mmap, and it should
not be used for "one-time" reads.

I also experimented with different buffer sizes on the same big test
file, and this gives the following results (times in s):

buffer size     test1   test2   test3   average
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D     =3D=3D=3D     =3D=3D=3D     =3D=3D=3D=
     =3D=3D=3D
        512     467     484     465     472
      1,024     391     415     392     399
      2,048     361     356     365     361
      4,096     353     353     356     354
      8,192     348     345     357     350
     16,384     341     373     350     354
     32,768     339     348     346     344
     65,536     336     359     371     355
    262,144     334     352     350     345
  1,048,576     334     350     351     345
  2,097,152     339     342     369     350
373,293,056     544     547     559     550

E.g. the 32k buffer size that I borrowed from GNU grep seems to be
reasonable enough.  There is no profit in wasting huge amounts of memory
to speed things up.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6AAA88.5080606>