Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 22 Aug 2010 11:30:01 -0500 (CDT)
From:      "Sean C. Farley" <scf@FreeBSD.org>
To:        =?ISO-8859-15?Q?Dag-Erling_Sm=F8rgrav?= <des@des.no>
Cc:        freebsd-current@FreeBSD.org, Mike Haertel <mike@ducky.net>, gabor@FreeBSD.org
Subject:   Re: why GNU grep is fast
Message-ID:  <alpine.BSF.2.00.1008221111300.1989@terminus>
In-Reply-To: <86k4nikglg.fsf@ds4.des.no>
References:  <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no>

next in thread | previous in thread | raw e-mail | index | archive | help
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--4255886656-1237390696-1282494601=:1989
Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Sun, 22 Aug 2010, Dag-Erling Smørgrav wrote:

> Mike Haertel <mike@ducky.net> writes:
>> GNU grep uses the well-known Boyer-Moore algorithm, which looks first 
>> for the final letter of the target string, and uses a lookup table to 
>> tell it how far ahead it can skip in the input whenever it finds a 
>> non-matching character.
>
> Boyer-Moore is for fixed search strings.  I don't see how that 
> optimization can work with a regexp search unless the regexp is so 
> simple that you break it down into a small number of cases with known 
> length and final character.

When I was working on making FreeGrep faster (years ago), I wrote down a 
few notes about possible algorithms, especially those that could be 
useful for fgrep functionality.  I am just passing these onto the list.

Some algorithms:
1. http://en.wikipedia.org/wiki/Aho-Corasick_string_matching_algorithm
2. http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
3. GNU fgrep:  Commentz-Walter
4. GLIMPSE:  http://webglimpse.net/pubs/TR94-17.pdf (Boyer-Moore variant)

Also, this may be a useful book:
http://www.dcc.uchile.cl/~gnavarro/FPMbook/

Sean
-- 
scf@FreeBSD.org
--4255886656-1237390696-1282494601=:1989--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.2.00.1008221111300.1989>