Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Aug 2010 12:23:05 +0200
From:      Gabor Kovesdan <gabor@FreeBSD.org>
To:        "Sean C. Farley" <scf@FreeBSD.org>
Cc:        =?ISO-8859-1?Q?Dag-Erling_Sm=F8rgr?=, freebsd-current@FreeBSD.org, Mike Haertel <mike@ducky.net>, =?ISO-8859-1?Q?av?= <des@des.no>
Subject:   Re: why GNU grep is fast
Message-ID:  <4C724C09.6090104@FreeBSD.org>
In-Reply-To: <alpine.BSF.2.00.1008222030080.93799@thor.farley.org>
References:  <201008210231.o7L2VRvI031700@ducky.net> <86k4nikglg.fsf@ds4.des.no>	<alpine.BSF.2.00.1008221111300.1989@terminus>	<628366E1-AF71-4A22-95AF-BC77A21C21A8@kientzle.com> <alpine.BSF.2.00.1008222030080.93799@thor.farley.org>

next in thread | previous in thread | raw e-mail | index | archive | help

>
>> Later on, he summarizes some of the existing implementations, 
>> including comments about the Plan 9 implementation and his own RE2, 
>> both of which efficiently handle international text (which seems to 
>> be a major concern of Gabor's).
>
> I believe Gabor is considering TRE for a good replacement regex library.
Yes. Oniguruma is slow, Google RE2 only supports Perl and fgrep syntax 
but not standard regex and Plan 9 implementation iirc only supports 
fgrep syntax and Unicode but not wchar_t in general.
>
>> The key comment in Mike's GNU grep notes is the one about not 
>> breaking into lines.  That's simply double-scanning the input; 
>> instead, run the matcher over blocks of text and, when it finds a 
>> match, work backwards from the match to find the appropriate line 
>> beginning.  This is efficient because most lines don't match.
>
> I do like the idea.
So do I.
>
> BTW, the fastgrep portion of bsdgrep is my fault/contribution to do a 
> faster search bypassing the regex library.  :)  It certainly was not 
> written with any encodings in mind; it was purely ASCII.  As I have 
> not kept up with it, I do not know if anyone improved it or not.
>
It has been made wchar-compliant.

Gabor



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C724C09.6090104>