Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 08 May 2011 18:49:38 -0700
From:      Bakul Shah <bakul@bitblocks.com>
To:        Gabor Kovesdan <gabor@kovesdan.org>
Cc:        "Pedro F. Giffuni" <giffunip@yahoo.com>, hackers@FreeBSD.org, Brooks Davis <brooks@freebsd.org>
Subject:   Re: [RFC] Replacing our regex implementation 
Message-ID:  <20110509014938.EE292B827@mail.bitblocks.com>
In-Reply-To: Your message of "Mon, 09 May 2011 02:37:10 BST." <4DC74546.1060902@kovesdan.org> 
References:  <4DC7356C.20905@kovesdan.org> <20110509011709.5455CB827@mail.bitblocks.com> <4DC74546.1060902@kovesdan.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 09 May 2011 02:37:10 BST Gabor Kovesdan <gabor@kovesdan.org>  wrote:
> Em 09-05-2011 02:17, Bakul Shah escreveu:
> > As per the following URLs re2 is much faster than TRE (on the
> > benchmarks they ran):
> >
> > http://lh3lh3.users.sourceforge.net/reb.shtml
> > http://sljit.sourceforge.net/regex_perf.html
> >
> > re2 is in C++&  has a PCRE API, while TRE is in C&  has a
> > POSIX API.  Both have BSD copyright. Is it worth considering
> > making re2 posix compliant?
> Is it wchar-clean and is it actively maintained? C++ is quite 
> anticipated for the base system and I'm not very skilled in it so atm I 
> couldn't promise to use re2 instead of TRE. And anyway, can C++ go into 
> libc? According to POSIX, the regex code has to be there. But let's see 
> what others say... If we happen to use re2 later, my extensions that I 
> talked about in points 2, and 3, would still be useful.
> 
> Anyway, according to some earlier vague measures, TRE seems to be slower 
> in small matching tasks but scales well. These tests seem to compare 
> only short runs with the same regex. It should be seem how they compare 
> e.g. if you grep the whole ports tree with the same pattern. If the 
> matching scales well once the pattern is compiled, that's more important 
> than the overall result for such short tasks, imho.

re2 is certainly maintained. Don't know about whcar cleanliness.
See 
    http://code.google.com/p/re2/
Also check out Russ Cox's excellent articles on implementing it
    http://swtch.com/~rsc/regexp/
and this:
    http://google-opensource.blogspot.com/2010/03/re2-principled-approach-to-regular.html

C++ may be an impediment for it to go into libc but one can
certainly put a C interface on a C++ library.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110509014938.EE292B827>