From owner-freebsd-current@FreeBSD.ORG  Tue Aug 24 01:16:10 2010
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id B861010656A4
	for <freebsd-current@freebsd.org>; Tue, 24 Aug 2010 01:16:10 +0000 (UTC)
	(envelope-from cpghost@cordula.ws)
Received: from mail-qw0-f54.google.com (mail-qw0-f54.google.com
	[209.85.216.54])
	by mx1.freebsd.org (Postfix) with ESMTP id 6A8598FC08
	for <freebsd-current@freebsd.org>; Tue, 24 Aug 2010 01:16:10 +0000 (UTC)
Received: by qwg5 with SMTP id 5so6312224qwg.13
	for <multiple recipients>; Mon, 23 Aug 2010 18:16:09 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.224.28.137 with SMTP id m9mr3868720qac.381.1282612569544; Mon,
	23 Aug 2010 18:16:09 -0700 (PDT)
Received: by 10.229.95.145 with HTTP; Mon, 23 Aug 2010 18:16:09 -0700 (PDT)
X-Originating-IP: [93.203.40.83]
In-Reply-To: <4C728DE5.4060809@FreeBSD.org>
References: <201008210231.o7L2VRvI031700@ducky.net>
	<4C728DE5.4060809@FreeBSD.org>
Date: Tue, 24 Aug 2010 03:16:09 +0200
Message-ID: <AANLkTi=ksoBptimSgnYUxp8+wYwOjidZ03uJyBFTTwz7@mail.gmail.com>
From: "C. P. Ghost" <cpghost@cordula.ws>
To: Gabor Kovesdan <gabor@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Cc: freebsd-current@freebsd.org
Subject: Re: What to learn from the BSD grep case [Was: why GNU grep is fast]
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2010 01:16:10 -0000

On Mon, Aug 23, 2010 at 5:04 PM, Gabor Kovesdan <gabor@freebsd.org> wrote:
> 4, We really need a good regex library. From the comments, it seems there's
> no such in the open source world. GNU libregex isn't efficient because GNU
> grep uses those workarounds that Mike kindly pointed out. Oniguruma was
> extremely slow when I checked it. PCRE supports Perl-style syntax with a
> POSIX-like API but not POSIX regex. Google RE2 is the same with additional
> egrep syntax but doesn't have support for standard POSIX regexes. Plan 9
> regex only supports egrep syntax. It seems that TRE is the best choice. It
> is BSD-licensed, supports wchar and POSIX(ish) regexes and it is quite fast.

I know it's C++ and not exactly what you're needing, but have you evaluated
Boost::Regex? Isn't there some code that can be retrofitted into a C lib?

http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/index.html

> I don't know the theoretical background of regex engines but I'm wondering
> if it's possible top provide an alternative API with byte-counted buffers
> and use the heuristical speedup with fixed string matching. As Mike pointed
> out the POSIX API is quite limiting because it works on NUL-terminated
> strings and not on byte-counted buffers, so we couldn't just do it with a
> POSIX-conformant library but it would be nice if we could implement it in
> such a library with an alternative interface.
>
> Gabor

-cpghost.

-- 
Cordula's Web. http://www.cordula.ws/