Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 29 Jan 1997 17:29:19 -0700 (MST)
From:      Patrick Giagnocavo <support@xinside.com>
To:        hackers@freebsd.org
Subject:   [Fwd: freebsd performance.]
Message-ID:  <199701300029.RAA02431@chon.xinside.com>
In-Reply-To: <32EFD2E2.167EB0E7@whistle.com>
References:  <32EFD2E2.167EB0E7@whistle.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer writes:

 > The posix regex library is VERY VERY slow.
 > 
 > I have a program that uses a large regex to parse some input.
 > 
 > I have a version in perl and a version in C++ using the freebsd posix
 > regex library.
 > 
 > The perl version is 100X faster that the C++ version.
 > 
 > gprof on the C++ version shows 99% of the spend in:
 > 
 >  91.53     46.17    46.17  1152366     0.04     0.04  lstep
 >   6.52     49.46     3.29    98497     0.03     0.47  lslow

I am surprised that some of our more erudite members on the list have
not jumped on this.  So, a definitely non-erudite person will.

The best book to get on the subject is the newly released 'Mastering
Regular Expressions' by Jeffrey Freidl; it is published by O'Reilly
and Associates.

Posix regex is explained in the book; as well, there are a number of
tests, gambits, and special cases that can be used to substantially
speed up searches.

There are two different 'engines' - NFA (nondeterministic finite
automaton) and DFA (deterministic finite automaton).  Perl is
'traditional NFA' according to Mr. Friedl, while POSIX leans towards
DFA-like behavior in all cases (always returns 'leftmost-longest' that
matches - perl returns I believe the first part that matches).

Also, Perl does not 'do' POSIX IIRC; so the results can actually be
different when using the same regex string.  Perl does however have
some very powerful features for its regex - covered in the book. 

Anyways, get the book - it is well written and very informative.

Another book that covers this is the 'Red Dragon' compiler book by Aho
Sethi and Ullman.  Page 113 and following in the version I have.

Maybe this fellow should post all or part of his regex strings - there
may well be a way to optimize them.  I am sure that there is a way to
speed it up to get it faster.  

100X difference is just too great to say that it is the library, IMHO.

Cordially

Patrick Giagnocavo
patrick@xinside.com

-Opinions expressed are not even mine, let alone my employers'!-



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701300029.RAA02431>