Date: Wed, 29 Jan 1997 17:29:19 -0700 (MST) From: Patrick Giagnocavo <support@xinside.com> To: hackers@freebsd.org Subject: [Fwd: freebsd performance.] Message-ID: <199701300029.RAA02431@chon.xinside.com> In-Reply-To: <32EFD2E2.167EB0E7@whistle.com> References: <32EFD2E2.167EB0E7@whistle.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Julian Elischer writes: > The posix regex library is VERY VERY slow. > > I have a program that uses a large regex to parse some input. > > I have a version in perl and a version in C++ using the freebsd posix > regex library. > > The perl version is 100X faster that the C++ version. > > gprof on the C++ version shows 99% of the spend in: > > 91.53 46.17 46.17 1152366 0.04 0.04 lstep > 6.52 49.46 3.29 98497 0.03 0.47 lslow I am surprised that some of our more erudite members on the list have not jumped on this. So, a definitely non-erudite person will. The best book to get on the subject is the newly released 'Mastering Regular Expressions' by Jeffrey Freidl; it is published by O'Reilly and Associates. Posix regex is explained in the book; as well, there are a number of tests, gambits, and special cases that can be used to substantially speed up searches. There are two different 'engines' - NFA (nondeterministic finite automaton) and DFA (deterministic finite automaton). Perl is 'traditional NFA' according to Mr. Friedl, while POSIX leans towards DFA-like behavior in all cases (always returns 'leftmost-longest' that matches - perl returns I believe the first part that matches). Also, Perl does not 'do' POSIX IIRC; so the results can actually be different when using the same regex string. Perl does however have some very powerful features for its regex - covered in the book. Anyways, get the book - it is well written and very informative. Another book that covers this is the 'Red Dragon' compiler book by Aho Sethi and Ullman. Page 113 and following in the version I have. Maybe this fellow should post all or part of his regex strings - there may well be a way to optimize them. I am sure that there is a way to speed it up to get it faster. 100X difference is just too great to say that it is the library, IMHO. Cordially Patrick Giagnocavo patrick@xinside.com -Opinions expressed are not even mine, let alone my employers'!-
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199701300029.RAA02431>