Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 23 Dec 2011 14:39:31 +0000 (UTC)
From:      Gabor Kovesdan <gabor@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-user@freebsd.org
Subject:   svn commit: r228842 - user/gabor/tre-integration/lib/libc/regex
Message-ID:  <201112231439.pBNEdVI7071003@svn.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: gabor
Date: Fri Dec 23 14:39:30 2011
New Revision: 228842
URL: http://svn.freebsd.org/changeset/base/228842

Log:
  - Minor rewording of some existing parts
  - Document some TRE-specific features

Modified:
  user/gabor/tre-integration/lib/libc/regex/re_format.7

Modified: user/gabor/tre-integration/lib/libc/regex/re_format.7
==============================================================================
--- user/gabor/tre-integration/lib/libc/regex/re_format.7	Fri Dec 23 13:50:33 2011	(r228841)
+++ user/gabor/tre-integration/lib/libc/regex/re_format.7	Fri Dec 23 14:39:30 2011	(r228842)
@@ -37,7 +37,7 @@
 .\"	@(#)re_format.7	8.3 (Berkeley) 3/20/94
 .\" $FreeBSD$
 .\"
-.Dd October 6, 2011
+.Dd December 23, 2011
 .Dt RE_FORMAT 7
 .Os
 .Sh NAME
@@ -69,13 +69,13 @@ so this manual will describe the behavio
 instead of just reproducing the same iformation that is already
 available in the standard.
 .Pp
-An extended regular expression is one or more non-empty
+An extended regular expression is constructed from one or more non-empty
 .Em branches ,
 separated by
 .Ql \&| .
 It matches anything that matches one of the branches.
 .Pp
-A branch is one or more
+A branch consists of one or more
 .Em pieces ,
 concatenated.
 It matches a match for the first, followed by a match for the second, etc.
@@ -284,7 +284,7 @@ The reverse, matching any character that
 class, the negation operator of bracket expressions may be used:
 .Ql [^[:class:]] .
 .Pp
-In the event that a regular expression  could match more than one
+In the event that a regular expression could match more than one
 substring of a given string,
 the regular expression matches the one starting earliest in the string.
 If the regular expression could match more than one substring starting
@@ -343,7 +343,77 @@ longer than 256 bytes,
 as an implementation can refuse to accept such regular expressions and
 remain POSIX-compliant.
 .Pp
+As described before,
+repetition operators and bounds are greedy by definition.
+This implementation provides non-greedy operators and bounds that
+are formed by adding an extra
+.Ql \&?
+after the repetition.
+.No e.g. Ql a*?
+will be non-greedy,
+that is,
+will match as few characters as possible.
+.Pp
+Another extension in this implementation is the set of non-standard
+anchors:
+.Bl -tag -width BBBB
+.It Ql \e<
+Beginning of a word
+.It Ql \e>
+End of a word
+.It Ql \eb
+Word boundary
+.It Ql \eB
+Non-word boundary
+.It Ql \ed
+Digit (equivalent to [[:digit:]])
+.It Ql \eD
+Non-digit (equivalent to [^[:digit:]])
+.It Ql \es
+Space (equivalent to [[:space:]])
+.It Ql \eS
+Non-space (equivalent to [^[:space:]])
+.It Ql \ew
+Word character (equivalent to [[:alnum]])
+.It Ql \eW
+Non-word character (equivalent to [^[:alnum]])
+.El
+.Pp
+The literal characters can also be expressed with an extended notation
+apart from real literals and escaped specials.
+It is possible to specify 8\-bit hexadecimal encoded characters
+.No e.g. \ex1B
+or wide hexadecimal encoded characters
+.No e.g. \ex{263a} .
+With this notation,
+every character can be included in a regular expression.
+Some common non\-printable characters have an escaped shorthand,
+as well:
+.Bl -tag -width BBBB
+.It Ql \ea
+Bell character (ASCII code 7)
+.It Ql \ee
+Escape character (ASCII code 27)
+.It Ql \ef
+Form\-feed character (ASCII code 12)
+.It Ql \en
+Newline character (ASCII code 10)
+.It Ql \er
+Carriage return character (ASCII code 13)
+.It Ql \et
+Horizontal tab character (ASCII code 9)
+.El
+.Pp
 Basic regular expressions differ in several respects.
+The delimiters for bounds are
+.Ql \e{
+and
+.Ql \e} ,
+with
+.Ql \&{
+and
+.Ql \&}
+by themselves ordinary characters.
 .Ql \&|
 is an ordinary character and there is no equivalent
 for its functionality.
@@ -352,23 +422,14 @@ and
 .Ql ?\&
 are ordinary characters, and their functionality
 can be expressed using bounds
-.No ( Ql {1,}
+.No ( Ql \e{1,\e}
 or
-.Ql {0,1}
+.Ql \e{0,1\e}
 respectively).
 Also note that
 .Ql x+
 in extended regular expressions is equivalent to
 .Ql xx* .
-The delimiters for bounds are
-.Ql \e{
-and
-.Ql \e} ,
-with
-.Ql \&{
-and
-.Ql \&}
-by themselves ordinary characters.
 The parentheses for nested subexpressions are
 .Ql \e(
 and
@@ -426,6 +487,8 @@ This manual was originally written by
 for an older implementation and later extended and
 tailored for TRE by
 .An Gabor Kovesdan .
+The description of TRE\-specific extensions is based on the original
+TRE documentation.
 The regex implementation comes from the TRE project
 and it was included first in
 .Fx 10-CURRENT.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201112231439.pBNEdVI7071003>