Go forward to Case-sensitivity.
Go backward to Regexp Usage.
Go up to Regexp.
Regular Expression Operators
----------------------------
You can combine regular expressions with the following characters,
called "regular expression operators", or "metacharacters", to increase
the power and versatility of regular expressions.
Here is a table of metacharacters. All characters not listed in the
table stand for themselves.
`^'
This matches the beginning of the string or the beginning of a line
within the string. For example:
^@chapter
matches the `@chapter' at the beginning of a string, and can be
used to identify chapter beginnings in Texinfo source files.
`$'
This is similar to `^', but it matches only at the end of a string
or the end of a line within the string. For example:
p$
matches a record that ends with a `p'.
`.'
This matches any single character except a newline. For example:
.P
matches any single character followed by a `P' in a string. Using
concatenation we can make regular expressions like `U.A', which
matches any three-character sequence that begins with `U' and ends
with `A'.
`[...]'
This is called a "character set". It matches any one of the
characters that are enclosed in the square brackets. For example:
[MVX]
matches any one of the characters `M', `V', or `X' in a string.
Ranges of characters are indicated by using a hyphen between the
beginning and ending characters, and enclosing the whole thing in
brackets. For example:
[0-9]
matches any digit.
To include the character `\', `]', `-' or `^' in a character set,
put a `\' in front of it. For example:
[d\]]
matches either `d', or `]'.
This treatment of `\' is compatible with other `awk'
implementations, and is also mandated by the POSIX Command Language
and Utilities standard. The regular expressions in `awk' are a
superset of the POSIX specification for Extended Regular
Expressions (EREs). POSIX EREs are based on the regular
expressions accepted by the traditional `egrep' utility.
In `egrep' syntax, backslash is not syntactically special within
square brackets. This means that special tricks have to be used to
represent the characters `]', `-' and `^' as members of a
character set.
In `egrep' syntax, to match `-', write it as `---', which is a
range containing only `-'. You may also give `-' as the first or
last character in the set. To match `^', put it anywhere except
as the first character of a set. To match a `]', make it the
first character in the set. For example:
[]d^]
matches either `]', `d' or `^'.
`[^ ...]'
This is a "complemented character set". The first character after
the `[' *must* be a `^'. It matches any characters *except* those
in the square brackets (or newline). For example:
[^0-9]
matches any character that is not a digit.
`|'
This is the "alternation operator" and it is used to specify
alternatives. For example:
^P|[0-9]
matches any string that matches either `^P' or `[0-9]'. This
means it matches any string that contains a digit or starts with
`P'.
The alternation applies to the largest possible regexps on either
side.
`(...)'
Parentheses are used for grouping in regular expressions as in
arithmetic. They can be used to concatenate regular expressions
containing the alternation operator, `|'.
`*'
This symbol means that the preceding regular expression is to be
repeated as many times as possible to find a match. For example:
ph*
applies the `*' symbol to the preceding `h' and looks for matches
to one `p' followed by any number of `h's. This will also match
just `p' if no `h's are present.
The `*' repeats the *smallest* possible preceding expression.
(Use parentheses if you wish to repeat a larger expression.) It
finds as many repetitions as possible. For example:
awk '/\(c[ad][ad]*r x\)/ { print }' sample
prints every record in the input containing a string of the form
`(car x)', `(cdr x)', `(cadr x)', and so on.
`+'
This symbol is similar to `*', but the preceding expression must be
matched at least once. This means that:
wh+y
would match `why' and `whhy' but not `wy', whereas `wh*y' would
match all three of these strings. This is a simpler way of
writing the last `*' example:
awk '/\(c[ad]+r x\)/ { print }' sample
`?'
This symbol is similar to `*', but the preceding expression can be
matched once or not at all. For example:
fe?d
will match `fed' and `fd', but nothing else.
`\'
This is used to suppress the special meaning of a character when
matching. For example:
\$
matches the character `$'.
The escape sequences used for string constants (*note Constant
Expressions: Constants.) are valid in regular expressions as well;
they are also introduced by a `\'.
In regular expressions, the `*', `+', and `?' operators have the
highest precedence, followed by concatenation, and finally by `|'. As
in arithmetic, parentheses can change how operators are grouped.