Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 27 Sep 2011 08:25:08 +0200
From:      joost@jodocus.org
To:        "grarpamp" <grarpamp@gmail.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: Regex Wizards
Message-ID:  <8bfa48b43bf616c25623fec80fe3c6aa.squirrel@webmail.jodocus.org>

next in thread | raw e-mail | index | archive | help
> Under the ERE implementation in RELENG_8, I'm having
> trouble figuring out how to group and backreference this.
> Given a line, where:
>  If AAA is present, CCC will be too, and B may appear in between. If AAA
is not present, neither CCC or B will be present.
>  DDDD is always present.
>  Junk may be present.
>  Match good lines and ouput in chunks.
> echo junkAAAABCCCDDDDjunk | \
> This works as expected:
> sed -E -n 's,^.*(AAAB?CCC)(DDDD).*$,1 \1 2 \2,p'
> 1 AAABCCC 2 DDDD
> But making the leading bits optional per spec does not work:
> sed -E -n 's,^.*(AAAB?CCC)?(DDDD).*$,1 \1 2 \2,p'
> 1  2 DDDD
> Nor does adding the usual grouping parens:
> sed -E -n 's,^.*((AAAB?CCC)?)(DDDD).*$,1 \1 2 \2,p'
> 1 2
> How do I group off the leading bits?
> Or is this a limitation of ERE's?
> Or a bug?
> Thanks.

Regular expressions are greedy by default. .* is matching "junkAAAABCCC"
in your second and third example.

Try `sed -E -n 's,^(.*)(AAAB?CCC)?(DDDD).*$,1 \1 2 \2 3 \3,p'` and you'll
see what I mean.

In perl I'd tell you to use .*? instead of .* but I have no idea what the
posix equivalent is if it exists.


Hope this helps.


Joost Bekkers







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8bfa48b43bf616c25623fec80fe3c6aa.squirrel>