Date: Tue, 27 Sep 2011 08:25:08 +0200 From: joost@jodocus.org To: "grarpamp" <grarpamp@gmail.com> Cc: freebsd-questions@freebsd.org Subject: Re: Regex Wizards Message-ID: <8bfa48b43bf616c25623fec80fe3c6aa.squirrel@webmail.jodocus.org>
next in thread | raw e-mail | index | archive | help
> Under the ERE implementation in RELENG_8, I'm having > trouble figuring out how to group and backreference this. > Given a line, where: > If AAA is present, CCC will be too, and B may appear in between. If AAA is not present, neither CCC or B will be present. > DDDD is always present. > Junk may be present. > Match good lines and ouput in chunks. > echo junkAAAABCCCDDDDjunk | \ > This works as expected: > sed -E -n 's,^.*(AAAB?CCC)(DDDD).*$,1 \1 2 \2,p' > 1 AAABCCC 2 DDDD > But making the leading bits optional per spec does not work: > sed -E -n 's,^.*(AAAB?CCC)?(DDDD).*$,1 \1 2 \2,p' > 1 2 DDDD > Nor does adding the usual grouping parens: > sed -E -n 's,^.*((AAAB?CCC)?)(DDDD).*$,1 \1 2 \2,p' > 1 2 > How do I group off the leading bits? > Or is this a limitation of ERE's? > Or a bug? > Thanks. Regular expressions are greedy by default. .* is matching "junkAAAABCCC" in your second and third example. Try `sed -E -n 's,^(.*)(AAAB?CCC)?(DDDD).*$,1 \1 2 \2 3 \3,p'` and you'll see what I mean. In perl I'd tell you to use .*? instead of .* but I have no idea what the posix equivalent is if it exists. Hope this helps. Joost Bekkers
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?8bfa48b43bf616c25623fec80fe3c6aa.squirrel>