Date: Tue, 27 Apr 2021 16:05:42 +0200 From: =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= <fernape@freebsd.org> To: Mark Millard <marklmi@yahoo.com> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: Regular expression compilation fail in current Message-ID: <CAGwOe2YxS08kZqodK2GzqbPVjW_=e%2BVrs9_peWFsL4-7KTpc6Q@mail.gmail.com> In-Reply-To: <CC70147B-9A24-433A-8678-31BD183DEE7F@yahoo.com> References: <CAGwOe2bwyLihdOzyxVYgdaSUTwGzELANARSh=HGQoou=5FgG%2Bg@mail.gmail.com> <CC70147B-9A24-433A-8678-31BD183DEE7F@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Apr 27, 2021 at 5:14 AM Mark Millard <marklmi@yahoo.com> wrote: > > > > On 2021-Apr-26, at 06:31, Fernando Apestegu=C3=ADa <fernape at freebsd.or= g> wrote: > > > Hi there, > > > > I'm working with this port PR > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D255182 > > > > and the problem seems to boil down to a regular expression that does > > not compile on current but it does in 12.2. > > > > The minimum repro is this one: > > > > #include <regex.h> > > #include <stdio.h> > > > > int > > main() > > { > > regex_t regexp; > > int ret =3D regcomp(®exp, "\\s*", REG_EXTENDED | REG_ICASE | > > REG_NOSUB); > > Here is my stab at notes for this . . . > > It is not all that uncommon for error cases to be > initially mistreated but later toolchains to reject > instead of mistreating the same. I suspect that is > what is going on here. But the details seem to be > as follows. > > Using C++11's raw_characters notation to specify > string content, "\\s*" is: > > R"%(\s*)%" > > In other words, the content of the string is just: > > \s* > > (3 characters, plus a terminating '\0' present). > It is this later string contant that the regcomp > 2nd parameter points to and that leads to the > error report. > > The "s" is not valid after the backslash for Basic > Regular Expressions or for Extended Regular Expressions. > ( https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.htm= l ) > > REG_EESCAPE is described at: > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html > > as: > > QUOTE > REG_EESCAPE > Trailing <backslash> character in pattern. > END QUOTE > > In other words: an extra backslash not paired > with anything valid just after it --so it is > tailing whatever was before it. > > If you meant the parameter received to point in > memory to: > > \\s* > > ( 4 characters, plus a terminating '\0' after it, > a.k.a. R"%(\\s*)%" ) you likely want the C-string: > > "\\\\s*" > > as the argument, shown below: > > regcomp(®exp, "\\\\s*", REG_EXTENDED | REG_ICASE | REG_NOSUB) > > If you meant some other character sequence in memory, I'd > have to know what it was to try to back-translate it to > C-source that would produce the correct content in the > memory pointed to. > > > if ( ret !=3D 0) { > > printf("regexp compilation failed: %d\n", ret); > > } > > > > return 0; > > } > > > > This one works in 12.2 > > It might not be rejected, but was does it do? And is that > conformant with: > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html > > ? > > > but fails to compile the regexp in FreeBSD > > 14.0-CURRENT #11 main-n245984-15221c552b3c with error 5 REG_EESCAPE > > `\' applied to unescapable character. > > > > Any help is appreciated. > > Note: While I used C++11's notation as one way of > indicating string content, no C standard has the > notation to my knowledge. Thanks for the explanation, Mark. > > =3D=3D=3D > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGwOe2YxS08kZqodK2GzqbPVjW_=e%2BVrs9_peWFsL4-7KTpc6Q>