From owner-freebsd-hackers@freebsd.org Tue Apr 27 14:09:21 2021 Return-Path: Delivered-To: freebsd-hackers@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 732775F3039 for ; Tue, 27 Apr 2021 14:09:21 +0000 (UTC) (envelope-from fernando.apesteguia@gmail.com) Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4FV3Xw3pmsz4V5k for ; Tue, 27 Apr 2021 14:09:20 +0000 (UTC) (envelope-from fernando.apesteguia@gmail.com) Received: by mail-qv1-f48.google.com with SMTP id a30so2281332qvb.12 for ; Tue, 27 Apr 2021 07:09:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=rbQ4qeFr/chsi2nxo8vAi4/CLldGPiNtIrXTFTEf2JU=; b=YbwC3oLNksQubGqAKQh/K2hpaCuZgJanvpe9fW90iqje7FQFrP7h3ZzP1tbajtZBlG zIpGMDWlLPZaQL0pUBgpiJWnO1elFhHsQlJjLt2lTHTGFUjkWPaK7TS1FrRH6W2SFIhW T97/iYl1Jr5WgyTsshPBrJHoX20tr7feknZqCst3O1wJqePDoaHkA08er1QPn02r7onp f4uMjE6ggTrjVJXrxEe+D8hJxQuE51x8XtNSb/VgSoHlK60WBefdtzEkABVZhWNoFXgg B/cK5qXCGZhRuHMBembCp04uUGyVgFiGXg/yCD3WHoS2khIFvTKfCZV2OSFJi5kfcF0l 5C4g== X-Gm-Message-State: AOAM530miKENDnF+qw6bpvNUtUiRnDdHGm0Z/NWlks9ZknV9sykLvB/R bAgChk4kor1FffiE8TEAD/XX18ATVmXqeA== X-Google-Smtp-Source: ABdhPJx1BHA4rJWDqOlVCC3yZHpZVrLhvUrZgmnYpV8qMeP5dPSQt0RXQaNMocsoBByJd2C7JcrF2w== X-Received: by 2002:a05:6214:a43:: with SMTP id ee3mr11390335qvb.61.1619532555305; Tue, 27 Apr 2021 07:09:15 -0700 (PDT) Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com. [209.85.219.169]) by smtp.gmail.com with ESMTPSA id f24sm13783087qto.45.2021.04.27.07.09.14 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Apr 2021 07:09:14 -0700 (PDT) Received: by mail-yb1-f169.google.com with SMTP id s9so11163144ybe.5 for ; Tue, 27 Apr 2021 07:09:14 -0700 (PDT) X-Received: by 2002:a25:d701:: with SMTP id o1mr499098ybg.377.1619532554581; Tue, 27 Apr 2021 07:09:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Fernando_Apestegu=C3=ADa?= Date: Tue, 27 Apr 2021 16:05:42 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Regular expression compilation fail in current To: Mark Millard Cc: FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4FV3Xw3pmsz4V5k X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of fernandoapesteguia@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=fernandoapesteguia@gmail.com X-Spamd-Result: default: False [-2.44 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; RCVD_COUNT_THREE(0.00)[4]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_TO(0.00)[yahoo.com]; FORGED_SENDER(0.30)[fernape@freebsd.org,fernandoapesteguia@gmail.com]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; R_MIXED_CHARSET(0.56)[subject]; R_DKIM_NA(0.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[fernape@freebsd.org,fernandoapesteguia@gmail.com]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FROM_HAS_DN(0.00)[]; RBL_DBL_DONT_QUERY_IPS(0.00)[209.85.219.48:from]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-hackers@freebsd.org]; DMARC_NA(0.00)[freebsd.org]; SPAMHAUS_ZRD(0.00)[209.85.219.48:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[209.85.219.48:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.85.219.48:from]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[freebsd-hackers] X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Technical discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Apr 2021 14:09:21 -0000 On Tue, Apr 27, 2021 at 5:14 AM Mark Millard wrote: > > > > On 2021-Apr-26, at 06:31, Fernando Apestegu=C3=ADa wrote: > > > Hi there, > > > > I'm working with this port PR > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D255182 > > > > and the problem seems to boil down to a regular expression that does > > not compile on current but it does in 12.2. > > > > The minimum repro is this one: > > > > #include > > #include > > > > int > > main() > > { > > regex_t regexp; > > int ret =3D regcomp(®exp, "\\s*", REG_EXTENDED | REG_ICASE | > > REG_NOSUB); > > Here is my stab at notes for this . . . > > It is not all that uncommon for error cases to be > initially mistreated but later toolchains to reject > instead of mistreating the same. I suspect that is > what is going on here. But the details seem to be > as follows. > > Using C++11's raw_characters notation to specify > string content, "\\s*" is: > > R"%(\s*)%" > > In other words, the content of the string is just: > > \s* > > (3 characters, plus a terminating '\0' present). > It is this later string contant that the regcomp > 2nd parameter points to and that leads to the > error report. > > The "s" is not valid after the backslash for Basic > Regular Expressions or for Extended Regular Expressions. > ( https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.htm= l ) > > REG_EESCAPE is described at: > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/regcomp.html > > as: > > QUOTE > REG_EESCAPE > Trailing character in pattern. > END QUOTE > > In other words: an extra backslash not paired > with anything valid just after it --so it is > tailing whatever was before it. > > If you meant the parameter received to point in > memory to: > > \\s* > > ( 4 characters, plus a terminating '\0' after it, > a.k.a. R"%(\\s*)%" ) you likely want the C-string: > > "\\\\s*" > > as the argument, shown below: > > regcomp(®exp, "\\\\s*", REG_EXTENDED | REG_ICASE | REG_NOSUB) > > If you meant some other character sequence in memory, I'd > have to know what it was to try to back-translate it to > C-source that would produce the correct content in the > memory pointed to. > > > if ( ret !=3D 0) { > > printf("regexp compilation failed: %d\n", ret); > > } > > > > return 0; > > } > > > > This one works in 12.2 > > It might not be rejected, but was does it do? And is that > conformant with: > > https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html > > ? > > > but fails to compile the regexp in FreeBSD > > 14.0-CURRENT #11 main-n245984-15221c552b3c with error 5 REG_EESCAPE > > `\' applied to unescapable character. > > > > Any help is appreciated. > > Note: While I used C++11's notation as one way of > indicating string content, no C standard has the > notation to my knowledge. Thanks for the explanation, Mark. > > =3D=3D=3D > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) >