Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jan 2021 03:05:07 GMT
From:      Kyle Evans <kevans@FreeBSD.org>
To:        src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org
Subject:   git: 912086c27f9a - stable/12 - libc: regex: rework unsafe pointer arithmetic
Message-ID:  <202101240305.10O357eR014704@gitrepo.freebsd.org>

next in thread | raw e-mail | index | archive | help
The branch stable/12 has been updated by kevans:

URL: https://cgit.FreeBSD.org/src/commit/?id=912086c27f9ab75253af8ae7914ae6001035a1b2

commit 912086c27f9ab75253af8ae7914ae6001035a1b2
Author:     Miod Vallat <miod@online.fr>
AuthorDate: 2021-01-08 18:59:00 +0000
Commit:     Kyle Evans <kevans@FreeBSD.org>
CommitDate: 2021-01-24 03:04:58 +0000

    libc: regex: rework unsafe pointer arithmetic
    
    regcomp.c uses the "start + count < end" idiom to check that there are
    "count" bytes available in an array of char "start" and "end" both point to.
    
    This is fine, unless "start + count" goes beyond the last element of the
    array. In this case, pedantic interpretation of the C standard makes the
    comparison of such a pointer against "end" undefined, and optimizers from
    hell will happily remove as much code as possible because of this.
    
    An example of this occurs in regcomp.c's bothcases(), which defines
    bracket[3], sets "next" to "bracket" and "end" to "bracket + 2". Then it
    invokes p_bracket(), which starts with "if (p->next + 5 < p->end)"...
    
    Because bothcases() and p_bracket() are static functions in regcomp.c, there
    is a real risk of miscompilation if aggressive inlining happens.
    
    The following diff rewrites the "start + count < end" constructs into "end -
    start > count". Assuming "end" and "start" are always pointing in the array
    (such as "bracket[3]" above), "end - start" is well-defined and can be
    compared without trouble.
    
    As a bonus, MORE2() implies MORE() therefore SEETWO() can be simplified a
    bit.
    
    PR:             252403
    (cherry picked from commit d36b5dbe28d8ebab219fa29db533734d47f0c4a3)
---
 lib/libc/regex/regcomp.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/lib/libc/regex/regcomp.c b/lib/libc/regex/regcomp.c
index 00ab6a77141b..fc66ea32046a 100644
--- a/lib/libc/regex/regcomp.c
+++ b/lib/libc/regex/regcomp.c
@@ -177,10 +177,10 @@ static char nuls[10];		/* place to point scanner in event of error */
  */
 #define	PEEK()	(*p->next)
 #define	PEEK2()	(*(p->next+1))
-#define	MORE()	(p->next < p->end)
-#define	MORE2()	(p->next+1 < p->end)
+#define	MORE()	(p->end - p->next > 0)
+#define	MORE2()	(p->end - p->next > 1)
 #define	SEE(c)	(MORE() && PEEK() == (c))
-#define	SEETWO(a, b)	(MORE() && MORE2() && PEEK() == (a) && PEEK2() == (b))
+#define	SEETWO(a, b)	(MORE2() && PEEK() == (a) && PEEK2() == (b))
 #define	SEESPEC(a)	(p->bre ? SEETWO('\\', a) : SEE(a))
 #define	EAT(c)	((SEE(c)) ? (NEXT(), 1) : 0)
 #define	EATTWO(a, b)	((SEETWO(a, b)) ? (NEXT2(), 1) : 0)
@@ -997,15 +997,17 @@ p_bracket(struct parse *p)
 	wint_t ch;
 
 	/* Dept of Truly Sickening Special-Case Kludges */
-	if (p->next + 5 < p->end && strncmp(p->next, "[:<:]]", 6) == 0) {
-		EMIT(OBOW, 0);
-		NEXTn(6);
-		return;
-	}
-	if (p->next + 5 < p->end && strncmp(p->next, "[:>:]]", 6) == 0) {
-		EMIT(OEOW, 0);
-		NEXTn(6);
-		return;
+	if (p->end - p->next > 5) {
+		if (strncmp(p->next, "[:<:]]", 6) == 0) {
+			EMIT(OBOW, 0);
+			NEXTn(6);
+			return;
+		}
+		if (strncmp(p->next, "[:>:]]", 6) == 0) {
+			EMIT(OEOW, 0);
+			NEXTn(6);
+			return;
+		}
 	}
 
 	if ((cs = allocset(p)) == NULL)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?202101240305.10O357eR014704>