Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Jan 2018 03:53:19 +0300
From:      Yuri Pankov <yuripv@icloud.com>
To:        freebsd-hackers <freebsd-hackers@freebsd.org>, Kyle Evans <kevans@FreeBSD.org>
Subject:   libc/regex: r302824 added invalid check breaking collating ranges
Message-ID:  <a0d9abd8-19b8-cdf6-5451-e184fa182b38@icloud.com>

next in thread | raw e-mail | index | archive | help
(CCing Kyle as he's working on regex at the moment and not because he 
broke something)

Hi,

r302284 added an invalid check which breaks collating ranges:

-if (table->__collate_load_error) {
-    (void)REQUIRE((uch)start <= (uch)finish, REG_ERANGE);
+if (table->__collate_load_error || MB_CUR_MAX > 1) {
+    (void)REQUIRE(start <= finish, REG_ERANGE);

The "MB_CUR_MAX > 1" is wrong, we should be doing proper comparison 
according to current locale's collation and not simply comparing the 
wchar_t values.

Example -- see Table 1 in http://www.unicode.org/reports/tr10/:

Let's try Swedish collation:
$ echo 'test' | LC_COLLATE=se_SE.UTF-8 grep '[ö-z]'
grep: invalid character range
$ echo 'test' | LC_COLLATE=se_SE.UTF-8 grep '[z-ö]'

OK, the above seems to be correct, 'ö' > 'z' in Swedish collation, but 
we just got lucky here, as wchar_t comparison gives us the same result.

Now German one:
$ echo 'test' | LC_COLLATE=de_DE.UTF-8 grep '[ö-z]'
grep: invalid character range
$ echo 'test' | LC_COLLATE=de_DE.UTF-8 grep '[z-ö]'

Same, but according to the table, 'ö' < 'z' in German collation!

I think the fix here would be to drop the "if 
(table->__collate_load_error || MB_CUR_MAX > 1)" block entirely as we no 
longer use the "table" so there's no point in getting it and checking 
error, wcscoll() which would be called eventually in p_range_cmp() does 
the table handling itself, and we can't use the direct comparison for 
anything other than 'C' locale (not sure if it's applicable even there).



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a0d9abd8-19b8-cdf6-5451-e184fa182b38>