Date: Tue, 5 Jun 2007 03:10:39 +0900 (JST) From: Kazuaki ODA <kazuaki@aliceblue.jp> To: FreeBSD-gnats-submit@FreeBSD.org Subject: gnu/113343: [PATCH] grep(1) outputs NOT-matched lines (with multi-bytes characters) Message-ID: <200706041810.l54IAd1D048113@eyes.aliceblue.jp> Resent-Message-ID: <200706041830.l54IU6tD093262@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 113343 >Category: gnu >Synopsis: [PATCH] grep(1) outputs NOT-matched lines (with multi-bytes characters) >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Mon Jun 04 18:30:05 GMT 2007 >Closed-Date: >Last-Modified: >Originator: Kazuaki ODA >Release: FreeBSD 6.2-RELEASE-p5 i386 >Organization: >Environment: System: FreeBSD eyes.aliceblue.jp 6.2-RELEASE-p5 FreeBSD 6.2-RELEASE-p5 #3: Sat May 26 12:45:48 JST 2007 kazuaki@eyes.aliceblue.jp:/usr/obj/usr/src/sys/EYES i386 >Description: Our grep(1) is a bit broken with multi-bytes characters. If byte sequence matches the searched pattern, grep(1) outputs the line containing the sequence. Of course, this is fine for single-byte characters, but may be wrong for multi-bytes characters. If matched sequence is the second byte of a character and the first byte of the next character, that is not matched and grep(1) should not output the line. Since our grep(1) has support for multi-bytes characters (and locales), it does not always behave as described above, but sometimes does. >How-To-Repeat: >Fix: Apply attached patch. mbstate_t should be initialized whenever mbrlen() returns -2, I think. --- search.c.diff begins here --- --- gnu/usr.bin/grep/search.c.orig Wed Mar 22 05:51:35 2006 +++ gnu/usr.bin/grep/search.c Tue Jun 5 01:09:24 2007 @@ -400,9 +400,12 @@ } if (mlen == (size_t) -2) - /* Offset points inside multibyte character: - * no good. */ - break; + { + /* Offset points inside multibyte character: + * no good. */ + memset (&mbs, '\0', sizeof (mbstate_t)); + break; + } beg += mlen; bytes_left -= mlen; @@ -462,9 +465,12 @@ } if (mlen == (size_t) -2) - /* Offset points inside multibyte character: - * no good. */ - break; + { + /* Offset points inside multibyte character: + * no good. */ + memset (&mbs, '\0', sizeof (mbstate_t)); + break; + } beg += mlen; bytes_left -= mlen; @@ -925,15 +931,21 @@ } if (mlen == (size_t) -2) - /* Offset points inside multibyte character: no good. */ - break; + { + /* Offset points inside multibyte character: no good. */ + memset (&mbs, '\0', sizeof (mbstate_t)); + break; + } beg += mlen; bytes_left -= mlen; } if (bytes_left) - continue; + { + beg += bytes_left; + continue; + } } else #endif /* MBS_SUPPORT */ @@ -1051,6 +1063,7 @@ { /* Offset points inside multibyte character: * no good. */ + memset (&mbs, '\0', sizeof (mbstate_t)); break; } --- search.c.diff ends here --- >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200706041810.l54IAd1D048113>