Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Sep 2007 09:08:01 GMT
From:      Petr Hroudny <petr.hroudny@gmail.com>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   gnu/116363: isspace broken for UTF-8 locales
Message-ID:  <200709150908.l8F981jj075109@www.freebsd.org>
Resent-Message-ID: <200709150910.l8F9A2b4063466@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         116363
>Category:       gnu
>Synopsis:       isspace broken for UTF-8 locales
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sat Sep 15 09:10:02 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Petr Hroudny
>Release:        6-stable, 7-current
>Organization:
>Environment:
>Description:
In UTF-8 locales, isspace(0xA0) returns 1 which is wrong.

In UTF-8, 0xA0 could only be the second or third byte of multibyte character, but never a space.

As a consequence, operations like str.upper() and/or str.split() are broken, when
UTF-8 character with 0xA0 byte is encountered.

An example of such character is Scaron (UTF-8 code 0xC5 0xA0).
>How-To-Repeat:

>Fix:
For UTF-8 locales, 0xA0 should never be considered to be a space.

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200709150908.l8F981jj075109>