From owner-freebsd-i18n@FreeBSD.ORG Mon Sep 17 17:01:09 2007 Return-Path: Delivered-To: i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3331216A417; Mon, 17 Sep 2007 17:01:09 +0000 (UTC) (envelope-from taku@tackymt.homeip.net) Received: from basalt.tackymt.homeip.net (unknown [IPv6:2001:3e0:577:0:20d:61ff:fecc:2253]) by mx1.freebsd.org (Postfix) with ESMTP id DE1EA13C481; Mon, 17 Sep 2007 17:01:08 +0000 (UTC) (envelope-from taku@tackymt.homeip.net) Received: from localhost (localhost [127.0.0.1]) by basalt.tackymt.homeip.net (Postfix) with ESMTP id AFFCB10749; Tue, 18 Sep 2007 02:01:07 +0900 (JST) Received: from basalt.tackymt.homeip.net ([127.0.0.1]) by localhost (basalt.tackymt.homeip.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 24382-07; Tue, 18 Sep 2007 02:01:03 +0900 (JST) Received: from biotite (biotite.tackymt.homeip.net [IPv6:2001:3e0:577:0:216:cfff:febc:1472]) by basalt.tackymt.homeip.net (Postfix) with ESMTP; Tue, 18 Sep 2007 02:01:03 +0900 (JST) Date: Tue, 18 Sep 2007 02:01:00 +0900 From: "YAMAMOTO, Taku" To: Andrey Chernov Message-Id: <20070918020100.d43beb0b.taku@tackymt.homeip.net> In-Reply-To: <20070917092130.GA24424@nagual.pp.ru> References: <20070916192924.GA12678@nagual.pp.ru> <20070917092130.GA24424@nagual.pp.ru> Organization: Trans New Technology, Inc. X-Mailer: Sylpheed 2.4.4 (GTK+ 2.10.14; i386-portbld-freebsd7.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at tackymt.homeip.net Cc: current@freebsd.org, i18n@freebsd.org, Petr Hroudn?? , perky@freebsd.org Subject: Re: Ctype patch for review X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Sep 2007 17:01:09 -0000 On Mon, 17 Sep 2007 13:21:30 +0400 Andrey Chernov wrote: > On Mon, Sep 17, 2007 at 10:29:21AM +0200, Petr Hroudn?? wrote: > > 2007/9/16, Andrey Chernov : > > > The problem is: currently our single byte ctype functions are broken for > > > wide characters locales in the argument range >= 0x80 - they may return > > > false positives. > > > > > > For example, for UTF-8 locale we currently have: > > > iswspace(0xA0)==1 and isspace(0xA0)==1 > > > (because iswspace() and isspace() are the same code) > > > but must have > > > isspace(0xA0)==0 > > > > This is exactly what happens on other OSes and I agree this is the > > right behaviour > > for UTF-8. However, we must ensure, that: > > > > for C locale: isspace(0xA0)==0 > > for ISO8859-* locales: isspace(0xA0)==1 > > for UTF-8 locales: isspace(0xA0)==0 > > The patch test for wide char locale presence first (__mb_cur_max > 1), so > does not affect single byte locales like ISO8859-* > Checking for __mb_cur_max is not enough for certain locales. For example, SJIS has following range for JIS X0201 (a.k.a. HALFWIDTH KANA). /* * JIS X201 */ PUNCT 0xa1-0xa5 SPACE 0xa0 BLANK 0xa0 SPECIAL 0xa1-0xdf PHONOGRAM 0xa6-0xdf SWIDTH1 0xa0-0xdf -- -|-__ YAMAMOTO, Taku | __ < - A chicken is an egg's way of producing more eggs. -