From owner-freebsd-hackers Thu Jun 11 00:50:21 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA14346 for freebsd-hackers-outgoing; Thu, 11 Jun 1998 00:50:21 -0700 (PDT) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from coconut.itojun.org (root@coconut.itojun.org [210.160.95.97]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA14317 for ; Thu, 11 Jun 1998 00:50:15 -0700 (PDT) (envelope-from itojun@itojun.org) Received: from localhost (itojun@localhost.itojun.org [127.0.0.1]) by coconut.itojun.org (8.8.8+3.0Wbeta12/3.6W) with ESMTP id QAA11421; Thu, 11 Jun 1998 16:44:15 +0900 (JST) To: Konstantin Chuguev cc: Gary Kline , Terry Lambert , hackers@FreeBSD.ORG In-reply-to: joy's message of Thu, 11 Jun 1998 12:14:40 +0600. <357F75D0.CEBAC766@urc.ac.ru> X-Template-Reply-To: itojun@itojun.org X-Template-Return-Receipt-To: itojun@itojun.org X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD 90 5F B4 60 79 54 16 E2 Subject: Re: internationalization From: Jun-ichiro itojun Itoh Date: Thu, 11 Jun 1998 16:44:15 +0900 Message-ID: <11417.897551055@coconut.itojun.org> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG >> Yes, iso-2022 families are quite important for supporting >> asian languages. Unicode is, for us Japanese, quite incomplete and >> unexpandable. >Do you mean Unicode does not cover all the CJK characters? Unicode maps different Chinese/Japanese/Korean letters into the same codepoint. The actual appearance (gryph) will be determined by the selection of font. (so, there will be font just for Chinese, font just for Japanese, and font just for Korean). Therefore, it may be sufficient for supporting single asian language (for example Japanization), it is not sufficient for multilingualization (C/J/K support at the same time). With Unicode, you will never be able to write a plaintext with C/J/K letters mixed. For example, I frequently write such a plaintext, for list of plates for chinese restaurant, with description in Japanese attached. Such a plaintext cannot be generated with Unicode. >What is "unexpandable"? Unicode people stressed Unicode because of the "fixed bitwidth" nature of Unicode. Therefore, basically they will not be able to support more than 2^16 letters. Recently Unicode introduced "surrogate pair" which makes Unicode a variable bitwidth character set. This breaks the key feature of Unicode, and it shows that Unicode is not expandable as nature. (Correct me if I'm wrong about "surrogate pair"...) iso-2022 is well designed to accomodate new character sets to appear later. Even with the most simplest subset it can accomodate bunch of character sets. Handling bare iso-2022 string is some hard to implement because it is variable length (yes I agree). If we can provide a good library for iso-2022, then there's no reason for us to migrate to Unicode. >> Yes, for Japanese, Chinese and Korean iso-2022 based model (euc-xx >> falls into the category) is really important. However, I >Why not to support both ISO 2022 and Unicode? Yes, it is more difficult >to implement. But otherwise we can lose compatibility with other systems. Of course my library support both of them. If you say setrunelocale("UTF2"), the internal and external representation will be come Unicode. If you say setrunelocale("ja_JP.iso-2022-jp") it will be come Japanese iso-2022-jp encoding. I'll try to release my library with sample application sooner. I think I can give you the tarball at New Olreans :-) itojun To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message