From owner-freebsd-hackers  Thu Jun 11 00:50:21 1998
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id AAA14346
          for freebsd-hackers-outgoing; Thu, 11 Jun 1998 00:50:21 -0700 (PDT)
          (envelope-from owner-freebsd-hackers@FreeBSD.ORG)
Received: from coconut.itojun.org (root@coconut.itojun.org [210.160.95.97])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA14317
          for <hackers@freebsd.org>; Thu, 11 Jun 1998 00:50:15 -0700 (PDT)
          (envelope-from itojun@itojun.org)
Received: from localhost (itojun@localhost.itojun.org [127.0.0.1])
	by coconut.itojun.org (8.8.8+3.0Wbeta12/3.6W) with ESMTP id QAA11421;
	Thu, 11 Jun 1998 16:44:15 +0900 (JST)
To: Konstantin Chuguev <joy@urc.ac.ru>
cc: Gary Kline <kline@tao.thought.org>, Terry Lambert <tlambert@primenet.com>,
        hackers@FreeBSD.ORG
In-reply-to: joy's message of Thu, 11 Jun 1998 12:14:40 +0600.
      <357F75D0.CEBAC766@urc.ac.ru> 
X-Template-Reply-To: itojun@itojun.org
X-Template-Return-Receipt-To: itojun@itojun.org
X-PGP-Fingerprint: F8 24 B4 2C 8C 98 57 FD  90 5F B4 60 79 54 16 E2
Subject: Re: internationalization 
From: Jun-ichiro itojun Itoh <itojun@iijlab.net>
Date: Thu, 11 Jun 1998 16:44:15 +0900
Message-ID: <11417.897551055@coconut.itojun.org>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

>>         Yes, iso-2022 families are quite important for supporting
>>         asian languages.  Unicode is, for us Japanese, quite incomplete and
>>         unexpandable.
>Do you mean Unicode does not cover all the CJK characters?

	Unicode maps different Chinese/Japanese/Korean letters into the same
	codepoint.  The actual appearance (gryph) will be determined by
	the selection of font. (so, there will be font just for Chinese,
	font just for Japanese, and font just for Korean).

	Therefore, it may be sufficient for supporting single asian language
	(for example Japanization), it is not sufficient for
	multilingualization (C/J/K support at the same time).  With Unicode,
	you will never be able to write a plaintext with C/J/K letters mixed.
	For example, I frequently write such a plaintext, for list of plates
	for chinese restaurant, with description in Japanese attached.
	Such a plaintext cannot be generated with Unicode.

>What is "unexpandable"?

	Unicode people stressed Unicode because of the "fixed bitwidth"
	nature of Unicode.  Therefore, basically they will not be able to
	support more than 2^16 letters.
	Recently Unicode introduced "surrogate pair" which makes Unicode
	a variable bitwidth character set.  This breaks the key feature of
	Unicode, and it shows that Unicode is not expandable as nature.
	(Correct me if I'm wrong about "surrogate pair"...)

	iso-2022 is well designed to accomodate new character sets to appear
	later.  Even with the most simplest subset it can accomodate bunch of
	character sets.
	Handling bare iso-2022 string is some hard to implement because it
	is variable length (yes I agree).  If we can provide a good library
	for iso-2022, then there's no reason for us to migrate to Unicode.

>>         Yes, for Japanese, Chinese and Korean iso-2022 based model (euc-xx
>>         falls into the category) is really important.  However, I 
>Why not to support both ISO 2022 and Unicode? Yes, it is more difficult
>to implement. But otherwise we can lose compatibility with other systems.

	Of course my library support both of them.  If you say
	setrunelocale("UTF2"), the internal and external representation
	will be come Unicode.  If you say setrunelocale("ja_JP.iso-2022-jp")
	it will be come Japanese iso-2022-jp encoding.

	I'll try to release my library with sample application sooner.
	I think I can give you the tarball at New Olreans :-)

itojun

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message