Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jun 1998 09:46:43 +0900
From:      Jun-ichiro itojun Itoh <itojun@itojun.org>
To:        Gary Kline <kline@tao.thought.org>
Cc:        tlambert@primenet.com (Terry Lambert), hackers@FreeBSD.ORG
Subject:   Re: internationalization 
Message-ID:  <6351.897526003@coconut.itojun.org>
In-Reply-To: kline's message of Wed, 10 Jun 1998 17:15:33 MST. <199806110015.RAA09151@tao.thought.org> 

next in thread | previous in thread | raw e-mail | index | archive | help

	Hello,

>> Another part of the problem is that XPG/4 is encoded multibyte, which
>> is bad from a number of major perspectives, starting with ISO2022.
>		We've got v 2.0 of the xpg4 library in 2.2.6.
>		Do you know if any other flavor of BSD has more
>		complete support?

	I've been working on iso-2022 encoding support for runelocale (xpg4)
	library.  At this moment I'm working on some specific packages
	(for example, nvi or scheduler software called "sch") but will be
	able to merge the modification into xpg4 library part.

>> I would prefer going to a full-on Unicode implementation to support
>> all known human languages.
>		This was my first leaning, but I'm increasingly
>		going toward the ISO families.

	Yes, iso-2022 families are quite important for supporting
	asian languages.  Unicode is, for us Japanese, quite incomplete and
	unexpandable.

>> I would suggest an initial 16 bit wchar_t with an assumption of a
>> zero valued code page designator.  If ISO ever gets around to adding
>> other code pages, we can deal with that at that time using page
>> selection.  Meanwhile, we'll be able to interportate with Microsoft
>> and JAVA, which use 16 bit wchar_t encodings.

	I would like wchar_t to be 32bit, OR MORE.  We will see more mutliple
	96x96 character pages at the same time so 16bit is really not enough.
	Modified xpg4 library assumes that wchar_t to be at least 32bit.
	Otherwise I cannot encode iso-2022 variant character sets into.

>> The last time I converted csh, this was absolute hell because the
>> code was badly organized for internationalization.
>> The next hardest step is the editors, starting with "vi".  They have
>> to be able to support Unicode.
>		nvi/nex already have been tweaked for 8-bit international
>		support.  I learned this accidently.  WAs quite
>		surprised to see messages in French and German.  :-)
>		Nonetheless, I see why you like the Unicode solution.
>		Someone said, ``Well, French support is great, but how
>		are you going to handle Japanese?''

	Do you mean the internationalization of messages displayed by nvi?
	or file content?  If it is the latter one, please install nvi-m17n
	from /usr/ports/{japanese,korean,chinese}/nvi-* and see how it works.
	(I'm responsible for nvi-m17n...)

>> I have had FS-based Unicode support working for a very long time,
>> though it has failed to be committed.  One big issue is that directory
>> entry blocks must grow from 512b to 1k.  This has a number of
>> implications to the soft updates work currently in progress.  This is
>> because, in order to support a maximally sized path component, 512 + 24
>> bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b)
>> for an 8 bit charaacter set.
>		:-( !
>		How does the ISO2022 model work here?  Isn't it the
>		same for Japanese and Chinese?  

	Yes, for Japanese, Chinese and Korean iso-2022 based model (euc-xx
	falls into the category) is really important.  However, I personally
	believe that filenames must be kept in C locale for simplicity...

itojun

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6351.897526003>