FreeBSD Mail Archives

Date:      Wed, 10 Jun 1998 19:31:42 -0700 (PDT)
From:      Gary Kline <kline@tao.thought.org>
To:        itojun@itojun.org (Jun-ichiro itojun Itoh)
Cc:        tlambert@primenet.com, hackers@FreeBSD.ORG
Subject:   Re: internationalization
Message-ID:  <199806110231.TAA09494@tao.thought.org>
In-Reply-To: <6351.897526003@coconut.itojun.org> from Jun-ichiro itojun Itoh at "Jun 11, 98 09:46:43 am"

According to Jun-ichiro itojun Itoh:
> 
> 	Hello,
> 
> >> Another part of the problem is that XPG/4 is encoded multibyte, which
> >> is bad from a number of major perspectives, starting with ISO2022.
> >		We've got v 2.0 of the xpg4 library in 2.2.6.
> >		Do you know if any other flavor of BSD has more
> >		complete support?
> 
> 	I've been working on iso-2022 encoding support for runelocale (xpg4)
> 	library.  At this moment I'm working on some specific packages
> 	(for example, nvi or scheduler software called "sch") but will be
> 	able to merge the modification into xpg4 library part.


		Wonderful!  With the broadly international reach of
		FreeBSD I was hoping that someone in China|Japan|Taiwan
		would be into this.  There may be a broader need for
		wide character support--say Sanskrit and Thai.  ...


> 
> >> I would prefer going to a full-on Unicode implementation to support
> >> all known human languages.
> >		This was my first leaning, but I'm increasingly
> >		going toward the ISO families.
> 
> 	Yes, iso-2022 families are quite important for supporting
> 	asian languages.  Unicode is, for us Japanese, quite incomplete and
> 	unexpandable.


		Is there a way of explaining (briefly :) how the
		iso-2022 character set is displayed?  This point 
		came up the other day and I guessed that it was 
		done by a ((large)) table-lookup under X.   



> 
> >> I would suggest an initial 16 bit wchar_t with an assumption of a
> >> zero valued code page designator.  If ISO ever gets around to adding
> >> other code pages, we can deal with that at that time using page
> >> selection.  Meanwhile, we'll be able to interportate with Microsoft
> >> and JAVA, which use 16 bit wchar_t encodings.
> 
> 	I would like wchar_t to be 32bit, OR MORE.  We will see more mutliple
> 	96x96 character pages at the same time so 16bit is really not enough.
> 	Modified xpg4 library assumes that wchar_t to be at least 32bit.
> 	Otherwise I cannot encode iso-2022 variant character sets into.
> 


		Hm!  In my world, our wchar_t is 32-bits.  So your
		library would work.  Since wchar_t can be redefined,
		I ought to be able to build it anywhere.


> >		nvi/nex already have been tweaked for 8-bit international
> >		support.  I learned this accidently.  WAs quite
> >		surprised to see messages in French and German.  :-)
> >		Nonetheless, I see why you like the Unicode solution.
> >		Someone said, ``Well, French support is great, but how
> >		are you going to handle Japanese?''
> 
> 	Do you mean the internationalization of messages displayed by nvi?
> 	or file content?  If it is the latter one, please install nvi-m17n
> 	from /usr/ports/{japanese,korean,chinese}/nvi-* and see how it works.
> 	(I'm responsible for nvi-m17n...)
> 


		The messages.  And probably the display, too.  For 
		the 8-bit character set languages, they can be coded
		in standard 8859-1 with \hex and catalogued.  If 
		iso-2022 can be similarly catalogued; then my initial
		idea is valid---however iso-2022 is displayed.

		Thanks for the pointer:: I'll ftp your port and see.

> >> I have had FS-based Unicode support working for a very long time,
> >> though it has failed to be committed.  One big issue is that directory
> >> entry blocks must grow from 512b to 1k.  This has a number of
> >> implications to the soft updates work currently in progress.  This is
> >> because, in order to support a maximally sized path component, 512 + 24
> >> bytes is needed for unicaode, as opposed to 256 + 24 (which fits in 512b)
> >> for an 8 bit charaacter set.
> >		:-( !
> >		How does the ISO2022 model work here?  Isn't it the
> >		same for Japanese and Chinese?  
> 
> 	Yes, for Japanese, Chinese and Korean iso-2022 based model (euc-xx
> 	falls into the category) is really important.  However, I personally
> 	believe that filenames must be kept in C locale for simplicity...
> 
> itojun
> 

		I'll check out iso-2022 further; if you know of any
		english-language docs on this, please sent me a 
		pointer.

		gary


-- 
   Gary D. Kline         kline@tao.thought.org          Public service uNix


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199806110231.TAA09494>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation