Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 11 Jun 1998 13:56:17 +0600
From:      Konstantin Chuguev <joy@urc.ac.ru>
To:        Chen Hsiung Chan <frankch@waru.life.nthu.edu.tw>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: i18n - what I can do for it?
Message-ID:  <357F8DA1.EDDBB6D0@urc.ac.ru>
References:  <19980611135643.05642@waru.life.nthu.edu.tw>

next in thread | previous in thread | raw e-mail | index | archive | help
Chen Hsiung Chan wrote:
> 
>     I am not sure about the way it is done. In fact big5 is not a
>     good encoding (not conform to ISO-2022), but I can not get rid
>     of it (it is the de facto standard in Taiwan now).
> 
There are no bad charset encodings, there are just incompatible ones :-)
And there are no charset encodings compatible with all other practically
used ones. Without speaking about 8-bit charsets, two major candidates
for being elected as the universal are Unicode and ISO 2022.
(Is big5 compatible/convertable to Unicode?)

As we can see, none of them satisfies all users. While both are
developing, there is a chance (but not a guarantee) that one of them
will sometime satisfy all the people. We cannot say now, which of them.
So there is a need in multiple charsets support in the OS and in
powerful charset conversion mechanism.

IMO it's worth choosing among something already developed in this
area instead of making something completely from scratch.

I am interesting, how many i18n APIs (or just source code pieces)
are available for public use? Including charset conversion, gettext
etc.

BTW, TCL-8.1's conception looks very attractive. It uses Unicode (UTF-8)
for its internal string representation, and has powerful and flexible
charset conversion mechanism. There is one "system" character set
(being got from locale), and all TCL's channels (virtual representation
of files, sockets etc.) can have a charset associated other than
"system".
Currently supported charsets (not counting Unicode and UTF-8) are:
ascii, big5, cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256,
cp1257, cp1258, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860,
cp861, cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp932, cp936,
cp949, cp950, dingbats, euc-jp, gb12345, gb1988, gb2312, iso2022-jp,
iso2022-kr, iso2022, iso8859-1, iso8859-2, iso8859-3, iso8859-4,
iso8859-5, iso8859-6, iso8859-7, iso8859-8, iso8859-9, jis0201,
jis0208, jis0212, macCentEuro, macCroatian, macCyrillic, macDingbats,
macGreek, macIceland, macJapan, macRoman, macRomania, macThai,
macTurkish, macUkraine, shiftjis, symbol.
All of them defined in external files.

And as for TK-8.1, it has built-in mechanism of accepting keycodes
in the system locale, and demultiplexing fonts' charsets from its
internal Unicode to fonts available in the system.

Tomorrow I added koi8-r to this list, and after patching lightly
tcl-8.1 sources, made Zircon IRC client able to speak any of
these charsets :-) I like it very much.

I understand, that's i18n implementation in such high-level language
as TCL is much simpler, than in C and the kernel. But the latter is not
impossible, there's just the need in clear specification here.

There is also TERENA's MAITS' internalization API, but there a few
information about it in the Internet, and I don't know about its
license terms and copyright.

Anybody knows other examples?

--
	Konstantin V. Chuguev.		System administrator of Southern
	http://www.urc.ac.ru/~joy/	Ural Regional Center of FREEnet,
	mailto:joy@urc.ac.ru		Chelyabinsk, Russia.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?357F8DA1.EDDBB6D0>