From owner-freebsd-hackers  Tue Mar 11 20:05:25 1997
Return-Path: <owner-hackers>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.5/8.8.5) id UAA12648
          for hackers-outgoing; Tue, 11 Mar 1997 20:05:25 -0800 (PST)
Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1])
          by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA12641
          for <hackers@freebsd.org>; Tue, 11 Mar 1997 20:05:20 -0800 (PST)
Received: from localhost (jfieber@localhost)
	by fallout.campusview.indiana.edu (8.8.5/8.8.5) with SMTP id XAA29802;
	Tue, 11 Mar 1997 23:03:29 -0500 (EST)
Date: Tue, 11 Mar 1997 23:03:29 -0500 (EST)
From: John Fieber <jfieber@indiana.edu>
To: Terry Lambert <terry@lambert.org>
cc: pam@polynet.lviv.ua, hackers@freebsd.org
Subject: Re: Q: Locale - is it possible to change on the fly?
In-Reply-To: <199703111800.LAA25667@phaeton.artisoft.com>
Message-ID: <Pine.BSF.3.95q.970311220457.26807G-100000@fallout.campusview.indiana.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Tue, 11 Mar 1997, Terry Lambert wrote:

> Be forewarned: locale is a mechansim for localizing software to a
> single locale, not for localizing software to multiple locales
> simultaneously.

Yes.

> Like Unicode, it is a tool for localization, not multinationalization;
> tools for multinationalization don't really exist, per se, since their
> application is limited to language researchers and translators.  The

Huh?

The Unicode 2.0 standard explicitly states multilingual computing
as the primary goal of the development effort. (First sentence in
section 1.1: Design Goals.)

The problem with locales is that they address the operating
environment for software, but blindly assume it to be appropriate
for whatever data is encountered.  Some dimensions of the locale
may remain "local", but other parts need to be driven by the
data, not the LANG environment variable.  For well behaved MIME
mail messages this can work pretty well, but it does not work in
the general case. 

Unicode attempts to help out here by providing a locale
independent data coding scheme.  With an en_US.ISO_8859-1 locale,
document in Russian (KOI8-R) cannot be properly processed.  If I
want to index it, how do I know what codes constitute word
boundaries?  What if I want to combine Russian and French in the
same index, or, heaven forbid, in the same document?  Now, if I
had an en_US.UTF locale (I actually do, but it is little buggy)
and the Russian and French document was in unicode, I could
sensibly process it in a useful manner even though my preferred
locale was different.

Granted, some things like collating sequence are language
dependent.  In Unicode, many languages share the same code space
(just like many languages share 8859-1) and explicit tagging of
languages as an unspecified higher level protocol.

Multilingual applications limited to linguists?  I suspect there
are plenty of people who know and use languages that don't share
the same character encoding. :)  Unicode also provides a rich
assortment of other things useful regardless of your language.
How many times have you seen web pages with the telltale signs of
"smart quotes"?  Box drawing characters that are portable across
platforms?  Wheee!  Math symbols?  Lots of people could use a
richer set than + - / * and ^.

> best you can hope for is picking a single round-trip character set
> that supports both your languages.  You will never find one of these
> for, for example, Chinese and Japanese.

I gather it is possible to round-trip CJK conversions through
unicode by utilizing the private use area.  I don't speak from
direct experience on this however.

-john