From owner-freebsd-hackers Tue Mar 11 20:05:25 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.5/8.8.5) id UAA12648 for hackers-outgoing; Tue, 11 Mar 1997 20:05:25 -0800 (PST) Received: from fallout.campusview.indiana.edu (fallout.campusview.indiana.edu [149.159.1.1]) by freefall.freebsd.org (8.8.5/8.8.5) with ESMTP id UAA12641 for ; Tue, 11 Mar 1997 20:05:20 -0800 (PST) Received: from localhost (jfieber@localhost) by fallout.campusview.indiana.edu (8.8.5/8.8.5) with SMTP id XAA29802; Tue, 11 Mar 1997 23:03:29 -0500 (EST) Date: Tue, 11 Mar 1997 23:03:29 -0500 (EST) From: John Fieber To: Terry Lambert cc: pam@polynet.lviv.ua, hackers@freebsd.org Subject: Re: Q: Locale - is it possible to change on the fly? In-Reply-To: <199703111800.LAA25667@phaeton.artisoft.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Tue, 11 Mar 1997, Terry Lambert wrote: > Be forewarned: locale is a mechansim for localizing software to a > single locale, not for localizing software to multiple locales > simultaneously. Yes. > Like Unicode, it is a tool for localization, not multinationalization; > tools for multinationalization don't really exist, per se, since their > application is limited to language researchers and translators. The Huh? The Unicode 2.0 standard explicitly states multilingual computing as the primary goal of the development effort. (First sentence in section 1.1: Design Goals.) The problem with locales is that they address the operating environment for software, but blindly assume it to be appropriate for whatever data is encountered. Some dimensions of the locale may remain "local", but other parts need to be driven by the data, not the LANG environment variable. For well behaved MIME mail messages this can work pretty well, but it does not work in the general case. Unicode attempts to help out here by providing a locale independent data coding scheme. With an en_US.ISO_8859-1 locale, document in Russian (KOI8-R) cannot be properly processed. If I want to index it, how do I know what codes constitute word boundaries? What if I want to combine Russian and French in the same index, or, heaven forbid, in the same document? Now, if I had an en_US.UTF locale (I actually do, but it is little buggy) and the Russian and French document was in unicode, I could sensibly process it in a useful manner even though my preferred locale was different. Granted, some things like collating sequence are language dependent. In Unicode, many languages share the same code space (just like many languages share 8859-1) and explicit tagging of languages as an unspecified higher level protocol. Multilingual applications limited to linguists? I suspect there are plenty of people who know and use languages that don't share the same character encoding. :) Unicode also provides a rich assortment of other things useful regardless of your language. How many times have you seen web pages with the telltale signs of "smart quotes"? Box drawing characters that are portable across platforms? Wheee! Math symbols? Lots of people could use a richer set than + - / * and ^. > best you can hope for is picking a single round-trip character set > that supports both your languages. You will never find one of these > for, for example, Chinese and Japanese. I gather it is possible to round-trip CJK conversions through unicode by utilizing the private use area. I don't speak from direct experience on this however. -john