From owner-freebsd-hackers Tue Apr 4 19:18:23 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from phobos.illtel.denver.co.us (dsl-206.169.4.82.wenet.com [206.169.4.82]) by hub.freebsd.org (Postfix) with ESMTP id D269537BC59 for ; Tue, 4 Apr 2000 19:18:15 -0700 (PDT) (envelope-from abelits@phobos.illtel.denver.co.us) Received: from localhost (abelits@localhost) by phobos.illtel.denver.co.us (8.9.3/8.9.3) with ESMTP id TAA11683; Tue, 4 Apr 2000 19:19:06 -0700 Date: Tue, 4 Apr 2000 19:19:06 -0700 (PDT) From: Alex Belits To: "G. Adam Stanislav" Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: Unicode on FreeBSD In-Reply-To: <20000404201412.C261@whizkidtech.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 4 Apr 2000, G. Adam Stanislav wrote: > On Tue, Apr 04, 2000 at 05:05:05PM -0700, Alex Belits wrote: > > The existing "market" of multilingual application is so small, and it's > >based on so simplistic requirements (to be able to display and print > >characters, and make multilingual "web pages"), that even solution so much > >flawed as standardization on Unicode can survive. Unicode is positioned as > >the _replacement_ for languages/charsets handling infrastructure -- "we > >know all the characters, so we can write all the words, right?". > > Not so. Unicode is a character map. One of many. It just happens to be > the most inclusive one in existence. It is. However if you look at the current efforts of its "adoption", it is not used as one. It's touted as the solution to all language-related problems, as a replacement of language/charset labeling infrastructure and as the necessary prerequisite for any multilingual text processing. [skipped] > It does not, for example, provide sorting order. It cannot. Unicode is > not about linguistics, it is about mapping characters regardless of their > use in specific languages. And different languages sort characters > differently. For example, in Slovak, "ch" is considered a character > which belongs after the "h". In other languages it is sorted differently. > And in most languages, it is just two unrelated characters. This is the kind of work that currently nonexistent language support infrastructure should do -- when some language is encountered in "multilingual" document/protocol/... its name can be used to load the procedures (in this case sorting but it may be hyphenation, phonetic match, etc.) for that particular language, and if no matched language is known or supported, data should be just left alone. The same infrastructure can be designed to support charsets and encodings, doing conversion between them (and unicode) only where possible and necessary, and providing the text in either "original" or "preferred", "supported", etc. encoding for the language for the particular operation that should be performed on the text. If such thing will be implemented, all existing charset-specific routines that now exist in various places, can be reused, and compatibility with existing software can be achieved without any significant pain. > Unicode is not simplistic. It does what its stated goal is, and it does > it well. How we use it, is up to us. > > Cheers, > Adam > > P.S. Hmmm... Interesting. I noticed my random quote contains a C-caron. > I wonder how it is going to be handled. :) It was handled pretty well for such a primitive system as pine in xterm. Since your charset was iso 8859-2, it was marked as such in Content-Type header of the message. pine given me a warning: ---8<--- [ The following text is in the "iso-8859-2" character set. ] [ Your display is set for the "koi8-r" character set. ] [ Some characters may be displayed incorrectly. ] --->8--- and displayed the text. xterm used the default font that happened to be in koi8-r charset, displaying C-caron as cyrillic ha. I have read the warning, manually switched xterm to a font in iso 8859-2 charset, and text was displayed correctly. If I used a gui-based MUA such as Netscape (what I didn't because Netscape Messenger sucks for reasons that have nothing to do with its charsets support), it would just display the message in the charset defined in the header. -- Alex ---------------------------------------------------------------------- Excellent.. now give users the option to cut your hair you hippie! -- Anonymous Coward To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message