Date: Thu, 13 Mar 1997 10:51:34 -0500 (EST) From: John Fieber <jfieber@indiana.edu> To: Terry Lambert <terry@lambert.org> Cc: pam@polynet.lviv.ua, chat@freebsd.org Subject: Re: Q: Locale - is it possible to change on the fly? Message-ID: <Pine.BSF.3.95q.970312143119.26807O-100000@fallout.campusview.indiana.edu> In-Reply-To: <199703121800.LAA27652@phaeton.artisoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 12 Mar 1997, Terry Lambert wrote: > > How many times have you seen web pages with the telltale signs of > > "smart quotes"? Box drawing characters that are portable across > > platforms? Wheee! Math symbols? Lots of people could use a > > richer set than + - / * and ^. > > You can't use Unicode for this... how can you attribute fonts on, for > instance, a Japanese www page on Chinese poetry? Any character sets > which have mutually unified code points that have different glyphs > can not be simultaneously represented without font attribution. The > Unicode standard is not a glyph encoding standard. In the current world, numerous glyph encodings are used to represent documents. Correct? These differing glyh encodings often share the code space, and thus it is essential that an encoding switch signal, a font tag for example, be present. In Unicode terms, these font tags constitute a "higher level protocol". If you need to convert that document's high level protocol, MS Word to HTML for example, the all-important encoding information stands a good chance of getting lost and/or mangled. If you used dingbats, math symbols, smart quotes, or any other encodings, you are SOL. Your document has just become rubbish. You suggest that Unicode has the same problem, and I'll agree but only to a limited extent. A Unicode document should have language tags for optimal rendering, processing, input method selection, etc., but if that information is lost in a high level protocol conversion, your document is hardly turned to rubbish! First, because of a unified character encoding, your smart quotes (0x201C, 0x201D) will never be mistaken for something else as they would be in the multiple glyph encoding schemes we have to use now. Second, although Unicode makes a clear distinction between character and glyph encoding, and Unicode is a character encoding, it is also true that many of the Unicode character blocks have a direct, language independent glyph mappings in practice. Certainly, other scripts do not have direct glyph mappings, and in some cases glyph mapping is affected by the language, but I hardly think this is grounds for rendering Unicode useless for multinational computing. In the absence of higher level protocols, you cannot handle all possible languages simultaneously, but you can easily handle a heck of a lot more than you can with the current current collection of glyh encoding standards. Is that not a contribution to multinational computing? Let me also re-state that Unicode by itself is not a complete multinational computing solution any more than US-ASCII is a complete solution for American English. I never stated it as such, and certainly never meant to imply it. -john
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.3.95q.970312143119.26807O-100000>