From owner-freebsd-stable Tue Mar 25 4:26:13 2003 Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D83C837B401 for ; Tue, 25 Mar 2003 04:26:08 -0800 (PST) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 35D4B43F85 for ; Tue, 25 Mar 2003 04:26:08 -0800 (PST) (envelope-from tlambert2@mindspring.com) Received: from pool0001.cvx40-bradley.dialup.earthlink.net ([216.244.42.1] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 18xnVD-0002pS-00; Tue, 25 Mar 2003 04:26:04 -0800 Message-ID: <3E8049DB.F29A2FBD@mindspring.com> Date: Tue, 25 Mar 2003 04:21:47 -0800 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Pete French Cc: daved@nostrum.com, stable@freebsd.org Subject: Re: Resolver Issues (non valid hostname characters) References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a439893b64f2ea2effd2c1c79968c5b02e387f7b89c61deb1d350badd9bab72f9c350badd9bab72f9c X-Spam-Status: No, hits=-20.8 required=5.0 tests=AWL,EMAIL_ATTRIBUTION,QUOTED_EMAIL_TEXT, RCVD_IN_OSIRUSOFT_COM,REFERENCES,REPLY_WITH_QUOTES autolearn=ham version=2.50 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Pete French wrote: > > The specific reasons were for support of Big5 due to increased > > political pressure coming from China. See the ICANN web site > > for details. > > Big-5 ? I thought they were switching to doing it all in UTF-8 > so that all the unpleasent "what character set is it" stuff goes > away! > > Is this another great opportunity for Unicode ending up lost for > no good reason ? :-( You'll notice that most of the Chinese SPAM you get is not Unicode-encoded; this is for a good reason. ;^). The Japanese hate Unicode because it's in Chinese dictionary order; they also hate it because you can't "grep -v" a mixed language file to get rid of the Chinese. The Chinese hate it for their own reasons. Basically, it's because it's a character set standard that's not useful for font encoding. Oh yeah: forget the fact that Japanese dictionary order had no way of classifying Chinese characters, and there were two of them to choose from, while Chinese could classify previously unknown Japanese glyphs, and there was only one order, "stroke-radical"... ;^). For ligatured fonts, like Tamil, Devengari, Arabic, and non-simplified Hebrew (and cursive German and English, for that matter), there is an inherent bias in the Unicode standard not interspersing "private use" areas. What this means is that it's hard to use these languages with fixed-cell rendering technologies, like X Windows. This was on purpose, since the Apple/IBM "Pink OS" and the resulting company, Taligent had an intentional bias towards Adobe rendering technology, where the brains live in the device, not in the font supplier, and ligatures are written on the fly, using pixel-poking (e.g. Display PostScript), which is only inexpensive if you are going to a local frame buffer. So it's not very appreciated in countries/regions with ligatured languages, or non-spacing diacriticals, e.g. Hangul (Korean), etc.. There's also the little problem of UTF-encoding ("Use The Force" is what we called it when it was first proposed in comp.std.internat), which renders all fixed-size record storage formats, like, Oh, Say, that used by ALL existing COBOL, Ada, and some FORTRAN data) useless (can't "stat" the file and divide by sizeof(struct foo) to get the record count). Finally, when ISO standardized it as ISO-10646, they caved-in to the Japanese desire to be able to attribute characters by language, and made it 32 bits, with only code page zero allocated initially. This gives them the ability to "grep -v" non-Japanese text out, so they do not have to see it (I suppose it would also work for non-French text, come to that ;^)). This resulted in the Windows-vs.-UNIX debate about "Is wchar_t 16 bits or 32 bits?" (it's 16 on Windows, but 32 on UNIX, a clear division between the defacto and the paper standards camps). So there's a huge amount of political water under the Unicode bridge that makes people not like using it. Plus, it's not like you could make a "Unicode Font". -- As far as DNS goes, though, it's hierarchical; so as long as all the DNS servers under a given suffix use the same encoding, it doesn't matter. For example, the first DNSINT deployment in Japan, for example, it used UTF-5 encoded JIS-208 + JIS-212 (to force the characters into the allowable DNS character set range, instead of having to change the DNS servers, caches, and all the resolver libraries in creation, you just changed the clients). So you bloated each character out to up to 7 characters for transfer. Historically, I think it will take a while to work out, and until it works out for Windows, I think people are going to pretty much ignore it... just like IPv6. PS: My pet IPv6 conspiracy theory is that the NSA don't want it deployed because of the built-in strong crypto support, and so they got the DOJ to back of Microsoft in trade for not deploying. PPS: No, I did not originate that conspiracy theory... 8-) 8-). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message