Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Mar 2003 04:21:47 -0800
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Pete French <pfrench@firstcallgroup.co.uk>
Cc:        daved@nostrum.com, stable@freebsd.org
Subject:   Re: Resolver Issues (non valid hostname characters)
Message-ID:  <3E8049DB.F29A2FBD@mindspring.com>
References:  <E18xmYm-0000IT-00@mailhost.firstcallgroup.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Pete French wrote:
> > The specific reasons were for support of Big5 due to increased
> > political pressure coming from China.  See the ICANN web site
> > for details.
> 
> Big-5 ? I thought they were switching to doing it all in UTF-8
> so that all the unpleasent "what character set is it" stuff goes
> away!
> 
> Is this another great opportunity for Unicode ending up lost for
> no good reason ? :-(

You'll notice that most of the Chinese SPAM you get is not
Unicode-encoded; this is for a good reason.  ;^).

The Japanese hate Unicode because it's in Chinese dictionary order;
they also hate it because you can't "grep -v" a mixed language file
to get rid of the Chinese.  The Chinese hate it for their own reasons.
Basically, it's because it's a character set standard that's not
useful for font encoding.  Oh yeah: forget the fact that Japanese
dictionary order had no way of classifying Chinese characters, and
there were two of them to choose from, while Chinese could classify
previously unknown Japanese glyphs, and there was only one order,
"stroke-radical"... ;^).

For ligatured fonts, like Tamil, Devengari, Arabic, and non-simplified
Hebrew (and cursive German and English, for that matter), there is an
inherent bias in the Unicode standard not interspersing "private use"
areas.  What this means is that it's hard to use these languages with
fixed-cell rendering technologies, like X Windows.  This was on purpose,
since the Apple/IBM "Pink OS" and the resulting company, Taligent had
an intentional bias towards Adobe rendering technology, where the brains
live in the device, not in the font supplier, and ligatures are written
on the fly, using pixel-poking (e.g. Display PostScript), which is only
inexpensive if you are going to a local frame buffer.  So it's not very
appreciated in countries/regions with ligatured languages, or non-spacing
diacriticals, e.g. Hangul (Korean), etc..

There's also the little problem of UTF-encoding ("Use The Force" is
what we called it when it was first proposed in comp.std.internat),
which renders all fixed-size record storage formats, like, Oh, Say,
that used by ALL existing COBOL, Ada, and some FORTRAN data) useless
(can't "stat" the file and divide by sizeof(struct foo) to get the
record count).

Finally, when ISO standardized it as ISO-10646, they caved-in to the
Japanese desire to be able to attribute characters by language, and
made it 32 bits, with only code page zero allocated initially.  This
gives them the ability to "grep -v" non-Japanese text out, so they
do not have to see it (I suppose it would also work for non-French
text, come to that ;^)).  This resulted in the Windows-vs.-UNIX
debate about "Is wchar_t 16 bits or 32 bits?" (it's 16 on Windows,
but 32 on UNIX, a clear division between the defacto and the paper
standards camps).

So there's a huge amount of political water under the Unicode bridge
that makes people not like using it.

Plus, it's not like you could make a "Unicode Font".

--

As far as DNS goes, though, it's hierarchical; so as long as all
the DNS servers under a given suffix use the same encoding, it
doesn't matter.  For example, the first DNSINT deployment in
Japan, for example, it used UTF-5 encoded JIS-208 + JIS-212 (to
force the characters into the allowable DNS character set range,
instead of having to change the DNS servers, caches, and all the
resolver libraries in creation, you just changed the clients).
So you bloated each character out to up to 7 characters for
transfer.

Historically, I think it will take a while to work out, and until
it works out for Windows, I think people are going to pretty much
ignore it... just like IPv6.


PS: My pet IPv6 conspiracy theory is that the NSA don't want it
    deployed because of the built-in strong crypto support, and
    so they got the DOJ to back of Microsoft in trade for not
    deploying.

PPS: No, I did not originate that conspiracy theory... 8-) 8-).

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E8049DB.F29A2FBD>