Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jan 1998 10:33:54 +0100
From:      Pierre.Beyssac@hsc.fr (Pierre Beyssac)
To:        tlambert@primenet.com (Terry Lambert)
Cc:        Pierre.Beyssac@hsc.fr (Pierre Beyssac), louie@TransSys.COM, daniel_sobral@voga.com.br, hackers@FreeBSD.ORG
Subject:   Re: Wide characters on tcp connections
Message-ID:  <19980121103354.EB02816@mars.hsc.fr>
In-Reply-To: <199801202118.OAA27310@usr06.primenet.com>; from Terry Lambert on Jan 20, 1998 21:18:36 %2B0000
References:  <19980120120216.OB37901@mars.hsc.fr> <199801202118.OAA27310@usr06.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help
According to Terry Lambert:
[ UTF-8 ]
> It will take up to 3 bytes to resync, since it can take up to 5
> bytes to represent a single 16 bit value.

I assume you mean 32 bit? I think (don't have the draft handy) that's
a little more complicated than that, because there if I remember
correctly there are "collisions" between prefix codes and multibyte
encodings. But that's the idea.

> This assumes you are willing to push an arbitrary number of bytes
> to get a 16 bit value to the other end of the pipe, and that you are
> willing to take the computational overhead of the conversion,

Yes, but you have to take a computational overhead anyway, even
with fixed width characters, if you are to convert to network
byte order.

> and
> that you are willing to treat your values as a stream instead of
> an external data representation of a structure (ie: you are willling
> to give up being able to tell the other end to expect a certain number
> of bytes in a transaction).

In the case of a telnet connection or mainly ASCII transfer, this makes
sense: I certainly don't feel like I'm ready to take a fourfold
performance loss due to wider characters :-)

When putting this in a database system, you obviously don't _have_ to
use UTF-8 internally, that's purely an implementation issue.

Now I agree using UTF-8 in RPCs can be difficult, but after all isn't
the RPC layer supposed to hide exactly these kinds of things from
the application programmer?

> The people who like UTF encoding are the people who've already had
> thier mail forwarded to Hell,

I'm quite sure you mean X400 :-). Don't worry about me, I'm not a
UTF-8 specialist, not a UTF-8 user and even less a UTF-8 advocate
(not to mention I hate X400).

I was just pointing out that it would be silly to reinvent the wheel
if that's to come up with something similar to UTF-8.
-- 
Pierre.Beyssac@hsc.fr



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19980121103354.EB02816>