Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Jan 1998 23:20:38 -0500
From:      "Louis A. Mamakos" <louie@TransSys.COM>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        daniel_sobral@voga.com.br, hackers@FreeBSD.ORG
Subject:   Re: Wide characters on tcp connections 
Message-ID:  <199801210420.XAA23356@whizzo.TransSys.COM>
In-Reply-To: Your message of "Tue, 20 Jan 1998 19:35:21 GMT." <199801201935.MAA27183@usr04.primenet.com> 
References:  <199801201935.MAA27183@usr04.primenet.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
> > > The issue is one of stream synchronization.  This is my main problem
> > > with UTF over non-error-checked links.  If you have an implicit value
> > > boundry, then you are guaranteed a synchronized stream.
> > 
> > Not applicable.  TCP *is* an error checked link.  Absent application
> > implementation errors, you shouldn't get unscynchronized.
> 
> Uh, byte order?

Oh, come now.  It's not like the problem of how to move multi-octet 
quantities across an octet-oriented communications channel hasen't
been solved for quite a long way.  For example, we manage to move
32 bit TCP sequence numbers (unsigned integers) without too many
byte-order implementation issues.

If you're unwilling to specify an encoding convention of your own,
there are plenty to choose from which provide a portable encoding
format suitable for many different implementation architectures.

You could use XDR.

You could use ASN.1 - plenty of rope here to hang yourself with.

You could choose "big-endian" byte orders.

You could choose "little-endian byte orders.

You could choose to make this problem much more difficult than anyone
might possibly imagine.

The point I made, which is completely lost, is that a reliable octet
stream transport protocol (like TCP) is not the place that you specify 
multibyte character encoding standards.  No one is (should be?) surprised 
that the RS-232 standard is silent on this issue.

> > > Re: the FS example: a better example is to perhaps ask if a UNIX
> > > FS has provisions for storing "wide characters" (or preferrably,
> > > 16bit wchar_t values from ISO10646 aka Unicode) in *directory
> > > entries* (the current answer is "no, namei is too stupid").
> > 
> > Why is this a better example?  It's not like we're trying to name
> > transport endpoints with any sort of character strings; the issue
> > is "awareness" of the underlying {transport,storage} mechansim.
> > 
> > There's really no point in reimplementing a transport protocol given
> > the literally thousands of man-hours of work by a lot of clever
> > people over more than a decade to make TCP work well.
> 
> The question is "what is the network prepresentation of the byte values";
> see the other part of this thread...

My comment was in response to the original poster's remark that
if TCP wasn't going to do this, then was it a better idea to implement
a scheme using UDP or directly over IP.  

If I had to choose, I'd use UTF-8 encodings in big-endian byte order.  This
is, I believe, what the IETF has chosen when dealing with multi-byte
characters which are embedded within other protocols.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801210420.XAA23356>