Date: Tue, 20 Jan 1998 23:20:38 -0500 From: "Louis A. Mamakos" <louie@TransSys.COM> To: Terry Lambert <tlambert@primenet.com> Cc: daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections Message-ID: <199801210420.XAA23356@whizzo.TransSys.COM> In-Reply-To: Your message of "Tue, 20 Jan 1998 19:35:21 GMT." <199801201935.MAA27183@usr04.primenet.com> References: <199801201935.MAA27183@usr04.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> > > The issue is one of stream synchronization. This is my main problem > > > with UTF over non-error-checked links. If you have an implicit value > > > boundry, then you are guaranteed a synchronized stream. > > > > Not applicable. TCP *is* an error checked link. Absent application > > implementation errors, you shouldn't get unscynchronized. > > Uh, byte order? Oh, come now. It's not like the problem of how to move multi-octet quantities across an octet-oriented communications channel hasen't been solved for quite a long way. For example, we manage to move 32 bit TCP sequence numbers (unsigned integers) without too many byte-order implementation issues. If you're unwilling to specify an encoding convention of your own, there are plenty to choose from which provide a portable encoding format suitable for many different implementation architectures. You could use XDR. You could use ASN.1 - plenty of rope here to hang yourself with. You could choose "big-endian" byte orders. You could choose "little-endian byte orders. You could choose to make this problem much more difficult than anyone might possibly imagine. The point I made, which is completely lost, is that a reliable octet stream transport protocol (like TCP) is not the place that you specify multibyte character encoding standards. No one is (should be?) surprised that the RS-232 standard is silent on this issue. > > > Re: the FS example: a better example is to perhaps ask if a UNIX > > > FS has provisions for storing "wide characters" (or preferrably, > > > 16bit wchar_t values from ISO10646 aka Unicode) in *directory > > > entries* (the current answer is "no, namei is too stupid"). > > > > Why is this a better example? It's not like we're trying to name > > transport endpoints with any sort of character strings; the issue > > is "awareness" of the underlying {transport,storage} mechansim. > > > > There's really no point in reimplementing a transport protocol given > > the literally thousands of man-hours of work by a lot of clever > > people over more than a decade to make TCP work well. > > The question is "what is the network prepresentation of the byte values"; > see the other part of this thread... My comment was in response to the original poster's remark that if TCP wasn't going to do this, then was it a better idea to implement a scheme using UDP or directly over IP. If I had to choose, I'd use UTF-8 encodings in big-endian byte order. This is, I believe, what the IETF has chosen when dealing with multi-byte characters which are embedded within other protocols.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801210420.XAA23356>