Date: Tue, 20 Jan 1998 21:18:36 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: Pierre.Beyssac@hsc.fr (Pierre Beyssac) Cc: louie@TransSys.COM, tlambert@primenet.com, daniel_sobral@voga.com.br, hackers@FreeBSD.ORG Subject: Re: Wide characters on tcp connections Message-ID: <199801202118.OAA27310@usr06.primenet.com> In-Reply-To: <19980120120216.OB37901@mars.hsc.fr> from "Pierre Beyssac" at Jan 20, 98 12:02:16 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> I can add that, if I've understood UTF-8 right, it's fairly easy to > resynchronize in case you happen to lose sync. It just takes one or > two lost or garbled chars. I think that UTF-8 is one of the ways to > go. Its only drawback is that it's not compatible with "pure" 8 bits > ISO-Latin-1 streams as it reuses 0x80-0xff. It will take up to 3 bytes to resync, since it can take up to 5 bytes to represent a single 16 bit value. This assumes you are willing to push an arbitrary number of bytes to get a 16 bit value to the other end of the pipe, and that you are willing to take the computational overhead of the conversion, and that you are willing to treat your values as a stream instead of an external data representation of a structure (ie: you are willling to give up being able to tell the other end to expect a certain number of bytes in a transaction). UTF encoding is evil personified if you are doing database work. You never know how many "real" characters (16 bit values) can be stored in any N bytes of a fixed field.. This makes input complicated, since you must veto base on the UTF encoding filling up the field or not, makes it impossible to fully specify field length in a schema, and in general, makes life Hell. The people who like UTF encoding are the people who've already had thier mail forwarded to Hell, mostly though already losing these programmatically useful abilities to some other evil, like EUC encoding and ISO2022. FWIW, CIFS (aka SMB) ships long (Unicode) names over the wire in wchar_t's in x86 byte order. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199801202118.OAA27310>