Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Aug 2008 21:37:41 -0700
From:      Tim Kientzle <kientzle@freebsd.org>
To:        =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= <svavar@kjarrval.is>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Unicode-based FreeBSD
Message-ID:  <48B38895.9040000@freebsd.org>
In-Reply-To: <48B3544B.4020601@kjarrval.is>
References:  <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com>		<200808241415.31812.mitchell@wyatt672earp.force9.co.uk>		<6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com>		<3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com>		<48B28B8D.9030305@kjarrval.is>		<3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com>		<48B336D8.2030300@kjarrval.is>	<3cb459ed0808251656l5716ee51y5bddf34fb8809b0c@mail.gmail.com> <48B3544B.4020601@kjarrval.is>

next in thread | previous in thread | raw e-mail | index | archive | help
> Going to UTF-8 might fix some of the character issues
> but we would be in the same shoes when it comes to characters
> which are in -16 and -32 but not in -8.

You need to read the Unicode/ISO10646 standards again;
you do not understand them.

There are no characters in UTF-32 that are not in UTF-8.

UTF-32, UTF-16, and UTF-8 all use exactly the same characters.

UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to 4 
bytes per character.

UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 to 
4 bytes per character.

UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 
bytes per character.

Practically speaking, UTF-8 is a bit more convenient for file
storage and transmission (including terminal support), UTF-16
or UTF-32 can be slightly more convenient for internal
string manipulation.  But all three encodings use exactly
the same characters.

Tim Kientzle



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48B38895.9040000>