FreeBSD Mail Archives

Date:      Tue, 26 Aug 2008 09:39:59 +0000
From:      =?UTF-8?B?U3ZhdmFyIEzDunRoZXJzc29u?= <svavar@kjarrval.is>
To:        Tim Kientzle <kientzle@freebsd.org>
Cc:        freebsd-current@freebsd.org
Subject:   Re: Unicode-based FreeBSD
Message-ID:  <48B3CF6F.5020202@kjarrval.is>
In-Reply-To: <48B38895.9040000@freebsd.org>
References:  <3cb459ed0808221700w335b0906g6901d8b8bec4dad9@mail.gmail.com>		<200808241415.31812.mitchell@wyatt672earp.force9.co.uk>		<6a7033710808241239p1cbdc7adwd4f87814b428b10b@mail.gmail.com>		<3cb459ed0808241958v552eafejf7841f0f9993928e@mail.gmail.com>		<48B28B8D.9030305@kjarrval.is>		<3cb459ed0808250621s28a1b825u1cc16939951bb157@mail.gmail.com>		<48B336D8.2030300@kjarrval.is>	<3cb459ed0808251656l5716ee51y5bddf34fb8809b0c@mail.gmail.com> <48B3544B.4020601@kjarrval.is> <48B38895.9040000@freebsd.org>

Tim Kientzle wrote:
>> Going to UTF-8 might fix some of the character issues
>> but we would be in the same shoes when it comes to characters
>> which are in -16 and -32 but not in -8.
>
> You need to read the Unicode/ISO10646 standards again;
> you do not understand them.
You are right, I do not understand them. As I mentioned, I am not a 
Unicode expert and I have never claimed to be one.
>
> There are no characters in UTF-32 that are not in UTF-8.
>
> UTF-32, UTF-16, and UTF-8 all use exactly the same characters.
>
> UTF-8 encodes Unicode characters from U+000000 to U+10FFFF, using 1 to 
> 4 bytes per character.
>
> UTF-16 encodes Unicode characters from U+000000 to U+10FFFF, using 2 
> to 4 bytes per character.
>
> UTF-32 encodes Unicode characters from U+000000 to U+10FFFF, using 4 
> bytes per character.
>
> Practically speaking, UTF-8 is a bit more convenient for file
> storage and transmission (including terminal support), UTF-16
> or UTF-32 can be slightly more convenient for internal
> string manipulation.  But all three encodings use exactly
> the same characters.
>
> Tim Kientzle
I cannot confirm you are 100% right because I am not an expert in 
Unicode. However, after some reading, I can see there is no "character 
loss" by using one form of Unicode than the other. Therefore, I stand 
corrected on that issue. I still think there should be support for 
UTF-16 and UTF-32 in FreeBSD in general but it is outside the scope of 
the topic (Unicode in syscons).

Tz-Huan Huang wrote:
> How do you define ``support''?
>
> If you mean software-level support, vim supports UTF-16, firefox
> supports UTF-16/UTF-32, perl supports UTF-16/UTF-32, etc.
>
> If you mean system-level support, there are two cases:
>
> 1. The system internal text representation is still in UTF-8, just add
> UTF-16/32
> support for terminal, stdin/stdout/stderr, etc. I think it's not so
> hard (I might be
> wrong because I don't know terminal at all) but I don't see any reason to set
> locale to UTF-16 or UTF-32.
>
> 2. The system internal text representation is changed to UTF-16 or UTF-32.
> This is another story and I have no comment on it.
>
>   
By support I meant full handling of Unicode characters which meant both 
1 and 2. Although, in connection to my discovery above, I think it is 
better if the internal handling is (continued to be) done in UTF-8.


Með kveðju / With regards,
Svavar Kjarrval (svavar@kjarrval.is)
s. 863-9900

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48B3CF6F.5020202>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation