Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Mar 2014 16:45:29 +0100
From:      Rolf Nielsen <rmg70swe@yahoo.com>
To:        stable@freebsd.org
Cc:        Gerhard Schmidt <estartu@ze.tum.de>
Subject:   Re: UTF-8 Sorting
Message-ID:  <53208119.6060009@yahoo.com>
In-Reply-To: <53207613.2090801@ze.tum.de>
References:  <5320297F.1080400@ze.tum.de> <53207451.3010305@yahoo.com> <53207613.2090801@ze.tum.de>

next in thread | previous in thread | raw e-mail | index | archive | help


Gerhard Schmidt skrev 2014-03-12 15:58:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12.03.2014 15:50, Rolf Nielsen wrote:
>>
>>
>> Gerhard Schmidt skrev 2014-03-12 10:31:
>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>
>>> Hi,
>>>
>>> I've a problem with FreeBSD, UTF-8 and Sorting.
>>>
>>> e.g. there is a file with the following content
>>>
>>> Meier Müller Öger Ofner Schmidt
>>>
>>> I have set my Terminal to ISO-8859-1 Encoding and call sort on
>>> this file I get the following output.
>>>
>>> Meier Müller Ofner Öger Schmidt
>>>
>>> Which is correctly sorted.
>>>
>>> When i change my Terminal to UTF-8 encoding and convert the file
>>> to UTF-8 and call sort again I get the following output.
>>>
>>> Meier Müller Ofner Schmidt Öger
>>>
>>> which is wrong.
>>>
>>> The problem seams to be that the LC_COLLATE file in the
>>> de_DE.UTF-8 locale is linked to ../la_LN.US-ASCII/LC_COLLATE (as
>>> are all LC_COLLATE Files in any UTF-8 locale).
>>>
>>> After some Research i found a Mail from Kuba Lida in December
>>> 2008 (yeah that's 5 Years ago) stating the same Problem and got
>>> no response.
>>>
>>> Why isn't there a UTF-8 LC_COLLATE file for any language. Kuba
>>> Lida believed there was a Problem with multibyte collate files in
>>> FreeBSD. Is this true and are there plans to fix this problem.
>>>
>>> The same test under Linux works without problem.
>>>
>>> Regards Estartu
>>>
>>> - -- -
>>
>> Hi,
>>
>> Hmm, to me the result that you claim is wrong looks perfectly
>> correct, however, it may of course differ between languages. In
>> Swedish Ö is a separate letter, located last in the alphabet (from
>> A to Z we have the exact same alphabet as English, and then come Å,
>> Ä and Ö, in that order).
>
> Yeah, Sweedisch sorts these characters after Z but in German Ö equals
> Oe in Names and O in all other cases. There have to be collation
> tables for different languages as there are different one for dieffent
> languages in ISO encoding. I know that the direfrence in Name and Not
> name will not be implementable but the default whould be much of an
> improvement.
>
> The same difference is between German German (de_DE) and Austrian
> German (de_AT).
>
> Regards
>     Estartu

I see. Well, different countries, different customs. :)

(I should have included the list in my previous reply, but I hit the 
wrong button. I apologise for that).

Regards,
Rolf



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53208119.6060009>