Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 19 Dec 2003 17:30:19 +0100 (CET)
From:      Martin Horcicka <horcicka@freebsd.cz>
To:        Hiroki Sato <hrs@FreeBSD.org>
Cc:        freebsd-doc@FreeBSD.org
Subject:   Re: Problems with mirrors.xml and advisories.xml
Message-ID:  <20031219170659.G6706@www.freebsd.cz>
In-Reply-To: <20031219.204308.35475294.hrs@eos.ocn.ne.jp>
References:  <20031215174940.B38847@www.freebsd.cz> <20031218010852.A44498@sumuk.de> <20031219.204308.35475294.hrs@eos.ocn.ne.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
Hiroki Sato (2003-12-19 20:43 +0900):

> Martin Heinen <martin@sumuk.de> wrote
>   in <20031218010852.A44498@sumuk.de>:
>
> martin> I encountered the same problem when sorting by
> martin> translated country names.  Attached is a simple
> martin> test case:  Running ?xsltproc sort.xsl names.xml?
> martin> will produce the following list:
>
>  Please try the attached stylesheet?  This includes a quick hack
>  to fix the sort order based on the order of accent marks in
>  Unicode code map.  I do not know if this is a reasonable order
>  or not because my knowledge of languages spoken in European
>  countries is very limited.
>
>  The mechanism used in the quick hack is that accent marks in a target
>  string are replaced with alphabets included in US-ASCII, and the set of
>  strings are sorted based on the replaced string first, and on the
>  original string after that.

If I understand right you try to do something like strxfrm(3) does but you
specify the translation rules manually. As you wrote - it is a hack, not a
general solution. It will not work even for Czech - e.g. in Czech sorting the
string 'ch' is taken as one letter that goes between 'h' and 'i', i.e. this
list is sorted in Czech:

cihla
hudba
chlap
idea

And there are probably other weird rules in other languages.

In my opinion the only right way for automatic sorting is using system locale
database somehow. What about simply externally using something like:

env -i LANG=cs_CZ.ISO_8859-2 sort

And similarly for other languages?

Martin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20031219170659.G6706>