From owner-freebsd-doc@FreeBSD.ORG Fri Dec 19 08:30:25 2003 Return-Path: Delivered-To: freebsd-doc@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 67A7916A4CE; Fri, 19 Dec 2003 08:30:25 -0800 (PST) Received: from www.freebsd.cz (www.freebsd.cz [195.113.19.84]) by mx1.FreeBSD.org (Postfix) with ESMTP id BDAC543D48; Fri, 19 Dec 2003 08:30:19 -0800 (PST) (envelope-from horcicka@freebsd.cz) Received: from localhost (localhost [127.0.0.1]) by www.freebsd.cz (8.12.9p2/8.12.9) with ESMTP id hBJGUJEI009798; Fri, 19 Dec 2003 17:30:19 +0100 (CET) (envelope-from horcicka@freebsd.cz) Date: Fri, 19 Dec 2003 17:30:19 +0100 (CET) From: Martin Horcicka To: Hiroki Sato In-Reply-To: <20031219.204308.35475294.hrs@eos.ocn.ne.jp> Message-ID: <20031219170659.G6706@www.freebsd.cz> References: <20031215174940.B38847@www.freebsd.cz> <20031218010852.A44498@sumuk.de> <20031219.204308.35475294.hrs@eos.ocn.ne.jp> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-doc@FreeBSD.org Subject: Re: Problems with mirrors.xml and advisories.xml X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Dec 2003 16:30:25 -0000 Hiroki Sato (2003-12-19 20:43 +0900): > Martin Heinen wrote > in <20031218010852.A44498@sumuk.de>: > > martin> I encountered the same problem when sorting by > martin> translated country names. Attached is a simple > martin> test case: Running ?xsltproc sort.xsl names.xml? > martin> will produce the following list: > > Please try the attached stylesheet? This includes a quick hack > to fix the sort order based on the order of accent marks in > Unicode code map. I do not know if this is a reasonable order > or not because my knowledge of languages spoken in European > countries is very limited. > > The mechanism used in the quick hack is that accent marks in a target > string are replaced with alphabets included in US-ASCII, and the set of > strings are sorted based on the replaced string first, and on the > original string after that. If I understand right you try to do something like strxfrm(3) does but you specify the translation rules manually. As you wrote - it is a hack, not a general solution. It will not work even for Czech - e.g. in Czech sorting the string 'ch' is taken as one letter that goes between 'h' and 'i', i.e. this list is sorted in Czech: cihla hudba chlap idea And there are probably other weird rules in other languages. In my opinion the only right way for automatic sorting is using system locale database somehow. What about simply externally using something like: env -i LANG=cs_CZ.ISO_8859-2 sort And similarly for other languages? Martin