From owner-freebsd-i18n@FreeBSD.ORG  Sat Apr  5 01:35:58 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id CA6AC9EB
 for <i18n@freebsd.org>; Sat,  5 Apr 2014 01:35:58 +0000 (UTC)
Received: from mail-lb0-f174.google.com (mail-lb0-f174.google.com
 [209.85.217.174])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4F143B79
 for <i18n@freebsd.org>; Sat,  5 Apr 2014 01:35:57 +0000 (UTC)
Received: by mail-lb0-f174.google.com with SMTP id u14so3029095lbd.33
 for <i18n@freebsd.org>; Fri, 04 Apr 2014 18:35:49 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to
 :cc:subject:references:in-reply-to:content-type
 :content-transfer-encoding;
 bh=TNpNtlmDbysLXSY7THx2/+Np1fr8YT8omJNfkifmUyQ=;
 b=levKXhaZg5E+y7Qdc1Q86ND7A7oB8Jp46+O50/muKq56Fbz7+oMZ5RlUaOijFTrTij
 S9+LZe6nG90TOiS9JPTVqLCvFxwjDNepqvYkni+gtbSy1WcqDwu/LNPk4+4mpPlG2/Rh
 FT6a22fd2ja1yi2JkYDTc/XUk5yIA3No1W4ZQ1zw2tZKSWe+lgQbPosrEhmRnH6u17Fx
 xLyOLL1IsvTmAGmyUQHMmOb58q2KEwQP6NDCzDt1cozne1V7HUIzNeuVF80UNcTpAvi9
 chxxgEtj9HtScyomcD7RFlRbhO77pN9PQv8R9kiqggbuom94tCK8FieBQNn1snEEy+kg
 XRqA==
X-Gm-Message-State: ALoCoQkrHHbdSd76WMrvwRPMAvHos+YDf2eee4u1OECI53NLGBv6KqGvbhe1oLhAP6rhWuQ7SgUk
X-Received: by 10.112.142.105 with SMTP id rv9mr538987lbb.42.1396661749533;
 Fri, 04 Apr 2014 18:35:49 -0700 (PDT)
Received: from [192.168.1.2] ([89.169.173.68])
 by mx.google.com with ESMTPSA id n9sm6748859lbg.6.2014.04.04.18.35.48
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Fri, 04 Apr 2014 18:35:48 -0700 (PDT)
Message-ID: <533F5DF5.9020803@freebsd.org>
Date: Sat, 05 Apr 2014 05:35:49 +0400
From: Andrey Chernov <ache@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: Gleb Smirnoff <glebius@FreeBSD.org>, i18n@freebsd.org
Subject: Re: login.conf --> UTF-8
References: <1396457629.2280.2.camel@powernoodle.corp.yahoo.com>
 <20140402171546.GL44326@FreeBSD.org> <533C8269.7040305@freebsd.org>
 <20140404124634.GC44326@glebius.int.ru>
In-Reply-To: <20140404124634.GC44326@glebius.int.ru>
X-Enigmail-Version: 1.7a1pre
Content-Type: text/plain; charset=KOI8-R
Content-Transfer-Encoding: 7bit
Cc: sbruno@freebsd.org,
 "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Apr 2014 01:35:58 -0000

On 04.04.2014 16:46, Gleb Smirnoff wrote:
> On Thu, Apr 03, 2014 at 01:34:33AM +0400, Andrey Chernov wrote:
> A> On 02.04.2014 21:15, Gleb Smirnoff wrote:
> A> > S> +	:lang=en_US.UTF-8:\
> A> > S> +	:charset=UTF-8:
> A> > 
> A> > And I'd like to do same change for the 'russian' login class
> A> > in /etc/login.conf.
> A> 
> A> Please everybody remember that we don't have UTF-8 collation
> A> implemented, just fallback to bytecode comparison.
> 
> Any objections on checking in FreeBSD-compatible[1] UTF-8 collation
> implementation from Alex Tutubalin?
> 
> http://blog.lexa.ru/2008/03/03/freebsd_utf8_russian_collate_vtoraja_popitka.html
> 

Even his "version 2" have my objections. I already reply Alex about this
in 2008. In short:
1) It is error there: almost all single chars above ASCII should be
"chains", i.t. two bytes minimum, since there almost no intersections
with ISO8859-1 as UTF-8 subset.
2) The table itself is very incomplete, f.e. not covering either whole
KOI8-R, nor ISO8859-5, nor CP866. It is made from CP1251 with all its
restrictions. So, switching from f.e. KOI8-R to UTF-8 will cause sorting
regression. Russian UTF-8 collation should be able to sort all major
Russian charsets mentioned, i.e. we need combined table.
3) "charmap map.ISO8859-1" declaration is missing (needed mainly for
using pure ASCII chars mnemonic names).

Even in case above mentioned errors will be removed and the code will be
committed afterwards, we should understand that this way (implementing
multibyte collation via single byte one) even while being possible is a
big hack and slowing sorting down up to 10 times.

Proper "Unicode collation algorithm" is already implemented by ICU and
other projects. See
http://unicode.org/reports/tr10/
It will be better if someone adopt it instead of hacks.

-- 
http://ache.vniz.net/