From owner-freebsd-i18n@FreeBSD.ORG  Tue Aug 26 21:09:20 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C7FCA7A9;
 Tue, 26 Aug 2014 21:09:20 +0000 (UTC)
Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com
 [IPv6:2a00:1450:400c:c05::235])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C46CB3DE1;
 Tue, 26 Aug 2014 21:09:19 +0000 (UTC)
Received: by mail-wi0-f181.google.com with SMTP id bs8so4809314wib.14
 for <multiple recipients>; Tue, 26 Aug 2014 14:09:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:from:date:message-id:subject:to:content-type;
 bh=cnr/8mqoU7mb6qYvKc4seokOzUYjWnJMKr03dvVdyy0=;
 b=POPCFLK6cYQ5XsUK+ftFfMaN2wT1kmDRk5YA38YNdjvCM+/6KC10ZDsj+rgdoprLfY
 qYHeeFS9G6C1eOLvkC/StQ5pUW7q2wu1yubFYqkKOxQjeGG8MQz5j1YM/ISy4G4sLZji
 O/Vm6q3/rWRNN6rg+FbTR+vX+aiea1QrABC6AbS1t59nccKYOoXJvuFm8FaLJbZdCEYm
 NfXhZgXiL7uWBp+ywkTX8zEFKZEp77r1NYwowiV+EGxtr3w8AOfGfQ66/evSfs0AmG0J
 8NGukji3nyrTlDD5rWI6V+JOdZr5J3/mwW5qaRYzdd4zODUWpYIBTiyFbFOkcuC/pD7U
 KwNQ==
X-Received: by 10.180.20.40 with SMTP id k8mr24479098wie.38.1409087358120;
 Tue, 26 Aug 2014 14:09:18 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.48.9 with HTTP; Tue, 26 Aug 2014 14:08:58 -0700 (PDT)
Reply-To: ghostmansd@gmail.com
From: Dmitry Selyutin <ghostman.sd@gmail.com>
Date: Wed, 27 Aug 2014 01:08:58 +0400
Message-ID: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
Subject: Report #9: Unicode support
To: soc-status@freebsd.org, Pedro Giffuni <pfg@freebsd.org>, 
 David Chisnall <theraven@freebsd.org>, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Aug 2014 21:09:21 -0000

Hello everyone!

Here are the last news about the Unicode support project[0].
You can always check my repository[1].

During these days I had hardware problems (my HDD peacefully died), so
development didn't progress so much as before. However, I've
eliminated these problems, so I tried to fix bugs and reorganize the
code as much as possible. Now everything shall compile.

I decided to use __attribute__((constructor)) and
__attribute__((destructor)), since I don't know if there exist a
better way to open a file once in the startup and closing it when all
routines close. I've found one or two occurrences of this construction
in FreeBSD code; AFAICT it is rather common in clang and gcc, so I
decided to use it. Hopefully it will also allow us to use root
collation database in the embedded systems (if any such system really
needs collation algorithm).

As you may know we need a tool that can convert collation text files
obtained from unicode.org to new collation database (colldb) format.
There is a version of this tool written in Python
(share/examples/colldb/colldb.py). IIRC we can't use Python when we
have a base system though, so it seems that we need to written such
tool using C language. I was thinking of lex/yacc combo; I've never
tried it, but I think it shouldn't be too hard to write a tool using
it. I'd like to know your opinions about this task.
I've already written a man page (bin/colldb/colldb.1). The only thing
which seems dubious is that I decided to use the same name as for the
library itself (well, it seems I have a lack of imagination). So we
have both colldb.1 and colldb.3 man pages.

The other thing I'd really like to do is to really force network byte
order in collation database format (I'm sure I've seen a way to do it
in Berkley databases). It's a pity that I have no platform with
big-endian (or even PDP!) byte order. Any help here is highly
appreciated (as well as your thoughts about lex/yacc, i.e. thoughts
whether it fits well to my task).

Since Google Summer of Code period has passed, I'd like to thank both
my mentors, Pedro and David, who gave me a helping hand during this
project, and especially Konrad Jankowski, who found time to answer my
questions and help me too. Though GSoC is closed, I'd like to stay
with FreeBSD project. First of all, I want to finish and bring to mind
this project: I don't think it's really finished, especially its
testing part, though it seems that new collation algorithm can already
be used. Then I'd like to work in other parts of my project,
especially in internationalization parts. I'd also like to improve my
own library, qc, to provide a rich API for *BSD and POSIX systems,
since I acutely feel the lack of such API. If it is possible to stay
with project, I'd be very happy to do it. :-)

P.S. Does anyone knows how to get diff between only for my branch
(i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to
give everything what all FreeBSD's GSoC have done, so I need some
other command. Thanks for your help!

[0] https://wiki.freebsd.org/SummerOfCode2014/Unicode
[1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd

-- 
With best regards,
Dmitry Selyutin

From owner-freebsd-i18n@FreeBSD.ORG  Tue Aug 26 22:16:16 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id B07CEA04;
 Tue, 26 Aug 2014 22:16:16 +0000 (UTC)
Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com
 [IPv6:2a00:1450:400c:c05::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id A2FCB35BC;
 Tue, 26 Aug 2014 22:16:15 +0000 (UTC)
Received: by mail-wi0-f178.google.com with SMTP id hi2so4876668wib.5
 for <multiple recipients>; Tue, 26 Aug 2014 15:16:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=sender:date:from:to:cc:subject:message-id:references:mime-version
 :content-type:content-disposition:in-reply-to:user-agent;
 bh=ARZ6+7JpQ5CLPbInYPEajOm/CEh+32hGx7okdeynTgs=;
 b=oB2CL8XVmaPDovtDZOo+01pNdU73gtZTl9Nb76s3ayoeKR3yYhQDr85rAr6WhVUsqz
 +DkoTQQJlEr3lK8eTe0DXbczzQIPyWlGvL0/J1/+GcTxm/arXzDGEtbayrnstGIFA/qT
 J6mgrV0bchA3KHeFmkJA3wmuk5NXnrLtbu4kfpM/8CayEGaV3zr8mZlm9YYIpFnpWhFI
 Lrf7z6gQsWw7jVR20G4+fvHRm4iI9ez42jaX9lzPWSuQ4vnmOmM5PuTbK/mFsuPJ2/bs
 0UiTqmYzaiCvFwHFGBQo868vhAm0i7RduJfXLRiIMCtyn2bDaQ8Z1c7JKAoDRouP09QW
 RlIw==
X-Received: by 10.180.149.169 with SMTP id ub9mr24451406wib.32.1409091373535; 
 Tue, 26 Aug 2014 15:16:13 -0700 (PDT)
Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1])
 by mx.google.com with ESMTPSA id hi4sm11541340wjb.46.2014.08.26.15.16.12
 for <multiple recipients>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 26 Aug 2014 15:16:12 -0700 (PDT)
Sender: Baptiste Daroussin <baptiste.daroussin@gmail.com>
Date: Wed, 27 Aug 2014 00:16:10 +0200
From: Baptiste Daroussin <bapt@FreeBSD.org>
To: ghostmansd@gmail.com
Subject: Re: Report #9: Unicode support
Message-ID: <20140826221610.GD65120@ivaldir.etoilebsd.net>
References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
 protocol="application/pgp-signature"; boundary="sXc4Kmr5FA7axrvy"
Content-Disposition: inline
In-Reply-To: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Cc: David Chisnall <theraven@freebsd.org>, soc-status@freebsd.org,
 Pedro Giffuni <pfg@freebsd.org>, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Aug 2014 22:16:16 -0000


--sXc4Kmr5FA7axrvy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote:
> Hello everyone!
>=20
> Here are the last news about the Unicode support project[0].
> You can always check my repository[1].
>=20
> During these days I had hardware problems (my HDD peacefully died), so
> development didn't progress so much as before. However, I've
> eliminated these problems, so I tried to fix bugs and reorganize the
> code as much as possible. Now everything shall compile.
>=20
> I decided to use __attribute__((constructor)) and
> __attribute__((destructor)), since I don't know if there exist a
> better way to open a file once in the startup and closing it when all
> routines close. I've found one or two occurrences of this construction
> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I
> decided to use it. Hopefully it will also allow us to use root
> collation database in the embedded systems (if any such system really
> needs collation algorithm).
>=20
> As you may know we need a tool that can convert collation text files
> obtained from unicode.org to new collation database (colldb) format.
> There is a version of this tool written in Python
> (share/examples/colldb/colldb.py). IIRC we can't use Python when we
> have a base system though, so it seems that we need to written such
> tool using C language. I was thinking of lex/yacc combo; I've never
> tried it, but I think it shouldn't be too hard to write a tool using
> it. I'd like to know your opinions about this task.
> I've already written a man page (bin/colldb/colldb.1). The only thing
> which seems dubious is that I decided to use the same name as for the
> library itself (well, it seems I have a lack of imagination). So we
> have both colldb.1 and colldb.3 man pages.
>=20
> The other thing I'd really like to do is to really force network byte
> order in collation database format (I'm sure I've seen a way to do it
> in Berkley databases). It's a pity that I have no platform with
> big-endian (or even PDP!) byte order. Any help here is highly
> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts
> whether it fits well to my task).
>=20
> Since Google Summer of Code period has passed, I'd like to thank both
> my mentors, Pedro and David, who gave me a helping hand during this
> project, and especially Konrad Jankowski, who found time to answer my
> questions and help me too. Though GSoC is closed, I'd like to stay
> with FreeBSD project. First of all, I want to finish and bring to mind
> this project: I don't think it's really finished, especially its
> testing part, though it seems that new collation algorithm can already
> be used. Then I'd like to work in other parts of my project,
> especially in internationalization parts. I'd also like to improve my
> own library, qc, to provide a rich API for *BSD and POSIX systems,
> since I acutely feel the lack of such API. If it is possible to stay
> with project, I'd be very happy to do it. :-)
>=20
> P.S. Does anyone knows how to get diff between only for my branch
> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to
> give everything what all FreeBSD's GSoC have done, so I need some
> other command. Thanks for your help!
>=20
> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode
> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd
>=20

First thank you very much for your work on this subject this is highly need=
ed.

Concerning the db format have you thought about using the new netbsd consta=
nt
database format?

It has simple API way easier to use, the db format is endian safe and final=
 file
is smaller than equivalent in bdb format.

Lots of areas of FreeBSD could benefit from using this cdb format as well i=
mho.

regards,
Bapt

--sXc4Kmr5FA7axrvy
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlP9ByoACgkQ8kTtMUmk6Ez2IACgjTEpHU5zDDx4IdA99j7/O1Ty
KT0AnjcnBEstTI1ZjNe8yurWOur1fi3l
=taUl
-----END PGP SIGNATURE-----

--sXc4Kmr5FA7axrvy--

From owner-freebsd-i18n@FreeBSD.ORG  Tue Aug 26 23:17:40 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 6BFF8D2A
 for <freebsd-i18n@freebsd.org>; Tue, 26 Aug 2014 23:17:40 +0000 (UTC)
Received: from nm17-vm1.bullet.mail.bf1.yahoo.com
 (nm17-vm1.bullet.mail.bf1.yahoo.com [98.139.213.55])
 by mx1.freebsd.org (Postfix) with ESMTP id 1A69C3EFD
 for <freebsd-i18n@freebsd.org>; Tue, 26 Aug 2014 23:17:39 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048;
 t=1409095052; bh=AaEvnKkmGqLM3zNYd0bc/iD94kjsBKCiZVWdxKNX8ic=;
 h=Received:Received:Received:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:From:Subject;
 b=turmXWapLknz59OM/vyTcbisxwK+cT+we6SPVM42c5Y4VQ+o/OnxgDgi9kGymEhiOGjABd4LZWXNuhD5dGbg3D1M1vSrvDtYV3DPa2OeDsQGmgqWsUfgpi8fQWLAwKWz0GvcM9XKWbSmcuaBlwAVkgmj97W81ZtgJlR9BnmBTajJi7KpSnfwv+Rh+C9r9GWdPt04Zqwq1j1iqzxmp6MjbO7FTzprDcnnZVWw+QEaZW/rAPpJPVQbcl2hCkMFXlJYidy4Zpr/Yl7RXMAuKBMIpkntYHeWVGpE5xkO2G6aMBkBwyIFmw7CpQ0++N7KYLqDgJzGBxPGqWhl2nlJRk1Xxg==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com;
 b=WeEHAzoYnG5rDKI/vMCD/y4YbUAAhmjxqoy5XDNU0HYAJkKxnR3KqS+UgPIPm2fSak65KoxlKNy/kUeyM+/KbVYud3W8Nd791iia2B2fSnNycuFpd2vXuDtJMf9jEpqdImsaMSntQ3uMkofWtSpcXzSL9CdCwVJsQBq89V676xHRH5AzSujxiGvwDRTmZNhLAjholMaiRA/G+LgZqNPg6xBJrC7EgJxURBPB2QXZLSYAZmVSgbn86ONdH03cfnr3n5jeZA/XhLDtlcTRW/dXUYgGp7M7rUwkkwLUbM5AauKZ8ovQJWQa4kQ+nKmbU547YwiNoQ9oaMCGr05MHx1z3g==;
Received: from [98.139.215.143] by nm17.bullet.mail.bf1.yahoo.com with NNFMP;
 26 Aug 2014 23:17:32 -0000
Received: from [68.142.230.69] by tm14.bullet.mail.bf1.yahoo.com with NNFMP;
 26 Aug 2014 23:17:32 -0000
Received: from [127.0.0.1] by smtp226.mail.bf1.yahoo.com with NNFMP;
 26 Aug 2014 23:17:32 -0000
X-Yahoo-Newman-Id: 116492.29038.bm@smtp226.mail.bf1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: iGE.ZTAVM1l3c8F6nX2XuCSeGcDmwdS3i_Uc3sRrywgW6tb
 Z_wY2XEiG5yCFwqyhTxZZHy0irdNX4F31Jwq6CzEzV8.GVfCfs0Z.NOndtTT
 GsrZzS3eBiFJwrt2FyW5Dvt9PKuwCSpU.cPWeuCFJwvDpND5sYdJi2zYyJT.
 CiGn78PpoYNiLqcJLjSjjdj8Rygqygc_OhPyPZ94YuMbN3PtJRIycOC8MjS8
 VD7U4EUbft.CpThnGm_nJD7UXtOdeJD6lB3CcJkzAIalFfOtDXwBuwVb.q.q
 qXS.afV5MS2QuY89Exx87lbQ3F.KXCNuzjXPtABM8yvlo5VwGskjD4n2Tj5f
 xijQQBEaJuTQra_CVWOW.pGZ6LJIZNAij.N2drb82eK9Cx5mgW59k1AfuulI
 UK3RL2ED4JdgHGL50e93KiLTmcccjSuot9DylUkuWeF6sHALrnioGdY_b8a9
 WRHj9nfdwWLp5RKag3A0WwNjAeimjqdX4TIEuhgsBF18e5dQXQWeHD3WTyv7
 InVfEPnT2SFnGe1o.0Ii7o1Nk_eU2mWtXgRPMctzMgNxwJWGbfxLPAF_jDaL
 3gZaW6aIhdN2JweKYO11sHnRUJYZsLub2wVFVaamf.LA7Z8rqucwdKQE.sef
 j7YeWf9AWOxnIwLX9Ijxn8gSrdlLWs5bZvqKdZ9fKDDl3ImyA1SOcm8KNrPa
 pcCAFEWcMtrd2NCMG2980HwylgjWrqhZnKb7M.CNxSCUR0MCWH96NwNN_bss
 3EvJCSah4d1EHZpQr8dS.t1HfZJ63bZ9DJJqPW25UbMcdb8QLkNNfl5cid0q
 3xzWXJsBNajmUAjWKGtWw7dSl8OjIZJOMV3C_6bweml48O48smBfW2TBNKVU
 g55av_T6AqHeTqeMucTRUzBPEStJ9y6.GMMs-
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
Message-ID: <53FD1599.7040708@freebsd.org>
Date: Tue, 26 Aug 2014 18:17:45 -0500
From: Pedro Giffuni <pfg@freebsd.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: Baptiste Daroussin <bapt@FreeBSD.org>, ghostmansd@gmail.com
Subject: Re: Report #9: Unicode support
References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
 <20140826221610.GD65120@ivaldir.etoilebsd.net>
In-Reply-To: <20140826221610.GD65120@ivaldir.etoilebsd.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: soc-status@freebsd.org, David Chisnall <theraven@freebsd.org>,
 Konrad Jankowski <versus@freebsd.org>, freebsd-i18n@freebsd.org
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 26 Aug 2014 23:17:40 -0000

Hi Baptiste;

On 08/26/14 17:16, Baptiste Daroussin wrote:
> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote:
>> Hello everyone!
>>
>> Here are the last news about the Unicode support project[0].
>> You can always check my repository[1].
>>
>> During these days I had hardware problems (my HDD peacefully died), so
>> development didn't progress so much as before. However, I've
>> eliminated these problems, so I tried to fix bugs and reorganize the
>> code as much as possible. Now everything shall compile.
>>
>> I decided to use __attribute__((constructor)) and
>> __attribute__((destructor)), since I don't know if there exist a
>> better way to open a file once in the startup and closing it when all
>> routines close. I've found one or two occurrences of this construction
>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I
>> decided to use it. Hopefully it will also allow us to use root
>> collation database in the embedded systems (if any such system really
>> needs collation algorithm).
>>
>> As you may know we need a tool that can convert collation text files
>> obtained from unicode.org to new collation database (colldb) format.
>> There is a version of this tool written in Python
>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we
>> have a base system though, so it seems that we need to written such
>> tool using C language. I was thinking of lex/yacc combo; I've never
>> tried it, but I think it shouldn't be too hard to write a tool using
>> it. I'd like to know your opinions about this task.
>> I've already written a man page (bin/colldb/colldb.1). The only thing
>> which seems dubious is that I decided to use the same name as for the
>> library itself (well, it seems I have a lack of imagination). So we
>> have both colldb.1 and colldb.3 man pages.
>>
>> The other thing I'd really like to do is to really force network byte
>> order in collation database format (I'm sure I've seen a way to do it
>> in Berkley databases). It's a pity that I have no platform with
>> big-endian (or even PDP!) byte order. Any help here is highly
>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts
>> whether it fits well to my task).
>>
>> Since Google Summer of Code period has passed, I'd like to thank both
>> my mentors, Pedro and David, who gave me a helping hand during this
>> project, and especially Konrad Jankowski, who found time to answer my
>> questions and help me too. Though GSoC is closed, I'd like to stay
>> with FreeBSD project. First of all, I want to finish and bring to mind
>> this project: I don't think it's really finished, especially its
>> testing part, though it seems that new collation algorithm can already
>> be used. Then I'd like to work in other parts of my project,
>> especially in internationalization parts. I'd also like to improve my
>> own library, qc, to provide a rich API for *BSD and POSIX systems,
>> since I acutely feel the lack of such API. If it is possible to stay
>> with project, I'd be very happy to do it. :-)
>>
>> P.S. Does anyone knows how to get diff between only for my branch
>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to
>> give everything what all FreeBSD's GSoC have done, so I need some
>> other command. Thanks for your help!
>>
>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode
>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd
>>
> First thank you very much for your work on this subject this is highly needed.
>
> Concerning the db format have you thought about using the new netbsd constant
> database format?
>
> It has simple API way easier to use, the db format is endian safe and final file
> is smaller than equivalent in bdb format.
>
> Lots of areas of FreeBSD could benefit from using this cdb format as well imho.

While here, let me congratulate Dmitry. The Unicode Collation Algorithm is
not something easy/fun to work with.

Indeed both David and Konrad suggested it (or tinycdb). The reason for
going bdb was that we had time constraints and bdb is already in libc.

FWIW, Nexenta kindly re-licensed localedef [1] and their collation support
in Illumos which basically implements their own very efficient format. We
ended up re-using the tools that libc already has to better focus on the
collation part.

Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.

As Dmitry noted there are still details to work out and we have to run tests
and get the code reviewed but all in all I am very satisfied with the 
advance
in this GSoC.

Best regards,

Pedro.

[1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef
[2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/


From owner-freebsd-i18n@FreeBSD.ORG  Wed Aug 27 10:48:34 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id C2974547;
 Wed, 27 Aug 2014 10:48:34 +0000 (UTC)
Received: from mail-we0-x234.google.com (mail-we0-x234.google.com
 [IPv6:2a00:1450:400c:c03::234])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 99FAA3080;
 Wed, 27 Aug 2014 10:48:33 +0000 (UTC)
Received: by mail-we0-f180.google.com with SMTP id w61so23678wes.25
 for <multiple recipients>; Wed, 27 Aug 2014 03:48:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type:content-transfer-encoding;
 bh=Ky5Mvmo/MVtD047GUVKgYLt+vchmDkt9mK4XZjv0wbU=;
 b=cr0+j3JPcDkRVb8sviKNOpT1iClpF8bYedBEjIQPYxJtnRrYi4lHfWeME2+0Fb21gB
 Upcyfk0Uk9TCt0ClopBTi2wtcTRjidro4R0dq+ntov2wHqEYY3PpIT7/B6rux+kk3JPH
 c+2zF4BTUylwRTmsDK3WfRxn87vhXFEm8RQlOZvuCBSgstEQG7Ae8Sd5cKKPwkli1APX
 tpdB9Iy9Zodv0xbysko0c7sraOvKl152q+7tJ1YxYwlqZKcgG5Mq6SnaGCKINX+MZXBG
 CLUsR7qGxNshRTCaiiSh80VLkbYpyfJKVUeuy5/mqqI3zg7pDdKYMUD/qvuwyjbbhGeQ
 X+lQ==
X-Received: by 10.180.92.134 with SMTP id cm6mr28091097wib.72.1409136510007;
 Wed, 27 Aug 2014 03:48:30 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.48.9 with HTTP; Wed, 27 Aug 2014 03:48:09 -0700 (PDT)
Reply-To: ghostmansd@gmail.com
In-Reply-To: <53FD1599.7040708@freebsd.org>
References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
 <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org>
From: Dmitry Selyutin <ghostman.sd@gmail.com>
Date: Wed, 27 Aug 2014 14:48:09 +0400
Message-ID: <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX+6j=wpO7zw@mail.gmail.com>
Subject: Re: Report #9: Unicode support
To: Pedro Giffuni <pfg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Baptiste Daroussin <bapt@freebsd.org>, soc-status@freebsd.org,
 David Chisnall <theraven@freebsd.org>, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Aug 2014 10:48:34 -0000

Hi, Pedro, Baptiste,

first of all thanks for your congratulations and kind words! The
project was really harder that anything I've ever met in my life, but
at the same time it was the most interesting one. :-) And still
remains! ;-)

> That is not really uncommon :)
Well, so I can leave it as it is. :-)

> The project does have access to sparc64 machines so if you have some
> self-contained test we can run it for you or we can test it as a routine =
libc
> test after committing.
Hopefully I can finish it today or in the next two days.

> You never answered my question concerning the fallback options.
Really? I thought that I answered. :-D Well, I'll try to explain
again. DUCET seems to be a bit obsolete collation table, which can be
more or less successfully used with real languages. However, in real
world it is completely unusable, so ICU and other use CLDR collation
table, which supports more levels. I started with DUCET since there
was much more information about it, but then I found that it doesn't
fit well, so I switched to CLDR. We have DUCET table somewhere in our
revisions though; as a fallback option, it still may be useful, so I
can restore it if you want.

> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.
Well, I think I'll do it right after exams. bdb AFAIK is deprecated
from Linux (though it can be used as bdb46 or something similar). I
don't know reasons why they did such thing; it would be great if we
could use a tool which can be used on different platforms without
modifications and tons of conditional define's and undef's.

> It has simple API way easier to use, the db format is endian safe and fin=
al file
> is smaller than equivalent in bdb format.
It sounds great!

> I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The
> FreeBSD Foundation will be allocating funds for students that want to go.
> I won=E2=80=99t be there (I am a bit far away) but David and other develo=
pers will
> likely be.
Well, that depends on whether I pass my exams for the postgraduate
course or not. I'd really like to listen to more experienced
developers and may be even talk to other people about work which I did
to better understand the community's opinions.

2014-08-27 3:17 GMT+04:00 Pedro Giffuni <pfg@freebsd.org>:
> Hi Baptiste;
>
>
> On 08/26/14 17:16, Baptiste Daroussin wrote:
>>
>> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote:
>>>
>>> Hello everyone!
>>>
>>> Here are the last news about the Unicode support project[0].
>>> You can always check my repository[1].
>>>
>>> During these days I had hardware problems (my HDD peacefully died), so
>>> development didn't progress so much as before. However, I've
>>> eliminated these problems, so I tried to fix bugs and reorganize the
>>> code as much as possible. Now everything shall compile.
>>>
>>> I decided to use __attribute__((constructor)) and
>>> __attribute__((destructor)), since I don't know if there exist a
>>> better way to open a file once in the startup and closing it when all
>>> routines close. I've found one or two occurrences of this construction
>>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I
>>> decided to use it. Hopefully it will also allow us to use root
>>> collation database in the embedded systems (if any such system really
>>> needs collation algorithm).
>>>
>>> As you may know we need a tool that can convert collation text files
>>> obtained from unicode.org to new collation database (colldb) format.
>>> There is a version of this tool written in Python
>>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we
>>> have a base system though, so it seems that we need to written such
>>> tool using C language. I was thinking of lex/yacc combo; I've never
>>> tried it, but I think it shouldn't be too hard to write a tool using
>>> it. I'd like to know your opinions about this task.
>>> I've already written a man page (bin/colldb/colldb.1). The only thing
>>> which seems dubious is that I decided to use the same name as for the
>>> library itself (well, it seems I have a lack of imagination). So we
>>> have both colldb.1 and colldb.3 man pages.
>>>
>>> The other thing I'd really like to do is to really force network byte
>>> order in collation database format (I'm sure I've seen a way to do it
>>> in Berkley databases). It's a pity that I have no platform with
>>> big-endian (or even PDP!) byte order. Any help here is highly
>>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts
>>> whether it fits well to my task).
>>>
>>> Since Google Summer of Code period has passed, I'd like to thank both
>>> my mentors, Pedro and David, who gave me a helping hand during this
>>> project, and especially Konrad Jankowski, who found time to answer my
>>> questions and help me too. Though GSoC is closed, I'd like to stay
>>> with FreeBSD project. First of all, I want to finish and bring to mind
>>> this project: I don't think it's really finished, especially its
>>> testing part, though it seems that new collation algorithm can already
>>> be used. Then I'd like to work in other parts of my project,
>>> especially in internationalization parts. I'd also like to improve my
>>> own library, qc, to provide a rich API for *BSD and POSIX systems,
>>> since I acutely feel the lack of such API. If it is possible to stay
>>> with project, I'd be very happy to do it. :-)
>>>
>>> P.S. Does anyone knows how to get diff between only for my branch
>>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to
>>> give everything what all FreeBSD's GSoC have done, so I need some
>>> other command. Thanks for your help!
>>>
>>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode
>>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd
>>>
>> First thank you very much for your work on this subject this is highly
>> needed.
>>
>> Concerning the db format have you thought about using the new netbsd
>> constant
>> database format?
>>
>> It has simple API way easier to use, the db format is endian safe and
>> final file
>> is smaller than equivalent in bdb format.
>>
>> Lots of areas of FreeBSD could benefit from using this cdb format as wel=
l
>> imho.
>
>
> While here, let me congratulate Dmitry. The Unicode Collation Algorithm i=
s
> not something easy/fun to work with.
>
> Indeed both David and Konrad suggested it (or tinycdb). The reason for
> going bdb was that we had time constraints and bdb is already in libc.
>
> FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppor=
t
> in Illumos which basically implements their own very efficient format. We
> ended up re-using the tools that libc already has to better focus on the
> collation part.
>
> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.
>
> As Dmitry noted there are still details to work out and we have to run te=
sts
> and get the code reviewed but all in all I am very satisfied with the
> advance
> in this GSoC.
>
> Best regards,
>
> Pedro.
>
> [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef
> [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/
>



--=20
With best regards,
Dmitry Selyutin

From owner-freebsd-i18n@FreeBSD.ORG  Wed Aug 27 10:51:24 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 328135C5;
 Wed, 27 Aug 2014 10:51:24 +0000 (UTC)
Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com
 [IPv6:2a00:1450:400c:c00::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0D58B3118;
 Wed, 27 Aug 2014 10:51:22 +0000 (UTC)
Received: by mail-wg0-f42.google.com with SMTP id l18so26061wgh.25
 for <multiple recipients>; Wed, 27 Aug 2014 03:51:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type:content-transfer-encoding;
 bh=xs76FovOPQA6k17V9jDJw/utYC9/mTQ60nnu07rrW6A=;
 b=E/ddpuRoflzeJ9ReuzKCnabiaEVnwS5LHyY5ZH1iWNFxv90SwBrf6n/YUtZeT1qmh8
 7WchhmhGnEC9Dnct57EEmHKClntGqDnBHUkTVteqfRnvlvunODYH0vY1buDraDmbVgks
 QqVJ8973EFYs0uel6uhhcx4PEeVKHYdw7LKyd3HoHq5I4F8lJgr1PZnZLsvoS2NjDq4T
 OG4epC8uriVesasH3kP9OBy0NBWmURAjD9ytr+vqmMs2BQxOBimFRk6r0h9a/w6b4iUR
 9Hpvqr0jv6Qdrmwh5xgl2cGH8ee46M2rhczmT/TquZNZuNicaYTXkQPCV7sTmsC8OPef
 0i9A==
X-Received: by 10.180.92.134 with SMTP id cm6mr28112076wib.72.1409136681311;
 Wed, 27 Aug 2014 03:51:21 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.48.9 with HTTP; Wed, 27 Aug 2014 03:51:01 -0700 (PDT)
Reply-To: ghostmansd@gmail.com
In-Reply-To: <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX+6j=wpO7zw@mail.gmail.com>
References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
 <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org>
 <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX+6j=wpO7zw@mail.gmail.com>
From: Dmitry Selyutin <ghostman.sd@gmail.com>
Date: Wed, 27 Aug 2014 14:51:01 +0400
Message-ID: <CAMqzjeuUrpOfkX41bTY62NRNap0NetCKzTpSv5JaUC4Qvh59sA@mail.gmail.com>
Subject: Re: Report #9: Unicode support
To: Pedro Giffuni <pfg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: Baptiste Daroussin <bapt@freebsd.org>, soc-status@freebsd.org,
 David Chisnall <theraven@freebsd.org>, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Aug 2014 10:51:24 -0000

I've just seen EuroBSDCon's calendar page and it seems that it is
impossible to join it (i.e. I missed the application deadline).[0]
Well, may be next year? :-)

2014-08-27 14:48 GMT+04:00 Dmitry Selyutin <ghostman.sd@gmail.com>:
> Hi, Pedro, Baptiste,
>
> first of all thanks for your congratulations and kind words! The
> project was really harder that anything I've ever met in my life, but
> at the same time it was the most interesting one. :-) And still
> remains! ;-)
>
>> That is not really uncommon :)
> Well, so I can leave it as it is. :-)
>
>> The project does have access to sparc64 machines so if you have some
>> self-contained test we can run it for you or we can test it as a routine=
 libc
>> test after committing.
> Hopefully I can finish it today or in the next two days.
>
>> You never answered my question concerning the fallback options.
> Really? I thought that I answered. :-D Well, I'll try to explain
> again. DUCET seems to be a bit obsolete collation table, which can be
> more or less successfully used with real languages. However, in real
> world it is completely unusable, so ICU and other use CLDR collation
> table, which supports more levels. I started with DUCET since there
> was much more information about it, but then I found that it doesn't
> fit well, so I switched to CLDR. We have DUCET table somewhere in our
> revisions though; as a fallback option, it still may be useful, so I
> can restore it if you want.
>
>> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.
> Well, I think I'll do it right after exams. bdb AFAIK is deprecated
> from Linux (though it can be used as bdb46 or something similar). I
> don't know reasons why they did such thing; it would be great if we
> could use a tool which can be used on different platforms without
> modifications and tons of conditional define's and undef's.
>
>> It has simple API way easier to use, the db format is endian safe and fi=
nal file
>> is smaller than equivalent in bdb format.
> It sounds great!
>
>> I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The
>> FreeBSD Foundation will be allocating funds for students that want to go=
.
>> I won=E2=80=99t be there (I am a bit far away) but David and other devel=
opers will
>> likely be.
> Well, that depends on whether I pass my exams for the postgraduate
> course or not. I'd really like to listen to more experienced
> developers and may be even talk to other people about work which I did
> to better understand the community's opinions.
>
> 2014-08-27 3:17 GMT+04:00 Pedro Giffuni <pfg@freebsd.org>:
>> Hi Baptiste;
>>
>>
>> On 08/26/14 17:16, Baptiste Daroussin wrote:
>>>
>>> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote:
>>>>
>>>> Hello everyone!
>>>>
>>>> Here are the last news about the Unicode support project[0].
>>>> You can always check my repository[1].
>>>>
>>>> During these days I had hardware problems (my HDD peacefully died), so
>>>> development didn't progress so much as before. However, I've
>>>> eliminated these problems, so I tried to fix bugs and reorganize the
>>>> code as much as possible. Now everything shall compile.
>>>>
>>>> I decided to use __attribute__((constructor)) and
>>>> __attribute__((destructor)), since I don't know if there exist a
>>>> better way to open a file once in the startup and closing it when all
>>>> routines close. I've found one or two occurrences of this construction
>>>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I
>>>> decided to use it. Hopefully it will also allow us to use root
>>>> collation database in the embedded systems (if any such system really
>>>> needs collation algorithm).
>>>>
>>>> As you may know we need a tool that can convert collation text files
>>>> obtained from unicode.org to new collation database (colldb) format.
>>>> There is a version of this tool written in Python
>>>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we
>>>> have a base system though, so it seems that we need to written such
>>>> tool using C language. I was thinking of lex/yacc combo; I've never
>>>> tried it, but I think it shouldn't be too hard to write a tool using
>>>> it. I'd like to know your opinions about this task.
>>>> I've already written a man page (bin/colldb/colldb.1). The only thing
>>>> which seems dubious is that I decided to use the same name as for the
>>>> library itself (well, it seems I have a lack of imagination). So we
>>>> have both colldb.1 and colldb.3 man pages.
>>>>
>>>> The other thing I'd really like to do is to really force network byte
>>>> order in collation database format (I'm sure I've seen a way to do it
>>>> in Berkley databases). It's a pity that I have no platform with
>>>> big-endian (or even PDP!) byte order. Any help here is highly
>>>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts
>>>> whether it fits well to my task).
>>>>
>>>> Since Google Summer of Code period has passed, I'd like to thank both
>>>> my mentors, Pedro and David, who gave me a helping hand during this
>>>> project, and especially Konrad Jankowski, who found time to answer my
>>>> questions and help me too. Though GSoC is closed, I'd like to stay
>>>> with FreeBSD project. First of all, I want to finish and bring to mind
>>>> this project: I don't think it's really finished, especially its
>>>> testing part, though it seems that new collation algorithm can already
>>>> be used. Then I'd like to work in other parts of my project,
>>>> especially in internationalization parts. I'd also like to improve my
>>>> own library, qc, to provide a rich API for *BSD and POSIX systems,
>>>> since I acutely feel the lack of such API. If it is possible to stay
>>>> with project, I'd be very happy to do it. :-)
>>>>
>>>> P.S. Does anyone knows how to get diff between only for my branch
>>>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to
>>>> give everything what all FreeBSD's GSoC have done, so I need some
>>>> other command. Thanks for your help!
>>>>
>>>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode
>>>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd
>>>>
>>> First thank you very much for your work on this subject this is highly
>>> needed.
>>>
>>> Concerning the db format have you thought about using the new netbsd
>>> constant
>>> database format?
>>>
>>> It has simple API way easier to use, the db format is endian safe and
>>> final file
>>> is smaller than equivalent in bdb format.
>>>
>>> Lots of areas of FreeBSD could benefit from using this cdb format as we=
ll
>>> imho.
>>
>>
>> While here, let me congratulate Dmitry. The Unicode Collation Algorithm =
is
>> not something easy/fun to work with.
>>
>> Indeed both David and Konrad suggested it (or tinycdb). The reason for
>> going bdb was that we had time constraints and bdb is already in libc.
>>
>> FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppo=
rt
>> in Illumos which basically implements their own very efficient format. W=
e
>> ended up re-using the tools that libc already has to better focus on the
>> collation part.
>>
>> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.
>>
>> As Dmitry noted there are still details to work out and we have to run t=
ests
>> and get the code reviewed but all in all I am very satisfied with the
>> advance
>> in this GSoC.
>>
>> Best regards,
>>
>> Pedro.
>>
>> [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef
>> [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/
>>
>
>
>
> --
> With best regards,
> Dmitry Selyutin



--=20
With best regards,
Dmitry Selyutin

From owner-freebsd-i18n@FreeBSD.ORG  Wed Aug 27 15:28:02 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id A4C11C54
 for <freebsd-i18n@freebsd.org>; Wed, 27 Aug 2014 15:28:02 +0000 (UTC)
Received: from nm10-vm0.bullet.mail.bf1.yahoo.com
 (nm10-vm0.bullet.mail.bf1.yahoo.com [98.139.213.147])
 by mx1.freebsd.org (Postfix) with ESMTP id 3E9C030FA
 for <freebsd-i18n@freebsd.org>; Wed, 27 Aug 2014 15:28:02 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048;
 t=1409153274; bh=Me8NL3cQAZHXGM5FkxOvDZHQ8eMG0v6cmBpF9FcL+20=;
 h=Received:Received:Received:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:From:Subject;
 b=B/RrSfAIMQXsoO0CMuVdgNPqI8TcpU6JuGzAFZfRZiQxhg4yfXUBsYESDd/u7Fc3M8xnSK1Gqof3xrb1PNASqG/0Yeo70JsWoAvpCibw1JqSG2NUCS3N6b/SvLHtgup9LxwSJXEkLZltQZfQf6534tn6APZbqgTdDl/RFofHzGXQUIY7dWE4JzocOLiJ9s1YuUWFK7sDrWRmZ/hECSR/HYfnQxof9ZFYnPsGBfZH8+prXT3xSe0sUOGUXSFC24GUXgZINITmgoM+bRB+B0l7Ct0eM4VCrnmA6suDawBnoEx/NgA9htvhyTN/tR3Zquv+eXEz5kY3+W7RHHp6yaME6w==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com;
 b=qjoNU/sQeUdcX+eRSRPwCV6EP7MrPpthstQ3n/CSCzUuciFnfjaQpAWYqqLyO0bidTv14hH4sD4GN6RVhPHCSmrI5hRMo+aAYP2ZlwWogT1JE6q+k45q8/PmGyo6JNlMUk/CJL7viULLanaVm4caPgJm4L/omu84XAY/mAE7AD2hT90m5jbu+ABUzZAD6WDKsWtMgX4Vy/iAm2RNbgK+i5OXxih1+MSJv4Nmy0YczcRNTIvqoY94iFE1yArf2+ms54dC1qDYHeT9AldQm1cixC9yn5ZhjWl0a2RNsBnSkpun6EjCWF47ttRqvOu5BkxfKsqT/6Oi5TfZUy4m0gXNZA==;
Received: from [66.196.81.170] by nm10.bullet.mail.bf1.yahoo.com with NNFMP;
 27 Aug 2014 15:27:54 -0000
Received: from [98.139.211.198] by tm16.bullet.mail.bf1.yahoo.com with NNFMP;
 27 Aug 2014 15:27:54 -0000
Received: from [127.0.0.1] by smtp207.mail.bf1.yahoo.com with NNFMP;
 27 Aug 2014 15:27:54 -0000
X-Yahoo-Newman-Id: 879918.25685.bm@smtp207.mail.bf1.yahoo.com
X-Yahoo-Newman-Property: ymail-3
X-YMail-OSG: K9VwH5kVM1nnsjGa3mRli6RRmWaA6ztzGinNhN9vOIg.faN
 JblK24uJyCv034snOhc7N6jnheVsP2tX4.afv_sooDzJ2_Gxah9oM.IfEMq.
 xbThr21Zaf50yg88JAe_WRgM.q_dspNb.KwFZxuDlH2LtP2AnNrGA.oQoqsY
 KR1_Cjm3JRtzvLT99QycwyqmjIURC3TCnPcrLS_0pgbQKZeXRY.4f5gzpzyk
 Ie1G23R5Slyaz863sQ0Y85CBEvcqPDQCUrHkgwvFyMb88JSBT_GObSGDVbg2
 0RM0gGkZIz.sqUaHrmMNAiQ686XZfX8QZQgoFnrwxgHKterCnZN8yluHgU_X
 AQLi9g98VJZmMookhFBcogEqlbQbsokAsNBBZd7ib7Tn87p.qYZ6porkRUeL
 Q9Vj3ukBkNUpCPDMkWBsxFyRjRGrqN9LT4Ong5g_j4xZu38t1uW1G2ggRsrv
 cIMdXBoRcQ6kAgTK6ZFurLrzc4pcPLBrRkcwaV2n4hfbVsrVYhrL6wPsAJwG
 GuWH8OJZrokCfZtBLw4aW6sF0.40K2ohqEg--
X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf
Message-ID: <53FDF90B.4030400@freebsd.org>
Date: Wed, 27 Aug 2014 10:28:11 -0500
From: Pedro Giffuni <pfg@freebsd.org>
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64;
 rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: ghostmansd@gmail.com
Subject: Re: Report #9: Unicode support
References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
 <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org>
 <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX+6j=wpO7zw@mail.gmail.com>
 <CAMqzjeuUrpOfkX41bTY62NRNap0NetCKzTpSv5JaUC4Qvh59sA@mail.gmail.com>
In-Reply-To: <CAMqzjeuUrpOfkX41bTY62NRNap0NetCKzTpSv5JaUC4Qvh59sA@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Baptiste Daroussin <bapt@freebsd.org>, soc-status@freebsd.org,
 David Chisnall <theraven@freebsd.org>, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Aug 2014 15:28:02 -0000


On 08/27/14 05:51, Dmitry Selyutin wrote:
> ...
>>> You never answered my question concerning the fallback options.
>> Really? I thought that I answered. :-D Well, I'll try to explain
>> again. DUCET seems to be a bit obsolete collation table, which can be
>> more or less successfully used with real languages. However, in real
>> world it is completely unusable, so ICU and other use CLDR collation
>> table, which supports more levels. I started with DUCET since there
>> was much more information about it, but then I found that it doesn't
>> fit well, so I switched to CLDR. We have DUCET table somewhere in our
>> revisions though; as a fallback option, it still may be useful, so I
>> can restore it if you want.


I don't see DUCET as being ever used but we are setting the old
algorithm as a fallback for CLDR.

I was just wondering how DUCET compares to the existing
algorithm. Given that DUCET is in the standard and that you
already implemented it, I thought it would be a better fallback
than the old code. It's your call though.

Pedro.

From owner-freebsd-i18n@FreeBSD.ORG  Thu Aug 28 20:23:41 2014
Return-Path: <owner-freebsd-i18n@FreeBSD.ORG>
Delivered-To: freebsd-i18n@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id D05BEB2E;
 Thu, 28 Aug 2014 20:23:41 +0000 (UTC)
Received: from mail-wg0-x232.google.com (mail-wg0-x232.google.com
 [IPv6:2a00:1450:400c:c00::232])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id CF6B81C48;
 Thu, 28 Aug 2014 20:23:40 +0000 (UTC)
Received: by mail-wg0-f50.google.com with SMTP id x12so1270708wgg.33
 for <multiple recipients>; Thu, 28 Aug 2014 13:23:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:from:date:message-id:subject:to:content-type;
 bh=Y0XI8uGyRfwy1Gw5foSW4L/pgNWAtHXLsY0Bw+L28fk=;
 b=mISzyEKaBe5+JYY8Cl4CxrEiUI6U6/hLTiWSiJ1YwkQZhfk+FpqmfPyq/+WQYG4n6E
 241ohD223b+dvylM6cX5/C+reHc214JjgnugBgalMBvH8uiT/lU92X5otsb7WberrA/c
 1fBFGF8OClOB0cI3o3/9MNnvXndUEKU03PXTJ6c06QAzTWmQHdJ/KBSD3pxUo7xYkCJp
 FS6ZyyLL1NJ/6Kjri74NKeuKpZu0AA+9BVtSolymt4ZfnZgOPWJ0Ah91vU5JBYX6ftCZ
 tRhbT4I7DyGPag2ePQLWeor/i9HlRTXzVBEW3ZrM9b5ZXuSo2jEAAG3zk+zkUvkpEpxm
 yHJw==
X-Received: by 10.180.92.134 with SMTP id cm6mr9245601wib.72.1409257419141;
 Thu, 28 Aug 2014 13:23:39 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.48.9 with HTTP; Thu, 28 Aug 2014 13:23:18 -0700 (PDT)
Reply-To: ghostmansd@gmail.com
From: Dmitry Selyutin <ghostman.sd@gmail.com>
Date: Fri, 29 Aug 2014 00:23:18 +0400
Message-ID: <CAMqzjev6ZcUTJ+fJLOPnfOt5La-Zg8OUm1tPZh4g4U+N+TPmSg@mail.gmail.com>
Subject: Report #10: Unicode support
To: soc-status@freebsd.org, Pedro Giffuni <pfg@freebsd.org>, 
 David Chisnall <theraven@freebsd.org>, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-i18n@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: FreeBSD Internationalization Effort <freebsd-i18n.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-i18n/>
List-Post: <mailto:freebsd-i18n@freebsd.org>
List-Help: <mailto:freebsd-i18n-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-i18n>,
 <mailto:freebsd-i18n-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Aug 2014 20:23:41 -0000

Hello everyone!

I've written a colldb tool using C language, so one may now use colldb
after compiling sources. I decided not to use lex/yacc here since it
seemed to be a bit overkill for just a simple task, so all you need is
C plus libcolldb (written in C too). I've also written colldb.1 manual
page and fixed libcolldb build (previously one could not compile it
without UNICODE=YES in make.conf).

Due to my exams (they will start in several days) I have to take a
rest to be prepared. Thanks to everyone who helped me during this
summer! I'd like to thank FreeBSD's community: guys, you are amazing,
it's really pleasant to work with you! I'm going to continue my work
after September 20th, so stay tuned! ;-)

P.S. Repository is here as usual:
https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd

-- 
With best regards,
Dmitry Selyutin