From owner-freebsd-i18n@FreeBSD.ORG Tue Aug 26 21:09:20 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C7FCA7A9; Tue, 26 Aug 2014 21:09:20 +0000 (UTC) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C46CB3DE1; Tue, 26 Aug 2014 21:09:19 +0000 (UTC) Received: by mail-wi0-f181.google.com with SMTP id bs8so4809314wib.14 for ; Tue, 26 Aug 2014 14:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:from:date:message-id:subject:to:content-type; bh=cnr/8mqoU7mb6qYvKc4seokOzUYjWnJMKr03dvVdyy0=; b=POPCFLK6cYQ5XsUK+ftFfMaN2wT1kmDRk5YA38YNdjvCM+/6KC10ZDsj+rgdoprLfY qYHeeFS9G6C1eOLvkC/StQ5pUW7q2wu1yubFYqkKOxQjeGG8MQz5j1YM/ISy4G4sLZji O/Vm6q3/rWRNN6rg+FbTR+vX+aiea1QrABC6AbS1t59nccKYOoXJvuFm8FaLJbZdCEYm NfXhZgXiL7uWBp+ywkTX8zEFKZEp77r1NYwowiV+EGxtr3w8AOfGfQ66/evSfs0AmG0J 8NGukji3nyrTlDD5rWI6V+JOdZr5J3/mwW5qaRYzdd4zODUWpYIBTiyFbFOkcuC/pD7U KwNQ== X-Received: by 10.180.20.40 with SMTP id k8mr24479098wie.38.1409087358120; Tue, 26 Aug 2014 14:09:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.48.9 with HTTP; Tue, 26 Aug 2014 14:08:58 -0700 (PDT) Reply-To: ghostmansd@gmail.com From: Dmitry Selyutin Date: Wed, 27 Aug 2014 01:08:58 +0400 Message-ID: Subject: Report #9: Unicode support To: soc-status@freebsd.org, Pedro Giffuni , David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 21:09:21 -0000 Hello everyone! Here are the last news about the Unicode support project[0]. You can always check my repository[1]. During these days I had hardware problems (my HDD peacefully died), so development didn't progress so much as before. However, I've eliminated these problems, so I tried to fix bugs and reorganize the code as much as possible. Now everything shall compile. I decided to use __attribute__((constructor)) and __attribute__((destructor)), since I don't know if there exist a better way to open a file once in the startup and closing it when all routines close. I've found one or two occurrences of this construction in FreeBSD code; AFAICT it is rather common in clang and gcc, so I decided to use it. Hopefully it will also allow us to use root collation database in the embedded systems (if any such system really needs collation algorithm). As you may know we need a tool that can convert collation text files obtained from unicode.org to new collation database (colldb) format. There is a version of this tool written in Python (share/examples/colldb/colldb.py). IIRC we can't use Python when we have a base system though, so it seems that we need to written such tool using C language. I was thinking of lex/yacc combo; I've never tried it, but I think it shouldn't be too hard to write a tool using it. I'd like to know your opinions about this task. I've already written a man page (bin/colldb/colldb.1). The only thing which seems dubious is that I decided to use the same name as for the library itself (well, it seems I have a lack of imagination). So we have both colldb.1 and colldb.3 man pages. The other thing I'd really like to do is to really force network byte order in collation database format (I'm sure I've seen a way to do it in Berkley databases). It's a pity that I have no platform with big-endian (or even PDP!) byte order. Any help here is highly appreciated (as well as your thoughts about lex/yacc, i.e. thoughts whether it fits well to my task). Since Google Summer of Code period has passed, I'd like to thank both my mentors, Pedro and David, who gave me a helping hand during this project, and especially Konrad Jankowski, who found time to answer my questions and help me too. Though GSoC is closed, I'd like to stay with FreeBSD project. First of all, I want to finish and bring to mind this project: I don't think it's really finished, especially its testing part, though it seems that new collation algorithm can already be used. Then I'd like to work in other parts of my project, especially in internationalization parts. I'd also like to improve my own library, qc, to provide a rich API for *BSD and POSIX systems, since I acutely feel the lack of such API. If it is possible to stay with project, I'd be very happy to do it. :-) P.S. Does anyone knows how to get diff between only for my branch (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to give everything what all FreeBSD's GSoC have done, so I need some other command. Thanks for your help! [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd -- With best regards, Dmitry Selyutin From owner-freebsd-i18n@FreeBSD.ORG Tue Aug 26 22:16:16 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B07CEA04; Tue, 26 Aug 2014 22:16:16 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A2FCB35BC; Tue, 26 Aug 2014 22:16:15 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id hi2so4876668wib.5 for ; Tue, 26 Aug 2014 15:16:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=ARZ6+7JpQ5CLPbInYPEajOm/CEh+32hGx7okdeynTgs=; b=oB2CL8XVmaPDovtDZOo+01pNdU73gtZTl9Nb76s3ayoeKR3yYhQDr85rAr6WhVUsqz +DkoTQQJlEr3lK8eTe0DXbczzQIPyWlGvL0/J1/+GcTxm/arXzDGEtbayrnstGIFA/qT J6mgrV0bchA3KHeFmkJA3wmuk5NXnrLtbu4kfpM/8CayEGaV3zr8mZlm9YYIpFnpWhFI Lrf7z6gQsWw7jVR20G4+fvHRm4iI9ez42jaX9lzPWSuQ4vnmOmM5PuTbK/mFsuPJ2/bs 0UiTqmYzaiCvFwHFGBQo868vhAm0i7RduJfXLRiIMCtyn2bDaQ8Z1c7JKAoDRouP09QW RlIw== X-Received: by 10.180.149.169 with SMTP id ub9mr24451406wib.32.1409091373535; Tue, 26 Aug 2014 15:16:13 -0700 (PDT) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by mx.google.com with ESMTPSA id hi4sm11541340wjb.46.2014.08.26.15.16.12 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Aug 2014 15:16:12 -0700 (PDT) Sender: Baptiste Daroussin Date: Wed, 27 Aug 2014 00:16:10 +0200 From: Baptiste Daroussin To: ghostmansd@gmail.com Subject: Re: Report #9: Unicode support Message-ID: <20140826221610.GD65120@ivaldir.etoilebsd.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sXc4Kmr5FA7axrvy" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: David Chisnall , soc-status@freebsd.org, Pedro Giffuni , Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 22:16:16 -0000 --sXc4Kmr5FA7axrvy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: > Hello everyone! >=20 > Here are the last news about the Unicode support project[0]. > You can always check my repository[1]. >=20 > During these days I had hardware problems (my HDD peacefully died), so > development didn't progress so much as before. However, I've > eliminated these problems, so I tried to fix bugs and reorganize the > code as much as possible. Now everything shall compile. >=20 > I decided to use __attribute__((constructor)) and > __attribute__((destructor)), since I don't know if there exist a > better way to open a file once in the startup and closing it when all > routines close. I've found one or two occurrences of this construction > in FreeBSD code; AFAICT it is rather common in clang and gcc, so I > decided to use it. Hopefully it will also allow us to use root > collation database in the embedded systems (if any such system really > needs collation algorithm). >=20 > As you may know we need a tool that can convert collation text files > obtained from unicode.org to new collation database (colldb) format. > There is a version of this tool written in Python > (share/examples/colldb/colldb.py). IIRC we can't use Python when we > have a base system though, so it seems that we need to written such > tool using C language. I was thinking of lex/yacc combo; I've never > tried it, but I think it shouldn't be too hard to write a tool using > it. I'd like to know your opinions about this task. > I've already written a man page (bin/colldb/colldb.1). The only thing > which seems dubious is that I decided to use the same name as for the > library itself (well, it seems I have a lack of imagination). So we > have both colldb.1 and colldb.3 man pages. >=20 > The other thing I'd really like to do is to really force network byte > order in collation database format (I'm sure I've seen a way to do it > in Berkley databases). It's a pity that I have no platform with > big-endian (or even PDP!) byte order. Any help here is highly > appreciated (as well as your thoughts about lex/yacc, i.e. thoughts > whether it fits well to my task). >=20 > Since Google Summer of Code period has passed, I'd like to thank both > my mentors, Pedro and David, who gave me a helping hand during this > project, and especially Konrad Jankowski, who found time to answer my > questions and help me too. Though GSoC is closed, I'd like to stay > with FreeBSD project. First of all, I want to finish and bring to mind > this project: I don't think it's really finished, especially its > testing part, though it seems that new collation algorithm can already > be used. Then I'd like to work in other parts of my project, > especially in internationalization parts. I'd also like to improve my > own library, qc, to provide a rich API for *BSD and POSIX systems, > since I acutely feel the lack of such API. If it is possible to stay > with project, I'd be very happy to do it. :-) >=20 > P.S. Does anyone knows how to get diff between only for my branch > (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to > give everything what all FreeBSD's GSoC have done, so I need some > other command. Thanks for your help! >=20 > [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode > [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >=20 First thank you very much for your work on this subject this is highly need= ed. Concerning the db format have you thought about using the new netbsd consta= nt database format? It has simple API way easier to use, the db format is endian safe and final= file is smaller than equivalent in bdb format. Lots of areas of FreeBSD could benefit from using this cdb format as well i= mho. regards, Bapt --sXc4Kmr5FA7axrvy Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlP9ByoACgkQ8kTtMUmk6Ez2IACgjTEpHU5zDDx4IdA99j7/O1Ty KT0AnjcnBEstTI1ZjNe8yurWOur1fi3l =taUl -----END PGP SIGNATURE----- --sXc4Kmr5FA7axrvy-- From owner-freebsd-i18n@FreeBSD.ORG Tue Aug 26 23:17:40 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6BFF8D2A for ; Tue, 26 Aug 2014 23:17:40 +0000 (UTC) Received: from nm17-vm1.bullet.mail.bf1.yahoo.com (nm17-vm1.bullet.mail.bf1.yahoo.com [98.139.213.55]) by mx1.freebsd.org (Postfix) with ESMTP id 1A69C3EFD for ; Tue, 26 Aug 2014 23:17:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1409095052; bh=AaEvnKkmGqLM3zNYd0bc/iD94kjsBKCiZVWdxKNX8ic=; h=Received:Received:Received:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:From:Subject; b=turmXWapLknz59OM/vyTcbisxwK+cT+we6SPVM42c5Y4VQ+o/OnxgDgi9kGymEhiOGjABd4LZWXNuhD5dGbg3D1M1vSrvDtYV3DPa2OeDsQGmgqWsUfgpi8fQWLAwKWz0GvcM9XKWbSmcuaBlwAVkgmj97W81ZtgJlR9BnmBTajJi7KpSnfwv+Rh+C9r9GWdPt04Zqwq1j1iqzxmp6MjbO7FTzprDcnnZVWw+QEaZW/rAPpJPVQbcl2hCkMFXlJYidy4Zpr/Yl7RXMAuKBMIpkntYHeWVGpE5xkO2G6aMBkBwyIFmw7CpQ0++N7KYLqDgJzGBxPGqWhl2nlJRk1Xxg== DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com; b=WeEHAzoYnG5rDKI/vMCD/y4YbUAAhmjxqoy5XDNU0HYAJkKxnR3KqS+UgPIPm2fSak65KoxlKNy/kUeyM+/KbVYud3W8Nd791iia2B2fSnNycuFpd2vXuDtJMf9jEpqdImsaMSntQ3uMkofWtSpcXzSL9CdCwVJsQBq89V676xHRH5AzSujxiGvwDRTmZNhLAjholMaiRA/G+LgZqNPg6xBJrC7EgJxURBPB2QXZLSYAZmVSgbn86ONdH03cfnr3n5jeZA/XhLDtlcTRW/dXUYgGp7M7rUwkkwLUbM5AauKZ8ovQJWQa4kQ+nKmbU547YwiNoQ9oaMCGr05MHx1z3g==; Received: from [98.139.215.143] by nm17.bullet.mail.bf1.yahoo.com with NNFMP; 26 Aug 2014 23:17:32 -0000 Received: from [68.142.230.69] by tm14.bullet.mail.bf1.yahoo.com with NNFMP; 26 Aug 2014 23:17:32 -0000 Received: from [127.0.0.1] by smtp226.mail.bf1.yahoo.com with NNFMP; 26 Aug 2014 23:17:32 -0000 X-Yahoo-Newman-Id: 116492.29038.bm@smtp226.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: iGE.ZTAVM1l3c8F6nX2XuCSeGcDmwdS3i_Uc3sRrywgW6tb Z_wY2XEiG5yCFwqyhTxZZHy0irdNX4F31Jwq6CzEzV8.GVfCfs0Z.NOndtTT GsrZzS3eBiFJwrt2FyW5Dvt9PKuwCSpU.cPWeuCFJwvDpND5sYdJi2zYyJT. CiGn78PpoYNiLqcJLjSjjdj8Rygqygc_OhPyPZ94YuMbN3PtJRIycOC8MjS8 VD7U4EUbft.CpThnGm_nJD7UXtOdeJD6lB3CcJkzAIalFfOtDXwBuwVb.q.q qXS.afV5MS2QuY89Exx87lbQ3F.KXCNuzjXPtABM8yvlo5VwGskjD4n2Tj5f xijQQBEaJuTQra_CVWOW.pGZ6LJIZNAij.N2drb82eK9Cx5mgW59k1AfuulI UK3RL2ED4JdgHGL50e93KiLTmcccjSuot9DylUkuWeF6sHALrnioGdY_b8a9 WRHj9nfdwWLp5RKag3A0WwNjAeimjqdX4TIEuhgsBF18e5dQXQWeHD3WTyv7 InVfEPnT2SFnGe1o.0Ii7o1Nk_eU2mWtXgRPMctzMgNxwJWGbfxLPAF_jDaL 3gZaW6aIhdN2JweKYO11sHnRUJYZsLub2wVFVaamf.LA7Z8rqucwdKQE.sef j7YeWf9AWOxnIwLX9Ijxn8gSrdlLWs5bZvqKdZ9fKDDl3ImyA1SOcm8KNrPa pcCAFEWcMtrd2NCMG2980HwylgjWrqhZnKb7M.CNxSCUR0MCWH96NwNN_bss 3EvJCSah4d1EHZpQr8dS.t1HfZJ63bZ9DJJqPW25UbMcdb8QLkNNfl5cid0q 3xzWXJsBNajmUAjWKGtWw7dSl8OjIZJOMV3C_6bweml48O48smBfW2TBNKVU g55av_T6AqHeTqeMucTRUzBPEStJ9y6.GMMs- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf Message-ID: <53FD1599.7040708@freebsd.org> Date: Tue, 26 Aug 2014 18:17:45 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Baptiste Daroussin , ghostmansd@gmail.com Subject: Re: Report #9: Unicode support References: <20140826221610.GD65120@ivaldir.etoilebsd.net> In-Reply-To: <20140826221610.GD65120@ivaldir.etoilebsd.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: soc-status@freebsd.org, David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 23:17:40 -0000 Hi Baptiste; On 08/26/14 17:16, Baptiste Daroussin wrote: > On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: >> Hello everyone! >> >> Here are the last news about the Unicode support project[0]. >> You can always check my repository[1]. >> >> During these days I had hardware problems (my HDD peacefully died), so >> development didn't progress so much as before. However, I've >> eliminated these problems, so I tried to fix bugs and reorganize the >> code as much as possible. Now everything shall compile. >> >> I decided to use __attribute__((constructor)) and >> __attribute__((destructor)), since I don't know if there exist a >> better way to open a file once in the startup and closing it when all >> routines close. I've found one or two occurrences of this construction >> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I >> decided to use it. Hopefully it will also allow us to use root >> collation database in the embedded systems (if any such system really >> needs collation algorithm). >> >> As you may know we need a tool that can convert collation text files >> obtained from unicode.org to new collation database (colldb) format. >> There is a version of this tool written in Python >> (share/examples/colldb/colldb.py). IIRC we can't use Python when we >> have a base system though, so it seems that we need to written such >> tool using C language. I was thinking of lex/yacc combo; I've never >> tried it, but I think it shouldn't be too hard to write a tool using >> it. I'd like to know your opinions about this task. >> I've already written a man page (bin/colldb/colldb.1). The only thing >> which seems dubious is that I decided to use the same name as for the >> library itself (well, it seems I have a lack of imagination). So we >> have both colldb.1 and colldb.3 man pages. >> >> The other thing I'd really like to do is to really force network byte >> order in collation database format (I'm sure I've seen a way to do it >> in Berkley databases). It's a pity that I have no platform with >> big-endian (or even PDP!) byte order. Any help here is highly >> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts >> whether it fits well to my task). >> >> Since Google Summer of Code period has passed, I'd like to thank both >> my mentors, Pedro and David, who gave me a helping hand during this >> project, and especially Konrad Jankowski, who found time to answer my >> questions and help me too. Though GSoC is closed, I'd like to stay >> with FreeBSD project. First of all, I want to finish and bring to mind >> this project: I don't think it's really finished, especially its >> testing part, though it seems that new collation algorithm can already >> be used. Then I'd like to work in other parts of my project, >> especially in internationalization parts. I'd also like to improve my >> own library, qc, to provide a rich API for *BSD and POSIX systems, >> since I acutely feel the lack of such API. If it is possible to stay >> with project, I'd be very happy to do it. :-) >> >> P.S. Does anyone knows how to get diff between only for my branch >> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to >> give everything what all FreeBSD's GSoC have done, so I need some >> other command. Thanks for your help! >> >> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode >> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >> > First thank you very much for your work on this subject this is highly needed. > > Concerning the db format have you thought about using the new netbsd constant > database format? > > It has simple API way easier to use, the db format is endian safe and final file > is smaller than equivalent in bdb format. > > Lots of areas of FreeBSD could benefit from using this cdb format as well imho. While here, let me congratulate Dmitry. The Unicode Collation Algorithm is not something easy/fun to work with. Indeed both David and Konrad suggested it (or tinycdb). The reason for going bdb was that we had time constraints and bdb is already in libc. FWIW, Nexenta kindly re-licensed localedef [1] and their collation support in Illumos which basically implements their own very efficient format. We ended up re-using the tools that libc already has to better focus on the collation part. Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. As Dmitry noted there are still details to work out and we have to run tests and get the code reviewed but all in all I am very satisfied with the advance in this GSoC. Best regards, Pedro. [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/ From owner-freebsd-i18n@FreeBSD.ORG Wed Aug 27 10:48:34 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C2974547; Wed, 27 Aug 2014 10:48:34 +0000 (UTC) Received: from mail-we0-x234.google.com (mail-we0-x234.google.com [IPv6:2a00:1450:400c:c03::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 99FAA3080; Wed, 27 Aug 2014 10:48:33 +0000 (UTC) Received: by mail-we0-f180.google.com with SMTP id w61so23678wes.25 for ; Wed, 27 Aug 2014 03:48:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=Ky5Mvmo/MVtD047GUVKgYLt+vchmDkt9mK4XZjv0wbU=; b=cr0+j3JPcDkRVb8sviKNOpT1iClpF8bYedBEjIQPYxJtnRrYi4lHfWeME2+0Fb21gB Upcyfk0Uk9TCt0ClopBTi2wtcTRjidro4R0dq+ntov2wHqEYY3PpIT7/B6rux+kk3JPH c+2zF4BTUylwRTmsDK3WfRxn87vhXFEm8RQlOZvuCBSgstEQG7Ae8Sd5cKKPwkli1APX tpdB9Iy9Zodv0xbysko0c7sraOvKl152q+7tJ1YxYwlqZKcgG5Mq6SnaGCKINX+MZXBG CLUsR7qGxNshRTCaiiSh80VLkbYpyfJKVUeuy5/mqqI3zg7pDdKYMUD/qvuwyjbbhGeQ X+lQ== X-Received: by 10.180.92.134 with SMTP id cm6mr28091097wib.72.1409136510007; Wed, 27 Aug 2014 03:48:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.48.9 with HTTP; Wed, 27 Aug 2014 03:48:09 -0700 (PDT) Reply-To: ghostmansd@gmail.com In-Reply-To: <53FD1599.7040708@freebsd.org> References: <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org> From: Dmitry Selyutin Date: Wed, 27 Aug 2014 14:48:09 +0400 Message-ID: Subject: Re: Report #9: Unicode support To: Pedro Giffuni Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Baptiste Daroussin , soc-status@freebsd.org, David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2014 10:48:34 -0000 Hi, Pedro, Baptiste, first of all thanks for your congratulations and kind words! The project was really harder that anything I've ever met in my life, but at the same time it was the most interesting one. :-) And still remains! ;-) > That is not really uncommon :) Well, so I can leave it as it is. :-) > The project does have access to sparc64 machines so if you have some > self-contained test we can run it for you or we can test it as a routine = libc > test after committing. Hopefully I can finish it today or in the next two days. > You never answered my question concerning the fallback options. Really? I thought that I answered. :-D Well, I'll try to explain again. DUCET seems to be a bit obsolete collation table, which can be more or less successfully used with real languages. However, in real world it is completely unusable, so ICU and other use CLDR collation table, which supports more levels. I started with DUCET since there was much more information about it, but then I found that it doesn't fit well, so I switched to CLDR. We have DUCET table somewhere in our revisions though; as a fallback option, it still may be useful, so I can restore it if you want. > Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. Well, I think I'll do it right after exams. bdb AFAIK is deprecated from Linux (though it can be used as bdb46 or something similar). I don't know reasons why they did such thing; it would be great if we could use a tool which can be used on different platforms without modifications and tons of conditional define's and undef's. > It has simple API way easier to use, the db format is endian safe and fin= al file > is smaller than equivalent in bdb format. It sounds great! > I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The > FreeBSD Foundation will be allocating funds for students that want to go. > I won=E2=80=99t be there (I am a bit far away) but David and other develo= pers will > likely be. Well, that depends on whether I pass my exams for the postgraduate course or not. I'd really like to listen to more experienced developers and may be even talk to other people about work which I did to better understand the community's opinions. 2014-08-27 3:17 GMT+04:00 Pedro Giffuni : > Hi Baptiste; > > > On 08/26/14 17:16, Baptiste Daroussin wrote: >> >> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: >>> >>> Hello everyone! >>> >>> Here are the last news about the Unicode support project[0]. >>> You can always check my repository[1]. >>> >>> During these days I had hardware problems (my HDD peacefully died), so >>> development didn't progress so much as before. However, I've >>> eliminated these problems, so I tried to fix bugs and reorganize the >>> code as much as possible. Now everything shall compile. >>> >>> I decided to use __attribute__((constructor)) and >>> __attribute__((destructor)), since I don't know if there exist a >>> better way to open a file once in the startup and closing it when all >>> routines close. I've found one or two occurrences of this construction >>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I >>> decided to use it. Hopefully it will also allow us to use root >>> collation database in the embedded systems (if any such system really >>> needs collation algorithm). >>> >>> As you may know we need a tool that can convert collation text files >>> obtained from unicode.org to new collation database (colldb) format. >>> There is a version of this tool written in Python >>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we >>> have a base system though, so it seems that we need to written such >>> tool using C language. I was thinking of lex/yacc combo; I've never >>> tried it, but I think it shouldn't be too hard to write a tool using >>> it. I'd like to know your opinions about this task. >>> I've already written a man page (bin/colldb/colldb.1). The only thing >>> which seems dubious is that I decided to use the same name as for the >>> library itself (well, it seems I have a lack of imagination). So we >>> have both colldb.1 and colldb.3 man pages. >>> >>> The other thing I'd really like to do is to really force network byte >>> order in collation database format (I'm sure I've seen a way to do it >>> in Berkley databases). It's a pity that I have no platform with >>> big-endian (or even PDP!) byte order. Any help here is highly >>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts >>> whether it fits well to my task). >>> >>> Since Google Summer of Code period has passed, I'd like to thank both >>> my mentors, Pedro and David, who gave me a helping hand during this >>> project, and especially Konrad Jankowski, who found time to answer my >>> questions and help me too. Though GSoC is closed, I'd like to stay >>> with FreeBSD project. First of all, I want to finish and bring to mind >>> this project: I don't think it's really finished, especially its >>> testing part, though it seems that new collation algorithm can already >>> be used. Then I'd like to work in other parts of my project, >>> especially in internationalization parts. I'd also like to improve my >>> own library, qc, to provide a rich API for *BSD and POSIX systems, >>> since I acutely feel the lack of such API. If it is possible to stay >>> with project, I'd be very happy to do it. :-) >>> >>> P.S. Does anyone knows how to get diff between only for my branch >>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to >>> give everything what all FreeBSD's GSoC have done, so I need some >>> other command. Thanks for your help! >>> >>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode >>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >>> >> First thank you very much for your work on this subject this is highly >> needed. >> >> Concerning the db format have you thought about using the new netbsd >> constant >> database format? >> >> It has simple API way easier to use, the db format is endian safe and >> final file >> is smaller than equivalent in bdb format. >> >> Lots of areas of FreeBSD could benefit from using this cdb format as wel= l >> imho. > > > While here, let me congratulate Dmitry. The Unicode Collation Algorithm i= s > not something easy/fun to work with. > > Indeed both David and Konrad suggested it (or tinycdb). The reason for > going bdb was that we had time constraints and bdb is already in libc. > > FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppor= t > in Illumos which basically implements their own very efficient format. We > ended up re-using the tools that libc already has to better focus on the > collation part. > > Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. > > As Dmitry noted there are still details to work out and we have to run te= sts > and get the code reviewed but all in all I am very satisfied with the > advance > in this GSoC. > > Best regards, > > Pedro. > > [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef > [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/ > --=20 With best regards, Dmitry Selyutin From owner-freebsd-i18n@FreeBSD.ORG Wed Aug 27 10:51:24 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 328135C5; Wed, 27 Aug 2014 10:51:24 +0000 (UTC) Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com [IPv6:2a00:1450:400c:c00::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0D58B3118; Wed, 27 Aug 2014 10:51:22 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id l18so26061wgh.25 for ; Wed, 27 Aug 2014 03:51:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=xs76FovOPQA6k17V9jDJw/utYC9/mTQ60nnu07rrW6A=; b=E/ddpuRoflzeJ9ReuzKCnabiaEVnwS5LHyY5ZH1iWNFxv90SwBrf6n/YUtZeT1qmh8 7WchhmhGnEC9Dnct57EEmHKClntGqDnBHUkTVteqfRnvlvunODYH0vY1buDraDmbVgks QqVJ8973EFYs0uel6uhhcx4PEeVKHYdw7LKyd3HoHq5I4F8lJgr1PZnZLsvoS2NjDq4T OG4epC8uriVesasH3kP9OBy0NBWmURAjD9ytr+vqmMs2BQxOBimFRk6r0h9a/w6b4iUR 9Hpvqr0jv6Qdrmwh5xgl2cGH8ee46M2rhczmT/TquZNZuNicaYTXkQPCV7sTmsC8OPef 0i9A== X-Received: by 10.180.92.134 with SMTP id cm6mr28112076wib.72.1409136681311; Wed, 27 Aug 2014 03:51:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.48.9 with HTTP; Wed, 27 Aug 2014 03:51:01 -0700 (PDT) Reply-To: ghostmansd@gmail.com In-Reply-To: References: <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org> From: Dmitry Selyutin Date: Wed, 27 Aug 2014 14:51:01 +0400 Message-ID: Subject: Re: Report #9: Unicode support To: Pedro Giffuni Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Baptiste Daroussin , soc-status@freebsd.org, David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2014 10:51:24 -0000 I've just seen EuroBSDCon's calendar page and it seems that it is impossible to join it (i.e. I missed the application deadline).[0] Well, may be next year? :-) 2014-08-27 14:48 GMT+04:00 Dmitry Selyutin : > Hi, Pedro, Baptiste, > > first of all thanks for your congratulations and kind words! The > project was really harder that anything I've ever met in my life, but > at the same time it was the most interesting one. :-) And still > remains! ;-) > >> That is not really uncommon :) > Well, so I can leave it as it is. :-) > >> The project does have access to sparc64 machines so if you have some >> self-contained test we can run it for you or we can test it as a routine= libc >> test after committing. > Hopefully I can finish it today or in the next two days. > >> You never answered my question concerning the fallback options. > Really? I thought that I answered. :-D Well, I'll try to explain > again. DUCET seems to be a bit obsolete collation table, which can be > more or less successfully used with real languages. However, in real > world it is completely unusable, so ICU and other use CLDR collation > table, which supports more levels. I started with DUCET since there > was much more information about it, but then I found that it doesn't > fit well, so I switched to CLDR. We have DUCET table somewhere in our > revisions though; as a fallback option, it still may be useful, so I > can restore it if you want. > >> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. > Well, I think I'll do it right after exams. bdb AFAIK is deprecated > from Linux (though it can be used as bdb46 or something similar). I > don't know reasons why they did such thing; it would be great if we > could use a tool which can be used on different platforms without > modifications and tons of conditional define's and undef's. > >> It has simple API way easier to use, the db format is endian safe and fi= nal file >> is smaller than equivalent in bdb format. > It sounds great! > >> I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The >> FreeBSD Foundation will be allocating funds for students that want to go= . >> I won=E2=80=99t be there (I am a bit far away) but David and other devel= opers will >> likely be. > Well, that depends on whether I pass my exams for the postgraduate > course or not. I'd really like to listen to more experienced > developers and may be even talk to other people about work which I did > to better understand the community's opinions. > > 2014-08-27 3:17 GMT+04:00 Pedro Giffuni : >> Hi Baptiste; >> >> >> On 08/26/14 17:16, Baptiste Daroussin wrote: >>> >>> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: >>>> >>>> Hello everyone! >>>> >>>> Here are the last news about the Unicode support project[0]. >>>> You can always check my repository[1]. >>>> >>>> During these days I had hardware problems (my HDD peacefully died), so >>>> development didn't progress so much as before. However, I've >>>> eliminated these problems, so I tried to fix bugs and reorganize the >>>> code as much as possible. Now everything shall compile. >>>> >>>> I decided to use __attribute__((constructor)) and >>>> __attribute__((destructor)), since I don't know if there exist a >>>> better way to open a file once in the startup and closing it when all >>>> routines close. I've found one or two occurrences of this construction >>>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I >>>> decided to use it. Hopefully it will also allow us to use root >>>> collation database in the embedded systems (if any such system really >>>> needs collation algorithm). >>>> >>>> As you may know we need a tool that can convert collation text files >>>> obtained from unicode.org to new collation database (colldb) format. >>>> There is a version of this tool written in Python >>>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we >>>> have a base system though, so it seems that we need to written such >>>> tool using C language. I was thinking of lex/yacc combo; I've never >>>> tried it, but I think it shouldn't be too hard to write a tool using >>>> it. I'd like to know your opinions about this task. >>>> I've already written a man page (bin/colldb/colldb.1). The only thing >>>> which seems dubious is that I decided to use the same name as for the >>>> library itself (well, it seems I have a lack of imagination). So we >>>> have both colldb.1 and colldb.3 man pages. >>>> >>>> The other thing I'd really like to do is to really force network byte >>>> order in collation database format (I'm sure I've seen a way to do it >>>> in Berkley databases). It's a pity that I have no platform with >>>> big-endian (or even PDP!) byte order. Any help here is highly >>>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts >>>> whether it fits well to my task). >>>> >>>> Since Google Summer of Code period has passed, I'd like to thank both >>>> my mentors, Pedro and David, who gave me a helping hand during this >>>> project, and especially Konrad Jankowski, who found time to answer my >>>> questions and help me too. Though GSoC is closed, I'd like to stay >>>> with FreeBSD project. First of all, I want to finish and bring to mind >>>> this project: I don't think it's really finished, especially its >>>> testing part, though it seems that new collation algorithm can already >>>> be used. Then I'd like to work in other parts of my project, >>>> especially in internationalization parts. I'd also like to improve my >>>> own library, qc, to provide a rich API for *BSD and POSIX systems, >>>> since I acutely feel the lack of such API. If it is possible to stay >>>> with project, I'd be very happy to do it. :-) >>>> >>>> P.S. Does anyone knows how to get diff between only for my branch >>>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to >>>> give everything what all FreeBSD's GSoC have done, so I need some >>>> other command. Thanks for your help! >>>> >>>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode >>>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >>>> >>> First thank you very much for your work on this subject this is highly >>> needed. >>> >>> Concerning the db format have you thought about using the new netbsd >>> constant >>> database format? >>> >>> It has simple API way easier to use, the db format is endian safe and >>> final file >>> is smaller than equivalent in bdb format. >>> >>> Lots of areas of FreeBSD could benefit from using this cdb format as we= ll >>> imho. >> >> >> While here, let me congratulate Dmitry. The Unicode Collation Algorithm = is >> not something easy/fun to work with. >> >> Indeed both David and Konrad suggested it (or tinycdb). The reason for >> going bdb was that we had time constraints and bdb is already in libc. >> >> FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppo= rt >> in Illumos which basically implements their own very efficient format. W= e >> ended up re-using the tools that libc already has to better focus on the >> collation part. >> >> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. >> >> As Dmitry noted there are still details to work out and we have to run t= ests >> and get the code reviewed but all in all I am very satisfied with the >> advance >> in this GSoC. >> >> Best regards, >> >> Pedro. >> >> [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef >> [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/ >> > > > > -- > With best regards, > Dmitry Selyutin --=20 With best regards, Dmitry Selyutin From owner-freebsd-i18n@FreeBSD.ORG Wed Aug 27 15:28:02 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A4C11C54 for ; Wed, 27 Aug 2014 15:28:02 +0000 (UTC) Received: from nm10-vm0.bullet.mail.bf1.yahoo.com (nm10-vm0.bullet.mail.bf1.yahoo.com [98.139.213.147]) by mx1.freebsd.org (Postfix) with ESMTP id 3E9C030FA for ; Wed, 27 Aug 2014 15:28:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1409153274; bh=Me8NL3cQAZHXGM5FkxOvDZHQ8eMG0v6cmBpF9FcL+20=; h=Received:Received:Received:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:From:Subject; b=B/RrSfAIMQXsoO0CMuVdgNPqI8TcpU6JuGzAFZfRZiQxhg4yfXUBsYESDd/u7Fc3M8xnSK1Gqof3xrb1PNASqG/0Yeo70JsWoAvpCibw1JqSG2NUCS3N6b/SvLHtgup9LxwSJXEkLZltQZfQf6534tn6APZbqgTdDl/RFofHzGXQUIY7dWE4JzocOLiJ9s1YuUWFK7sDrWRmZ/hECSR/HYfnQxof9ZFYnPsGBfZH8+prXT3xSe0sUOGUXSFC24GUXgZINITmgoM+bRB+B0l7Ct0eM4VCrnmA6suDawBnoEx/NgA9htvhyTN/tR3Zquv+eXEz5kY3+W7RHHp6yaME6w== DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com; b=qjoNU/sQeUdcX+eRSRPwCV6EP7MrPpthstQ3n/CSCzUuciFnfjaQpAWYqqLyO0bidTv14hH4sD4GN6RVhPHCSmrI5hRMo+aAYP2ZlwWogT1JE6q+k45q8/PmGyo6JNlMUk/CJL7viULLanaVm4caPgJm4L/omu84XAY/mAE7AD2hT90m5jbu+ABUzZAD6WDKsWtMgX4Vy/iAm2RNbgK+i5OXxih1+MSJv4Nmy0YczcRNTIvqoY94iFE1yArf2+ms54dC1qDYHeT9AldQm1cixC9yn5ZhjWl0a2RNsBnSkpun6EjCWF47ttRqvOu5BkxfKsqT/6Oi5TfZUy4m0gXNZA==; Received: from [66.196.81.170] by nm10.bullet.mail.bf1.yahoo.com with NNFMP; 27 Aug 2014 15:27:54 -0000 Received: from [98.139.211.198] by tm16.bullet.mail.bf1.yahoo.com with NNFMP; 27 Aug 2014 15:27:54 -0000 Received: from [127.0.0.1] by smtp207.mail.bf1.yahoo.com with NNFMP; 27 Aug 2014 15:27:54 -0000 X-Yahoo-Newman-Id: 879918.25685.bm@smtp207.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: K9VwH5kVM1nnsjGa3mRli6RRmWaA6ztzGinNhN9vOIg.faN JblK24uJyCv034snOhc7N6jnheVsP2tX4.afv_sooDzJ2_Gxah9oM.IfEMq. xbThr21Zaf50yg88JAe_WRgM.q_dspNb.KwFZxuDlH2LtP2AnNrGA.oQoqsY KR1_Cjm3JRtzvLT99QycwyqmjIURC3TCnPcrLS_0pgbQKZeXRY.4f5gzpzyk Ie1G23R5Slyaz863sQ0Y85CBEvcqPDQCUrHkgwvFyMb88JSBT_GObSGDVbg2 0RM0gGkZIz.sqUaHrmMNAiQ686XZfX8QZQgoFnrwxgHKterCnZN8yluHgU_X AQLi9g98VJZmMookhFBcogEqlbQbsokAsNBBZd7ib7Tn87p.qYZ6porkRUeL Q9Vj3ukBkNUpCPDMkWBsxFyRjRGrqN9LT4Ong5g_j4xZu38t1uW1G2ggRsrv cIMdXBoRcQ6kAgTK6ZFurLrzc4pcPLBrRkcwaV2n4hfbVsrVYhrL6wPsAJwG GuWH8OJZrokCfZtBLw4aW6sF0.40K2ohqEg-- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf Message-ID: <53FDF90B.4030400@freebsd.org> Date: Wed, 27 Aug 2014 10:28:11 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: ghostmansd@gmail.com Subject: Re: Report #9: Unicode support References: <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Baptiste Daroussin , soc-status@freebsd.org, David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2014 15:28:02 -0000 On 08/27/14 05:51, Dmitry Selyutin wrote: > ... >>> You never answered my question concerning the fallback options. >> Really? I thought that I answered. :-D Well, I'll try to explain >> again. DUCET seems to be a bit obsolete collation table, which can be >> more or less successfully used with real languages. However, in real >> world it is completely unusable, so ICU and other use CLDR collation >> table, which supports more levels. I started with DUCET since there >> was much more information about it, but then I found that it doesn't >> fit well, so I switched to CLDR. We have DUCET table somewhere in our >> revisions though; as a fallback option, it still may be useful, so I >> can restore it if you want. I don't see DUCET as being ever used but we are setting the old algorithm as a fallback for CLDR. I was just wondering how DUCET compares to the existing algorithm. Given that DUCET is in the standard and that you already implemented it, I thought it would be a better fallback than the old code. It's your call though. Pedro. From owner-freebsd-i18n@FreeBSD.ORG Thu Aug 28 20:23:41 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id D05BEB2E; Thu, 28 Aug 2014 20:23:41 +0000 (UTC) Received: from mail-wg0-x232.google.com (mail-wg0-x232.google.com [IPv6:2a00:1450:400c:c00::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CF6B81C48; Thu, 28 Aug 2014 20:23:40 +0000 (UTC) Received: by mail-wg0-f50.google.com with SMTP id x12so1270708wgg.33 for ; Thu, 28 Aug 2014 13:23:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:from:date:message-id:subject:to:content-type; bh=Y0XI8uGyRfwy1Gw5foSW4L/pgNWAtHXLsY0Bw+L28fk=; b=mISzyEKaBe5+JYY8Cl4CxrEiUI6U6/hLTiWSiJ1YwkQZhfk+FpqmfPyq/+WQYG4n6E 241ohD223b+dvylM6cX5/C+reHc214JjgnugBgalMBvH8uiT/lU92X5otsb7WberrA/c 1fBFGF8OClOB0cI3o3/9MNnvXndUEKU03PXTJ6c06QAzTWmQHdJ/KBSD3pxUo7xYkCJp FS6ZyyLL1NJ/6Kjri74NKeuKpZu0AA+9BVtSolymt4ZfnZgOPWJ0Ah91vU5JBYX6ftCZ tRhbT4I7DyGPag2ePQLWeor/i9HlRTXzVBEW3ZrM9b5ZXuSo2jEAAG3zk+zkUvkpEpxm yHJw== X-Received: by 10.180.92.134 with SMTP id cm6mr9245601wib.72.1409257419141; Thu, 28 Aug 2014 13:23:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.48.9 with HTTP; Thu, 28 Aug 2014 13:23:18 -0700 (PDT) Reply-To: ghostmansd@gmail.com From: Dmitry Selyutin Date: Fri, 29 Aug 2014 00:23:18 +0400 Message-ID: Subject: Report #10: Unicode support To: soc-status@freebsd.org, Pedro Giffuni , David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Aug 2014 20:23:41 -0000 Hello everyone! I've written a colldb tool using C language, so one may now use colldb after compiling sources. I decided not to use lex/yacc here since it seemed to be a bit overkill for just a simple task, so all you need is C plus libcolldb (written in C too). I've also written colldb.1 manual page and fixed libcolldb build (previously one could not compile it without UNICODE=YES in make.conf). Due to my exams (they will start in several days) I have to take a rest to be prepared. Thanks to everyone who helped me during this summer! I'd like to thank FreeBSD's community: guys, you are amazing, it's really pleasant to work with you! I'm going to continue my work after September 20th, so stay tuned! ;-) P.S. Repository is here as usual: https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd -- With best regards, Dmitry Selyutin