From owner-freebsd-gnome@FreeBSD.ORG Thu Jan 10 04:37:39 2008 Return-Path: Delivered-To: gnome@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D584B16A417 for ; Thu, 10 Jan 2008 04:37:39 +0000 (UTC) (envelope-from bland@FreeBSD.org) Received: from smtp1.jp.viruscheck.net (smtp1.jp.viruscheck.net [154.33.69.52]) by mx1.freebsd.org (Postfix) with ESMTP id 9B3F913C46B for ; Thu, 10 Jan 2008 04:37:39 +0000 (UTC) (envelope-from bland@FreeBSD.org) Received: from (mail2.jp.viruscheck.net) [154.33.69.39]:12639 by smtp1.jp.viruscheck.net with esmtp id 1JCos2-00029f-AB ; Thu, 10 Jan 2008 13:18:22 +0900 Received: from (nux.orchid) [125.206.34.113]:13388 by mail2.jp.viruscheck.net with esmtp id 1JCos1-0001JF-VD ; Thu, 10 Jan 2008 13:18:22 +0900 Received: from [89.60.10.11] (horse.orchid.orchidtechnology.com [89.60.10.11] (may be forged)) by nux.orchid (8.13.8/8.13.8) with ESMTP id m0A4IJEb003174; Thu, 10 Jan 2008 13:18:19 +0900 Message-ID: <47859C8A.6060908@FreeBSD.org> Date: Thu, 10 Jan 2008 13:18:18 +0900 From: Alexander Nedotsukov User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: "Alexandre \"Sunny\" Kovalenko" References: <1199893999.756.29.camel@RabbitsDen> <1199900104.304.28.camel@shumai.marcuscom.com> <1199925635.9959.10.camel@RabbitsDen> <1199927795.304.70.camel@shumai.marcuscom.com> <1199930945.46097.11.camel@RabbitsDen> In-Reply-To: <1199930945.46097.11.camel@RabbitsDen> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: ports@FreeBSD.org, gnome@FreeBSD.org Subject: Re: [patch] glib20, UTF-8 and string collation X-BeenThere: freebsd-gnome@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GNOME for FreeBSD -- porting and maintaining List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Jan 2008 04:37:40 -0000 Alexandre, The problem you exposed have its roots in Evo code. g_utf8_* stuff defined to work on *utf-8* strings only and have undefined behaviour on MBCS strings. It may sound stupid but crashes are allowed in this case :-) Even we apply your latest patch the true problem solution will be only postponed. We have to continue with Evo source. Find subject parser part and ensure that it will be utf-8 encoded string at the end. All the best, Alexander. Alexandre "Sunny" Kovalenko wrote: > On Wed, 2008-01-09 at 20:16 -0500, Joe Marcus Clarke wrote: > >> On Wed, 2008-01-09 at 19:40 -0500, Alexandre "Sunny" Kovalenko wrote: >> >>> On Wed, 2008-01-09 at 12:35 -0500, Joe Marcus Clarke wrote: >>> >>>> On Wed, 2008-01-09 at 10:53 -0500, Alexandre "Sunny" Kovalenko wrote: >>>> >>>>> I have seen recent commit WRT string collation in devel/glib20 by >>>>> marcus, so I have decided to check if there is an interest to fix SEGV >>>>> in g_utf8_collate when it is given 8-bit non-UTF-8 string(s) to collate. >>>>> >>>> Any commits I have made in the area of UTF-8 are completely accidental. >>>> I am not the UTF-8 guy. Both bland and jylefort have expressed interest >>>> in this. Perhaps one of them will comment. >>>> >>> I hope so. Just in case, they would decide to, I have reduced the >>> situation to the small program below. I get >>> >>> GLib-CRITICAL **: g_convert: assertion `str != NULL' failed >>> >>> and no core dump from this simple program, whereas Evolution manages to >>> pass NULL to strcoll further down in g_utf8_collate and get SEGV for its >>> pains. >>> >> That sounds like a no-no for Evolution to be dereferencing a NULL >> pointer. Hopefully they'd fix this to prevent the problem. >> > > It's not Evolution, it is glib, specifically g_utf8_collate, which would > call strcoll(3) blindly on the return of g_utf8_normalize inside > gunicollate.c. And now, I can get core dumped out of this simple program > as well, merely by setting CHARSET=en_US.UTF-8 (I had it is ASCII in the > terminal window, which would trigger different path within > g_utf8_collate). > > >>> Conversely, if the answer still is "Evolution should not have done >>> that", I will happily crawl back under my rock and keep my patch >>> locally. >>> >> I can't imagine you're alone in this. But then again, any Cyrillic mail >> that comes my way is always spam, so what do I know. >> > > More importantly, it is UTF-8 spam -- in order to trigger this, you need > KOI8-R or CP1251, and in the sorted column to boot. I suspect that > Latin1 or ShiftJIS would do the trick too. > > Now, how about this: would you be amenable to this Really Harmless(tm) > patch, which merely adds error checking along the lines used in the same > function, about dozen lines up ;) > > --- glib/gunicollate.c.B 2008-01-09 20:48:25.000000000 -0500 > +++ glib/gunicollate.c 2008-01-09 20:49:35.000000000 -0500 > @@ -166,6 +166,9 @@ > str1_norm = g_utf8_normalize (str1, -1, G_NORMALIZE_ALL_COMPOSE); > str2_norm = g_utf8_normalize (str2, -1, G_NORMALIZE_ALL_COMPOSE); > > + g_return_val_if_fail (str1_norm != NULL, 0); > + g_return_val_if_fail (str2_norm != NULL, 0); > + > if (g_get_charset (&charset)) > { > result = strcoll (str1_norm, str2_norm); > > I can add it to your files/extra-patch-glib_gunicollate.c, or package > it separately -- I really hate it when I start Evolution after portupgrade > to write some E-mails real quick, only to find out that I have forgotten > to patch glib... again. > >