Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Jan 2008 16:13:29 +0100
From:      =?UTF-8?B?UmFmYcOrbCBDYXJyw6k=?= <funman@videolan.org>
To:        questions@freebsd.org
Subject:   Some UTF-8 characters are not representable on FreeBSD7
Message-ID:  <20080117161329.69fe4135@zod.zod>

next in thread | raw e-mail | index | archive | help
--Sig_/E.JkcNTlRDC02h.7oneId8g
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello,

I noticed I couldn't use some characters with libncursesw: namely =E2=9A=91=
 =E2=9A=90
and =E2=8F=8F.

I run into some tests and found that some characters were reported as
unprintable, while on Linux all was fine.

I found it extremely strange since those characters would show up in my
terminal (gnome-terminal) when I pasted them.

Here are the results of the test I ran on Linux and FreeBSD:

[fun@zod ~]% uname -a ;./test
FreeBSD zod 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Sun Dec  2 02:30:18 CET
2007     root@zod:/media/externe/usr/src/sys/ZOD  i386 Locale:
fr_FR.UTF-8 OK a : 1=20
OK =E2=9A=91 : 0
OK =C3=B6 : 1
OK =E2=86=91 : 1
OK =C2=A9 : 1
OK =E2=9A=90 : 0
OK =C3=A9 : 1
OK =E2=8F=8F : 0

[fun@zod ~]% uname -a ; LANG=3Dfr_FR.ISO8859-15 ./test
FreeBSD zod 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Sun Dec  2 02:30:18 CET
2007     root@zod:/media/externe/usr/src/sys/ZOD  i386 Locale:
fr_FR.ISO8859-15 OK a : 1
OK =E2=9A=91 : 1
OK =C3=B6 : 1
OK =E2=86=91 : 1
OK =C2=A9 : 1
OK =E2=9A=90 : 1
OK =C3=A9 : 1
OK =E2=8F=8F : 1


16:03 funman@altair  ~% uname -a ; ./test=20
Linux altair 2.6.22-2-amd64 #1 SMP Thu Aug 30 23:43:59 UTC 2007 x86_64
GNU/Linux Locale: fr_FR.UTF-8
OK a : 32768
OK =E2=9A=91 : 1
OK =C3=B6 : 1
OK =E2=86=91 : 1
OK =C2=A9 : 1
OK =E2=9A=90 : 1
OK =C3=A9 : 1
OK =E2=8F=8F : 1


A value of 0 means unprintable, a positive value means printable (there
is a graphical representation).

And here is the test I used:

#include <stdio.h>
#include <locale.h>
#include <stdlib.h>
#include <wchar.h>

int main(void)
{
    printf( "Locale: %s\n", setlocale( LC_ALL, getenv( "LANG" ) ) );

#define MAX 8
    const char const tab[MAX][6] =3D {
        "a", "=E2=9A=91", "=C3=B6", "=E2=86=91", "=C2=A9", "=E2=9A=90", "=
=C3=A9", "=E2=8F=8F"
    };

    int i;
    wchar_t wc;
    for( i =3D 0; i < MAX; i++ )
    {
        printf("%s ", mbtowc( &wc, tab[i], 6 ) ? "OK" : "KO" );
        printf("%s : %d\n", tab[i], iswgraph( wc ) );
    }

    return 0;
}


I suppose this is a bug in UTF-8 locale, I tested with different
$LANG finished by "UTF-8" and the result was the same.

Am I right that an Unicode character should always have a graphical
representation in an UTF-8 locale ?

Thanks

--=20
Rafa=C3=ABl Carr=C3=A9

--Sig_/E.JkcNTlRDC02h.7oneId8g
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4 (FreeBSD)

iD8DBQFHj3CcYWCeGMCv8Q8RAhTGAKCvuh60BrgBl8fQHEWgg+LFmj+fAACgzBaH
614hND+LTvD6IrwtSVH3Xtc=
=RJlK
-----END PGP SIGNATURE-----

--Sig_/E.JkcNTlRDC02h.7oneId8g--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080117161329.69fe4135>