Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Jul 2016 16:07:41 +0200
From:      Baptiste Daroussin <bapt@FreeBSD.org>
To:        Jonathan Anderson <jonathan@FreeBSD.org>
Cc:        Tim =?utf-8?Q?=C4=8Cas?= <darkuranium@gmail.com>, freebsd-current@freebsd.org
Subject:   Re: UTF-8 by default?
Message-ID:  <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net>
In-Reply-To: <B68D48ED-66CA-4E5B-8ED2-555B397AC73E@FreeBSD.org>
References:  <CANd9X8f5wHvdwN_XZ2y0qsiydYyb=NKLXF0k65S0_TiuWHeGKA@mail.gmail.com> <B68D48ED-66CA-4E5B-8ED2-555B397AC73E@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--2wdpd5drrm4uufok
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote:
> On 20 Jul 2016, at 9:13, Tim =C4=8Cas wrote:
>=20
> > So, without further ado:
> > 1) What are the reasons that UTF-8 isn't the default yet?
> > 2) Would it be possible to make this the default in 11.0? What about
> > 12.0?
> > 3) Assuming an effort is started towards making UTF-8 the default,
> > what changes would be required?
>=20
> At least according to one of my students (who makes more extensive use of
> i18n than I do), enabling UTF-8 by default is pretty straightforward:
>=20
> https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support

the LC_COLLATE=3DC is not needed anymore with freebsd 11+
>=20
> If there's anything missing there, I'd love to hear about it.
>=20

Lot of work has been done during the 11.0 development the following issues =
were
fixed:

/bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit)
no unicode collation: fixed but still very fresh code
vi: there was a potential corruption when opening a file in an encoding whi=
ch is
not unicode in a unicode env, now is does not corrupt anything anymore but =
still
says it is unhappy
finger(1) has been fixed for multibytes names (I know noone care about that=
 one
:))

On the list of still known issues:
* important:
  - csh does not handle unicode
  - regex in libc: it does not handle unicode right (except if I have missed
    something) and needs to be either fixed either switch to libtre + custom
    patches (there was a summer of code about it long ago and dfly went that
    way)
  - unicode support in our old groff is pretty bad, I plan to replace it wi=
th
    heirloom-doctools which does handle unicode propertly (as far I have te=
sted
    at least)
  - edit(1) does not handle multibyte

* medium (minor?)
  - login(1) does not handle unicode properly

* minor:
  - lots of base tools (minor one like nl and friends are not multibyte
    aware in lot of cases, probably merging the work done by Ingo Schwarze =
on
    those tools on OpenBSD might be useful, but I have no plan to do it)
  - vi needs improvement in multiencoding support I haven't checked the lat=
est
    modification on vi upstream about that

There might be more, but that is all that comes out of my head right now

Best regards,
Bapt

--2wdpd5drrm4uufok
Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXj4WtAAoJEGOJi9zxtz5ayIIQAOIRxyPn99omd0XTr0pUmm78
kpx+aNrC8uKNauTeW5ElwbEx6ieDdvaZ8BP4L97edSr537AC3aCUaYKIqF3Ai34X
ztPOAc7XubJRHpPx4/4GfnjXnzBQs+Cq0rMtcJ/VYDgxGYnkwFjYMcKW3QbzEU3I
m0ksrXlpJ6AL15mKgBnnjdHn1QEQxAR6pZt/O/W9aFFXDcKRzMm9Nraqh90JclUM
bKe6hlWRN8QFlbGU7+MFl3Yt/iXb8CPO/gpDEdoKh6pMkeLk50Hp+eQ/esH39x7R
y3rHid8QfgRjsQVaABEnXjDyR11CNER6cT0mdZm6KHVG6P1ijqG8XlG/9cXXKQ8h
EEnXQCqJSeio4U2cIJiasesPlJmgOnOvVFnVu98pf/qj0tHLmRViFFbQ6ap3XZmk
FBMYVrMxfan8NdUwChbiO/er5dznd746nOFhEpGaeGkOv4p4ZrvjiF0JtUgwq2LQ
oSr50NV8VaZnyLkL6b+4mhsI2H0Ef+smi6/b5KZuLr4Foe+u2FOhLKoP8E3Y9Dif
sPuPi9BVCBCRV6jJ3U1dqr0o/rsvjzO5n931JPHCWx+7pT3dFKs1h8/s9vUiGFIV
KXPNp3PPlggHnvr3J5YHgmsyBjwZ1Oy0GLfCwCZ0z9EUjwbfgquPKJJAHJwnHaOs
pbtomIcStNTuqFJhQ8Rz
=4m7z
-----END PGP SIGNATURE-----

--2wdpd5drrm4uufok--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160720140741.yi7vfgmmqtg6eprx>