From owner-freebsd-current@freebsd.org Wed Jul 20 14:07:45 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 34466B9E3D2 for ; Wed, 20 Jul 2016 14:07:45 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C5D7B133C; Wed, 20 Jul 2016 14:07:44 +0000 (UTC) (envelope-from baptiste.daroussin@gmail.com) Received: by mail-wm0-x22c.google.com with SMTP id i5so70675966wmg.0; Wed, 20 Jul 2016 07:07:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=bqQWkAdE0l5TbXJpCm/FcMkfu757jwTNLDA5bveqelU=; b=ewdEbTJqUAFA/OWGvPHY5mQZcAx5FRCkj0fR1ivK4Rltdy0wOK2YgSPwyLOVUYn8fW jtjPeW3u/8qD4M0DSJ1cRUDGwgOEYJu6m95sMUh0vk1B1En4LbhMRLN53XmNY9b9U/Ir cNYR08F82mj8OeAYXaWXQZG4jgFCbQG1w4Cx71pNHtPBTxa5vWJ+S5aJT5gxJiYMPdse dD3neOTh8vnI2McKe0w6Goi8mrRpr5J+0rKYqcpmU/kv3+sLIuti/GhhV2Fz5H5z1dEo ChM6IGRw3J1WvMMPokJg9WJemwCWg9B3w4gMeCP6CZQ01pAahXIUrF6kaTfTxWLKRqTL hw3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=bqQWkAdE0l5TbXJpCm/FcMkfu757jwTNLDA5bveqelU=; b=HF9Lm4FdwITuySiW8g16GbpUgdGF2q48N3M5kYeDMp0UIrBRbkaC8v3GC7R2pDgW6Q AoL4mvmiRlQd8ydgbn8e5+cE1Pkh41jqDKxlma4GarPuRFMhTlE3VPzVVrh9A3daDAOP Kjj3kL7g6FDZWDBMofKi+cDSd5oViL6/pnQOcFcok0ZJCzYJvnlH8ZHe3yQYRearPLhc 1p8sEIUaw2Mjwk0BPgICbGjMis3o+dfV78PJ6+x36O2BYpnlLEseSGm8yoa1cJhqDt84 TvkE7tTB2+VKZLi3HkOQS8Mb7r2ZEnBzv89HRFUfngAnLiWOx/ZVj6GMuxrMEyMRTi00 sZ7g== X-Gm-Message-State: ALyK8tKiI4OfNf9H9y4fRmXcydNXpRJRSuJK+8PWOjzPgEvsCSaGgumX6/xCDs/uN+67qw== X-Received: by 10.28.232.145 with SMTP id f17mr11445662wmi.15.1469023662976; Wed, 20 Jul 2016 07:07:42 -0700 (PDT) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by smtp.gmail.com with ESMTPSA id a194sm26458563wmd.24.2016.07.20.07.07.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 20 Jul 2016 07:07:42 -0700 (PDT) Sender: Baptiste Daroussin Date: Wed, 20 Jul 2016 16:07:41 +0200 From: Baptiste Daroussin To: Jonathan Anderson Cc: Tim =?utf-8?Q?=C4=8Cas?= , freebsd-current@freebsd.org Subject: Re: UTF-8 by default? Message-ID: <20160720140741.yi7vfgmmqtg6eprx@ivaldir.etoilebsd.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="2wdpd5drrm4uufok" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.6.1-neo (2016-06-11) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jul 2016 14:07:45 -0000 --2wdpd5drrm4uufok Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 20, 2016 at 10:47:45AM -0230, Jonathan Anderson wrote: > On 20 Jul 2016, at 9:13, Tim =C4=8Cas wrote: >=20 > > So, without further ado: > > 1) What are the reasons that UTF-8 isn't the default yet? > > 2) Would it be possible to make this the default in 11.0? What about > > 12.0? > > 3) Assuming an effort is started towards making UTF-8 the default, > > what changes would be required? >=20 > At least according to one of my students (who makes more extensive use of > i18n than I do), enabling UTF-8 by default is pretty straightforward: >=20 > https://github.com/musec/freebsd/wiki/Common-setup#utf-8-support the LC_COLLATE=3DC is not needed anymore with freebsd 11+ >=20 > If there's anything missing there, I'd love to hear about it. >=20 Lot of work has been done during the 11.0 development the following issues = were fixed: /bin/sh not able to handle utf-8 (fixed by fixing the bug in libedit) no unicode collation: fixed but still very fresh code vi: there was a potential corruption when opening a file in an encoding whi= ch is not unicode in a unicode env, now is does not corrupt anything anymore but = still says it is unhappy finger(1) has been fixed for multibytes names (I know noone care about that= one :)) On the list of still known issues: * important: - csh does not handle unicode - regex in libc: it does not handle unicode right (except if I have missed something) and needs to be either fixed either switch to libtre + custom patches (there was a summer of code about it long ago and dfly went that way) - unicode support in our old groff is pretty bad, I plan to replace it wi= th heirloom-doctools which does handle unicode propertly (as far I have te= sted at least) - edit(1) does not handle multibyte * medium (minor?) - login(1) does not handle unicode properly * minor: - lots of base tools (minor one like nl and friends are not multibyte aware in lot of cases, probably merging the work done by Ingo Schwarze = on those tools on OpenBSD might be useful, but I have no plan to do it) - vi needs improvement in multiencoding support I haven't checked the lat= est modification on vi upstream about that There might be more, but that is all that comes out of my head right now Best regards, Bapt --2wdpd5drrm4uufok Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXj4WtAAoJEGOJi9zxtz5ayIIQAOIRxyPn99omd0XTr0pUmm78 kpx+aNrC8uKNauTeW5ElwbEx6ieDdvaZ8BP4L97edSr537AC3aCUaYKIqF3Ai34X ztPOAc7XubJRHpPx4/4GfnjXnzBQs+Cq0rMtcJ/VYDgxGYnkwFjYMcKW3QbzEU3I m0ksrXlpJ6AL15mKgBnnjdHn1QEQxAR6pZt/O/W9aFFXDcKRzMm9Nraqh90JclUM bKe6hlWRN8QFlbGU7+MFl3Yt/iXb8CPO/gpDEdoKh6pMkeLk50Hp+eQ/esH39x7R y3rHid8QfgRjsQVaABEnXjDyR11CNER6cT0mdZm6KHVG6P1ijqG8XlG/9cXXKQ8h EEnXQCqJSeio4U2cIJiasesPlJmgOnOvVFnVu98pf/qj0tHLmRViFFbQ6ap3XZmk FBMYVrMxfan8NdUwChbiO/er5dznd746nOFhEpGaeGkOv4p4ZrvjiF0JtUgwq2LQ oSr50NV8VaZnyLkL6b+4mhsI2H0Ef+smi6/b5KZuLr4Foe+u2FOhLKoP8E3Y9Dif sPuPi9BVCBCRV6jJ3U1dqr0o/rsvjzO5n931JPHCWx+7pT3dFKs1h8/s9vUiGFIV KXPNp3PPlggHnvr3J5YHgmsyBjwZ1Oy0GLfCwCZ0z9EUjwbfgquPKJJAHJwnHaOs pbtomIcStNTuqFJhQ8Rz =4m7z -----END PGP SIGNATURE----- --2wdpd5drrm4uufok--