From owner-freebsd-arch@freebsd.org Sat Nov 14 12:19:19 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A0DF1A2D04B for ; Sat, 14 Nov 2015 12:19:19 +0000 (UTC) (envelope-from freebsd.contact@marino.st) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 85B3F1907 for ; Sat, 14 Nov 2015 12:19:19 +0000 (UTC) (envelope-from freebsd.contact@marino.st) Received: by mailman.ysv.freebsd.org (Postfix) id 84C37A2D04A; Sat, 14 Nov 2015 12:19:19 +0000 (UTC) Delivered-To: arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6BCC0A2D047 for ; Sat, 14 Nov 2015 12:19:19 +0000 (UTC) (envelope-from freebsd.contact@marino.st) Received: from shepard.synsport.net (mail.synsport.com [208.69.230.148]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 027B01904; Sat, 14 Nov 2015 12:19:15 +0000 (UTC) (envelope-from freebsd.contact@marino.st) Received: from [192.168.1.22] (210.Red-81-38-187.dynamicIP.rima-tde.net [81.38.187.210]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by shepard.synsport.net (Postfix) with ESMTP id 51F9343BC5; Sat, 14 Nov 2015 06:19:07 -0600 (CST) Subject: Re: Question about ASCII and nl_langinfo (locale work) To: Andrey Chernov , Baptiste Daroussin , arch@FreeBSD.org References: <20151110222636.GN10134@ivaldir.etoilebsd.net> <564373D4.9060403@freebsd.org> Reply-To: marino@freebsd.org From: John Marino X-Enigmail-Draft-Status: N1110 Message-ID: <564726B8.7060308@marino.st> Date: Sat, 14 Nov 2015 13:19:04 +0100 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <564373D4.9060403@freebsd.org> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Nov 2015 12:19:19 -0000 On 11/11/2015 5:59 PM, Andrey Chernov wrote: > On 11.11.2015 1:26, Baptiste Daroussin wrote: >> The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does >> not. which means tcl is not able to determine what encoding is needed for the C >> and POSIX locales. >> >> On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most >> application knows what linux returns. >> >> That means we need to teach all upstream about US-ASCII all the time. >> >> The proposals are: >> - Do not change what we have always done. >> - Change it to something that makes sense "C" (what we tried with "POSIX" which >> was a very bad idea, but "C" seems to be commonly recognised by application as >> ASCII) >> - Let's report the same as Linux, that will simplify portability >> - Let's be obvious and report ASCII (also commonly recognised by applications) > > Just repeating my opinion in this new thread. > > Since POSIX don't tell anything certain, we should be Linux compatible > here to have less surprise, i.e.: > 1) Return "ANSI_X3.4-1968" for C/POSIX locale (was "US-ASCII"). > 2) Return "ASCII" for *.US-ASCII locales (was "US-ASCII"). > Typical Linux program knows nothing about our "US-ASCII", and porting > handles it rarely. > > Not doing that leads to hidden, hard to find bugs like still present > right now in our tcl ports. For all that years tcl don't understand > FreeBSD-native nl_langinfo() "US-ASCII" and falls back to "iso8859-1" > (it understands Linux "ANSI_X3.4-1968" and "ASCII" of course). > As a DragonFly representative (and probably the person that would implement it), I can accept Andrey's proposal. What it would mean: 1) "ANSI_X3.4-1968" would be the one return value of nl_langinfo(CODESET) that is not in the output of "locale -m" 2) This would require an alteration to usr.bin/locale to add this "ANSI_X3.4-1968" if not found (similar to how it's done for US-ASCII 3) At the same time usr.bin/locale would be modified to change check from "US-ASCII" to "ASCII" 4) The locale tools would have to be modified to change all source and map references from "US-ASCII" to "ASCII" and the six LC* generating makefiles regenerated 5) nl_langinfo would be changed to return "ANSI_X3.4-1968" instead of "US-ASCII" if the encoding equals "NONE" 6) the "make upgrade" utility would need to remove *.US-ASCII locales 7) Do we really need 6 ".ASCII" locales? It has very limited use, I'd suggest just having "en_US.ASCII" and that it. Dump en_AU, en_ZA, en_GB, etc. We can keep all 6 if we want, but if we are removing US-ASCII anyway, we should limit the locales to what makes sense. Alternatively FreeBSD could link US-ASCII => ASCII and have both variations but I think DragonFly will just drop US-ASCII in this case. What nl_langinfo(CODESET) returns has to be reflected in the locale name (with the exception of "ANSI_X3.4-1968") so there has to be e.g. en_US.ASCII as a valid locale if US-ASCII is changed. There might be other changes necessary if "US-ASCII" is changed; I'd have to do a thorough review. To get started, I think this needs to be decided: A) confirm we want locale -m and nl_langinfo(CODESET) to return "ANSI_X3.4-1968" for C/POSIX locales B) Confirm renaming US-ASCII locales to ASCII C) (FreeBSD only) Decide if you want to conserve US-ASCII locales with symlinks. nl_langinfo(CODESET) will return "ASCII" for these symlinked locales D) Decide the set of "ASCII" locales are really needed. (I suggest one, en_US.ASCII) Thanks, John