Date: Sat, 14 Nov 2015 13:19:04 +0100 From: John Marino <freebsd.contact@marino.st> To: Andrey Chernov <ache@freebsd.org>, Baptiste Daroussin <bapt@FreeBSD.org>, arch@FreeBSD.org Subject: Re: Question about ASCII and nl_langinfo (locale work) Message-ID: <564726B8.7060308@marino.st> In-Reply-To: <564373D4.9060403@freebsd.org> References: <20151110222636.GN10134@ivaldir.etoilebsd.net> <564373D4.9060403@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/11/2015 5:59 PM, Andrey Chernov wrote: > On 11.11.2015 1:26, Baptiste Daroussin wrote: >> The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does >> not. which means tcl is not able to determine what encoding is needed for the C >> and POSIX locales. >> >> On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most >> application knows what linux returns. >> >> That means we need to teach all upstream about US-ASCII all the time. >> >> The proposals are: >> - Do not change what we have always done. >> - Change it to something that makes sense "C" (what we tried with "POSIX" which >> was a very bad idea, but "C" seems to be commonly recognised by application as >> ASCII) >> - Let's report the same as Linux, that will simplify portability >> - Let's be obvious and report ASCII (also commonly recognised by applications) > > Just repeating my opinion in this new thread. > > Since POSIX don't tell anything certain, we should be Linux compatible > here to have less surprise, i.e.: > 1) Return "ANSI_X3.4-1968" for C/POSIX locale (was "US-ASCII"). > 2) Return "ASCII" for *.US-ASCII locales (was "US-ASCII"). > Typical Linux program knows nothing about our "US-ASCII", and porting > handles it rarely. > > Not doing that leads to hidden, hard to find bugs like still present > right now in our tcl ports. For all that years tcl don't understand > FreeBSD-native nl_langinfo() "US-ASCII" and falls back to "iso8859-1" > (it understands Linux "ANSI_X3.4-1968" and "ASCII" of course). > As a DragonFly representative (and probably the person that would implement it), I can accept Andrey's proposal. What it would mean: 1) "ANSI_X3.4-1968" would be the one return value of nl_langinfo(CODESET) that is not in the output of "locale -m" 2) This would require an alteration to usr.bin/locale to add this "ANSI_X3.4-1968" if not found (similar to how it's done for US-ASCII 3) At the same time usr.bin/locale would be modified to change check from "US-ASCII" to "ASCII" 4) The locale tools would have to be modified to change all source and map references from "US-ASCII" to "ASCII" and the six LC* generating makefiles regenerated 5) nl_langinfo would be changed to return "ANSI_X3.4-1968" instead of "US-ASCII" if the encoding equals "NONE" 6) the "make upgrade" utility would need to remove *.US-ASCII locales 7) Do we really need 6 ".ASCII" locales? It has very limited use, I'd suggest just having "en_US.ASCII" and that it. Dump en_AU, en_ZA, en_GB, etc. We can keep all 6 if we want, but if we are removing US-ASCII anyway, we should limit the locales to what makes sense. Alternatively FreeBSD could link US-ASCII => ASCII and have both variations but I think DragonFly will just drop US-ASCII in this case. What nl_langinfo(CODESET) returns has to be reflected in the locale name (with the exception of "ANSI_X3.4-1968") so there has to be e.g. en_US.ASCII as a valid locale if US-ASCII is changed. There might be other changes necessary if "US-ASCII" is changed; I'd have to do a thorough review. To get started, I think this needs to be decided: A) confirm we want locale -m and nl_langinfo(CODESET) to return "ANSI_X3.4-1968" for C/POSIX locales B) Confirm renaming US-ASCII locales to ASCII C) (FreeBSD only) Decide if you want to conserve US-ASCII locales with symlinks. nl_langinfo(CODESET) will return "ASCII" for these symlinked locales D) Decide the set of "ASCII" locales are really needed. (I suggest one, en_US.ASCII) Thanks, John
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?564726B8.7060308>