From owner-freebsd-questions@freebsd.org Sun Nov 3 12:04:04 2019 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 22B141B2C14 for ; Sun, 3 Nov 2019 12:04:04 +0000 (UTC) (envelope-from per@hedeland.org) Received: from mailout.easydns.com (mailout.easydns.com [64.68.202.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 475ZN33kwhz447D for ; Sun, 3 Nov 2019 12:04:03 +0000 (UTC) (envelope-from per@hedeland.org) Received: from localhost (localhost [127.0.0.1]) by mailout.easydns.com (Postfix) with ESMTP id EBBA3C0CE2; Sun, 3 Nov 2019 12:04:02 +0000 (UTC) Received: from mailout.easydns.com ([127.0.0.1]) by localhost (emo12-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l0053xpbA0mZ; Sun, 3 Nov 2019 12:04:02 +0000 (UTC) Received: from hedeland.org (81-228-157-209-no289.tbcn.telia.com [81.228.157.209]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mailout.easydns.com (Postfix) with ESMTPSA id 11951C0BB8; Sun, 3 Nov 2019 12:03:59 +0000 (UTC) Received: from pluto.hedeland.org (pluto.hedeland.org [10.1.1.5]) by tellus.hedeland.org (8.15.2/8.15.2) with ESMTPS id xA3C3v8R024680 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Sun, 3 Nov 2019 13:03:58 +0100 (CET) (envelope-from per@hedeland.org) Subject: Re: sort is broken To: "John R. Levine" Cc: "Ronald F. Guilmette" , freebsd-questions@freebsd.org References: <8847.1572745058@segfault.tristatelogic.com> From: Per Hedeland Message-ID: <19f67a18-b23d-9dca-661c-a541cda19dd0@hedeland.org> Date: Sun, 3 Nov 2019 13:03:57 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 475ZN33kwhz447D X-Spamd-Bar: / Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=none (mx1.freebsd.org: domain of per@hedeland.org has no SPF policy when checking 64.68.202.10) smtp.mailfrom=per@hedeland.org X-Spamd-Result: default: False [0.98 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCVD_COUNT_FIVE(0.00)[5]; RWL_MAILSPIKE_POSSIBLE(0.00)[10.202.68.64.rep.mailspike.net : 127.0.0.17]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-0.53)[-0.527,0]; FROM_HAS_DN(0.00)[]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; DMARC_NA(0.00)[hedeland.org]; AUTH_NA(1.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[209.157.228.81.khpj7ygk5idzvmvt5x4ziurxhy.zen.dq.spamhaus.net : 127.0.0.11]; TO_MATCH_ENVRCPT_SOME(0.00)[]; NEURAL_SPAM_LONG(0.13)[0.132,0]; R_SPF_NA(0.00)[]; RCVD_IN_DNSWL_LOW(-0.10)[10.202.68.64.list.dnswl.org : 127.0.5.1]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:16686, ipnet:64.68.200.0/22, country:CA]; MID_RHS_MATCH_FROM(0.00)[]; IP_SCORE(0.57)[ip: (1.02), ipnet: 64.68.200.0/22(0.16), asn: 16686(1.78), country: CA(-0.09)]; FROM_EQ_ENVFROM(0.00)[] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Nov 2019 12:04:04 -0000 On 2019-11-03 05:19, John R. Levine wrote: >> In my env, LC_ALL is not set at all. >> >> I do have these, but not sure if they make any difference: >> >> LANG=en_US.UTF-8 >> XTERM_LOCALE=en_US.UTF-8 >> LESSCHARSET=utf-8 > > Try this and see if it's happier: > > export LC_ALL=en_US.UTF-8 According to https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html (as well as the sort(1) man page, actually), if no LC_* variables are set, the LANG setting (if any) is used. And if LC_ALL is set, the setting of both LANG and all the other LC_* variables is ignored. I.e. your setting of LC_ALL to the same value as LANG, when no other LC_* variables are set, should be a no-op. > I think your problem is that the default C locale is ASCII only. So not relevant to Ronald's problem, since the C locale isn't used due to his LANG setting, but the above page says: If the locale value is "C" or "POSIX", the POSIX locale is used and the standard utilities behave in accordance with the rules in POSIX Locale , for the associated category. where "Posix Locale" is a link to https://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html#tag_005_002 which says: The tables in Locale Definition describe the characteristics and behaviour of the POSIX locale for data consisting entirely of characters from the portable character set and the control character set. For other characters, the behaviour is unspecified. For C-language programs, the POSIX locale is the default locale when the setlocale() function is not called. I.e. it does indeed specify the behavior only for ASCII ("the portable character set and the control character set"), so in principle 'sort' could give an error if characters outside that set is present. But as I showed in an earlier posting, 'sort' has no problem with Ronald's ISO-8859-1, non-ASCII, character when LANG is set to "C" - presumably it just uses the full 8-bit byte values, since that is the correct behavior for ASCII. --Per