Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 3 Nov 2019 03:38:25 +0100
From:      Per Hedeland <per@hedeland.org>
To:        "Ronald F. Guilmette" <rfg@tristatelogic.com>
Cc:        freebsd-questions@freebsd.org
Subject:   Re: sort is broken
Message-ID:  <f416a932-7084-bec3-8a7a-8efaaebc2952@hedeland.org>
In-Reply-To: <8847.1572745058@segfault.tristatelogic.com>
References:  <8847.1572745058@segfault.tristatelogic.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2019-11-03 02:37, Ronald F. Guilmette wrote:
> In message <20191102233528.CFE66E4728E@ary.local>, you wrote:
> 
>> In article <7668.1572729288@segfault.tristatelogic.com> you write:
>>> Not a question, just an expression of grief and deep dismay.
>>>
>>> It is a sad day when even very fundamental tools, used in billions
>>> of scripts, such as /usr/bin/sort turn up broken.
>>>
>>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241679
>>
>> I tried it on 11.3 and 12.0 and it works fine.
>>
>> What's in your environment, particularly what's LC_ALL set to?
> 
> In my env, LC_ALL is not set at all.
> 
> I do have these, but not sure if they make any difference:
> 
> LANG=en_US.UTF-8

This, in combination with trying to sort a file with contents that
*isn't* valid UTF-8, is the reason for the behavior you observe - see
my previous post.

The specification of how LANG and the LC_* variables (should) interact
can be found at
https://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html - I
believe setting only LANG is the "normal" way to specify a locale.

If you convert your file to UTF-8, e.g. using the strange behavior of
'sort':

$ sort test > test.utf8

- or more "properly" (assuming you have the libiconv package
installed):

$ iconv -f ISO-8859-1 -t UTF-8 test > test.utf8

- you will find that the test.utf8 file is handled correctly by
'sort', both as filename argument and as stdin.

> XTERM_LOCALE=en_US.UTF-8

This - which is actually set by xterm based on how it was started -
implies that your xterm will decode UTF-8 and display the "real"
character.

--Per Hedeland



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f416a932-7084-bec3-8a7a-8efaaebc2952>