Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 07 Feb 2006 02:53:46 +0100
From:      Martin Krzysiak <cinek@gmx.de>
To:        freebsd-stable@FreeBSD.ORG
Subject:   Re: tr(1) buggy with de_DE.ISO8859-1(5) locale?
Message-ID:  <43E7FDAA.3010409@gmx.de>
In-Reply-To: <200602061658.k16GwqLr068150@lurza.secnetix.de>
References:  <200602061658.k16GwqLr068150@lurza.secnetix.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Oliver Fromme wrote:

> It's not a bug.  It's perfectly POSIX-compatible.

I think this behavior is "undefined" in POSIX, as
I found in some documents. This is a difference.

> To convert lower case to upper case, use the command
> "tr '[:lower:]' '[:upper:]'" (or enumerate all letters
> explicitely, like "tr abcdef ABCDEF").  Skripts that
> use things like "tr a-z A-Z" are broken and need to be
> fixed.

It's not only upper-lowercase conversion that is weird.
Try "echo wxyz | tr w-z a-d". Ranges are broken generally
in ISO-locales, in my opinion.

> By the way:  Do not set LANG or LC_ALL, expecially for
> the root user, and especially when compiling things.

One thing I like about FreeBSD is that I have my German
environment. But you are right. The only locale that is
expected to work correctly is "C".

> Not only will tr behave in unexpected ways when used
> like above, but also other things might break.  For
> example, German month names appear in "ls -l", which
> will break scripts that try to parse them.

Don't tell me about localization problems. I've seen
lots of stupid things. The latest one was a localized
"Date:" header produced by a commercial application.

> Some tools
> use decimal commas instead of decimal points, which
> can lead to further confusion, etc.  Yes, scripts
> which try to do that are broken, but they do exist.

Yes. You are right.

How many times did you use tr(1) to convert your texts
to upper/lower case? Do you expect that it works correctly?
I would prefer to use it like: "tr a-zäöü A-ZÄÖÜ",
_if_ I ever need to do it.

> If you only need support for German umlauts, then only
> set LC_CTYPE.  That shouldn't break anything.

I appreciate really really really that FreeBSD supports
German locales.

Let's stop arguing. I just wanted to ask about the behavior.
Now I know that something might by fishy with tr(1) and I
understand how to avoid this problem. That's all I need to
know.

For people who are interested in a simple workaround.
Don't use de_DE.ISO8859-1(5). Instead use de_DE.UTF-8.
tr(1)'s ranges work like expected there.

Martin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?43E7FDAA.3010409>