Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Feb 2018 13:31:13 +0100
From:      Eivind Nicolay Evensen <eivinde@terraplane.org>
To:        Brandon Allbery <allbery.b@gmail.com>
Cc:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: Locale problem updating 10.3 to 11.1
Message-ID:  <20180221123112.GB75251@klump.hjerdalen.lokalnett>
In-Reply-To: <CAKFCL4WgvbTfg9Hxh_Rvd_C4BSoUETczP_P75nWPg2HvJOSHOA@mail.gmail.com>
References:  <20180218230251.GA60727@klump.hjerdalen.lokalnett> <alpine.BSF.2.21.1802190032250.24158@mail.fig.ol.no> <20180219081129.GB62932@klump.hjerdalen.lokalnett> <20180220230822.GA72560@klump.hjerdalen.lokalnett> <CAKFCL4VDs6YYUNkMPPo6sWHMne2rtbzeKsfEQKK96BQyHPZkjg@mail.gmail.com> <20180221120811.GA75251@klump.hjerdalen.lokalnett> <CAKFCL4WgvbTfg9Hxh_Rvd_C4BSoUETczP_P75nWPg2HvJOSHOA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Feb 21, 2018 at 07:16:49AM -0500, Brandon Allbery wrote:
> A locale mapping is basically a lookup table (with complications for things
> like ß). A single-byte lookup table will be 256 entries, each holding one
> or more (because of combining characters) Unicode codepoints representing
> the mapping from the locale character set to the underlying common
> character set (Unicode). (There may also be a reverse lookup table for
> mapping Unicode codepoints to locale codepoints.)

That's fine, it doesn't make my life miserable such as it would when
directly using multibyte character sets, as long as it doesn't
negatively affect performance.

> Without this, every program would have to deal directly with every possible
> character set.

Or only handle what one cares about.

> (Complications include things like: depending on encoding/locale details,
> German lowercase ß will uppercase to either SS or ???.

While German is not my main language, I've never seen a situation
where an uppcase variant of ß would make sense, though I understand
the example.

> And that's one of the
> simpler ones; for some locales, things can get *really* weird. Not to
> mention fun stuff like Arabic having 4 representations of every character:
> initial, medial, final, standalone.)

Complications I don't want or need, nicely points out what I dislike
about unicode, although I can understand some os wanting to support it,
to be useful in more situations.



-- 
Eivind



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180221123112.GB75251>