Skip site navigation (1)Skip section navigation (2)
Date:      18 Oct 2020 14:05:46 -0400
From:      "John R. Levine" <johnl@iecc.com>
To:        "Steve O'Hara-Smith" <steve@sohara.org>
Cc:        freebsd-questions@freebsd.org, naddy@mips.inka.de
Subject:   Re: printf(1) and UTF-8 multi-byte chars
Message-ID:  <3c62a326-887f-4f4e-dbb2-56666f7571a0@iecc.com>
In-Reply-To: <20201018182309.490ff752536eae2092533c5a@sohara.org>
References:  <slrnroo8n9.1iu4.naddy@lorvorc.mips.inka.de> <20201018154838.49CBC239CEDF@ary.qy> <20201018182309.490ff752536eae2092533c5a@sohara.org>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
> 	There are good reasons for using all three levels, here are some:
>
> Bytes: Content length headers, malloc calls - storage related

Sure.

> Glyphs: Truncation, apparent length, sorting - appearance related

Not so much.  I suppose it's preferable to truncate at a glyph boundary, 
but sorting UTF-8 bytes gives you the same order as sorting the glyphs, 
and for useful sorting you need to deal with issues like normalized forms 
and case folding.  Not sure what use apparent length would be since the 
number of glyphs tells you neither the number of visible characters nor 
how wide they are.

> Unicode Characters: UTF-8/16/32 conversions - encoding related

That and a lot of composition and display issues.

Regards,
John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly



Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?3c62a326-887f-4f4e-dbb2-56666f7571a0>