Skip site navigation (1)Skip section navigation (2)
Date:      18 Oct 2020 11:48:37 -0400
From:      "John Levine" <johnl@iecc.com>
To:        freebsd-questions@freebsd.org
Cc:        naddy@mips.inka.de
Subject:   Re: printf(1) and UTF-8 multi-byte chars
Message-ID:  <20201018154838.49CBC239CEDF@ary.qy>
In-Reply-To: <slrnroo8n9.1iu4.naddy@lorvorc.mips.inka.de>

next in thread | previous in thread | raw e-mail | index | archive | help
In article <slrnroo8n9.1iu4.naddy@lorvorc.mips.inka.de> you write:
>On 2020-10-17, Matthias Apitz <guru@unixarea.de> wrote:
>
>> This means the output of printf(1) is byte oriented and not
>> character oriented.
>
>This conforms to POSIX.

I don't think there is any useful middle ground between counting bytes
and full Unicode typesetting. Some Unicode characters are half- or
double-width, particularly in east Asian languages, and many combine
with adjacent characters depending on context, e.g., the character รถ
can be the single xF6 character which is two UTF-8 bytes, or a
combining diaresis x308 followed by lower case o x6F which is three
UTF-8 bytes, but one space wide either way.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20201018154838.49CBC239CEDF>