Skip site navigation (1)Skip section navigation (2)
Date:      18 Oct 2020 11:48:37 -0400
From:      "John Levine" <>
Subject:   Re: printf(1) and UTF-8 multi-byte chars
Message-ID:  <20201018154838.49CBC239CEDF@ary.qy>
In-Reply-To: <>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
In article <> you write:
>On 2020-10-17, Matthias Apitz <> wrote:
>> This means the output of printf(1) is byte oriented and not
>> character oriented.
>This conforms to POSIX.

I don't think there is any useful middle ground between counting bytes
and full Unicode typesetting. Some Unicode characters are half- or
double-width, particularly in east Asian languages, and many combine
with adjacent characters depending on context, e.g., the character รถ
can be the single xF6 character which is two UTF-8 bytes, or a
combining diaresis x308 followed by lower case o x6F which is three
UTF-8 bytes, but one space wide either way.

Want to link to this message? Use this URL: <>