From owner-freebsd-questions@freebsd.org Sun Oct 18 18:05:49 2020 Return-Path: Delivered-To: freebsd-questions@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 7EFD84386A7 for ; Sun, 18 Oct 2020 18:05:49 +0000 (UTC) (envelope-from johnl@iecc.com) Received: from gal.iecc.com (gal.iecc.com [IPv6:2001:470:1f07:1126:0:43:6f73:7461]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "gal.iecc.com", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CDnqw1Vf6z4mHj for ; Sun, 18 Oct 2020 18:05:47 +0000 (UTC) (envelope-from johnl@iecc.com) Received: (qmail 50194 invoked from network); 18 Oct 2020 18:05:46 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=date:message-id:from:to:cc:subject:in-reply-to:references:mime-version:content-type; s=c410.5f8c83fa.k2010; i=johnl-iecc.com@submit.iecc.com; bh=WR8yd3gRLOgnHFRURiw0fEmg2h6B/lvqgjllHbcdG/0=; b=Wd53auyC8ryr06b2ZJsSiGE3yQgWF9onQaKOM8ULcqk2ko9DRLNq63aJxWCeiKx6pPrvUKhPD4VBXVAcjGkbDGJS9gVQUnRGbZ4prriKDreZXLumgG3na3eLqDUyyHFtszoKdUzH1YQmsBGMd25+d2T2UsXJNWyvLst43914jrzWGBVTbG4b8dnOmqohyeEc/TMOqnhmrmKZCvkyUrOA71i2C6oZcGQSj2VowIpuSg5ZXl4NyHjNBw+nOlx9OGdG6jQVPLVHdmtylImJ8vlpkwN4gwtTRAeHnSFRvMhyk9Jfn/PI1utPsANLYMH6jgVnixctrPPUkanhflG02U+66w== Received: from localhost ([IPv6:2001:470:1f07:1126::78:696d:6170]) by imap.iecc.com ([IPv6:2001:470:1f07:1126::78:696d:6170]) with ESMTPSA (TLS1.3 ECDHE-RSA AES-256-GCM AEAD, johnl@iecc.com) via TCP6; 18 Oct 2020 18:05:46 -0000 Date: 18 Oct 2020 14:05:46 -0400 Message-ID: <3c62a326-887f-4f4e-dbb2-56666f7571a0@iecc.com> From: "John R. Levine" To: "Steve O'Hara-Smith" Cc: freebsd-questions@freebsd.org, naddy@mips.inka.de Subject: Re: printf(1) and UTF-8 multi-byte chars In-Reply-To: <20201018182309.490ff752536eae2092533c5a@sohara.org> References: <20201018154838.49CBC239CEDF@ary.qy> <20201018182309.490ff752536eae2092533c5a@sohara.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Rspamd-Queue-Id: 4CDnqw1Vf6z4mHj X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=iecc.com header.s=c410.5f8c83fa.k2010 header.b=Wd53auyC; dmarc=pass (policy=none) header.from=iecc.com; spf=pass (mx1.freebsd.org: domain of johnl@iecc.com designates 2001:470:1f07:1126:0:43:6f73:7461 as permitted sender) smtp.mailfrom=johnl@iecc.com X-Spamd-Result: default: False [-5.60 / 15.00]; RCVD_TLS_ALL(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[iecc.com:s=c410.5f8c83fa.k2010]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:1f07:1126::/64]; MIME_GOOD(-0.10)[text/plain]; ARC_NA(0.00)[]; DWL_DNSWL_MED(-2.00)[iecc.com:dkim]; NEURAL_HAM_LONG(-0.96)[-0.962]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[iecc.com:+]; DMARC_POLICY_ALLOW(-0.50)[iecc.com,none]; NEURAL_HAM_SHORT(-0.67)[-0.673]; NEURAL_HAM_MEDIUM(-0.96)[-0.963]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-questions] X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 18 Oct 2020 18:05:49 -0000 > There are good reasons for using all three levels, here are some: > > Bytes: Content length headers, malloc calls - storage related Sure. > Glyphs: Truncation, apparent length, sorting - appearance related Not so much. I suppose it's preferable to truncate at a glyph boundary, but sorting UTF-8 bytes gives you the same order as sorting the glyphs, and for useful sorting you need to deal with issues like normalized forms and case folding. Not sure what use apparent length would be since the number of glyphs tells you neither the number of visible characters nor how wide they are. > Unicode Characters: UTF-8/16/32 conversions - encoding related That and a lot of composition and display issues. Regards, John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies", Please consider the environment before reading this e-mail. https://jl.ly