From owner-freebsd-hackers@freebsd.org Fri Feb 2 03:59:46 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7C273EE1FCD for ; Fri, 2 Feb 2018 03:59:46 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (ns1.bitblocks.com [173.228.5.8]) by mx1.freebsd.org (Postfix) with ESMTP id 1A1DF77EF2 for ; Fri, 2 Feb 2018 03:59:45 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id C51F8156E80B; Thu, 1 Feb 2018 19:51:15 -0800 (PST) From: Bakul Shah To: Farhan Khan cc: freebsd-hackers@freebsd.org Subject: Re: Printing UTF-8 characters In-reply-to: Your message of "Thu, 01 Feb 2018 10:42:36 -0500." References: <20180201072831.GA2239@c720-r314251> Comments: In-reply-to Farhan Khan message dated "Thu, 01 Feb 2018 10:42:36 -0500." MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <87366.1517543475.1@bitblocks.com> Date: Thu, 01 Feb 2018 19:51:15 -0800 Message-Id: <20180202035130.C51F8156E80B@mail.bitblocks.com> X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Feb 2018 03:59:46 -0000 On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan wrote: > Sorry, that was a poorly phrased question on my part. Let me try again. > I am trying to make text align in columns in a terminal. My > understanding is that characters above 0x7E are 3 bytes in length. A > modern terminal will render that as either a single question-mark or > the character itself, making terminal column alignment easy. But how > would an older terminal display a 3-byte character? I am worried that > would render as 3 question marks and throw off column alignment. If > so, is there a proper way to perform alignment for both newer and > older terminals? UTF-8 can use upto 4 bytes to encode a unicode point, depending on the script. For what you want, you can use openoffice like programs that understand unicode and can do complex text layout. Normal terminal programs typically use monospace (fixed width) fonts are simply not capable of what you want. The assumption that one char means one rectangular cell on the screen is too deeply woven in them. Particularly for Indic languages this just doesn't work, You may have N unicode points, each of which require 3 bytes, all together map to a one single glyph.