From owner-freebsd-hackers@freebsd.org Wed Jun 20 01:34:31 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 546CD101C5BA for ; Wed, 20 Jun 2018 01:34:31 +0000 (UTC) (envelope-from khanzf@gmail.com) Received: from mail-it0-x236.google.com (mail-it0-x236.google.com [IPv6:2607:f8b0:4001:c0b::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DC2938031F for ; Wed, 20 Jun 2018 01:34:30 +0000 (UTC) (envelope-from khanzf@gmail.com) Received: by mail-it0-x236.google.com with SMTP id u4-v6so3228299itg.0 for ; Tue, 19 Jun 2018 18:34:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=BCeGtnxJII3mvGep4FfWjqejhbDJlkLUA2wwYBZh6nk=; b=E2mNK0iGBeQK0A1tlpeROCJRQr23T+sL3K0JZVegz5L9k1oUww8H7pdP6jD31KXaog ZaL+4VNQm0Z3vPWQlr+zaaw0loplmyb5Ta3sLLguBoj2qS+jsJhR20FoGqjV5/D4D0dX xYjkJswxxtf/FIjLXGSQLt0aKAruDBUgrfa/yVDe+l2rGP+7KmZoNLobJgjIl/Iwyj7R 3xpCE/LOoCq5P7qkgoYfQrqRcHny/bPImtxY4eQ4DECO9aBT9+A6kMgAjG8aNOrFAIUn YeEzFYgANcz7z/Ox6DdeTkWKm4fD6yxGKcA54B+ORc7B0AwHYa3myBsMlb0bUJxB0F5X 3rig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=BCeGtnxJII3mvGep4FfWjqejhbDJlkLUA2wwYBZh6nk=; b=E3NBUJ/F8zVWkkVqya34pWVYc32dX1SzdypTTKkUKJ8C0kXOZmCBMFW7Yfdq+7CUbs 0Ko2q7oESJfKr6EHoewuJL8vvZcPfDJh9Mbem1L52LiFoBbpJqUiWPokPvVusxi+6VMo Q3uToAgob9eZqDjLYwIRnZ9+FdazQa0c8XBYIJQ0EgihDU9s0i0H2tfDY72q415i16h9 fTODyDj7XDM9O+UkTaawDGAggCkpcaTderA4KY9HuVDhHb1jneoq6MP8wuHInQJrNOHg Q58YWywfOPMNJx1AkYlH5ojVbKqgIvz0NYQYx0cc/h1zBvBU/r42IyCllZr28a6xfi7G Uv0A== X-Gm-Message-State: APt69E3NwHCDr0Gfrn9R4X30OIXOicHy26YK3hvNfx9jb6DcAyZAD1iY UPB5cpfT9SpvQCi8UIVuV/7G2LcNx6i5FgujBiOcuygW X-Google-Smtp-Source: ADUXVKIhIDHzeDjKWvUiTh1aZY3S9qgsI3RNDknCLj3lIPcBhR49d+3M7PtcnhpHqi4p0otRoGUUg4TbnfCqtQH1DW0= X-Received: by 2002:a24:e51a:: with SMTP id g26-v6mr31998iti.43.1529458469972; Tue, 19 Jun 2018 18:34:29 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac0:a148:0:0:0:0:0 with HTTP; Tue, 19 Jun 2018 18:34:09 -0700 (PDT) In-Reply-To: <20180202035130.C51F8156E80B@mail.bitblocks.com> References: <20180201072831.GA2239@c720-r314251> <20180202035130.C51F8156E80B@mail.bitblocks.com> From: Farhan Khan Date: Tue, 19 Jun 2018 21:34:09 -0400 Message-ID: Subject: Re: Printing UTF-8 characters To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2018 01:34:31 -0000 On Thu, Feb 1, 2018 at 10:51 PM, Bakul Shah wrote: > On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan wrote: >> Sorry, that was a poorly phrased question on my part. Let me try again. >> I am trying to make text align in columns in a terminal. My >> understanding is that characters above 0x7E are 3 bytes in length. A >> modern terminal will render that as either a single question-mark or >> the character itself, making terminal column alignment easy. But how >> would an older terminal display a 3-byte character? I am worried that >> would render as 3 question marks and throw off column alignment. If >> so, is there a proper way to perform alignment for both newer and >> older terminals? > > UTF-8 can use upto 4 bytes to encode a unicode point, > depending on the script. > > For what you want, you can use openoffice like programs that > understand unicode and can do complex text layout. Normal > terminal programs typically use monospace (fixed width) fonts > are simply not capable of what you want. The assumption that > one char means one rectangular cell on the screen is too > deeply woven in them. Particularly for Indic languages this > just doesn't work, You may have N unicode points, each of > which require 3 bytes, all together map to a one single glyph. Hi all, To follow-up from my earlier poorly asked question from a few months back, how do I determine if the terminal is capable of printing UTF-8 encoded strings and/or unicode in general? The obvious answer is to check the LANG variable via getenv(3), but what if you are using "en_US.UTF-8" vs "en_GB.UTF-8"? Should I just check for the string "UTF-8" in the LANG variable? My concern is printing characters above 0x7F on terminals/encodings that are not capable of displaying them, resulting in unusual behavior. Thanks, -- Farhan Khan PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE