Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 03 Jul 2018 02:09:56 +0900 (JST)
From:      Hiroki Sato <hrs@FreeBSD.org>
To:        daichigoto@icloud.com
Cc:        lists@eitanadler.com, daichi@freebsd.org, gnn@FreeBSD.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r335836 - head/usr.bin/top
Message-ID:  <20180703.020956.859981414196673670.hrs@allbsd.org>
In-Reply-To: <459BD898-8072-426E-A968-96C1382AC616@icloud.com>
References:  <CAF6rxg=Zjkf6EbSgt1fBQBUDHGKWwLf=n9ZJweJH%2BDi800kJ3w@mail.gmail.com> <20180702.155529.1102410939281120947.hrs@allbsd.org> <459BD898-8072-426E-A968-96C1382AC616@icloud.com>

next in thread | previous in thread | raw e-mail | index | archive | help
----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)--
Content-Type: Text/Plain; charset=iso-2022-jp
Content-Transfer-Encoding: 7bit

$B8eF#BgCO(B <daichigoto@icloud.com> wrote
  in <459BD898-8072-426E-A968-96C1382AC616@icloud.com>:

da>
da>
da> > 2018/07/02 15:55$B!"(BHiroki Sato <hrs@FreeBSD.org>$B$N%a!<%k(B:
da> >
da> > Eitan Adler <lists@eitanadler.com> wrote
da> >  in <CAF6rxg=Zjkf6EbSgt1fBQBUDHGKWwLf=n9ZJweJH+Di800kJ3w@mail.gmail.com>:
da> >
da> > li> On 1 July 2018 at 10:08, Conrad Meyer <cem@freebsd.org> wrote:
da> > li> > Hi Daichi,
da> > li> >
da> > li> >
da> > li> >
da> > li> > I don't think code to decode UTF-8 belongs in top(1).  I don't know
da> > li> > what the goal of this routine is, but I doubt this is the right way to
da> > li> > accomplish it.
da> > li>
da> > li> For the record, I agree. This is why I didn't click "accept" on the
da> > li> revision. I don't fully oppose leaving it in top(1) for now as we work
da> > li> out the API, but long term its the wrong place.
da> > li>
da> > li> https://reviews.freebsd.org/D16058 is the review.
da> >
da> > I strongly object this kind of encoding-specific routine.  Please
da> > back out it.  The problem is that top(1) does not support multibyte
da> > encoding in functions for printing, and using C99 wide/multibyte
da> > character manipulation API such as iswprint(3) is the way to solve
da> > it.  Doing getenv("LANG") and assuming an encoding based on it is a
da> > very bad practice to internationalize software.
da> >
da> > -- Hiroki
da>
da> I respect what you mean.
da>
da> Once I back out, I will begin implementing it in a different way.
da> Please advise which function should be used for implementation
da> (iswprint (3) and what other functions should be used?)

 Roughly speaking, POSIX/XPG/C99 I18N model requires the following
 steps:

 1. Call setlocale(LC_ALL, "") first.

 2. Use mbs<->wcs and/or mb<->wc conversion functions in C95/C99 to
    manipulate characters and strings depending on what you want to
    do.  The printable() function should use mbtowc(3) and
    iswprint(3), for example.  And wcslen(3) should be used to
    determine the length of characters to be printed instead of
    strlen().

    Note that if mbs->wcs or mb->wc conversion fails with EILSEQ at
    some point, some of the character(s) are invalid for printing.
    This can happen because command-line parameters in top(1) are not
    always encoded in one specified in LC_CTYPE or LANG.  It should
    also be handled as non-printable.  However, to make matters worse,
    each process does not always use a single, same locale as top(1).
    A process invoked with LANG=ja_JP.eucJP may have EUC-JP characters
    in its ARGV array even if top(1) runs by another user whose LANG
    is en_US.UTF-8.  You have to determine which locale should be used
    before doing mb->wc conversion.  It is not so simple.

 3. Print the multibyte characters by using strvisx(3) family, which
    supports multibyte character, or swprintf(3) family if you want to
    format wide characters directly.  Note that buffer length for
    strvisx(3) must be calculated by using MB_LEN_MAX.

 I recommend you to learn about I18N by reading the following
 documents since this involves an I18N programming model, not just a
 matter of which function should be used.  While they are quite old
 and contain system-specific topics, they are still useful to
 understand general overview of how XPG4 and the relevant C95/C99 APIs
 work:

 [1] Developer's Guide to Internationalization (801-6660)
     https://docs.oracle.com/cd/E19457-01/801-6660/801-6660.pdf

 [2] Software Internationalization Guide (526225-002)
     https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-c02131936

 [3] ISO/IEC 9899:TC2 draft (p.204, Sec. 7.11 Localization)
     http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf

 [4] Internationalization Guide, Version 2
     ISBN: 978-0133535419

-- Hiroki

----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iEUEABECAAYFAls6XGQACgkQTyzT2CeTzy0S1gCYqZxIks21KRt8aXhWQFAbZc32
ZACcCe/wIH4C05HgRdJso+ALuG43WNk=
=UBXt
-----END PGP SIGNATURE-----

----Security_Multipart(Tue_Jul__3_02_09_56_2018_607)----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180703.020956.859981414196673670.hrs>