FreeBSD Mail Archives

Date:      Wed, 04 Jul 2018 10:42:52 +0900 (JST)
From:      Hiroki Sato <hrs@FreeBSD.org>
To:        jilles@stack.nl
Cc:        daichigoto@icloud.com, lists@eitanadler.com, daichi@freebsd.org, gnn@FreeBSD.org, cem@freebsd.org, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org
Subject:   Re: svn commit: r335836 - head/usr.bin/top
Message-ID:  <20180704.104252.1616889858955681927.hrs@allbsd.org>
In-Reply-To: <20180703211002.GA11832@stack.nl>
References:  <459BD898-8072-426E-A968-96C1382AC616@icloud.com> <20180703.020956.859981414196673670.hrs@allbsd.org> <20180703211002.GA11832@stack.nl>

----Security_Multipart(Wed_Jul__4_10_42_52_2018_164)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Jilles Tjoelker <jilles@stack.nl> wrote
  in <20180703211002.GA11832@stack.nl>:

ji> >  3. Print the multibyte characters by using strvisx(3) family, which
ji> >     supports multibyte character, or swprintf(3) family if you want to
ji> >     format wide characters directly.  Note that buffer length for
ji> >     strvisx(3) must be calculated by using MB_LEN_MAX.
ji>
ji> In this case, calling setlocale() and then using strvisx() seems the
ji> right solution. If locales differ across processes this may result in
ji> mojibake but that cannot really be helped. Even analyzing other
ji> processes' locale variables is not fully reliable, since strings may be
ji> incorrectly encoded even in the process's real locale, environment
ji> variables cannot be read across users and the environment block may be
ji> overwritten by a program.
ji>
ji> In general, although using conversion to wide characters allows users a
ji> lot of flexibility, I don't think it is the best in all situations:
ji>
ji> * The result of mbstowcs() is a UTF-32 string which consumes a lot of
ji>   memory. A loop with mbrtowc() may also be slow. Many operations can be
ji>   done directly on UTF-8 strings with no or little additional complexity
ji>   compared to byte strings.
ji>
ji> * If there is an invalid multibyte character, there is little
ji>   flexibility to handle this usefully and securely, since so little is
ji>   known about the encoding. The best handling may depend on the context.
ji>
ji> Therefore, in /bin/sh, I have only implemented multibyte support for
ji> UTF-8. All other encodings have bytes treated as characters.
ji>
ji> However, I do agree that getenv("LANG") is bad. Instead, setlocale()
ji> should be used. After that, nl_langinfo(CODESET) can be called and the
ji> result compared to "UTF-8".

 Yes, I agree that using mb->wc conversion is not always the best and
 using strvisx() for cmdbuf, not only for argv, is enough in this
 case.  I thought it was difficult to avoid iswprint() because I was
 not sure of the goal of r335836 and it looked to me that it aimed to
 keep the original printable() function.  And as you mentioned it may
 not be worth to try to correctly detect/support locales in different
 processes, either.  Probably one of the simplest ways would be that
 relying on LC_CTYPE+strvisx() and documenting how top(1) handles
 multibyte characters in the manual page.

-- Hiroki

----Security_Multipart(Wed_Jul__4_10_42_52_2018_164)--
Content-Type: application/pgp-signature
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----

iEYEABECAAYFAls8JhwACgkQTyzT2CeTzy1IeQCaAodTCzM9gOB5rqO81+Gy24Q1
O60AnRmFR2/cYK0ov6a3d5Tma6vk/zff
=MhXt
-----END PGP SIGNATURE-----

----Security_Multipart(Wed_Jul__4_10_42_52_2018_164)----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180704.104252.1616889858955681927.hrs>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation