Date: Wed, 20 Jul 2016 13:23:46 -0700 (PDT) From: Don Lewis <truckman@FreeBSD.org> To: darkuranium@gmail.com Cc: freebsd-current@freebsd.org Subject: Re: UTF-8 by default? Message-ID: <201607202023.u6KKNksl055230@gw.catspoiler.org> In-Reply-To: <CANd9X8fFB8OAmc1oasJNb8HxANmh6qKQqYbmhHwiBv1=K3w%2Bmw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 20 Jul, Tim Čas wrote: > On 20 July 2016 at 20:33, Don Lewis <truckman@freebsd.org> wrote: >> wc(1) has problems with its multibyte support pointed out by Coverity >> as I recall. > > Not sure how critical that issue is (e.g. byte counts [`-c`], line > counts [`-l`], and such should still work as intended; whether word > counts work or not depends on whether we should count Unicode > whitespace as, well, whitespace). I do wonder if everyone agrees that > an effort should be made towards UTF-8 default, though? It passes a fixed-length non-NUL terminated buffer (returned by read(2)) to mbrtowc(). In addition to the lack of termination, the buffer could also contain a partial character at its beginning or end if the contents are UTF-8. The Coverity ID is 978825.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201607202023.u6KKNksl055230>