FreeBSD Mail Archives

Date:      Thu, 31 Jul 2014 10:55:47 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Phil Shafer <phil@juniper.net>
Cc:        sjg@freebsd.org, arch@freebsd.org, marcel@freebsd.org
Subject:   Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML
Message-ID:  <20140731175547.GO43962@funkthat.com>
In-Reply-To: <201407302324.s6UNOB2H087915@idle.juniper.net>
References:  <20140730193819.GM43962@funkthat.com> <201407302324.s6UNOB2H087915@idle.juniper.net>

Phil Shafer wrote this message on Wed, Jul 30, 2014 at 19:24 -0400:
> John-Mark Gurney writes:
> >My vote would be to use and *enforce* UTF-8 by the API.   That means if
> >someone passes a string in, it must be properly formed UTF-8...
> 
> I can certainly see making this an option, detecting the high-bit
> and inspecting the following 1-5 bytes to ensure the corresponding
> high two bits are set appropriately.  But what action would you
> expect the library to take when invalid strings are passed in?

Return an error?  printf can return an error, yet most people don't
check it.. so no real difference in API/bugs...

The reason I even suggest this is that JSON requires the output to be
in Unicode... Not some special locale encoding..  See section 3 of:
https://www.ietf.org/rfc/rfc4627.txt

Besides we should finally move to UTF-8 for file system and other
parts of the system...  I do like the idea of random binary filenames,
but we really should stop sticking our head in the sand.. We will only
make ourselves look silly when 2020 roles around if we don't...

> libxo supports a warning flag, that will trigger warnings on stderr
> for things like invalid or malformed format strings, but I'm not
> sure I'd be happy if the library skipped invalid strings.

printf may skip parts of your strings if you don't check it's return
value...  Plus, if the API states you must pass in UTF-8 strings,
and someone doesn't properly encode/convert to UTF-8, it's their
bug, not the library's bug...  We have too many encoding issues
already in our source tree, and we need to get better about making
sure we don't have them, and this will help...

> BTW, this issue is driven by "w"s use of wide characters (for
> days of the week).

Plus, enforcing UTF-8 will make the w versions easier, and allow
the library to output other width of UTF if wanted/requested..

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140731175547.GO43962>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation