Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 31 Jul 2014 14:09:37 -0700
From:      John-Mark Gurney <jmg@funkthat.com>
To:        Phil Shafer <phil@juniper.net>
Cc:        sjg@freebsd.org, arch@freebsd.org, marcel@freebsd.org
Subject:   Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML
Message-ID:  <20140731210937.GV43962@funkthat.com>
In-Reply-To: <201407311839.s6VIdlMK096434@idle.juniper.net>
References:  <20140731175547.GO43962@funkthat.com> <201407311839.s6VIdlMK096434@idle.juniper.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Phil Shafer wrote this message on Thu, Jul 31, 2014 at 14:39 -0400:
> John-Mark Gurney writes:
> >Return an error?  printf can return an error, yet most people don't
> >check it.. so no real difference in API/bugs...
> 
> My concern is emitting half a string, where the half we don't emit
> is something important.  I don't want to make the opposite of an
> injection attack, where arranging some daemon to call xo_emit with
> a broken UTF-8 string allows an evil-doer to fix their evil content
> into the other half of the string.
> 
> I'm escaping XML, JSON, and HTML content already, so the simplest
> scheme is to:
> 
> a) UTF-8 check the format string;
>    if it fails, nothing is emitted
> b) for each format descriptor, check the content generared;
>    if it fails, nothing is emitted from the xo_emit call
>       anything already generated is discarded
> 
> Simple and easy.  Seem reasonable?  The other option would be to
> discard only that specific format descriptor or only that field
> description.
> 
>     xo_emit("{:good/%d}{:bad/%d%s}{:ugly}", 0, 55, "\xff\x01\xff", "cat");
> 
> Does the "<ugly>cat</ugly>" get emitted?  Is "<bad>55</bad>" emitted?
> 
> If "ugly" was <run-this-command-as-user>phil</...>, and the bogus
> string blocked the generation of that vital bit of info, life could
> be bad.

I agree...

> Unfortunately, even this isn't a simple fix for "w", which wants
> call wcsftime() to get wide values for month and day-of-the-week
> names.  Does wcsrtombs() convert this to UTF-8?  Is there a locale
> for UTF-8?

Well, from my understanding there can't be a "locale" that is UTF-8
as a locale contains more than just character encoding...  It also
includes month/day names, sorting, etc...  I think you can get a
C locale (the default) w/ UTF-8 by setting the correct environment
variables, but I don't know them well enough to say...  Should we add
a locale that does this?  There is UTF-8 in /usr/share/locale, but if
you set LANG to it, things don't work..

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140731210937.GV43962>