From owner-freebsd-current@FreeBSD.ORG  Fri Feb 15 15:52:23 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by hub.freebsd.org (Postfix) with ESMTP id 288833B0
 for <current@freebsd.org>; Fri, 15 Feb 2013 15:52:23 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id C99FC3E4
 for <current@freebsd.org>; Fri, 15 Feb 2013 15:52:22 +0000 (UTC)
Received: from pakbsde14.localnet (unknown [38.105.238.108])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 36B7FB94B;
 Fri, 15 Feb 2013 10:52:22 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: Jilles Tjoelker <jilles@stack.nl>
Subject: Re: [PATCH] open_memstream() and open_wmemstream()
Date: Fri, 15 Feb 2013 10:41:14 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; )
References: <201302051546.43839.jhb@freebsd.org>
 <201302131144.19617.jhb@freebsd.org> <20130215151011.GA16538@stack.nl>
In-Reply-To: <20130215151011.GA16538@stack.nl>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201302151041.14359.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Fri, 15 Feb 2013 10:52:22 -0500 (EST)
Cc: current@freebsd.org
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Feb 2013 15:52:23 -0000

On Friday, February 15, 2013 10:10:12 am Jilles Tjoelker wrote:
> On Wed, Feb 13, 2013 at 11:44:19AM -0500, John Baldwin wrote:
> > On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote:
> > > On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote:
> > > > I've written an implementation of open_memstream() and
> > > > open_wmemstream() along with a set of regression tests.  I'm pretty
> > > > sure open_memstream() is correct, and I believe open_wmemstream() is
> > > > correct for expected usage.  The latter might even do the right thing
> > > > if you split a multi-byte character across multiple writes.  One
> > > > question I have is if my choice to discard any pending multi-byte
> > > > state in the stream anytime a seek changes the effective position in
> > > > the output stream.  I think this is correct as stdio will flush any
> > > > pending data before doing a seek, so if there is a partially parsed
> > > > character we aren't going to get the rest of it.
> 
> > > I don't think partially parsed characters can happen with a correct
> > > application. As per C99, an application must not call byte output
> > > functions on a wide-oriented stream, and vice versa.
> 
> > Oh, interesting.  Should I remove those tests for using byte output
> > functions from the open_wmemstream test?
> 
> Yes. Unless real applications are identified that use this undefined
> behaviour, such a test would test the implementation and not the
> specification.

Ok.

> I briefly tried to run the tests against the glibc implementation
> (Ubuntu 10.04) and got many failures. Glibc does not check the
> parameters for null pointers and simply stores them blindly which is
> POSIX-compliant. Things like bad length 6 for "foo" appear to be glibc
> bugs, and test-open_wmemstream even segfaults.

Eh, the open group page requires EINVAL:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html

> > > Discarding the shift state on fseek()/fseeko() is permitted (but should
> > > be documented as this is implementation-defined behaviour).
> > > State-dependent encodings (where this is relevant) are rarely used
> > > nowadays.
> 
> > > The conversion to bytes and back probably makes open_wmemstream() quite
> > > slow but I don't think that is very important.
> 
> > Note that we do this already for swprintf().  I've added this note:
> 
> > IMPLEMENTATION NOTES
> >      Internally all I/O streams are effectively byte-oriented, so using wide-
> >      oriented operations to write to a stream opened via open_wmemstream()
> >      results in wide characters being expanded to a stream of multibyte char-
> >      acters in stdio's internal buffers.  These multibyte characters are then
> >      converted back to wide characters when written into the stream.  As a
> >      result, the wide-oriented streams maintain an internal multibyte charac-
> >      ter conversion state that is cleared on any seek opertion that changes
> >      the current position.  This should have no effect unless byte-oriented
> >      output operations are used on a wide-oriented stream.
> 
> This appears to "approve" the use of byte-oriented operations on a
> wide-oriented stream which is in fact undefined.

Hmm, I can remove the last bit ("unless..."), but am struggling to think of good
wording.  Maybe:

    This should only affect use of byte-oriented output operations on a wide-
    oriented stream which is an undefined operation and should be avoided.

> > > > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch
> 
> > > The seek functions should check for overflow in the addition (for
> > > SEEK_CUR and SEEK_END) and the conversion to size_t.
> 
> > Note that SEEK_CUR is effectively only called with an offset of 0
> > via __ftello().  I'm not sure of the best way to check for overflow, do I have 
> > to do something like this:
> 
> > static int
> > will_overflow(size_t off, fpos_t pos)
> > {
> > 	if (pos < 0)
> > 		return (-pos > off);
> > 	return (SIZE_MAX - off > pos);
> > }
> 
> > For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and
> > for SEEK_END 'will_overflow(ms->len, pos)'
> 
> > Presumably SEEK_SET should test for the requested position being
> > beyond SIZE_MAX as well?
> 
> > If my above test is ok then I end up with this:
> 
> Hmm, perhaps it is better to restrict to SSIZE_MAX instead of SIZE_MAX
> as this greatly simplifies the code. With the above code, comparisons
> may be signed or unsigned and there may be no greater type that fits
> both a size_t and an fpos_t. Negating a negative fpos_t is also
> incorrect because it may overflow. If you restrict the memory buffer to
> ssize_t, an ssize_t can be safely converted to an fpos_t.

Humm.  Presumably I have to check for overflow in the expression to grow
the stream as well?  I have no idea how to write these checks. :(

Of course, practically speaking (and I can make assumptions about the
implementation I think since this is libc internals after all),
fpos_t == off_t and sizeof(size_t) <= sizeof(off_t) for all platforms
we support (and likely will always be true, files can always be larger
than memory).  Does that simplify things if I assume that?

-- 
John Baldwin