From owner-freebsd-current@FreeBSD.ORG  Wed Feb 13 20:04:22 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: current@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id AE9459B7
 for <current@freebsd.org>; Wed, 13 Feb 2013 20:04:22 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net
 [IPv6:2001:470:1f10:75::2])
 by mx1.freebsd.org (Postfix) with ESMTP id 85832CB5
 for <current@freebsd.org>; Wed, 13 Feb 2013 20:04:22 +0000 (UTC)
Received: from pakbsde14.localnet (unknown [38.105.238.108])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id B29AEB98A;
 Wed, 13 Feb 2013 15:04:21 -0500 (EST)
From: John Baldwin <jhb@freebsd.org>
To: Jilles Tjoelker <jilles@stack.nl>
Subject: Re: [PATCH] open_memstream() and open_wmemstream()
Date: Wed, 13 Feb 2013 11:44:19 -0500
User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; )
References: <201302051546.43839.jhb@freebsd.org>
 <20130207211222.GA98989@stack.nl>
In-Reply-To: <20130207211222.GA98989@stack.nl>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201302131144.19617.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Wed, 13 Feb 2013 15:04:21 -0500 (EST)
Cc: current@freebsd.org
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Feb 2013 20:04:22 -0000

On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote:
> On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote:
> > I've written an implementation of open_memstream() and
> > open_wmemstream() along with a set of regression tests.  I'm pretty
> > sure open_memstream() is correct, and I believe open_wmemstream() is
> > correct for expected usage.  The latter might even do the right thing
> > if you split a multi-byte character across multiple writes.  One
> > question I have is if my choice to discard any pending multi-byte
> > state in the stream anytime a seek changes the effective position in
> > the output stream.  I think this is correct as stdio will flush any
> > pending data before doing a seek, so if there is a partially parsed
> > character we aren't going to get the rest of it.
> 
> I don't think partially parsed characters can happen with a correct
> application. As per C99, an application must not call byte output
> functions on a wide-oriented stream, and vice versa.

Oh, interesting.  Should I remove those tests for using byte output
functions from the open_wmemstream test?

> Discarding the shift state on fseek()/fseeko() is permitted (but should
> be documented as this is implementation-defined behaviour).
> State-dependent encodings (where this is relevant) are rarely used
> nowadays.
> 
> The conversion to bytes and back probably makes open_wmemstream() quite
> slow but I don't think that is very important.

Note that we do this already for swprintf().  I've added this note:

IMPLEMENTATION NOTES
     Internally all I/O streams are effectively byte-oriented, so using wide-
     oriented operations to write to a stream opened via open_wmemstream()
     results in wide characters being expanded to a stream of multibyte char-
     acters in stdio's internal buffers.  These multibyte characters are then
     converted back to wide characters when written into the stream.  As a
     result, the wide-oriented streams maintain an internal multibyte charac-
     ter conversion state that is cleared on any seek opertion that changes
     the current position.  This should have no effect unless byte-oriented
     output operations are used on a wide-oriented stream.
 
> > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch
> 
> The seek functions should check for overflow in the addition (for
> SEEK_CUR and SEEK_END) and the conversion to size_t.

Note that SEEK_CUR is effectively only called with an offset of 0
via __ftello().  I'm not sure of the best way to check for overflow, do I have 
to do something like this:

static int
will_overflow(size_t off, fpos_t pos)
{
	if (pos < 0)
		return (-pos > off);
	return (SIZE_MAX - off > pos);
}

For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and
for SEEK_END 'will_overflow(ms->len, pos)'

Presumably SEEK_SET should test for the requested position being
beyond SIZE_MAX as well?

If my above test is ok then I end up with this:

static int
will_overflow(size_t offset, fpos_t pos)
{
	if (pos < 0) {
		if (-pos > off) {
			errno = EINVAL;
			return (1);
		}
	} else {
		if (SIZE_MAX - off > pos) {
			errno = EOVERFLOW;
			return (1);
		}
	}
	return (0);
}

static fpos_t
memstream_seek(void *cookie, fpos_t pos, int whence)
{
	struct memstream *ms;
#ifdef DEBUG
	size_t old;
#endif

	ms = cookie;
#ifdef DEBUG
	old = ms->offset;
#endif
	switch (whence) {
	case SEEK_SET:
		if (pos > SIZE_MAX) {
			errno = EOVERFLOW;
			return (-1);
		}
		ms->offset = pos;
		break;
	case SEEK_CUR:
		if (will_overflow(ms->offset, pos))
			return (-1);
		ms->offset += pos;
		break;
	case SEEK_END:
		if (will_overflow(ms->len, pos))
			return (-1);
		ms->offset = ms->len + pos;
		break;
	}
	memstream_update(ms);
#ifdef DEBUG
	fprintf(stderr, "MS: seek(%p, %zd, %d) %zd -> %zd\n", ms, pos, whence,
	    old, ms->offset);
#endif
	return (ms->offset);
}

Thanks for reviewing this so far.

-- 
John Baldwin