From owner-freebsd-current@FreeBSD.ORG Wed Feb 13 20:04:22 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AE9459B7 for ; Wed, 13 Feb 2013 20:04:22 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id 85832CB5 for ; Wed, 13 Feb 2013 20:04:22 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id B29AEB98A; Wed, 13 Feb 2013 15:04:21 -0500 (EST) From: John Baldwin To: Jilles Tjoelker Subject: Re: [PATCH] open_memstream() and open_wmemstream() Date: Wed, 13 Feb 2013 11:44:19 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201302051546.43839.jhb@freebsd.org> <20130207211222.GA98989@stack.nl> In-Reply-To: <20130207211222.GA98989@stack.nl> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201302131144.19617.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 13 Feb 2013 15:04:21 -0500 (EST) Cc: current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Feb 2013 20:04:22 -0000 On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote: > On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote: > > I've written an implementation of open_memstream() and > > open_wmemstream() along with a set of regression tests. I'm pretty > > sure open_memstream() is correct, and I believe open_wmemstream() is > > correct for expected usage. The latter might even do the right thing > > if you split a multi-byte character across multiple writes. One > > question I have is if my choice to discard any pending multi-byte > > state in the stream anytime a seek changes the effective position in > > the output stream. I think this is correct as stdio will flush any > > pending data before doing a seek, so if there is a partially parsed > > character we aren't going to get the rest of it. > > I don't think partially parsed characters can happen with a correct > application. As per C99, an application must not call byte output > functions on a wide-oriented stream, and vice versa. Oh, interesting. Should I remove those tests for using byte output functions from the open_wmemstream test? > Discarding the shift state on fseek()/fseeko() is permitted (but should > be documented as this is implementation-defined behaviour). > State-dependent encodings (where this is relevant) are rarely used > nowadays. > > The conversion to bytes and back probably makes open_wmemstream() quite > slow but I don't think that is very important. Note that we do this already for swprintf(). I've added this note: IMPLEMENTATION NOTES Internally all I/O streams are effectively byte-oriented, so using wide- oriented operations to write to a stream opened via open_wmemstream() results in wide characters being expanded to a stream of multibyte char- acters in stdio's internal buffers. These multibyte characters are then converted back to wide characters when written into the stream. As a result, the wide-oriented streams maintain an internal multibyte charac- ter conversion state that is cleared on any seek opertion that changes the current position. This should have no effect unless byte-oriented output operations are used on a wide-oriented stream. > > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch > > The seek functions should check for overflow in the addition (for > SEEK_CUR and SEEK_END) and the conversion to size_t. Note that SEEK_CUR is effectively only called with an offset of 0 via __ftello(). I'm not sure of the best way to check for overflow, do I have to do something like this: static int will_overflow(size_t off, fpos_t pos) { if (pos < 0) return (-pos > off); return (SIZE_MAX - off > pos); } For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and for SEEK_END 'will_overflow(ms->len, pos)' Presumably SEEK_SET should test for the requested position being beyond SIZE_MAX as well? If my above test is ok then I end up with this: static int will_overflow(size_t offset, fpos_t pos) { if (pos < 0) { if (-pos > off) { errno = EINVAL; return (1); } } else { if (SIZE_MAX - off > pos) { errno = EOVERFLOW; return (1); } } return (0); } static fpos_t memstream_seek(void *cookie, fpos_t pos, int whence) { struct memstream *ms; #ifdef DEBUG size_t old; #endif ms = cookie; #ifdef DEBUG old = ms->offset; #endif switch (whence) { case SEEK_SET: if (pos > SIZE_MAX) { errno = EOVERFLOW; return (-1); } ms->offset = pos; break; case SEEK_CUR: if (will_overflow(ms->offset, pos)) return (-1); ms->offset += pos; break; case SEEK_END: if (will_overflow(ms->len, pos)) return (-1); ms->offset = ms->len + pos; break; } memstream_update(ms); #ifdef DEBUG fprintf(stderr, "MS: seek(%p, %zd, %d) %zd -> %zd\n", ms, pos, whence, old, ms->offset); #endif return (ms->offset); } Thanks for reviewing this so far. -- John Baldwin