From owner-freebsd-current@FreeBSD.ORG Fri Feb 15 15:52:23 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 288833B0 for ; Fri, 15 Feb 2013 15:52:23 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigknife-pt.tunnel.tserv9.chi1.ipv6.he.net [IPv6:2001:470:1f10:75::2]) by mx1.freebsd.org (Postfix) with ESMTP id C99FC3E4 for ; Fri, 15 Feb 2013 15:52:22 +0000 (UTC) Received: from pakbsde14.localnet (unknown [38.105.238.108]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 36B7FB94B; Fri, 15 Feb 2013 10:52:22 -0500 (EST) From: John Baldwin To: Jilles Tjoelker Subject: Re: [PATCH] open_memstream() and open_wmemstream() Date: Fri, 15 Feb 2013 10:41:14 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p25; KDE/4.5.5; amd64; ; ) References: <201302051546.43839.jhb@freebsd.org> <201302131144.19617.jhb@freebsd.org> <20130215151011.GA16538@stack.nl> In-Reply-To: <20130215151011.GA16538@stack.nl> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201302151041.14359.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 15 Feb 2013 10:52:22 -0500 (EST) Cc: current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Feb 2013 15:52:23 -0000 On Friday, February 15, 2013 10:10:12 am Jilles Tjoelker wrote: > On Wed, Feb 13, 2013 at 11:44:19AM -0500, John Baldwin wrote: > > On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote: > > > On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote: > > > > I've written an implementation of open_memstream() and > > > > open_wmemstream() along with a set of regression tests. I'm pretty > > > > sure open_memstream() is correct, and I believe open_wmemstream() is > > > > correct for expected usage. The latter might even do the right thing > > > > if you split a multi-byte character across multiple writes. One > > > > question I have is if my choice to discard any pending multi-byte > > > > state in the stream anytime a seek changes the effective position in > > > > the output stream. I think this is correct as stdio will flush any > > > > pending data before doing a seek, so if there is a partially parsed > > > > character we aren't going to get the rest of it. > > > > I don't think partially parsed characters can happen with a correct > > > application. As per C99, an application must not call byte output > > > functions on a wide-oriented stream, and vice versa. > > > Oh, interesting. Should I remove those tests for using byte output > > functions from the open_wmemstream test? > > Yes. Unless real applications are identified that use this undefined > behaviour, such a test would test the implementation and not the > specification. Ok. > I briefly tried to run the tests against the glibc implementation > (Ubuntu 10.04) and got many failures. Glibc does not check the > parameters for null pointers and simply stores them blindly which is > POSIX-compliant. Things like bad length 6 for "foo" appear to be glibc > bugs, and test-open_wmemstream even segfaults. Eh, the open group page requires EINVAL: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html > > > Discarding the shift state on fseek()/fseeko() is permitted (but should > > > be documented as this is implementation-defined behaviour). > > > State-dependent encodings (where this is relevant) are rarely used > > > nowadays. > > > > The conversion to bytes and back probably makes open_wmemstream() quite > > > slow but I don't think that is very important. > > > Note that we do this already for swprintf(). I've added this note: > > > IMPLEMENTATION NOTES > > Internally all I/O streams are effectively byte-oriented, so using wide- > > oriented operations to write to a stream opened via open_wmemstream() > > results in wide characters being expanded to a stream of multibyte char- > > acters in stdio's internal buffers. These multibyte characters are then > > converted back to wide characters when written into the stream. As a > > result, the wide-oriented streams maintain an internal multibyte charac- > > ter conversion state that is cleared on any seek opertion that changes > > the current position. This should have no effect unless byte-oriented > > output operations are used on a wide-oriented stream. > > This appears to "approve" the use of byte-oriented operations on a > wide-oriented stream which is in fact undefined. Hmm, I can remove the last bit ("unless..."), but am struggling to think of good wording. Maybe: This should only affect use of byte-oriented output operations on a wide- oriented stream which is an undefined operation and should be avoided. > > > > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch > > > > The seek functions should check for overflow in the addition (for > > > SEEK_CUR and SEEK_END) and the conversion to size_t. > > > Note that SEEK_CUR is effectively only called with an offset of 0 > > via __ftello(). I'm not sure of the best way to check for overflow, do I have > > to do something like this: > > > static int > > will_overflow(size_t off, fpos_t pos) > > { > > if (pos < 0) > > return (-pos > off); > > return (SIZE_MAX - off > pos); > > } > > > For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and > > for SEEK_END 'will_overflow(ms->len, pos)' > > > Presumably SEEK_SET should test for the requested position being > > beyond SIZE_MAX as well? > > > If my above test is ok then I end up with this: > > Hmm, perhaps it is better to restrict to SSIZE_MAX instead of SIZE_MAX > as this greatly simplifies the code. With the above code, comparisons > may be signed or unsigned and there may be no greater type that fits > both a size_t and an fpos_t. Negating a negative fpos_t is also > incorrect because it may overflow. If you restrict the memory buffer to > ssize_t, an ssize_t can be safely converted to an fpos_t. Humm. Presumably I have to check for overflow in the expression to grow the stream as well? I have no idea how to write these checks. :( Of course, practically speaking (and I can make assumptions about the implementation I think since this is libc internals after all), fpos_t == off_t and sizeof(size_t) <= sizeof(off_t) for all platforms we support (and likely will always be true, files can always be larger than memory). Does that simplify things if I assume that? -- John Baldwin