From owner-svn-src-head@FreeBSD.ORG Tue Jan 31 15:20:10 2012 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3D8011065670; Tue, 31 Jan 2012 15:20:10 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail09.syd.optusnet.com.au (mail09.syd.optusnet.com.au [211.29.132.190]) by mx1.freebsd.org (Postfix) with ESMTP id C56238FC19; Tue, 31 Jan 2012 15:20:09 +0000 (UTC) Received: from c211-30-171-136.carlnfd1.nsw.optusnet.com.au (c211-30-171-136.carlnfd1.nsw.optusnet.com.au [211.30.171.136]) by mail09.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id q0VFK6Fi003512 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 1 Feb 2012 02:20:07 +1100 Date: Wed, 1 Feb 2012 02:20:06 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Konstantin Belousov In-Reply-To: <20120131105431.GB3283@deviant.kiev.zoral.com.ua> Message-ID: <20120201013621.V2889@besplex.bde.org> References: <20120126153641.GA68112@FreeBSD.org> <20120127194612.H1547@besplex.bde.org> <20120127091244.GZ2726@deviant.kiev.zoral.com.ua> <20120127194221.GA25723@zim.MIT.EDU> <20120128123748.GD2726@deviant.kiev.zoral.com.ua> <20120129001225.GA32220@zim.MIT.EDU> <20120129062327.GK2726@deviant.kiev.zoral.com.ua> <20120129223904.GA37483@zim.MIT.EDU> <20120130063034.GU2726@deviant.kiev.zoral.com.ua> <20120130190703.GA44663@zim.MIT.EDU> <20120131105431.GB3283@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, Gleb Smirnoff , src-committers@freebsd.org, Bruce Evans Subject: Re: svn commit: r230583 - head/sys/kern X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jan 2012 15:20:10 -0000 On Tue, 31 Jan 2012, Konstantin Belousov wrote: > On Mon, Jan 30, 2012 at 02:07:03PM -0500, David Schultz wrote: >> That's why I'm glad I'm not committing it. :) A more conservative >> change (also known as "kicking the can down the road") would be to >> add a VFS flag, e.g., VFCF_LONGIO, and only set it on file systems >> that have been thoroughly reviewed. The VFS layer could cap the size >> at INT_MAX for file systems without the flag. > At least I will get more mail after the commit, I hope. > > I disagree with the VFCF_LONGIO approach. It will cause much head-scratching > for unsuspecting user who would try to use > 4GB transfers. No one cared when this was broken for almost 10 years by 4.4BSD changing the type of iov_len from int to size_t. You can almost equally safely break it again :(. Previously (in FreeBSD-1 and presumably in Net/2) iov_len had the same type as uio_resid, and the code in readv(), etc., depended on this and also on benign undefined behaviour on overflow to detect overflow (after it gave undefined behaviour by occurring) when adding up iov_len's to get uio_resid. The bug didn't require passing a buffer of size > INT_MAX to reach -- it just required a single or multiple iov_len whose total size exceeded INT_MAX. FreeBSD fixed this in 2003. size_t for iov_len is bogus even if the limit is SSIZE max. It allows a single iov to have a size that cannot work, and n iov's to have a size almost 2*n times too large to work. Unfortunately, POSIX standardizes the type of iov_len as size_t. It is interesting that POSIX says "shall fail" for readv() when the sum of the iov_len values overflowed [sic] an ssize_t. For read(), the behaviour when the count exceeds {SSIZE_MAX} is implementation-defined. This gives the silly possibility that read() can work for a size that is almost twice as large as readv(), using a single count instead of an array of counts. It is also strange that readv() refers to ssize_t while read() refers to {SSIZE_MAX}. POSIX standardizes the bug that {SSIZE_MAX} is the limit of the type, although that may be unrelated to sizes that can work. If these APIs had been correctly designed, then the limit would be {READ_MAX} and unrelated to any size type (except that the type must be large enough to represent {READ_MAX}). It would normally be INT_MAX or much smaller. readv() also has the bogus specification that if iovcnt is <= 0 or > {IOV_MAX}, then failure is optional. So failure is optional for a hard error like iovcnt > {IOV_MAX}, but is required for a condition that is not required to be an error for a similar API (count > {SSIZE_MAX}). I only checked a 2001 draft for this. Maybe some of these bugs have been fixed. > What I can do, is to commit all changes except removals of the checks > for INT_MAX. After type changes settle, I can try to gather enough > bravery to flip the checks in HEAD, possibly with temporary sysctl > to return to old behaviour for emergency (AKA hole). > >> >>> diff --git a/sys/kern/sys_pipe.c b/sys/kern/sys_pipe.c >>> index 9edcb74..332ec37 100644 >>> --- a/sys/kern/sys_pipe.c >>> +++ b/sys/kern/sys_pipe.c >> [...] >>> @@ -757,14 +757,14 @@ pipe_build_write_buffer(wpipe, uio) >>> struct pipe *wpipe; >>> struct uio *uio; >>> { >>> - u_int size; >>> + size_t size; >>> int i; >>> >>> PIPE_LOCK_ASSERT(wpipe, MA_NOTOWNED); >>> KASSERT(wpipe->pipe_state & PIPE_DIRECTW, >>> ("Clone attempt on non-direct write pipe!")); >>> >>> - size = (u_int) uio->uio_iov->iov_len; >>> + size = uio->uio_iov->iov_len; >>> if (size > wpipe->pipe_buffer.size) >>> size = wpipe->pipe_buffer.size; >> >> The transfer can't be bigger than the max pipe buffer size (64k), >> so `size = (int)MIN(uio->uio_iov->iov_len, wpipe->pipe_buffer.size)' >> should suffice. The same comment applies elsewhere in the file. > > True. If you much prefer this version, I will change the patch. But I do > think that my changes are cleaner. size_t is a bogus type for `size', since it must still be a u_int to work. For example, when it is assigned to the u_int wpipe->pipe_map.cnt. This shouldn't be "fixed" by bloating the type to size_t throughout (any global change should be to int to give suitably undefined behaviour on overflow). There must be an implicit or explicit check on it somewhere that it fits. This check is provided by the comparison with wpipe->pipe_buffer.size. But the above is a bad way to write it -- it should be checked before assignment, to avoid any possibility of overflow or sign extension of either operand before the check: u_int size; if (uio->uio_iov->iov_len > wpipe->pipe_buffer.size) size = wpipe->pipe_buffer.size; else size = uio->uio_iov->iov_len; This is the same as the MIN() expression (without a bogus cast). I think I want to write it out verbosely like this and not use MIN(), even if MIN() were not a style bug, to make it clear that this is a correct bounds check. Bruce