From owner-cvs-src@FreeBSD.ORG  Fri Jan 21 14:39:07 2005
Return-Path: <owner-cvs-src@FreeBSD.ORG>
Delivered-To: cvs-src@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id ECE8916A4CE; Fri, 21 Jan 2005 14:39:07 +0000 (GMT)
Received: from mailout1.pacific.net.au (mailout1.pacific.net.au [61.8.0.84])
	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 40E0843D1D; Fri, 21 Jan 2005 14:39:07 +0000 (GMT)
	(envelope-from bde@zeta.org.au)
Received: from mailproxy2.pacific.net.au (mailproxy2.pacific.net.au
	[61.8.0.87])j0LEd6A6022793;	Sat, 22 Jan 2005 01:39:06 +1100
Received: from katana.zip.com.au (katana.zip.com.au [61.8.7.246])
	j0LEd3xH009024;	Sat, 22 Jan 2005 01:39:04 +1100
Date: Sat, 22 Jan 2005 01:39:03 +1100 (EST)
From: Bruce Evans <bde@zeta.org.au>
X-X-Sender: bde@delplex.bde.org
To: Peter Wemm <peter@freebsd.org>
In-Reply-To: <200501210123.j0L1NP1u009492@repoman.freebsd.org>
Message-ID: <20050122003920.U37564@delplex.bde.org>
References: <200501210123.j0L1NP1u009492@repoman.freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
cc: cvs-src@freebsd.org
cc: src-committers@freebsd.org
cc: cvs-all@freebsd.org
Subject: Re: cvs commit: src/sys/nfs nfs_vfsops.c
X-BeenThere: cvs-src@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: CVS commit messages for the src tree <cvs-src.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/cvs-src>
List-Post: <mailto:cvs-src@freebsd.org>
List-Help: <mailto:cvs-src-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/cvs-src>,
	<mailto:cvs-src-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Jan 2005 14:39:08 -0000

On Fri, 21 Jan 2005, Peter Wemm wrote:

> peter       2005-01-21 01:23:25 UTC
>
>   FreeBSD src repository
>
>   Modified files:        (Branch: RELENG_4)
>     sys/nfs              nfs_vfsops.c
>   Log:
>   Yet another pass on trying to fix the nfs statfs large-fs blocksize scaler.
>   Casting the result of the 64 bit division to 32 bits (thus discarding the
>   upper 32 bits) and then looking at the truncated result to try and figure
>   out if the untruncated result would fit in 32 bits was utterly useless.
>
>   I am still not sure that it is right, but it has a chance of working now.
>   I'm not at all sure about the sign handling.  NFSv3 only reports positive
>   values here, but correctness of handling the 63/64 bit signs on nfs
>   volumes is not a problem we'll likely have to deal with for some time. I
>   think the "most correct" test is for an unsigned division testing for
>   exceeding LONG_MAX, since we should never end up with a negative number to
>   compare against LONG_MIN.

I sent many mails with better sign handling and many other details,
including ones about the panic implemented in this commit.  The panic
here is just a minor variation of the one in rev.1.133 which was "fixed"
in rev.1.135 using bogus casts to annull buggy code.  Now the modified
casts have no effect so the code is active again.  It was reactivated
in a slightly different way in -current a month or two ago, but has
been backed out there, thus reducing the main bug in this area in
-current to wrong values for negative available space (e.g., 2^55-1
instead of -1 for -1 available blocks).

>   values here, but correctness of handling the 63/64 bit signs on nfs
>   volumes is not a problem we'll likely have to deal with for some time. I
>   think the "most correct" test is for an unsigned division testing for
>   exceeding LONG_MAX, since we should never end up with a negative number to
>   compare against LONG_MIN.

No, negative numbers happen every day for fs_bavail with non-broken servers.
The v3 server has been broken in -current but remains unbroken in RELENG_4.

% Index: nfs_vfsops.c
% ===================================================================
% RCS file: /home/ncvs/src/sys/nfs/Attic/nfs_vfsops.c,v
% retrieving revision 1.91.2.8
% retrieving revision 1.91.2.9
% diff -u -2 -r1.91.2.8 -r1.91.2.9
% --- nfs_vfsops.c	10 Jul 2004 10:28:08 -0000	1.91.2.8
% +++ nfs_vfsops.c	21 Jan 2005 01:23:25 -0000	1.91.2.9
% @@ -294,16 +294,16 @@
%  			sbp->f_bsize = bsize;
%  			tquad = fxdr_hyper(&sfp->sf_tbytes);
% -			if (((long)(tquad / bsize) > LONG_MAX) ||
% -			    ((long)(tquad / bsize) < LONG_MIN))
% +			if (((quad_t)(tquad / bsize) > LONG_MAX) ||
% +			    ((quad_t)(tquad / bsize) < LONG_MIN))
%  				continue;

Casting to quad_t has no effect, since bsize begins at 512 so dividing
by it always reduces tquad far below QUAD_MAX.  The result of the
division alone is always nonnegative so the comparison with LONG_MIN
has no effect either.  Removing the null code and obfuscatory parentheses
gives:

			if (tquad / bsize > LONG_MAX)
				continue;

This is the same as in rev.1.133 except for it not having excessive
parentheses.

A panic can occur due to continuing here, but is unlikely.  If tquad is
nearly 2^64, then there is no value of bsize that can reduce the block
count to less than LONG_MAX on machines with 32-bit longs.  Then
bsize wants to be doubled endlessly, but it first overflows on doubling
of 2^30 and then causes a panic for division by 0 on doubling of 2^31.
But physical file systems cannot have sizes anywhere near 2^64 bytes yet.

My version avoids the panic using explicit bounds checking for bsize.

%  			sbp->f_blocks = tquad / bsize;
%  			tquad = fxdr_hyper(&sfp->sf_fbytes);
% -			if (((long)(tquad / bsize) > LONG_MAX) ||
% -			    ((long)(tquad / bsize) < LONG_MIN))
% +			if (((quad_t)(tquad / bsize) > LONG_MAX) ||
% +			    ((quad_t)(tquad / bsize) < LONG_MIN))
%  				continue;

As above.

%  			sbp->f_bfree = tquad / bsize;
%  			tquad = fxdr_hyper(&sfp->sf_abytes);
% -			if (((long)(tquad / bsize) > LONG_MAX) ||
% -			    ((long)(tquad / bsize) < LONG_MIN))
% +			if (((quad_t)(tquad / bsize) > LONG_MAX) ||
% +			    ((quad_t)(tquad / bsize) < LONG_MIN))
%  				continue;

Now the panic can very easily occur.  Non-broken servers pass negative
available space counts mod 2^64.  Then tquad is 2^64-epsilon, so
tquad / bsize is initially 2^55-epsilon1 and never smaller than
2^33-epsilon2.  So on machines with 32-bit longs, there is no possible
value of bsize which can reduce 2^64-epsilon to less that LONG_MAX,
and doubling bsize causes a panic as above.

Fixing the bugs consists mainly of removing parentheses:

			if ((quad_t)tquad / bsize > LONG_MAX ||
			    (quad_t)tquad / bsize < LONG_MIN)
				continue;

%  			sbp->f_bavail = tquad / bsize;

tquad must also be converted to a signed type before this division too.
In my version, tquad is a quad_t and holds the divided value so that
the division and its ugly cast doesn't have to be repeated for the
range checking, but this is not quite right since tquad really should
be an unsigned type except for handling f_bavail.

Bruce