From owner-freebsd-fs@FreeBSD.ORG Tue May 3 08:30:56 2011 Return-Path: Delivered-To: fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 56A621065674 for ; Tue, 3 May 2011 08:30:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id E77168FC08 for ; Tue, 3 May 2011 08:30:55 +0000 (UTC) Received: from c122-106-155-58.carlnfd1.nsw.optusnet.com.au (c122-106-155-58.carlnfd1.nsw.optusnet.com.au [122.106.155.58]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p438UpOZ008619 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 3 May 2011 18:30:52 +1000 Date: Tue, 3 May 2011 18:30:51 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Rick Macklem In-Reply-To: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: <20110503174200.V1050@besplex.bde.org> References: <2119325179.903923.1304380636687.JavaMail.root@erie.cs.uoguelph.ca> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: fs@FreeBSD.org Subject: Re: newnfs client and statfs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2011 08:30:56 -0000 On Mon, 2 May 2011, Rick Macklem wrote: > I have attached a version of the patch that I intend to commit > unless it doesn't work for Kostik's test case. Kostik, could > you please test this one. > > Yes, Bruce, I realize you won't like it, but I > have put some comments in it > to try and clarify why it is coded the way it is. > (The arithmetic seems to work the way I would expect it to for > i386, which is the only arch I have for testing.) Sigh. % --- fs/nfsclient/nfs_clport.c.sav 2011-04-30 20:16:39.000000000 -0400 % +++ fs/nfsclient/nfs_clport.c 2011-05-02 19:32:31.000000000 -0400 % @@ -838,21 +838,33 @@ void % nfscl_loadsbinfo(struct nfsmount *nmp, struct nfsstatfs *sfp, void *statfs) % { % struct statfs *sbp = (struct statfs *)statfs; % - nfsquad_t tquad; % % if (nmp->nm_flag & (NFSMNT_NFSV3 | NFSMNT_NFSV4)) { % sbp->f_bsize = NFS_FABLKSIZE; % - tquad.qval = sfp->sf_tbytes; % - sbp->f_blocks = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_fbytes; % - sbp->f_bfree = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_abytes; % - sbp->f_bavail = (long)(tquad.qval / ((u_quad_t)NFS_FABLKSIZE)); % - tquad.qval = sfp->sf_tfiles; % - sbp->f_files = (tquad.lval[0] & 0x7fffffff); % - tquad.qval = sfp->sf_ffiles; % - sbp->f_ffree = (tquad.lval[0] & 0x7fffffff); % + sbp->f_blocks = sfp->sf_tbytes / NFS_FABLKSIZE; % + sbp->f_bfree = sfp->sf_fbytes / NFS_FABLKSIZE; % + /* % + * Although sf_abytes is uint64_t and f_bavail is int64_t, % + * the value after dividing by NFS_FABLKSIZE is small % + * enough that it will fit in 63bits, so it is ok to % + * assign it to f_bavail without fear that it will become % + * negative. % + */ % + sbp->f_bavail = sfp->sf_abytes / NFS_FABLKSIZE; % + sbp->f_files = sfp->sf_tfiles; % + /* Since f_ffree is int64_t, clip it to 63bits. */ % + if (sfp->sf_ffiles > (uint64_t)INT64_MAX) This cast has no effect. INT64_MAX has type int64_t. sf_ffiles has uint64_t. The default binary promotions cause both types to be promoted to the minimally larger common type. This type is uint64_t. Thus INT64_MAX is converted automatically to the correct type. % + sbp->f_ffree = INT64_MAX; % + else % + sbp->f_ffree = sfp->sf_ffiles; % } else if ((nmp->nm_flag & NFSMNT_NFSV4) == 0) { % + /* % + * The type casts to (int32_t) ensure that this code is % + * compatible with the old NFS client, in that it will % + * sign extend a value with bit31 set. This may or may % + * not be correct for NFSv2, but since it is a legacy % + * environment, I'd rather retain backwards compatibility. % + */ % sbp->f_bsize = (int32_t)sfp->sf_bsize; % sbp->f_blocks = (int32_t)sfp->sf_blocks; % sbp->f_bfree = (int32_t)sfp->sf_bfree; It won't sign extend, but will propagate bit31 as an unsigned bit. For example, sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = 0xFFFFFFFF80000000, which is massively different. Again, omitting the cast gives the correct result if the wire insists on its values being unsigned. The result is only backwards compatible with relatively recent FreeBSD nfs clients. All FreeBSD clients are completely broken if bit31 is set, and compatibility with this brokenness is not useful (but as I pointed out in another reply, we would never have seen the broken case when the old clients weren't old, since it takes a server file system size of about 32TB for bit 31 to be set). The details of the brokenness vary: Net/2, FreeBSD-1, 4.4BSD-Lite, FreeBSD-[2-4]: f_blocks was plain long: if long is 32 bits, then sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = -0x7fffffff - 1 (LONG_MIN) if long is 64 bits, then sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = -0x80000000L (INT32_MIN (same as 32-bit LONG_MIN) FreeBSD-current after 2003/11/12, FreeBSD-[5-9]: f_blocks is now uint64_t: changing it (and others from a signed type to an unsigned type mainly gave lots of sign extension bugs, including here. The bugs remain mostly unfixed. sfp->sf_blocks = 0x80000000 becomes sbp->f_blocks = 0xFFFFFFFF80000000 ((uint64_t)INT32_MIN) on all arches. Neither of the garbage values INT32_MIN, ((uint64_t)INT_MIN) gives useful behaviour. The former is negative, though the wire value cannot be negative (not sure about this for v2). Applications that are naive enough to believe this value should assume that the the file system has a negative size and never try to write anything. The latter is enormous and positive. If the wire count really is 0x80000000, then that is already very large, so believing that the value is 0xFFFFFFFF80000000 should make little difference. The bugs are a little different for signed fields like f_bavail. Now there are no sign extension bugs or version-dependent misbehaviours. There are just overflow bugs in the bogus casts. (int32_t)0x80000000 overflows to INT32_MIN (only on 2's complement machines, but no others are supported), and assignment to sbp->f_bavail doesn't change this garbage value. Now the bugs are even further off, since it takes about a 400 TB ffs server file system to reach them. (400 TB with 8% minfree gives a 32TB reserve for root. After using all 32TB of this reserve, there would be -32TB available for non-root. -32TB is INT32_MIN in 16K-blocks.) Bruce