Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 06 Nov 2003 16:25:04 -0800
From:      Kirk McKusick <mckusick@beastie.mckusick.com>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Peter Wemm <peter@freebsd.org>
Subject:   Re: >0x7fffffff blocksize filesystem reporting 
Message-ID:  <200311070025.hA70P4eN035719@beastie.mckusick.com>
In-Reply-To: Your message of "Thu, 06 Nov 2003 22:44:33 %2B1100." <20031106211046.J7380@gamplex.bde.org> 

next in thread | previous in thread | raw e-mail | index | archive | help
> Date: Thu, 6 Nov 2003 22:44:33 +1100 (EST)
> From: Bruce Evans <bde@zeta.org.au>
> To: Kirk McKusick <mckusick@mckusick.com>
> cc: Peter Wemm <peter@freebsd.org>, Robert Watson <rwatson@nailabs.com>,
>         arch@freebsd.org
> Subject: Re: >0x7fffffff blocksize filesystem reporting
> X-ASK-Info: Whitelist match
> 
> On Wed, 5 Nov 2003, Kirk McKusick wrote:
> 
> > + #define	MNAMELEN	80		/* size of on/from name bufs */
> 
> As pointed out by tjr, there are buffer overflows from bcopy() MNAMELEN
> bytes.  This is because the above MNAMELEN Is smaller than the old MNAMELEN
> on systems with sizeof(long) > 4.  It should be either the historical
> MNAMELEN of 90 or 88, or larger than that since we have room for expansion,
> or no larger than the previous smallest MNAMELEN for all machines, depending
> on whether the overflow bug is fixed by truncation on conversion or avoided
> by truncation when the name is copied in.

Per my earlier message to this list, I now copy the smaller of
MNAMELEN and OMNAMELEN. Per your suggestion, I have increased
MNAMELEN to 88 (see revised statfs structure below).

> > + #define STATFS_VERSION	0x20030518	/* current version number */
> 
> Style bug (space instead of tab).
> 
> > + struct statfs {
> > + 	u_int32_t f_version;		/* structure version number */
> > + 	u_int32_t f_type;		/* type of filesystem */
> > + 	u_int64_t f_flags;		/* copy of mount exported flags */
> > + 	u_int64_t f_bsize;		/* filesystem fragment size */
> > + 	u_int64_t f_iosize;		/* optimal transfer block size */
> > + 	u_int64_t f_blocks;		/* total data blocks in filesystem */
> > + 	u_int64_t f_bfree;		/* free blocks in filesystem */
> > + 	int64_t	  f_bavail;		/* free blocks avail to non-superuser */
> > + 	u_int64_t f_files;		/* total file nodes in filesystem */
> > + 	int64_t	  f_ffree;		/* free nodes avail to non-superuser */
> > + 	u_int64_t f_syncwrites;		/* count of sync writes since mount */
> > + 	u_int64_t f_asyncwrites;	/* count of async writes since mount */
> > + 	u_int64_t f_syncreads;		/* count of sync reads since mount */
> > + 	u_int64_t f_asyncreads;		/* count of async reads since mount */
> > + 	u_int64_t f_spare[10];		/* unused spare */
> > + 	u_int32_t f_namemax;		/* maximum filename length */
> 
> I disklike all these unsigned types, and to a lesser extent, typedefed types
> and some of the 64-bit types.
> 
> Unsigned types limit possibilties for trapping overflow if overflow
> actually occurs.  They should not be used to get an extra bit of
> precision unless the extra bit is very important.  The old code got
> this right by using only signed types.  f_bavail and f_ffree need to
> go negative so they are correctly signed types, but this means that
> the extra bit for the 64-bit unsigned block and node counts cannot
> actually be used (since values that use it would cause overflow if
> most of the blocks or nodes are free).
> 
> Typedefed types are difficult to print, as shown by printf format errors
> in most of the formats changed in this patch.  But forwards compatibility
> at the source level requires them.

I tend to agree with you about using signed types. However, the
general sentiment when this was discussed on the arch list several
months ago were that unsigned types should be used. So, I went with
majority (or at least most vocal) opinion there. The fixed sizes 
are for the reasons that you note.

> Why 64-bit types for f_bsize and f_iosize?

It is concievable that 32-bits would not be enough, so why risk
getting it wrong when it is so easy to fix now.

> If unsigned fixed-width types are used, then they should be named in
> a standard way (uintN_t, not u_intN_t).

Done (see revised statfs structure below).

> There are also some standard POSIX types for block counts, etc.  These
> are used in POSIX's variant of statfs().  E.g., there is blkcnt_t, which
> is a signed integral type not yet implemented in FreeBSD.  In the XSI
> extension there is also fsblkcnt_t, which is an _unsigned_ integral
> type implemented as uint64_t in FreeBSD.  Its unsignedness is apparently
> derived from the old XSI type of unsigned long for block counts.
> 
> > + 	uid_t	  f_owner;		/* user that mounted the filesystem */
> > + 	fsid_t	  f_fsid;		/* filesystem id */
> > + 	char	  f_charspare[76];	    /* spare string space */
> > + 	char	  f_fstypename[MFSNAMELEN]; /* filesystem type name */
> > + 	char	  f_mntfromname[MNAMELEN];  /* mounted filesystem */
> > + 	char	  f_mntonname[MNAMELEN];    /* directory on which mounted */
> > + };
> 
> f_charspare would be better at the end.  If explicit padding is needed
> after f_fsid_t, then it should be in terms of fsid_t spares.  Similarly
> for spares after f_namemax and f_owner.  These are only packed because
> uid_t happens to be 32-bit.  fsid_t is 2*32-bit so it doesn't need
> padding either.  76 is an odd amount of padding.  It gives 252 bytes
> of char arrays altogether and a struct size of 452 on i386's.  452 is
> 4 mod 8, so there are 4 bytes of unnamed padding at the end of the
> struct on most 64-bit arches (if not internal padding).

I miss-calculated the size of fsid_t. I agree that the structure should
be mod 8 == 0. I changed the f_charspare to 80 per your suggestion. I
put the spare character space between the int32's and the character
arrays so that it could be used for either additions int32's or new
character arrays. We could use it for extra int32's even at the end,
but I feel it is more intuitive to keep the int32's together.

> > Index: sys/kern/vfs_syscalls.c
> > ===================================================================
> > RCS file: /usr/ncvs/src/sys/kern/vfs_syscalls.c,v
> > retrieving revision 1.332
> > diff -c -r1.332 vfs_syscalls.c
> > *** sys/kern/vfs_syscalls.c	19 Oct 2003 20:41:07 -0000	1.332
> > --- sys/kern/vfs_syscalls.c	5 Nov 2003 05:10:54 -0000
> > ...
> > + /*
> > +  * Convert a new format statfs structure to an old format statfs structure.
> > +  */
> > + static void
> > + cvtstatfs(td, nsp, osp)
> > + 	struct thread *td;
> > + 	struct statfs *nsp;
> > + 	struct ostatfs *osp;
> > + {
> > +
> > + 	bzero(osp, sizeof(*osp));
> > + 	osp->f_bsize = nsp->f_bsize;
> > + 	osp->f_iosize = nsp->f_iosize;
> > + 	osp->f_blocks = nsp->f_blocks;
> > + 	osp->f_bfree = nsp->f_bfree;
> > + 	osp->f_bavail = nsp->f_bavail;
> > + 	osp->f_files = nsp->f_files;
> > + 	osp->f_ffree = nsp->f_ffree;
> 
> tjr suggested setting the values to LONG_MAX instead of blindly truncating
> them  I think POSIX has some methods for dealing with such overflows
> (from LFS).  I thing they would reduce to returning -1/EOVERFLOW here,
> which is not very useful.
> 
> tjr also suggested using a fake block size.  This is already done in nfs,
> but the implementation is broken (nfs3 has similar 32-bit limits in the
> client-server interface).

Per my earlier message, I now cap the size in the old structure to
LONG_MAX. I chose not to play games with the block size as the old
interface will be going away and I would rather leave it as it has
historically been done.

> > Index: sys/kern/vfs_bio.c
> > ===================================================================
> > RCS file: /usr/ncvs/src/sys/kern/vfs_bio.c,v
> > retrieving revision 1.420
> > diff -c -r1.420 vfs_bio.c
> > *** sys/kern/vfs_bio.c	4 Nov 2003 06:30:00 -0000	1.420
> > --- sys/kern/vfs_bio.c	5 Nov 2003 05:10:54 -0000
> > ***************
> > *** 3239,3245 ****
> >   				    (int) m->pindex, (int)(foff >> 32),
> >   						(int) foff & 0xffffffff, resid, i);
> >   				if (!vn_isdisk(vp, NULL))
> > ! 					printf(" iosize: %ld, lblkno: %jd, flags: 0x%x, npages: %d\n",
> >   					    bp->b_vp->v_mount->mnt_stat.f_iosize,
> >   					    (intmax_t) bp->b_lblkno,
> >   					    bp->b_flags, bp->b_npages);
> > --- 3239,3245 ----
> >   				    (int) m->pindex, (int)(foff >> 32),
> >   						(int) foff & 0xffffffff, resid, i);
> >   				if (!vn_isdisk(vp, NULL))
> > ! 					printf(" iosize: %jd, lblkno: %jd, flags: 0x%x, npages: %d\n",
> >   					    bp->b_vp->v_mount->mnt_stat.f_iosize,
> >   					    (intmax_t) bp->b_lblkno,
> >   					    bp->b_flags, bp->b_npages);
> 
> Example of a printf format error.  The long was easy to print using %ld,
> but now there is a a u_int64_t.  Using %jd gives a sign mismatch on all
> machines and a size mismatch on machines with
> sizeof(u_int64_t) != sizeof(intmax_t).

So true, I need to do a lot of casting to (intmax_t). I wish there
were a better way, sigh.

> > Index: bin/df/df.c
> > ===================================================================
> > RCS file: /usr/ncvs/src/bin/df/df.c,v
> > retrieving revision 1.51
> > diff -c -r1.51 df.c
> > *** bin/df/df.c	13 Sep 2003 20:46:58 -0000	1.51
> > --- bin/df/df.c	5 Nov 2003 19:22:11 -0000
> > ...
> 
> This has many more examples of printf format errors.  The old code has
> many related bugs.
> 
> Bruce

Thanks for your prompt feedback. The revised statfs structure is given
below.

	Kirk McKusick

=-=-=-=-=-=

+ #define	MFSNAMELEN	16		/* length of type name including null */
+ #define	MNAMELEN	88		/* size of on/from name bufs */
+ #define	STATFS_VERSION	0x20030518	/* current version number */
+ struct statfs {
+ 	uint32_t f_version;		/* structure version number */
+ 	uint32_t f_type;		/* type of filesystem */
+ 	uint64_t f_flags;		/* copy of mount exported flags */
+ 	uint64_t f_bsize;		/* filesystem fragment size */
+ 	uint64_t f_iosize;		/* optimal transfer block size */
+ 	uint64_t f_blocks;		/* total data blocks in filesystem */
+ 	uint64_t f_bfree;		/* free blocks in filesystem */
+ 	int64_t	 f_bavail;		/* free blocks avail to non-superuser */
+ 	uint64_t f_files;		/* total file nodes in filesystem */
+ 	int64_t	 f_ffree;		/* free nodes avail to non-superuser */
+ 	uint64_t f_syncwrites;		/* count of sync writes since mount */
+ 	uint64_t f_asyncwrites;		/* count of async writes since mount */
+ 	uint64_t f_syncreads;		/* count of sync reads since mount */
+ 	uint64_t f_asyncreads;		/* count of async reads since mount */
+ 	uint64_t f_spare[10];		/* unused spare */
+ 	uint32_t f_namemax;		/* maximum filename length */
+ 	uid_t	  f_owner;		/* user that mounted the filesystem */
+ 	fsid_t	  f_fsid;		/* filesystem id */
+ 	char	  f_charspare[80];	    /* spare string space */
+ 	char	  f_fstypename[MFSNAMELEN]; /* filesystem type name */
+ 	char	  f_mntfromname[MNAMELEN];  /* mounted filesystem */
+ 	char	  f_mntonname[MNAMELEN];    /* directory on which mounted */
+ };



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200311070025.hA70P4eN035719>