Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 26 Feb 2002 15:40:36 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Bill Fenner <fenner@research.att.com>
Cc:        <standards@FreeBSD.ORG>
Subject:   Re: scanf(3) patches for review
Message-ID:  <20020226141642.A43061-100000@gamplex.bde.org>
In-Reply-To: <200201300531.g0U5VEh48095@stash.attlabs.att.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 29 Jan 2002, Bill Fenner wrote:

> Here are some long-standing scanf(3) patches, which add the new c99
> size modifiers to scanf(3).  They've been languishing in my tree waiting

Here is a belated review.

> for me to get around to implementing %n$, but I am clearly not getting
> to it so there's nothing to win by sitting on 'em.
>
> One thing that I haven't decided yet is whether it makes sense to
> rewrite the size modifier narrative to a table in the same way as
> I did for printf(3).  Input is solicited.

I didn't look at the mdoc parts.

> Index: vfscanf.c
> ===================================================================
> RCS file: /home/ncvs/src/lib/libc/stdio/vfscanf.c,v
> retrieving revision 1.19
> diff -u -r1.19 vfscanf.c
> --- vfscanf.c	29 Nov 2001 03:03:55 -0000	1.19
> +++ vfscanf.c	30 Jan 2002 05:16:58 -0000
> @@ -45,6 +45,7 @@
>  #include "namespace.h"
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <stddef.h>
>  #include <ctype.h>
>  #if __STDC__
>  #include <stdarg.h>
> @@ -52,6 +53,7 @@
>  #include <varargs.h>
>  #endif
>  #include <string.h>
> +#include <inttypes.h>
>  #include "un-namespace.h"
>
>  #include "collate.h"

Includes more disordered than before.

> @@ -76,7 +78,11 @@
>  #define	SUPPRESS	0x08	/* suppress assignment */
>  #define	POINTER		0x10	/* weird %p pointer (`fake hex') */
>  #define	NOSKIP		0x20	/* do not skip blanks */
> -#define	QUAD		0x400
> +#define	LONGLONG	0x400	/* ll: long long (+ deprecated q: quad) */
> +#define	INTMAXT		0x800	/* j: intmax_t */
> +#define	PTRDIFFT	0x1000	/* t: ptrdiff_t */
> +#define	SIZET		0x2000	/* z: size_t */
> +#define	SHORTSHORT	0x4000	/* hh: char */
>
>  /*
>   * The following are used in numeric conversions only:

Please change all the old comments to have the same format (e.g.,
"p: void * (`fake hex')" for POINTER.

> @@ -98,13 +104,10 @@
>  #define	CT_CHAR		0	/* %c conversion */
>  #define	CT_CCL		1	/* %[...] conversion */
>  #define	CT_STRING	2	/* %s conversion */
> -#define	CT_INT		3	/* integer, i.e., strtoq or strtouq */
> +#define	CT_INT		3	/* integer, i.e., strtoimax or strtoumax */
>  #define	CT_FLOAT	4	/* floating, i.e., strtod */

I think naming the conversion function(s) is not as useful as naming
all the format specifiers.

> @@ -136,8 +139,8 @@
>  	int nassigned;		/* number of fields assigned */
>  	int nconversions;	/* number of conversions */
>  	int nread;		/* number of characters consumed from fp */
> -	int base;		/* base argument to strtoq/strtouq */
> -	u_quad_t(*ccfn)();	/* conversion function (strtoq/strtouq) */
> +	int base;		/* base argument to strtoimax/strtoumax */
> +	uintmax_t(*ccfn)();	/* conversion function (strtoimax/strtoumax) */

Missing space after uintmax_t (from style bug in rev.1.11).

Naming the conversion functions all over is even less useful than above.

We should start fixing incomplete prototypes in libc.  They are more common
for function pointers like the above than for functions.

> @@ -205,61 +225,49 @@
>
>  		/*
>  		 * Conversions.
> -		 * Those marked `compat' are for 4.[123]BSD compatibility.
> -		 *
> -		 * (According to ANSI, E and X formats are supposed
> -		 * to the same as e and x.  Sorry about that.)
>  		 */
> -		case 'D':	/* compat */
> -			flags |= LONG;
> -			/* FALLTHROUGH */

Maybe abort() for obsolete formats.

>  		case 'd':
>  			c = CT_INT;
> -			ccfn = (u_quad_t (*)())strtoq;
> +			ccfn = (uintmax_t (*)())strtoimax;
>  			base = 10;
>  			break;

I was unhappy with the corresponding global change from strto[u]l to
strto[u]q in rev.1.11.  It pessimized the usual case and changes the
handling of overflow.  scanf'ing the integer one larger than LONG_MAX
using %ld used to give LONG_MAX and set errno = ERANGE, but it gives
undefined behaviour (usually silent truncation to LONG_MIN).  Rev.1.1
had the same problem with the integer one larger than INT_MIN if
INT_MIN < LONG_MIN.

I just learned that all these misbehaviours are standards conformant.
The behaviour is undefined if the integer (parsed in the same way as
strto[u]l execpt with infinite precsion) is too large to be representatable
by the specified type.  So we can parse all integers using strto[u]max()
and blindly truncate them.  No programs should notice the inefficiency
for this, since no programs should use the scanf family, at least with
integer formats, since there is no way to determine if the scan really
worked :-).

>  #ifdef FLOATING_POINT
> -		case 'E':	/* compat   XXX */
> -		case 'F':	/* compat */
> -			flags |= LONG;
> -			/* FALLTHROUGH */
> +		case 'E': case 'F': case 'G':


Oops, we can't abort in most cases, since the specifier is used for
something else.

> @@ -278,7 +289,7 @@
>  		case 'p':	/* pointer format is like hex */
>  			flags |= POINTER | PFXOK;
>  			c = CT_INT;
> -			ccfn = strtouq;
> +			ccfn = strtoumax;
>  			base = 16;
>  			break;
>

Strictly, pointers might need more precision than strtoumax(), and we might
even have a function to support this if we had a machine that needed it,
but the function wouldn't fit into the current framework.

> ...
> @@ -451,7 +465,7 @@
>  			continue;
>
>  		case CT_INT:
> -			/* scan an integer as if by strtoq/strtouq */
> +			/* scan an integer as if by strtoimax/strtoumax */

As above about naming the conversion functions all over.  The standard
actually only says that the subject sequence is in the same format as
expected by strtol().

> @@ -569,19 +583,27 @@
>  				(void) __ungetc(c, fp);
>  			}
>  			if ((flags & SUPPRESS) == 0) {
> -				u_quad_t res;
> +				uintmax_t res;
>
>  				*p = 0;
>  				res = (*ccfn)(buf, (char **)NULL, base);

This invokes undefined behaviour (calling through a function pointer with
a type different from the original function).  This can be fixed easily by
throwing away the function pointer and all the dubious casts to it.  Just
set an UNSIGNED flag if we don't already have a suitable flag, and use it
here as follows:

				if (flags & UNSIGNED)
					res = strtoumax(buf, NULL, base);
				else
					res = strtoimax(buf, NULL, base);

Better code would use a a specialized function for each result type and
not risk gratutous sign extension bugs for storing all the results in a
common type.  About 15 specialized functgions would be needed.  This is
too much for a function that should never be used, so I'm not seriously
suggesting it :-).

>  				if (flags & POINTER)
>  					*va_arg(ap, void **) =
> -						(void *)(u_long)res;
> +							(void *)(uintptr_t)res;

Strictly, uintmax_t is not suitable for storing pointers.  The cast to
uintptr_t is main to break warnings about this.  Better code would use:

 					*va_arg(ap, void **) =
						strtovoidstar(...);

This is even simpler, except strtovoidstar() doesn't exist.

> +				else if (flags & SHORTSHORT)
> +					*va_arg(ap, char *) = res;

Better code for this would not be as simple as for pointers, since it
would require 2 elseif clauases instead of 1 (for the signed and unsigned
cases).  Similarly for all the other integer types except ptrdiff_t and
size_t.

I like most of your changes but didn't check them against the standard.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-standards" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020226141642.A43061-100000>