Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 14 Dec 2005 00:54:05 +1100 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Peter Jeremy <PeterJeremy@optushome.com.au>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: printf behaviour with illegal or malformed format string
Message-ID:  <20051213234201.E3248@epsplex.bde.org>
In-Reply-To: <20051213091656.GD77268@cirb503493.alcatel.com.au>
References:  <1023.1134389663@critter.freebsd.dk> <200512121643.39236.max@love2party.net> <20051213175413.H80942@delplex.bde.org> <20051213091656.GD77268@cirb503493.alcatel.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 13 Dec 2005, Peter Jeremy wrote:

> On Tue, 2005-Dec-13 18:53:15 +1100, Bruce Evans wrote:
>>>> Our first line of defence against this kind of error is compile-time
>>>> checking by GCC, but we cannot rely on the string being sane in libc,
>>>> we still need to do error checking.
>>
>> There is also fmtcheck(3).
>
> I'm probably not the only person who hadn't heard of this function
> before reading your mail.  I'm not sure it is of much use other than
> validating user-supplied format strings (and maybe internationalisation
> translations via catgets(3)).

I think internationalisation is its main use now.

>> I think most checking belongs in the compiler and in fmtcheck(3).
>
> fmtcheck(3) can't check that the arguments passed to printf are valid.
> All it can do is verify that two arbitrary format strings expect the
> same arguments and the documentation states it doesn't even manage that:
>     The fmtcheck() function does not understand all of the conversions
>     that printf(3) does.

Its example also starts by saying that %p is compatible with %lu.
Punning "void *" as "unsigned long" might be safe enough if these types
have the same size, but fmtcheck()'s code doesn't seem to have any
size checks.

> If a programmer (mistakenly) believes that "%hf" is sensible then
> fmtcheck(3) isn't going to help.

This can be checked statically -- check the default at compile time;
then any format that actually consumes the same arg types as the
default will consume the ones passed.

fmtcheck() could also rewrite the suspect format -- you could put
all type info in the default and all other info in the suspect format
and merge at runtime.  E.g., afmtcheck("%-*I %#8I", "%*s%jx") =
strdup("%-*s %#8jx").

>> printf() itself cannot detect most types errors without compiler
>> support that would basically involve passing it the types of all
>> args so that it could call fmtcheck(3).
>
> This is an interesting idea but fmtcheck(3) isn't adequate.  In
> particular, consider:
> 	int	i, j;
> 	const char	*fmt, *string;
> 	...
> 	fmt = "%*.*s %p";
> 	...
> 	printf(fmt, i, j, string, string);
>
> Based on the printf() arguments, the compiler can't really do
> better than:
> 	printf(fmtcheck(fmt, "%d %d %s %s"), i, j, string, string);
> but fmtcheck(3) doesn't consider that "%*.*s %p" and "%d %d %s %s" are
> equivalent.

So more that ordinary C type info is needed.  The compile would have to
parse the format string (which is possible in the above since fmt is
const and visible but not in all cases) and pass "%*.*s %p" not
"%d %d %s %s".

> What could be useful is a function that takes a format string and
> a set of argument types and verifies that the argument types match
> the format string.  The compiler could then expand the printf() to:
> 	if (fmtvalidate(fmt, "int;int;char*;char*"))
> 		printf(fmt, i, j, string, string);
> 	else
> 		abort();
> (though it's not clear how the compiler should handle struct/union
> pointers which might make sense to a user extension).

Both this and my afmtcheck() example could be merged into printf()
itself.  printf("%*.*s %p", width, prec, s, s1) could be transformed
into printf("%C%*.*s %p", "%*.*s%p", width, prec, s, (void *)s1).  %C
indiciates a check arg and that arg is "%*.*s%p".  The check arg is
amost exactly the original string here but could be very different if
the compiler doesn't understand the format or the check arg is from a
message catalog.  Here the compiler understands the format and does
the following transformations:
- %*.*s -> itself.  No problem since the args match.
- " " -> "".  Not related to args.
- "%s" -> "%p".  The args don't match, but this is harmless for char *
   vs void * and the compiler has silently fixed the arg type to match
   the format.  Perhaps it should leave this to printf().

I don't see any special problem with struct/union pointers.  The
compiler just won't usually be able to understand the pointed-to
objects or parts of the format string related to them.  An escape
sequence might be needed to tell it not to warn about extensions,
but pass the escaped format and args directly to the format
checker.

>>>> If the variable is set, a bogus format string will result in abort(2).
>>
>> This sometimes breaks defined behaviour.

I meant in malloc().

> I thought printf(3) documented the behaviour for invalid conversion
> specifiers and mofidiers but I can't find it in the man page right now.

The C standard mostly says "undefined" for nonstandard things in fprintf().
>From n869.txt:

        [#9] If a conversion specification is invalid, the  behavior
        is  undefined.225)  ...

Footnote 225 points to future library extensions.  Conversions specs
have to be undefined so that everyone can define inconsistent extensions :).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20051213234201.E3248>