Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 16 Aug 2010 20:28:53 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Bakul Shah <bakul@bitblocks.com>
Cc:        mdf@FreeBSD.org, freebsd-arch@FreeBSD.org
Subject:   Re: RFC: replace vm_offset_t with uintptr_t and vm_size_t with size_t
Message-ID:  <20100816193035.A896@besplex.bde.org>
In-Reply-To: <20100815162730.BC15B5B04@mail.bitblocks.com>
References:  <AANLkTik_2pXA1LP9dq-iOLkFrQBG7jP=4yUXBjtDOBF3@mail.gmail.com> <20100813191149.V12776@delplex.bde.org> <20100815162730.BC15B5B04@mail.bitblocks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, 15 Aug 2010, Bakul Shah wrote:

> On Fri, 13 Aug 2010 19:46:42 +1000 Bruce Evans <brde@optusnet.com.au>  wrote:
>>
>> I prefer to fix printf.  There should be a %I format something like in
>> sfio.  Unfortunately C99 only standardized the PRI* mistake.  I never
>> learned exactly what %I does, but think it should have 2 variations,
>> something like a plain %I causing the compiler to rewrite the literal
>> format string, replacing %I with the appropriate Standard format, and
>> %I<width> for interpretation by the library.  -Wformat has done all the
>> work needed to determine the correct replacement for 10-20 years.  So
>> vm_offset's and size_t's would normally be printed using "%I[xu]" (no
>> need for %z), but in message catalogues they would be printed using
>>
>>      ("...%I*[xu] %I*[xu]...", ...
>>      sizeof(vm_offset_t) * CHAR_BIT, var_of_type_vm_offset_t,
>>      sizeof(size_t) * CHAR_BIT,  var_of_type_vm_offset_t, ...)
>>
>> Except that came out too painful (almost as bad as using PRI*).  I
>> think I would usually avoid using %I<width> if it were as messy as
>> this, and use %j and require all integer args to be of type [u]intmax_t.
>>
>> %I could also be a more global directive (either at the front of every
>> literal format string that wants rewriting for all args, or in CFLAGS
>> for all strings in a file).
>
> Have you looked at plan9's fmtinstall(3)? Basically

No, but it doesn't sound too good.

> fmtinstall('x', foo) will install function foo to be called
> when %...x is seen in a format string. foo takes struct Fmt*
> which points to stuff needed for formatting. Things like
> width, precision, flags, whether output buffer is runes or
> chars etc.  Once you install the formats you need, their use
> becomes pretty painless.  Perhaps kernel's printf can be

How is this painless?  The user still has the burden of making the
variadic arg match 'x'.  You can casts all variadic args to a small
subset of common types (standard ones and fmtinstalled() ones), but
you can do that for ordinary printf() too.

> extended (or rebuilt) using this idea?  Seems to me something
> like that would be better and much more extensible than
> inflicting %I*[xu].  This does not require compiler magic but
> you also lose the -Wformat crutch.

So you have hundreds if not thousands more possibilities for type
mismatches (1 for each fmtinstall()'ed foo), and no way to check
them).  -Wformat is better.  FreeBSD also has fmtcheck(), which
does dynamic checking and thus shares some problems with fmtinstall()
(it doesn't extend, but would be more needed with dynamic extension,
but it is already missing too much of what -Wformat does and would
be missing more with dynamic extension).

%I* is painful, but not plain %I.  Compiler magic is the right way.

Another way that is not painless but is perfectly portable and
only depends on compiler magic for optimizations is to conventially
cast all args to a common maximal type.  Code like:

 	printf("int: %jd unsigned: %ju: FP %Lf\n",
 	    (intmax_t)i, (uintmax_t)u, (long double)f);

may be unnecessarily inefficient (i might by 8 bits and intmax_t might
be 1024 bits).  However, printf() is a standard function so the compiler

 	printf("int: %d unsigned: %u: FP %f\n", i, u, f);

if it knows that the default promotion of i is int, etc.  gcc already does
the related optimization of replacing printf("foo\n") by puts("foo"), and
it should also replace a big printf by lots of little ones with a special
function for each arg if that were actually an optimization.  The analysis
for this is essentially what is done by -Wformat.

This is still painful to read and write because it requires the programmer
to figure out the default promotion of each arg and write out all the
casts explicitly; then readers have to read all the casts.  %I in the
format string is essentially a short way of figuring out, writing and
reading these casts.

Another way that doesn't quite work without compiler magic is to have
a non-variadic printf which takes only args of common maximal types:

 	xprintf("int: %d unsigned: %u: FP %f\n", i, u, f);

Here %d, %u and %f means the maximal specifiers %jd, %ju and %Lf,
respectively, not the usual specifiers, and xprintf() has the magic
prototype xprintf(const char * restrict fmt, ^^^) which says that
all variadic args are to be (as if) promoted to their maximal type.
There is a minor problem with args like field widths that don't need
promotion (however, this problem is a feature too, since it avoids the
common printf format error of passing an uncast size_t for a field
width).  The `as if' rule allows reducing xprintf() to printf() as
above, or avoiding promotion for field widths only.

Note that K&R functions have a similar problem with requiring either
too many casts, or manual matching of arg types with parameter types,
or both.  This was fixed using compiler magic named prototypes
(except allowing downcasting and crosscasting without a diagnostics
gives man new bugs).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100816193035.A896>