From owner-freebsd-current@FreeBSD.ORG Fri Dec 16 21:01:16 2005 Return-Path: X-Original-To: current@freebsd.org Delivered-To: freebsd-current@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F22016A41F for ; Fri, 16 Dec 2005 21:01:16 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.FreeBSD.org (Postfix) with ESMTP id BD44443D7C for ; Fri, 16 Dec 2005 21:01:11 +0000 (GMT) (envelope-from phk@critter.freebsd.dk) Received: from critter.freebsd.dk (unknown [192.168.48.2]) by phk.freebsd.dk (Postfix) with ESMTP id CE918BC66 for ; Fri, 16 Dec 2005 21:01:03 +0000 (UTC) To: current@freebsd.org From: "Poul-Henning Kamp" In-Reply-To: Your message of "Fri, 16 Dec 2005 18:56:39 GMT." <200512161856.jBGIudig093608@repoman.freebsd.org> Date: Fri, 16 Dec 2005 22:01:03 +0100 Message-ID: <8509.1134766863@critter.freebsd.dk> Sender: phk@critter.freebsd.dk Cc: Subject: About extensible prinf(3), a slightly long X-mas card X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 16 Dec 2005 21:01:16 -0000 Those of you following CVS commits will have have noticed that I just added an extensible printf(3) implemention as an experiment. This email is intended to provide some insight and background. First, what is it ? ------------------- In a program I'm writing right now, I have some measurement points which are identified by a triplet of numbers, usually encoded in a single integer. That means I have to write things like printf("%03u.%03u.%03u", SITE_OF(p), GROUP_OF(p), POINT_OF(p)); where each of the three macros basically does "(X >> a) & b" for varying a and b. I could have made a function: char * fmt_U(unsigned p) { static char buf[12]; sprintf(buf, "%03u.%03u.%03u", SITE_OF(p), GROUP_OF(p), POINT_OF(p)); return (buf); } and that would mostly work, only not where it is most needed because of the static buffer: fprintf(stderr, "WARNING: configuring %03u.%03u.%03u as analog-input" " conflicts with pin usage on %03u.%03u.%03u thru " "%03u.%03u.%03u\n", SITE_OF(p), GROUP_OF(p), POINT_OF(p), SITE_OF(a), GROUP_OF(a), POINT_OF(a), SITE_OF(b), GROUP_OF(b), POINT_OF(b)); Obviously, it would be much smarter if I could tell printf that %U in my program means format an integer this particular way and write the much more readable: fprintf(stderr, "WARNING: configuring %U as analog-input" " conflicts with pin usage on %U thru %U\n", a, b, c); What ABI to use ? ----------------- GLIBC got here before us, so, lets examine their ABI: The GLIBC ABI consists of a function int register_printf_function( int spec, printf_function *render, printf_arginfo_function *arginfo); The arginfo function is called first, to determine the kinds of arguments, and the render function is called to do the actual formatting. This two-step approach is necessary because at some point some madhatter in some standardsbody or other added a positional argument facility to printf() without thinking about the consequences: Did you know that printf("%2$s %1$s\n", "world\n", "Hello"); is legal ? There are semi-quasi-good reasons for allowing something like this, but the cost in performance is high: One is forced to scan the format string at least twice to first determine the types for the arguments and then again in order to actually render them to the output. In practice one is forced to scan the format string _three_ times, once to find the number of arguments so that memory can be allocated for the second pass which finds the type so that this info can be made available for the third pass. Sigh... Well, anyway, back to my example: The code to implement my points would in GLIBC's world look like this: First a function which says that we take one argument of type int: int printf_arginfo_point(const struct printf_info *pi, size_t n, int *argt) { assert(n >= 1); argt[0] = PA_INT; return (1); } and then a function to render that: int printf_render_point(FILE *fp, const struct printf_info *pi, const void *const *arg) { unsigned u; u = *((unsigned **)arg[0]); return (fprintf(fp, "%03u.%03u.%03u", SITE_OF(u), GROUP_OF(u), POINT_OF(u))); } And finally a call to register these two; i = register_printf_function('U', printf_render_point, printf_arginfo_point); I can live with that, so I have adopted that ABI. Implementation & performance ---------------------------- The next thing was to teach our printf(3) implementation about this, and boy, what a monster. Take a peek for yourself in: src/lib/libc/stdio/vfprintf.c This is why we in FreeBSD have such a damn good performance: every trick in the book have been used where it matters. Attempting to shoe-horn extensibility into this piece of code would be suicide, both from a performance point of view and for my sanity. Instead what I did was write a new implementation of the core task: parsing the format string, and then extracted the integer, floating-point, string and char stuff from the vfprintf.c code and put it into separate files and aligned those with the GLIBC ABI. Until somebody specifically ask for it (with a global variable or an environment variable) or registers an extension, we keep running on our good old fast spaghetti vprintf, but once extensibility is called for, we switch to my code. Unfortunately, the extensible printf had a totally unsatisfactory perforamnce in comparison with the vprintf.c one and something had to happen. The problem is that printing for instance a floating point number consists of marshalling up to 6 pieces of the string: * [+|-| ] [0x|0X] MMM . NNN [e|E|p|P] [+|-] ZZ * A B ---C--- D E F (taken from an exceptionally good comment in vfprintf.c) and in addition there may be leading and trailing padding to the requested field width. That means that worst (sensible) case, 8 separate I/O requests will be made for one floating point number. Ouch. The vprintf.c implementation has its own private struct uio implementation for this exact reason. I reimplmented that so that it was compatible with the extensibility of the new printf, but that unfortunately breaks the GLIBC ABI because now the render function must take a "struct __printf_io *" instead of a "FILE *" So right now we have two ABIs, the GLIBC and the FreeBSD. I'm not happy with that, and I may drop the FreeBSD one again and decide to live with the performance hit. Another trick which vfprintf.c does is that if the file is unbuffered (setbuf(fp, NULL)) it creates a local buffer so that each of the I/Os won't result in a write(2) system call, and only at the end of the printf(3) operation, does the buffer get written to the file. Sneaky. I've added that as well and that helped performance a good deal too. Default extensions ------------------ I also decided that there were some pieces of code that I have written far too many times to be funny, and therefore I added three sets of arginfo/render functions for things I have missed more times than I could count: %H -- Hexdump Takes a pointer and a length: printf("Received %H\n", packet, packet_len); 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 If you add a width, that will control how many columns are printed: printf("Received %4H\n", packet, packet_len); 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 Adding the printf flag for alternate will add ascii dump and a "+" will give addresses: printf("Received %+#8H\n", packet, packet_len); 0000 00 01 02 03 04 05 06 07 |........| 0008 08 09 0a 0b 0c 0d 0e 0f |........| 0010 10 11 12 13 |.... | %V -- Stringvis Expands non-printable characters to C-style back-slash escapes, octal back-slash escapes or HTTP stype %xx escapes: printf("%V", recv_string) Hello\tWor\377ld printf("%0V", recv_string) Hello\011Wor\377ld printf("%+V", recv_string) Hello%09Wor%FFld %T -- time_t/timeval/timespec Making sense of time intervals takes a lot of printf work, and I do this a lot: printf("%T\n", &delta_t); /* time_t */ 20349 printf("%#T\n", &delta_t); /* time_t */ 5h39m9 printf("%#lT\n", &tv); /* struct timeval */ 5h39m9.245000 printf("%#.3llT\n", &ts); /* struct timespec */ 5h39m9.245 You can enable these with register_printf_render_std("HVT"); Style(9) and GCC issues ----------------------- Obviously, such extensions are not condoned by style(9) but probably more damning, GCC's printf format checker knows nothing about them, so it will take a fit if you use them. For these reasons, and for general reasons of sanity, printf format extensions should _not_ be added to FreeBSD programs at this time, if ever. Conclusion & future -------------------- Well, there you have it: Some of you will, like me, have implemented your own printf many times to do stuff like this, now FreeBSD offers a sane way to do that. If this proves useful, it'll stay in the tree and be part of 7.0 (no MFCs!) it people break out in bikesheds about it, it will not. Keep me posted... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.