Date: Wed, 17 Aug 2005 12:52:20 +1000 (EST) From: Bruce Evans <bde@zeta.org.au> To: Divacky Roman <xdivac02@stud.fit.vutbr.cz> Cc: freebsd-bugs@freebsd.org, FreeBSD-gnats-submit@freebsd.org Subject: Re: bin/84992: gcc4.x cleanup of usr.bin/hexdump Message-ID: <20050817111104.C1553@epsplex.bde.org> In-Reply-To: <200508161451.j7GEpX1I014969@eva.fit.vutbr.cz> References: <200508161451.j7GEpX1I014969@eva.fit.vutbr.cz>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 16 Aug 2005, Divacky Roman wrote: >> Description: > gcc4.x (tested with gcc41) cleanup of usr.bin/hexdump >> How-To-Repeat: > apply the patch >> Fix: > > Index: conv.c > =================================================================== > RCS file: /home/ncvs/src/usr.bin/hexdump/conv.c,v > retrieving revision 1.8 > diff -u -r1.8 conv.c > --- conv.c 16 Jul 2004 11:07:07 -0000 1.8 > +++ conv.c 11 Aug 2005 13:22:00 -0000 > @@ -103,7 +103,7 @@ > if (odmode && MB_CUR_MAX > 1) { > oclen = 0; > retry: > - clen = mbrtowc(&wc, p, bufsize, &pr->mbstate); > + clen = mbrtowc(&wc, (const char *)p, bufsize, &pr->mbstate); Adding casts is a cleandown. p should have type "char *" to begin with. If p actually needs to be "unsigned char *" then it is unclear that mbrtowc() can handle it correctly and casting it to "char *" breaks the warning about this. The possible critical differences between unsigned chars and signed chars are especially clear in hexdump. hexdump wants to print raw bytes so it should use unsigned chars just like it already does except without the type mismatches from passes pointers to string functions that want signed chars. Signed chars may have any number of padding bits that wouldn't be printed right if the bytes are accessed as signed chars. The padding bits may contain any number of trap representations which would abort the program before it prints anything. Signed chars must have a single sign bit, but it may be anywhere and may be printed weirdly. See section 6.2.6.2 of the draft C99 standard. It is clear that string functions can't in general handle arrays of unsigned chars via type punning as above, since the unsigned chars may have trap representations when punned to signed chars -- explicit conversion at the level of chars is required, although it might not work right for hexdump since the conversion might lose bits (it _must_ lose any trap bits). What is unclear is whether the problems can actually occur, even in theory. Note that the above code is only executed in the odmode case due to the "hd" mode wanting to be rawer. > if (clen == 0) > clen = 1; > else if (clen == (size_t)-1 || (clen == (size_t)-2 && > @@ -118,7 +118,7 @@ > * can complete it. > */ > oclen = bufsize; > - bufsize = peek(p = peekbuf, MB_CUR_MAX); > + bufsize = peek(p = (u_char *)peekbuf, MB_CUR_MAX); > goto retry; > } > clen += oclen; Similarly, in the reverse direction. peek() wants unsigned chars so using possibly-signed chars for its buffers is nonsense. It is unclear if peek() actually needs unsigned chars. > Index: parse.c > =================================================================== > RCS file: /home/ncvs/src/usr.bin/hexdump/parse.c,v > retrieving revision 1.13 > diff -u -r1.13 parse.c > --- parse.c 22 Jul 2004 13:14:42 -0000 1.13 > +++ parse.c 11 Aug 2005 13:22:00 -0000 > @@ -54,7 +54,7 @@ > void > addfile(char *name) > { > - unsigned char *p; > + char *p; > FILE *fp; > int ch; > char buf[2048 + 1]; > @@ -79,7 +79,7 @@ > void > add(const char *fmt) > { > - unsigned const char *p, *savep; > + const char *p, *savep; > static FS **nextfs; > FS *tfs; > FU *tfu, **nextfu; Changing the basic type to plain char gives bugs even on normal machines (2's complement with 8-bit chars). The data will have been read using read(2) or passed on the command line via char **argv so it naturally has type char, but hexdump depends critically on the reverse of the bogus casts above -- it puns the plain chars to unsigned chars to get rid of the sign bit. Here p points into the format string so the type pun is more bogus than for file data, but it is intentional -- p had type plain "char *" in rev.1.1, but it was changed to "unsigned char *" in rev.1.3 to work around many sign extension bugs that occur in practice. The first one in add() is the well-known bug of using isspace() on a possibly-signed char. This gives undefined behaviour if the value is negative but different from EOF, and bogus defined behaviour if the value is EOF. The format string can be specified on the command line so it is easy for users who use more than the ASCII character set to put a negative char in it. [... Similarly to undo most of the work-arounds in rev.1.3]. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050817111104.C1553>