From owner-svn-src-head@FreeBSD.ORG Thu Aug 29 13:53:40 2013 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id E7ED55F5; Thu, 29 Aug 2013 13:53:40 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail105.syd.optusnet.com.au (mail105.syd.optusnet.com.au [211.29.132.249]) by mx1.freebsd.org (Postfix) with ESMTP id 91737290E; Thu, 29 Aug 2013 13:53:40 +0000 (UTC) Received: from c122-106-156-23.carlnfd1.nsw.optusnet.com.au (c122-106-156-23.carlnfd1.nsw.optusnet.com.au [122.106.156.23]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 90A991045118; Thu, 29 Aug 2013 23:26:06 +1000 (EST) Date: Thu, 29 Aug 2013 23:26:05 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Sergey Kandaurov Subject: Re: svn commit: r254600 - head/lib/libutil In-Reply-To: Message-ID: <20130829220230.W1258@besplex.bde.org> References: <201308211646.r7LGk6eV051215@svn.freebsd.org> <5214F72B.7070006@freebsd.org> <20130821190309.GB52908@omg> <20130821202725.GA4991@stack.nl> <20130821212413.GC52908@omg> <20130821213755.GA8052@stack.nl> <20130822185756.Y1297@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=YYGEuWhf c=1 sm=1 tr=0 a=ebeQFi2P/qHVC0Yw9JDJ4g==:117 a=PO7r1zJSAAAA:8 a=aw4-S10CjX0A:10 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=H7VN2zDOXXgA:10 a=6eX2XVolbth28Z6AywQA:9 a=Qu08HLGT9pfdkBNc:21 a=XgiEenO_gzH1qr_s:21 a=CjuIK1q_8ugA:10 Cc: src-committers@freebsd.org, Andrey Chernov , Jilles Tjoelker , svn-src-all@freebsd.org, Bruce Evans , svn-src-head@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Aug 2013 13:53:41 -0000 On Thu, 29 Aug 2013, Sergey Kandaurov wrote: > On 22 August 2013 14:51, Bruce Evans wrote: >> expand_number() remains a very badly designed and implemented function. >> Its design errors start with its name. It doesn't expand numbers. It >> ... > [...] >> >> Some of the other bugs in the old version: >> >> % /* >> % * Convert an expression of the following forms to a uint64_t. >> % * 1) A positive decimal number. >> % * 2) A positive decimal number followed by a 'b' or 'B' (mult by 1). >> % ... >> This garbage was copied from a similar comment in dd/args.c. The number is >> actually neither positive nor decimal, especially in dd where there is a >> variant of this function that returns negative numbers. Another design >> error >> in this function is that it can't return negative numbers. > > Well, I see no useful contents in this comment. It is obvious and verbose. > Probably it should be trimmed. Stating the format in the man page is enough. I don't like the man page much either, and have complained about it and dehumanize_number(3) at length elsewhere. But expand_number() is much simpler, so most of the bugs mostly affect the other man page. (It is too simple to be able to translate numbers printed by dehumanize_number(), since it doesn't support power of 10 "prefixes".) Looking at expand_number(3) now, I see only the following bugs: - the format of the string isn't really documented - poor grammar: - "the buf string" - "the address pointed out" - confusing terminology "prefix". This means an "SI power of 2" prefix, e.g., "kilo" or "k". (I think SI only really has power if 10 prefixes. Anyway, "k" should always mean 1000 and "K" always 1024. This is backwards in the man page.) "kilo" really is a prefix when it modifies a unit, "e.g.", kilobytes. But this function deals with pure numbers, so there are no units, and the "prefixes" are always uses as suffixes, e.g., "10K" (sic). - the code supports both cases for all "prefixes", with literal lower case, but the man page only documents one case (always upper case, except for "k" where this is backwards). Comments in the code describes the "prefixes" as "units". That is better, but is too confusing since there may also be units like bytes added later. - the man page describes the multipliers in non-dehumanized form, i.e., as decimal numbers. Even I prefer a dehumanized form here, since I can only recognize small decimal numbers as powers of 2. The man page should give the numbers in hex or powers of 2 first and then in decimal if there is space. - in the ERRORS section: - the description of EINVAL is incomplete. See strtoumax() for more. - the description of ERANGE has poor grammar. It should start with "The", as os normal and as is done for EINVAL, and perhaps give more details, as in strto*(). I think most utilities and all library functions that parse numbers should say when they use strto*() to do this. Utilities should mostly be covered by intro(1). The default should be to accept numbers in any base. POSIX unfortunately doesn't require very much, so bases other than 10 are unportable at best. Grepping in utitilies man pages shows the following: - I patched mknod(8) with the following when I fixed its strto*() parsing: % Index: mknod.8 % =================================================================== % RCS file: /home/ncvs/src/sbin/mknod/mknod.8,v % retrieving revision 1.3 % retrieving revision 1.4 % diff -r1.3 -r1.4 % 94,95c94,96 % < Major and minor device numbers can be given using the normal C % < notation, so that a leading % --- % > Major and minor device numbers can be given in any format acceptable to % > .Xr strtoul 3 , % > so that a leading This is still fuzzy since it doesn't say that any base is allowed. - sleep(1) refers to strtod(3). sleep(1) actually uses sscanf() instead of strtod(3), but I think the reference is correct. - ctladm(8) refers to strtoull(3) for lba's, with details about the base (but expressed too verbosely, and duplicated for read and write). dd needs a similar amount of detail and more, but has less. ctladm(3) does actually use strtoull(), but that is another bug since on 32-bit machines strtoull() only gives 32-bit numbers but lba's need more. - etherswitchcfg(8) refers to strtol(4) for phy's and gives details about accepting any base. strtol(4) of course doesn't exist. It is a typo for strtol(3). > >> >> % */ >> % >> >> This bogus blank line detaches the comment from the code that it describes, >> resulting in the comment describing null code. > > Sorry, I didn't see such blank line in head. Oops, false alarm. > ... > Fixing the verbose comment above would result in adding more verboseness > citing from man strtoumax(3) wrt. signedness and base details. > Yet another point to trim it. Yes. We add the detail that the number is restricted to 64 bits (else ERANGE), but already document that well enough (in the man page), and after these fixes the man page should match the code. > Thanks for your valuable input. What about this change (on top of head)? > > Index: expand_number.c > =================================================================== > --- expand_number.c (revision 255017) > +++ expand_number.c (working copy) > @@ -35,42 +35,29 @@ > #include > #include > > -/* > - * Convert an expression of the following forms to a uint64_t. > - * 1) A positive decimal number. > - * 2) A positive decimal number followed by a 'b' or 'B' (mult by 1). > - * 3) A positive decimal number followed by a 'k' or 'K' (mult by 1 << 10). > - * 4) A positive decimal number followed by a 'm' or 'M' (mult by 1 << 20). > - * 5) A positive decimal number followed by a 'g' or 'G' (mult by 1 << 30). > - * 6) A positive decimal number followed by a 't' or 'T' (mult by 1 << 40). > - * 7) A positive decimal number followed by a 'p' or 'P' (mult by 1 << 50). > - * 8) A positive decimal number followed by a 'e' or 'E' (mult by 1 << 60). > - */ > int > expand_number(const char *buf, uint64_t *num) > { > + char *endptr; > + uintmax_t umaxval; > uint64_t number; > - int saved_errno; > unsigned shift; > - char *endptr; > + int serrno; > > - saved_errno = errno; > + serrno = errno; > errno = 0; > - > - number = strtoumax(buf, &endptr, 0); > - > - if (number == UINTMAX_MAX && errno == ERANGE) { > - return (-1); > - } > - > - if (errno == 0) > - errno = saved_errno; > - > + umaxval = strtoumax(buf, &endptr, 0); > if (endptr == buf) { > /* No valid digits. */ > errno = EINVAL; > return (-1); > } I want to remove this if clause (depend on the POSIX extension). > + if (umaxval > UINT64_MAX) > + errno = ERANGE; > + if (errno != 0) > + return (-1); > + errno = serrno; > + number = umaxval; > > switch (tolower((unsigned char)*endptr)) { > case 'e': Seems OK. Tabs got corrupted in the mail. I complained about missing overflow checking for the shift, but there is some. It seems to be correct except for a style bug (a blank line that separates the check from the final evaluation. We avoid changing *num when failing, and don't use a temporary variable for holding the result of the shift, so the shift expression is written twice, but that is a negative reason to separate its evaluations). Bruce