Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 30 Jun 2011 18:32:45 +0200
From:      Stefan Esser <se@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Alexander Best <arundel@freebsd.org>, Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org, Bruce Evans <bde@freebsd.org>
Subject:   Re: [RFC] Consistent numeric range for "expr" on all architectures
Message-ID:  <4E0CA52D.7020508@freebsd.org>
In-Reply-To: <4E0C2F41.2060302@freebsd.org>
References:  <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
The attached patch takes the feedback from Bruce into account.
See details inline in the quoted mail, below.

Any further comments?

On 30.06.2011 10:09, Stefan Esser wrote:
> On 30.06.2011 01:32, Bruce Evans wrote:
>> On Wed, 29 Jun 2011, Stefan Esser wrote:
>>> Am 29.06.2011 01:06, schrieb Bruce Evans:
>>>> Other points:
>>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>>>   but it isn't the result that is too large, but the arg that is too large.
>>>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>>>   correct, but it is in a different style to strerror() (uncapitalized,
>>>>   and more concise).
>>>
>>> The patch that I sent with my first message fixes this. The message is
>>> changed from "Result too large" to "Not a valid integer: %s".
>>>
>>> ("non-numeric argument" is used in other places and I could adapt to
>>> that, though I prefer to also see which argument was rejected. But I
>>> think that "not a valid integer" better describes the situation.)
>>
>> I prefer "operand too large".  A decimal integer operand that is too
>> large to be represented by intmax_t is not really invalid, but is just
>> too large.  We already have a different message for invalid.
> 
> Yes, I also thought the "non-numeric" message was misleading and that
> "not a valid integer" was just slightly less misleading ;-)
> 
> But "operand too large" correctly describes the situation and I'll just
> add a test for that case (instead of just relying on strtoimax() to decide
> about integer-ness).

The patch introduces a function is_integer(), which performs a simple 
syntax check on operands to determine whether they are decimals. This 
function replaces the use of strtoimax() for the purpose of testing the 
operand, which fails to accept decimals too long to be represented by 
intmax_t.

> According to POSIX, "+1" is no valid integer operand in "expr", should a
> leading "+" also be detected and rejected?

The is_integer() function only permits an optional leading minus sign 
followed by digits. The case of a only a minus sign passed as operand 
is already caught by the parser. 

(Aside: Is it correct to reject "expr -- - : -" as a syntax error?) 

>>> From an old (?) version of the man page:
>>
>> %    Arithmetic operations are performed using signed integer math.  If the -e
>> %    flag is specified, arithmetic uses the C intmax_t data type (the largest
>>                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
>>                                    for expr [-e] 10000000000000000000 we are
>>                                    doing actual arithmetic -- see below 
>> %    integral type available), and expr will detect arithmetic overflow and
>> %    return an error indication.  If a numeric operand is specified which is
>>
>> "Numeric" is not defined anywhere in the man page.  This is the only
>> use of it in the man page.  It means "decimal integer" and should say
>> precisely that.  The only hint about this in the man page is the statement
>> that "all integer operands are interpreted in base 10".  The fuzziness
>> extends to error messages saying "non-numeric argument" instead of
>> "operand not a decimal integer [used in integer context]".
> 
> Ok, all numeric arguments should be called "decimal integer" in the
> documentation and error messages (if applicable).

I have changed the wording in the man page. Critical review is welcome.

>> %    so large as to overflow conversion to an integer, it is parsed as a
>> %    string instead.  If -e is not specified, arithmetic operations and pars-
>>
>> This specificially says that large operands are parsed as strings.
>> Strangely, since large operands are only checked for with -e, only -e can
>> get this right; without -e, large operands are not even detected.
>> However, this is a bug in the man page -- see below.
> 
> I had already wondered about this case (large numbers being considered
> strings), but I thought it was in compliance with POSIX. This will be
> fixed by checking operands to be POSIX decimal integers.

Both program and man page are fixed.

[...]
>>   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
>>   to be required by the man page; the arg in it forms a degenerate
>>   expression.  The arg is not a <string> since it is an <integer>.
>>   Therefore, the expression is numeric (I didn't check that POSIX says
>>   this explicitly).  Therefore, we are justified in applying strtoimax()
>>   to all the operands in the expression (all 1 of them) and getting a
>>   range error.

The current behaviour was irritating in so far as the first the operand 
was printed, then an error message on the next line, and an exit code of 2 
is returned. The error message comes from the check for 0 or "", which 
shall result in an exit code of 1, but for too large decimals an overflow 
is detected during these tests.

This is fixed by first checking for overflow and then printing the result 
of the evaluated expression.

A side effect is that "expr 0000000000010000000" now correctly prints 
"10000000" instead of "0000000000010000000".

> I think this is rational behaviour of the code, but I'm not convinced,
> that <identifier> can not be <integer> and <string> at the same time.
> See for example:
> 
> $ expr \( 222 \) : "2*"
> 3
> $ expr \( 111 + 111 \) : "2*"
> 3
> 
> The numeric result is taken as string to perform the RE matching and the
> length of the first match is returned. Therefore, a clear integer can
> still be interpreted as string, if operand to a string expression.
> The degenerate case of just a too long integer might still be considered
> a valid string and the missing operation (or identity?) does not change
> that type. Therefore it might be argued, that a very long numeric
> argument should just be printed as string in that case. (???)

I have had a look at the POSIX page for EXRP and found that I had a wrong 
understanding of some terms. Each operand is either an integer or a string, 
but the result of an evaluation can be converted to either.

Thus it is correct to enforce numeric interpretation on input that is 
syntactically parsed as integer number.

>> The syntax is still broken as designed, since it doesn't allow +1 to
>> be an integer, and it requires octal intgers to be misinterpreted
>> as decimal integers although no reasonable specification of decimal
>> integers allows them to start with a '0', and it doesn't support
>> non-decimal integers...
> 
> We *do* accept "+1" as synonymous to "1", though, as pointed out above.
> I'd like to change this.

Changed, as explained above.

>>> Does POSIX require that expr exits on illegal arguments?
>>
>> Not sure.  It requires an exit status of 2 for invalid expressions.
>> For the "+" operator, the operands are required to be decimal
>> integers, but the error handling isn't so clearly specified.  For
>> the "&" operator, operands(s) are allowed to be null.

The patch I prepared supports empty operands for arithmetic operations,
they are treated as "0":

$ expr "" + 1
1

This is easily changed by enforcing at least one digit in is_integer().
(I have tested both versions, but since this appears to be traditional
behaviour at least on FreeBSD, I'm worried that enforcing at least one 
digit for numeric operands will break existing shell scripts.)

Please review the attached patch. It is again created with "-w" to ignore 
whitespace changes (indentation).

Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4E0CA52D.7020508>