From owner-freebsd-standards@FreeBSD.ORG  Mon Jun 27 11:07:12 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54516106564A
	for <freebsd-standards@FreeBSD.org>;
	Mon, 27 Jun 2011 11:07:12 +0000 (UTC)
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: from freefall.freebsd.org (freefall.freebsd.org
	[IPv6:2001:4f8:fff6::28])
	by mx1.freebsd.org (Postfix) with ESMTP id 3B3978FC24
	for <freebsd-standards@FreeBSD.org>;
	Mon, 27 Jun 2011 11:07:12 +0000 (UTC)
Received: from freefall.freebsd.org (localhost [127.0.0.1])
	by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5RB7CoB071970
	for <freebsd-standards@FreeBSD.org>; Mon, 27 Jun 2011 11:07:12 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Received: (from gnats@localhost)
	by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5RB7BTe071968
	for freebsd-standards@FreeBSD.org; Mon, 27 Jun 2011 11:07:11 GMT
	(envelope-from owner-bugmaster@FreeBSD.org)
Date: Mon, 27 Jun 2011 11:07:11 GMT
Message-Id: <201106271107.p5RB7BTe071968@freefall.freebsd.org>
X-Authentication-Warning: freefall.freebsd.org: gnats set sender to
	owner-bugmaster@FreeBSD.org using -f
From: FreeBSD bugmaster <bugmaster@FreeBSD.org>
To: freebsd-standards@FreeBSD.org
Cc: 
Subject: Current problem reports assigned to freebsd-standards@FreeBSD.org
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jun 2011 11:07:12 -0000

Note: to view an individual PR, use:
  http://www.freebsd.org/cgi/query-pr.cgi?pr=(number).

The following is a listing of current problems submitted by FreeBSD users.
These represent problem reports covering all versions including
experimental development code and obsolete releases.


S Tracker      Resp.      Description
--------------------------------------------------------------------------------
o stand/157050 standards  OSS implementation lacks AFMT_FLOAT
o stand/154842 standards  invalid request authenticator in the second and subseq
o stand/150093 standards  C++ std::locale support is broken
a stand/149980 standards  [libc] [patch] negative value integer to nanosleep(2) 
o stand/147210 standards  xmmintrin.h and cstdlib conflicts with each other with
o docs/143472  standards  gethostname(3) references undefined value: HOST_NAME_M
s stand/141705 standards  [libc] [request] libc lacks cexp (and friends)
o stand/130067 standards  Wrong numeric limits in system headers?
o stand/124860 standards  flockfile(3) doesn't work when the memory has been exh
o stand/121921 standards  [patch] Add leap second support to at(1), atrun(8)
o stand/116477 standards  rm(1): rm behaves unexpectedly when using -r and relat
o bin/116413   standards  incorrect getconf(1) handling of unsigned constants gi
o stand/116081 standards  make does not work with the directive sinclude
p stand/107561 standards  [libc] [patch] [request] Missing SUS function tcgetsid
o stand/100017 standards  [Patch] Add fuser(1) functionality to fstat(1)
o stand/96236  standards  [patch] [posix] sed(1) incorrectly describes a functio
o stand/94729  standards  [libc] fcntl() throws undocumented ENOTTY
a stand/86484  standards  [patch] mkfifo(1) uses wrong permissions
o stand/82654  standards  C99 long double math functions are missing
o stand/81287  standards  [patch] fingerd(8) might send a line not ending in CRL
a stand/80293  standards  sysconf() does not support well-defined unistd values
o stand/79056  standards  [feature request] [atch] regex(3) regression tests
o stand/70813  standards  [patch] ls(1) not Posix compliant
o stand/66357  standards  make POSIX conformance problem ('sh -e' & '+' command-
s kern/64875   standards  [libc] [patch] [request] add a system call: fdatasync(
o stand/56476  standards  [patch] cd9660 unicode support simple hack
o stand/54410  standards  one-true-awk not POSIX compliant (no extended REs)
o stand/46119  standards  Priority problems for SCHED_OTHER using pthreads
o stand/44365  standards  [headers] [patch] [request] introduce ulong and unchar
a stand/41576  standards  ln(1): replacing old dir-symlinks
o stand/39256  standards  snprintf/vsnprintf aren't POSIX-conformant for strings
a docs/26003   standards  getgroups(2) lists NGROUPS_MAX but not syslimits.h
s stand/24590  standards  timezone function not compatible witn Single Unix Spec
o stand/21519  standards  sys/dir.h should be deprecated some more
s bin/14925    standards  getsubopt isn't poisonous enough

35 problems total.


From owner-freebsd-standards@FreeBSD.ORG  Wed Jun 29 01:10:01 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 239661065670;
	Wed, 29 Jun 2011 01:10:01 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
	(fallbackmx07.syd.optusnet.com.au [211.29.132.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 486E28FC16;
	Wed, 29 Jun 2011 01:09:59 +0000 (UTC)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p5SN73QD001264; Wed, 29 Jun 2011 09:07:03 +1000
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p5SN6v60014950
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Wed, 29 Jun 2011 09:06:58 +1000
Date: Wed, 29 Jun 2011 09:06:57 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stefan Esser <se@FreeBSD.org>
In-Reply-To: <4E0A0774.3090004@freebsd.org>
Message-ID: <20110629082103.O1084@besplex.bde.org>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Alexander Best <arundel@FreeBSD.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@FreeBSD.org,
	Bruce Evans <bde@FreeBSD.org>
Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 01:10:01 -0000

[I changed developers to standards instead of removing it]

On Tue, 28 Jun 2011, Stefan Esser wrote:

> Am 28.06.2011 13:02, schrieb Poul-Henning Kamp:
>> In message <4E09AF8E.5010509@freebsd.org>, "Stefan Esser" writes:
>>
>>> Due to (false, according  to BDE) considerations for POSIX compliance,
>>> the 64bit code was made conditional on a command line option in 2002.
>>
>> I think 64bit is the wrong thing to focus on, shouldn't it be
>> "intmax_t" so we will not have to revisit this again ?
>
> Well, actually it already *is* intmax_t, which happens to be 64bit
> on all architectures I checked ;-)
>
> My proposal is just to not produce overflows when easily avoidable.
> This takes little effort, simplifies the code and makes scripts more
> portable accross architectures.
>
> Are there any supported architectures with intmax_t smaller than 64bit?

There cannot be, since C99 requires long long to be at least 64 bits
(counting the sign bit) and it requires intmax_t to be capable of
representing any value of any signed integer type.

Which checking this, I noticed that:
- preprocessor arithmetic is done using intmax_t or uintmax_t.  This causes
   portability problems related to ones for expr -- expressions like
   ULONG_MAX + ULONG_MAX suddenly started in 1999 giving twice ULONG_MAX
   instead of ULONG_MAX-1, but only on arches where ULONG_MAX < UINTMAX_MAX.
   (I use unsigned values in this example to give defined behaviour on
   overflow, so that the expression ULONG_MAX + ULONG_MAX is not just a bug.
   expr doesn't have this complication.)
- C99 doesn't require intmax_t to be the logically longest type.  Thus it
   permits FreeBSD's rather bizarre implementation of intmax_t being plain
   long which is logically shorter than long long.

Other points:
- `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
   but it isn't the result that is too large, but the arg that is too large.
   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
   correct, but it is in a different style to strerror() (uncapitalized,
   and more concise).
- `expr 10000000000000000000' (19 or even 119 zeros) gives no error.  It
   is documented that the arg is parsed as a string in this case, and the
   documentation for -e doesn't clearly say that -e changes this.  And -e
   doesn't change this if the arg clearly isn't a number
   (e.g., if it is 10000000000000000000mumble), or even if it is a non-decimal
   number (e.g., if is 010, 0x10 or 10.0).  If the arg isn't a decimal integer,
   then (except for -e on decimal integers), there is an error irrespective
   of -e when arithmetic is attempted (e.g., adding 0).  The error message
   for this bogusly says "non-numeric argument" when the arg is numeric but
   not a decimal integer.
- POSIX requires brokenness for bases other than 10, but I wonder if an
   arg like 0x10 invokes undefined behaviour and thus can be made to
   work.  (I wanted to use a hex number since I can never remember what
   INTMAX_MAX is in decimal and wanted to type it in hex for checking
   the range and overflow errors.)  Allowing hex args causes fewer
   problems than allowing decimal args larger than INT32_MAX, since
   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
   support hex args and don't worry about POSIX restricting them.
- POSIX unfortunately requires args larger than INT32_MAX to be unportable
   (to work if longs are longer than 32 bits, else to give undefined (?)
   behaviour.  For portability there could be a -p switch that limits args
   to INT32_MAX even if longs are longer than 32 bits.
- I hope POSIX doesn't require benign overflow.  Thus treating all overflows
   as errors is good for portability and doesn't require any switch.

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Wed Jun 29 12:49:34 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2C3721065675
	for <standards@FreeBSD.org>; Wed, 29 Jun 2011 12:49:34 +0000 (UTC)
	(envelope-from se@freebsd.org)
Received: from nm20-vm3.bullet.mail.ne1.yahoo.com
	(nm20-vm3.bullet.mail.ne1.yahoo.com [98.138.91.150])
	by mx1.freebsd.org (Postfix) with SMTP id E123E8FC19
	for <standards@FreeBSD.org>; Wed, 29 Jun 2011 12:49:33 +0000 (UTC)
Received: from [98.138.90.52] by nm20.bullet.mail.ne1.yahoo.com with NNFMP;
	29 Jun 2011 12:36:25 -0000
Received: from [98.138.84.37] by tm5.bullet.mail.ne1.yahoo.com with NNFMP;
	29 Jun 2011 12:36:25 -0000
Received: from [127.0.0.1] by smtp105.mail.ne1.yahoo.com with NNFMP;
	29 Jun 2011 12:36:25 -0000
X-Yahoo-Newman-Id: 760581.86118.bm@smtp105.mail.ne1.yahoo.com
Received: from [192.168.119.20] (se@81.173.144.90 with plain)
	by smtp105.mail.ne1.yahoo.com with SMTP; 29 Jun 2011 05:36:24 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: 0Mu8oeUVM1l5toojlmSMG5uDs4_AoT2uXC__MjGRoKmxOF.
	WEEUaDQKCmGQ58g2xLcRlsvlZsa38i.RUQ1SaJgrKVR_z8FUqQ2rYE9gvZGI
	Ho0XRfdOop1hqOgsdGAwYyex_ghqInuiqctkx5jG1nupzC5CpTrocm9ad.vN
	WPNK8w1N2mFRI0IU89lc2IzSWrQCduzKxlQS0Sop.nh9Ht40g4t7xpM9Dssr
	LzfXral9LKDAT0HgU2gUamsS0KcbN5auoKLXpKBlAkXBFdwgpeUVxfxZnqiN
	TGa9NOpxsJi_NrTSlBuXEKey7vaGqbRj15u.yzveV5cre6Z4y071LVKRM1iL
	7jDeAScChe.VVKhk3kep1nqqKptbHw0g4PdbCxA--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0B1C47.4010201@freebsd.org>
Date: Wed, 29 Jun 2011 14:36:23 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org>
In-Reply-To: <20110629082103.O1084@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Alexander Best <arundel@FreeBSD.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>,
	Stefan Esser <se@localhost.FreeBSD.org>, standards@FreeBSD.org,
	Bruce Evans <bde@FreeBSD.org>
Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 12:49:34 -0000

Am 29.06.2011 01:06, schrieb Bruce Evans:
> [I changed developers to standards instead of removing it]

Thanks Bruce. I just subcribed to standards@ and was about to move the
thread there, too.

> On Tue, 28 Jun 2011, Stefan Esser wrote:
>> Are there any supported architectures with intmax_t smaller than 64bit?
> 
> There cannot be, since C99 requires long long to be at least 64 bits
> (counting the sign bit) and it requires intmax_t to be capable of
> representing any value of any signed integer type.
> 
> Which checking this, I noticed that:
> - preprocessor arithmetic is done using intmax_t or uintmax_t.  This causes
>   portability problems related to ones for expr -- expressions like
>   ULONG_MAX + ULONG_MAX suddenly started in 1999 giving twice ULONG_MAX
>   instead of ULONG_MAX-1, but only on arches where ULONG_MAX < UINTMAX_MAX.
>   (I use unsigned values in this example to give defined behaviour on
>   overflow, so that the expression ULONG_MAX + ULONG_MAX is not just a bug.
>   expr doesn't have this complication.)
> - C99 doesn't require intmax_t to be the logically longest type.  Thus it
>   permits FreeBSD's rather bizarre implementation of intmax_t being plain
>   long which is logically shorter than long long.
> 
> Other points:
> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>   but it isn't the result that is too large, but the arg that is too large.
>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>   correct, but it is in a different style to strerror() (uncapitalized,
>   and more concise).

The patch that I sent with my first message fixes this. The message is
changed from "Result too large" to "Not a valid integer: %s".

("non-numeric argument" is used in other places and I could adapt to
that, though I prefer to also see which argument was rejected. But I
think that "not a valid integer" better describes the situation.)

Without "-e" the numeric result is "undefined" and no error is signaled,
since there was no test whether the conversion succeeded before I added
it back in 2000.


BTW: The current number parsing is not fully POSIX compliant, since it
allows to write "expr +1 - +1".

This is an artefact of using strtoimax(..., 10) for conversion and
validity checking at the same time (and strtoimax accepts leading "+",
which POSIX forbids). This could be fixed by a simple test of the first
non-blank character of the parameter.

> - `expr 10000000000000000000' (19 or even 119 zeros) gives no error.  It
>   is documented that the arg is parsed as a string in this case, and the
>   documentation for -e doesn't clearly say that -e changes this.  And -e
>   doesn't change this if the arg clearly isn't a number
>   (e.g., if it is 10000000000000000000mumble), or even if it is a non-decimal
>   number (e.g., if is 010, 0x10 or 10.0).  If the arg isn't a decimal integer,
>   then (except for -e on decimal integers), there is an error irrespective
>   of -e when arithmetic is attempted (e.g., adding 0).  The error message
>   for this bogusly says "non-numeric argument" when the arg is numeric but
>   not a decimal integer.

Yes, the reason is the conversion from string to intmax_t just before
the operation. You are right in pointing out that strtoimax can fail for
(out of range) integer numbers.

The error message should probably be changed to "not a valid integer"
instead (or suggest a better form; but the shorter "invalid integer"
seems wrong to me ...).

> - POSIX requires brokenness for bases other than 10, but I wonder if an
>   arg like 0x10 invokes undefined behaviour and thus can be made to
>   work.  (I wanted to use a hex number since I can never remember what
>   INTMAX_MAX is in decimal and wanted to type it in hex for checking
>   the range and overflow errors.)  Allowing hex args causes fewer
>   problems than allowing decimal args larger than INT32_MAX, since
>   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
>   support hex args and don't worry about POSIX restricting them.

Does POSIX require that expr exits on illegal arguments?

Else we'll have undefined behaviour and might be allowed to add parsing
of hex numbers. I'd be glad to add it (but this would be a different
commit). But octals must not be supported, since 010 must be treated as
decimal number (i.e., we must not just make strtoimax accept other bases
than 10, but must special case for decimal and hex).

> - POSIX unfortunately requires args larger than INT32_MAX to be unportable
>   (to work if longs are longer than 32 bits, else to give undefined (?)
>   behaviour.  For portability there could be a -p switch that limits args
>   to INT32_MAX even if longs are longer than 32 bits.

Well, undefined behaviour can always be to return the correct result ;-)

I'd be willing to add "-p" (effectively just make "-e" the default that
can be overridden).

> - I hope POSIX doesn't require benign overflow.  Thus treating all overflows 
>   as errors is good for portability and doesn't require any switch.

So do I ... But if we agreed on making intmax_t the default and on "-p"
for strict POSIX compliance, then we could of course keep the undetected
overflows for that code path.

Thank you for the comments!

Best regards, STefan

From owner-freebsd-standards@FreeBSD.ORG  Wed Jun 29 20:21:31 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 624781065670
	for <freebsd-standards@freebsd.org>;
	Wed, 29 Jun 2011 20:21:31 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from nm17.bullet.mail.bf1.yahoo.com (nm17.bullet.mail.bf1.yahoo.com
	[98.139.212.176])
	by mx1.freebsd.org (Postfix) with SMTP id EB35C8FC13
	for <freebsd-standards@freebsd.org>;
	Wed, 29 Jun 2011 20:21:30 +0000 (UTC)
Received: from [98.139.212.148] by nm17.bullet.mail.bf1.yahoo.com with NNFMP;
	29 Jun 2011 20:07:45 -0000
Received: from [98.139.211.204] by tm5.bullet.mail.bf1.yahoo.com with NNFMP;
	29 Jun 2011 20:07:44 -0000
Received: from [127.0.0.1] by smtp213.mail.bf1.yahoo.com with NNFMP;
	29 Jun 2011 20:07:44 -0000
X-Yahoo-Newman-Id: 918647.82623.bm@smtp213.mail.bf1.yahoo.com
Received: from [192.168.119.20] (se@81.173.158.168 with plain)
	by smtp213.mail.bf1.yahoo.com with SMTP; 29 Jun 2011 13:07:44 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: sGJ0h0IVM1nweiP28hoWO89pdFFk1nd4Z8Ev0wEjDjGKCvH
	YMjSumln2ZrDnEKu0HicXuRQ5CuwsoSzyVByKJFeYekKk3k2t7VHaj3lmPRN
	9qwr20HhoRb4cOL2FQPYUdThKfWMHoFBOan5aTCiUeXWdm0bFGOsHkUtHQXP
	wnCePLgCtDud43lFIhTFBXezw9lOeLOwc8pKBjttwal4jdHJxreQQcymGNVK
	ZbTdZ0MgHJ4FDe8hzRDmG5xppSElyeADCwJbo3MvCdso.NTO2gHqykSgiR9R
	yoKttCk5nt991VGkkqIZtcP8.C0LZx87x_JkqOXENmQ1R5EXSTFXjMEC6Uv2
	0EgRdv..HVKgJ9rmYNlIZGZkBo0od.1tcLbDl1A--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0B860E.50504@freebsd.org>
Date: Wed, 29 Jun 2011 22:07:42 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-standards@freebsd.org
References: <4E09AF8E.5010509@freebsd.org>
In-Reply-To: <4E09AF8E.5010509@freebsd.org>
X-Forwarded-Message-Id: <4E09AF8E.5010509@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Subject: Fwd: [RFC] Consistent numeric range for "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 20:21:31 -0000

Since BDE moved the thread over to this list I'm reposting the first
message under the assumption, that many list members have not seen it, yet.

Replies received in private were in support of the change, except of the
warning of possible POSIX compliance issues received from GWollman,
who therefore asked to move this thread to the standards list.


-------- Original Message --------
Subject: [RFC] Consistent numeric range for "expr" on all architectures
Date: Tue, 28 Jun 2011 12:40:14 +0200
From: Stefan Esser <st_esser@t-online.de>
To: FreeBSD Developers <developers@freebsd.org>
CC: Bruce Evans <bde@freebsd.org>, Alexander Best <arundel@freebsd.org>

Hi,

this point has been on my ToDO list for too long and I'd like to gather
opinions whether we should clean up "expr" a bit.

Back in 2000, I found that expr silently failed in an installer for a
Linux software demo due to restricted range. It was a bug in that
installer, but since it was just a few lines of code change, I made
"expr", "test" and "/bin/sh" support 64bit range on all architectures.
At the same time I added range checks for all calculations as suggested
by Bruce Evans.

Due to (false, according  to BDE) considerations for POSIX compliance,
the 64bit code was made conditional on a command line option in 2002.
The expr builtin in /bin/sh has been decoupled from the "expr" sources,
some time ago, so it has become a separate issue.


Given the simple functionality, the behaviour of "expr" is quite
complex, depending on CPU architecture, environment and options.
The following list shows supported numeric range depending on the CPU:

command               | ILP32   | I32LP64 | specialties
--------------------------------------------------------------------
expr                  | 32bit   | 64bit   | wrap around
expr -e               | 64bit   | 64bit   | overflow check
EXPR_COMPAT=1 expr    | 64bit   | 64bit   | overflow check
EXPR_COMPAT=1 expr -e | error   | error   | "-e" seen as 1st operand

(EXPR_COMPAT allows support of "expr -1 + 1", where "-1" is treated as
an option by POSIX expr and "expr -- -1 + 1" must be written instead.)


Examples on i386:

# --- 32 bit range, no range checks:
$ expr 2000000000 + 2000000000
-294967296

# --- 64 bit range and overflow checks via "-e" or EXPR_COMAPT:
$ expr -e 2000000000 + 2000000000
4000000000
$ EXPR_COMPAT=1 expr 2000000000 + 2000000000
4000000000

$ expr -e 5000000000000000000 + 5000000000000000000
expr: overflow
$ echo $?
2

$ EXPR_COMPAT=1 expr -e 1 + 1
expr: syntax error
$ echo $?
2


Examples on amd64:

# --- always 64 bit range, but wrap around controlled by -e or ENV:
$ expr 5000000000000000000 + 5000000000000000000
-8446744073709551616
$ expr -e 5000000000000000000 + 5000000000000000000
expr: overflow
$ EXPR_COMPAT=1 expr 5000000000000000000 + 5000000000000000000
expr: overflow


Why do I consider the current situation bad:

1) Shell scripts using "expr" have inconsistent behaviour, depending
   on the word size of the CPU architecture. This makes it possible
   to develop a shell script on amd64 and be sure it works on i386.

2) You can not just put "expr -e" into shell scripts to get consistent
   64bit range over all architectures, since that will fail with a
   syntax error if EXPR_COMPAT is set in the environment.

3) Range checks are very useful, since silent overflow in shell scripts
   can lead to data loss or other malfunction that is hard to diagnose.
   With 64 bit range, overflows are prevented even in cases, where they
   occur with 32 bit range.

4) The original reason for 64 bit support was a silently failing test
   in a commercial installer. It happened to fail in such a way, that
   the result of the computation was ignored under Linux but not in the
   port I was preparing at that time.

5) The current code is made complex by support of different ranges and
   overflow behaviour, depending on the options passed and environment.
   Consistent behavious is not only easier to understand and use, but
   does even reduce code size and complexity of the sources.


Regarding the POSIX compatibility of the proposed change, I received a
message from BDE:

On Tue, 15 Jun 2010, Bruce Evans wrote:
>On Tue, 15 Jun 2010, Stefan Esser wrote: 
>> I extended expr and test (standalone binaries and /bin/sh builtins) to
>> handle 64 bit on i386 a looong time ago (and I also added overflow
>> checks to all calculations). But then it was decided, that POSIX demands
>> 32 bit range on 32 bit machines, so support for 64 bit range was made
>> conditional on a switch (expr -e if memory serves me right ...). So yes,
>> 64 bit range is available on all platforms.
> 
> Hmm, POSIX doesn't require this brokenness for the shell, though it
> requires practically equivalent brokenness for all operands and option-
> arguments generally: it requires "all" integer arithmetic to use
> signed long variables and "all" floating point arithmetic to use double
> variables, with Standard C semantics.  signed long may have any size,
> but normally has 31 or 63 value bits as above.  It is unclear how a
> variable can ever be floating point.  Then it contradicts^Woverrides
> "all" with certain extensions for the shell.  However, these extensions
> are almost useless since they mainly make a difference when the signed
> long arithmetic would overflow, in which case the behaviour is undefined
> (?) so it can DTRT using any extension that it wants.

I take this to mean, that the shell should use "signed long" integers,
but that it is up to the implementor, what action is taken on overflow
and we are free to implement an extension (and enable it by default),
which supports 64bit range instead of returning bogus values.


My suggestion is to make the following modifications to expr:

1) Always compute with 64 bit range and overflow checks enabled. This
   means, that the "-e" option is always assumed.

2) Continue to support the "-e" option as a NOP for compatibility with
   scripts that knew about that FreeBSD extension.

This suggestion applies to -CURRENT - I'm afraid a MFC might be
considered too much of a POLA.

If there are no strong objections, I'd like to commit this change before
the code freeze. Of course, I'm willing to adapt the man page at the
time of the code commit.

The suggested patch just removes casts of operands to "long" that
currently exist only to reduce range. All stored operands are 64bit,
anyway. This patch if for review purposes only ("diff -w" to ignore
indent level, to show that just complexity is removed).

Best regards, STefan


--- /usr/src/bin/expr/expr.y	2009-10-30 23:02:40.000000000 +0100
+++ ./expr.y	2011-04-28 16:14:52.000000000 +0200
@@ -71,7 +71,6 @@
 int		yylex(void);
 int		yyparse(void);

-static int	eflag;
 char **av;
 %}

@@ -156,10 +155,7 @@
 	 * non-digits MUST NOT be considered integers.  strtoimax() will
 	 * figure this out for us.
 	 */
-	if (eflag)
 		(void)strtoimax(s, &ep, 10);
-	else
-		(void)strtol(s, &ep, 10);

 	if (*ep != '\0')
 		vp->type = string;
@@ -191,13 +187,9 @@

 	/* vp->type == numeric_string, make it numeric */
 	errno = 0;
-	if (eflag) {
 		i  = strtoimax(vp->u.s, (char **)NULL, 10);
 		if (errno == ERANGE)
-			err(ERR_EXIT, NULL);
-	} else {
-		i = strtol(vp->u.s, (char **)NULL, 10);
-	}
+		errx(ERR_EXIT, "Not a valid integer: %s", vp->u.s);

 	free (vp->u.s);
 	vp->u.i = i;
@@ -282,12 +274,11 @@
 	if (getenv("EXPR_COMPAT") != NULL
 	    || check_utility_compat("expr")) {
 		av = argv + 1;
-		eflag = 1;
 	} else {
 		while ((c = getopt(argc, argv, "e")) != -1)
 			switch (c) {
 			case 'e':
-				eflag = 1;
+				/* ignore for backwards compatibility */
 				break;

 			default:
@@ -483,13 +474,10 @@
 		errx(ERR_EXIT, "non-numeric argument");
 	}

-	if (eflag) {
 		r = make_integer(a->u.i + b->u.i);
 		if (chk_plus(a->u.i, b->u.i, r->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i + (long)b->u.i);

 	free_value (a);
 	free_value (b);
@@ -520,13 +508,10 @@
 		errx(ERR_EXIT, "non-numeric argument");
 	}

-	if (eflag) {
 		r = make_integer(a->u.i - b->u.i);
 		if (chk_minus(a->u.i, b->u.i, r->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i - (long)b->u.i);

 	free_value (a);
 	free_value (b);
@@ -554,13 +539,10 @@
 		errx(ERR_EXIT, "non-numeric argument");
 	}

-	if (eflag) {
 		r = make_integer(a->u.i * b->u.i);
 		if (chk_times(a->u.i, b->u.i, r->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i * (long)b->u.i);

 	free_value (a);
 	free_value (b);
@@ -591,13 +573,10 @@
 		errx(ERR_EXIT, "division by zero");
 	}

-	if (eflag) {
 		r = make_integer(a->u.i / b->u.i);
 		if (chk_div(a->u.i, b->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i / (long)b->u.i);

 	free_value (a);
 	free_value (b);
@@ -617,11 +596,8 @@
 		errx(ERR_EXIT, "division by zero");
 	}

-	if (eflag)
 		r = make_integer(a->u.i % b->u.i);
 	        /* chk_rem necessary ??? */
-	else
-		r = make_integer((long)a->u.i % (long)b->u.i);

 	free_value (a);
 	free_value (b);


From owner-freebsd-standards@FreeBSD.ORG  Wed Jun 29 23:32:18 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 083471065673;
	Wed, 29 Jun 2011 23:32:18 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au
	[211.29.132.184])
	by mx1.freebsd.org (Postfix) with ESMTP id 7DF828FC12;
	Wed, 29 Jun 2011 23:32:17 +0000 (UTC)
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p5TNWBg6004343
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Thu, 30 Jun 2011 09:32:12 +1000
Date: Thu, 30 Jun 2011 09:32:11 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stefan Esser <se@freebsd.org>
In-Reply-To: <4E0B1C47.4010201@freebsd.org>
Message-ID: <20110630073705.P1117@besplex.bde.org>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: standards@freebsd.org, Poul-Henning Kamp <phk@phk.freebsd.dk>,
	Stefan Esser <se@localhost.FreeBSD.org>, Bruce Evans <bde@freebsd.org>,
	Alexander Best <arundel@freebsd.org>
Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2011 23:32:18 -0000

On Wed, 29 Jun 2011, Stefan Esser wrote:

> Am 29.06.2011 01:06, schrieb Bruce Evans:
>> Other points:
>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>   but it isn't the result that is too large, but the arg that is too large.
>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>   correct, but it is in a different style to strerror() (uncapitalized,
>>   and more concise).
>
> The patch that I sent with my first message fixes this. The message is
> changed from "Result too large" to "Not a valid integer: %s".
>
> ("non-numeric argument" is used in other places and I could adapt to
> that, though I prefer to also see which argument was rejected. But I
> think that "not a valid integer" better describes the situation.)

I prefer "operand too large".  A decimal integer operand that is too
large to be represented by intmax_t is not really invalid, but is just
too large.  We already have a different message for invalid.

>From an old (?) version of the man page:

%    Arithmetic operations are performed using signed integer math.  If the -e
%    flag is specified, arithmetic uses the C intmax_t data type (the largest
                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
                                    for expr [-e] 10000000000000000000 we are
                                    doing actual arithmetic -- see below
%    integral type available), and expr will detect arithmetic overflow and
%    return an error indication.  If a numeric operand is specified which is

"Numeric" is not defined anywhere in the man page.  This is the only
use of it in the man page.  It means "decimal integer" and should say
precisely that.  The only hint about this in the man page is the statement
that "all integer operands are interpreted in base 10".  The fuzziness
extends to error messages saying "non-numeric argument" instead of
"operand not a decimal integer [used in integer context]".

%    so large as to overflow conversion to an integer, it is parsed as a
%    string instead.  If -e is not specified, arithmetic operations and pars-

This specificially says that large operands are parsed as strings.
Strangely, since large operands are only checked for with -e, only -e can
get this right; without -e, large operands are not even detected.
However, this is a bug in the man page -- see below.

%    ing of integer arguments will overflow silently according to the rules of
%    the C standard, using the long data type.

This says that the -e case is broken, but doesn't override the statement
that large operands are parsed as strings.  Since the man page is wrong,
no override is needed.

I originally though of using "argument" instead of "operand", but got the
better word from the above section of the man page.

> Without "-e" the numeric result is "undefined" and no error is signaled,
> since there was no test whether the conversion succeeded before I added
> it back in 2000.

I first though that the error reporting must be delayed to when an operand
is used in an expression, even with -e.  But it is already delayed, and
the parsing works as specified in POSIX.  The parsing is just poorly or
incorrectly documented in the man page.

- The syntax in the man page doesn't seem to mention the degenerate
   expression <identifer>.  POSIX specifies this of course.  <identifier>
   can be either <integer> or <string>, where <integer> is an optional
   unary minus followed by digits, and <string> is any argument that is
   not an <integer> and not an operator symbol.

   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
   to be required by the man page; the arg in it forms a degenerate
   expression.  The arg is not a <string> since it is an <integer>.
   Therefore, the expression is numeric (I didn't check that POSIX says
   this explicitly).  Therefore, we are justified in applying strtoimax()
   to all the operands in the expression (all 1 of them) and getting a
   range error.

- The man page is broken in saying that unrepresentable numeric operands
   are parsed as strings instead.  Whether an operand is numeric is
   determined by the POSIX syntax which is purely lexical and doesn't
   depend on representability.  So for the degnerate expression with 1
   operand, the type of the expression is determined by the type of the
   operand which is detemined lexically as described above.  Similarly
   for parenthesized degenerate expressions.  For non-degenerate
   expressions, the types of the operators and of the result are again
   mostly or always determined lexically by the types of the results.
   For example, '=' means equality of integers if both operands are
   integers, but equality of of strings if one or both operands is not
   an integer.

- In all cases, whether an operand is an integer is context-dependent,
   so args must not be classified early.  This seems to be done correctly,
   so the code conforms to the POSIX syntax although the man page doesn't.

The syntax is still broken as designed, since it doesn't allow +1 to
be an integer, and it requires octal intgers to be misinterpreted
as decimal integers although no reasonable specification of decimal
integers allows them to start with a '0', and it doesn't support
non-decimal integers...

>> - POSIX requires brokenness for bases other than 10, but I wonder if an
>>   arg like 0x10 invokes undefined behaviour and thus can be made to
>>   work.  (I wanted to use a hex number since I can never remember what
>>   INTMAX_MAX is in decimal and wanted to type it in hex for checking
>>   the range and overflow errors.)  Allowing hex args causes fewer
>>   problems than allowing decimal args larger than INT32_MAX, since
>>   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
>>   support hex args and don't worry about POSIX restricting them.
>
> Does POSIX require that expr exits on illegal arguments?

Not sure.  It requires an exit status of 2 for invalid expressions.
For the "+" operator, the operands are required to be decimal
integers, but the error handling isn't so clearly specified.  For
the "&" operator, operands(s) are allowed to be null.

Anyway, hex numbers can't be put through this gap.  Since they are not
decimal integers, they are required to be interpreted as strings in
some contexts.  So "expr 0x10 \< 2" gives 1 because the string "0" is
less then the string "2".  This conflicts with "expr 16 \< 2" giving
0 since both operands are intgers and of course 16 = 0x10 is not less
than 2.

>> - POSIX unfortunately requires args larger than INT32_MAX to be unportable
>>   (to work if longs are longer than 32 bits, else to give undefined (?)
>>   behaviour.  For portability there could be a -p switch that limits args
>>   to INT32_MAX even if longs are longer than 32 bits.
>
> Well, undefined behaviour can always be to return the correct result ;-)
>
> I'd be willing to add "-p" (effectively just make "-e" the default that
> can be overridden).

I now don't see any problem with -e.  Not even the one for degenerate
expressions that I thought I saw.

POSIX says that shell expressions should be prefered to expr, and for
shell expressions it has a non-null discussion of representability and
overflows.  It basically says that only long arithmetic is supported,
without even C's type suffixes which are needed to extend to unsigned
long arithmetic, but extensions are encouraged.

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 08:23:34 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3F0BD1065673
	for <standards@freebsd.org>; Thu, 30 Jun 2011 08:23:34 +0000 (UTC)
	(envelope-from se@freebsd.org)
Received: from nm16-vm0.bullet.mail.sp2.yahoo.com
	(nm16-vm0.bullet.mail.sp2.yahoo.com [98.139.91.210])
	by mx1.freebsd.org (Postfix) with SMTP id 1CBEA8FC0C
	for <standards@freebsd.org>; Thu, 30 Jun 2011 08:23:34 +0000 (UTC)
Received: from [98.139.91.63] by nm16.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 08:09:40 -0000
Received: from [208.71.42.200] by tm3.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 08:09:40 -0000
Received: from [127.0.0.1] by smtp211.mail.gq1.yahoo.com with NNFMP;
	30 Jun 2011 08:09:40 -0000
X-Yahoo-Newman-Id: 632559.16409.bm@smtp211.mail.gq1.yahoo.com
Received: from [192.168.119.20] (se@81.173.152.201 with plain)
	by smtp211.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 01:09:39 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: oacrv1cVM1l64zQLXGULrH9xaOXV.S8jMke.EWwb20ChH9f
	xbr89FWDxKcIeOhfEfTxOeqQroIMbmYhEeq.3eucPreft5a.h4DtpVXlI.YA
	nwhzoBc8TXUUlDBekaE5bwQIb535vBTtAVvZlJ1wW00b1PLkW5mAK2H3ZEj_
	M1M.WoOteceOkBuAzYcESNoymNyqi9f86Kz_i5e.HKgz7pux4uVnf7MGowIr
	rD5mp_NV0mEWeCI_SrFKfTYq2l4COlz42B3Sn57oE_HMMmm_S4z6rAzU9F.Q
	HDq6PqG2FQaEDgWF0Sk.H.Sy5ehbYpRBOKiZE6MObZSAKmV1Y8whOYvfAJhV
	hXDDUn.IjKuYLM_5pt2E4do_QcA--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0C2F41.2060302@freebsd.org>
Date: Thu, 30 Jun 2011 10:09:37 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org>
	<4E0B1C47.4010201@freebsd.org>
	<20110630073705.P1117@besplex.bde.org>
In-Reply-To: <20110630073705.P1117@besplex.bde.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Alexander Best <arundel@freebsd.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org,
	Bruce Evans <bde@freebsd.org>
Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 08:23:34 -0000

Hi Bruce,

thank you for the detailed analysis!

On 30.06.2011 01:32, Bruce Evans wrote:
> On Wed, 29 Jun 2011, Stefan Esser wrote:
>> Am 29.06.2011 01:06, schrieb Bruce Evans:
>>> Other points:
>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>>   but it isn't the result that is too large, but the arg that is too large.
>>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>>   correct, but it is in a different style to strerror() (uncapitalized,
>>>   and more concise).
>>
>> The patch that I sent with my first message fixes this. The message is
>> changed from "Result too large" to "Not a valid integer: %s".
>>
>> ("non-numeric argument" is used in other places and I could adapt to
>> that, though I prefer to also see which argument was rejected. But I
>> think that "not a valid integer" better describes the situation.)
> 
> I prefer "operand too large".  A decimal integer operand that is too
> large to be represented by intmax_t is not really invalid, but is just
> too large.  We already have a different message for invalid.

Yes, I also thought the "non-numeric" message was misleading and that
"not a valid integer" was just slightly less misleading ;-)

But "operand too large" correctly describes the situation and I'll just
a test for that case (instead of just relying on strtoimax() to decide
about integer-ness).

According to POSIX, "+1" is no valid integer operand in "expr", should a
leading "+" also be detected and rejected?

>> From an old (?) version of the man page:
> 
> %    Arithmetic operations are performed using signed integer math.  If the -e
> %    flag is specified, arithmetic uses the C intmax_t data type (the largest
>                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
>                                    for expr [-e] 10000000000000000000 we are
>                                    doing actual arithmetic -- see below 
> %    integral type available), and expr will detect arithmetic overflow and
> %    return an error indication.  If a numeric operand is specified which is
> 
> "Numeric" is not defined anywhere in the man page.  This is the only
> use of it in the man page.  It means "decimal integer" and should say
> precisely that.  The only hint about this in the man page is the statement
> that "all integer operands are interpreted in base 10".  The fuzziness
> extends to error messages saying "non-numeric argument" instead of
> "operand not a decimal integer [used in integer context]".

Ok, all numeric arguments should be called "decimal integer" in the
documentation and error messages (if applicable).

> %    so large as to overflow conversion to an integer, it is parsed as a
> %    string instead.  If -e is not specified, arithmetic operations and pars-
> 
> This specificially says that large operands are parsed as strings.
> Strangely, since large operands are only checked for with -e, only -e can
> get this right; without -e, large operands are not even detected.
> However, this is a bug in the man page -- see below.

I had already wondered about this case (large numbers being considered
strings), but I thought it was in compliance with POSIX. This will be
fixed by checking operands to be POSIX decimal integers.

> %    ing of integer arguments will overflow silently according to the rules of
> %    the C standard, using the long data type.
> 
> This says that the -e case is broken, but doesn't override the statement
> that large operands are parsed as strings.  Since the man page is wrong,
> no override is needed.

Is the -e case OK if decimal numbers are correctly detected (not just
those accepted by strtoimax(), but all matching "-?[0-9]+")?

> I originally though of using "argument" instead of "operand", but got the
> better word from the above section of the man page.
> 
>> Without "-e" the numeric result is "undefined" and no error is signaled,
>> since there was no test whether the conversion succeeded before I added
>> it back in 2000.
> 
> I first though that the error reporting must be delayed to when an operand
> is used in an expression, even with -e.  But it is already delayed, and
> the parsing works as specified in POSIX.  The parsing is just poorly or
> incorrectly documented in the man page.

Yes, the reporting is delayed, since any operand (even number) might be
used in a RE matching operation (":"). I'll see whether I find a better
wording for the man page but this might be easier for a native speaker
of English.

> - The syntax in the man page doesn't seem to mention the degenerate
>   expression <identifer>.  POSIX specifies this of course.  <identifier>
>   can be either <integer> or <string>, where <integer> is an optional
>   unary minus followed by digits, and <string> is any argument that is
>   not an <integer> and not an operator symbol.

Yes, that definition makes "+1" an invalid <integer> ...

>   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
>   to be required by the man page; the arg in it forms a degenerate
>   expression.  The arg is not a <string> since it is an <integer>.
>   Therefore, the expression is numeric (I didn't check that POSIX says
>   this explicitly).  Therefore, we are justified in applying strtoimax()
>   to all the operands in the expression (all 1 of them) and getting a
>   range error.

I think this is rational behaviour of the code, but I'm not convinced,
that <identifier> can not be <integer> and <string> at the same time.
See for example:

$ expr \( 222 \) : "2*"
3
$ expr \( 111 + 111 \) : "2*"
3

The numeric result is taken as string to perform the RE matching and the
length of the first match is returned. Therefore, a clear integer can
still be interpreted as string, if operand to a string expression.
The degenerate case of just a too long integer might still be considered
a valid string and the missing operation (or identity?) does not change
that type. Therefore it might be argued, that a very long numeric
argument should just be printed as string in that case. (???)

> - The man page is broken in saying that unrepresentable numeric operands
>   are parsed as strings instead.  Whether an operand is numeric is
>   determined by the POSIX syntax which is purely lexical and doesn't
>   depend on representability.  So for the degnerate expression with 1
>   operand, the type of the expression is determined by the type of the
>   operand which is detemined lexically as described above.  Similarly

I do not fully agree on the type being determined in that way. It is
obvious, that over-long numbers may still be valid strings (and can thus
be used as operands for the RE matching operator ":"). This clearly
shows, that non-numeric arguments can be determined to be of type
string, but numeric arguments are of indetermined  type until used in an
operation (i.e. they are both, valid integer and string, like a mixed
state in quantum mechanics ;-) ).

>   for parenthesized degenerate expressions.  For non-degenerate
>   expressions, the types of the operators and of the result are again
>   mostly or always determined lexically by the types of the results.
>   For example, '=' means equality of integers if both operands are
>   integers, but equality of of strings if one or both operands is not
>   an integer.

Hmm, I think you are right in this case and numeric overflow should
occur (and be detected) if at least one operand is a large decimal
(outside the supported numeric range).

> - In all cases, whether an operand is an integer is context-dependent,
>   so args must not be classified early.  This seems to be done correctly,
>   so the code conforms to the POSIX syntax although the man page doesn't.

I agree, but I'm still not sure about "expr 10000000000000000000000"
causing an integer overflow.

> The syntax is still broken as designed, since it doesn't allow +1 to
> be an integer, and it requires octal intgers to be misinterpreted
> as decimal integers although no reasonable specification of decimal
> integers allows them to start with a '0', and it doesn't support
> non-decimal integers...

We *do* accept "+1" as synonymous to "1", though, as pointed out above.
I'd like to change this.

>>> - POSIX requires brokenness for bases other than 10, but I wonder if an
>>>   arg like 0x10 invokes undefined behaviour and thus can be made to
>>>   work.  (I wanted to use a hex number since I can never remember what
>>>   INTMAX_MAX is in decimal and wanted to type it in hex for checking
>>>   the range and overflow errors.)  Allowing hex args causes fewer
>>>   problems than allowing decimal args larger than INT32_MAX, since
>>>   they are obviously unportable.  Some FreeBSD utilities, e.g., dd,
>>>   support hex args and don't worry about POSIX restricting them.
>>
>> Does POSIX require that expr exits on illegal arguments?
> 
> Not sure.  It requires an exit status of 2 for invalid expressions.
> For the "+" operator, the operands are required to be decimal
> integers, but the error handling isn't so clearly specified.  For
> the "&" operator, operands(s) are allowed to be null.

There is an asymmetry in the handling of logical AND and OR, which may
or may not be mandated by POSIX:

$ expr "" \| ""

$ expr "" \& ""
0

In the case of OR, the second argument is returned, if the first one is
"0" or "". In the example above this results in "" being printed.

In the case of AND, "0" is returned, if both arguments are "0" or "".

This makes a difference if results of logical operations are not used as
logical operands (where "" and "0" have the same value), but as
arguments to other (numeric or string) operations.

Is this mandated by POSIX?
If not: Should we make the behaviour symmetric?

I prefer the way OR operates in that it always returns one of its
operands, the second one if both are FALSE.

For AND we should return the second operand, if the first one evaluated
to TRUE, IMHO (if permitted by POSIX). But since the current behaviour
has existed in FreeBSD for at least 10 years, we might just keep it
(although it slightly complicates the code).

> Anyway, hex numbers can't be put through this gap.  Since they are not
> decimal integers, they are required to be interpreted as strings in
> some contexts.  So "expr 0x10 \< 2" gives 1 because the string "0" is
> less then the string "2".  This conflicts with "expr 16 \< 2" giving
> 0 since both operands are intgers and of course 16 = 0x10 is not less
> than 2.

Too bad, actually ...

>>> - POSIX unfortunately requires args larger than INT32_MAX to be unportable
>>>   (to work if longs are longer than 32 bits, else to give undefined (?)
>>>   behaviour.  For portability there could be a -p switch that limits args
>>>   to INT32_MAX even if longs are longer than 32 bits.
>>
>> Well, undefined behaviour can always be to return the correct result ;-)
>>
>> I'd be willing to add "-p" (effectively just make "-e" the default that
>> can be overridden).
> 
> I now don't see any problem with -e.  Not even the one for degenerate
> expressions that I thought I saw.

That's great!

> POSIX says that shell expressions should be prefered to expr, and for
> shell expressions it has a non-null discussion of representability and
> overflows.  It basically says that only long arithmetic is supported,
> without even C's type suffixes which are needed to extend to unsigned
> long arithmetic, but extensions are encouraged.

Yes, I know that shell expressions are highly preferable to "expr" ...

BTW: /bin/sh does support 64bit numeric range on 32bit architectures in
shell expressions, but without range checks being performed.

I just checked a few shells on i386:

SH:
$ echo $((2000000000000000000000 + 10000000000))
-9223372026854775809

BASH:
$ echo $((2000000000000000000000 + 10000000000))
7751640049368425472

ZSH:
$ echo $((2000000000000000000000 + 10000000000))
zsh: number truncated after 19 digits: 2000000000000000000000 + 10000000000
2000000010000000000

So: How about range checks for $(( )) in our /bin/sh ???

Best regards, STefan

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 16:32:49 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 2B5F61065670
	for <standards@freebsd.org>; Thu, 30 Jun 2011 16:32:49 +0000 (UTC)
	(envelope-from se@freebsd.org)
Received: from nm11.bullet.mail.sp2.yahoo.com (nm11.bullet.mail.sp2.yahoo.com
	[98.139.91.81]) by mx1.freebsd.org (Postfix) with SMTP id 0A37C8FC15
	for <standards@freebsd.org>; Thu, 30 Jun 2011 16:32:49 +0000 (UTC)
Received: from [98.139.91.66] by nm11.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 16:32:48 -0000
Received: from [98.136.185.45] by tm6.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 16:32:48 -0000
Received: from [127.0.0.1] by smtp106.mail.gq1.yahoo.com with NNFMP;
	30 Jun 2011 16:32:48 -0000
X-Yahoo-Newman-Id: 343301.838.bm@smtp106.mail.gq1.yahoo.com
Received: from [192.168.119.20] (se@81.173.186.241 with plain)
	by smtp106.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 09:32:47 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: 2j7c05gVM1n5qIunuqRCSr5GR8jVgSovWmKxEvvGWIcMv7B
	7sTP6WK61KcygplnVYxtKaCPsC7h.8zNZdxC9RKkhF1TEJEJH9lUMxvn40kG
	l1JxopArcHye4Zpbj4MF5PE8s885RGTPUvj_WD2ppXOnV34Byvs9JvV6MSwN
	ypXZtWDAMFRWDyv5S8zdIwRqS1oIh3oTlSIQH8eZpPut2CLcxMf5XR73dDqS
	WekUAF0pPAl_M3qR.ALoDUA5qVOSFQd0DDPXSan535sFQWb8iKloxw3i6fHt
	lxnAaPKc7UK02crRHMh8j.TXD4sLkyh_ebtz8uU2IzRKtVhxqtZuEfYxerfi
	nZAEYXMEI7ypE3czj36IkQuV62w--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0CA52D.7020508@freebsd.org>
Date: Thu, 30 Jun 2011 18:32:45 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org>
	<4E0B1C47.4010201@freebsd.org>
	<20110630073705.P1117@besplex.bde.org>
	<4E0C2F41.2060302@freebsd.org>
In-Reply-To: <4E0C2F41.2060302@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Alexander Best <arundel@freebsd.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org,
	Bruce Evans <bde@freebsd.org>
Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 16:32:49 -0000

The attached patch takes the feedback from Bruce into account.
See details inline in the quoted mail, below.

Any further comments?

On 30.06.2011 10:09, Stefan Esser wrote:
> On 30.06.2011 01:32, Bruce Evans wrote:
>> On Wed, 29 Jun 2011, Stefan Esser wrote:
>>> Am 29.06.2011 01:06, schrieb Bruce Evans:
>>>> Other points:
>>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>>>   but it isn't the result that is too large, but the arg that is too large.
>>>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>>>   correct, but it is in a different style to strerror() (uncapitalized,
>>>>   and more concise).
>>>
>>> The patch that I sent with my first message fixes this. The message is
>>> changed from "Result too large" to "Not a valid integer: %s".
>>>
>>> ("non-numeric argument" is used in other places and I could adapt to
>>> that, though I prefer to also see which argument was rejected. But I
>>> think that "not a valid integer" better describes the situation.)
>>
>> I prefer "operand too large".  A decimal integer operand that is too
>> large to be represented by intmax_t is not really invalid, but is just
>> too large.  We already have a different message for invalid.
> 
> Yes, I also thought the "non-numeric" message was misleading and that
> "not a valid integer" was just slightly less misleading ;-)
> 
> But "operand too large" correctly describes the situation and I'll just
> add a test for that case (instead of just relying on strtoimax() to decide
> about integer-ness).

The patch introduces a function is_integer(), which performs a simple 
syntax check on operands to determine whether they are decimals. This 
function replaces the use of strtoimax() for the purpose of testing the 
operand, which fails to accept decimals too long to be represented by 
intmax_t.

> According to POSIX, "+1" is no valid integer operand in "expr", should a
> leading "+" also be detected and rejected?

The is_integer() function only permits an optional leading minus sign 
followed by digits. The case of a only a minus sign passed as operand 
is already caught by the parser. 

(Aside: Is it correct to reject "expr -- - : -" as a syntax error?) 

>>> From an old (?) version of the man page:
>>
>> %    Arithmetic operations are performed using signed integer math.  If the -e
>> %    flag is specified, arithmetic uses the C intmax_t data type (the largest
>>                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
>>                                    for expr [-e] 10000000000000000000 we are
>>                                    doing actual arithmetic -- see below 
>> %    integral type available), and expr will detect arithmetic overflow and
>> %    return an error indication.  If a numeric operand is specified which is
>>
>> "Numeric" is not defined anywhere in the man page.  This is the only
>> use of it in the man page.  It means "decimal integer" and should say
>> precisely that.  The only hint about this in the man page is the statement
>> that "all integer operands are interpreted in base 10".  The fuzziness
>> extends to error messages saying "non-numeric argument" instead of
>> "operand not a decimal integer [used in integer context]".
> 
> Ok, all numeric arguments should be called "decimal integer" in the
> documentation and error messages (if applicable).

I have changed the wording in the man page. Critical review is welcome.

>> %    so large as to overflow conversion to an integer, it is parsed as a
>> %    string instead.  If -e is not specified, arithmetic operations and pars-
>>
>> This specificially says that large operands are parsed as strings.
>> Strangely, since large operands are only checked for with -e, only -e can
>> get this right; without -e, large operands are not even detected.
>> However, this is a bug in the man page -- see below.
> 
> I had already wondered about this case (large numbers being considered
> strings), but I thought it was in compliance with POSIX. This will be
> fixed by checking operands to be POSIX decimal integers.

Both program and man page are fixed.

[...]
>>   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
>>   to be required by the man page; the arg in it forms a degenerate
>>   expression.  The arg is not a <string> since it is an <integer>.
>>   Therefore, the expression is numeric (I didn't check that POSIX says
>>   this explicitly).  Therefore, we are justified in applying strtoimax()
>>   to all the operands in the expression (all 1 of them) and getting a
>>   range error.

The current behaviour was irritating in so far as the first the operand 
was printed, then an error message on the next line, and an exit code of 2 
is returned. The error message comes from the check for 0 or "", which 
shall result in an exit code of 1, but for too large decimals an overflow 
is detected during these tests.

This is fixed by first checking for overflow and then printing the result 
of the evaluated expression.

A side effect is that "expr 0000000000010000000" now correctly prints 
"10000000" instead of "0000000000010000000".

> I think this is rational behaviour of the code, but I'm not convinced,
> that <identifier> can not be <integer> and <string> at the same time.
> See for example:
> 
> $ expr \( 222 \) : "2*"
> 3
> $ expr \( 111 + 111 \) : "2*"
> 3
> 
> The numeric result is taken as string to perform the RE matching and the
> length of the first match is returned. Therefore, a clear integer can
> still be interpreted as string, if operand to a string expression.
> The degenerate case of just a too long integer might still be considered
> a valid string and the missing operation (or identity?) does not change
> that type. Therefore it might be argued, that a very long numeric
> argument should just be printed as string in that case. (???)

I have had a look at the POSIX page for EXRP and found that I had a wrong 
understanding of some terms. Each operand is either an integer or a string, 
but the result of an evaluation can be converted to either.

Thus it is correct to enforce numeric interpretation on input that is 
syntactically parsed as integer number.

>> The syntax is still broken as designed, since it doesn't allow +1 to
>> be an integer, and it requires octal intgers to be misinterpreted
>> as decimal integers although no reasonable specification of decimal
>> integers allows them to start with a '0', and it doesn't support
>> non-decimal integers...
> 
> We *do* accept "+1" as synonymous to "1", though, as pointed out above.
> I'd like to change this.

Changed, as explained above.

>>> Does POSIX require that expr exits on illegal arguments?
>>
>> Not sure.  It requires an exit status of 2 for invalid expressions.
>> For the "+" operator, the operands are required to be decimal
>> integers, but the error handling isn't so clearly specified.  For
>> the "&" operator, operands(s) are allowed to be null.

The patch I prepared supports empty operands for arithmetic operations,
they are treated as "0":

$ expr "" + 1
1

This is easily changed by enforcing at least one digit in is_integer().
(I have tested both versions, but since this appears to be traditional
behaviour at least on FreeBSD, I'm worried that enforcing at least one 
digit for numeric operands will break existing shell scripts.)

Please review the attached patch. It is again created with "-w" to ignore 
whitespace changes (indentation).

Regards, STefan

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 16:33:38 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 54AC41065674
	for <standards@freebsd.org>; Thu, 30 Jun 2011 16:33:38 +0000 (UTC)
	(envelope-from se@freebsd.org)
Received: from nm19.bullet.mail.sp2.yahoo.com (nm19.bullet.mail.sp2.yahoo.com
	[98.139.91.89]) by mx1.freebsd.org (Postfix) with SMTP id 2ED4F8FC1A
	for <standards@freebsd.org>; Thu, 30 Jun 2011 16:33:38 +0000 (UTC)
Received: from [98.139.91.67] by nm19.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 16:33:37 -0000
Received: from [208.71.42.200] by tm7.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 16:33:37 -0000
Received: from [127.0.0.1] by smtp211.mail.gq1.yahoo.com with NNFMP;
	30 Jun 2011 16:33:37 -0000
X-Yahoo-Newman-Id: 326617.47511.bm@smtp211.mail.gq1.yahoo.com
Received: from [192.168.119.20] (se@81.173.186.241 with plain)
	by smtp211.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 09:33:36 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: _JDLKuIVM1k_rHOWHKkYnf4QIzBgpT0NrnSbF1HiUld8JvT
	ErT9FTe9ZyNnyz0UgI0LZOQOszQLCcUR3o0YyRfxHGAgNl4cXBZMSwc.2Jop
	OVC_THnzorKkUGylY5NuuAxr6wLQ5WUMvVyLLGvMImCoFU0G0n98VHBemQpz
	wwmPVCuxgRMqzn.Ro_brwKoInZwfjcTRe7MeiW9iIdj8u.OSOZunMgK4nCvF
	vWelmzjCrqfFDDtob28zy9le3TfT59CrZlPlUvKkDJ.qOOnrNuivO02xoFRR
	VyMpAoNLYwtJkx7.tIe5W2JITl4XBprhiKsIbVT69ZGT8pxnAGEyAnDFewgV
	yoOZSlP7AsRdIJy9qJcOJ17QO8A--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0CA55E.2090102@freebsd.org>
Date: Thu, 30 Jun 2011 18:33:34 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org>
	<4E0B1C47.4010201@freebsd.org>
	<20110630073705.P1117@besplex.bde.org>
	<4E0C2F41.2060302@freebsd.org>
In-Reply-To: <4E0C2F41.2060302@freebsd.org>
Content-Type: multipart/mixed; boundary="------------070607050902030603000606"
Cc: Alexander Best <arundel@freebsd.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org,
	Bruce Evans <bde@freebsd.org>
Subject: RESENT with patch ... Re: [RFC] Consistent numeric range for "expr"
 on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 16:33:38 -0000

This is a multi-part message in MIME format.
--------------070607050902030603000606
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

The attached patch takes the feedback from Bruce into account.
See details inline in the quoted mail, below.

Any further comments?

On 30.06.2011 10:09, Stefan Esser wrote:
> On 30.06.2011 01:32, Bruce Evans wrote:
>> On Wed, 29 Jun 2011, Stefan Esser wrote:
>>> Am 29.06.2011 01:06, schrieb Bruce Evans:
>>>> Other points:
>>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large",
>>>>   but it isn't the result that is too large, but the arg that is too large.
>>>>   This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE.
>>>>   `expr -e 1000000000000000000 \* 10' gives "overflow".  This message is
>>>>   correct, but it is in a different style to strerror() (uncapitalized,
>>>>   and more concise).
>>>
>>> The patch that I sent with my first message fixes this. The message is
>>> changed from "Result too large" to "Not a valid integer: %s".
>>>
>>> ("non-numeric argument" is used in other places and I could adapt to
>>> that, though I prefer to also see which argument was rejected. But I
>>> think that "not a valid integer" better describes the situation.)
>>
>> I prefer "operand too large".  A decimal integer operand that is too
>> large to be represented by intmax_t is not really invalid, but is just
>> too large.  We already have a different message for invalid.
> 
> Yes, I also thought the "non-numeric" message was misleading and that
> "not a valid integer" was just slightly less misleading ;-)
> 
> But "operand too large" correctly describes the situation and I'll just
> add a test for that case (instead of just relying on strtoimax() to decide
> about integer-ness).

The patch introduces a function is_integer(), which performs a simple
syntax check on operands to determine whether they are decimals. This
function replaces the use of strtoimax() for the purpose of testing the
operand, which fails to accept decimals too long to be represented by
intmax_t.

> According to POSIX, "+1" is no valid integer operand in "expr", should a
> leading "+" also be detected and rejected?

The is_integer() function only permits an optional leading minus sign
followed by digits. The case of a only a minus sign passed as operand is
already caught by the parser.
(Aside: Is it correct to reject "expr -- - : -" as a syntax error?)
>>> From an old (?) version of the man page:
>>
>> %    Arithmetic operations are performed using signed integer math.  If the -e
>> %    flag is specified, arithmetic uses the C intmax_t data type (the largest
>>                         ^^^^^^^^^^ actual arithmentic, not just parsing; but
>>                                    for expr [-e] 10000000000000000000 we are
>>                                    doing actual arithmetic -- see below 
>> %    integral type available), and expr will detect arithmetic overflow and
>> %    return an error indication.  If a numeric operand is specified which is
>>
>> "Numeric" is not defined anywhere in the man page.  This is the only
>> use of it in the man page.  It means "decimal integer" and should say
>> precisely that.  The only hint about this in the man page is the statement
>> that "all integer operands are interpreted in base 10".  The fuzziness
>> extends to error messages saying "non-numeric argument" instead of
>> "operand not a decimal integer [used in integer context]".
> 
> Ok, all numeric arguments should be called "decimal integer" in the
> documentation and error messages (if applicable).

I have changed the wording in the man page. Critical review is welcome.

>> %    so large as to overflow conversion to an integer, it is parsed as a
>> %    string instead.  If -e is not specified, arithmetic operations and pars-
>>
>> This specificially says that large operands are parsed as strings.
>> Strangely, since large operands are only checked for with -e, only -e can
>> get this right; without -e, large operands are not even detected.
>> However, this is a bug in the man page -- see below.
> 
> I had already wondered about this case (large numbers being considered
> strings), but I thought it was in compliance with POSIX. This will be
> fixed by checking operands to be POSIX decimal integers.

Both program and man page are fixed.

[...]
>>   Therefore, "expr -e 1000000000000000000" is not a syntax error as seems
>>   to be required by the man page; the arg in it forms a degenerate
>>   expression.  The arg is not a <string> since it is an <integer>.
>>   Therefore, the expression is numeric (I didn't check that POSIX says
>>   this explicitly).  Therefore, we are justified in applying strtoimax()
>>   to all the operands in the expression (all 1 of them) and getting a
>>   range error.

The current behaviour was irritating in so far as the first the operand
was printed, then an error message on the next line, and an exit code of
2 is returned. The error message comes from the check for 0 or "", which
shall result in an exit code of 1, but for too large decimals an
overflow is detected during these tests.

This is fixed by first checking for overflow and then printing the
result of the evaluated expression.

A side effect is that "expr 0000000000010000000" now correctly prints
"10000000" instead of "0000000000010000000".

> I think this is rational behaviour of the code, but I'm not convinced,
> that <identifier> can not be <integer> and <string> at the same time.
> See for example:
> 
> $ expr \( 222 \) : "2*"
> 3
> $ expr \( 111 + 111 \) : "2*"
> 3
> 
> The numeric result is taken as string to perform the RE matching and the
> length of the first match is returned. Therefore, a clear integer can
> still be interpreted as string, if operand to a string expression.
> The degenerate case of just a too long integer might still be considered
> a valid string and the missing operation (or identity?) does not change
> that type. Therefore it might be argued, that a very long numeric
> argument should just be printed as string in that case. (???)

I have had a look at the POSIX page for EXRP and found that I had a
wrong understanding of some terms. Each operand is either an integer or
a string, but the result of an evaluation can be converted to either.

Thus it is correct to enforce numeric interpretation on input that is
syntactically parsed as integer number.

>> The syntax is still broken as designed, since it doesn't allow +1 to
>> be an integer, and it requires octal intgers to be misinterpreted
>> as decimal integers although no reasonable specification of decimal
>> integers allows them to start with a '0', and it doesn't support
>> non-decimal integers...
> 
> We *do* accept "+1" as synonymous to "1", though, as pointed out above.
> I'd like to change this.

Changed, as explained above.

>>> Does POSIX require that expr exits on illegal arguments?
>>
>> Not sure.  It requires an exit status of 2 for invalid expressions.
>> For the "+" operator, the operands are required to be decimal
>> integers, but the error handling isn't so clearly specified.  For
>> the "&" operator, operands(s) are allowed to be null.

The patch I prepared supports empty operands for arithmetic operations,
they are treated as "0":

$ expr "" + 1
1

This is easily changed by enforcing at least one digit in is_integer().
(I have tested both versions, but since this appears to be traditional
behaviour at least on FreeBSD, I'm worried that enforcing at least one
digit for numeric operands will break existing shell scripts.)

Please review the attached patch. It is again created with "-w" to
ignore whitespace changes (indentation).

Regards, STefan

--------------070607050902030603000606
Content-Type: text/plain;
 name="expr.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="expr.diff"

--- expr.y~
+++ expr.y	2011-06-30 17:44:35.914960240 +0200
@@ -48,6 +48,7 @@
 int		chk_times(intmax_t, intmax_t, intmax_t);
 void		free_value(struct val *);
 int		is_zero_or_null(struct val *);
+int		is_integer(const char *);
 int		isstring(struct val *);
 struct val	*make_integer(intmax_t);
 struct val	*make_str(const char *);
@@ -65,13 +66,12 @@
 struct val	*op_plus(struct val *, struct val *);
 struct val	*op_rem(struct val *, struct val *);
 struct val	*op_times(struct val *, struct val *);
-intmax_t	to_integer(struct val *);
+int		to_integer(struct val *);
 void		to_string(struct val *);
 int		yyerror(const char *);
 int		yylex(void);
 int		yyparse(void);
 
-static int	eflag;
 char **av;
 %}
 
@@ -134,37 +134,15 @@
 make_str(const char *s)
 {
 	struct val *vp;
-	char *ep;
 
 	vp = (struct val *) malloc (sizeof (*vp));
 	if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) {
 		errx(ERR_EXIT, "malloc() failed");
 	}
-
-	/*
-	 * Previously we tried to scan the string to see if it ``looked like''
-	 * an integer (erroneously, as it happened).  Let strtoimax() do the
-	 * dirty work.  We could cache the value, except that we are using
-	 * a union and need to preserve the original string form until we
-	 * are certain that it is not needed.
-	 *
-	 * IEEE Std.1003.1-2001 says:
-	 * /integer/ An argument consisting only of an (optional) unary minus  
-	 *	     followed by digits.          
-	 *
-	 * This means that arguments which consist of digits followed by
-	 * non-digits MUST NOT be considered integers.  strtoimax() will
-	 * figure this out for us.
-	 */
-	if (eflag)
-		(void)strtoimax(s, &ep, 10);
+	if (is_integer(s))
+		vp->type = numeric_string;
 	else
-		(void)strtol(s, &ep, 10);
-
-	if (*ep != '\0')
 		vp->type = string;
-	else	
-		vp->type = numeric_string;
 
 	return vp;
 }
@@ -178,7 +156,7 @@
 }
 
 
-intmax_t
+int
 to_integer(struct val *vp)
 {
 	intmax_t i;
@@ -191,13 +169,9 @@
 
 	/* vp->type == numeric_string, make it numeric */
 	errno = 0;
-	if (eflag) {
-		i  = strtoimax(vp->u.s, (char **)NULL, 10);
-		if (errno == ERANGE)
-			err(ERR_EXIT, NULL);
-	} else {
-		i = strtol(vp->u.s, (char **)NULL, 10);
-	}
+	i  = strtoimax(vp->u.s, (char **)NULL, 10);
+	if (errno == ERANGE)
+		errx(ERR_EXIT, "operand too large: '%s'", vp->u.s);
 
 	free (vp->u.s);
 	vp->u.i = i;
@@ -230,6 +204,17 @@
 
 
 int
+is_integer(const char *s)
+{
+	if (*s == '-')
+		s++;
+	while (isdigit(*s))
+		s++;
+	return *s == '\0';
+}
+
+
+int
 isstring(struct val *vp)
 {
 	/* only TRUE if this string is not a valid integer */
@@ -282,12 +267,11 @@
 	if (getenv("EXPR_COMPAT") != NULL
 	    || check_utility_compat("expr")) {
 		av = argv + 1;
-		eflag = 1;
 	} else {
 		while ((c = getopt(argc, argv, "e")) != -1)
 			switch (c) {
 			case 'e':
-				eflag = 1;
+				/* ignore for backwards compatibility */
 				break;
 
 			default:
@@ -300,6 +284,8 @@
 
 	yyparse();
 
+	/* convert type numeric_string to integer; NOP for other types */
+	(void)to_integer(result);
 	if (result->type == integer)
 		printf("%jd\n", result->u.i);
 	else
@@ -483,13 +469,10 @@
 		errx(ERR_EXIT, "non-numeric argument");
 	}
 
-	if (eflag) {
-		r = make_integer(a->u.i + b->u.i);
-		if (chk_plus(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i + (long)b->u.i);
+	r = make_integer(a->u.i + b->u.i);
+	if (chk_plus(a->u.i, b->u.i, r->u.i)) {
+		errx(ERR_EXIT, "overflow");
+	}
 
 	free_value (a);
 	free_value (b);
@@ -520,13 +503,10 @@
 		errx(ERR_EXIT, "non-numeric argument");
 	}
 
-	if (eflag) {
-		r = make_integer(a->u.i - b->u.i);
-		if (chk_minus(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i - (long)b->u.i);
+	r = make_integer(a->u.i - b->u.i);
+	if (chk_minus(a->u.i, b->u.i, r->u.i)) {
+		errx(ERR_EXIT, "overflow");
+	}
 
 	free_value (a);
 	free_value (b);
@@ -554,13 +534,10 @@
 		errx(ERR_EXIT, "non-numeric argument");
 	}
 
-	if (eflag) {
-		r = make_integer(a->u.i * b->u.i);
-		if (chk_times(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i * (long)b->u.i);
+	r = make_integer(a->u.i * b->u.i);
+	if (chk_times(a->u.i, b->u.i, r->u.i)) {
+		errx(ERR_EXIT, "overflow");
+	}
 
 	free_value (a);
 	free_value (b);
@@ -591,13 +568,10 @@
 		errx(ERR_EXIT, "division by zero");
 	}
 
-	if (eflag) {
-		r = make_integer(a->u.i / b->u.i);
-		if (chk_div(a->u.i, b->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i / (long)b->u.i);
+	r = make_integer(a->u.i / b->u.i);
+	if (chk_div(a->u.i, b->u.i)) {
+		errx(ERR_EXIT, "overflow");
+	}
 
 	free_value (a);
 	free_value (b);
@@ -617,11 +591,8 @@
 		errx(ERR_EXIT, "division by zero");
 	}
 
-	if (eflag)
-		r = make_integer(a->u.i % b->u.i);
-	        /* chk_rem necessary ??? */
-	else
-		r = make_integer((long)a->u.i % (long)b->u.i);
+	r = make_integer(a->u.i % b->u.i);
+	/* chk_rem necessary ??? */
 
 	free_value (a);
 	free_value (b);
--- expr.1~
+++ expr.1	2011-06-30 16:06:50.338086000 +0200
@@ -50,25 +50,19 @@
 All operators and operands must be passed as separate arguments.
 Several of the operators have special meaning to command interpreters
 and must therefore be quoted appropriately.
-All integer operands are interpreted in base 10.
+All integer operands are interpreted in base 10 and must only consist
+of an optional leading minus sign followed by one or more digits.
 .Pp
-Arithmetic operations are performed using signed integer math.
-If the
-.Fl e
-flag is specified, arithmetic uses the C
+Arithmetic operations are performed using signed integer math with a
+range according to the C
 .Vt intmax_t
-data type (the largest integral type available), and
-.Nm
-will detect arithmetic overflow and return an error indication.
-If a numeric operand is specified which is so large as to overflow
-conversion to an integer, it is parsed as a string instead.
-If
+data type (the largest signed integral type available). All conversions and
+operations are checked for overflow, which will result in program termination
+with an error message on stdout and with error status being returned.
+.Pp
+The
 .Fl e
-is not specified, arithmetic operations and parsing of integer
-arguments will overflow silently according to the rules of the C
-standard, using the
-.Vt long
-data type.
+option is accepted for backwards compatibility but has no effect.
 .Pp
 Operators are listed below in order of increasing precedence; all
 are left-associative.
@@ -272,6 +266,8 @@
 utility conforms to
 .St -p1003.1-2001 ,
 provided that compatibility mode is not enabled.
+Extended arithmetic range and overflow checks apply to cases of
+undefined behavior according to POSIX.
 The
 .Fl e
 flag is an extension.

--------------070607050902030603000606--

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 16:59:05 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 21E9C106566C
	for <freebsd-standards@FreeBSD.ORG>;
	Thu, 30 Jun 2011 16:59:05 +0000 (UTC) (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101])
	by mx1.freebsd.org (Postfix) with ESMTP id DFF218FC15
	for <freebsd-standards@FreeBSD.ORG>;
	Thu, 30 Jun 2011 16:59:04 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
	by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id p5UGkr7W083426;
	Thu, 30 Jun 2011 12:46:53 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by zim.MIT.EDU (8.14.4/8.14.2/Submit) id p5UGkrOM083425;
	Thu, 30 Jun 2011 12:46:53 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Date: Thu, 30 Jun 2011 12:46:53 -0400
From: David Schultz <das@FreeBSD.ORG>
To: Stefan Esser <se@FreeBSD.ORG>
Message-ID: <20110630164653.GA82980@zim.MIT.EDU>
Mail-Followup-To: Stefan Esser <se@freebsd.org>, freebsd-standards@freebsd.org
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E0B860E.50504@freebsd.org>
Cc: freebsd-standards@FreeBSD.ORG
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
	architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 16:59:05 -0000

On Wed, Jun 29, 2011, Stefan Esser wrote:
> My suggestion is to make the following modifications to expr:
> 
> 1) Always compute with 64 bit range and overflow checks enabled. This
>    means, that the "-e" option is always assumed.
> 
> 2) Continue to support the "-e" option as a NOP for compatibility with
>    scripts that knew about that FreeBSD extension.

Using 64-bit signed integer arithmetic consistently in expr sounds
like a reasonable idea.  There's no reason shell scripts ought to
be exposed to architectural details.

There are two problems, however.  First, the implementation
doesn't do this: it uses intmax_t, which has platform-dependent
width, at least in theory.  Second, it sounds like POSIX doesn't
allow this (although I don't know where this requirement comes
from):

| r96367 | wollman | 2002-05-10 18:59:29 -0400 (Fri, 10 May 2002) | 5 lines
| 
| The response to my POSIX interpretation request says that `expr'
| is required to be oblivious to overflow and to use the data type `long'.
| (Division by zero is undefined in ISO C so it's still OK to check for it
| here.)  Add a new `-e' flag to get the old, more useful behavior.

Nevertheless, moving to int64_t arithmetic by default seems like a
good idea, with low potential for complications.

Overflow checking is a separate concern, and one that is more
likely to cause problems.  I'd be more careful about changing the
default behavior there, because some scripts might rely on modular
arithmetic without overflow checks.

Another thing that is likely to cause confusion is expr(1)
behaving differently from the shell builtin, and differently from
the shell's arithmetic.  Above all else, I suggest trying to make
everything consistent...perhaps talk to jilles@ about the shell
side of things.

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 17:12:47 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 3DD5E1065677
	for <standards@freebsd.org>; Thu, 30 Jun 2011 17:12:47 +0000 (UTC)
	(envelope-from se@freebsd.org)
Received: from nm16.bullet.mail.sp2.yahoo.com (nm16.bullet.mail.sp2.yahoo.com
	[98.139.91.86]) by mx1.freebsd.org (Postfix) with SMTP id 1C5858FC1B
	for <standards@freebsd.org>; Thu, 30 Jun 2011 17:12:47 +0000 (UTC)
Received: from [98.139.91.61] by nm16.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 17:12:46 -0000
Received: from [208.71.42.192] by tm1.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 17:12:46 -0000
Received: from [127.0.0.1] by smtp203.mail.gq1.yahoo.com with NNFMP;
	30 Jun 2011 17:12:46 -0000
X-Yahoo-Newman-Id: 493635.11874.bm@smtp203.mail.gq1.yahoo.com
Received: from [192.168.119.20] (se@81.173.186.241 with plain)
	by smtp203.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 10:12:46 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: DUS1HSYVM1mONaYXbrRosb.IOMHctps.pGx6EOc1ZIXX8fX
	NORcnMFHxeeHSI6UwLo7DKALBnNKotLqPYerfgGfbGrm2aGLVDScn3AZOfNS
	DbU7hx4LkGGO_fIuBHZgziDvizLrsnWfT8bZHH6m4STJuWVmsEhrE7F1lN1L
	Mfhnmgj.F4EBQFHch7.Pqb8Q.iuU3uIHNaXIuoDtvjocsawFoGTTm.7tQx00
	x7l.KA1TOPdyGsAwh1oX7N5ZEHeFj2aCX2nyTahRhEJ3Obk_VmWlxYryWGUh
	gfAmoMhp_2S7ITmo2GfkxpYunn0v3gXUgXhiYJvkKDfO1QkVcGOszth8sUOB
	CXaPhkyIR._auVLpbeINclZB29g--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0CAE8B.6030309@freebsd.org>
Date: Thu, 30 Jun 2011 19:12:43 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: Bruce Evans <brde@optusnet.com.au>, Alexander Best <arundel@freebsd.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org, 
	Bruce Evans <bde@freebsd.org>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org>
	<4E0B1C47.4010201@freebsd.org>
	<20110630073705.P1117@besplex.bde.org>
	<4E0C2F41.2060302@freebsd.org> <4E0CA55E.2090102@freebsd.org>
	<20110630165050.GB82980@zim.MIT.EDU>
In-Reply-To: <20110630165050.GB82980@zim.MIT.EDU>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: 
Subject: Re: RESENT with patch ... Re: [RFC] Consistent numeric range for
 "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 17:12:47 -0000

Am 30.06.2011 18:50, schrieb David Schultz:
> On Thu, Jun 30, 2011, Stefan Esser wrote:
>>  int
>> +is_integer(const char *s)
>> +{
>> +	if (*s == '-')
>> +		s++;
>> +	while (isdigit(*s))
>> +		s++;
>> +	return *s == '\0';
>> +}
> 
> I only glanced at the patch for a few seconds, but your
> is_integer() routine will accept a bare minus sign ("-") as an
> integer.

I mentioned in the message, that I do not test for empty numeric
operands. I had considered:

int
is_integer(const char *s)
{
        if (*s == '-')
                s++;
        if (!isdigit(*s++)
                return 0;
        while (isdigit(*s))
                s++;
        return *s == '\0';
}

But this will break script that do

	while :
	do
		i=`expr "$i" + 1`
	done

without first setting i=0. And I assume, such scripts exist in huge numbers.

But "-" will not be accepted as operand (neither integer nor string).

Didn't I give the example "expr -- - : -" which fails with a syntax
error?

Therefore, I do not need to check for actual digits in the input (and
thus may accept "" as 0 for backwards compatibility), but "-" will not
be allowed as parameter to expr by the parser ("syntax error").

Regards, STefan

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 17:18:59 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 74F4C106566C
	for <freebsd-standards@freebsd.org>;
	Thu, 30 Jun 2011 17:18:59 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from nm30.bullet.mail.sp2.yahoo.com (nm30.bullet.mail.sp2.yahoo.com
	[98.139.91.100]) by mx1.freebsd.org (Postfix) with SMTP id 580618FC1A
	for <freebsd-standards@freebsd.org>;
	Thu, 30 Jun 2011 17:18:59 +0000 (UTC)
Received: from [98.139.91.67] by nm30.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 17:06:08 -0000
Received: from [98.136.185.47] by tm7.bullet.mail.sp2.yahoo.com with NNFMP;
	30 Jun 2011 17:06:08 -0000
Received: from [127.0.0.1] by smtp108.mail.gq1.yahoo.com with NNFMP;
	30 Jun 2011 17:06:08 -0000
X-Yahoo-Newman-Id: 913956.36989.bm@smtp108.mail.gq1.yahoo.com
Received: from [192.168.119.20] (se@81.173.186.241 with plain)
	by smtp108.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 10:06:06 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: QxSXANIVM1njiZAbBuil7_IdtFmU6BPtQApEsK6n6bG2zXU
	Wg8scr_X1q8JtnZCRtyWOTTZ_Pu8kmZ2glahJLLcKyZPaRaiVKjZigVAu_rl
	eeKaCR0v3X1Nw.kD9AqpUn0iSj3AsYL9vuIRJT5W7Cme7Oqao_tPkzxA0wEW
	ILapMHU_854MxKoqAaZjzfzvJud2_a33c492KEHM.sIbCW324mBlq5mZT3In
	mYXjmTxRcH5dHg3XGhbQSPbucJDQXHtkaEuOebn3HgAIDKt2wvaRrzMxwW49
	zfhGRazFsC.8imzswD4kKNfLSEHO9IeBMzpNlLR3FHBopeBdsrcB5EUoK.Tg
	Bx9AQ77tEBDqo8DYwQ__SHhscSw--
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0CACFB.8040906@freebsd.org>
Date: Thu, 30 Jun 2011 19:06:03 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-standards@freebsd.org
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU>
In-Reply-To: <20110630164653.GA82980@zim.MIT.EDU>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
	architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 17:18:59 -0000

Am 30.06.2011 18:46, schrieb David Schultz:
> On Wed, Jun 29, 2011, Stefan Esser wrote:
>> My suggestion is to make the following modifications to expr:
>>
>> 1) Always compute with 64 bit range and overflow checks enabled. This
>>    means, that the "-e" option is always assumed.
>>
>> 2) Continue to support the "-e" option as a NOP for compatibility with
>>    scripts that knew about that FreeBSD extension.
> 
> Using 64-bit signed integer arithmetic consistently in expr sounds
> like a reasonable idea.  There's no reason shell scripts ought to
> be exposed to architectural details.

Yes, especially when you move them from e.g. amd64 to i386 ...

> There are two problems, however.  First, the implementation
> doesn't do this: it uses intmax_t, which has platform-dependent
> width, at least in theory.  Second, it sounds like POSIX doesn't

Yes, but it seems that intmax_t is guaranteed to be at least 64bit wide.

> allow this (although I don't know where this requirement comes
> from):
> 
> | r96367 | wollman | 2002-05-10 18:59:29 -0400 (Fri, 10 May 2002) | 5 lines
> | 
> | The response to my POSIX interpretation request says that `expr'
> | is required to be oblivious to overflow and to use the data type `long'.
> | (Division by zero is undefined in ISO C so it's still OK to check for it
> | here.)  Add a new `-e' flag to get the old, more useful behavior.

Well, that's the frustrating part: I had implemented 64bit range in expr
back in 2000, but this extended range (on 32bit archs) has been made
optional, some two years later with the commit message you quote.

In between, a number of people were hurt by scripts that silently
truncated numeric operands or results, and during a mail exchange in
that context, the above requirement was said to not really apply.

The reasoning is, that overflow beyond the range of signed long leads to
unspecified behaviour. And one possibility to deal with unspecified
behaviour it to just return the "correct" result without wrap-aroung.

The range check is also possible in that case, AFAIK. Once you are
outside the specified range, POSIX allows extensions that do just what I
propose (and what already was in FreeBSD, 10 years ago).

> Nevertheless, moving to int64_t arithmetic by default seems like a
> good idea, with low potential for complications.

On amd64 and other 64 bit architectures the numeric range already is 64bit.

> Overflow checking is a separate concern, and one that is more
> likely to cause problems.  I'd be more careful about changing the
> default behavior there, because some scripts might rely on modular
> arithmetic without overflow checks.

You cannot portably rely in overflow, since you have no guarantee that
overflow occurs at a specific boundary.

> Another thing that is likely to cause confusion is expr(1)
> behaving differently from the shell builtin, and differently from
> the shell's arithmetic.  Above all else, I suggest trying to make
> everything consistent...perhaps talk to jilles@ about the shell
> side of things.

My change in 2000 made the shell builtin do 64bit arithmetic as well (in
fact, the shell build just included the expr.y file ...).

But there is no expr builtin anymore, so there is no risk here ;-)

Regards, STefan

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 17:29:04 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id ECCD51065674;
	Thu, 30 Jun 2011 17:29:04 +0000 (UTC) (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101])
	by mx1.freebsd.org (Postfix) with ESMTP id 85A6D8FC13;
	Thu, 30 Jun 2011 17:29:04 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
	by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id p5UGooOk083449;
	Thu, 30 Jun 2011 12:50:50 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by zim.MIT.EDU (8.14.4/8.14.2/Submit) id p5UGoo1X083448;
	Thu, 30 Jun 2011 12:50:50 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Date: Thu, 30 Jun 2011 12:50:50 -0400
From: David Schultz <das@FreeBSD.ORG>
To: Stefan Esser <se@FreeBSD.ORG>
Message-ID: <20110630165050.GB82980@zim.MIT.EDU>
Mail-Followup-To: Stefan Esser <se@freebsd.org>,
	Bruce Evans <brde@optusnet.com.au>,
	Alexander Best <arundel@freebsd.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org,
	Bruce Evans <bde@freebsd.org>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org>
	<4E0B1C47.4010201@freebsd.org>
	<20110630073705.P1117@besplex.bde.org>
	<4E0C2F41.2060302@freebsd.org> <4E0CA55E.2090102@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E0CA55E.2090102@freebsd.org>
Cc: Alexander Best <arundel@FreeBSD.ORG>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@FreeBSD.ORG,
	Bruce Evans <bde@FreeBSD.ORG>
Subject: Re: RESENT with patch ... Re: [RFC] Consistent numeric range for
	"expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 17:29:05 -0000

On Thu, Jun 30, 2011, Stefan Esser wrote:
>  int
> +is_integer(const char *s)
> +{
> +	if (*s == '-')
> +		s++;
> +	while (isdigit(*s))
> +		s++;
> +	return *s == '\0';
> +}

I only glanced at the patch for a few seconds, but your
is_integer() routine will accept a bare minus sign ("-") as an
integer.

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 17:42:09 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1BE8C106564A;
	Thu, 30 Jun 2011 17:42:09 +0000 (UTC) (envelope-from das@FreeBSD.ORG)
Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101])
	by mx1.freebsd.org (Postfix) with ESMTP id 8828A8FC14;
	Thu, 30 Jun 2011 17:42:08 +0000 (UTC)
Received: from zim.MIT.EDU (localhost [127.0.0.1])
	by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id p5UHg8VQ083790;
	Thu, 30 Jun 2011 13:42:08 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Received: (from das@localhost)
	by zim.MIT.EDU (8.14.4/8.14.2/Submit) id p5UHg8FI083789;
	Thu, 30 Jun 2011 13:42:08 -0400 (EDT) (envelope-from das@FreeBSD.ORG)
Date: Thu, 30 Jun 2011 13:42:08 -0400
From: David Schultz <das@FreeBSD.ORG>
To: Stefan Esser <se@FreeBSD.ORG>
Message-ID: <20110630174208.GA83700@zim.MIT.EDU>
Mail-Followup-To: Stefan Esser <se@freebsd.org>, freebsd-standards@freebsd.org
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4E0CACFB.8040906@freebsd.org>
Cc: freebsd-standards@FreeBSD.ORG
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
	architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 17:42:09 -0000

On Thu, Jun 30, 2011, Stefan Esser wrote:
> Am 30.06.2011 18:46, schrieb David Schultz:
> > There are two problems, however.  First, the implementation
> > doesn't do this: it uses intmax_t, which has platform-dependent
> > width, at least in theory.  Second, it sounds like POSIX doesn't
> 
> Yes, but it seems that intmax_t is guaranteed to be at least 64bit wide.

The distinction is that int64_t is always exactly 64 bits, whereas
intmax_t could be wider.  If the goal is to guarantee that expr's
arithmetic is architecture-independent, then int64_t is a more
appropriate type.  (This requires a narrowing conversion and range
check after strtoimax(), but the compiler should optimize it
away.)

> > Overflow checking is a separate concern, and one that is more
> > likely to cause problems.  I'd be more careful about changing the
> > default behavior there, because some scripts might rely on modular
> > arithmetic without overflow checks.
> 
> You cannot portably rely in overflow, since you have no guarantee that
> overflow occurs at a specific boundary.

My point was that some scripts might rely on the lack of overflow
checking anyway.  This is a separate and potentially more
problematic change.  But I'm not trying to argue that it's
necessarily a bad idea.

> > Another thing that is likely to cause confusion is expr(1)
> > behaving differently from the shell builtin, and differently from
> > the shell's arithmetic.  Above all else, I suggest trying to make
> > everything consistent...perhaps talk to jilles@ about the shell
> > side of things.
> 
> My change in 2000 made the shell builtin do 64bit arithmetic as well (in
> fact, the shell build just included the expr.y file ...).

I don't know the state of the shell these days.  Didn't jilles@
change the shell to do that more recently?  My point here is that
working to make both the shell and expr behave identically would
be ideal.

From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 18:04:50 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 419D61065672
	for <freebsd-standards@freebsd.org>;
	Thu, 30 Jun 2011 18:04:50 +0000 (UTC)
	(envelope-from wollman@khavrinen.csail.mit.edu)
Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [128.30.28.20])
	by mx1.freebsd.org (Postfix) with ESMTP id 036C78FC0C
	for <freebsd-standards@freebsd.org>;
	Thu, 30 Jun 2011 18:04:49 +0000 (UTC)
Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1])
	by khavrinen.csail.mit.edu (8.14.4/8.14.4) with ESMTP id p5UHqslg094336
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256
	verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA);
	Thu, 30 Jun 2011 13:52:54 -0400 (EDT)
	(envelope-from wollman@khavrinen.csail.mit.edu)
Received: (from wollman@localhost)
	by khavrinen.csail.mit.edu (8.14.4/8.14.4/Submit) id p5UHqsNq094333;
	Thu, 30 Jun 2011 13:52:54 -0400 (EDT) (envelope-from wollman)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <19980.47094.184089.398324@khavrinen.csail.mit.edu>
Date: Thu, 30 Jun 2011 13:52:54 -0400
From: Garrett Wollman <wollman@csail.mit.edu>
To: Stefan Esser <se@freebsd.org>
In-Reply-To: <4E0CACFB.8040906@freebsd.org>
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org>
X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
	(khavrinen.csail.mit.edu [127.0.0.1]);
	Thu, 30 Jun 2011 13:52:54 -0400 (EDT)
Cc: freebsd-standards@freebsd.org
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
	architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 18:04:50 -0000

<<On Thu, 30 Jun 2011 19:06:03 +0200, Stefan Esser <se@freebsd.org> said:

> Well, that's the frustrating part: I had implemented 64bit range in expr
> back in 2000, but this extended range (on 32bit archs) has been made
> optional, some two years later with the commit message you quote.

The 2001 POSIX standard states:

> Integer variables and constants, including the values of operands
> and option-arguments, used by the standard utilities listed in this
> volume of IEEE Std 1003.1-2001 shall be implemented as equivalent to
> the ISO C standard signed long data type; floating point shall be
> implemented as equivalent to the ISO C standard double
> type. Conversions between types shall be as described in the ISO C
> standard. All variables shall be initialized to zero if they are not
> otherwise assigned by the input to the application.

> Arithmetic operators and control flow keywords shall be implemented
> as equivalent to those in the cited ISO C standard section, as
> listed in Table 1-2 (on page 8).

However, for arithemtic expansion in the shell (but *not* the expr
utility), it says:

> As an extension, the shell may recognize arithmetic expressions
> beyond those listed. The shell may use a signed integer type with a
> rank larger than the rank of signed long. The shell may use a
> real-floating type instead of signed long as long as it does not
> affect the results in cases where there is no overflow.

The language in the 2008 version of the standard is the same.

For expr, the following definitions are relevant (from XCU7 page
2715):
> integer An argument consisting only of an (optional) unary minus
>         followed by digits.
> string  A string argument; see below.
[...]
> A string argument is an argument that cannot be identified as an
> integer argument or as one of the expression operator symbols shown
> in the OPERANDS section.

I'm unable to find the response to my interpretation request
in the Austin Group mail archives.

-GAWollman


From owner-freebsd-standards@FreeBSD.ORG  Thu Jun 30 22:52:43 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id F007A106566B;
	Thu, 30 Jun 2011 22:52:42 +0000 (UTC) (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107])
	by mx1.freebsd.org (Postfix) with ESMTP id 8E8C88FC14;
	Thu, 30 Jun 2011 22:52:42 +0000 (UTC)
Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132])
	by mx1.stack.nl (Postfix) with ESMTP id 34AF51DD7BF;
	Fri,  1 Jul 2011 00:52:41 +0200 (CEST)
Received: by turtle.stack.nl (Postfix, from userid 1677)
	id 22478174D1; Fri,  1 Jul 2011 00:52:41 +0200 (CEST)
Date: Fri, 1 Jul 2011 00:52:41 +0200
From: Jilles Tjoelker <jilles@stack.nl>
To: Stefan Esser <se@freebsd.org>, freebsd-standards@freebsd.org
Message-ID: <20110630225240.GA59192@stack.nl>
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org>
	<20110630174208.GA83700@zim.MIT.EDU>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20110630174208.GA83700@zim.MIT.EDU>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: 
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
 architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jun 2011 22:52:43 -0000

On Thu, Jun 30, 2011 at 01:42:08PM -0400, David Schultz wrote:
> > > Overflow checking is a separate concern, and one that is more
> > > likely to cause problems.  I'd be more careful about changing the
> > > default behavior there, because some scripts might rely on modular
> > > arithmetic without overflow checks.

> > You cannot portably rely in overflow, since you have no guarantee that
> > overflow occurs at a specific boundary.

> My point was that some scripts might rely on the lack of overflow
> checking anyway.  This is a separate and potentially more
> problematic change.  But I'm not trying to argue that it's
> necessarily a bad idea.

tools/regression/bin/sh/expansion/arith11.0 depends on two's complement
overflow behaviour, to test dividing the smallest integer by -1.

> > > Another thing that is likely to cause confusion is expr(1)
> > > behaving differently from the shell builtin, and differently from
> > > the shell's arithmetic.  Above all else, I suggest trying to make
> > > everything consistent...perhaps talk to jilles@ about the shell
> > > side of things.

> > My change in 2000 made the shell builtin do 64bit arithmetic as well (in
> > fact, the shell build just included the expr.y file ...).

As far as I can see from the history, there has never been an expr(1)
builtin in FreeBSD. The commented line in builtins.def is deceiving: it
was for Almquist's combined expr and test (which cannot possibly work
reliably and has never been enabled in FreeBSD) but test and [ were
removed from the line when a test(1) builtin was added.

There is a let builtin, with an alias "exp" in 8.x and older, but this
simply concatenates its arguments and evaluates the result as an
arithmetic expression $((...)). I strongly discourage using "let".

> I don't know the state of the shell these days.  Didn't jilles@
> change the shell to do that more recently?  My point here is that
> working to make both the shell and expr behave identically would
> be ideal.

Shell arithmetic has been intmax_t for a while.

At least in all versions with intmax_t, the smallest integer cannot be
specified as a literal. This is because the minus sign is not part of
the literal and the absolute value of the smallest integer (like
9223372036854775808) does not fit in an intmax_t and is
silently clipped (for example to 9223372036854775807).

This problem does not occur when the number is stored in a variable and
the variable is used in the expression without a dollar sign. The
smallest integer will be processed correctly in that case as the whole
string is passed to strtoimax().

The new arithmetic code in 9.0 has brought some subtle changes. If a
variable is used in an expression without a dollar sign, out of range
values are now errors (previously such values were silently clipped).
Apparently ports and other scripts do not mind this check. Regular
literals are still silently clipped.

This usage of strtoimax() also means that things like
  echo $(($(printf "%#x" -1) ))
or
  v=$(printf "%#x" -1); echo $((v))
do not work as might be expected.

A different overflow behaviour for arithmetic operations might be
dangerous, breaking ports and/or other scripts.

Also note that errors in expr(1) and shell arithmetic are handled rather
differently. The former are usually ignored (though an error message
will probably be printed) while the latter will almost always cause the
shell to stop executing the subshell or script.

-- 
Jilles Tjoelker

From owner-freebsd-standards@FreeBSD.ORG  Fri Jul  1 00:27:06 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 9D14A106566C;
	Fri,  1 Jul 2011 00:27:06 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id 398788FC13;
	Fri,  1 Jul 2011 00:27:05 +0000 (UTC)
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p610R3vF023067
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 1 Jul 2011 10:27:04 +1000
Date: Fri, 1 Jul 2011 10:27:03 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stefan Esser <se@freebsd.org>
In-Reply-To: <4E0CACFB.8040906@freebsd.org>
Message-ID: <20110701100701.U1279@besplex.bde.org>
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-standards@freebsd.org
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
	architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2011 00:27:06 -0000

On Thu, 30 Jun 2011, Stefan Esser wrote:

> Am 30.06.2011 18:46, schrieb David Schultz:
>> On Wed, Jun 29, 2011, Stefan Esser wrote:
>>> My suggestion is to make the following modifications to expr:
>>>
>>> 1) Always compute with 64 bit range and overflow checks enabled. This
>>>    means, that the "-e" option is always assumed.
>>>
>>> 2) Continue to support the "-e" option as a NOP for compatibility with
>>>    scripts that knew about that FreeBSD extension.
>>
>> Using 64-bit signed integer arithmetic consistently in expr sounds
>> like a reasonable idea.  There's no reason shell scripts ought to
>> be exposed to architectural details.
>
> Yes, especially when you move them from e.g. amd64 to i386 ...
>
>> There are two problems, however.  First, the implementation
>> doesn't do this: it uses intmax_t, which has platform-dependent
>> width, at least in theory.  Second, it sounds like POSIX doesn't
>
> Yes, but it seems that intmax_t is guaranteed to be at least 64bit wide.

It was intentional to use intmax_t instead of int64_t.  This allows for
future expansion.  The range checking will detect unportabilities in
scripts that depend on intmax_t being even larger than int64_t, and
hopefully prevent existence of such scripts.

>> allow this (although I don't know where this requirement comes
>> from):
>>
>> | r96367 | wollman | 2002-05-10 18:59:29 -0400 (Fri, 10 May 2002) | 5 lines
>> |
>> | The response to my POSIX interpretation request says that `expr'
>> | is required to be oblivious to overflow and to use the data type `long'.

I don't think POSIX is that broken :-).

>> | (Division by zero is undefined in ISO C so it's still OK to check for it
>> | here.)  Add a new `-e' flag to get the old, more useful behavior.

Behaviour on overflow is equally undefined.

>> Overflow checking is a separate concern, and one that is more
>> likely to cause problems.  I'd be more careful about changing the
>> default behavior there, because some scripts might rely on modular
>> arithmetic without overflow checks.
>
> You cannot portably rely in overflow, since you have no guarantee that
> overflow occurs at a specific boundary.

And that is when it is implementation-defined.  FreeBSD's man page
attempts to say what the FreeBSD implementation definition is, but
ends up saying that the behaviour is undefined -- it says "will overflow
silently according to the rules of the C standard, using the long data
type", but the C standard says that the behaviour is undefined, and
"undefined" of course doesn't include silence.

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Fri Jul  1 01:09:42 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 4064F106566C;
	Fri,  1 Jul 2011 01:09:42 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au
	[211.29.132.188])
	by mx1.freebsd.org (Postfix) with ESMTP id CADF08FC13;
	Fri,  1 Jul 2011 01:09:41 +0000 (UTC)
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p6119a0R000663
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 1 Jul 2011 11:09:37 +1000
Date: Fri, 1 Jul 2011 11:09:36 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Stefan Esser <se@freebsd.org>
In-Reply-To: <4E0CAE8B.6030309@freebsd.org>
Message-ID: <20110701102746.I1335@besplex.bde.org>
References: <99048.1309258976@critter.freebsd.dk>
	<4E0A0774.3090004@freebsd.org>
	<20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org>
	<20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org>
	<4E0CA55E.2090102@freebsd.org> <20110630165050.GB82980@zim.MIT.EDU>
	<4E0CAE8B.6030309@freebsd.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: Alexander Best <arundel@freebsd.org>,
	Poul-Henning Kamp <phk@phk.freebsd.dk>, standards@freebsd.org,
	Bruce Evans <bde@freebsd.org>
Subject: Re: RESENT with patch ... Re: [RFC] Consistent numeric range for
 "expr" on all architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2011 01:09:42 -0000

On Thu, 30 Jun 2011, Stefan Esser wrote:

> Am 30.06.2011 18:50, schrieb David Schultz:
>> On Thu, Jun 30, 2011, Stefan Esser wrote:
>>>  int
>>> +is_integer(const char *s)
>>> +{
>>> +	if (*s == '-')
>>> +		s++;
>>> +	while (isdigit(*s))
>>> +		s++;
>>> +	return *s == '\0';
>>> +}
>>
>> I only glanced at the patch for a few seconds, but your
>> is_integer() routine will accept a bare minus sign ("-") as an
>> integer.

It also invokes undefined behaviour by applying isdigit() to a value
that is not necessarily representable as an unsigned char or equal
to the value EOF, and gives wrong behaviour if there is a digit
whose value as a plain char promotes to an int with the value EOF
(perhaps the latter can't happen).  strto*() is more careful.

It also has a style bug (missing silly parentheses around return value).

Does you current version use only the above to classify integers, so
that leading whitespace is not permitted?  "expr [-e] ' 2' + ' 2'"
now gives 4 since strtoimax() ignores the leading whitespace, but I
think the syntax doesn't allow this.  This is another example of
why expr's syntax is broken as designed.  Shell arithmetic seems to
ignore leading whitespace.  This might be just because the whitespace
is kept when a variable is expanded, but has no significance in
shell expressions.

Trailing whitespace is already disallowed in expr.  This is presumably
just because strto*() stops parsing digits at the whitespace and then
the check for trailing garbage treats even whitespace as garbage.
Shell arithmetic seems to ignore trailing whitespace.

> I mentioned in the message, that I do not test for empty numeric
> operands. I had considered:

This seems to be another bug in old expr.  It gives 4 for
"expr [-e] 2 + '' + 2".

> int
> is_integer(const char *s)
> {
>        if (*s == '-')
>                s++;
>        if (!isdigit(*s++)
>                return 0;
>        while (isdigit(*s))
>                s++;
>        return *s == '\0';
> }
>
> But this will break script that do
>
> 	while :
> 	do
> 		i=`expr "$i" + 1`
> 	done
>
> without first setting i=0. And I assume, such scripts exist in huge numbers.

Eeek,  Doesn't the syntax require them to break?  The resulting expression
is similar to unary plus, which is useful for similar things.

> But "-" will not be accepted as operand (neither integer nor string).
>
> Didn't I give the example "expr -- - : -" which fails with a syntax
> error?
>
> Therefore, I do not need to check for actual digits in the input (and
> thus may accept "" as 0 for backwards compatibility), but "-" will not
> be allowed as parameter to expr by the parser ("syntax error").

I tried a Linux expr (from a FreeBSD package in 2002): the following are
all syntax errors ("non-numeric argument"):
     '+1' + 1 (unary plus)
     ' 1' + 1 (leading whitespace)
     '1 ' + 1 (trailing whitespace)
     ''   + 1 (no digits)
The following wrap silently, but not as signed long (this is on i386, but
int64_t is apparently used):
     1111111111111111111 + 1111111111111111111 ->  2222222222222222222
     9223372036854775806 + 1                   ->  9223372036854775807
     9223372036854775806 + 2                   -> -9223372036854775808

We have the compat mode for the broken scripts, but since they would have
broken in Linux in 2002, hopefully there aren't any.  Someone should
check a current Linux expr.  I guess it hasn't changed for unary plus
and whitepsace, since it was perfectly broken to POSIX spec in 2002
and the spec hasn't changed.  It might have changed the behaviour on
overflow (say to make it nonsilent), since the spec doesn't require any
particular brokenness then.

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Fri Jul  1 02:24:14 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 539DA1065672
	for <freebsd-standards@freebsd.org>;
	Fri,  1 Jul 2011 02:24:14 +0000 (UTC)
	(envelope-from brde@optusnet.com.au)
Received: from fallbackmx07.syd.optusnet.com.au
	(fallbackmx07.syd.optusnet.com.au [211.29.132.9])
	by mx1.freebsd.org (Postfix) with ESMTP id 9F8288FC17
	for <freebsd-standards@freebsd.org>;
	Fri,  1 Jul 2011 02:24:13 +0000 (UTC)
Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au
	[211.29.132.189])
	by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p5UNwt4m014690
	for <freebsd-standards@freebsd.org>; Fri, 1 Jul 2011 09:58:55 +1000
Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au
	(c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191])
	by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id
	p5UNwc6E019477
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
	Fri, 1 Jul 2011 09:58:41 +1000
Date: Fri, 1 Jul 2011 09:58:38 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Garrett Wollman <wollman@csail.mit.edu>
In-Reply-To: <19980.47094.184089.398324@khavrinen.csail.mit.edu>
Message-ID: <20110701092909.F1164@besplex.bde.org>
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org>
	<19980.47094.184089.398324@khavrinen.csail.mit.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
Cc: freebsd-standards@freebsd.org
Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all
	architectures
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2011 02:24:14 -0000

On Thu, 30 Jun 2011, Garrett Wollman wrote:

> <<On Thu, 30 Jun 2011 19:06:03 +0200, Stefan Esser <se@freebsd.org> said:
>
>> Well, that's the frustrating part: I had implemented 64bit range in expr
>> back in 2000, but this extended range (on 32bit archs) has been made
>> optional, some two years later with the commit message you quote.
>
> The 2001 POSIX standard states:
>
>> Integer variables and constants, including the values of operands
>> and option-arguments, used by the standard utilities listed in this
>> volume of IEEE Std 1003.1-2001 shall be implemented as equivalent to
>> the ISO C standard signed long data type; floating point shall be

Thus, overflow can occur, and the behaviour on overflow is undefined.

>> implemented as equivalent to the ISO C standard double
>> type. Conversions between types shall be as described in the ISO C
>> standard. All variables shall be initialized to zero if they are not
>> otherwise assigned by the input to the application.
>
>> Arithmetic operators and control flow keywords shall be implemented
>> as equivalent to those in the cited ISO C standard section, as
>> listed in Table 1-2 (on page 8).
>
> However, for arithemtic expansion in the shell (but *not* the expr
> utility), it says:
>
>> As an extension, the shell may recognize arithmetic expressions
>> beyond those listed. The shell may use a signed integer type with a
>> rank larger than the rank of signed long. The shell may use a
>> real-floating type instead of signed long as long as it does not
>> affect the results in cases where there is no overflow.

The part about recognizing different syntaxes is an extension and
requires explicit permission.  The part about "using" a rank larger
than that of signed long requires no explicit permission, since it
only affects the result if there is overflow, but then the behaviour
is undefined.  So any utility can do the latter.

The as-if rule always allowed any utility to use any type to implement
arithmetic, provided that the final result is no different if it is
defined.  Thus for example, as an implementation detail real-double
can be used on i386 but not on amd64, since it has 53 value bits on
both, while signed long has fewer (31) value bits on i386 and more
(63) value bits on amd64.  long double could be used on amd64.  When
the FP arithmetic produces an unrepresentable preliminary result like
Inf or NaN due to overflow or worse, conversion of that result to
signed long gives undefined behaviour (typically a garbage value, with
the garbage being amazingly inconsistent depending on the sizes of the
FP and integer types in the conversion).

> The language in the 2008 version of the standard is the same.
>
> For expr, the following definitions are relevant (from XCU7 page
> 2715):
>> integer An argument consisting only of an (optional) unary minus
>>         followed by digits.
>> string  A string argument; see below.
> [...]
>> A string argument is an argument that cannot be identified as an
>> integer argument or as one of the expression operator symbols shown
>> in the OPERANDS section.

Found another bug in FreeBSD expr: it accepts leading whitespace in
integers (since strto*() does), but the above doesn't allow this.
Leading whitespace might even be useful for turning numbers into
strings (it is sort of the reverse of turning strings into numbers
in awk and even in expr by adding 0 to them).  The FreeBSD man page
of course gives null detail about this requirement of the syntax.

Found another bug of commission in the man page:

%      Arithmetic operations are performed using signed integer math.  If the -e
%      flag is specified, arithmetic uses the C intmax_t data type (the largest
%      integral type available), and expr will detect arithmetic overflow and
%      return an error indication.  If a numeric operand is specified which is
%      so large as to overflow conversion to an integer, it is parsed as a
%      string instead.  If -e is not specified, arithmetic operations and pars-
%      ing of integer arguments will overflow silently according to the rules of
%      the C standard, using the long data type.

The rule of the C standard is that overflow gives undefined behaviour, not
silence.

Bruce

From owner-freebsd-standards@FreeBSD.ORG  Fri Jul  1 20:05:27 2011
Return-Path: <owner-freebsd-standards@FreeBSD.ORG>
Delivered-To: freebsd-standards@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 977AB106564A
	for <freebsd-standards@freebsd.org>;
	Fri,  1 Jul 2011 20:05:27 +0000 (UTC) (envelope-from se@freebsd.org)
Received: from nm7-vm1.bullet.mail.ne1.yahoo.com
	(nm7-vm1.bullet.mail.ne1.yahoo.com [98.138.90.250])
	by mx1.freebsd.org (Postfix) with SMTP id 550EF8FC0A
	for <freebsd-standards@freebsd.org>;
	Fri,  1 Jul 2011 20:05:27 +0000 (UTC)
Received: from [98.138.90.57] by nm7.bullet.mail.ne1.yahoo.com with NNFMP;
	01 Jul 2011 19:51:30 -0000
Received: from [98.138.226.127] by tm10.bullet.mail.ne1.yahoo.com with NNFMP;
	01 Jul 2011 19:51:30 -0000
Received: from [127.0.0.1] by smtp206.mail.ne1.yahoo.com with NNFMP;
	01 Jul 2011 19:51:30 -0000
X-Yahoo-Newman-Id: 673043.42219.bm@smtp206.mail.ne1.yahoo.com
Received: from [192.168.119.20] (se@81.173.154.254 with plain)
	by smtp206.mail.ne1.yahoo.com with SMTP; 01 Jul 2011 12:51:28 -0700 PDT
X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq.
X-YMail-OSG: _YNjbRsVM1lk6Y8Ub8R79OlAfayrMJXCKBDrf9fqA3z2kP2
	976lOD0K7rYUonFIX77M69vqA9DqVzZHLcORgN5LvRne6xhMbGDXoOJspZxM
	QS_XRwzvw45DuLdgIETTCJzVMySGgNFHyKWP72181RXg3y7c8r.NumEWlCF9
	RBZILJV6kkfoqEonBNPUYeMsmgPEE8ME6pNefozcl9NE3F73Y0yvZQogIiOz
	9Y2ZkeXJuX3j3YEALYHqZg6sUHlWoJxd33LF6BsJ_grAIy92Vh0sHocVUVph
	PQr3tvo90fsWAkJFaik_EjhsoaksBGJ32mYULQ0BauWjTYLbKZtpG8Of61O9
	Jos36qADTTfD6DV5MUX.O7UNvCiCVdWZg_TCqpQOgHBVwVF9OMVeKNYxhz0F
	QL3rhAfhziL9l5vL6GAUBhHVWKWshUY6x5x5v
X-Yahoo-Newman-Property: ymail-3
Message-ID: <4E0E2535.1060204@freebsd.org>
Date: Fri, 01 Jul 2011 21:51:17 +0200
From: Stefan Esser <se@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
	rv:5.0) Gecko/20110624 Thunderbird/5.0
MIME-Version: 1.0
To: freebsd-standards@freebsd.org
References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org>
	<20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org>
	<19980.47094.184089.398324@khavrinen.csail.mit.edu>
	<20110701092909.F1164@besplex.bde.org>
In-Reply-To: <20110701092909.F1164@besplex.bde.org>
Content-Type: multipart/mixed; boundary="------------060600010302060201060806"
Cc: Garrett Wollman <wollman@csail.mit.edu>
Subject: New patches for expr (was: Re: Fwd: [RFC] Consistent numeric range
 for "expr" on all architectures)
X-BeenThere: freebsd-standards@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Standards compliance <freebsd-standards.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-standards>
List-Post: <mailto:freebsd-standards@freebsd.org>
List-Help: <mailto:freebsd-standards-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-standards>, 
	<mailto:freebsd-standards-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Jul 2011 20:05:27 -0000

This is a multi-part message in MIME format.
--------------060600010302060201060806
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 01.07.2011 01:58, Bruce Evans wrote:
[...]
> The part about recognizing different syntaxes is an extension and
> requires explicit permission.  The part about "using" a rank larger
> than that of signed long requires no explicit permission, since it
> only affects the result if there is overflow, but then the behaviour
> is undefined.  So any utility can do the latter.

I have attached a new patch which does the following for overflow:

1) Numeric range is extended to intmax_t
2) Overflow is caught in all arithmetic operations
3) Integers not used in arithmetic operations are treated like strings

> Found another bug in FreeBSD expr: it accepts leading whitespace in
> integers (since strto*() does), but the above doesn't allow this.
> Leading whitespace might even be useful for turning numbers into
> strings (it is sort of the reverse of turning strings into numbers
> in awk and even in expr by adding 0 to them).  The FreeBSD man page
> of course gives null detail about this requirement of the syntax.

4) Leading whitespace in numbers is only accepted in compat mode
   ("expr -e" or EXPR_COMPAT defined in the process environment or
   check_utility_compat("expr") == true).

<<<<<< Aside: Is check_utility_compat() still used at all?

It provides compatibility with FreeBSD-4 if the environment variable
_COMPAT_FreeBSD_4 exists and contains "expr" as member of a comma
separated list, or if the following symlink exists in /etc:

	compat-FreeBSD-4-util -> expr

In fact, the symlink is also parsed as comma separated list, but expr is
the only program in FreeBSD that calls check_utility_compat() ...

Since expr is the only user of check_utility_compat() and the same
effect can be had by specifying EXPR_COMPAT, the check_utility_compat
function should be considered deprecated and only be kept in libc for
backwards compatibility of old binaries, IMHO. This function adds one
unnecessary call of getenv() and one of readlink() per invocation of
expr and adds complexity to the man-page for expr.

>>>>>>>

> Found another bug of commission in the man page:
> 
> %      Arithmetic operations are performed using signed integer math. If the -e
> %      flag is specified, arithmetic uses the C intmax_t data type (the largest
> %      integral type available), and expr will detect arithmetic overflow and
> %      return an error indication.  If a numeric operand is specified which is
> %      so large as to overflow conversion to an integer, it is parsed as a
> %      string instead.  If -e is not specified, arithmetic operations and pars-
> %      ing of integer arguments will overflow silently according to the rules of
> %      the C standard, using the long data type.
> 
> The rule of the C standard is that overflow gives undefined behaviour, not
> silence.

I have prepared a number of patches, which I attach to this message. All
patches are relative to the current SVN version of expr.y (i.e. do *not*
try to apply the higher numbered ones on a patched source file).

The first file (expr.y.diff1) contains a "diff -w -u3" of the functional
changes that I suggest. It is only meant to show the actual code changes
(applying this diff results in mis-indented code).

The second file (expr.y.diff2) has been created with the same command
and contains some refactoring of functions. It is functionally identical
to the first version.

The third file (expr.y.diff3) contains the diff with whitespace and
style changes that I suggest as final version. I plan to commit the
changes in sequence (with correct indentation of each step, of course),
in order to keep the change history separate for each intended purpose.

The final version is some 40 lines (plus some 25 lines of stripped
comments and whitespace) shorter than the current version (607 vs. 674
lines of code) and it complies with style(9), AFAICT.

I also include a SHAR of regression tests meant to be committed to
/usr/src/tools/regression/bin/expr. All tests pass with the current
patched version of expr (while the currently shipped version dumps core
in one test case).

Please let me know, whether there are any objections to the
functionality and the patches. If there are no strong objections, then I
intend to commit the changes on July 6th.

I plan to commit all three versions in short sequence (not by applying
the patches, as explained above they are created with "-w" for a minimal
diff and therefore lead to wrong indentation).

I also plan to commit the regression tests. Suggestions for further
tests are welcome. I know that tests for comparison operators are mostly
missing and plan to add a few (especially string vs. decimal), but
perhaps you want to suggest special boundary cases for the arithmetic tests.

There have been comments regarding the wording of the man page and I
want to commit an updated man page with the source file. But before
finalizing the man page, I want to have finalized the functionality to
be described in the man page ...

Best regards, STefan

--------------060600010302060201060806
Content-Type: text/plain;
 name="expr-test.sh"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="expr-test.sh"

# This is a shell archive.  Save it in a file, remove anything before
# this line, and then unpack it by entering "sh file".  Note, it may
# create directories; files and directories will be owned by you and
# have default permissions.
#
# This archive contains:
#
#	expr/Makefile
#	expr/regress.sh
#	expr/regress.t
#
echo x - expr/Makefile
sed 's/^X//' >expr/Makefile << 'f9600fcce6c6ed7d48045c253fb4f463'
X# $FreeBSD: head/tools/regression/bin/test/Makefile 215022 2010-11-08 23:15:10Z jilles $
X
Xall:
X	sh regress.sh
f9600fcce6c6ed7d48045c253fb4f463
echo x - expr/regress.sh
sed 's/^X//' >expr/regress.sh << '78ebeeef8ff008d708ef3dfc31f2d4b1'
X#!/bin/sh
X
X#-
X# Copyright (c) 2011 Stefan Esser <se@freebsd.org>
X# All rights reserved. 
X#
X# Redistribution and use in source and binary forms, with or without
X# modification, are permitted provided that the following conditions
X# are met:
X# 1. Redistributions of source code must retain the above copyright
X#    notice, this list of conditions and the following disclaimer.
X# 2. Redistributions in binary form must reproduce the above copyright
X#    notice, this list of conditions and the following disclaimer in the
X#    documentation and/or other materials provided with the distribution.
X#
X# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
X# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
X# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
X# ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
X# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
X# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
X# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
X# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
X# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
X# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
X# SUCH DAMAGE.
X
X#
X# expr.sh - check if test(1) or builtin test works
X#
X# $FreeBSD:$
X
X# force a specified test program, e.g. `env test=/bin/expr sh regress.sh'
X: ${test=expr}
X
Xt ()
X{
X	# $1 -> exit code
X	# $2 -> result written to stdout
X	# $3 -> $test expression
X
X	count=$((count+1))
X	# check for syntax errors
X	output="`eval $test $3 2>/dev/null`"
X	ret=$?
X	if [ "$ret" -ne "$1" ]; then
X		printf "not ok %s - (got $ret, expected $1)\n" "$count $3"
X	elif [ "$output" != "$2" ]; then
X		printf "not ok %s - (unexpected output)\n" "$count $3"
X	else
X		printf "ok %s\n" "$count $3"
X	fi
X}
X
Xcount=0
Xecho "1..55"
X
Xt 0 xyzzy '\( xyzzy \)'
Xt 2 '' 'xyzzy + 0'
X
Xt 0 000100000 '000100000'
Xt 0 00010000000000000000000 '00010000000000000000000'
X
Xt 0 1 '0 + 1'
Xt 1 0 '0 + 0'
Xt 0 2 '1 + 1'
Xt 0 -2 '-- -1 + -1'
Xt 1 0 '-- -1 + 1'
Xt 2 '' '-'			# syntax error
Xt 2 '' '-1'			# syntax error
Xt 2 '' '-1 + 1'			# syntax error
Xt 1 0 '-- -1 + 1'
Xt 2 '' '" 	  1" + 1'	# syntax error
Xt 0 2 '-e " 	  1" + 1'	# allow leading white space with -e
Xt 0 1 '"" + 1'			# empty string is interpreted as 0
X
Xt 0 9223372036854775807 '9223372036854775807 + 0'
Xt 2 '' '9223372036854775807 + 1'
Xt 0 -9223372036854775808 '-- -9223372036854775807 - 1'
Xt 0 -9223372036854775808 '-- -9223372036854775808 \* 1'
Xt 2 '' '-- -9223372036854775808 \* -1'
Xt 0 9223372036854775807 '-- -9223372036854775807 \* -1'
Xt 0 10000000000000000000 '10000000000000000000'
Xt 2 '' '10000000000000000000 + 0'
Xt 0 100001 '000100000 + 1'
Xt 0 9223372036854775807 '4611686018427387904 - 1 - -4611686018427387904'
Xt 0 -9223372036854775808 '-- -4611686018427387904 + -4611686018427387904'
Xt 0 -9223372036854775808 '-- -9223372036854775808 / 1'
Xt 2 '' '-- -9223372036854775808 / -1'
Xt 0 -9223372036854775807 '9223372036854775807 / -1'
Xt 0 9223372036854775806 '4611686018427387903 \* 2'
Xt 2 '' '4611686018427387904 \* 2'
Xt 0 -9223372036854775806 '4611686018427387903 \* -2'
Xt 0 -9223372036854775808 '4611686018427387904 \* -2'
Xt 2 '' '4611686018427387905 \* -2'
Xt 0 -7 '-- -9223372036854775807 % 10'
Xt 0 -7 '-- -9223372036854775807 % -10'
Xt 0 7 '9223372036854775807 % 10'
Xt 0 7 '9223372036854775807 % -10'
Xt 0 64 '\( 2 \* 2 \) \* \( 2 \* 2 \) \* 2 \* 2'
Xt 0 10 '5 \* 7 - 5 \* 5'
Xt 0 50 '5 \* \( 7 - 5 \) \* 5'
X
Xt 0 abc '123abc123 : ".*\(a[^0-9]*\)"'
Xt 0 124 '123abc123 : ".*a[^0-9]*\(.*\)" + 1'
Xt 0 1 '123abc123 : ".*a[^0-9]*\(.*\)" = 123'
Xt 0 7 '123abc123 : ".*a[^0-9]*" + 1'
Xt 1 0 '123Abc123 : ".*a[^0-9]*"'
Xt 0 1 '123Abc123 : ".*a[^0-9]*" + 1'
Xt 0 1 '123Abc123 : ".*a[^0-9]*" = 0'
X
Xt 1 0 '2 + 0 "<" 1'
Xt 0 1 '0 "<" 1 + 2'
Xt 0 3 '\( 0 "<" 1 \) + 2'
X
XEXPR_COMPAT=1; export EXPR_COMPAT
X
Xt 0 -2 '-1 + -1'
Xt 1 0 '-1 + 1'
Xt 0 2 '" 1" + 1'
78ebeeef8ff008d708ef3dfc31f2d4b1
echo x - expr/regress.t
sed 's/^X//' >expr/regress.t << '85ccbaa69b7bc1caaef6199f47c0ae78'
X#!/bin/sh
X# $FreeBSD: head/tools/regression/bin/test/regress.t 215022 2010-11-08 23:15:10Z jilles $
X
Xcd `dirname $0`
X
Xsh regress.sh
85ccbaa69b7bc1caaef6199f47c0ae78
exit


--------------060600010302060201060806
Content-Type: text/plain;
 name="expr.y.diff1"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="expr.y.diff1"

--- expr.y.ORIG	2011-07-01 20:22:29.119607269 +0200
+++ expr.y.NEW1	2011-07-01 20:27:35.146459819 +0200
@@ -42,13 +42,15 @@
 
 struct val *result;
 
+void		assert_to_integer(struct val *);
 int		chk_div(intmax_t, intmax_t);
 int		chk_minus(intmax_t, intmax_t, intmax_t);
 int		chk_plus(intmax_t, intmax_t, intmax_t);
 int		chk_times(intmax_t, intmax_t, intmax_t);
 void		free_value(struct val *);
-int		is_zero_or_null(struct val *);
+int		is_integer(const char *);
 int		isstring(struct val *);
+int		is_zero_or_null(struct val *);
 struct val	*make_integer(intmax_t);
 struct val	*make_str(const char *);
 struct val	*op_and(struct val *, struct val *);
@@ -65,7 +67,7 @@
 struct val	*op_plus(struct val *, struct val *);
 struct val	*op_rem(struct val *, struct val *);
 struct val	*op_times(struct val *, struct val *);
-intmax_t	to_integer(struct val *);
+int		to_integer(struct val *);
 void		to_string(struct val *);
 int		yyerror(const char *);
 int		yylex(void);
@@ -134,37 +136,16 @@
 make_str(const char *s)
 {
 	struct val *vp;
-	char *ep;
 
 	vp = (struct val *) malloc (sizeof (*vp));
 	if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) {
 		errx(ERR_EXIT, "malloc() failed");
 	}
 
-	/*
-	 * Previously we tried to scan the string to see if it ``looked like''
-	 * an integer (erroneously, as it happened).  Let strtoimax() do the
-	 * dirty work.  We could cache the value, except that we are using
-	 * a union and need to preserve the original string form until we
-	 * are certain that it is not needed.
-	 *
-	 * IEEE Std.1003.1-2001 says:
-	 * /integer/ An argument consisting only of an (optional) unary minus  
-	 *	     followed by digits.          
-	 *
-	 * This means that arguments which consist of digits followed by
-	 * non-digits MUST NOT be considered integers.  strtoimax() will
-	 * figure this out for us.
-	 */
-	if (eflag)
-		(void)strtoimax(s, &ep, 10);
+	if (is_integer(s))
+		vp->type = numeric_string;
 	else
-		(void)strtol(s, &ep, 10);
-
-	if (*ep != '\0')
 		vp->type = string;
-	else	
-		vp->type = numeric_string;
 
 	return vp;
 }
@@ -178,31 +159,33 @@
 }
 
 
-intmax_t
+int
 to_integer(struct val *vp)
 {
 	intmax_t i;
 
-	if (vp->type == integer)
-		return 1;
-
-	if (vp->type == string)
-		return 0;
-
-	/* vp->type == numeric_string, make it numeric */
+	/* we can only convert numeric_string to integer, here */
+	if (vp->type == numeric_string) {
 	errno = 0;
-	if (eflag) {
 		i  = strtoimax(vp->u.s, (char **)NULL, 10);
-		if (errno == ERANGE)
-			err(ERR_EXIT, NULL);
-	} else {
-		i = strtol(vp->u.s, (char **)NULL, 10);
-	}
-
+		/* just keep as numeric_string, if the conversion fails */
+		if (errno != ERANGE) {
 	free (vp->u.s);
 	vp->u.i = i;
 	vp->type = integer;
-	return 1;
+		}
+	}
+	return (vp->type == integer);
+}
+
+
+void
+assert_to_integer(struct val *vp)
+{
+	if (vp->type == string)
+		errx(ERR_EXIT, "not a decimal number: '%s'", vp->u.s);
+	if (!to_integer(vp))
+		errx(ERR_EXIT, "operand too large: '%s'", vp->u.s);
 }
 
 void
@@ -230,6 +213,20 @@
 
 
 int
+is_integer(const char *s)
+{
+	if (eflag)
+		while (isspace((unsigned char)*s))
+			s++;
+	if (*s == '-')
+		s++;
+	while (isdigit((unsigned char)*s))
+		s++;
+	return (*s == '\0');
+}
+
+
+int
 isstring(struct val *vp)
 {
 	/* only TRUE if this string is not a valid integer */
@@ -350,8 +347,8 @@
 		to_string (b);	
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) == 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i == b->u.i));
 	}
 
@@ -370,8 +367,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) > 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i > b->u.i));
 	}
 
@@ -390,8 +387,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) < 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i < b->u.i));
 	}
 
@@ -410,8 +407,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) >= 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i >= b->u.i));
 	}
 
@@ -430,8 +427,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) <= 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i <= b->u.i));
 	}
 
@@ -450,8 +447,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) != 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i != b->u.i));
 	}
 
@@ -479,17 +476,13 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (eflag) {
 		r = make_integer(a->u.i + b->u.i);
 		if (chk_plus(a->u.i, b->u.i, r->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i + (long)b->u.i);
 
 	free_value (a);
 	free_value (b);
@@ -516,17 +509,13 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (eflag) {
 		r = make_integer(a->u.i - b->u.i);
 		if (chk_minus(a->u.i, b->u.i, r->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i - (long)b->u.i);
 
 	free_value (a);
 	free_value (b);
@@ -550,17 +539,13 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (eflag) {
 		r = make_integer(a->u.i * b->u.i);
 		if (chk_times(a->u.i, b->u.i, r->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i * (long)b->u.i);
 
 	free_value (a);
 	free_value (b);
@@ -583,21 +568,16 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
 	if (b->u.i == 0) {
 		errx(ERR_EXIT, "division by zero");
 	}
-
-	if (eflag) {
-		r = make_integer(a->u.i / b->u.i);
 		if (chk_div(a->u.i, b->u.i)) {
 			errx(ERR_EXIT, "overflow");
 		}
-	} else
-		r = make_integer((long)a->u.i / (long)b->u.i);
+	r = make_integer(a->u.i / b->u.i);
 
 	free_value (a);
 	free_value (b);
@@ -609,19 +589,12 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
-
+	assert_to_integer(a);
+	assert_to_integer(b);
 	if (b->u.i == 0) {
 		errx(ERR_EXIT, "division by zero");
 	}
-
-	if (eflag)
 		r = make_integer(a->u.i % b->u.i);
-	        /* chk_rem necessary ??? */
-	else
-		r = make_integer((long)a->u.i % (long)b->u.i);
 
 	free_value (a);
 	free_value (b);

--------------060600010302060201060806
Content-Type: text/plain;
 name="expr.y.diff2"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="expr.y.diff2"

--- expr.y.ORIG	2011-07-01 20:22:29.119607269 +0200
+++ expr.y.NEW2	2011-07-01 20:26:47.212460189 +0200
@@ -40,15 +40,19 @@
 	} u;
 } ;
 
+char		**av;
+int		eflag;
 struct val *result;
 
-int		chk_div(intmax_t, intmax_t);
-int		chk_minus(intmax_t, intmax_t, intmax_t);
-int		chk_plus(intmax_t, intmax_t, intmax_t);
-int		chk_times(intmax_t, intmax_t, intmax_t);
+void		assert_to_integer(struct val *);
+void		assert_div(intmax_t, intmax_t);
+void		assert_minus(intmax_t, intmax_t, intmax_t);
+void		assert_plus(intmax_t, intmax_t, intmax_t);
+void		assert_times(intmax_t, intmax_t, intmax_t);
 void		free_value(struct val *);
-int		is_zero_or_null(struct val *);
+int		is_integer(const char *);
 int		isstring(struct val *);
+int		is_zero_or_null(struct val *);
 struct val	*make_integer(intmax_t);
 struct val	*make_str(const char *);
 struct val	*op_and(struct val *, struct val *);
@@ -65,14 +69,12 @@
 struct val	*op_plus(struct val *, struct val *);
 struct val	*op_rem(struct val *, struct val *);
 struct val	*op_times(struct val *, struct val *);
-intmax_t	to_integer(struct val *);
+int		to_integer(struct val *);
 void		to_string(struct val *);
 int		yyerror(const char *);
 int		yylex(void);
 int		yyparse(void);
 
-static int	eflag;
-char **av;
 %}
 
 %union
@@ -134,37 +136,16 @@
 make_str(const char *s)
 {
 	struct val *vp;
-	char *ep;
 
 	vp = (struct val *) malloc (sizeof (*vp));
 	if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) {
 		errx(ERR_EXIT, "malloc() failed");
 	}
 
-	/*
-	 * Previously we tried to scan the string to see if it ``looked like''
-	 * an integer (erroneously, as it happened).  Let strtoimax() do the
-	 * dirty work.  We could cache the value, except that we are using
-	 * a union and need to preserve the original string form until we
-	 * are certain that it is not needed.
-	 *
-	 * IEEE Std.1003.1-2001 says:
-	 * /integer/ An argument consisting only of an (optional) unary minus  
-	 *	     followed by digits.          
-	 *
-	 * This means that arguments which consist of digits followed by
-	 * non-digits MUST NOT be considered integers.  strtoimax() will
-	 * figure this out for us.
-	 */
-	if (eflag)
-		(void)strtoimax(s, &ep, 10);
+	if (is_integer(s))
+		vp->type = numeric_string;
 	else
-		(void)strtol(s, &ep, 10);
-
-	if (*ep != '\0')
 		vp->type = string;
-	else	
-		vp->type = numeric_string;
 
 	return vp;
 }
@@ -178,31 +159,33 @@
 }
 
 
-intmax_t
+int
 to_integer(struct val *vp)
 {
 	intmax_t i;
 
-	if (vp->type == integer)
-		return 1;
-
-	if (vp->type == string)
-		return 0;
-
-	/* vp->type == numeric_string, make it numeric */
-	errno = 0;
-	if (eflag) {
+	/* we can only convert numeric_string to integer, here */
+	if (vp->type == numeric_string) {
+		errno = 0;
 		i  = strtoimax(vp->u.s, (char **)NULL, 10);
-		if (errno == ERANGE)
-			err(ERR_EXIT, NULL);
-	} else {
-		i = strtol(vp->u.s, (char **)NULL, 10);
+		/* just keep as numeric_string, if the conversion fails */
+		if (errno != ERANGE) {
+			free (vp->u.s);
+			vp->u.i = i;
+			vp->type = integer;
+		}
 	}
+	return (vp->type == integer);
+}
 
-	free (vp->u.s);
-	vp->u.i = i;
-	vp->type = integer;
-	return 1;
+
+void
+assert_to_integer(struct val *vp)
+{
+	if (vp->type == string)
+		errx(ERR_EXIT, "not a decimal number: '%s'", vp->u.s);
+	if (!to_integer(vp))
+		errx(ERR_EXIT, "operand too large: '%s'", vp->u.s);
 }
 
 void
@@ -230,6 +213,20 @@
 
 
 int
+is_integer(const char *s)
+{
+	if (eflag)
+		while (isspace((unsigned char)*s))
+			s++;
+	if (*s == '-')
+		s++;
+	while (isdigit((unsigned char)*s))
+		s++;
+	return (*s == '\0');
+}
+
+
+int
 isstring(struct val *vp)
 {
 	/* only TRUE if this string is not a valid integer */
@@ -350,8 +347,8 @@
 		to_string (b);	
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) == 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i == b->u.i));
 	}
 
@@ -370,8 +367,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) > 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i > b->u.i));
 	}
 
@@ -390,8 +387,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) < 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i < b->u.i));
 	}
 
@@ -410,8 +407,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) >= 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i >= b->u.i));
 	}
 
@@ -430,8 +427,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) <= 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i <= b->u.i));
 	}
 
@@ -450,8 +447,8 @@
 		to_string (b);
 		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) != 0));
 	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
+		assert_to_integer(a);
+		assert_to_integer(b);
 		r = make_integer ((intmax_t)(a->u.i != b->u.i));
 	}
 
@@ -460,18 +457,16 @@
 	return r;
 }
 
-int
-chk_plus(intmax_t a, intmax_t b, intmax_t r)
+void
+assert_plus(intmax_t a, intmax_t b, intmax_t r)
 {
-
-	/* sum of two positive numbers must be positive */
-	if (a > 0 && b > 0 && r <= 0)
-		return 1;
-	/* sum of two negative numbers must be negative */
-	if (a < 0 && b < 0 && r >= 0)
-		return 1;
-	/* all other cases are OK */
-	return 0;
+	/*
+	 * sum of two positive numbers must be positive,
+	 * sum of two negative numbers must be negative
+	 */
+	if ((a > 0 && b > 0 && r <= 0) ||
+	    (a < 0 && b < 0 && r >= 0))
+		errx(ERR_EXIT, "overflow");
 }
 
 struct val *
@@ -479,36 +474,26 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (eflag) {
-		r = make_integer(a->u.i + b->u.i);
-		if (chk_plus(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i + (long)b->u.i);
+	r = make_integer(a->u.i + b->u.i);
+	assert_plus(a->u.i, b->u.i, r->u.i);
 
 	free_value (a);
 	free_value (b);
 	return r;
 }
 
-int
-chk_minus(intmax_t a, intmax_t b, intmax_t r)
+void
+assert_minus(intmax_t a, intmax_t b, intmax_t r)
 {
 
 	/* special case subtraction of INTMAX_MIN */
-	if (b == INTMAX_MIN) {
-		if (a >= 0)
-			return 1;
-		else
-			return 0;
-	}
-	/* this is allowed for b != INTMAX_MIN */
-	return chk_plus (a, -b, r);
+	if (b == INTMAX_MIN && a < 0)
+		errx(ERR_EXIT, "overflow");
+	/* check addition of negative subtrahend */
+	assert_plus(a, -b, r);
 }
 
 struct val *
@@ -516,33 +501,26 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (eflag) {
-		r = make_integer(a->u.i - b->u.i);
-		if (chk_minus(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i - (long)b->u.i);
+	r = make_integer(a->u.i - b->u.i);
+	assert_minus(a->u.i, b->u.i, r->u.i);
 
 	free_value (a);
 	free_value (b);
 	return r;
 }
 
-int
-chk_times(intmax_t a, intmax_t b, intmax_t r)
+void
+assert_times(intmax_t a, intmax_t b, intmax_t r)
 {
-	/* special case: first operand is 0, no overflow possible */
-	if (a == 0)
-		return 0;
-	/* verify that result of division matches second operand */
-	if (r / a != b)
-		return 1;
-	return 0;
+	/*
+	 * if first operand is 0, no overflow is possible,
+	 * else result of division test must match second operand
+	 */
+	if (a != 0 && r / a != b)
+		errx(ERR_EXIT, "overflow");
 }
 
 struct val *
@@ -550,32 +528,25 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (eflag) {
-		r = make_integer(a->u.i * b->u.i);
-		if (chk_times(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i * (long)b->u.i);
+	r = make_integer(a->u.i * b->u.i);
+	assert_times(a->u.i, b->u.i, r->u.i);
 
 	free_value (a);
 	free_value (b);
 	return (r);
 }
 
-int
-chk_div(intmax_t a, intmax_t b)
+void
+assert_div(intmax_t a, intmax_t b)
 {
-	/* div by zero has been taken care of before */
+	if (b == 0)
+		errx(ERR_EXIT, "division by zero");
 	/* only INTMAX_MIN / -1 causes overflow */
 	if (a == INTMAX_MIN && b == -1)
-		return 1;
-	/* everything else is OK */
-	return 0;
+		errx(ERR_EXIT, "overflow");
 }
 
 struct val *
@@ -583,21 +554,12 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
 
-	if (b->u.i == 0) {
-		errx(ERR_EXIT, "division by zero");
-	}
-
-	if (eflag) {
-		r = make_integer(a->u.i / b->u.i);
-		if (chk_div(a->u.i, b->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i / (long)b->u.i);
+	/* assert based on operands only, not on result */
+	assert_div(a->u.i, b->u.i);
+	r = make_integer(a->u.i / b->u.i);
 
 	free_value (a);
 	free_value (b);
@@ -609,19 +571,11 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
-
-	if (b->u.i == 0) {
-		errx(ERR_EXIT, "division by zero");
-	}
-
-	if (eflag)
-		r = make_integer(a->u.i % b->u.i);
-	        /* chk_rem necessary ??? */
-	else
-		r = make_integer((long)a->u.i % (long)b->u.i);
+	assert_to_integer(a);
+	assert_to_integer(b);
+	/* pass a=1 to only check for div by zero */
+	assert_div(1, b->u.i);
+	r = make_integer(a->u.i % b->u.i);
 
 	free_value (a);
 	free_value (b);

--------------060600010302060201060806
Content-Type: text/plain;
 name="expr.y.diff3"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="expr.y.diff3"

--- expr.y.ORIG	2011-07-01 20:22:29.119607269 +0200
+++ expr.y.FINAL	2011-07-01 20:25:23.527484480 +0200
@@ -1,6 +1,6 @@
 %{
 /*-
- * Written by Pace Willisson (pace@blitz.com) 
+ * Written by Pace Willisson (pace@blitz.com)
  * and placed in the public domain.
  *
  * Largely rewritten by J.T. Conklin (jtc@wimsey.com)
@@ -21,7 +21,7 @@
 #include <string.h>
 #include <regex.h>
 #include <unistd.h>
-  
+
 /*
  * POSIX specifies a specific error code for syntax errors.  We exit
  * with this code for all errors.
@@ -40,15 +40,19 @@
 	} u;
 } ;
 
-struct val *result;
-
-int		chk_div(intmax_t, intmax_t);
-int		chk_minus(intmax_t, intmax_t, intmax_t);
-int		chk_plus(intmax_t, intmax_t, intmax_t);
-int		chk_times(intmax_t, intmax_t, intmax_t);
+char		**av;
+int		eflag;
+struct val	*result;
+
+void		assert_to_integer(struct val *);
+void		assert_div(intmax_t, intmax_t);
+void		assert_minus(intmax_t, intmax_t, intmax_t);
+void		assert_plus(intmax_t, intmax_t, intmax_t);
+void		assert_times(intmax_t, intmax_t, intmax_t);
 void		free_value(struct val *);
+int		is_integer(const char *);
+int		is_string(struct val *);
 int		is_zero_or_null(struct val *);
-int		isstring(struct val *);
 struct val	*make_integer(intmax_t);
 struct val	*make_str(const char *);
 struct val	*op_and(struct val *, struct val *);
@@ -65,14 +69,12 @@
 struct val	*op_plus(struct val *, struct val *);
 struct val	*op_rem(struct val *, struct val *);
 struct val	*op_times(struct val *, struct val *);
-intmax_t	to_integer(struct val *);
+int		to_integer(struct val *);
 void		to_string(struct val *);
 int		yyerror(const char *);
 int		yylex(void);
 int		yyparse(void);
 
-static int	eflag;
-char **av;
 %}
 
 %union
@@ -96,23 +98,22 @@
 
 expr:	TOKEN
 	| '(' expr ')' { $$ = $2; }
-	| expr '|' expr { $$ = op_or ($1, $3); }
-	| expr '&' expr { $$ = op_and ($1, $3); }
-	| expr '=' expr { $$ = op_eq ($1, $3); }
-	| expr '>' expr { $$ = op_gt ($1, $3); }
-	| expr '<' expr { $$ = op_lt ($1, $3); }
-	| expr GE expr  { $$ = op_ge ($1, $3); }
-	| expr LE expr  { $$ = op_le ($1, $3); }
-	| expr NE expr  { $$ = op_ne ($1, $3); }
-	| expr '+' expr { $$ = op_plus ($1, $3); }
-	| expr '-' expr { $$ = op_minus ($1, $3); }
-	| expr '*' expr { $$ = op_times ($1, $3); }
-	| expr '/' expr { $$ = op_div ($1, $3); }
-	| expr '%' expr { $$ = op_rem ($1, $3); }
-	| expr ':' expr { $$ = op_colon ($1, $3); }
+	| expr '|' expr { $$ = op_or($1, $3); }
+	| expr '&' expr { $$ = op_and($1, $3); }
+	| expr '=' expr { $$ = op_eq($1, $3); }
+	| expr '>' expr { $$ = op_gt($1, $3); }
+	| expr '<' expr { $$ = op_lt($1, $3); }
+	| expr GE expr  { $$ = op_ge($1, $3); }
+	| expr LE expr  { $$ = op_le($1, $3); }
+	| expr NE expr  { $$ = op_ne($1, $3); }
+	| expr '+' expr { $$ = op_plus($1, $3); }
+	| expr '-' expr { $$ = op_minus($1, $3); }
+	| expr '*' expr { $$ = op_times($1, $3); }
+	| expr '/' expr { $$ = op_div($1, $3); }
+	| expr '%' expr { $$ = op_rem($1, $3); }
+	| expr ':' expr { $$ = op_colon($1, $3); }
 	;
 
-
 %%
 
 struct val *
@@ -120,89 +121,65 @@
 {
 	struct val *vp;
 
-	vp = (struct val *) malloc (sizeof (*vp));
-	if (vp == NULL) {
+	vp = (struct val *) malloc(sizeof(*vp));
+	if (vp == NULL)
 		errx(ERR_EXIT, "malloc() failed");
-	}
 
 	vp->type = integer;
 	vp->u.i  = i;
-	return vp; 
+	return (vp);
 }
 
 struct val *
 make_str(const char *s)
 {
 	struct val *vp;
-	char *ep;
 
-	vp = (struct val *) malloc (sizeof (*vp));
-	if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) {
+	vp = (struct val *) malloc(sizeof(*vp));
+	if (vp == NULL || ((vp->u.s = strdup(s)) == NULL))
 		errx(ERR_EXIT, "malloc() failed");
-	}
 
-	/*
-	 * Previously we tried to scan the string to see if it ``looked like''
-	 * an integer (erroneously, as it happened).  Let strtoimax() do the
-	 * dirty work.  We could cache the value, except that we are using
-	 * a union and need to preserve the original string form until we
-	 * are certain that it is not needed.
-	 *
-	 * IEEE Std.1003.1-2001 says:
-	 * /integer/ An argument consisting only of an (optional) unary minus  
-	 *	     followed by digits.          
-	 *
-	 * This means that arguments which consist of digits followed by
-	 * non-digits MUST NOT be considered integers.  strtoimax() will
-	 * figure this out for us.
-	 */
-	if (eflag)
-		(void)strtoimax(s, &ep, 10);
+	if (is_integer(s))
+		vp->type = numeric_string;
 	else
-		(void)strtol(s, &ep, 10);
-
-	if (*ep != '\0')
 		vp->type = string;
-	else	
-		vp->type = numeric_string;
 
-	return vp;
+	return (vp);
 }
 
-
 void
 free_value(struct val *vp)
 {
 	if (vp->type == string || vp->type == numeric_string)
-		free (vp->u.s);	
+		free(vp->u.s);
 }
 
-
-intmax_t
+int
 to_integer(struct val *vp)
 {
 	intmax_t i;
 
-	if (vp->type == integer)
-		return 1;
-
-	if (vp->type == string)
-		return 0;
-
-	/* vp->type == numeric_string, make it numeric */
-	errno = 0;
-	if (eflag) {
+	/* we can only convert numeric_string to integer, here */
+	if (vp->type == numeric_string) {
+		errno = 0;
 		i  = strtoimax(vp->u.s, (char **)NULL, 10);
-		if (errno == ERANGE)
-			err(ERR_EXIT, NULL);
-	} else {
-		i = strtol(vp->u.s, (char **)NULL, 10);
+		/* just keep as numeric_string, if the conversion fails */
+		if (errno != ERANGE) {
+			free(vp->u.s);
+			vp->u.i = i;
+			vp->type = integer;
+		}
 	}
+	return (vp->type == integer);
+}
 
-	free (vp->u.s);
-	vp->u.i = i;
-	vp->type = integer;
-	return 1;
+void
+assert_to_integer(struct val *vp)
+{
+	if (vp->type == string)
+		errx(ERR_EXIT, "not a decimal number: '%s'", vp->u.s);
+	if (!to_integer(vp))
+		errx(ERR_EXIT, "operand too large: '%s'", vp->u.s);
 }
 
 void
@@ -228,15 +205,26 @@
 	vp->u.s  = tmp;
 }
 
+int
+is_integer(const char *s)
+{
+	if (eflag)
+		while (isspace((unsigned char)*s))
+			s++;
+	if (*s == '-')
+		s++;
+	while (isdigit((unsigned char)*s))
+		s++;
+	return (*s == '\0');
+}
 
 int
-isstring(struct val *vp)
+is_string(struct val *vp)
 {
 	/* only TRUE if this string is not a valid integer */
 	return (vp->type == string);
 }
 
-
 int
 yylex(void)
 {
@@ -247,10 +235,10 @@
 
 	p = *av++;
 
-	if (strlen (p) == 1) {
-		if (strchr ("|&=<>+-*/%:()", *p))
+	if (strlen(p) == 1) {
+		if (strchr("|&=<>+-*/%:()", *p))
 			return (*p);
-	} else if (strlen (p) == 2 && p[1] == '=') {
+	} else if (strlen(p) == 2 && p[1] == '=') {
 		switch (*p) {
 		case '>': return (GE);
 		case '<': return (LE);
@@ -258,19 +246,17 @@
 		}
 	}
 
-	yylval.val = make_str (p);
+	yylval.val = make_str(p);
 	return (TOKEN);
 }
 
 int
 is_zero_or_null(struct val *vp)
 {
-	if (vp->type == integer) {
+	if (vp->type == integer)
 		return (vp->u.i == 0);
-	} else {
-		return (*vp->u.s == 0 || (to_integer (vp) && vp->u.i == 0));
-	}
-	/* NOTREACHED */
+
+	return (*vp->u.s == 0 || (to_integer(vp) && vp->u.i == 0));
 }
 
 int
@@ -278,7 +264,7 @@
 {
 	int c;
 
-	setlocale (LC_ALL, "");
+	setlocale(LC_ALL, "");
 	if (getenv("EXPR_COMPAT") != NULL
 	    || check_utility_compat("expr")) {
 		av = argv + 1;
@@ -291,9 +277,8 @@
 				break;
 
 			default:
-				fprintf(stderr,
+				errx(ERR_EXIT,
 				    "usage: expr [-e] expression\n");
-				exit(ERR_EXIT);
 			}
 		av = argv + optind;
 	}
@@ -314,28 +299,27 @@
 	errx(ERR_EXIT, "syntax error");
 }
 
-
 struct val *
 op_or(struct val *a, struct val *b)
 {
-	if (is_zero_or_null (a)) {
-		free_value (a);
+	if (is_zero_or_null(a)) {
+		free_value(a);
 		return (b);
 	} else {
-		free_value (b);
+		free_value(b);
 		return (a);
 	}
 }
-		
+
 struct val *
 op_and(struct val *a, struct val *b)
 {
-	if (is_zero_or_null (a) || is_zero_or_null (b)) {
-		free_value (a);
-		free_value (b);
-		return (make_integer ((intmax_t)0));
+	if (is_zero_or_null(a) || is_zero_or_null(b)) {
+		free_value(a);
+		free_value(b);
+		return (make_integer((intmax_t)0));
 	} else {
-		free_value (b);
+		free_value(b);
 		return (a);
 	}
 }
@@ -345,19 +329,19 @@
 {
 	struct val *r;
 
-	if (isstring (a) || isstring (b)) {
-		to_string (a);
-		to_string (b);	
-		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) == 0));
-	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
-		r = make_integer ((intmax_t)(a->u.i == b->u.i));
+	if (is_string(a) || is_string(b)) {
+		to_string(a);
+		to_string(b);
+		r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) == 0));
+	} else {
+		assert_to_integer(a);
+		assert_to_integer(b);
+		r = make_integer((intmax_t)(a->u.i == b->u.i));
 	}
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
 struct val *
@@ -365,19 +349,19 @@
 {
 	struct val *r;
 
-	if (isstring (a) || isstring (b)) {
-		to_string (a);
-		to_string (b);
-		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) > 0));
-	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
-		r = make_integer ((intmax_t)(a->u.i > b->u.i));
+	if (is_string(a) || is_string(b)) {
+		to_string(a);
+		to_string(b);
+		r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) > 0));
+	} else {
+		assert_to_integer(a);
+		assert_to_integer(b);
+		r = make_integer((intmax_t)(a->u.i > b->u.i));
 	}
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
 struct val *
@@ -385,19 +369,19 @@
 {
 	struct val *r;
 
-	if (isstring (a) || isstring (b)) {
-		to_string (a);
-		to_string (b);
-		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) < 0));
-	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
-		r = make_integer ((intmax_t)(a->u.i < b->u.i));
+	if (is_string(a) || is_string(b)) {
+		to_string(a);
+		to_string(b);
+		r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) < 0));
+	} else {
+		assert_to_integer(a);
+		assert_to_integer(b);
+		r = make_integer((intmax_t)(a->u.i < b->u.i));
 	}
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
 struct val *
@@ -405,19 +389,19 @@
 {
 	struct val *r;
 
-	if (isstring (a) || isstring (b)) {
-		to_string (a);
-		to_string (b);
-		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) >= 0));
-	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
-		r = make_integer ((intmax_t)(a->u.i >= b->u.i));
+	if (is_string(a) || is_string(b)) {
+		to_string(a);
+		to_string(b);
+		r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) >= 0));
+	} else {
+		assert_to_integer(a);
+		assert_to_integer(b);
+		r = make_integer((intmax_t)(a->u.i >= b->u.i));
 	}
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
 struct val *
@@ -425,19 +409,19 @@
 {
 	struct val *r;
 
-	if (isstring (a) || isstring (b)) {
-		to_string (a);
-		to_string (b);
-		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) <= 0));
-	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
-		r = make_integer ((intmax_t)(a->u.i <= b->u.i));
+	if (is_string(a) || is_string(b)) {
+		to_string(a);
+		to_string(b);
+		r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) <= 0));
+	} else {
+		assert_to_integer(a);
+		assert_to_integer(b);
+		r = make_integer((intmax_t)(a->u.i <= b->u.i));
 	}
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
 struct val *
@@ -445,33 +429,31 @@
 {
 	struct val *r;
 
-	if (isstring (a) || isstring (b)) {
-		to_string (a);
-		to_string (b);
-		r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) != 0));
-	} else {
-		(void)to_integer(a);
-		(void)to_integer(b);
-		r = make_integer ((intmax_t)(a->u.i != b->u.i));
+	if (is_string(a) || is_string(b)) {
+		to_string(a);
+		to_string(b);
+		r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) != 0));
+	} else {
+		assert_to_integer(a);
+		assert_to_integer(b);
+		r = make_integer((intmax_t)(a->u.i != b->u.i));
 	}
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
-int
-chk_plus(intmax_t a, intmax_t b, intmax_t r)
+void
+assert_plus(intmax_t a, intmax_t b, intmax_t r)
 {
-
-	/* sum of two positive numbers must be positive */
-	if (a > 0 && b > 0 && r <= 0)
-		return 1;
-	/* sum of two negative numbers must be negative */
-	if (a < 0 && b < 0 && r >= 0)
-		return 1;
-	/* all other cases are OK */
-	return 0;
+	/*
+	 * sum of two positive numbers must be positive,
+	 * sum of two negative numbers must be negative
+	 */
+	if ((a > 0 && b > 0 && r <= 0) ||
+	    (a < 0 && b < 0 && r >= 0))
+		errx(ERR_EXIT, "overflow");
 }
 
 struct val *
@@ -479,36 +461,24 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
+	r = make_integer(a->u.i + b->u.i);
+	assert_plus(a->u.i, b->u.i, r->u.i);
 
-	if (eflag) {
-		r = make_integer(a->u.i + b->u.i);
-		if (chk_plus(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i + (long)b->u.i);
-
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
-int
-chk_minus(intmax_t a, intmax_t b, intmax_t r)
+void
+assert_minus(intmax_t a, intmax_t b, intmax_t r)
 {
-
 	/* special case subtraction of INTMAX_MIN */
-	if (b == INTMAX_MIN) {
-		if (a >= 0)
-			return 1;
-		else
-			return 0;
-	}
-	/* this is allowed for b != INTMAX_MIN */
-	return chk_plus (a, -b, r);
+	if (b == INTMAX_MIN && a < 0)
+		errx(ERR_EXIT, "overflow");
+	/* check addition of negative subtrahend */
+	assert_plus(a, -b, r);
 }
 
 struct val *
@@ -516,33 +486,25 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
+	assert_to_integer(a);
+	assert_to_integer(b);
+	r = make_integer(a->u.i - b->u.i);
+	assert_minus(a->u.i, b->u.i, r->u.i);
 
-	if (eflag) {
-		r = make_integer(a->u.i - b->u.i);
-		if (chk_minus(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i - (long)b->u.i);
-
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
 
-int
-chk_times(intmax_t a, intmax_t b, intmax_t r)
+void
+assert_times(intmax_t a, intmax_t b, intmax_t r)
 {
-	/* special case: first operand is 0, no overflow possible */
-	if (a == 0)
-		return 0;
-	/* verify that result of division matches second operand */
-	if (r / a != b)
-		return 1;
-	return 0;
+	/*
+	 * if first operand is 0, no overflow is possible,
+	 * else result of division test must match second operand
+	 */
+	if (a != 0 && r / a != b)
+		errx(ERR_EXIT, "overflow");
 }
 
 struct val *
@@ -550,32 +512,24 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
-
-	if (eflag) {
-		r = make_integer(a->u.i * b->u.i);
-		if (chk_times(a->u.i, b->u.i, r->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i * (long)b->u.i);
+	assert_to_integer(a);
+	assert_to_integer(b);
+	r = make_integer(a->u.i * b->u.i);
+	assert_times(a->u.i, b->u.i, r->u.i);
 
-	free_value (a);
-	free_value (b);
+	free_value(a);
+	free_value(b);
 	return (r);
 }
 
-int
-chk_div(intmax_t a, intmax_t b)
+void
+assert_div(intmax_t a, intmax_t b)
 {
-	/* div by zero has been taken care of before */
+	if (b == 0)
+		errx(ERR_EXIT, "division by zero");
 	/* only INTMAX_MIN / -1 causes overflow */
 	if (a == INTMAX_MIN && b == -1)
-		return 1;
-	/* everything else is OK */
-	return 0;
+		errx(ERR_EXIT, "overflow");
 }
 
 struct val *
@@ -583,51 +537,33 @@
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
-
-	if (b->u.i == 0) {
-		errx(ERR_EXIT, "division by zero");
-	}
-
-	if (eflag) {
-		r = make_integer(a->u.i / b->u.i);
-		if (chk_div(a->u.i, b->u.i)) {
-			errx(ERR_EXIT, "overflow");
-		}
-	} else
-		r = make_integer((long)a->u.i / (long)b->u.i);
+	assert_to_integer(a);
+	assert_to_integer(b);
+	/* assert based on operands only, not on result */
+	assert_div(a->u.i, b->u.i);
+	r = make_integer(a->u.i / b->u.i);
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
-	
+
 struct val *
 op_rem(struct val *a, struct val *b)
 {
 	struct val *r;
 
-	if (!to_integer(a) || !to_integer(b)) {
-		errx(ERR_EXIT, "non-numeric argument");
-	}
-
-	if (b->u.i == 0) {
-		errx(ERR_EXIT, "division by zero");
-	}
-
-	if (eflag)
-		r = make_integer(a->u.i % b->u.i);
-	        /* chk_rem necessary ??? */
-	else
-		r = make_integer((long)a->u.i % (long)b->u.i);
+	assert_to_integer(a);
+	assert_to_integer(b);
+	/* pass a=1 to only check for div by zero */
+	assert_div(1, b->u.i);
+	r = make_integer(a->u.i % b->u.i);
 
-	free_value (a);
-	free_value (b);
-	return r;
+	free_value(a);
+	free_value(b);
+	return (r);
 }
-	
+
 struct val *
 op_colon(struct val *a, struct val *b)
 {
@@ -642,33 +578,30 @@
 	to_string(b);
 
 	/* compile regular expression */
-	if ((eval = regcomp (&rp, b->u.s, 0)) != 0) {
-		regerror (eval, &rp, errbuf, sizeof(errbuf));
+	if ((eval = regcomp(&rp, b->u.s, 0)) != 0) {
+		regerror(eval, &rp, errbuf, sizeof(errbuf));
 		errx(ERR_EXIT, "%s", errbuf);
 	}
 
 	/* compare string against pattern */
 	/* remember that patterns are anchored to the beginning of the line */
-	if (regexec(&rp, a->u.s, (size_t)2, rm, 0) == 0 && rm[0].rm_so == 0) {
+	if (regexec(&rp, a->u.s, (size_t)2, rm, 0) == 0 && rm[0].rm_so == 0)
 		if (rm[1].rm_so >= 0) {
 			*(a->u.s + rm[1].rm_eo) = '\0';
-			v = make_str (a->u.s + rm[1].rm_so);
+			v = make_str(a->u.s + rm[1].rm_so);
 
-		} else {
-			v = make_integer ((intmax_t)(rm[0].rm_eo - rm[0].rm_so));
-		}
-	} else {
-		if (rp.re_nsub == 0) {
-			v = make_integer ((intmax_t)0);
-		} else {
-			v = make_str ("");
-		}
-	}
+		} else
+			v = make_integer((intmax_t)(rm[0].rm_eo - rm[0].rm_so));
+	else
+		if (rp.re_nsub == 0)
+			v = make_integer((intmax_t)0);
+		else
+			v = make_str("");
 
 	/* free arguments and pattern buffer */
-	free_value (a);
-	free_value (b);
-	regfree (&rp);
+	free_value(a);
+	free_value(b);
+	regfree(&rp);
 
-	return v;
+	return (v);
 }

--------------060600010302060201060806--