From owner-freebsd-standards@FreeBSD.ORG Mon Jun 27 11:07:12 2011 Return-Path: Delivered-To: freebsd-standards@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54516106564A for ; Mon, 27 Jun 2011 11:07:12 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3B3978FC24 for ; Mon, 27 Jun 2011 11:07:12 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.4/8.14.4) with ESMTP id p5RB7CoB071970 for ; Mon, 27 Jun 2011 11:07:12 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.4/8.14.4/Submit) id p5RB7BTe071968 for freebsd-standards@FreeBSD.org; Mon, 27 Jun 2011 11:07:11 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 27 Jun 2011 11:07:11 GMT Message-Id: <201106271107.p5RB7BTe071968@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-standards@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-standards@FreeBSD.org X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2011 11:07:12 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/157050 standards OSS implementation lacks AFMT_FLOAT o stand/154842 standards invalid request authenticator in the second and subseq o stand/150093 standards C++ std::locale support is broken a stand/149980 standards [libc] [patch] negative value integer to nanosleep(2) o stand/147210 standards xmmintrin.h and cstdlib conflicts with each other with o docs/143472 standards gethostname(3) references undefined value: HOST_NAME_M s stand/141705 standards [libc] [request] libc lacks cexp (and friends) o stand/130067 standards Wrong numeric limits in system headers? o stand/124860 standards flockfile(3) doesn't work when the memory has been exh o stand/121921 standards [patch] Add leap second support to at(1), atrun(8) o stand/116477 standards rm(1): rm behaves unexpectedly when using -r and relat o bin/116413 standards incorrect getconf(1) handling of unsigned constants gi o stand/116081 standards make does not work with the directive sinclude p stand/107561 standards [libc] [patch] [request] Missing SUS function tcgetsid o stand/100017 standards [Patch] Add fuser(1) functionality to fstat(1) o stand/96236 standards [patch] [posix] sed(1) incorrectly describes a functio o stand/94729 standards [libc] fcntl() throws undocumented ENOTTY a stand/86484 standards [patch] mkfifo(1) uses wrong permissions o stand/82654 standards C99 long double math functions are missing o stand/81287 standards [patch] fingerd(8) might send a line not ending in CRL a stand/80293 standards sysconf() does not support well-defined unistd values o stand/79056 standards [feature request] [atch] regex(3) regression tests o stand/70813 standards [patch] ls(1) not Posix compliant o stand/66357 standards make POSIX conformance problem ('sh -e' & '+' command- s kern/64875 standards [libc] [patch] [request] add a system call: fdatasync( o stand/56476 standards [patch] cd9660 unicode support simple hack o stand/54410 standards one-true-awk not POSIX compliant (no extended REs) o stand/46119 standards Priority problems for SCHED_OTHER using pthreads o stand/44365 standards [headers] [patch] [request] introduce ulong and unchar a stand/41576 standards ln(1): replacing old dir-symlinks o stand/39256 standards snprintf/vsnprintf aren't POSIX-conformant for strings a docs/26003 standards getgroups(2) lists NGROUPS_MAX but not syslimits.h s stand/24590 standards timezone function not compatible witn Single Unix Spec o stand/21519 standards sys/dir.h should be deprecated some more s bin/14925 standards getsubopt isn't poisonous enough 35 problems total. From owner-freebsd-standards@FreeBSD.ORG Wed Jun 29 01:10:01 2011 Return-Path: Delivered-To: standards@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 239661065670; Wed, 29 Jun 2011 01:10:01 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 486E28FC16; Wed, 29 Jun 2011 01:09:59 +0000 (UTC) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5SN73QD001264; Wed, 29 Jun 2011 09:07:03 +1000 Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5SN6v60014950 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Jun 2011 09:06:58 +1000 Date: Wed, 29 Jun 2011 09:06:57 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stefan Esser In-Reply-To: <4E0A0774.3090004@freebsd.org> Message-ID: <20110629082103.O1084@besplex.bde.org> References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Alexander Best , Poul-Henning Kamp , standards@FreeBSD.org, Bruce Evans Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 01:10:01 -0000 [I changed developers to standards instead of removing it] On Tue, 28 Jun 2011, Stefan Esser wrote: > Am 28.06.2011 13:02, schrieb Poul-Henning Kamp: >> In message <4E09AF8E.5010509@freebsd.org>, "Stefan Esser" writes: >> >>> Due to (false, according to BDE) considerations for POSIX compliance, >>> the 64bit code was made conditional on a command line option in 2002. >> >> I think 64bit is the wrong thing to focus on, shouldn't it be >> "intmax_t" so we will not have to revisit this again ? > > Well, actually it already *is* intmax_t, which happens to be 64bit > on all architectures I checked ;-) > > My proposal is just to not produce overflows when easily avoidable. > This takes little effort, simplifies the code and makes scripts more > portable accross architectures. > > Are there any supported architectures with intmax_t smaller than 64bit? There cannot be, since C99 requires long long to be at least 64 bits (counting the sign bit) and it requires intmax_t to be capable of representing any value of any signed integer type. Which checking this, I noticed that: - preprocessor arithmetic is done using intmax_t or uintmax_t. This causes portability problems related to ones for expr -- expressions like ULONG_MAX + ULONG_MAX suddenly started in 1999 giving twice ULONG_MAX instead of ULONG_MAX-1, but only on arches where ULONG_MAX < UINTMAX_MAX. (I use unsigned values in this example to give defined behaviour on overflow, so that the expression ULONG_MAX + ULONG_MAX is not just a bug. expr doesn't have this complication.) - C99 doesn't require intmax_t to be the logically longest type. Thus it permits FreeBSD's rather bizarre implementation of intmax_t being plain long which is logically shorter than long long. Other points: - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", but it isn't the result that is too large, but the arg that is too large. This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. `expr -e 1000000000000000000 \* 10' gives "overflow". This message is correct, but it is in a different style to strerror() (uncapitalized, and more concise). - `expr 10000000000000000000' (19 or even 119 zeros) gives no error. It is documented that the arg is parsed as a string in this case, and the documentation for -e doesn't clearly say that -e changes this. And -e doesn't change this if the arg clearly isn't a number (e.g., if it is 10000000000000000000mumble), or even if it is a non-decimal number (e.g., if is 010, 0x10 or 10.0). If the arg isn't a decimal integer, then (except for -e on decimal integers), there is an error irrespective of -e when arithmetic is attempted (e.g., adding 0). The error message for this bogusly says "non-numeric argument" when the arg is numeric but not a decimal integer. - POSIX requires brokenness for bases other than 10, but I wonder if an arg like 0x10 invokes undefined behaviour and thus can be made to work. (I wanted to use a hex number since I can never remember what INTMAX_MAX is in decimal and wanted to type it in hex for checking the range and overflow errors.) Allowing hex args causes fewer problems than allowing decimal args larger than INT32_MAX, since they are obviously unportable. Some FreeBSD utilities, e.g., dd, support hex args and don't worry about POSIX restricting them. - POSIX unfortunately requires args larger than INT32_MAX to be unportable (to work if longs are longer than 32 bits, else to give undefined (?) behaviour. For portability there could be a -p switch that limits args to INT32_MAX even if longs are longer than 32 bits. - I hope POSIX doesn't require benign overflow. Thus treating all overflows as errors is good for portability and doesn't require any switch. Bruce From owner-freebsd-standards@FreeBSD.ORG Wed Jun 29 12:49:34 2011 Return-Path: Delivered-To: standards@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2C3721065675 for ; Wed, 29 Jun 2011 12:49:34 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm20-vm3.bullet.mail.ne1.yahoo.com (nm20-vm3.bullet.mail.ne1.yahoo.com [98.138.91.150]) by mx1.freebsd.org (Postfix) with SMTP id E123E8FC19 for ; Wed, 29 Jun 2011 12:49:33 +0000 (UTC) Received: from [98.138.90.52] by nm20.bullet.mail.ne1.yahoo.com with NNFMP; 29 Jun 2011 12:36:25 -0000 Received: from [98.138.84.37] by tm5.bullet.mail.ne1.yahoo.com with NNFMP; 29 Jun 2011 12:36:25 -0000 Received: from [127.0.0.1] by smtp105.mail.ne1.yahoo.com with NNFMP; 29 Jun 2011 12:36:25 -0000 X-Yahoo-Newman-Id: 760581.86118.bm@smtp105.mail.ne1.yahoo.com Received: from [192.168.119.20] (se@81.173.144.90 with plain) by smtp105.mail.ne1.yahoo.com with SMTP; 29 Jun 2011 05:36:24 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: 0Mu8oeUVM1l5toojlmSMG5uDs4_AoT2uXC__MjGRoKmxOF. WEEUaDQKCmGQ58g2xLcRlsvlZsa38i.RUQ1SaJgrKVR_z8FUqQ2rYE9gvZGI Ho0XRfdOop1hqOgsdGAwYyex_ghqInuiqctkx5jG1nupzC5CpTrocm9ad.vN WPNK8w1N2mFRI0IU89lc2IzSWrQCduzKxlQS0Sop.nh9Ht40g4t7xpM9Dssr LzfXral9LKDAT0HgU2gUamsS0KcbN5auoKLXpKBlAkXBFdwgpeUVxfxZnqiN TGa9NOpxsJi_NrTSlBuXEKey7vaGqbRj15u.yzveV5cre6Z4y071LVKRM1iL 7jDeAScChe.VVKhk3kep1nqqKptbHw0g4PdbCxA-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0B1C47.4010201@freebsd.org> Date: Wed, 29 Jun 2011 14:36:23 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Bruce Evans References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> In-Reply-To: <20110629082103.O1084@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Best , Poul-Henning Kamp , Stefan Esser , standards@FreeBSD.org, Bruce Evans Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 12:49:34 -0000 Am 29.06.2011 01:06, schrieb Bruce Evans: > [I changed developers to standards instead of removing it] Thanks Bruce. I just subcribed to standards@ and was about to move the thread there, too. > On Tue, 28 Jun 2011, Stefan Esser wrote: >> Are there any supported architectures with intmax_t smaller than 64bit? > > There cannot be, since C99 requires long long to be at least 64 bits > (counting the sign bit) and it requires intmax_t to be capable of > representing any value of any signed integer type. > > Which checking this, I noticed that: > - preprocessor arithmetic is done using intmax_t or uintmax_t. This causes > portability problems related to ones for expr -- expressions like > ULONG_MAX + ULONG_MAX suddenly started in 1999 giving twice ULONG_MAX > instead of ULONG_MAX-1, but only on arches where ULONG_MAX < UINTMAX_MAX. > (I use unsigned values in this example to give defined behaviour on > overflow, so that the expression ULONG_MAX + ULONG_MAX is not just a bug. > expr doesn't have this complication.) > - C99 doesn't require intmax_t to be the logically longest type. Thus it > permits FreeBSD's rather bizarre implementation of intmax_t being plain > long which is logically shorter than long long. > > Other points: > - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", > but it isn't the result that is too large, but the arg that is too large. > This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. > `expr -e 1000000000000000000 \* 10' gives "overflow". This message is > correct, but it is in a different style to strerror() (uncapitalized, > and more concise). The patch that I sent with my first message fixes this. The message is changed from "Result too large" to "Not a valid integer: %s". ("non-numeric argument" is used in other places and I could adapt to that, though I prefer to also see which argument was rejected. But I think that "not a valid integer" better describes the situation.) Without "-e" the numeric result is "undefined" and no error is signaled, since there was no test whether the conversion succeeded before I added it back in 2000. BTW: The current number parsing is not fully POSIX compliant, since it allows to write "expr +1 - +1". This is an artefact of using strtoimax(..., 10) for conversion and validity checking at the same time (and strtoimax accepts leading "+", which POSIX forbids). This could be fixed by a simple test of the first non-blank character of the parameter. > - `expr 10000000000000000000' (19 or even 119 zeros) gives no error. It > is documented that the arg is parsed as a string in this case, and the > documentation for -e doesn't clearly say that -e changes this. And -e > doesn't change this if the arg clearly isn't a number > (e.g., if it is 10000000000000000000mumble), or even if it is a non-decimal > number (e.g., if is 010, 0x10 or 10.0). If the arg isn't a decimal integer, > then (except for -e on decimal integers), there is an error irrespective > of -e when arithmetic is attempted (e.g., adding 0). The error message > for this bogusly says "non-numeric argument" when the arg is numeric but > not a decimal integer. Yes, the reason is the conversion from string to intmax_t just before the operation. You are right in pointing out that strtoimax can fail for (out of range) integer numbers. The error message should probably be changed to "not a valid integer" instead (or suggest a better form; but the shorter "invalid integer" seems wrong to me ...). > - POSIX requires brokenness for bases other than 10, but I wonder if an > arg like 0x10 invokes undefined behaviour and thus can be made to > work. (I wanted to use a hex number since I can never remember what > INTMAX_MAX is in decimal and wanted to type it in hex for checking > the range and overflow errors.) Allowing hex args causes fewer > problems than allowing decimal args larger than INT32_MAX, since > they are obviously unportable. Some FreeBSD utilities, e.g., dd, > support hex args and don't worry about POSIX restricting them. Does POSIX require that expr exits on illegal arguments? Else we'll have undefined behaviour and might be allowed to add parsing of hex numbers. I'd be glad to add it (but this would be a different commit). But octals must not be supported, since 010 must be treated as decimal number (i.e., we must not just make strtoimax accept other bases than 10, but must special case for decimal and hex). > - POSIX unfortunately requires args larger than INT32_MAX to be unportable > (to work if longs are longer than 32 bits, else to give undefined (?) > behaviour. For portability there could be a -p switch that limits args > to INT32_MAX even if longs are longer than 32 bits. Well, undefined behaviour can always be to return the correct result ;-) I'd be willing to add "-p" (effectively just make "-e" the default that can be overridden). > - I hope POSIX doesn't require benign overflow. Thus treating all overflows > as errors is good for portability and doesn't require any switch. So do I ... But if we agreed on making intmax_t the default and on "-p" for strict POSIX compliance, then we could of course keep the undetected overflows for that code path. Thank you for the comments! Best regards, STefan From owner-freebsd-standards@FreeBSD.ORG Wed Jun 29 20:21:31 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 624781065670 for ; Wed, 29 Jun 2011 20:21:31 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm17.bullet.mail.bf1.yahoo.com (nm17.bullet.mail.bf1.yahoo.com [98.139.212.176]) by mx1.freebsd.org (Postfix) with SMTP id EB35C8FC13 for ; Wed, 29 Jun 2011 20:21:30 +0000 (UTC) Received: from [98.139.212.148] by nm17.bullet.mail.bf1.yahoo.com with NNFMP; 29 Jun 2011 20:07:45 -0000 Received: from [98.139.211.204] by tm5.bullet.mail.bf1.yahoo.com with NNFMP; 29 Jun 2011 20:07:44 -0000 Received: from [127.0.0.1] by smtp213.mail.bf1.yahoo.com with NNFMP; 29 Jun 2011 20:07:44 -0000 X-Yahoo-Newman-Id: 918647.82623.bm@smtp213.mail.bf1.yahoo.com Received: from [192.168.119.20] (se@81.173.158.168 with plain) by smtp213.mail.bf1.yahoo.com with SMTP; 29 Jun 2011 13:07:44 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: sGJ0h0IVM1nweiP28hoWO89pdFFk1nd4Z8Ev0wEjDjGKCvH YMjSumln2ZrDnEKu0HicXuRQ5CuwsoSzyVByKJFeYekKk3k2t7VHaj3lmPRN 9qwr20HhoRb4cOL2FQPYUdThKfWMHoFBOan5aTCiUeXWdm0bFGOsHkUtHQXP wnCePLgCtDud43lFIhTFBXezw9lOeLOwc8pKBjttwal4jdHJxreQQcymGNVK ZbTdZ0MgHJ4FDe8hzRDmG5xppSElyeADCwJbo3MvCdso.NTO2gHqykSgiR9R yoKttCk5nt991VGkkqIZtcP8.C0LZx87x_JkqOXENmQ1R5EXSTFXjMEC6Uv2 0EgRdv..HVKgJ9rmYNlIZGZkBo0od.1tcLbDl1A-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0B860E.50504@freebsd.org> Date: Wed, 29 Jun 2011 22:07:42 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-standards@freebsd.org References: <4E09AF8E.5010509@freebsd.org> In-Reply-To: <4E09AF8E.5010509@freebsd.org> X-Forwarded-Message-Id: <4E09AF8E.5010509@freebsd.org> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 20:21:31 -0000 Since BDE moved the thread over to this list I'm reposting the first message under the assumption, that many list members have not seen it, yet. Replies received in private were in support of the change, except of the warning of possible POSIX compliance issues received from GWollman, who therefore asked to move this thread to the standards list. -------- Original Message -------- Subject: [RFC] Consistent numeric range for "expr" on all architectures Date: Tue, 28 Jun 2011 12:40:14 +0200 From: Stefan Esser To: FreeBSD Developers CC: Bruce Evans , Alexander Best Hi, this point has been on my ToDO list for too long and I'd like to gather opinions whether we should clean up "expr" a bit. Back in 2000, I found that expr silently failed in an installer for a Linux software demo due to restricted range. It was a bug in that installer, but since it was just a few lines of code change, I made "expr", "test" and "/bin/sh" support 64bit range on all architectures. At the same time I added range checks for all calculations as suggested by Bruce Evans. Due to (false, according to BDE) considerations for POSIX compliance, the 64bit code was made conditional on a command line option in 2002. The expr builtin in /bin/sh has been decoupled from the "expr" sources, some time ago, so it has become a separate issue. Given the simple functionality, the behaviour of "expr" is quite complex, depending on CPU architecture, environment and options. The following list shows supported numeric range depending on the CPU: command | ILP32 | I32LP64 | specialties -------------------------------------------------------------------- expr | 32bit | 64bit | wrap around expr -e | 64bit | 64bit | overflow check EXPR_COMPAT=1 expr | 64bit | 64bit | overflow check EXPR_COMPAT=1 expr -e | error | error | "-e" seen as 1st operand (EXPR_COMPAT allows support of "expr -1 + 1", where "-1" is treated as an option by POSIX expr and "expr -- -1 + 1" must be written instead.) Examples on i386: # --- 32 bit range, no range checks: $ expr 2000000000 + 2000000000 -294967296 # --- 64 bit range and overflow checks via "-e" or EXPR_COMAPT: $ expr -e 2000000000 + 2000000000 4000000000 $ EXPR_COMPAT=1 expr 2000000000 + 2000000000 4000000000 $ expr -e 5000000000000000000 + 5000000000000000000 expr: overflow $ echo $? 2 $ EXPR_COMPAT=1 expr -e 1 + 1 expr: syntax error $ echo $? 2 Examples on amd64: # --- always 64 bit range, but wrap around controlled by -e or ENV: $ expr 5000000000000000000 + 5000000000000000000 -8446744073709551616 $ expr -e 5000000000000000000 + 5000000000000000000 expr: overflow $ EXPR_COMPAT=1 expr 5000000000000000000 + 5000000000000000000 expr: overflow Why do I consider the current situation bad: 1) Shell scripts using "expr" have inconsistent behaviour, depending on the word size of the CPU architecture. This makes it possible to develop a shell script on amd64 and be sure it works on i386. 2) You can not just put "expr -e" into shell scripts to get consistent 64bit range over all architectures, since that will fail with a syntax error if EXPR_COMPAT is set in the environment. 3) Range checks are very useful, since silent overflow in shell scripts can lead to data loss or other malfunction that is hard to diagnose. With 64 bit range, overflows are prevented even in cases, where they occur with 32 bit range. 4) The original reason for 64 bit support was a silently failing test in a commercial installer. It happened to fail in such a way, that the result of the computation was ignored under Linux but not in the port I was preparing at that time. 5) The current code is made complex by support of different ranges and overflow behaviour, depending on the options passed and environment. Consistent behavious is not only easier to understand and use, but does even reduce code size and complexity of the sources. Regarding the POSIX compatibility of the proposed change, I received a message from BDE: On Tue, 15 Jun 2010, Bruce Evans wrote: >On Tue, 15 Jun 2010, Stefan Esser wrote: >> I extended expr and test (standalone binaries and /bin/sh builtins) to >> handle 64 bit on i386 a looong time ago (and I also added overflow >> checks to all calculations). But then it was decided, that POSIX demands >> 32 bit range on 32 bit machines, so support for 64 bit range was made >> conditional on a switch (expr -e if memory serves me right ...). So yes, >> 64 bit range is available on all platforms. > > Hmm, POSIX doesn't require this brokenness for the shell, though it > requires practically equivalent brokenness for all operands and option- > arguments generally: it requires "all" integer arithmetic to use > signed long variables and "all" floating point arithmetic to use double > variables, with Standard C semantics. signed long may have any size, > but normally has 31 or 63 value bits as above. It is unclear how a > variable can ever be floating point. Then it contradicts^Woverrides > "all" with certain extensions for the shell. However, these extensions > are almost useless since they mainly make a difference when the signed > long arithmetic would overflow, in which case the behaviour is undefined > (?) so it can DTRT using any extension that it wants. I take this to mean, that the shell should use "signed long" integers, but that it is up to the implementor, what action is taken on overflow and we are free to implement an extension (and enable it by default), which supports 64bit range instead of returning bogus values. My suggestion is to make the following modifications to expr: 1) Always compute with 64 bit range and overflow checks enabled. This means, that the "-e" option is always assumed. 2) Continue to support the "-e" option as a NOP for compatibility with scripts that knew about that FreeBSD extension. This suggestion applies to -CURRENT - I'm afraid a MFC might be considered too much of a POLA. If there are no strong objections, I'd like to commit this change before the code freeze. Of course, I'm willing to adapt the man page at the time of the code commit. The suggested patch just removes casts of operands to "long" that currently exist only to reduce range. All stored operands are 64bit, anyway. This patch if for review purposes only ("diff -w" to ignore indent level, to show that just complexity is removed). Best regards, STefan --- /usr/src/bin/expr/expr.y 2009-10-30 23:02:40.000000000 +0100 +++ ./expr.y 2011-04-28 16:14:52.000000000 +0200 @@ -71,7 +71,6 @@ int yylex(void); int yyparse(void); -static int eflag; char **av; %} @@ -156,10 +155,7 @@ * non-digits MUST NOT be considered integers. strtoimax() will * figure this out for us. */ - if (eflag) (void)strtoimax(s, &ep, 10); - else - (void)strtol(s, &ep, 10); if (*ep != '\0') vp->type = string; @@ -191,13 +187,9 @@ /* vp->type == numeric_string, make it numeric */ errno = 0; - if (eflag) { i = strtoimax(vp->u.s, (char **)NULL, 10); if (errno == ERANGE) - err(ERR_EXIT, NULL); - } else { - i = strtol(vp->u.s, (char **)NULL, 10); - } + errx(ERR_EXIT, "Not a valid integer: %s", vp->u.s); free (vp->u.s); vp->u.i = i; @@ -282,12 +274,11 @@ if (getenv("EXPR_COMPAT") != NULL || check_utility_compat("expr")) { av = argv + 1; - eflag = 1; } else { while ((c = getopt(argc, argv, "e")) != -1) switch (c) { case 'e': - eflag = 1; + /* ignore for backwards compatibility */ break; default: @@ -483,13 +474,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { r = make_integer(a->u.i + b->u.i); if (chk_plus(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i + (long)b->u.i); free_value (a); free_value (b); @@ -520,13 +508,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { r = make_integer(a->u.i - b->u.i); if (chk_minus(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i - (long)b->u.i); free_value (a); free_value (b); @@ -554,13 +539,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { r = make_integer(a->u.i * b->u.i); if (chk_times(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i * (long)b->u.i); free_value (a); free_value (b); @@ -591,13 +573,10 @@ errx(ERR_EXIT, "division by zero"); } - if (eflag) { r = make_integer(a->u.i / b->u.i); if (chk_div(a->u.i, b->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i / (long)b->u.i); free_value (a); free_value (b); @@ -617,11 +596,8 @@ errx(ERR_EXIT, "division by zero"); } - if (eflag) r = make_integer(a->u.i % b->u.i); /* chk_rem necessary ??? */ - else - r = make_integer((long)a->u.i % (long)b->u.i); free_value (a); free_value (b); From owner-freebsd-standards@FreeBSD.ORG Wed Jun 29 23:32:18 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 083471065673; Wed, 29 Jun 2011 23:32:18 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 7DF828FC12; Wed, 29 Jun 2011 23:32:17 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5TNWBg6004343 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Jun 2011 09:32:12 +1000 Date: Thu, 30 Jun 2011 09:32:11 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stefan Esser In-Reply-To: <4E0B1C47.4010201@freebsd.org> Message-ID: <20110630073705.P1117@besplex.bde.org> References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: standards@freebsd.org, Poul-Henning Kamp , Stefan Esser , Bruce Evans , Alexander Best Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2011 23:32:18 -0000 On Wed, 29 Jun 2011, Stefan Esser wrote: > Am 29.06.2011 01:06, schrieb Bruce Evans: >> Other points: >> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", >> but it isn't the result that is too large, but the arg that is too large. >> This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. >> `expr -e 1000000000000000000 \* 10' gives "overflow". This message is >> correct, but it is in a different style to strerror() (uncapitalized, >> and more concise). > > The patch that I sent with my first message fixes this. The message is > changed from "Result too large" to "Not a valid integer: %s". > > ("non-numeric argument" is used in other places and I could adapt to > that, though I prefer to also see which argument was rejected. But I > think that "not a valid integer" better describes the situation.) I prefer "operand too large". A decimal integer operand that is too large to be represented by intmax_t is not really invalid, but is just too large. We already have a different message for invalid. >From an old (?) version of the man page: % Arithmetic operations are performed using signed integer math. If the -e % flag is specified, arithmetic uses the C intmax_t data type (the largest ^^^^^^^^^^ actual arithmentic, not just parsing; but for expr [-e] 10000000000000000000 we are doing actual arithmetic -- see below % integral type available), and expr will detect arithmetic overflow and % return an error indication. If a numeric operand is specified which is "Numeric" is not defined anywhere in the man page. This is the only use of it in the man page. It means "decimal integer" and should say precisely that. The only hint about this in the man page is the statement that "all integer operands are interpreted in base 10". The fuzziness extends to error messages saying "non-numeric argument" instead of "operand not a decimal integer [used in integer context]". % so large as to overflow conversion to an integer, it is parsed as a % string instead. If -e is not specified, arithmetic operations and pars- This specificially says that large operands are parsed as strings. Strangely, since large operands are only checked for with -e, only -e can get this right; without -e, large operands are not even detected. However, this is a bug in the man page -- see below. % ing of integer arguments will overflow silently according to the rules of % the C standard, using the long data type. This says that the -e case is broken, but doesn't override the statement that large operands are parsed as strings. Since the man page is wrong, no override is needed. I originally though of using "argument" instead of "operand", but got the better word from the above section of the man page. > Without "-e" the numeric result is "undefined" and no error is signaled, > since there was no test whether the conversion succeeded before I added > it back in 2000. I first though that the error reporting must be delayed to when an operand is used in an expression, even with -e. But it is already delayed, and the parsing works as specified in POSIX. The parsing is just poorly or incorrectly documented in the man page. - The syntax in the man page doesn't seem to mention the degenerate expression . POSIX specifies this of course. can be either or , where is an optional unary minus followed by digits, and is any argument that is not an and not an operator symbol. Therefore, "expr -e 1000000000000000000" is not a syntax error as seems to be required by the man page; the arg in it forms a degenerate expression. The arg is not a since it is an . Therefore, the expression is numeric (I didn't check that POSIX says this explicitly). Therefore, we are justified in applying strtoimax() to all the operands in the expression (all 1 of them) and getting a range error. - The man page is broken in saying that unrepresentable numeric operands are parsed as strings instead. Whether an operand is numeric is determined by the POSIX syntax which is purely lexical and doesn't depend on representability. So for the degnerate expression with 1 operand, the type of the expression is determined by the type of the operand which is detemined lexically as described above. Similarly for parenthesized degenerate expressions. For non-degenerate expressions, the types of the operators and of the result are again mostly or always determined lexically by the types of the results. For example, '=' means equality of integers if both operands are integers, but equality of of strings if one or both operands is not an integer. - In all cases, whether an operand is an integer is context-dependent, so args must not be classified early. This seems to be done correctly, so the code conforms to the POSIX syntax although the man page doesn't. The syntax is still broken as designed, since it doesn't allow +1 to be an integer, and it requires octal intgers to be misinterpreted as decimal integers although no reasonable specification of decimal integers allows them to start with a '0', and it doesn't support non-decimal integers... >> - POSIX requires brokenness for bases other than 10, but I wonder if an >> arg like 0x10 invokes undefined behaviour and thus can be made to >> work. (I wanted to use a hex number since I can never remember what >> INTMAX_MAX is in decimal and wanted to type it in hex for checking >> the range and overflow errors.) Allowing hex args causes fewer >> problems than allowing decimal args larger than INT32_MAX, since >> they are obviously unportable. Some FreeBSD utilities, e.g., dd, >> support hex args and don't worry about POSIX restricting them. > > Does POSIX require that expr exits on illegal arguments? Not sure. It requires an exit status of 2 for invalid expressions. For the "+" operator, the operands are required to be decimal integers, but the error handling isn't so clearly specified. For the "&" operator, operands(s) are allowed to be null. Anyway, hex numbers can't be put through this gap. Since they are not decimal integers, they are required to be interpreted as strings in some contexts. So "expr 0x10 \< 2" gives 1 because the string "0" is less then the string "2". This conflicts with "expr 16 \< 2" giving 0 since both operands are intgers and of course 16 = 0x10 is not less than 2. >> - POSIX unfortunately requires args larger than INT32_MAX to be unportable >> (to work if longs are longer than 32 bits, else to give undefined (?) >> behaviour. For portability there could be a -p switch that limits args >> to INT32_MAX even if longs are longer than 32 bits. > > Well, undefined behaviour can always be to return the correct result ;-) > > I'd be willing to add "-p" (effectively just make "-e" the default that > can be overridden). I now don't see any problem with -e. Not even the one for degenerate expressions that I thought I saw. POSIX says that shell expressions should be prefered to expr, and for shell expressions it has a non-null discussion of representability and overflows. It basically says that only long arithmetic is supported, without even C's type suffixes which are needed to extend to unsigned long arithmetic, but extensions are encouraged. Bruce From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 08:23:34 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F0BD1065673 for ; Thu, 30 Jun 2011 08:23:34 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm16-vm0.bullet.mail.sp2.yahoo.com (nm16-vm0.bullet.mail.sp2.yahoo.com [98.139.91.210]) by mx1.freebsd.org (Postfix) with SMTP id 1CBEA8FC0C for ; Thu, 30 Jun 2011 08:23:34 +0000 (UTC) Received: from [98.139.91.63] by nm16.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 08:09:40 -0000 Received: from [208.71.42.200] by tm3.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 08:09:40 -0000 Received: from [127.0.0.1] by smtp211.mail.gq1.yahoo.com with NNFMP; 30 Jun 2011 08:09:40 -0000 X-Yahoo-Newman-Id: 632559.16409.bm@smtp211.mail.gq1.yahoo.com Received: from [192.168.119.20] (se@81.173.152.201 with plain) by smtp211.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 01:09:39 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: oacrv1cVM1l64zQLXGULrH9xaOXV.S8jMke.EWwb20ChH9f xbr89FWDxKcIeOhfEfTxOeqQroIMbmYhEeq.3eucPreft5a.h4DtpVXlI.YA nwhzoBc8TXUUlDBekaE5bwQIb535vBTtAVvZlJ1wW00b1PLkW5mAK2H3ZEj_ M1M.WoOteceOkBuAzYcESNoymNyqi9f86Kz_i5e.HKgz7pux4uVnf7MGowIr rD5mp_NV0mEWeCI_SrFKfTYq2l4COlz42B3Sn57oE_HMMmm_S4z6rAzU9F.Q HDq6PqG2FQaEDgWF0Sk.H.Sy5ehbYpRBOKiZE6MObZSAKmV1Y8whOYvfAJhV hXDDUn.IjKuYLM_5pt2E4do_QcA-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0C2F41.2060302@freebsd.org> Date: Thu, 30 Jun 2011 10:09:37 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Bruce Evans References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> In-Reply-To: <20110630073705.P1117@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Best , Poul-Henning Kamp , standards@freebsd.org, Bruce Evans Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 08:23:34 -0000 Hi Bruce, thank you for the detailed analysis! On 30.06.2011 01:32, Bruce Evans wrote: > On Wed, 29 Jun 2011, Stefan Esser wrote: >> Am 29.06.2011 01:06, schrieb Bruce Evans: >>> Other points: >>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", >>> but it isn't the result that is too large, but the arg that is too large. >>> This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. >>> `expr -e 1000000000000000000 \* 10' gives "overflow". This message is >>> correct, but it is in a different style to strerror() (uncapitalized, >>> and more concise). >> >> The patch that I sent with my first message fixes this. The message is >> changed from "Result too large" to "Not a valid integer: %s". >> >> ("non-numeric argument" is used in other places and I could adapt to >> that, though I prefer to also see which argument was rejected. But I >> think that "not a valid integer" better describes the situation.) > > I prefer "operand too large". A decimal integer operand that is too > large to be represented by intmax_t is not really invalid, but is just > too large. We already have a different message for invalid. Yes, I also thought the "non-numeric" message was misleading and that "not a valid integer" was just slightly less misleading ;-) But "operand too large" correctly describes the situation and I'll just a test for that case (instead of just relying on strtoimax() to decide about integer-ness). According to POSIX, "+1" is no valid integer operand in "expr", should a leading "+" also be detected and rejected? >> From an old (?) version of the man page: > > % Arithmetic operations are performed using signed integer math. If the -e > % flag is specified, arithmetic uses the C intmax_t data type (the largest > ^^^^^^^^^^ actual arithmentic, not just parsing; but > for expr [-e] 10000000000000000000 we are > doing actual arithmetic -- see below > % integral type available), and expr will detect arithmetic overflow and > % return an error indication. If a numeric operand is specified which is > > "Numeric" is not defined anywhere in the man page. This is the only > use of it in the man page. It means "decimal integer" and should say > precisely that. The only hint about this in the man page is the statement > that "all integer operands are interpreted in base 10". The fuzziness > extends to error messages saying "non-numeric argument" instead of > "operand not a decimal integer [used in integer context]". Ok, all numeric arguments should be called "decimal integer" in the documentation and error messages (if applicable). > % so large as to overflow conversion to an integer, it is parsed as a > % string instead. If -e is not specified, arithmetic operations and pars- > > This specificially says that large operands are parsed as strings. > Strangely, since large operands are only checked for with -e, only -e can > get this right; without -e, large operands are not even detected. > However, this is a bug in the man page -- see below. I had already wondered about this case (large numbers being considered strings), but I thought it was in compliance with POSIX. This will be fixed by checking operands to be POSIX decimal integers. > % ing of integer arguments will overflow silently according to the rules of > % the C standard, using the long data type. > > This says that the -e case is broken, but doesn't override the statement > that large operands are parsed as strings. Since the man page is wrong, > no override is needed. Is the -e case OK if decimal numbers are correctly detected (not just those accepted by strtoimax(), but all matching "-?[0-9]+")? > I originally though of using "argument" instead of "operand", but got the > better word from the above section of the man page. > >> Without "-e" the numeric result is "undefined" and no error is signaled, >> since there was no test whether the conversion succeeded before I added >> it back in 2000. > > I first though that the error reporting must be delayed to when an operand > is used in an expression, even with -e. But it is already delayed, and > the parsing works as specified in POSIX. The parsing is just poorly or > incorrectly documented in the man page. Yes, the reporting is delayed, since any operand (even number) might be used in a RE matching operation (":"). I'll see whether I find a better wording for the man page but this might be easier for a native speaker of English. > - The syntax in the man page doesn't seem to mention the degenerate > expression . POSIX specifies this of course. > can be either or , where is an optional > unary minus followed by digits, and is any argument that is > not an and not an operator symbol. Yes, that definition makes "+1" an invalid ... > Therefore, "expr -e 1000000000000000000" is not a syntax error as seems > to be required by the man page; the arg in it forms a degenerate > expression. The arg is not a since it is an . > Therefore, the expression is numeric (I didn't check that POSIX says > this explicitly). Therefore, we are justified in applying strtoimax() > to all the operands in the expression (all 1 of them) and getting a > range error. I think this is rational behaviour of the code, but I'm not convinced, that can not be and at the same time. See for example: $ expr \( 222 \) : "2*" 3 $ expr \( 111 + 111 \) : "2*" 3 The numeric result is taken as string to perform the RE matching and the length of the first match is returned. Therefore, a clear integer can still be interpreted as string, if operand to a string expression. The degenerate case of just a too long integer might still be considered a valid string and the missing operation (or identity?) does not change that type. Therefore it might be argued, that a very long numeric argument should just be printed as string in that case. (???) > - The man page is broken in saying that unrepresentable numeric operands > are parsed as strings instead. Whether an operand is numeric is > determined by the POSIX syntax which is purely lexical and doesn't > depend on representability. So for the degnerate expression with 1 > operand, the type of the expression is determined by the type of the > operand which is detemined lexically as described above. Similarly I do not fully agree on the type being determined in that way. It is obvious, that over-long numbers may still be valid strings (and can thus be used as operands for the RE matching operator ":"). This clearly shows, that non-numeric arguments can be determined to be of type string, but numeric arguments are of indetermined type until used in an operation (i.e. they are both, valid integer and string, like a mixed state in quantum mechanics ;-) ). > for parenthesized degenerate expressions. For non-degenerate > expressions, the types of the operators and of the result are again > mostly or always determined lexically by the types of the results. > For example, '=' means equality of integers if both operands are > integers, but equality of of strings if one or both operands is not > an integer. Hmm, I think you are right in this case and numeric overflow should occur (and be detected) if at least one operand is a large decimal (outside the supported numeric range). > - In all cases, whether an operand is an integer is context-dependent, > so args must not be classified early. This seems to be done correctly, > so the code conforms to the POSIX syntax although the man page doesn't. I agree, but I'm still not sure about "expr 10000000000000000000000" causing an integer overflow. > The syntax is still broken as designed, since it doesn't allow +1 to > be an integer, and it requires octal intgers to be misinterpreted > as decimal integers although no reasonable specification of decimal > integers allows them to start with a '0', and it doesn't support > non-decimal integers... We *do* accept "+1" as synonymous to "1", though, as pointed out above. I'd like to change this. >>> - POSIX requires brokenness for bases other than 10, but I wonder if an >>> arg like 0x10 invokes undefined behaviour and thus can be made to >>> work. (I wanted to use a hex number since I can never remember what >>> INTMAX_MAX is in decimal and wanted to type it in hex for checking >>> the range and overflow errors.) Allowing hex args causes fewer >>> problems than allowing decimal args larger than INT32_MAX, since >>> they are obviously unportable. Some FreeBSD utilities, e.g., dd, >>> support hex args and don't worry about POSIX restricting them. >> >> Does POSIX require that expr exits on illegal arguments? > > Not sure. It requires an exit status of 2 for invalid expressions. > For the "+" operator, the operands are required to be decimal > integers, but the error handling isn't so clearly specified. For > the "&" operator, operands(s) are allowed to be null. There is an asymmetry in the handling of logical AND and OR, which may or may not be mandated by POSIX: $ expr "" \| "" $ expr "" \& "" 0 In the case of OR, the second argument is returned, if the first one is "0" or "". In the example above this results in "" being printed. In the case of AND, "0" is returned, if both arguments are "0" or "". This makes a difference if results of logical operations are not used as logical operands (where "" and "0" have the same value), but as arguments to other (numeric or string) operations. Is this mandated by POSIX? If not: Should we make the behaviour symmetric? I prefer the way OR operates in that it always returns one of its operands, the second one if both are FALSE. For AND we should return the second operand, if the first one evaluated to TRUE, IMHO (if permitted by POSIX). But since the current behaviour has existed in FreeBSD for at least 10 years, we might just keep it (although it slightly complicates the code). > Anyway, hex numbers can't be put through this gap. Since they are not > decimal integers, they are required to be interpreted as strings in > some contexts. So "expr 0x10 \< 2" gives 1 because the string "0" is > less then the string "2". This conflicts with "expr 16 \< 2" giving > 0 since both operands are intgers and of course 16 = 0x10 is not less > than 2. Too bad, actually ... >>> - POSIX unfortunately requires args larger than INT32_MAX to be unportable >>> (to work if longs are longer than 32 bits, else to give undefined (?) >>> behaviour. For portability there could be a -p switch that limits args >>> to INT32_MAX even if longs are longer than 32 bits. >> >> Well, undefined behaviour can always be to return the correct result ;-) >> >> I'd be willing to add "-p" (effectively just make "-e" the default that >> can be overridden). > > I now don't see any problem with -e. Not even the one for degenerate > expressions that I thought I saw. That's great! > POSIX says that shell expressions should be prefered to expr, and for > shell expressions it has a non-null discussion of representability and > overflows. It basically says that only long arithmetic is supported, > without even C's type suffixes which are needed to extend to unsigned > long arithmetic, but extensions are encouraged. Yes, I know that shell expressions are highly preferable to "expr" ... BTW: /bin/sh does support 64bit numeric range on 32bit architectures in shell expressions, but without range checks being performed. I just checked a few shells on i386: SH: $ echo $((2000000000000000000000 + 10000000000)) -9223372026854775809 BASH: $ echo $((2000000000000000000000 + 10000000000)) 7751640049368425472 ZSH: $ echo $((2000000000000000000000 + 10000000000)) zsh: number truncated after 19 digits: 2000000000000000000000 + 10000000000 2000000010000000000 So: How about range checks for $(( )) in our /bin/sh ??? Best regards, STefan From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 16:32:49 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2B5F61065670 for ; Thu, 30 Jun 2011 16:32:49 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm11.bullet.mail.sp2.yahoo.com (nm11.bullet.mail.sp2.yahoo.com [98.139.91.81]) by mx1.freebsd.org (Postfix) with SMTP id 0A37C8FC15 for ; Thu, 30 Jun 2011 16:32:49 +0000 (UTC) Received: from [98.139.91.66] by nm11.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 16:32:48 -0000 Received: from [98.136.185.45] by tm6.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 16:32:48 -0000 Received: from [127.0.0.1] by smtp106.mail.gq1.yahoo.com with NNFMP; 30 Jun 2011 16:32:48 -0000 X-Yahoo-Newman-Id: 343301.838.bm@smtp106.mail.gq1.yahoo.com Received: from [192.168.119.20] (se@81.173.186.241 with plain) by smtp106.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 09:32:47 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: 2j7c05gVM1n5qIunuqRCSr5GR8jVgSovWmKxEvvGWIcMv7B 7sTP6WK61KcygplnVYxtKaCPsC7h.8zNZdxC9RKkhF1TEJEJH9lUMxvn40kG l1JxopArcHye4Zpbj4MF5PE8s885RGTPUvj_WD2ppXOnV34Byvs9JvV6MSwN ypXZtWDAMFRWDyv5S8zdIwRqS1oIh3oTlSIQH8eZpPut2CLcxMf5XR73dDqS WekUAF0pPAl_M3qR.ALoDUA5qVOSFQd0DDPXSan535sFQWb8iKloxw3i6fHt lxnAaPKc7UK02crRHMh8j.TXD4sLkyh_ebtz8uU2IzRKtVhxqtZuEfYxerfi nZAEYXMEI7ypE3czj36IkQuV62w-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0CA52D.7020508@freebsd.org> Date: Thu, 30 Jun 2011 18:32:45 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Bruce Evans References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org> In-Reply-To: <4E0C2F41.2060302@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Alexander Best , Poul-Henning Kamp , standards@freebsd.org, Bruce Evans Subject: Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 16:32:49 -0000 The attached patch takes the feedback from Bruce into account. See details inline in the quoted mail, below. Any further comments? On 30.06.2011 10:09, Stefan Esser wrote: > On 30.06.2011 01:32, Bruce Evans wrote: >> On Wed, 29 Jun 2011, Stefan Esser wrote: >>> Am 29.06.2011 01:06, schrieb Bruce Evans: >>>> Other points: >>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", >>>> but it isn't the result that is too large, but the arg that is too large. >>>> This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. >>>> `expr -e 1000000000000000000 \* 10' gives "overflow". This message is >>>> correct, but it is in a different style to strerror() (uncapitalized, >>>> and more concise). >>> >>> The patch that I sent with my first message fixes this. The message is >>> changed from "Result too large" to "Not a valid integer: %s". >>> >>> ("non-numeric argument" is used in other places and I could adapt to >>> that, though I prefer to also see which argument was rejected. But I >>> think that "not a valid integer" better describes the situation.) >> >> I prefer "operand too large". A decimal integer operand that is too >> large to be represented by intmax_t is not really invalid, but is just >> too large. We already have a different message for invalid. > > Yes, I also thought the "non-numeric" message was misleading and that > "not a valid integer" was just slightly less misleading ;-) > > But "operand too large" correctly describes the situation and I'll just > add a test for that case (instead of just relying on strtoimax() to decide > about integer-ness). The patch introduces a function is_integer(), which performs a simple syntax check on operands to determine whether they are decimals. This function replaces the use of strtoimax() for the purpose of testing the operand, which fails to accept decimals too long to be represented by intmax_t. > According to POSIX, "+1" is no valid integer operand in "expr", should a > leading "+" also be detected and rejected? The is_integer() function only permits an optional leading minus sign followed by digits. The case of a only a minus sign passed as operand is already caught by the parser. (Aside: Is it correct to reject "expr -- - : -" as a syntax error?) >>> From an old (?) version of the man page: >> >> % Arithmetic operations are performed using signed integer math. If the -e >> % flag is specified, arithmetic uses the C intmax_t data type (the largest >> ^^^^^^^^^^ actual arithmentic, not just parsing; but >> for expr [-e] 10000000000000000000 we are >> doing actual arithmetic -- see below >> % integral type available), and expr will detect arithmetic overflow and >> % return an error indication. If a numeric operand is specified which is >> >> "Numeric" is not defined anywhere in the man page. This is the only >> use of it in the man page. It means "decimal integer" and should say >> precisely that. The only hint about this in the man page is the statement >> that "all integer operands are interpreted in base 10". The fuzziness >> extends to error messages saying "non-numeric argument" instead of >> "operand not a decimal integer [used in integer context]". > > Ok, all numeric arguments should be called "decimal integer" in the > documentation and error messages (if applicable). I have changed the wording in the man page. Critical review is welcome. >> % so large as to overflow conversion to an integer, it is parsed as a >> % string instead. If -e is not specified, arithmetic operations and pars- >> >> This specificially says that large operands are parsed as strings. >> Strangely, since large operands are only checked for with -e, only -e can >> get this right; without -e, large operands are not even detected. >> However, this is a bug in the man page -- see below. > > I had already wondered about this case (large numbers being considered > strings), but I thought it was in compliance with POSIX. This will be > fixed by checking operands to be POSIX decimal integers. Both program and man page are fixed. [...] >> Therefore, "expr -e 1000000000000000000" is not a syntax error as seems >> to be required by the man page; the arg in it forms a degenerate >> expression. The arg is not a since it is an . >> Therefore, the expression is numeric (I didn't check that POSIX says >> this explicitly). Therefore, we are justified in applying strtoimax() >> to all the operands in the expression (all 1 of them) and getting a >> range error. The current behaviour was irritating in so far as the first the operand was printed, then an error message on the next line, and an exit code of 2 is returned. The error message comes from the check for 0 or "", which shall result in an exit code of 1, but for too large decimals an overflow is detected during these tests. This is fixed by first checking for overflow and then printing the result of the evaluated expression. A side effect is that "expr 0000000000010000000" now correctly prints "10000000" instead of "0000000000010000000". > I think this is rational behaviour of the code, but I'm not convinced, > that can not be and at the same time. > See for example: > > $ expr \( 222 \) : "2*" > 3 > $ expr \( 111 + 111 \) : "2*" > 3 > > The numeric result is taken as string to perform the RE matching and the > length of the first match is returned. Therefore, a clear integer can > still be interpreted as string, if operand to a string expression. > The degenerate case of just a too long integer might still be considered > a valid string and the missing operation (or identity?) does not change > that type. Therefore it might be argued, that a very long numeric > argument should just be printed as string in that case. (???) I have had a look at the POSIX page for EXRP and found that I had a wrong understanding of some terms. Each operand is either an integer or a string, but the result of an evaluation can be converted to either. Thus it is correct to enforce numeric interpretation on input that is syntactically parsed as integer number. >> The syntax is still broken as designed, since it doesn't allow +1 to >> be an integer, and it requires octal intgers to be misinterpreted >> as decimal integers although no reasonable specification of decimal >> integers allows them to start with a '0', and it doesn't support >> non-decimal integers... > > We *do* accept "+1" as synonymous to "1", though, as pointed out above. > I'd like to change this. Changed, as explained above. >>> Does POSIX require that expr exits on illegal arguments? >> >> Not sure. It requires an exit status of 2 for invalid expressions. >> For the "+" operator, the operands are required to be decimal >> integers, but the error handling isn't so clearly specified. For >> the "&" operator, operands(s) are allowed to be null. The patch I prepared supports empty operands for arithmetic operations, they are treated as "0": $ expr "" + 1 1 This is easily changed by enforcing at least one digit in is_integer(). (I have tested both versions, but since this appears to be traditional behaviour at least on FreeBSD, I'm worried that enforcing at least one digit for numeric operands will break existing shell scripts.) Please review the attached patch. It is again created with "-w" to ignore whitespace changes (indentation). Regards, STefan From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 16:33:38 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 54AC41065674 for ; Thu, 30 Jun 2011 16:33:38 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm19.bullet.mail.sp2.yahoo.com (nm19.bullet.mail.sp2.yahoo.com [98.139.91.89]) by mx1.freebsd.org (Postfix) with SMTP id 2ED4F8FC1A for ; Thu, 30 Jun 2011 16:33:38 +0000 (UTC) Received: from [98.139.91.67] by nm19.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 16:33:37 -0000 Received: from [208.71.42.200] by tm7.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 16:33:37 -0000 Received: from [127.0.0.1] by smtp211.mail.gq1.yahoo.com with NNFMP; 30 Jun 2011 16:33:37 -0000 X-Yahoo-Newman-Id: 326617.47511.bm@smtp211.mail.gq1.yahoo.com Received: from [192.168.119.20] (se@81.173.186.241 with plain) by smtp211.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 09:33:36 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: _JDLKuIVM1k_rHOWHKkYnf4QIzBgpT0NrnSbF1HiUld8JvT ErT9FTe9ZyNnyz0UgI0LZOQOszQLCcUR3o0YyRfxHGAgNl4cXBZMSwc.2Jop OVC_THnzorKkUGylY5NuuAxr6wLQ5WUMvVyLLGvMImCoFU0G0n98VHBemQpz wwmPVCuxgRMqzn.Ro_brwKoInZwfjcTRe7MeiW9iIdj8u.OSOZunMgK4nCvF vWelmzjCrqfFDDtob28zy9le3TfT59CrZlPlUvKkDJ.qOOnrNuivO02xoFRR VyMpAoNLYwtJkx7.tIe5W2JITl4XBprhiKsIbVT69ZGT8pxnAGEyAnDFewgV yoOZSlP7AsRdIJy9qJcOJ17QO8A-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0CA55E.2090102@freebsd.org> Date: Thu, 30 Jun 2011 18:33:34 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Bruce Evans References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org> In-Reply-To: <4E0C2F41.2060302@freebsd.org> Content-Type: multipart/mixed; boundary="------------070607050902030603000606" Cc: Alexander Best , Poul-Henning Kamp , standards@freebsd.org, Bruce Evans Subject: RESENT with patch ... Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 16:33:38 -0000 This is a multi-part message in MIME format. --------------070607050902030603000606 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit The attached patch takes the feedback from Bruce into account. See details inline in the quoted mail, below. Any further comments? On 30.06.2011 10:09, Stefan Esser wrote: > On 30.06.2011 01:32, Bruce Evans wrote: >> On Wed, 29 Jun 2011, Stefan Esser wrote: >>> Am 29.06.2011 01:06, schrieb Bruce Evans: >>>> Other points: >>>> - `expr -e 10000000000000000000 + 0' (19 zeros) gives "Result too large", >>>> but it isn't the result that is too large, but the arg that is too large. >>>> This message is strerror(ERANGE) after strtoimax() sets errno to ERANGE. >>>> `expr -e 1000000000000000000 \* 10' gives "overflow". This message is >>>> correct, but it is in a different style to strerror() (uncapitalized, >>>> and more concise). >>> >>> The patch that I sent with my first message fixes this. The message is >>> changed from "Result too large" to "Not a valid integer: %s". >>> >>> ("non-numeric argument" is used in other places and I could adapt to >>> that, though I prefer to also see which argument was rejected. But I >>> think that "not a valid integer" better describes the situation.) >> >> I prefer "operand too large". A decimal integer operand that is too >> large to be represented by intmax_t is not really invalid, but is just >> too large. We already have a different message for invalid. > > Yes, I also thought the "non-numeric" message was misleading and that > "not a valid integer" was just slightly less misleading ;-) > > But "operand too large" correctly describes the situation and I'll just > add a test for that case (instead of just relying on strtoimax() to decide > about integer-ness). The patch introduces a function is_integer(), which performs a simple syntax check on operands to determine whether they are decimals. This function replaces the use of strtoimax() for the purpose of testing the operand, which fails to accept decimals too long to be represented by intmax_t. > According to POSIX, "+1" is no valid integer operand in "expr", should a > leading "+" also be detected and rejected? The is_integer() function only permits an optional leading minus sign followed by digits. The case of a only a minus sign passed as operand is already caught by the parser. (Aside: Is it correct to reject "expr -- - : -" as a syntax error?) >>> From an old (?) version of the man page: >> >> % Arithmetic operations are performed using signed integer math. If the -e >> % flag is specified, arithmetic uses the C intmax_t data type (the largest >> ^^^^^^^^^^ actual arithmentic, not just parsing; but >> for expr [-e] 10000000000000000000 we are >> doing actual arithmetic -- see below >> % integral type available), and expr will detect arithmetic overflow and >> % return an error indication. If a numeric operand is specified which is >> >> "Numeric" is not defined anywhere in the man page. This is the only >> use of it in the man page. It means "decimal integer" and should say >> precisely that. The only hint about this in the man page is the statement >> that "all integer operands are interpreted in base 10". The fuzziness >> extends to error messages saying "non-numeric argument" instead of >> "operand not a decimal integer [used in integer context]". > > Ok, all numeric arguments should be called "decimal integer" in the > documentation and error messages (if applicable). I have changed the wording in the man page. Critical review is welcome. >> % so large as to overflow conversion to an integer, it is parsed as a >> % string instead. If -e is not specified, arithmetic operations and pars- >> >> This specificially says that large operands are parsed as strings. >> Strangely, since large operands are only checked for with -e, only -e can >> get this right; without -e, large operands are not even detected. >> However, this is a bug in the man page -- see below. > > I had already wondered about this case (large numbers being considered > strings), but I thought it was in compliance with POSIX. This will be > fixed by checking operands to be POSIX decimal integers. Both program and man page are fixed. [...] >> Therefore, "expr -e 1000000000000000000" is not a syntax error as seems >> to be required by the man page; the arg in it forms a degenerate >> expression. The arg is not a since it is an . >> Therefore, the expression is numeric (I didn't check that POSIX says >> this explicitly). Therefore, we are justified in applying strtoimax() >> to all the operands in the expression (all 1 of them) and getting a >> range error. The current behaviour was irritating in so far as the first the operand was printed, then an error message on the next line, and an exit code of 2 is returned. The error message comes from the check for 0 or "", which shall result in an exit code of 1, but for too large decimals an overflow is detected during these tests. This is fixed by first checking for overflow and then printing the result of the evaluated expression. A side effect is that "expr 0000000000010000000" now correctly prints "10000000" instead of "0000000000010000000". > I think this is rational behaviour of the code, but I'm not convinced, > that can not be and at the same time. > See for example: > > $ expr \( 222 \) : "2*" > 3 > $ expr \( 111 + 111 \) : "2*" > 3 > > The numeric result is taken as string to perform the RE matching and the > length of the first match is returned. Therefore, a clear integer can > still be interpreted as string, if operand to a string expression. > The degenerate case of just a too long integer might still be considered > a valid string and the missing operation (or identity?) does not change > that type. Therefore it might be argued, that a very long numeric > argument should just be printed as string in that case. (???) I have had a look at the POSIX page for EXRP and found that I had a wrong understanding of some terms. Each operand is either an integer or a string, but the result of an evaluation can be converted to either. Thus it is correct to enforce numeric interpretation on input that is syntactically parsed as integer number. >> The syntax is still broken as designed, since it doesn't allow +1 to >> be an integer, and it requires octal intgers to be misinterpreted >> as decimal integers although no reasonable specification of decimal >> integers allows them to start with a '0', and it doesn't support >> non-decimal integers... > > We *do* accept "+1" as synonymous to "1", though, as pointed out above. > I'd like to change this. Changed, as explained above. >>> Does POSIX require that expr exits on illegal arguments? >> >> Not sure. It requires an exit status of 2 for invalid expressions. >> For the "+" operator, the operands are required to be decimal >> integers, but the error handling isn't so clearly specified. For >> the "&" operator, operands(s) are allowed to be null. The patch I prepared supports empty operands for arithmetic operations, they are treated as "0": $ expr "" + 1 1 This is easily changed by enforcing at least one digit in is_integer(). (I have tested both versions, but since this appears to be traditional behaviour at least on FreeBSD, I'm worried that enforcing at least one digit for numeric operands will break existing shell scripts.) Please review the attached patch. It is again created with "-w" to ignore whitespace changes (indentation). Regards, STefan --------------070607050902030603000606 Content-Type: text/plain; name="expr.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="expr.diff" --- expr.y~ +++ expr.y 2011-06-30 17:44:35.914960240 +0200 @@ -48,6 +48,7 @@ int chk_times(intmax_t, intmax_t, intmax_t); void free_value(struct val *); int is_zero_or_null(struct val *); +int is_integer(const char *); int isstring(struct val *); struct val *make_integer(intmax_t); struct val *make_str(const char *); @@ -65,13 +66,12 @@ struct val *op_plus(struct val *, struct val *); struct val *op_rem(struct val *, struct val *); struct val *op_times(struct val *, struct val *); -intmax_t to_integer(struct val *); +int to_integer(struct val *); void to_string(struct val *); int yyerror(const char *); int yylex(void); int yyparse(void); -static int eflag; char **av; %} @@ -134,37 +134,15 @@ make_str(const char *s) { struct val *vp; - char *ep; vp = (struct val *) malloc (sizeof (*vp)); if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) { errx(ERR_EXIT, "malloc() failed"); } - - /* - * Previously we tried to scan the string to see if it ``looked like'' - * an integer (erroneously, as it happened). Let strtoimax() do the - * dirty work. We could cache the value, except that we are using - * a union and need to preserve the original string form until we - * are certain that it is not needed. - * - * IEEE Std.1003.1-2001 says: - * /integer/ An argument consisting only of an (optional) unary minus - * followed by digits. - * - * This means that arguments which consist of digits followed by - * non-digits MUST NOT be considered integers. strtoimax() will - * figure this out for us. - */ - if (eflag) - (void)strtoimax(s, &ep, 10); + if (is_integer(s)) + vp->type = numeric_string; else - (void)strtol(s, &ep, 10); - - if (*ep != '\0') vp->type = string; - else - vp->type = numeric_string; return vp; } @@ -178,7 +156,7 @@ } -intmax_t +int to_integer(struct val *vp) { intmax_t i; @@ -191,13 +169,9 @@ /* vp->type == numeric_string, make it numeric */ errno = 0; - if (eflag) { - i = strtoimax(vp->u.s, (char **)NULL, 10); - if (errno == ERANGE) - err(ERR_EXIT, NULL); - } else { - i = strtol(vp->u.s, (char **)NULL, 10); - } + i = strtoimax(vp->u.s, (char **)NULL, 10); + if (errno == ERANGE) + errx(ERR_EXIT, "operand too large: '%s'", vp->u.s); free (vp->u.s); vp->u.i = i; @@ -230,6 +204,17 @@ int +is_integer(const char *s) +{ + if (*s == '-') + s++; + while (isdigit(*s)) + s++; + return *s == '\0'; +} + + +int isstring(struct val *vp) { /* only TRUE if this string is not a valid integer */ @@ -282,12 +267,11 @@ if (getenv("EXPR_COMPAT") != NULL || check_utility_compat("expr")) { av = argv + 1; - eflag = 1; } else { while ((c = getopt(argc, argv, "e")) != -1) switch (c) { case 'e': - eflag = 1; + /* ignore for backwards compatibility */ break; default: @@ -300,6 +284,8 @@ yyparse(); + /* convert type numeric_string to integer; NOP for other types */ + (void)to_integer(result); if (result->type == integer) printf("%jd\n", result->u.i); else @@ -483,13 +469,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { - r = make_integer(a->u.i + b->u.i); - if (chk_plus(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i + (long)b->u.i); + r = make_integer(a->u.i + b->u.i); + if (chk_plus(a->u.i, b->u.i, r->u.i)) { + errx(ERR_EXIT, "overflow"); + } free_value (a); free_value (b); @@ -520,13 +503,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { - r = make_integer(a->u.i - b->u.i); - if (chk_minus(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i - (long)b->u.i); + r = make_integer(a->u.i - b->u.i); + if (chk_minus(a->u.i, b->u.i, r->u.i)) { + errx(ERR_EXIT, "overflow"); + } free_value (a); free_value (b); @@ -554,13 +534,10 @@ errx(ERR_EXIT, "non-numeric argument"); } - if (eflag) { - r = make_integer(a->u.i * b->u.i); - if (chk_times(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i * (long)b->u.i); + r = make_integer(a->u.i * b->u.i); + if (chk_times(a->u.i, b->u.i, r->u.i)) { + errx(ERR_EXIT, "overflow"); + } free_value (a); free_value (b); @@ -591,13 +568,10 @@ errx(ERR_EXIT, "division by zero"); } - if (eflag) { - r = make_integer(a->u.i / b->u.i); - if (chk_div(a->u.i, b->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i / (long)b->u.i); + r = make_integer(a->u.i / b->u.i); + if (chk_div(a->u.i, b->u.i)) { + errx(ERR_EXIT, "overflow"); + } free_value (a); free_value (b); @@ -617,11 +591,8 @@ errx(ERR_EXIT, "division by zero"); } - if (eflag) - r = make_integer(a->u.i % b->u.i); - /* chk_rem necessary ??? */ - else - r = make_integer((long)a->u.i % (long)b->u.i); + r = make_integer(a->u.i % b->u.i); + /* chk_rem necessary ??? */ free_value (a); free_value (b); --- expr.1~ +++ expr.1 2011-06-30 16:06:50.338086000 +0200 @@ -50,25 +50,19 @@ All operators and operands must be passed as separate arguments. Several of the operators have special meaning to command interpreters and must therefore be quoted appropriately. -All integer operands are interpreted in base 10. +All integer operands are interpreted in base 10 and must only consist +of an optional leading minus sign followed by one or more digits. .Pp -Arithmetic operations are performed using signed integer math. -If the -.Fl e -flag is specified, arithmetic uses the C +Arithmetic operations are performed using signed integer math with a +range according to the C .Vt intmax_t -data type (the largest integral type available), and -.Nm -will detect arithmetic overflow and return an error indication. -If a numeric operand is specified which is so large as to overflow -conversion to an integer, it is parsed as a string instead. -If +data type (the largest signed integral type available). All conversions and +operations are checked for overflow, which will result in program termination +with an error message on stdout and with error status being returned. +.Pp +The .Fl e -is not specified, arithmetic operations and parsing of integer -arguments will overflow silently according to the rules of the C -standard, using the -.Vt long -data type. +option is accepted for backwards compatibility but has no effect. .Pp Operators are listed below in order of increasing precedence; all are left-associative. @@ -272,6 +266,8 @@ utility conforms to .St -p1003.1-2001 , provided that compatibility mode is not enabled. +Extended arithmetic range and overflow checks apply to cases of +undefined behavior according to POSIX. The .Fl e flag is an extension. --------------070607050902030603000606-- From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 16:59:05 2011 Return-Path: Delivered-To: freebsd-standards@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21E9C106566C for ; Thu, 30 Jun 2011 16:59:05 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id DFF218FC15 for ; Thu, 30 Jun 2011 16:59:04 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id p5UGkr7W083426; Thu, 30 Jun 2011 12:46:53 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.4/8.14.2/Submit) id p5UGkrOM083425; Thu, 30 Jun 2011 12:46:53 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 30 Jun 2011 12:46:53 -0400 From: David Schultz To: Stefan Esser Message-ID: <20110630164653.GA82980@zim.MIT.EDU> Mail-Followup-To: Stefan Esser , freebsd-standards@freebsd.org References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E0B860E.50504@freebsd.org> Cc: freebsd-standards@FreeBSD.ORG Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 16:59:05 -0000 On Wed, Jun 29, 2011, Stefan Esser wrote: > My suggestion is to make the following modifications to expr: > > 1) Always compute with 64 bit range and overflow checks enabled. This > means, that the "-e" option is always assumed. > > 2) Continue to support the "-e" option as a NOP for compatibility with > scripts that knew about that FreeBSD extension. Using 64-bit signed integer arithmetic consistently in expr sounds like a reasonable idea. There's no reason shell scripts ought to be exposed to architectural details. There are two problems, however. First, the implementation doesn't do this: it uses intmax_t, which has platform-dependent width, at least in theory. Second, it sounds like POSIX doesn't allow this (although I don't know where this requirement comes from): | r96367 | wollman | 2002-05-10 18:59:29 -0400 (Fri, 10 May 2002) | 5 lines | | The response to my POSIX interpretation request says that `expr' | is required to be oblivious to overflow and to use the data type `long'. | (Division by zero is undefined in ISO C so it's still OK to check for it | here.) Add a new `-e' flag to get the old, more useful behavior. Nevertheless, moving to int64_t arithmetic by default seems like a good idea, with low potential for complications. Overflow checking is a separate concern, and one that is more likely to cause problems. I'd be more careful about changing the default behavior there, because some scripts might rely on modular arithmetic without overflow checks. Another thing that is likely to cause confusion is expr(1) behaving differently from the shell builtin, and differently from the shell's arithmetic. Above all else, I suggest trying to make everything consistent...perhaps talk to jilles@ about the shell side of things. From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 17:12:47 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3DD5E1065677 for ; Thu, 30 Jun 2011 17:12:47 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm16.bullet.mail.sp2.yahoo.com (nm16.bullet.mail.sp2.yahoo.com [98.139.91.86]) by mx1.freebsd.org (Postfix) with SMTP id 1C5858FC1B for ; Thu, 30 Jun 2011 17:12:47 +0000 (UTC) Received: from [98.139.91.61] by nm16.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 17:12:46 -0000 Received: from [208.71.42.192] by tm1.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 17:12:46 -0000 Received: from [127.0.0.1] by smtp203.mail.gq1.yahoo.com with NNFMP; 30 Jun 2011 17:12:46 -0000 X-Yahoo-Newman-Id: 493635.11874.bm@smtp203.mail.gq1.yahoo.com Received: from [192.168.119.20] (se@81.173.186.241 with plain) by smtp203.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 10:12:46 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: DUS1HSYVM1mONaYXbrRosb.IOMHctps.pGx6EOc1ZIXX8fX NORcnMFHxeeHSI6UwLo7DKALBnNKotLqPYerfgGfbGrm2aGLVDScn3AZOfNS DbU7hx4LkGGO_fIuBHZgziDvizLrsnWfT8bZHH6m4STJuWVmsEhrE7F1lN1L Mfhnmgj.F4EBQFHch7.Pqb8Q.iuU3uIHNaXIuoDtvjocsawFoGTTm.7tQx00 x7l.KA1TOPdyGsAwh1oX7N5ZEHeFj2aCX2nyTahRhEJ3Obk_VmWlxYryWGUh gfAmoMhp_2S7ITmo2GfkxpYunn0v3gXUgXhiYJvkKDfO1QkVcGOszth8sUOB CXaPhkyIR._auVLpbeINclZB29g-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0CAE8B.6030309@freebsd.org> Date: Thu, 30 Jun 2011 19:12:43 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: Bruce Evans , Alexander Best , Poul-Henning Kamp , standards@freebsd.org, Bruce Evans References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org> <4E0CA55E.2090102@freebsd.org> <20110630165050.GB82980@zim.MIT.EDU> In-Reply-To: <20110630165050.GB82980@zim.MIT.EDU> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Subject: Re: RESENT with patch ... Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 17:12:47 -0000 Am 30.06.2011 18:50, schrieb David Schultz: > On Thu, Jun 30, 2011, Stefan Esser wrote: >> int >> +is_integer(const char *s) >> +{ >> + if (*s == '-') >> + s++; >> + while (isdigit(*s)) >> + s++; >> + return *s == '\0'; >> +} > > I only glanced at the patch for a few seconds, but your > is_integer() routine will accept a bare minus sign ("-") as an > integer. I mentioned in the message, that I do not test for empty numeric operands. I had considered: int is_integer(const char *s) { if (*s == '-') s++; if (!isdigit(*s++) return 0; while (isdigit(*s)) s++; return *s == '\0'; } But this will break script that do while : do i=`expr "$i" + 1` done without first setting i=0. And I assume, such scripts exist in huge numbers. But "-" will not be accepted as operand (neither integer nor string). Didn't I give the example "expr -- - : -" which fails with a syntax error? Therefore, I do not need to check for actual digits in the input (and thus may accept "" as 0 for backwards compatibility), but "-" will not be allowed as parameter to expr by the parser ("syntax error"). Regards, STefan From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 17:18:59 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74F4C106566C for ; Thu, 30 Jun 2011 17:18:59 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm30.bullet.mail.sp2.yahoo.com (nm30.bullet.mail.sp2.yahoo.com [98.139.91.100]) by mx1.freebsd.org (Postfix) with SMTP id 580618FC1A for ; Thu, 30 Jun 2011 17:18:59 +0000 (UTC) Received: from [98.139.91.67] by nm30.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 17:06:08 -0000 Received: from [98.136.185.47] by tm7.bullet.mail.sp2.yahoo.com with NNFMP; 30 Jun 2011 17:06:08 -0000 Received: from [127.0.0.1] by smtp108.mail.gq1.yahoo.com with NNFMP; 30 Jun 2011 17:06:08 -0000 X-Yahoo-Newman-Id: 913956.36989.bm@smtp108.mail.gq1.yahoo.com Received: from [192.168.119.20] (se@81.173.186.241 with plain) by smtp108.mail.gq1.yahoo.com with SMTP; 30 Jun 2011 10:06:06 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: QxSXANIVM1njiZAbBuil7_IdtFmU6BPtQApEsK6n6bG2zXU Wg8scr_X1q8JtnZCRtyWOTTZ_Pu8kmZ2glahJLLcKyZPaRaiVKjZigVAu_rl eeKaCR0v3X1Nw.kD9AqpUn0iSj3AsYL9vuIRJT5W7Cme7Oqao_tPkzxA0wEW ILapMHU_854MxKoqAaZjzfzvJud2_a33c492KEHM.sIbCW324mBlq5mZT3In mYXjmTxRcH5dHg3XGhbQSPbucJDQXHtkaEuOebn3HgAIDKt2wvaRrzMxwW49 zfhGRazFsC.8imzswD4kKNfLSEHO9IeBMzpNlLR3FHBopeBdsrcB5EUoK.Tg Bx9AQ77tEBDqo8DYwQ__SHhscSw-- X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0CACFB.8040906@freebsd.org> Date: Thu, 30 Jun 2011 19:06:03 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-standards@freebsd.org References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> In-Reply-To: <20110630164653.GA82980@zim.MIT.EDU> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 17:18:59 -0000 Am 30.06.2011 18:46, schrieb David Schultz: > On Wed, Jun 29, 2011, Stefan Esser wrote: >> My suggestion is to make the following modifications to expr: >> >> 1) Always compute with 64 bit range and overflow checks enabled. This >> means, that the "-e" option is always assumed. >> >> 2) Continue to support the "-e" option as a NOP for compatibility with >> scripts that knew about that FreeBSD extension. > > Using 64-bit signed integer arithmetic consistently in expr sounds > like a reasonable idea. There's no reason shell scripts ought to > be exposed to architectural details. Yes, especially when you move them from e.g. amd64 to i386 ... > There are two problems, however. First, the implementation > doesn't do this: it uses intmax_t, which has platform-dependent > width, at least in theory. Second, it sounds like POSIX doesn't Yes, but it seems that intmax_t is guaranteed to be at least 64bit wide. > allow this (although I don't know where this requirement comes > from): > > | r96367 | wollman | 2002-05-10 18:59:29 -0400 (Fri, 10 May 2002) | 5 lines > | > | The response to my POSIX interpretation request says that `expr' > | is required to be oblivious to overflow and to use the data type `long'. > | (Division by zero is undefined in ISO C so it's still OK to check for it > | here.) Add a new `-e' flag to get the old, more useful behavior. Well, that's the frustrating part: I had implemented 64bit range in expr back in 2000, but this extended range (on 32bit archs) has been made optional, some two years later with the commit message you quote. In between, a number of people were hurt by scripts that silently truncated numeric operands or results, and during a mail exchange in that context, the above requirement was said to not really apply. The reasoning is, that overflow beyond the range of signed long leads to unspecified behaviour. And one possibility to deal with unspecified behaviour it to just return the "correct" result without wrap-aroung. The range check is also possible in that case, AFAIK. Once you are outside the specified range, POSIX allows extensions that do just what I propose (and what already was in FreeBSD, 10 years ago). > Nevertheless, moving to int64_t arithmetic by default seems like a > good idea, with low potential for complications. On amd64 and other 64 bit architectures the numeric range already is 64bit. > Overflow checking is a separate concern, and one that is more > likely to cause problems. I'd be more careful about changing the > default behavior there, because some scripts might rely on modular > arithmetic without overflow checks. You cannot portably rely in overflow, since you have no guarantee that overflow occurs at a specific boundary. > Another thing that is likely to cause confusion is expr(1) > behaving differently from the shell builtin, and differently from > the shell's arithmetic. Above all else, I suggest trying to make > everything consistent...perhaps talk to jilles@ about the shell > side of things. My change in 2000 made the shell builtin do 64bit arithmetic as well (in fact, the shell build just included the expr.y file ...). But there is no expr builtin anymore, so there is no risk here ;-) Regards, STefan From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 17:29:04 2011 Return-Path: Delivered-To: standards@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id ECCD51065674; Thu, 30 Jun 2011 17:29:04 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 85A6D8FC13; Thu, 30 Jun 2011 17:29:04 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id p5UGooOk083449; Thu, 30 Jun 2011 12:50:50 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.4/8.14.2/Submit) id p5UGoo1X083448; Thu, 30 Jun 2011 12:50:50 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 30 Jun 2011 12:50:50 -0400 From: David Schultz To: Stefan Esser Message-ID: <20110630165050.GB82980@zim.MIT.EDU> Mail-Followup-To: Stefan Esser , Bruce Evans , Alexander Best , Poul-Henning Kamp , standards@freebsd.org, Bruce Evans References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org> <4E0CA55E.2090102@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E0CA55E.2090102@freebsd.org> Cc: Alexander Best , Poul-Henning Kamp , standards@FreeBSD.ORG, Bruce Evans Subject: Re: RESENT with patch ... Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 17:29:05 -0000 On Thu, Jun 30, 2011, Stefan Esser wrote: > int > +is_integer(const char *s) > +{ > + if (*s == '-') > + s++; > + while (isdigit(*s)) > + s++; > + return *s == '\0'; > +} I only glanced at the patch for a few seconds, but your is_integer() routine will accept a bare minus sign ("-") as an integer. From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 17:42:09 2011 Return-Path: Delivered-To: freebsd-standards@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1BE8C106564A; Thu, 30 Jun 2011 17:42:09 +0000 (UTC) (envelope-from das@FreeBSD.ORG) Received: from zim.MIT.EDU (ZIM.MIT.EDU [18.95.3.101]) by mx1.freebsd.org (Postfix) with ESMTP id 8828A8FC14; Thu, 30 Jun 2011 17:42:08 +0000 (UTC) Received: from zim.MIT.EDU (localhost [127.0.0.1]) by zim.MIT.EDU (8.14.4/8.14.2) with ESMTP id p5UHg8VQ083790; Thu, 30 Jun 2011 13:42:08 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Received: (from das@localhost) by zim.MIT.EDU (8.14.4/8.14.2/Submit) id p5UHg8FI083789; Thu, 30 Jun 2011 13:42:08 -0400 (EDT) (envelope-from das@FreeBSD.ORG) Date: Thu, 30 Jun 2011 13:42:08 -0400 From: David Schultz To: Stefan Esser Message-ID: <20110630174208.GA83700@zim.MIT.EDU> Mail-Followup-To: Stefan Esser , freebsd-standards@freebsd.org References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4E0CACFB.8040906@freebsd.org> Cc: freebsd-standards@FreeBSD.ORG Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 17:42:09 -0000 On Thu, Jun 30, 2011, Stefan Esser wrote: > Am 30.06.2011 18:46, schrieb David Schultz: > > There are two problems, however. First, the implementation > > doesn't do this: it uses intmax_t, which has platform-dependent > > width, at least in theory. Second, it sounds like POSIX doesn't > > Yes, but it seems that intmax_t is guaranteed to be at least 64bit wide. The distinction is that int64_t is always exactly 64 bits, whereas intmax_t could be wider. If the goal is to guarantee that expr's arithmetic is architecture-independent, then int64_t is a more appropriate type. (This requires a narrowing conversion and range check after strtoimax(), but the compiler should optimize it away.) > > Overflow checking is a separate concern, and one that is more > > likely to cause problems. I'd be more careful about changing the > > default behavior there, because some scripts might rely on modular > > arithmetic without overflow checks. > > You cannot portably rely in overflow, since you have no guarantee that > overflow occurs at a specific boundary. My point was that some scripts might rely on the lack of overflow checking anyway. This is a separate and potentially more problematic change. But I'm not trying to argue that it's necessarily a bad idea. > > Another thing that is likely to cause confusion is expr(1) > > behaving differently from the shell builtin, and differently from > > the shell's arithmetic. Above all else, I suggest trying to make > > everything consistent...perhaps talk to jilles@ about the shell > > side of things. > > My change in 2000 made the shell builtin do 64bit arithmetic as well (in > fact, the shell build just included the expr.y file ...). I don't know the state of the shell these days. Didn't jilles@ change the shell to do that more recently? My point here is that working to make both the shell and expr behave identically would be ideal. From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 18:04:50 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 419D61065672 for ; Thu, 30 Jun 2011 18:04:50 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [128.30.28.20]) by mx1.freebsd.org (Postfix) with ESMTP id 036C78FC0C for ; Thu, 30 Jun 2011 18:04:49 +0000 (UTC) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.4/8.14.4) with ESMTP id p5UHqslg094336 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Thu, 30 Jun 2011 13:52:54 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.4/8.14.4/Submit) id p5UHqsNq094333; Thu, 30 Jun 2011 13:52:54 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <19980.47094.184089.398324@khavrinen.csail.mit.edu> Date: Thu, 30 Jun 2011 13:52:54 -0400 From: Garrett Wollman To: Stefan Esser In-Reply-To: <4E0CACFB.8040906@freebsd.org> References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 30 Jun 2011 13:52:54 -0400 (EDT) Cc: freebsd-standards@freebsd.org Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 18:04:50 -0000 < said: > Well, that's the frustrating part: I had implemented 64bit range in expr > back in 2000, but this extended range (on 32bit archs) has been made > optional, some two years later with the commit message you quote. The 2001 POSIX standard states: > Integer variables and constants, including the values of operands > and option-arguments, used by the standard utilities listed in this > volume of IEEE Std 1003.1-2001 shall be implemented as equivalent to > the ISO C standard signed long data type; floating point shall be > implemented as equivalent to the ISO C standard double > type. Conversions between types shall be as described in the ISO C > standard. All variables shall be initialized to zero if they are not > otherwise assigned by the input to the application. > Arithmetic operators and control flow keywords shall be implemented > as equivalent to those in the cited ISO C standard section, as > listed in Table 1-2 (on page 8). However, for arithemtic expansion in the shell (but *not* the expr utility), it says: > As an extension, the shell may recognize arithmetic expressions > beyond those listed. The shell may use a signed integer type with a > rank larger than the rank of signed long. The shell may use a > real-floating type instead of signed long as long as it does not > affect the results in cases where there is no overflow. The language in the 2008 version of the standard is the same. For expr, the following definitions are relevant (from XCU7 page 2715): > integer An argument consisting only of an (optional) unary minus > followed by digits. > string A string argument; see below. [...] > A string argument is an argument that cannot be identified as an > integer argument or as one of the expression operator symbols shown > in the OPERANDS section. I'm unable to find the response to my interpretation request in the Austin Group mail archives. -GAWollman From owner-freebsd-standards@FreeBSD.ORG Thu Jun 30 22:52:43 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F007A106566B; Thu, 30 Jun 2011 22:52:42 +0000 (UTC) (envelope-from jilles@stack.nl) Received: from mx1.stack.nl (relay04.stack.nl [IPv6:2001:610:1108:5010::107]) by mx1.freebsd.org (Postfix) with ESMTP id 8E8C88FC14; Thu, 30 Jun 2011 22:52:42 +0000 (UTC) Received: from turtle.stack.nl (turtle.stack.nl [IPv6:2001:610:1108:5010::132]) by mx1.stack.nl (Postfix) with ESMTP id 34AF51DD7BF; Fri, 1 Jul 2011 00:52:41 +0200 (CEST) Received: by turtle.stack.nl (Postfix, from userid 1677) id 22478174D1; Fri, 1 Jul 2011 00:52:41 +0200 (CEST) Date: Fri, 1 Jul 2011 00:52:41 +0200 From: Jilles Tjoelker To: Stefan Esser , freebsd-standards@freebsd.org Message-ID: <20110630225240.GA59192@stack.nl> References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org> <20110630174208.GA83700@zim.MIT.EDU> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110630174208.GA83700@zim.MIT.EDU> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 30 Jun 2011 22:52:43 -0000 On Thu, Jun 30, 2011 at 01:42:08PM -0400, David Schultz wrote: > > > Overflow checking is a separate concern, and one that is more > > > likely to cause problems. I'd be more careful about changing the > > > default behavior there, because some scripts might rely on modular > > > arithmetic without overflow checks. > > You cannot portably rely in overflow, since you have no guarantee that > > overflow occurs at a specific boundary. > My point was that some scripts might rely on the lack of overflow > checking anyway. This is a separate and potentially more > problematic change. But I'm not trying to argue that it's > necessarily a bad idea. tools/regression/bin/sh/expansion/arith11.0 depends on two's complement overflow behaviour, to test dividing the smallest integer by -1. > > > Another thing that is likely to cause confusion is expr(1) > > > behaving differently from the shell builtin, and differently from > > > the shell's arithmetic. Above all else, I suggest trying to make > > > everything consistent...perhaps talk to jilles@ about the shell > > > side of things. > > My change in 2000 made the shell builtin do 64bit arithmetic as well (in > > fact, the shell build just included the expr.y file ...). As far as I can see from the history, there has never been an expr(1) builtin in FreeBSD. The commented line in builtins.def is deceiving: it was for Almquist's combined expr and test (which cannot possibly work reliably and has never been enabled in FreeBSD) but test and [ were removed from the line when a test(1) builtin was added. There is a let builtin, with an alias "exp" in 8.x and older, but this simply concatenates its arguments and evaluates the result as an arithmetic expression $((...)). I strongly discourage using "let". > I don't know the state of the shell these days. Didn't jilles@ > change the shell to do that more recently? My point here is that > working to make both the shell and expr behave identically would > be ideal. Shell arithmetic has been intmax_t for a while. At least in all versions with intmax_t, the smallest integer cannot be specified as a literal. This is because the minus sign is not part of the literal and the absolute value of the smallest integer (like 9223372036854775808) does not fit in an intmax_t and is silently clipped (for example to 9223372036854775807). This problem does not occur when the number is stored in a variable and the variable is used in the expression without a dollar sign. The smallest integer will be processed correctly in that case as the whole string is passed to strtoimax(). The new arithmetic code in 9.0 has brought some subtle changes. If a variable is used in an expression without a dollar sign, out of range values are now errors (previously such values were silently clipped). Apparently ports and other scripts do not mind this check. Regular literals are still silently clipped. This usage of strtoimax() also means that things like echo $(($(printf "%#x" -1) )) or v=$(printf "%#x" -1); echo $((v)) do not work as might be expected. A different overflow behaviour for arithmetic operations might be dangerous, breaking ports and/or other scripts. Also note that errors in expr(1) and shell arithmetic are handled rather differently. The former are usually ignored (though an error message will probably be printed) while the latter will almost always cause the shell to stop executing the subshell or script. -- Jilles Tjoelker From owner-freebsd-standards@FreeBSD.ORG Fri Jul 1 00:27:06 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D14A106566C; Fri, 1 Jul 2011 00:27:06 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id 398788FC13; Fri, 1 Jul 2011 00:27:05 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p610R3vF023067 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jul 2011 10:27:04 +1000 Date: Fri, 1 Jul 2011 10:27:03 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stefan Esser In-Reply-To: <4E0CACFB.8040906@freebsd.org> Message-ID: <20110701100701.U1279@besplex.bde.org> References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2011 00:27:06 -0000 On Thu, 30 Jun 2011, Stefan Esser wrote: > Am 30.06.2011 18:46, schrieb David Schultz: >> On Wed, Jun 29, 2011, Stefan Esser wrote: >>> My suggestion is to make the following modifications to expr: >>> >>> 1) Always compute with 64 bit range and overflow checks enabled. This >>> means, that the "-e" option is always assumed. >>> >>> 2) Continue to support the "-e" option as a NOP for compatibility with >>> scripts that knew about that FreeBSD extension. >> >> Using 64-bit signed integer arithmetic consistently in expr sounds >> like a reasonable idea. There's no reason shell scripts ought to >> be exposed to architectural details. > > Yes, especially when you move them from e.g. amd64 to i386 ... > >> There are two problems, however. First, the implementation >> doesn't do this: it uses intmax_t, which has platform-dependent >> width, at least in theory. Second, it sounds like POSIX doesn't > > Yes, but it seems that intmax_t is guaranteed to be at least 64bit wide. It was intentional to use intmax_t instead of int64_t. This allows for future expansion. The range checking will detect unportabilities in scripts that depend on intmax_t being even larger than int64_t, and hopefully prevent existence of such scripts. >> allow this (although I don't know where this requirement comes >> from): >> >> | r96367 | wollman | 2002-05-10 18:59:29 -0400 (Fri, 10 May 2002) | 5 lines >> | >> | The response to my POSIX interpretation request says that `expr' >> | is required to be oblivious to overflow and to use the data type `long'. I don't think POSIX is that broken :-). >> | (Division by zero is undefined in ISO C so it's still OK to check for it >> | here.) Add a new `-e' flag to get the old, more useful behavior. Behaviour on overflow is equally undefined. >> Overflow checking is a separate concern, and one that is more >> likely to cause problems. I'd be more careful about changing the >> default behavior there, because some scripts might rely on modular >> arithmetic without overflow checks. > > You cannot portably rely in overflow, since you have no guarantee that > overflow occurs at a specific boundary. And that is when it is implementation-defined. FreeBSD's man page attempts to say what the FreeBSD implementation definition is, but ends up saying that the behaviour is undefined -- it says "will overflow silently according to the rules of the C standard, using the long data type", but the C standard says that the behaviour is undefined, and "undefined" of course doesn't include silence. Bruce From owner-freebsd-standards@FreeBSD.ORG Fri Jul 1 01:09:42 2011 Return-Path: Delivered-To: standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4064F106566C; Fri, 1 Jul 2011 01:09:42 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail07.syd.optusnet.com.au (mail07.syd.optusnet.com.au [211.29.132.188]) by mx1.freebsd.org (Postfix) with ESMTP id CADF08FC13; Fri, 1 Jul 2011 01:09:41 +0000 (UTC) Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p6119a0R000663 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jul 2011 11:09:37 +1000 Date: Fri, 1 Jul 2011 11:09:36 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Stefan Esser In-Reply-To: <4E0CAE8B.6030309@freebsd.org> Message-ID: <20110701102746.I1335@besplex.bde.org> References: <99048.1309258976@critter.freebsd.dk> <4E0A0774.3090004@freebsd.org> <20110629082103.O1084@besplex.bde.org> <4E0B1C47.4010201@freebsd.org> <20110630073705.P1117@besplex.bde.org> <4E0C2F41.2060302@freebsd.org> <4E0CA55E.2090102@freebsd.org> <20110630165050.GB82980@zim.MIT.EDU> <4E0CAE8B.6030309@freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Alexander Best , Poul-Henning Kamp , standards@freebsd.org, Bruce Evans Subject: Re: RESENT with patch ... Re: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2011 01:09:42 -0000 On Thu, 30 Jun 2011, Stefan Esser wrote: > Am 30.06.2011 18:50, schrieb David Schultz: >> On Thu, Jun 30, 2011, Stefan Esser wrote: >>> int >>> +is_integer(const char *s) >>> +{ >>> + if (*s == '-') >>> + s++; >>> + while (isdigit(*s)) >>> + s++; >>> + return *s == '\0'; >>> +} >> >> I only glanced at the patch for a few seconds, but your >> is_integer() routine will accept a bare minus sign ("-") as an >> integer. It also invokes undefined behaviour by applying isdigit() to a value that is not necessarily representable as an unsigned char or equal to the value EOF, and gives wrong behaviour if there is a digit whose value as a plain char promotes to an int with the value EOF (perhaps the latter can't happen). strto*() is more careful. It also has a style bug (missing silly parentheses around return value). Does you current version use only the above to classify integers, so that leading whitespace is not permitted? "expr [-e] ' 2' + ' 2'" now gives 4 since strtoimax() ignores the leading whitespace, but I think the syntax doesn't allow this. This is another example of why expr's syntax is broken as designed. Shell arithmetic seems to ignore leading whitespace. This might be just because the whitespace is kept when a variable is expanded, but has no significance in shell expressions. Trailing whitespace is already disallowed in expr. This is presumably just because strto*() stops parsing digits at the whitespace and then the check for trailing garbage treats even whitespace as garbage. Shell arithmetic seems to ignore trailing whitespace. > I mentioned in the message, that I do not test for empty numeric > operands. I had considered: This seems to be another bug in old expr. It gives 4 for "expr [-e] 2 + '' + 2". > int > is_integer(const char *s) > { > if (*s == '-') > s++; > if (!isdigit(*s++) > return 0; > while (isdigit(*s)) > s++; > return *s == '\0'; > } > > But this will break script that do > > while : > do > i=`expr "$i" + 1` > done > > without first setting i=0. And I assume, such scripts exist in huge numbers. Eeek, Doesn't the syntax require them to break? The resulting expression is similar to unary plus, which is useful for similar things. > But "-" will not be accepted as operand (neither integer nor string). > > Didn't I give the example "expr -- - : -" which fails with a syntax > error? > > Therefore, I do not need to check for actual digits in the input (and > thus may accept "" as 0 for backwards compatibility), but "-" will not > be allowed as parameter to expr by the parser ("syntax error"). I tried a Linux expr (from a FreeBSD package in 2002): the following are all syntax errors ("non-numeric argument"): '+1' + 1 (unary plus) ' 1' + 1 (leading whitespace) '1 ' + 1 (trailing whitespace) '' + 1 (no digits) The following wrap silently, but not as signed long (this is on i386, but int64_t is apparently used): 1111111111111111111 + 1111111111111111111 -> 2222222222222222222 9223372036854775806 + 1 -> 9223372036854775807 9223372036854775806 + 2 -> -9223372036854775808 We have the compat mode for the broken scripts, but since they would have broken in Linux in 2002, hopefully there aren't any. Someone should check a current Linux expr. I guess it hasn't changed for unary plus and whitepsace, since it was perfectly broken to POSIX spec in 2002 and the spec hasn't changed. It might have changed the behaviour on overflow (say to make it nonsilent), since the spec doesn't require any particular brokenness then. Bruce From owner-freebsd-standards@FreeBSD.ORG Fri Jul 1 02:24:14 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 539DA1065672 for ; Fri, 1 Jul 2011 02:24:14 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from fallbackmx07.syd.optusnet.com.au (fallbackmx07.syd.optusnet.com.au [211.29.132.9]) by mx1.freebsd.org (Postfix) with ESMTP id 9F8288FC17 for ; Fri, 1 Jul 2011 02:24:13 +0000 (UTC) Received: from mail08.syd.optusnet.com.au (mail08.syd.optusnet.com.au [211.29.132.189]) by fallbackmx07.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5UNwt4m014690 for ; Fri, 1 Jul 2011 09:58:55 +1000 Received: from c122-106-165-191.carlnfd1.nsw.optusnet.com.au (c122-106-165-191.carlnfd1.nsw.optusnet.com.au [122.106.165.191]) by mail08.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id p5UNwc6E019477 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 1 Jul 2011 09:58:41 +1000 Date: Fri, 1 Jul 2011 09:58:38 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Garrett Wollman In-Reply-To: <19980.47094.184089.398324@khavrinen.csail.mit.edu> Message-ID: <20110701092909.F1164@besplex.bde.org> References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org> <19980.47094.184089.398324@khavrinen.csail.mit.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-standards@freebsd.org Subject: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2011 02:24:14 -0000 On Thu, 30 Jun 2011, Garrett Wollman wrote: > < said: > >> Well, that's the frustrating part: I had implemented 64bit range in expr >> back in 2000, but this extended range (on 32bit archs) has been made >> optional, some two years later with the commit message you quote. > > The 2001 POSIX standard states: > >> Integer variables and constants, including the values of operands >> and option-arguments, used by the standard utilities listed in this >> volume of IEEE Std 1003.1-2001 shall be implemented as equivalent to >> the ISO C standard signed long data type; floating point shall be Thus, overflow can occur, and the behaviour on overflow is undefined. >> implemented as equivalent to the ISO C standard double >> type. Conversions between types shall be as described in the ISO C >> standard. All variables shall be initialized to zero if they are not >> otherwise assigned by the input to the application. > >> Arithmetic operators and control flow keywords shall be implemented >> as equivalent to those in the cited ISO C standard section, as >> listed in Table 1-2 (on page 8). > > However, for arithemtic expansion in the shell (but *not* the expr > utility), it says: > >> As an extension, the shell may recognize arithmetic expressions >> beyond those listed. The shell may use a signed integer type with a >> rank larger than the rank of signed long. The shell may use a >> real-floating type instead of signed long as long as it does not >> affect the results in cases where there is no overflow. The part about recognizing different syntaxes is an extension and requires explicit permission. The part about "using" a rank larger than that of signed long requires no explicit permission, since it only affects the result if there is overflow, but then the behaviour is undefined. So any utility can do the latter. The as-if rule always allowed any utility to use any type to implement arithmetic, provided that the final result is no different if it is defined. Thus for example, as an implementation detail real-double can be used on i386 but not on amd64, since it has 53 value bits on both, while signed long has fewer (31) value bits on i386 and more (63) value bits on amd64. long double could be used on amd64. When the FP arithmetic produces an unrepresentable preliminary result like Inf or NaN due to overflow or worse, conversion of that result to signed long gives undefined behaviour (typically a garbage value, with the garbage being amazingly inconsistent depending on the sizes of the FP and integer types in the conversion). > The language in the 2008 version of the standard is the same. > > For expr, the following definitions are relevant (from XCU7 page > 2715): >> integer An argument consisting only of an (optional) unary minus >> followed by digits. >> string A string argument; see below. > [...] >> A string argument is an argument that cannot be identified as an >> integer argument or as one of the expression operator symbols shown >> in the OPERANDS section. Found another bug in FreeBSD expr: it accepts leading whitespace in integers (since strto*() does), but the above doesn't allow this. Leading whitespace might even be useful for turning numbers into strings (it is sort of the reverse of turning strings into numbers in awk and even in expr by adding 0 to them). The FreeBSD man page of course gives null detail about this requirement of the syntax. Found another bug of commission in the man page: % Arithmetic operations are performed using signed integer math. If the -e % flag is specified, arithmetic uses the C intmax_t data type (the largest % integral type available), and expr will detect arithmetic overflow and % return an error indication. If a numeric operand is specified which is % so large as to overflow conversion to an integer, it is parsed as a % string instead. If -e is not specified, arithmetic operations and pars- % ing of integer arguments will overflow silently according to the rules of % the C standard, using the long data type. The rule of the C standard is that overflow gives undefined behaviour, not silence. Bruce From owner-freebsd-standards@FreeBSD.ORG Fri Jul 1 20:05:27 2011 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 977AB106564A for ; Fri, 1 Jul 2011 20:05:27 +0000 (UTC) (envelope-from se@freebsd.org) Received: from nm7-vm1.bullet.mail.ne1.yahoo.com (nm7-vm1.bullet.mail.ne1.yahoo.com [98.138.90.250]) by mx1.freebsd.org (Postfix) with SMTP id 550EF8FC0A for ; Fri, 1 Jul 2011 20:05:27 +0000 (UTC) Received: from [98.138.90.57] by nm7.bullet.mail.ne1.yahoo.com with NNFMP; 01 Jul 2011 19:51:30 -0000 Received: from [98.138.226.127] by tm10.bullet.mail.ne1.yahoo.com with NNFMP; 01 Jul 2011 19:51:30 -0000 Received: from [127.0.0.1] by smtp206.mail.ne1.yahoo.com with NNFMP; 01 Jul 2011 19:51:30 -0000 X-Yahoo-Newman-Id: 673043.42219.bm@smtp206.mail.ne1.yahoo.com Received: from [192.168.119.20] (se@81.173.154.254 with plain) by smtp206.mail.ne1.yahoo.com with SMTP; 01 Jul 2011 12:51:28 -0700 PDT X-Yahoo-SMTP: iDf2N9.swBDAhYEh7VHfpgq0lnq. X-YMail-OSG: _YNjbRsVM1lk6Y8Ub8R79OlAfayrMJXCKBDrf9fqA3z2kP2 976lOD0K7rYUonFIX77M69vqA9DqVzZHLcORgN5LvRne6xhMbGDXoOJspZxM QS_XRwzvw45DuLdgIETTCJzVMySGgNFHyKWP72181RXg3y7c8r.NumEWlCF9 RBZILJV6kkfoqEonBNPUYeMsmgPEE8ME6pNefozcl9NE3F73Y0yvZQogIiOz 9Y2ZkeXJuX3j3YEALYHqZg6sUHlWoJxd33LF6BsJ_grAIy92Vh0sHocVUVph PQr3tvo90fsWAkJFaik_EjhsoaksBGJ32mYULQ0BauWjTYLbKZtpG8Of61O9 Jos36qADTTfD6DV5MUX.O7UNvCiCVdWZg_TCqpQOgHBVwVF9OMVeKNYxhz0F QL3rhAfhziL9l5vL6GAUBhHVWKWshUY6x5x5v X-Yahoo-Newman-Property: ymail-3 Message-ID: <4E0E2535.1060204@freebsd.org> Date: Fri, 01 Jul 2011 21:51:17 +0200 From: Stefan Esser User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:5.0) Gecko/20110624 Thunderbird/5.0 MIME-Version: 1.0 To: freebsd-standards@freebsd.org References: <4E09AF8E.5010509@freebsd.org> <4E0B860E.50504@freebsd.org> <20110630164653.GA82980@zim.MIT.EDU> <4E0CACFB.8040906@freebsd.org> <19980.47094.184089.398324@khavrinen.csail.mit.edu> <20110701092909.F1164@besplex.bde.org> In-Reply-To: <20110701092909.F1164@besplex.bde.org> Content-Type: multipart/mixed; boundary="------------060600010302060201060806" Cc: Garrett Wollman Subject: New patches for expr (was: Re: Fwd: [RFC] Consistent numeric range for "expr" on all architectures) X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2011 20:05:27 -0000 This is a multi-part message in MIME format. --------------060600010302060201060806 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 01.07.2011 01:58, Bruce Evans wrote: [...] > The part about recognizing different syntaxes is an extension and > requires explicit permission. The part about "using" a rank larger > than that of signed long requires no explicit permission, since it > only affects the result if there is overflow, but then the behaviour > is undefined. So any utility can do the latter. I have attached a new patch which does the following for overflow: 1) Numeric range is extended to intmax_t 2) Overflow is caught in all arithmetic operations 3) Integers not used in arithmetic operations are treated like strings > Found another bug in FreeBSD expr: it accepts leading whitespace in > integers (since strto*() does), but the above doesn't allow this. > Leading whitespace might even be useful for turning numbers into > strings (it is sort of the reverse of turning strings into numbers > in awk and even in expr by adding 0 to them). The FreeBSD man page > of course gives null detail about this requirement of the syntax. 4) Leading whitespace in numbers is only accepted in compat mode ("expr -e" or EXPR_COMPAT defined in the process environment or check_utility_compat("expr") == true). <<<<<< Aside: Is check_utility_compat() still used at all? It provides compatibility with FreeBSD-4 if the environment variable _COMPAT_FreeBSD_4 exists and contains "expr" as member of a comma separated list, or if the following symlink exists in /etc: compat-FreeBSD-4-util -> expr In fact, the symlink is also parsed as comma separated list, but expr is the only program in FreeBSD that calls check_utility_compat() ... Since expr is the only user of check_utility_compat() and the same effect can be had by specifying EXPR_COMPAT, the check_utility_compat function should be considered deprecated and only be kept in libc for backwards compatibility of old binaries, IMHO. This function adds one unnecessary call of getenv() and one of readlink() per invocation of expr and adds complexity to the man-page for expr. >>>>>>> > Found another bug of commission in the man page: > > % Arithmetic operations are performed using signed integer math. If the -e > % flag is specified, arithmetic uses the C intmax_t data type (the largest > % integral type available), and expr will detect arithmetic overflow and > % return an error indication. If a numeric operand is specified which is > % so large as to overflow conversion to an integer, it is parsed as a > % string instead. If -e is not specified, arithmetic operations and pars- > % ing of integer arguments will overflow silently according to the rules of > % the C standard, using the long data type. > > The rule of the C standard is that overflow gives undefined behaviour, not > silence. I have prepared a number of patches, which I attach to this message. All patches are relative to the current SVN version of expr.y (i.e. do *not* try to apply the higher numbered ones on a patched source file). The first file (expr.y.diff1) contains a "diff -w -u3" of the functional changes that I suggest. It is only meant to show the actual code changes (applying this diff results in mis-indented code). The second file (expr.y.diff2) has been created with the same command and contains some refactoring of functions. It is functionally identical to the first version. The third file (expr.y.diff3) contains the diff with whitespace and style changes that I suggest as final version. I plan to commit the changes in sequence (with correct indentation of each step, of course), in order to keep the change history separate for each intended purpose. The final version is some 40 lines (plus some 25 lines of stripped comments and whitespace) shorter than the current version (607 vs. 674 lines of code) and it complies with style(9), AFAICT. I also include a SHAR of regression tests meant to be committed to /usr/src/tools/regression/bin/expr. All tests pass with the current patched version of expr (while the currently shipped version dumps core in one test case). Please let me know, whether there are any objections to the functionality and the patches. If there are no strong objections, then I intend to commit the changes on July 6th. I plan to commit all three versions in short sequence (not by applying the patches, as explained above they are created with "-w" for a minimal diff and therefore lead to wrong indentation). I also plan to commit the regression tests. Suggestions for further tests are welcome. I know that tests for comparison operators are mostly missing and plan to add a few (especially string vs. decimal), but perhaps you want to suggest special boundary cases for the arithmetic tests. There have been comments regarding the wording of the man page and I want to commit an updated man page with the source file. But before finalizing the man page, I want to have finalized the functionality to be described in the man page ... Best regards, STefan --------------060600010302060201060806 Content-Type: text/plain; name="expr-test.sh" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="expr-test.sh" # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # expr/Makefile # expr/regress.sh # expr/regress.t # echo x - expr/Makefile sed 's/^X//' >expr/Makefile << 'f9600fcce6c6ed7d48045c253fb4f463' X# $FreeBSD: head/tools/regression/bin/test/Makefile 215022 2010-11-08 23:15:10Z jilles $ X Xall: X sh regress.sh f9600fcce6c6ed7d48045c253fb4f463 echo x - expr/regress.sh sed 's/^X//' >expr/regress.sh << '78ebeeef8ff008d708ef3dfc31f2d4b1' X#!/bin/sh X X#- X# Copyright (c) 2011 Stefan Esser X# All rights reserved. X# X# Redistribution and use in source and binary forms, with or without X# modification, are permitted provided that the following conditions X# are met: X# 1. Redistributions of source code must retain the above copyright X# notice, this list of conditions and the following disclaimer. X# 2. Redistributions in binary form must reproduce the above copyright X# notice, this list of conditions and the following disclaimer in the X# documentation and/or other materials provided with the distribution. X# X# THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND X# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE X# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE X# ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE X# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL X# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS X# OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) X# HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT X# LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY X# OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF X# SUCH DAMAGE. X X# X# expr.sh - check if test(1) or builtin test works X# X# $FreeBSD:$ X X# force a specified test program, e.g. `env test=/bin/expr sh regress.sh' X: ${test=expr} X Xt () X{ X # $1 -> exit code X # $2 -> result written to stdout X # $3 -> $test expression X X count=$((count+1)) X # check for syntax errors X output="`eval $test $3 2>/dev/null`" X ret=$? X if [ "$ret" -ne "$1" ]; then X printf "not ok %s - (got $ret, expected $1)\n" "$count $3" X elif [ "$output" != "$2" ]; then X printf "not ok %s - (unexpected output)\n" "$count $3" X else X printf "ok %s\n" "$count $3" X fi X} X Xcount=0 Xecho "1..55" X Xt 0 xyzzy '\( xyzzy \)' Xt 2 '' 'xyzzy + 0' X Xt 0 000100000 '000100000' Xt 0 00010000000000000000000 '00010000000000000000000' X Xt 0 1 '0 + 1' Xt 1 0 '0 + 0' Xt 0 2 '1 + 1' Xt 0 -2 '-- -1 + -1' Xt 1 0 '-- -1 + 1' Xt 2 '' '-' # syntax error Xt 2 '' '-1' # syntax error Xt 2 '' '-1 + 1' # syntax error Xt 1 0 '-- -1 + 1' Xt 2 '' '" 1" + 1' # syntax error Xt 0 2 '-e " 1" + 1' # allow leading white space with -e Xt 0 1 '"" + 1' # empty string is interpreted as 0 X Xt 0 9223372036854775807 '9223372036854775807 + 0' Xt 2 '' '9223372036854775807 + 1' Xt 0 -9223372036854775808 '-- -9223372036854775807 - 1' Xt 0 -9223372036854775808 '-- -9223372036854775808 \* 1' Xt 2 '' '-- -9223372036854775808 \* -1' Xt 0 9223372036854775807 '-- -9223372036854775807 \* -1' Xt 0 10000000000000000000 '10000000000000000000' Xt 2 '' '10000000000000000000 + 0' Xt 0 100001 '000100000 + 1' Xt 0 9223372036854775807 '4611686018427387904 - 1 - -4611686018427387904' Xt 0 -9223372036854775808 '-- -4611686018427387904 + -4611686018427387904' Xt 0 -9223372036854775808 '-- -9223372036854775808 / 1' Xt 2 '' '-- -9223372036854775808 / -1' Xt 0 -9223372036854775807 '9223372036854775807 / -1' Xt 0 9223372036854775806 '4611686018427387903 \* 2' Xt 2 '' '4611686018427387904 \* 2' Xt 0 -9223372036854775806 '4611686018427387903 \* -2' Xt 0 -9223372036854775808 '4611686018427387904 \* -2' Xt 2 '' '4611686018427387905 \* -2' Xt 0 -7 '-- -9223372036854775807 % 10' Xt 0 -7 '-- -9223372036854775807 % -10' Xt 0 7 '9223372036854775807 % 10' Xt 0 7 '9223372036854775807 % -10' Xt 0 64 '\( 2 \* 2 \) \* \( 2 \* 2 \) \* 2 \* 2' Xt 0 10 '5 \* 7 - 5 \* 5' Xt 0 50 '5 \* \( 7 - 5 \) \* 5' X Xt 0 abc '123abc123 : ".*\(a[^0-9]*\)"' Xt 0 124 '123abc123 : ".*a[^0-9]*\(.*\)" + 1' Xt 0 1 '123abc123 : ".*a[^0-9]*\(.*\)" = 123' Xt 0 7 '123abc123 : ".*a[^0-9]*" + 1' Xt 1 0 '123Abc123 : ".*a[^0-9]*"' Xt 0 1 '123Abc123 : ".*a[^0-9]*" + 1' Xt 0 1 '123Abc123 : ".*a[^0-9]*" = 0' X Xt 1 0 '2 + 0 "<" 1' Xt 0 1 '0 "<" 1 + 2' Xt 0 3 '\( 0 "<" 1 \) + 2' X XEXPR_COMPAT=1; export EXPR_COMPAT X Xt 0 -2 '-1 + -1' Xt 1 0 '-1 + 1' Xt 0 2 '" 1" + 1' 78ebeeef8ff008d708ef3dfc31f2d4b1 echo x - expr/regress.t sed 's/^X//' >expr/regress.t << '85ccbaa69b7bc1caaef6199f47c0ae78' X#!/bin/sh X# $FreeBSD: head/tools/regression/bin/test/regress.t 215022 2010-11-08 23:15:10Z jilles $ X Xcd `dirname $0` X Xsh regress.sh 85ccbaa69b7bc1caaef6199f47c0ae78 exit --------------060600010302060201060806 Content-Type: text/plain; name="expr.y.diff1" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="expr.y.diff1" --- expr.y.ORIG 2011-07-01 20:22:29.119607269 +0200 +++ expr.y.NEW1 2011-07-01 20:27:35.146459819 +0200 @@ -42,13 +42,15 @@ struct val *result; +void assert_to_integer(struct val *); int chk_div(intmax_t, intmax_t); int chk_minus(intmax_t, intmax_t, intmax_t); int chk_plus(intmax_t, intmax_t, intmax_t); int chk_times(intmax_t, intmax_t, intmax_t); void free_value(struct val *); -int is_zero_or_null(struct val *); +int is_integer(const char *); int isstring(struct val *); +int is_zero_or_null(struct val *); struct val *make_integer(intmax_t); struct val *make_str(const char *); struct val *op_and(struct val *, struct val *); @@ -65,7 +67,7 @@ struct val *op_plus(struct val *, struct val *); struct val *op_rem(struct val *, struct val *); struct val *op_times(struct val *, struct val *); -intmax_t to_integer(struct val *); +int to_integer(struct val *); void to_string(struct val *); int yyerror(const char *); int yylex(void); @@ -134,37 +136,16 @@ make_str(const char *s) { struct val *vp; - char *ep; vp = (struct val *) malloc (sizeof (*vp)); if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) { errx(ERR_EXIT, "malloc() failed"); } - /* - * Previously we tried to scan the string to see if it ``looked like'' - * an integer (erroneously, as it happened). Let strtoimax() do the - * dirty work. We could cache the value, except that we are using - * a union and need to preserve the original string form until we - * are certain that it is not needed. - * - * IEEE Std.1003.1-2001 says: - * /integer/ An argument consisting only of an (optional) unary minus - * followed by digits. - * - * This means that arguments which consist of digits followed by - * non-digits MUST NOT be considered integers. strtoimax() will - * figure this out for us. - */ - if (eflag) - (void)strtoimax(s, &ep, 10); + if (is_integer(s)) + vp->type = numeric_string; else - (void)strtol(s, &ep, 10); - - if (*ep != '\0') vp->type = string; - else - vp->type = numeric_string; return vp; } @@ -178,31 +159,33 @@ } -intmax_t +int to_integer(struct val *vp) { intmax_t i; - if (vp->type == integer) - return 1; - - if (vp->type == string) - return 0; - - /* vp->type == numeric_string, make it numeric */ + /* we can only convert numeric_string to integer, here */ + if (vp->type == numeric_string) { errno = 0; - if (eflag) { i = strtoimax(vp->u.s, (char **)NULL, 10); - if (errno == ERANGE) - err(ERR_EXIT, NULL); - } else { - i = strtol(vp->u.s, (char **)NULL, 10); - } - + /* just keep as numeric_string, if the conversion fails */ + if (errno != ERANGE) { free (vp->u.s); vp->u.i = i; vp->type = integer; - return 1; + } + } + return (vp->type == integer); +} + + +void +assert_to_integer(struct val *vp) +{ + if (vp->type == string) + errx(ERR_EXIT, "not a decimal number: '%s'", vp->u.s); + if (!to_integer(vp)) + errx(ERR_EXIT, "operand too large: '%s'", vp->u.s); } void @@ -230,6 +213,20 @@ int +is_integer(const char *s) +{ + if (eflag) + while (isspace((unsigned char)*s)) + s++; + if (*s == '-') + s++; + while (isdigit((unsigned char)*s)) + s++; + return (*s == '\0'); +} + + +int isstring(struct val *vp) { /* only TRUE if this string is not a valid integer */ @@ -350,8 +347,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) == 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i == b->u.i)); } @@ -370,8 +367,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) > 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i > b->u.i)); } @@ -390,8 +387,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) < 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i < b->u.i)); } @@ -410,8 +407,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) >= 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i >= b->u.i)); } @@ -430,8 +427,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) <= 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i <= b->u.i)); } @@ -450,8 +447,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) != 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i != b->u.i)); } @@ -479,17 +476,13 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (eflag) { r = make_integer(a->u.i + b->u.i); if (chk_plus(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i + (long)b->u.i); free_value (a); free_value (b); @@ -516,17 +509,13 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (eflag) { r = make_integer(a->u.i - b->u.i); if (chk_minus(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i - (long)b->u.i); free_value (a); free_value (b); @@ -550,17 +539,13 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (eflag) { r = make_integer(a->u.i * b->u.i); if (chk_times(a->u.i, b->u.i, r->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i * (long)b->u.i); free_value (a); free_value (b); @@ -583,21 +568,16 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); if (b->u.i == 0) { errx(ERR_EXIT, "division by zero"); } - - if (eflag) { - r = make_integer(a->u.i / b->u.i); if (chk_div(a->u.i, b->u.i)) { errx(ERR_EXIT, "overflow"); } - } else - r = make_integer((long)a->u.i / (long)b->u.i); + r = make_integer(a->u.i / b->u.i); free_value (a); free_value (b); @@ -609,19 +589,12 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } - + assert_to_integer(a); + assert_to_integer(b); if (b->u.i == 0) { errx(ERR_EXIT, "division by zero"); } - - if (eflag) r = make_integer(a->u.i % b->u.i); - /* chk_rem necessary ??? */ - else - r = make_integer((long)a->u.i % (long)b->u.i); free_value (a); free_value (b); --------------060600010302060201060806 Content-Type: text/plain; name="expr.y.diff2" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="expr.y.diff2" --- expr.y.ORIG 2011-07-01 20:22:29.119607269 +0200 +++ expr.y.NEW2 2011-07-01 20:26:47.212460189 +0200 @@ -40,15 +40,19 @@ } u; } ; +char **av; +int eflag; struct val *result; -int chk_div(intmax_t, intmax_t); -int chk_minus(intmax_t, intmax_t, intmax_t); -int chk_plus(intmax_t, intmax_t, intmax_t); -int chk_times(intmax_t, intmax_t, intmax_t); +void assert_to_integer(struct val *); +void assert_div(intmax_t, intmax_t); +void assert_minus(intmax_t, intmax_t, intmax_t); +void assert_plus(intmax_t, intmax_t, intmax_t); +void assert_times(intmax_t, intmax_t, intmax_t); void free_value(struct val *); -int is_zero_or_null(struct val *); +int is_integer(const char *); int isstring(struct val *); +int is_zero_or_null(struct val *); struct val *make_integer(intmax_t); struct val *make_str(const char *); struct val *op_and(struct val *, struct val *); @@ -65,14 +69,12 @@ struct val *op_plus(struct val *, struct val *); struct val *op_rem(struct val *, struct val *); struct val *op_times(struct val *, struct val *); -intmax_t to_integer(struct val *); +int to_integer(struct val *); void to_string(struct val *); int yyerror(const char *); int yylex(void); int yyparse(void); -static int eflag; -char **av; %} %union @@ -134,37 +136,16 @@ make_str(const char *s) { struct val *vp; - char *ep; vp = (struct val *) malloc (sizeof (*vp)); if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) { errx(ERR_EXIT, "malloc() failed"); } - /* - * Previously we tried to scan the string to see if it ``looked like'' - * an integer (erroneously, as it happened). Let strtoimax() do the - * dirty work. We could cache the value, except that we are using - * a union and need to preserve the original string form until we - * are certain that it is not needed. - * - * IEEE Std.1003.1-2001 says: - * /integer/ An argument consisting only of an (optional) unary minus - * followed by digits. - * - * This means that arguments which consist of digits followed by - * non-digits MUST NOT be considered integers. strtoimax() will - * figure this out for us. - */ - if (eflag) - (void)strtoimax(s, &ep, 10); + if (is_integer(s)) + vp->type = numeric_string; else - (void)strtol(s, &ep, 10); - - if (*ep != '\0') vp->type = string; - else - vp->type = numeric_string; return vp; } @@ -178,31 +159,33 @@ } -intmax_t +int to_integer(struct val *vp) { intmax_t i; - if (vp->type == integer) - return 1; - - if (vp->type == string) - return 0; - - /* vp->type == numeric_string, make it numeric */ - errno = 0; - if (eflag) { + /* we can only convert numeric_string to integer, here */ + if (vp->type == numeric_string) { + errno = 0; i = strtoimax(vp->u.s, (char **)NULL, 10); - if (errno == ERANGE) - err(ERR_EXIT, NULL); - } else { - i = strtol(vp->u.s, (char **)NULL, 10); + /* just keep as numeric_string, if the conversion fails */ + if (errno != ERANGE) { + free (vp->u.s); + vp->u.i = i; + vp->type = integer; + } } + return (vp->type == integer); +} - free (vp->u.s); - vp->u.i = i; - vp->type = integer; - return 1; + +void +assert_to_integer(struct val *vp) +{ + if (vp->type == string) + errx(ERR_EXIT, "not a decimal number: '%s'", vp->u.s); + if (!to_integer(vp)) + errx(ERR_EXIT, "operand too large: '%s'", vp->u.s); } void @@ -230,6 +213,20 @@ int +is_integer(const char *s) +{ + if (eflag) + while (isspace((unsigned char)*s)) + s++; + if (*s == '-') + s++; + while (isdigit((unsigned char)*s)) + s++; + return (*s == '\0'); +} + + +int isstring(struct val *vp) { /* only TRUE if this string is not a valid integer */ @@ -350,8 +347,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) == 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i == b->u.i)); } @@ -370,8 +367,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) > 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i > b->u.i)); } @@ -390,8 +387,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) < 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i < b->u.i)); } @@ -410,8 +407,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) >= 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i >= b->u.i)); } @@ -430,8 +427,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) <= 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i <= b->u.i)); } @@ -450,8 +447,8 @@ to_string (b); r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) != 0)); } else { - (void)to_integer(a); - (void)to_integer(b); + assert_to_integer(a); + assert_to_integer(b); r = make_integer ((intmax_t)(a->u.i != b->u.i)); } @@ -460,18 +457,16 @@ return r; } -int -chk_plus(intmax_t a, intmax_t b, intmax_t r) +void +assert_plus(intmax_t a, intmax_t b, intmax_t r) { - - /* sum of two positive numbers must be positive */ - if (a > 0 && b > 0 && r <= 0) - return 1; - /* sum of two negative numbers must be negative */ - if (a < 0 && b < 0 && r >= 0) - return 1; - /* all other cases are OK */ - return 0; + /* + * sum of two positive numbers must be positive, + * sum of two negative numbers must be negative + */ + if ((a > 0 && b > 0 && r <= 0) || + (a < 0 && b < 0 && r >= 0)) + errx(ERR_EXIT, "overflow"); } struct val * @@ -479,36 +474,26 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (eflag) { - r = make_integer(a->u.i + b->u.i); - if (chk_plus(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i + (long)b->u.i); + r = make_integer(a->u.i + b->u.i); + assert_plus(a->u.i, b->u.i, r->u.i); free_value (a); free_value (b); return r; } -int -chk_minus(intmax_t a, intmax_t b, intmax_t r) +void +assert_minus(intmax_t a, intmax_t b, intmax_t r) { /* special case subtraction of INTMAX_MIN */ - if (b == INTMAX_MIN) { - if (a >= 0) - return 1; - else - return 0; - } - /* this is allowed for b != INTMAX_MIN */ - return chk_plus (a, -b, r); + if (b == INTMAX_MIN && a < 0) + errx(ERR_EXIT, "overflow"); + /* check addition of negative subtrahend */ + assert_plus(a, -b, r); } struct val * @@ -516,33 +501,26 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (eflag) { - r = make_integer(a->u.i - b->u.i); - if (chk_minus(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i - (long)b->u.i); + r = make_integer(a->u.i - b->u.i); + assert_minus(a->u.i, b->u.i, r->u.i); free_value (a); free_value (b); return r; } -int -chk_times(intmax_t a, intmax_t b, intmax_t r) +void +assert_times(intmax_t a, intmax_t b, intmax_t r) { - /* special case: first operand is 0, no overflow possible */ - if (a == 0) - return 0; - /* verify that result of division matches second operand */ - if (r / a != b) - return 1; - return 0; + /* + * if first operand is 0, no overflow is possible, + * else result of division test must match second operand + */ + if (a != 0 && r / a != b) + errx(ERR_EXIT, "overflow"); } struct val * @@ -550,32 +528,25 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (eflag) { - r = make_integer(a->u.i * b->u.i); - if (chk_times(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i * (long)b->u.i); + r = make_integer(a->u.i * b->u.i); + assert_times(a->u.i, b->u.i, r->u.i); free_value (a); free_value (b); return (r); } -int -chk_div(intmax_t a, intmax_t b) +void +assert_div(intmax_t a, intmax_t b) { - /* div by zero has been taken care of before */ + if (b == 0) + errx(ERR_EXIT, "division by zero"); /* only INTMAX_MIN / -1 causes overflow */ if (a == INTMAX_MIN && b == -1) - return 1; - /* everything else is OK */ - return 0; + errx(ERR_EXIT, "overflow"); } struct val * @@ -583,21 +554,12 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); - if (b->u.i == 0) { - errx(ERR_EXIT, "division by zero"); - } - - if (eflag) { - r = make_integer(a->u.i / b->u.i); - if (chk_div(a->u.i, b->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i / (long)b->u.i); + /* assert based on operands only, not on result */ + assert_div(a->u.i, b->u.i); + r = make_integer(a->u.i / b->u.i); free_value (a); free_value (b); @@ -609,19 +571,11 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } - - if (b->u.i == 0) { - errx(ERR_EXIT, "division by zero"); - } - - if (eflag) - r = make_integer(a->u.i % b->u.i); - /* chk_rem necessary ??? */ - else - r = make_integer((long)a->u.i % (long)b->u.i); + assert_to_integer(a); + assert_to_integer(b); + /* pass a=1 to only check for div by zero */ + assert_div(1, b->u.i); + r = make_integer(a->u.i % b->u.i); free_value (a); free_value (b); --------------060600010302060201060806 Content-Type: text/plain; name="expr.y.diff3" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="expr.y.diff3" --- expr.y.ORIG 2011-07-01 20:22:29.119607269 +0200 +++ expr.y.FINAL 2011-07-01 20:25:23.527484480 +0200 @@ -1,6 +1,6 @@ %{ /*- - * Written by Pace Willisson (pace@blitz.com) + * Written by Pace Willisson (pace@blitz.com) * and placed in the public domain. * * Largely rewritten by J.T. Conklin (jtc@wimsey.com) @@ -21,7 +21,7 @@ #include #include #include - + /* * POSIX specifies a specific error code for syntax errors. We exit * with this code for all errors. @@ -40,15 +40,19 @@ } u; } ; -struct val *result; - -int chk_div(intmax_t, intmax_t); -int chk_minus(intmax_t, intmax_t, intmax_t); -int chk_plus(intmax_t, intmax_t, intmax_t); -int chk_times(intmax_t, intmax_t, intmax_t); +char **av; +int eflag; +struct val *result; + +void assert_to_integer(struct val *); +void assert_div(intmax_t, intmax_t); +void assert_minus(intmax_t, intmax_t, intmax_t); +void assert_plus(intmax_t, intmax_t, intmax_t); +void assert_times(intmax_t, intmax_t, intmax_t); void free_value(struct val *); +int is_integer(const char *); +int is_string(struct val *); int is_zero_or_null(struct val *); -int isstring(struct val *); struct val *make_integer(intmax_t); struct val *make_str(const char *); struct val *op_and(struct val *, struct val *); @@ -65,14 +69,12 @@ struct val *op_plus(struct val *, struct val *); struct val *op_rem(struct val *, struct val *); struct val *op_times(struct val *, struct val *); -intmax_t to_integer(struct val *); +int to_integer(struct val *); void to_string(struct val *); int yyerror(const char *); int yylex(void); int yyparse(void); -static int eflag; -char **av; %} %union @@ -96,23 +98,22 @@ expr: TOKEN | '(' expr ')' { $$ = $2; } - | expr '|' expr { $$ = op_or ($1, $3); } - | expr '&' expr { $$ = op_and ($1, $3); } - | expr '=' expr { $$ = op_eq ($1, $3); } - | expr '>' expr { $$ = op_gt ($1, $3); } - | expr '<' expr { $$ = op_lt ($1, $3); } - | expr GE expr { $$ = op_ge ($1, $3); } - | expr LE expr { $$ = op_le ($1, $3); } - | expr NE expr { $$ = op_ne ($1, $3); } - | expr '+' expr { $$ = op_plus ($1, $3); } - | expr '-' expr { $$ = op_minus ($1, $3); } - | expr '*' expr { $$ = op_times ($1, $3); } - | expr '/' expr { $$ = op_div ($1, $3); } - | expr '%' expr { $$ = op_rem ($1, $3); } - | expr ':' expr { $$ = op_colon ($1, $3); } + | expr '|' expr { $$ = op_or($1, $3); } + | expr '&' expr { $$ = op_and($1, $3); } + | expr '=' expr { $$ = op_eq($1, $3); } + | expr '>' expr { $$ = op_gt($1, $3); } + | expr '<' expr { $$ = op_lt($1, $3); } + | expr GE expr { $$ = op_ge($1, $3); } + | expr LE expr { $$ = op_le($1, $3); } + | expr NE expr { $$ = op_ne($1, $3); } + | expr '+' expr { $$ = op_plus($1, $3); } + | expr '-' expr { $$ = op_minus($1, $3); } + | expr '*' expr { $$ = op_times($1, $3); } + | expr '/' expr { $$ = op_div($1, $3); } + | expr '%' expr { $$ = op_rem($1, $3); } + | expr ':' expr { $$ = op_colon($1, $3); } ; - %% struct val * @@ -120,89 +121,65 @@ { struct val *vp; - vp = (struct val *) malloc (sizeof (*vp)); - if (vp == NULL) { + vp = (struct val *) malloc(sizeof(*vp)); + if (vp == NULL) errx(ERR_EXIT, "malloc() failed"); - } vp->type = integer; vp->u.i = i; - return vp; + return (vp); } struct val * make_str(const char *s) { struct val *vp; - char *ep; - vp = (struct val *) malloc (sizeof (*vp)); - if (vp == NULL || ((vp->u.s = strdup (s)) == NULL)) { + vp = (struct val *) malloc(sizeof(*vp)); + if (vp == NULL || ((vp->u.s = strdup(s)) == NULL)) errx(ERR_EXIT, "malloc() failed"); - } - /* - * Previously we tried to scan the string to see if it ``looked like'' - * an integer (erroneously, as it happened). Let strtoimax() do the - * dirty work. We could cache the value, except that we are using - * a union and need to preserve the original string form until we - * are certain that it is not needed. - * - * IEEE Std.1003.1-2001 says: - * /integer/ An argument consisting only of an (optional) unary minus - * followed by digits. - * - * This means that arguments which consist of digits followed by - * non-digits MUST NOT be considered integers. strtoimax() will - * figure this out for us. - */ - if (eflag) - (void)strtoimax(s, &ep, 10); + if (is_integer(s)) + vp->type = numeric_string; else - (void)strtol(s, &ep, 10); - - if (*ep != '\0') vp->type = string; - else - vp->type = numeric_string; - return vp; + return (vp); } - void free_value(struct val *vp) { if (vp->type == string || vp->type == numeric_string) - free (vp->u.s); + free(vp->u.s); } - -intmax_t +int to_integer(struct val *vp) { intmax_t i; - if (vp->type == integer) - return 1; - - if (vp->type == string) - return 0; - - /* vp->type == numeric_string, make it numeric */ - errno = 0; - if (eflag) { + /* we can only convert numeric_string to integer, here */ + if (vp->type == numeric_string) { + errno = 0; i = strtoimax(vp->u.s, (char **)NULL, 10); - if (errno == ERANGE) - err(ERR_EXIT, NULL); - } else { - i = strtol(vp->u.s, (char **)NULL, 10); + /* just keep as numeric_string, if the conversion fails */ + if (errno != ERANGE) { + free(vp->u.s); + vp->u.i = i; + vp->type = integer; + } } + return (vp->type == integer); +} - free (vp->u.s); - vp->u.i = i; - vp->type = integer; - return 1; +void +assert_to_integer(struct val *vp) +{ + if (vp->type == string) + errx(ERR_EXIT, "not a decimal number: '%s'", vp->u.s); + if (!to_integer(vp)) + errx(ERR_EXIT, "operand too large: '%s'", vp->u.s); } void @@ -228,15 +205,26 @@ vp->u.s = tmp; } +int +is_integer(const char *s) +{ + if (eflag) + while (isspace((unsigned char)*s)) + s++; + if (*s == '-') + s++; + while (isdigit((unsigned char)*s)) + s++; + return (*s == '\0'); +} int -isstring(struct val *vp) +is_string(struct val *vp) { /* only TRUE if this string is not a valid integer */ return (vp->type == string); } - int yylex(void) { @@ -247,10 +235,10 @@ p = *av++; - if (strlen (p) == 1) { - if (strchr ("|&=<>+-*/%:()", *p)) + if (strlen(p) == 1) { + if (strchr("|&=<>+-*/%:()", *p)) return (*p); - } else if (strlen (p) == 2 && p[1] == '=') { + } else if (strlen(p) == 2 && p[1] == '=') { switch (*p) { case '>': return (GE); case '<': return (LE); @@ -258,19 +246,17 @@ } } - yylval.val = make_str (p); + yylval.val = make_str(p); return (TOKEN); } int is_zero_or_null(struct val *vp) { - if (vp->type == integer) { + if (vp->type == integer) return (vp->u.i == 0); - } else { - return (*vp->u.s == 0 || (to_integer (vp) && vp->u.i == 0)); - } - /* NOTREACHED */ + + return (*vp->u.s == 0 || (to_integer(vp) && vp->u.i == 0)); } int @@ -278,7 +264,7 @@ { int c; - setlocale (LC_ALL, ""); + setlocale(LC_ALL, ""); if (getenv("EXPR_COMPAT") != NULL || check_utility_compat("expr")) { av = argv + 1; @@ -291,9 +277,8 @@ break; default: - fprintf(stderr, + errx(ERR_EXIT, "usage: expr [-e] expression\n"); - exit(ERR_EXIT); } av = argv + optind; } @@ -314,28 +299,27 @@ errx(ERR_EXIT, "syntax error"); } - struct val * op_or(struct val *a, struct val *b) { - if (is_zero_or_null (a)) { - free_value (a); + if (is_zero_or_null(a)) { + free_value(a); return (b); } else { - free_value (b); + free_value(b); return (a); } } - + struct val * op_and(struct val *a, struct val *b) { - if (is_zero_or_null (a) || is_zero_or_null (b)) { - free_value (a); - free_value (b); - return (make_integer ((intmax_t)0)); + if (is_zero_or_null(a) || is_zero_or_null(b)) { + free_value(a); + free_value(b); + return (make_integer((intmax_t)0)); } else { - free_value (b); + free_value(b); return (a); } } @@ -345,19 +329,19 @@ { struct val *r; - if (isstring (a) || isstring (b)) { - to_string (a); - to_string (b); - r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) == 0)); - } else { - (void)to_integer(a); - (void)to_integer(b); - r = make_integer ((intmax_t)(a->u.i == b->u.i)); + if (is_string(a) || is_string(b)) { + to_string(a); + to_string(b); + r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) == 0)); + } else { + assert_to_integer(a); + assert_to_integer(b); + r = make_integer((intmax_t)(a->u.i == b->u.i)); } - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } struct val * @@ -365,19 +349,19 @@ { struct val *r; - if (isstring (a) || isstring (b)) { - to_string (a); - to_string (b); - r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) > 0)); - } else { - (void)to_integer(a); - (void)to_integer(b); - r = make_integer ((intmax_t)(a->u.i > b->u.i)); + if (is_string(a) || is_string(b)) { + to_string(a); + to_string(b); + r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) > 0)); + } else { + assert_to_integer(a); + assert_to_integer(b); + r = make_integer((intmax_t)(a->u.i > b->u.i)); } - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } struct val * @@ -385,19 +369,19 @@ { struct val *r; - if (isstring (a) || isstring (b)) { - to_string (a); - to_string (b); - r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) < 0)); - } else { - (void)to_integer(a); - (void)to_integer(b); - r = make_integer ((intmax_t)(a->u.i < b->u.i)); + if (is_string(a) || is_string(b)) { + to_string(a); + to_string(b); + r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) < 0)); + } else { + assert_to_integer(a); + assert_to_integer(b); + r = make_integer((intmax_t)(a->u.i < b->u.i)); } - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } struct val * @@ -405,19 +389,19 @@ { struct val *r; - if (isstring (a) || isstring (b)) { - to_string (a); - to_string (b); - r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) >= 0)); - } else { - (void)to_integer(a); - (void)to_integer(b); - r = make_integer ((intmax_t)(a->u.i >= b->u.i)); + if (is_string(a) || is_string(b)) { + to_string(a); + to_string(b); + r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) >= 0)); + } else { + assert_to_integer(a); + assert_to_integer(b); + r = make_integer((intmax_t)(a->u.i >= b->u.i)); } - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } struct val * @@ -425,19 +409,19 @@ { struct val *r; - if (isstring (a) || isstring (b)) { - to_string (a); - to_string (b); - r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) <= 0)); - } else { - (void)to_integer(a); - (void)to_integer(b); - r = make_integer ((intmax_t)(a->u.i <= b->u.i)); + if (is_string(a) || is_string(b)) { + to_string(a); + to_string(b); + r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) <= 0)); + } else { + assert_to_integer(a); + assert_to_integer(b); + r = make_integer((intmax_t)(a->u.i <= b->u.i)); } - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } struct val * @@ -445,33 +429,31 @@ { struct val *r; - if (isstring (a) || isstring (b)) { - to_string (a); - to_string (b); - r = make_integer ((intmax_t)(strcoll (a->u.s, b->u.s) != 0)); - } else { - (void)to_integer(a); - (void)to_integer(b); - r = make_integer ((intmax_t)(a->u.i != b->u.i)); + if (is_string(a) || is_string(b)) { + to_string(a); + to_string(b); + r = make_integer((intmax_t)(strcoll(a->u.s, b->u.s) != 0)); + } else { + assert_to_integer(a); + assert_to_integer(b); + r = make_integer((intmax_t)(a->u.i != b->u.i)); } - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } -int -chk_plus(intmax_t a, intmax_t b, intmax_t r) +void +assert_plus(intmax_t a, intmax_t b, intmax_t r) { - - /* sum of two positive numbers must be positive */ - if (a > 0 && b > 0 && r <= 0) - return 1; - /* sum of two negative numbers must be negative */ - if (a < 0 && b < 0 && r >= 0) - return 1; - /* all other cases are OK */ - return 0; + /* + * sum of two positive numbers must be positive, + * sum of two negative numbers must be negative + */ + if ((a > 0 && b > 0 && r <= 0) || + (a < 0 && b < 0 && r >= 0)) + errx(ERR_EXIT, "overflow"); } struct val * @@ -479,36 +461,24 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); + r = make_integer(a->u.i + b->u.i); + assert_plus(a->u.i, b->u.i, r->u.i); - if (eflag) { - r = make_integer(a->u.i + b->u.i); - if (chk_plus(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i + (long)b->u.i); - - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } -int -chk_minus(intmax_t a, intmax_t b, intmax_t r) +void +assert_minus(intmax_t a, intmax_t b, intmax_t r) { - /* special case subtraction of INTMAX_MIN */ - if (b == INTMAX_MIN) { - if (a >= 0) - return 1; - else - return 0; - } - /* this is allowed for b != INTMAX_MIN */ - return chk_plus (a, -b, r); + if (b == INTMAX_MIN && a < 0) + errx(ERR_EXIT, "overflow"); + /* check addition of negative subtrahend */ + assert_plus(a, -b, r); } struct val * @@ -516,33 +486,25 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } + assert_to_integer(a); + assert_to_integer(b); + r = make_integer(a->u.i - b->u.i); + assert_minus(a->u.i, b->u.i, r->u.i); - if (eflag) { - r = make_integer(a->u.i - b->u.i); - if (chk_minus(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i - (long)b->u.i); - - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } -int -chk_times(intmax_t a, intmax_t b, intmax_t r) +void +assert_times(intmax_t a, intmax_t b, intmax_t r) { - /* special case: first operand is 0, no overflow possible */ - if (a == 0) - return 0; - /* verify that result of division matches second operand */ - if (r / a != b) - return 1; - return 0; + /* + * if first operand is 0, no overflow is possible, + * else result of division test must match second operand + */ + if (a != 0 && r / a != b) + errx(ERR_EXIT, "overflow"); } struct val * @@ -550,32 +512,24 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } - - if (eflag) { - r = make_integer(a->u.i * b->u.i); - if (chk_times(a->u.i, b->u.i, r->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i * (long)b->u.i); + assert_to_integer(a); + assert_to_integer(b); + r = make_integer(a->u.i * b->u.i); + assert_times(a->u.i, b->u.i, r->u.i); - free_value (a); - free_value (b); + free_value(a); + free_value(b); return (r); } -int -chk_div(intmax_t a, intmax_t b) +void +assert_div(intmax_t a, intmax_t b) { - /* div by zero has been taken care of before */ + if (b == 0) + errx(ERR_EXIT, "division by zero"); /* only INTMAX_MIN / -1 causes overflow */ if (a == INTMAX_MIN && b == -1) - return 1; - /* everything else is OK */ - return 0; + errx(ERR_EXIT, "overflow"); } struct val * @@ -583,51 +537,33 @@ { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } - - if (b->u.i == 0) { - errx(ERR_EXIT, "division by zero"); - } - - if (eflag) { - r = make_integer(a->u.i / b->u.i); - if (chk_div(a->u.i, b->u.i)) { - errx(ERR_EXIT, "overflow"); - } - } else - r = make_integer((long)a->u.i / (long)b->u.i); + assert_to_integer(a); + assert_to_integer(b); + /* assert based on operands only, not on result */ + assert_div(a->u.i, b->u.i); + r = make_integer(a->u.i / b->u.i); - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } - + struct val * op_rem(struct val *a, struct val *b) { struct val *r; - if (!to_integer(a) || !to_integer(b)) { - errx(ERR_EXIT, "non-numeric argument"); - } - - if (b->u.i == 0) { - errx(ERR_EXIT, "division by zero"); - } - - if (eflag) - r = make_integer(a->u.i % b->u.i); - /* chk_rem necessary ??? */ - else - r = make_integer((long)a->u.i % (long)b->u.i); + assert_to_integer(a); + assert_to_integer(b); + /* pass a=1 to only check for div by zero */ + assert_div(1, b->u.i); + r = make_integer(a->u.i % b->u.i); - free_value (a); - free_value (b); - return r; + free_value(a); + free_value(b); + return (r); } - + struct val * op_colon(struct val *a, struct val *b) { @@ -642,33 +578,30 @@ to_string(b); /* compile regular expression */ - if ((eval = regcomp (&rp, b->u.s, 0)) != 0) { - regerror (eval, &rp, errbuf, sizeof(errbuf)); + if ((eval = regcomp(&rp, b->u.s, 0)) != 0) { + regerror(eval, &rp, errbuf, sizeof(errbuf)); errx(ERR_EXIT, "%s", errbuf); } /* compare string against pattern */ /* remember that patterns are anchored to the beginning of the line */ - if (regexec(&rp, a->u.s, (size_t)2, rm, 0) == 0 && rm[0].rm_so == 0) { + if (regexec(&rp, a->u.s, (size_t)2, rm, 0) == 0 && rm[0].rm_so == 0) if (rm[1].rm_so >= 0) { *(a->u.s + rm[1].rm_eo) = '\0'; - v = make_str (a->u.s + rm[1].rm_so); + v = make_str(a->u.s + rm[1].rm_so); - } else { - v = make_integer ((intmax_t)(rm[0].rm_eo - rm[0].rm_so)); - } - } else { - if (rp.re_nsub == 0) { - v = make_integer ((intmax_t)0); - } else { - v = make_str (""); - } - } + } else + v = make_integer((intmax_t)(rm[0].rm_eo - rm[0].rm_so)); + else + if (rp.re_nsub == 0) + v = make_integer((intmax_t)0); + else + v = make_str(""); /* free arguments and pattern buffer */ - free_value (a); - free_value (b); - regfree (&rp); + free_value(a); + free_value(b); + regfree(&rp); - return v; + return (v); } --------------060600010302060201060806--