From owner-freebsd-questions@FreeBSD.ORG Sun Aug 3 12:34:31 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2BF1E37B417 for ; Sun, 3 Aug 2003 12:34:31 -0700 (PDT) Received: from out003.verizon.net (out003pub.verizon.net [206.46.170.103]) by mx1.FreeBSD.org (Postfix) with ESMTP id 29D8443F3F for ; Sun, 3 Aug 2003 12:34:30 -0700 (PDT) (envelope-from cswiger@mac.com) Received: from mac.com ([151.205.189.55]) by out003.verizon.net (InterMail vM.5.01.05.33 201-253-122-126-133-20030313) with ESMTP id <20030803193429.IVAC21766.out003.verizon.net@mac.com> for ; Sun, 3 Aug 2003 14:34:29 -0500 Message-ID: <3F2D63C2.7010501@mac.com> Date: Sun, 03 Aug 2003 15:34:26 -0400 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en-us, en MIME-Version: 1.0 Cc: freebsd-questions@freebsd.org References: <3F1322A9.8080805@mac.com> <20030731225137.GA15353@rot13.obsecurity.org> <3F29C399.6070108@mac.com> <20030801020842.GA16234@rot13.obsecurity.org> <3F29D0E1.30800@mac.com> <20030801030039.GA87206@falcon.midgard.homeip.net> <3F2BE47A.5080302@mac.com> <20030802174536.GA14507@falcon.midgard.homeip.net> <3F2C1679.5010702@mac.com> <20030802215728.GA16558@falcon.midgard.homeip.net> In-Reply-To: <20030802215728.GA16558@falcon.midgard.homeip.net> X-Enigmail-Version: 0.76.4.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out003.verizon.net from [151.205.189.55] at Sun, 3 Aug 2003 14:34:28 -0500 Subject: Re: buggy optimization levels... X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Aug 2003 19:34:31 -0000 Erik Trulsson wrote: > On Sat, Aug 02, 2003 at 03:52:25PM -0400, Chuck Swiger wrote: [ ... ] > That wasn't my real point anyway. I was trying to refute your statement > that "Even if the code contains a bug, "cc -O" and "cc -O -fgcse" > should produce the same results." > > I claim that if the code has a bug that results in undefined behaviour > then the compiler is allowed to produce different results when invoked > with different optimization flags. If the code being compiled has a bug that results in undefined behavior, the compiler is allowed to produce different results when invoked with different optimization flags. While true, that doesn't refute my statement: what the compiler is allowed to do, and what the compiler should do, is required not to diverge for code which does not involve undefined behavior. [ ... ] >>Page 586 of _Compilers: Principles, Techniques, and Tools_ states: >> >>First, a transformation must preserve the meaning of programs. That is, an >>"optimization" must not change the output produced by a program for a given >>input, or cause an error, such as a division by zero, that was not present >>in the original program. The influence of this criterion prevades this >>chapter; at all times we take the "safe" approach of missing an opportunity >>to apply a transformation rather than risk changing what the program does. >> >> -- >> >> Like your divide-by-zero example above, or this paragraph about "semantics" >> vs "meaning", I'm not going to disagree if you want to state that the >> running time of a program is part of the "behavior". However, using the >> terms in such a fashion precludes you from understanding the intended >> meaning in this particular context. > > I understand the intended meaning. I just don't agree completely with > your reasoning. I would say that the compiler is not allowed to > *introduce* errors, or to change the meaning of *correct* programs. So far, so good. > (Correct here essentially meaning programs that do not invoke undefined > behaviour.) For programs that do not have a defined 'meaning' the > compiler is free to do anyting. This is wrong. The fact that integer divide-by-zero by not defined by ANSI C (C89, et al), means the following program does not have well-defined behavior: int main() { return 120 / 0; } ...however, what happens if you remove the semicolon after the zero? That results in a syntax error, and the compiler is _not_ "free to do anything" it likes in such a case: it is required to identify the syntax error in code which fails to parse by issuing an error message: d.c: In function `main': d.c:3: syntax error before `}' There is more to the point just above than just identifying syntax errors. If you have 5 input source files (call them "translation units"), of which four are valid code, and the fifth contains an error resulting in undefined behavior, the compiler is required to compile four of the five source files in a well-defined fashion. One can repeat that analysis for code within the fifth input source file: if you moved only the function containing the bug/undefined code to a sixth file, one would discover that the compiler will handle the rest of the code in that file in a well-defined fashion. One could repeat the analysis yet again, using units known as "basic blocks", which may either be a single intermediate code instruction (that's intermediate code within the AST, not the input source code), or a sequence of instructions terminated by a flow-of-control instruction like a branch, return from subroutine, memory protection/barrier commands, etc. > If a compiler was not allowed to change the result of running a program > having undefined behaviour when compiled with different optimization > flags, then this would preclude doing just about any optimization. No, it would not. Code optimization consists of transformations to the AST based on algebraic invariants, liveness analysis used by CSE and dead-code removal, loop invariants, and other techniques which are universal (platform-independant), as well as register allocation, peephole analysis, and other transformations to the target code which are platform-specific. What happens to source code with undefined semantics isn't particularly useful to the person writing compiler: valid optimization techniques are required to not change the meaning of any possible well-defined input source code, so being able to behave differently in the case of undefined semantics doesn't help. > I am fairly certain that for *all* the specific optimizations available > in gcc it is possible to find *some* program that will give different > results depending on if that optimization ws used or not. Are you claiming that every specific optimization available in gcc contains bugs? :-) > If one were to accept your claims that the compiler should not perform > any optimizations that could change the behaviour any program, then it > would not be able to do any optimization at all, which is clearly not a > desirable situation. It is entirely possible to write an optimizing compiler which does not make any changes to the behavior of any valid input source program. Aho, Sethi, and Ullman spend about 150 pages discussing how to do so in the reference I provided. A simple example would be: if (0 && (expression)) statement; DeMorgan's laws and short-circuit evaluation in C let you simply that to: if (0) statement; ...which will then be removed entirely by dead-code analysis. > The reason why the C standard does not define the behaviour of certain > constructions or code sequences, and why compilers give different > results when compiling such code, is that doing so allows compilers to > perform more aggressive optimizations on correct code. People sometimes compile without using the optimizer at all, yes? The issues of why "the C standard does not define the behavior..." and "code optimization" are orthogonal. For the most part, the reason why C standard leaves things undefined is because of platform-specific differences in how the underlying hardware behaves-- things like register, long, and pointer size, byte order. It's generally easier to write optimizations where things are well-defined than where they are not: in particular, the pointer aliasing issues in C make liveness analysis a nightmare compared to Java or Python. -- -Chuck