From owner-freebsd-hackers@FreeBSD.ORG Thu Nov 27 21:06:29 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A97B8106564A for ; Thu, 27 Nov 2008 21:06:29 +0000 (UTC) (envelope-from christoph.mallon@gmx.de) Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx1.freebsd.org (Postfix) with SMTP id F0A078FC13 for ; Thu, 27 Nov 2008 21:06:28 +0000 (UTC) (envelope-from christoph.mallon@gmx.de) Received: (qmail invoked by alias); 27 Nov 2008 20:39:46 -0000 Received: from p54A3F275.dip.t-dialin.net (EHLO tron.homeunix.org) [84.163.242.117] by mail.gmx.net (mp041) with SMTP; 27 Nov 2008 21:39:46 +0100 X-Authenticated: #1673122 X-Provags-ID: V01U2FsdGVkX1+kjU9aGVCkpW4HIVB4ILKfqDVLp+EPMxg7rqL6PD OJK86Sz9Bt1IBu Message-ID: <492F0591.7050807@gmx.de> Date: Thu, 27 Nov 2008 21:39:45 +0100 From: Christoph Mallon User-Agent: Thunderbird 2.0.0.18 (X11/20081124) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.42 Subject: New C compiler and analyzer lang/cparser in ports X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Nov 2008 21:06:29 -0000 A few days ago libFIRM[1] and cparser were added to the ports tree. If you want to see, what other compilers besides GCC have to offer, this might be of interest for you. libFIRM is a modern optimizing intermediate representation (IR) library. cparser is a C compiler providing many useful warnings and uses libFIRM for optimization and code generation. libFIRM is a library implementing a modern graph based IR for compilers, which exclusively operates on program graphs in SSA-form. It is developed at the IPD at the Universität Karlsruhe (TH)[2]. SSA is maintained beginning at the construction of the IR graph until assembler code is emitted. libFIRM offers many analyses and optimizations, provides extensive debugging support and includes a backend framework. At the moment it can produce code for 32bit x86 processors. There are also unfinished backends for ARM, MIPS and PPC32. The backend uses a novel SSA based register allocator and includes several algorithms for instruction scheduling, spilling, copy coalescing and basic block scheduling. All stages of the compilation process can be dumped and inspected with our graph viewer yComp[3]. cparser is a C compiler, which can parse C89 and C99 as well as many GCC and some MSVC extensions. The handled GCC extensions include __attribute__, inline assembler, computed goto and statement expressions. The compiler driver is compatible with with GCC (-fxxx, -Wxxx, -M, ...). It also provides many useful analyses for warnings - for some examples see below. Take a look at the examples below to get a glimpse of what cparser and libFIRM have to offer. You can find more information about them at http://www.libfirm.org. Of course, they are capable of compiling non-trivial programs like the SPEC INT2000 suite and Quake 3[4]. If I have sparked your interest in cparser and libFIRM, try the port lang/cparser. If you have any questions, suggestions or problems regarding cparser and libFIRM, please feel free to contact us. Regards Christoph Examples -------- One goal is to provide concise error messages, which help to pinpoint the cause of the problem. This is achieved by using a handwritten recursive descent parser, which includes some heuristics to produce sensible error messages. A further aim is to restart parsing immediately after a parse error, in order to avoid subsequent errors. In this way, the number of stray errors after common problems, like a missing #include or a misspelled type name, is minimized. Here is a small example: 1 struct X { 2 unsigned lng x; 3 unsigned int y; 4 }; 5 vodi f(struct X* p) { p.x = 23; } 6 void g(struct X* p) { p->y = p.z; } parse_error.c:2: error: discarding stray 'symbol lng' in declaration specifier parse_error.c:5: error: 'symbol vodi' does not name a type parse_error.c:5: error: request for member 'x' in something not a struct or union, but 'struct X*' parse_error.c:6: error: request for member 'z' in something not a struct or union, but 'struct X*' parse_error.c:6: error: 'struct X*' has no member named 'z' For comparison, these are the error messages by GCC 4.2.1: parse_error.c:2: error: expected ':', ',', ';', '}' or '__attribute__' before 'x' parse_error.c:5: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'f' parse_error.c: In function 'g': parse_error.c:6: error: 'struct X' has no member named 'y' parse_error.c:6: error: request for member 'z' in something not a structure or union cparser can detect a large number of undesirable constructs and warns about them. Here are a few examples: * unused and never read variables 1 #include 2 void f(void) 3 { 4 int x; 5 if ((x = rand())) 6 x = x + 1; 7 } write_only.c:4: warning: variable 'x' is never read 'x' is warned about, because it is only read to calculate its own new value. GCC does not detect this. Similar cases to line 5 have been found by Coverity in the FreeBSD source code recently[5]. * unreachable code 1 #include 2 #include 3 int main(void) 4 { 5 for (int i = 0; i != 10; ++i) { 6 if (rand() == 23) /* mind the missing {} */ 7 puts("ouch"); 8 return 1; 9 } 10 return 0; 11 } unreachable.c:5: warning: step of for-statement is unreachable GCC does not warn here. * format string checker for char and wide char (wchar_t) functions 1 #include 2 #include 3 int main(int argc, char* argv[]) 4 { 5 printf("%u %s", argc); 6 wprintf(L"%S", "hello", 42); 7 return 0; 8 } format.c:5: warning: argument type 'int' does not match conversion specifier '%u' at position 1 format.c:5: warning: too few arguments for format string format.c:6: warning: argument type 'char*' does not match conversion specifier '%S' at position 1 format.c:6: warning: 2 arguments but only 1 format specifier GCC is only capable of checking char, but not wchar_t, format strings. * missing return statments 1 int f(int x) 2 { 3 switch (x) { 4 case 42: return 0; 5 case 23: break; 6 default: return 1; 7 } 8 } missing_return.c:5: warning: control reaches end of non-void function The location is exactly pinpointed: The last statement before leaving the function is the break. In contrast GCC refers to line 8. * typical security warnings 1 #include 2 #include 3 void f(char* x) 4 { 5 if (getpid || &*&x || x == "bla") {} 6 } security.c:5: warning: the address of 'getpid' will always evaluate as 'true' security.c:5: warning: the address of 'x' will always evaluate as 'true' security.c:5: warning: comparison with string literal results in unspecified behaviour * missing and redundant declarations 1 static void f(void); 2 int x; 3 static void f(void) {} decl.c:2: warning: no previous declaration for 'int x' decl.c:1: warning: unnecessary static forward declaration for 'void f(void)' The warning about 'x' is a typical indicator for a missing header include or that the variable should be static. The first declaration of 'f' is unnecessary, because there is no use before the definition of the function. The generated assembler is heavily annotated. Some example C source code: 1 #include 2 #include 3 int f(void) 4 { 5 int x = rand(); 6 if (rand()) 7 puts("bla"); 8 return x; 9 } In the resulting assembler output below you see: * list of predecessors and estimated execution frequency for every basic block /* .L897: preds: 1034, freq: 0.500000 */ * comments telling the correspoding node in the intermediate representation and the original source line (if available) of an instruction. The numbers in [] are internal node counters, which can be referred to in yComp or be used in conjunction with the debugging features of libFIRM. /* ia32_CallT[1086:53] annotate.c:6 */ * function begin/end markers (good for scripting) # -- Begin f # -- End f The resulting assembler output: .section .text .Ltext0: # -- Begin f .p2align 4,,15 .globl f .type f, @function f: /* .L1034: preds: none, freq: 1.000000 */ subl $4, %esp /* be_IncSPIu[1001:12] */ call rand /* ia32_CallT[1059:26] ann.c:5 */ movl %eax, (%esp) /* ia32_StoreM[1213:171] */ call rand /* ia32_CallT[1086:53] ann.c:6 */ testl %eax, %eax /* ia32_TestIu[1220:178] ann.c:6 */ jne .L776 /* ProjX[805:139] ann.c:6 */ /* fallthrough to .L897 */ /* ProjX[806:140] ann.c:6 */ /* .L897: preds: 1034, freq: 0.500000 */ popl %eax /* ia32_PopT[1218:176] */ ret /* be_ReturnX[1031:125] ann.c:8 */ .p2align 4,,7 .L776: /* preds: 1034, freq: 0.500000 */ pushl $Lstr.168 /* ia32_PushT[1223:181] ann.c:7 */ call puts /* ia32_CallT[1118:85] ann.c:7 */ addl $4, %esp /* be_IncSPIu[982:113] */ popl %eax /* ia32_PopT[1221:179] */ ret /* be_ReturnX[1020:107] ann.c:8 */ .size f, .-f # -- End f .section .rodata .type Lstr.168, @object .size Lstr.168, 4 Lstr.168: .string "bla" ---- [1] http://www.libfirm.org/ [2] http://www.info.uni-karlsruhe.de/ [3] http://www.info.uni-karlsruhe.de/software.php/id=6&lang=en [4] specifically ioquake3 [5] for example see http://lists.freebsd.org/pipermail/svn-src-all/2008-November/001623.html