From owner-p4-projects@FreeBSD.ORG Thu Jun 29 06:33:23 2006 Return-Path: X-Original-To: p4-projects@freebsd.org Delivered-To: p4-projects@freebsd.org Received: by hub.freebsd.org (Postfix, from userid 32767) id BA4A416A540; Thu, 29 Jun 2006 06:33:22 +0000 (UTC) X-Original-To: perforce@FreeBSD.org Delivered-To: perforce@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 81DC316A51C for ; Thu, 29 Jun 2006 06:33:22 +0000 (UTC) (envelope-from swhitman@FreeBSD.org) Received: from repoman.freebsd.org (repoman.freebsd.org [216.136.204.115]) by mx1.FreeBSD.org (Postfix) with ESMTP id 24FDB44745 for ; Thu, 29 Jun 2006 06:03:23 +0000 (GMT) (envelope-from swhitman@FreeBSD.org) Received: from repoman.freebsd.org (localhost [127.0.0.1]) by repoman.freebsd.org (8.13.6/8.13.6) with ESMTP id k5T63NnK015467 for ; Thu, 29 Jun 2006 06:03:23 GMT (envelope-from swhitman@FreeBSD.org) Received: (from perforce@localhost) by repoman.freebsd.org (8.13.6/8.13.4/Submit) id k5T63MfB015464 for perforce@freebsd.org; Thu, 29 Jun 2006 06:03:22 GMT (envelope-from swhitman@FreeBSD.org) Date: Thu, 29 Jun 2006 06:03:22 GMT Message-Id: <200606290603.k5T63MfB015464@repoman.freebsd.org> X-Authentication-Warning: repoman.freebsd.org: perforce set sender to swhitman@FreeBSD.org using -f From: Spencer Whitman To: Perforce Change Reviews Cc: Subject: PERFORCE change 100265 for review X-BeenThere: p4-projects@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: p4 projects tree changes List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 29 Jun 2006 06:33:23 -0000 http://perforce.freebsd.org/chv.cgi?CH=100265 Change 100265 by swhitman@swhitman_joethecat on 2006/06/29 06:03:09 Commented files Affected files ... .. //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/SocTask1#4 edit .. //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/cpp.c#4 edit .. //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/file.c#4 edit .. //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/k.c#5 edit .. //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/k.h#4 edit .. //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/string.c#4 edit Differences ... ==== //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/SocTask1#4 (text+ko) ==== @@ -55,3 +55,163 @@ beneficially in the FreeBSD kernel source code. Implement it + + + +____________________________________________________________________ + +Domain specific languages have been known for over three decades and +has been a widely accepted paradigm for more than half of that time. + +So the time has come to create a DSL for kernel coding. + +Only problem is, we're not quite sure what it should do and migration +is a tricky business on its own, because people are so damned +conservative in this project. + +In FreeBSD we do not want to get any further into the compiler +business than we have to, GCC has always been a major headache for +us over the years, and we just wish we didn't have to even think +about compilers at all. + +So the "real" task for the secret K-language conspiracy (largely +myself as the GodFather with George (gnn@) and Diomedis (dss@) +as my henchmen) is to sneak 'K' in by the backdoor, by making +the developers life a little bit easier by every step we take. + +If we look at the endlösung we're aiming at, it consists of a +"kcc" compiler which compiles .h .c and .k files into C language +which A C-compiler, (likely GCC) will turn into executable code +for us. + +One benefit appears right there: No matter how they screw up +GCC in the future, we have a layer where we can isolate our +source code from these screwups. (Or imagine the STDC people +suddenly making "lock" a reserved word or something). + +It also follows from the above that 'K' itself must be a superset +of 'C', with the footnote that the 'C' we talk about is the subset +of STDC which we have settled on using for the FreeBSD kernel. +(There are things in STDC we don't use in the kernel. Trigraphs, +floating point and wide characters being the most prominent examples.) + +So in order to get anywhere, we need to do is to be able to insert +a program in compiler chain which will not affect the compilation, +but which will give us a place to start implementing and experimenting +with the K extensions to C. + +Inserting such a program will slow compilation down a bit, +so we need to bring some benefit to justify this slowdown. + +But there is another avenue in: In FreeBSD we have the style(9) +coding style, and we could gain some traction if provided +a program which would warn about transgressions on style(9) +in the same way as lint(1) warns about transgressions on C. + +This is a less heavy burden to lift because we do not need to +generate code, only messages based on our analysis. + + +This is where we are right now: trying to write that program +and trying to identify and implement those benefits. + + +The code is basically a small lexer&parser for the FreeBSD subset of STDC. +If you run the FreeBSD kernel sources through a CPP macroprocessor +first, my code will lex and parse the kernel sources correctly. + +It doesn't generate any code at this point, it merely avoids +barfing. + + +So your first task is to implement the necessary CPP macro processor +facilities so that we can avoid using an external CPP to run FreeBSD +kernel sources through. + +This basically means #define, #if, #ifdef ... #endif and macro +expansion. + +The good news is that it shouldn't take too long, CPP is a pretty +simple concept, although some of the STDC decisions fouls up some +corner cases. + +The bad news is that there always seems to be some piece of code +which relies on any particular weird corner case of the CPP language. + + +Next step is to look for tangible benefits. + +The #! expansion is my first guess (but better ideas are very +welcome!) and after that we should probably see if we can detect +unused #include files (a continuing problem in FreeBSD) and after +that look for things in style(9) which we can detect with the +full cpp/lexer/parser combo. + +However, this is merely my ideas, and if you have or come across +better ideas I am all ears. + + +I hope you understand that all these weird restrictions are not +put in place to make your life miserable. Introducing a new +language for kernel coding in a conservative project like FreeBSD +takes some careful planning and there are many toes we need to +avoid stomping on. But 14 years of experience with this crew +has taught me that making their life easier in the long run will +always win the hearts in the end. + + +Now that you have studied the code a bit, I hope you can see how I +tried to avoid copying data around more than necessary, for instance +by pointing from the tokens into the original file rather than copy +them into the to token structure etc. + +This is an attempt to try to drag modern performance programming +practice into the compiler, in the hope that we will end up with +a compiler which can run very efficiently on modern multi-core +cpus. + +A traditionally particioned compiler like GCC runs as three +processes with pipes between them: + + cpp | cc1 | asm + +The pipes means that the process has to dive into the kernel, +fiddle around with locking there etc. + +In the K compiler, I still want to have distinct stages for reasons +of structure, but I want them to live in the same process and hand +data over without bothering the kernel if at all possible. + + +The other thing which is important to me is that we build a graph +so we can track backwards for error reporting. + +Some of the more horrible macros can give quite unhelpful diagnostics +if used wrong, because the error is emitted from one of the middle +layer of a stack of macro expansions. + +I would like the compiler to emit very detailed error messages, showing +step by step how it ended up with the tokens it tried to process. +Something like this mock-up of an error message: + + Syntax error: Identifier expected, found floating number: + 4.56 += 3.14; + ^^^^ + expanded from macro ADDFP(a,b) + defined at fooinclude.h line 8 + called from fooinclude.h line 12 + ADDFP(4.56, 3.14); + expanded from macro PLUSPI(aa) + defined at fooinclude.h line 9 + called from mymacros.h line 123 + g = PLUSPI(4.56); + #included from mysrc.c line 4 + #include "mymacros.h" + +To do this, you have to build a tree for the macro expansions so +that you can backtrack to generate these messages. But do keep in +mind, most of the time the messages will not be emitted, so you +should design the tree to be fast in the normal case where all that +info will never be used. It doesn't matter if linear searches are +necessary to generate the diagnostic message, the programmer will +be wasting far more time to fix the mistake any way. ==== //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/cpp.c#4 (text+ko) ==== ==== //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/file.c#4 (text+ko) ==== @@ -19,9 +19,12 @@ int fd; filename = String(filename, NULL); + + /* Check if this file has been loaded already */ TAILQ_FOREACH(s, &sourcefile_head, list) if (s->filename == filename) return (s); + fd = open(filename, O_RDONLY); if (fd < 0) return (NULL); ==== //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/k.c#5 (text+ko) ==== @@ -111,21 +111,32 @@ struct h *hf, *hg; char *p; + /* Set up print stuff */ register_printf_render('T', printf_render_token, printf_arginfo_token); register_printf_render_std("HVQM"); setbuf(stdout, NULL); + /* Set up string tokens */ InitString(); + /* Create a new list of tokens and initalize the symbol lists*/ hg = NewH(); hg->sym = NewSymScope(); + /* Initalize type information */ InitTypes(); #if 0 CppIarg("-I/usr/include"); #endif - + /* Get command line arguments + * D: Not implemented + * U: Not implemented + * I: Include file optarg + * W: Not implemented + * c: Not implemented + * default: print usage + */ while ((ch = getopt(argc, argv, "cD:U:I:W:")) != -1) { switch (ch) { case 'D': CppDUarg(hg, optarg, 1); break; @@ -140,27 +151,38 @@ } argc -= optind; argv += optind; + /* Exit in case of no file */ if (argc < 1) errx(1, "Missing file argument(s)"); for (ch = 0; ch < argc; ch++) { -printf("argv[%d] = %Q\n", ch, argv[ch]); - p = strrchr(argv[ch], '.'); - if (p == NULL) - errx(1, "No '.' in filename %Q", argv[ch]); - if (p[1] == 'h') { -printf("H file\n"); - hf = hg; - } else if (p[1] == 'c') { -printf("C file\n"); - hf = NewH(); - hf->sym = hg->sym; - PushSymScope(hf); - } else - errx(1, "Unknown filename suffix %Q", p); - Cpp(hf, argv[ch]); - if (0) - DumpRefs(stdout, hf); + printf("argv[%d] = %Q\n", ch, argv[ch]); + + /* Determin what type of file has been passed to K */ + p = strrchr(argv[ch], '.'); + + if (p == NULL) + errx(1, "No '.' in filename %Q", argv[ch]); + + if (p[1] == 'h') { + printf("H file\n"); + /* Use hg's token and symbol lists */ + hf = hg; + } else if (p[1] == 'c') { + printf("C file\n"); + /* Create a new list of tokens */ + hf = NewH(); + /* Set the symbol list to hg's */ + hf->sym = hg->sym; + /* Add a new symbol scope to hf */ + PushSymScope(hf); + } else + errx(1, "Unknown filename suffix %Q", p); + + Cpp(hf, argv[ch]); + + if (0) + DumpRefs(stdout, hf); if (0) DumpTokens(stdout, hf); if (p[1] == 'c') { @@ -169,7 +191,7 @@ PopSymScope(hf); } } - + return (0); } ==== //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/k.h#4 (text+ko) ==== @@ -5,8 +5,8 @@ /* -------------------------------------------------------------------*/ struct s { - const char *b; - const char *e; + const char *b; /* Begining of a file (in file.c) */ + const char *e; /* End of the file (in file.c) */ struct ref *r; }; ==== //depot/projects/soc2006/swhitman-K_Kernel_Meta-Language/k/string.c#4 (text+ko) ==== @@ -36,7 +36,7 @@ { struct string *s; struct string_head *h; - unsigned l, hash; + unsigned l, hash; /* XXX hash is unused here */ assert(b != NULL); if (e == NULL) { @@ -50,6 +50,7 @@ hash = *b; if (l > 1) hash = (hash << 8) | b[1]; + /* Have we already inserted this string into the hash table? */ h = &strings[*b % NHASH]; LIST_FOREACH(s, h, list) { if (b == s->string)