From owner-freebsd-qa Tue Nov 21 6:41: 1 2000 Delivered-To: freebsd-qa@freebsd.org Received: from smtp.web.de (pop3.web.de [212.227.116.81]) by hub.freebsd.org (Postfix) with SMTP id 15BA237B479 for ; Tue, 21 Nov 2000 06:40:50 -0800 (PST) Received: from kinetic.ki.informatik.uni-frankfurt.de by smtp.web.de with smtp (freemail 4.2.1.0 #13) id m13yEbA-005DM5C; Tue, 21 Nov 2000 15:40 +0100 Received: from localhost (localhost [127.0.0.1]) by kinetic.ki.informatik.uni-frankfurt.de (8.8.8/8.8.8) with ESMTP id PAA06428 for ; Tue, 21 Nov 2000 15:40:31 +0100 (CET) (envelope-from marko@kinetic.ki.informatik.uni-frankfurt.de) To: qa@freebsd.org Subject: Tool support... Reply-To: marko@ki.informatik.uni-frankfurt.de Mime-Version: 1.0 Content-Type: Text/Plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable From: MarkoSchuetz@web.de X-Mailer: Mew version 1.92.4 on Emacs 19.34 Message-Id: <20001121154030Z.marko@kinetic.ki.informatik.uni-frankfurt.de> Date: Tue, 21 Nov 2000 15:40:30 +0100 X-Dispatcher: imput version 971024 Lines: 417 Sender: owner-freebsd-qa@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG IIRC I proposed this once before. Here is some text that advocates aegis... I have used it myself, although on a single person project. I have found the aggregation of automated tests in the repository to be an invaluable help. What do others think? How could such a tool be integrated into FreeBSD's development model? Marko -----snip----- Proposal: Aegis to manage Linux kernel development Peter Miller (millerp@canb.auug.org.au) Fri, 26 Mar 1999 12:27:53 +0100   * Messages sorted by: [ date ][ thread ][ subject ][ author ]   * Next message: Chip Salzenberg: "Re: NetGear FA310TX/tulip.c"   * Previous message: Dan Hollis: "Re: NetGear FA310TX/tulip.c"   * Next in thread: Andrew Morton: "Re: Proposal: Aegis to manage Linux kernel development"   * Maybe reply: Andrew Morton: "Re: Proposal: Aegis to manage Linux kernel development"   * Maybe reply: Peter Miller: "Re: Proposal: Aegis to manage Linux kernel development" ------------------------------------------------------------------------------- Purpose of this Posting Recently there have been discussions about how to manage the Linux kernel sources, rapidly side-tracking into how CVS isn't sufficiency capable to do the job. These discussions appear in numerous places on the Internet, and have even appeared in more public forums, such as the recent Linux Expo. I would like to suggest a candidate for serious consideration: Aegis This post is rather long, and I apologize in advance if you feel this topic is an inappropriate use of linux-kernel bandwidth. While it is a "meta" issue, about management of the kernel sources, rather than about the kernel itself, no other forum would appear more appropriate. Summary for the Impatient Source management is not enough. The Linux kernel is more than the aggregation of its source files. A tool which supports the the software development process for large teams is required. Aegis supports large teams and large projects. Aegis is designed around change sets. Aegis is designed around repository security (availability, integrity and confidentiality). Aegis' distributed development uses this existing mature functionality to keep two or more repositories synchronized. Aegis supports multiple repositories, multiple lines of development, multiple distributed copies of repositories, disconnected operation, and is security conscious. Aegis is licensed under the GNU GPL. Aegis is mature software. It is 8 years old. It has users all around the world. It is actively being maintained and enhanced. Aegis is easy to use. Is -is- big, it -does- have a lot of functionality, but the essential process can be learned in less than a day. Aegis is available from http://www.canb.auug.org.au/~millerp/aegis/ Please download it, plus one of the template projects, to get a feel for the environment. If you would like more information, there is also a Reference Manual and User Guide available from the same place. Source Management is not sufficient In looking for a better way to manage the Linux kernel sources, it is necessary to look beyond the obvious and perennial file bashing, to see if there could be a larger picture. In writing software, there is one basic underlying activity, repeated again and again: edit, build, test, check, commit Different textbooks and tools will call the various steps different things, like edit, make, Unit Test, Peer Review, check-in and for single-person projects, some of these steps are so abbreviated as to be almost invisible, especially when you simply jump in and edit the files in the master source directly. And the activities are rarely so pure, usually there are iterations and backtracking, which also serves to obscure the underlying commonality of software development. The review step, in particular, often moves around a great deal. For the maintainer of an Internet project, the activities are remarkably similar: edit: apply an incoming patch, build it (also serves to make sure it is consistent with itself and the rest of the project), test: make sure it works (does the thing right), review: make sure it is appropriate (does the right thing), commit: yes, I'll accept this The term ``source management'' carries with it a focus on the source files, but the activities outlined above only talk about files indirectly! Source management alone is not enough. Tools like RCS and SCCS concentrate exclusively on single files. CVS also concentrates on files, but only at a slightly higher level. Enter the Change Set One of the most obvious things about the software development process outlined above, is that it is about *sets* of files. You almost always edit several files to fix a bug or add a new feature, you then build them to stitch them together into the project, you test them as a set, if there is a review they will be reviewed as a set, and you commit them together. A project makes progress by applying a series of these change sets, so tracking them is the only way to re-create self- consistent prior versions of the project. Software developers, however, frequently work on several changes at once. Figuring out where one change sets stops and another starts requires a modicum of discipline. The fuzziness of the boundaries often serves to obscure the underlying presence of change sets. But are change sets enough? Change sets are, after all, a way of aggregating the right versions of sets of *files*, and the software development process above only mentions change sets indirectly. What Could be More than Change Sets? For many developers, even those working in large companies and in large teams, change sets are the best tool they have. They work, day in and day out, with change sets. And they get the job done. But take a look, for a moment, at what the project maintainer does: if the patch doesn't apply cleanly, don't accept it if the patch doesn't build, don't accept it if the patch doesn't test OK, don't accept it if the patch doesn't look right, don't accept it else commit Stepping back a bit, you will notice that these apply equally in work within a software house. How often have we all seen stuff which was allowed to skip one of the validations, only to get yanked and re-fixed later? The next step in improving the development process is automating the tracking of these steps, to make sure each one has been done. Some tools merely beep at you if you skip a step, others make the validations mandatory before a commit may occur. Mandatory things usually get developers riled up, and prevent introduction of the tools. But these validations are done for a purpose: they are there to catch stuff-ups *before* they reach the repository. They exist to defend the quality of the product. They are not arbitrary rules, they are just checking that we are doing the things we say we are doing already. The pay-back for such a tool is to detect such process blunders before they introduce defects into the project. Fixing them before they are committed is less effort than fixing them after they are released (if we are to believe cumulative experience *and* the numerous studies). Let's look at the maintainers role again for a moment. Those first 3 steps (patch, build, test) can be automated. I would not suggest for a moment that the commit should be unconditional! Thus, the 4th step, the code review, is the essential work of the maintainer. The pay-back of this is also clear - less mindless tedium for the maintainer. What ELSE Could be More than Change Sets? Most folks are not convinced by any of this. It's just a crock. They can do it perfectly well manually. They *have* been doing it manually for a decade or more - with more flexibility, too! Working in a team comes with a number of costs. The most obvious cost is that you need to manage the interactions between the developers. It becomes rapidly obvious that they can't all just leap into the source tree and edit on the files directly, because pretty much instantly nothing compiles for anyone. And the change sets are obfuscated beyond redemption. That's what work areas are for - they've been re-invented thousands of times, and have been called zillions of different names (e.g. sand boxes), but they all do the same thing: Each developer gets their own work area, and they leave the master source alone. They do all their work there, and only when they are ready to commit do files get modified in the master source. Notice the strong correlation between work areas and change sets? Different tools make this correlation weaker or stronger, depending on what they are trying to achieve. The basic concept, however, is that change sets have meaning even after the files are committed, whereas a work area is where change sets are created and reproduced. A tool which seeks to do more than just manage files, or even change sets, needs to address work areas, too. This is particularly true when one of the validations (build, test or review) *fail*. You don't want the master source polluted. Work areas are only half the story though. Teams almost immediately lead to the next problem: file conflicts. No matter how you implement file locking, at some point you have to merge the competing edits. Different tools do this at different points in the software development process, but they all do it. The tool needs to track file versions in work areas, so you know if the file is up-to-date (if someone has committed a competing edit ahead of you). This isn't a big problem, because change sets must record file versions anyway. If the file isn't up-to-date, you need a 3-way merge to bring it up-to-date (and you have the 3 versions - the one copied, the one in the work area, and the one most recently committed). Most tools prevent commit from occurring if the file needs to be merged. (You could prevent build and test, too, but that's a bit too officious - there are often good reasons for working with outdated sources.) Software Configuration Management ``Nuh, uh. No way! I've tried BarfCase and it always crashed / went far too slowly / harassed me. Not going there!'' This is a common reaction to tools which attempt to do more than baby-sit files. On the whole, it's a very reasonable reaction, considering what some of them do to you and your system. However, SCM is the correct term (in the textbooks, anyway) for looking after the process and not just the files. To look after more you need to actually track the progress of change sets as they work their way through the process. Some tools are *very* invasive about this, and some are more subtle. There are things the SCM tool needs to know to do its job: * when a change set is created (this often implies the creation of a work area) * when a file is added to a change set, so the version can be recorded (this often implies a copy into the work area) * when files are created or deleted or renamed as part of a change set. * the results of building the change set (either for warnings or errors, if a commit is tried against a failed build). * the results of testing the change set (either for warnings or errors, if a commit is tried against a failed test). * the results of a review of the change sets (either for warnings or errors, if a commit is tried against a failed review). * when the change set is committed or abandoned (i.e. when it is finished) None of these things are new. All of us are doing all of them already. Sometimes, some of the steps are pretty short, but they are all there. Distributed Development Once you have change sets, you have the basics of distributed development. You can use their information about files and file versions to package them up and sling them across the net. But what do you do when you are the recipient of a change set? There is no way you are going to apply the damn thing to your repository sight unseen. You are going to check it all the ways you can: you will build it, you will test it, you will review it, and maybe decide to commit it. You need *process*. Even when you are working alone, when you are the only user on a single PC, participation in a distributed development project is a -team- activity, and you need an SCM tool which is designed for working in teams. Source management alone is not enough. Aegis Aegis is a software configuration management system. It does all of the above and more besides, but it delegates as much as possible, so as to give you access to the other development tools you need... * the build step is watched, but what it does, and what tool you use to do it, is up to you. Yes, you can use make. * file merges are watched, but what it does, and what tool you use to do it, is up to you. * the test step is watched, but what it does, and what tool you use to do it, is up to you. It's also optional. * the review step is watched, but what it does, and what tool you use to do it, is up to you. * the commit step is watched, but what it does, and what tool you use to do it, is up to you. Yes, you can use RCS. Yes, you can use SCCS. Aegis does all this, but introduces a bare minimum of commands. Most of them perform functions developers are already intimately familiar with, and others with obvious purpose in a process like the one described above. Some of them are described here: aenc (new change), aedb (develop begin) are used to create a change set, and create its work area. aecp (copy files) - analogous to RCS ``co'', used to copy files into the change set, and remember the version. aeb (build) - used to run the build tool of your choice, and wait for the exit status. aed (diff) - used to see the differences between the baseline and the change set. aede (develop end) - used to say the change set is ready for review. aerpass (review pass) - used to say a change set has passed review. The commands are different (e.g. aeb vs make, aecp vs co) but the activities are familiar. Aegis is easy to use - believe it or not, you've just seen all of the *routine* commands necessary for a developer to submit a change (there are only a couple more routine commands for change set integrators, and they are often automated). One more command... The aedist command is used to package change sets for sending, and unpackage them on receipt. aedist -send -change N | mail linus will take change set N and mail it somewhere. Easy. To apply it at the other end (I use MH in this example) you simply say show | aedist -receive The change set will unpacked into a separate work area, be built, and be tested (if tests enabled). If the change set has no problems, it will then stop and wait for review. Similar things can be done with aedist for web servers and clients. Where to from Here? Can Aegis do the job? I believe that it can, but you should not take my word for it! Download a copy and start playing. Get a feel for it. You can get Aegis from http://www.canb.auug.org.au/~millerp/aegis/ If you would like to read some manuals, there is PostScript copies of the User Guide and Reference Manual available for download from the same place. Once you have Aegis installed, download one of the template projects, available from the same place. These template projects get you up and running very quickly. (They also exercise the distributed development functionality to do so: your first taste.) In order to have informed discussion of the merits of Aegis, it is necessary for a number of people to download Aegis and try it out. And also try out distributing change sets with it. Once this has happened, it will be possible to discuss whether or not it is suitable for Linux kernel development, and if so, how to implement it. I look forward to your thoughtful comments and suggestions. Regards Peter Miller E-Mail: millerp@canb.auug.org.au /\/\* WWW: http://www.canb.auug.org.au/~millerp/ Disclaimer: The opinions expressed here are personal and do not necessarily reflect the opinion of my employer or the opinions of my colleagues. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/ -------------------------------------------------------------------------------   * Next message: Chip Salzenberg: "Re: NetGear FA310TX/tulip.c"   * Previous message: Dan Hollis: "Re: NetGear FA310TX/tulip.c"   * Next in thread: Andrew Morton: "Re: Proposal: Aegis to manage Linux kernel development"   * Maybe reply: Andrew Morton: "Re: Proposal: Aegis to manage Linux kernel development"   * Maybe reply: Peter Miller: "Re: Proposal: Aegis to manage Linux kernel development" -----snip---- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-qa" in the body of the message