From owner-freebsd-ports@FreeBSD.ORG Sat Apr 17 11:40:55 2004 Return-Path: Delivered-To: freebsd-ports@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E8B2E16A4CE for ; Sat, 17 Apr 2004 11:40:55 -0700 (PDT) Received: from mail.soaustin.net (mail.soaustin.net [207.200.4.66]) by mx1.FreeBSD.org (Postfix) with ESMTP id B101643D48 for ; Sat, 17 Apr 2004 11:40:55 -0700 (PDT) (envelope-from linimon@lonesome.com) Received: by mail.soaustin.net (Postfix, from userid 502) id 20F241465D; Sat, 17 Apr 2004 13:40:53 -0500 (CDT) Date: Sat, 17 Apr 2004 13:40:53 -0500 (CDT) From: Mark Linimon X-X-Sender: linimon@pancho To: Garance A Drosihn In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-ports@freebsd.org Subject: Re: Second "RFC" on pkg-data idea for ports X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Apr 2004 18:40:56 -0000 On Mon, 12 Apr 2004, Garance A Drosihn wrote: > Back in January I send out a long-ish email asking for feedback > on some ideas I had for the ports-collection [...] > The basic idea is to collapse many of the separate files for a > port into a single pkg-data file. The web pages explain why I > think this might be worth doing. Please check them out at: > > http://people.freebsd.org/~gad/PkgData/ My reaction to this proposed changed is: I suppose it depends on what problem you're really trying to solve. You've mentioned (in a later maessage than this one) that you have some ideas about future directions that could spring from this work, but that they are not yet fully-formed enough to be written down. While that's fair enough, until that's done, it's really hard to weigh the tradeoffs involved in doing all this (IMHO) extensive work. But if that's the case, then what you're trying to address is not just the inodes problem. Lacking that, what we have is a proposal to address the inodes problem. Assuming, for the sake of argument, that the number of inodes is affecting a large-enough class of people, let's think about some alternative ways to address that, that might involve less reworking of the infrastructure. (In the colorful folk saying, "don't raise the bridge -- lower the waster"). 1. (easy) If the distinfo lines were moved into the Makefiles, that would result in a savings of 9568 files out of 10149 ports (60075 files), for about 16%. (Note: I'm using the numbers from an old tree, but the percentage has probably not changed significantly). (Disclaimer: although I personally am not really fond of this solution due to the repo-churn it would create, I know that other people are pushing for this to be done). 2. (intermediate) Let's change the way we think about patchfiles. Instead of seeing them as a permanent part of the port, perhaps we should instead be thinking about each one as a temporary measure until we can get the original software's authors to incorporate them upstream. Now, there's no question that working through each and every port, sending email to its author(s) (if, indeed, the software is even still being maintained), is nowhere near as exciting or fun than reworking infrastructure :-) . However, think of the benefits: getting the patches incorporated upstream means less work for each individual port maintainer during each port update; also, in many cases, the patches will help out maintainers on other OSes (in particular, the other BSDs, but the gcc3.3 patches and patches for 64-bit problems will also help Debian and some of the other Linux distributions. In this scenario, everybody wins.) 3. (advanced) Right now our default assumption is that to install any ports, you have to install the entire ports collection. This is true whether you install ports via downloading and unzipping the tarball from our main site, or use cvsup. Perhaps it's time to reevaluate this assumption. Right now, some of our ports tools rely on having an up-to-date INDEX file, and since it is updated much more rarely than ports are added, moved, or deleted, that implies needing the ability to generate the INDEX file locally -- and, due to the cross-dependencies between ports categories, generating that file doesn't work if you don't have all the Makefiles. (There are exceptions to this: a few categories really are 'leaf categories' but they're fewer in number than you might suspect: most (but not all!) of the language ports; and, IIRC, astro, benchmarks, biology, finance, mbone, picobsd, and maybe x11-themes). 3a. (hard) figure out some way to do away with the INDEX file. This probably means creating some kind of Berkeley db-based solution (AFAIK that's the only database included in the base system.) As you are learning, getting consensus on what type of technology to bring in is not so easy ... that's why I list this as "hard". Nevertheless, I think it would be an interesting line of research, but at my current rate I myself will not be getting around to it, myself, for months. 3b. (somewhat easier) Figure out ways to not have to have the entire hierarchy loaded. The way that has occurred to me to do this is to figure out which ports in which categories require which other ports from which other categories. My first attempt to do this, that led to the conclusions about "leaf categories" above, was just some sh scripts, and although informative, led me to the conclusion that the gain from partitioning out the "easy cases" was on the order of 9% of the inodes. I haven't pursued it further, because 9% didn't sound super-attractive to me; but again, I do not see the inodes as quite so pressing a problem in the first place, so maybe it's worth doing regardless ... But the only way to get more than that 9% gain is to understand the cross-category dependencies, to lead to possible further repartitioning of the tree (really, only the filesystem is a tree; the dependencies are a very messy graph). (As an example, my other conclusion from that shell-script run was "everything depends on devel, and devel depends on everything else". Since devel has 1184 ports in it, it's difficult to attack the overall problem without attacking devel ...) I honestly don't think anyone in the FreeBSD project really has a handle on what that dependency graph looks like. And this is where I think your desire to have someone work on the inodes problem, who doesn't have an intricate knowledge of coding to the existing infrastructure, could be invaluble. There are various ports in the tree (graphics/graphviz; graphics/ meshviewer; graphics/vcg) that might be really useful to shed some light on the data structures. To my knowledge, no one has ever done this for the FreeBSD ports, if, indeed, for any of the various open-source OSes at all. Since these things take data-file input, they might not need heavy-duty programming experience to come up with useful results. So this is where I'd like to suggest that some work by a dedicated volunteer could produce some immediate short-term results that would help out the users: this would help us to define what the underlying problem actually _is_ that we're trying to solve. And that's always a good thing. mcl