From owner-freebsd-hackers@FreeBSD.ORG Mon May 4 15:35:20 2009 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DA8B6106566B; Mon, 4 May 2009 15:35:19 +0000 (UTC) (envelope-from kientzle@freebsd.org) Received: from kientzle.com (kientzle.com [66.166.149.50]) by mx1.freebsd.org (Postfix) with ESMTP id A3C668FC14; Mon, 4 May 2009 15:35:19 +0000 (UTC) (envelope-from kientzle@freebsd.org) Received: (from root@localhost) by kientzle.com (8.14.3/8.14.3) id n44FZJVd057015; Mon, 4 May 2009 08:35:19 -0700 (PDT) (envelope-from kientzle@freebsd.org) Received: from dark.x.kientzle.com (fw2.kientzle.com [10.123.1.2]) by kientzle.com with SMTP id b2sivu489rb6ncx3akvxukj33i; Mon, 04 May 2009 08:35:18 -0700 (PDT) (envelope-from kientzle@freebsd.org) Message-ID: <49FF0B36.2030805@freebsd.org> Date: Mon, 04 May 2009 08:35:18 -0700 From: Tim Kientzle User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.21) Gecko/20090409 SeaMonkey/1.1.15 MIME-Version: 1.0 To: Jeremy Lea References: <20090430214520.GA37974@flint.openpave.org> In-Reply-To: <20090430214520.GA37974@flint.openpave.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: David Forsythe , freebsd-hackers@freebsd.org Subject: Re: SoC2009: libpkg, pkg tools rewrite X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 15:35:20 -0000 Jeremy Lea: > 1. The library needs a global "package manager". This needs to perform > all of the tasks, and it should ideally do this through a task queue > (which I didn't implement). See the lib/lib.h header in FreePKG. This sounds like a good idea to implement ... eventually. First job is to identify the tasks and a way to implement them. It's relatively straightforward to add a task queue wrapper later on. > 4. bsd.port.mk should do everything through the tools. It should have > no knowledge of the contents of /var/db/pkg. I agree with this in principle, but I think it would be better to avoid touching bsd.port.mk at all in this iteration. Certainly, it's worthwhile to read bsd.port.mk carefully to try to understand the capabilities missing from the current tools with an eye to supporting those in a future iteration, but I think it's more important to keep the scope of this summer's work as narrow as possible. > 1. I made the file->pkg database to sensitive. If there is a miss it > rebuilds the database for scratch - it should do a search through the > +CONTENTS files and only rebuild it if there was a hit (meaning the > database was wrong). I'm going to show my ignorance here: Why is this database necessary at all? On my system with just over 500 packages, it takes <1s to read all of /var/db/pkg with a warm cache (~10s cold cache), and I can only think of a couple of cases where the file->pkg database is useful at all. I fear that maintaining a file->pkg database is a lot of extra effort for very little gain. I would be more interested in experimenting with using extended attributes directly on the files to record what package they came from. In particular, that provides much more robust handling for a variety of use cases, including conflicts, stale files, and manually-updated files. > 2. There needs to be a pkgname and origin database, which can be loaded > at startup to prime the package manager. The dependency graph should > also be stored in a database. These should be rebuilt if any directory > in /var/db/pkg has a mtime later than the database (so could the file > database). Again, I'm a little skeptical of the need for separate databases here at all. In the current system, /var/db/pkg and the package archives themselves have to be authoritative, so any package system has to be able to work directly from that information. Building a separate database from that information seems like a lot of extra work that will pay off someday but can be largely avoided for the current iteration. Remember that the Single Point of Truth for any package manager is the files on disk. If someone deletes the files from disk, then the package is not installed, regardless of what any "package database" might say. In the case of the FreeBSD package system, /var/db/pkg is therefore secondary data and any additional databases you compute from /var/db/pkg are tertiary. This is getting pretty far from the SPoT and is going to lead to increasing consistency problems. If we can avoid these additional databases, we'll get a simpler, more robust system. > 3. There needs to be a set of flags which indicate how a package got > installed (as a dependency or by the user), and if it has been upgraded > in-place and might have old leftover libraries. These could go in > +CONTENTS. Yes, these are the kind of features that should be easy to add after the package tools have been rewritten around libpkg. But first we need a basic libpkg that works and a set of package tools that use it. > In addition I also began the design of a new on disk package format. I'm pretty firmly opposed to this. I think that the INDEX, /var/db/pkg, and binary format for package archives all need to be kept essentially unchanged, simply because of the growing number of third-party tools that interact with them. Eventually, such tools may all be rewritten to use libpkg, at which point it may be reasonable to reconsider, but for the foreseeable future, any changes to these would cause a lot of pain for little gain. There may be some incremental changes (such as the installation flags you mention above) that would help, but I think we're stuck with the current formats for the time being. I'll also observe that some of the concerns that drove your new package archive design are no longer relevant; libarchive allows us to use tar format without forking the tar executable. Tim