Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Mar 2007 16:48:02 -0800 (PST)
From:      youshi10@u.washington.edu
To:        freebsd-ports@freebsd.org
Subject:   Re: portupgrade O(n^m)?
Message-ID:  <Pine.LNX.4.43.0703011648020.15513@hymn09.u.washington.edu>
In-Reply-To: <346a80220703010923n483e47fcw1444a84814eca4a4@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 1 Mar 2007, Coleman Kane wrote:

> On 2/28/07, Sean Bryant <sean@cyberwang.net> wrote:
>> 
>> youshi10@u.washington.edu wrote:
>> > On Thu, 15 Feb 2007, Michel Talon wrote:
>> >
>> >>> Give me a few weeks, and if I can band together with a few people I
>> >>> wanted to try and port sections of portupgrade and its related tools
>> to
>> >>> C++ (and maybe do some code tweaks along the way). Most of the ruby
>> >>> files are over 400 lines long, sparsely commented, and I don't know
>> ruby
>> >>> enough to port right now, but I've been making some headway lately so
>> >>> I'll try porting some stuff soon.
>> >>
>> >> I think that porting portupgrade to C++ would be time spent in vain. In
>> >> my opinion, some of the basic ideas of portupgrade are deeply flawed,
>> >> and as much as one polishes the algorithms it will not gain much. The
>> >> idea of keeping state in databases is deeply flawed, it is constantly
>> >> broken, and doesn't help in speed at all. This was one of the
>> >> motivations of portmaster, get rid of database dependencies. In my
>> >> opinion, upgrading progressiveley, that is, port by port, is deeply
>> >> flawed. There is 90% chance that something will go wrong in the middle
>> >> and you will be stuck with an half upgraded system.
>> >>
>> >> So in my opinion, what is needed is thinking radically new about the
>> >> problem, write a prototype in a scripting language to experiment with
>> >> the solutions, and then code it in C++. Personnally i have done that, i
>> >> have written a python script, which can be found here:
>> >> http://www.lpthe.jussieu.fr/~talon/pkgupgrade
>> >> (it needs the companion
>> >> http://www.lpthe.jussieu.fr/~talon/save_pkg.py).
>> >> For the time being, i still have bugs, that i am working on, but at
>> >> least these bugs show that the problem is vastly more complicated that
>> >> one can imagine at first.
>> >>
>> >> Why python? because it is much more readable than perl or ruby, and
>> much
>> >> more performant than ruby. In may opinion ruby is vastly hyperhyped, it
>> >> is much closer to rubish than anything else.
>> >> What ideas? Don't use any database, database connector, do everything
>> >> in memory, recompute needed information on the fly. It works very well,
>> >> one can count on something of the order of 1mn to 2mn to perform the
>> >> necessary analysis for 700 ports. Second, download as much precompiled
>> >> packages as possible, at full speed, that is with the same connection
>> to
>> >> the ftp server. This works very well, if you have a good internet
>> >> connection, in 15 mn to 20 mn you have your packages.
>> >>
>> >> Why packages?
>> >> because packages don't break when compiling. Compiling from source is
>> >> asking for problems. If you minimise the number of compilations you
>> >> minimise the risk of breakage. Moreover simultaneously with downloading
>> >> one can backup old packages, and so, gain time. By contrast, for every
>> >> packages, portupgrade first does dependency analysis that could be done
>> >> once, then does backup, then fetches the binary package or compiles,
>> >> then installs it, then discards backup. Al this is terrible loss of
>> >> time.
>> >>
>> >> Finally my script produces a shell script able to do the upgrade. So
>> you
>> >> can look in written form to *exactly* what will be removed, what will
>> be
>> >> installed by binary packages, and what will be compiled. All necessary
>> >> packages for installation are already present on the machine. There is
>> >> absolutely no element of surprise, you can evaluate the risk soundly.
>> >> These are the ideas i have explored.
>> >>
>> >> Now, performance wise, when you run the shell script it takes around 2
>> >> hours. This is entirely time spent by pkg_delete ( roughly 15 mn) and
>> >> pkg_add (roughly 1h45mn) for around 500 ports replaced. This is very
>> >> long, sure, but it can be optimized only by working on pkg_delete and
>> >> pkg_add. No amount of work on portupgrade or a replacement will help in
>> >> any way.
>> >>
>> >> As for the remaining bugs i have, they are entirely due to the crappy
>> >> complexity that FreeBSD port developers introduce by constantly
>> >> modifying the origins of the ports. So for a given program, i can have
>> 3
>> >> different origins, one when the port was previously installed on the
>> >> machine, another one when the last RELEASE was produced, and the last
>> >> one if i compile now the port on the machine with the present state of
>> >> the ports tree. These 3 origins may be different, i have examples.
>> >> These morons are *constantly* modifying the names, as an exercice in
>> >> bikeshed painting. For example pan -> pan2 -> pan, etc. Cycles don't
>> >> worry them at all!
>> >> Of course, for a given software, you may have all combinations, such as
>> >> inexistant or existant at the time the machine was installed, at the
>> >> time of the release, or at present.
>> >>
>> >> Compare that to the situation for Debian apt-get. The names are
>> >> conserved. They have strict rules about package naming, they stick to
>> >> them and don't change them arbitrarily. All packages exist in compiled
>> >> form, you don't have to worry about prepackaged or "to be compiled, so
>> >> has 50% chance to break". You have only 2 states to consider instead of
>> >> 3: the state on the machine and the state on the repository. Things are
>> >> vastly simpler. No wonders that apt-get works and portupgrade doesn't.
>> >> This has nothing to do with the fact that apt-get is written in C++
>> >
>> > (sorry to cross post, but this thread is just as relevant to @ports as
>> > it is to @hackers)
>> >
>> > Well, since you brought up Debian's apt-get system I thought it'd be a
>> > good idea to take a look at the Gentoo Linux emerge / portage system
>> > (patterned after Freebsd):
>> >
>> > =====
>> > Pros:
>> > =====
>> > -It's written in python (portable).
>> > -It's a system which focuses on ports compilation from source, not
>> > binary package installation.
>> > -Stores information in a db format (not Berkeley DB, but something
>> > different)for entire system in a common file; stores installed leaf
>> > package information in another simple textfile.
>> > -Has flags for stability reasons, since some packages are alpha or beta
>> > and don't compile under certain architectures.
>> > -Portage files are fetched via rsync.
>> > -Has separate portage files which are phased out over time, in case the
>> > portage maintainers move the files in one release. The maintainers then
>> > create an informative message which describes what's going on while
>> > emerging the package or going through the portage database. If possible
>> > the outdated package is pruned and the newer, more recent dependency is
>> > merged.
>> >
>> > =====
>> > Cons:
>> > =====
>> > -It's written in python (not fast).
>> > -Uses rsync.
>> >
>> > ======
>> > Point:
>> > ======
>> > Apart from what's listed in the above paragraph, Gentoo's portage may
>> > have several things that are better than FreeBSD's port system:
>> >
>> > -Limited life cycle for versioning, which doesn't force server / desktop
>> > owners to fix a number of machines all at once, but instead gives them a
>> > heads up before a big change occurs and automatically unmerges old
>> > dependencies and emerges new items, if possible.
>> > -One common interface for package / portage management--not 10 little
>> > tools which do basically the same thing, or are specialized for specific
>> > tasks.
>> > -One common file for all installed packages / ports, not a series of
>> > directories and files.
>> > -Separate versioning for files, which doesn't break things nearly as
>> > much as one common ports Makefile for each file.
>> > -A means to search for portage items and their descriptions, without
>> > having to deal with a tool that doesn't really work reliably.
>> >
>> > It's not so much that I'm trying to bash on freebsd, but there's
>> > definitely a revision that needs to be made to the way that ports /
>> > packages are done, because it seems that the commitee in charge of ports
>> > planning and the overall roadmap seem to have let things get a bit off
>> > track, just because of the sheer number of ports items available.
>> > Something can be fixed and should be. I can only do a portion of the
>> > load myself in so much time, since I'm going to work and school right
>> now.
>> >
>> > =======
>> > In light of previous statement:
>> > =======
>> >
>> > I wasn't trying to port the pkg_* and port* utils to C++ thinking that I
>> > would magically get more optimized code. Sure, C++ is much better than
>> > ruby at optimizations if done correctly, but C++ is also easier to screw
>> > up than ruby or perl or python, because you have the power to shoot
>> > yourself in the foot easier (not as much as C or ASM, but close).
>> >
>> > The point was that with C++ we could finally get a set of standardized
>> > tools and a common interface for FreeBSD for managing ports / packages
>> > which could be included in the base system, not a bunch of little
>> > specialized tools and packages.
>> >
>> > I'll have to approach this problem from a black box perspective and be
>> > carefully in planning this out, but my goal is to be as backwards
>> > compatible friendly as possible or at least provide migration tools to
>> > ease the move from the old system to the new one.
>> >
>> > Again, if anyone is interested in helping me out, it would be more than
>> > welcome. That way we could ensure that the project gets done in a timely
>> > manner and can reduce bugs and think of better solutions (more people
>> > can help in thinking out of the box, the larger the group).
>> >
>> > Thanks,
>> > -Garrett
>> >
>> > PS Please reply on the @hackers list, if possible.
>> >
>> > _______________________________________________
>> > freebsd-hackers@freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>> > To unsubscribe, send any mail to "
>> freebsd-hackers-unsubscribe@freebsd.org"
>> Honestly I'd be more interested in a package building system. Maybe be a
>> little bit more liberal in the default building of ports. It doesn't
>> need to build a package of every port just common ones. That way its
>> easier to get up and running with things. Things like xorg, gnome and
>> KDE take ages to build and would be awesome if there was a decent
>> package fetching system. Something like apt-get where you could add some
>> kind of repository. and you could just pull down a list of packages and
>> choose what you want. This can be emulated in a way using portupgrade -P
>> and changing the pkgtools.conf to have some more mirrors to fetch from a
>> pointyhat macro is there but probably shouldn't be abused as its there
>> to look for problems not build us consumers packages it just a side
>> effect or at least this is how it was explained to me. A neat thing
>> might be a distributed package building project. Where packages are
>> picked apart and pieces are built all over the place get enough places
>> to donate CPU and package building might be a thing of the past, but
>> those are just pipe dreams right now.
>> 
>> The slowness affects me after a mass upgrade, after that I'm fine. Maybe
>> someone can look into profiling portupgrade and seeing if its with
>> portupgrade or the pkg_* tools.
>
>
> One of FreeBSD's strengths, from my POV, is the power it affords you from
> the ports system. One of my biggest beefs w/ Linux has always been the
> prebuilt-binary-centric package systems. In addition to performance, this
> was one other thing that drove me to The Beastie. With the newer, faster
> processors, my personal bottleneck (and I think this is true for many others
> in this thread) has moved from the compilation stage of ports into the
> meta-data management. It can be disheartening for a geek to see that the
> process of "registering package", or updating the "information repository to
> get/build packages" seems to take significantly longer than the process of
> actually building and installing the software.
>
> I have worked on other projects where "splitting up" the work has been
> meritorious. I am looking at the pkgdb/portsdb BDB files on my system right
> now, and I see the following:
> usr/ports/INDEX-7.db: 36658176 bytes (~35MB)
> var/db/pkg/pkgdb.db: 34974720 (~33.3MB)
>
> What if we were to divide up the pkgdb.db and the INDEX-7.db files into
> multiple .db files (perhaps one file for each category directory in ports),
> and then force the package names to be
> "{category}/{packagename}-{versioninfo}" everywhere they are referenced? I
> do not know if currently packages record the category name for their
> dependencies, but it seems that if they did we could reduce the search space
> down to only the ports in the same category as the port in question.
>
> While we're at it, adding some more meta-information to package .tbz files
> would be nice. I don't know if any of this is packaged up, but it would be
> useful for FreeBSD binary package distributors to have some of the make
> environment variables ($CFLAGS, $CC, $CXX, $CPP, etc...) embedded in the
> package metadata as well as any defined WITH_* option variables or other
> port-knobs. If it can/has be(en) done, then a package system could take
> these things into account and perhaps offer the user a screen similar to
> "make config" for which toggles to get with the desired port-package.
>
> Let me know how all of this sounds, if I am on the right track or just
> blowing smoke. Of course, I have no time, just like everybody else...

Gabor's working on restructuring the ports system for the Google SoC: <http://wiki.freebsd.org/G%c3%a1borK%c3%b6vesd%c3%a1n>.

I'm proposing my changes to Kris (kris@freebsd.org) so I can first optimize the pkg system. Then, time allowing I'll collaborate with Gabor to move port management from ports "packages" to a series of apps in the base system by porting the Ruby code to C++ (that was the only heirarchical language I was told could be used for base system projects since Perl's no longer part of the base system).

In the meantime I plan to rewrite the scripts in Perl to get the general idea down, and also because I've been writing a ton of Perl lately, so getting those changes out quickly should be quicker in Perl for the short term. That way Ruby can be removed from the equation. Full backwards compatibility would be provided with the other Ruby ports scripts (portupgrade, portinstall, etc).

Sound good?

-Garrett




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.43.0703011648020.15513>