Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jul 2002 00:01:41 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Jordan K Hubbard <jkh@queasyweasel.com>
Cc:        Dan Nelson <dnelson@allantgroup.com>, Archie Cobbs <archie@dellroad.org>, Dan Moschuk <dan@FreeBSD.ORG>, Dag-Erling Smorgrav <des@ofug.org>, Wes Peters <wes@softweyr.com>, arch@FreeBSD.ORG
Subject:   Re: Package system flaws?
Message-ID:  <3D2BDBD5.CA71A2C1@mindspring.com>
References:  <FFFB5387-93BA-11D6-AACD-0003938C7B7E@queasyweasel.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Jordan K Hubbard wrote:
> Oh dear, why do I find myself so unable to resist this thread? :)

Occupational hazard?  8-).

> I'm borrowing a bit of Macintosh nomenclature there (though I'm sure
> Terry will come along and correct me by pointing out that IBM was the
> first to introduce "Fat binaries" with VM/CMS or something :) but I'm
> sure people get the idea.  If you're distributing an Emacs or TeX
> package which weighs in at some hefty percentage of the New York phone
> book in size, and with KDE and Gnome one doesn't even have to look to
> Emacs anymore for a good example of a "really big, honkin' package", you
> naturally want to save on disk space if at all possible both to minimize
> load on the archives and make those poor Australian users with their
> 9600 baud Telstra link to the US happy.   Compression is certainly a
> good start, but when you start distributing those packages for 3 or 4
> different architectures (as FreeBSD is definitely not far away from
> doing) you also would like to not distribute 3 or 4 different copies of
> the same architecture-neutral bits if you can possibly help it.  That's
> where the idea of munging attributes into the dictionary namespace
> starts making more and more sense, and not just for representing
> different architectures but also thinks like "experimental vs stable"
> variants, "mix-ins" (like all the various versions of Apache which have
> various bits of compiled-in smarts) and what have you.  If you introduce
> the concept of install-time attributes, some of which may be implicit
> (like architecture) and some of which may be explicit (like "give me the
> experimental version please"), you conceivably end up with mangled
> pathnames within the package which are demangled on the way out,
> C++-style.  This allows you to have, say, just one copy of all the Emacs
> lisp files and documentation but 3 or 4 different "bin/emacs" files
> which don't collide internally and are properly selected for on
> extraction.

Apple has dealt with this for the 68K vs. PPC binaries by stuffing
the binaries into the same package, as a unit, and putting them in
different "code resource" in the resource fork of the same file,
while sharing the "data fork" between the code.

The historical canonical answer is "ANDF" -- Architecture Neutral
Distribution Format.

In ANDF, one would compute the quad tree for a compiled program (for
example), but not do the code generation.  Instead, one would store
the quad tree, and generate the actual code, at install time.

ANDF is actually a brilliant idea, since the resulting program is not
source code.  It also doesn't have the problem of the Apple approach,
which is bloating the file size by the unused portion(s) of the code
contained therein.

Unfortunately, creation of intermediate languages is against the
policy of the FSF, due to it's ability to weaken the effect of the
GPL on programs.  This is the same reason the data dictionary work
that was discussed on -advocacy and -chat recently is frowned upon
by RMS, and why he posted what he did about sweeping the research
under the rug to avoid exposure.

At this point in time, ANDF is not an option because of license
politics having nothing to do with available technology.


I would like to think that a pure copy of the Apple approach was,
likewise, not an option.  Specifically -- and it looks like this is
what you are actually advocating here -- extracting the portions of
the applicable binaries at installation time, rather than installing
the whole thing, is probably the most correct approach.  It means
that you only end up with what you are worried about on your disk.

It also has the (dubious, in some people's eyes) attribute of making
it impossible to take an installed package and recreate the package
as a distribution -- you can't go to any machine with the software
installed on it, and simply plug in you iPirate or whatever, and
suck the package down to ensure redistributability to any target
platform supported by the original distribution.


The one real drawback with this approach is linear media.  That is,
if I have a DVD distribution, it's not a problem.  But if I have a
28K modem... it is.  A CDROM distribution is mildy problematic,
since putting i386, PPC, SPARC, Alpha, and IA64 (and S/360? 8-))
binaries all in the same package could require a lot of CDROMs.  But
that's still less of a problem then linear media, like a TCP connection
stream or a magnetic tape.


There are two workarounds for this... assuming you can get the metadata
early, which means that it's at a known location in every file, which
pretty much dictates the front of the file.  The first is that the FTP
protocol supports the concept of a "REGET", and does not check that the
receiver has the most recent version of the file ... it only cares about
the byte offset, AND it supports interruption of transfers.  The second
is that the HTTP 1.1 protocol support HTTP GET with a range argument for
a byte range for a server object.  Both of these options effectively
support random acces of known locations within a file, at a known offset.

Not all FTP and HTTP servers support these facilities; enough do that it
might be OK to rely upon them.  HTTP is particularly attractive, due to
firewall issues at large companies, where other protocols are blocked.


A corollary to the use of random access at a known offset from metadata
gathered from a known offset is that per-file MD5 (or other cryptographic)
signatures can not be applied to the file as a whole.  Magnetic tape may
not have to worry about this... since you are going to have to read
everything to byte-count for offset based extents anyway... but other
other media where the intervening bytes are not traversed *is* a
problem for those of us without corporate network connectivity.  Even
a cable modem or DSL could be come onerous for a large package (your
EMACS example would probably strain a T1 user's patience ;^)).


> Anyway, wish-list items like this are why it's a good idea to define the
> goals first and the package format second. :-)

Yes.


> P.S. I also gree with jhb's assertion that some folks really need to
> take a good look at libh since it takes a number of things like this
> into account, including all the "occludes, obsoletes, upgrades, ..."
> attributes that people were recently demanding as package metadata.

The current system uses libh.  We would like to (or *I* would like to 8^))
exceed the capabilities of the current system.

The thing that strikes me about libh is that there is a lot of human
effort involved in the dependency tracking mechanism, if it's going to
be possible to perform some of the relationship tracking that some of
the posters have already requested.

In particular, I think that there is not sufficient differentiation
between "necessary" and "sufficient".

Part of the problem there is that the ports system has always been
based on the idea that most of the things in it get built by the
user as needed, so no matter what, it's always "sufficient".  Moving
to a system that can support binary-only implies that the implicit
guarantees that were there are no longer there, and you need to
consider that "just rebuild" will not be an option.

I don't mean to call the demands that were made incomplete or naieve,
but... well, yes I *do* mean to call them naieve, or at *least* call
them incomplete.  8-).


Finally, I think that people often confuse design with implementation,
and that just because a system is *capable* of solving a particular
problem, that the initial implementation would have to be delayed
until it *actually solves the problem*.  I have a really big wishlish,
and adding everyone's wishlist together yields a *huge* wishlist.  It's
clear that implementing everything isn't possible at the present time.
But that doesn't mean that a design should not take this into account
as potential futuere work.

Even if something ends up falling on the floor (I rather expect almost
*everything* will *have to*, and that the first revision will only solve
one or two fundamental issues, like the "packaging the base system"
problem), whatever the final design is, it needs to not *preclude*
solving certain problems in the future, without having to reinvent the
framework yet again.

I guess we should start a seperate thread specifically for a "wishlist";
no offense to the person who volunteered to collect and summarize this,
but I'd like to see the information captured without any editorializing
by a single person; I think it will be more useful as raw data.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D2BDBD5.CA71A2C1>