Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Apr 2005 00:36:09 -0700
From:      Colin Percival <colin.percival@wadham.ox.ac.uk>
To:        Olaf Wagner <wagner@luthien.in-berlin.de>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: Adding bsdiff to the base system
Message-ID:  <4254E2E9.2090504@wadham.ox.ac.uk>
In-Reply-To: <200504070644.j376imwB027984@luthien.iceflower.in-berlin.de>
References:  <200504070644.j376imwB027984@luthien.iceflower.in-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Olaf Wagner wrote:
> In article <424BD4FB.1050304@wadham.ox.ac.uk> you wrote:
>>At present portsnap is the only mechanism
>>available by which most users can securely maintain an up-to-date copy
>>of the FreeBSD ports tree; it also provides some other advantages over
>>cvsup (reduced bandwidth and ports INDEX/INDEX-5/INDEX-6 files).
> 
> Just out of interest: how does it do that? I've not tested it yet,
> but what intelligence or knowledge does it use to be so much more
> efficient (1/10) than CVSup? (I myself haven't found anything as
> efficient as CVSup yet, at least for replicating CVS repositories...)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exactly.  CVSup is a tool for replicating CVS repositories; portsnap is
a tool for checking out the latest version of all the files in the
repository.  CVSup is solving a very difficult problem; portsnap is
solving a very simple problem -- so it's not all that surprising that
portsnap can be a bit more efficient.

The reason portsnap is more efficient lies in how portsnap and CVSup
determine which files need to be updated.  The ports tree contains
roughly 71000 files, and the first thing the CVSup client does is list
all of these files and send that list to the server.

In contrast, portsnap has an index file -- containing, roughly speaking,
that same list -- and the portsnap client merely sends the sha256 hash of
this index file to the server, which responds with either "I recognize
that index -- here's a patch which will turn it into the latest index"
or "I don't recognize that -- here's the new index".  Because these
indices have no user-serviceable parts (in fact, mucking about with the
files in /usr/local/portsnap at all is strongly discouraged), there is
a very good chance that the portsnap server will have a useful patch.

As a result, while CVSup uses (in this initial stage) bandwidth which
is proportional to the number of files in the ports tree, portsnap uses
bandwidth proportional to the number of files which have been modified,
which is typically around 1% of the tree per day.

When it comes to the actual distribution of patches to files in the tree,
portsnap is also marginally more efficient than CVSup, due to differences
in how they encode the patches, but the real gains come in the process of
identifying which files need to be updated.

Colin Percival
PS. CVSup's inefficiency in dealing with large trees containing a small
number of updated files isn't only relevant in the context of updating a
ports tree; it is even more notable when tracking the security branches
of the src tree.  In the paper in which I introduced FreeBSD Update, I
gave an example of where FreeBSD Update -- which distributes binary updates
to the base system -- used less than half of the bandwidth needed by CVSup
for the task of applying the corresponding updates to the src tree.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4254E2E9.2090504>