Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Feb 2008 09:17:56 -0600
From:      Brooks Davis <brooks@freebsd.org>
To:        Erik Cederstrand <erik@cederstrand.dk>
Cc:        freebsd-performance@freebsd.org, Brooks Davis <brooks@freebsd.org>, kris@freebsd.org
Subject:   Re: Performance Tracker project update
Message-ID:  <20080208151756.GA35423@lor.one-eyed-alien.net>
In-Reply-To: <47AC15A5.5020009@cederstrand.dk>
References:  <4796C717.9000507@cederstrand.dk> <20080123193400.N63024@fledge.watson.org> <4797A245.7080202@cederstrand.dk> <20080123202433.E63024@fledge.watson.org> <4797A802.8060509@FreeBSD.org> <47A0BFE7.4070708@cederstrand.dk> <20080130190000.GA18333@lor.one-eyed-alien.net> <47AC15A5.5020009@cederstrand.dk>

next in thread | previous in thread | raw e-mail | index | archive | help

--n8g4imXOkfNTN/H1
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Feb 08, 2008 at 09:41:09AM +0100, Erik Cederstrand wrote:
> Brooks Davis skrev:
>> On Wed, Jan 30, 2008 at 07:20:23PM +0100, Erik Cederstrand wrote:
>>>=20
>>> I'd like a situation where I can very quickly set up a slave with a=20
>>> specific version of FreeBSD to run additional tests or provide shell=20
>>> access to a developer. This currently involves adding an entry to a=20
>>> queue, rebooting and waiting 2 minutes. Quick and easy, but the archivi=
ng=20
>>> strategy is obviously very inefficient.
>>>=20
>>> I'm thinking of a couple of options:
>>> 1. Having one full install per month and archiving the rest as diffs
>>>    against that by recursively bsdiff'ing every file in the tree (I
>>>    could bsdiff a whole tarball, but bsdiff is very memory-intensive).
>>>    Quick test: 25 mins.
>>> 2. Make a hash of all files and only store the binaries where the hash
>>>    is different from the monthly tarball. Faster than 1., but less
>>>    effective. Quick test: 5 mins.
>>> 3. Use some kind of VCS. My experience with Subversion and binary files
>>>    is that it's very slow.
>>> 4. Throw hardware at the problem.
>>>=20
>>> I'd say it should not take more than 10 mins to recreate an archived=20
>>> version. Any thoughts?
>> It seems like you should be able to combine 1 and 2 with checksums to
>> decide if you need to run diffs.  I'd think that would be quite fast.
>=20
> I finally got around to testing this, and with a combination of mtree=20
> comparing md5 hashes, bsdiff compacting changed files and hardlinking=20
> unchanged files I get a reduction in size from 256MB to 10MB. Pretty good=
,=20
> and the whole operation only takes a few minutes.

Cool!

> I have one peculiarity, though. I install python2.5 into the directory=20
> containing the build, and even though the python version has not changed,=
 I=20
> still get mismatching md5 sums on every .pyo and .pyc file. Any thoughts =
on=20
> this?

I'm not a python guru by any means, but I think .pyc files probably have da=
ta
about the .py they are generated from because there's some sort of
auto-generation available.  It may be possible to not store them at all and
just generate them before you use them or add some magic build flags to cau=
se
them to store some sort of cooked values.  I'm not sure where the .pyo files
come from.

-- Brooks

--n8g4imXOkfNTN/H1
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)

iD8DBQFHrHKkXY6L6fI4GtQRApHaAJ97Xs/RkROLfXsgnFBV8d6yHmfoCQCgtF9N
P5wzW2mvgZCgBv973JH1cMs=
=Fzh9
-----END PGP SIGNATURE-----

--n8g4imXOkfNTN/H1--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080208151756.GA35423>