From owner-freebsd-questions@FreeBSD.ORG Wed Jan 14 02:57:36 2009 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 624091065670 for ; Wed, 14 Jan 2009 02:57:36 +0000 (UTC) (envelope-from cpghost@cordula.ws) Received: from fw.farid-hajji.net (fw.farid-hajji.net [213.146.115.42]) by mx1.freebsd.org (Postfix) with ESMTP id 81A398FC1A for ; Wed, 14 Jan 2009 02:57:35 +0000 (UTC) (envelope-from cpghost@cordula.ws) Received: from phenom.cordula.ws (phenom [192.168.254.60]) by fw.farid-hajji.net (Postfix) with ESMTP id 5A822364F2; Wed, 14 Jan 2009 03:57:33 +0100 (CET) Date: Wed, 14 Jan 2009 03:57:32 +0100 From: cpghost To: freebsd-questions@freebsd.org Message-ID: <20090114025732.GA98196@phenom.cordula.ws> References: <20090102164412.GA1258@phenom.cordula.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090102164412.GA1258@phenom.cordula.ws> User-Agent: Mutt/1.5.18 (2008-05-17) Subject: Re: Foiling MITM attacks on source and ports trees X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jan 2009 02:57:36 -0000 On Fri, Jan 02, 2009 at 05:44:12PM +0100, cpghost wrote: > Any idea? Could this be implemented as a plugin to Subversion (since > it must access previous revisions of files and previously computed > digests)? Given read-only access to the repository, a set of simple > Python scripts or C/C++ programs could easily implement the basic > functionality and cache the results for fast retrieval by other > scripts. But how well will all this scale? Sorry to revive this thread by replying to self, but nothing has materialized out of it (yet). Considering all that has been said up until now, it boils down to this: Issue #1 was signing the list: With or without SSL/TLS certificates, the (compressed) list could be signed by a web-trusted GnuPG Project Key, so let's assume it will be, and deal with the issue of transmission over SSL and how to get a certificate for the server(s) later (if at all). Issue #2 was how to generate the list out of the repository. A script that has (read-only) access to the Subversion repo would first in batch mode compute md5/sha256 checksums for *all* revisions available. It may take some time, but so what? It's only a one-time job, so let it run overnight to checksum the few GBs. The results could be stored in an arbitrary database. Then, another script will have to be hooked into Subversion, so that each commit will have that script compute the md5/sha256 checksums of the additional revisions, and store them in the database as well. This doesn't seem too much of a burden on the server, because even if the commits come in bursts, the number of bytes to commit are very fast checksummed... and saved in the database (I think / hope). It doesn't look like an overly expensive operation. Issue #3 was how to generate the list on-demand. That's a simple database query script, that would select a subset of files, revisions and checksums from the database, compress the result, sign it with the GnuPG Project Key, and return it to the user. This scales well to many concurrent client queries, because the database is independent from the Subversion server and can run on separate hardware -- and even be replicated if need be. Issue #4 was how to get the checksums on the client side. A simple app could connect to the "checksum server" (the app defined in Issue #3) -- or one of its mirrors if need be -- and select a signed list for a specific subrange (say, now up to 24h in the past). It would verify the signature using the public Project Key (obtained through a secure channel -- but let's care about that later when the infrastructure is in place). This app could factor out the tasks of querying the server and checking the signature into a library, that could also be used by an expanded version of csup. The idea is that csup, called with a special flag would verify the checksums of all files downloaded in the current run; while the main app could still check the integrity of a tree fetched 2 months ago, provided it is called with the right time stamp. Issue #5 was how to identify the revisions of files stored locally. That's a tough one, AFAICS. How to solve that one? Ideas? On old trees, its kinda hopeless (but read below); new invocations of a modified csup could save metadata, including revisions numbers somewhere (/var/db/sup perhaps), and use those metadata. For old(er) trees, checksums could be computed locally and sent to the "checksum server" for identification purposes. The server could match the path and checksums obtained through the client, and return a revision number (if any) out of the database. That in turn could be stored post-facto in /var/db/sup, and everything could proceed as above. Soo... implementation should now be easy as pie and require just a few lines of Python or a few more lines of C and a couple of little programs... and of course read-only access to the repository for deployment once it's ready. Or is it not yet? Thanks, -cpghost. -- Cordula's Web. http://www.cordula.ws/