From owner-freebsd-security@FreeBSD.ORG Mon Sep 27 14:45:13 2004 Return-Path: Delivered-To: freebsd-security@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1D9D916A4CE; Mon, 27 Sep 2004 14:45:13 +0000 (GMT) Received: from bas.flux.utah.edu (bas.flux.utah.edu [155.98.60.2]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8A88343D46; Mon, 27 Sep 2004 14:45:12 +0000 (GMT) (envelope-from danderse@flux.utah.edu) Received: from bas.flux.utah.edu (localhost [127.0.0.1]) by bas.flux.utah.edu (8.12.9/8.12.5) with ESMTP id i8REjC1f015734; Mon, 27 Sep 2004 08:45:12 -0600 (MDT) (envelope-from danderse@bas.flux.utah.edu) Received: (from danderse@localhost) by bas.flux.utah.edu (8.12.9/8.12.5/Submit) id i8REjBr4015733; Mon, 27 Sep 2004 08:45:11 -0600 (MDT) Date: Mon, 27 Sep 2004 08:45:11 -0600 From: "David G. Andersen" To: Giorgos Keramidas Message-ID: <20040927084511.E75411@cs.utah.edu> References: <20011107211316.A7830@nomad.lets.net> <20040925140242.GB78219@gothmog.gr> <41575DFC.9020206@wadham.ox.ac.uk> <20040927091710.GC914@orion.daedalusnetworks.priv> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20040927091710.GC914@orion.daedalusnetworks.priv>; from keramida@freebsd.org on Mon, Sep 27, 2004 at 12:17:10PM +0300 cc: freebsd-security@freebsd.org cc: Colin Percival Subject: Re: compare-by-hash (was Re: sharing /etc/passwd) X-BeenThere: freebsd-security@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Security issues [members-only posting] List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Sep 2004 14:45:13 -0000 Giorgos Keramidas just mooed: > > What I pointed out was that when a non-zero possibility of two data > blocks comparing as equal (even though they are no) exists, the method > in question should not be used for password data or other sensitive bits > of information. A larger hash key will never yield a possibility of > zero, so it doesn't mean that you can sleep untroubled at night while > the rsync server overwrites /etc/*pwd.db files periodically. P(hash collision) << p(gamma ray memory bit-flip) also << p(disk block error) also << p(undetected network error w/TCP) You're worried about the wrong thing. Unless you're talking about malicious hash collision generation with a broken hash function, the random hash collision probability, particularly with something like sha-1, really is small enough as to be insignificant. The section in the paper dealing with this is pure bunk. _everything_ is a probability -- it's just a matter of how much you're willing to spend (bandwidth, computation, storage) to drive that probability low. Illustrative example from the paper: "The empirically observed rate of undetected errors in TCP packets is about 0.0000005% ... or we could slightly worsen that rate by sending only the hash" What's the error rate when sending only the hash? Since the probabilities are small, we can effectively add them. P(undetected TCP error) = 0.000000005 P(hash collision) = 1/1208925819614629174706176 =~ 0.00000000000000000000001 "Worsening" = 0.00000000500000000000001 Now, if I were a smart programmer, I'd look at that and say, "If I'm worried about reliability, then TCP is my enemy. Hashes are my friend -- because I can send _two_ different hashes of the same data for _way_ less then the cost of sending the data, and that way I can protect myself against undetected TCP errors!" -dave -- work: dga@lcs.mit.edu me: dga@pobox.com MIT Laboratory for Computer Science http://www.angio.net/ I do not accept unsolicited commercial email. Do not spam me.