Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 27 Sep 2004 08:45:11 -0600
From:      "David G. Andersen" <danderse@cs.utah.edu>
To:        Giorgos Keramidas <keramida@freebsd.org>
Cc:        Colin Percival <cperciva@wadham.ox.ac.uk>
Subject:   Re: compare-by-hash (was Re: sharing /etc/passwd)
Message-ID:  <20040927084511.E75411@cs.utah.edu>
In-Reply-To: <20040927091710.GC914@orion.daedalusnetworks.priv>; from keramida@freebsd.org on Mon, Sep 27, 2004 at 12:17:10PM %2B0300
References:  <Pine.LNX.4.33.0111071900280.24824-100000@moroni.pp.asu.edu> <20011107211316.A7830@nomad.lets.net> <20040925140242.GB78219@gothmog.gr> <41575DFC.9020206@wadham.ox.ac.uk> <20040927091710.GC914@orion.daedalusnetworks.priv>

next in thread | previous in thread | raw e-mail | index | archive | help
Giorgos Keramidas just mooed:
> 
> What I pointed out was that when a non-zero possibility of two data
> blocks comparing as equal (even though they are no) exists, the method
> in question should not be used for password data or other sensitive bits
> of information.  A larger hash key will never yield a possibility of
> zero, so it doesn't mean that you can sleep untroubled at night while
> the rsync server overwrites /etc/*pwd.db files periodically.

  P(hash collision) << p(gamma ray memory bit-flip)
      also          << p(disk block error)
      also          << p(undetected network error w/TCP)

You're worried about the wrong thing.  Unless you're talking about malicious
hash collision generation with a broken hash function, the random hash
collision probability, particularly with something like sha-1,
really is small enough as to be insignificant.  The section in the
paper dealing with this is pure bunk.  _everything_ is a probability --
it's just a matter of how much you're willing to spend (bandwidth,
computation, storage) to drive that probability low.  Illustrative
example from the paper:

   "The empirically observed rate of undetected errors in TCP
    packets is about 0.0000005% ... or we could slightly worsen
    that rate by sending only the hash"

What's the error rate when sending only the hash?  Since the
probabilities are small, we can effectively add them.

P(undetected TCP error) =  0.000000005
P(hash collision)       =  1/1208925819614629174706176
                        =~ 0.00000000000000000000001

"Worsening"             =  0.00000000500000000000001

Now, if I were a smart programmer, I'd look at that and say,
"If I'm worried about reliability, then TCP is my enemy.  Hashes
are my friend -- because I can send _two_ different hashes
of the same data for _way_ less then the cost of sending the
data, and that way I can protect myself against undetected
TCP errors!"

  -dave

-- 
work: dga@lcs.mit.edu                          me:  dga@pobox.com
      MIT Laboratory for Computer Science           http://www.angio.net/
      I do not accept unsolicited commercial email.  Do not spam me.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040927084511.E75411>