Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 15 Nov 2009 19:41:18 +0000
From:      "b. f." <bf1783@googlemail.com>
To:        Chris <christopher-ml@telting.org>
Cc:        freebsd-questions@FreeBSD.org
Subject:   Re: Produce identical packages for checksum comparison?
Message-ID:  <d873d5be0911151141r65f182axd48a2c767d2486d5@mail.gmail.com>
In-Reply-To: <4B002741.4000403@telting.org>
References:  <d873d5be0911141823o40f16depea7f6dc5090801a3@mail.gmail.com> <4B002741.4000403@telting.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/15/09, Chris <christopher-ml@telting.org> wrote:
> b. f. wrote:
>> Chris wrote:

...

>> Even if you edited your
>> filesystem or archives to change the timestamps of package files, the
>>
> I think that could be accomplished though the port makefiles.

I think that the exact reproduction of whole archives will be
problematic, unless you have a means of changing the ctime of the
binaries that have been built to a predetermined value.

>> toolchain used to create the binary files in packages often injects
>> random seeds, timestamps, file paths, uid/gid information, etc. that
>>
> I can understand file paths with debug info.  Timestamps?  Ok sure for a
> timestamp file being generated during a make that auto increments version
> numbers.  What would change about uid/gid?  I can't imagine why that
> might be in the binaries.

ar(1) and some of the other utilities inject this information into
certain binary files.  Try running 'objdump -a'  on, for example,
some static archive like /usr/lib/libc.a.  Of course this information
can be manipulated, but you have to do it.  See the patches in the
link I cited earlier for other examples.

...

> Why would the build tools be injecting random numbers into binaries?

Usually to provide some degree of uniqueness.  I'm not saying that it
is always done, just that it _may_ be done.  See, for example, the gcc
sources or the -frandom-seed option description in gcc(1).  And it may
not be just the compiler toolchain -- a port may do it.

Occasionally, there are other sources of non-determinism.  For
example, in a recent thesis, a researcher who was trying to use
reproducible builds to defeat a longstanding security threat found
that the tcc compiler produced non-deterministic builds because of a
defect in sign-extending some casts, and a problem with long double
output.  He also cited another researcher's finding that a certain
java compiler's output was dependent upon the address of heap memory
addresses used during compilation.  See:

http://www.dwheeler.com/trusting-trust/dissertation/wheeler-trusting-trust-ddc.pdf

...

>If I concentrated on one problem at a  time I would never get anything done.

?! :)


b.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?d873d5be0911151141r65f182axd48a2c767d2486d5>