Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 8 Dec 2015 02:59:52 -0800
From:      Yuri <yuri@rawbw.com>
To:        Freebsd hackers list <freebsd-hackers@FreeBSD.org>
Subject:   How to get the deterministic result for FreeBSD tar(1)?
Message-ID:  <5666B828.5000306@rawbw.com>

next in thread | raw e-mail | index | archive | help
I have two identical directories (no diffs, all identical mtime 
attributes) compressed by this command:
find dir -print0 | LC_ALL=C sort -z | tar cf archive.tgz --format=bsdtar 
--no-recursion --null -T -

The results are different: 3 files out of 10,000 have pax attributes set 
that are different:
- 27 ctime=1449566560.642715
+27 ctime=1449566903.167521

src/contrib/libarchive/archive_write_set_format_by_name.c suggests that 
format=bsdtar should force ARCHIVE_FORMAT_TAR_PAX_RESTRICTED format (no 
attributes), unless need_extension=1 is set on a per-file basis in 
archive_write_set_format_pax.c.

need_extension=1 is triggered by these conditions:
* too long or non-ASCII path
* too long or non-ASCII link
* too large file
* too long GID or UID
* too long or non-ASCII group name or user name
* ACL entries and extended attributes
* sparse info

In my case file hierarchy is indeed very deep, and these three files 
also have the "path" attribute.

I think this is a bug that in archive_write_set_format_pax.c ctime 
attribute is written in case one of the above conditions are satisfied, 
because ctime can't be controlled by the user, and will always cause the 
difference.

So I have two questions:
1. How do I actually achieve the output determinism for tar(1)?
2. Is there an agreement that this is a bug that too long or non-ASCII 
path name triggers the leakage of ctime into a tar file?

Yuri



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5666B828.5000306>