Date: Tue, 8 Dec 2015 02:59:52 -0800 From: Yuri <yuri@rawbw.com> To: Freebsd hackers list <freebsd-hackers@FreeBSD.org> Subject: How to get the deterministic result for FreeBSD tar(1)? Message-ID: <5666B828.5000306@rawbw.com>
next in thread | raw e-mail | index | archive | help
I have two identical directories (no diffs, all identical mtime attributes) compressed by this command: find dir -print0 | LC_ALL=C sort -z | tar cf archive.tgz --format=bsdtar --no-recursion --null -T - The results are different: 3 files out of 10,000 have pax attributes set that are different: - 27 ctime=1449566560.642715 +27 ctime=1449566903.167521 src/contrib/libarchive/archive_write_set_format_by_name.c suggests that format=bsdtar should force ARCHIVE_FORMAT_TAR_PAX_RESTRICTED format (no attributes), unless need_extension=1 is set on a per-file basis in archive_write_set_format_pax.c. need_extension=1 is triggered by these conditions: * too long or non-ASCII path * too long or non-ASCII link * too large file * too long GID or UID * too long or non-ASCII group name or user name * ACL entries and extended attributes * sparse info In my case file hierarchy is indeed very deep, and these three files also have the "path" attribute. I think this is a bug that in archive_write_set_format_pax.c ctime attribute is written in case one of the above conditions are satisfied, because ctime can't be controlled by the user, and will always cause the difference. So I have two questions: 1. How do I actually achieve the output determinism for tar(1)? 2. Is there an agreement that this is a bug that too long or non-ASCII path name triggers the leakage of ctime into a tar file? Yuri
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5666B828.5000306>