Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Oct 2007 03:37:11 -0700
From:      Tim Kientzle <kientzle@freebsd.org>
To:        josh.carroll@gmail.com
Cc:        Bruce Cran <bruce@cran.org.uk>, current@freebsd.org
Subject:   Re: bsdtar can't handle files >8GB
Message-ID:  <471DCED7.2020500@freebsd.org>
In-Reply-To: <8cb6106e0710222017p133ddccyc973c6ebcd23e270@mail.gmail.com>
References:  <471CF3F3.6070803@cran.org.uk> <8cb6106e0710222017p133ddccyc973c6ebcd23e270@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format.
--------------040305050300030903010507
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Josh Carroll wrote:
>>tar: Unrecognized archive format: Inappropriate file type or format
>>tar: Error exit delayed from previous errors.
> 
> Confirmed in RELENG_7 as well. Interestingly enough, if the file
> inside the tarball is nothing but zeros (dd if=/dev/zero ...), I don't
> get this error.  However, it doesn't work either. The resulting file
> is 0 bytes, rather than 10 GB of \0.

Try the attached patch, which I think fixes this
problem.

I need to do some more testing before I commit it,
but your feedback will certainly help.  The failure
here is that libarchive was erroneously interpreting
the large file as having a zero-byte body, then generating
the error above when it tried to read the next header.

This bug crept in when I was working on read support
for GNU tar's new --pax --sparse format.  I need to
test that in the case where a sparse entry has more than
8G of non-hole data.

Tim Kientzle

--------------040305050300030903010507
Content-Type: text/x-patch;
 name="archive_tar_largefile.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="archive_tar_largefile.patch"

Index: archive_read_support_format_tar.c
===================================================================
--- archive_read_support_format_tar.c	(revision 510)
+++ archive_read_support_format_tar.c	(working copy)
@@ -164,6 +164,7 @@
 	struct sparse_block	*sparse_last;
 	int64_t			 sparse_offset;
 	int64_t			 sparse_numbytes;
+	int64_t			 sparse_realsize;
 	int			 sparse_gnu_major;
 	int			 sparse_gnu_minor;
 	char			 sparse_gnu_pending;
@@ -440,6 +441,7 @@
 		free(sp);
 	}
 	tar->sparse_last = NULL;
+	tar->sparse_realsize = -1; /* Mark this as "unset" */
 
 	r = tar_read_header(a, tar, entry);
 
@@ -1388,9 +1390,10 @@
 		}
 		if (wcscmp(key, L"GNU.sparse.name") == 0)
 			archive_entry_copy_pathname_w(entry, value);
-		if (wcscmp(key, L"GNU.sparse.realsize") == 0)
-			archive_entry_set_size(entry,
-			    tar_atol10(value, wcslen(value)));
+		if (wcscmp(key, L"GNU.sparse.realsize") == 0) {
+			tar->sparse_realsize = tar_atol10(value, wcslen(value));
+			archive_entry_set_size(entry, tar->sparse_realsize);
+		}
 		break;
 	case 'L':
 		/* Our extensions */
@@ -1471,11 +1474,22 @@
 		/* POSIX has reserved 'security.*' */
 		/* Someday: if (wcscmp(key, L"security.acl")==0) { ... } */
 		if (wcscmp(key, L"size")==0) {
-			tar->entry_bytes_remaining = tar_atol10(value, wcslen(value));
-			archive_entry_set_size(entry, tar->entry_bytes_remaining);
+			/* "size" is the size of the data in the entry. */
+			tar->entry_bytes_remaining
+			    = tar_atol10(value, wcslen(value));
+			/*
+			 * But, "size" is not necessarily the size of
+			 * the file on disk; if this is a sparse file,
+			 * the disk size may have already been set from
+			 * GNU.sparse.realsize.
+			 */
+			if (tar->sparse_realsize < 0) {
+				archive_entry_set_size(entry,
+				    tar->entry_bytes_remaining);
+				tar->sparse_realsize
+				    = tar->entry_bytes_remaining;
+			}
 		}
-		tar->entry_bytes_remaining = 0;
-
 		break;
 	case 'u':
 		if (wcscmp(key, L"uid")==0)

--------------040305050300030903010507--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?471DCED7.2020500>