From owner-freebsd-current@FreeBSD.ORG Wed Jul 21 15:14:29 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 24FEA16A4CE for ; Wed, 21 Jul 2004 15:14:29 +0000 (GMT) Received: from mailout1.informatik.tu-muenchen.de (mailout1.informatik.tu-muenchen.de [131.159.0.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id D26CA43D49 for ; Wed, 21 Jul 2004 15:14:28 +0000 (GMT) (envelope-from langd@informatik.tu-muenchen.de) Date: Wed, 21 Jul 2004 17:14:27 +0200 From: Daniel Lang To: Jan Grant Message-ID: <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de> References: <40F963D8.6010201@freebsd.org> <20040719060730.GA87697@nagual.pp.ru> <20040720081051.GB3001@cirb503493.alcatel.com.au> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Geek: GCS/CC d-- s: a- C++$ UBS++++$ P+++$ L- E-(---) W+++(--) N++ o K w--- O? M? V? PS+(++) PE--(+) Y+ PGP+ t++ 5+++ X R+(-) tv+ b+ DI++ D++ G++ e+++ h---(-) r+++ y+ User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new at informatik.tu-muenchen.de cc: Peter Jeremy cc: Stefan Bethke cc: current@freebsd.org Subject: Re: NEW TAR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2004 15:14:29 -0000 Hi, Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100: [..] > You're correct, in that filesystem semantics don't require an archiver > to recreate holes. There are storage efficiency gains to be made in > identifying holes, that's true - particularly in the case of absolutely > whopping but extremely sparse files. In those cases, a simple > userland-view-of-the-filesystem-semantics approach to ideentifying areas > that _might_ be holes (just for archive efficiency) can still be > expensive and might involve the scanning of multiple gigabytes of > "virtual" zeroes. > > Solaris offers an fcntl to identify holes (IIRC) for just this purpose. > If the underlying filesystem can't be made to support it, there's an > efficiency loss but otherwise it's no great shakes. I don't get it. I assume, that for any consumer it is totally transparent if possibly existing chunks of 0-bytes are actually blocks full of zeroes or just non-allocated blocks, correct? Second, it is true, that there is a gain in terms of occupied disk space, if chunks of zeroes are not allocated at all, correct? So, from my point of view it is totally irrelevant, if a sparse file is archived and then extracted, if the areas, which contain zeroes are exactly in the same manner consisting of unallocated blocks or not. So, all I guess an archiver must do is: - read the file - scan the file for consecutive blocks of zeroes - archive these blocks in an efficient way - on extraction, create a sparse file with the previously identified empty blocks, regardless if these blocks have been 'sparse' blocks in the original file or not. I do not see, why it is important if the original file was sparse at all or maybe in different places. Cheers, Daniel -- IRCnet: Mr-Spock - My name is Pentium of Borg, division is futile, you will be approximated. - Daniel Lang * dl@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/