From owner-freebsd-current@FreeBSD.ORG Wed Jul 21 15:31:00 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4BA4916A4CE for ; Wed, 21 Jul 2004 15:31:00 +0000 (GMT) Received: from n33.kp.t-systems-sfr.com (n33.kp.t-systems-sfr.com [129.247.16.33]) by mx1.FreeBSD.org (Postfix) with ESMTP id 682BF43D5E for ; Wed, 21 Jul 2004 15:30:57 +0000 (GMT) (envelope-from harti@freebsd.org) Received: from n81.sp.op.dlr.de (n81g.sp.op.dlr.de [129.247.163.1]) i6LFUkX530862; Wed, 21 Jul 2004 17:30:46 +0200 Received: from zeus.nt.op.dlr.de (zeus.nt.op.dlr.de [129.247.173.3]) i6LFUkf348648; Wed, 21 Jul 2004 17:30:46 +0200 Received: from beagle.kn.op.dlr.de (opkndnwsbsd178 [129.247.173.178]) by zeus.nt.op.dlr.de (8.11.7+Sun/8.9.1) with ESMTP id i6LFUiV19856; Wed, 21 Jul 2004 17:30:45 +0200 (MET DST) Date: Wed, 21 Jul 2004 17:30:45 +0200 (CEST) From: Harti Brandt X-X-Sender: brandt@beagle.kn.op.dlr.de To: Daniel Lang In-Reply-To: <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de> Message-ID: <20040721172047.C99248@beagle.kn.op.dlr.de> References: <40F963D8.6010201@freebsd.org> <20040719060730.GA87697@nagual.pp.ru> <20040720081051.GB3001@cirb503493.alcatel.com.au> <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: Jan Grant cc: current@freebsd.org cc: Peter Jeremy Subject: Re: NEW TAR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: Harti Brandt List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2004 15:31:00 -0000 On Wed, 21 Jul 2004, Daniel Lang wrote: DL>Hi, DL> DL>Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100: DL>[..] DL>> You're correct, in that filesystem semantics don't require an archiver DL>> to recreate holes. There are storage efficiency gains to be made in DL>> identifying holes, that's true - particularly in the case of absolutely DL>> whopping but extremely sparse files. In those cases, a simple DL>> userland-view-of-the-filesystem-semantics approach to ideentifying areas DL>> that _might_ be holes (just for archive efficiency) can still be DL>> expensive and might involve the scanning of multiple gigabytes of DL>> "virtual" zeroes. DL>> DL>> Solaris offers an fcntl to identify holes (IIRC) for just this purpose. DL>> If the underlying filesystem can't be made to support it, there's an DL>> efficiency loss but otherwise it's no great shakes. DL> DL>I don't get it. DL> DL>I assume, that for any consumer it is totally transparent if DL>possibly existing chunks of 0-bytes are actually blocks full of DL>zeroes or just non-allocated blocks, correct? DL> DL>Second, it is true, that there is a gain in terms of occupied disk DL>space, if chunks of zeroes are not allocated at all, correct? DL> DL>So, from my point of view it is totally irrelevant, if a sparse file DL>is archived and then extracted, if the areas, which contain zeroes DL>are exactly in the same manner consisting of unallocated blocks DL>or not. DL> DL>So, all I guess an archiver must do is: DL> DL> - read the file DL> - scan the file for consecutive blocks of zeroes DL> - archive these blocks in an efficient way DL> - on extraction, create a sparse file with the previously DL> identified empty blocks, regardless if these blocks DL> have been 'sparse' blocks in the original file or not. DL> DL>I do not see, why it is important if the original file was sparse DL>at all or maybe in different places. It just may be a good deal faster just to take existing hole information (if it exists) than to scan the file. Also there is a difference between holes and actual zeroes: it's like overcommitting memory. Yoy may have a 1TB file consisting of a large hole on a 10GB disk. Just as you write something to it you will get an error at some time even when writing into the middle of the file, just because the FS needs to allocate blocks. I could imagine an application knowing its access pattern to a large sparse file allocating zeroed blocks in advance while skipping blocks that it knows it'll not write, just to make sure the blocks are there when it will write later on. But that's a rather hypothetical application. harti