Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 21 Jul 2004 17:14:27 +0200
From:      Daniel Lang <dl@leo.org>
To:        Jan Grant <Jan.Grant@bristol.ac.uk>
Cc:        current@freebsd.org
Subject:   Re: NEW TAR
Message-ID:  <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de>
In-Reply-To: <Pine.GSO.4.61.0407211440210.28037@mail.ilrt.bris.ac.uk>
References:  <40F963D8.6010201@freebsd.org> <20040719060730.GA87697@nagual.pp.ru> <20040720081051.GB3001@cirb503493.alcatel.com.au> <B82A97D5-DA91-11D8-B0C4-000A95C893E4@lassitu.de> <Pine.GSO.4.61.0407211440210.28037@mail.ilrt.bris.ac.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi,

Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100:
[..]
> You're correct, in that filesystem semantics don't require an archiver 
> to recreate holes. There are storage efficiency gains to be made in 
> identifying holes, that's true - particularly in the case of absolutely 
> whopping but extremely sparse files. In those cases, a simple 
> userland-view-of-the-filesystem-semantics approach to ideentifying areas 
> that _might_ be holes (just for archive efficiency) can still be 
> expensive and might involve the scanning of multiple gigabytes of 
> "virtual" zeroes.
> 
> Solaris offers an fcntl to identify holes (IIRC) for just this purpose. 
> If the underlying filesystem can't be made to support it, there's an 
> efficiency loss but otherwise it's no great shakes.

I don't get it.

I assume, that for any consumer it is totally transparent if
possibly existing chunks of 0-bytes are actually blocks full of
zeroes or just non-allocated blocks, correct?

Second, it is true, that there is a gain in terms of occupied disk
space, if chunks of zeroes are not allocated at all, correct?

So, from my point of view it is totally irrelevant, if a sparse file
is archived and then extracted, if the areas, which contain zeroes
are exactly in the same manner consisting of unallocated blocks
or not.

So, all I guess an archiver must do is:

 - read the file 
 - scan the file for consecutive blocks of zeroes
 - archive these blocks in an efficient way
 - on extraction, create a sparse file with the previously
   identified empty blocks, regardless if these blocks
   have been 'sparse' blocks in the original file or not.

I do not see, why it is important if the original file was sparse
at all or maybe in different places.

Cheers,
 Daniel
-- 
IRCnet: Mr-Spock  - My name is Pentium of Borg, division is futile, you
                                                will be approximated. - 
 Daniel Lang * dl@leo.org * +49 89 289 18532 * http://www.leo.org/~dl/



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040721151427.GC54664>