From owner-freebsd-current@FreeBSD.ORG Wed Jul 21 16:27:11 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2C49116A4D1 for ; Wed, 21 Jul 2004 16:27:11 +0000 (GMT) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id ECBD343D4C for ; Wed, 21 Jul 2004 16:27:10 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (IDENT:brdavis@localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.12.10/8.12.10) with ESMTP id i6LGR6OF019724; Wed, 21 Jul 2004 09:27:06 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.12.10/8.12.3/Submit) id i6LGR6Ba019723; Wed, 21 Jul 2004 09:27:06 -0700 Date: Wed, 21 Jul 2004 09:27:06 -0700 From: Brooks Davis To: Daniel Lang Message-ID: <20040721162706.GA12760@Odin.AC.HMC.Edu> References: <40F963D8.6010201@freebsd.org> <20040719060730.GA87697@nagual.pp.ru> <20040720081051.GB3001@cirb503493.alcatel.com.au> <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="lrZ03NoBR/3+SXJZ" Content-Disposition: inline In-Reply-To: <20040721151427.GC54664@atrbg11.informatik.tu-muenchen.de> User-Agent: Mutt/1.5.4i cc: current@freebsd.org cc: Jan Grant cc: Stefan Bethke cc: Peter Jeremy Subject: Re: NEW TAR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Jul 2004 16:27:11 -0000 --lrZ03NoBR/3+SXJZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 21, 2004 at 05:14:27PM +0200, Daniel Lang wrote: > Hi, >=20 > Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100: > [..] > > You're correct, in that filesystem semantics don't require an archiver= =20 > > to recreate holes. There are storage efficiency gains to be made in=20 > > identifying holes, that's true - particularly in the case of absolutely= =20 > > whopping but extremely sparse files. In those cases, a simple=20 > > userland-view-of-the-filesystem-semantics approach to ideentifying area= s=20 > > that _might_ be holes (just for archive efficiency) can still be=20 > > expensive and might involve the scanning of multiple gigabytes of=20 > > "virtual" zeroes. > >=20 > > Solaris offers an fcntl to identify holes (IIRC) for just this purpose.= =20 > > If the underlying filesystem can't be made to support it, there's an=20 > > efficiency loss but otherwise it's no great shakes. >=20 > I don't get it. >=20 > I assume, that for any consumer it is totally transparent if > possibly existing chunks of 0-bytes are actually blocks full of > zeroes or just non-allocated blocks, correct? >=20 > Second, it is true, that there is a gain in terms of occupied disk > space, if chunks of zeroes are not allocated at all, correct? >=20 > So, from my point of view it is totally irrelevant, if a sparse file > is archived and then extracted, if the areas, which contain zeroes > are exactly in the same manner consisting of unallocated blocks > or not. >=20 > So, all I guess an archiver must do is: >=20 > - read the file=20 > - scan the file for consecutive blocks of zeroes > - archive these blocks in an efficient way > - on extraction, create a sparse file with the previously > identified empty blocks, regardless if these blocks > have been 'sparse' blocks in the original file or not. >=20 > I do not see, why it is important if the original file was sparse > at all or maybe in different places. Since sparse files over commit the disk, they should only be created deliberatly. Otherwise you can easily get in trouble if you try to use reserved space later since it won't actually be reserved. Consider the case of a file system image created with "dd if=3D/dev/zero ...; newfw =2E..". If your archiver decides to be "smart" and restore a copy of that file sparce and then you use up the availble blocks on your disk you're going to be in a world of hurt. I wouldn't be suprised it that resulted in a panic. -- Brooks --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --lrZ03NoBR/3+SXJZ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFA/plZXY6L6fI4GtQRAvmQAJ4u+YermbOn0uurNfGxp9YABnGhZACePfRU 1RGVXsw5HhIjR5U7iO/seN0= =8BZ4 -----END PGP SIGNATURE----- --lrZ03NoBR/3+SXJZ--