From owner-freebsd-current@FreeBSD.ORG Thu Jul 22 07:34:37 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CA8C816A4CE for ; Thu, 22 Jul 2004 07:34:37 +0000 (GMT) Received: from cs1.cs.huji.ac.il (cs1.cs.huji.ac.il [132.65.16.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4274843D5D for ; Thu, 22 Jul 2004 07:34:37 +0000 (GMT) (envelope-from danny@cs.huji.ac.il) Received: from pampa.cs.huji.ac.il ([132.65.80.32] ident=danny) by cs1.cs.huji.ac.il with esmtp id 1BnY66-000Pdm-RN; Thu, 22 Jul 2004 10:34:34 +0300 X-Mailer: exmh version 2.7.0 06/18/2004 with nmh-1.0.4 To: Brooks Davis In-Reply-To: Message from Brooks Davis <20040721162706.GA12760@Odin.AC.HMC.Edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 22 Jul 2004 10:34:34 +0300 From: Danny Braniss Message-Id: <20040722073437.4274843D5D@mx1.FreeBSD.org> cc: Jan Grant cc: Peter Jeremy cc: current@freebsd.org cc: Daniel Lang cc: Stefan Bethke Subject: Re: NEW TAR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Jul 2004 07:34:38 -0000 > > --lrZ03NoBR/3+SXJZ > Content-Type: text/plain; charset=us-ascii > Content-Disposition: inline > Content-Transfer-Encoding: quoted-printable > > On Wed, Jul 21, 2004 at 05:14:27PM +0200, Daniel Lang wrote: > > Hi, > >=20 > > Jan Grant wrote on Wed, Jul 21, 2004 at 02:44:42PM +0100: > > [..] > > > You're correct, in that filesystem semantics don't require an archiver= > =20 > > > to recreate holes. There are storage efficiency gains to be made in=20 > > > identifying holes, that's true - particularly in the case of absolutely= > =20 > > > whopping but extremely sparse files. In those cases, a simple=20 > > > userland-view-of-the-filesystem-semantics approach to ideentifying area= > s=20 > > > that _might_ be holes (just for archive efficiency) can still be=20 > > > expensive and might involve the scanning of multiple gigabytes of=20 > > > "virtual" zeroes. > > >=20 > > > Solaris offers an fcntl to identify holes (IIRC) for just this purpose.= > =20 > > > If the underlying filesystem can't be made to support it, there's an=20 > > > efficiency loss but otherwise it's no great shakes. > >=20 > > I don't get it. > >=20 > > I assume, that for any consumer it is totally transparent if > > possibly existing chunks of 0-bytes are actually blocks full of > > zeroes or just non-allocated blocks, correct? > >=20 > > Second, it is true, that there is a gain in terms of occupied disk > > space, if chunks of zeroes are not allocated at all, correct? > >=20 > > So, from my point of view it is totally irrelevant, if a sparse file > > is archived and then extracted, if the areas, which contain zeroes > > are exactly in the same manner consisting of unallocated blocks > > or not. > >=20 > > So, all I guess an archiver must do is: > >=20 > > - read the file=20 > > - scan the file for consecutive blocks of zeroes > > - archive these blocks in an efficient way > > - on extraction, create a sparse file with the previously > > identified empty blocks, regardless if these blocks > > have been 'sparse' blocks in the original file or not. > >=20 > > I do not see, why it is important if the original file was sparse > > at all or maybe in different places. > > Since sparse files over commit the disk, they should only be created > deliberatly. Otherwise you can easily get in trouble if you try to use > reserved space later since it won't actually be reserved. Consider the > case of a file system image created with "dd if=3D/dev/zero ...; newfw > =2E..". If your archiver decides to be "smart" and restore a copy of that > file sparce and then you use up the availble blocks on your disk you're > going to be in a world of hurt. I wouldn't be suprised it that resulted > in a panic. If the file has 'holes' and they are read as zero, then doesn't compressing the tar file nicely reduce it? dd if=/dev/zero of=junk count=100 tar czf junk.tar.gz junk ls -ls junk* 50 -rw-r--r-- 1 danny wheel 51200 Jul 22 10:28 junk 2 -rw-r--r-- 1 danny wheel 170 Jul 22 10:33 junk.tar.gz danny