From owner-freebsd-questions Fri Sep 15 09:59:27 1995 Return-Path: owner-questions Received: (from root@localhost) by freefall.freebsd.org (8.6.12/8.6.6) id JAA22349 for questions-outgoing; Fri, 15 Sep 1995 09:59:27 -0700 Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.6.12/8.6.6) with ESMTP id JAA22327 for ; Fri, 15 Sep 1995 09:59:13 -0700 Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id JAA01332; Fri, 15 Sep 1995 09:57:15 -0700 From: Terry Lambert Message-Id: <199509151657.JAA01332@phaeton.artisoft.com> Subject: Re: Discrepance between df/du/tar! To: wollman@lcs.mit.edu (Garrett A. Wollman) Date: Fri, 15 Sep 1995 09:57:14 -0700 (MST) Cc: terry@lambert.org, lars.koeller@odie.physik2.uni-rostock.de, freebsd-questions@freefall.freebsd.org In-Reply-To: <9509151613.AA06621@halloran-eldar.lcs.mit.edu> from "Garrett A. Wollman" at Sep 15, 95 12:13:51 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 2757 Sender: owner-questions@FreeBSD.org Precedence: bulk > > Most likely you have several sparse file on your box, including your > > password databases and mail aliases. > > db(3) does not create sparse files, and in any case this does not > account for the discrepancy he is seeing. Here is the CORRECT answer: > > `du' figures out disk usage by walking the directory tree and > examining every file listed. However, it is possible for a file to > exist, but not be listed in any directory (because it was unlinked > after it was opened and is still open). This explains the difference > between what `du' was able to determine by examining the directory > structure and what `df' found out by asking the kernel how much space > is free. > > `tar' pads the end of each file in the archive with null characters to > make it a multiple of the block size you specified, or 10k if you > didn't specify one. So, if we assume that the residual (i.e., size > mod blocksize) is uniformly distributed, then you can expect a `tar' > archive of N files to contain N * blocksize / 2 bytes of padding. (In > reality, the distribution is not uniform, so this estimate is not > accurate, but it gives you an idea.) And, of course, `tar' has to > output directory information about the files it writes. This explains > the discrepancy between `du' and what `tar' wrote. The tar padding is correct. It will preterb the tar block count. However, I find unlinked temp files less likely than sparse files. You are saying that you can guarantee that none of his phantom blocks are the result of sparse files? An amazing remote diagnosis, but one if true which would allow you to CORRECT me in such certain terms. BTW: A file which is unlinked does not have a directory entry. It will not be seen by du. But a file which is a hard link will be seen *twice* by du and once by df. The 'df' output has an internal discrepancy between total blocks and blocks available + blocks used which accounts for the inodes. The du command does not account for inodes. Neither does the tar block count. The 'du' command reports the block count for '..' for each directory in the tree traversed: % cd /tmp % mkdir foo % cd foo 2 . This will inflate the block count by n times the block size of the current directory for each subdirectory. Similarly, it will count the current directory once in the parent and once in itself. Arguably this is a bug in 'du', and 'du' should be modified to ignore '.' and '..' (but count '.' for the root of the tree being traversed) to get an accurate block count. All of this being more information than the poster was probably prepared to deal with. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.