Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Sep 1995 09:57:14 -0700 (MST)
From:      Terry Lambert <>
To: (Garrett A. Wollman)
Subject:   Re: Discrepance between df/du/tar!
Message-ID:  <>
In-Reply-To: <> from "Garrett A. Wollman" at Sep 15, 95 12:13:51 pm

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
> > Most likely you have several sparse file on your box, including your
> > password databases and mail aliases.
> db(3) does not create sparse files, and in any case this does not
> account for the discrepancy he is seeing.  Here is the CORRECT answer:
> `du' figures out disk usage by walking the directory tree and
> examining every file listed.  However, it is possible for a file to
> exist, but not be listed in any directory (because it was unlinked
> after it was opened and is still open).  This explains the difference
> between what `du' was able to determine by examining the directory
> structure and what `df' found out by asking the kernel how much space
> is free.
> `tar' pads the end of each file in the archive with null characters to
> make it a multiple of the block size you specified, or 10k if you
> didn't specify one.  So, if we assume that the residual (i.e., size
> mod blocksize) is uniformly distributed, then you can expect a `tar'
> archive of N files to contain N * blocksize / 2 bytes of padding.  (In
> reality, the distribution is not uniform, so this estimate is not
> accurate, but it gives you an idea.)  And, of course, `tar' has to
> output directory information about the files it writes.  This explains
> the discrepancy between `du' and what `tar' wrote.

The tar padding is correct.  It will preterb the tar block count.

However, I find unlinked temp files less likely than sparse files.

You are saying that you can guarantee that none of his phantom blocks
are the result of sparse files?

An amazing remote diagnosis, but one if true which would allow you to
CORRECT me in such certain terms.

BTW: A file which is unlinked does not have a directory entry.  It will
not be seen by du.  But a file which is a hard link will be seen *twice*
by du and once by df.

The 'df' output has an internal discrepancy between total blocks and
blocks available + blocks used which accounts for the inodes.  The du
command does not account for inodes.  Neither does the tar block count.

The 'du' command reports the block count for '..' for each directory in
the tree traversed:

% cd /tmp
% mkdir foo
% cd foo
2	.

This will inflate the block count by n times the block size of the current
directory for each subdirectory.  Similarly, it will count the current
directory once in the parent and once in itself.

Arguably this is a bug in 'du', and 'du' should be modified to ignore '.'
and '..' (but count '.' for the root of the tree being traversed) to get
an accurate block count.

All of this being more information than the poster was probably prepared
to deal with.

					Terry Lambert
Any opinions in this posting are my own and not those of my present
or previous employers.

Want to link to this message? Use this URL: <>