Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 26 Aug 2002 16:32:35 -0700
From:      Terry Lambert <tlambert2@mindspring.com>
To:        Chris Ptacek <cptacek@sitaranetworks.com>
Cc:        'David Schultz' <dschultz@uclink.Berkeley.EDU>, Giorgos Keramidas <keramida@FreeBSD.ORG>, Carlos Carnero <zopewiz@yahoo.com>, freebsd-questions@FreeBSD.ORG, freebsd-fs@FreeBSD.ORG
Subject:   Re: optimization changed from TIME to SPACE ?!
Message-ID:  <3D6ABA93.704F54A0@mindspring.com>
References:  <31269226357BD211979E00A0C9866DAB02BB998B@rios.sitaranetworks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Chris Ptacek wrote:
> I had a few questions...

You would do well to read the FFS design paper.  It is available
at: http://citeseer.nj.nec.com/mckusick84fast.html in any format
you could probably want (PDF, etc.).


> What actually causes the fragmentation to occur?

There are two types of fragmentation: external fragmentation,
which can be a performance issue, and internal fragmentation,
which is based on the smalles permissible allocation using in
the filesystem.

In FFS, there are only two causes of external fragmentation:

1)	Putting more data on a disk than the free reserve would
	permit (can only be done by root, and you will see it
	complain and state "changed optimization from time to
	space").

2)	Filling up a disk, and then using "growfs", without also
	doing a backup/restore to spread the pre-existing data
	out over the whole space, instead of just the front of
	the disk.

Internal fragmentation occurs based on the "frag size"; this is
normally 1/8 of the file system block size.  The current defaults
for block size and frag size, respectively, are 16K and 2K.  This
means that the smallest amount of space you can allocate is 2K,
and that any file on average, will allocate 2K/2 (or 1K) of space
that is not usable for other files.

All filesystems have internal fragmentation; historically, FFS used
a 4K/512b sizing, rather than a 16K/2K sizing.  Since a physical
block 512b, and that is the smallest addressable unit on a disk
device, that is as optimum as it is possible to get on internal
fragmentation.  At this "frag size", the average "wasted space"
per file is 256b.  For large FSs, and for modern data cache sizes --
on disks, controllers, and in system memory itself -- the larger size
is generally more efficient.  Also, it is necessary for incredibly
large disks to use a larger filesystem block size in order to be
able to span all that space.

You can select the filesystem block size/frag size ratio to be 1:1,
1:2, 1:4, or 1:8 at the time you newfs the disk.  These are the
only permissable values, and 1:8 is almost always the correct one.


> I have tried just copying a small file over and
> over and this results in no fragmentation.  This
> leads me to believe that the fragmentation is a
> result of simultainious open files or at least
> different file sizes.

No.  It is a result of discontiguous small free areas; a small
file being copied would almost never demonstrate any external
fragmentation.  The best possible demonstration is a bunch of
allocations of the same two sizes, large and small, spanning
2/3 + 1 of a cluster of 9 disk blocks, and deleting the small
ones and repeating the process with only large ones, as root,
until the disk is physically full.


> Also it seems that when we switch to SPACE
> optimizaiton is based on the % fragmentation based
> on the minfree setting.  Can I change the minfree
> for the filesystem (I have a dedicated cache
> partition) to like 27% (8 is default) so that I
> am much less likely to hit the SPACE case?  My
> question is other than reserving 27% of my disk
> space, will this cause any other problems or
> performance decreases?

15% is the best number.  A Perfect Hash does not suffer any
collisions until an 85% fill has been achieved.  Anything
more than 15% is a waste; anything less will exponentially
degrade performance.  The current default value of 8% is a
compromise (it used to be 10% by default) for people who
believe that the reserved space is "wasted" because they do
not understand statistics or hashing.  It was picked because
it's the largest number that "seems to be a small percentage".

The switch to space optimization occurs at 5%.  If you are
getting to the point that space optimization is occurring, it
means you are using 3% of your free reserve.

Since only root can use the free reserve ("for emergencies"),
that means that whatever you are doing, you are doing as root.

Thus the easiest way to avoid switching to space optimization
is to not run your programs as root.  For programs which *must*
run as root (not even logging "must" run as root, once the
reserved port has been obtained), don't share the FS between
the root program and user programs.

-- Terry

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D6ABA93.704F54A0>